Top 26 Wipro Data Science Interview Questions

1. Difference between WHERE and HAVING in SQL

WHERE: Filters rows before grouping.
HAVING: Filters groups after grouping (used with aggregate functions).

2. Basics of Logistic Regression

Logistic Regression predicts probabilities for a binary outcome using the sigmoid function. It calculates the log-odds (logistic function) and uses maximum likelihood estimation to fit the model.

3. How do you treat outliers?

Identify using methods like IQR or Z-scores.
Handle by:
- Removing (if they are errors or noise).
- Transforming (e.g., log transformation).
- Capping values at a threshold.
- Using robust models.

4. Explain Confusion Matrix

A confusion matrix summarizes prediction results:

True Positive (TP): Correct positive prediction.
False Positive (FP): Incorrect positive prediction.
True Negative (TN): Correct negative prediction.
False Negative (FN): Incorrect negative prediction.
Metrics derived: Accuracy, Precision, Recall, F1-score.

5. Explain PCA

Principal Component Analysis reduces dimensionality by transforming data into a new coordinate system:

Covariance Matrix: Measures variance and relationships between features.
Eigenvalues: Magnitudes of variance captured by principal components.
Eigenvectors: Directions of principal components.
Steps: Compute covariance matrix, find eigenvalues/eigenvectors, sort by eigenvalue, and transform data.

6. Cut a Cake into 8 Equal Parts Using 3 Cuts

First cut horizontally, slicing the cake into two layers.
Make a vertical cut dividing it into four halves.
A perpendicular vertical cut divides it into eight equal pieces.

7. Explain k-means Clustering

Unsupervised algorithm to group data into $k$ clusters based on similarity.
Steps:
1. Initialize $k$ centroids.
2. Assign points to nearest centroid.
3. Recalculate centroids.
4. Repeat until convergence.

8. Difference Between KNN and k-means Clustering

KNN (K-Nearest Neighbors): Supervised, classifies based on nearest neighbors.
k-means: Unsupervised, clusters data based on similarity.

9. Handle Imbalanced Dataset

Resampling techniques:
- Oversampling (e.g., SMOTE).
- Undersampling.
Use metrics like Precision, Recall, F1-score.
Algorithm adjustments: Use weighted loss functions.

10. Stock Market Prediction: Classification or Regression?

Classification: Predict if bankruptcy will occur (Yes/No).
Regression is used if predicting a continuous value like stock price.

11. Key Performance Indicators for a Product

Customer Satisfaction Score (CSAT).
Net Promoter Score (NPS).
Conversion Rate.
Retention Rate.
Revenue Growth.

12. Technique for Predicting Categorical Responses

Logistic Regression.
Decision Trees.
Naive Bayes.

13. What is Logistic Regression?

Logistic Regression predicts the probability of a categorical outcome using the sigmoid function.
Example: Predicting if a customer will churn or not.

14. Importance of Data Cleaning

Removes inconsistencies, duplicates, and errors.
Improves data quality for better model performance.
Reduces bias and ensures accuracy.

15. Normal Distribution

A symmetric, bell-shaped curve.
Mean = Median = Mode.
Defined by mean ( $\mu$ ) and standard deviation ( $\sigma$ ).

16. Cross-Validation

Technique to assess model performance by splitting data into training and validation sets.
Popular method: K-Fold Cross-Validation.

17. Variants of Back Propagation

Stochastic Gradient Descent (SGD).
Mini-batch Gradient Descent.
Adaptive methods (e.g., Adam, RMSProp).

18. What is a Random Forest?

Ensemble learning method using multiple decision trees.
Aggregates results via majority voting (classification) or averaging (regression).

19. Collaborative Filtering

Recommender system technique.
Types:
- User-based: Finds similar users.
- Item-based: Finds similar items.

20. Interpolation and Extrapolation

Interpolation: Estimating within the range of known data points.
Extrapolation: Predicting outside the known range.

21. Power Analysis

Determines sample size needed for detecting an effect.
Factors: Effect size, significance level, and power ( $1-\beta$ ).

22. Difference Between Cluster and Systematic Sampling

Cluster Sampling: Randomly selects groups, then samples within them.
Systematic Sampling: Selects every $n$ -th item in an ordered list.

23. Are Expected Value and Mean Value Different?

Same for a probability distribution.
Mean refers to empirical data; Expected Value is theoretical.

24. Box-Cox Transformation for Normality

Applies a power transformation:
$y' = \frac{y^\lambda - 1}{\lambda}, \lambda \neq 0$
Stabilizes variance and normalizes data.

25. Eigenvalue and Eigenvector

Eigenvalue: Magnitude of a transformation.
Eigenvector: Direction unaffected by transformation.

26. Do Gradient Descent Methods Always Converge?

No, convergence depends on factors like:

Learning rate.
Non-convex loss functions.
Proper initialization.

Top 26 Wipro Data Science Interview Questions

1. Difference between WHERE and HAVING in SQL

2. Basics of Logistic Regression

3. How do you treat outliers?

4. Explain Confusion Matrix

5. Explain PCA

6. Cut a Cake into 8 Equal Parts Using 3 Cuts

7. Explain k-means Clustering

8. Difference Between KNN and k-means Clustering

9. Handle Imbalanced Dataset

10. Stock Market Prediction: Classification or Regression?

11. Key Performance Indicators for a Product

12. Technique for Predicting Categorical Responses

13. What is Logistic Regression?

14. Importance of Data Cleaning

15. Normal Distribution

16. Cross-Validation

17. Variants of Back Propagation

18. What is a Random Forest?

19. Collaborative Filtering

20. Interpolation and Extrapolation

21. Power Analysis

22. Difference Between Cluster and Systematic Sampling

23. Are Expected Value and Mean Value Different?

24. Box-Cox Transformation for Normality

25. Eigenvalue and Eigenvector

26. Do Gradient Descent Methods Always Converge?

Posted by JordanXo

Post a Comment

0 Comments

Search This Blog

Footer Menu Widget

Contact form

Top 26 Wipro Data Science Interview Questions

1. Difference between WHERE and HAVING in SQL

2. Basics of Logistic Regression

3. How do you treat outliers?

4. Explain Confusion Matrix

5. Explain PCA

6. Cut a Cake into 8 Equal Parts Using 3 Cuts

7. Explain k-means Clustering

8. Difference Between KNN and k-means Clustering

9. Handle Imbalanced Dataset

10. Stock Market Prediction: Classification or Regression?

11. Key Performance Indicators for a Product

12. Technique for Predicting Categorical Responses

13. What is Logistic Regression?

14. Importance of Data Cleaning

15. Normal Distribution

16. Cross-Validation

17. Variants of Back Propagation

18. What is a Random Forest?

19. Collaborative Filtering

20. Interpolation and Extrapolation

21. Power Analysis

22. Difference Between Cluster and Systematic Sampling

23. Are Expected Value and Mean Value Different?

24. Box-Cox Transformation for Normality

25. Eigenvalue and Eigenvector

26. Do Gradient Descent Methods Always Converge?

Posted by JordanXo

You may like these posts

Post a Comment

0 Comments

Search This Blog

Footer Menu Widget

Contact form