1. Difference between WHERE and HAVING in SQL
- WHERE: Filters rows before grouping.
- HAVING: Filters groups after grouping (used with aggregate functions).
2. Basics of Logistic Regression
Logistic Regression predicts probabilities for a binary outcome using the sigmoid function. It calculates the log-odds (logistic function) and uses maximum likelihood estimation to fit the model.
3. How do you treat outliers?
- Identify using methods like IQR or Z-scores.
- Handle by:
- Removing (if they are errors or noise).
- Transforming (e.g., log transformation).
- Capping values at a threshold.
- Using robust models.
4. Explain Confusion Matrix
A confusion matrix summarizes prediction results:
- True Positive (TP): Correct positive prediction.
- False Positive (FP): Incorrect positive prediction.
- True Negative (TN): Correct negative prediction.
- False Negative (FN): Incorrect negative prediction.
Metrics derived: Accuracy, Precision, Recall, F1-score.
5. Explain PCA
Principal Component Analysis reduces dimensionality by transforming data into a new coordinate system:
- Covariance Matrix: Measures variance and relationships between features.
- Eigenvalues: Magnitudes of variance captured by principal components.
- Eigenvectors: Directions of principal components.
- Steps: Compute covariance matrix, find eigenvalues/eigenvectors, sort by eigenvalue, and transform data.
6. Cut a Cake into 8 Equal Parts Using 3 Cuts
- First cut horizontally, slicing the cake into two layers.
- Make a vertical cut dividing it into four halves.
- A perpendicular vertical cut divides it into eight equal pieces.
7. Explain k-means Clustering
- Unsupervised algorithm to group data into clusters based on similarity.
- Steps:
- Initialize centroids.
- Assign points to nearest centroid.
- Recalculate centroids.
- Repeat until convergence.
8. Difference Between KNN and k-means Clustering
- KNN (K-Nearest Neighbors): Supervised, classifies based on nearest neighbors.
- k-means: Unsupervised, clusters data based on similarity.
9. Handle Imbalanced Dataset
- Resampling techniques:
- Oversampling (e.g., SMOTE).
- Undersampling.
- Use metrics like Precision, Recall, F1-score.
- Algorithm adjustments: Use weighted loss functions.
10. Stock Market Prediction: Classification or Regression?
- Classification: Predict if bankruptcy will occur (Yes/No).
- Regression is used if predicting a continuous value like stock price.
11. Key Performance Indicators for a Product
- Customer Satisfaction Score (CSAT).
- Net Promoter Score (NPS).
- Conversion Rate.
- Retention Rate.
- Revenue Growth.
12. Technique for Predicting Categorical Responses
- Logistic Regression.
- Decision Trees.
- Naive Bayes.
13. What is Logistic Regression?
Logistic Regression predicts the probability of a categorical outcome using the sigmoid function.
Example: Predicting if a customer will churn or not.
14. Importance of Data Cleaning
- Removes inconsistencies, duplicates, and errors.
- Improves data quality for better model performance.
- Reduces bias and ensures accuracy.
15. Normal Distribution
- A symmetric, bell-shaped curve.
- Mean = Median = Mode.
- Defined by mean () and standard deviation ().
16. Cross-Validation
- Technique to assess model performance by splitting data into training and validation sets.
- Popular method: K-Fold Cross-Validation.
17. Variants of Back Propagation
- Stochastic Gradient Descent (SGD).
- Mini-batch Gradient Descent.
- Adaptive methods (e.g., Adam, RMSProp).
18. What is a Random Forest?
- Ensemble learning method using multiple decision trees.
- Aggregates results via majority voting (classification) or averaging (regression).
19. Collaborative Filtering
- Recommender system technique.
- Types:
- User-based: Finds similar users.
- Item-based: Finds similar items.
20. Interpolation and Extrapolation
- Interpolation: Estimating within the range of known data points.
- Extrapolation: Predicting outside the known range.
21. Power Analysis
- Determines sample size needed for detecting an effect.
- Factors: Effect size, significance level, and power ().
22. Difference Between Cluster and Systematic Sampling
- Cluster Sampling: Randomly selects groups, then samples within them.
- Systematic Sampling: Selects every -th item in an ordered list.
23. Are Expected Value and Mean Value Different?
- Same for a probability distribution.
- Mean refers to empirical data; Expected Value is theoretical.
24. Box-Cox Transformation for Normality
- Applies a power transformation:
- Stabilizes variance and normalizes data.
25. Eigenvalue and Eigenvector
- Eigenvalue: Magnitude of a transformation.
- Eigenvector: Direction unaffected by transformation.
26. Do Gradient Descent Methods Always Converge?
No, convergence depends on factors like:
- Learning rate.
- Non-convex loss functions.
- Proper initialization.
0 Comments