Intermediate Data Science Interview Questions

1. What are the differences between supervised and unsupervised learning?

Supervised Learning: Involves labelled data, where the model learns to map inputs to outputs. Examples include classification and regression.

Unsupervised Learning: Deals with unlabeled data, aiming to identify hidden structures or patterns. Examples include clustering and dimensionality reduction.

2. Why is dimension reduction significant?

Dimensionality reduction simplifies datasets by reducing the number of features while retaining essential information. It helps:

Improve model performance.
Reduce computation time.
Mitigate the curse of dimensionality.

3. What is the law of large numbers?

The law of large numbers states that as the size of a sample increases, its mean will converge to the population mean. It ensures the reliability of estimates in statistics.

4. What is Prior probability and likelihood?

Prior Probability: The probability of an event before observing new data.
Likelihood: The probability of observing the data given a specific hypothesis or parameter value.

5. What is Back Propagation?

Backpropagation is an algorithm used to train neural networks.

It calculates gradients of the loss function concerning each weight by applying the chain rule, enabling weights to be updated using gradient descent.

6. What are confounding variables?

Confounding variables are extraneous factors that correlate with both independent and dependent variables, potentially biasing results in experiments or studies.

7. Explain the cluster sampling technique.

Cluster sampling divides the population into groups (clusters) and randomly selects entire clusters for analysis.

It is cost-effective and useful when the population is geographically dispersed.

8. Do gradient descent methods always converge to similar points?

No, gradient descent may converge to different points depending on:

The cost function's convexity.
Initialization of parameters.
Learning rate.

9. What is principal component analysis (PCA)?

PCA is a dimensionality reduction technique that transforms features into a smaller set of uncorrelated variables called principal components, capturing the most variance in the data.

10. When is the mean imputation of missing data acceptable?

Mean imputation is acceptable when missing data is random and the dataset has no severe outliers, but it can distort variance and correlations.

11. Which language is best for text analytics, R or Python?

Python is preferred for text analytics due to libraries like NLTK, spaCy, and Transformers.
R excels in statistical modelling but is less versatile for NLP tasks.

12. Why is MSE a bad measure of model performance? What would you suggest instead?

MSE penalizes large errors excessively and is sensitive to outliers. Alternatives:

MAE (Mean Absolute Error): More robust to outliers.
R² (Coefficient of Determination): Measures model fit.

13. What are Auto-Encoders?

Auto-encoders are neural networks used for unsupervised learning, primarily for dimensionality reduction and data reconstruction.

They consist of an encoder and a decoder.

14. Discuss Artificial Neural Network (ANN).

ANN is a computational model inspired by the human brain, consisting of layers of nodes (neurons) connected by weights.

It processes inputs, applies activation functions, and learns through backpropagation.

15. Name various types of Deep Learning Frameworks.

Popular frameworks include:

TensorFlow
PyTorch
Keras
Caffe
MXNet

16. Name commonly used algorithms.

Common ML algorithms:

Linear Regression
Decision Trees
Random Forest
Gradient Boosting (e.g., XGBoost)
K-Means Clustering

17. What are the disadvantages of using a linear model?

Assumes linearity between variables.
Sensitive to outliers.
Cannot capture complex patterns or interactions.

18. What is the difference between convex and non-convex cost functions?

Convex: Single global minimum; easier optimization.
Non-convex: Multiple local minima; harder to optimize.

19. What is univariate analysis?

Univariate analysis examines a single variable's distribution, central tendency, and dispersion. Examples: histograms, box plots.

20. Explain Collaborative filtering.

Collaborative filtering recommends items based on user-item interactions, using similarities among users or items.

21. When does underfitting occur in a static model?

Underfitting occurs when a model is too simple, failing to capture data patterns, often due to insufficient features or training.

22. Overfitting VS Underfitting

Overfitting: The model fits the training data too well, failing to generalize.
Underfitting: The model fails to learn the underlying pattern.

23. What are the top three technical skills of a data scientist?

Programming (Python, R)
Data Manipulation (SQL, Pandas)
Machine Learning/Statistics

24. What are the model parameters used in iterative methods?

Examples:

Weights in neural networks.
Coefficients in linear models.

25. What is imbalanced data, and why is it a problem?

Imbalanced data occurs when class distribution is skewed, leading to biased models favouring the majority class.

26. What are Eigen vectors and Eigen values?

Eigenvalues quantify variance captured by eigenvectors, which define directions in a transformed feature space, crucial in PCA.

27. Grid search vs. random search tuning?

Grid Search: Exhaustive search of hyperparameter combinations.
Random Search: Random sampling of combinations.

28. What are the assumptions used in linear regression?

Linearity, independence, homoscedasticity, and normality. Violation leads to biased estimates.

29. Differences between the test set and the validation set?

Validation Set: Tuning hyperparameters.
Test Set: Final model evaluation.

30. What is the bias-variance trade-off?

Balancing underfitting (bias) and overfitting (variance) for optimal model performance.

31. Does overfitting occur with large data?

Not necessarily, but overfitting can occur if the model is too complex.

32. Dimensionality reduction before SVM?

Yes, it reduces computation and improves performance, especially for high-dimensional data.

33. What is the kernel trick?

Transforms data into higher dimensions for better separability in algorithms like SVM.

34. Correlation vs. Covariance?

Correlation: Scaled relationship (bounded between -1 and 1).
Covariance: Unscaled linear relationship.

35. Why TensorFlow in deep learning?

TensorFlow offers flexibility, scalability, and support for distributed computing.

36. Exploding vs. Vanishing Gradients?

Exploding: Gradients grow excessively large.
Vanishing: Gradients shrink, slowing learning.

37. What is a computational graph?

A visual representation of mathematical operations in deep learning, enabling automatic differentiation.

38. What is GAN?

A Generative Adversarial Network uses two networks (generator and discriminator) to generate realistic data.

39. What are support vectors?

Points in SVM closest to the decision boundary, are crucial for margin maximization.

40. What is a gradient and gradient descent?

Gradient: Derivative indicating the slope of a function.
Gradient Descent: Optimization algorithm minimizing loss functions.

41. What is a p-value?

Indicates the probability of observing data as extreme as the current sample under the null hypothesis.

42. How are time series problems different?

Incorporate temporal dependencies; require specialized methods like ARIMA.

43. When are both false positives and false negatives important?

In medical tests, both errors can have significant consequences.

44. Box plots vs. histograms?

Box Plots: Summarize distributions with quartiles.
Histograms: Show data frequency.

45. How often to update algorithms?

Depends on data drift, domain changes, and model performance.

46. Expected value vs. mean?

The expected value is a theoretical concept; the mean is empirical.

47. How to identify a biased coin?

Use hypothesis testing to compare observed results against expected probabilities.

48. When is resampling done?

To balance classes, estimate uncertainty, or validate models.

49. Long vs. wide-format data?

Long: Each observation is in a separate row.
Wide: Each variable is in a separate column.

50. Define KPI, lift, model fitting, robustness, and DOE.

KPI: Key Performance Indicator; measures success.
Lift: Improvement in response due to a model.
Model fitting: Optimizing model parameters.
Robustness: Model's resistance to variations.
DOE (Design of Experiments): Planning tests to identify causal relationships.

51. Probability of events (shooting stars)?

Use event frequency and total observations to calculate probabilities. For rare events, Poisson distribution may be applicable.