๐ Data Analytics Interview Questions
๐น Basics
-
What is Data Analytics?
-
What are the different types of data analytics?
-
Difference between Data Analysis and Data Analytics?
-
What is structured vs unstructured data?
-
What is data cleaning and why is it important?
-
Explain missing values and how to handle them.
-
What is exploratory data analysis (EDA)?
-
What is normalization and standardization?
-
What are outliers? How do you detect them?
-
Difference between qualitative and quantitative data?
๐น Statistics & Mathematics
-
What is mean, median, and mode?
-
What is variance and standard deviation?
-
What is correlation vs covariance?
-
What is a probability distribution?
-
Explain normal distribution.
-
What is skewness and kurtosis?
-
What is hypothesis testing?
-
What is p-value?
-
Explain confidence interval.
-
What is Type I and Type II error?
๐น SQL & Data Handling
-
What is primary key and foreign key?
-
Difference between
WHEREandHAVING? -
What are joins? Types of joins?
-
Difference between
DELETE,TRUNCATE, andDROP? -
What is indexing?
-
Write a query to find the second highest salary.
-
What is a subquery?
-
What are window functions?
-
What is normalization in databases?
-
Difference between OLTP and OLAP?
๐น Data Visualization
-
What is data visualization?
-
Which charts are used for categorical data?
-
When would you use a box plot?
-
What is dashboarding?
-
Tools used for data visualization?
-
How do you choose the right chart?
-
What is storytelling with data?
-
Difference between bar chart and histogram?
-
What are KPIs?
-
What makes a dashboard effective?
๐ค Machine Learning Interview Questions
๐น Fundamentals
-
What is Machine Learning?
-
Types of Machine Learning?
-
Difference between AI, ML, and Deep Learning?
-
What is supervised learning?
-
What is unsupervised learning?
-
What is reinforcement learning?
-
What is a feature?
-
What is a label?
-
What is training and testing data?
-
What is overfitting?
๐น Algorithms
-
Explain Linear Regression.
-
What is Logistic Regression?
-
Difference between regression and classification?
-
What is K-Means clustering?
-
How does KNN work?
-
What is Naive Bayes?
-
Explain Decision Tree.
-
What is Random Forest?
-
What is SVM?
-
Difference between bagging and boosting?
๐น Model Evaluation
-
What is accuracy?
-
What is precision and recall?
-
What is F1-score?
-
What is confusion matrix?
-
What is ROC curve?
-
What is AUC?
-
What is cross-validation?
-
Difference between bias and variance?
-
What is underfitting?
-
How do you improve model performance?
๐น Feature Engineering
-
What is feature engineering?
-
How do you handle categorical data?
-
What is one-hot encoding?
-
What is label encoding?
-
What is feature scaling?
-
When do you apply normalization?
-
What is PCA?
-
What is dimensionality reduction?
-
What is multicollinearity?
-
How do you detect multicollinearity?
๐น Advanced ML & Practical
-
What is ensemble learning?
-
Explain Gradient Boosting.
-
What is XGBoost?
-
Difference between XGBoost and Random Forest?
-
What is hyperparameter tuning?
-
What is GridSearchCV?
-
What is RandomSearch?
-
What is model deployment?
-
What is data leakage?
-
How do you handle imbalanced datasets?
๐ง Scenario-Based / Real Interview Questions
-
How would you handle missing data in a real project?
-
How do you choose the best ML model?
-
Explain a data analytics project you worked on.
-
How do you explain ML results to non-technical people?
-
What steps do you follow before model building?
-
How do you detect outliers in real data?
-
How do you deal with noisy data?
-
How do you validate business impact of a model?
-
What challenges did you face in ML projects?
-
How do you keep learning new ML trends?
๐ฅ HR + Concept Mixing
-
Why should we hire you as a Data Analyst / ML Engineer?
-
Difference between Data Scientist and Data Analyst?
-
What tools are you comfortable with?
-
Python vs R for data analytics?
-
SQL vs NoSQL?
-
How do you handle tight deadlines?
-
What is your strongest ML skill?
-
What is your weakness?
-
Explain a failure in your project.
-
Where do you see yourself in 5 years?
๐ Advanced Data Analytics Interview Questions
๐น Business & Case Study Based
-
How do you translate a business problem into a data problem?
-
How do you decide which metrics matter most?
-
What KPIs would you track for an e-commerce app?
-
How do you measure customer churn?
-
How do you evaluate campaign performance?
-
How do you handle conflicting data from multiple sources?
-
What is cohort analysis?
-
What is A/B testing?
-
How do you design an experiment?
-
How do you avoid misleading insights?
๐น Advanced Statistics
-
What is Central Limit Theorem?
-
Difference between parametric and non-parametric tests?
-
When do you use t-test vs ANOVA?
-
What is Chi-square test?
-
What is power of a statistical test?
-
What is multivariate analysis?
-
What is Bayesian statistics?
-
Explain regression assumptions.
-
What is heteroscedasticity?
-
How do you detect heteroscedasticity?
๐น SQL – Advanced & Optimization
-
What is query optimization?
-
What are indexes and how do they work internally?
-
What is execution plan?
-
What is CTE?
-
Difference between CTE and subquery?
-
What are window functions with example?
-
What is partitioning?
-
What is sharding?
-
How do you handle large datasets in SQL?
-
What causes slow queries?
๐ค Advanced Machine Learning Interview Questions
๐น Theory + Depth
-
What assumptions does Linear Regression make?
-
Why Logistic Regression is called regression?
-
Explain kernel trick in SVM.
-
How does entropy work in Decision Trees?
-
What is Gini index?
-
Difference between CART and ID3?
-
What is gradient descent?
-
Types of gradient descent?
-
Learning rate impact?
-
What happens if learning rate is too high?
๐น Deep Learning (Frequently Asked)
-
Difference between ML and Deep Learning?
-
What is a neural network?
-
Explain backpropagation.
-
What is activation function?
-
Types of activation functions?
-
What is vanishing gradient problem?
-
What is exploding gradient?
-
Difference between CNN and RNN?
-
What is LSTM?
-
When do you use CNN vs RNN?
๐น Model Optimization
-
What is regularization?
-
Difference between L1 and L2?
-
What is dropout?
-
What is early stopping?
-
What is batch normalization?
-
How do you tune hyperparameters?
-
What is learning curve?
-
What is validation curve?
-
How do you reduce overfitting?
-
How do you handle high bias?
๐งช Production ML & MLOps Questions (High Value)
-
What is MLOps?
-
How do you deploy an ML model?
-
Difference between offline and online inference?
-
What is model drift?
-
What is data drift vs concept drift?
-
How do you monitor model performance?
-
How do you retrain models?
-
What tools are used in MLOps?
-
What is model versioning?
-
How do you ensure reproducibility?
๐ง Scenario-Based / Problem Solving
-
Dataset has 99% accuracy but fails in production. Why?
-
How do you handle imbalanced classes?
-
What if features are highly correlated?
-
How would you design a recommendation system?
-
How do you build a fraud detection system?
-
How do you predict demand?
-
How do you detect anomalies?
-
What would you do if data is noisy?
-
How do you explain model decisions?
-
How do you select features for a new dataset?
๐ง๐ป Python for Data & ML (Interview Favorite)
-
Difference between list, tuple, and set?
-
What is NumPy?
-
Pandas vs NumPy?
-
What is vectorization?
-
Apply function vs map?
-
What is lambda function?
-
What is iterator vs generator?
-
What is shallow vs deep copy?
-
What is time complexity?
-
How do you optimize Python code?
๐งฉ Real Coding / Whiteboard Questions
-
Detect duplicates in a dataset.
-
Handle missing values using Python.
-
Implement Linear Regression from scratch.
-
Find outliers using IQR.
-
Normalize a dataset.
-
Write SQL to get top N records per group.
-
Confusion matrix from predictions.
-
Feature importance extraction.
-
Train-test split logic.
-
Cross-validation implementation.
๐ Expert-Level Data Analytics Interview Questions
๐น Metrics, KPIs & Business Thinking
-
How do you define a good metric?
-
What is a north-star metric?
-
Difference between leading and lagging indicators?
-
How do you prevent metric gaming?
-
Vanity metrics vs actionable metrics?
-
How do you design metrics for a new product?
-
What metrics would you track for:
-
Ride-sharing app?
-
Food delivery app?
-
OTT platform?
-
-
How do you validate metrics statistically?
-
What happens when metrics conflict?
-
How do you sunset a metric?
๐น Experimentation & A/B Testing
-
How do you design an A/B test end-to-end?
-
What assumptions does A/B testing make?
-
How do you calculate sample size?
-
What is statistical power?
-
What is p-hacking?
-
How do you handle multiple hypothesis testing?
-
What is CUPED?
-
When should you stop an experiment?
-
Can A/B testing give wrong results?
-
What are guardrail metrics?
๐ค Very Advanced Machine Learning Interview Questions
๐น Mathematical Depth
-
Derive the cost function for Linear Regression.
-
Why is MSE differentiable?
-
Why do we use log loss for classification?
-
Explain bias-variance decomposition mathematically.
-
What is convex optimization?
-
Why does gradient descent converge?
-
What is Hessian matrix?
-
When do second-order methods help?
-
What is eigenvalue significance in PCA?
-
Why does normalization help convergence?
๐น Algorithms – Deep Dive
-
Why Random Forest reduces variance?
-
Why boosting reduces bias?
-
Explain XGBoost objective function.
-
Why does XGBoost handle missing values well?
-
What is LightGBM leaf-wise growth?
-
CatBoost vs XGBoost?
-
Why SVM works well in high dimensions?
-
What happens when C → ∞ in SVM?
-
Why Naive Bayes works despite independence assumption?
-
Why KNN is called a lazy learner?
๐ง Model Failure & Debugging (Interview Gold)
-
Model performs well offline but fails online. Why?
-
How do you debug a bad ML model?
-
How do you identify data leakage?
-
How do you detect label noise?
-
What causes training-serving skew?
-
How do you handle unseen categories?
-
How do you handle missing values at inference?
-
Why does accuracy suddenly drop?
-
How do you validate feature importance?
-
How do you rollback a model safely?
๐งช Production ML & System Design
-
Design an ML system for spam detection.
-
Design a recommendation system for news.
-
How do you choose batch vs real-time inference?
-
What is feature store?
-
Why do we need offline + online features?
-
What is idempotency in ML pipelines?
-
How do you design data pipelines?
-
How do you ensure low-latency predictions?
-
What trade-offs exist in model size vs speed?
-
How do you handle cold start?
๐ Ethics, Fairness & Explainability
-
What is algorithmic bias?
-
How do you detect bias in models?
-
What is fairness vs accuracy trade-off?
-
What is SHAP?
-
What is LIME?
-
When should models be interpretable?
-
Explain counterfactual explanations.
-
How do you handle sensitive attributes?
-
What regulations affect ML systems?
-
Can an ML model be ethical?
๐ป Hands-On Coding & Whiteboard (Advanced)
-
Implement gradient descent from scratch.
-
Implement logistic regression without sklearn.
-
Implement K-Means from scratch.
-
Compute ROC-AUC manually.
-
Write SQL for running totals.
-
Optimize a slow pandas pipeline.
-
Detect data drift programmatically.
-
Feature selection using correlation matrix.
-
Custom cross-validation strategy.
-
Train model with time-series split.
๐ต️ Trick / Trap Interview Questions
-
Can a model have high precision and low recall?
-
Can R² be negative?
-
Can adding features reduce performance?
-
Can unsupervised learning use labels?
-
Is deep learning always better?
-
Does more data always help?
-
Why accuracy is misleading?
-
When does PCA hurt performance?
-
Is cross-validation always needed?
-
Can models learn causality?
๐ง๐ผ Leadership & Senior Role Questions
-
How do you review another analyst’s work?
-
How do you explain uncertainty to stakeholders?
-
How do you push back on bad metrics?
-
How do you prioritize ML projects?
-
How do you mentor juniors?
-
How do you decide build vs buy?
-
How do you estimate ROI of ML?
-
How do you manage technical debt?
-
How do you handle production incidents?
-
How do you scale ML teams?