📊 Data Manipulation & Analysis

Pandas

DataFrames, Series, indexing, grouping, merging, pivoting, time series operations

NumPy

Arrays, broadcasting, vectorization, linear algebra operations, random number generation

Data Cleaning

Handling missing values, outliers, duplicates, data type conversions, normalization

Exploratory Data Analysis (EDA)

Statistical summaries, distributions, correlations, visualizations, hypothesis testing

🤖 Machine Learning Fundamentals

Supervised Learning

Classification, regression, evaluation metrics (accuracy, precision, recall, F1, AUC, RMSE, MAE)

Unsupervised Learning

Clustering (K-means, DBSCAN), dimensionality reduction (PCA, t-SNE), anomaly detection

Linear Models

Linear regression, logistic regression, regularization (L1/L2), Ridge, Lasso, Elastic Net

Tree-Based Models

Decision trees, Random Forest, Gradient Boosting (XGBoost, LightGBM, CatBoost), feature importance

Support Vector Machines

SVM theory, kernels, hyperparameter tuning, use cases

Neural Networks

Perceptrons, backpropagation, activation functions, loss functions, optimization algorithms

🧠 Deep Learning

Feedforward Networks

Multi-layer perceptrons, activation functions, weight initialization, batch normalization

Convolutional Neural Networks (CNN)

Convolutions, pooling, architectures (ResNet, VGG, EfficientNet), transfer learning

Recurrent Neural Networks (RNN)

LSTM, GRU, sequence modeling, attention mechanisms, transformers

Regularization Techniques

Dropout, early stopping, data augmentation, weight decay, batch normalization

🔧 Feature Engineering

Categorical Encoding

One-hot encoding, label encoding, target encoding, frequency encoding, embedding

Numerical Features

Scaling (StandardScaler, MinMaxScaler), binning, polynomial features, log transformations

Feature Selection

Correlation analysis, mutual information, recursive feature elimination, importance-based selection

Time Series Features

Lag features, rolling statistics, seasonality, trend extraction, Fourier transforms

Text Features

TF-IDF, word embeddings (Word2Vec, GloVe), BERT, text preprocessing, n-grams

📈 Model Evaluation & Validation

Cross-Validation

K-fold, stratified K-fold, time series splits, leave-one-out, nested CV

Evaluation Metrics

Classification: accuracy, precision, recall, F1, ROC-AUC, PR-AUC, log loss
Regression: RMSE, MAE, MAPE, R², adjusted R²

Bias-Variance Tradeoff

Understanding overfitting/underfitting, learning curves, validation curves

Hyperparameter Tuning

Grid search, random search, Bayesian optimization (Optuna, Hyperopt), early stopping

🎯 Ensemble Methods

Bagging

Bootstrap aggregating, Random Forest, Extra Trees

Boosting

AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost

Stacking & Blending

Meta-learning, stacking architectures, weighted averaging, rank averaging

📊 Statistics & Mathematics

Probability & Statistics

Distributions, hypothesis testing, confidence intervals, Bayesian statistics

Linear Algebra

Matrix operations, eigenvalues/eigenvectors, SVD, PCA

Calculus

Derivatives, gradients, optimization, chain rule (for backpropagation)

💻 Programming & Tools

Python

Object-oriented programming, list comprehensions, generators, decorators, context managers

Libraries

Scikit-learn, XGBoost, LightGBM, CatBoost, TensorFlow, PyTorch, Keras

Data Visualization

Matplotlib, Seaborn, Plotly, creating effective visualizations

Version Control

Git, GitHub, managing code versions, collaboration

🎓 Recommended Learning Path

1

Foundation

Python basics → Pandas/NumPy → Data visualization → Basic statistics

2

Machine Learning Basics

Linear models → Tree models → Evaluation metrics → Cross-validation

3

Advanced ML

Feature engineering → Ensemble methods → Hyperparameter tuning → Model selection

4

Deep Learning

Neural networks → CNNs → RNNs/LSTMs → Transfer learning

5

Competition Skills

EDA techniques → Advanced feature engineering → Ensemble strategies → Time management