HikeCatalystHikeCatalyst
← Back to Paths
[PLACEHOLDER hero banner]

Become a Data Scientist / ML Engineer

Take ML models from Jupyter to production-grade serving systems with measurable business impact.

CREATED BY
S
Sneha T. [PLACEHOLDER] 4.9
Business Analyst Lead at ConsultPro | 8+ years of experience

About this Path

Designed for data scientists who can train models but struggle to take them to production, and for ML engineers aiming for senior or staff roles. You will master feature stores, model serving at scale, LLM fine-tuning and RAG pipelines, ML system design, and building the experiment tracking and monitoring discipline that separates research from reliable ML products.

Path Overview

Advanced LevelCertificate of CompletionAbout 76 hours to completeEnglish language24+ curated videosLearn online at your own pace6 modules with resourcesGamified & interactive

Path Curriculum

Feature Engineering: Feature Stores with Feast & Tecton
Point-in-time correctness, online/offline consistency, feature sharing across teams.
Experiment Tracking: MLflow, Weights & Biases & Hydra Configs
Run comparison, artifact versioning, hyperparameter sweeps, reproducible configs.
Training Pipeline Design: Kubeflow Pipelines & Metaflow
Step caching, conditional branches, resource annotations for GPU nodes.
Model Registry & Governance: Lineage, Approval Gates & Rollback
Stage transitions, signature enforcement, automated smoke tests before promotion.
Transformer Architecture: Attention, Positional Encoding & Scaling Laws
Multi-head attention complexity, KV cache, chinchilla optimal compute allocation.
Efficient Training: Mixed Precision, Gradient Checkpointing & FSDP
BF16 vs FP16, activation offloading, ZeRO stages for multi-GPU training.
Recommender Systems: Two-Tower Models & HNSW Retrieval
Hard negative mining, ANN index trade-offs, real-time feature lookup latency.
Time-Series Forecasting: TFT, N-BEATS & Probabilistic Outputs
Quantile loss, conformal prediction intervals, data leakage prevention in windows.
LoRA & QLoRA: Parameter-Efficient Fine-Tuning on Domain Data
Rank selection, adapter merging, quantisation trade-offs with bitsandbytes.
RLHF & DPO: Aligning Models with Human Preferences
Reward model training, KL divergence penalty, Direct Preference Optimisation mechanics.
LLM Evaluation: RAGAS, LLM-as-Judge & Custom Benchmarks
Faithfulness, answer relevance, factuality metrics, evals CI integration.
LLM Safety: Guardrails, Prompt Injection & Jailbreak Mitigations
NeMo Guardrails, constitutional AI patterns, red-teaming methodology.
Vector Databases: Pinecone, Weaviate & pgvector at Scale
HNSW vs IVF index selection, namespace isolation, filtered ANN performance.
Advanced RAG: Re-Ranking, HyDE & Query Decomposition
Cross-encoder re-rankers, hypothetical document embedding, multi-hop retrieval.
Agentic Workflows: LangGraph, Tool Calling & Memory Patterns
ReAct vs plan-and-execute, tool error recovery, long-term memory with episodic stores.
Triton Inference Server: Dynamic Batching & Model Ensembles
Backend selection (TensorRT, ONNX, PyTorch), grpc vs http perf benchmarks.
LLM Inference: vLLM, PagedAttention & Speculative Decoding
Throughput vs latency trade-offs, tensor parallelism, continuous batching.
Serving Infrastructure: KServe, KEDA Scaling & GPU Autoscaling
Custom resource requests, cold-start mitigation, GPU memory fragmentation avoidance.
A/B Testing, Shadow Deployments & Champion-Challenger Frameworks
Traffic splitting via Istio, statistical significance gates, online metric degradation alerts.
Data & Concept Drift Detection: Evidently, Arize & Nannyml
PSI, Jensen-Shannon divergence, drift alerting thresholds and retraining triggers.
ML System Design: Ads Ranking, Fraud Detection, Search Relevance
Two-stage retrieval-ranking, real-time feature pipelines, business metric alignment.
DS/MLE Interview Playbook: Stats, ML Concepts & Case Studies
Bias-variance decomposition, causal inference basics, AB test pitfall questions.
Communicating ML Results to Stakeholders & Product Partners
Confidence intervals for business leaders, model card templates, launch readiness checklist.

What you'll learn

  • Design end-to-end ML systems covering feature engineering, training pipelines, model registry, and low-latency serving.
  • Build and operate a feature store using Feast or Tecton, eliminating training-serving skew in production pipelines.
  • Fine-tune and evaluate large language models using LoRA/QLoRA, RLHF pipelines, and rigorous LLM evaluation frameworks.
  • Architect RAG systems with retrieval quality metrics, re-ranking, hybrid search, and production-grade vector databases.
  • Deploy models at scale using TorchServe, Triton Inference Server, and Kubernetes-based autoscaling serving stacks.
  • Implement ML observability: data drift, concept drift, shadow deployments, and champion-challenger A/B frameworks.
FREE PROFILE AUDIT

Book your free audit

Tell us where you are — a senior mentor reviews your profile and shows you exactly what's blocking interview calls. Only name, email and role are required; the more you share, the sharper your audit. No spam, no obligation.

A FEW MORE DETAILS (OPTIONAL)
I want

* required · Prefer talking? WhatsApp +91 83598 96054 or email connect@hikecatalyst.com

📄 Score My Resume