cv
You can download the PDF from the pink/blue link on the right.
Basics
| Name | Kaustubh Sharma |
| Label | Undergraduate Student / Researcher |
| kaustubh_s@ee.iitr.ac.in | |
| Url | https://kaustubh202.github.io |
| Summary | B.Tech (Electrical Engineering) student at IIT Roorkee (CGPA 9.04/10). Research interests and experience in mechanistic interpretability, diffusion models, Gaussian process inference, and ML for science & engineering. Author on multiple papers (ICCV / ICLR workshops) and incoming quantitative research intern. |
Work
-
2025.05 - Present Undergraduate Research Assistant
P-square Lab, IIT Roorkee
Research on attention mechanisms for Prior-Data Fitted Networks (PFNs) and amortized kernel/hyperparameter inference for GP-style problems.
- Developed Decoupled-Value Attention (DVA) for PFNs to accelerate GP inference for physical equation solving (code: PSquare-Lab/DVA-PFN).
- Working on foundational architectures for amortized kernel hyperparameter inference and scaling PFNs.
-
2025.03 - 2025.03 Educator
Edufabrica Pvt. Ltd.
Delivered a 2-day workshop lecture to 200+ students across India on Generative AI.
-
Incoming Quantitative Research Intern (Quantitative Strategist)
Goldman Sachs
Selected for the quantitative strategist internship involving financial modelling, statistical analysis and quantitative research. (Upcoming)
Education
-
2023.07 - 2027.06
Awards
- 2025.01.01
Micron AI Hackathon 2025 — Second Runner Up
Micron Technology
Second Runner Up at Micron AI Hackathon 2025.
- 2023.06.01
JEE Advanced 2023
IIT / Joint Entrance Examination
Secured All India Rank 1624 among 100k+ applicants.
- 2023.05.01
- 2021.01.01
NTSE Scholar 2021
NCERT
Awarded NTSE scholarship; among top ~2000 candidates from ~900k+ applicants.
- 2022.01.01
KVPY Fellow 2022
IISc Bangalore / KVPY
Secured AIR 509 among ~200k+ applicants; KVPY fellowship recipient.
Publications
-
2026 Dissecting Attention and MLP Roles: A Study of Domain Specialization in Large Language Models
Under review
First author (Kaustubh Sharma). Study on domain specialization in LLaMA 3-3B: forward pass profiling, probing, zero-out tests and analysis of fine-tuning shifts.
-
2025.09 Decoupled-Value Attention for Prior-Data Fitted Networks: GP Inference for Physical Equations
arXiv / Under review
First author (Kaustubh Sharma). Proposed DVA attention to improve Gaussian process inference speed/accuracy inside PFNs for physical equation solving.
-
2025 Image-Alchemy: Advancing Subject Fidelity in Personalized Text-to-Image Generation
DeLTa Workshop, ICLR 2025
First author (Kaustubh Sharma). Pipeline for personalizing Stable Diffusion XL improving subject fidelity using LoRA and segmentation-guided Img2Img to reduce overfitting/forgetting.
-
2025 Explainable AI-Generated Image Forensics: A Low-Resolution Perspective with Novel Artifact Taxonomy
APAI Workshop, ICCV 2025 (Proceedings)
Solo author (Kaustubh Sharma). Developed an interpretable pipeline for AI-generated image detection at low (32×32) resolution with an artifact taxonomy and explainability framework.
Skills
| Technical Skills | |
| Python (PyTorch, Transformers, NumPy, Diffusion) | |
| C++ | |
| Git / GitHub | |
| Linux |
| Machine Learning | |
| Mechanistic Interpretability | |
| Diffusion Models | |
| Autoregressive Modelling | |
| ML for Science & Engineering | |
| Language Models |
| Mathematics & Statistics | |
| Probability Theory | |
| Gaussian Processes | |
| Optimization | |
| Statistics |
Languages
| Hindi | |
| Native speaker |
| English | |
| Advanced |
| French | |
| Beginner |
Interests
| Music | |
| Pianist (Music Section, IIT Roorkee) |
| Swimming | |
| Member — IIT Roorkee Swimming Team | |
| Competitive Swimming |
| Data Science | |
| Data Science Group — IIT Roorkee | |
| Mechanistic interpretability projects |
Projects
- 2025.04 - Present
Domain Circuit Discovery in LLMs - Mechanistic Interpretability
Investigating domain-specific knowledge emergence in LLaMA 3-3B; mapping domain-specific 'rooms' across transformer architecture and evaluating causal effects, probe separability, zero-out tests, hydra effect and fine-tuning shifts.
- Forward pass profiling
- Probe separability
- Zero-out tests
- Hydra effect analysis
- Fine-tuning shift evaluation
- 2025.05 - Present
Decoupled-Value Attention (DVA) for PFNs
Designed and implemented DVA attention to speed up GP-like inference in Prior-Data Fitted Networks for physical equation solving; codebase and experiments developed with P-square Lab collaborators.
- DVA attention design and implementation
- GP inference time reduction for physical equation solving
- Scaling PFNs
-
Sparsity-Aware Representation Learning for Jet Image Generation via Guided Latent Diffusion
Sparsity-aware latent diffusion framework for generating high-energy physics jet images with a custom VAE and a mean-pulling mechanism to improve reconstruction quality.
- Custom VAE with sparsity reconstruction loss
- Latent diffusion mean-pulling mechanism