Gaurav Bhole
Research Intern at Laboratory of Integrative Systems Physiology, École Polytechnique Fédérale de Lausanne (EPFL)
Masters Research Student at Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology Hyderabad (IIITH)
Building solutions to decode the complexities of life sciences using AI
About Me
I'm pursuing my integrated B.Tech in Computer Science and Master of Science in Computational Natural Sciences by Research at IIIT Hyderabad (CGPA: 7.52/10), where I've been recognized as a Research List Award Recipient for the academic year 2024-25. I have also been awarded the IHub-Data Research-Translation Fellowship for the year 2025–26. Under Dr. Nita Parekh's guidance at CCNSB, I've contributed to research in Long-Read DNA sequencing for structural variant detection and multi-modal mammographic analysis.
Simultaneuously, I am working as a Research Intern at the Laboratory of Integrative Systems Physiology (LISP) at EPFL, where I work under the guidance of Dr. Johan Auwerx on cutting-edge aging and behavioral analysis research. My current focus involves developing hierarchical masked autoencoder frameworks to model mouse movement trajectories and decode aging patterns from behavioral time series data using the Healthspan Diversity Panel (~4000 mice from 82 genetically diverse strains).
My research philosophy centers on bridging the gap between computational innovation and biological understanding. I believe that the most profound scientific breakthroughs emerge at the intersection of rigorous computational methods and deep biological insight. By leveraging the power of artificial intelligence and machine learning, I aim to uncover patterns and relationships in complex biological systems that would otherwise remain hidden, contributing to advances that can improve human health and our understanding of life itself.
Deep Learning
Neural networks, transformers, and autoencoder architectures for biological data analysis
Medical Imaging
Mammography analysis, fMRI processing, and computer-aided diagnosis systems
Genomics
Long-read sequencing, structural variant detection, and multi-omics integration
Behavioral Analysis
Movement trajectory modeling and aging pattern recognition in biological systems
NLP & LLMs
Large language models, machine unlearning, and conversational AI for healthcare
Computational Biology
Systems-level modeling and computational approaches to biological problems
Current Research Focus
At EPFL (LISP)
Developing hierarchical masked autoencoder frameworks for modeling aging trajectories from continuous mouse behavioral data. Working with the Healthspan Diversity Panel to link natural movement patterns with genetic variation and molecular aging signatures.
At IIIT Hyderabad (CCNSB)
Advancing structural variant detection in human genomes using Long-Read DNA sequencing technologies. Developing multi-modal deep learning approaches for mammographic analysis and cancer subtype classification using hypergraph contrastive learning.
Professional Journey
Research Intern
LISP, École Polytechnique Fédérale de Lausanne (EPFL)
May 2025 - Present • Lausanne, Switzerland
Adapting hierarchical masked autoencoder frameworks to model mouse movement trajectories and decode aging patterns from behavioral time series pose vectors, working with the Healthspan Diversity Panel (~1800 mice from 82 strains).

Research Student
CCNSB, IIIT Hyderabad
May 2023 - Present • Hyderabad, India
Conducting research in Genetics and Medical Imaging under Dr. Nita Parekh. Focus on Long-Read DNA sequencing for structural variant detection and multi-modal classification strategies for mammographic analysis.

Head Teaching Assistant
IIIT Hyderabad
Aug 2023 - May 2025 • Hyderabad, India
Designed examination papers and led tutorial sessions for Non-Linear Dynamics and Bioinformatics courses.

Research Intern
Global Health X
Jun 2024 - Sept 2024 • Hyderabad, India
Developed conversational AI agents for mental health support using LLM frameworks like DsPy and Langchain. Implemented fine-tuning on Meta-Llama-3.1-8B with PEFT for therapeutic contexts.
Publications & Research
Mammo-Bench: A Large-scale Benchmark Dataset of Mammography Images
Accepted at The 13th International Conference on Computational Advances in Bio and Medical Sciences 2025, Atlanta, USA
HyperCLSA: A Hypergraph Contrastive Learning Pipeline for Multi-Omics Data Integration
Accepted at The 11th International Conference on Pattern Recognition and Machine Intelligence 2025, Delhi, India
DFANet: A Difference Fusion Attention-based method for Semantic Change Detection
Under review at Journal of the Indian Society of Remote Sensing
Deep phenotyping via hierarchical learning of mouse movement
Oral and Poster Presentation at the Computational Biology Symposium 2025, UNIL, Switzerland
Featured Projects
Modeling Brain Activity During Naturalistic Movie Watching
Developed encoding and decoding models using deep MLPs and LSTMs to predict fMRI brain activity from video embeddings, achieving high intra-subject accuracy for short film identification from brain activity patterns.
Machine Unlearning for PII Removal from LLMs
Developed adaptive Representation Misdirection Unlearning techniques to selectively remove personally identifiable information from large language models. Achieved 4th place rankings on both 1B and 7B parameter model leaderboards in SemEval-2025 Task 4.
Image Captioning using FAISS-Accelerated Retrieval
Significantly reduced computational time with FAISS for large datasets using a Distributed Representation-Based Query Expansion Approach, demonstrating that classical retrieval-based methods can achieve competitive performance for image captioning tasks.
Parameter Efficient Fine Tuning for Text Summarization
Implemented and compared three parameter-efficient fine-tuning approaches—Prompt Tuning, LoRA, and traditional fine-tuning—on GPT-2 for text summarization using CNN/Daily Mail dataset. Validated that parameter-efficient methods achieve comparable performance with significantly reduced computational requirements.
Brain Encoding and Decoding for Visual Cognition
Developed bidirectional computational neuroscience pipelines using Natural Scenes Dataset to map between visual stimuli and fMRI responses. Compared CNN architectures achieving correlations up to 0.43, revealing insights into how deep networks model human visual processing mechanisms.
Tokenization Effects in Psycholinguistic Surprisal Analysis
Extended research on surprisal theory by comparing character-level n-gram models, token-level GPT-2 surprisal, and character-level surprisal via beam-based marginalization across four eye-tracking corpora. Found that marginalized character-level surprisal consistently outperformed token-based approaches.
Quantization and Model Compression
Implemented various model quantization techniques for LLMs including both custom quantization implementations and Bitsandbytes integration, focusing on reducing model size while maintaining performance.
Age Prediction from Facial Images
Developed and compared various CNN and Vision Transformer models for age prediction using facial images. Achieved 7th place ranking among 200 contestants in a Kaggle competition, demonstrating effective application of computer vision techniques for age estimation.
Neural Machine Translation with Transformer
Built a Transformer model from scratch for English-French translation based on the "Attention is All You Need" paper. Implemented custom encoder-decoder architecture with self-attention mechanisms and positional encodings.
Text-Based Brain Encoding and Decoding for Cognitive Science
Developed comprehensive computational neuroscience pipeline for bidirectional mapping between textual stimuli and fMRI brain activations. Enhanced decoding performance through multi-ROI integration achieving improved 2V2 accuracy and correlation metrics.
Analysis of Song Lyrics for Global and Indian Top Charts
Analyzed Indian and Global chart songs using self-similarity matrix algorithms for lyrics segmentation and NLP techniques for sentiment analysis. Developed extractive summarization methods and applied Valence-Arousal framework for cross-cultural comparison.
IMDB Movie Review Sentiment Analysis using RNNs and LSTMs
Implemented and compared RNN and LSTM architectures for binary sentiment classification on IMDB movie reviews, discovering that mean pooling across all timesteps significantly outperformed last-hidden-state approaches.
Technical Expertise
Programming Languages
Deep Learning Frameworks
Bioinformatics
LLM & NLP
Tools & Technologies
Web Development
Leadership & Service
Entrepreneurship Cell
Corporate Relations Head
Techno-Cultural Fest
Corporate Relations Head
Server Administrator
Computational Biology Server
Football Captain
University Team
A Bit of Research Humor
As someone who's experienced the academic journey from undergraduate to a masters research student, I find this representation of how people in science see each other both hilarious and surprisingly accurate!

Current Status: Somewhere between the Masters and PhD student phases, definitely seeing my professors as the wise mentors they are (most of the time)! 😄
Let's Connect
Interested in collaboration, research opportunities, or just want to discuss the latest in AI and computational biology? I'd love to hear from you.