I am a tenure-track Assistant Professor in the Department of Biostatistics and Bioinformatics at Duke University, where I develop AI-driven methods for biomedical data science. My research focuses on building intelligent, generalizable systems that integrate machine learning, large language models, and multimodal biological data to advance scientific discovery.
A major direction of my work is the use of large language models in biomedical research. I study how models such as GPT-4 can perform tasks including cell-type annotation, code generation, and automated data analysis, and how they compare to human experts. My goal is to transform biomedical data analysis from manual, fragmented pipelines into adaptive, AI-guided systems that improve accessibility, efficiency, and reproducibility.
In parallel, I develop AI and deep learning methods for spatial transcriptomics and biomedical imaging, enabling the integration of molecular and spatial information to better understand tissue organization. I also apply these approaches to study cellular senescence and aging at single-cell resolution.
Overall, my research aims to build next-generation AI systems that not only analyze complex biomedical data but also guide how science is conducted.
Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis. 2024.
Nature Methods
Comparing large language models and human programmers for generating programming code. 2025.
Advanced Science
Evaluating large language models in biomedical data science challenges through a classroom experiment. 2025.
PNAS
PreTSA: computationally efficient modeling of temporal and spatial gene expression patterns. 2026.
Genome Biology
Identifying cell-type-specific spatially variable genes with ctSVG. 2025.
Genome Biology
Vispro improves imaging analysis for Visium spatial transcriptomics. 2025.
Genome Biology
GeneSegNet: a deep learning framework for cell segmentation by integrating gene expression and imaging. 2023. Genome Biology
Single-cell and spatial detection of senescent cells using DeepScence. 2025. Cell Genomics
BIOSTAT 824, Case Studies in Biomedical Data Science (2025). Course Link