Keynote Speaker
Prof. Dong Xu

Prof. Dong Xu

Department of EECS and C.S. Bond Life Sciences Center
University of Missouri-Columbia, USA
Speech Title: Applications of Large Language Models, Prompt Engineering, and AI Agents for Biology

Abstract: Large language models (LLMs), trained on massive datasets, are opening new frontiers in biology, especially when combined with prompt-based learning, retrieval-augmented generation (RAG), and AI agents. This presentation showcases our work leveraging these tools across multiple biological domains, such as plant science. We developed RAG and prompt refinement techniques to improve gene relationship prediction. We built AI agents for protein annotation and Fatplants (https://fatplants.net), our database of plant lipid-related genes and metabolism. In protein modeling, we introduced S-PLM, a contrastive learning-based, 3D structure-aware protein language model that enhances sequence-based predictions. Prompting protein language models further boosted tasks like signal peptide and targeting signal prediction. We also applied prompt-based learning to large single-cell RNA-seq models, improving several single-cell analysis tasks. In addition, we developed scPlantAnnotate, a plant-specific large single-cell RNA-seq model, for plant cell type annotation that significantly outperforms current reference-based methods across four plant species. Our findings demonstrate the transformative potential of LLMs and AI agents in advancing biological research.


Biography: Dong Xu is Curators’ Distinguished Professor in the Department of Electrical Engineering and Computer Science, with appointments in the Christopher S. Bond Life Sciences Center and the Informatics Institute at the University of Missouri-Columbia. He obtained his Ph.D. from the University of Illinois, Urbana-Champaign in 1995 and did two years of postdoctoral work at the US National Cancer Institute. He was a Staff Scientist at Oak Ridge National Laboratory until 2003 before joining the University of Missouri, where he served as Department Chair of Computer Science during 2007-2016. Over the past 30 years, he has conducted research in many areas of computational biology and bioinformatics, including single-cell data analysis, protein structure prediction and modeling, protein post-translational modifications, protein localization prediction, computational systems biology, biological information systems, and bioinformatics applications in human, microbes, and plants. His research since 2012 has focused on the interface between bioinformatics and deep learning. He has published more than 500 papers with more than 28,000 citations and an H-index of 89 according to Google Scholar. He was elected to the rank of American Association for the Advancement of Science (AAAS) Fellow in 2015 and American Institute for Medical and Biological Engineering (AIMBE) Fellow in 2020.