Invited Speaker
Prof. Yuan-Ming Zhang
College of Plant Science and Technology, Huazhong Agricultural University, ChinaSpeech Title: Fast3VmrMLM: a fast and efficient GWAS algorithm that identifies QTNs, QTN-by-environment interactions, and QTN-by-QTN interactions for polygenic traits in big data and artificial intelligence era
Abstract: The rapid advancement of omics technologies and AI presents new challenges for genome-wide association studies, including large population sizes, diverse marker types and quantities, and various dependent variable types. Climate change is another challenge. However, studies identifying genes, gene-by-environment interactions (GEI) and gene-by-gene interactions (GGI), as well as breeding by design, remain limited. To identify large-scale genes, GEIs, GGIs and key genes for polygenic or complex traits in big datasets, five algorithms and two computer science technologies were integrated into a compressed variance component mixed model and a 3VmrMLM algorithm framework, combining 'genome-wide scanning + machine learning' to develop an innovative Fast3VmrMLM algorithm. In a reanalysis of 18K rice lines in a single environment, Fast3VmrMLM detected 211 known functional genes for 14 traits, which far exceeds the 100 genes identified by FarmCPU in Science. In a joint analysis across three environments, Fast3VmrMLM detected 103 known functional genes and 26 functional GEIs for six yield-related traits. In an epistasis analysis of 100K markers per environment across 18K rice lines, Fast3VmrMLM identified 133 known functional genes and 41 gene pairs with experimental interaction evidence for six yield-related traits. A gene interaction network was constructed using the known and candidate genes, GEIs and GGIs from the above analyses, identifying 23 key Hub genes related to rice yield traits. The analysis of superior haplotypes for early heading genes identified 38 known major-effect genes that could advance heading by 1–15 days for breeding purposes, as well as 14 GEIs that could advance heading by 1–19 days in Hangzhou. Ten early-heading breeding lines suitable for all three environments and ten region-adapted breeding lines for Hangzhou were identified. In twelve-environment maize dataset, six GEIs interacting with five meteorological factors and two MEJA-detected GEIs helped to explain flowering time plasticity. Thirteen known genes, eight known GEIs and seven plasticity genes advanced flowering by 1.10 to 6.61days, whereas nine known genes, one known GEIs and three plasticity genes increased yield by 0.51 to 3.56 MG·ha–1, identifying fifteen high breeding potential hybrids and 29 genes. Fast3VmrMLM took 12.96 hours and 4.88 GB of memory to jointly analyze phenotypes across 40 environments in 1,000 varieties, each with one million markers, on a small server with 60 CPUs and 1 TB of memory. Additionally, genetic analyses of maize NCII breeding populations, soybean structural variation data, cotton multi-omics data, bin haplotype data and Monte Carlo simulation datasets further validated Fast3VmrMLM. Fast3VmrMLM effectively overcomes the 'blind spots' of traditional approaches when it comes to detecting dominant, small-effect, small allelic substitution effect and rare loci, and expands GEI detection to gene-by-meteorological factor interactions. The size of association mapping populations has increased significantly, from thousands to millions, overcoming the 'computational barrier' of big crop data and the 'bottleneck' challenge of high-end chips. This study presents a method and software platform for large-scale GWAS that is highly effective, fast, broadly applicable, compact and low-power.
