Research

Reconstructing evolutionary history of cancer genomes

The reconstruction of cancer evolutionary history is important for early detection, prognosis, and treatment of patients. Phylogenetic inference methods have been widely used to reconstruct evolutionary history of cancer cells with different types of markers.

Phylogeny inference from copy number profiles of multiple samples

Using copy number alterations (CNAs) as markers, we have developed CNETML,the first program to jointly infer the tree topology, node ages, and CNA rates from longitudinal samples.

With the recent MRC award, we will develop better methods to enrich the toolbox of cancer phylogenetics.

References:

Computational modelling of human genomes

Linking genomics with stochastic modelling and Bayesian inference provides a powerful approach to quantify somatic evolution, which may help to predict disease progression and drug response.

Parameter inference with approximate Bayesian computation (ABC)

We have applied this approach to model CNAs and structural variants (SVs) resulted from chromosome instability from experimental and real cancer patient data.

Our previous inferences of important prognosis-related parameters including chromosome mis-segregation rates and selection strengths were limited by the mixture of signals in bulk sequencing data. Therefore, we are interested in improving the inference at a higher resolution with increasingly available single cell data.

We are also interested in applying the latest genomic language models to tackle challenges related to human genomics.

References:

Investigating intratumour heterogeneity and clonal evolution in cancer genomes

Intratumour heterogeneity (ITH) and clonal evolution often cause therapy failure and drug resistance in cancer patients. We worked on the analysis of genomic ITH in lung adenocarcinoma (LUAD) and hepatocellular carcinoma (HCC) patients previously.

The flow chart of PSiTE

To evaluate the performances of different variant callers and clonal decomposition methods, we also developed a phylogeny guided simulator for tumour evolution (PSITE).

We are interested in developing new methods to decompose ITH and decipher clonal evolution.

References:

Understanding the evolution of microbial genomes

Lateral gene transfer (LGT) and recombination are common and important evolutionary processes in microbes. We have developed two machine learning methods to predict genomic islands (GIs), a large genomic region probably acquired by LGT which may contain genes related to pathogenesis and antibiotic resistance. We are also interested in developing and applying new methods to other important problems in microbial evolution.

The flow chart of GI-Cluster

References:

Phylogenetic networks are becoming essential to represent complex evolutionary relationships when LGT and other reticulation events are involved.

An example of a phylogenetic network

We previously developed algorithms related to two fundamental problems in phylogenetic networks, the tree containment problem (TCP) and the cluster containment problem (CCP). We are interested in solving other related problems.

References:

Integrating multi-omics data to solve problems in cancer and microbial biology

To understand all the underlying processes shaping cancer evolution and inform treatment, it is necessary to integrate data measurements of various types. High-throughput sequencing has been generating huge amounts of multi-omics data, which provide a rich resource of information to address important questions in cancer evolution. However, it is challenging to systematically integrate these heterogeneous data types. Machine learning has emerged as a promising technique for multi-omics data integration, but there are still many challenges including data of high dimension yet low sample size, data noise and missing information, and biological interpretations. We are interested in developing new machine learning methods that leverage multi-omics data to tackle key challenges in cancer and microbial biology.

Despite multiple options and drugs to treat cancer, treatments often fail due to intratumour heterogeneity, metastasis, and drug resistance. However, it remains very expensive and time-consuming to develop new cancer drugs. Drug repurposing serves as a cost-effective option to provide patients with affordable and effective individualized treatments. We are interested in developing new machine learning methods for effective drug combinations or personalized drug recommendations.

References:

Reconstructing and analyzing genome graphs

SVs often alter large genomic regions and play an important role in both species evolution and cancer development. However, the complete landscape of SVs in the genomes has been understudied due to technical limitations and gradually gets improved with the new sequencing techniques such as long-read sequencing.

An example of reconstructed cancer genome graphs

Due to the extreme variety of SVs, graph-based genome representation provides a natural way to analyze SVs, but the utilities of these graphs have not been fully exploited. We are interested in developing new approaches to better understand the patterns and mechanisms of SVs with genome graphs.