Broadly-speaking, my research focuses on developing computational approaches that translate genomic data into biological insights. I particularly focused on the analysis of DNA sequencing data, primarily derived from microbial communities. I have also worked on, and continue to be interested in the reconstruction of individual bacterial or eukaryotic genomes.
In terms of methods, my research primarily deals with sequence/string algorithms and graph algorithms, though we also touch upon techniques from statistical inference, machine learning, and software engineering.
I am particularly interested in research that leads to the development of software tools usable by biologists, or that provides insights into the context within which certain computational approaches are most effective.
Several broad themes of my research are described below.
Uncovering diversity within microbial communities. An often overlooked area of microbiome research is the study of the differences that occur between strains of individual organisms in a sample. This area is overlooked in part because it is very difficult to reconstruct individual organisms from a metagenomic mixture, let alone determine whether there exist different variants of these organisms. My lab tackles this problem through careful analysis of genome assembly graphs, including through new graph visualization techniques.
Taxonomic/functional identification of organisms. The output of a metagenomic experiment (sequencing of the DNA in a microbial community) comprises many small fragments of DNA. Even after additional processing, such as sequence assembly or clustering, researchers need to decode strings of As, Cs, Gs, and Ts to figure out what organisms are found in the sample (taxonomic identification), as well as what those organisms may do (functional identification). My lab is exploring combinations of database search and statistical inference to speed up and improve the accuracy of taxonomic and functional classification.
Validation. Increasingly we entrust the conclusions of our work to results generated by complex software systems from massive datasets. How do we know that the results are correct, especially when we cannot verify them by hand? This question underlies all the research conducted in our lab. While we are primarily focused on evaluating the output of genome assembly algorithms, we are also tackling the problem of software/result validation in a broader context.