Listed below are several vignettes related to my contributions to science, in reverse chronological order. Please click here for a full list of publications.
I took the opportunity to participate in a large consortium, the Human Microbiome Project (HMP), whose aim was to identify what constitutes a “normal” human microbiome. By sequencing microbial communities from 300 individuals sampled at multiple body sites, the range of normal flora in a large cohort was described. This study of healthy individuals can now serve as a reference point for studies of the human microbiota and its association with disease. Involvement in the HMP led to my joining the lab of one of project’s PIs, Dr. George Weinstock, as Director of Microbial Genomics Computing. I led a team that provided informatics infrastructure for research analysts and augmented the analytic capabilities of the lab. Areas of applied research included: microbial assembly and annotation, comparative genomics, automating analytic workflows, R&D support, and updating and extending older pipelines. Administrative duties included overseeing data submission to archives/repositories, web site development, and grant writing. I also interfaced with PIs and colleagues in clinical collaborations, which ultimately led to the formation of a new Pediatric Microbial Genomics group in the Department of Pediatrics, and my faculty position thereof. The Pediatric Microbial Genomics group focuses on improving medical practice through the application of high-throughput multi-omics data and associated computational methods in the clinic. My colleagues and I have recently developed a comprehensive targeted sequence capture panel called ViroCap, designed to enrich nucleic acid from DNA and RNA viruses. This tool will greatly enhance the study of eukaryotic DNA and RNA viruses and takes us closer to using high-throughput sequencing as a comprehensive viral diagnostic tool.
- Enhanced virome sequencing using targeted sequence capture. PMID: 26395152
- Development and Evaluation of an Enterovirus D68 Real-Time Reverse Transcriptase PCR Assay. PMID: 26063859
- Sepsis from the gut: the enteric habitat of bacteria that cause late-onset neonatal bloodstream infections. PMID: 24647013
- Human Microbiome Project Consortium. A framework for human microbiome research. PMID: 22699610
- Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. PMID: 22699609
My analytical work on early next-generation sequencing data provided the background needed to manage a small group of bioinformaticians in the context of technology development. As the Manager of Application Programming and Development for the Technology Development group under Dr. Elaine Mardis, I oversaw several areas of bioinformatics development: performing ad hoc and high priority analyses; analysis workflow and pipeline development (e.g. sncRNA, RNA-seq, and targeted sequencing analysis automation); evaluation/feedback of data produced by early-access and cutting edge technologies; contributing written word and data visualization to presentations, publications, and grants. I contributed analyses, experience, and suggestions to the sequencing and publication of the first human cancer genome, a patient with acute myeloid leukemia (AML), as well as subsequent projects analyzing sequence data from human cancers. Much of my group’s effort was in the emerging area of cancer genomics. My group formulated approaches for assessing custom and whole exome targeted sequence capture platforms, while troubleshooting early-access capture products. We also engineered workflows for comparing coverage/efficiency among multiple commercial exome capture kits. My group was responsible for providing analysis support and tool development for AML methylation studies. I collaborated with a team of oncologists and research biologists to formulate an approach to miRNA generation and analysis from cancer patient specimens. My team also provided analysis, insight, and workflows for an innovative, exploratory project aimed at creating personalized breast cancer vaccines based on epitope binding efficacy, as predicted in silico from underlying SNV mutations from tumor/normal pairs.
- DNMT3A mutations in acute myeloid leukemia. PMID: 21067377
- Next-generation sequencing identifies the natural killer cell microRNA transcriptome. PMID: 20935160
- Recurring mutations found by sequencing an acute myeloid leukemia genome. PMID: 19657110
- DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. PMID: 18987736
The advent of next-generation sequencing technologies dramatically increased genomic data throughput well beyond the capabilities of the computational tools current at the time. As such, I was a charter member of a team of “CompBio” analysts responsible for handling and assessing NGS data. Our foci included: reviewing, testing, and application of cutting edge sequence alignment algorithms, developing coverage modeling algorithms for NGS alignment data, variant calling, and proposing best practices in data handling/assessment of high-throughput data, as imposed by emerging NGS technologies. Our group was one of the first to sequence and evaluate a multicellular organism (C. elegans) using NGS technology. During this time, I greatly increased my proficiency in creating, analyzing, and manipulating large data sets, while expanding my understanding of core bioinformatics algorithms, applications, and data repositories. Our team also developed computational methods for genome assembly appraisal (e.g. platypus, macaque), as well as software to guide draft genome assembly with cDNA sequences. Many of the approaches formulated to handle NGS data would go on to be incorporated into the center’s automated analysis pipeline, the Genome Modeling System. Several of my CompBio colleagues went on to found Cofactor Genomics.
- Genome Modeling System: A Knowledge Management Platform for Genomics. PMID: 26158448
- Genome analysis of the platypus reveals unique signatures of evolution. PMID: 18464734
- Whole-genome sequencing and variant discovery in C. elegans. PMID: 18204455
- Evolutionary and biomedical insights from the rhesus macaque genome. PMID: 17431167
The Human Genome and Model Organisms
My earliest work in genomics involved sequencing and analyzing expressed sequence tags (ESTs) from multiple model organisms–e.g. human, mouse, chicken, soybean, zebrafish, toxoplasma, and numerous parasitic and free-living nematode species. Under Dr. Marco A. Marra, I contributed to the management of a research laboratory (15+ technicians) specializing in large-scale cDNA sequencing, at the time the largest provider of open-access EST data in the world. During this time, I became fluent in Unix/Linux operating systems and associated programming languages, as well as core bioinformatics applications. I helped define the analysis pipeline and methods used to process ~6 million ESTs from over 90 species and co-authored over 2.3 million ESTs submitted to NCBI’s dbEST repository. I filled a key analysis role as a member of the Parasitic Nematode EST Project team, which entailed the sequencing, cataloging, and comparative analysis of ESTs from over 30 species of free-living and parasitic nematode species. I was lead software engineer for Nematode Net. Designing this site combined many disparate aspects of my skill set: biology, database administration, programming, and web design. I also implemented NemaPath software, which provides information about the presence, absence and composition of enzymatic pathways in given organisms based on transcript data, while providing higher-order comparisons across clades and hosts. These activities were formative in building skills and knowledge related to genomics, bioinformatics, and transcriptome analysis.
- NemaPath: online exploration of KEGG-based metabolic pathways for nematodes. PMID: 18983679
- Nematode.net: a tool for navigating sequences from parasitic and free-living nematodes. PMID: 14681448
- A compilation of soybean ESTs: generation and analysis. PMID: 11962630
- An encyclopedia of mouse genes. PMID: 9988271