Matthew Segar, ’12

Project: A probabilistic method for assembly of next generation sequencing instrumentation
Duration: Summer 2011 – Spring 2012
Funding: Bucknell PUR, Provost’s Office, CS. Dept Funds

ABSTRACT

With the advent of cheaper and faster DNA sequencing technologies, assembly methods have greatly changed. Instead of outputting reads that are thousands of base pairs long, new sequencers parallelize the task by producing read lengths between 35 and 400 base pairs. Reconstructing an organism’s genome from these millions of reads is a computationally expensive task. Our algorithm solves this problem by organizing and indexing the reads using n-grams, which are short, fixed-length DNA sequences of length n. These n-grams are used to efficiently locate putative read joins, thereby eliminating the need to perform an exhaustive search over all possible read pairs. Our goal is to develop a novel n-gram method for the assembly of genomes from next-generation sequencers. Specifically, a probabilistic, iterative approach will be utilized to determine the most likely reads to join through development of a new metric that models the probability of any two arbitrary reads being joined together. Tests were run using simulated short read data based on randomly created genomes ranging in lengths from 10,000 to 100,000 nucleotides with 16 to 20x coverage. We have been able to successfully re-assemble entire genomes up to 100,000 nucleotides in length.

ACHIEVEMENTS

  • Honor’s Thesis: A probabilistic method for assembly of next generation sequencing instrumentation
    • Matt was awarded the Harold W. Miller prize — a competitive university-wide award given to one or two students at graduation that complete a highly successful honors thesis. CONGRATULATIONS, MATT! The award was well-deserved.
  • Poster (International Conference) – Presented at 20th Annual International Conference on Intelligent Systems for Molecular Biology, ISMB 2012, July 15-17, Long Beach, CA
  • Poster – Susquehanna Valley Undergraduate Research Symposium, August 9, 2011 Geisinger Research, Danville, PA
  • Poster – Sigma Xi Summer Research Symposium, July 27, 2011, Bucknell University, Lewisburg, PA

POST GRADUATION UPDATES

Matt completed a masters in bioinformatics at Indiana University – Purdue University Indianapolis in Spring, 2014. He has now been accepted into the School of Medicine at Indiana University.