We have an active summer in store. Three students are working on entirely different research projects:
- Son Pham is working on investigating the use of Deep Learning for protein sequence classification. Deep Learning has recently gained substantial recognition due to its success with automaticed image recognition and speech classificastion. Very few have examined its use in bioinformatics. Son will help me explore this untapped area in bioinformatics.
- Jason Hammett will be applying data mining techniques to years of regional climate data, including local stats for the Susquehanna River, to develop explanatory and predictive models for anomalistic weather events around the Susquehanna River Valley.
- Robert Cowen will be continuing the wonderful work that I started wtih Bucknell Student Stephanie Gonthier last year on word prediction. Robert will be collaborating with myself and speech pathologists at Geisinger Research to develop a preliminary version of a new augmentative and alternative communication (AAC) app that will utilize my word prediction model. This first version will be developed to run on Android tablets.
- Rachel Ren is graciously staying for a month after graduating to help submit a paper based on her extensive work completed for her honors thesis. Stayed tuned!
Rachel Ren successfully defended her honors thesis, titled, "Predicting Protein Contact Maps by Bagging Decision Trees". Congratulations, Rachel! Additionally, Rachel will be attending graduate school starting in the fall at Columbia University, where she will pursue a Masters in Computer Science. Rachel intends to focus on research in machine learning.
Congratulations, Rachel! Bucknell is proud of you! We wish you the very best as you pursue your graduate work.
Stephanie Gonthier received well deserved recognition on Bucknell's news pages highlighting student research:
In addition, Stephanie has also been recognized for her work in the College of Engineering newsletter:
Summer is winding down. My sabbatical is nearly complete. I wish express sincere thanks to Geisinger Health System for their generous support of my research endeavors this past year. In particular, Dr. Gerard Tromp has been an invaluable resource of knowledge in many areas, and an absolute pleasure to collaborate with.
- August 5, 2014 - The 4th annual Susquehanna Valley Undergraduate Research Symposium was held today at the Henry Hood Center for Research at Geisinger. Like last year, there were four institutions that participated (Bucknell, Susquehanna U., Bloomsburg U., and students and clinicians at Geisinger Research). There were a total of 86 abstract submissions this year for poster presentations, up from 67 last year! Out of those, only three were chosen using a blind review process for an oral presentation, and were awarded with a monetary prize. I am proud to report that our own Stephanie Gonthier '15, a Computer Science and Linguistics double major, was one of the three chosen out of 86 submissions! Her work is titled "Using Statistical Learning to Improve Word Prediction for Augmentative and Alternative Communication". We used data mining methods on a large corpus of crowrdsourced text to develop a word prediction model that can be used in devices used by those that suffer from severe speech or language problems.
- Rachel Ren '15 had a short paper and poster accepted into ACM BCB, which is ACM's internatonal conference in bioinformatics and computational biology. Congratulations, Rachel!
- I will be teaching my data mining class again for this coming Fall 2014 semester. The course was very popular in Spring 2013. To address some questions I'm already receiving:
- The course has an official number now: CSCI 349
- I am reserving two rooms BRKI 164 (Computer Lab) and BRKI 166. The primary room will be 166. However, I'm holding several classes in the lab when we start using various tools required for the course.
- I am not screening enrollments this time, the course is open to students on a first come, first serve basis, assuming prerequisites have been met.
- CSCI 311 is a prerequisite. Students will be required to intimately understand and implement tree and graph based data structures, and understand their associated algorithms beyond what is taught in CSCI 204.
- I can not increase enrollment size.
- Yes, the course will be taught at 8am. Sorry! Please arrive with coffee for me. If there are extras, I'll share with the class.
- I work to improve the course each time I teach it. I can not guarantee what I will be using until the semester starts. However, as of right now, I plan to teach the course using the R programming language and the Weka machine learning / data mining software. I am currently investigating some other tools in Python which might prove quite useful.
- I still plan on using the Data Mining book by Han Kamber and Pei. Other material online will be used as needed.
- Charles Cole, '14, wins award at the annual Susquehanna Valley Undergraduate Research Symposium held on Tuesday, August 6, 2013. He was awarded for his work on analyzing HIV genomic sequence data. He was one of three chosen for the award, out of 67 total submissions. (He also was recognized in the Daily Item!) Congratulations, Charles!
- Charles Cole and Brigitte Hofmeister are both presenting a poster for their work at ACM BCB 2013, in Washington DC. (Details are on my student research page.) Congratulations to both of them for great work this summer!
- Rachel Ren '15 was profiled on Bucknell's Facebook page on August 7, 2013, for her research she started with me this summer on protein contact map prediction. I look forward to more great work with Rachel this coming year.
- I gave a talk at the ASEE 2013 conference, titled, "Teaching Data Mining in the Era of Big Data." (Paper can be found here.) The conference was held this past late June, 2013 in Atlanta, GA.
Student Research Projects
For more information about current and former student research projects, please see my student research page.
- Bioinformatics projects are available each semester! I am looking
for solid computer science students with strong programming skills
to work on some interesting research projects in data mining, mostly
in bioinformatics. There are numerous opportunities for publication
or conference posters and/or presentations. An introductory background in biology (especially
molecular biology) is a plus, but not required. A solid background
in probability and statistical analysis is a plus. Tool development
projects will ideally culminate with delivery of a desktop or web
based application for public community use. All projects will
culminate with a paper or poster reporting results.
- Data mining projects - I'm interested in
working on other data mining projects outside of the bioinformatics
realm. If you have an idea, let's discuss it!
- My work
SCHEDULE is now available online.
Official office hours are posted on there. I encourage all students
to utilize them. If you can not make any of my office hours, e-mail
me for an appointment.
- Faculty profile on Bucknell's main site (released on Sept. 27, 2010):