Updating CSCI 205

CSCI 205 has been a highly successful course for our majors. It is a lot of work for students, and likewise for the instructors who teach it (myself and Prof. Chris Dancy.) But, the rewards have been plenty, as the course teaches a lot about there is a lot of material that is out of date. The course will still relying on Java 7, and used Netbeans.

I’ve taken quite a bit of extra time this semester to update some of the course. As of Fall 2019, the course has been updated in the following ways:

  • We are now using IntelliJ IDEA
  • The course has been updated to Java 12
  • Many videos have been re-recorded to address the updated content, including:
    • Heavier emphasis on lambda expressions than ever before
    • Teaching more java.nio and java.nio2 along with java.io
    • Added new material on socket programming with java.net
    • Added new material on multithreading and concurrency in Java
    • Introduced the Java Stream API (not to be confused with the I/O streams)
  • The final project has been overhauled. Every student now must use Gitlab for all their Scrum task board and sprint management (This has worked surprisingly well!)
  • Expanded the JavaFX material 

It’s a start. There is a lot to be done still.

In a recent discussion with Chris Dancy, he expressed significant interest in incorporating elements of engineering social justice into the course. This represents a broader move by the engineering community at large to start helping our students recognize the impact that their choices have on humanity. I am at fault here. Like many of us, we focus on the goals without teaching our students the impacts that their choices have. Well, that’s not entirely true. I discuss impact – computational resource impact. That’s not enough. I do not give enough attention to social, moral or ethical impact. So, I believe this will be the next set of revisions we make to the course. Chris may likely start on some of those changes in Spring 2020. But, we’ll likely make a more substantial effort to incorporate this into the project over the summer. It’s time for engineers to emphasize people as more important considerations than profits.

Posting from MarsEdit

I’m still slacking off on my hope to do a better job with keeping a presence online. I need massive simplicity, and as much as I dig WordPress, I don’t find the workflow intuitive. So, on my quest to find the simplest, easiest editor that will let me publish posts quickly, with relatively rich content, I continue to stumble around. Nothing out there does exactly what I want. And frankly, this is just not anywhere near representing a high priority. Blogo was easy to use, but they died. I’m not sure where they went? IFTTT supports an automatic hook to allow a new Evernote post to post to WP, but not a self-hosted WP blog like this. I am close to just giving up and editing directly on the WP interface. Given the great functionality that Project Gutenberg has given to WP users with Blocks, that’s not really a bad option. My bit of frustration comes just with dealing with media files, mostly images.

So, checking out MarsEdit, available on the App Store. Do not let the free price to download and install fool you. It’s free with complete features for 14-days, then your ability to publish content is disabled. To continue, you must pay $49 for a full license.

Here’s the interface running on my Mac in Dark Mode. You can see it fully supports Dark Mode in Catalina:

There are also options to edit your slug, tags, select the WP categories this post is assigned to, select your featured image, and other server options to set your post status, author, whether comments are closed, etc. Overall, it seems quite simple. But, I mostly care about editing. You can see above it’s a basic functional rich-text editor. It supports the most common formatting commands.

Yet, we know that WP has made a substantial commitment to its Project Gutenberg – their new Blocks editor. So, what happens when you publish a brand new post? It comes up in “classic” mode when you open your post in WP:

I can attempt to convert my post to Blocks…

MarsEdit_convert_to_blocks

but sometimes it results in no change, and leaves your post in classic mode. Other times, it does indeed work. I’m not certain what the triggers are that prevent your post from converting to separate blocks, though this is not a big deal to me.

At first glance, this tool seems a bit pricey. However, the workflow I need to quickly publish updates from my Mac with minimal effort is definitely there. I might adopt this. Why? There are two tools I rely on extensively when writing documentation – quick screen captures, and recording quick little GIF animations. Having to save a file, upload it to my Media, and then reference it, is an absolute pain.

I’ll try this for a bit…

Past Research Projects

The following are research projects that, for one reason or another, ended up falling down the priority list and are no longer being actively worked on. I list them here as a possible conversation starter with students looking for interesting work

  • [IN PREP] Cowen R, Mitchel MW, Hare-Harris A, King BR. Incorporation of Brown’s stages of syntactic and morphological development in a word prediction model of conversational speech from young children
  • [IN PREP] – Cowen R, Mitchel MW, Hare-Harris A, King BR. An adaptive n-gram based stochastic word prediction model for conversational speech.
  • [IN PREP]- Hare A, Essae E, King BR, Ledbetter DH, Martin CL. Determining the dosage effect of copy number variants in the human genome.
  • [IN PREP] – Ren C, King BR – Protein residue contact map prediction using bagged decision trees

Current Student Research

These are ongoing projects as of Summer 2019


Bhagawat Acharya ’20 – Using deep learning for handwriting text recognition.

  • This is a collaborative, interdisciplinary project with Katherine Faull (Comparative Humanities and German Studies) and Carrie Pirmann (Research Services Librarian). We are working together to develop an improved handwriting translation pipeline to increase the HTR throughput of 17-18th century Moravian handwritten literature that is part of the Moravian archives.
  • Funding – Bucknell Emerging Scholars Summer Research Program

Taehwan Kim ’20 – Using Deep Learning to Forecast Monthly Extreme Temperatures over the United States

  • Undoubtedly, climate change is one of most pressing, disconcerting issues of our time. Collaborating with atmospheric science and aerosol science expert Dabrina Dutcher, Assistant Prof. in Chemistry and Chemical Engineering, we are exploring the use of deep learning to develop advanced models that can improve future temperature predictions
  • Funding – Katherine Mabis McKenna Environmental Internship

Lily Romano ’20 – Software for Aerosol Analysis

  • We are developing a new software toolkit to aid in the aerosol research of my colleagues in Chemical Engineering, Dabrina Dutcher, PhD and Timothy Raymond, PhD. Lily is resuming work that was initiated by former student Khai Nguyen ’18 on the software, including advancing the data analysis tools available for aerosol researchers.
  • Funding – Clare Boothe Luce Research Scholars Program

Kartikeya Sharma ’20 – Trajectory Gaze Path Analysis on Eye Tracking Data for Autism Spectrum Disorder Studies

  • This is a collaborative project with my colleagues, Vanessa Troiani, PhD and Antoinette Sabatino DiCriscio, PhD at the Geisinger Autism Developmental Medicine Institute. The primary aim is to develop a toolkit for the eye tracking research community that incorporates my novel method for extracting scanpath trends from group-level eye tracking data.
  • Funding – Ciffolillo Healthcare Technology Inventors Program

Yili Wang ’21 – Using deep learning to identify discriminative features of images with high interest of autistic children

  • This is a collaborative project with my colleague Vanessa Troiani, PhD at Geisinger Autism and Developmental Medicine Institute. This is also a continuation of a project with former student Tongyu Yang `17, who is continuing to assist with the effort
  • Funding – Bucknell Program for Undergraduate Research (PUR)

These are projects that are unfinished for a variety of reasons:

Including a Jupyter Notebook file on WordPress

I’ve been exploring different mechanisms to post Python Jupyter notebook files on WordPress. Of course, I can use nbconvert to convert my notebook files to other formats – including HTML – right from the command line. I can then post this file as part of an embedded HTML block in a WordPress post. However, this sounded like an unnecessary step, since I also wanted the notebook to be available in GitHub. I did not want to deal with generating this HTML file AND also managing a published notebook on GitLab as well. Smells a lot like duplicate efforts, wasted time. Thanks to a great WordPress plugin from Andy Challis, called nbconvert, I was able to achieve what I wanted! See his page at https://www.andrewchallis.co.uk/portfolio/php-nbconvert-a-wordpress-plugin-for-jupyter-notebooks/ for complete instructions.

  1. If you haven’t yet, you must install WP Pusher as a plugin in your WordPress site. (See this for more info.)
  2. Go to his web page for nbconvert, copy the CSS custom code displayed on the page.
  3. Go to your WordPress page, and add the custom CSS displayed on the page above into Appearance -> Customize -> Additonal CSS
  4. Go to https://github.com/ghandic/nbconvert and verify the latest instructions. Install the nbconvert shortcode plugin through WP Pusher. Activate it.
  5. That’s it!

Follow the instructions to include your own Jupyter notebook file available on GitHub.

Example

Here is an example. In a standalone text (or paragraph) block, I included the following shortcode:

[nbconvert url="https://github.com/bkingcs/python_snippets/blob/master/clustering/hierarchical.ipynb" /]

This generates the following:

Hierarchical Clustering

Example code for heirarchical clustering

In [4]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()  # for plot styling
from scipy.spatial.distance import pdist, squareform
from scipy.cluster.hierarchy import linkage,fcluster,dendrogram, cophenet
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics.cluster import adjusted_rand_score, \
                                    homogeneity_completeness_v_measure, contingency_matrix
In [5]:
from sklearn.datasets.samples_generator import make_blobs
X, y_true = make_blobs(n_samples=60, centers=5,
                              cluster_std=(0.3,0.4,0.5,0.7,0.7),
                              center_box=(0, 8), random_state=1234)

y_true = pd.Categorical([["A","B","C","D","E"][x] for x in y_true])
df = pd.DataFrame(data={"x" : X[:,0],
                        "y" : X[:,1],
                        "target": y_true })
X = df.iloc[:,0:2]
y_true = df.iloc[:,2]
In [6]:
sns.scatterplot(x="x",y="y",hue="target",data=df )
plt.show()

Let's make our first hierarchical clustering. We'll do it piecewise, using some functions in the scipy.cluster.hierarchy package.

We start by computing a distance matrix over all of our data:

In [8]:
d = pdist(X,metric="euclidean")

Now, let's perform the hierarchical clustering using single linkage:

In [9]:
lnk = linkage(d,method="single")

Finally, let's plot a basic dendrogram using the dendrogram function. Notice some of the options we'll use to get some more informative results:

In [10]:
# Plot the dendrogram, but label the leafs using the actual labels in the data
plt.figure(figsize=(11,6))
plt.title("Hierarchical Clustering: Single Linkage")
plt.xlabel("sample index")
plt.ylabel("distance")
dnd = dendrogram(lnk,labels=list(y_true),leaf_rotation=0,leaf_font_size=9,
                 color_threshold=2)
plt.show()

Moving to WordPress

So, you’re interested in contributing some code back to the wonderful Internet community. Well, your first stop should surely be Github. If you’ve been a student in Computer Science, or pretty much any discipline where you need some code or libraries to accomplish some task, then surely you know that 1) Github is your primary go-to, and 2) you better have set up an account, and started sharing some examples of code for your future employers and collaborators to see what you do.

For some of us, however, we want access to a platform that can give us a bit more flexibility to be able to be a bit more reflective and educational with the code we share, and not simply just share code. And, of course there are dozens of options out there for doing that as well. For me, I spent years just managing my own site using Adobe Dreamweaver. However, it was becoming a bit overkill for what I was trying to achieve. I’ve used WordPress for various classes I teach, but not for my own endeavors. Well, as of 2019, that has changed. I narrowed down my search between Medium and WordPress, and concluded that both would have worked. However, I ultimately decided on WordPress due to convincing myself that it had a bit more flexibility to do some cool things.

Plugins

At some point, I’ll start a log of plugins I use…

There is one plugin that is one of my favs…

WP Pusher

One plugin I’ll mention right off the bat is WP Pusher. While WordPress manages a pretty rich set of plugins, some developers have complained that it takes a bit of time to get their plugins available through their API. Also, you’ll quickly notice that there are a lot of outdated plugins still available that may not necessarily work properly on your version of WordPress. Instead, you can deploy plugins directly from GitHub, Bitbucket or GitLab.

It’s quite simple to install WP Pusher:

  1. Go to WP Pusher
  2. Download the zip file (wppusher.zip).
  3. Go to your WordPress dashboard, and select Plugins -> Add Plugins -> Upload Plugin. Upload your zip file.
  4. Activate it!

You should now notice that you have a new menu item available called WP Pusher.

That’s it! Find a WordPress plugin on GitHub, and readily install it using WP Pusher! All plugins you install and activate from WP Pusher will then be available and managed from your standard Plugin interface. Very cool.

My joy with ILTM

I had the joy of becoming a core faculty member of the Institute of Leadership in Technology and Management for the past two summers (Summer 2017, 2018). I found this to be one of the most transformative experiences available to Bucknell students since I’ve been here. I was honored to be part of this program. I worked with some absolutely wonderful students in ILTM! However, as a result of this opportunity, my scholarship was substantially halted for the last two summers. Thus, I have not taken on any new students for quite some time.

I was also on sabbatical during the entire 2017-18 academic year. During this time, I continued to work on interesting projects collaboratively with Dr. Vanessa Troiani at Geisinger Autism and Developmental Medicine Institute. As much as I’ve found much pleasure working in various areas of bioinformatics, I decided it was time for me to explore other areas of sequential data analysis. Dr. Troiani and her lab members have invigorated me with new opportunities in pattern mining mass quantities of eye-tracking data. This ultimately led to another collaborative project involving Dr. Troiani and our own Prof. Evan Peck. Slowly, the research agenda is ramping back up again. I applied to 5 different grant opportunities, of which, to date, one has been awarded, and a much larger one is currently under review.

I’ve also become more involved in interdisciplinary teaching and research opportunities across the university. Bucknell is at a point now where we can truly provide some very interesting transformative experiences to our students – rare opportunities that very few colleges can offer. To do so, however, we must leverage the opportunities that exist across disciplines. Thus, I’ve been intentional in my pursuits to identify new opportunities outside of my own department, and my own home – the College of Engineering. For instance, I’ve had great joy working with my collegue, Prof. Abby Flynt, on both teaching and research projects. (We both recently received the Presidential Award for Teaching Excellence for 2018, and co-mentored a wonderful student, Alexander Murph, who completed an honors thesis and is now at UNC Chapel Hill pursuing his PhD in Statistics!)

Speaking of new, unique opportunities for interdisciplinary work. I’m looking forward to seeing new things happen with our new College of Management, where I expect some interesting collaborations with new faculty who will be part of their new Analytics and Operations Management program. I’ve been spending time with them recently serving on a committee to help them hire new faculty for this exciting program.

Of course, I can’t forget our wonderful friends in Biology, who were so instrumental in collaborating on my bioinformatics projects very early on during my pre-tenure days here. Needless to say, there are great colleagues across this university, with lots of data! It’s a rich place for a data scientist!

Sequential data mining and analysis – it will always remain my primary area of focus, and it’s exciting to be able to afford the risks with tenure to be able to stretch my core interests toward new areas. Fortunately, sequential data are ubiquitous. Thus, I’ve branched away from biological sequence analysis and delved into numerous other areas of sequential data. I will update soon.

My post tenure feelings

So, is tenure all it’s cracked up to be? Well, I’m now in the midst of my third year post tenure. Or is it my second? I don’t even know. I’m burnt out, thanks to the vicious down side of tenure – SERVICE! Once a faculty member receives tenure, it seems as though you are put on a list by the administration throughout the college and university. This list is special. I believe the title of the list is, “PEOPLE WHO WE CAN GUILT INTO SERVING ON COMMITTEES NOW THAT THEY HAVE TENURE.” This semester, I honestly have lost count of the committees and the other opportunities where I have said “yes” to volunteer for opportunities to help my colleagues. The result, I regularly have a minimum of 10-15 additional hours per week dedicated to service obligations; that has recently reached 20+. Those are hours that are on top of my normal teaching hours in and out of the classroom, which are easily 40+ hours, and that doesn’t include my normal teaching/service duties, such as academic advising, department meetings, mandatory caffeine pursuits, and so on. (An academic has no concept of a 40-hour work week. It doesn’t exist.) This is a huge challenge that I’m struggling with. It is due, in part, to a very young, vibrant department of faculty who are going through the tenure process. Thus, the relatively few of us who have tenure take on a lot of the service obligations to protect them as they work through tenure. And, of course, I know, I know… the real reason? Because I often find it difficult to say, “NO!” Like I said, I work at a great place with wonderful colleagues. I believe it’s important to pay it forward. I had people senior to me who were once in my shoes and protected me from excessive service obligations, and I will do the same. The challenge is the imbalance in the department. We’ve had a lot of people retiring in recent years. In time, the balance should be back to normal as others get through the tenure review process, and they can share the service burdens.

Anyway, the most important thing that has me excited? First, I’m teaching BOTH a data mining AND a data science course this Spring! Second – this summer 2019 is mine! All mine! [Insert-evil-laugh-here]. I have not had a summer for research since 2016. So, I have several projects that are ramping up, and I am looking for new students to work with me this summer. Funding is available. Send me an e-mail if you are interested.