» Data Visualization Discovery Platform

Project Description

This past summer, we (a collaboration between Bucknell and my colleague Dr. Lane Harrison at WPI) built a website – sightlinevis.com – that collects the metadata of data visualizations as people encounter them on the internet. Although it has been live less than a month, we are using this crowdsourced data to construct a database that is growing by more than 100 data visualizations a week (currently at 650 on August 10). Each visualization url has a number of interesting data properties associated with it that was collected by IBM’s Alchemy API – url, title, authors, keywords, taxonomy information, concept information, and even emotion analysis. We also have data about how many of our users have visited each visualization and what other visualizations each user has visited.

We would like to leverage this growing store of unique data. Maybe we can construct a recommendation system that suggests data analyses based on the content of a user’s current website (ex: reading an article about the Election, we suggest interactive visualizations by the NYTimes and 538 on the topic) — what does it mean for visualizations to be similar? Maybe we can suggest data visualizations that may be of interest to you given your browsing history. Maybe we can cluster and organize our existing data visualizations in an interesting way to promote discovery. We now have data on the visualizations an individual visits over days, weeks, and (soon) months — how can we use this information to improve their experience of data visualization on the web? There are a lot of options here, and it is very open-ended. .

Goals

Simply to leverage the information that sightlinevis.com has collected. We think we’re sitting on a very unique dataset, and would like to use it to empower everyday people to understand the complexities of 2016 through data visualization.

Impact

Data visualization is a big buzzword right now because people believe it is a significant tool for facilitating comprehension of complex data (and the complex problems we’re dealing with in 2016). While there are thousands of data visualizations on the internet, no one has leveraged the internet as an ecosystem … increasing access to the diversity of data analyses on any given topic. We think that any compelling tool that moves in this direction could see significant use by visualization practitioners (at least), and hopefully more broadly to everyday people. Dr. Harrison and I would also hope to publish a paper on any tool developed and include the student team as co-authors.

Constraints

Whatever is built, it should be launched publicly.

Resources

The team will need to access sightline’s postgres database (which I can provide access to). Depending on the project, it might also make sense to give access to the sightlinevis.com domain and hosting.