Overview of Problem: Comcast’s Internet customers use the Comcast DNS to translate names like www.google.com into IP addresses. Comcast is one of the largest DNS platforms in the world (larger than Google Public DNS for example) and received more than 100 billion queries per day. Until the advent of “big data” systems it was not economically feasible to store all these queries in order to analyze them. Now that this has changed, students will work with Comcast engineers to design and deploy a massive DNS big data system. In addition, and as importantly, students need to determine what questions to ask of the data and determine what other data to merge with it to come up with compelling insights into how customers of the largest ISP in the U.S. are using the DNS – and thus using the Internet. Questions that may be possible with such a system: • What are the top sites that customers visit and how does that change by time of day? And how do these change over time (e.g. increasing or decreasing traffic to netflix.com or hulu.com)? • Can you combine the DNS data with domain categorization data (gambling, games, video, sports, news, social media, etc.) to group DNS data into categories? • Are sites that use tracking cookies also using highly personalized fully qualified domain names (FQDNs) in order to enhance user tracking? • Can this data be combined with malware / abuse data to identify spam and malware-hosting sites? • If a domain name is blocked for some reason (i.e. seized by DHS) how rapidly do users discover and access alternative domain names? • How do Content Delivery Networks (CDNs) used by Akamai, Amazon, Facebook, Google, Netflix, and others use DNS records differently that other sites? Do the records change more often? To they vary geographically? Information on Prior Work / Other Information: None Resources Needed to Complete Work: Work with Comcast to determine how to design & create a big data store, work with them to specify a list of what equipment to buy (max budget $100,000), and work to establish remote VPN access to build queries/reports/access data.

Leave a Reply