While we are known for our participation in structured prediction challenges, MDST has picked up at least two community projects in the last year. MDST members of all experience levels got to participate in both our efforts in Flint and our work with UMS’s ticket purchase data. Around the time that we hit milestones in both projects, news of the Bloomberg Data 4 Good Exchange call for papers reached some members of MDST and we decided to take a shot.
The results of our foray into volunteer, remote, academic paper collaboration can be found below in the form of two successfully written MDST papers! We’re incredibly proud of the results and even prouder of our membership, who worked so hard to produce such quality work.
The Michigan Data Science Team is excited to have partnered with Google and the University of Michigan-Flint to engineer a data platform and accompanying app as a part of our continued efforts to help the community of Flint. This app will provide users with information regarding key public services, such as the locations of water bottle distribution centers and instructions to request new water testing kits. Users will also be able to report concerns about the water quality at their location, and access our predictive model, which flags homes that are potentially at high risk of lead contamination.
Google.org is providing the University of Michigan-Flint a grant of $150,000 to build the platform and accompanying app. In addition, they are also providing access to several Google engineering consultants who will aid in producing interactive visualizations and oversee the app’s user interface design. MDST has created a multidisciplinary engineering team to oversee and manage the creation of our predictive model and data platform.
We will continue our efforts to ask and answer the data-related questions surrounding this crisis in order to provide as much value as we can to the people of Flint. We are incredibly grateful for the support from Google and for the chance to collaborate with our friends and fellow researchers at the University of Michigan-Flint campus.
Last week, we held the FARS Dataset Visualization Challenge, where teams were tasked with visualizing more than a decade of fatal traffic accident records to address the question – “What causes drunk driving accidents?”
First prize went to Team Bidiu (Chengyu Dai, Cyrus Anderson, Cupjin Huang, and Wenbo Shen) whose presentation addressed the questions: who is driving drunk, where are they driving, and when do fatal accidents occur? For their first-place finish, each member of Team Bidiu will receive a $25 gift card to Amazon.com! You can view Team Bidiu’s presentation and source code at the team’s Github page.
It’s my great pleasure to be announcing the next MDST competition! We are very fortunate to be partnering with the Michigan Institute for Data Science (or ‘MIDAS’) for this event. We will be holding the kickoff meeting this Thursday at 5:00pm in 3150 DOW. Through this partnership, we’ve been able to obtain a particularly interesting dataset. We have compiled records of every fatal car accident reported in the United States between 2003 and 2014, a dataset known as the Fatal Accident Reporting System Dataset, or FARS. The challenge will be to predict whether or not a drunk driver was involved in the accident.
More information about logistics, prizes, and the dataset itself will be given at the kickoff ceremony this Thursday. Additionally, we will be awarding prizes to the winners of our last competition, the RateMyProfessor challenge, so if you won a prize, please show up to claim it.
If you have any further questions, feel free to email me at firstname.lastname@example.org
ARC-TS was kind enough to let MDST pick up the unused cores on their computing cluster, which are usually used for computationally intensive research simulations and big data algorithms. You can read more about it at this link!
2nd: The Data Miners – Alexander Zaitzeff, Ryan Sandberg
$100 Amazon Gift Card + MDST T-shirts
3rd: Arya Farahi
Congratulations are in order to DaBrain, who in the last week of the competition rocketed through the leaderboard, displacing the incumbent finalists and finishing out the competition in first place! DaBrain employed the only neural network algorithm in the competition, leveraging the power of the LSTM architecture for Recurrent Neural Networks against this document-based dataset. LSTM-RNN’s have seen great experimental results on NLP tasks in recent years, making this choice in algorithm particularly exciting for the Rate My Professor Challenge.
Our second place finalists, The Data Miners, lead the leaderboard for many weeks after the beginning of the challenge. The Data Miners managed an impressive number of submissions, more than double that of the next most frequently submitting team. In the end, an ensembling method combining ridge regression, random forest regression, and gradient boosting- as well as some hacky tricks beyond explanation- allowed The Data Miners to seize second place!
Our third place finisher was Arya Farahi, a PhD student of Physics. His final approach was well reasoned, relying on a small family of simple, easily interpreted predictors. Employing Ridge, Lasso, and logistic regressions, as well as a measure of ‘happiness’ described by an academic paper (Dodds, et al 2015), Arya was able to build out a highly transparent, robust model with very little additional tinkering.
The MDST administration would like to thank everyone who participated. We deeply appreciate all the time and energy our members put into these competitions. We’ve learned a lot from our first internal competition and we hope you all did as well!
As always, send us your questions, comments, concerns, and suggestions to email@example.com.
The Michigan Data Science Team will be kicking off our second data science competition of the year next week! At this meeting, we will be introducing the competition dataset and giving a live demo of a script to help get you started. You do not need to have participated in the last challenge to compete! Unlike the Springleaf challenge, this will be an entirely internal competition, and will be especially geared towards those new to data science.
This challenge will feature ratings of professors, written by university students. Your task is to infer the numerical rating assigned to a professor by their student from their rating’s text. To be successful, your solutions will need to extract interesting, useful features from this text and apply them effectively. The best solutions will draw inspiration from research in natural language processing, sentiment analysis, and deep learning.
The Springleaf Marketing Response Challenge is now over! Thank you and congratulations to all the dedicated students who competed. In total, we had over 50 students and 20 teams participate in this competition.
1st Place: $200 Amazon Gift Card + T-Shirts
#34 – Cantseetherandomforestforthetrees
Alexander Zaitzeff and Jared Webb
2nd Place: $100 Amazon Gift Card + T-Shirts
#545 – Physteam
Arya Farahi and Anthony Kremin
3rd Place: T-Shirts
#552 – GGBrown
Xiang Li, Xinyu Tan, Tianpei Xie, and Jianming Sang
We graciously acknowledge Soartech for funding for the MDST Springleaf Challenge.
MDST will be participating in its first ever online data science competition, the Springleaf Marketing Response Challenge, during the first few weeks of the Fall semester. In this challenge, competitors are tasked with preemptively identifying customers who will respond to a direct mail offer for a personal or auto loan. For privacy reasons, features are completely anonymized, encouraging competitors to employ creative data-driven feature selection methods.
FLINT—A partnership between Google and the University of Michigan’s Flint and Ann Arbor campuses aims to provide a smartphone app and other digital tools to Flint residents and officials to help them manage the ongoing water crisis.
The app and other tools will help predict where lead levels will be highest in the city’s water, and they’ll pull together information and resources to make the crisis easier to navigate for those affected. The project is made possible by a $150,000 grant from Google.
“This investment by Google is an outstanding commitment to our community. It creates an ideal combination of an industry powerhouse with faculty expertise. It will create new opportunities for students and continue building community partnerships—all so that we can provide quick and critically important information and analysis for our community as we move forward,” said Chancellor Susan E. Borrego of the University of Michigan-Flint.
The Android app is slated for roll-out this summer. It could help residents determine whether their homes are at high risk of having lead-contaminated water. It could also help them locate day-to-day resources for lead testing, water distribution, water bottle recycling, water filters, and volunteer opportunities. A website will offer similar resources and will be accessible on any computer, including those in public libraries.
Additional web-based tools for researchers and government officials could provide detailed insight on how to deploy repairs and resources. For example, they could help identify and prioritize the water service line replacements.
A student team at UM-Flint has already developed a prototype smartphone app for Flint residents. Google and U-M Ann Arbor will work with them through the spring and summer to add mapping features that use predictive analytics from U-M Ann Arbor’s Michigan Data Science Team. The team will also develop an improved user interface with assistance from Google.
Google has pledged a variety of resources to the project including a grant and remote and on-site assistance from its user experience and app development team. The company will also donate data resources to the Michigan Data Science Team including mapping, satellite imagery, and geo-location data.
Initial work by the data science team has already shown some success at predicting which homes and neighborhoods have a high risk of lead contamination. In the coming months, they’ll continue to apply predictive algorithms and machine learning techniques to data from a wide variety of sources including Google, the State of Michigan and the City of Flint. The data includes existing lead testing data; detailed information on the type and location of water infrastructure; and information on the size, age, type, and condition of every parcel of property in the city.
“There’s a lot of data on the water crisis, but it’s scattered over many different agencies and places,” said Jacob Abernethy, an assistant professor of computer science and engineering at U-M Ann Arbor and faculty advisor to the Michigan Data Science Team. “By organizing it in one place and analyzing it, we can predict which areas are likely to be at risk. We can help planners determine which infrastructure repairs will benefit the most residents, and how to allocate resources like bottled water most efficiently.”
Google and U-M also plan to create a separate set of web tools for city planners and other officials. They will include extensive mapping and predictive analytics, with details on waterline type and location and other infrastructure data.
Mark Allison is an assistant professor of computer science at UM-Flint and the faculty leader of the Flint student team. He says the project will be an opportunity for students to make a difference in the water crisis and pick up valuable real-world development experience along the way.
“Finding the best way to put resources close to where high lead levels are is a big part of managing this crisis, and it’s the kind of problem that analytics can solve. We also want to give residents more transparency by making it easier for anyone to get access to the most up-to-date information,” Allison said. “I think this project will be transformative. And for all of us here in Flint, it’s about much more than grades.”
Allison said the team is working to keep the tools they develop flexible, enabling them to be used by other cities that face similar crises. His team is developing the tools as part of UM-Flint Computer Science’s community-based learning program, which puts students to work on real-world challenges in and around Flint.
The Michigan Data Science Team is a competitive extra-curricular team at U-M Ann Arbor. Founded by Abernethy, the team builds and applies advanced computer algorithms that can analyze and “learn” from large sets of data. By finding connections and patterns within that data, they can make predictions about future events. The techniques are already widely used in areas like online retailing and advertising.
“Access to clean drinking water is a concern all over the world, but in the United States it’s often a foregone conclusion. That is not the case recently for the residents of Flint, Michigan,” said Mike Miller, head of Google Michigan. “I am proud that we can contribute to help with the recovery of and we hope we can help to support a resolution to this crisis and get the residents of Flint the resources and respect they so rightly deserve.”
The Flint Water crisis began after April of 2014, when the city’s drinking water source was changed from Lake Huron via Detroit’s water system to the Flint River. The water supply was not properly monitored for corrosion control and it caused lead to leach from service lines into the city’s drinking water. While the city has since switched its water supply back to the Detroit system, residents are still being advised not to drink unfiltered tap water.