Thank you so, so, soooo much for an excellent info session. We had more than eighty new signups on the spreadsheet and about twenty on the website! For everyone who couldn’t make it, here’s a list of resources you can use to get up to speed.
The Michigan Data Science Team will be holding its Fall info session Thursday, the 8th of September at 6pm in 1670 BBB! If you can’t make it, don’t worry! Just head over to the contact page to drop us a line using the form and we’ll make sure you receive our updates.
You can check out our flyer here. Distribute at your own discretion.
Earlier this summer, MDST submitted two papers to the Bloomberg Data For Good Exchange conference regarding our work on the Flint Water Crisis and with the University Musical Society respectively. It is my great pleasure to announce that the conference has elected both of our papers for presentation at the conference in New York on September 25th!
Needless to say, we’re all very excited. ?
Our very own Jacob Abernethy was recently interviewed on the popular machine learning podcast, Talking Machines. Among other things, Jake was asked about his experiences working with the trove of municipal data available in Flint, his path to research at the University of Michigan, and our work with Google and UM-Flint.
While we are known for our participation in structured prediction challenges, MDST has picked up at least two community projects in the last year. MDST members of all experience levels got to participate in both our efforts in Flint and our work with UMS’s ticket purchase data. Around the time that we hit milestones in both projects, news of the Bloomberg Data 4 Good Exchange call for papers reached some members of MDST and we decided to take a shot.
The results of our foray into volunteer, remote, academic paper collaboration can be found below in the form of two successfully written MDST papers! We’re incredibly proud of the results and even prouder of our membership, who worked so hard to produce such quality work.
The Michigan Data Science Team is excited to have partnered with Google and the University of Michigan-Flint to engineer a data platform and accompanying app as a part of our continued efforts to help the community of Flint. This app will provide users with information regarding key public services, such as the locations of water bottle distribution centers and instructions to request new water testing kits. Users will also be able to report concerns about the water quality at their location, and access our predictive model, which flags homes that are potentially at high risk of lead contamination.
Google.org is providing the University of Michigan-Flint a grant of $150,000 to build the platform and accompanying app. In addition, they are also providing access to several Google engineering consultants who will aid in producing interactive visualizations and oversee the app’s user interface design. MDST has created a multidisciplinary engineering team to oversee and manage the creation of our predictive model and data platform.
We will continue our efforts to ask and answer the data-related questions surrounding this crisis in order to provide as much value as we can to the people of Flint. We are incredibly grateful for the support from Google and for the chance to collaborate with our friends and fellow researchers at the University of Michigan-Flint campus.
Last week, we held the FARS Dataset Visualization Challenge, where teams were tasked with visualizing more than a decade of fatal traffic accident records to address the question – “What causes drunk driving accidents?”
First prize went to Team Bidiu (Chengyu Dai, Cyrus Anderson, Cupjin Huang, and Wenbo Shen) whose presentation addressed the questions: who is driving drunk, where are they driving, and when do fatal accidents occur? For their first-place finish, each member of Team Bidiu will receive a $25 gift card to Amazon.com! You can view Team Bidiu’s presentation and source code at the team’s Github page.
It’s my great pleasure to be announcing the next MDST competition! We are very fortunate to be partnering with the Michigan Institute for Data Science (or ‘MIDAS’) for this event. We will be holding the kickoff meeting this Thursday at 5:00pm in 3150 DOW. Through this partnership, we’ve been able to obtain a particularly interesting dataset. We have compiled records of every fatal car accident reported in the United States between 2003 and 2014, a dataset known as the Fatal Accident Reporting System Dataset, or FARS. The challenge will be to predict whether or not a drunk driver was involved in the accident.
More information about logistics, prizes, and the dataset itself will be given at the kickoff ceremony this Thursday. Additionally, we will be awarding prizes to the winners of our last competition, the RateMyProfessor challenge, so if you won a prize, please show up to claim it.
If you have any further questions, feel free to email me at firstname.lastname@example.org
ARC-TS was kind enough to let MDST pick up the unused cores on their computing cluster, which are usually used for computationally intensive research simulations and big data algorithms. You can read more about it at this link!
Placement & Prizes
1st: DaBrain – Guangsha Shi, Sean Ma, Sheng Yang
$200 Amazon Gift Card + MDST T-shirts
2nd: The Data Miners – Alexander Zaitzeff, Ryan Sandberg
$100 Amazon Gift Card + MDST T-shirts
3rd: Arya Farahi
Congratulations are in order to DaBrain, who in the last week of the competition rocketed through the leaderboard, displacing the incumbent finalists and finishing out the competition in first place! DaBrain employed the only neural network algorithm in the competition, leveraging the power of the LSTM architecture for Recurrent Neural Networks against this document-based dataset. LSTM-RNN’s have seen great experimental results on NLP tasks in recent years, making this choice in algorithm particularly exciting for the Rate My Professor Challenge.
Our second place finalists, The Data Miners, lead the leaderboard for many weeks after the beginning of the challenge. The Data Miners managed an impressive number of submissions, more than double that of the next most frequently submitting team. In the end, an ensembling method combining ridge regression, random forest regression, and gradient boosting- as well as some hacky tricks beyond explanation- allowed The Data Miners to seize second place!
Our third place finisher was Arya Farahi, a PhD student of Physics. His final approach was well reasoned, relying on a small family of simple, easily interpreted predictors. Employing Ridge, Lasso, and logistic regressions, as well as a measure of ‘happiness’ described by an academic paper (Dodds, et al 2015), Arya was able to build out a highly transparent, robust model with very little additional tinkering.
The MDST administration would like to thank everyone who participated. We deeply appreciate all the time and energy our members put into these competitions. We’ve learned a lot from our first internal competition and we hope you all did as well!
As always, send us your questions, comments, concerns, and suggestions to email@example.com.