Student data science competition winners visit Quicken Loans headquarters in Detroit

By | Educational, General Interest, MDSTPosts, News

Earlier this year, three Data Science Team (MDST) members — winners of the Quicken Loan (QL) Lending Strategies Prediction Challenge — traveled to Detroit to visit QL headquarters, accept their prizes, and present their findings to the company’s Data Science team.

Back row left to right: Reddy Rachamallu, Alexandr, Alex, Mark Nuppnau, Brian Ball
Front row left to right: Jingshu Chen, Patrick, Alex’s wife Kenzie, Yvette Tian, Mike Tan, and Catherine Tu.

 

Alexander Zaitzeff, a graduate student in the Applied and Interdisciplinary Mathematics program won first place; Alexandr Kalinin, a Bioinformatics graduate student earned second; and Patrick Belancourt, a graduate student in Climate and Space Sciences and Engineering took third.

The goal of the competition was to create a model that would predict whether potential clients would end up getting a mortgage based on the loan product originally offered to them. In order to create this model, each participant was given access to proprietary de-identified financial data from recent QL clients. The accuracy of their models was then evaluated on one month of client data.

Alexander Zaitzeff

“Every time I participate in a competition I try out a new technique,” Zaitzeff said. “MDST puts me in competitions with other U-M students who I can team up with and learn from.”

“This was a very valuable competition because it gives people experience working with real datasets, on actual problems that companies work on day to day,” said Jonathan Stroud, organizational chair of MDST.

Brian Ball, a data scientist at QL and U-M alum, said the input from MDST students gained through the competition helped confirm the company’s hope that “our system is predictable from a mathematical standpoint.”

“In that regard, we can use the results produced and the methods used to drive good decisions to most benefit our clients,” he added. “We view this as a total success as it was our hypothesis — and underlying hope — from the beginning.”

About 20 people from QL’s Data Science team gathered to hear how the MDST winners developed their models, as well as vice presidents of the Business Intelligence unit.

The winning entry was an “ensemble model,” in which several models are synthesized into one predictive framework.

Finding that so many different kinds of models performed similarly was a confirmation that “the data tells the story,” Ball said.

“Allowing for each technique to contribute more strongly to the final score in areas where the model type performs well (referred to as “blending” or “stacking”) is an especially strong method and one we should consider moving forward,” he said.

The competition began in September and ran until the end of the Fall semester. Over 70 students competed in this challenge, including both graduates and undergraduates from several schools and departments across the University.

MDST typically runs two or three competitions each year — the current competition involves predicting the value of NFL free agents, and is being conducted in partnership with the Baltimore Ravens. For more information, please visit MDST’s webpage: midas.umich.edu/mdst

HPC training workshops begin Tuesday, Feb. 13

By | Educational, Events, General Interest, Happenings, HPC, News

series of training workshops in high performance computing will be held Feb. 12 through March 6, 2018, presented by CSCAR in conjunction with Advanced Research Computing – Technology Services (ARC-TS).

Introduction to the Linux command Line
This course will familiarize the student with the basics of accessing and interacting with Linux computers using the GNU/Linux operating system’s Bash shell, also known as the “command line.”
Location: East Hall, Room B254, 530 Church St.
Dates: (Please sign up for only one)
• Tuesday, Feb. 13, 1 – 4 p.m. (full descriptionregistration)
• Friday, Feb. 16, 9 a.m. – noon (full description | registration)

Introduction to the Flux cluster and batch computing
This workshop will provide a brief overview of the components of the Flux cluster, including the resource manager and scheduler, and will offer students hands-on experience.
Location: East Hall, Room B254, 530 Church St.
Dates: (Please sign up for only one)
• Monday, Feb. 19, 1 – 4 p.m. (full description | registration)
• Tuesday, March 6, 1 – 4 p.m. (full description | registration)

Advanced batch computing on the Flux cluster
This course will cover advanced areas of cluster computing on the Flux cluster, including common parallel programming models, dependent and array scheduling, and a brief introduction to scientific computing with Python, among other topics.
Location: East Hall, Room B250, 530 Church St.
Dates: (Please sign up for only one)
• Wednesday, Feb. 21, 1 – 5 p.m. (full description | registration)
• Friday, Feb. 23, 1 – 5 p.m. (full description | registration)

Hadoop and Spark workshop
Learn how to process large amounts (up to terabytes) of data using SQL and/or simple programming models available in Python, R, Scala, and Java.
Location: East Hall, Room B250, 530 Church St.
Dates: (Please sign up for only one)
• Thursday, Feb. 22, 1 – 5 p.m. (full description | registration)

New Data Science Course – Winter 2018

By | Educational, News

Computational Data Science
(EECS 598 / BIOINF 505)

A new graduate course that provides an in-depth introduction to computational methods in data science for identifying, fitting, extracting and making sense of patterns in large data sets is now enrolling students for Winter 2018.

Lectures will typically begin with an introduction of a core data science method, followed by the student programming the method computationally with a computer assisting the student by certifying when the program is correct, interleaved with ‘just-in-time’ theory that will expose the student to the mathematics that underpin the methodology. Once the method has been correctly implemented, the students will be given a real world example or ‘success story’ to work with that illustrates when the algorithm ‘works’ as expected, followed by an instructor guided computational exploration of the various subtleties of the algorithm and its weakness.

A full course description, prerequisites and schedule are available.

Please share this announcement with students who might be interested.

U-M students make strong showing at Michigan Datathon

By | Data, Educational, Events, General Interest, Happenings, News

University of Michigan students won first and third places in the Michigan Datathon held Nov. 4, 2017 in the Michigan Union and hosted by Citadel LLC, Correlation One, and the U-M Statistics Department.

1st-place winning team from the University of Michigan:

Ruofei (Brad) Zhao, Statistics Ph.D. student

Zheng Gao, Statistics Ph.D. student

You Wu, Master’s in Applied Statistics student

Kevin Zheng, Sophomore, Computer Science

 

2nd-place team:

Zi Yi, Statistics Master’s student, University of Chicago

Tian Gu, Biostatistics Ph.D. student, University of Michigan

Shuo Zhang, Statistics Master’s student, University of Chicago

Shiyang Lu, Robotics & Naval Architecture and Marine Engineering Master’s student, University of Michigan

 

3rd-place team from the University of Michigan:

Hanbo Sun, Master’s in Applied Statistics student

Xinghui Song, Master’s in Applied Statistics student

Tuo Wang, Master’s in Applied Statistics student

Hang Yuan, Master’s in Applied Statistics student

 

For more, see https://lsa.umich.edu/stats/news-events/all-news/graduatenews/MichiganDatathonWinners0.html

Reading and discussion group:  Data science in understanding and addressing climate change 

By | Educational, Events, General Interest, Happenings

CSCAR announces a reading and discussion group Data science in understanding and addressing climate change that will meet on the third or fourth (depending on the preferences of participants) Friday of every month between 3 and 5 pm. We will discuss reports and significant papers that illuminate fundamental issues in climate change science, policy, and management. The suggested format at this stage is that we discuss one science and one policy (or management) paper or chapter. The focus will be on the spatial (and temporal) dimensions of the issue and we will concentrate more on methods and techniques keeping the requirement for domain knowledge relatively low. We will lay emphasis on the conceptual part of the tools and techniques so that it is accessible to a wider set of participants, but will also get into the technical details.

This is an effort to bring people involved in climate change together from a data science perspective. The idea is to learn together in a fun environment and foster dialogue with a focus on how data science can provide the common ground for mutual learning and understanding.

 We will meet in Rackham, but we will be open to rotating the location. You will be able to participate remotely, if you choose to.

 If you are interested send an email to Manish Verma at manishve@umich.edu

 If you have any suggestion for discussion and reading let us know.  We will include chapters from the IPCC and US global change science programs in our discussion.

MDST – NFL Free Agency Value Prediction Competition Kick-Off – Nov. 9, 6pm

By | Data, Data sets, Educational, Events, Happenings, MDSTPosts, MDSTProjects, News

In this competition, student teams at the University of Michigan will use historical free agent data to predict the value of new contracts signed in the 2018 free agency period. These predictions will be evaluated against the actual contracts as they are signed. This competition is organized by the Michigan Data Science Team (MDST), in collaboration with the Baltimore Ravens and the Michigan Sports Analytics Society (MSAS).  Food will be provided. This is an initial kick-off meeting of the competition.

RSVP

Date, Time

Thursday, November 9 at 6:00 PM EST to Thursday, November 9 at 7:00 PM EST
Add To Google Calendar | iCal/Outlook

Location

Weiser Hall 10th Floor Auditorium
500 Church St, 48104, MI

Host

Michigan Data Science Team

 

 

CSCAR provides walk-in support for new Flux users

By | Data, Educational, Flux, General Interest, HPC, News

CSCAR now provides walk-in support during business hours for students, faculty, and staff seeking assistance in getting started with the Flux computing environment.  CSCAR consultants can walk a researcher through the steps of applying for a Flux account, installing and configuring a terminal client, connecting to Flux, basic SSH and Unix command line, and obtaining or accessing allocations.  

In addition to walk-in support, CSCAR has several staff consultants with expertise in advanced and high performance computing who can work with clients on a variety of topics such as installing, optimizing, and profiling code.  

Support via email is also provided via hpc-support@umich.edu.  

CSCAR is located in room 3550 of the Rackham Building (915 E. Washington St.). Walk-in hours are from 9 a.m. – 5 p.m., Monday through Friday, except for noon – 1 p.m. on Tuesdays.

See the CSCAR web site (cscar.research.umich.edu) for more information.

Info session: Consulting and computing resources for data science — Nov. 8

By | Data, Educational, Events, General Interest, Happenings, HPC

Advanced Research Computing at U-M (ARC) will host an information session for graduate students in all disciplines who are interested in new computing and data science resources and services available to U-M researchers.

Brief presentations from members of ARC Technology Services (ARC-TS) on computing infrastructure, and from Consulting for Statistics, Computing, and Analytics Research (CSCAR) on statistics, data science, and computing training and consulting will be followed by a Q&A session, and opportunities to interact individually with ARC and CSCAR staff.

ARC and CSCAR are interested in connecting with graduate students whose research would benefit from customized or innovative computational or analytic approaches, and can provide guidance for students aiming to do this. ARC and CSCAR are also interested in developing training and documentation materials for a diverse range of application areas, and would welcome input from student researchers on opportunities to tailor our training offerings to new areas.

Speakers:

  • Kerby Shedden, Director, CSCAR
  • Brock Palen, Director, ARC-TS

Date/Time/Location:

Wednesday, Nov. 8, 2017, 2 – 4 p.m., West Conference Room, 4th Floor, Rackham Building (915 E. Washington St.)

Add to Google Calendar