MIDAS organizes many events throughout the year specifically geared towards students’ technical skill development, job search and career preparation, and engagement with industry professionals in the data science field.

Research Projects

For MIDAS Affiliated Faculty:
If you are a faculty member and would like to submit a research project you are looking for student assistance/collaboration on please submit the below form:

Submit Research Project

The following faculty members are seeking student research assistants (click name to expand):

Murali Mani (Professor of Computer Science, University of Michigan, Flint)

Project Summary
Seeking students for 2 main projects:

  1. Secure query processing that I discussed as part of MIDAS faculty pitch. For this, I am looking for students with a background in cryptography and/or database systems/algorithms.
  2. Data set search where we are investigating what datasets could be investigated further to answer a user question. For this, I am looking for students interested in database systems/algorithms/IR.

Responsibilities
Both projects will require literature analysis. Students will also involve algorithm development, prototype building, evaluation and analysis.

Required Experience
Having completed an undergraduate course in data structures and algorithms are required. Additional course work in database systems or IR (for the dataset search project) or cryptography (for the secure query processing project) would be helpful.

Compensation
Students may register for independent research credit.

Contact
mmani@umich.edu

Date Posted
10/1/2020

Ivo Dinov (Professor, Computational Medicine and Bioinformatics, University of Michigan)

Project Summary
Please see the SOCR Lab projects (computing, data science, health analytics, mathematical physics, bioinformatics, statistical inference):
https://socr.umich.edu/docs/uploads/2021/SOCR_MDP_2021_Projects.pdf
https://wiki.socr.umich.edu/index.php/Available_SOCR_Development_Projects

Responsibilities
Students will develop math models, build computational tools & apps, fit stat inference models, analyze data, design AI/ML forecasting and prediction models.

Required Experience
Extremely strong motivation/self-drive, ability to work in a trans-disciplinary team, deep scientific background.

Compensation
Students may register for independent research credit.

Contact
dinov@umich.edu

Date Posted
10/1/2020

Greg Rybarczyk (Associate Professor, University of Michigan-Flint)

Project Summary
The projects I am currently investigating include: human travel behavior and attitudes when bicycling, walking, and utilizing “third spaces” pre and post Covid. I am also interested in forecasting bicycle-share use and bicycling crash rates in response to urban design. Other related, but important projects currently under consideration include: urban foraging behavior and public health, senior citizen response to green/blue space during and after a pandemic.

Responsibilities
Students will assist with data collection/cleaning, survey development, analysis, mapping, literature reviews, and writing.

Required Experience
Graduate student standing, experience with Python programming, data science, crowd-source data, public health modeling, predictive modeling, sentiment analysis, mixed-methods, GIS, survey development, or spatial analysis.

Compensation
Hourly Pay is Available, Students may register for independent research credit.

Contact
grybar@umich.edu

Date Posted
10/27/2020

Somangshu Mukherji (Assistant Professor, Music Theory, SMTD)

Project Summary
The Telemann Chorale Project – To develop some models for automatically composing Baroque chorales, using ideas from music theory, generative linguistics and computer science, and preparing a new edition of Telemann’s 430 chorales as a result.

Responsibilities
Students will assist with developing new computational frameworks for music composition, natural language processing, and optical score recognition, music theory and linguistics literature analysis, writing code

Required Experience
Prior coursework in music theory and/or generative linguistics, interest in computational modeling of music, some experience preferable in music-related coding languages such as music21.

Compensation
Students may register for independent research credit

Contact
somangsh@umich.edu

Date Posted
11/2/2020

Mathematics Research Institution Coding and Disambiguation

Background
Mathematical Reviews (MR) is a division of the American Mathematical Society.  Since 1940, MR has been collecting data on the research literature in mathematics.  From the beginning, institutional affiliations of authors have been collected, primarily as an aid in distinguishing authors with similar names. However, institutional identities are themselves susceptible to ambiguities, especially since organizations are identified down to the level of departments, not just universities or colleges.  MR follows the guidelines of descriptive cataloging, meaning that the information from the paper or book is used, rather than matching department names to an existing list or authority.

Authors are not consistent in naming their own departments or institutions.  It is possible for one department to be called by several names, even by the same author.  At the same time, at other universities, there may be separate departments, with “Mathematics Department” being pure math and “Department of Mathematical Sciences” being applied math. Finally, political divisions such as countries are not stable,  i.e., countries can split, merge, or just change names. 

Dataset
The dataset consists of a table of 207,334 active Institution Codes. Codes generally follow the pattern A-BB-CCC, where A is one or more character for the country, BB is two or more characters that represent the university or other larger organization, and CCC represents the unit at the level of a department.  The dataset also includes inactive codes:  22,331 codes identified as invalid and 10,212 codes that have been retired. There are separate tables for locations (cities), states, and countries. 

Research Prompts:

  1. Identify possible duplications in the set of institution codes.  
  2. Devise a method for resolving duplications. Solution methods may include scripted querying of the database via MathSciNet, the user interface to the Mathematical Reviews Database.  
  3. Create a database of the disambiguated codes that includes tables for countries, cities, primary institutions (university, college), and department-level units.  

Required Coursework and Skills
Participating students must have taken EECS 484 (Database Management Systems) or its equivalent and be experienced in programming in Python or other general-purpose programming language.

Number of Participants
One team of up to four (4) undergraduate or graduate students will be chosen to work on this project.

Contact
If interested, please send your CV to Jonathan Gryak (gryakj@med.umich.edu), Senior Scientist at MIDAS.

Date Posted
11/13/2020

Mathematics Research Collaboration Graph 2020

Background
Mathematical Reviews (MR) is a division of the American Mathematical Society.  Since 1940, Mathematical Reviews has been collecting data on research literature in mathematics.  The collected authorship data are remarkably accurate (though not perfect). The Mathematics Research Collaboration Graph has the authors from mathematics publications as its vertices, and two vertices are joined by an edge if the corresponding authors have published an article together.  MathSciNet contains information for over one million authors and nearly four million publications.  Understanding the mathematical structure of the collaboration graph provides interesting and useful information about mathematics as a profession.   

Research Prompts:

  1. In this graph, what is the average degree of a vertex?  In other words, what is the average number of co-authors?  What are other parameters for the graph? 
  2. If all authors with no co-authors are eliminated, what is the average degree of a vertex in the resulting graph? What are other parameters for this subgraph?
  3. Are there any Super Collaborators, i.e., authors with a significantly large number of co-authors?  
  4. Continuing from Question 3: for any Super Collaborator, consider the subgraph of the Collaboration Graph consisting of this author, their collaborators, the collaborators of their collaborators, the collaborators of the collaborators of their collaborators, and so on.  
      1. How big is this subgraph?  
      2. What is the average degree of a vertex in this subgraph? 
      3. What is the maximum distance between any two co-authors of this subgraph?  
      4. What happens to this subgraph if the Super Collaborator is removed? Is the subgraph still connected? If not, how many components does it have?  What is the distribution of sizes of these components?  
  1. Find the connected components of the Collaboration Graph, and in particular, compare the two largest components. What structural differences are there between these two components? Is the largest component one of the subgraphs found in Question 4? Are there any authors whose removal results in two disconnected subgraphs of comparable size?  
  2. Some measures of centrality in social networks include betweenness, closeness, and eigencentrality. Investigate these and other parameters for the largest component of the Collaboration Graph. 
  3. What general statements can you make about collaboration in mathematics?  

Datasets
The dataset consists of a list of author pairs, along with an identifier (MR number) of a paper that connects them.  

Required Coursework and Skills
Participating students must have taken Math 465 (Introduction to Combinatorics), EECS 444/544 (Analysis of Societal Networks) or its equivalent and be experienced in programming in Python or other general-purpose programming language.

Number of Participants
One undergraduate or graduate student will be chosen to work on this project.

Contact
If interested, please send your CV to Jonathan Gryak (gryakj@med.umich.edu), Senior Scientist at MIDAS.

Date Posted
11/13/2020

DATA CHALLENGES AND HACKATHONS

Participating as either teams or individuals, students use real world data sets from industry sponsors, community organizations, or University research projects to answer pre-defined research questions.  Often run as competitions, MIDAS aims to promote student work by using judges from industry that may (and many times do) offer outstanding participants internship and permanent job opportunities.

Current Events:

Previous Events:

INDUSTRY TALK-BACKS

Students learn about how data science is utilized in industry and how best to prepare for careers through conversations with real-world professionals.  Sessions are either centered around a theme (interviewing, company specific job openings, etc.) or feature panelists from various fields to give a broad overview of career opportunities in data science.  

Previous Events:

SKILLS WORKSHOPS

MIDAS coordinates workshops with industry professionals that center around a specific data science tool, software package, or company-specific methodology.  Participants actively engage with presenters using real-world data to gain practical experience. Workshop providers include AWS, Google, and Databricks. See the event page for more details on upcoming and past events.

UPCOMING EVENTS

December 1st, 2020 | 6:00-7:30pm

The Student Leadership Board of the Michigan Institute for Data Science invites undergraduates interested in learning more about graduate education in data science at the University of Michigan.  Planned and hosted by students of the Leadership Board, the event is open to all undergraduates with an interest in data science, regardless of experience or background. It will take place next week on December 1st, 2020 at 6-7:30 PM EST via Zoom, and will consist of a series of lightning talks from professors and graduate students from various data science programs at the University of Michigan. Following these presentations, attendees will be split into breakout rooms for an opportunity to ask further questions of our panelists.
Data Science programs from the following UM Schools and Colleges will be represented on the panel:
Computer Science Engineering- College of Engineering
Applied Data Science- School of Information
Statistics and Data Science- College of Literature, Science and the Arts
Biostatistics- School of Public Health
Please fill out this form to register by November 30th.