MIDAS organizes many events throughout the year specifically geared towards students’ technical skill development, job search and career preparation, and engagement with industry professionals in the data science field.

Research Projects

For MIDAS Affiliated Faculty:
If you are a faculty member and would like to submit a research project you are looking for student assistance/collaboration on please submit the below form:

Submit Research Project

The following projects are seeking student research assistants (click name to expand):

Cancer Metabolism and Precision Medicine (Sriram Chandrasekaran, Biomedical Engineering)

Project Summary
This project involves the application of computer models to simulate the metabolic properties of tumors. The computer models will be built using genomics, metabolomics and transcriptomics data from various types of cancer cell lines. By understanding the unique metabolic properties of each cell type, we can design drugs that target specific tumors. Further, knowledge of these differences will be used to design synergistic drug combinations tailored to each patient.

Responsibilities
The project involves data collection, model construction, simulation, analysis, testing,  and literature review. Estimated 8 hours per week of work.

Required Experience
Preferred skills: Familiarity with MATLAB or Python. Basic knowledge of biochemistry, molecular biology and genetics. Experience working with big-data (genomics, transcriptomics) and knowledge of machine-learning.

Compensation
Students may register for independent research credit.

Contact
csriram@umich.edu

Date Posted
1/19/2021

Large Scale Text and AI with Human Intelligence (Paramveer Dhillon, School of Information)

Project Summary
Seeking students for 2 main projects:

  1. Multiple RA positions available for working on projects where you will work with large scale text datasets to extract insights from data.
  2. Multiple RA positions available to perform online studies via Amazon Mechanical Turk to understand users’ perception of AI-based automation and how AI can augment human intelligence.

Responsibilities

  1. The first project involves data cleaning, analysis, writing code, literature review, and communication of results via writing. Estimated 10-12 hours per week of work.
  2. The second project involves data collection, data cleaning, and running online user experiments. Estimated 10-12 hours per week of work.

Required Experience
BS or MS in a quantitative major, development skills to build interfaces to collect data online via crowd-sourcing, should have taken a few data science courses, some knowledge of statistical modeling.

Compensation
Students may register for independent research credit.

Contact
dhillonp@umich.edu

Date Posted
1/15/2021

Mathematics Research Institution Coding and Disambiguation (MIDAS Student Project)

MIDAS Student Project

Mathematics Research Institution Coding and Disambiguation

Background: Mathematical Reviews (MR) is a division of the American Mathematical Society. Since 1940, MR has been collecting data on the research literature in mathematics. From the beginning, institutional affiliations of authors have been collected, primarily as an aid in distinguishing authors with similar names. However, institutional identities are themselves susceptible to ambiguities, especially since organizations are identified down to the level of departments, not just universities or colleges. MR follows the guidelines of descriptive cataloging, meaning that the information from the paper or book is used, rather than matching department names to an existing list or authority.

Authors are not consistent in naming their own departments or institutions. It is possible for one department to be called by several names, even by the same author. At the same time, at other universities, there may be separate departments, with “Mathematics Department” being pure math and “Department of Mathematical Sciences” being applied math. Finally, political divisions such as countries are not stable, i.e., countries can split, merge, or just change names.

Dataset: The dataset consists of a table of 207,334 active Institution Codes. Codes generally follow the pattern A-BB-CCC, where A is one or more character for the country, BB is two or more characters that represent the university or other larger organization, and CCC represents the unit at the level of a department. The dataset also includes inactive codes: 22,331 codes identified as invalid and 10,212 codes that have been retired. There are separate tables for locations (cities), states, and countries.

Research Prompts:

  1. Identify possible duplications in the set of institution codes.
  2. Devise a method for resolving duplications. Solution methods may include scripted querying of thedatabase via MathSciNet, the user interface to the Mathematical Reviews Database.
  3. Create a database of the disambiguated codes that includes tables for countries, cities, primaryinstitutions (university, college), and department-level units.

Required Coursework and Skills: Participating students must have taken EECS 484 (Database Management Systems) or its equivalent and be experienced in programming in Python or other general-purpose programming language.

Number of Participants: One team of up to four (4) undergraduate or graduate students will be chosen to work on this project.

Expected Project Start Date: January 25, 2021

Contact: If interested, please send your CV to Jonathan Gryak (gryakj@med.umich.edu), Senior Scientist at MIDAS, by January 18.

Mathematics Research Collaboration Graph 2020 (MIDAS Student Project)

MIDAS Student Project

Mathematics Research Collaboration Graph 2020

Background: Mathematical Reviews (MR) is a division of the American Mathematical Society. Since 1940, Mathematical Reviews has been collecting data on research literature in mathematics. The collected authorship data are remarkably accurate (though not perfect). The Mathematics Research Collaboration Graph has the authors from mathematics publications as its vertices, and two vertices are joined by an edge if the corresponding authors have published an article together. MathSciNet contains information for over one million authors and nearly four million publications. Understanding the mathematical structure of the collaboration graph provides interesting and useful information about mathematics as a profession.

Research Prompts:

  1. In this graph, what is the average degree of a vertex? In other words, what is the average number of co- authors? What are other parameters for the graph?
  2. If all authors with no co-authors are eliminated, what is the average degree of a vertex in the resulting graph? What are other parameters for this subgraph?
  3. Are there any Super Collaborators, i.e., authors with a significantly large number of co-authors?
  4. Continuing from Question 3: for any Super Collaborator, consider the subgraph of the CollaborationGraph consisting of this author, their collaborators, the collaborators of their collaborators, the collaborators of the collaborators of their collaborators, and so on.
    1. i)  How big is this subgraph?
    2. ii)  What is the average degree of a vertex in this subgraph?
    3. iii)  What is the maximum distance between any two co-authors of this subgraph?
    4. iv)  What happens to this subgraph if the Super Collaborator is removed? Is the subgraph stillconnected? If not, how many components does it have? What is the distribution of sizes of thesecomponents?
  5. Find the connected components of the Collaboration Graph, and in particular, compare the two largestcomponents. What structural differences are there between these two components? Is the largest component one of the subgraphs found in Question 4? Are there any authors whose removal results in two disconnected subgraphs of comparable size?
  6. Some measures of centrality in social networks include betweenness, closeness, and eigencentrality. Investigate these and other parameters for the largest component of the Collaboration Graph.
  7. What general statements can you make about collaboration in mathematics?

Datasets: The dataset consists of a list of author pairs, along with an identifier (MR number) of a paper that connects them.

Required Coursework and Skills: Participating students must have taken Math 465 (Introduction to Combinatorics), EECS 444/544 (Analysis of Societal Networks), or their equivalents and be experienced in programming in Python or other general-purpose programming language.

Number of Participants: One undergraduate or graduate student will be chosen to work on this project. Expected Project Start Date: January 25, 2021

Contact: If interested, please send your CV to Jonathan Gryak (gryakj@med.umich.edu), Senior Scientist at MIDAS, by January 18.

Medical Analysis (Sardar Ansari, MCIRCC, Department of Emergency Medicine)

Project Summary
MCIRCC is seeking volunteer students for the following projects:

1. Analysis of electrocardiogram (ECG) signals: ECG sensors are widely used in clinical settings for diagnosis and monitoring of patients. Hence, automatic analysis of the ECG signal can provide a powerful tool for diagnosis and prognosis of several acute and chronic conditions such as cardiac arrhythmias, myocardial infarction, heart failure and cardiogenic shock. The main objective of this project is to create a machine learning model that interprets ECG signals agnostic of any specific medical condition, similar to how a physician reads an ECG. The model can then be used in various medical applications using transfer learning techniques.

2. Prediction of unplanned transfers in rehabilitation unit using structured and unstructured medical data: MCIRCC has built an early warning system, PICTURE-Rehab, to predict unplanned readmissions of rehabilitation inpatients using structured medical data such as lab results and diagnosis codes. The aim of this project is to augment this model with unstructured medical data obtained from clinical notes using natural language processing.

3. Identifying racial bias in continuous oxygen saturation (SpO2) readings: a recent publication by Michigan Medicine researchers has uncovered racial bias in SpO2 readings from pulse oximeter sensors that are commonly used in hospitals and clinics, leading to underdiagnosis of black patients. The goal of this project is to expand this study to long-term continuous SpO2 readings from bedside monitors and to derive correction factors that would alleviate the existing bias.

4. Using heart rate variability to predict successful weaning from mechanical ventilation: adult patients in the ICU who undergo mechanical ventilation need to be weaned off (taken off) the ventilator when the therapy is no longer needed. Some percentage of patients who are taken off fail weaning and need to be put back on the ventilator. It is important to predict the outcome of this procedure to identify the best time for weaning the patient off the ventilator.

Responsibilities
All students will be responsible for conducting literature reviews, writing code and communicating the results via writing. In addition, each project has specific requirements:

1. Analysis of electrocardiogram (ECG) signals: the student(s) will convert diagnosis statements available for each ECG to labels that define the output for each record. Then, they will train and compare multiple unsupervised machine learning models to create a representation of the ECG that is agnostic of any specific disease. Finally, they will test the models in multiple medical applications to evaluate their efficacy.

2. Prediction of unplanned transfers in rehabilitation unit using structured and unstructured medical data: the student(s) will engage in data cleaning and creating a word embedding to represent the vocabulary of words in the medical notes. Then, they will use a multi-layered approach to model the discharge notes associated with each patient encounter. The output of the model will be a patient profile that is fed to PICTURE-Rehab to better predict unplanned patient transfers.

3. Identifying racial bias in continuous oxygen saturation (SpO2) readings: the student(s) will engage in data cleaning and aligning the readings derived from bedside monitors with those obtained from lab results. They will then conduct statistical analysis to identify racial bias in the data and derive the correction factors.

4. Using heart rate variability to predict successful weaning from mechanical ventilation: the student(s) will engage in data cleaning and deriving heart rate variability features from ECG waveforms. Then, they will train machine learning models with the features as input and the outcome of the weaning attempt as output.

Estimated at least 8 hours per week of work.

Required Experience
Junior or senior undergraduate or graduate students with the following specific requirements for each project:

1. Analysis of electrocardiogram (ECG) signals: the student(s) need to be fluent in python, tensorflow and keras. Knowledge of Matlab is desirable.

2. Prediction of unplanned transfers in rehabilitation unit using structured and unstructured medical data: the student(s) need to be fluent in python, tensorflow and keras and have some knowledge of and/or coursework in natural language processing.

3. Identifying racial bias in continuous oxygen saturation (SpO2) readings: the student(s) need to be fluent either in python or in matlab and be able to perform statistical analysis.

4. Using heart rate variability to predict successful weaning from mechanical ventilation: the student(s) need to be fluent in matlab and general machine learning techniques.

Compensation
Students may register for independent research credit.

Contact
sardara@umich.edu

Date Posted
1/19/2021

Representation Learning of Brain Activity (Zhongming Liu, Biomedical Engineering, Electrical & Computer Engineering)

Project Summary
Representation learning of brain activity. Learning algorithms are designed to represent and decode brain activity, e.g. to reconstruct human vision, speech, language, or dream. Abundant data are available from human or animal brains.

Responsibilities
The project involves data analysis, writing code, and literature analysis. Estimated 5-10 hours per week of work.

Required Experience
Graduate or senior undergraduate students in computer science, electrical engineering, biomedical engineering, statistics, or mathematics. Ideal candidates should have completed courses related to machine learning, especially deep learning, and experiences with PyTorch or TensorFlow.

Compensation
Hourly pay is available, Students may register for independent research credit

Contact
zmliu@umich.edu

Date Posted
1/19/2021

DATA CHALLENGES AND HACKATHONS

Participating as either teams or individuals, students use real world data sets from industry sponsors, community organizations, or University research projects to answer pre-defined research questions.  Often run as competitions, MIDAS aims to promote student work by using judges from industry that may (and many times do) offer outstanding participants internship and permanent job opportunities.

Current Events:

Previous Events:

INDUSTRY TALK-BACKS

Students learn about how data science is utilized in industry and how best to prepare for careers through conversations with real-world professionals.  Sessions are either centered around a theme (interviewing, company specific job openings, etc.) or feature panelists from various fields to give a broad overview of career opportunities in data science.  

Previous Events:

SKILLS WORKSHOPS

MIDAS coordinates workshops with industry professionals that center around a specific data science tool, software package, or company-specific methodology.  Participants actively engage with presenters using real-world data to gain practical experience. Workshop providers include AWS, Google, and Databricks. See the event page for more details on upcoming and past events.

UPCOMING EVENTS

Domino’s Information Session

February 23rd, 2021 | 12:00-1:30pm

Join us for a virtual opportunity to hear from members of Domino’s Global Analytics and Insights team! We’re a collaborative team of just over 50 supporting a brand that serves up an average of 3 million pizzas a day around the globe. Our team includes people with a wide variety of backgrounds including Data Science, Statistics, Economics, Computer Science, Industrial Engineering, Finance, Psychology, Marketing, Chemistry, Physics and even Nuclear Engineering. Come find out how you might fit into the A&I team and hear about professional training, networking and Employee Resource Group opportunities available to all Domino’s employees.