Tag

MIDAS

MIDAS Special Seminar: Qianying Lin – MIDAS Data Science Fellow

By |

 

Qianying Lin

Michigan Institute for Data Science – Data Science Fellow

VIEW RECORDING

COVID-19 outbreak in Wuhan, China: in retrospect and in prospect

Since first confirmation in December 2019, the novel coronavirus diseases (COVID-19) infected more than 50,000 people and claimed over 2000 lives in Wuhan, China. It was transmitted across the whole country shortly, and now swept the world by causing more 20,000 infections in countries other than China. Using official reported cases and assuming changing reporting ratio, we investigated the early stage of the epidemic of COVID-19 in Wuhan and analysed its transmissibility. We then built up a conceptual model and incorporated the zoonotic introduction, emigration, individual reaction, and governmental action to simulate the trends of the outbreak in Wuhan and predicted the disease would be completely controlled by the end of April under current policies. These studies provide insights into not only the characteristics of COVID-19 itself, but the impact of governmental actions.

Read Global Reach’s article here. Read full transcript here.

For more information on MIDAS or the Seminar Series, please contact midas-contact@umich.edu.

#UMTweetCon2019

By |

A Conference on the Use of Twitter Data for Research and Analytics

 

#UMTweetCon2019 will connect U-M scholars across a diverse set of disciplines in an interdisciplinary exchange about common challenges and lessons learned. We further seek to facilitate new connections to help U-M scholars create opportunities for future joint research, collaborative grant writing, training and other activities. Conference attendance will be open to anyone interested in learning about the wide array of Twitter data applications in current research at the University. The conference is sponsored by the Social Science and Social Media Collaborative, the Michigan Institute for Data Science, the #Parenting Rackham Interdisciplinary Group, and coordinated by the Center for Political Studies and the Institute for Social Research.

Call for Abstracts

Do you use Twitter data in your research? Then, you are invited to submit an abstract for the first

 university wide conference at the University of Michigan (Ann Arbor, Dearborn, and Flint) on the use of Twitter data in research and analytics. #UMTweetCon2019 will connect U-M scholars across a diverse set of disciplines in an interdisciplinary exchange about common challenges and lessons learned. We further seek to facilitate new connections to help U-M scholars create opportunities for future joint research, collaborative grant writing, training and other activities. Conference attendance will be open to anyone interested in learning about the wide array of Twitter data applications in current research at the University.

To reflect the wide range of ongoing research across disciplines, we invite submissions that 1) directly examine dynamics of Tweet behavior and Twitter networks, 2) explore the representativeness and validity of Twitter data for making scientific inference, 3) develop new computational methodology for obtaining, processing, or archiving Twitter data, or 4) present applications of Twitter data for studying diverse social phenomena. During the 2-day conference, research presentations will be complemented with participatory sessions to provide participants with an opportunity to plan future activities and help create a regular user community across campuses (e.g., seminar series, computational training sessions, hackathons, regular coding meetups, etc.)

Interested U-M researchers are asked to use the form linked here to submit a short abstract of 200-300 words in length that describes their research project, along with information about participating co-authors. Submissions are due by Friday April 12, 2019.

Click here to submit an abstract for a panel or poster presentation.

Attending #UMTweetCon2019 will require a small, non-refundable registration fee from presenters and attendees alike (students/post-docs: $15 pre-conference online, $20 on-site; faculty/staff/other: $30 pre-conference online, $40 on-site). Presenters and attendees from Dearborn and Flint campuses will receive a registration discount (students/post-docs: $15, faculty/staff/other: $20). We will use the revenue from registration fees to fund best paper awards.

Understanding How the Brain Processes Music Through the Bach Trio Sonatas

By |

This event is open to the public.

Daniel Forger, Professor of Mathematics and Computational Medicine and Bioinformatics
James Kibbie, Professor of Music and Chair of the Organ Department, University Organist
Caleb Mayer, Graduate Student Research Assistant (Mathematics)
Sarah Simko, Graduate Student Research Assistant (Organ Performance)

With support from the Data Science for Music Challenge Initiative through MIDAS, the team is taking a big data approach to understanding the patterns and principles of music. The project is developing and analyzing a library of digitized performances of the Trio Sonatas for organ by Johann Sebastian Bach, applying novel algorithms to study the music structure from a data science perspective. Organ students from the School of Music, Theatre & Dance will demonstrate how the Frieze Memorial Organ in Hill Auditorium is used to create big data files of live performances. The team will discuss how its analysis compares different performances to determine features that make performances artistic, as well as the common mistakes performers make. The digitized performances will be shared with researchers and will enable research and pedagogy in many disciplines, including data science, music performance, mathematics and music psychology.

MIDAS adds Associate Directors to boost campus engagement

By | General Interest, Happenings, News

The Michigan Institute for Data Science (MIDAS) has added two Associate Directors who will help increase outreach to all academic units at the University of Michigan.

  • Pamela Davis-Kean, Professor of Psychology and Research Professor at the Institute for Social Research, will be the new MIDAS Associate Director for Humanities and Social Sciences.
  • Kayvan Najarian, Professor of Computational Medicine and Bioinformatics and Emergency Medicine, will be the new MIDAS Associate Director for Health Sciences.
  • Ivo Dinov, Professor of Health Behavior and Biological Science, will continue as the MIDAS Associate Director for Education and Training.
  • H.V. Jagadish, Professor of Electrical Engineering and Computer Science, and the recently appointed Director of MIDAS, will lead outreach efforts for Engineering and the Physical Sciences.

“The goal is for each associate director to engage with corresponding parts of the University,” said Prof. Jagadish. “At times, that will mean simply being a primary point of contact for researchers engaged in data-driven science. But it will also entail developing data science activities or programs of particular interest to researchers in their respective parts of campus.”

Davis-Kean and Najarian will take their positions on March 1, 2019.

MIDAS was established in 2015 as part of the university-wide Data Science Initiative to promote interdisciplinary collaboration in data science and education. The institute has built a cohort of more than 200 affiliated faculty members who span all three U-M campuses. Institute funding has catalyzed several multidisciplinary research projects, many of which have generated significant external funding. MIDAS also plays a key role in establishing new educational opportunities, such as the graduate certificate in data science, and provides additional support for student groups, including one team that used data science to help address the Flint water crisis.

Graduate Studies in Computational & Data Sciences Info Session – Central Campus

By |

2016-06-14 11.13.52Learn about graduate programs that will prepare you for success in computationally intensive fields — pizza and pop provided

  • The Ph.D. in Scientific Computing is open to all Ph.D. students who will make extensive use of large-scale computation, computational methods, or algorithms for advanced computer architectures in their studies. It is a joint degree program, with students earning a Ph.D. from their current departments, “… and Scientific Computing” — for example, “Ph.D. in Aerospace Engineering and Scientific Computing.”
  • The Graduate Certificate in Computational Discovery and Engineering trains graduate students in computationally intensive research so they can excel in interdisciplinary HPC-focused research and product development environments. The certificate is open to all students currently pursuing Master’s or Ph.D. degrees at the University of Michigan.
  • The Graduate Certificate in Data Science is focused on developing core proficiencies in data analytics:
    1) Modeling — Understanding of core data science principles, assumptions and applications;
    2) Technology — Knowledge of basic protocols for data management, processing, computation, information extraction, and visualization;
    3) Practice — Hands-on experience with real data, modeling tools, and technology resources.
  • The Graduate Certificate in Computational Neuroscience provides training in interdisciplinary computational neuroscience to graduate students in experimental neuroscience programs and to graduate students in quantitative science programs, such as physics, biophysics, mathematics and engineering. The curriculum includes required core computational neuroscience courses and coursework outside of the student’s home department research focus, i.e. quantitative coursework for students in experimental programs, and neuroscience coursework for students in quantitative programs.

Graduate Studies in Computational & Data Sciences Info Session – North Campus

By |

2016-06-14 11.13.52Learn about graduate programs that will prepare you for success in computationally intensive fields — pizza and pop provided

  • The Ph.D. in Scientific Computing is open to all Ph.D. students who will make extensive use of large-scale computation, computational methods, or algorithms for advanced computer architectures in their studies. It is a joint degree program, with students earning a Ph.D. from their current departments, “… and Scientific Computing” — for example, “Ph.D. in Aerospace Engineering and Scientific Computing.”
  • The Graduate Certificate in Computational Discovery and Engineering trains graduate students in computationally intensive research so they can excel in interdisciplinary HPC-focused research and product development environments. The certificate is open to all students currently pursuing Master’s or Ph.D. degrees at the University of Michigan.
  • The Graduate Certificate in Data Science is focused on developing core proficiencies in data analytics:
    1) Modeling — Understanding of core data science principles, assumptions and applications;
    2) Technology — Knowledge of basic protocols for data management, processing, computation, information extraction, and visualization;
    3) Practice — Hands-on experience with real data, modeling tools, and technology resources.
  • The Graduate Certificate in Computational Neuroscience provides training in interdisciplinary computational neuroscience to graduate students in experimental neuroscience programs and to graduate students in quantitative science programs, such as physics, biophysics, mathematics and engineering. The curriculum includes required core computational neuroscience courses and coursework outside of the student’s home department research focus, i.e. quantitative coursework for students in experimental programs, and neuroscience coursework for students in quantitative programs.

H.V. Jagadish appointed director of MIDAS

By | General Interest, Happenings, News

H.V. Jagadish has been appointed director of the Michigan Institute for Data Science (MIDAS), effective February 15, 2019.

Jagadish, the Bernard A. Galler Collegiate Professor of Electrical Engineering and Computer Science at the University of Michigan, was one of the initiators of an earlier concept of a data science initiative on campus. With support from all academic units and the Institute for Social Research, the Office of the Provost and Office of the Vice President for Research, MIDAS was established in 2015 as part of the university-wide Data Science Initiative to promote interdisciplinary collaboration in data science and education.

“I have a longstanding passion for data science, and I understand its importance in addressing a variety of important societal issues,” Jagadish said. “As the focal point for data science research at Michigan, I am thrilled to help lead MIDAS into its next stage and further expand our data science efforts across disciplines.”

Jagadish replaces MIDAS co-directors Brian Athey and Alfred Hero, who completed their leadership appointments in December 2018.

“Professor Jagadish is a leader in the field of data science, and over the past two decades, he has exhibited national and international leadership in this area,” said S. Jack Hu, U-M vice president for research. “His leadership will help continue the advancement of data science methodologies and the application of data science in research in all disciplines.”

MIDAS has built a cohort of 26 active core faculty members and more than 200 affiliated faculty members who span all three U-M campuses. Institute funding has catalyzed several multidisciplinary research projects in health, transportation, learning analytics, social sciences and the arts, many of which have generated significant external funding. MIDAS also plays a key role in establishing new educational opportunities, such as the graduate certificate in data science, and provides additional support for student groups, including one team that used data science to help address the Flint water crisis.

As director, Jagadish aims to expand the institute’s research focus and strengthen its partnerships with industry.

“The number of academic fields taking advantage of data science techniques and tools has been growing dramatically,” Jagadish said. “Over the next several years, MIDAS will continue to leverage the university’s strengths in data science methodologies to advance research in a wide array of fields, including the humanities and social sciences.”

Jagadish joined U-M in 1999. He previously led the Database Research Department at AT&T Labs.

His research, which focuses on information management, has resulted in more than 200 journal articles and 37 patents. Jagadish is a fellow of the Association for Computing Machinery and the American Association for the Advancement of Science, and he served nine years on the Computing Research Association board.

Women in Big Data at Michigan Symposium

By |

Please join us for the Women in Big Data at Michigan symposium. This day-long symposium will highlight women data science researchers at U-M, provide resources and support for women pursuing careers in data science, a poster session, lunch time round table discussions, a faculty panel, and ample time for networking.

Please fill out the registration form if you plan to attend and consider submitting a poster. 

For more information, see the event page at https://midas.umich.edu/2018-wbdm/.

Keynote Speaker:

Xihong Lin
Henry Pickering Walcott Professor of Biostatistics
Harvard T.H. Chan School of Public Health

Dr. Lin’s research focuses on the development and application of statistical and computational methods to analyze high-throughput genetic and genomic data in epidemiological, environmental and clinical studies, and to analyze complex exposure and phenotype data in observational studies.

U-M Speakers:

Presenters Panel Participants
“Charting a Career in Data Science”
Jenna Wiens, Computer Science and Engineering Moderator: Liza Levina, Statistics
Snigdha Panagrahi, Statistics Bhramar Mukherjee, Biostatistics
Heather Mayes, Chemical Engineering Rada Mihalcea, Computer Science and Engineering
Danai Koutra, Computer Science and Engineering Amy Cohn, Industrial and Operations Engineering
Veronica Berrocal, Biostatistics Rocio Titiunik, Political Science
Maureen Sartor, DCMB Jennifer Linderman, Chemical Engineering

MIDAS Learning Analytics Challenge Symposium

By |

Learning analytics is one of the research focus areas that MIDAS supports with its Challenge Awards.  Our long-term goal is to support this research area more broadly, using the Challenge Award projects as the starting point to build a critical mass.  This symposium offers a platform for all participants to explore collaboration opportunities and aims to attract more researchers to our hub.  It will feature in-depth presentations from two Challenge Award teams, and all participants are encouraged to submit posters on research related to Learning Analytics.

Agenda

9 am to 11:30 am: Welcome and Challenge Award presentations

11:30 am to 1 pm: Lunch, Poster Session, Networking [poster dimensions: up to 6ft wide X 4ft height]

1 to 2 pm: Panel discussion: The Future of Data Science for Learning Analytics at U-M

Panelists:

  • Steve DesJardins, Education, Public Policy
  • Cynthia Finelli, Engineering Education Research Program
  • Al Hero (Moderator), MIDAS, Electrical Engineering and Computer Science
  • Rada Mihalcea, Computer Science Engineering
  • Stephanie Teasley, Information

 

Please register online.  Please submit poster abstracts (< 300 words).  Submission Deadline: May 15.

For questions: midas-research@umich.edu.

Recommended Visitor Parking:  Palmer Parking StructurePalmer Drive, Ann Arbor

U-M, MIDAS researchers supported by Chan Zuckerberg Initiative

By | General Interest, Happenings, News, Research

Several University of Michigan researchers, including faculty affiliated with MIDAS, recently received support from the Chan Zuckerberg Initiative under its Human Cell Atlas project.

The project seeks to create a shared, open reference atlas of all cells in the healthy human body as a resource for studies of health and disease. The project is funding a variety of software tools and analytic methods. The U-M projects are listed below:

Identifying genetic markers: dimension reduction and feature selection for sparse data
Investigator: Anna Gilbert, Department of Mathematics, MIDAS Core Faculty Member
Description: One of the modalities that scientists participating in the Human Cell Atlas will use to gather data is single cell RNA sequencing (scRNA-seq). The analysis, however, of scRNA-seq data poses novel biological and algorithmic challenges. The data are high dimensional and not necessarily in distinct clusters (indeed, some cell types are exist along a continuum or developmental trajectory). In addition, data values are missing. To analyze this data, we must adjust our dimension reduction algorithms accordingly and either fill in the values or determine quantitatively the impact of the missing values. Furthermore, none of these steps is performed in isolation; they are part of a principled data analysis pipeline. This work will leverage over a decade of modern, sparsity-based machine learning methods and apply them to dimension reduction, marker selection, and data imputation for scRNA-seq data. In one of our two feature selection methods, we adapt a 1-bit compressed sensing algorithm (1CS) introduced by Genzel and Conrad. In order to select markers, the algorithm finds optimal hyperplanes that separate the given clusters of cells and that depend only on a small number of genes. The second method is based on the mutual information (MI) framework developed in. This algorithm greedily builds a set of markers out of a set of statistically significant genes that maximizes information about the target clusters and minimizes redundancy between markers. The imputation algorithms use sparse data models to impute missing values and are tailored to integer counts.

Computational tools for integrating single-cell RNA sequencing studies with genome-wide association studies
Investigator: Xiang Zhou, Biostatistics
Description: Single cell RNA sequencing (scRNAseq) has emerged as a powerful tool in genomics. Unlike previous bulk RNAseq that measures average expression levels across many cells, scRNAseq can measure gene expression at the single cell level. The high resolution of scRNAseq has thus far transformed genomics: scRNAseq has been applied to classify novel cell-subpopulations and states, quantify progressive gene expression, perform spatial mapping, identify differentially expressed genes, and investigate the genetic basis of expression variation. While many computational tools have been developed for analyzing scRNAseq data, tools for effective integrative analysis of scRNAseq with other existing genetic/genomic data types are underdeveloped. Here, we propose to extend our previous integrative methods and develop novel computational tools for integrating scRNAseq data with genome-wide association studies (GWASs). Our proposed tools will identify cell-subpopulations relevant to GWAS diseases or traits, facilitate the interpretation of association results, catalyze more powerful future association studies, and help understand disease etiology and the genetic basis of phenotypic variation. The proposed tools will be applied to integrate summary statistics from various GWASs with fine-scale cell-subpopulations identified from the Human Cell Atlas (HCA) project, to maximize the impact of HCA and facilitate our understanding of the genetic architecture of various human traits and diseases — a question of central importance to human health.

Joint analysis of single cell and bulk RNA data via matrix factorization
Investigator: Clayton Scott, Electrical Engineering and Computer Science, MIDAS Affiliated Faculty
Description: Single cell RNA sequence (ssRNAseq) data is a recently developed platform that enables the measurement of thousands of gene expression levels across individual cells in a tissue sample of interest. The ability to quantify gene expression at the cell level has great potential for advancing our understanding of the cellular processes that characterize a broad range of biological phenomena. However, compared with older bulk RNA technology, which measures expression levels of large numbers of cells in aggregate, ssRNAseq data has higher levels of measurement noise, which complicates its analysis. Furthermore, the problem of inferring cell type from ssRNAseq data is an unsupervised machine learning problem, an already difficult problem even without high measurement noise. To address these issues, we propose a mathematical and algorithmic framework to infer cellular characteristics by analyzing single cell and bulk RNA data simultaneously, via an approach grounded in matrix factorization. The developed algorithms will be evaluated on real data gathered by researchers at the University of Michigan who study breast cancer and spermatogenesis.

Integrating single cell profiles across modalities using manifold alignment
Investigator: Joshua Welch, Computational Medicine and Bioinformatics
Description: Integrating the variation underlying different types of single cell measurements is a critical step toward a comprehensive catalog of human cell types. The ideal approach to construct a cell type atlas would use high-throughput single cell multi-omic profiling to simultaneously measure all cellular modalities of interest within each cell. Although this approach is currently out of reach, it is possible to separately perform high-throughput transcriptomic, epigenomic, and proteomic measurements at the single cell level. Computationally integrating multiple data modalities measured on different individual cells can circumvent the experimental challenges of multi-omic profiling. If different types of single cell measurements are performed on distinct single cells from a common population, each modality will sample a similar set of cells. Matching up similar cells to infer multimodal profiles enables some analyses for which multi-omic profiling is desirable, including multimodal cell type definition and studying covariance among different data types. Manifold alignment is a powerful computational technique for integrating multiple sources of data that describe the same set of events by discovering the common manifold (general geometric shape) that underlies them. Previously, we showed that transcriptomic and epigenomic measurements performed on distinct single cells share underlying sources of variation. We developed a computational method, MATCHER, which uses manifold alignment to integrate cell trajectories constructed from these measurements and infer single cell multi-omic profiles. Here, we will extend this approach to match multimodal single cell profiles sampled from an entire tissue.

Computational methods to enable robust and cost-effective multiplexing of single cell rna-seq experiments in population-scale
Investigator: Hyun Min Kang, Biostatistics
Description: With the advent of single-cell genomic technologies, Human Cell Atlas (HCA) seeks to create a reference maps of each individual cell type and to understand how they develop and maintain their functions, how they interact with each other, and which environmental and/or genetic changes trigger molecular dysfunction that leads to disease. To achieve these goals, it becomes increasingly important to creatively integrate single-cell genomic technologies with novel computational methods to maximize the potential of the new technological advances. Recently, our group has developed a computational tool demuxlet that enable population- scale multiplexing of droplet-based single-cell RNA-seq (dscRNA-seq) experiments. Our approach harnesses natural genetic variation carried within dscRNA-seq reads to multiplex cells from many samples in a single library prep, and statistically deconvolute the sample identity of each barcoded droplet while filtering out multiplets (droplets that contain two or more cells). In this proposal, we aim to further extend our method to increase the accuracy by harnessing cell-specific expression levels, and to eliminate the constraint requiring external genotype data. We will enable application of these methods through production, distribution, and support of efficient, well-documented, open-source software; and test these tools through analysis of simulated data and of real dscRNA-seq data.