Explore ARCExplore ARC

ARC-TS seeks pilot users for two new research storage services

By | General Interest, Happenings, HPC, News

Advanced Research Computing – Technology Services (ARC-TS) is seeking pilot users for two new research storage services.

The first, Locker, is group project storage focused on large data sets, and is available at a cost less than half that of current primary storage services. Locker still provides encryption, replication, snapshots, and workstation access. Example use cases for Locker are research projects in climate studies, genomics, imaging, and other data-intensive sciences.

The second service, Data Den, provides archive class storage for research data that is not actively used. As our lowest cost research storage offering, Data Den provides “cold storage” for massive amounts of data with 20 petabytes of encrypted and replicated capacity. Data Den allows researchers to preserve data between rounds of funding and management plans, and to free up space in more expensive primary storage by moving valuable, but not currently used, data.

Those interested in participating in the pilots should contact ARC-TS at hpc-support@umich.edu.

MDST group wins KDD best paper award

By | General Interest, Happenings, MDSTPosts, Research

A paper by members and faculty leaders of the Michigan Data Science Team (co-authors: Jacob Abernethy, Alex Chojnacki, Arya Farahi, Eric Schwartz, and Jared Webb) won the Best Student Paper award in the Applied Data Science track at the KDD 2018 conference in August in London.

The paper, ActiveRemediation: The Search for Lead Pipes in Flint, Michigan, details the group’s ongoing work in Flint to detect pipes made of lead and other hazardous material.

For more on the team’s work, see this recent U-M press release.

U-M part of new software institute on high-energy physics

By | General Interest, Happenings, News, Research

The University of Michigan is part of an NSF-supported 17-university coalition dedicated to creating next-generation computing power to support high-energy physics research.

Led by Princeton University, the Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP) will focus on developing software and expertise to enable a new era of discovery at the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland.

Shawn McKee, Research Scientist in the U-M Department of Physics, is a co-PI of the institute. His His work will focus on integrating and extending the Open Storage Grid networking activities with similar efforts at the LHC.

For more information, see Princeton’s press release, and the NSF’s announcement.

New course for fall 2018: On-Ramp to Data Science for Chemical Engineers

By | Educational, General Interest, Happenings, News

Description: Engineers are encountering and generating a ever-growing body of data and recognizing the utility of applying data science (DataSci) approaches to extract knowledge from that data. A common barrier to learning DataSci is the stack of prerequisite courses that cannot fit into the typical engineering student schedule. This class will remove this barrier by, in one semester, covering essential foundational concepts that are not part of many engineering disciplines’ core curricula. These include: good programming practices, data structures, linear algebra, numerical methods, algorithms, probability, and statistics. The class’s focus will be on how these topics relate to data science and to provide context for further self-study.

Eligibility: College of Engineering students, pending instructor approval.

More information: http://myumi.ch/LzqPq

Instructor: Heather Mayes, Assistant Professor, Chemical Engineering, hbmayes@umich.edu.

University of Michigan awarded Women in High Performance Computing chapter

By | General Interest, News

The University of Michigan has been recognized as one of the first Chapters in the new Women in High Performance Computing (WHPC) Pilot Program.

“The WHPC Chapter Pilot will enable us to reach an ever-increasing community of women, provide these women with the networks that we recognize are essential for them excelling in their career, and retaining them in the workforce.” says Dr. Sharon Broude Geva, WHPC’s Director of Chapters and Director of Advanced Research Computing (ARC) at the University of Michigan (U-M). “At the same time, we envisage that the new Chapters will be able to tailor their activities to the needs of their local community, as we know that there is no ‘one size fits all’ solution to diversity.”

“At WHPC we are delighted to be accepting the University of Michigan as a Chapter under the pilot program, and working with them to build a sustainable solution to diversifying the international HPC landscape” said Dr. Toni Collis, Chair and co-founder of WHPC, and Chief Business Development Officer at Appentra Solutions.

The process of selecting organizations to participate in the program accounted for potential conflicts of interest; Geva did not vote on U-M’s application.

About Women in High Performance Computing (WHPC) and the Chapters and Affiliates Pilot Program

Women in High Performance Computing (WHPC) was created with the vision to encourage women to participate in the HPC community by providing fellowship, education, and support to women and the organizations that employ them. Through collaboration and networking, WHPC strives to bring together women in HPC and technical computing while encouraging women to engage in outreach activities and improve the visibility of inspirational role models.

WHPC has launched a pilot program for groups to become Affiliates or Chapters. The program will share the knowledge and expertise of WHPC as well as help to tailor activities and develop diversity and inclusion goals suitable to the needs of local HPC communities. During the pilot, WHPC will work with the Chapters and Affiliates to support and promote the work of women in their organizations, develop crucial role models, and assist employers in the recruitment and retention of a diverse and inclusive HPC workforce.

WHPC is stewarded by EPCC at the University of Edinburgh. For more information visit http://www.womeninhpc.org.  

For more information on the U-M chapter, contact Dr. Geva at sgeva@umich.edu.

MIDAS researchers’ papers accepted at ACM KDD data science conference in London

By | General Interest, Happenings, News, Research

Several U-M faculty affiliated with MIDAS will participate in the KDD2018 Conference in London in August. The meeting is held by the Associate for Computing Machinery’s Special Interest Group in Knowledge Discovery and Data Mining (KDD).

U-M researchers had the following papers accepted:

Learning Adversarial Networks for Semi-Supervised Text Classification via Policy Gradient
Yan Li (U-M); Jieping Ye (U-M)

TINET: Learning Invariant Networks via Knowledge Transfer
Chen Luo (Rice University); Zhengzhang Chen (NEC Laboratories America); Lu-An Tang (NEC Laboratories America); Anshumali Shrivastava (Rice University); Zhichun Li (NEC Laboratories America); Haifeng Chen (NEC Laboratories America); Jieping Ye (U-M)

Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
Jiaqi Ma(U-M); Zhe Zhao (Google); Xinyang Yi (Google); Jilin Chen (Google); Lichan Hong (Google); Ed Chi (Google)

Learning Credible Models
Jiaxuan Wang (U-M); Jeeheh Oh (U-M); Haozhu Wang (U-M); Jenna Wiens (U-M)

Deep Multi-Output Forecasting: Learning to Accurately Predict Blood Glucose Trajectories
Ian Fox (U-M); Lynn Ang (U-M); Mamta Jaiswal (U-M); Rodica Pop-Busui (U-M); Jenna Wiens (U-M)

ActiveRemediation: The Search for Lead Pipes in Flint, Michigan
Jacob Abernethy (Georgia Institute of Technology); Alex Chojnacki (U-M); Arya Farahi (U-M); Eric Schwartz (U-M); Jared Webb (Brigham Young University)

Career Transitions and Trajectories: A Case Study in Computing
Tara Safavi (U-M); Maryam Davoodi (Purdue University); Danai Koutra (U-M)

In addition, U-M Professor Jieping Ye will present at the event’s Artificial Intelligence in Transportation tutorial, and U-M Assistant Professor Qiaozhu Mei will speak as part of Deep Learning Day.

CASC image competition open for submissions

By | General Interest, Happenings, News

The image competition for the Coalition for Academic Scientific Computation (CASC) 2019 annual brochure is now open. Winning images will be featured in the brochure, which is distributed to industry, government and academia. An image from U-M Aerospace Engineering Professor Joaquim Martins was on the cover of the 2016 edition, and several U-M investigators have had their work featured in the brochure in other years.

Images will be judged on the following criteria:

  • Illustrative of research underway at the center submitting the proposed images
  • Focus on research that offers a broad representation of what CASC members have undertaken
  • Timeliness of visualization relative to events currently in the news
  • Exhibits intellectual merit
  • Provides scientific, cultural, economic impact
  • Compelling, visually interesting, lively, colorful images in a  high-resolution format

Please send potential submissions to Dan Meisler, ARC Communications Manager, at dmeisler@umich.edu. The deadline is June 11, 2018.

ARC-TS joins Cloud Native Computing Foundation

By | General Interest, Happenings, News

Advanced Research Computing – Technology Services (ARC-TS) at the University of Michigan has become the first U.S. academic institution to join the Cloud Native Computing Foundation (CNCF), a foundation that advances the development and use of cloud native applications and services. Founded in 2015, CNCF is part of the Linux Foundation.

CNCF announced ARC-TS’s membership at the KubeCon and CloudNativeCon event in Copenhagen. A video of the opening remarks by CNCF Executive Director Dan Kohn can be viewed on the event website.

“Our membership in the CNCF signals our commitment to bringing cloud computing and containers technology to researchers across campus,” said Brock Palen, Director of ARC-TS. “Kubernetes and other CNCF platforms are becoming crucial tools for advanced machine learning, pipelining, and other research methods. We also look forward to bring an academic perspective to the foundation.”

ARC-TS’s membership and participation in the group signals its adoption and commitment to cloud-native technologies and practices. Users of containers and other CNCF services will have access to experts in the field.

Membership gives the U-M research community input into in the continuing development of cloud-native applications, and within CNCF-managed and ancillary projects. U-M is the second academic institution to join the foundation, and the only one in the U.S.

U-M, MIDAS researchers supported by Chan Zuckerberg Initiative

By | General Interest, Happenings, News, Research

Several University of Michigan researchers, including faculty affiliated with MIDAS, recently received support from the Chan Zuckerberg Initiative under its Human Cell Atlas project.

The project seeks to create a shared, open reference atlas of all cells in the healthy human body as a resource for studies of health and disease. The project is funding a variety of software tools and analytic methods. The U-M projects are listed below:

Identifying genetic markers: dimension reduction and feature selection for sparse data
Investigator: Anna Gilbert, Department of Mathematics, MIDAS Core Faculty Member
Description: One of the modalities that scientists participating in the Human Cell Atlas will use to gather data is single cell RNA sequencing (scRNA-seq). The analysis, however, of scRNA-seq data poses novel biological and algorithmic challenges. The data are high dimensional and not necessarily in distinct clusters (indeed, some cell types are exist along a continuum or developmental trajectory). In addition, data values are missing. To analyze this data, we must adjust our dimension reduction algorithms accordingly and either fill in the values or determine quantitatively the impact of the missing values. Furthermore, none of these steps is performed in isolation; they are part of a principled data analysis pipeline. This work will leverage over a decade of modern, sparsity-based machine learning methods and apply them to dimension reduction, marker selection, and data imputation for scRNA-seq data. In one of our two feature selection methods, we adapt a 1-bit compressed sensing algorithm (1CS) introduced by Genzel and Conrad. In order to select markers, the algorithm finds optimal hyperplanes that separate the given clusters of cells and that depend only on a small number of genes. The second method is based on the mutual information (MI) framework developed in. This algorithm greedily builds a set of markers out of a set of statistically significant genes that maximizes information about the target clusters and minimizes redundancy between markers. The imputation algorithms use sparse data models to impute missing values and are tailored to integer counts.

Computational tools for integrating single-cell RNA sequencing studies with genome-wide association studies
Investigator: Xiang Zhou, Biostatistics
Description: Single cell RNA sequencing (scRNAseq) has emerged as a powerful tool in genomics. Unlike previous bulk RNAseq that measures average expression levels across many cells, scRNAseq can measure gene expression at the single cell level. The high resolution of scRNAseq has thus far transformed genomics: scRNAseq has been applied to classify novel cell-subpopulations and states, quantify progressive gene expression, perform spatial mapping, identify differentially expressed genes, and investigate the genetic basis of expression variation. While many computational tools have been developed for analyzing scRNAseq data, tools for effective integrative analysis of scRNAseq with other existing genetic/genomic data types are underdeveloped. Here, we propose to extend our previous integrative methods and develop novel computational tools for integrating scRNAseq data with genome-wide association studies (GWASs). Our proposed tools will identify cell-subpopulations relevant to GWAS diseases or traits, facilitate the interpretation of association results, catalyze more powerful future association studies, and help understand disease etiology and the genetic basis of phenotypic variation. The proposed tools will be applied to integrate summary statistics from various GWASs with fine-scale cell-subpopulations identified from the Human Cell Atlas (HCA) project, to maximize the impact of HCA and facilitate our understanding of the genetic architecture of various human traits and diseases — a question of central importance to human health.

Joint analysis of single cell and bulk RNA data via matrix factorization
Investigator: Clayton Scott, Electrical Engineering and Computer Science, MIDAS Affiliated Faculty
Description: Single cell RNA sequence (ssRNAseq) data is a recently developed platform that enables the measurement of thousands of gene expression levels across individual cells in a tissue sample of interest. The ability to quantify gene expression at the cell level has great potential for advancing our understanding of the cellular processes that characterize a broad range of biological phenomena. However, compared with older bulk RNA technology, which measures expression levels of large numbers of cells in aggregate, ssRNAseq data has higher levels of measurement noise, which complicates its analysis. Furthermore, the problem of inferring cell type from ssRNAseq data is an unsupervised machine learning problem, an already difficult problem even without high measurement noise. To address these issues, we propose a mathematical and algorithmic framework to infer cellular characteristics by analyzing single cell and bulk RNA data simultaneously, via an approach grounded in matrix factorization. The developed algorithms will be evaluated on real data gathered by researchers at the University of Michigan who study breast cancer and spermatogenesis.

Integrating single cell profiles across modalities using manifold alignment
Investigator: Joshua Welch, Computational Medicine and Bioinformatics
Description: Integrating the variation underlying different types of single cell measurements is a critical step toward a comprehensive catalog of human cell types. The ideal approach to construct a cell type atlas would use high-throughput single cell multi-omic profiling to simultaneously measure all cellular modalities of interest within each cell. Although this approach is currently out of reach, it is possible to separately perform high-throughput transcriptomic, epigenomic, and proteomic measurements at the single cell level. Computationally integrating multiple data modalities measured on different individual cells can circumvent the experimental challenges of multi-omic profiling. If different types of single cell measurements are performed on distinct single cells from a common population, each modality will sample a similar set of cells. Matching up similar cells to infer multimodal profiles enables some analyses for which multi-omic profiling is desirable, including multimodal cell type definition and studying covariance among different data types. Manifold alignment is a powerful computational technique for integrating multiple sources of data that describe the same set of events by discovering the common manifold (general geometric shape) that underlies them. Previously, we showed that transcriptomic and epigenomic measurements performed on distinct single cells share underlying sources of variation. We developed a computational method, MATCHER, which uses manifold alignment to integrate cell trajectories constructed from these measurements and infer single cell multi-omic profiles. Here, we will extend this approach to match multimodal single cell profiles sampled from an entire tissue.

Computational methods to enable robust and cost-effective multiplexing of single cell rna-seq experiments in population-scale
Investigator: Hyun Min Kang, Biostatistics
Description: With the advent of single-cell genomic technologies, Human Cell Atlas (HCA) seeks to create a reference maps of each individual cell type and to understand how they develop and maintain their functions, how they interact with each other, and which environmental and/or genetic changes trigger molecular dysfunction that leads to disease. To achieve these goals, it becomes increasingly important to creatively integrate single-cell genomic technologies with novel computational methods to maximize the potential of the new technological advances. Recently, our group has developed a computational tool demuxlet that enable population- scale multiplexing of droplet-based single-cell RNA-seq (dscRNA-seq) experiments. Our approach harnesses natural genetic variation carried within dscRNA-seq reads to multiplex cells from many samples in a single library prep, and statistically deconvolute the sample identity of each barcoded droplet while filtering out multiplets (droplets that contain two or more cells). In this proposal, we aim to further extend our method to increase the accuracy by harnessing cell-specific expression levels, and to eliminate the constraint requiring external genotype data. We will enable application of these methods through production, distribution, and support of efficient, well-documented, open-source software; and test these tools through analysis of simulated data and of real dscRNA-seq data.