The Michigan Data Science Team, with support from MIDAS, won the best poster competition at the Meeting the Challenges of Safe Transportation in an Aging Society Symposium Sept. 14-15, 2016.
The MIDAS Graduate Certificate in Data Science was established in 2015 to offer students a way to enhance their skills and prepare for a workforce that values multidisciplinary knowledge, broad analytical skills, and agile technological abilities. Nearly 50 students have enrolled in the program, which requires 9 credits of courses and 3 credits of experiential training, and involves mentorship opportunities with MIDAS-affiliated faculty members.
Chaoyi Jiao, who recently received his Ph.D from the Department of Climate and Space Sciences and Engineering and is now a post-doc there, was the first recipient of the MIDAS Graduate Certificate in Data Science. He recently answered a few questions about the program.
What are your research interests, and how does “data science,” broadly speaking, pertain to them?
My research primarily focuses on the Arctic climate change and climate modeling. Observation shows that the Arctic is warming at a much more rapid pace compared to the middle latitudes and tropics. Thus further warming of the climate system may pose an increasing threat to the climate and ecosystem in the Arctic. I hope to gain better understanding of the Arctic climate change and improve the numerical representation of Arctic climate in the climate models. As the data generated by the current generation of climate models and observational networks are growing rapidly, more sophisticated data analyses skills become more and more important for this research area.
Why did you decide to pursue the Graduate Certificate in Data Science?
As I started to conduct my PhD research project, I realized that statistics and data analysis skills are quite important. So I started to take some statistics classes on my second year. Later I learnt that there is a Data Science certificate program. I was very interested in the learning opportunities and academic experience proposed by this program. And I also think it could greatly benefit my career. So I decided to apply.
How hard or easy was it to meet the academic requirements?
Some classes are quite challenging when I started. But generally speaking, I think the academic requirement of this certificate is quite reasonable.
Were you required to take courses that you wouldn’t otherwise have taken? If so, how did they help you broaden your view of data science?
I would say probably not. I was planning to take some courses relate to statistics and machine learning topics before this certificate becomes available. But I think if I enrolled the data science program at a earlier time, I may take one or two extra classes. My experience tells that taking classes outside one’s own research field often helps to think with a broader perspective.
Why should other U-M students pursue this certificate?
I think as many research fields are becoming more and more data driven, mastering the cutting edge data analysis skills can greatly benefit one’s career. I would say if you believe that your research field is data driven and you hope to learn more advanced data science related topics, you definitely should consider this certificate. Moreover, the data science certificate also provides a great opportunities for networking with other students in this program.
The new Data Acquisition for Data Science (DADS) program supports acquisition, preparation, management, and maintenance of specialized research data sets used in current and future data science-enabled research projects across U-M, with special focus on the four challenge initiative areas pursued by the Michigan Institute for Data Science (MIDAS): transportation science, health science, social science, and learning analytics.
DADS is meant to provide datasets that can be used by multiple U-M researchers and departments.
DADS is funded through the Data Science Initiative (DSI); total funding is capped at $200,000 per year for 5 years.
DADS will be managed jointly by the Library and Advanced Research Computing (ARC), with support from ARC’s Consulting for Statistics, Computing, and Analytics Research (CSCAR), MIDAS, and ARC-Technology Services (ARC-TS) units.
For more information, see arc.umich.edu/dads.
ABSTRACT: Recovery from the Flint Water Crisis has been hindered by uncertainty in both the water testing process and the causes of contamination. On the other hand, city, state, and federal officials have been collecting and organizing a significant amount of data, including many thousands of water samples, information on pipe materials, and city records. Combining all of this information, and utilizing state-of-the-art algorithmic and statistical tools, we have be able to develop a clearer picture as to the source of the problems, to accurately estimate the greatest risks, and to more efficiently direct resources towards recovery.
CONTACT: Dan Meisler, ARC Communications Manager, 734-764-7414, firstname.lastname@example.org
A strategic partnership between the University of Michigan and software company Yottabyte promises to unleash a new wave of data-intensive research by providing a flexible computing cloud for complex computational analyses of sensitive and restricted data.
The Yottabyte Research Cloud will provide scientists high performance, secure and flexible computing environments that enable the analysis of sensitive data sets restricted by federal privacy laws, proprietary access agreements, or confidentiality requirements. Previously, the complexity of building secure and project-specific IT platforms often made the computational analysis of sensitive data prohibitively costly and time consuming.
The system is built on $5.5 million worth of hardware and software donated to the University by Yottabyte; U-M will provide $2 million to support delivery of services to researchers and general operations.
Brahmajee Nallamothu, professor of internal medicine, tested a pilot installation of the Yottabyte Research Cloud at the U-M Institute of Healthcare Policy and Innovation for his research on such topics as predictors of opioid use after surgery and the costs and uses of cancer screenings under the Affordable Care Act.
“We recently moved a healthcare claims database, which is multiple terabytes in size and requires a great deal of memory and fast storage to process, onto the pilot platform,” Nallamothu said. “The platform allows us to immediately increase or decrease computing resources to meet demand while permitting multiple users to access the data safely and remotely. Our previous setup relied on network storage and self-managed hardware, which was extremely inefficient compared to what we can do now.”
“The Yottabyte Research Cloud will improve research productivity by reducing the cost and time required to create the individualized, secure computing platforms that are increasingly necessary to support scientific discovery in the age of Big Data,” said Eric Michielssen, associate vice president for advanced research computing at U-M.
“With the Yottabyte Research Cloud, researchers will be able to ask more questions, faster, of the ever-expanding and massive sets of data collected for their work,” said Yottabyte CEO Paul E. Hodges, III. “We are very pleased to be a part of the diverse and challenging research environment at U-M. This partnership is a great opportunity to develop and refine computing tools that will increase the productivity of U-M’s world class researchers.”
Many U-M scientists are working on a variety of research projects that could benefit from use of the Yottabyte Research Cloud:
- Healthcare research, for example in precision medicine, often requires working with sensitive patient information and large volumes of diverse data types. This research can yield results that positively impact patients’ lives, but often involves the analysis of millions of clinical observations that can include genomic, hospital, outpatient, pharmaceutical, laboratory and cost data. This requires a secure high performance computing ecosystem coupled to massive amounts of multi-tiered storage.
- In the social sciences, U-M research requires secure, remote access to sensitive research data about substance abuse, mental health, and other topics.
- Transportation researchers who mine large and sensitive datasets — for example, a 24 Terabyte dataset that includes videos of drivers’ faces and GPS traces of their journeys — also stand to benefit from the security features and computing power.
- In learning analytics, studies of the persistence of teacher effects on student learning could benefit from the enclaves to store and analyze data that includes observational measures scored from classroom videos, and elementary and middle school students’ scores on standardized tests.
- Researchers in brain science will be able to use the Yottabyte Research Cloud to investigate a wide range of topics including the effects of aging on brain function and structure and how we focus our attention in the presence of distraction.
The Yottabyte Research Cloud is U-M’s first foray into software-defined infrastructure for research, allowing on-the-fly personalized configuration of any-scale computing resources, which promises to change the way traditional IT infrastructure systems are deployed across the research community.
More about Yottabyte: www.yottabyte.com.
More about Yottabyte Research Cloud: arc-ts.umich.edu/yrc