Category

Feature

MIDAS Data Science Fellow Elyse Thulin Awarded Best Poster by a Trainee at the UCSF Promoting Research in Social Media and Health Symposium 

By | Feature, News, Research, research papers
Elyse Thulin

At the 2022 UCSF Promoting Research in Social Media and Health Symposium, Elyse Thulin, a Postdoctoral Fellow at MIDAS and at the Addiction Center in the department of Psychiatry at Michigan Medicine, was awarded Best Poster by a Trainee.

Using traditional epidemiologic, mixed qualitative and quantitative, and computational machine learning methods, Elyse’s broad program of research focuses on how people use online and virtual spaces to interact in ways that both hinder and support wellbeing, mental health, and changes in substance use behaviors. More specifically, Elyse’s areas of research include cyber dating violence, online substance use recovery support groups, and online support groups for traumatic change/loss. Computational skills greatly enhance her work as it enables her to scrape data from online sources, utilize natural language processing to identify top terms, themes, and sentiment from text, and efficiently expand traditional qualitative methods to efficiently code thousands of posts. Elyse’s long term goal is to become a faculty member who teaches, mentors students, and conducts research around expanded applications of computational social science for health and wellbeing.

Online public support group for recovery from problematic cannabis use: trends of use and topics of discussion Elyse J. Thulin, PhD, Anne Fernandez, PhD, Erin Bonar, PhD, Maureen Walton, PhD

Download a PDF version of the poster here.

Elyse provided the following statement about her research:

“Over the past two decades, there have been significant increases in cannabis consumption in the U.S., tied to greater state legalization of recreational (21 states) and medical (37 states) cannabis use, new routes of administration (e.g., vaping, dabbing, edibles), and increased potency of THC. This is worrying given increases in emergency-room injuries related to cannabis use and the increased prevalence of cannabis use disorders (CUD). Despite increased risk of injury related to cannabis and growing prevalence of cannabis use disorder, admission rates for clinical treatment are down, and more than 85% of who would qualify as having CUD do not receive clinical forms of treatment. In contrast, in recent years there has been an uptick in the use of online nonclinical services for those looking to change their cannabis use behaviors. Despite this uptick, very little is known at this time about the functionality, content and interactions occurring within non-clinical, online spaces. In this poster presentation, I aimed to begin to fill this gap by identifying the major themes of conversation, contextualizing information of those themes, and overlap in the present themes with 4 domains of recovery proposed by the US Substance Abuse and Mental Health Services Administration (SAMHSA) in a publicly available online community of individuals who are aiming to cease using cannabis.

I used a data-driven approach to inform the methods of this study. I scraped data from 10 years from a popular Reddit forum on cannabis cessation. I then evaluated the growth of the community across the 10 year period. I next used pre-processing NLP methods (e.g., case uniformity, stemming, etc.) to ready the data for analysis, then identified the top words and terms present in posts. Finally, I extracted a subset of posts to analyze by hand using qualitative methods, to determine the context around top words and phrases. The growth of the community and top words can be found on the poster. Most importantly, we found five major themes in the present study present in posts to the online cannabis cessation community: 1.) individual identify & cannabis use; consequences of cannabis use; reasons for change; cessation strategies; and consequences of change. While examples within these five themes overlapped with the three SAMHSA domains of health, community and purpose, the domain of home was less common and may be less pertinent to this community. Simultaneously, many posts referenced individual identity and cannabis use in posts. Examples were “I smoked daily for ten years” and “I took my first tok at 14, and by 16 I was using in the morning, afternoon and night”. In the context of a common (but incorrect) public narrative that cannabis is not harmful or addictive, individuals in this community may find it important to share the frequency or longevity of their experiences to help emphasize the significant role that cannabis had in their day to day lives. It may be that increased public awareness of that cannabis can be addictive and harmful, particularly when use begins in adolescence or early adulthood or is heavy and frequent, would create greater opportunities for individuals who have experienced dependence or are wanting to change their cannabis use behaviors.”

U-M partners with Cavium on Big Data computing platform

By | Feature, General Interest, Happenings, HPC, News

A new partnership between the University of Michigan and Cavium Inc., a San Jose-based provider of semiconductor products, will create a powerful new Big Data computing cluster available to all U-M researchers.

The $3.5 million ThunderX computing cluster will enable U-M researchers to, for example, process massive amounts of data generated by remote sensors in distributed manufacturing environments, or by test fleets of automated and connected vehicles.

The cluster will run the Hortonworks Data Platform providing Spark, Hadoop MapReduce and other tools for large-scale data processing.

“U-M scientists are conducting groundbreaking research in Big Data already, in areas like connected and automated transportation, learning analytics, precision medicine and social science. This partnership with Cavium will accelerate the pace of data-driven research and opening up new avenues of inquiry,” said Eric Michielssen, U-M associate vice president for advanced research computing and the Louise Ganiard Johnson Professor of Engineering in the Department of Electrical Engineering and Computer Science.

“I know from experience that U-M researchers are capable of amazing discoveries. Cavium is honored to help break new ground in Big Data research at one of the top universities in the world,” said Cavium founder and CEO Syed Ali, who received a master of science in electrical engineering from U-M in 1981.

Cavium Inc. is a leading provider of semiconductor products that enable secure and intelligent processing for enterprise, data center, wired and wireless networking. The new U-M system will use dual socket servers powered by Cavium’s ThunderX ARMv8-A workload optimized processors.

The ThunderX product family is Cavium’s 64-bit ARMv8-A server processor for next generation Data Center and Cloud applications, and features high performance custom cores, single and dual socket configurations, high memory bandwidth and large memory capacity.

Alec Gallimore, the Robert J. Vlasic Dean of Engineering at U-M, said the Cavium partnership represents a milestone in the development of the College of Engineering and the university.

“It is clear that the ability to rapidly gain insights into vast amounts of data is key to the next wave of engineering and science breakthroughs. Without a doubt, the Cavium platform will allow our faculty and researchers to harness the power of Big Data, both in the classroom and in their research,” said Gallimore, who is also the Richard F. and Eleanor A. Towner Professor, an Arthur F. Thurnau Professor, and a professor both of aerospace engineering and of applied physics.

Along with applications in fields like manufacturing and transportation, the platform will enable researchers in the social, health and information sciences to more easily mine large, structured and unstructured datasets. This will eventually allow, for example, researchers to discover correlations between health outcomes and disease outbreaks with information derived from socioeconomic, geospatial and environmental data streams.

U-M and Cavium chose to run the cluster on Hortonworks Data Platform, which is based on open source Apache Hadoop. The ThunderX cluster will deliver high performance computer services for the Hadoop analytics and, ultimately, a total of three petabytes of storage space.

“Hortonworks is excited to be a part of forward-leading research at the University of Michigan exploring low-powered, high-performance computing,” said Nadeem Asghar, vice president and global head of technical alliances at Hortonworks. “We see this as a great opportunity to further expand the platform and segment enablement for Hortonworks and the ARM community.”

Liza Levina, PhD, Chosen IMS Medallion Lecturer in 2019

By | Events, Feature, General Interest, Happenings, News, Research

Professor Liza Levina has been selected to present an Institute of Mathematical Statistics (IMS) Medallion Lecture at the 2019 Joint Statistical Meeting (JSM).

Each year eight Medallion Lecturers are chosen from across all areas of statistics and probability by the IMS Committee on Special Lectures. The Medallion nomination is an honor and an acknowledgment of a significant research contribution to one or more areas of research. Each Medallion Lecturer will receive a Medallion in a brief ceremony preceding the lecture.

Raising the next generation of data scientists at the MIDAS Summer Camp

By | Educational, Feature, General Interest, News

This summer, 10 high school students from around the country gathered in Ann Arbor for the first annual Michigan Institute for Data Science Summer Camp on the campus of the University of Michigan.

The weeklong camp, titled “From Simple Building Blocks to Complex Shapes: A Visual Tour of Fourier Series,” drew students from as far away as Kansas City, MO, and as nearby as Ypsilanti and Ann Arbor.

The camp was organized by Raj Nadakuditi, assistant professor in the Electrical Engineering and Computer Science Department. Other U-M faculty instructors at the camp were Prof. Jenna Weins, and MIDAS co-directors Prof. Al Hero and Prof. Brian Athey.

The camp was well received by the participants, who ranged from high school sophomores to seniors. A total of 10 students attended, five boys and five girls. Students used the Fourier Series to make art, diagnose disease, and “play detective.”

“I’ve been looking to learn about what been going on with Big Data,” said Daniel Neamati, a 16-year-old from Ann Arbor who hopes to someday study deep space with NASA. “I was really surprised by this camp. Math is basically everywhere.”

Elizabeth Fitzgerald, 16, traveled from South Carolina to take part in the camp. She said she wants to study artificial intelligence and machine learning, but was interested to see what else data science can explain.

“It was enlightening to see all the different applications of data science,” she said.

Building a Community of Social Scientists with Big Data Skills: The ICOS Big Data Summer Camp

By | Educational, Feature, General Interest, News

As the use of data science techniques continues to grow across disciplines, a group of University of Michigan researchers are working to build a community of social scientists with skills in Big Data through a week-long summer camp for faculty and graduate students.

Having recently completed its fourth annual session, the Big Data Summer Camp held by the Interdisciplinary Committee for Organizational Studies (ICOS) trains approximately 50 people each spring in skills and methods such as Python, SQL, and social media APIs. The camp splits up into several groups to try to answer a research question using these newly acquired skills.

Working with researchers from other fields is a key component of the camp, and of creating a Big Data social science community, said co-coordinator Todd Schifeling, a Research Fellow at the Erb Institute in the School of Natural Resources and Environment.

“Students meet from across social science disciplines who wouldn’t meet otherwise,” said Schifeling. “And every year we bring back more and more past campers to present on what they’ve been doing.”

Schifeling himself participated in the camp as a student before taking on the role of coordinator this year.

Teddy DeWitt, the other co-coordinator of the camp and a doctoral student at the Ross School of Business, added the camp presents the curriculum in a unique way relative to the rest of campus.

“This set of material does not seem to be available in other parts of the university, at least … with an applied perspective in mind,” he said. “So we’re glad we have this set of resources that is both accessible and well-received by students.”

Participants range in skill from beginning to advanced, but even a relatively advanced student like Jeff Lockhart, a doctoral student in sociology and population studies who describes himself as “super-committed to computational social science,” said that it’s hard to find classes in computational methods in social science departments.

“[The ICOS camp] doesn’t expect a lot of prior knowledge, which I think is critical,” Lockhart said.

Lockhart, DeWitt, and Dylan Nelson, also a sociology doctoral student, are working on setting up a series of workshops in Computational Social Science for fall 2016 (contact Lockhart at jwlock@umich.edu for more information). Lockhart said it’s critical that social scientists learn Big Data skills.

“If we don’t have skills like this, there’s no way for us to enter into these fields of research that are going to be more and more important,” he said.

“A lot of the skills we’ve learned are sort of the on-ramp for doing data science,” DeWitt added.

The camp is co-sponsored by Advanced Research Computing (ARC).

Great Lakes Observing System Data Challenge: Call for Issue Experts, Sponsors

By | Educational, Feature, General Interest, Happenings, News

CALL FOR ISSUE EXPERTS AND SPONSORS

The Great Lakes Observing System (GLOS) is hosting the Great Lakes Data Challenge in summer of 2016. As part of our 10 year anniversary, GLOS will be taking open data to the next level by using open innovation to broaden our community and create new partnerships to engage people in problem solving for the Great Lakes. GLOS is currently soliciting support for sponsors and issue experts.

GOALS

  • Inspire a wider audience to engage with Great Lakes issues
  • Use technologies, innovation and creativity to solve Great Lakes problems
  • Encourage the use of open data resources from GLOS and beyond

TIMELINE

  • Late May 2016: Launch challenge
  • June: Kick-off event(s), including IAGLR
  • August 15: Submissions due. Submissions can include an app, data “mash-up”, visualization, story, or other innovative idea for using, collecting, analyzing, visualizing, and/or communicating Great Lakes data.
  • August 15-31: Judging
  • September 15: Winners notified
  • October 12-13: Award presentation at GLOS Annual Meeting in Ann Arbor, MI

GLOS PROVIDES

  • Baseline prize money: $5,000
  • Data, technical support, and resources for developer guidelines, rules, etc.
  • Data Challenge(s) coordination

WE NEED YOU

Sponsors: by May 20 The Great Lakes Data Challenge is a unique opportunity to network the region’s
environmental, governmental and non-profit sectors with the information technology sector. Sponsors must commit by May 20 to ensure inclusion in event promotions.

Consider sponsoring the challenge at one of our suggested levels (see next page) to help support prize
money, event costs, and promotional giveaways. This is a great way to promote your business/organization to a diverse audience of environmental data and technology stakeholders.

Issue Experts: by June 1 We are looking for volunteers with expertise in areas including invasive species, nutrients and algae and boater safety, among others. You would agree to be a resource to teams who have specific questions about the topic at hand. The commitment could be flexible according to your interest and availability.

Please contact GLOS at kpaige@glos.us if you are interested in supporting the data challenge in any of these areas.

Be a part of the Great Lakes Observing System’s Data Challenge

  • SUPERIOR $5,000
    All lower level sponsorship benefits as well as…
    Top billing as Data Challenge co-sponsor in all event promotions and media releases
    Large, prominent logo on event giveaways, promotional signage, and website
  • MICHIGAN $2,500
    All lower level sponsorship benefits as well as…
    Acknowledgement as co-sponsor for a custom challenge category
    Logo on event giveaways
  • HURON $1,000
    All lower level sponsorship benefits as well as…
    Sponsorship acknowledgement at promotional events including kick-off and award presentation
    Logo on Data Challenge website and promotional signage
  • ONTARIO $500
    All lower level sponsorship benefits as well as…
    Sponsorship acknowledgement on promotional signage
    Complimentary individual (for 1 person) GLOS membership and registration to the GLOS Annual Meeting
  • ERIE $250
    Sponsorship acknowledgement and website link on Data Challenge website
    Acknowledgement in GLOS Annual Report

UMHS – PUHSC Joint Institute 2016 Symposium: Call for Posters

By | clinical, Events, Feature, Happenings, Paper/Presentation Solicitation, Translational

University of Michigan Health System & Peking University Health Science Center

Joint Institute for Translational and Clinical Research

2016 Symposium

Call for Posters

[Downloadable Directions]

Call for Poster Abstracts: Submission Information
Showcase your research to Peking University Health Science Center counterparts as the JI looks to expand by offering funding to non-medical school faculty for new health-related joint research projects. A great venue to meet potential collaborators, the poster session will be Thursday, Oct. 13. Details, including times, will follow poster acceptance.

How to submit
Abstracts should relate to clinical and translational research studies and should be submitted electronically in a Microsoft Word document.

Send abstracts to globalreach@umich.edu by September 9, 2016. Please include the following:
Title

  • Title should be brief but should not contain abbreviations.
  • Do not bold use letters in the title unless necessary.
  • Do not capitalize all letters in title, only the first word and key words.

Authors

  • Include all authors and their affiliations. To associate authors and their institutional affiliations, please place a number in parenthesis after each author’s name (if more than one author) and the corresponding number before each affiliated institution’s name (if more than one institution).
  • Put the submitting/presenting author’s name in bold.
  • Do not capitalize all letters in speaker information, only as appropriate.

Abstract

  • Abstracts are limited to 300 words. Use size 11 Arial or Calibri font.
  • Submit text only. Do not include tables, graphics, or charts.
  • Do not include title, authors, or author affiliations in the abstract text.
  • Abstracts may include background, methods, results, conclusions, and funding-source acknowledgements, if applicable.

Submitter contact information

  • First and last name, degrees
  • Email address

Please proofread carefully – information submitted with errors may be published as is. Use a word processing program to assist with checking for grammar and spelling errors, as well as word count.
The deadline to submit abstracts is Sept. 9, 2016. For more information, contact globalreach@umich.edu.

 

New on-campus data-science and computational research services available

By | Feature, General Interest, News

Researchers across campus now have access to several new services to help them navigate the new tools and methodologies emerging for data-intensive and computational research.

As part of the U-M Data Science Initiative announced in fall 2015, Consulting for Statistics, Computing and Analytics Research (CSCAR) is offering new and expanded services, including guidance on:

  • Research methodology for data science.
  • Large scale data processing using high performance computing systems.
  • Optimization of code and use of Flux and other advanced computing systems.
  • Advanced data management.
  • Geospatial data analyses.
  • Exploratory analysis and data visualization.
  • Obtaining licensed data from commercial sources.
  • Scraping, aggregating and integrating data from public sources.
  • Analysis of restricted data.

“With Big Data and computational simulations playing an ever-larger role in research in a variety of fields, it’s increasingly important to provide researchers with a comprehensive ecosystem of support and services that address those methodologies,” said CSCAR Director Kerby Shedden.

As part of this significant expansion of its scope, the campuswide statistical consulting service CSCAR has been renamed Consulting for Statistics, Computing and Analytics Research. It was formerly known as the Center for Statistical Consultation and Research.

For more information, see the University Record article.

New ARC-TS program offers unused cycles on Flux to undergraduates

By | Feature
MDST-team

The MDST Team: L-R: Anthony Kremin, Ben Bray, Wei Lee, Curtis Fenner, Jimmy Hsu, Alex Chojnacki, Alexander Zaitzeff, Jonathan Stroud, Jared Webb, Tianpei Xie, Helena Zeng, Xiang Li, Xinyu Tan, Jianming Sang, Guangsha Shi

Undergraduates working on research that requires high performance computing resources can now use the Flux HPC cluster at no cost.

Flux is the shared computing cluster available across campus, operated by Advanced Research Computing – Technology Services (ARC-TS). Under ARC-TS’s new Flux for Undergraduates program, student groups and individuals with faculty sponsors can access unused computing cycles on Flux for free.

The first student group to take advantage of this program is the Michigan Data Science Team, which was created in Fall 2015 with the goal of helping U-M students enter Big Data competitions. The team enters competitions through sites like Kaggle, and is one of the first such teams affiliated with a university.

The group’s organizer, Jonathan Stroud, a Computer Science and Engineering graduate student, said team members were maxing out the capabilities of their laptops when they first started.

“For the first couple of competitions, we made sure we picked a problem that people could do on their laptops. Still, every night before bed, they would set up their experiments and they ran all night.”
— Jonathan Stroud

He said success in the data science competitions typically depends on trying several approaches simultaneously, which can be taxing on computing resources. Stroud said the team typically uses software such as Python, R, and Matlab. Team members come from a wide range of disciplines, including Engineering, Applied Math, Physics, and one from the Music School, Stroud said.

Jacob Abernethy, assistant professor of Electrical Engineering and Computer Science, is the group’s faculty advisor. He wrote some funding for the group into his NSF CAREER proposal that was awarded in 2015. He said after the group’s first competition, he surveyed the students as to what worked and what didn’t. He said one of the clearest responses was the need for more robust computing resources.

“Our top two competitors talked about maxing out the resources on not only their own laptop, but also on the clusters provided them by their advisors,” Abernethy said. “It became clear that we needed to talk about Flux.”

He said a key method to the machine learning and data science experimentation process is the use of cross-validation, that is, testing the performance of a set of parameters on several subsets of data simultaneously. “This leads to a very obvious need for a distributed system in which we can execute a large number of ‘embarrassingly parallel’ tasks quickly,” Abernethy said.

Learn more about Flux

Flux for Undergraduates

Being able to use Flux “has been helping us a lot,” Stroud added. “We’ve been contacted by other schools to see how they can do the same thing.”

Jobs submitted under Flux For Undergraduates will run only when unused cycles are available and will be requeued when those resources are needed by standard Flux jobs. To be most efficient, student groups should use short or checkpointed jobs to take advantage of these available cycles.

Student groups can also purchase Flux allocations for jobs that are higher priority or time constrained; those allocations can also work in conjunction with the free Flux for Undergraduates jobs.

“The goal is to provide undergraduates with experience in high performance computing, and access to computational resources for their projects,” said Brock Palen, Associate Director of ARC-TS.

Undergraduate groups and individuals must have sponsorship from a faculty member. To request resources through Flux for Undergraduates, please fill out this form. An abstract of the intended activity must be submitted.