Natural Language Processing Workshop Series

Written text contains a wealth of information that can be turned into research data to study almost every aspect of human behavior, human health and our society. However, converting text to usable data requires an understanding of standard techniques from the field of Natural Language Processing (NLP). MIDAS and the AI Lab jointly organized a monthly series of NLP workshops during the Fall 2022 semester.

This workshop helped researchers to assess whether to incorporate NLP/text data into their research agenda, determine what expertise they might want to seek from collaborators for future projects, and know what skills they wanted to develop further in this area, as well as where to look for additional learning opportunities.

We especially welcome faculty and staff researchers who don’t normally have training opportunities.

Workshop 1 | Introduction to Natural Language Processing

3:00 – 5:00pm, Monday, Sept. 19, 2022 | 340 West Hall

In the first workshop of the series, we will provide a broad overview of NLP and introduce basic concepts used in NLP, including keyword counting, sentiment classification, and topic modeling. Additional topics include how NLP can be used, what the data looks like, what social science questions could be answered using NLP, and more.

Lead Instructor: Dr. Xuan Lu, Research Fellow and Research Investigator, School of Information

Dr. Xuan Lu is a research fellow in the School of Information, University of Michigan. She earned her Ph.D. in Computer Science from Peking University. She is interested in large-scale user behavior data analysis. Her current research focuses on using emoji as a lens for understanding the languages, sentiments, health, behaviors, and cultural differences of social media users. She is a recipient of the WWW best paper award in 2019 and Microsoft Research Asia Fellowship in 2017.

Workshop 2 | What's in Text Data?

3:00 – 5:00pm, Monday, Oct. 10, 2022 | 340 West Hall

Text contains rich information about human knowledge, opinions, and communication styles, but how do we extract insight about all this from the data? In this second workshop in the series, we will first introduce Jupyter notebooks, a popular platform for performing data science research. Then we will discuss how to choose a dataset for your research question, extract your own dataset from social media sites like Reddit and Twitter, and convert the raw text data to a usable format. We will then explore several methods to extract information and gain insight from text data, including named entity recognition and sentiment analysis.

Lead Instructor: Winston Wu, Research Fellow, Computer Science and Engineering

Dr. Winston Wu is currently a postdoctoral researcher in Computer Science and Engineering at the University of Michigan. His research interests broadly include natural language processing across multiple languages and cultures. He currently works with Professors Rada Mihalcea and Lu Wang on understanding societal biases in existing and machine-generated texts in various domains including political news and fairy tales. He received his PhD from Johns Hopkins University, where he worked on multilingual NLP for low-resource languages.

Workshop 3 | NLP Research Project Workflows

2:00 – 5:00pm, Wednesday, Nov. 9, 2022 | Earl Lewis Room, Rackham Building

In this afternoon workshop, we will discuss incorporating NLP into broader project workflows and introduce work from some labs at U-M applying NLP to study mental health.

Co-Instructors:

  • Elyse Thulin, Data Science Fellow, Michigan Institute for Data Science
    Dr. Elyse Thulin’s research focuses on applications of computational methods to better understand human behaviors. One of her main projects applies natural language processing methods to examine interactions in an online substance use recovery group to better understand substance use recovery pathways, mental health, and social relationships.
  • Winston Wu, Research Fellow, Computer Science and Engineering
    Dr. Winston Wu is currently a postdoctoral researcher in Computer Science and Engineering at the University of Michigan. His research interests broadly include natural language processing across multiple languages and cultures. He currently works with Professors Rada Mihalcea and Lu Wang on understanding societal biases in existing and machine-generated texts in various domains including political news and fairy tales. He received his PhD from Johns Hopkins University, where he worked on multilingual NLP for low-resource languages.

Prerequisites and software requirements: Knowledge of Python would be helpful, but not required.
Registration is required. 

Questions?