Natural Language Processing Workshop Series

Written text contains a wealth of information that can be turned into research data to study almost every aspect of human behavior, human health and our society. However, converting text to usable data requires an understanding of standard techniques from the field of Natural Language Processing (NLP). MIDAS and the AI Lab are jointly organizing a monthly series of NLP workshops during the Fall 2022 semester. The first installment is on Sept. 19.

Completing the workshop series will allow you to assess whether you want to incorporate NLP/text data into your research agenda, determine what expertise you might want to seek from collaborators for future projects, and know what skills you will want to develop further in this area, as well as where to look for additional learning opportunities.

The workshops are open to all U-M researchers, but the content will be geared toward using NLP in social science research.

We especially welcome faculty and staff researchers who don’t normally have training opportunities.

Series Schedule

Additional workshop sessions coming soon.

Workshop 1 | Introduction to Natural Language Processing

3:00 – 5:00pm, Monday, Sept. 19, 2022 | 340 West Hall

In the first workshop of the series, we will provide a broad overview of NLP and introduce basic concepts used in NLP, including keyword counting, sentiment classification, and topic modeling. Additional topics include how NLP can be used, what the data looks like, what social science questions could be answered using NLP, and more.

Lead Instructor: Dr. Xuan Lu, Research Fellow and Research Investigator, School of Information

Dr. Xuan Lu is a research fellow in the School of Information, University of Michigan. She earned her Ph.D. in Computer Science from Peking University. She is interested in large-scale user behavior data analysis. Her current research focuses on using emoji as a lens for understanding the languages, sentiments, health, behaviors, and cultural differences of social media users. She is a recipient of the WWW best paper award in 2019 and Microsoft Research Asia Fellowship in 2017.

Prerequisites and software requirements: None
Registration is required. 

Workshop 2 | What's in Text Data?

3:00 – 5:00pm, Monday, Oct. 10, 2022 | 340 West Hall

Written text contains rich information about human knowledge, opinions, and communication styles, but how do we extract insight about all this from the data? In this second workshop in the series, we will first introduce Jupyter notebooks, a popular platform for performing data science research. Then we will discuss how to choose a dataset for your research question, extract your own dataset from social media sites like Reddit and Twitter, and convert the raw text data to a usable format. We will then explore several methods to extract information and gain insight from text data, including named entity recognition and sentiment analysis.

Lead Instructor: Winston Wu, Research Fellow, Computer Science and Engineering

Dr. Winston Wu is currently a postdoctoral researcher in Computer Science and Engineering at the University of Michigan. His research interests broadly include natural language processing across multiple languages and cultures. He currently works with Professors Rada Mihalcea and Lu Wang on understanding societal biases in existing and machine-generated texts in various domains including political news and fairy tales. He received his PhD from Johns Hopkins University, where he worked on multilingual NLP for low-resource languages.

Prerequisites and software requirements: Knowledge of Python would be helpful, but not required.
Registration is required. 



Each workshop builds off of the previous workshop in the series. However, attendance at previous workshop(s) in the series is not a prerequisite for later workshops if you already have knowledge of the material covered in the previous workshop(s).

Have Questions?

Please contact MIDAS Senior Scientist Shane Redman (