Workshop 2 | What’s in Text Data?

October 10, 2022 3:00 PM - 5:00 PM

340 West Hall

Text contains rich information about human knowledge, opinions, and communication styles, but how do we extract insight about all this from the data? In this second workshop in the series, we will first introduce Jupyter notebooks, a popular platform for performing data science research. Then we will discuss how to choose a dataset for your research question, extract your own dataset from social media sites like Reddit and Twitter, and convert the raw text data to a usable format. We will then explore several methods to extract information and gain insight from text data, including named entity recognition and sentiment analysis.

Lead Instructor: Winston Wu, Research Fellow, Computer Science and Engineering

Dr. Winston Wu is currently a postdoctoral researcher in Computer Science and Engineering at the University of Michigan. His research interests broadly include natural language processing across multiple languages and cultures. He currently works with Professors Rada Mihalcea and Lu Wang on understanding societal biases in existing and machine-generated texts in various domains including political news and fairy tales. He received his PhD from Johns Hopkins University, where he worked on multilingual NLP for low-resource languages.