MIDAS organizes many events throughout the year specifically geared towards students’ technical skill development, job search and career preparation, and engagement with industry professionals in the data science field.


Participating as either teams or individuals, students use real world data sets from industry sponsors, community organizations, or University research projects to answer pre-defined research questions.  Often run as competitions, MIDAS aims to promote student work by using judges from industry that may (and many times do) offer outstanding participants internship and permanent job opportunities.

Previous Events:


Students learn about how data science is utilized in industry and how best to prepare for careers through conversations with real-world professionals.  Sessions are either centered around a theme (interviewing, company specific job openings, etc.) or feature panelists from various fields to give a broad overview of career opportunities in data science.  

Previous Events:


MIDAS coordinates workshops with industry professionals that center around a specific data science tool, software package, or company-specific methodology.  Participants actively engage with presenters using real-world data to gain practical experience. Workshop providers include AWS, Google, and Databricks. See the event page for more details on upcoming and past events.

Upcoming event details are shared below:


  • Thursday, August 20, 2020, 12-2pm
    Introduction to Apache Spark
    Apache Spark is a lightning-fast unified analytics engine for big data and machine learning. It was originally developed at UC Berkeley in 2009. Since its release, Apache Spark has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Netflix, Yahoo, and eBay have deployed Spark at massive scale, collectively processing multiple petabytes of data on clusters of over 8,000 nodes. It has quickly become the largest open source community in big data, with over 1000 contributors from 250+ organizations. The Apache Spark ecosystem includes Spark Core, SparkSQL, Spark Streaming, SparkML and GraphX and supports languages including Java, Scala, Python, R, SQL. The lecture is divided into two parts. The first part of the lecture introduces the Spark fundamentals and core concepts via slides. The second part of the lecture uses Databricks notebook to provide hands-on examples of how to read/transform/write data in the Spark and some advanced topics such as performance optimization. To follow the second part of the hands-on lab yourself, you need to have a free Databricks Community Edition account. Here is the link to register a free Databricks Community account: https://databricks.com/try-databricks. Knowledge of SQL and experience with Python is helpful to understand the lecture. Register at: https://forms.gle/oSCrhEYt99Q9CQkS6