The recent advances in machine learning and artificial intelligence are amazing! Yet, in order to have real value within a company, data scientists must be able to get their models off of their laptops and deployed within a company’s data pipelines and infrastructure. Those models must also scale to production size data. In this talk, we will implement a model locally in Python. We will then take that model and deploy both it’s training and inference in a scalable manner to a production cluster with Pachyderm, an open source framework for distributed pipelining and data versioning. We will also learn how to update the production model online, track changes in our model and data, and explore our results.
Daniel Whitenack (@dwhitena) is a Ph.D. trained data scientist working with Pachyderm (@pachydermIO). Daniel develops innovative, distributed data pipelines which include predictive models, data visualizations, statistical analyses, and more. He has spoken at conferences around the world (ODSC, Spark Summit, Datapalooza, DevFest Siberia, GopherCon, and more), teaches data science/engineering with Ardan Labs (@ardanlabs), maintains the Go kernel for Jupyter, and is actively helping to organize contributions to various open source data science projects.
PyData Ann Arbor is a group for amateurs, academics, and professionals currently exploring various data ecosystems. Specifically, we seek to engage with others around analysis, visualization, and management. We are primarily focused on how Python data tools can be used in innovative ways but also maintain a healthy interest in leveraging tools based in other languages such as R, Java/Scala, Rust, and Julia.
PyData Ann Arbor strives to be a welcoming and fully inclusive group and we observe the PyData Code of Conduct. PyData is organized by NumFOCUS.org, a 501(c)3 non-profit in the United States.
“use what you have learned to make something better and share with others”