Explore ARCExplore ARC

Web Scraping with Python

By |

This workshop will provide an overview of how to scrape data from html pages and website APIs using Python. This will mostly be accomplished using the Python requests, beautifulsoup, retry modules and the browser developer tools. The workshop is intended for users with basic Python knowledge. Anaconda Python 3.5 will be used.

Classification, Regression and Model Selection using Python’s Scikit-learn

By |
This workshop will introduce participants to machine learning in Python. We’ll start with a brief explanation of Anaconda and the Jupyter notebook environment (although not required for the participant, the instructor will be using these tools). After an introduction to classification, regression and model selection, we’ll use a couple of example datasets to demonstrate how to create, apply and evaluate models in Scikit-learn. Although not required, we recommend all participants to have a basic knowledge of Python.

Data Processing in Python using Pandas

By |

This workshop will introduce participants to Python’s Pandas. We’ll start with a brief explanation of Anaconda and the Jupyter notebook environment (although not required for the participant, the instructor will be using these tools). After a brief introduction to main Python’s standard data types as well as Pandas data structures, we’ll demonstrate how to retrieve information from Pandas Series and DataFrames. We’ll also demonstrate basic input/output, selection, dropping, sorting, ranking, grouping and apply operations.  Although not required, we recommend all participants to have a basic knowledge of Python.

Intro to Natural Language Processing with Python

By |

This workshop will provide a quick overview of natural language processing using Python. We’ll cover the basics. Segmenting text into tokens, assigning part-of-speech, assigning dependency labels, detecting and labeling named-entities. We’ll also cover sentiment analysis, topic modelling and maybe some visualizations. The workshop will be conducted in Python and is intended for users with basic Python programming knowledge. Anaconda Python 3.5 and a Jupyter Notebook will be used.

Parallel Processing with Python

By |

Modern computers have a CPU with multiple cores (usually between 4-8). Come learn how to take advantage of them to parallelize and speed up your code. We’ll show you how to structure your code so you can parallelize it in 5 lines or less. We will also cover some theory, a few practical considerations along with some basic exercises. We’ll be using the multiprocessing module in Python. The workshop is intended for users with basic Python knowledge. The workshop assumes you know how to do the following in Python: i) write a for loop, ii) write a function that has inputs and outputs.  Anaconda Python 3.5 will be used.

Regular Expressions II

By |

Regular expressions are perfectly suited for people who like puzzles. Regular expressions are a sequence of characters used to define a search pattern. They are commonly used to do “find” and “find and replace” string operations. They are also used to validate strings like phone numbers, passwords, etc. in data entry. Regular expression capabilities can be found in a variety of programming languages and software like ArcGIS, Java, Javascript, Matlab, Perl, PHP, Python, R, Visual Basic, etc. and some text editors. This workshop is part II of a two-part series and will cover more advanced topics like captured groups, backreferences and assertions. The workshop will consist of hands-on example problems. Basic understanding of regular expressions is required. You should be able to understand expressions like “w{3,}-d{1,2}-d{4}“ and “des*ert?s?”. The tutorials will be conducted using Python. A basic programming background is helpful but not required for this workshop.

Mini-course: Introduction to Python — Sept. 11-14

By | Data, Educational, Events, General Interest, News

Asst. Prof. Emanuel Gull, Physics, is offering a mini-course introducing the Python programming language in a four-lecture series. Beginners without any programming experience as well as programmers who usually use other languages (C, C++, Fortran, Java, …) are encouraged to come; no prior knowledge of programming languages is required!

For the first two lectures we will mostly follow the book Learning Python. This book is available at our library. An earlier edition (with small differences, equivalent for all practical purposes) is available as an e-book. The second week will introduce some useful python libraries: numpyscipymatplotlib.

At the end of the first two weeks you will know enough about Python to use it for your grad class homework and your research.

Special meeting place: we will meet in 340 West Hall on Monday September 11 at 5 PM.

Please bring a laptop computer along to follow the exercises!

Syllabus (Dates & Location for Fall 2017)

  1. Monday September 11 5:00 – 6:30 PM: Welcome & Getting Started (hello.py). Location: 340 West Hall
  2. Tuesday September 12 5:00 – 6:30 PM: Numbers, Strings, Lists, Dictionaries, Tuples, Functions, Modules, Control flow. Location: 335 West Hall
  3. Wednesday September 13 5:00 – 6:30 PM: Useful Python libraries (part I): numpy, scipy, matplotlib. Location: 335 West Hall
  4. Thursday September 14 5:00 – 6:30 PM: Useful Python libraries (part 2): 3d plotting in matplotlib and exercises. Location: 335 West Hall

For more information: https://sites.lsa.umich.edu/gull-lab/teaching/physics-514-fall-2017/introduction-to-python/

 

PyData July Meetup: Designing an Algorithmic Trading Strategy with Python

By |

Join us for a PyData Ann Arbor Meetup on Thursday, July 13th, at 6 PM, hosted by TD Ameritrade and MIDAS.

Gus Gordon is a Data Engineer at Quantopian, an algorithmic investing platform and hedge fund manager. He works on the research team, developing tools for analyzing financial data and evaluating the performance of trading strategies. Gus studied physics and economics at Bucknell University.

In this talk, Gus will go through a clean example of how to design a financial trading strategy using open source Python tools. We’ll start off by analyzing a raw trading signal in alphalens, then transition that signal into an algorithm that we can backtest with zipline. Finally, we’ll review the results of the backtest by going through some plots generated by pyfolio.

PyData Ann Arbor is a group for amateurs, academics, and professionals currently exploring various data ecosystems. Specifically, we seek to engage with others around analysis, visualization, and management. We are primarily focused on how Python data tools can be used in innovative ways but also maintain a healthy interest in leveraging tools based in other languages such as R, Java/Scala, Rust, and Julia.

PyData Ann Arbor strives to be a welcoming and fully inclusive group and we observe the PyData Code of Conduct. PyData is organized by NumFOCUS.org, a 501(c)3 non-profit in the United States.

“use what you have learned to make something better and share with others”

PyData May Meetup: Scalable, Distributed, and Reproducible Machine Learning

By |

Join us for a PyData Ann Arbor Meetup on Thursday, May 25th at 6 PM, hosted by TD Ameritrade and MIDAS.

The recent advances in machine learning and artificial intelligence are amazing!  Yet, in order to have real value within a company, data scientists must be able to get their models off of their laptops and deployed within a company’s data pipelines and infrastructure.  Those models must also scale to production size data. In this talk, we will implement a model locally in Python. We will then take that model and deploy both it’s training and inference in a scalable manner to a production cluster with Pachyderm, an open source framework for distributed pipelining and data versioning. We will also learn how to update the production model online, track changes in our model and data, and explore our results.

Daniel Whitenack (@dwhitena) is a Ph.D. trained data scientist working with Pachyderm (@pachydermIO). Daniel develops innovative, distributed data pipelines which include predictive models, data visualizations, statistical analyses, and more. He has spoken at conferences around the world (ODSC, Spark Summit, Datapalooza, DevFest Siberia, GopherCon, and more), teaches data science/engineering with Ardan Labs (@ardanlabs), maintains the Go kernel for Jupyter, and is actively helping to organize contributions to various open source data science projects.

PyData Ann Arbor is a group for amateurs, academics, and professionals currently exploring various data ecosystems. Specifically, we seek to engage with others around analysis, visualization, and management. We are primarily focused on how Python data tools can be used in innovative ways but also maintain a healthy interest in leveraging tools based in other languages such as R, Java/Scala, Rust, and Julia.

PyData Ann Arbor strives to be a welcoming and fully inclusive group and we observe the PyData Code of Conduct. PyData is organized by NumFOCUS.org, a 501(c)3 non-profit in the United States.

“use what you have learned to make something better and share with others”

PyData April Meetup: Interactive Data Visualization in Jupyter Notebook Using bqplot

By |

Join us for a PyData Ann Arbor Meetup on Thursday, April 13th at 6 PM, hosted by TD Ameritrade and MIDAS.

This month’s meetup will focus on bqplot which is a Python plotting library based on d3.js that offers its functionality directly in the Jupyter Notebook, including selections, interactions, and arbitrary css customization. In bqplot, every element of a chart is an interactive widget that can be bound to a python function, which serves as the callback when an interaction takes place. This allows the user to generate full fledged interactive applications directly in the Notebook with just a few lines of Python code. In the second part of the talk, drawing examples from fields like Data Science and Finance, we show examples of building interactive charts and dashboards using bqplot and the ipywidgets framework.

The talk will also cover bqplot’s interaction with the new JupyterLab IDE and what we plan for the future.

Presenter: Dhruv Madeka is a Quantitative Researcher at Bloomberg LP. His current research interests focus on Machine Learning, Quantitative Finance, Data Visualization and Applied Mathematics. Having graduated from the University of Michigan with a BS in Operations Research and from Boston University with an MS in Mathematical Finance, Dhruv is part of one of the leading research teams in Finance, developing models, software and tools for users to make their data analysis experience richer.

 

PyData Ann Arbor is a group for amateurs, academics, and professionals currently exploring various data ecosystems. Specifically, we seek to engage with others around analysis, visualization, and management. We are primarily focused on how Python data tools can be used in innovative ways but also maintain a healthy interest in leveraging tools based in other languages such as R, Java/Scala, Rust, and Julia.

PyData Ann Arbor strives to be a welcoming and fully inclusive group and we observe the PyData Code of Conduct. PyData is organized by NumFOCUS.org, a 501(c)3 non-profit in the United States.

“use what you have learned to make something better and share with others”