New ARC-TS program offers unused cycles on Flux to undergraduates

By February 29, 2016Feature

The MDST Team: L-R: Anthony Kremin, Ben Bray, Wei Lee, Curtis Fenner, Jimmy Hsu, Alex Chojnacki, Alexander Zaitzeff, Jonathan Stroud, Jared Webb, Tianpei Xie, Helena Zeng, Xiang Li, Xinyu Tan, Jianming Sang, Guangsha Shi

Undergraduates working on research that requires high performance computing resources can now use the Flux HPC cluster at no cost.

Flux is the shared computing cluster available across campus, operated by Advanced Research Computing – Technology Services (ARC-TS). Under ARC-TS’s new Flux for Undergraduates program, student groups and individuals with faculty sponsors can access unused computing cycles on Flux for free.

The first student group to take advantage of this program is the Michigan Data Science Team, which was created in Fall 2015 with the goal of helping U-M students enter Big Data competitions. The team enters competitions through sites like Kaggle, and is one of the first such teams affiliated with a university.

The group’s organizer, Jonathan Stroud, a Computer Science and Engineering graduate student, said team members were maxing out the capabilities of their laptops when they first started.

“For the first couple of competitions, we made sure we picked a problem that people could do on their laptops. Still, every night before bed, they would set up their experiments and they ran all night.”
— Jonathan Stroud

He said success in the data science competitions typically depends on trying several approaches simultaneously, which can be taxing on computing resources. Stroud said the team typically uses software such as Python, R, and Matlab. Team members come from a wide range of disciplines, including Engineering, Applied Math, Physics, and one from the Music School, Stroud said.

Jacob Abernethy, assistant professor of Electrical Engineering and Computer Science, is the group’s faculty advisor. He wrote some funding for the group into his NSF CAREER proposal that was awarded in 2015. He said after the group’s first competition, he surveyed the students as to what worked and what didn’t. He said one of the clearest responses was the need for more robust computing resources.

“Our top two competitors talked about maxing out the resources on not only their own laptop, but also on the clusters provided them by their advisors,” Abernethy said. “It became clear that we needed to talk about Flux.”

He said a key method to the machine learning and data science experimentation process is the use of cross-validation, that is, testing the performance of a set of parameters on several subsets of data simultaneously. “This leads to a very obvious need for a distributed system in which we can execute a large number of ‘embarrassingly parallel’ tasks quickly,” Abernethy said.

Being able to use Flux “has been helping us a lot,” Stroud added. “We’ve been contacted by other schools to see how they can do the same thing.”

Jobs submitted under Flux For Undergraduates will run only when unused cycles are available and will be requeued when those resources are needed by standard Flux jobs. To be most efficient, student groups should use short or checkpointed jobs to take advantage of these available cycles.

Student groups can also purchase Flux allocations for jobs that are higher priority or time constrained; those allocations can also work in conjunction with the free Flux for Undergraduates jobs.

“The goal is to provide undergraduates with experience in high performance computing, and access to computational resources for their projects,” said Brock Palen, Associate Director of ARC-TS.

Undergraduate groups and individuals must have sponsorship from a faculty member. To request resources through Flux for Undergraduates, please fill out this form. An abstract of the intended activity must be submitted.