I primarily work on developing scalable parallel algorithms to solve large scientific problems. This has been done with teams from several different disciplines and application areas. I’m most concerned with algorithms emphasizing in-memory approaches. Another area of research has developed serial algorithms for nonparametric regression. This is a flexible form of regression that only assumes a general shape, such as upward, rather than a parametric form such as linear. It can be applied to a range of learning and classification problems, such as taxonomy trees. I also work some in adaptive learning, designing efficient sampling procedures.
The Smith lab group is primarily interested in examining evolutionary processes using new data sources and analysis techniques. We develop new methods to address questions about the rates and modes of evolution using the large data sources that have become more common in the biological disciplines over the last ten years. In particular, we use DNA sequence data to construct phylogenetic trees and conduct additional analyses about processes of evolution on these trees. In addition to this research program, we also address how new data sources can facilitate new research in evolutionary biology. To this end, we sequence transcriptomes, primarily in plants, with the goal of better understanding where, within the genome and within the phylogeny, processes like gene duplication and loss, horizontal gene transfer, and increased rates of molecular evolution occur.
My interests include randomized approximation algorithms for massive data sets, including, specifically, sublinear-time algorithms for sparse recovery in the Fourier and other domains. Other interests include data privacy, including privacy of energy usage data.