I will be an ETH Postdoctoral Fellow at ETH Zurich, working with Professor Torsten Hoefler in the Scalable Parallel Computing Laboratory, beginning Fall, 2019. I previously completed my PhD in computer science at the University of Illinois at Urbana-Champaign, advised by Professor Marc Snir. I work heavily with members of the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory. My research focuses on scaling deep neural network training on HPC resources (large clusters and supercomputers) by developing new algorithms and optimized implementations, and applying DNNs to scientific and computational simulation data sets. I also work on tools and applications for exascale supercomputers.
- My contact information
- My publications
- My current resume
- My current CV
- My Github
- My current committed identity, using SHA-512, is:
- My Erdos number is currently 3.
Aluminum is a generic communication framework enabling high-performance asynchronous point-to-point and collective operations, especially on GPUs. It includes more GPU-friendly semantics than MPI, and a suite of latency- and bandwidth-optimized algorithms, both from existing library and custom implementations. Aluminum has been integrated into both the LBANN deep learning toolkit and the Hydrogen distributed linear algebra library.
LBANN (Livermore Big Artificial Neural Network Toolkit) is a research toolkit for scaling the training of deep neural networks on HPC systems. My work includes optimized communication algorithms and patterns, communication sparsification and quantization, more general distributed-memory convolution algorithms, and more scalable data-parallel training algorithms.
PPL is an experimental C++11 runtime system for exploring different implementation tradeoffs, especially in the context of future exa-scale systems. See the paper on it below. If you’re interested in further details, contact me.
The name (probably) inventively stands for “Parallel Programming Library”, and is certainly not meant to be confused with the nice people right next door at the other PPL (Parallel Programming Laboratory).
PGDB is a parallel debugger for large-scale MPI applications, written primarily in Python with some C/C++. I haven’t found time to work on it in quite a while, but I continually find situations where it would be useful.
Xenos was a web-based RPG I did back-end PHP (the horror!) and database work for between 2005 and 2008, primarily in collaboration with Alistair Lynn, Nick Farley, Taylor Vaughan, and Alec Ingulsrud. At its height, we had several hundred players.