Deep learning is a rapidly growing field of machine learning, and has proven successful in many domains, including computer vision, language translation, and speech recognition. The training of deep neural networks is resource intensive, requiring compute accelerators such as GPUs, as well as large amounts of storage and memory, and network bandwidth. Additionally, getting the training data ready requires a lot of tooling for data cleansing, data merging, ambiguity resolution, etc. Sophisticated middleware abstractions are needed to schedule resources, manage the distributed training job as well as visualize how well the training is progressing. Likewise, serving the large neural network models with low latency constraints can require middleware to manage model caching, selection, and refinement.
All the major cloud providers, including Amazon, Google, IBM, and Microsoft have started to offer cloud services in the last year or so with services to train and/or serve deep neural network models. In addition, there is a lot of activity in open source middleware for deep learning, including Tensorflow, Theano, Caffe2, PyTorch, and MXNet. There are also efforts to extend existing platforms such as Spark for deep learning workloads.
This workshop focuses on the tools, frameworks, and algorithms to support executing deep learning algorithms in a distributed environment. As new hardware and accelerators become available, the middleware and systems need to be able exploit their capabilities and ensure they are utilized efficiently.
Bishwaranjan Bhattacharjee, IBM Research
Vatche Ishakian, Bentley University
Vinod Muthusamy, IBM Research
Parag Chandakkar, Walmart Labs
Ian Foster, Argonne National Laboratory and the University of Chicago
Benoit Huet, Eurecom
Pietro Michiardi, Eurecom
Peter Pietzuch, Imperial College
Evgenia Smirni, College of William and Mary
Yandong Wang, Citadel Securities
Chuan Wu, University of Hong Kong
Ruben Mayer, Technical University of Munich
Gauri Joshi, Carnegie Mellon University