Second Workshop on Distributed Infrastructures for Deep Learning (DIDL) 2018

The DIDL workshop is co-located with ACM/IFIP Middleware 2018, and takes place on December 10 at Rennes, France.

Deep learning is a rapidly growing field of machine learning, and has proven successful in many domains, including computer vision, language translation, and speech recognition. The training of deep neural networks is resource intensive, requiring compute accelerators such as GPUs, as well as large amounts of storage and memory, and network bandwidth. Additionally, getting the training data ready requires a lot of tooling for data cleansing, data merging, ambiguity resolution, etc. Sophisticated middleware abstractions are needed to schedule resources, manage the distributed training job as well as visualize how well the training is progressing. Likewise, serving the large neural network models with low latency constraints can require middleware to manage model caching, selection, and refinement.

All the major cloud providers, including Amazon, Google, IBM, and Microsoft have started to offer cloud services in the last year or so with services to train and/or serve deep neural network models. In addition, there is a lot of activity in open source middleware for deep learning, including Tensorflow, Theano, Caffe2, PyTorch, and MXNet. There are also efforts to extend existing platforms such as Spark for deep learning workloads.

This workshop focuses on the tools, frameworks, and algorithms to support executing deep learning algorithms in a distributed environment. As new hardware and accelerators become available, the middleware and systems need to be able exploit their capabilities and ensure they are utilized efficiently.

Agenda

Introduction and tutorial on deep learning (9:00 - 9:30)
Bishwaranjan Bhattacharjee (IBM Research)

Keynote #1: Building a Distributed Deep Learning Product - From Idea to 250 Live Customers - Lessons Learned (9:30 - 10:30)
Zbigniew Jerzak (SAP)

In this talk we are going to present a full 2 year journey from an initial idea to a medium sized, cloud-based, deep learning product that serves over 200 customers in production. We are specifically going to focus on technical challenges faced by the team as well as the organizational aspects. We are going to talk about the solutions that worked out as well as the ones that did not work out in order to provide the audience with a glimpse behind the curtains of commercial business to business software development.

Bio: Zbigniew Jerzak is the Head of the Deep Learning Center of Excellence and Machine Learning Research at SAP. The mission of the Deep Learning Center of Excellence and Machine Learning Research is to research and develop machine learning technology behind existing and new SAP products. Our technology is serving hundreds of SAP customers and is touching millions of transactions every month. Zbigniew holds a PhD degree in Distributed Systems from the TU Dresden, Germany.

Break (10:30 - 11:00)

Paper presentations #1 (11:00 - 12:00)

A performance evaluation of federated learning algorithms
Adrian Nilsson, Simon Smith, Gregor Ulm, Emil Gustavsson, Mats Jirstrand (Fraunhofer-Chalmers Centre & Fraunhofer Center for Machine Learning)
Presentation [PDF]

Distributed C++-Python embedding for fast predictions and fast prototyping
Georgios Varisteas (University of Luxembourg), Tigran Avanesov (OlaMobile), Radu State (University of Luxembourg)
Presentation [PDF]

Lunch (12:00 - 1:30)

Keynote #2: Robust scheduling and elastic scaling of deep learning workloads (1:30 - 2:30)
Jayaram K. R. (IBM Research)

Jayaram K. R. is a research scientist at IBM Research (T. J. Watson Research Center), NY, USA. He is interested in distributed systems and distributed programming. Specific topics of interest include large scale machine/deep learning platforms, elasticity in cloud computing, and event-based systems. He holds M.S. and PhD degrees in Computer Science from Purdue University, USA. More information at http://www.jayaramkr.com
Presentation [PDF]

Paper presentations #2 (2:30 - 3:00)

Parallelized training of deep NN – comparison of current concepts and frameworks
Sebastian Jäger (inovex GmbH, Karlsruhe, Germany), Stefan Igel (inovex GmbH, Karlsruhe, Germany), Christian Zirpins (Karlsruhe University of Applied Sciences), Hans-Peter Zorn (inovex GmbH, Karlsruhe, Germany)
Presentation [PDF]

Break (3:00 - 3:30)

Paper presentations #3 (3:30 - 4:15)

Object Storage for Deep Learning Frameworks
Or Ozeri, Effi Ofer, Ronen Kat (IBM Research)
Presentation [PDF]

Gossiping GANs
Hardy Corentin (INRIA/Technicolor), Le Merrer Erwan (Technicolor), Sericola Bruno (INRIA)
Presentation [PDF]

Closing remarks (4:15 - 4:30)

Workshop call for papers

Call For Papers (CFP)

Workshop Co-chairs

Bishwaranjan Bhattacharjee, IBM Research
Vatche Ishakian, Bentley University
Vinod Muthusamy, IBM Research

Program Committee

Parag Chandakkar, Walmart Labs
Ian Foster, Argonne National Laboratory and the University of Chicago
Benoit Huet, Eurecom
Gauri Joshi, Carnegie Mellon University
Ruben Mayer, Technical University of Munich
Pietro Michiardi, Eurecom
Peter Pietzuch, Imperial College
Evgenia Smirni, College of William and Mary
Yandong Wang, Citadel Securities
Chuan Wu, University of Hong Kong