Warning: Attempt to read property "user_firstname" on string in /home/customer/www/thoughtworks.dayshiftdigital.com/public_html/wp-content/themes/connected-2021/single-post.php on line 6

Warning: Attempt to read property "user_lastname" on string in /home/customer/www/thoughtworks.dayshiftdigital.com/public_html/wp-content/themes/connected-2021/single-post.php on line 7

Takeaways from the International Conference on Machine Learning

Dhruv Guliani

Dhruv Guliani

Software Engineer

August 10, 2018


As an engineer focused on applied machine learning, I was lucky to attend this year’s International Conference on Machine Learning (ICML) in Stockholm, Sweden. Here are some of my biggest takeaways.

1. Imitation Has Key Benefits for Reinforcement Learning

Presenters: Yisong Yue (Caltech) & Hoang M. Le (Caltech)

Also known as “learning from demonstration” or “apprenticeship learning,” Imitation Learning is a promising area of ML research because it helps learning agents overcome the challenges of learning in open world environments by mimicking pre-existing target behaviours. Disney Research used imitation learning, for example, to create animations of faces that mimicthe way peoples’ jaws move when they speak various languages — something Yue and Le partnered with Disney on.

Other applications explored included the training of autonomous vehicles based on human decision making, tracking where athletes should be on the field based on winning examples, and the picking and placing of objects for fulfillment (think Amazon warehouses).

Demo of RL agent “mouthing” German by imitating a human.
Demo of RL agent “mouthing” German by imitating a human.

Imitation learning is compelling because it uses “winning” examples to address some of the fundamental challenges of its parent discipline, reinforcement learning. The whole point of reinforcement learning, after all, is to train an agent to maximize the cumulative reward it receives by taking actions in an environment (think of an AI dog trying to maximize the number of treats it can get by figuring out which tricks to perform). By finding winning pre-existing behaviours to target (from “experts” or “demonstrators,” as they’re called), imitation learning helps a) narrow the field of possible actions that might result in a reward (without an expert, the “action space” can be prohibitively large) and b) establish a direct correlation between actions and rewards (as opposed to being unsure which action resulted in a reward in a given episode).

Techniques covered in this talk included Behavioural Cloning, Direct Policy Learning, and Inverse RL.

Video and slides here.

2. Model-Free Reinforcement Learning Is Not Ready for the Real World

Presenter: Benjamin Recht (UC Berkeley)

Model-free reinforcement learning tries to maximize the rewards an agent receives by having it interact with its environment directly rather than by first establishing a well-defined model¹ of that environment and interacting with that model as a proxy. Model-free approaches are appealing because they don’t require the effort and overhead of trying to build a model of the environment in question.

On the surface, it looks like model-free RL would be a great candidate to help with real-world scenarios, and in particular with continuous control problems². Continuous control problems involve controlling inputs to real-world dynamical systems to minimize cost (e.g. robots learning to move, self-driving cars navigating busy roads, and unmanned aerial vehicles learning to stabilize mid-air). Because continuous control problems resemble RL problems closely in their formulation as minimization problems consisting of states, actions, transitions, and rewards, wouldn’t it be great if we could learn to solve these problems without first modelling complex real-world environments?

Image shot by a stable unmanned aerial vehicle.
Image shot by a stable unmanned aerial vehicle.

Yet as promising as this seems, there are some fundamental challenges of having model-free RL agents control real-world systems. As Dr. Recht cogently argued in his tutorial, model-free reinforcement learning is only really effective in highly structured, closed environments, for example in simulations and games. As soon as an agent is brought into real-world continuous environments, performance becomes highly unstable and sensitive to even the smallest perturbations and changes in initializations. The reason for this drop in performance is likely because closed environments don’t simulate all real-world conditions, and so an agent that performs well in a closed environment may not be well-suited to to anticipate unseen perturbations in the real world.

A second reason why model-free reinforcement learning fails in real-world situations is that the real world doesn’t allow for as much experimentation, which is a key basis of model-free learning, since the stakes are way higher. Consider an RL agent learning to control a motorcycle on a highway, for example: it won’t be allowed to fail to the same extent as in a simulation because of the cost of repeatedly destroying motorcycles — to say nothing of situations where human beings are involved — and so the agent can’t learn by trial-and-error in the same way as in simulation.

Humanoid trained to walk in simulation using model-free RL.
Humanoid trained to walk in simulation using model-free RL.

Considering that reinforcement learning sessions were among the most popular at ICML this year (to the extent that the conference was jokingly referred to as ICRL), Dr. Recht proposed that the community either focus on model-based RL — specifically model-predictive control — or look for fundamental techniques to further stabilize model-free reinforcement learning in real-world environments. If we continue to only optimize model-free RL control algorithms in simulation, Recht warns, we’re barking up the wrong tree.

For a deeper dive into Dr. Recht’s work, I recommend reading his concise yet highly educational series “An Outsider’s Tour of Reinforcement Learning.”

Video and slides here.

3. AI Is Crucial for Security

Presenter: Dawn Song (UC Berkeley)

This talk explored challenges and opportunities at the intersection of AI and Security: how machine learning and security have an entangled path forward if we are to put ML research into products. To show just how vulnerable our learning algorithms are, Dr. Song discussed different techniques to attacking deep learning systems, an in particular vision systems (for an example of this, check out my introductory work here). On the other hand, machine learning can also be used to create new attacks for different software systems (e.g. web services and IoT clouds).

Another crucial part of the narrative on security and ML is the treatment of privacy, and specifically the issue of preventing the extraction of sensitive information from machine learning models. Although introductory in nature, this talk was a great view into the growing relationship between security and machine learning that highlighted some of the most important problems to look out for going forward.

4. Popular Paper Themes from ICML

I came across a number of interesting papers at ICML that I thought it would be helpful to group by theme.

a) The Adversary

One of the biggest themes I encountered was adversarial machine learning, which is the broad field of using competition between learning algorithms to optimize a generator that comes up with new samples. Within this field, there are two main subfields — one which generates new samples based on learnings from a training set (e.g. training a generator to come up with paintings in a particular style), and another which generates adversarial samples intended to fool neural networks.

Image generated through generative adversarial learning.
Image generated through generative adversarial learning.


Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

Provable defenses against adversarial examples via the convex outer adversarial polytope

Block-box Adversarial Attacks with Limited Queries and Information

Which Training Methods for GANs do actually Converge?

b) New Architectures in Deep Learning

This theme refers to advancements in introducing new architectural components to neural networks, something I saw a lot of discussion around. This work furthers deep learning as a field overall, and provides new tools to solve prediction problems in different domains.


Differentiable plasticity: training plastic neural networks with backpropagation

Conditional Neural Processes

Generative Temporal Models with Spatial Memory for Partially Observed Environments

c) Deploying Advances in Machine Learning to Products

Research related to the deployment of machine learning systems in products.


Parallel WaveNet: Fast High-Fidelity Speech Synthesis

d) Reinforcement Learning

Advancements made to solve the problem of maximizing cumulative reward through decision making in a state-action space.


Learning by Playing — Solving Sparse Reward Tasks from Scratch

Hierarchical Imitation and Reinforcement Learning

Unsupervised Meta-Learning for Reinforcement Learning

Implicit Quantile Networks for Distributional Reinforcement Learning

e) Theory

The work here looks to further our theoretical understanding of why and how machine learning (and specifically deep learning) works. This work is critical in enabling us to move ever-forward with the deployment of models and networks everywhere, as it provides foundational knowledge to contrast the highly empirical advancements made by the deep learning community.


Machine Theory of Mind

The Mechanics of n-Player Differentiable Games

f) Privacy & Fairness

As deep learning becomes applicable in contexts that deal broadly with populations of humans (e.g. healthcare, lending, banking, credit scoring, online payday loans application assessment, etc.), we need tangential tools that allow us to remove historical biases present in training data. At the same time, we need ways of knowing that private information cannot be extracted from models through adversarial attack. This work makes progress in these areas.


Residual Unfairness in Fair Machine Learning from Prejudiced Data

Fairness Without Demographics in Repeated Loss Minimization

5. Overheard at ICML

Finally, some fun and meaningful quotes from ICML:

“All of machine learning becomes reinforcement learning once we put it into production. Today, you have these massive companies retraining models regularly with user feedback. That’s reinforcement learning. “

– Dr. Benjamin Recht, Associate Professor (UC Berkeley)

Context: Dr. Benjamin Recht advocated for more fundamental research to stabilize reinforcement learning agents in continuous real-world environments. He reflected that many non-RL models in production are retrained often based on feedback, effectively making them RL systems.

“That single line that we all love to write as machine learning people, namely ‘(acc, loss) = model.evaluate(x_test, y_test)’ is no longer enough.”

– Nicholas Carlini, Research Scientist (Google Brain)

Context: Nicholas Carlini, upon winning the Best Paper award for his work “Obfuscated Gradients Give a False Sense of Security,” advocated for a more structured approach to validating models, namely by creating ways to identify the extent to which a model is secure.

S’up? Thanks for your useful constructive suggestions and for including your valuable deep question. We will clarify what we learn and update the final appendix if we have space to test our estimate!

– ICML 2018 Organizing Team

Context: For all the papers submitted to ICML which required revisions before being accepted, the organizers used classical NLP techniques to identify the words and phrases that authors used in response to criticism with the highest probability of changing the outcome. This was a fun mash-up of key phrases intended to increase an author’s chance of acceptance to ICML..


With over 5000 attendees and more than 600 papers, keynotes, poster presentations, and awards given out over a packed six-day schedule, ICML this year was a lot to take in. Beyond academic institutions, numerous companies also came out in droves and had set up their own booths (I’ve written about who was there specifically in another post.) It was impossible to miss the growth in attendance and new research submitted to ICML this year, indicating that the machine learning community is more vibrant than ever.

The most substantial non-technical takeaway for me was the continued coupling of fundamental theoretical research and product development in machine learning. ML remains a field that has yet to be truly commoditized. Although we have tools that are ever-more sophisticated, advanced, and easy to use (TensorFlow, PyTorch, Cloud Services etc.), genuine and impactful innovation is still coming primarily from research. This means that large companies are continuing to invest substantially in the unknown, setting up research labs where top academics can earn hefty salaries while setting their own work agenda.

Some things I’d love to see more of next year include:

  • Distillation research, which is research that solves the problem of increasing research debtas the community blazes forwardIt would be great to see more researchers willing to comprehensively process, simplify, and place research in broader contexts to reduce the research debt that has built up over the last decade.
  • I’d also love to see a broader exploration of optimizations outside of deep learning and reinforcement learning. It’s just my opinion, but I believe that Artificial General Intelligence will not be about one specific field so much as about combining optimization with a semi-structured understanding of the brain. I believe the community should continue looking to animal and human brains for inspiration when solving problems of intelligence.

All in all, ICML was a phenomenal experience. I left the conference energized and inspired to continue working on machine learning problems, focussing my efforts on distilling specific fields of research so that others may understand which problems have been solved and draw inspiration for other challenges to tackle. If you have a quantitative background and are interested in collaborating, comment here or email me at dhruv [ dot ] guliani [ at ] gmail [ dot ] com.

Dhruv is a ML Software Engineer at Connected, a product development firm that works with clients to drive impact through software-powered products. More stories here.

Author’s Remarks

I wanted to give a huge shoutout to Eli Burnstein for helping proofread and edit this article through multiple revisions.

As always, leave comments and call out any mistakes you find here. I will work to fix them as quickly as possible!


  1. A “model” in model-based RL captures the transition probabilities between states and actions and contains some information regarding the possible reward coming from these transitions. Effectively, this understanding of transition dynamics allows the RL problem to be reduced to a problem of planning the optimal trajectory within an environment to maximize overall reward.
  2. The resemblance between continuous control problems and RL problems is due to the fundamental formulation of both problems. Most reinforcement learning problems are thought of as Markov Decision Processes (MDPs). A MDP is a mathematical framework for modelling decision making scenarios consisting of 1) A finite set of states, 2) A finite set of actions, 3) The probability that an action will lead to a particular state, 4) The immediate reward (or expected immediate reward) received after transitioning between states due to an action, and 5) A discount factor. Continuous control problems can also be thought of as MDPs, having the same fundamental components described above, and so it seems natural that reinforcement learning techniques would be effective in this domain as well.

Article originally posted on Towards Data Science

Warning: Undefined array key "modal" in /home/customer/www/thoughtworks.dayshiftdigital.com/public_html/wp-content/themes/connected-2021/template-parts/newsletter-modal.php on line 2

Related Posts