I have tested numerous RL algorithms on simulated production systems for my PhD thesis. It was a complete failure.
Most production planning problems are inherently high-dimensional and come with complex constraints. When you add noise to the problem, like random demand, all generic RL algorithms ran into trouble once you added complexity (like producing multiple products on the same line). So if you’re looking for good examples to break RL codes, throw them a nice stochastic economic lot sizing problem with sequence-dependent setup times.
By constrast what works well is policy search over parametric policies that consider domain knowledge. Sounds boring, but is quite hard to beat.
Our team at Pathmind has applied deep RL to multiple use cases in industrial control and supply chain management, notably various forms of scheduling, and MEIO, respectively.
Yes, these show RL applied to simulations, and yes, real data and real physical plants are more complex than that.
But those physical systems are already controlled by optimizers (usually mathematical solvers like IBM Cplex or Gurobi), and deep RL happens to be an optimizer that can handle a few things better:
* data variability
* multiple objectives in complex scenarios
* multiple agents making simultaneous decisions in coordination
Solving the third problem gives rise to very interesting emergent behavior among teams of machines, which learn to behave in ways that are almost impossible for a hard-coded set of rules to specify. This is the source of many of the gains that RL produces.
I see a lot of preconceptions about RL in this thread that are partially true, but also solvable.
Yes, RL is data hungry. I acknowledged that in the TC piece. The problems with data are pretty much the same problems you get with ML in general. It's messy, partial, and usually not gathered with ML in mind. RL is no more impossible than other ML approaches, based on that argument.
A valid simulation of a physical system is a source of synthetic data that helps solve that problem.
Deep RL is already controlling machines on factory floors, and it will slowly help optimize larger and larger systems, from groups of assembly lines, to networks of plants.
In some cases it will serve as a "decisioning layer" (not my word) with HITL, and in others, it will drive more autonomous systems with low latency requirements.
Believe me or not, it's happening.
Without getting into too many details, the underlying algorithms are a flavor of policy gradient implemented with the open source projects Ray/RLlib (supported by AnyScale) and Gym (from OpenAI). They're great tools and I highly recommend them to anyone who wants to explore deep RL.
Fwiw, DeepMind says RL may be the path to AGI, and smarter people than me have placed their bets on that team, which has consistently surpassed expectations about what RL can do.
AWS, while not particularly focused on RL, is making a big push into operational technology (OT) as well.
The incentives driving their cloud business will spur them to transform the software that controls the factory floor, or at least prompt existing OT software makers like Siemens to move faster.
If you have any questions about how deep RL is being applied, my email is in my profile. Pls hit me up!
Super interesting work - and thanks for the links! Maybe the parent was talking about applying DRL to dynamical systems? For instance, making a robot learn how to tighten a screw, or pick and place from arbitrary locations? I think these are still very hard to solve with DRL because it's very difficult to simulate things like contact forces, even with generous error bounds.
I suspect DRL will become extremely useful once we figure out how to use learned priors of the physical world to build more and more complex systems.
P&P (and peg-in-hole) are two popular problem domains for robotics and there's a couple of academic success applying DRL+ Domain Randomization techniques (or any sim2real) to that. There's some daunting limitation to it though; and there are other DL/analytic methods that does it at the same level.
It is great to see that you are using Anylogic which I think is an excellent workbench for creating discrete-event models of production systems and entire supply chains even. Anylogic also comes with an integrated tool for parameter tuning using global optimization.
Now, the model that I was looking into was a stochastic economic lot-sizing problem with random orders. At the end of a production run or upon arrival of a new order, the policy decided which product to produce next on a single machine. Each change however incurs switching cost and any orders that met an empty inventory were lost (the backlog case was easier).
The difficulty with this problem was noisy rewards after each state transition due to order randomness, as well as for the training algorithm to communicate the future value of an action in a particular state far enough backwards in time. As said, it worked for easy problems but not for problems with 5 products or more (which I think is a ridiculously small and simple problem).
Now, I do not know what you guys are using, whether your policy is a DNN and you are tuning parameters using REINFORCE / evolution strategies or whether you are using value function approximation with fitted Q-iteration (FQI) or similar (like DeepMind's Atari players). While I think that policy search may actually work (despite rather slow with Anylogic), I'd expect that FQI will run into trouble on more difficult problems.
That said, deep RL is a great piece of tech, but I am a bit skeptical of the claim that RL will transform the future of manufacturing, as I have made the experience that it can easily struggle even with simple problems as soon as the action space becomes high-dimensional and reward information must be communicated across hundreds or even thousands of state transitions.
No it won’t transform manufacturing. Deep RL in physical systems is a pipe dream.
Deep RL is one of the most data hungry methods. And in physical systems you don’t get enough samples.
There are loads of data. But no manager of a shopfloor lets you produce scrap just to potentially learn.
There is an opportunity to combine ML with classical engineering models for manufacturing. Think differential equations for chemical processes. Then you end up with something like ML augmented control theory. That does work.
It's not easy (or it would have been done already!) but I don't think it's a pipe dream. A lot of the computer vision capabilities that are taken for granted today would've been pipe dreams a few years ago.
It's true that the process is data hungry. But this isn't such a big problem in some cases. Having a few dozens or more of robot arms just figuring out stuff by trial and error isn't such a big deal (you don't have to do this at the customer's shop floor...). For self driving cars and drones this isn't such a great idea (though it has been tried to some extent). That's where simulation and more clever algorithms come in. Including, of course some prior knowledge, like you are implying. I don't think anybody expects to "transform manufacturing" by just throwing REINFORCE into a random KUKA arm.
Your simulated environment needs to accurately represent everything you're simulating for RL to be useful, otherwise, it will simply find artifacts of flaws of the sim itself or may fail to discover n-th order effects since they won't ever occur. This is exactly where DNNs have excelled and is most desired to be used: in all the places we don't know the foundational theory to. That may be why they're so successful compared to our traditional reductionist approaches: they might catch all the non-reduced aspects we never add into sims as we know them. We don't need DNNs doing QED or Newtonian mechanics (in general), we have nice solutions to these that in many (not all) cases are pretty darn efficient.
In some cases, you could be computationally bound and never achieve a sim representing your desired environment. In many, you lack theory to correctly build a sim. You need RL building the sims that you plan to use RL to explore optimal processes you seek within. We have a few disciplines that are well formulated enough where sims can be useful in but the vast majority are simply going to give you garbage or are just computationally bound. At best, many sims provide guidance that experts need to interpret.
Manufacturers already install all sorts of sensors to control the production process. Of course if you produce high end luxury cars there is probably not enough data for a useful model, but if you operate e.g. a hot rolling mill it may be useful.
Hot rolling is a good example.
For the last 30 years neural networks have been used to control the process. There’s a nice NeurIPS paper from 1991 [1].
And even with the non-deep networks from back in the day, people felt compelled to include prior knowledge and even used Bayesian priors to deal with a lack of data.
Other comments point out valid technical concerns, but even if those were solved, there's still not a clear application in manufacturing. If it has already been automated (e.g. semiconductors, welding car frames) then there's not a lot to gain from Deep RL.
On the other hand, if it hasn't been automated yet, who is going to invest hundreds of millions into a Deep RL system (never mind an army of robots) when they can just pay a slightly higher variable fee for humans. For example, sure a robot could theoretically sew jeans for $0.20 vs a human at $15.00 (living wage in US), but does any one firm have the scale to justify $500M to train Deep RL?
Nike has a market capitalization of over $200B. I’d guess if they could do away with human labor for $500M they would just do it. Cut labor costs and eliminate their sweatshop labor practice PR problems in one fell swoosh.
It seems to me that the $500M you quote is an exaggeration. I might not understand what it takes to train Deer RL at this time, but it consistently seems that with time, technologies like this are becoming cheaper and more-user-friendly. If not now (you may be right), isn't likely that in the near future we could have tools easy enough to chain together that people after an intense bootcamp could code up a useful Deep RL architecture to produce a useful-for-profit model?
GPT-3 reportedly cost $10M, and Deep RL is notorious for requiring way more epochs than supervised learning tasks like transformers or CNNs.
And that's not even accounting for the cost of robots (which will need frequent repairs when the untrained policies crash into physical objects), salaries for your engineers and all the raw materials that you're going to have to scrap because the robots didn't make saleable product.
I would assume that automating a previously manual industrial process is a lot more difficult than just training a neural network. My gut feeling is that 500M is an underestimate.
You probably have more experience to estimate price better, but I guess I'm working off a different premise. I thought the claim to defend is that "if you go far enough into the future, it will be cheap".
A parallel - facial recognition today is a matter of `npm install face-api.js` (or some other library). Advances in hardware and ML architecture design can bring the currently-challenging world of RL to similar levels.
Another way to put my argument is through a question: do you think that even 20 years from now, the cost of a useful RL system will be around $500M (inflation adjusted)?
There are a couple ways to think about this. Yes, automation has occurred, often at the level of a single machine. There is a difference between automation and optimization. And there are multiple levels to which both can be applied. One area where most previous optimizers fail, and where deep RL can make real contributions, is in the coordination of larger systems. An analogy would be the teams of agents that OpenAI trained to win at Dota 2. Similar teamwork can produce better results in industrial settings than they are currently achieving by automating or optimizing the behavior at a machine-level. The trick is to optimize groups of machines, entire plants, and networks of plants (e.g. through planning where to send items to be processed).
In the world of supply chain and industrial control, people typically fight hard to get an additional 1% gain in optimization. Deep RL can often surface decision paths that lead to double-digit gains, and those gains, for many companies, will be worth a lot of money.
Fwiw, it does not cost anywhere near $500M to train deep RL.
Big companies with big markets, they’re the ones with the money who can take those risks and collect the rewards in the future. That’s why the “big get bigger” and that’s been true since forever.
I’ve worked at a startup providing a similar solution. Even for an ad piece, the article is wildly underestimating the difficulty in getting this to work and the problem is not lack of state of the art methods.
Most customers don’t have the type of data to control these systems, even if they think they do. While industry 4.0 has taught companies to measure everything, there is rarely any notion of data integrity or quality in place, and even your smart new ML technique will not be able to control system with very low correlation between observations and controls. The reality for companies providing these ML solutions is that they end up spending all of their resources on fixing each clients’ data problems rendering their smart solution irrelevant.
It's true that a lot of companies are at different stages of what we would call "the journey of digital transformation." Some of them do not know what's happening inside their plants. The problem before the problem -- the upstream chokepoint of ML -- is gathering, collecting and cleaning the right data. That's true across the industry and it's the main reason why ML is not moving faster, and why we don't see more huge ML-based startups. Scale, one of the unicorns, is chiefly devoted to data annotation... But there are a lot of smaller, well defined problems with machine scheduling and/or inventory management where they have exactly the data they need, and they are already feeding it to an optimizer. Deep RL can often outperform against those existing technologies.
I think there are lower hanging fruit in manufacturing than this, although someone more familiar with the industry can tell me how realistic this is. I was speaking to my father-in-law recently, who has helped design production lines his entire life. One interesting thing he said was that companies often have huge investments in machines that are effectively no longer supported by manufacturers. Many of these machines are based on simple circuits or microcontrollers, and their behaviour is more or less fixed at this point. But in many cases the performance of the machine is not optimal, with wasteful pauses and inefficient transitions between parts of a job. It occurred to me that there’d be a reasonably useful business in learning the behaviour of the controllers in these machines, and in some cases it must be possible to just rip out chips and replace them with tweaked versions. I don’t believe the IO can be very complex in this generation of machines (we’re talking various cutting machines etc) but you’d be offering savings of millions if you could build a scalable business tweaking them. One approach to that would be treating microcontrollers as black boxes and just using deep learning to pick up the mappings between inputs and outputs. Anyway, completely idle speculation on my part but free business idea for someone if they thought it was in any way possible, and entirely possible this is big business already.
This is called 'retrofitting', and it commonly costs $100k to $300k per machine. It's a niche business as people smart enough to pull it off are fairly rare.
So after reading the article I started reading up on deep RL, and came across what is essentially a blog post that shows you how to train a model to play Super Mario Bros [1]. I have of course done all the mNIST handwritten number recognition stuff, and have done quite a bit of real world analysis with scikit-learn, but this is... pretty cool!
RL theory is cool and all, but a large part of the trick seems to be the neural network architecture itself (that is, having a good parametrized model for the policy) and millions of evaluations. For some Atari games it turned out to be almost as sample-efficient to try random weights instead of RL. Not for everything, and e.g. policy gradient is certainly worth learning about. But see for example https://openai.com/blog/evolution-strategies/
Personally I expect all real-world applications of RL to be trained in simulation, with tricks to make sure they also learn to adapt to reality at startup (meta-learning). For example by simulating each episode with parameters that are slightly off.
Many comments here actually state that RL is really data hungry.
That is true, and indeed one of the difficulties to overcome.
However there is a lot of work in the field with making use of transfer learning / meta learning to tackle this and I have seen successful implementations in the manufacturing industry.
The bigger blocker right now is the simulation part itself (and there is also a lot work on that!)
Training a DRL for real world physical processes ob real world processes has the Problem, that a failure is extremely expensive.
You don’t want to see a manufacturing process moving too fast (because it’s still learning / optimizing) and then breaking a 500k router spindle.
Simulation is certainly one of the trajectories to deal with this though.
The third big block is then really the cost: for many manufacturing processes there are pretty good parameters found over multiple years. The trade off of the costs of a new DRL system - training it, potential failures, deploying it and training the users - to the gains need to be big enough to justify the use of DRL financially. Again standardization will help with this, but it requires significant R&D costs upfront only few are willing to pay.
In the end, in my opinion we will see the application more often, but it takes time and effort to improve and make it cost effective.
If you want to revolutionize manufacturing, you need to open a large number of small job shops each specializing in a different kind of work, then do applied research and engineering in this real world environment. Step one is making the shops successful, step two is throwing a bunch of geniuses into the room who can help the people who understand what's needed build interesting new technologies.
Large existing manufacturing firms with the kind of money to do this can't get out of their own way, and will never be a source of innovation. Small companies don't have the funds to throw at maintaining a staff of genius engineers and researchers.
Most outsiders with access to capital looking at innovating in the space probably don't even appreciate the difference between a job shop and an OEM, and are likely to lose lots of cash on projects headed by former OEM enterprise middle managers who exclusively have experience getting in the way of production and wasting money.
I'd love to be part of a highly funded team of smart people whose job was to reinvent manufacturing in a practical environment.
How about Inverse Reinforcement Learning? MaxEnt and Deep MaxEnt seem to estimate the reward functions pretty well. But I've only seen limited applications of IRL. Researchers mostly focus on gridworld examples or robot arms learning how to do certain tasks from (human) experts. But how can IRL have more influential role in the society?
Most production planning problems are inherently high-dimensional and come with complex constraints. When you add noise to the problem, like random demand, all generic RL algorithms ran into trouble once you added complexity (like producing multiple products on the same line). So if you’re looking for good examples to break RL codes, throw them a nice stochastic economic lot sizing problem with sequence-dependent setup times.
By constrast what works well is policy search over parametric policies that consider domain knowledge. Sounds boring, but is quite hard to beat.