I have tested numerous RL algorithms on simulated production systems for my PhD ...

vonnik · on June 20, 2021

TechCrunch article author here.

Our team at Pathmind has applied deep RL to multiple use cases in industrial control and supply chain management, notably various forms of scheduling, and MEIO, respectively.

You can see some demo simulations here:

https://pathmind.com/examples/

https://cloud.anylogic.com/model/15ee2fe9-c45c-425d-8b98-920...

https://cloud.anylogic.com/model/9c0725cc-7eb5-44ed-a2de-ff8...

https://cloud.anylogic.com/model/8769d942-43c4-4530-bc68-243...

https://cloud.anylogic.com/model/b7da42a0-734c-461f-9f68-382...

Yes, these show RL applied to simulations, and yes, real data and real physical plants are more complex than that.

But those physical systems are already controlled by optimizers (usually mathematical solvers like IBM Cplex or Gurobi), and deep RL happens to be an optimizer that can handle a few things better:

* data variability

* multiple objectives in complex scenarios

* multiple agents making simultaneous decisions in coordination

Solving the third problem gives rise to very interesting emergent behavior among teams of machines, which learn to behave in ways that are almost impossible for a hard-coded set of rules to specify. This is the source of many of the gains that RL produces.

I see a lot of preconceptions about RL in this thread that are partially true, but also solvable.

Yes, RL is data hungry. I acknowledged that in the TC piece. The problems with data are pretty much the same problems you get with ML in general. It's messy, partial, and usually not gathered with ML in mind. RL is no more impossible than other ML approaches, based on that argument.

A valid simulation of a physical system is a source of synthetic data that helps solve that problem.

Deep RL is already controlling machines on factory floors, and it will slowly help optimize larger and larger systems, from groups of assembly lines, to networks of plants.

In some cases it will serve as a "decisioning layer" (not my word) with HITL, and in others, it will drive more autonomous systems with low latency requirements.

Believe me or not, it's happening.

Without getting into too many details, the underlying algorithms are a flavor of policy gradient implemented with the open source projects Ray/RLlib (supported by AnyScale) and Gym (from OpenAI). They're great tools and I highly recommend them to anyone who wants to explore deep RL.

Fwiw, DeepMind says RL may be the path to AGI, and smarter people than me have placed their bets on that team, which has consistently surpassed expectations about what RL can do.

https://www.sciencedirect.com/science/article/pii/S000437022...

Microsoft is making a major push into this area using their Bonsai acquisition to power autonomous systems.

https://www.microsoft.com/en-us/ai/autonomous-systems

AWS, while not particularly focused on RL, is making a big push into operational technology (OT) as well.

The incentives driving their cloud business will spur them to transform the software that controls the factory floor, or at least prompt existing OT software makers like Siemens to move faster.

If you have any questions about how deep RL is being applied, my email is in my profile. Pls hit me up!

blueblisters · on June 20, 2021

Super interesting work - and thanks for the links! Maybe the parent was talking about applying DRL to dynamical systems? For instance, making a robot learn how to tighten a screw, or pick and place from arbitrary locations? I think these are still very hard to solve with DRL because it's very difficult to simulate things like contact forces, even with generous error bounds.

I suspect DRL will become extremely useful once we figure out how to use learned priors of the physical world to build more and more complex systems.

NalNezumi · on June 20, 2021

P&P (and peg-in-hole) are two popular problem domains for robotics and there's a couple of academic success applying DRL+ Domain Randomization techniques (or any sim2real) to that. There's some daunting limitation to it though; and there are other DL/analytic methods that does it at the same level.

loehnsberg · on June 21, 2021

Thanks for sharing all of these examples.

It is great to see that you are using Anylogic which I think is an excellent workbench for creating discrete-event models of production systems and entire supply chains even. Anylogic also comes with an integrated tool for parameter tuning using global optimization.

Now, the model that I was looking into was a stochastic economic lot-sizing problem with random orders. At the end of a production run or upon arrival of a new order, the policy decided which product to produce next on a single machine. Each change however incurs switching cost and any orders that met an empty inventory were lost (the backlog case was easier).

The difficulty with this problem was noisy rewards after each state transition due to order randomness, as well as for the training algorithm to communicate the future value of an action in a particular state far enough backwards in time. As said, it worked for easy problems but not for problems with 5 products or more (which I think is a ridiculously small and simple problem).

Now, I do not know what you guys are using, whether your policy is a DNN and you are tuning parameters using REINFORCE / evolution strategies or whether you are using value function approximation with fitted Q-iteration (FQI) or similar (like DeepMind's Atari players). While I think that policy search may actually work (despite rather slow with Anylogic), I'd expect that FQI will run into trouble on more difficult problems.

That said, deep RL is a great piece of tech, but I am a bit skeptical of the claim that RL will transform the future of manufacturing, as I have made the experience that it can easily struggle even with simple problems as soon as the action space becomes high-dimensional and reward information must be communicated across hundreds or even thousands of state transitions.

Btw: are there any scientific papers on this?