What are some good references on neural ODEs that don't come from the Julia comm...

orbifold · on Oct 21, 2021

Neural ODEs are essentially a rebranding of adjoint sensitivity analysis, which has been around in various forms in established solver suites, such as Sundials, PETSC, etc. The machine learning community got a hold of it, cited one book and otherwise happily reinvented everything.

IlyaOrson · on Oct 22, 2021

That actual content of that work was good but very misleading with an excess of backpropaganda and a poor literature review. The training procedure makes sense as continuous time backprop but it is mostly a special case of adjoint sensitivity analysis. The use of a NN as defining an ODE system seems fair to be named Neural ODE, imho its a good name, although again it was not completely novel as the writing style through the paper makes it look.

orbifold · on Oct 22, 2021

I agree that it is a good name and evaluating this on "ML tasks" was novel. However this and subsequent papers did a really poor job in delineating what is novel, from what is well known. Moreover this is a pattern in almost all subsequent papers, where people literally pretend that they were the first to consider parameter gradient computation of controlled differential equations, hybrid dynamical systems, etc., while in fact this has been worked out in full generality since essentially the 60s.

ChrisRackauckas · on Oct 22, 2021

Yes, that's mostly right (instead of PETSc put FATODE since PETSc TS Adjoint was only published in 2019, but it's based heavily on FATODE's techniques https://epubs.siam.org/doi/10.1137/130912335?mobileUi=0). And that's why the Julia tools were so ready for it: we already had adjoint sensitivity analysis (implemented for parameter estimation in systems pharmacology), so neural ODEs were a freebee. Similar to DEQs using the adjoint of a nonlinear solve: that means a DEQ is just a neural network inside of a Julia nonlinear solver and you call the gradients and you'll hit the right overloads that were originally implemented for parameter estimation of elliptic PDEs. There are definitely some aspects of the ML community that have muddied the waters so-to-speak.

The recent one that I found funny was the Second-Order Neural ODE (https://arxiv.org/abs/2109.14158). The second order adjoint is rather old, with a canonical implementation in Sundials (https://github.com/LLNL/sundials/raw/master/doc/cvodes/cvs_g...) based on a very good analysis of how to do second order adjoints fast (https://epubs.siam.org/doi/abs/10.1137/030601582?journalCode...). But what about putting a neural network in there? In Julia it's just composing forward-over-reverse to get the optimal adjoint that matches Sundials, so there's been a tutorial on it in DiffEqFlux since 2019 (https://diffeqflux.sciml.ai/dev/examples/second_order_adjoin...) with both Newton and Newton-Krylov methods. The rest of the optimizations from the paper follow using BacksolveAdjoint and relying on dead code elimination (DCE), which is a compiler pass that wouldn't exist in Python so I guess they have to care? I assumed that was too trivial to publish given that history of prior work, but somehow that paper got a NeurIPS Spotlight with "a novel computational framework for computing higher-order derivatives of deep continuous-time models". At this point I just kind of shrug though, I think it's a symptom of conference culture not giving people enough time to review the literature thoroughly so random things seem to seep through. Take it as one reason among many to treat any ML conference paper similar to an unreviewed preprint.

All of that together, that's why the Julia universe of tools has mostly been focusing on improvements and performance vs Sundials, PETSc, etc. since those are the real challengers. You can see that we spend our time benchmarking against Sundials all of the time in our stiff ODE benchmarks and recently started outperforming it with QNDF pulling 2x-5x wins in various ways (https://benchmarks.sciml.ai/html/Bio/BCR.html, see https://sciml.ai/news/2021/05/24/QNDF/). The stiff neural ODE paper describes an 3 ways to achieve an improvement on the Sundials adjoint in terms of complexity (https://aip.scitation.org/doi/10.1063/5.0060697), and our adjoint benchmarking paper shows how putting all of these various pieces together leads to about 2-3 orders of magnitude improvement over the naive CVODES adjoint (https://arxiv.org/abs/1812.01892) (against the method, the follow up of course will be against the direct wrapping).

What's funny though is that people then get upset when we show benchmarks against some of the Python tools, like >100x improvements in solver speeds on physical and biological problems (https://gist.github.com/ChrisRackauckas/cc6ac746e2dfd285c28e...) and adjoints (https://gist.github.com/ChrisRackauckas/4a4d526c15cc4170ce37...). I don't understand why anyone would be surprised though: we've spent years "competing" against the C and Fortran codes and only recently started pulling ahead due to the combined effort of a whole community, while the Python tools were just a few people with simple methods who never benchmarked against the previous tools. If they benchmarked enough they would see that we're not an outlier claiming to be 100x faster than everyone else, instead we're in and slightly ahead of the pack but they are the outlier that is 100x behind the whole group. Personally, I would require every paper to at least have a benchmark against Sundials (and/or its methods) as a baseline which is the standard we tend to hold.

orbifold · on Oct 22, 2021

Ha, yeah I knew that analysis of how to do second order adjoints fast :), in what is basically documentation no less. I figured someone would probably do a ML paper on that eventually. I have had a hard time judging, what is non-trivial in the past, aswell. The next thing that they will probably discover is Jet spaces and Hopf algebras.

I'm also slightly salty because a recent paper (https://arxiv.org/pdf/2011.03902.pdf) claimed "Differentiable event handling generalizes many numerical methods that often have specialized methods for gradient computation,...", citing one of our results. We instead went through the trouble to find a semi-complete list of prior work, which would make it abundantly clear that what they claim as new is in fact traceable to a paper by Rozenvasser in ~1965 (we translated it from Russian to make sure), with a long history of more recent work. In fact I think DifferentialEquations.jl just supports this out of the box and has non-broken event-handling (their implementation tests against a product of jump conditions).

I am aware of those benchmarks :). I will probably adopt your library for this very reason, although I have to say that I really don't like Julia as a language. It is pretty clear that besides Sundials, PETSC and other C++ based libraries, you are rapidly becoming the only game in town. As much as I am tempted I really shouldn't be handwriting integration routines :). I don't think you need a random internet stranger to tell you that, but it is abundantly clear that DifferentialEquations.jl provides far more long-term value than these types of machine learning papers.

adgjlsfhk1 · on Oct 22, 2021

Just wondering, are the reasons you don't like Julia existential (eg, it's a dynamic language without static types), or stuff that we can fix? Also, I wouldn't be surprised if at some point in the next decade, there start to be good C/C++ libraries written in Julia.

orbifold · on Oct 22, 2021

I would say I'm relatively pragmatic. My main use case would involve DifferentialEquations.jl modelling either certain quantum processes or model equations of biological neurons. I have a rough idea how that would work and what kind of abstractions I would like to implement. I mainly have not been able to get a good developer experience with it, so far. Things tend to either work or fail rather drastically (spectacular long back traces). But it clearly is very useful, so I will probably bite the bullet and try to use it again.

UncleOxidant · on Oct 21, 2021

There's some info from the Pyhon/Pytorch camp: https://towardsdatascience.com/neural-odes-with-pytorch-ligh...

I suspect neural ODE work was done in Julia earlier because it was easier given some language features and libraries. But there does seem to be some work on neural ODEs in Python/Pytorch.

ssivark · on Oct 21, 2021

The original Neural ODEs paper is quite readable, and by now there are loads of blog posts and even a few talks on the subject.

The basic idea is inspired by the “adjoint method” for ODE solving (so you don’t have to hold in memory all the intermediate layer outputs — which is otherwise necessary to compute the backpropagated gradient signal).

ChrisRackauckas · on Oct 21, 2021

Yeah, though with the method described in that paper you do have to be very careful since it has exponential error growth with the Lipchitz constants of the ODE. See https://aip.scitation.org/doi/10.1063/5.0060697 for details. But that is generally the case in numerical analysis: there's always a simple way to do things, and then there's the way that prevents error growths. Both have different pros and cons.

adgjlsfhk1 · on Oct 21, 2021

What do you mean by "the coupling of algorithm types to tools" (not judging, just curious).

jstx1 · on Oct 21, 2021

More bluntly my question is if SciML is that good, why aren’t more people doing it yet? Why is it limited to a small group of Julia developers and packages?

(There are good possible explanations - it could be very new, have only niche applications, Julia is somehow uniquely suited for it etc. I don’t know)

krastanov · on Oct 21, 2021

Some minor clarifications: NeuralODEs are not a Julia invention. I am pretty sure the first papers on the topic were using a python package implementing a rather crude ODE solver in torch or tensorflow. Julia just happens to be light years ahead of any other tool when it comes to solving ODEs, while having many high-quality autodifferentiation packages as well, so it feels natural to use it for these problems. But more importantly, SciML is not just for your typical Machine Learning tasks: being able to solve ODEs and have autodiff over them is incredibly empowering for boring old science and engineering, and SciML has become one of the most popular set of libraries when it comes to unwieldy ODEs.

ChrisRackauckas · on Oct 21, 2021

Lots to say here. First of all, the community growth has been pretty tremendous and I couldn't really ask for more. We're seeing tens of thousands of visitors to the documentation of various packages, and have some high profile users. For example, NASA showing a 15,000x acceleration (https://www.youtube.com/watch?v=tQpqsmwlfY0) and the Head of Clinical Pharmacology at Moderna saying SciML-based Pumas "has emerged as our 'go-to' tool for most of our analyses in recent months" in 2020 (see https://pumas.ai/). We try to keep a showcase (https://sciml.ai/showcase/) but at this point it's hard to stay on top of the growth. I think anyone would be excited to see an open source side project reach that level of use. Since we tend to focus on core numerical issues (stiffness) and performance, we target the more "hardcore" people in scientific disciplines who really need these aspects and those communities are the ones seeing the most adoption (pharmacology, systems biology, combustion modeling, etc.). Indeed the undergrad classes using a non-stiff ODE solver on small ODEs or training a neural ODE on MNIST don't really have those issues so they aren't our major growth areas. That's okay and that's probably the last group that would move.

In terms of the developer team, throughout the SciML organization repositories we have had around 30 people who have had over 100 commits, which is similar in number to NumPy and SciPy. Julia naturally has a much lower barrier to entry in terms of committing to such packages (since the packages are all in Julia rather than C/Fortran), so the percentage of users who become developers is much higher which is probably why you see a lot more developer activity in contrast to "pure" users. With things like the Python community you have a group of people who write blog posts and teach the tools in courses without ever hacking on the packages or its ecosystem. In Julia, that background is sufficient knowledge to also be developing the package, so everyone writing about Julia seems to also be associated with developing Julia packages somehow. I tend to think that's a positive, but it does make the community look insular as everyone you see writing about Julia is also a developer of packages.

Lastly, since we have been focusing on people with big systems and numerically hard problems, we have had the benefit of being able to overlook some "simple user" issues so far. We are starting to do a big push to clean up things like compile times (https://github.com/SciML/DifferentialEquations.jl/issues/786), improve the documentation, throw better errors, support older versions longer, etc. One way to think about SciML is that it's somewhat the Linux to the monolith Python packages's Windows. We give modular tools in a bunch of different packages that work together, get high performance, and become "more than the sum of the parts", but sometimes people are fine with the simple app made for one purpose. With DEQs, there's a Python package specifically for DEQs (https://github.com/locuslab/deq). Does it have all of the Newton-Krylov choices for the different classes of Jacobians and all of that? No, but it gets something simple and puts an easily Google-able face to it. So while all it takes in Julia with SciML is to stick a nonlinear solver in the right spot in the right way and know how the adjoint codegen will bring it all together, the majority want Visual Studio instead of Awk+Sed or Vim. We understand that, and so the DiffEqFlux.jl package is essentially just a repository of tutorials and prebuilt architectures that people tend to want (https://diffeqflux.sciml.ai/dev/) but we need to continue improving that "simplified experience". The age of Linux is more about making desktop managers that act sufficiently like Windows and less about trying to get everyone building ArchLinux from source. Right now we are currently too much like ArchLinux and need to build more of the Ubuntu-like pieces. We thus have similarly loyal hardcore followers but need to focus a bit on making that installation process easier and the error messages shorter to attract a larger crowd.

ViralBShah · on Oct 21, 2021

What do you mean by "more" people? Perhaps you mean people who know? Anyone who solves a differential equation in Julia is using the SciML ecosystem of packages. The Julia ecosystem is about 1M users, and lots of people in that ecosystem use these tools.

There's over 100 dependent packages: https://juliahub.com/ui/Packages/OrdinaryDiffEq/DlSvy/5.64.1...

UncleOxidant · on Oct 21, 2021

> Why is it limited to a small group of Julia developers and packages?

I don't think there are any gatekeepers limiting it's use. Articles like the one highlighted here help to get the word out to more potential users.