From about 2000-2005 there were maybe a dozen people doing interesting theoretical work in this area worldwide, much of which was never published. Unfortunately, it received a lot of criticism from AI academics that, frankly, didn't understand the problem space which still gets repeated today.
Most of the criticisms of using approximated Solomonoff induction for AI has the caveat that they presume a naive and "obvious" approximation that had little connection to the theoretical computer science constructs actually being used to solve the problem of efficient, general approximation. In other words, the criticisms are a bit of a strawman fallacy.
Sophisticated approximations with excellent theoretical properties for general purpose computation had another critical defect instead: they had terrible scalability on real silicon due to the strong non-locality of operations when implemented in any obvious way at the time. Basically, you could prove that a different computing substrate would scale up to incredible capability but vanilla silicon was pathological which limited it to toy problems in practice. In principle, these Solomonoff induction approximations should easily outperform any of the "deep learning" technologies currently being used.
However, I would make the observation now that these computing models should be practically scalable using recent topological parallelization techniques, computer science that did not exist in 2005. I suspect few people have noticed that this is a solvable problem now.
I'd argue that NNs (i) have to have a strong bias towards sensible structure to work well (e.g., use image topology in convolutional networks), and that (ii) a trained NN models one particular mode of the distribution, similar how Variational Bayes fits to one particular mode.
Which is arguing for ensembles of neural networks, which - as people have found out - are even more effective than single NNs.
From about 2000-2005 there were maybe a dozen people doing interesting theoretical work in this area worldwide, much of which was never published. Unfortunately, it received a lot of criticism from AI academics that, frankly, didn't understand the problem space which still gets repeated today.
Most of the criticisms of using approximated Solomonoff induction for AI has the caveat that they presume a naive and "obvious" approximation that had little connection to the theoretical computer science constructs actually being used to solve the problem of efficient, general approximation. In other words, the criticisms are a bit of a strawman fallacy.
Sophisticated approximations with excellent theoretical properties for general purpose computation had another critical defect instead: they had terrible scalability on real silicon due to the strong non-locality of operations when implemented in any obvious way at the time. Basically, you could prove that a different computing substrate would scale up to incredible capability but vanilla silicon was pathological which limited it to toy problems in practice. In principle, these Solomonoff induction approximations should easily outperform any of the "deep learning" technologies currently being used.
However, I would make the observation now that these computing models should be practically scalable using recent topological parallelization techniques, computer science that did not exist in 2005. I suspect few people have noticed that this is a solvable problem now.