More anecdata: we consistently outperform lightgbm, xgboost, random forests, lin...

suresk · on Oct 11, 2019

This was a really interesting and insightful comment, thanks for sharing. I think the conclusion I shared in my sibling comment was probably a little too broad.

I particularly like this:

> In my experience it really comes down to how many “tricks” you know for each algorithm and how well can you apply and combine these “tricks”. The difference is that neural networks have many more of these tricks and a broader coverage of research detailing the interactions between them.

This is pretty true - the lack of knobs to turn on something like XGBoost or LightGBM both make it pretty easy to get good results and harder to fine tune results for your specific problem. Maybe this isn't the most correct way to look at it, but I've always sort of pictured it as curve where you are plotting effort vs results, and the one for LightGBM/XGBoost starts out higher but is more flat, and the NN one is steeper.

I guess reading your post makes me wonder where the two curves cross? Do you have good intuition for that, or do you feel so comfortable with neural networks that they are sort of your default? I peeked at the company you have listed in your bio, and it looks like you have pretty deep experience with neural networks and work with other people who have been in research roles in that area too, and I wonder how that changes your curve compared to the average ML practitioner? Certainly figuring out how to pick the best layer combinations, optimizer, loss functions, etc benefits hugely from intuition gained over years of experience.

jimfleming · on Oct 11, 2019

I think your conclusions are accurate. For many problems LightGBM or xgboost can often yield decent results in short amounts of time and for many problems that’s sufficient. A lot of the work we do is about pushing the results as far as we can take them and the business case justifies the extra time it can take to get there. For those types of problems, today, we would probably choose a neural network because then we have a lot more knobs as you mentioned.

Just like the rest of ML, whether neural networks are the right choice still depends on the problem at hand and the team implementing the solution. It definitely impacts where the performance / time curves intersect. If we just need something decent fast, or we’re working with another team that doesn’t have the same background, we tend to focus on approaches with fewer moving pieces. If we need the best possible performance, have a qualified team to get there, and have the time to iterate on development then the curves would favor neural networks.

oli5679 · on Oct 11, 2019

Do you have any advice on how to increase performance of NN? Id be interested to see some examples of NN doing better than lgbm benchmark on medium size tabular data. What black magic is needed to achieve this? Would be super valuable to my job:)

This tuning approach gets good results for Lightgbm. I'd recommend using TimeSeriesSplit.

https://www.kaggle.com/nanomathias/bayesian-optimization-of-...

I've seen colleagues do something like this, or random search over NN architecture (NUM layers, nodes per layer, learning rate, dropout rate), always falling short of results this archives, despite far longer time to code up an tune model.

jimfleming · on Oct 11, 2019

It’s really very problem dependent. I allude to a few low-hanging things in my post above: e.g. feature engineering. Just because neural networks have an easier time learning non-linear feature transformations doesn’t mean its good to ignore feature engineering entirely.

Possibly more important is to focus on the process for how you derive and apply model changes. You get some model performance and then what? Rather than throwing something else at the model in a “guess-and-check” fashion, be methodical about what you try next. Have analysis and a hypothesis going in to each change that you make and why it’s worth spending time on and why it will help. Back that hypothesis by research, when possible, to save yourself some time verifying something that someone else has already done the legwork on. Then verify the hypothesis with empirical results and further analysis to understand the impact of the change. This sounds obvious (it’s just the scientific method) but in my experience ML practitioners and data scientists tend to forget or were never taught the “science” part. (I’m not accusing you of this; it just tends to be my experience.)

Random search, AutoML, hyperparameter searches, etc. are incredibly inefficient at model development so they’ll rarely land you in a better place unless a lot of upfront work has been put in. For us, they’re useful for two things: analysis and finalization. For analysis the search should be heavily constrained since you’re trying to understand something specific. For finalization of a model before going into production, a search on only the most sensitive parameters identified during development usually yields additional gains.

Stasis5001 · on Oct 11, 2019

Any good references on the art part, or is it just an intuition you develop over time? In my experience, all the ML education will teach you a ton of theory and basics but none of the practical details you're referring to.

jimfleming · on Oct 11, 2019

It takes time and a lot of hands-on experience. Many ML teams tend to work on one or just a few tightly coupled project for years. By contrast, we’ve worked on a lot of unique projects with real-world constraints so it gives us a different perspective. An important part has been developing a rigorous process; sort of a framework for applying the “art”. As you mention, this often isn’t covered in ML or data science education, which tends to focus on (important) fundamentals.