A great and insightful article. A common theme I've seen in practice is folks wh...

jmount · on April 16, 2012

My definition of a "deep understanding of ML" definitely excludes people who immediately try "the most sophisticated algorithm." Buzzword jockies try all the cool stuff first, but most practitioners I have met try basic statistical methods first. Then when they see what issue they need to overcome they bring in a method designed to help with that issue.

ma2rten · on April 16, 2012

Really? I'd think that most ML people are well aware of the importance of data cleansing and feature extraction. Also my experience is that domain knowledge often (but not always - depends on the domain) helps surprisingly little. Feature extraction is mostly an iterative approach anyway: you define some very simple features, you look at the mistakes, you add some features and repeat until you are happy. Ideally you also do some visualization in there somewhere.

tel · on April 16, 2012

I think there are essentially two "deep" understandings of ML prevalent today. The first is more common: the ability to do the calculus, algebra, and probability derivations required to design complex ML algorithms combined with the CS knowledge to find/design a good algorithm and the software design skill to actually implement it on real, "big" data.

No doubt this is a difficult position to master and those who perform well are able to tackle lots of mathematical and computational challenges. They also are model builders who (have tendency to) relentlessly seek complex models in order to solve complex problems.

The other, rarer side is the learning theorist who may or may not understand the model building, algorithmic, and computational tools but understands well the theories which allow us to have reasonable expectations that the tools of the first group will work at all. These guys have a funny story in that they were the old statisticians who got a major egg-on-the-face after proclaiming that essentially all of ML was impossible. Turns out the first group managed to redefine the problem slightly and make major headway (and money).

---

The thing I want to bring to light however is that the second group knows the math that bounds the capacities of ML algorithms. This isn't easy. It's one thing to say you recognize that the curse of dimensionality exists, but it's another to have felt it's mathematical curves and to build an intuition for what forces are sufficient to cause disruption.

The more experience you have with the learning maths, the more likely you are, I feel, to apply very simple algorithms, to be scared of "little x's" (real data) enough to treat it with great care, and to attempt to explore the problem space with confidence for what steps will lead you to folly.

---

It's a fine line between the two, though. Stray too far to the first group and you'll spend a month building an algorithm that does a millionth of a percentage point better than Fisher's LDA. Spend too much time in the second camp and you'll confidently state that no algorithm exists that does better than a millionth of a percentage point over Fisher LDA... and then lose purely by never trying.

sireat · on April 17, 2012

Our Data Mining (different field but somewhat related) professor had this quote "All models are wrong, but some models are useful" on the first day of university.

You can build an extremely complicated model that is not useful, where a simpler one might suffice.

tel · on April 17, 2012

I've heard that line referred to as Box's Razor. It's definitely the right heuristic, but it's interesting to see that even if your model is right, you're still in trouble if it's too complex. This is a sort of bias/variance tradeoff.

irahul · on April 16, 2012

> On the other hand, people who know a bit about ML but understand the domain better start by applying intuition to data cleansing and then follow up with simpler algorithms.

I find data cleansing(if you are including feature selection) hard, and I consider it a refinement. If I am working on a classification problem, I start with naive bayes with a trivial feature generator(if words are feature, split on whitespace and discard some symbols), train it, and cross validate. Depending on the results of the cross validation on differently sized data-sets(say 100 tweets, 200, 500, 1000, 2000, 5000) I decide if I refine bayes further or I need to pick another algorithm.

I avoid SVM because I have a hard time figuring out the kernel and relation between data. I mostly don't use linear classifiers because the relation is very rarely linear.

Generally if the features are pseudo-independent(naive bayes assumes independent events but it might work fine even if the events aren't independent), naive bayes does the job. If not, it's time to refine the feature generator and selector.

robrenaud · on April 16, 2012

Naive Bayes is a linear classifier, and it makes much stronger assumptions than other linear classifiers.

irahul · on April 16, 2012

My bad.

Regarding stronger assumptions, is there anything other than independence(thus the name naive) that it assumes?

tensor · on April 16, 2012

That's the main assumption that people care about. In contrast, logistic regression (aka maximum entropy) does not make this assumption. As a go-to first classifier I would suggest multi-class logistic regression with regularization.

joe_the_user · on April 17, 2012

That has been my experience with AI programming.

But I would take a more pessimistic interpretation of this.

That is: all our "learning algorithms" has failed to learn and those with some clever heuristics succeed versus the broken methods we so-far have.

cema · on April 17, 2012

I would offer a somewhat more optimistic take: humans are still better learners than machines are, and our algorithms still do not capture well the way our thinking (and intuition) works. Really, we have only started with machine learning a few decades ago; catching up with millions of years of evolution can be expected to take a bit longer.

joe_the_user · on April 17, 2012

But I don't really think of that as positive.

Maybe its the pain of my previous AI job talking but when the choice is just an opaque hunch of an expert, it doesn't feel like a victory for human intelligence. A victory for human intelligence looks much like the discovery of a physical law where you both deal with a phenomenon and communicate how someone else can also deal with it.

What is quintessentially human in the modern sense is human beings understanding ourselves rather than

mturmon · on April 16, 2012

Agreed. Another ingredient is sustained engagement with the problem, so that your algorithm works not just for a pre-selected demo, but actually provides noticeable performance gains for real data.