Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's some additional information you'd need to determine this. Suppose, for the sake of argument, I only have one feature, X. Pretend I extend it into two dimensions by simply replicating the feature. In two dimensions, (X,X) forms a perfect straight line, which should make it clear that using an additional dimension didn't gain you anything.

Testing whether or not to include additional terms requires an understanding of the distribution of the response, as well as the amount of collinearity with the features (how similar the features are). There are some ways to do this in statistics, but this is more of something they do in inference as opposed to prediction.

Heuristically, the most common way is just to look at the cross-validated classification error and compare it with and without a feature (or set of features) in question. Asking about the distribution of the cross-validated classification rate is an interesting statistical question, though!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: