I'm summarizing an article Data Prophet in Wired October 2012. It's an interview with Nate Silver, a statistician, a data scientist.
He pointed out a number of problems with our typical predictive models.
Out of Sample Problems
Here we get no data for the phenomenon because we did not collect such data in the first place. e.g. If your model excludes real estate as an industry, whatever rise, fall, crisis happens in real estate, you will have no data for them. And your economic prediction will also miss problmes and effects arising from real estate
Over Fitting Problem
Here, you're mistaking mere coincidences or co-occurrences for patterns. e.g. You observed that whenever ice cream sales go up, murder cases go up too. Then, you assumed that there was a pattern.
If you curb ice cream sales, will you prevent some murder cases then?
Nate advises that instead of looking for ideas, we should accept and see what is really there in the data.
Problem of Over-reacting to New Data
Maybe your model is too flexible or sensitive to new data or new types of data.
Here, I can give a personal experience. Recently I read in Newsweek about life-after-death experience of a neurosurgeon. He said that he got to a strange place and met strange beings, that there our senses were fused such that he could see a sound and hear a color at the same time etc. I believed every word of his. That is, a person like him who has lived like he has lived his life, can get to a place like that.
Nat Silver wisely warns that more is worse if we don't have a good framework, that we'd better stick to basic models if we are prone to over-reaction to new data. Wise poker players, stock investors, soldiers all now this.
I asked myself what if I got there myself. I would hear-see a sound-smell-sight. Then, I would see-hear it dissolve and another sense arise .... dissolve and arise .... dissolve and arise .... dissolve and arise ....I hope my basic, simplistic model will work there too.
The Need to Test in the Right Environment
An example of the problem situation may go like this: You created your model with data from New Mexico. You then tested the model in New Mexico again.
According to Nate Silver, the right environment is" an environment where you don’t know what’s going to happen."