What is datasnooping? In the trial run performance of CASTrader, I was quite pleased that it was able to detect price patterns in the market with no datasnooping (aka data dredging or even data mining, although "data mining" commonly references "good" data analysis). Current academic research about possible trading patterns in the market now typically go to great lengths to prevent datasnooping, because otherwise, they are highly suspect results. You may ask "If I find a pattern in the market that is highly profitable, shouldn't I exploit it?" Not necessarily - how many other patterns did you discard to find that one? Simply put, if it's too many, you may be guilty of datasnooping.
What happens when you datasnoop? You fool yourself - you are fooling yourself by finding fleeting patterns that will not last. Those profits, however real in the past, are illusory going forward. At best, you might match the market. At worst, you could underperform it. The worst problem in my opinion, though, is when traders attempt to pick the best trading system on a so-called risk-adjusted basis. Depending on the way you measure performance, you could be convincing yourself you have a high-return system with some added volatility, when all you really have is mediocre returns with high volatility.
Datasnooping Reality Checks. Academics have come up with "reality checks" for datasnooping, which certainly help build confidence in sorting out datasnooped effects from real ones. In my opinion, however, these reality checks are exactly that - a check, not a guarantee. In other words, I don't think a pass/fail on the reality check necessarily means a system will pass/fail going forward.
CASTrader and datasnooping. The first incarnation of CASTrader was developed without datasnooping. The pattern finding algorithms were new and novel (and none were discarded or biased), and incorporated no pre-conceived notions of what patterns should look like - they were not based on anyone else's work on technical analysis that could have been datasnooped. Furthermore, traders in CASTrader are created randomly and with equal starting capital, and thrown at the data in a "sink or swim" fashion. In other words, I'm testing all hypotheses collectively from the get-go and discarding none. This is about as far from datasnooping as one can possibly get in my opinion. I'm fully confident the patterns CASTrader I found are not the result of datasnooping. Of course, I haven't exactly applied CASTrader to the market yet, and therein lies the rub.
CASTrader II will be subject to the datasnooping effect, and I'll address that in Part II.
Comments