[Sticky] Lesson 2: Statistical significance in trading.
Statistical significance is another huge reason why most traders fail in their auto-trading. Why? Because nobody talks about it...and I really do not know why...Statistical significance tell you one extremely important thing: if you can 'trust' your results observed in backtest or forward test or is everything what you see just due to random luck.
To make clear what statistical significance is let's look at the scientific concept know as P-Value. I will try to explain that using two examples:
- Example 1: The P-VALUE is calculated by big pharmaceutical corporations each time they are testing new drugs in order to check if a new drug is really effective or if the positive results are pure accidental (due to randomness or luck). So, for example they are running tests on 20 different test groups (each group several hundreds of people) where only 1 group of 20 groups will get the real drug and the rest of groups will get fake drugs (candies). After some months they will compare if the results in this one group with the real drugs are really much better than all other 19 groups with fake drugs. If the results are significantly better, then it means that the calculated P-VALUE is 5% = 1 group out of 20 groups (1/20= 0.05 = 5%). This 5% is the minimal scientifically accepted level of P-VALUE, where you can assume that your results are statistically significant. Moreover, P-VALUE of 1% (1 out of 100) is much stronger than 5%. This also applies to trading and optimization, for each single optimization results you need to check what the resulting P-VALUE is and determine if this is really profitable setting or just a random lucky shot.
- Example 2: Imagine you claim to have a system or crystal ball (or a system) which is capable of predicting results from a simple coin toss sequence. Of course, I do not believe you and of course I want to test if you are telling the truth. So in order to test your claims I need to perform the following experiment: each time the coin will be thrown (heads or tail) I will write down your prediction upfront, but I will also write down all results from 19 other random prediction (for this I will use 19 different coins, which I will throw in parallel with the main coin to generate random predictions). So, if your claim (or system) is right I need to see a significant difference in accuracy between your predictions and my 19 randomly selected predictions, after some X-number of tosses. This is because my pure random predictions should always result in 50%/50% and your system (if valid) should give much better win/loss ratio like: 60%/40%.
This simple test also gives you the X number. This number is the minimum required number of coin tosses (or FX trades), needed to be able to tell if a given system (or an EA setting) has statistical edge or statistical significance. So, if you see a profitable setting after only 25 trades during optimization, you need to compare this to at least 19 other random trading systems (random coin tosses). If one or more random results produces equal or better results than your optimized system, then it is NOT a statistically significant result. That is why it is almost impossible to optimize using short term data (like: weekly basis), since each EA setting will produce not more than 25 trades. When comparing to random systems those random systems will always produce similar or even better results! You will not know if your system is profitable or if the positive result during optimization is caused by a random lucky shot.
You can test it by yourself, but the minimum valid number of trades >= 50. Only after 50 trades there will be a significant difference between all random systems and any profitable setting. 50 is the absolute minimum 150 or more is considered as stable (this number depends on 'degrees-of-freedom'...keep reading...). See the following example.
Figure 1: 25 trades = example of poor P-VALUE
As you can see in the example above after 25 trades the main strategy result (gold line) is not much better than randomly distributed results (based on random entry strategy, coin toss). In that case you can not say if this result is due to good strategy or just pure luck like show by random trading systems. This also means that the result is within 'first sigma' of probability distribution, among pure randomness.
Figure 4: 200 trades = example of good P-VALUE
Conclusion: In this example the selected strategy (EA setting) is profitable over long term and results in a strong P-Value of 5% (since the final result beats 19 random strategies).
Thus, in order to be able to say if the given setting is profitable or not we need to test it over a long(er) period of time using high amount of trades! The optimization/backtesting results based on a (too) small amount of trades (<100) have very low STATISTICAL SIGNIFICANCE and cannot be trusted!