I've just read your September 2005 letter to the BMJ "The statistical fluctuation in the natural recovery rate between control and treatment group dilutes their results. The statistical fluctuation in the natural recovery rate between control and treatment group dilutes their results" You wrote: I find a 10% probability that a "minimum worthwhile difference between the two arms set at 1.0% syllables stuttered" occurs due to statistical fluctuation and not due to treatment effect. The standard aims for a less than 5% chance that the observed results occur by chance (usually expressed as p<.05), whereas you're observing that this trial achieves only 10% (p<.10). Is that what you're saying? If so, 90% is obviously not as powerful a result as 95%, but it's not worthless either.I show that two control groups (i.e. I assume no treatment effect) have a worthwhile difference in 10% of all cases, because the natural recovery rate causes the groups to start at a different baseline (they have a different percentage of natural recovers by chance) (This effect gets smaller with increasing sample size). Then I argue: If with no treatment effect, I have a 10% chance of seeing an effect, then something is wrong because I should expect that in 0% of the cases there is an effect because both groups receive no treatment. So their p-value (which they claim less than p<0.01) MUST BE wrong and higher, because the t-test assumes that they are the same. So their statistics is wrong, and they must re-do their stats. What exactly the value is is not easy to compute.
Possibly instead of choosing an effect size (1.0% difference in syllables stuttered) that allowed them to claim p<.10, they should have claimed a smaller effect size (for example, 0.5% difference in syllables stuttered) that would have supported the stronger claim of p<.05.What you are doing is shifting the criteria to get a result! See my arguments above, we need a strong signal to be sure that no other systematic effect can destroy it.
Though I don't fully understand your argument in your Sept 2005 letter, I would agree if you're merely saying that their results may not be as strong as they're suggesting. But on the other hand it seems to me that your claim that "the random control trial did not show efficacy despite their claims" is also too strong; 90% may not be 95%, but a 90% number is still strongly suggestive.I am saying that the p-value is much higher, but I do not know how much higher.
However, there is at least one more crucial flaw in the study that I haven't talked about in the letter but I did in my talk which supports my claim that they have not convincingly shown efficacy. They only observe the kids for 9 months. Let's assume that the only thing Lidcombe does is to accelerate the recovery of those kids that would have recovered anyway, then I see a treatment effect over the short-term. Here is how it works. Lets assume at the start 100% of the kids stutter, and after 2 years only 20% of the kids stutter, those that do not recover naturally. I also assume that 40% recover naturally per year, so I have 40% first year, 40% second year, and 20% don't recover. If I run the experiment for one year. Then in the control group only 40% recover, but in the treatment group 80% will recover (remember I assume that Lidcombe makes the kids who recover naturally recover faster). And so I have a 40% higher recovery in the treatment group!!! But if I wait for one more year, the control group will have the other 40% kids recover naturally, and then there is NO difference any more between the two groups!!! So they need a longer observation period to control for this effect! Another aspect might be to look for relapses of kids...