A Review of the Challenge- Dechallenge- Rechallenge Design
I think it goes it without saying that pointing this out doesn’t always go over well. These issues deal with the core of what different people consider to be acceptable evidence. It also inevitably ties into an extremely emotive issue; their children’s health and wellbeing. Sometimes though, the people the people who have been accused of using such a fallacy do take the criticism to heart. They want to know what they could do to offer proof and be convincing.
I have been asked this several times. Frankly it is a frustrating question for me to answer, because I know the answer itself is unfortunate and unsatisfactory. And the answer is: “Not much”. The very limited list of possibilities includes:
1. Volunteering their child to be a participant in trials of their preferred treatments.
2. Conducting group based research like a double blind, placebo controlled, crossover design.
3. Advocate for studies to be undertaken concerning their preferred treatment.
4. Conducting a well controlled single case design.
The first option is unlikely because such trials are rare and the logistics of the thing make having the opportunity difficult. The second option is exceptionally hard because it requires significant technical expertise and well as considerable time and money. The third option is very indirect and often feels unsatisfying. It takes the ball out of the advocate’s court and places it into the hands of some third party.
This leaves the fourth option. This seems to be the option that many of the caregivers who accept the criticism of the post hoc fallacy, attempt to use. Specifically they attempt to make use of the Challenge- Dechallenge- Rechallenge Design. Said another way, this design has three phases, a treatment phase, followed by a baseline (non- treatment) phase, followed by another treatment phase. The advocates or caregivers claim this is a scientific design, which proves that a treatment was effective for their child.
To such advocates it may look like a good plan. This design is well considered in some mainstream scientific circles; it doesn’t involve the expense and difficulty recruiting for a group design or the detailed knowledge of inferential statistics. The basic mechanics of the design are easy enough to understand. It puts the ball back in the caregiver’s court. And maybe best of all, it seems to provide a direct answer about the effectiveness of a treatment, for someone very dear to the advocate or caregiver.
Unfortunately, a certain cliché is apt here: nothing is ever easy. The treatment- baseline- treatment design, even in the best possible case is a quasi- experimental design. It has reduced validity to answer if the treatment is what caused the improvement for the child. This has nothing to do with the fact that there is only one child. In fact there are several excellent single subject designs that allow us to be quite confident that treatment caused the improvement. Unfortunately, this design isn’t one of them.
What worse is that even though this design when conducted under excellent conditions and high control might have some validity, this still requires years of study, guided practice, and hard work to understand. Single subject has its own quirks and trip falls just like any other type of research from epidemiology to double blind designs. The point is you are not going to be competent in this type of design from reading a book or researching the topic on the internet. They could become so, but it takes years of hard study and practice.
The concerned caregiver or advocates who claim that they administered a treatment, withdrew, it administered it again, are not drawing level with trained researchers who use this design. They don’t have the same safeguards and controls. A good research project isn’t just the basal design, it is the level and appropriateness of the controls in the design. A trained researcher’s design might have:
1. More than one participant being investigated at the same time or in sequence to control for maturation or outside factors.
2. Specific targeted behaviors, with objective definitions, as opposed to someone’s general impression of “wellness” or competency.
3. Appropriate reliable and valid assesment.
4. Inter- observer agreement assessment, to make sure that those who give the assessment or record the behaviors are doing so in a valid and reliable manner.
5. Careful notation of possible confounds that emerge over the conditions.
6. Appropriate graphical analysis to observe trend, level, variability, and possible patterns indicative of additional or outside concerns e.g. (cyclical behavior).
7. Careful control of the context and environment in which the child is exposed to the treatment.
8. Submission and analysis in peer reviewed publications, where ones knowledgeable peers can point out problems, weakness, or graphical shenanigans.
9. Knowledge of what type of questions the design can and can not answer in the first place.
10. Knowledge that the conditions can not be arbitrarily switched, that there are rules for when we can transition to the next condition.
If one doesn’t have at least some of the above and especially 2, 3, 8, 9, and 10 intact, then they have no real design to speak of.
Look at Figure 1. This a graph of a hypothetical treatment- baseline- treatment design. There are only two points of comparison in this design; the first treatment phase to the baseline, and then the baseline to the second treatment phase. We have absolutely no idea what was going on before the first treatment phase implemented. And even if there was a dip in the baseline condition we have no idea if this was an artifact from an outside event or maybe based on a cyclical pattern of behavior. If you only have two points of comparison it is very hard to know.
This is figure 2. This is what happens when someone tries to use a treatment- baseline- treatment design, for a task that involved learning. You get stability where we should see variability between the phases. So, even if the treatment was biomedical and the measure was on something not easily lost like acquisition of language, the negative effect would not be observed during the language tests.