Herman Rubin <email@example.com> replied:
>It is claimed that the algorithm was derived by some heuristic procedure
>using the information about the 34 first rank individuals. Since
>if one looks hard enough for an algorithm providing such an association,
>one is likely to find one, this cannot be used as evidence for something
>However, IF none of the information about the second rank individuals
>was used in producing the algorithm, it could be tested on them. This
>was the test proposed by someone who knew nothing of what procedure
>they had used.
This is in accordance with the story as I heard it - the study of
first-rank individuals was, in fact, done first. But the rebuttal theory
offered by the reviewers was that given the data about the first-rank
individuals, one could concoct a study which produced amazing results
(though it's pretty difficult to produce one in a thousand million results,
no matter what!). Since the study of the second-rank individuals was
clearly set out in advance of any actual data-gathering effort, it was
considered genuine - and it produced results quite close to the previous
study (actually, just slightly _less_ likely to show up at random). I have
a copy of an earlier paper showing the studies on both groups of individuals.
However, Dr. Rubin also writes:
>As far as I know, nobody outside the "codes" group knows the algorithm used.
I believe he intended to write "knew," for all that is relevant to this
challenge is that when the codes group produced their initial results, they
could not prove their claim that they had designed their study _before_
examining the data. It is not true that today, no one else knows the
algorithm used - the experiment is laid out in sufficient detail in the
Statistical Science article, complete with distance measures, etc., that I
think anyone with good programming and statistics skills could get a copy
of the Torah on disk and replicate the entire experiment (I think I could
pull this off myself, but it would take quite a while to do). It is my
understanding that Harold Gans first got involved by doing exactly this.
>>At this point, however, it would be prudent to reserve
>>judgement on this entire enterprise rather than jumping on the
>With this I agree. I am a quite suspicious mathematical statistician.
>My Hebrew would not be good enough to check their algorithm, but I
>would very much like to see other suspicious people look at it
>carefully, as well as their claims that the Samaritan version has no
I think Harold Gans is the first and last other person to go to the
necessary trouble. Most other researchers have not challenged the results,
because this isn't like a sociological study where different people might
obtain different data. In this case, if the researchers "fudged their
data," this would be glaringly obvious to anyone who checked, and these
researchers would immediately become the global laughingstock of the
community of statisticians, and would probably jobs (can't tenure even be
revoked in these cases?). It's not likely they would do this.
The prudent reaction is then not to ignore the results pending additional
studies, but to act as if this study may be genuine. When the first studies
demonstrated a link between smoking and lung cancer, many people decided
that the possibility that the studies were correct was sufficient for the
possible danger to outweigh the pleasure of smoking - others insisted the
results were inaccurate. Today, the link is considered proven, but many
thousands lost their lives in the meantime. What are the implications of
_this_ study? That is for the individual to decide.