## Friday, 24 November 2017

### ANOVA, t-tests and regression: different ways of showing the same thing

In my last post, I gave a brief explainer of what the term 'Analysis of variance' actually means – essentially you are comparing how much variation in a measure is associated with a group effect and how much with within-group variation.

The use of t-tests and ANOVA by psychologists is something of a historical artefact. These methods have been taught to generations of researchers in their basic statistics training, and they do the business for many basic experimental designs. Many statisticians, however, prefer variants of regression analysis. The point of this post is to explain that, if you are just comparing two groups, all three methods – ANOVA, t-test and linear regression – are equivalent. None of this is new but it is often confusing to beginners.

Anyone learning basic statistics probably started out with the t-test. This is a simple way of comparing the means of two groups, and, just like ANOVA, it looks at how big that mean difference is relative to the variation within the groups. You can't conclude anything by knowing that group A has a mean score of 40 and group B has a mean score of 44. You need to know how much overlap there is in the scores of people in the two groups, and that is related to how variable they are. If scores in group A range from to 38 to 42 and those in group B range from 43 to 45 we have a massive difference with no overlap between groups – and we don't really need to do any statistics! But if group A ranges from 20 to 60 and group B ranges from 25 to 65, then a 2-point difference in means is not going to excite us. The t-test gives a statistic that reflects how big the mean difference is relative to the within-group variation.  What many people don't realise is that the t-test is computationally equivalent to the ANOVA. If you square the value of t from a t-test, you get the F-ratio*.

 Figure 1: Simulated data from experiments A, B, and C.  Mean differences for two intervention groups are the same in all three experiments, but within-group variance differs

Now let's look at regression. Consider Figure 1. This is similar to the figure from my last post, showing three experiments with similar mean differences between groups, but very different within-group variance. These could be, for instance, scores out of 80 on a vocabulary test. Regression analysis focuses on the slope of the line between the two means, shown in black, which is referred to as b. If you've learned about regression, you'll probably have been taught about it in the context of two continuous variables, X and Y, where the slope b, tells you how much change there is in Y for every unit change in X. But if we have just two groups, b is equivalent to the difference in means.

So, how can it be that regression is equivalent to ANOVA, if the slopes are the same for A, B and C? The answer is that, just as illustrated above, we can't interpret b unless we know about the variation within each group. Typically, when you run a regression analysis, the output includes a t-value that is derived by dividing b by a measure known as the standard error, which is an index of the variation within groups.

An alternative way to show how it works is to transform data from the three experiments to be on the same scale, in a way that takes into account the within-group variation. We achieve this by transforming the data into z-scores. All three experiments now have the same overall mean (0) and standard deviation (1). Figure 2 shows the transformed data – and you see that after the data have been rescaled in this way, the y-axis now ranges from -3 to +3, and the slope is considerably larger for Experiment C than Experiment A. The slope for z- transformed data is known as beta, or the standardized regression coefficient.

 Figure 2: Same data as from Figure 1, converted to z-scores

The goal of this blogpost is to give an intuitive understanding of the relationship between ANOVA, t-tests and regression, so I am avoiding algebra as far as possible. The key point is when you are comparing two groups, t and F are different ways of representing the ratio between variation between groups and variation within groups, and t can be converted into F by simply squaring the value. You can derive t from linear regression by dividing the b or beta by its standard error - and this is automatically done by most stats programmes. If you are nerdy enough to want to use algebra to transform beta into F, or to see how Figures 1 and 2 were created, see the script Rftest_with_t_and_b.r here.

How do you choose which statistics to do? For a simple two-group comparison it really doesn't matter and you may prefer to use the method that is most likely to be familiar to your readers. The t-test has the advantage of being well-known – and most stats packages also allow you to make an adjustment to the t-value which is useful if the variances in your two groups are different. The main advantage of ANOVA is that it works when you have more than two groups. Regression is even more flexible, and can be extended in numerous ways, which is why it is often preferred.

Further explanations can be found here:

*It might not be exactly the same if your software does an adjustment for unequal variances between groups, but it should be close. It is identical if no correction is done.

## Monday, 20 November 2017

### How Analysis of Variance Works

Lots of people use Analysis of Variance (Anova) without really understanding how it works, so I thought I'd have a go at explaining the basics in an intuitive fashion.

Consider three experiments, A, B and C, each of which compares the impact of an intervention on an outcome measure. The three experiments each have 20 people in a control group and 20 in an intervention group. Figure 1 shows the individual scores on an outcome measure for the two groups as blobs, and the mean score for each group as a dotted black line.

 Figure 1: Simulated data from 3 intervention studies

In terms of average scores of control and intervention groups, the three groups look very similar, with the intervention group about .4 to .5 points higher than the control group. But we can't interpret this difference without having an idea of how variable scores are in the two groups.

For experiment A, there is considerable variation within each group, that swamps the average difference between the groups. In contrast, for experiment C, the scores within each group are tightly packed. Group B is somewhere in between.

If you enter these data into a one-way Anova, with group as a between-subjects factor, you get out a F-ratio, which can then be evaluated in terms of a p-value which gives the probability of obtaining such an extreme result if there is really no impact of the intervention. As you will see, the F-ratios are very different for A, B, and C, even though the group mean differences are the same. And in terms of the conventional .05 level of significance, the result from experiment A is not significant, experiment C is significant at the .001 level, and experiment B shows a trend (p = .051).

So how is the F-ratio computed? It just involves computing a number that reflects the ratio between the variance of the means of the groups, and the average variance within each group. When we just have two groups, as here, the first value just reflects how far away the two group means are from the overall mean. This is the Between Groups term, which is just the Variance of the two means multiplied by the number in each group (20). That will be similar for A, B and C, because the means for the two groups are similar and the numbers in each group are the same.

But the Within Groups term will differ substantially for A, B, and C, because it is computed as the average variance for the two groups. The F-ratio is obtained by just dividing the between groups term by the within groups term. If the within groups term is big, F is small, and vice versa.

The R script used to generate Figure 1 can be found here: https://github.com/oscci/intervention/blob/master/Rftest.R

PS. 20/11/2017. Thanks to Jan Vanhove for providing code to show means rather than medians in Fig 1.

## Friday, 3 November 2017

### Prisons, developmental language disorder, and base rates

There's been some interesting discussion on Twitter about the high rate of developmental language disorder (DLD) in the prison population. Some studies give an estimate as high as 50 percent (Anderson et al, 2016), and this has prompted calls for speech-language therapy services to be involved in the working with offenders. Work by Pam Snow and others has documented the difficulties of navigating the justice system if your understanding and ability to express yourself are limited.

This is important work, but I have worried from time to time about the potential for misunderstanding. In particular, if you are a parent of a child with DLD, should you be alarmed at the prospect that your offspring will be incarcerated? So I wanted to give a brief explainer that offers some reassurance.

The simplest way to explain it is to think about gender. I've been delving into the latest national statistics for this post, and found that the UK prison population this year contained 82,314 men, but a mere 4,013 women. That's a staggering difference, but we don't conclude that because most criminals are men, therefore most men are criminals. This is because we have to take into account base rates: the proportion of the general population who are in prison. Another set of government statistics estimates the UK population as around 64.6 million, about half of whom are male, and 81% are adults. So a relatively small proportion of the adult population is in prison, and the numbers of non-criminal men vastly outnumber the number of criminal men.

I did similar sums for DLD, using data from Norbury et al (2016) to estimate a population prevalence of 7% in adult males, and plugging in that relatively high figure of 50% of prisoners with DLD. The figures look like this.

 Numbers (in thousands) assuming 7% prevalence of DLD and 50% DLD in prisoners*
As you can see, according to this scenario, the probability of going to prison is much greater for those with DLD than for those without DLD (2.24% DLD vs 0.17% without DLD), but the absolute probability is still very low – 98% of those with DLD will not be incarcerated.

The so-called base rate fallacy is a common error in logical reasoning. It seems natural to conclude that if A is associated with B, then B must be associated with A. Statistically, that is true, but if A is extremely rare, then the likelihood of B given A can be considerably less than the likelihood of A given B.

So I don't think therefore that we need to seek explanations for the apparent inconsistency that's being flagged up on Twitter between rates of incarceration in studies of those with DLD, vs rates of DLD in those who are incarcerated. It could just be the consequence of the low base rate of incarceration.

References
Anderson et al (2016) Language impairments among youth offenders: A systematic review. Children and Youth Services Review, 65, 195-203.

Norbury, C. F.,  et al. (2016). The impact of nonverbal ability on prevalence and clinical presentation of language disorder: evidence from a population study. Journal of Child Psychology and Psychiatry, 57, 1247-1257.

*An R script for generating this figure can be found here.

Postscript - 4th November 2017
The Twitter discussion has continued and drawn attention to further sources of information on rates of language and related problems in prison populations. Happy to add these here if people can send sources:

Talbot, J. (2008). No One Knows: Report and Final Recommendations. Report by Prison Reform Trust.

House of Commons Justice Committee (2016) The Treatment of Young Adults in the Criminal Justice System.  Report HC 169.

## Tuesday, 17 October 2017

### Citing the research literature: the distorting lens of memory

 Corticogenesis: younger neurons migrate past older ones using radial glia as a scaffolding. Figure from https://en.wikipedia.org/wiki/Neural_development#/media/File:Corticogenesis_in_a_wild-type_mouse.png
"Billy was a likable twelve-year old boy whose major areas of difficulty were described by his parents as follows: 1) marked difficulty in reading and retaining what he read; 2) some trouble with arithmetic; 3) extreme slowness in completing homework with writing and spelling of poor quality; 4) slowness in learning to tell time (learned only during the past year); 5) lapses of attention with staring into space; 6) "dizzy spells" with "blackouts"; 7) recurring left frontal headaches always centering around and behind the left eye; 8) occasional enuresis until recently; 9) disinterest in work; 10) sudden inappropriate temper outbursts which were often violent; 11) enjoyment of irritating people; and 12) tendency to cry readily." Drake (1968), p . 488

Poor Billy would have been long forgotten, were it not for the fact that he died suddenly shortly after he had undergone extensive assessment for his specific learning difficulties. An autopsy found that death was due to a brain haemorrhage caused by an angioma in the cerebellum, but the neuropathologist also remarked on some unusual features elsewhere in his brain:

"In the cerebral hemispheres, anomalies were noted in the convolutional pattern of the parietal lobe bilaterally. The cortical pattern was disrupted by penetrating deep gyri that appeared disconnected. Related areas of the corpus callosum appeared thin (Figure 2). Microscopic examination revealed the cause of the hemorrrage to be a cerebellar angioma of the type known as capillary telangiectases (Figure 3). The cerebral cortex was more massive than normal, the lamination tended to be columnar, the nerve cells were spindle-shaped, and there were numerous ectopic neurons in the white matter that were not collected into distinct heterotopias (Figure 4)." p. 496*

I had tracked down this article in the course of writing a paper with colleagues on the neuronal migration account of dyslexia – a topic I have blogged about previously  The 'ectopic neurons' referred to by Drake are essentially misplaced neurons that,  because of disruptions of very early development, have failed to migrate to their usual location in the brain.

I realised that my hazy memory of this paper was quite different from the reality: I had thought the location of the ectopic neurons was consistent with those reported in later post mortem studies by Galaburda and colleagues. In fact, Drake says nothing about their location, other than that it is in white matter – which contrasts with the later reports.

This made me curious to see how this work had been reported by others. This was not a comprehensive exercise: I did this by identifying from Web of Science all papers that cited Drake's article, and then checking what they said about the results if  I could locate an online version of the article easily. Here's what I found:

Out of a total of 45 papers, 18 were excluded: they were behind a paywall or not readily traceable online, or (1 case) did not mention neuroanatomical findings A further 10 papers included the Drake study in a bunch of references referring to neuroanatomical abnormalities in dyslexia, without singling out any specific results. Thus they were not inaccurate, but just vague.

The remaining 17 could be divided up as follows:

Seven papers gave a broadly accurate account of the neuroanatomical findings. The most detailed accurate account was by Galaburda et al (1985) who noted:

"Drake published neuropathological findings in a well-documented case of developmental dyslexia. He described a thinned corpus callosum particularly involving the parietal connections, abnormal cortical folding in the parietal regions, and, on microscopical examination, excessive numbers of neurons in the subcortical white matter. The illustrations provided did not show the parietal lobe, and the portion of the corpus callosum that could be seen appeared normal. No mention was made as to whether the anomalies were asymmetrically distributed."p. 227.

Four (three of them from the same research group) cited Drake as though there were two patients, rather than one, and focussed only on the the corpus callosum, without mentioning ectopias.

Six gave an inaccurate account of the findings. The commonest error was to be specific about the location of the ectopias, which (as is clear from the Galaburda quote above), was not apparent in the text or figures of the original paper. Five of these articles located the ectopias in the left parietal lobe, one more generally in the parietal lobe, and one in the cerebellum (where the patient's stroke had been).

So, if we discount those available articles that just gave a rather general reference to Drake's study, over half of the remainder got some information wrong – and the bias was in the direction of making this early study consistent with later research.

The paper is hard to get hold of**, and when you do track it down, it is rather long-winded. It is largely concerned with the psychological evaluation of the patient, including aspects, such as Oedipal conflicts, that seem fanciful to modern eyes, and the organisation of material is not easy to follow. Perhaps it is not so surprising that people make errors when reporting the findings. But if nothing else, this exercise reminded me of the need to check sources when you cite them. It is all too easy to think you know what is in a paper – or to rely on someone else's summary. In fact, these days I am often dismayed to discover I have a false memory of what is in my own old papers, let alone those by other people. But once in the literature, errors can propagate, and we need to be vigilant to prevent a gradual process of distortion over time. It is all too easy to hurriedly read a secondary source or an abstract: we (and I include myself here) need to slow down.

References
Drake, W. E. (1968). Clinical and pathological findings in a child with a developmental learning disability Journal of Learning Disabilities, 1(9), 486-502.
Galaburda, A. M., Sherman, G. F., Rosen, G. D., Aboitiz, F., & Geschwind, N. (1985). Developmental dyslexia: four consecutive cases with cortical anomalies. Annals of Neurology, 18, 222-233.

* I assume the figures are copyrighted so am not reproducing them here
**I thank Michelle Dawson for pointing out that the article can be downloaded from this site: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.949.4021&rep=rep1&type=pdf

## Sunday, 1 October 2017

### Pre-registration or replication: the need for new standards in neurogenetic studies

This morning I did a very mean thing. I saw an author announce to the world on Twitter that they had just published this paper, and I tweeted a critical comment. This does not make me happy, as I know just how proud and pleased one feels when a research project at last makes it into print, and to immediately pounce on it seems unkind. Furthermore, the flaws in the paper are not all that unusual: they characterise a large swathe of literature. And the amount of work that has gone into the paper is clearly humongous, with detailed analysis of white matter structural integrity that probably represents many months of effort. But that, in a sense, is the problem. We just keep on and on doing marvellously complex neuroimaging in contexts where the published studies are likely to contain unreliable results.

Why am I so sure that this is unreliable? Well, yesterday saw the publication of a review that I had led on, which was highly relevant to the topic of the paper – genetic variants affecting brain and behaviour. In our review we closely scrutinised 30 papers on this topic that had been published in top neuroscience journals. The field of genetics was badly burnt a couple of decades ago when it was discovered that study after study reported results that failed to replicate. These days, it's not possible to publish a genetic association in a genetics journal unless you show that the finding holds up in a replication sample. However, neuroscience hasn't caught up and seems largely unaware of why this is a problem.

The focus of this latest paper was on a genetic variant known as the COMT Val158Met SNP. People can have one of three versions of this genotype: Val/Val, Val/Met and Met/Met, but it's not uncommon for researchers to just distinguish people with Val/Val from Met carriers (Val/Met and Met/Met). This COMT polymorphism is one of the most-studied genetic variants in relation to human cognition, with claims of associations with all kinds of things: intelligence, memory, executive functions, emotion, response to anti-depressants, to name just a few. Few of these, however, have replicated, and there is reason to be dubious about the robustness of findings (Barnett, Scoriels & Munafo, 2008)

In this latest COMT paper – and many, many other papers in neurogenetics – the sample size is simply inadequate.  There were 19 participants (12 males and 7 females) with the COMT Val/Val version of the variant, compared with 63 (27 males and 36 females) who had either Met/Met or Val/Met genotype. The authors reported that significant effects of genotype on corpus callosum structure were found in males only. As we noted in our review, effects of common genetic variants are typically very small. In this context, an effect size (standardized difference between means of two genotypes, Cohen's d) of .2 would be really large. Yet this study has power of .08 to detect such an effect in males – that is if there really is a difference of -0.2 SDs between the two genotypes, and you repeatedly ran studies with this sample size, then you'd fail to see the effect in 92% of studies. To look at it another way, the true effect size would need to be enormous (around 1 SD difference between groups) to have an 80% chance of being detectable, given the sample size.

When confronted with this kind of argument, people often say that maybe there really are big effect sizes. After all, the researchers were measuring characteristics of the brain, which are nearer to the gene than the behavioural measures that are often used. Unfortunately, there is another much more likely explanation for the result, which is that it is a false positive arising from a flexible analytic pipeline.

The problem is that both neuroscience and genetics are a natural environment for analytic flexibility. Put the two together, and you need to be very very careful to control for spurious false positive results. In the papers we evaluated for our review, there were numerous sources of flexibility: often researchers adopted multiple comparisons corrections for some of these, but typically not for all. In the COMT/callosum paper, the authors addressed the multiple comparisons issue using permutation testing. However, one cannot tell from a published paper how many subgroupings/genetic variants/phenotypes/analysis pathways etc were tried but not reported. If, as in mainstream genetics, the authors had included a direct replication of this result, that would be far more convincing. Perhaps the best way for the field to proceed would be by adopting pre-registration as standard. Pre-registration means you commit yourself to a specific hypothesis and analytic plan in advance; hypotheses can then be meaningfully tested using standard statistical methods. If you don’t pre-register and there are many potential ways of looking at the data, it is very easy to fool yourself into finding something that looks 'significant'.

I am sufficiently confident that this finding will not replicate that I hereby undertake to award a prize of £1000 to anyone who does a publicly preregistered replication of the El-Hage et al paper and reproduces their finding of a statistically significant male-specific effect of COMT Val158Met polymorphism on the same aspects of corpus callosum structure.

I emphasise that, though the new COMT/callosum paper is the impetus for this blogpost, I do not intend this as a specific criticism of the authors of that paper. The research approach they adopted is pretty much standard in the field, and the literature is full of small studies that aren't pre-registered and don't include a replication sample. I don't think most researchers are being deliberately misleading, but I do think we need a change of practices if we are to amass a research literature that can be built upon. Either pre-registration or replication should be conditions of publication.

PS. 3rd October 2017
An anonymous commentator (below) drew my attention to a highly relevant preprint in Bioarxiv by Jahanshad and colleagues from the ENIGMA-DTI consortium, entitled 'Do Candidate Genes Affect the Brain's White Matter Microstructure? Large-Scale Evaluation of 6,165 Diffusion MRI Scans'. They included COMT as one of the candidate genes, although they did not look at gender-specific effects. The Abstract makes for sobering reading: 'Regardless of the approach, the previously reported candidate SNPs did not show significant associations with white matter microstructure in this largest genetic study of DTI to date; the negative findings are likely not due to insufficient power.'

In addition, Kevin Mitchell (@WiringTheBrain) on Twitter alerted me to a blogpost from 2015 in which he made very similar points about neuroimaging biomarkers. Let's hope that funders and mainstream journals start to get the message.

## Sunday, 10 September 2017

### Bishopblog catalogue (updated 10 Sept 2017)

 Source: http://www.weblogcartoons.com/2008/11/23/ideas/

Those of you who follow this blog may have noticed a lack of thematic coherence. I write about whatever is exercising my mind at the time, which can range from technical aspects of statistics to the design of bathroom taps. I decided it might be helpful to introduce a bit of order into this chaotic melange, so here is a catalogue of posts by topic.

Language impairment, dyslexia and related disorders
The common childhood disorders that have been left out in the cold (1 Dec 2010) What's in a name? (18 Dec 2010) Neuroprognosis in dyslexia (22 Dec 2010) Where commercial and clinical interests collide: Auditory processing disorder (6 Mar 2011) Auditory processing disorder (30 Mar 2011) Special educational needs: will they be met by the Green paper proposals? (9 Apr 2011) Is poor parenting really to blame for children's school problems? (3 Jun 2011) Early intervention: what's not to like? (1 Sep 2011) Lies, damned lies and spin (15 Oct 2011) A message to the world (31 Oct 2011) Vitamins, genes and language (13 Nov 2011) Neuroscientific interventions for dyslexia: red flags (24 Feb 2012) Phonics screening: sense and sensibility (3 Apr 2012) What Chomsky doesn't get about child language (3 Sept 2012) Data from the phonics screen (1 Oct 2012) Auditory processing disorder: schisms and skirmishes (27 Oct 2012) High-impact journals (Action video games and dyslexia: critique) (10 Mar 2013) Overhyped genetic findings: the case of dyslexia (16 Jun 2013) The arcuate fasciculus and word learning (11 Aug 2013) Changing children's brains (17 Aug 2013) Raising awareness of language learning impairments (26 Sep 2013) Good and bad news on the phonics screen (5 Oct 2013) What is educational neuroscience? (25 Jan 2014) Parent talk and child language (17 Feb 2014) My thoughts on the dyslexia debate (20 Mar 2014) Labels for unexplained language difficulties in children (23 Aug 2014) International reading comparisons: Is England really do so poorly? (14 Sep 2014) Our early assessments of schoolchildren are misleading and damaging (4 May 2015) Opportunity cost: a new red flag for evaluating interventions (30 Aug 2015) The STEP Physical Literacy programme: have we been here before? (2 Jul 2017)

Autism
Autism diagnosis in cultural context (16 May 2011) Are our ‘gold standard’ autism diagnostic instruments fit for purpose? (30 May 2011) How common is autism? (7 Jun 2011) Autism and hypersystematising parents (21 Jun 2011) An open letter to Baroness Susan Greenfield (4 Aug 2011) Susan Greenfield and autistic spectrum disorder: was she misrepresented? (12 Aug 2011) Psychoanalytic treatment for autism: Interviews with French analysts (23 Jan 2012) The ‘autism epidemic’ and diagnostic substitution (4 Jun 2012) How wishful thinking is damaging Peta's cause (9 June 2014)

Developmental disorders/paediatrics
The hidden cost of neglected tropical diseases (25 Nov 2010) The National Children's Study: a view from across the pond (25 Jun 2011) The kids are all right in daycare (14 Sep 2011) Moderate drinking in pregnancy: toxic or benign? (21 Nov 2012) Changing the landscape of psychiatric research (11 May 2014)

Genetics
Where does the myth of a gene for things like intelligence come from? (9 Sep 2010) Genes for optimism, dyslexia and obesity and other mythical beasts (10 Sep 2010) The X and Y of sex differences (11 May 2011) Review of How Genes Influence Behaviour (5 Jun 2011) Getting genetic effect sizes in perspective (20 Apr 2012) Moderate drinking in pregnancy: toxic or benign? (21 Nov 2012) Genes, brains and lateralisation (22 Dec 2012) Genetic variation and neuroimaging (11 Jan 2013) Have we become slower and dumber? (15 May 2013) Overhyped genetic findings: the case of dyslexia (16 Jun 2013) Incomprehensibility of much neurogenetics research ( 1 Oct 2016) A common misunderstanding of natural selection (8 Jan 2017) Sample selection in genetic studies: impact of restricted range (23 Apr 2017)

Neuroscience
Neuroprognosis in dyslexia (22 Dec 2010) Brain scans show that… (11 Jun 2011)  Time for neuroimaging (and PNAS) to clean up its act (5 Mar 2012) Neuronal migration in language learning impairments (2 May 2012) Sharing of MRI datasets (6 May 2012) Genetic variation and neuroimaging (1 Jan 2013) The arcuate fasciculus and word learning (11 Aug 2013) Changing children's brains (17 Aug 2013) What is educational neuroscience? ( 25 Jan 2014) Changing the landscape of psychiatric research (11 May 2014) Incomprehensibility of much neurogenetics research ( 1 Oct 2016)

Reproducibility
Accentuate the negative (26 Oct 2011) Novelty, interest and replicability (19 Jan 2012) High-impact journals: where newsworthiness trumps methodology (10 Mar 2013) Who's afraid of open data? (15 Nov 2015) Blogging as post-publication peer review (21 Mar 2013) Research fraud: More scrutiny by administrators is not the answer (17 Jun 2013) Pressures against cumulative research (9 Jan 2014) Why does so much research go unpublished? (12 Jan 2014) Replication and reputation: Whose career matters? (29 Aug 2014) Open code: note just data and publications (6 Dec 2015) Why researchers need to understand poker ( 26 Jan 2016) Reproducibility crisis in psychology ( 5 Mar 2016) Further benefit of registered reports ( 22 Mar 2016) Would paying by results improve reproducibility? ( 7 May 2016) Serendipitous findings in psychology ( 29 May 2016) Thoughts on the Statcheck project ( 3 Sep 2016) When is a replication not a replication? (16 Dec 2016) Reproducible practices are the future for early career researchers (1 May 2017) Which neuroimaging measures are useful for individual differences research? (28 May 2017) Prospecting for kryptonite: the value of null results (17 Jun 2017)

Statistics
Book review: biography of Richard Doll (5 Jun 2010) Book review: the Invisible Gorilla (30 Jun 2010) The difference between p < .05 and a screening test (23 Jul 2010) Three ways to improve cognitive test scores without intervention (14 Aug 2010) A short nerdy post about the use of percentiles (13 Apr 2011) The joys of inventing data (5 Oct 2011) Getting genetic effect sizes in perspective (20 Apr 2012) Causal models of developmental disorders: the perils of correlational data (24 Jun 2012) Data from the phonics screen (1 Oct 2012)Moderate drinking in pregnancy: toxic or benign? (1 Nov 2012) Flaky chocolate and the New England Journal of Medicine (13 Nov 2012) Interpreting unexpected significant results (7 June 2013) Data analysis: Ten tips I wish I'd known earlier (18 Apr 2014) Data sharing: exciting but scary (26 May 2014) Percentages, quasi-statistics and bad arguments (21 July 2014) Why I still use Excel ( 1 Sep 2016) Sample selection in genetic studies: impact of restricted range (23 Apr 2017) Prospecting for kryptonite: the value of null results (17 Jun 2017)

Journalism/science communication
Orwellian prize for scientific misrepresentation (1 Jun 2010) Journalists and the 'scientific breakthrough' (13 Jun 2010) Science journal editors: a taxonomy (28 Sep 2010) Orwellian prize for journalistic misrepresentation: an update (29 Jan 2011) Academic publishing: why isn't psychology like physics? (26 Feb 2011) Scientific communication: the Comment option (25 May 2011)  Publishers, psychological tests and greed (30 Dec 2011) Time for academics to withdraw free labour (7 Jan 2012) 2011 Orwellian Prize for Journalistic Misrepresentation (29 Jan 2012) Time for neuroimaging (and PNAS) to clean up its act (5 Mar 2012) Communicating science in the age of the internet (13 Jul 2012) How to bury your academic writing (26 Aug 2012) High-impact journals: where newsworthiness trumps methodology (10 Mar 2013)  A short rant about numbered journal references (5 Apr 2013) Schizophrenia and child abuse in the media (26 May 2013) Why we need pre-registration (6 Jul 2013) On the need for responsible reporting of research (10 Oct 2013) A New Year's letter to academic publishers (4 Jan 2014) Journals without editors: What is going on? (1 Feb 2015) Editors behaving badly? (24 Feb 2015) Will Elsevier say sorry? (21 Mar 2015) How long does a scientific paper need to be? (20 Apr 2015) Will traditional science journals disappear? (17 May 2015) My collapse of confidence in Frontiers journals (7 Jun 2015) Publishing replication failures (11 Jul 2015) Psychology research: hopeless case or pioneering field? (28 Aug 2015) Desperate marketing from J. Neuroscience ( 18 Feb 2016) Editorial integrity: publishers on the front line ( 11 Jun 2016) When scientific communication is a one-way street (13 Dec 2016) Breaking the ice with buxom grapefruits: Pratiques de publication and predatory publishing (25 Jul 2017)

Social Media
A gentle introduction to Twitter for the apprehensive academic (14 Jun 2011) Your Twitter Profile: The Importance of Not Being Earnest (19 Nov 2011) Will I still be tweeting in 2013? (2 Jan 2012) Blogging in the service of science (10 Mar 2012) Blogging as post-publication peer review (21 Mar 2013) The impact of blogging on reputation ( 27 Dec 2013) WeSpeechies: A meeting point on Twitter (12 Apr 2014) Email overload ( 12 Apr 2016)

Academic life
An exciting day in the life of a scientist (24 Jun 2010) How our current reward structures have distorted and damaged science (6 Aug 2010) The challenge for science: speech by Colin Blakemore (14 Oct 2010) When ethics regulations have unethical consequences (14 Dec 2010) A day working from home (23 Dec 2010) Should we ration research grant applications? (8 Jan 2011) The one hour lecture (11 Mar 2011) The expansion of research regulators (20 Mar 2011) Should we ever fight lies with lies? (19 Jun 2011) How to survive in psychological research (13 Jul 2011) So you want to be a research assistant? (25 Aug 2011) NHS research ethics procedures: a modern-day Circumlocution Office (18 Dec 2011) The REF: a monster that sucks time and money from academic institutions (20 Mar 2012) The ultimate email auto-response (12 Apr 2012) Well, this should be easy…. (21 May 2012) Journal impact factors and REF2014 (19 Jan 2013)  An alternative to REF2014 (26 Jan 2013) Postgraduate education: time for a rethink (9 Feb 2013)  Ten things that can sink a grant proposal (19 Mar 2013)Blogging as post-publication peer review (21 Mar 2013) The academic backlog (9 May 2013)  Discussion meeting vs conference: in praise of slower science (21 Jun 2013) Why we need pre-registration (6 Jul 2013) Evaluate, evaluate, evaluate (12 Sep 2013) High time to revise the PhD thesis format (9 Oct 2013) The Matthew effect and REF2014 (15 Oct 2013) The University as big business: the case of King's College London (18 June 2014) Should vice-chancellors earn more than the prime minister? (12 July 2014)  Some thoughts on use of metrics in university research assessment (12 Oct 2014) Tuition fees must be high on the agenda before the next election (22 Oct 2014) Blaming universities for our nation's woes (24 Oct 2014) Staff satisfaction is as important as student satisfaction (13 Nov 2014) Metricophobia among academics (28 Nov 2014) Why evaluating scientists by grant income is stupid (8 Dec 2014) Dividing up the pie in relation to REF2014 (18 Dec 2014)  Shaky foundations of the TEF (7 Dec 2015) A lamentable performance by Jo Johnson (12 Dec 2015) More misrepresentation in the Green Paper (17 Dec 2015) The Green Paper’s level playing field risks becoming a morass (24 Dec 2015) NSS and teaching excellence: wrong measure, wrongly analysed (4 Jan 2016) Lack of clarity of purpose in REF and TEF ( 2 Mar 2016) Who wants the TEF? ( 24 May 2016) Cost benefit analysis of the TEF ( 17 Jul 2016)  Alternative providers and alternative medicine ( 6 Aug 2016) We know what's best for you: politicians vs. experts (17 Feb 2017) Advice for early career researchers re job applications: Work 'in preparation' (5 Mar 2017)

Celebrity scientists/quackery
Three ways to improve cognitive test scores without intervention (14 Aug 2010) What does it take to become a Fellow of the RSM? (24 Jul 2011) An open letter to Baroness Susan Greenfield (4 Aug 2011) Susan Greenfield and autistic spectrum disorder: was she misrepresented? (12 Aug 2011) How to become a celebrity scientific expert (12 Sep 2011) The kids are all right in daycare (14 Sep 2011)  The weird world of US ethics regulation (25 Nov 2011) Pioneering treatment or quackery? How to decide (4 Dec 2011) Psychoanalytic treatment for autism: Interviews with French analysts (23 Jan 2012) Neuroscientific interventions for dyslexia: red flags (24 Feb 2012) Why most scientists don't take Susan Greenfield seriously (26 Sept 2014)

Women
Academic mobbing in cyberspace (30 May 2010) What works for women: some useful links (12 Jan 2011) The burqua ban: what's a liberal response (21 Apr 2011) C'mon sisters! Speak out! (28 Mar 2012) Psychology: where are all the men? (5 Nov 2012) Should Rennard be reinstated? (1 June 2014) How the media spun the Tim Hunt story (24 Jun 2015)

Politics and Religion
Lies, damned lies and spin (15 Oct 2011) A letter to Nick Clegg from an ex liberal democrat (11 Mar 2012) BBC's 'extensive coverage' of the NHS bill (9 Apr 2012) Schoolgirls' health put at risk by Catholic view on vaccination (30 Jun 2012) A letter to Boris Johnson (30 Nov 2013) How the government spins a crisis (floods) (1 Jan 2014) The alt-right guide to fielding conference questions (18 Feb 2017) We know what's best for you: politicians vs. experts (17 Feb 2017) Barely a good word for Donald Trump in Houses of Parliament (23 Feb 2017)

Humour and miscellaneous Orwellian prize for scientific misrepresentation (1 Jun 2010) An exciting day in the life of a scientist (24 Jun 2010) Science journal editors: a taxonomy (28 Sep 2010) Parasites, pangolins and peer review (26 Nov 2010) A day working from home (23 Dec 2010) The one hour lecture (11 Mar 2011) The expansion of research regulators (20 Mar 2011) Scientific communication: the Comment option (25 May 2011) How to survive in psychological research (13 Jul 2011) Your Twitter Profile: The Importance of Not Being Earnest (19 Nov 2011) 2011 Orwellian Prize for Journalistic Misrepresentation (29 Jan 2012) The ultimate email auto-response (12 Apr 2012) Well, this should be easy…. (21 May 2012) The bewildering bathroom challenge (19 Jul 2012) Are Starbucks hiding their profits on the planet Vulcan? (15 Nov 2012) Forget the Tower of Hanoi (11 Apr 2013) How do you communicate with a communications company? ( 30 Mar 2014) Noah: A film review from 32,000 ft (28 July 2014) The rationalist spa (11 Sep 2015) Talking about tax: weasel words ( 19 Apr 2016) Controversial statues: remove or revise? (22 Dec 2016) The alt-right guide to fielding conference questions (18 Feb 2017) My most popular posts of 2016 (2 Jan 2017)

## Tuesday, 25 July 2017

### Breaking the ice with buxom grapefruits: Pratiques de publication and predatory publishing

#### Max Coltheart, Department of Cognitive Science, Macquarie University

These days it is common for academics to receive invitations from unfamiliar sources to attend conferences, submit papers, or join editorial boards. We began an attack against this practice by not ignoring such invitations – by, instead, replying to them with messages selected from the output of the wonderful Random Surrealism Generator. It generates syntactically correct but surreal sentences such as “Is that a tarantula in your bicycle clip, or are you just gold-trimmed?” (a hint of Mae West there?). This sometimes had the desired effect of generating a bemused response from the inviter; but we decided more was needed.

So we used the surrealism generator to craft an absurdist critique of “impaired” publication practices (the title of the piece says as much, albeit obliquely). The first few sentences seem relevant to the paper’s title but the piece then deteriorates rapidly into a sequence of surreal sentences (we threw in some gratuitous French and Latin for good measure) so that no one who read the paper could possibly believe that it was serious (our piece also quotes itself liberally); and we submitted the paper to a number of journals. Specifically, we submitted the paper to every journal that contacted either of us in the period 21 June 2017 to 1 July 2017 inviting us to submit a paper. There were 10 such invitations. We accepted all of them, and submitted the paper, making minor changes to the title of the paper and the first couple of sentences to generate the impression that the paper was somehow relevant to the interests of the journal; but the bulk of the paper was always the same sequence of surreal sentences.

While we were engaged in this exercise, the blogger Neuroskeptic was doing something similar: we describe that work below. Both of us were of course following the honourable tradition of  submissions as these by -->Peter Vamplew and Christoph Bartnek (More generally, there is a fine tradition of hoax articles intended as critiques of certain academic fields, e.g., postmodernism or theology).

What happened then?

All ten journals responded by informing us that our ms had been sent out for review.  We did not hear anything further from four of them. A fifth, the SM Journal of Psychiatry and Mental Health, eventually responded “The ms was plagiarized so please make some changes to the content”. We did not respond to this request, nor to a subsequent request for resubmission.

The Scientific Journal of Neurology & Neurosurgery responded by telling us that our paper had been peer-reviewed; the reviewer praised our “scientific methodology” but chided us about our poor English (specifically, they said “English should be rewritten, it is necessary a correction of typing errors (spaces)”). We ignored this advice and resubmitted. However, the journal then noticed the similarity with the article we had submitted to the International Journal of Brain Disorders and Therapy (see below for this), so ceased production of our article.

The paper was accepted by Psychiatry and Mental Disorders: “accepted for publication by our reviewers without any changes”, we were told.

The paper was accepted by Mental Health and Addiction Research, but at that point we were told that a publication fee was due. We protested on the ground that when we had been invited to submit there had been no mention of a fee, and we said that unless a full fee waiver was granted we would take our work to a more appreciative journal. In response, we were granted a full fee waiver, and our paper was published in the on-line journal.

The SM Journal of Disease Markers also accepted the paper, and sent us proofs, which we corrected and returned. At that point, we were told that an article processing fee of US\$920 was due. We protested in the same way, asking for a full fee waiver. In response, they offered a reduced fee of \$520. We did not respond, so this paper, although accepted, has not been published.

The tenth journal, the International Journal of Brain Disorders and Therapy, sent us one reviewer comment. The reviewer had entered into the spirit of the hoax by providing a review which was itself surrealistic. We incorporated this reviewer’s comment about Scottish Lithium Flying saucers and resubmitted, and the paper was accepted. The journal then noticed irregularities in some (but surprisingly not all) of the references. We replaced these problematic references with citations of recent and classic hoaxes (e.g., Kline & Saunders’ 1959 piece on “psychochemical symbolism”; Lindsay & Boyle’s recent piece on the “Conceptual Penis”), along with a citation of Pennycook et al’s article “On the reception and detection of pseudo-profound bullshit”. The paper was then published in the on-line journal.  Later this journal asked us for a testimonial about the review process, which we supplied: "The process of publishing this article was much smoother than we anticipated".

In sum: all ten journals to which we submitted the paper sent it out for review, even though any editor had only to read to the end of the first paragraph to come across this:
“Of course, neither cognitive neuropsychiatry nor cognitive neuropsychology is remotely informative when it comes to breaking the ice with buxom grapefruits. When pondering three-in-a-bed romps with broken mules, therefore, one must refrain, at all costs, from driving a manic-depressive lemon-squeezer through ham (Baumard & Brugger, 2016).”

Of these ten journals, two tentatively accepted the paper and four fully accepted it for publication. Two of these journals have already published it.

The blogger Neuroskeptic did this a little differently (see http://blogs.discovermagazine.com/neuroskeptic/2017/07/22/predatory-journals-star-wars-sting/#.WXbIstP5hTF ). A hoax paper entitled “Mitochondria: Structure, Function and Clinical Relevance” was prepared. It did not contain any nonsensical sentences, as our paper did, but its topic was the fictional cellular entities “midi-chlorians” (which feature in Star Wars). The paper was submitted to nine journals. Four accepted it. One of these charged a fee, which the author declined to pay; the other three charged no fee, and so the paper has been published in all three of these papers, the International Journal of Molecular Biology: Open Access (MedCrave), the Austin Journal of Pharmacology and Therapeutics (Austin) and American Research Journal of Biosciences (ARJ). In order to know that this paper was nonsense, one would need some knowledge of cell biology. But our paper is blatantly nonsensical to any reader; and yet it boasted an acceptance rate very similar to that of Neuroskeptic’s paper.

What can be learned from our exercise? Several things:

(a) It is clear that with these journals there is no process by which a submission is initially read by an editor to decide whether the paper should be sent out for review, because our paper could not possibly have survived any such inspection.

(b)  But nor should our paper have survived any serious review process, since any reviewer reading the paper would have pointed out its nonsensical content. Only twice did a journal send us feedback from a reviewer, one which said we should discuss Lithium Flying Saucers, and one which seemed suspect to us because its criticism of our English was expressed in such poor English.

(c) In contrast to this apparent lack of human intervention in the article-handling process, there was some software intervention: some of these journals appear routinely to apply plagiarism-detection software to submitted articles

(d) What’s in this for the journals? We assumed that they exist solely to make money by charging authors. We presume that, just as they attempt to build apparently legitimate editorial boards (see here), these journals will sometimes waive their fees so as to get some legitimate-seeming articles on their books, the better to entice others to submit.

## Sunday, 2 July 2017

### The STEP Physical Literacy programme: have we been here before?

One day in 2003, I turned on BBC Radio 4 and found myself listening to an interview on the Today Programme with Wynford Dore, the founder of an educational programme that claimed to produce dramatic improvements in children's reading and attentional skills. The impetus for the programme was a press release of a study published in the journal Dyslexia, reporting results from a trial of the programme with primary school-children.  The interview seemed more like an advertisement than a serious analysis, but the consequent publicity led many parents to sign up for the programme, both in the UK and in other countries, notably Australia.

The programme involved children doing two 10-minute sessions per day of exercises designed to improve balance and eye-hand co-ordination. These were personalised to the child, so that the specific exercises would be determined by level of progress in particular skills. The logic behind the approach was that these exercises trained the cerebellum, a part of the brain concerned with automatizing skills. For instance, when you first play the piano or drive a car, it is slow and effortful, but after practice you can do it automatically without thinking about it. The idea was that cerebellar training would lead to a general cerebellar boost, helping other tasks, such as reading, to become more automatic.

Various experts who were on the editorial board of Dyslexia were unhappy with the quality of the research and asked for the paper to be retracted. When no action was taken, a number of them resigned. In 2007, I published a detailed critique of the study, which by that time had been complemented by a follow-up – which had prompted further editorial resignations.
Meanwhile, Wynford Dore, who had considerable business acumen, continued to promote the Dore Programme, writing a popular book describing its origins, and signing up celebrities to endorse it. Among these were rugby legends Kenny Logan and Scott Quinnell. In addition, Dore was in conversations with the Welsh Assembly about the possibility of rolling the programme out in Welsh schools. He had also persuaded Conservative MP Christopher Chope that the Dore programme was enormously effective but was being suppressed by government.
Various bloggers were interested in the amazing uptake of the Dore Programme, and in 2008, Ben Goldacre wrote a trenchant piece on his Bad Science blog, noting among other things that Kenny Logan was paid for some of his promotional work. The nail in the coffin of the Dore Programme was an Australian documentary in the Four Corners series, which included interviews with Dore, some of his customers, and scientists who had been involved both in the evaluation and the criticisms. The Dore business, which had been run as a franchise, collapsed, leaving many people out of pocket: parents who had paid up-front for a long-term intervention course, and staff at Dore centres, who found themselves out of a job.
The Dore programme did not die completely, however. Scott Quinnell continued to market a scaled-down version of the programme through his company Dynevor, but was taken to task by the Advertising Standards Authority for making unsubstantiated claims. Things then went rather quiet for a while.
This year, however, I have been contacted by concerned teachers who have told me about a new programme, STEP Physical Literacy, which is being promoted for use in schools, and which bears some striking similarities to Dore.  Here are some quotes from the STEP website:
• Pupils undertake 2 ten minute exercise sessions at the start and end of each school day. The exercises focus on the core skills of balance, eye-tracking and coordination.
• STEP is a series of personalised physical exercises that stimulate the cerebellum to function more efficiently.
• The STEP focus is on the development of physical capabilities that should be automatic such as standing still, riding a bike or following words on a page.
In addition, STEP Physical Literacy is being heavily promoted by Kenny Logan, who features several times on the News section of the website.
As with Dore, STEP has been promoted to politicians, who argue it should be introduced into schools. In this case, the Christopher Chope role is fulfilled by Liz Smith MSP, who appears to be sincerely convinced that Scotland's literacy problems can be overcome by having children take two 10 minute sessions out of lessons to do physical exercises.
On Twitter, Ben Goldacre noted that the directors of Dynevor CIC, overlap substantially with directors of Step2Progress, who own STEP. The registered address is the same for the two companies.
When asked about Dore, those involved with STEP deny any links. After I tweeted about this, I was emailed by Lucinda Roberts Holmes, Managing Director of STEP, to reassure me that STEP is not a rebranding of Dore, and to suggest we meet so she could "talk through the various pilots and studies that have gone on both in the UK and the US as well as future research RCTs planned with Florida State University and the University of Edinburgh." I love evidence, but I find it best to sit down with data rather than have a conversation, so I replied explaining that and saying I'd be glad to take a look at any written reports. So far nothing has materialised. I should add that I have not been able to find any studies on STEP published in the peer-reviewed literature, and the account of the pilot study and case studies on the STEP website does not given me confidence that these would be publishable in a reputable journal.
In short, the evidence to date does not justify introducing this intervention into schools: there's no methodologically adequate study showing effectiveness, and it carries both financial costs and opportunity costs to children. It's a shame that the field of education is so far behind medicine in its attitude to evidence, and that we have politicians who will consider promoting educational interventions on the basis of persuasive marketing. I suggest Liz Smith talks to the Education Endowment Foundation, who will be able to put her in touch with experts who can offer an objective evaluation of STEP Physical Literacy.

8th July 2017: Postscript. A response from STEP
I have had a request from Lucinda Roberts-Holmes, Managing Director of Step2Progress, to remove this blogpost on the grounds that it contains defamatory and inaccurate information. I asked for more information on specific aspects of the post that were problematic and obtained a very long response, which I reproduce in full below. Readers are invited to form their own interpretation of the facts, based on the response from STEP (in italics) and my comments on the points raised.

Preamble: To be clear your blog in its current form includes a number of statements which are factually incorrect. In particular, the suggestion that STEP is simply a reincarnation of the Dore programme is not true as I have already explained to you (see my email of 29 June). The fact that you chose to ignore that assurance and instead publish the blog is very concerning to us. The suggestion, also, that I had chosen not to reply to your email ("so far nothing has materialised") is, I am afraid, disingenuous particularly in circumstances where you did not even set a deadline in your email and you waited only 72 hours to post your blog. Had you, of course, waited to receive a response to your email, we would have explained the correct position to you. Similarly, had you carried out an objective comparison of the two programmes you would have noted the many differences between STEP and Dore and, more significantly, identified the fact that STEP makes absolutely none of the assertions about cures for Dyslexia and other learning difficulties or any other of the hypotheses that Wynford Dore concocted. They are not the same programme evidenced not least by the fact that STEP states its programme is not a SEN learning intervention.

Comment: a) I did not state in the blog that STEP is 'simply a reincarnation of the Dore Programme'. I said it bears some striking similarities to Dore.

b) I did not ignore Lucinda's reassurance that STEP is not a rebranding of Dore. On the contrary, I stated in the blogpost that I had received that reassurance from her.

c) I did not suggest that Lucinda had chosen not to reply to my email: I simply observed that I had not so far received a response. As my blogpost points out, I had made it clear in my initial email that I did not want her to 'explain the correct position' to me. I had specifically requested written reports documenting the evidence for effectiveness of STEP.

1. Despite what Ben Goldacre may believe, Kenny Logan (KL) was not paid by the Dore programme for "promotional work". He was, in fact, a paying customer of the programme who went from being unable to read at the start of the programme to being literate by the end of it. KL was happy to share his experience publicly and was very clear with Dore that he would not be paid to do this. Whilst it is true that in 2006, he was contracted and paid by Wynford Dore for his professional input into a sports programme that he was seeking to develop that is an entirely different matter. The suggestion that KL was only promoting the Dore programme for his own financial benefit is clearly defamatory of him (and indeed of us).

I asked Ben Goldacre about this. The claim about Logan's payment for promotional work was made in a Comment is Free article in the Guardian. Ben told me it all went through a legal review at the Guardian to ensure everything was robust, and no complaints were received from Kenny Logan at the time. If the claim is untrue, then Kenny Logan needs to take this up with the Guardian. It's unclear to me why Kenny Logan promoting Dore would be defamatory of STEP, given that STEP claims to have no association with Dore.

2. The fact that KL previously promoted the Dore programme also does not support the allegation that the STEP programme is the same as the Dore programme. They are very different programmes and we are a very different organisation to Dore. Incorrectly stating that KL was paid for the promotion of Dore and trying to draw an inference that therefore he is paid to promote STEP (which he is not) is also misleading.

Comment: I made no claims that Kenny Logan is paid to promote STEP. He is a shareholder in STEP2Progress, which is a different matter.

3. Dynevor was never "Scott Quinnell's Company". Dynevor was primarily owned by Tim Griffiths and was the organisation that purchased the intellectual property rights in Dore after it went bankrupt. Tim Griffiths had no prior connection to Wynford Dore or the Dore programme but did have an interest in the link between exercise and ability to learn. As many thousands of people had been left in a difficult position when Dore collapsed into administration having purchased a programme they could not continue the directors at Dynevor agreed to commit the funding necessary to allow those who wanted to continue the programme the opportunity to do so. Scott Quinnell had a shareholding of less than 1% in Dynevor. STEP has absolutely no association with Scott Quinnell.

Comment: The role of Scott Quinnell in Dynevor is not central to my description of Dore, but this account of his role seems disingenuous. According to Companies House, Quinnell was appointed as one of two Directors of Dynevor C.I.C in 2009, and his interest in the company in 2011 was 2.6% of the shareholding, at a time when Wynford Dore had a shareholding of 4.3%.

I have not claimed that Scott Quinnell has any relationship with STEP. My account of his dealings was to provide a brief history of the problems with Dore for readers unfamiliar with the background.

4. You refer to the claims Ben Goldacre has made on Twitter that the directors of Dynevor CIC "overlap substantially" with the directors of STEP. In fact, of the 8 Directors of Dynevor only 2 hold directorships at STEP. In any event that misses the point which is that none of the directors of STEP had any association with the Dore Programme prior to the purchase of intellectual property rights in 2009.

Comment: According to Companies House, the one 'active person with significant control' in Dynevor CIC is Timothy Griffiths, and the 'one active person with significant control' in STEP2Progress is Conor Davey. If I have understood this correctly, this is based on shareholdings. Timothy Griffiths is one of four Directors of STEP2Progress, and Conor Davey is the Chairman of Dynevor CIC. Dynevor CIC and STEP2Progress have the same postal address.

It wasn't quite clear if Lucinda was saying that Dynevor CIC is now disassociated from Dore, but if that is the case, it would be wise to update the company's LinkedIn Profile, which states that the company 'provides the Dore Programme to individual clients and schools around the UK and licences the rights to provide the Dore Programme in a number of overseas countries'.

5. It is not correct to state that STEP denies any links to the Dore programme. There is, of course, a link, as there is also to the work of Dr Frank Belgau and his studies into balametrics. There is also a link to other movement programmes such as Better Movers and Thinkers and Move to Learn. What we have said is that the STEP programme is not the Dore programme and we stand by this. You may seek to draw similarities between them as I could between apples and pears.

Comment: Nowhere in my blogpost did I state that STEP denies any links to the Dore programme.

Re Belgau: I have just done a search on Web of Science that returned no articles for either author = Belgau or topic = balametrics.

6. May I also ask how you can state that "the evidence to date does not justify introducing this intervention in to schools" when you have refused so far to meet with me or even seen the evidence or read the full Pilot Study? Have you asked any teachers or head teachers who have experience of delivering the STEP Programme whether they would recommend to their peers the use of the programme in their schools?

Comment: There is a fundamental misunderstanding here about how scientists evaluate evidence. If you want to find out whether an intervention is effective, the worst thing you can do is to talk to people who are convinced that it is. There are people who believe passionately in all sorts of things: the healing powers of crystals, the harms of vaccines, the benefits of homeopathy, or the evils of phonics instruction. They will, understandably, try to convince you that they are right, but they will not be objective. The way to get an accurate picture of what works is not by asking people what they think about it, but by doing well-controlled studies that compare the intervention with a control condition in terms of children's outcomes. It is for this reason that I have been asking for any hard evidence that STEP2Progress has from properly conducted studies or information about future-planned studies, which I am told are in the pipeline. I would love to read the full Pilot Study, but am having difficulty accessing it (see below).

7. You say in your blog "It is a shame that... We have politicians who will consider promoting educational interventions on the basis of persuasive marketing" Presumably this is a reference to Liz Smith MSP (LS) who you refer to separately in the blog? For your information, LS has read the full research report of the 2015/2016 Pilot Study as well as the other case studies. In light of that information, she has indicated the she is impressed with the STEP programme and that the Scottish Government should consider piloting it and looking more widely at the impact of physical literacy on academic attainment. At the point she expressed this view there had not been any marketing of the STEP programme in Scotland so I do not understand the evidence to support the statement you make in the blog.

Comment: In this regard Liz Smith has the advantage. Although Lucinda has now sent me three emails since my blogpost appeared, in none of them did she send me the reports I had initially requested. In my latest email I asked to see the 'full research report' that Liz Smith had access to. I got this reply from Lucinda:

Dear Dorothy,

Thank you for your email. With the greatest respect, I think the first step should be for you to correct or remove your blog and apologise for the inaccuracies I have outlined below. Alongside that I repeat my offer to come and talk you through the STEP programme and the studies that have been carried out so far. As I say, we are not the same programme as the Dore programme and it is wrong to allege otherwise.

Kind regards
Lucinda

Nevertheless, with her penultimate email, Lucinda attached a helpful Excel spreadsheet documenting differences between Dore and STEP, as follows:

Difference 1. The Dore Programme was a paper book of 100 exercises followed sequentially. Dore's assertions that they were personalised were untrue. STEP software contains over 350 exercises delivered through an adaptive learning software platform that is individualised to the child based on previous performance. The Programme also contains 10 minutes of 1-1 time with each pupil twice per day (nurture) and involves pupils overcoming a series of physical challenges (resilience) in a non class-competitive environment (success cycle) which displays their commitment levels (engagement) and is overseen by committed members of staff who also work with them in the classroom (mentoring and translational trust building).

Comment: The question of interest is where do these exercises come from? How were they developed? Usually for an adaptive learning process, one needs to do prior research to establish difficulty levels of items for children of different ages. I raised this issue with the original Dore programme: there is no published evidence of the kind of foundational work you'd normally expect for an educational programme. Readers will no doubt be intereted to hear that STEP has more exercises than Dore and delivers these in a specific, personalised sequence, but what is missing is a clear rationale explaining how and why specific exercises were developed. It would also be of interest to know how many of Dore's original 100 exercises are incorporated in STEP.

Difference 2. Dore was an exercise programme completed by adults and children at home supervised by untrained parents. STEP is delivered in schools and overseen by teaching staff trained through industry leader Professor Geraint Jones' teacher training programme. This also includes training on how to assess pupil performance.

Comment. If the intervention is effective, then standardized administration by teachers is a good thing. If it is not effective, then teachers should not be spending time and money being trained. Everything hinges on evidence for effectiveness (see below).

Difference 3. Dore asserted that the programme was a cure for dyslexia and and other learning difficulties. It further claimed to know the cause of these learning difficulties. STEP makes absolutely no assertions about Dyslexia, ADHD or other learning difficulties and absolutely no assertions about the medical cause for these.

Comment. I am sure that there are many people who will be glad to have the clarification that STEP is not designed to treat children with specific learning difficulties or dyslexia, as there appears to be some misunderstanding of this. This may in part be the consequence of Kenny Logan's involvement in promoting STEP. Consider, for instance, this piece in the Daily Mail, which first describes how Kenny's dyslexia was remediated by the Dore programme, and then moves to talk of his worries over his son Reuben, who was having difficulties in school:

"The answer was already staring him in the face, however, and within months, Kenny decided to try putting Reuben through a similar 'brain-training' technique to the one that transformed his own life just 14 years ago. Reuben, it transpired, had mild dyspraxia - a condition affecting mental and physical co-ordination - and the outcome for him has been so successful that Kenny is currently trying to persuade education chiefs to implement the technique in the country's worst-performing state schools, to raise attainment levels."

Another reason for confusion may be because the STEP home page lists the British Dyslexia Association as a partner and has features in the News section of its website on Dyslexia Awareness Month , on unidentified dyslexia, and a case study describing use of STEP with dyslexic children in Mississippi.

The transcript of the debate in the Scottish Parliament (scroll down to the section on Motion debated: That the Parliament is impressed by the STEP physical literacy programme) shows that many of the Scottish MPs who took part in the debate with Liz Smith were under the impression that STEP was a treatment for specific learning disabilities such as dyslexia and ADHD, as evident from these quotes:

Daniel Johnson: 'It is vital that we understand that there is a direct link between physical understanding, learning, knowledge and ability and educational ability. Overall - and specifically - there would be key benefits for people who have conditions such as ADHD and dyslexia... There is a growing body of evidence about the link between spatial awareness and physical ability and dyslexia. Likewise, the improvements on focus and concentration that exercises such as those that are outlined in the STEP programme can have for people with ADHD are clear. Improvements in those areas are linked not only to training the mind to concentrate, but to the impacts on brain chemistry.'

Elaine Smith: With regard to STEP, we have already heard that it is a programme of exercises performed twice a day for 10 minutes and focuses in particular on balance, eye tracking and co-ordination with the aim of making physical activity part of children's everyday learning. Improving physical literacy is particularly advantageous for children and young people who can find it difficult to concentrate, such as those with dyslexia and autism... STEP also has the backing of the British Dyslexia Association, which supported the findings of the pilot study.

Shirley-Anne Somerville: We are aware that the STEP programme has been promoted for children who have dyslexia.

Difference 4. Dore claimed that completing the exercises would repair a damaged or underdeveloped cerebellum. It is known that repetitive physical exercises stimulate the cerebellum but STEP makes no assertions of science that any physiological changes take place. STEP involves using repetitive physical exercises to embed actions and make them automatic.

Comment: It is good to see that some of the more florid claims of Dore are avoided by STEP, but the fact remains that the underlying theory is similar, namely that cerebellar training will improve skills beyond motor skills. The idea that training motor skills will produce effects that generalise to other aspects of development is is dubious because the cerebellum is a complex organ subserving a range of functions and controlled studies typically find that training effects are task-specific. I discussed these issues in relation to the Dore programme here.

Specific statements about the cerebellum on the STEP website are:

'After going on national television to tell his heart-breaking story about facing up to the frustrations of overcoming a childhood stumbling block bigger than Mount Everest, Kenny (Logan) is determined to highlight the positive effects of using cerebellum specific teaching and learning programmes in primary school settings.'

And on this page of the website we hear: 'In the last century, academics experimenting with balametrics, dance and movement, established that specifically stimulating the cerebellum through exercise improves skill automation. The STEP Programme is built upon this foundation.'

Difference 5. Dore was a "medical" treatment that required participants to regularly visit treatment centres for "medical" evaluations to determine whether their learning difficulty was being cured. STEP is a primary school physical literacy programme delivered by teaching assistants or other teaching staff. It is to date shown to be most impactful on the lower quartile of the classroom in terms of academic improvement.

This is a rather odd interpretation of the Dore programme, which perhaps is signalled by the use of quotes around 'medical'. I never had the impression it was medical ╨ it was not prescribed or administered by doctors. It is true that Dore did establish centres for assessment and this proved to be a major reason for its commercial failure: there were substantial costs in premises, staffing and equipment. But there was no necessity to run the intervention that way: some people at the time of the collapse suggested it would be feasible to offer the exercises over the internet at much lower cost.

The second point, re the greatest benefits for the lower quartile of the classroom, is on the one hand of potential interest, but on the other hand raises the concern that the benefits could be a classic case of the regression to the mean. This is one of many ways in which scores can improve on an outcome measure for spurious reasons - which is why you need proper randomised controlled trials. Improvements are largely uninterpretable without these because increases in scores can arise because of practice, maturation, regression to the mean or placebo effects.

Difference 6. Dore determined "progress" and "cure" via a series of physical assessments. STEP empirically measures the academic progress of pupils with baseline data and presents reports against actual physical skills developed inviting schools to draw their own conclusions in the context of their school setting.

Comment. Agree that Dore's method of measuring progress and cure was a major problem, because a child could improve on the measures of balance and eye-hand co-ordination and be deemed 'cured' even though their reading had not improved at all. But the account of STEP sounds too vague to evaluate - and the evidence on their website from the pilot study is so underspecified as to be uninterpretable. It is not clear what the measures were, and which children were involved in which measures. I would like to see the full report to have a clearer idea of the methods and results.

Difference 7. Dore claimed that the exercises were developed and delivered in a formulaic manner that was a trade secret. STEP focuses on determining whether a pupils core physical capabilities in balance, eye tracking and coordination. There is no secret formula or claims of one. The genesis of STEP is in balametrics as well as other movement programmes such as Better Movers and Thinkers https://www.ncbi.nlm.nih.gov/pubmed/27247688 and Move to Learn https://www.movetolearn.com.au/research/

Comment. In STEP, how are the scores on core physical capabilities standardized for age and sex? This refers back to my earlier comment about the development work needed to underpin an effective programme. The impression is that people in this field borrow ideas from previous programmes but there is no serious science behind this.

Difference 8. The Dore Programme cost over £2000 per person and was paid for individually. STEP costs £365 per year per child and is completed over 2 years. It is largely paid for through schools that have the discretion to ask parents to fund the programme if it is an additional intervention being offered. STEP also commits a significant number of places to schools free of charge. The fee includes year round school support

Comment. Good to have the differences in charging methods clarified.

Difference 9. Dore published research based around a single school with hypotheses relating to the cerebellum and dyslexia that could not be substantiated. It used dyslexic tendencies as a measure of improvement and selection. STEP as an organisation is wholly open to independent research and evaluation. Its initial pilot study was designed and led by the IAPS Education Committee and conducted by Innovation Bubble, led by Dr Simon Moore, University of Middlesex and Chartered Psychologist. It was held across 17 schools. Further pilot studies have taken place carried out by education districts in Mississippi and ESCCO as well as independent case studies. These have always been presented openly and in the context they were compiled. STEP believes it has sufficient evidence to warrant a large scale evaluation of the Programme.

Comment. In the context of intervention evaluation, quantity of research does not equate with quality. Here is Wikipedia's definition of a pilot study: 'A small scale preliminary study conducted in order to evaluate feasibility, time, cost, adverse events, and effect size (statistical variability) in an attempt to predict an appropriate sample size and improve upon the study design prior to performance of a full-scale research project.' I agree that a large-scale evaluation of the Programme is warranted. It's a bit odd to say the results have been presented openly while at the same time refusing to send me reports unless I take down my blogpost.

It is clear that the MSPs in the debate in the Scottish Parliament were all, without exception, convinced that we already had evidence for the effectiveness of STEP. If they based these impressions on the information on the STEP website (as suggested by Liz Smith's initial statement), then this is worrying, as this came from the pilot study, where the methods were not clearly described, and the description of the results is unclear and looks incomplete, or from uncontrolled case studies.

Here are some of the statements from MSPs:

Liz Smith: As members know, the programme has been used successfully in both England and the United States, and it has been empirically evidenced to reduce the attainment gap in primary school pupils. Pupils who have completed STEP have shown significant improvements academically, behaviourally, physically and socially. A United Kingdom pilot last year compared more than 100 below-attainment primary school pupils who were on the STEP programme to a group of pupils at the same attainment level who were not. The improved learning outcomes that the study showed are extremely impressive: 86 per cent of pupils on the programme moved to on or above target in reading, compared with 56 per cent of the non-STEP group; 70 per cent of STEP pupils met their target for maths, compared with 30 per cent of the non-STEP group; and 75 per cent and 62 per cent of STEP pupils were on or above target for English comprehension and spelling respectively, compared with 43 per cent and 30 per cent of the non-STEP group.
In Mississippi, in the USA, more than 1,000 pupils have completed the programme over the past three years, and it is no coincidence that that state has seen significant improvement in fourth grade - which is the equivalent of P6 - reading and maths, which has resulted in the state being awarded a commendation for educational innovation.

Brian Whittle: The STEP programme is tried and tested, with measured physical, emotional and academic outcomes, especially in the lower percentiles.

Daniel Johnson: Perhaps most impressive is the STEP programme's achievements on academic improvement╤it has led to improved English for 76 per cent of participants, and to improved maths, reading and spelling for 70 per cent of participants. The benefits that physical literacy can bring to academic attainment are clear.

Oliver Mundell: the STEP programme has been shown to work and is popular with both the teachers and the pupils who have benefited from it in England and the USA.

Conclusion This has been a very long postscript, but it seems important to be clear about what the objections to STEP are. I have not claimed that STEP is exactly the same as Dore. My sense of déjà vu arises because of the similarities, in the people involved, in the use of cerebellar exercises involving balance and eye-hand coordination delivered in short sessions, and in the successful promotion of the programme to politicians and schools in the absence of adequate peer-reviewed evidence. Given that the basic theory does not have strong scientific plausibility, this latter point that is the source of greatest concern. We can agree that we all want children to succeed in school and any method that can help them achieve this is to be welcomed. There is also, however, a need for better education of our politicians, so that they are equipped to evaluate evidence properly. They have a responsibility to ensure we do the best for our children, but this requires a critical mindset.