In my last post about the Higgs rumors, I referred to an excellent blog post by Matt Strassler that features a long comment exchange between him and Peter Woit about the legitimation for leaking information about the experimental results before the data analysis has been completed. One thing that made me thinking was Matts point about “blinding the data”. From the context, I could understand what they referred to, but confirming my intuition on Wikipedia made me aware how common such a blinded analysis seems to be in particle physics. From the article about blind experiments:
Modern nuclear physics and particle physics experiments often involve large numbers of data analysts working together to extract quantitative data from complex datasets. In particular, the analysts want to report accurate systematic error estimates for all of their measurements; this is difficult or impossible if one of the errors is observer bias. To remove this bias, the experimenters devise blind analysis techniques, where the experimental result is hidden from the analysts until they’ve agreed—based on properties of the data set other than the final value—that the analysis techniques are fixed.
They give an example for that
One example of a blind analysis occurs in neutrino experiments, like the Sudbury Neutrino Observatory, where the experimenters wish to report the total number N of neutrinos seen. The experimenters have preexisting expectations about what this number should be, and these expectations must not be allowed to bias the analysis. Therefore, the experimenters are allowed to see an unknown fraction f of the dataset. They use these data to understand the backgrounds, signal-detection efficiencies, detector resolutions, etc.. However, since no one knows the “blinding fraction” f, no one has preexisting expectations about the meaningless neutrino count N’ = N x f in the visible data; therefore, the analysis does not introduce any bias into the final number N which is reported.
That seems to me a very reasonable approach for ecology as well, actually really for every type of experimental or empirical work, but particularly those that draw on large datasets and databases, so what you might call synthesis ecology.
I wonder if anyone has seriously done this or at least thought about it in ecology … ? And my cynical self wonders by how much the percentage of significant result would drop when everyone was blinding his data before the analysis ;).