Over the last few months I have been looking at computationally evaluating confidence intervals and significance tests. This process has helped me sharpen up the recommendations I can give to researchers. I have updated some online papers and blog posts as a result.
This analysis has exposed a difference, rarely commented upon, between the optimum test for contingency (“χ²-type”) tests when independent variable samples are drawn from the same population or independent populations.
For 2 × 2 tests it is recommended to use a different test (Newcombe-Wilson) when the IV is sociolinguistic (e.g. genre, time, different subcorpora) or otherwise divides samples by participants, than when the same participant may be sampled in either value (e.g. when the IV is a lexical-grammatical variable).
Meta-comment: In a way this is another benefit of a blog — unlike traditional publication, I can quickly correct any problems or improve papers as a result of my discoveries or those of colleagues. However it also means I need to draw the attention of my readership to any changes.
Confidence intervals and significance tests are closely related, for reasons discussed here. So if we can evaluate a formula for a confidence interval in some way, then we can also potentially evaluate the test. Continue reading