What do I need to know about millisecond timing?
Find out more
If you are a psychologist, neuroscientist or vision researcher who uses a computer to run experiments, and report timing accuracy in units of a millisecond, then it's likely your timings are wrong! This can lead to replication failure, spurious results and questionable conclusions. Timing error will affect your work even when you use an experiment generator like E-Prime, SuperLab, Inquisit, MATLAB, Presentation, Paradigm or PsychoPy…

The Black Box ToolKit lets you quickly and easily check your own millisecond timing accuracy in terms of stimulus presentation accuracy; stimulus synchronization accuracy; and response time accuracy.

Timing error means that your study is not working as you intended and that your results might be spurious. Are you putting your reputation at risk?

The Black Box ToolKit guarantees your ability to replicate.

 

 

Ask yourself:

  • Are you always carrying out the experiments you assume you are?
  • Are you aware of millisecond timing error in your own experiments?
  • Are you confident you can replicate experiments using different hardware and software in another lab?

Put simply, if you are using a computer to run experiments and report timing measures in units of a millisecond then it's likely that your presentation and response timings are wrong! Put simply, modern computers and operating systems whilst running much faster are not designed to offer the user millisecond accuracy. You have not conducted the experiment you thought you had! Hardware is designed to be as cheap as possible to mass produce and to appeal to the widest market. Whilst multitasking operating systems are designed to offer a smooth user experience and look attractive. No doubt you'll have noticed that your new computer and operating system doesn't seem to run the latest version of your word processor any faster than your old system!

Do computers lie?
Computers don't, and more to the point can't, always do what you tell them and you shouldn't blindly rely on the results they give you. For example you can tell a piece of software used to run experiments to present a priming image for 11 milliseconds whilst playing a tone in the left headphone for 100 milliseconds. You've dialled in the numbers, the computer has accepted them, but the hardware can't possibility do what you've asked due to TFT panel input lag and soundcard start-up latency. The question is does this make your experiment less valid because you are not running the experiment you thought you were? More shockingly different hardware and software has wildly different timing characteristics. So if you reran your study with identical stimulus materials and settings but on different hardware would you be running a different study? Would your results be comparable?

In a nutshell
In a nutshell bad timing will negatively affect the reliability and validity of your experimental work and the results you find. Plus you may also not be able to replicate your own findings over the longer term. The cornerstone of good science is experimental control and replication.

The key question
The key question you should be asking yourself is, "Am I confident in my findings and would I be happy for a researcher in the same field to independently check my experiments?" With the fallout from the Stapel fraud case and the push for Open Access Journals together with the sharing of research data and resources by funders researchers are coming under much more scrutiny. Are you putting your reputation at risk? If you need some convincing take a look at why millisecond timing accuracy can be more difficult to achieve than you might think and how it can affect any study.

Face and faith validity
In terms of computer based studies often researchers are prepared to blindly believe what the computer tells them. If the computer reports that a reaction time is 300.14159265 milliseconds because there are quite a few digits after the decimal place on the face of it surely this must be an accurate measure? Well actually no. All it tells us is that the computer is quite precise but not that it has given you an accurate measure. A wall clock can be 10 minutes slow but be accurate to the second. If we knew this would we still say the time we read from its face is accurate? If we didn't know the clock was 10 minutes slow then it would also achieve faith validity. In much the same way we place our faith in computers being accurate when often they are not.

Anti-patterns
Most researchers have a tacit understanding that they should construct well controlled computer-based studies and that presentation and response timing is important. However, unless made a requirement by journal publishers and funders many will continue in their current bad practices. These might be thought of as anti-patterns (Koenig 1995). In software engineering, an anti-pattern is a pattern that may be commonly used but is ineffective and/or counterproductive in practice. Two key elements distinguish an actual anti-pattern from a simple bad habit, bad practice, or bad idea:

  • Some repeated pattern of action, process or structure that initially appears to be beneficial, but ultimately produces more bad consequences than beneficial results, and
  • A refactored solution exists that is clearly documented, proven in actual practice and repeatable

In the case of computer-based experiments this means that researchers may forgo adequate testing in order to save themselves perceived additional workload. However a proven solution exists to control for poor presentation and response timing. Namely the Black Box ToolKit.

More and more complex experiments
All the while researchers continue to create ever more complex computer administered paradigms that look for increasingly small differences between conditions. Often paradigms interact and synchronise with complex third party equipment, e.g. fMRI, EEG, eye trackers, biofeedback devices etc, which in turn introduce many more variables that can affect timing accuracy. Unaccounted for systematic biases and uncontrolled confounding variables can alter the whole experimental outcome and make replication difficult. Just because modern hardware and software lets you construct pretty much any experiment you like this doesn't mean it's doing what you tell it or for that matter that it did what it told you it did. Buying a faster computer or "better" software will not help. Unless you control for timing error you can never be totally confident of your results.

Don't commercial experiment generator packages solve all my problems?
Unfortunately using a commercial experiment generator such as, E-Prime, SuperLab, Inquisit and the like will not guarantee you accurate timing as they are designed to run on commodity hardware and operating systems. They all quote millisecond precision, but logically, "millisecond precision" refers to the timing units the software reports in and should not be confused with "millisecond accuracy". If you write your own software you will remain just as uncertain as to its timing accuracy. You should also be wary of in-built time audit measures as they can lead to a false sense of security as they are derived by the software itself. For example, if you swap a monitor it is impossible for the software to know anything about a TFT panels timing characteristics, or for that matter about a response device, soundcard or other device you are working with.

What about switching to Mac/PC/Linux?
It doesn't matter which hardware you work with, PC or Mac, which operating system you use, Microsoft Windows, Apple's OS X or a variety of Linux, you will succumb to timing error. What's more it's getting harder to source the equipment you might have used previously. For example CRT monitors are now virtually impossible to source at a reasonable cost. As we have seen input lag can have a huge effect on TFT panels whereas traditional CRTs don't suffer from this effect and can be well over 20x faster when displaying images. What's more each TFT make and model has different timing characteristics for input lag and panel response time. This means you should check each and every TFT you use. If you can see or hear it – you know you have a problem!

Human variability and adding more trials
Finally there has been a long standing argument that human responses are far more variable than the hardware and software itself. In most cases this is only true if the error is truly random, within certain limits and you are not interacting with other external hardware. However, as can be seen from the video showing input lag on two TFT monitors typical error is no longer random. This can make carrying out replications difficult due to spurious artefacts and conditional biases. In the same way carrying out an unspecified additional number of trials will not lessen the effect of any systematic presentation, synchronisation or measurement error.

Aren't humans pretty slow?
The latest research suggests that humans may actually be able to process information much faster than previously thought. For example, Thurgood et al (2011) proposes that humans can identify animals with only 1 millisecond of visual exposure. To his credit to be able to test this he and his team had to develop their own LED light-emitting diode (LED) tachistoscope. Put simply off the shelf equipment was simply not fast enough. If differences as small as a millisecond can have an experimental effect this implies that timing errors in a typical study could also have more of an effect than you might think. In the auditory arena a lag of just 10 milliseconds can be reliably detected.

Human error when designing experiments
Human error when creating the experimental scripts themselves is also an unrecognised problem. For example, software commonly used for experimental work has a variety of settings which can affect presentation of both audio and visual stimuli. Often researchers are unsure what impact various settings might have. It is also not unknown for researchers to set incorrect values or introduce bugs into their own code that affects timings. Such errors can be clearly identified and corrected if studies are checked at an early stage.

Reliability and validity
Writing in The Psychologist magazine, in "Replication, replication, replication", Ritchie, Wiseman and French (2012) noted that replication was one of the cornerstones of scientific progress. We would wholeheartedly agree!

Ritchie et al highlighted some startling attempts to replicate recently published work but failed miserably. The work they were trying to replicate was that of Bem (2011b) who tested the psychological phenomena and 'time-reversing'.

By far the largest effect size was obtained in Bem's final experiment, which investigated the 'retroactive facilitation of recall'. In this procedure, participants were shown a serial list of words, which they then had to type into a computer from memory in a surprise free recall test. After the test, the computer randomly selected half of the words from the list and showed them again to the participants. Bem's results appeared to show that this post-test practice had worked backwards in time to help his participants to remember the selected words – in the recall test they had remembered more of the words they were about to (randomly) see again. If these results are true, the implications for psychology – and society – are huge. In principle, experimental results could be confounded by participants obtaining information from the future, and studying for an exam after it has finished could improve your grade! As several commentators have pointed out, Bem's (2011b) experiments were far from watertight – for instance, Alcock (2011) and Yarkoni (2011) have outlined numerous experimental flaws in the design.

We might suggest that at least a proportion of the effect could be due to timing error inherent in the equipment used. Further we suggest that many more unsuccessful replications or the original results themselves might be due to such errors. If your own replication was unsuccessful are you certain that timing error isn't the cause?