What do I need to know about millisecond timing accuracy?
Find out more
If you are a psychologist, neuroscientist or vision researcher who uses a computer to run experiments, and report timing accuracy in units of a millisecond, then it's likely your timings are wrong! This can lead to replication failure, spurious results and questionable conclusions. Timing error affects your work even when you use an experiment generator like E-Prime, SuperLab, Inquisit, Presentation, Paradigm, OpenSesame or PsychoPy etc.

Our products sole aim is to help you improve the quality of your research prior to publication. The Black Box ToolKit v2 for example helps you check your own millisecond timing accuracy in terms of stimulus presentation accuracy; stimulus synchronization accuracy; and response time accuracy.

A summary of what types of millisecond timing error likely to affect your computer-based experiment is shown below:

Millisecond timing error means that your experiment is not working as you intended and that your results might be invalid.
  • Are you always carrying out the experiments you assume you are?

  • Are you aware of millisecond timing error in your own experiments?

  • Are you confident you can replicate experiments using different hardware and software in another lab?

The key question you should be asking yourself is, "Am I confident in my findings and would I be happy for a researcher in the same field to independently check my experiments?"

Are you putting your reputation at risk?

 
 
Idealised experiment shown top, what may happen in reality on your own equipment bottom (click to enlarge)  

Put simply, if you are using a computer to run experiments and report timing measures in units of a millisecond then it's likely that your presentation and response timings are wrong! Modern computers and operating systems, whilst running much faster, are not designed to offer the user millisecond accuracy. As a result you may not have conducted the experiment you thought you had!

Hardware is designed to be as cheap as possible to mass produce and to appeal to the widest market. Whilst multitasking operating systems are designed to offer a smooth user experience and look attractive. No doubt you'll have noticed that your new computer and operating system doesn't seem to run the latest version of your word processor any faster than your old system!


Don't commercial experiment generator packages solve all my problems?
Unfortunately using a commercial experiment generator such as, E-Prime, SuperLab, Inquisit and the like will not guarantee you accurate timing as they are designed to run on commodity hardware and operating systems. They all quote millisecond precision, but logically, "millisecond precision" refers to the timing units the software reports in and should not be confused with "millisecond accuracy", i.e. do events occur in the real world with millisecond accuracy.

If you write your own software you will remain just as uncertain as to its timing accuracy. You should also be wary of in-built time audit measures as they can lead to a false sense of security as they are derived by the software itself. For example, if you swap a monitor it is impossible for the software to know anything about a TFT panels timing characteristics, or for that matter about a response device, soundcard or other device you are working with.

It is also impossible to find out which experiment generator offers the most accurate presentation and response timing using generic benchmarks. Often such benchmarks have been conducted using devices such as our BBTK v2, or homemade response hardware, and the experiment generator scripts tuned to give consistent results. The fatal flaw in such an approach is that the authors have tuned the experiment generator to give good results on their own hardware within a very simple script. If you think about it for a moment what this actually shows is that you should be checking and tuning your own experiment on your own hardware with a BBTK v2 to give better results. Results from generic benchmarks cannot possibly apply to your own hardware and experiment as they will be markedly different.


What about switching to Mac/PC/Linux?

It doesn't matter which hardware you work with, PC or Mac, which operating system you use, Microsoft Windows, Apple's OS X or a variety of Linux, you will succumb to timing error. What's more it's getting harder to source the equipment you might have used previously. For example CRT monitors are now virtually impossible to source at a reasonable cost. Input lag can have a huge effect on TFT panels whereas traditional CRTs don't suffer from this effect and can be well over 20x faster when displaying images. What's more each TFT make and model has different timing characteristics for input lag and panel response time. This means you should check each and every TFT you use. If you can see or hear it – you know you have a problem!


Human variability and adding more trials

There has been a long standing argument that human responses are far more variable than the hardware and software itself. In most cases this is only true if the error is truly random, within certain limits and you are not interacting with other external hardware. This can make carrying out replications difficult due to spurious artefacts and conditional biases. In the same way carrying out an unspecified additional number of trials will not lessen the effect of any systematic presentation, synchronisation or measurement error.


Aren't humans pretty slow?

The latest research suggests that humans may actually be able to process information much faster than previously thought. For example, Thurgood et al (2011) proposes that humans can identify animals with only 1 millisecond of visual exposure. To her credit to be able to test this her team had to develop their own light-emitting diode (LED) tachistoscope. Put simply off the shelf equipment was simply not fast enough. If differences as small as a millisecond can have an experimental effect this implies that timing errors in a typical study could also have more of an effect than you might think. In the auditory arena a lag of just 10 milliseconds can be reliably detected.


Human error when designing experiments

Human error when creating the experimental scripts themselves is also an unrecognised problem. For example, software commonly used for experimental work has a variety of settings which can affect presentation of both audio and visual stimuli. Often researchers are unsure what impact various settings might have. It is also not unknown for researchers to set incorrect values or introduce bugs into their own code that affects timings. Such errors can be clearly identified and corrected if studies are checked at an early stage.


Do computers lie?

Computers don't, and more to the point can't always do what you tell them and you shouldn't blindly rely on the results they give you. For example you can tell a piece of software used to run experiments to present a priming image for 11 milliseconds whilst playing a tone in the left headphone for 100 milliseconds. You've dialled in the numbers, the computer has accepted them, but the hardware can't possibility do what you've asked due to TFT panel input lag and soundcard start-up latency. The question is does this make your experiment less valid because you are not running the experiment you thought you were? More shockingly different hardware and software has wildly different timing characteristics. So if you reran your study with identical stimulus materials and settings but on different hardware would you be running a different study? Would your results be comparable?


Face and faith validity

In terms of computer-based studies often researchers are prepared to blindly believe what the computer tells them. If the computer reports that a reaction time is 300.14159265 milliseconds because there are quite a few digits after the decimal place on the face of it surely this must be an accurate measure? Well actually no. All it tells us is that the computer is quite precise but not that it has given you an accurate measure. A wall clock can be 10 minutes slow but be accurate to the second. If we knew this would we still say the time we read from its face is accurate? If we didn't know the clock was 10 minutes slow then it would also achieve faith validity. In much the same way we place our faith in computers being accurate when often they are not.


In a nutshell

In a nutshell bad timing will negatively affect the reliability and validity of your experimental work and the results you find. Plus you may also not be able to replicate your own findings over the longer term. The cornerstone of good science is experimental control and replication.