How metrologists use comparisons
National Metrology Institutes (NMIs) have always used comparisons for scientific purposes, to test their methods and especially their uncertainty analysis. In the early stages of research these scientific comparisons show up the unknown unknowns – the differences between participants that are not (yet) considered in the uncertainty analysis. At this point, comparisons tend to be informal and performed, for example, through participants visiting each other’s facilities. As a field matures and the technical approaches move from research to operational services, comparisons show increasing agreement between participants. At this point the role of comparisons changes from research into auditing and peer review.
This second purpose was formalised in 1999 by the signing of the Mutual Recognition Arrangement (MRA) by the world’s NMIs. The MRA says that ‘within an appropriate degree of equivalence’ the results of one NMI can be considered equivalent to the results of another NMI. In practice this enables world trade and the use of artefacts and instruments calibrated in another country. Being a legal process, the MRA relies on NMIs regularly reviewing each other’s calibration and measurement capabilities through a combination of formal peer review and auditing and through formal ‘key comparisons’ that compare the measurement capability of laboratories – both at the international level (by a handful of laboratories with, generally, the lowest uncertainties) and at the regional level (e.g. within Europe or within Asia-Pacific).
The formal key comparisons are run with strict guidelines and are always blind comparisons (only one ‘pilot’ laboratory has access to the results before they are published). There is ongoing discussion about the best ways of analysing such comparisons, and in particular about the choice of the Key Comparison Reference Value (KCRV) against which all participants are compared. In very mature fields, where the differences between the measured values of the different participants and the KCRV are consistent with uncertainties, the most common KCRV is the weighted mean of the results of the different NMIs. In fields where there is more spread, this may not be the appropriate choice and alternatives (including ‘weighted mean with cut-off’ which limits the weight assigned to the laboratories with the lowest uncertainties, or simply using a median value) are considered.
It is important to note that for metrologists the purpose of comparisons is to test and validate uncertainty claims. Comparisons are not performed to estimate uncertainties.
Ideally a bilateral comparison is made between two independent observations, each with full uncertainty analysis, to calculate the equivalence ratio:
where and are the two independent measured values and and are the two standard uncertainties associated with those measured values. is the standard uncertainty associated with the comparison itself (e.g. from a known difference between the observation conditions – matchup uncertainties) and is the coverage factor for the appropriate confidence interval (usually to have a confidence interval of 95 %).
An equivalence ratio suggests that the two measured values agree within their uncertainties, while a larger ratio suggests that at least one uncertainty is underestimated.
For a multilateral comparison (with multiple independent observations, e.g. by several participants) then it is possible either to do every pair of bilateral comparison and to present data in a table showing whether the ratio is greater or less than 1 for each pair of observations, or to determine a comparison reference and calculate the ratio for each participant with respect to the reference, using:
Note that while the reference can be arbitrarily chosen, some care must be taken in choosing it as it is easy to interpret an as implying “bad” data, which may have political and potentially financial repercussions. Within the metrology community a weighted mean of all values may be chosen as a reference.