Article Text

## Statistics from Altmetric.com

The bridge from complex models to the clinicians' practice

In this issue of the journal, Feinstein has provided great insight and correctly pointed out a number of problem areas in current research on diagnostic tests^{1}: dichotomising a disease state into yes and no, and a test result into positive and negative, do not represent the real clinical situation. Sensitivity, specificity, and likelihood ratios are calculated from patients with known disease status and therefore do not help practising physicians to make a diagnosis. Many research studies on diagnostic tests assume a disease prevalence (prior probability) of 0.5, which is unrealistic. The premise of diagnostic research, that sensitivity and specificity of a diagnostic test is constant, is now known to be wrong. Practising physicians often want to avoid the correct and academically recommended methods because of complex computations. Many diagnostic tests can produce additional information, but it is usually disregarded in the assessment of the tests. Tests in combination should be evaluated differently than a single test. Tests used to “rule out” or to “rule in” a disease should be evaluated differently. Tests used to identify the stage of a disease, or to offer reassurance rather than diagnosis, cannot be evaluated with conventional indices of accuracy. Tests based on subjective decisions are difficult to assess. Many research studies fail to comply with appropriate methodological standards and are often affected by “work up” bias. Current research seldom questions the accuracy of the “gold standard”. Human errors, such as intraobserver and interobserver variability and concordance, are seldom evaluated. Finally, Feinstein pointed out that the persistent focus on accuracy of diagnosis, while justified 60 years ago, is no longer appropriate, and that the current dominance of mathematical models and statistical approaches can preclude involvement of clinicians in diagnostic research.

In the past several decades, in diagnostic research that involves both clinicians and mathematicians, we have seen both a quest for simplicity and a quest for complexity.

Efforts to achieve simplicity started when Yerushalmy proposed in 1947 the indicators “sensitivity” and “specificity” for a dichotomous test, which are supposed to be simple constants for patients with and without a disease, respectively.^{2} Youden in 1950 went a step further in simplifying and created a single index by combining together sensitivity and specificity.^{3} The Youden index, which is “sensitivity plus specificity minus 1”, has been shown to be the excess test performance rate.^{4} (It turns out that the likelihood ratios are relative test performance rates.^{4}) One successful example of simplicity is the development of simple graphical techniques (“nomograms”) to replace the complex Bayes's theorem for calculating posterior probabilities from prior probabilities.^{1} Clinicians can visualise and use and are, therefore, in favour of simple approaches. A number of years ago, a group of knowledgeable clinicians in Toronto questioned why the tests ordered for the diagnosis of leukaemia that came back from different laboratories seemed to frequently contradict each other. A simple interrater agreement study was conducted and the κ statistics confirmed the clinicians' suspicion. Results from four diagnostic methods (routine morphology, electron microscopy, cell surface marker, and cancer cytogenetics) correlated poorly for cell type identification in leukaemia.^{5}

While simple concepts and calculations are welcome by clinicians, who are the primary users of diagnostic tests, mathematicians continue to find problems in the simple models and develop complex models to correct such problems. In this regard, we see an evolution from the use of “sensitivity” and “specificity” for a dichotomous diagnostic test with known disease status,^{2} to “predictive values” (posterior probabilities) for a dichotomous test with unknown disease status,^{4} “likelihood ratio” for an ordinal test with known disease status,^{1} and the use of the slope of the receiver operating characteristic (ROC) curve to estimate the likelihood ratio for a continuous test.^{6} It has even been pointed out that there are at least three likelihood ratios and three corresponding slopes on an ROC curve.^{6} (To put things into perspective, for a dichotomous test, positive likelihood ratio is simply the ratio of “sensitivity” to “1 minus specificity”, and negative likelihood ratio is the ratio of “1 minus sensitivity” to “specificity”.^{6} Furthermore, the positive likelihood ratio is the ratio of the “posterior odds” of disease to the “prior odds” of disease, while the negative likelihood ratio is the ratio of the “posterior odds” of no disease to the “prior odds” of no disease.^{4})

The quest for complexity is sometimes needed. For example, a mass screening for rectal neoplasm was found to be affected by work up bias.^{7} When a mathematical procedure was developed to correct for the bias,^{8} results were found to be completely reversed—the sensitivity of faecal occult blood test for rectal neoplasm dropped from 91.6% (without the correct procedure) to 27.8% (with the correct procedure).^{7} The new procedure resulted in a heated debate over which correction procedure is more correct for work up bias.^{9,}^{10} This has subsequently led to the development of three sets of complex mathematical models, using causal modelling, to estimate sensitivity and specificity of three types of tests.^{11}

In the search for ultimate perfection, however, mathematicians often get carried away and create complex models that become alienated from the clinicians. Feinstein's call for new unconventional research using qualitative methods and paradigm shift is very timely and welcome. The question is how to strike a balance between simplicity and complexity. The solution seems to once again lie jointly in the hands of clinicians and mathematicians.

One way to provide a balance is to create complex models with simple model-user interface. In the olden days, if one wants to create or use a machine, such as a horse drawn cart or a sail boat, one must understand the six simple machines (lever, pulley, wheel and axle, inclined plane, wedge, and screw). With modern technology, however, such thorough knowledge is no longer possible or necessary. Modern automobiles and ocean liners are created by teams of experts, who themselves can understand and contribute to only a small part of the complex problems. Most users do not have a clue as to what is going on inside the machines, because using such modern machines has become simpler, such as with the introduction of cruise controls and auto-pilots. In terms of diagnostic research, it could mean development of automated computer software, with specified assumptions and restrictions for users, which requires only simple inputs, to do complex calculations, and to generate simple outputs that users can easily interpret and apply.

To quote from Major Greenwood, “The scientific purist, who will wait for medical statistics until they are nosologically exact, is no wiser than Horace's rustic waiting for the river to flow away” (page IX).^{12} Diagnostic tests are imperfect but improving, as is research on diagnostic tests.

The bridge from complex models to the clinicians' practice