Article Text

Download PDFPDF

The need for expanding and re-focusing of statistical approaches in diagnostic research
  1. H Brenner1,
  2. T Stürmer1,
  3. O Gefeller2
  1. 1Department of Epidemiology, German Centre for Research on Ageing, Heidelberg, Germany
  2. 2Department of Medical Informatics, Biometry and Epidemiology, University of Erlangen-Nuremberg, Germany
  1. Correspondence to:
 Dr H Brenner, Department of Epidemiology, German Centre for Research on Ageing, Bergheimer Str 20, D-69115 Heidelberg, Germany;

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Fitting statistical methodology to the need of diagnostic research

In his contribution “Misguided efforts and future challenges for research on diagnostic tests”1 Dr Feinstein has identified major gaps and shortcomings in previous and current diagnostic research. While we fully agree with most of his criticisms, we would like to take issue with him over the role of statistics and mathematical formalisation in diagnostic research. In particular, we would like to emphasise the need and potential of expanding and re-focusing rather than abandoning statistical approaches in diagnostic research.

The traditional concepts of sensitivity and specificity as well as of the “posterior probabilities” have certainly been useful as a methodological framework for structuring efforts to quantify accuracy of diagnostic markers in well defined, very special settings in the past, and they may continue to be useful as such in the future. The major limitation of these concepts does not so much lie in their intrinsic properties, but in the uncritical adoption of these concepts to a wide range of different, usually more complex settings. This misapplication along with some misconceptions outlined below have often been severely misguiding indeed.


An important example is the dogma still found in most textbooks of clinical epidemiology and biostatistics, that the sensitivity and specificity as well as the likelihood ratios are constant benchmarks of test performance, which, in contrast to the posterior probabilities, are independent of disease prevalence in the population studied. As pointed out by Dr Feinstein, this dogma has repeatedly been challenged in various settings by empirical counter-evidence. Furthermore, it has been demonstrated by more general methodological work that in situations commonly encountered in practice, in which diagnostic tests are based on dichotomisation of inherently continuous traits rather than on inherently dichotomous traits, major departures from this dogma are expected to be the rule rather than the exception.2 In particular, it has been shown that for tests based on dichotomisation of inherently continuous traits, variation with disease prevalence is typically expected to be strong for sensitivity and specificity, and even more so for the likelihood ratios. Although positive and negative predictive values also strongly vary with disease prevalence, this variation is usually much less pronounced than one would expect under the (incorrect) dogma of independence of sensitivity and specificity of disease prevalence.


These findings have clear implications for diagnostic marker evaluations and their interpretations. The performance of diagnostic tests must clearly be related to the population characteristics in which it is evaluated. Simply estimating overall values of sensitivity and specificity, or of measures derived from these parameters, should no longer be sufficient to characterise test performance. Furthermore, diagnostic studies should be designed in such a way that they also allow to estimate dependence of test performance on characteristics of the study population and on the study setting.

In particular, it should become common practice in diagnostic studies to evaluate the role of covariates, such as sex, age, comorbidity, and medication, in addition to disease prevalence in the study population, on the outcome of a diagnostic test. Multivariate approaches that have surprisingly much less tradition in the evaluation of diagnostic tests than in other fields of epidemiology, may help to figure out the independent contribution of different factors in this context. Multivariate approaches may also help to improve the validity of diagnostic tests by taking covariates into account. For example, some suitable function of the result of the diagnostic test itself and of the values of the covariates may often be more predictive of the true disease status than the result of the diagnostic test alone. In this context, the framework of generalised linear models offers an appropriate statistical modelling approach to select the relevant covariates and to find a suitable function for predicting the true disease status from the available information, thereby potentially enhancing the predictive ability of the diagnostic test.


We agree with Dr Feinstein's notion that previous predominant or even exclusive focus on diagnostic accuracy does not adequately capture the challenges for research on diagnostic tests. A particularly important example is discrimination between two or more alternative diagnoses, which often is a more relevant diagnostic task than distinguishing between absence or presence of a certain disease in most practical situations. For example, the most challenging part of the diagnosis of acute stroke is the differentiation of cerebral infarction and cerebral haemorrhage.3,4 Other well known examples include the challenging differentiation between Crohn's disease and ulcerative colitis 5,6 or between benign prostata hyperplasia and prostate cancer among patients with intermediate levels of prostate specific antigen.7

Weinstein and Fineberg have briefly addressed situations, in which several diseases are under diagnostic consideration. They provided a generalisation of Bayes's theorem that yields individual probabilities for each alternative diagnosis given a specific test result.8 More recently, specific statistical approaches to quantify the differential diagnostic value of diagnostic procedures to distinguish between two among several alternative diagnoses have been proposed,9 which may be more meaningful in this context than measures of diagnostic accuracy. However, these approaches are only occasionally applied in practice and followed up in diagnostic research so far, and they need to be expanded to more complex situations of diagnostic decisions.

The approaches outlined above illustrate, that overcoming the deficits outlined by Dr Feinstein do not imply the need to leave mathematical models and statistical approaches. The opposite may be true: we have to strive for a deeper understanding of the necessary statistical methodology to find better formal strategies of conceptualising what really matters in diagnostic research. A re-formulation of the mathematical models and statistical approaches in a way that better reflects the actual challenges encountered in clinical practice may be a promising avenue to pursue. Pertinent collaborative efforts of clinical investigators and methodologists who are open to depart from conventional “paradigms” seem to be most promising in this context and should deserve particular encouragement by funding agencies and editorial boards.

Fitting statistical methodology to the need of diagnostic research