Background Dichotomisation of continuous variables before analysis has frequently been criticised but, nonetheless, remains a common approach. We were interested in the effects of dichotomisation of an outcome variable when two predictors are examined.
Methods Assuming a log-normally distributed continuous outcome, a three-level and a binary independent variable, we evaluated the results that would be obtained by logistic regression after dichotomisation. Different cut-offs, predictor effects and dispersions were examined, with a special focus on interaction terms.
Results Depending on the specific parameter combination, dichotomisation introduced sometimes substantial spurious interactions between the two predictor variables regarding their association with the outcome. These interactions could be assigned statistical significance even with modest sample sizes. Real-life data on sex×weight as determinants of γ-glutamyltransferase provided a practical example of these issues.
Conclusions The findings presented add a new aspect to the controversy surrounding dichotomisation of continuous variables. Researchers should critically examine whether the validity of their results might be hampered by such phenomena.
- spurious association
Statistics from Altmetric.com
Dichotomisation of continuous variables is frequently done in many research fields, including biomedical sciences and epidemiology. This pertains to both outcome and predictor variables, and may be motivated by prevailing disease definitions (eg, hypertension1), strongly non-linear relationships with a continuous parameter (eg, cardiovascular risk and glomerular filtration rate2) or by various conventions regarding what is seen as normal or abnormal (eg, serum γ-glutamyltransferase (gGT) concentrations3). Reporting results in terms of ORs rather than linear regression coefficients may sometimes appear advantageous, and additional arguments for dichotomisation have been discussed by others.4 5
It is well known that dichotomisation (or more generally categorisation) results in a loss of information and, consequently, reduced statistical power.5–7 Furthermore, it has been demonstrated that dichotomisation of multiple predictor variables may lead to spurious associations and interactions regarding their associations with the dependent variable.4 8 However, discussion of the effects of outcome dichotomisation have largely focussed on the decreased analytical efficiency6 9 or the strength of associations observed, including the potential impact of changing cut-off definitions.6 7 10
The aims of the present study were to examine how and to what extent dichotomisation of a continuous outcome variable may introduce a spurious interaction between two independent variables. Given the current excitement about sex-specific genetic effects, our evaluations focussed on a scenario with a two-level stratum indicator (sex) and a three-level predictor (single nucleotide polymorphism). We demonstrate the issue with real-life example data of similar structure (sex×normal/overweight/obese).
Table 1 presents the nomenclature used throughout this report. We assumed the quantitative outcome variable y to be log-normally distributed with log-SD σ and log-mean μES, where μES depends on stratum S (considering two strata, say, male and female, this may take on the value of 0 or 1) and exposure E (considering a three categorical exposure, say, a genotype with the values GG, GT and TT, this may take on the value of 0, 1 or 2) and satisfies the simple equation log(μES)=μ0+βS·S+βE·E. Thus, the continuous data could be perfectly modelled by a semi-logarithmic linear regression model of the form log(y)=α+β1·x1+β2·x2+e, and there clearly was no interaction present between S and E—that is, regardless of S, y would increase exp(βE)-fold per unit increase in E.
Assuming that some researchers would prefer for any of the above mentioned reasons to analyse these data after dichotomisation of the outcome, we calculated the odds of y exceeding a specified cut-off value exp(c) in each of the six stratum×exposure categories, based on the log-normal distribution function. To reveal possible interactions between the two independent variables, we derived the ratios of the stratum-specific ORs associated with each exposure category as indicated in table 1. The resulting OR interaction terms IA1 and IA2 (ie, interaction terms for the stratum S with the exposure E at E=1 and E=2, respectively) were examined over a range of values of βS, βE, c and σ, with the intercept term μ0 fixed at 0. The parameters were changed one by one, always keeping all other parameters constant at their arbitrarily chosen standard values (βS=1; βE=1; c=1; σ=1). We visualised the results by tabulation and plotting.
The most common approach to analysing the dichotomised outcome presumably would be by logistic regression. To get an impression as to the danger of assigning statistical significance to spurious interactions introduced by outcome dichotomisation, we sought the sample size for which product interaction terms between S and E would appear statistically significant (p<0.05) in likelihood ratio tests with 2 degrees of freedom (E treated as a categorical predictor) or 1 degree of freedom (E treated as a continuous predictor; commonly referred to as ‘additive genotype model’ in genetic or ‘linear trend model’ in epidemiological studies). We assumed the strata to be equal-sized (imagine a sex ratio of 1) and the exposure categories 0-1-2 to follow a distribution of n:2n:n (imagine a single nucleotide polymorphism in Hardy–Weinberg equilibrium with minor allele frequency of 50%, the generally most powerful situation in genetic association studies). Simulation analyses were carried out using R 2.6.1 for Windows.11
Whereas we made a somewhat arbitrary choice by conducting our evaluations using log-normal data, one would expect the principal observations to be made in the present work to be of rather general nature. An advantage of using log-normally (instead of normally) distributed data is that in this setup the linear regression of the log-transformed outcome essentially is already on a multiplicative scale, and any findings made after dichotomisation and logistic modelling consequently could not be ascribed merely to a change in scales of the regression analyses.
To demonstrate the issues discussed with a practical example, we conducted an ad hoc analysis examining whether the effect of overweight and obesity on serum gGT levels differs by sex. Data originated from the baseline examination of an epidemiological study in Germany,10 12 and the dichotomisation cut-off was defined either as the upper reference limit for females (38 IU/l) of the assay used or based on the literature (50 IU/l).13 Example data analyses were conducted using SAS V.9.2 software.14
The main results are summarised in figures 1 and 2. Already with the parameter combination consisting of the chosen standard values, a positive interaction for the higher exposure category was introduced by dichotomisation.
As becomes clear from figure 1, the impact of assuming different exposure or stratum effects (βE, βS) was qualitatively very similar: the more the effect exceeded 1, the larger became both IA1 and IA2. Intriguingly—and more easily seen in table 2 rather than the graphical presentation—the interaction term IA2 (that is, the stratum×exposure interaction at the exposure level E=2) grew substantially faster when varying βE than when similarly changing βS, while the development of IA1 (stratum×exposure interaction at E=1) was exactly identical for changes in these two parameters.
With either coefficient approaching 0, the interaction ORs must approach 1. For exposure effects βE <1, this happened somewhat more rapidly than for βS <1, with the difference that IA2 remained >1 throughout for the stratum effect approaching 0, while it dropped below 1 in the βE evaluations (figure 1). IA1 dropped below 1 in both cases.
The picture was rather different for varying values of the dichotomisation cut-off parameter c (figure 2A). Here, positive interactions were introduced when the cut-off was shifted to the left— that is, c approached 0. However, when the cut-off increased beyond 1, both IA1 and IA2 pronouncedly dropped below 1. Whereas IA2 had generally exceeded IA1 in the prior evaluations, this relationship was clearly reversed for larger values of c.
Figure 2B finally presents the effect of varying the dispersion of the continuous outcome dichotomised. Given that IA1 would be constantly 1 if using only default values for the distribution shifts (table 2), a stratum effect of βS=1.5 was used for this evaluation. With increased dispersion diluting associations and rendering the y-distribution more and more uniform, the interaction terms unsurprisingly approached 1 for larger values of σ. In contrast, lowering σ lead to an extremely rapid growth of both IA1 and IA2.
Note that all of the results above are identical to those that would be obtained for a normally distributed outcome with mean μES=μ0+βS·S+βE·E, SD σ and cut-off c. Similarly, this would extend to a variable whose logit is normally distributed with means determined as above, if the cut-offs were applied on the logit scale.
When we evaluated the number of subjects required to obtain a statistically significant interaction finding in logistic regression models, the lowest number of subjects per homozygotes cell for the parameter combinations shown in table 2 occurred at βS=2 (LRT1df=312; LRT2df=665). Assuming this combination of values and a stratum and exposure distribution as detailed in the methods section, a 1 degree of freedom likelihood ratio test for a stratum×exposure trend interaction term could assign significance to the spurious interaction with a total study size as small as 2×312+2×624+2×312=2496 participants.
Results of our real-life example on gGT, sex and body mass index are presented in table 3. In the population studied, gGT was—as generally the case for this serum marker—highly skewed with median (interquartile range) of 37.3 (26.8–57.1) IU/l in males and 24.8 (18.5–36.5) IU/l in females, and natural log-transformation generated reasonable normality (not shown). Modelling the transformed gGT by linear regression, the effect estimates for overweight and obesity as determinants of gGT appeared to differ somewhat between males and females, but the statistical interaction test did not reach significance at α=0.05. However, the interaction p value was already smaller for the cut-off of 38 IU/l and ultimately dropped below the significance threshold for the cut-off of 50 IU/l. Thus, in contrast to the analysis of the continuous data, modelling the arbitrarily dichotomised outcome here would lead to a finding of statistically significant effect heterogeneity, in particular an interaction of weight and sex with respect to their role as risk factors for elevated serum gGT, depending on the choice of cut-off.
The elaborations presented above demonstrate that the dichotomisation of a continuous outcome variable can introduce spurious interactions between two predictor variables. Such interaction was caused in essentially all parameter scenarios examined, and in some cases could be assigned statistical significance assuming sample sizes often analysed in modern epidemiological studies.
Clinical dichotomisation thresholds, rightly, are widely applied and often of clear advantage for classifying, diagnosing and appropriately treating patients—for example, for obesity (body mass index above a certain cut-off) or hypertension (blood pressure above a certain cut-off)—but for analyses trying to understand the determinants of such characteristics, arbitrary dichotomisation may be not only unfounded but even hazardous. While multiple reasons have previously been brought forward against the widespread habit of dichotomising continuous variables prior to analysis,5–7 our findings add an additional facet to the list. The impact of only varying the dichotomisation cut-off on effect estimates and statistical power has been evaluated previously for situations with a single independent variable.6 Realising that our stratified scenario can be seen as a combination of two such bivariate situations, in which the single dichotomisation cut-off appears shifted in relation to the underlying distribution in one of the strata, our intriguing findings probably could have been anticipated. Somewhat surprisingly, however, the implications of the cut-point dependence for assessment of interactions have received little if any attention to date.
The phenomenon described would be of little relevance, if realistic studies never reached the sample size required to detect these interactions with any precision—for example because extreme cut-offs would make the dichotomised outcome too rare. It was not the objective of this study to evaluate all possible study designs and our simulations evaluated only the situation in which the genotypes (our three-level predictors) are in perfect Hardy–Weinberg equilibrium, the variant genotype is as frequent as possible (minor allele frequency 50%) and strata are of equal size. Nonetheless, the minimum relevant sample size found was well below the number of subjects that nowadays are routinely included especially in genetic epidemiology—a field heavily relying on the concept of statistical significance and p values. In this field, large sample sizes are typically needed to detect the mostly very small main effects of specific polymorphisms or haplotypes. Our findings, paired with the fact that sex heterogeneities of the effects of genetic predictors in particular of quantitative traits15 are currently very much en vogue, urge caution when interpreting pertinent results.
Our real-life example presented a situation in which informed—yet, nonetheless arbitrary—usage of cut-offs could result in different conclusions regarding the statistical significance of interactions between two predictors. As real life goes, the data chosen potentially suggested some effect heterogeneity also in the continuous model, even though these analyses conducted purely for instructional purposes clearly would require more extensive considerations and more detailed modelling before drawing conclusions beyond potential pitfalls of dichotomisation.
We only examined a log-normally distributed outcome and focussed on varying effect sizes and cut-off definitions without changing the general shape of the underlying distribution. For the sake of feasibility, we did not attempt to cover all possible parameter combinations or alternative distributions. One, consequently, cannot extend the results to the entirety of continuous outcomes of interest in the health sciences or other fields. These may sometimes not even lend themselves to continuous modelling; for example, because no statistical distribution fitting the outcome well can be identified and consequently no appropriate regression model can be specified. Our findings can only stipulate any researcher to critically question his/her results if obtained after outcome dichotomisation, investigate whether phenomena such as demonstrated here might have endangered the validity of the results given the particular real data at hand and determine if the pros of dichotomisation actually outweigh the cons in the individual analysis. If the latter is the case, stratum-specific cut-offs at equivalent positions relative to the stratum-specific distributions could prevent the exposure effects from spuriously varying between the strata, if the shape of the underlying distributions otherwise was equivalent in the strata (excluding the shift accounted for by the stratum-specific cutoff). Alternatively, one should consider the possibility of presenting appropriate analyses of the continuous outcome alongside the artificially binary analyses in order to reveal discrepancies possibly due to spurious interactions and probably worth discussion. It should also be noted that efficient methods have been developed to model the risk of exceeding a dichotomisation cut-off without discarding the information inherent to the underlying continuous data,9 16 though their performance regarding the scenario described here presumably needs further investigation.
Others have mentioned (or sung, for that matter) that, ‘Breaking Up is Hard to Do.’7 As shown in this report, making sense of relationships found after breaking up might be even harder and interactions observed might well be spurious oddities. This does not preclude consciously splitting up, but it contributes a new aspect to the controversy surrounding this common and sometimes unavoidable practice.
This work was supported by a grant within the German Research Foundation (DFG) priority programme SPP1226 “Nicotine” (Br1704/11-1). The real-life example data were collected as part of the ESTHER cohort baseline examination, which was funded by the Baden Württemberg Ministry of Research, Science and Arts. Over the years, many individuals have contributed to ESTHER and their efforts are gratefully acknowledged.
Funding Deutsche Forschungsgemeinschaft, Germany.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.