Background In health studies, proportions and percentages can often seem more informative than raw counts and therefore appear to be of more interest to analysts. However, it has long been acknowledged that their use is problematic in correlation and regression analyses where they comprise common components that are present in both the dependent and independent constituents of a model (exposure and outcome), as in the regression analysis of proportions with common denominators. We demonstrate this so-called mathematical coupling with real-world examples aided by directed acyclic graphs (DAGs) and simulations.
Methods We consider three possible real-world scenarios: (1) the population size (N) of a geographical area causes both the number of people living in detached houses (X) and the number of people living in care homes (Y), within each area, but the number of detached houses (X) does not cause the number of care homes (Y) within any area, or vice versa; (2) the population size (N) of a geographical area causes both the number of people with no formal qualifications (X) and the number of people with poor self-reported health (Y), while both the population size (N) and number of people with no formal qualifications (X) are causes of the number of people with self-reported poor health (Y); and (3) within a geographical area, the area wealth (X) causes the number of elderly people (N), while both area wealth (X) and the number of elderly people (N) cause social care expenditure (Y).
Results We show how historical solutions to the issue of mathematical coupling caused by a common denominator hold under the situation when the denominator is a confounder of the exposure outcome relationship; i.e. the results of the simulated examples under scenarios 1 and 2 result in expected regression coefficients. The same solution does not hold in scenario 3, when the denominator is a mediator (i.e. lies on the causal path) between the exposure and outcome.
Conclusion We show how DAGs and accompanying causal graph theory can be used to understand a problem first presented over a century ago. We highlight the issue of mathematical coupling when analysing proportions with a common denominator, showing under which circumstances historical solutions are valid or invalid. By using real-world examples to inform simulations, we demonstrate the utility of DAGs and causal graph theory in health geography and observational research to understand statistical problems and to verify proposed solutions.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.