Article Text
Abstract
Metaanalysis is a method to obtain a weighted average of results from various studies. In addition to pooling effect sizes, metaanalysis can also be used to estimate disease frequencies, such as incidence and prevalence. In this article we present methods for the metaanalysis of prevalence. We discuss the logit and double arcsine transformations to stabilise the variance. We note the special situation of multiple category prevalence, and propose solutions to the problems that arise. We describe the implementation of these methods in the MetaXL software, and present a simulation study and the example of multiple sclerosis from the Global Burden of Disease 2010 project. We conclude that the double arcsine transformation is preferred over the logit, and that the MetaXL implementation of multiple category prevalence is an improvement in the methodology of the metaanalysis of prevalence.
 Meta Analysis
 Statistics
 Methodology
Statistics from Altmetric.com
Introduction
The large majority of metaanalyses are devoted to establishing the effects of interventions, and therefore aim to get a pooled estimate of effect size, be that relative risk, odds ratio (OR), risk difference, or weighted or standardised mean difference. However, metaanalysis methods can be useful as well to get a more precise estimate of disease frequency, such as disease incidence rates and prevalence proportions. For example, the Global Burden of Disease 2010 project aimed to obtain disease frequency estimates of a large number of diseases and conditions, often based on a limited number of studies, of usually varying quality.
This article looks at the metaanalysis of prevalence, using a concrete example from the Global Burden of Disease 2010 study: the distribution of severity in the prevalence of multiple sclerosis (MS). We first discuss the specific properties of prevalence as a variable and the options to deal with these in the metaanalysis, next look at the multiple category case, then discuss the implementation of these methods in the MetaXL software, present the simulation study and the example, and conclude.
Prevalence as a variable
While many epidemiologists will habitually speak of ‘prevalence rate’, prevalence is defined as a proportion: the number of cases of a disease in a population, divided by the population number. This definition implies that (1) prevalence is always between 0 and 1 (inclusive), and (2) the sum over categories always equals 1.
The definition of prevalence is the same as the definition of the binomial distribution (number of successes in a sample), and therefore the standard assumption is that prevalence follows a binomial distribution. With the main metaanalysis methods based on the inverse variance method (or modifications thereof), the binomial equation for variance (expressed as a proportion) can be used to obtain the individual study weights: 1
where p is the prevalence proportion, and N the population size.
Thus, the pooled prevalence estimate P then becomes (according to the inverse variance method): 2
with SE: 3
The CI of the pooled prevalence can then be obtained by: 4
where denotes the appropriate factor from the standard Normal distribution for the desired confidence percentage (eg, Z_{0.025}=1.96).
Transformations
While this works fine for prevalence proportions around 0.5, increasing problems arise when the proportions get closer to the limits of the 0..1 range. The first problem is mostly cosmetic: the equation for the CI does not preclude confidence limits outside the 0..1 range. While this is annoying, the second problem is much more substantial: when the proportion becomes small or big, the variance of the study is squeezed towards 0 (see equation (1)). As a consequence, in the inverse variance method, the study gets a large weight. A metaanalysis of prevalence according to the method described above therefore puts undue weight on the studies at the extreme of the 0..1 range.
The way to deal with these problems is to transform the prevalence to a variable that is not constrained to the 0..1 range, has an approximately normal distribution, and by being unconstrained avoids the squeezing of variance effect. The metaanalysis can be carried out on the transformed proportions, using the inverse of the variance of the transformed proportion as study weight. For final presentation, the pooled transformed proportion and its CI are back transformed to a proportion. We discuss two transformations that have been used: the logit and the double arcsine.
Logit
The logit transformation is given by1: 5with variance: 6
The back transformation to a proportion is done using: 7
However, while the logit transformation solves the problem of estimates falling outside the 0..1 limits, unfortunately, it does not succeed in stabilising the variance. Rather, there is a reversal of the variance instability of the nontransformed proportions: studies with proportions close to 0 or 1 get their variance estimates grossly magnified, while the variance for proportions around 0.5 is squeezed. So a large study with a small prevalence gets completely outweighed by a small study with prevalence around 0.5. The variance instability that plagues nontransformed proportions thus persists even after logit transformation.
Double arcsine
The double arcsine transformation is given by2: 8
with n the number of people in the category. The variance of t is given by: 9
The back transformation to a proportion is done using3: 10
with ‘sgn’ being the sign operator. A much simpler but less accurate alternative back transformation is: 11
The double arcsine transformation addresses both the problem of confidence limits outside the 0..1 range and that of variance instability, and it is therefore preferred over the logit transformation.
Multicategory prevalence
The discussion so far has implicitly been about single category prevalence (those with the disease present which of course implies a second category of those not in the first). But in some instances kcategory prevalences may be metaanalysed where k>2 (representing different health states associated with the disease such as mild, moderate and severe), and this complicates matters.
The first thing to mention is that a multicategory prevalence from a study is a single (vectorvalued) observation, which suggests that a single study weight across all categories applies. This is no problem in the case of the double arcsine transformation, because its variance depends on the population size (N) only. But for the untransformed and logit transformation options the variance of both p and logit(p) depends on p itself, and this implies that the same study would get a different weight in each category, which seems hard to justify.
A second issue is the calculation of the heterogeneity statistic Cochran's Q. With multiple categories, the standard calculation for Q will return a different value for each category, again something that is hard to justify. Moreover, Q is used for further calculations, in the random effects model for τ^{2}, and in the quality effects model for the overdispersion correction (see below).
The third issue is that because of the nonlinearities introduced by the logit and double arcsine transformations, the backtransformed pooled proportions from multiple categories will not necessarily add up to 1. We discuss solutions to these three issues in the next section.
The MetaXL implementation
MetaXL is a new, freely available, software program for metaanalysis in Microsoft Excel, available from http://www.epigear.com. It implements the standard fixed effects (inverse variance, Mantel–Haenszel, Peto) and random effects models. MetaXL is unique in that it also implements a quality effects model, which allows explicit accounting for study quality in the metaanalysis.4–6
MetaXL implements three variants of metaanalysis of prevalence: untransformed, logit transformed and double arcsine transformed prevalence, with the double arcsine transformation as the default option. The user can specify as many categories as desired, and can choose between fixed effects (inverse variance only), random effects and quality effects models.
The backtransformation in equation (10) can become numerically unstable when sin t is close to 0, and to avoid this, the double arcsine back transformation is implemented as follows in MetaXL.
Let: 12
where is the pooled t. Then: 13
otherwise
where is the pooled prevalence and is the pooled variance. Instead of the harmonic mean, as suggested by Miller,3 we use because our simulation studies suggest that this is a more stable estimate for N than the harmonic mean. This works because the variance of a double arcsine transformed proportion is approximately 1/N.
The lower (LLC) and upper (ULC) limit are given by: 14
otherwise
and 15
otherwise
where .
The three issues caused by generalising to more than two categories are dealt with as follows. To obtain a single study weight across categories for the untransformed and logit transformed prevalences, MetaXL uses the inverse of the average of the category variances.
For the second issue, to obtain a single estimate for Cochran's Q, we have chosen to take the maximum category Q. This is because if we believe that the effect size variations across studies in one category of proportions is not independent of variations in the other categories, the maximum category Q value would be the best and most conservative estimate of a common Q.
For the random effects model, a single τ^{2} is calculated in the usual way from this common Q and this τ^{2} is added to the previously computed average variance across categories. Inverting this inflated variance obtains the random effects weight for each study. The pooled variance for each category is computed by first adding τ^{2} to category variances, invert to a weight, and sum the category weights over studies. The pooled category variance is then obtained from the reciprocal of the summed category weights.
In the quality effects model, the redistribution of inverse variance weights is done using a quality parameter between 0 (lowest quality) and 1 (highest quality), individualised to each study rather than the common τ^{2} parameter across studies as is done with the random effects model.5 ,6 In addition to this, the quality effects model applies an overdispersion correction based on Cochran's Q χ^{2} statistic, resulting in a more conservative CI in the presence of heterogeneity.4 Note that, unlike in the random effects model, this increase does not affect the study weights. For the multicategory prevalence analysis, the overdispersion correction is applied to each of the categories separately, thus boosting the CIs in case of heterogeneity.
The third and final issue is that, due to the transformations, the back transformed proportions may not sum to 1. The difference is usually small, and MetaXL deals with it by offering the option to normalise the prevalences in each category after pooling and back transformation. The CIs, however, are not adjusted.
Prevalence and heterogeneity
Heterogeneity is the main issue with metaanalysis. When study results are heterogeneous, we cannot assume that the same phenomenon has been measured in a sufficiently similar way and that differences in results are due to sampling error only. The original random effects model assumed that the observed phenomenon follows a Normal distribution, and that individual studies are random draws from this distribution.7
We have argued that, while the observed phenomenon may well have a Normal distribution, the largest part of the heterogeneity is most likely due to differences in the way the studies were carried out: differences in case definitions, biases, etc.4–6 ,8 The quality effects model was developed to take these differences explicitly into account by way of a study quality score, thus giving lower weights to studies of lower quality.
In Burden of Disease studies, the aim is to get a best estimate of disease frequency (in this case prevalence), based on available data. When studies produce heterogeneous results, the first response is to look for covariates that may explain the heterogeneity, and stratify studies into more homogeneous subgroups accordingly. But within those subgroups, the assumption is that the studies have been measuring the same phenomenon, albeit with varying quality. Therefore, the quality effects model is, in our opinion, clearly the best choice for this purpose. However, this will not be discussed further as the purpose of this paper is to compare properties of various transformations of proportions, and not to compare metaanalysis models.
Simulation
We carried out a simulation study to compare bias and coverage between the transformation methods, based on a hypothetical dataset. We defined nine studies of increasing size, with the smallest comprising 20 subjects, and each subsequent study having 20 more subjects, resulting in 180 subjects for the largest study. We assumed the number of cases in each study to have a binomial distribution, with study size and prevalence proportion as the parameters.
For the fixed effects model, we assumed a prevalence of 0.05, and for the random effects model we assumed the prevalence to have a Normal distribution with parameters 0.05 and 0.005, for mean and SD, respectively. For the quality effects model, we assumed that quality associated deviations from the true prevalence would also follow a Normal distribution with parameters 0.05 and 0.005. From the deviation we calculated a quality score between 0 and 1 using a quadratic function. (Note that the quality effects model does not require that the quality score is a function of the difference between the true effect and the study estimate, but for this simulation this provides a convenient way to calculate a quality score.)
We implemented the numbers and functions in Excel, using the Ersatz Monte Carlo simulation addin, and drew randomly from the distributions 1000 times.9 Each time we recalculated the metaanalysis for the three transformation options for each model. We calculated the mean of the central estimates, the bias in this mean and the mean squared error, and determined the coverage proportion, given the pooled CIs.
Results of the simulation study are presented in table 1. Without transformation, the effect of the squeezed variance for a prevalence of this size can clearly be seen: the mean pooled prevalences are much lower than the true prevalence of 0.05, dragged down by the large weights applied to the samples with low prevalence. For the logit transformation, a reverse (but much smaller) effect can be seen: the lowest sampled prevalences get less weight, and the mean prevalence therefore is overestimated. Both logit and double arcsine do much better than the no transformation option across all variables, with the double arcsine doing slightly better than the logit for most variables, but much better for bias.
For prevalences close to 0.5, there is hardly any difference between the results from the three transformation options (data not shown).
Example: multiple sclerosis
In this section we present an example from the Global Burden of Disease 2010 Study where systematic reviews of the scientific literature and quantitative metaanalyses were conducted in order to estimate the severity distribution of various disorders at the global level. We look at the distribution of cases of MS over stages mild, moderate, and severe. A total of 18 studies were found to contain usable data on the distribution of MS severity. A quality score was given to each study based on the method of measuring functional impairment, with higher preference given to studies that assessed impairment using both a physician and a standardised scale. Kurtzke's Expanded Disability Status Scale (EDSS) is considered by experts to be the gold standard for measuring MS functional impairment and studies using this were given the highest quality score of 1. The majority of studies in table 2 used EDSS, as demonstrated by the frequent score of 1. Three studies assessed MS severity with no emphasis on scores or scales used and were given a final quality score of 0.6.
Table 2 shows the studies in this analysis, their distribution over the three categories, their quality and population size scores. From the table, it is clear that substantial heterogeneity exists, and the I^{2} for these studies is 95.2% (95% CI 93.6% to 96.4%).
Table 3 shows the pooled results, with the normalisation option of MetaXL switched on. The pooled estimates do not differ much between models and transformation methods, with the random effects and to a lesser extent the quality effects model producing a bit larger proportion for the mild category at the expense of the severe one. In the transformed analyses the CIs of the random effects model are the widest: because the quality scores are mostly 1, the quality effects model CIs are not as wide.
Conclusion
We discussed the metaanalysis of prevalence, and two transformation options used: logit and double arcsine transformed proportions. We signalled issues with the metaanalysis of multiple categories, and proposed solutions for these issues.
We discussed the implementation of these methods in MetaXL, including a normalisation option for multiple category prevalence, and presented results from a simulation study and an example. We conclude that the double arcsine is the preferred transformation and that the MetaXL implementation of multiple category prevalence is an improvement in the methodology of the metaanalysis of prevalence.
What is already known on this subject

Metaanalysis of prevalence using inverse variance methods has the problem that the variance becomes very small when the prevalence is small or large, with the consequence that such studies get a large weight in the metaanalysis.

Transformation methods can be used to avoid an undue large weight for studies with small or large prevalence.
What this study adds

The double arcsine transformation is discussed and we show it has properties that make it the clearly preferred option over the often used logit transformation.

The generalisation of the methods to multiple category prevalence and the implementation in the MetaXL software package are introduced.

We conclude that the resulting MetaXL software is an improvement in the metaanalysis of prevalence.
References
Footnotes

Contributors JJB and SAD developed the methods, YYL, RN and TV did the metaanalyses, all authors contributed to writing.

Competing interests JJB owns Epigear International Pty Ltd, which sells the Ersatz software used in the analysis.

Provenance and peer review Not commissioned; externally peer reviewed.

Data sharing statement The data are available from the Epigear website, and are part of the MetaXL download.
Request permissions
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.