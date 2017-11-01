Introduction

Perinatal morbidity scoring tools are tools that score or weight adverse perinatal events according to their severity.1–5 The morbidity scores are an appealing choice for a primary outcome in research evaluating the risks and benefits of new perinatal interventions or exposures. Because maternity patients are mostly young, healthy women, serious adverse events (such as maternal mortality) are extremely rare. The morbidity scoring ensures that these serious events are not treated as interchangeable with considerably less serious events that must often included in a composite study outcome to ensure sufficient statistical precision.6–8 Further, by producing a numeric outcome score for each pregnancy, the scoring tools provide a strategy for combining the health outcomes of both mother and infant into a single endpoint.

Although calculating a perinatal morbidity score for each study participant is relatively straightforward, difficulties arise when researchers wish to test if perinatal morbidity scores are significantly different between two or more intervention or exposure groups. These difficulties impede the use of perinatal morbidity scores as the primary outcome of applied research studies. The objective of this report was to outline of why perinatal morbidity scores can be difficult to analyse using standard statistical approaches, summarise the limitations of previous strategies used to analyse morbidity scores and present the use of a simple count-based (Poisson) regression approach to overcome these limitations. We apply the approach to data from our recent study evaluating the safety of labour and delivery following the closure of planned obstetrical services in 21 rural hospitals in British Columbia, Canada.9

Overview of why perinatal morbidity scores can be problematic for standard statistical tests The statistical distributions of perinatal morbidity scores are not compatible with common analytical approaches, which leads to challenges when testing for differences between two or more study groups. Figure 1 illustrates a typical distribution of a perinatal morbidity score. In this example, over half the pregnancies are assigned a score of zero (because most pregnancies are healthy and deliver without complications) and the distribution has a long, irregular tail (because serious events, although rare, are assigned considerably higher scores than milder adverse events). The non-Gaussian distribution means that differences in scores between study groups should not be tested using a Student’s t-test or linear regression, and log or other standard data transformations do not achieve normality. A non-parametric test comparing medians (such as the Wilcoxon’s rank sum test) is also problematic because the median score in both groups will usually be zero, ignoring important differences in the distributions of the tails. The scored variable could be collapsed into a small number of categories for statistical analyses (or even a binary variable indicating the occurrence of one or more of the adverse events). However, doing so would lead to a loss of the detailed, continuous variable that was initially generated by the scoring tool, as well as the relatively greater difference in severity associated with the most serious outcomes. Figure 1 Distribution of newborn outcome scores in a cohort of 11 066 infants published by Novicoff et al.2

Examples of strategies that have previously been used to analyse perinatal morbidity scores The detailed numerical scores initially generated by scoring tools have often not been used to their full capacity in previous research. For example, the DIGITAT trial was a randomised trial comparing expectant monitoring versus iatrogenic early delivery in pregnancies with suspected fetal growth restriction.7 As a secondary analysis, the researchers used the Morbidity Assessment Index for Newborns (MAIN), a validated outcome scale designed for ranking neonatal morbidity beyond 28 weeks’ gestation.4 10 Although the MAIN scale is a continuous variable derived from points to 47 possible birth complications, the score was collapsed into four categories (severe, moderate, mild and no morbidity) for statistical analysis, losing much of the score’s detailed information. The Adverse Outcome Index, an expert-opinion based scoring system that includes 10 adverse maternal and neonatal outcomes,1 was used as the primary outcome of a randomised trial evaluating the impact of labour and delivery room teamwork training.11 The outcomes ranged in severity from third/fourth degree perinatal tear (assigned five points per event) to maternal mortality (750 points). However, the trial’s primary analysis used a dichotomised version of the index (ie, a composite outcome indicating the occurrence of any of the 10 adverse events), which meant that maternal mortality was treated as interchangeable with a third or fourth degree tears. In a large randomised trial of expectant monitoring versus labour induction in post-term pregnancies, researchers created a detailed index of perinatal mortality and neonatal morbidity (scores ranging from 0 to 10 160).12 Analysis of the score as a continuous variable using a rank sum test found a significant difference in the median morbidity score between groups that was dismissed as being of ‘probably of no clinical importance’ (pg 1590)12 because it essentially only compared differences in very mild morbidity (ie, the median scores) between the two groups, not differences in serious events of clinical interest on the extremes.12

Using a count-based regression approach to compare morbidity scores We propose that an alternative way to compare perinatal morbidity scores between groups is through a count-based framework such as Poisson regression. The general idea behind the approach is that the points assigned to each component of the morbidity score are converted into count format data (where the outcome variable reflects the number of times an event occurred, such as the number of goals in a football match). More severe adverse events are ‘counted’ a greater number of times to reflect their increased severity. Severity-weighted rate ratios are then estimated using a Poisson regression model with CIs calculated through bootstrapping (resampling) techniques. The approach can be implemented through the following steps. Step 1. Rescale the existing scoring system The perinatal morbidity scores are first converted such that the adverse event with the lowest assigned score is assigned a value of 1 point, and all other adverse events are expressed multiplicatively in relation to the least severe event. The first two columns of table 1 show this conversion for the previously discussed Adverse Outcome Index.1 In this scoring system, the least severe event is a third-degree or fourth-degree perineal tear, which has a score of 5, while the most severe is a maternal death, which has a score of 750. To scale this scoring system, one would divide all scores by 5. The third-degree or fourth-degree tear then has a value of 1 point (5/5), while maternal death has a value of 150 points (750/5). Table 1 Safety of labour and delivery following the closure of local planned obstetrical services in 21 communities in British Columbia, Canada, modified from data published in Hutcheon et al.9 Table 2 shows a sample dataset of six pregnancies that uses the Adverse Outcome Index scores that have been converted for use in the subsequent regression model. The variable ‘outcome’ is a binary indicator of pregnancies that had any one of the adverse events that make up the index. For those women who had an adverse outcome (ie, Study IDs 003, 005, and 006), the variable ‘severity points’ reflects the rescaled Adverse Outcome Index scores, with more severe events (uterine rupture, intensive care unit admission) expressed as multiples of the least severe event (third-degree or fourth-degree tear). In the event of multiple adverse events (eg, Study ID 006), points are summed. Table 2 Example dataset for severity-weighted Poisson regression Step 2. Use Poisson regression to incorporate information on relative severity of events A weighted rate ratio associated with one or more independent variables is then estimated using Poisson regression. In this model, the outcome of interest is the count of severity points in a given pregnancy (analogous to, eg, the count of chronic obstructive pulmonary disease exacerbations in a given patient-year). For the dataset in table 1, the model would be specified in Stata as: poisson severitypoints X1×2×3…, irr vce(robust), where X1–X3 are the independent variables of interest (exposure group and any confounders), and robust SEs are specified to correct for overdispersion (ie, increased heterogeneity, indicated by a variance substantially greater than the mean). In the event of highly overdispersed data, the negative binomial model could be used as an alternative to Poisson. If absolute rather than relative measures of association are desired, an offset for the number of births (=1 for each row) should be included (the offset will cancel out when relative measures of association such as a rate ratio are calculated). Although a zero-inflated model is often used when data contain a large number of zeros,13 we opted against its use in this context as our zeros reflect the rarity of adverse events rather than a different underlying process creating the zero values. Rather than using the CIs produced by the Poisson model, CIs are calculated through bootstrapping (SAS, R and Stata code provided in the online supplementary appendix 1).14 This is done because although the model ‘counts’ more severe events a greater number of times, each adverse event is still only a single occurrence. The inference on a single stillbirth worth 80 points (in the rescaled Adverse Outcome Index) is much less stable than that on 80 women with third-degree or fourth-degree tears worth one point each, even though their contribution to the weighted rate ratio point estimate would be the same. The variance estimates produced by the Poisson model will result in artificially narrow CIs, and CIs should be instead estimated using bootstrapping. Supplementary Material Supplementary Appendix 1 [SP1.pdf]