Background We present probabilistic and Bayesian techniques to correct for bias in categorical and numerical measures and empirically apply them to a recent survey of female sex workers (FSW) conducted in Iran.
Methods We used bias parameters from a previous validation study to correct estimates of behaviours reported by FSW. Monte-Carlo Sensitivity Analysis and Bayesian bias analysis produced point and simulation intervals (SI).
Results The apparent and corrected prevalence differed by a minimum of 1% for the number of ‘non-condom use sexual acts’ (36.8% vs 35.8%) to a maximum of 33% for ‘ever associated with a venue to sell sex’ (35.5% vs 68.0%). The negative predictive value of the questionnaire for ‘history of STI’ and ‘ever associated with a venue to sell sex’ was 36.3% (95% SI 4.2% to 69.1%) and 46.9% (95% SI 6.3% to 79.1%), respectively. Bias-adjusted numerical measures of behaviours increased by 0.1 year for ‘age at first sex act for money’ to 1.5 for ‘number of sexual contacts in last 7 days’.
Conclusions The ‘true’ estimates of most behaviours are considerably higher than those reported and the related SIs are wider than conventional CIs. Our analysis indicates the need for and applicability of bias analysis in surveys, particularly in stigmatised settings.
- RESEARCH METHODS
- HEALTH BEHAVIOUR
Statistics from Altmetric.com
To monitor the national response and achievement of prevention targets, researchers and public health officials spend considerable resources estimating and tracking HIV risk using biological and behavioural surveys worldwide.1 However, risk behaviour is reliant upon self-reported information which is highly vulnerable to social desirability response bias.2
In Iran sex work may be particularly stigmatised. Sex between persons who are not married to each other is illegal. Sex work therefore occurs underground and it may be extremely difficult to get accurate reports of behaviour by female sex workers (FSW). We believe the situation is similar in other parts of the world, but perhaps to a greater degree in Iran.
Thus, social stigma surrounding risky behaviours, especially those related to illegal sexual contact, complicate behavioural surveys, affecting the validity of responses and leading to bias in estimating the prevalence of key behaviours.3 ,4 To obtain valid estimates, researchers should therefore calibrate and triangulate the apparent or self-reported estimates, considering both random and systematic errors. Typically, researchers only report the possible and likely effects of biases on their study briefly in a section on limitations rather than attempting to calculate and correct for biases in their principal findings. Moreover, the literature presents much discussion on how to measure and quantify random error with less discussion on systematic error or bias.5 ,6
Sensitivity analysis is one naïve method that has been used to measure the amount of uncertainty around effects according to systematic measurement errors. The approach has been advanced to address potential confounding and selection biases, and also to consider the uncertainty about the bias parameters when correcting apparent effects.7 Other methods range from probabilistic sensitivity analysis8 to semi- and full- Bayesian techniques9 which produce more plausible CIs (intervals that include the true effects in the population) for the corrected effects. Although recent efforts are beginning to expand upon these new techniques, there is a gap in applying them, particularly for numerical outcomes. In the field of HIV/AIDS, which is always intimately linked to monitoring and measuring socially stigmatised behaviours, the application of such techniques is notably lacking.
The present study aims to describe probabilistic and Bayesian bias correction techniques for categorical and numerical measures and to empirically apply and compare them on several HIV-related stigmatised risky behaviours in a highly marginalised population. To better understand the pitfalls of the naïve methods and the process of bias adjustment, we also present and compare the above advanced techniques with the naïve methods.
Data and study subjects
From April to July 2010, 872 FSW were recruited in the first national biobehavioural survey to track the risk of HIV in key populations. They were recruited from 21 health facilities in 14 cities by convenience sampling. At each facility one trained female interviewer identified FSW routinely getting services and, after verbal informed consent, those meeting the eligibility criteria and agreeing to participate were enrolled in the survey. Risk behaviour questions were asked by the interview in a face-to-face interview (FTFI) and answers were recorded. All procedures were approved by the Research Review Board of the Kerman University of Medical Sciences.
Completed questionnaires were entered into STATA V.10 for data cleaning, quality checks and data analysis. The point estimates and CI for each risky behaviour were calculated by the standard survey analysis package. These estimates were called the apparent prevalence (AP) measures because they were as reported by FSW.
Adjustment methods for systematic error in reported categorical behaviour measures
To adjust estimates for misclassification bias influencing the validity of reported behaviours by the FSW, we acquired bias parameters (sensitivity (Se) and specificity (Sp)) from a previous validation study presented in detail elsewhere.2 The validation study was an independent survey conducted among 63 FSW in Tehran and Kerman (ie, they were not FSW who participated in the larger national survey) in which the participants were interviewed by FTFI as well as by in-depth interviews (IDI). These FSW were consecutively approached at the same clinics in the two cities and enrolled in the validation study using the same eligibility criteria. The interviewers conducted an IDI with each of the 63 FSW as a cognitive cross-check of their answers collected in the FTFI. The IDI was an open-ended interview and began with mutual trust building questions about their general living conditions, health status and social welfare needs as a consultative interview. This time the interviewer did not apply the questionnaire as step-by-step reading of the questions but rather followed a natural discussion leading to more private topics according to the participants’ comfort and lead. IDI responses were recorded by short notes and later transferred and aligned to the FTFI questionnaire in a separate column after finishing the IDI. The IDI was considered as the ‘gold standard’ for the purposes of this study and the bias parameters were calculated accordingly. Our assumption is that the responses given to professional interviewers in an open discussion—with a longer period to establish trust, check internal consistency and with the final answer based on the interviewers’ judgement—would provide a more truthful response than a structured series of close-ended questions.
The bias parameters are presented as Se and Sp for categorical behaviour variables (table 1). We applied two methods of bias analysis: a naïve sensitivity analysis and a Bayesian approach.
In the naïve method we used the following equation to correct the biased estimates: 1
Since we do not know the true amount of the parameters in the above model, instead we used the estimates:
Because of the uncertainty around the parameters (ie, Se and Sp), we re-ran the above model with different combinations of Se and Sp within the range of their lower and upper 95% CI (see online supplementary appendix 1). This provided a bound CI around the true prevalence (TP) of each risk behaviour. We called this method a naive sensitivity analysis approach to bias analysis.
To apply the Bayesian methods for bias correction we assumed that the bias parameters (Se and Sp) probabilities have β distributions. These are presented in online supplementary appendix 2 (with the shape parameters of a.Se, b.Se for sensitivity and a.Sp, b.Sp for specificity, respectively). β Distributions are a type of probability distribution that are frequently used to describe prior uncertainty about prevalence, Se or Sp in the Gibbs sampler.10 We defined the β distribution by two parameters, α and β, with α=y+1 and β=n−y+1, where y is the number of events out of n trials. This was done for Se and Sp separately and their β distributions were defined for every risky behaviour. We have assumed that there is no informative prior distribution for the TP of risky behaviours and so we assign a uniform prior distribution to TP. To obtain posterior distributions for TP, positive predictive value (PPV) and negative predictive value (NPV), we ran a model (see online supplementary appendix 2) in WinBUGS V.1.411 with 50 000 iterations and report the posterior mean and 95% credible intervals (CRI) (by percentile 2.5% and 97.5%) for the TP, PPV and NPV from iterations 40 001–50 000. The results are summarised in table 2 (column 3) and figure 1.
Adjustment for systematic error in reported numerical behaviour measures
In order to correct the reported numerical behaviours (eg, number of sex acts in last 7 days) for measurement error, we also acquired the bias parameters from the validation study, this time as mean differences between the findings in the IDI and the routine FTFI and also the Pearson correlation coefficient between the differences (IDI − FTFI) and the reported amount in the FTFI.2 The bias parameters for the numerical measures are presented in table 1. Monte-Carlo Sensitivity Analysis (MCSA) was used to correct the bias in numerical risk behaviours.
The theory behind MCSA is that values from the bias parameters (in this case, the difference between observed and true numerical behaviour means and the correlation coefficient between them) are randomly selected from their assigned probability distributions and then used to solve the bias correction equations to produce bias-adjusted estimates.
As an example, we observed that, on average, FSW report their ‘number of sex acts in the last week’ as 1.48 (0.7 to 2.3) fewer than in reality.2 We called this Delta (D). The true mean (TM) would be a linear function of apparent mean (AM), which is the mean of the reported behaviour and the amount of bias (D):
Since we do not know the true amount of the parameters (AM and D) in the above equations, instead we used the estimates: 2
To estimate the TM from the known normal probability distribution of AM and D (in the validation study), considering both random and systematic errors, we randomly picked two numbers and solved equation 2. Finally, we made an average of the entire simulated TM as the adjusted true estimate. We did this simulation by running a model in WinBUGS (see online supplementary appendix 3).
The model randomly selects a number from the distribution of bias parameter (D), with the mean and precision (1/variance) of 1.4 and 6.2 for ‘number of sex acts in the last week’, and adds it to a randomly selected measure from the distribution of ‘number of sex acts in the last week’ (AM), with the mean and precision of 2.2 and 6.7 reported by FSW recruited in the national survey. We repeated this for 50 000 iterations and, in every iteration, we summed these two numbers.
AM and D are correlated, and this should be considered when simulating the data. Suppose that the Pearson correlation coefficient between AM and D is R. In order to produce the two correlated distributions (for AM and D), we first created two independent random variables (X and Y) which have standard normal distributions. Given the Pearson correlation coefficient of R, we computed Z as:
Z has a standard normal distribution and is correlated with X with the correlation coefficient of R. The last step is to transform X and Z standard distributions into AM (mean=AMmean and standard division=AMSE) and Delta (mean=Dmean and standard division=DSE) by:
We have AM and D (correlated with the Pearson coefficient of R) and now we can calculate the true mean based on equation 2. The best estimate for R is calculated from the validation study (table 1). We applied the Fisher z transformation to reproduce its normal distribution in WinBUGS (see online supplementary appendix 3). At each iteration we drew a new value for R (consistent with the evidence from the validation study).
The above model was also used for other numerical behaviour measures. Finally, for all the numerical measures we calculated the mean and 95% simulation intervals (by percentile 2.5% and 97.5%) from iterations 40 001–50 000 and the results are summarised in table 3 (column 2 with MCSA label).
The 872 FSW recruited in the national behavioural survey were of average age 31.7 (95% CI 29.7 to 33.7) years (table 4). According to self-report, 81.5% (95% CI 71.5% to 88.6%) had ever been married while only 35.9% (95% CI 25.7% to 47.5%) were still in a marital union. The education level tended to be low, with most not having completed secondary education.
In table 2 the AP (ie, reported in FTFI) and TP (ie, corrected for bias by naïve sensitivity and Baysian methods) for categorical risky behaviours are presented. For ‘never tested for HIV in last 12 months’ and ‘non-condom use last sex act’, the corrected estimates were about 1–2% lower than those reported by the FSW. For all other behaviours the TP was higher than that reported. The differences between the AP and TP varied from <1% for ‘non-condom use sex act’ to 33% for ‘ever associated with a venue to sell sex’. CIs were wider with both the naïve and Bayesian approaches than with the analysis which only considered random error. For four behaviours the upper CI calculated by the naïve sensitivity method exceeded the plausible limit (ie, >100%), while all the Bayesian interval bounds fell within plausible ranges. The adjusted point estimates for both the naïve and Bayesian methods were similar for all behaviours except for ‘ever used drugs’ (79.4% vs 69.6%) and ‘not receiving the result of the HIV test’ (31.5% vs 36.8%).
The PPV and NPV (calculated from the posterior distribution) of the responses to the FTFI for seven categorical behaviours are illustrated in figure 1. The PPV varied from the lowest level of 80.5% (95% CRI 54.7% to 95.0%) for ‘non-condom use last sexual act’ to 97.4% (95% CRI 90.4% to 99.8%) for ever being married.
The NPV of the behavioural questionnaire for a history of symptoms of sexually transmitted infections and ‘ever associated with a venue to sell sex’ had the lowest levels of 36.3% (95% CRI 4.2% to 69.1%) and 46.9% (95% CRI 6.3% to 79.1%), respectively. The best NPV was recorded for ‘non-condom use last sexual contact’ at 90.3% (95% CRI 76.4% to 97.8%).
As illustrated in table 3, after adjusting for systematic errors by the MCSA technique, estimates of all numerical risky behaviour variables increased from a minimum of 0.1 year for ‘age at first sex act for money’ to a maximum of 1.5 for ‘number of sexual contacts in last 7 days’. In all cases the correction for measurement errors yielded higher estimates.
Bias analysis quantifies the influence of systematic error on the estimation of association in an epidemiology study.12 We have elaborated further the applicability of bias analysis in cross-sectional behavioural surveys to provide the best estimates of risky behaviour prevalence rather than associations. Both our naïve and Bayesian techniques for adjustment of measurement error produce point estimates for sensitive risk behaviours which are considerably higher than those reported by FSW in FTFIs. To our knowledge, adjustment techniques for bias, particularly for numerical outcomes, are very seldom elaborated upon in the literature.7 ,9 ,13 ,14 We propose a model for adjusting measurement errors which is applicable to surveys of high-risk populations measuring highly stigmatised behaviours.
Bias analysis has been proposed mainly to obtain an estimate of the direction and the magnitude of the bias and to reduce the human tendency of overconfidence in findings from a particular study. In line with Lash et al,15 we wish to emphasise the third objective of the bias analysis which is to guide new data collection. We have demonstrated in a behavioural survey (in this case, an HIV-related survey) that answers to sensitive questions are affected differently and to a considerably high level because of the stigma surrounding them. How large the effects are and how to adjust for them highlight the need for more careful attention in the data collection process and consideration of whether additional validation processes should be implemented. As an empirical example, our study audience is people working in the field of public health and HIV to encourage calibration steps.
Our general additive model (equation 2), considering both random and systematic types of errors, appears to be a simple but effective model to provide adjusted estimates for numerical variables. The steps are similar to the analytical approach described by Lash et al,12 applying the distribution of bias parameters to take into account the bias and random errors and summarising the final corrected estimate as a point with its simulation interval.15
We recognise that the two approaches have limitations. First, in the naïve method it was possible to have implausible values (ie, upper limits >100%) for the corrected prevalence when using Se and Sp in equation 1 since the uncertainty around AP was not considered properly. In contrast, the Bayesian technique considers the random errors around the AP as well as the uncertainty around the bias parameters (Se and Sp) to produce the adjusted estimates conditional to the prior TP which is bound by 0% and 100%.9 This condition on the outcome removes the need for Se+Sp−1 to be positive.14
Second, we note that the TP of ‘ever used drugs’ corrected by the Bayesian technique was considerably lower than the naïve method. This was probably due to the large degree of uncertainty around one of the bias parameters for this variable (ie, Sp 15.8–100%). The effect on the prevalence point estimate was masked in the naïve method, indicating that, for bias analysis, we need a perfect or at least a most probable guess of the bias parameters.13 For the plausible range of the TP, the uncertainty around the bias parameters (Se, Sp) translated into impossible values with an uninterruptable range for the naïve method.
Third, we have assumed the bias parameters to be equal for all subgroups of the survey, but this may not always be true. Moreover, considering a uniform prior distribution for the TP could be improved by replacing it with more plausible or realistic distributions. Unfortunately, access to such extra information is typically very limited. We therefore recommend that a sensitivity analysis should be performed to see how the final bias-adjusted estimates vary according to changes in the prior distributions of the target estimates, that the exact distributions of bias parameters (instead of approximate parametric distributions) should be employed and that different link functions between random and systematic error be used in the sensitivity analysis formula.
Finally, some may find the corrections for numerical variables too small and therefore not necessary to be considered. However, the magnitude of bias in most variables we assessed can result in substantial changes affecting the potential of the epidemic. For example, the number of sexual contacts in the last week was corrected from 2.3 to 3.8 per week. As a corrected rate, expanding the time frame illuminates the difference between the biased and corrected estimates, which is about 78 sexual contacts per year.
As part of any well-planned epidemiological study, researchers need to design and implement the bias analysis carefully, ensuring that the information on bias parameters is valid and sufficiently precise when used to adjust the observed estimates and also that the bias adjustment equations are properly chosen.12 ,15 Investigators planning behavioural surveys aimed at collecting stigmatised behaviours must also consider threats to the validity of responses and plan for a systematic validation study, even as a small subsample of the whole study, to collect the required data on bias parameters. Having a quality assurance strategy for HIV serological testing to make sure the test results are accurate is well-accepted practice. The same concept should apply to behavioural data upon which prevention planning and impact assessment depend.
In conclusion, we believe that adjusting for measurement bias in sensitive behavioural studies will yield a higher prevalence of risky behaviours most of the time which, given the social desirability response bias, is more likely to approach the truth. We further believe that both Bayesian and MCSA methods provide more plausible corrected estimates and therefore recommend their use in the analysis of bias and its effects on key measures of risk in surveys of hidden, hard-to-reach and marginalised populations. In addition to other types of study designs such as case–control studies, cross-sectional surveys can also benefit from applying bias analysis.
What is already known on this subject
Policymakers and health authorities need accurate estimates of risky behaviours to better monitor the response to an HIV epidemic.
HIV self-reported risk behaviours are mostly stigmatised issues and are prone to under-reporting (bias).
The literature presents much discussion on how to measure and quantify random error with less discussion on systematic error or bias.
What this study adds
There is a clear need for and applicability of bias analysis in surveys, particularly in stigmatised settings.
The two proposed bias analysis methods can be applied to produce bias-adjusted estimates for both categorical and numerical HIV self-reported risk behaviours.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
Acknowledgements The authors are indebted to their colleagues from the medical universities for helping in recruitment, interviewing and supervising the national biobehavioural survey in female sex workers.
Contributors AM contributed to the conception and design of the study; acquisition, analysis and interpretation of the data; drafting the article and revising based on reviewer comments. M-AM made substantial contributions to the conception, analysis and interpretation. SaN contributed to the conception and design of the study, interpretation of the data and revised the article for important intellectual content. SN contributed to the analysis and interpretation of the data and revised the article for important intellectual content. WM contributed to the conception, design, analysis and interpretation of data and revised the article for important intellectual content. AAH contributed to the conception of the study, interpretation of the data and revised the article for important intellectual content. KM contributed to the conception and design of the study, interpretation of data and revised the article for important intellectual content.
Funding This work (as a PhD thesis for AM) was supported jointly by HIV Research Trust (grant no. HIVRT11-052), Tehran University of Medical Sciences (grant no. 240/1626) and Regional Knowledge Hub for HIV/AIDS Surveillance—WHO collaborating center based at Kerman University of Medical Sciences (grant no 90/122).
Competing interests None.
Ethics approval All procedures were approved by the Research Review Board of the Kerman University of Medical Sciences.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.