Self report in clinical and epidemiological studies with non-English speakers: the challenge of language and culture
Internationally, there is a drive for equality in health care for ethnic groups. To achieve equality, produce sound policies, and provide appropriately targeted services good quality data are essential. Where data are based upon self report, especially from non-English speakers, there are major barriers to the accumulation of reliable and valid information. When data collection instruments designed for English speakers are simply translated into ethnic minority languages, measurement error can result from inadequate translation procedures, inappropriate content, insensitivity of items, and the failure of researchers to make themselves familiar with cultural norms and beliefs. More attention should be paid to conceptual and cultural factors especially in epidemiological and clinical studies where self report is used to gather data. More interdisciplinary collaboration is necessary as well as a modification of customary methods of data collection and the assumptions behind them. The essence of such modifications entails participatory research with members of the linguistic communities concerned.

The assessment of the health and healthcare needs of ethnic minority populations are acknowledged priorities for both health and social services in the USA, UK,1 and increasingly in Europe. The UK Race Relations (Amendment) Act 2000 places responsibility for equality between “races” on all public sector bodies. Internationally, there are increasing numbers of policies and legal requirements based on similar principles. A sound scientific database is necessary to ensure that services and initiatives are targeted appropriately for the groups in question. In the medical domain, reliable information is required on symptoms, health related behaviours, disease patterns, healthcare needs, use of services, and outcomes. Non-English speakers have been excluded from clinical trials and epidemiological studies, one of the several reasons for which may be the lack of valid and reliable instruments.2,3

There are major barriers to the accumulation of reliable information on ethnic minorities, particularly newer and older immigrants and refugees who may have little or no competency in English. About 23% of immigrants to Britain born in China, Bangladesh, India, and Pakistan have no functional skill in English and 70% cannot function fully in an English speaking social environment.4

Ethnic boundaries are imprecise and fluid and the grouping of people has too often ignored significant differences of religion, habits, and language. As this paper is concerned with the gathering of information obtained by self report we are focusing on preferred language rather than ethnic group itself.

A recent editorial in the British Medical Journal summarised the general problems of obtaining high quality data from self report by non-English speakers.5 However, this is a very complex and important topic that has received scant attention in the epidemiological literature. Accordingly this paper discusses the issues in greater depth, together with the implications for comparative surveys.


The problems of meaningful communication with non-English speakers by general practitioners, nurses, and hospital doctors have received some attention.6,7 However, there has been insufficient acknowledgement of the issues in relation to epidemiological studies and surveys. In the health domain self report data are gathered for several purposes (box 1) using a variety of methods ranging in precision from casual questioning through unstructured or semi-structured qualitative interviews, to standardised interview schedules and self completed questionnaires. These are produced with varying degrees of refinement. Some undergo extensive testing for validity, responsiveness, and reliability, others little or none.

Box 1 Uses of self report data

  • As an integral part of the clinical interview

  • As an adjunct to clinical measures, for example using standardised questionnaires such as the SF368 for the assessment of general health status, to assess degree of disability, to check for the presence and severity of symptoms, or to gather patient assessed outcomes, increasingly included in clinical trials and other studies of treatment efficacy.

  • As a part of epidemiological studies, for example gathering data on health related behaviour to monitor changes in personal habits, the use of diagnostic tools such as the Rose Angina Questionnaire,9 and for periodic international data collection, such as that on child and adolescent health.10

  • By the NHS in relation to health needs assessment for planning and targeting of services.

  • In studies of satisfaction with health care, for example women’s views of maternity services.11

Measurement error results from four major sources. Firstly, latent variables are not shared across languages. This might be the case with descriptions of mental wellbeing for example.11 Secondly, from the original selection of questions and response constraints, with respect to ambiguous wording, lack of clarity, awkward or inappropriate categories, and the order and context in which the items are presented. Thirdly, from the respondents themselves, in relation to lack of understanding, misinterpretation or confusion, lack of motivation, and/or the perceived social desirability of certain answers. Fourthly, from researchers not being informed about the populations to be sampled, for example not being familiar with cultural norms and conventions and failing to consult translators about the appropriateness of questions.

There has been a great deal of research, in the social sciences, on the way in which respondents interpret and respond to oral or written questions. When a person is confronted with a questionnaire item a series of cognitive processes are set in train (box 2). These processes affect the quality of the data obtained.12

Box 2 Cognitive processes triggered by questioning

  • Comprehension and interpretation of the purpose of the question

  • Information retrieval and reconstruction

  • Judgment about what is required

  • Evaluation of the situation and a decision about what to reveal

  • Selection of a response.

Self report data may assess some phenomenon within a particular group, say, satisfaction with care and, sometimes, to monitor changes over time. This requires only that the measure used is salient to and appropriate for the group concerned and is valid and reliable. If the data are to be used to make comparisons between groups as in clinical trials and most epidemiological studies, then the questions must be conceptually and functionally equivalent and salient for all the groups compared.


It has been customary to translate questions originally developed for native English speakers into the requisite language(s). An assumption is made that the modes of inquiry, types of assessment, and research methods appropriate for native English speakers can be applied to other linguistic groups. Experience in translating questionnaires and interview schedules began primarily in the USA, in the field of cross cultural psychology.13 Initially one or more professional translators would take material in English and translate it into the target language. Where more than one translator was involved translations would be compared and agreement negotiated. The focus was on achieving linguistic equivalence. Consequent field testing led to the realisation that bi-lingual people are not at all representative of the population from which they come being biased by age, education and, often, gender and they produce translations that are too formal and literary for most people.14

Key points

  • Collecting self reported data by ethnic group in multi-ethnic settings is necessary and difficult

  • There are major ethnic variations in many measures of self reported health and risk factor status

  • Translation into appropriate languages and back translation are necessary but insufficient steps

  • The social sciences literature offers guidelines on how to conduct surveys in cross cultural settings

  • There is an urgent need to improve the cross cultural validity of survey methods, particularly in epidemiological and public health research in multilingual, multi-ethnic societies

Currently, the most sophisticated translation techniques are applied in the field of patient assessed outcomes where methods have evolved to a prolonged process of item selection, testing, and retesting and consultations with people monolingual in the target language(s). This development has been fuelled by the availability of funding from pharmaceutical companies carrying out clinical trials where there will be insufficient numbers of patients in any one country and the increasing requirement of regulatory bodies to include information from patients themselves.

“State of the art” translation/adaptation uses an iterative process with several stages15,16 as summarised in box 3.

Box 3 State of the art translation/adaptation procedures

  • Translation of items by a team of bi-linguals

  • Comparison of translations

  • Negotiation of “best” items

  • Consultations with people who are monolingual in the target language(s)

  • Item refinement

  • Field testing with monolinguals

  • Refinements as needed

  • Testing for face, content, construct, and criterion validity in each language.

  • Testing for reliability and responsiveness

  • Statistical analysis of ratings of quality of translation across different countries

However, even this prolonged degree of testing has been criticised for failure to achieve items in other languages that are comparable to the original English in terms of appropriateness and meaning.17 Where multiple languages are involved single languages are translated separately so although each may bear some resemblance to the English, the different languages are not necessarily close to one another. Often there remain considerable differences of concept between languages simply because it is impossible to find equivalent translations. For example the term “feeling blue”, which is used in the original American version of the SF36 has different connotations when translated into different languages.18 In relation to languages such as Arabic, Cantonese, Punjabi, or Swahili, which have different roots to those of English, these issues are starkly highlighted. For example, the terms “check up” and “Pap smear” have no conceptual equivalent in any Chinese language.19


It is important to consider conceptual matters, cultural relevance, and the subtle connotations of words and phrases within a particular group.20 A distinction can be made between language and culture. For example, both England and USA are primarily English speaking but their citizens do not necessarily share the same values with respect to health. In addition, because of the differing healthcare systems, socialised and private medicine, the readiness of patients to admit to health problems may be affected, as may their satisfaction and compliance with medical regimens. It is important to bear in mind that the content of a questionnaire reflects not only the language of the originating country but also the standards, expectations, values, and preoccupations of both the researchers and the lay people involved in the developmental procedures. Even when people belonging to another culture speak fluent English they do not necessarily share the beliefs and values of native English speakers. Moreover, their interpretation of the meaning of a question may be somewhat different because there is evidence that bi-linguals process information differently than do monolinguals.21

Research and health needs assessment and have assumed that data from different ethnic groups can be compared. Western concepts of “health”, “risk”, and “need” for example are so dominant that it is easy to forget that there are alternative views. According to the evidence cognitive processes are universal across cultures but the content of those processes is clearly not, although many elements may be shared.22 One example would be in the distinctions drawn between family and friends in English. This is not nearly so clear cut in Asian and African cultures. Thus, a question such as “Has anyone in your family had heart disease?” may elicit a response drawn from a wider frame of reference in these populations than in northern Europeans.

Comparability of data across languages is not always achievable. For example, an analysis of datasets from six different languages based upon the World Health Organisation’s Quality of Life Scale were subjected to the Rasch model of measurement.23 Although there were some health related concepts that were similar across cultures, there were others which were very dissimilar, particularly those related to mental health. This study strongly suggested that equivalence of a questionnaire across several cultures is unrealistic at all but a very basic level.



In epidemiology and health survey research the methods for obtaining equivalent information from different language groups trail behind the fields of sociolinguistics and anthropology. Indeed, the design and development of questionnaires used in epidemiology in general, has been criticised on the grounds they are too often ad hoc and of poor quality.24 Two research projects carried out in this department have demonstrated these issues.

Fifteen local and national surveys of alcohol and tobacco use that had involved ethnic minorities were examined. Prevalence data on the consumption of alcohol and cigarette smoking differed by ethnic group. However, these differences were not consistent across studies. As the data were obtained, in many cases, from translated interview schedules and questionnaires doubts were raised about the quality of the resulting information. Accordingly the way in which items had been translated from the English was examined and compared with state of the art criteria. Only two of the studies had used more than one translator. One had used consultations with members of the language group concerned to investigate cultural sensitivities and one had consulted monolingual people on the adequacy of the translations. None of the studies had tested the questions for validity, reliability, or responsiveness and none had compared the translated questionnaires with languages other than English. Consultations held with Bengali speakers indicated that there were serious problems both with the translations and with cultural appropriateness, for example, asking Muslims about alcohol use, using expressions such as “weekend”, ‘Christmas”, and “hangover”, which had low relevance for the respondents.25

Another study concerned translations into Punjabi and Cantonese of the Rose Angina Questionnaire (RAQ). These translations were made by professional translators without any input from monolingual people representative of those to whom they would eventually be administered. A review of translations of the RAQ showed that, most often, no details of translation methods were given at all and face and content validity were not assessed in any of the studies.26 In depth interviews with Punjabi and Cantonese speaking peoples have highlighted issues of incorrect translations, inappropriateness, and lack of cultural relevance (Hanna L, PhD thesis forthcoming; personal communication).


Where face to face interviews use an interview schedule complications can arise because there exist somewhat different forms of the same language, for example, Bengali and the Syllheti variant of Bengali, which has no written form. For some languages, for example, Arabic and Cantonese the written and spoken forms are not the same. This means that if an interviewer asks questions in one of those languages the actual questions asked will not be the same as the questions as written with unknown effects on data quality.

The customary interview situation may violate expectations of other cultures about the normal way to interact, especially in face to face situations. Such interviews constitute highly unnatural social situations regardless of culture. For example, the questions have been decided in advance by some unknown party and not discussed with the respondent, the topic may switch abruptly, the response options may not fit the situation, the interviewer uses standard wording regardless of the respondent’s response, the interviewer is not supposed to elaborate or discuss to preserve standardisation. Thus requests for clarification meet with a standard response and the answers may not be grounded in the everyday life of the interviewee.27 These issues may be exacerbated in the case of people from cultures where surveys are uncommon or unknown.


In the health research and healthcare fields translation/adaptation of interview schedules and questionnaires from English into other languages have suffered from faults in relation to cultural hegemony, failure to ensure that the phenomenon of interest is present in all target groups, lack of salience of content, the non-equivalence of concepts, assumptions about willingness to disclose certain types of information, and the use of levels of language not easily comprehensible to the less well educated. More interdisciplinary cooperation could help ensure that advances in one field pertinent to another may be shared, for example cognitive aspects of survey methodology, health measurement, epidemiology, survey research, linguistics, and ethnomedicine.

Rather than pursuing cross cultural equivalence an alternative is to search for both emic and etic items referring to the topic of interest. Emic refers to those issues that are salient and meaningful only within particular cultures while etic refers to universal concerns such as the welfare of children or the inability to carry out daily activities. In relation to health behaviours asking if a person smokes tobacco products may be universally appropriate. However, asking if a person chews paan is unlikely to be. This strategy requires a much less ethnocentric and more participatory approach whereby monolingual and bi-lingual representatives of the target group(s) are involved at all stages of the research. Initial inquiries would ascertain which matters are of importance to the group concerned and generating items for inclusion in a mode of inquiry congenial to that group. The process consists of a collaborative spiral of inquiry, reflection, action, planning, and discussion. The end result would be a set of questions that would share common items supplemented by culture specific information. Measures developed in this way would be different in different languages with respect to some of the content but fully appropriate and salient to each. Such a procedure would allow for comparisons within groups over time and between groups for the shared items. The application of Rasch analysis, as previously mentioned, can assist in testing for the similarity of latent variables across languages.

Another approach is to focus upon the similarity of concepts rather than upon equivalence of items. For example, we might assume that the notion of physical wellbeing exists in all cultures but the implications could be different. For groups where prayer is important the ability to kneel may be essential to wellbeing, for others this may be less salient than the ability to play or even stand and watch football. It may not be necessary to have exact comparisons as long as the underlying purpose of the question is the same.

Translators should be required to advise not solely on the target language but also upon the cultural acceptability of the questions to be asked. In some cases bi-lingual people may have become so far removed from traditions and the community under study as to be culturally if not linguistically alienated. Unless requested to do so translators may not regard it as part of their task to comment on the salience or sensitive nature of the questions asked.

Researchers doing research with ethnic minorities should be cognisant of the customs, values, and beliefs of the target group(s) before designing any project. Issues of cross language data collection should be seen as a challenge and not as an obstacle, a stimulus to innovative thought and the development of new techniques of investigation. Cultural and linguistic differences have yet to be incorporated as fundamental to sound public health, primary and secondary care, and health promotion. Categories and concepts in health research based upon a Western epistemological order may impede the accumulation of high quality data from people born outside Western countries.28 Individuals’ reactions to illness and discomfort, their concepts of health, their behaviour, their help seeking is intimately bound up with cultural beliefs, values, and experience.


Policies to improve the health and health care of ethnic and linguistic majorities in the health and social services will not achieve their goals unless the cultural dimensions of self report are given fuller attention than hitherto. This poses a formidable challenge to policy makers, as this is no small task. In London, as in many metropolitan cities, numerous languages are represented.29 The implications for research, if the concepts in this paper are to be implemented, are substantial. Research funding bodies need to devise policies that consider the implications.


We are indebted to Lisa Hanna, Colin Fischbacher, and Leslie Alexander for their helpful comments on the several drafts of this paper.

 SMH had the original idea and wrote the paper; RB contributed ideas, helped redraft and commented at all stages.


