Background Physical activity (PA) is important for maintaining health, but there are fundamental unanswered questions on how best it should be measured.
Methods We measured PA in the Netherlands (n=748), the USA (n=540) and England (n=254), both by a 7 day wrist-worn accelerometer and by self-reports. The self-reports included a global self-report on PA and a report on the frequency of vigorous, moderate and mild activity.
Results The self-reported data showed only minor differences across countries and across groups within countries (such as different age groups or working vs non-working respondents). The accelerometer data, however, showed large differences; the Dutch and English appeared to be much more physically active than Americans h (For instance, among respondents aged 50 years or older 38% of Americans are in the lowest activity quintile of the Dutch distribution). In addition, accelerometer data showed a sharp decline of PA with age, while no such pattern was observed in self-reports. The differences between objective measures and self-reports occurred for both types of self-reports.
Conclusion It is clear that self-reports and objective measures tell vastly different stories, suggesting that across countries people use different response scales when answering questions about how physically active they are.
- physical activity
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Physical activity (PA) is a prime component of health behaviour, and accurate measurement is necessary for a better understanding of what drives differences in PA and how PA influences health. Most large-scale studies of population PA including PA at older ages have used self-report questionnaires.1 2 Valuable though self-report data may be, there are severe limitations to their use. First, responses to questionnaires may suffer from incomplete recall, impaired cognitive ability at older ages and the influence of socially desirable answers, which may vary across place and time. Second, there may be important differences across socioeconomic and demographic groups and places in what is considered PA and vigorous PA in particular. Third, with self-reports alone, it may be difficult to assess PA in different intensities since light-intensity PA is especially hard to measure with questionnaires. With accelerometers, it is possible to better assess PA in different intensities.
Different respondents may also attach different subjective ratings to a given PA level.3 4 This is particularly likely when using cross-country data as individuals from different countries may have culturally influenced thresholds demarcating response categories commonly found in subjective questions such as ‘inactive’, ‘mildly active’, ‘moderately active’, ‘active’ and ‘very active’.5
Self-assessments involve several cognitive processes, including understanding the question asked, recall of relevant information and translation of information into response alternatives offered by the survey administrator.6 For instance, when asked to report their PA on a five-point scale from 1 being ‘inactive’ to 5 being ‘very active’, individuals will first assess their true PA and then translate the assessed activity into what it means to them to be above or below a given threshold (such as ‘active’ or ‘very active’). Individuals may attach different labels to describe the same situation, making it difficult to determine how much of the variation is attributable to true differences and how much is attributable to variations in their subjective thresholds.
One promising way to address the measurement issues inherent in self-reports of PA is to rely on wearable devices such as accelerometers that can objectively measure PA. Accelerometers have been used in several population studies including NHANES7 and the Health Survey for England,8 and provide unique information about limitations of self-reported activity data. For example, although patterns across age and gender are qualitatively similar in the USA between the self-reports and accelerometer data, the accelerometer data indicate a much lower adherence to generally recommended PA levels than the self-reports.7 Similarly, in the UK,9 activity levels of older people rarely reach generally recommended levels when measured with accelerometers, but do when measured with self-reports.
International comparisons of PA have mostly used standardised questionnaires, such as the International PA Questionnaire.10 11 When comparing activity levels across European countries, the highest PA level was found in the Netherlands and Germany, while lower PA levels were found for Belgium and Sweden.12
The goal of this paper is to compare PA across countries and across different Socio-economic groups within countries, using both self-reports and accelerometry for the same people.
Participants and cohorts
Participants were recruited from three nationally representative cohorts; the Dutch Longitudinal Internet Studies for the Social Sciences (LISS) panel (https://www.lissdata.nl/lissdata/Home), the Understanding America Study (UAS) (https://uasdata.usc.edu) and the English Longitudinal Study of Ageing (ELSA) (http://www.elsa-project.ac.uk/). The Dutch and US panels interview respondents over the internet. To maintain national representativeness, respondents without prior internet access are provided with a computer or tablet and an internet subscription. ELSA is a nationally representative panel of English respondents aged 50 years and older who are interviewed face-to-face. Respondents from all cohorts were randomly selected to participate in the accelerometer study. The final analytical sample contained 748 participants from LISS, 540 from UAS and 254 from ELSA. Informed consent was obtained from participants.
Physical activity data
PA was assessed objectively with a wrist-worn accelerometer (Geneactiv, UK). We collected data at 50Hz in England and the USA and at 60 Hz in the Netherlands.i The data used in the analysis are based on 60 s epochs. Data were stored in gravity (g) units (1 g=9.81 m/s2). The Euclidean norm of the three raw signals minus 1 g, with negative numbers rounded to zero, was used to quantify acceleration related to the movement registered and expressed in milligravity units.13 Participants were required to wear the device for 24 hours a day for a minimum of seven consecutive days. Participants who failed to wear the device for at least 10 hours/day were excluded. There was little variation in average wear time across the three countries (23.66 hours/day in LISS, 23.26 hours/day UAS, 23.26 hours/day ELSA). Respondents were asked to report what time they woke up and went to bed, if and how long they were bicycling, and whether they had taken off the accelerometer and for how long.
Participants were also asked to self-report their PA (‘Overall, how would you describe your level of PA?’), rated on a five-point scale: inactive, mildly active, moderately active, active and very active. This question was not specifically designed to refer to the period of accelerometer wear, but elicits a general self-evaluation of PA. Furthermore, all three surveys asked for frequency of PA at three levels of intensity. For vigorous, moderate and mild activity, respondents are asked if they engage in them ‘hardly ever, or never’, ‘one to three times a month’, ‘once a week’ or ‘more than once a week’.
All three datasets contained information on gender, age, marital status, work status, education and ethnicity.
We rescaled objective PA measures on a similar five-point scale as used for self-reports. For this purpose, we took the Dutch data as a basis. To translate the objective measure into categorical responses on a five-point scale, we first computed each respondent’s average of measured acceleration conditional on having valid wear days over the 7-day period and assigned the label ‘inactive’ if the 7-day average fell below the 20th percentile, ‘mildly active’ if the average was between 20th and 40th percentile, ‘moderately active’ between 40th and 60th percentile, ‘active’ between 60th and 80th percentile, and ‘very active’ >80th percentile of the distribution. By construction, 20% of Dutch respondents fell in each of the five activity categories. We compared differences across groups in both PA measures using χ2 tests. Separate linear regression models were estimated to explore how covariates explained subjective and objective activity measures for all three countries. Linear regressions were redone using ordered probit and ordered logit analyses to check for sensitivity of results to the analysis method adopted. The set of respondent characteristics included marital status (not married is the reference group); gender (male is the reference group); ethnicity (non-white is the reference group) or immigrant status (in the Netherlands, non-native Dutch is the reference group); two dummy variables indicating educational attainment levels (low is the reference group); a dummy variable for working in the labour market (not working is the reference group) and a dummy for being 65 years or older (ages 50–64 are the reference group). To facilitate comparability, virtually all analyses presented in this paper were restricted to the 50 years+ population.
Sixty per cent of our sample was married, and almost half were female. Compared with the Dutch and English samples, the US sample share of the low education group was smaller while that of the high education group was higher (online supplementary table S1).
Supplementary file 1
Compared with the US sample, the Dutch and especially English samples are relatively old: 36% of Americans are among the 65+ group, which compares with 48% for Dutch respondents and 62% for English respondents. As a result of these differences in age composition, far fewer respondents in the ELSA sample are working than in either the Dutch or the US sample (30% in ELSA, 34% in LISS and 48% in UAS). The English sample has a high share of respondents with a low education and a low share of high education, which may be partly due to differences in definition of education categories.
Global self-reports and accelerometry
Results from table 1, showing distributions for the 50+ samples, suggest that the Dutch and English are more inclined to stay near the middle of the scale in self-reports: they are somewhat less likely to call themselves either inactive or very active compared with the Americans, but overall differences are modest. In contrast, the right three columns of table 1 show large differences between objectively measured PA of the Dutch and English, compared to the American sample: 38% of Americans would be in the bottom 20% of the Dutch distribution (and 21% of the English). The χ2 statistics at the bottom provide tests of three bivariate comparisons. The comparisons of self-reports show that the level of PA between the UK and the Netherlands is not significantly different, but both are significantly different from the USA.ii In terms of statistical significance, the comparison of the objective data shows a similar pattern, but the p-values of the tests comparing the USA with the Netherlands or England are several orders of magnitude smaller, and indeed the patterns are notably different between the USA and the two European countries.
Next consider breakdowns by salient background characteristics. The breakdown by age in table 2 presents once again a stark contrast between the conclusions based on self-reports and objective measures. In this table, to highlight the age patterns we provide the more complete age distribution statistics for the Dutch and American samples. In England and the Netherlands self-reports on activity do not show an appreciable relation with age, while the American data suggest both a modest increase in the number of inactive respondents and decrease in the number of very active individuals with age. The objective data make clear however that PA drops sharply at older ages in all three countries, while the age patterns in the USA are much steeper than for the self-reported data.
Table 3 shows no appreciable difference in self-reported PA between working and non-working respondentsiii, while the objective measures show highly significant differences in all three countries; working respondents are significantly more active. Once again, individuals’ subjective standards of what constitutes PA may vary by their work status with a higher threshold for PA among workers.
This finding may be partly related to the pattern found for different age groups, as older people are less likely to be working and are also less likely to be physically active. To disentangle the effect of the various background characteristics, we therefore turn to regression analysis. Redoing the analysis with ordered probit or ordered logit leads to qualitatively similar conclusions.
Table 4 presents results for separate regressions explaining the subjective and objective activity measures for all three countries, and once again restricting the samples to 50+. Using the Dutch sample to illustrate, Dutch respondents show very different patterns in the regression analyses for the subjective and physical measures of PA. The objective measure demonstrates a pronounced decline in PA with age, which is not present at all in the subjective measure. A very similar pattern is seen in the USA and England with the English data showing a significant increase in self-reported PA with age.
The estimates suggest some other country differences that differ depending on whether we are using a subjective or an objective measure of PA. For instance, in the Dutch sample, being female is positively associated with self-reported PA, while in the USA the association is negative. This association disappears in all three countries when we use the objective measure of activity. The relationship of PA with working is stronger with the objective measure than the subjective measure in all three countries. The objective measures do not indicate a relation of PA with education in the Netherlands or the USA. The self-reports suggest a positive association between high education and PA in the Netherlands. In England both self-reports and objective data suggest a positive association with high education, but the effect is twice as large in the objective data.
Self-reported frequencies and accelerometry
Rather than the global self-report question analysed so far, many studies use a more quantitative approach by asking how often one engages in vigorous, moderate or mild activity. In particular, this question asks respondents to report how often they take part in sports or activities that are vigorous, moderate and mild; answers to each question are (1) hardly ever, or never; (2) once to three times a month; (3) once a week and (4) more than once a week. Responses to these questions reveal that among respondents 50+ Americans and Dutch report rather similar frequency and intensity of PA (table 5). However, the English self-reports are quite different. In particular, the English are much more likely to say that they frequently engage in vigorous PA. Consistent with this, the English are much less likely to say that they frequently participate in moderate or mild activity.
The results of OLS regressions of the measured weekly average of accelerometer readings on a set of dummies representing the categories listed in table 5 are shown in online supplementary table S2. First considering the coefficients of the Dutch variables we note that the self-reported frequency of activities does not appear to have a consistent relation with measured PA. For example for vigorous PA, the effect of doing that one to three times a month is about equal to the effect of engaging in vigorous PA more than once a week. For moderate activity, we observe a somewhat larger effect for engaging in moderate activity more than once a week compared with the lower frequencies. For mild activities, there is no clear pattern, although we find a significant negative effect of doing mild activities once a week on measured PA.
The US and England columns show interactions with the dummies for these countries. For both for the USA and England the coefficients for vigorous activity frequency are mostly large and negative. This implies that for the same frequency of reported vigorous activity the objectively measured level of activity is mostly much lower in the USA and England compared with the Netherlands. For moderate activity, the English interactions are once again large and negative. The interactions with the US dummy are not significantly different from zero for this level of activity. For mild activity, the England dummies are not significant, while for the USA they are significant and positive. Altogether, the relation between self-reports and measured PA is unclear, confirming once again the difficulty of measuring differences in PA based on self-reports. The most significant outcome in table S2 is the large negative US dummy, indicating that at the same level of self-reported PA, measured PA is considerably lower.
The GENEactive model was chosen on grounds of cost, ease of use and available software.14 In addition, the wrist-worn accelerometer has several advantages over waist-worn actigraphs: it is easier for participants to wear and avoids errors of positioning. It can be worn continuously, so that movements at night can be detected.
The very high compliance rate in all three cohorts reduces the need for imputation for the time that the device was not worn. Since moreover the wear time was not statistically different across countries, we deemed it unnecessary to impute activity for the limited time the device was not worn. The large differences found between patterns of self-reports and of measured activity across countries or across demographic groups within countries are unlikely to be affected materially by the minor differences in compliance across cohorts.
We have noted a number of demographic and socioeconomic differences between the three samples. Although these differences may affect conclusions about population levels of PA, it should be borne in mind that the main aim of the paper is not to compare PA by country or socioeconomic group per se, but rather to analyse how conclusions may change if we adopt different measures of PA.
It is clear that self-reports and objective measures tell vastly different stories. Both across countries and across different socioeconomic and demographic groups within countries, self-reports vary only moderately or not at all. At the same time, accelerometry indicates large differences across certain groups. We have found a sharp decline of PA with age, a much higher level of activity in the Netherlands and England than in the USA, and a higher level of activity among working than among non-working respondents. The discrepancy between the objective measures and the global self-reports points at reporting standards for PA that vary across groups. Individuals in different environments and in different age groups simply have different standards of what it means to be physically active. Respondents seem to adjust their standards for what it means to be physically active to their own circumstances, such as age.15 16 Conceivably, standards are set relative to others in the same age bracket or same demographic group, so that standards vary in proportion to the average level of PA in a group.
The global self-report question (‘Overall, how would you describe your level of PA? (1) inactive, (2) mildly active (3) moderately active, (4) active and (5) very active.’) would seem to be an obvious approach to comparing PA in different countries. Our analysis suggests that this goal is elusive. The global question is an example of a question using ‘vague quantifiers’.17 18 The criticism often made against such a question is their inherent incomparability: for instance, ‘moderate activity’ may mean very different things to different people. This does not necessarily mean that quantitative questions are more informative if respondents find it difficult to recall the quantitative information with sufficient accuracy. Indeed measures listed with vague quantifiers are more predictive of target variables of interest than answers to quantitative questions.19 Similarly, if we run regressions with the self-reports on a five-point scale as presented in table 1, the R2 is 0.250, which is almost twice as high as the R2 in online supplementary table S2 (0.136).
Thus, the issue is not that simple self-reports of PA are less reliable than the more detailed questions for frequency of various levels of PA. Rather the problem with both types of questions is that they are understood systematically differently by different groups or by respondents in different countries and hence are unsuitable for use in comparisons across these groups. For that purpose, the use of accelerometry appears indispensable.
What is already known on this subject
Most large-scale population studies have used self-report questionnaires to assess physical activity.
There may be important differences in reporting bias across socioeconomic and demographic groups.
What this study adds
The self-reported physical activity data showed only minor differences across countries.
In contrast, objective data showed dramatic differences.
Across countries people use different response scales when answering questions about how physically active they are.
The authors thank Joris Mulder and Annette Scherpenzeel for designing and managing the data collection among members of the LISS panel and Tania Gutsche, Bas Weerman and Eric Esajian for the design and management of the data collection among members of the Understanding America Study. Margaret Blake, Marta Jackowska and Stephanie Schrempft contributed to the management of data collection in ELSA. The authors also thank Arthur Stone for many helpful comments.
↵i We used the GENEActiv PC procedure to generate the data. Under this procedure, the individual data points are summed to create the epochs. Since the Dutch data are based on 60Hz, we multiplied the Dutch epochs by 5/6, to make outcomes comparable across the three samples.
↵ii This conclusion remains true when we correct significance levels for multiple hypothesis testing using Holm-Bonferroni critical values.
↵iii The seemingly significant effect for the USA goes away once we correct for multiple hypothesis testing.
Contributors AK, JB, MH, JPS and AS conceived the study and drafted the paper. SHW and AK carried out the statistical analysis, had full access to the data and take responsibility for the integrity and accuracy of the results. All authors contributed intellectually to refine the study design and to the critical revision of the manuscript.
Funding This research was funded by grants from the National Institute on Aging including R-37AG25529 to JPS at Rand and R01AG20717 to AK at USC. Funding for ELSA was provided by the National Institute of Aging (R01AG017644) and a consortium of UK government departments coordinated by the Economic and Social Research Council.
Competing interests None declared.
Patient consent Obtained.
Ethics approval Ethical approval was gained from the respective ethics committees of each study (ELSA, LISS, UAS).
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement ELSA data are publicly available from http://www.data-archive.ac.uk/.
Press Release We are planning a press release through USC
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.