Study objectives: To examine the internal validity of a dietary pattern analysis and its ability to discriminate clusters of people with similar dietary patterns using independently assessed nutrient intakes and heart disease risk factors.
Design and participants: Population based study characterising dietary patterns using cluster analysis applied to data from the semiquantitative Framingham food frequency questionnaire collected from 1942 women ages 18–76 years, between 1984–88.
Setting: Framingham, Massachusetts.
Main results: Of 1942 women included in the cluster analysis, 1828 (94%) were assigned to one of the five dietary pattern clusters: Heart Healthy, Light Eating, Wine and Moderate Eating, High Fat, and Empty Calorie. Dietary patterns differed substantially in terms of individual nutrient intakes, overall dietary risk, heart disease risk factors, and predicted heart disease risk. Women in the Heart Healthy cluster had the most nutrient dense eating pattern, the lowest level of dietary risk, more favourable risk factor levels, and the lowest probability of developing heart disease. Those in the Empty Calorie cluster had a less nutritious dietary pattern, the greatest level of dietary risk, a heavier burden of heart disease risk factors, and a relatively higher probability of developing heart disease. Cluster reproducibility using discriminant analysis showed that 80% of the sample was correctly classified. The cluster technique was highly sensitive and specific (75% to 100%).
Conclusions: These findings support the internal validity of a dietary pattern analysis for characterising dietary exposures in epidemiological research. The authors encourage other researchers to explore this technique when investigating relations between nutrition, health, and disease.
- dietary patterns
- Framingham Study
- internal validity
Statistics from Altmetric.com
With advancements in dietary assessment methodologies, many of the observed epidemiological associations between diet and chronic disease risk have been supported by clinical research and now inform our public policy recommendations for health promotion. None the less, nutrition researchers continue to critically evaluate investigative methods and strive for further enhancements that minimise the well established limitations of dietary assessment and analytical techniques.1–6
Food frequency questionnaires (FFQs) have facilitated the widespread investigation of diet and disease relations in large cohorts. With their generally established reliability and validity,7–9 FFQs provide a cost effective method for assessing usual dietary intake over a specified period of time within a population. FFQ data have typically been used to rank individuals according to varying levels of nutrient intake exposure, enabling calculation of relative risk estimates. As exposures to nutrients are not independent,3,5,10,11 interest in overall dietary patterns has emerged in the recent literature.6,12 In fact, FFQs are now commonly used to characterise intake of food groups and to explore total dietary patterns in relation to cardiovascular and cancer risk.13–19
An analytical approach that characterises overall dietary patterns will provide an innovative strategy for assessing dietary exposures in chronic disease epidemiology.3–5,10 Cluster analysis is one multivariate technique that has been successfully applied to nutrition data for characterising food intake patterns.6,12 In this report, we discuss the use of cluster analysis applied to a FFQ that was administered to women in the Framingham Nutrition Studies. We examine the internal validity of the dietary patterns using two sets of independent criterion measures: (1) nutrient intake assessed by dietary records and (2) heart disease risk factor profiles.
The Framingham Study was started in 1948 as a longitudinal population based study of cardiovascular disease. In 1971, a second generation cohort was enrolled, forming the Framingham Offspring/Spouse (FOS) study.20 This cohort consisted of 5124 Framingham Study offspring, between the ages of 5 and 70, and their spouses if married.
Subjects and ethics
The data reported were collected among women in the Framingham Offspring/Spouse cohort at Exam 3 (1984–88). The data collection protocols and procedures were approved for human subjects by the Office of the Institutional Review Board at the Boston University School of Medicine, Boston Medical Center. At Exam 3, some 2005 women aged 18 to 76 years participated (83% of eligible). Dietary patterns were characterised using cluster analysis applied to food consumption data from the semiquantitative Framingham FFQ. Participants were grouped into one of five clusters based on food intake patterns identified from individual responses to a FFQ. The clusters were then compared for independent assessments of nutrient intake (based on separate three day dietary records) and cardiovascular risk factors (based on standardised Framingham exam protocols), as described in detail below.
The self administered Framingham FFQ is a modified version of the original Willett questionnaire.21 This instrument was validated in the Framingham Offspring cohort.22 It contains 145 food items with seven non-overlapping response categories, ranging from “rarely or never use” to use “four or more times each day.” Respondents reported how often, on average, they consumed a standard portion of each food item during the past year. Reported frequencies were used to estimate the number of usual daily servings of each food item. Among women who came to FOS Exam 3, 97% completed the FFQ (n=1944). Of those, only two women (0.1%) were excluded from these analyses because of incomplete FFQ data.
Analytical details of the cluster technique were previously published and the identification of the dietary clusters has been described.6 In brief, the 145 food item listings on the FFQ were first classified into 42 nutrient based food categories. The food categories were consistent with subgroups of foods defined by the American Dietetic Association's Exchange List for Meal Planning.23,24 Food item listings in a particular category contained similar levels of macronutrients and other key nutrients (for example, all vitamin C rich fruits (>30 mg/serving) were grouped together). An estimate of the usual number of daily servings for each of the 42 food categories was derived for each study participant by summing across the category's component items, weighted by the individual frequency responses.
We used two procedures to identify dietary patterns. Firstly, VARCLUS 25 identified and grouped the 42 food categories that women consumed with a similar level of frequency. This SAS procedure is similar to a factor analysis, with the added requirement that food categories are separated into non-overlapping groups. The VARCLUS technique does not require that foods be eaten at the same time of day, at the same meal, or in similar quantities. Thus, foods that appear in the same cluster were consumed with a similar daily frequency (that is, relatively frequent consumption versus relatively infrequent consumption). For example, women who reported relatively higher daily intakes of fish also reported relatively higher daily intakes of other lower fat foods including skinless poultry, whole grains, and low fat dairy. Using this method, 13 food groupings were identified, each containing multiple food categories (see appendix).
In the second step, Ward's clustering method 26 was used to separate women into non-overlapping groups based upon similarities in their frequency of consumption of the food group clusters. This method considered how women differed in their consumption of the 13 food groupings and used the pseudo t2 statistic, a criterion for identifying the optimal number of clusters of women with distinctive food consumption patterns. The pseudo t2 statistic is plotted against the number of potential clusters, and when it changes little, it is concluded that adding clusters does not provide a better fit to the data.
We evaluated the reproducibility of the dietary pattern clusters by examining their stability and classification ability using discriminant analysis. This analysis was performed on the five clusters using the 13 food groupings as discriminant variables and the 1828 subjects originally clustered as the sample. From the discriminant analysis, classification functions were developed and used to assign each subject into one of the five clusters. Subjects were then cross classified according to their original cluster classification and the one obtained from the discriminant analysis.
Nutrient intake analyses
We estimated mean nutrient intake levels among women in the clusters using three day dietary records as an independent assessment of nutrient intake. As previously published,22 independent estimates of nutrient intake in the Framingham cohort are derived from either 24 hour dietary recalls or three day dietary records. In our sample, 1802 women provided 24 hour dietary recalls (90% of the cohort) and 1265 women completed three day dietary records (63% of the cohort). Women who completed the dietary records were similar to those who did not complete them with respect to most heart disease risk factors. As estimates of nutrient intake derived from multiple days are considered more representative of usual intake than estimates from a single day,22 we present findings based on the three day dietary records. We reviewed the dietary data based on three day dietary records in comparison with data derived from the 24 hour recalls. The analysis using recall data did not change our interpretation of the differences in nutrient profiles between the clusters as presented in this report.
Dietary records (two weekdays and one weekend day) were completed by participants using published research protocols.22,27 Nutrient calculations were performed using the Minnesota Nutrition Data System (NDS) software (NDS version 2.6; Food Database 6A; Nutrient Database 23).28 Three day mean nutrient intakes were determined for each individual. Macronutrient intakes were expressed as percentage of total calories whereas all other nutrients were expressed in absolute amounts (g, mg, μg).
Cardiovascular disease risk factor analyses
Risk factors for cardiovascular disease are routinely measured at all Framingham exams. Framingham protocols for risk factor measurement are standardised and have been summarised by Cupples et al.29 The two year probability of developing coronary heart disease (coronary heart disease risk score) was estimated using a published Framingham algorithm.30 This multivariate model is gender specific and includes the following risk factors for determining primary coronary heart disease risk: age, fasting lipid levels (total cholesterol, high density lipoprotein-cholesterol, and triglycerides), systolic blood pressure, smoking status, presence of left ventricular hypertrophy, diagnosis of diabetes, alcohol consumption (ounces per week), use of antihypertensive medication, postmenopausal status, and body mass index. The coronary heart disease risk score was calculated only for women who were free of heart disease at Exam 3 (n=1722).
Analysis of variance was used to test for differences in food group consumption across clusters (table 1). Next, for each nutrient (table 2), we assigned ranks ranging from 1 to 1265 to the three day mean intake estimate for each of the 1265 women who completed the dietary records. Ranks were assigned so that an individual with a desirable nutrient intake level (that is, lower fat intake) received a lower rank, whereas an individual with an undesirable nutrient intake level (that is, lower fibre intake) received a higher rank. This facilitated a consistent interpretation linking lower risk with lower ranks and higher risk with higher ranks. For “risk nutrients” (total and saturated fat, cholesterol, alcohol, and sodium) and for energy, protein and monounsaturated fat, intakes were ranked from lowest (1) to highest (1265). For polyunsaturated fat and the “cardioprotective nutrients,” intakes were ranked in the reverse order, from highest (1) to lowest (1265). In these analyses, monounsaturated fat was not considered a protective nutrient as the majority of its consumption by Framingham Offspring/Spouse participants was derived from animal products, which also contribute saturated fat, rather than from plant sources.31 Age adjusted mean ranks were determined for each nutrient in each cluster, because age distributions differed between clusters.
Ranks of overall dietary risk were computed using the mean of the ranks of the individual variables in the nutrient list. Firstly, we computed the mean rank across the 19 nutrients for each of the 1265 women with dietary records. For a given individual, this mean of the ranks over 19 nutrients represented their overall nutrient rank. The age adjusted least square mean of these overall nutrient ranks was then computed for each cluster to provide a measure of overall dietary risk.
For each of the cardiovascular disease risk factors considered (table 3), each woman who participated in the cluster analysis (n=1828) was assigned a rank ranging from 1 to 1828. Here too, ranks were assigned so that a woman with a desirable risk factor level (higher high density lipoprotein-cholesterol) received a lower rank and a woman with an undesirable risk factor level (raised low density lipoprotein-cholesterol) received a higher rank. All but three of the cardiovascular disease risk factors were ranked from lowest (1) to highest (1828). The exceptions were the protective factors (physical activity, high density lipoprotein-cholesterol, and oestrogen use), which were reverse ranked, from highest (1) to lowest (1828) to coincide with a consistent interpretation of desirability. Age adjusted mean ranks were determined for each risk factor in each cluster.
The coronary heart disease risk score was ranked in a similar fashion to provide a measure of overall heart disease risk. This score was ranked in ascending order for all women who were free of heart disease at Exam 3 (n=1722). Finally, the age adjusted least square mean of the coronary heart disease risk score ranks was computed for each cluster.
Identifying dietary clusters
Five groups of women were identified by the cluster analysis, representing distinct segments of the population with unique dietary patterns (fig 1). Of the 1942 women included in the cluster analysis, 1828 (94%) were assigned cluster membership. The remaining 114 women (6%) were excluded because of extreme food behaviours that deviated markedly from the food consumption patterns of the five clusters that emerged. Dietary patterns were characterised by differences in levels of consumption of 11 of the 13 food groupings (table 1). Each cluster was assigned a label based on the key distinguishing features of its dietary pattern: Heart Healthy, Light Eating, Wine and Moderate Eating, High Fat, and Empty Calorie.
Compared with other clusters, women in the Heart Healthy cluster consumed notably more daily servings of vegetables, fruits and low fat milk, other low fat foods (non-fat and low fat dairy, skinless poultry, fish, whole grains, etc), and legumes, soups and miscellaneous foods (including other vegetarian products and shellfish). They also ate fewer servings of diet beverages and firm vegetable fats and had relatively lower intakes of sweets and animal fats, desserts, and high fat dairy and snack foods. Among the dietary patterns observed in this cohort, the Heart Healthy pattern was most consistent with current population based dietary guidelines for health promotion.6
Women in the Light Eating cluster chose fewer servings of legumes, soups and miscellaneous foods, refined grains, soft margarines and oils, and sweets and animal fats. Those in the Wine and Moderate Eating cluster consumed more servings of wine and cholesterol rich foods (eggs and organ meats) and high fat dairy and (salty and fatty) snack foods. Contributing to their otherwise moderate eating pattern, their intake levels of desserts and sweetened beverages were lower than other groups of women.
The High Fat dietary pattern was distinguished by substantially higher consumption of fats of all types (animal fats, firm and soft vegetable fats, and oils). Women in this cluster consumed more servings of sweets, refined grains, and diet beverages and relatively more servings of desserts. Their dietary pattern was notable for lower intakes of fruits and low fat milk, other low fat foods, legumes, soups and miscellaneous foods, high fat dairy products, and snack foods. The Empty Calorie pattern was characterised by markedly higher intakes of desserts and sweetened beverages with fewer servings of vegetables, wine, and cholesterol rich foods. The Empty Calorie label for this pattern reflects its relative lack of nutrient density despite its relatively higher caloric content, a result of the predominant food choices made by these women.
Ranking of nutrient intake
An analysis of age adjusted cluster mean ranks of independently assessed nutrient intake was one approach to validating the dietary pattern methodology (table 2). Dietary patterns differed significantly from one another on the basis of nutrient ranks for 15 of the 19 nutrients considered (79%). As well, the patterns differed on the composite measure of overall dietary risk. These individual and composite differences in nutrient intake ranks confirmed the placement of the clusters along a continuum of dietary risk, ranging from the most desirable Heart Healthy pattern to the least desirable Empty Calorie pattern.
It is notable that the Heart Healthy pattern was most desirable (lowest ranks) in terms of protein, total, saturated and monounsaturated fat, carbohydrate, fibre, calcium, vitamins C, B6, and E, folate, and β carotene intakes. Not surprisingly, this cluster displayed the lowest level of overall dietary risk. At the other end of the spectrum, the Empty Calorie dietary pattern was least desirable (highest ranks) for intakes of these same nutrients. The only exception was for saturated fat intake. However, the mean rank for saturated fat in the Empty Calorie cluster was not significantly different from the mean rank in the High Fat cluster, which was the absolute highest. On the positive side, women in the Empty Calorie cluster had the most desirable (lowest) rank for alcohol intake. The Empty Calorie dietary pattern achieved the highest level of overall dietary risk, anchoring its position at the high risk end of the continuum.
The ranking analysis also confirmed distinguishing nutrient profiles of other dietary patterns. The Light Eating cluster displayed the lowest rank for energy intake, consistent with its lower caloric density and overall pattern of lower food and nutrient intake relative to other clusters. The Wine and Moderate Eating cluster had the highest rank for alcohol and a relatively high rank for total calories. Women in this cluster also had the least desirable (lowest intake) levels of carbohydrate and calcium in their diets. Yet their overall micronutrient profiles and total and saturated fat intake levels were relatively closer to the desirable end of the spectrum than most other clusters. Finally, the High Fat cluster was distinguished by the least desirable (highest) intake of saturated fat. Ranks for other dietary fats, protein, and most micronutrients were relatively closer to the less desirable range.
Ranking of coronary heart disease risk factors
A second level of validation of the dietary pattern analysis was supported by the differences in cardiovascular disease risk factors observed between the five clusters (table 3). As with the nutrient ranks, the analysis of age adjusted cluster mean ranks of risk factors and the estimated coronary heart disease risk score demonstrated a continuum of risk among the five groups of women. Not only did the clusters differ by age, but they also differed in terms of the predicted likelihood of developing coronary heart disease and in levels of 13 of the15 risk factors considered (87%).
Heart Healthy women were again positioned at the most desirable end of the risk spectrum and Empty Calorie women maintained the highest risk status. Women who ate a Heart Healthy diet had the most desirable profiles for smoking, physical activity, and blood concentrations of total cholesterol and triglycerides. These women were the oldest, yet they had the lowest probability for developing coronary heart disease in the next two years. In contrast, women who exhibited the Empty Calorie dietary pattern had the least desirable profiles for BMI, overweight and obesity, smoking, physical activity, and dyslipidaemia in general. They were also relatively more likely to have undesirable ranks for blood pressure measures, hypertension prevalence, and low density lipoprotein-cholesterol levels. Though women in the Empty Calorie cluster were the youngest, they had a relatively higher likelihood of developing coronary heart disease in the next two years. Their risk was not substantially different from women in the cluster with the absolute highest predicted risk.
Relative to other groups, those in the Light Eating cluster had relatively lower to moderate ranks for most risk factors and for the coronary heart disease risk score. Women in the Wine and Moderate Eating cluster had the least desirable blood pressure levels and a relatively higher prevalence of hypertension. This clinical profile, along with their higher level of alcohol consumption, was a major contributor to their highest ranking on the coronary heart disease risk score. Levels of overweight and dyslipidaemia were more desirable among women in this cluster relative to other groups of women. While women in the High Fat cluster had less desirable levels of low density lipoprotein-cholesterol relative to other groups, they had more desirable profiles for BMI, obesity, hypertension, and blood pressure measures.
Reliability of the cluster analysis
Discriminant analysis correctly classified 80% of the sample. When each of the original clusters was considered separately, the sensitivity of the discriminant analysis ranged from 75% to 100% over the clusters as follows: 75% for the High Fat cluster, 78% for Heart Healthy, 80% for Light Eating, 90% for Empty Calorie, and 100% for Wine and Moderate Eating. Similarly, the specificity ranged from 75% to 100% over the five clusters as follows: 75% for the High Fat cluster, 89% for Light Eating, 90% for Heart Healthy, 98% for Empty Calorie, and 100% for Wine and Moderate Eating.
Using cluster analysis with food frequency data from a large population based cohort of women, we characterised unique patterns of dietary exposure within five distinct segments of the population. Groups of women whose dietary patterns were distinguished by cluster analysis differed from one another in terms of food group consumption, desirability of individual nutrient intakes, overall dietary risk, desirability of cardiovascular disease risk factors, and predicted risk for developing heart disease. In fact, the dietary patterns ranged along a continuum, from a low fat, nutrient dense “Heart Healthy” eating pattern to a higher fat, less nutritious “Empty Calorie” pattern. The continuum corresponded with both overall dietary risk and risk for the predicted development of coronary heart disease. These data support the internal validity of a dietary pattern analysis for characterising dietary exposures in epidemiological research. While our analyses were focused on heart disease risk factors and predicted coronary heart disease risk, it is believed that the utility of the dietary pattern approach would be comparable for exploring other chronic disease outcomes, including hypertension and cancer.12,18,34 This is an area for further research.
Multiple unique eating patterns were identified in a cohort of women.
Dietary patterns display a range of nutritional and cardiovascular risk.
Cluster analysis is a valid approach to dietary exposure measurement, enabling the exploration of relations between eating patterns, health, and disease.
Dietary patterns are informative for the development and targeting of effective nutrition intervention messages.
Analytical approaches that define the food consumption patterns of people within a population have been used, albeit relatively rarely, in a variety of research settings since as early as 1981. Their popularity has increased in the past decade as investigators have realised the utility of grouping individual food items into interpretable eating patterns that lend themselves to epidemiological investigations.12,35 Aggregating foods into meaningful food groupings or nutrient based categories (that is, fruits high in antioxidants) and then statistically clustering people on the basis of dietary behaviours is now considered to be an important feature of studies of diet and disease.36 Recently, Pryer et al 12 used cluster analysis and recorded food intake data to identify four diet groups among men and four among women in Great Britian. These authors described important differences in nutrient profiles, sociodemographic characteristics, and behavioural features that are relevant for the development and targeting of health promotion strategies and public health nutrition policy.
While the validity of FFQs has been established for relative ranking of nutrient intake,7,21,37 dietary pattern analyses using FFQ data are relatively recent. FFQs are attractive for this use because they effectively measure longer term usual diet and are cost efficient tools to use with large cohorts. Hu et al 38 used factor analysis and data from a FFQ to identify two predominant eating patterns, a prudent pattern and a Western pattern, within a small cohort of 127 men. The prudent pattern was similar to the Heart Healthy pattern observed among Framingham women, and the Western pattern was similar to the High Fat and Empty Calorie patterns described in our cohort. Recent prospective analyses involving 44 875 men followed up over eight years by Hu et al 19 demonstrated that the prudent and Western dietary patterns predict risk of coronary heart disease, independent of other lifestyle variables including smoking and body mass index.
These Framingham analyses lend support to the internal validity of the dietary pattern approach and provide further justification for its application in epidemiological research. The data of Hu et al38 are important but are somewhat limited in scope because the validity of the dietary pattern analysis was established only at the level of food groupings using factor analysis. Our analyses identify the unique dietary patterns of individuals in the population and demonstrate the validity of this technique in comparison with two independent criteria.
Approaches that identify dietary patterns enable the examination of broader aspects of the diet, rather than single nutrient or food exposures, in relation to health outcomes.3,39,40 In light of the complexities of the biological mechanisms known to cause disease, it seems important to understand relations within the context of the overall diet and to explore analytical techniques that discern patterns of intake that confer greater or lesser risk. This perspective is supported by recent findings in the literature where dietary pattern interventions, such as the DASH trial,41 contributed to risk reduction in clinical settings.
Our five dietary clusters included eating patterns similar to the prudent and Western diets described by Hu et al 38 at the extremes of the risk continuum. Yet, differences between dietary patterns were not limited to a comparison of the two extremes within our sample. We were also able to show important differences in nutrient and risk factor profiles between patterns that fell in the middle of the continuum. One notable example is the Wine and Moderate Eating pattern, which was associated with higher blood pressure levels and higher predicted CHD risk.
We recognise that there is a level of non-specificity that accompanies dietary pattern analyses. Dietary patterns are inherently complex. Without more detailed analyses, they do not enable the specific identification of the particular nutrient or dietary components within the pattern that may be responsible for the observed differences in disease risk between population subgroups.38 While the dietary pattern approach may be complex, the dietary patterns and clusters of people are in fact unique and non-overlapping. Each person belongs to only one cluster and the clusters are distinguished from each other by levels of food group consumption.
Thus, while we do not know which specific individual component(s) of the dietary patterns relate to better health outcomes or the relative importance of the components in conferring risk, we have begun to identify what those individual components are within the context of the overall eating pattern. We have also established that certain dietary patterns are associated with healthier profiles while others are associated with disease risk. Furthermore, we have identified personal characteristics (gender and age) and other behavioural characteristics (alcohol consumption, smoking habits, etc) that are closely linked to dietary behaviour. This information is likely to be beneficial for developing interventions based upon dietary pattern messages and may facilitate the targeting of intervention messages to relevant subgroups of the population.
With established validity, the dietary pattern approach offers an innovative analytical strategy for measuring dietary exposures in epidemiological research settings and in other populations. This approach may prove well suited for addressing nutrition research questions and offers an alternative to traditional methods that consider isolated dietary components as exposures of interest. Dietary pattern analyses may be especially useful when traditional analyses have failed to identify associations between individual nutrients and the outcome of interest, or to use the dietary pattern as a covariate in traditional single nutrient analyses to assess the effect of the exposure nutrient independent of the dietary pattern.38 We demonstrated the internal validity and stability of a dietary pattern analysis and documented how the dietary behaviours of distinct groups of people relate directly to nutritional risk and heart disease risk. The availability of this analytical approach will probably increase the ability of investigators to explore relations between nutrition, health, and disease.
Funding: this research was supported, in part, by the National Heart, Lung and Blood Institute (NHLBI) grants and contracts R01-HL-60700, by the Department of Health and Human Services, Public Health Service, National Research Service Award T32 AG00220, and by the American Dietetic Association Foundation through the Kraft Foods Fellowship program. The Framingham Study was supported by NIH/NHLBI contract N01-HC-38038.
Conflicts of interest: none.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.