Article Text

Download PDFPDF

Validity of rapid estimates of household wealth and income for health surveys in rural Africa


STUDY OBJECTIVE To test the validity of proxy measures of household wealth and income that can be readily implemented in health surveys in rural Africa.

DESIGN Data are drawn from four different integrated household surveys. The assumptions underlying the choice of wealth proxy are described, and correlations with the true value are assessed in two different settings. The expenditure proxy is developed and then tested for replicability in two independent datasets representing the same population.

SETTING Rural areas of Mali, Malawi, and Côte d'Ivoire (two national surveys).

PARTICIPANTS Random sample of rural households in each setting (n=275, 707, 910, and 856, respectively).

MAIN RESULTS In both Mali and Malawi, the wealth proxy correlated highly (r⩾0.74) with the more complex monetary value method. For rural areas of Côte d'Ivoire, it was possible to generate a list of just 10 expenditure items, the values of which when summed correlated highly with expenditures on all items combined (r=0.74, development dataset,r=0.72, validation dataset). Total household expenditure is an accepted alternative to household income in developing country settings.

CONCLUSIONS It is feasible to approximate both household wealth and expenditures in rural African settings without dramatically lengthening questionnaires that have a primary focus on health outcomes.

  • socioeconomic status
  • indicators
  • Africa

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Fifteen years ago in a classic article, Mosley and Chen proposed wedding social science and medical approaches to the study of child survival in a framework that would include both proximate and more distal socioeconomic determinants.1 Since this time, the epidemiological literature has witnessed an explosion of interest in questions relating to the socioeconomic patterning of health and disease.2 Recently, there have been suggestions that a broad paradigm shift is required in epidemiology to accommodate a new focus on systems that generate patterns of disease.3-5 It also seems likely that—as interdisciplinary research into the socioeconomic origins of health and disease advances—new data collection instruments will be needed that are able to satisfy disciplinary concerns on both sides of the epidemiology/social science divide.

Socioeconomic status has two broad, interlinked components: class and position.6 Socioeconomic class refers to social groups that arise from interdependent economic, social and legal relationships among a group of people. Socioeconomic position is an aggregate concept, making reference to holdings of assets, the income that these assets yield, and the consumption that such income permits.

Socioeconomic position is generally conceptualised as referring to the diverse components of economic and social well being that differentiate persons of different social classes, including both resource-based and prestige-based measures. Wealth and income are two important dimensions of socioeconomic position. In developed countries, there is a wealth of data both on socioeconomic class and on aspects of socioeconomic position.6 By contrast, in developing countries, especially in rural areas, such data are far less readily available, and measurement of these determinants of health outcomes is challenging. Distinctions based on social class may not be especially meaningful when the vast majority of respondents report themselves to be self employed farmers. On the other hand, measurement of socioeconomic position is complicated by the fact that few households own the kinds of major consumer durables that epidemiologists are most comfortable enumerating, such as radios, cars, refrigerators, and the like. Self reported measures of total income are unlikely to be reliable, because, quite apart from an understandable reluctance to reveal such information to a stranger, the myriad transactions undertaken by such self employed people make it unlikely that respondents know this datum.7 Faced with this difficulty, economists working in developing countries administer lengthy interviews, often running to several hours, collecting detailed information on literally hundreds of purchases and sales. Imposing an accounting framework on these data, together with imputations for the value of goods for which no price data are available, makes it possible, with considerable effort, to arrive at estimates of income, or expenditures, that can be seen as measures of socioeconomic position or the resources available to the household.

Given the extensive evidence linking socioeconomic position to health outcomes, there are good reasons for collecting such information. However, existing approaches in epidemiology are unsatisfactory or ad hoc. Using MEDLINE and the terms socioeconomic (in title) + Africa to search articles published since 1990, we found 19 studies that examined associations between health outcomes and socioeconomic status using measures of socioeconomic position. In addition to usually controlling for educational attainments, quality of housing and water supply—factors with direct links to health status—six studies used some measure of self reported income,8-13 and eight included one or two selected assets, such as radio, refrigerator or shoes, or access to electricity.9 14-20 A few others used a wider set of assets, reported individually 21 22 or aggregated via simple summations,23 weighted summations using subjectively determined weights,24 or principal components analysis.25 26 As already noted, self reported incomes are unlikely to be accurate. However, in the asset-based analyses it is not obvious why only one or two selected consumer durables are chosen, nor does this literature make it clear why a wider set of assets should be described individually, summed, or subjected to reduction by principal components analysis. This does not imply that epidemiologists should necessarily adopt the approaches taken by other disciplines. Many health scientists are either unfamiliar with the methods used by other social scientists, wary of the lengthy questionnaires required to elicit the necessary information, or sceptical of the validity of the resulting data.

In this paper, we illustrate rather simple methods for measuring aspects of household socioeconomic position in a variety of rural African contexts. These methods are more satisfactory than the ad hoc inclusion of selected household characteristics, but are considerably simpler to implement than the more complex approaches customarily encountered in economic research. Specifically, we explore two different methods. The first is an asset-based approach intended as a proxy for wealth, based on a simple weighted sum of the numbers of different items owned by the household. The second exploits the common practice of using total household expenditures as a proxy for the income generated by resources available to the household, extending this approach to show how one can identify a small number of expenditure items that, when summed, mirror expenditures on all items combined. By simplifying data collection requirements, both methods permit meaningful economic parameters to be estimated without overloading questionnaires that have a primary focus on health.



Four different datasets are used in this analysis. The first two—from rural areas of Mali and Malawi—are used to derive and test a simple measure of household wealth that does not require the assessment of the monetary value of the items possessed. The third and fourth—both representative national surveys of the rural areas of Côte d'Ivoire—are used to derive and test (respectively) a proxy for total household consumption that is estimated on the basis of responses to just 10 simple questions.


An integrated household survey was undertaken by the International Food Policy Research Institute (IFPRI) in the Zone Lacustre region in August–September 1997. Ten villages were purposively selected in order to include all types of agricultural livelihood systems in the region. Within each village, systematic random sampling was used to select a 1 in 3 sample, yielding a sample of 275 households. The primary aim of the study was to test the comparative properties of different methods of identifying food insecurity. Among other questions, men in the household were asked about their ownership of 18 different types of agricultural implements, and 18 consumer durables such as bicycles, gas lamps, tables, chairs, etc. In addition, women in the household were asked about their ownership of 16 different kinds of kitchen equipment (pots, cups, calabashes, etc) and 14 types of household durables similar to those asked of the men. The full set of agricultural implements and consumer durables included in the questionnaire was identified before the survey using free listing techniques as described by Hudelson,27 in addition to reviewing previous questionnaires used in rural Mali and conducting spot observations around the study area. Questions were asked about the numbers of each item owned, and their value if they were to be sold in their current condition. The purpose of the study was fully explained to each household before the beginning of the interview, and verbal consent to participate was obtained from the household head.


A two round survey of 700 rural households was conducted in the central region of Malawi in 1998 by IFPRI, in collaboration with Bunda College of Agriculture. The main objective of the study was to assess the income and food security impact of participation in one of two different rural development projects operating in the region. Consequently, the sample design was guided by the necessity of selecting an adequate number of respondents from each of the two groups of project participants, as well as from a control group. Approximately 200 households were selected from the list of participants in each project using a two stage procedure. Households belonging to either project were organised into farmers' clubs of variable size. Thus, the farmers' clubs were chosen as the primary sampling unit, and a number of them were selected in the first stage using simple random sampling. Because of the variable size of the clubs, a number of households proportional to the size of the club were drawn in the second stage. This procedure yields an equal probability of selection for each of the beneficiary households in each project domain. Because of the lack of an appropriate sampling frame for the control group, the remaining 300 households were sampled from a group of farmers not belonging to either of the two projects, using a variant of the EPI cluster sampling method.28 Detailed data on 22 individual assets and nine types of livestock, including information on the number of units owned, their monetary value and intrahousehold control, were collected from both male and female heads of the household. The set of assets and livestock was identified as described above for the Mali case study. The purpose of the study was fully explained to each household before the beginning of the interview, and verbal consent to participate was obtained from the household head.

Côte d'Ivoire

A nationally representative integrated household survey—the Côte d'Ivoire Living Standards Survey (CILSS)—was conducted each year from 1985 to 1988 by the Ivorian Direction Nationale de la Statistique in collaboration with the Living Standards Measurement initiative of the World Bank. The purpose of the surveys was to monitor changes in living standards and to “contribute to the design of development policies by providing a stronger empirical foundation for policy dialogue”.29 This analysis uses data from the second (1986) and fourth (1988) rounds of the survey. Both surveys were two stage random samples of 16 households in each of 100 primary sampling units, or clusters (43 urban and 57 rural in 1986, 45 urban and 55 rural in 1988), distributed between five large geographical strata. Only the rural segments are used in this analysis. The two samples are entirely independent in that there was no overlap between the clusters selected in 1986 and those selected in 1988.

The questionnaires used in 1986 and 1988 were virtually identical. A household roster was used to obtain basic information on all household members, and a comprehensive household accounts approach was used to estimate household expenditures.30 Specifically, household members were asked to recall amounts purchased as well as consumed from home production of 34 different food items (past two weeks), “daily” and “annual” expenditures on 39 different non-food items, rent (actual or imputed), utility bills, expenditure on education, the use value of durable goods, remittances paid out, and wage income in kind. Many similar questionnaires from a variety of different countries can be downloaded without special permission from the Living Standards Measurement Survey Web site at:


The asset-based approach

A household asset score was derived by assigning to each item in the list of assets (g) a weight equal to the reciprocal of the proportion of the study households who owned one or more of that item (w g), then multiplying that weight by the number of units of assetg owned by the household (fg ), and summing the product over all possible assets. Thus, for householdj,

Embedded Image

The total value of household assets was calculated by summing—over all assets owned—the reported current values of those assets (Vg ). This approach is based on the assumption that households with greater resources will purchase and own a greater number of consumer durables.

Several comments regarding this asset score should be noted. Firstly, it deliberately omits housing quality. Housing is both a direct correlate of health status and a measure of wealth. The former consideration suggests that some measure of housing be included during data collection and analysis. However, in rural localities of developing countries, housing markets are almost non-existent. Most dwellings are constructed using household labour and a mix of purchased and gathered goods. (Examples of these being metal sheeting and mud, respectively.) Consequently, it is rarely possible to attach a monetary value to housing stock. For all these reasons, it makes sense to collect information on quality of housing, but include it separately during data analysis. Secondly, it also omits the value of land. Including land in this measure requires that it be valued. Doing so in the context of rural areas of developing countries is fraught with difficulty. Firstly, in many contexts such as that experienced in the Mali study, there are simply no purchases or sales of land, making valuations impossible. Even where such transactions take place, as in the Malawi study, they are rare. As land quality is highly heterogeneous, it is not clear that these few purchases or sales can be used to value land owned by all households. It should also be noted that the quantity of land owned may not always be a good measure of wealth. The amount of income that a rural household can generate will depend not only on the quantity of land it owns, but also whether the household can rent in additional land, whether the land is irrigated, whether it is flat, slightly or steeply sloped, and the type of soil.

Thirdly, the choice of weighting system for the asset score was based on the assumption that households would be progressively less likely to own a particular item the higher its monetary value. The ability of the household asset score to mirror household asset value was tested by transforming both variables to a log scale to remove the asymmetries in the distributions and then calculating a Pearson correlation coefficient to assess the strength of the association between the two. High values of the correlation coefficient indicate that households are similarly classified by both measures.

The expenditure approach

This method assumes that higher levels of expenditures by households is a measure of higher socioeconomic position—see Deaton, 1997 7 for further discussion and evidence. The first step is to calculate total household expenditure by summing for each household the annualised values of (1) food expenses, (2) farm product home consumption, (3) value of output of non-farm enterprises consumed domestically (4) rent, imputed rent, utility bills, expenditures on education, daily and yearly non-food purchases, use value of household durable goods, (5) remittances paid out, and (6) wages in kind. Details on the derivation of this variable for the Côte d'Ivoire Living Standards Survey are given by Oh and Venkataraman,31 and a more general discussion of the issues involved in constructing a summary indicator of consumption is given by Hentschel and Lanjouw.32

Next, to identify a reduced list of consumption items that, taken together, would closely mirror total household expenditure, we first eliminated those components of total household consumption expenditure for which large numbers of households reported zero consumption over the recall period. The rationale for doing this was that the proxy measure of total expenditure—just like the true measure—had to be capable of distinguishing fine gradients in welfare even among the poorest subset of households. Having eliminated a number of components, we then assessed the strength of associations between total household consumption expenditure and expenditures on each of the remaining components, using the Pearson correlation coefficient with both variables expressed on a log scale.

Finally, the max_r procedure of Mark,et al 33 was used to select 10 individual items of expenditure that—when summed—would best preserve the relation between households ranked on their true total expenditures. The algorithm, which is described in detail by the authors, maximises the correlation r between the proxy measure (the sum of 10 selected expenditure items) and the true measure, which is simply the sum of all expenditure items considered in the estimation. Maximising the correlationr ensures minimal attenuation of risk estimates when this exposure is subsequently related to disease outcomes in a logistic regression framework.33 Other strategies for selecting a reduced set of items that best predict the true measure—such as stepwise selection procedures—do not set out to maximise r and are shown by the authors to perform less well on this criterion. Themax_r algorithm directly searches for the subset of k items that correlates most closely with the true sum measure.


key points
  • Analyses of associations between socioeconomic status and health in Africa are marred by poor conceptualisation and measurement of socioeconomic position.

  • Wealth can be assessed using a comprehensive listing of household assets, with total values adequately approximated by weighted frequencies.

  • Expenditures, a proxy for income, can be assessed by developing a shortlist of key expenditure items, exploiting existing survey data.


In the Mali survey, the household asset score was highly correlated with the total value of household assets when both variables were expressed on a log scale (fig 1;r=0.74, n=275, p<0.001). A slightly lower correlation was observed when livestock were included along with the household items (r=0.69, n=275, p<0.001). In the Malawi survey, the household asset score was very highly correlated with the total value of household assets with both variables expressed on a log scale (fig 2; r=0.83, n=707, p<0.001). A markedly lower correlation (r=0.53) was observed when livestock were included along with the household items. This proved to be attributable to non-linearities in the association that were no longer apparent when both variables were expressed on a double-log scale (r=0.87, n=707, p<0.001).

Figure 1

Association between household asset index and the total monetary value of the same assets (275 rural households, Northern Mali, 1998).

Figure 2

Association between household asset index and the total monetary value of the same assets (707 rural households, Central Malawi, 1998).


A total of 911 rural households were available for analysis in the 1986 Côte d'Ivoire survey. All but one of these households reported non-zero expenditures on purchased food (n=910), and all reported non-zero expenditures on “other” expenditures (rent or imputed rent, utility bills, expenditures on education, daily and yearly non-food purchases, use value of household durable goods). Of the various subcategories of “other” expenditures, all households reported non-zero expenditures on “annual” expenditures, which were expenditures on 30 different non-food goods and services that are typically purchased only occasionally. Nearly all households (n=901) reported non-zero expenditures on “daily” items (nine items: street foods, soft drinks and tobacco, soap and cleaning products, and fuel for heating, cooking and vehicles). With all variables expressed on a log scale, expenditure on purchased foods was correlated with total household consumption expenditure at ther=0.76 level, while the “annual” expenditures were correlated at the r=0.79 level, and the “daily” expenses were less highly correlated (r=0.52). Other types of expenditure had large numbers of households reporting zero expenditures.

The subset of 10 items identified by Mark, et al's max_r method as most closely mirroring total expenditures on items in the “annual” category are shown in the table 1. The sum of these 10 individual expenditure items was correlated with total household consumption expenditure at ther=0.74 level (both variables expressed on a log scale; fig 3). This association was replicated in the second dataset (CILSS 1988), where the reduced measure was correlated with total household consumption expenditure at ther=0.72 level (again, both variables expressed on a log scale; n=856).

Table 1

Subset of expenditure items mirroring total household expenditure in the Côte d'Ivoire Living Standards Survey, 1986

Figure 3

Association between total annualised household consumption expenditure and the 10 item proxy (911 rural households, Côte d'Ivoire, 1996).


Household wealth and income are important distal determinants of health that are difficult to measure in societies where wage income is negligible and savings are not generally held in the form of money. Epidemiologists have frequently sought to get around these difficulties by working with broad indicators of socioeconomic status such as the construction quality of the home, or ownership of individual, high value, durable goods. Such approaches are unsatisfactory in that: (1) they confuse genuine distal social determinants of health with more proximate ones such as the quality of the household environment, (2) they confuse the concepts of income and wealth, which social scientists understand to influence health through different pathways, and to be influenced by different aspects of national and sub-national policymaking, (3) the choice of indicators is atheoretical, as well as being unstandardised, with the result that effects cannot be compared across populations, and (4) the approach lends itself to adjusting for numerous unlinked indicators in a multiple regression framework, which is not equivalent to classifying households on a continuum capturing the whole range of possible conditions. On the other hand, the standard economic household survey approaches to estimating household welfare and asset accumulation in these circumstances entail data collection procedures that many epidemiologist find excessively burdensome for respondents and coders alike, and potentially subject to a large number of undocumented biases.

This article illustrates two simple methods that can be used to generate proxy measures of household wealth and income in a rural African context. For the wealth indicator, only a few days of preparatory activities are required to generate the appropriate list of assets; for the income proxy, on the other hand, investigators would need to have access to a recent integrated household survey dataset for the area in which they intend to work. Although this seems like a demanding requirement, there are probably now few countries in the world where such a survey has not been conducted at some time over the past decade, and many of the datasets are either freely available on the world wide web (for instance, on the Living Standards Measurement Survey pages of the World Bank's site), or on request from national statistical offices, international research organisations, or universities. The analysis that has been outlined takes no more than one day of a mid-level analyst's time, time that is more than compensated for by enormous savings in interviewing time in the field.

The proposed household asset score classifies households according to the weighted sum of the assets at their disposal. The list of assets is of course context specific, but should be as comprehensive as possible. We found that the rapid scoring method using derived weights appeared to correlate highly (r⩾0.74, both variables expressed on a log scale) with the more complex monetary value method in both sites where we were able to investigate this association (Mali and Malawi). In both locations, correlations were higher when livestock was not included because only a few households managed livestock in addition to their crop growing activities. The utility of the rapid scoring method will depend on the difficulty that household members encounter in valuing their assets, the validity of their responses, and the time saved by omitting this question. It should be noted that as an indicator of wealth, the measure as presented is incomplete, because—for reasons outlined in the Methods section above—it does not consider what are often a rural household's most valuable assets: the family home, and land holdings. Financial capital and human and social resources are also ignored, and the measure will be even more limited where livestock holdings are not included. Nevertheless, the score does give a quantitative indication of the overall value of a household's assets relative to other similar households, a value that should be comparable across populations as even when actual values are not ascertained, they can be predicted if the relation can be determined in a validation subsample. The measure stands out for its simplicity of use, and differs from more familiar indicators in that it is based on a rather comprehensive list of household assets, differentially weighted according to a systematic algorithm.

The proposed proxy for household income uses a statistical algorithm developed by Mark, et al 33 for the original purpose of selecting food items that should be included in a dietary intake questionnaire intended to preserve the relation between people in nutrient intake (though not necessarily to provide accurate estimates of absolute quantities ingested). Exactly the same problem is encountered when trying to estimate total household consumption expenditures without having to ask about expenditures on hundreds of different items. Provided that it does not matter if the absolute value of expenditures is correctly estimated (which is generally the case in studies of the determinants of health), then one can search for the set of expenditure items, which when summed, correlates most highly with the overall total, preserving the relations between households. In this analysis, we were able to generate a list of just 10 expenditure questions, the values of which when summed correlated highly (r=0.74) with total household expenditures. The relation was equally strong (r=0.72) in a second, independent dataset from the same country. Clearly, there is no guarantee that such relations will be identifiable in every case, and investigators will have to weigh whether the potential benefits of obtaining a valid proxy for household income outweigh the costs of obtaining datasets and undertaking the required analysis.

The question remains as to whether the levels of precision attained by these proxy measures are adequate to permit valid inference in studies of the socioeconomic patterning of disease and health in rural Africa. Nelson and coworkers 34 have shown that, assuming bivariate normality, a correlation between true and proxy measures ofr=0.75 implies 59% of observations correctly classified by the proxy measure into the extreme quintiles of the distribution, and virtually no gross misclassification into opposite extremes. It can safely be assumed that currently used methods of classifying socioeconomic status in rural Africa are associated with much greater levels of misclassification, as well as having dubious construct validity. Where lower levels of misclassification are required, epidemiologists may need to borrow more complex and unfamiliar methods from other disciplines.


The authors would like to thank Dr Guessan Bi Kouassi, Director General of the Institut National de la Statistique, Côte d'Ivoire, for granting access to the Côte d'Ivoire Living Standards Survey.



  • Funding: funding for data collection and analysis of the Mali and Malawi data has been supported by the International Fund for Agricultural Development (TA Grant No. 301-IFPRI). We gratefully acknowledge this funding, but emphasise that ideas and opinions presented here are the responsibility of the authors and should in no way be attributed to IFAD.