Article Text


Smoking in young adolescents: an approach with multilevel discrete choice models
  1. J Pinilla,
  2. B González,
  3. P Barber,
  4. Y Santana
  1. Department of Quantitative Methods, University of Las Palmas de Gran Canaria, Canary Islands, Spain
  1. Correspondence to:
 Professor B González López-Valcárcel, University of Las Palmas de Gran Canaria, Department of Quantitative Methods, Campus de Tafira, 35017 Las Palmas, Spain;


Study objective: To understand the context for tobacco smoking in young adolescents, estimating the effects of individual, family, social, and school related factors.

Design: Cross sectional analysis performed by multilevel logistic regression with pupils at the first level and schools at the second level. The data came from a stratified sample of students surveyed on their own, their families' and their friends' smoking habits, their schools, and their awareness of cigarette prices and advertising.

Setting: The study was performed in the Island of Gran Canaria, Spain.

Participants: 1877 students from 30 secondary schools in spring of 2000 (model's effective sample sizes 1697 and 1738) .

Main results: 14.2% of the young teenagers surveyed use tobacco, almost half of them (6.3% of the total surveyed) on a daily basis. According to the ordered logistic regression model, to have a smoker as the best friend increases significantly the probability of smoking (odds ratio: 6.96, 95% confidence intervals (CI) (4.93 to 9.84), and the same stands for one smoker living at home compared with a smoking free home (odds ratio: 2.03, 95% CI 1.22 to 3.36). Girls smoke more (odds ratio: 1.85, 95% CI 1.33 to 2.59). Experience with alcohol, and lack of interest in studies are also significant factors affecting smoking. Multilevel models of logistic regression showed that factors related to the school affect the smoking behaviour of young teenagers. More specifically, whether a school complies with antismoking rules or not is the main factor to predict smoking prevalence in schools. The remainder of the differences can be attributed to individual and family characteristics, tobacco consumption by parents or other close relatives, and peer group.

Conclusions: A great deal of the individual differences in smoking are explained by factors at the school level, therefore the context is very relevant in this case. The most relevant predictors for smoking in young adolescents include some factors related to the schools they attend. One variable stood out in accounting for the school to school differences: how well they enforced the no smoking rule. Therefore we can prevent or delay tobacco smoking in adolescents not only by publicising health risks, but also by better enforcing no smoking rules in schools.

  • tobacco
  • schools
  • adolescents
  • smoking
  • multilevel discrete choice models

Statistics from

Precisely what leads adolescents to take up smoking has been the subject of exhaustive study in the recent epidemiological literature.1–5 These studies have revealed the multifactorial nature of the phenomenon, and the methodological and empirical problems to be faced when trying to model and predict smoking patterns.6 Peer7,8 and parental influences are well documented,9–12 as well as personality variables.13 Differences in prevalence between girls and boys have been found.14,15 The wish to “experiment” at this age incites some teenagers to try cigarettes16; the same stands for alcohol consumption, which has been proved to be complementary with smoking for the teenagers group.17

It has been clearly and empirically stated in the literature that smokers aged 12–14 tend to have a circle of friends who are smokers. In addition, they often have at least one parent who smokes, little interest in school, come from broken homes, and may exhibit a precocious physical maturity. Experience with alcohol tends to be present, sometimes combined with other unhealthy habits.

In addition to these individual predictors, there are others specific to the environment at the school the young teenager attends. One such factor, having a particular importance, is the awareness of and compliance with antitobacco school regulations pertaining both to the sale and consumption of cigarettes on the part of teachers and students.18 Enforced smoking bans at school may reduce teenage smoking.19 Besides this, schools also can be an important source of information about the health risks of smoking20,21 and an optimal place to implement preventive programmes.22–25

This paper shows some results and analysis of the first wave of a three years longitudinal survey. It is part of a research project about the formation of the smoking habit in young adolescents. Its main objective is to estimate the relative influence of those factors based on the school setting as compared with the factors and variables related to the teenager's personal characteristics and to their family. The most important contribution of this study is that it estimates the effect of compliance with antismoking rules in the schools on the prevalence of smoking among young adolescents. To the best of our knowledge, this is the first time that this kind of results are reported in the public health literature.


The data used in the empirical application derive from a cross section survey made in the spring of 2000, part of the first stage of a broader, three year, longitudinal study. The broader study was begun with secondary school students in their second year (aged 13–14).*

The sample system used was that of clusters with probabilities proportional to the number of groups of second year students in each school, doubly stratified: by municipality and by kind of school—public, subsidised, and private. The reason for stratification by municipality was that schools on Gran Canaria are highly concentrated (59%) in 2 of the 21 municipalities. For this reason we defined three strata, one each for the two cities with most of the schools, Las Palmas de Gran Canaria and Telde, and the third for the other 19 municipalities.

Stratification by type of school similarly was attributable to uneven distribution: 72% of the schools are public, 19% subsidised, and just 9% private.

The initial sample included a total of 2010 students in 33 schools, out of a total of 10 839 students and 178 schools, which yields an estimation error of 1.3% for 95% level of confidence for =0.1 , and an error of 6.76% for =0.5, also at the 95% level of confidence. Figure 1 shows our sampling strategy, in which 88% of the schools initially selected (30 of 33) participated in the experiment. Two schools no longer existed, and two schools had merged in the year since our list of schools had been drawn up. So including this merged school we had a total of 1910 survey responses, of which we used 1877, rejecting 23 that were self contradictory or obviously joking.

Figure 1

Survey sampling strategy.

The questionnaire, which was self administered, had four sections. The first included sociodemographic variables: age, sex, number of people in the home, and spending money, followed by variables related to in school and out of school activities. Then it asked about consumption of cigarettes and variables that might be related: interest in studies, alcohol consumption, tobacco consumption among peer group, by parents, and by other family members. We also asked about cigarette brands, prices, and the attitudes towards advertising, and concluded with questions about the students' knowledge about the negative effects of tobacco, attitudes toward tobacco, and their own prediction about whether they would become smokers. The questionnaire was approved by a committee of independent researchers with experience in surveys of teenage smoking, and tested with an initial pilot sample of 200 children. When students were given the questionnaires, they were told their answers would be confidential; teachers were not allowed to be present in order to guarantee free responses. The survey was administered in the school classrooms by the same, single researcher, who provided instructions and answered students' doubts.

We estimate two types of models. Firstly, we estimate an ordered logistic regression model to explain the question “Do you smoke?”, with four ordered categories from No smoking to Yes, smokes daily. The explanatory variables, at the individual level, are sex, interest in school (ordered, with four categories), alcohol consumption (never; occasionally; more than once a week); whether the best friend smokes or not; and the number of smokers living at home. The second type of model was a binary logistic multilevel regression (smokes daily or not). It has as explanatory variable at the school level (context) the compliance with antismoking rules in the school. This is a dichotomous variable, based on the replies of school directors about smoking in various school areas where smoking is forbidden by the antismoking rules: cafeteria, courtyard, classrooms, bathrooms, corridors or halls. If the director says that smoking is allowed in one or more of these places, the dummy variable for compliance is set to zero, otherwise it is set to one.

Multilevel models

In this study we specify and estimate a multilevel model26,27 adjusted for the traits of the adolescent smoker. It combines effects related to the individual with those for the school where the young teenager studies. Multilevel models, designed as they are for data grouped in hierarchies or levels, are particularly appropriate for this kind of analysis. With these models we can evaluate how much of the variability of the dependent variable—tobacco use in the adolescent—is attributable to individual circumstances, and how much to the effect of the group, the school attended.

Multilevel regression models are indicated when there is a hierarchical structure in levels of data, with a single dependent variable measured at the lowest level and a set of explanatory variables on each of the levels. The advantage of these models lies in their capacity to define and explore variations at each level of the hierarchy after controlling for relevant explanatory variables. The application of multilevel models in health economics has become quite common, in health administration,28 as well as in other areas.29,30

To better understand multilevel models, we suggest starting with a model of components of variance. When applying it, let us suppose that we have data from successive repetitions of experiments with J groups of different individuals, in each of which there are nj persons in the sample. Taking Yij as the value of a given variable or property of individual i of group j in the sample, we can represent this variable asEmbedded Image where βj represents the mean of the variable Y of all the individuals of group j and eij the deviation of the value of the variable Y for the individual i from the mean of that variable in the other members of group j.

Deviating the βj from its mean ββ̅, equation (1) can be expressed as:Embedded Image

The model formulated in this equation coincides with the components of variances with fixed and random effects. The component β of the equation represents the so called fixed effects, while uj+eij contains the random effects. This model assumes that the random effects uj are distributed normally with mean 0 and variance σ2uj2βj which stands for differences in the variable Y attributable to the group. It also assumes that the error component eij is also distributed normally with mean 0 and variance σ2. And finally it assumes that the random effects uj and the error component eij are independent, rue=0, and that the eij are all independent one from another.

The objective, therefore, consists in estimating from the available data the variances σ2u and σ2 which represent, respectively, the variability of the variable Y attributable to the group the individual belongs to, and also the variability of the variable Y attributable to differences between individuals, defining ρ the intraclass correlation as the percentage of the total variance among the J groups.Embedded Image

Let us suppose that we also have information about K-1 personal characteristics for each individual in our sample. These characteristics could be included in a vector Xij that when introduced in the model would involve the appearance of K new fixed effects, K-1 variables, and a constant. Equation (2) would then become:Embedded Image in which X includes K regressors and eij ≈N(0,σ2).

The next step in hierarchical models is to parametrise the coefficients βj of equation (4) by adding L explanatory variables at the group level, leaving the model as:Embedded Image in which the fixed effects are now dependent on L possible variables of the groups.

The models we have seen so far represent general specifications of multilevel models in which the responses to a continuous variable are related in a linear function to a set of explanatory variables and to a simple hierarchical structure. Many of the applications to health economics are not linear, and so work better with generalised linear models with different functions: logit, probit, Poisson, negative binomial, etc. In our case we applied multilevel analysis with a logistic function.

For a binary response, unlike in the Normal case, even if the only random coefficient is the intercept, the level 1 variance (Var(Yij)=πij(1-πij), where πij is the probability of smoking of the i-student attending to the j- school) depends on the expected value of the dependent variable. Furthermore, the level 2 variance, σ2u is measured on the logistic scale, so is not directly comparable to the level 1 variance. Therefore in multilevel models for binary response, intraclass correlation cannot be calculated with equation (3). Goldstein proposes four different procedures to provide at least approximate estimates31 of the intraclass correlations in this models, but their interpretation is unclear.

The sequential process of modelling

To ensure the convergence of the estimations, we collapsed the dependent variable Yij which we initially specified with four possible values (smokes daily, on weekends, less than once a week, and not at all) as a simple binary: smokes daily/does not smokes daily. Then we estimated the following sequence of four models: Model 1, unconditional means model; Model 2, including a school level (level 2) predictor, the compliance with antismoking rules in each school; Model 3, including pupils level (level 1) predictors (fixed effects); finally Model 4, including both level 1 and level 2 predictors. In models 3 and 4, the random errors at the school level involve only intercepts. So, there is only one explicit random effect, and there is not interaction assumed between the context level and the individual level.

Statistical analysis

We analysed the data with the program MlWin, version 1.0.32 We have used RGLS (restricted generalised least squares), with extra-binomial variation and without it, to compare both results. The estimation without extra-binomial variation constrains to 1 the variance of the level 1 error. In the other case (extra-binomial variation is allowed), the level 1 variance is estimated.33 The reported results are those with extra-binomial variation. There is some discussion about this point. It was said that, as a “rule of thumb”, an estimate close to 1.0 indicates conformity to the binomial assumption, but a recent simulation34 to investigate the effects of sample size and model misspecification on estimating the level 1 variance for binary responses has shown that “an estimate very close to 1.0 seems to be an indication for the incorrect specification rather than for a correct binomial assumption” (12). And, furthermore, “in small samples, estimates around 0.8 can appear although the model is correctly specified under the binomial assumption (13)”.


Table 1 shows the descriptive analysis of the variables that appear in the models, including some characteristics of the schools that had been included at first, but that were excluded from the final specifications. There is not significant association between the school classification and antismoking rules compliance. Private schools have higher rates of academic success. We have no information about the prevalence of smoking in the teachers group, but more than 90% of 12 years old students know that at least one of their teachers smoke, and this is the case for both private and public schools. We observe that 14.3% of the young teenagers surveyed use tobacco, almost half of them (6.3% of the total surveyed) on a daily basis. As for the no smoking rules, we see that in 33% of the schools there is compliance with them.

Table 1

Descriptive statistics of the variables in the models

Smoking prevalence was higher (9.53%) in schools with weak compliance with smoking rules as compared with schools with medium/strong compliance (5.45%) The difference in means is significant (t=2.017; p=0.053).

Firstly, we estimated an ordered logit model to find out the relation between the young teenager's smoking habits and other individual characteristics. This model is designed for ordered categorical dependent variables, to take into account their ordinal nature. In our case, the dependent variable has four ordered categories, no smoke, less than once a week, on weekends and smokes daily. Table 2 presents the results of this estimation, with the odds ratio of each explanatory variable and the corresponding confidence levels at 95% as well as the univariate (unadjusted) odds ratios. These odds ratios are calculated to compare the pupils smoking daily with those that never smoke. The multicollineality among the explanatory variables causes the discrepancy between the unadjusted and the adjusted odds ratios (through the multivariate ordered logistic regression model). The pupils that do not have any interest in school tend to drink alcohol and they tend also to have a smoker as their best friend. We find that on the whole, girls are more inclined to smoke than boys, with an odds ratio of 1.85. Young teenagers who are less interested in school are more likely to smoke, and alcohol consumption is strongly related to smoking. Tobacco use by relatives and in the peer group has a strong positive effect on the probability of being a smoker. The peer group has the strongest effect, with an odds ratio of 6.96.

Table 2

Smoking by young teenagers. Ordered logistic regression model estimation results

Table 3 presents the estimations obtained from the multilevel models. The sample size is 1.738. The dependent variable is dichotomous (1 if he/she smokes daily, 0 if he/she not smokes daily). According to the first multilevel logistic regression model (Model 1), the school level variance, σ2u is significant, and we may attempt to explain it by the differences in school related factors. As the computation and the interpretation of the intraclass correlation in this model has some problems (see discussion above), we leave it out of all models in table 3. Model 2 introduced a fixed coefficient at level 2, for compliance with no smoking rules. Its coefficient is significant. We tested other school related variables such as the number of students per classroom, the kind of school (public, subsidised, private), and the percentage of students who pass their courses, by subject matter, but eliminated them from the final model for lack of statistical significance. In Model 2, the school level variance (significant too), is a 28% lower than in the previous model.

Table 3

Differences in smoking habits explained by factors at the school level and individual characteristics (n=1738). Logistic multilevel regression results. Dependent variable: Smokes daily (yes=1/no=0)

In Model 3 we included in the unconditional model, as fixed effects, the set of those explanatory variables related to the individual that proved significant in the initial logit ordered model. Model 4 adds the antismoking enforcement. The level 2 component of variance (for the differences between schools) is significant. It has been reduced from 0.518 (in Model 3) to 0.447. Again, as in the shift from Model 1 to Model 2, the addition of the explanatory variable at the school level reduces the unexplained variability among schools. There are not significant cross level fixed or random effects. As, furthermore, it would have impeded convergence, they were not included in any of the final models.


Our study coincides with those in other European countries as to the main risk factors for tobacco consumption in adolescents,35–38 principally, smoking by close friends, tobacco consumption by parents or other close relatives, experience with alcohol, and lack of interest in school.

According to our estimates, part of the variability among teenagers' probabilities of smoking is related to individual and family characteristics, and the remaining belongs to the “context” level, due to school variables. The school variable that significantly contributes to explain the probability of smoking is the compliance with no smoking rules. The significance of this variable in our models shows how important a strict enforcement of no smoking health laws could be for the pupils. Other characteristics of the schools have no significant effect. There is no difference among public, private and subsidided schools; the rate of academic success in each school is not significant, either. We have tried with some other school variables, but none of them is significant in our model.

The school variable that significantly contributes to explain the probability of smoking is the compliance with no smoking rules.

The period of secondary schooling coincides with adolescence, in which students undergo an intense process of physical, intellectual, and emotional change. Students undergo tension, not only because of these internal changes, but also because of pressure from the family and the school. Though counselling and guidance schools should help their students form self identity, build self esteem, and resist group pressure in order to promote values and habits against the use of tobacco and other drugs.

According to the latest statistics on unhealthy consumption in the school population, tobacco use among students has increased at an alarming rate.39,40 According to our survey, 14.3% of students aged 13–14 smoke, and almost half of these do so daily.41 These young teenager smokers may well be leaders or especially popular figures in peer groups, which would explain why there is a higher percentage, 23.5%, of affirmative replies to the question whether your best friend smokes, than the 14.3% overall rate of smokers.

Girls do so more than boys, reflecting the feminisation of tobacco consumption in Spain in the past years.

Lax enforcement of no smoking rules in schools has a double impact on the young teenagers. It makes it easier for them to begin to smoke themselves, and it also allows them to see older students, even teachers, smoke. Habits brought by teachers and by students from “smoking homes” become visible, available, and attractive to young students from “non-smoking homes.”

Tobacco is one of the most consumed drugs; it causes the most health problems. Preventing its consumption is thus one of the main objectives of public health policy.42–45 We must attempt to neutralise the relation that is made in the social arena (induced by advertising) between the values of masculinity (competitiveness, initiative, power) and the consumption of tobacco. This relation is attractive now especially for girls, as they too prepare for positions of social power. Furthermore, as we know that initiation in the habit takes place in the circle of friends, and is associated with the symbolic perception of passage to adult life, our intervention should be focused on strengthening resistance to group pressure. For this reason it is the first years of secondary school (ESO in Spain) in which we must increase preventive measures in regard to tobacco, if we want to delay as long as possible teenager smoking, if not avoid it altogether. The school itself is a key arena for this preventive action, both in what it does, and in what it allows to be done.

The main limitation of our study lies in the multilevel model's assumption that teenager and school effects are uncorrelated. If some individual variables, omitted from the model because of lack of data (for instance, the family income), turned out to be relevant and correlated with the school effects (better compliance of the antismoking rules in the “rich” schools, for example), our hypothesis would not be justified.

This study is to be continued with the second (and third) wave of the survey. Longitudinal data on our schoolchildren will provide us with new material to pursue our research. New information will emerge because some of our sample teenagers are moving to a new school.


The authors are grateful to Bill Christian, PhD for his helpful comments of this manuscript and for his assistance in the translation, and to the anonymous reviewers for many helpful and constructive comments. The authors are also grateful to Jose Ramon Calvo, PhD, to Anselmo Lopez PhD, and to Araceli Caballero for their comments and ideas about the survey.

Funding: this paper is a product of the research project FUNCIS PI17/99 funded by FUNCIS (Fundación Canaria de Investigación y Salud).


View Abstract


  • * Secondary education in Spain is obligatory for students aged 12–16. It is intended to prepare students for integration in the active population, to enter technical school, or to study bachillerato, the equivalent of high school. The students surveyed were in what is known in Spain as the “segundo año de ESO”.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.