A multivariate approach to investigate the combined biological effects of multiple exposures

Pooja Jain; Paolo Vineis; Benoît Liquet; Jelle Vlaanderen; Barbara Bodinier; Karin van Veldhoven; Manolis Kogevinas; Toby J Athersuch; Laia Font-Ribera; Cristina M Villanueva; Roel Vermeulen; Marc Chadeau-Hyam

doi:10.1136/jech-2017-210061

Article Text

PDF

Theory and methods

A multivariate approach to investigate the combined biological effects of multiple exposures

Pooja Jain1,
Paolo Vineis1,2,
Benoît Liquet3,4,
Jelle Vlaanderen5,
Barbara Bodinier1,
Karin van Veldhoven1,
Manolis Kogevinas6,7,8,9,
Toby J Athersuch1,10,
Laia Font-Ribera6,7,8,9,
Cristina M Villanueva6,7,8,9,
Roel Vermeulen1,5,
http://orcid.org/0000-0001-8341-5436Marc Chadeau-Hyam1,5

¹ Department of Epidemiology and Biostatistics, School of Public Health, MRC-PHE Centre for Environment and Health, Imperial College London, London, UK
² Molecular and Genetic Epidemiology Unit, Italian Institute for Genomic Medicine (IIGM), Turin, Italy
³ UMR CNRS 5142, Laboratoire de Mathématiques et de leurs Applications, Université de Pau et des Pays de l’Adour, Anglet, France
⁴ School of Mathematics, ARC Centre of Excellence for Mathematical and Statistical Frontiers, Queensland University of Technology, Brisbane, Australia
⁵ Institute for Risk Assessment Sciences, Division of Environmental Epidemiology, Utrecht University, Utrecht, Netherlands
⁶ ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
⁷ Universitat Pompeu Fabra (UPF), Barcelona, Spain
⁸ CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
⁹ IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
¹⁰ Division of Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK

Correspondence to Dr Marc Chadeau-Hyam, MRC-PHE Centre for Environment and Health, Imperial College London, London W2 1PG, UK; m.chadeau{at}imperial.ac.uk

Abstract

Epidemiological studies provide evidence that environmental exposures may affect health through complex mixtures. Formal investigation of the effect of exposure mixtures is usually achieved by modelling interactions, which relies on strong assumptions relating to the identity and the number of the exposures involved in such interactions, and on the order and parametric form of these interactions. These hypotheses become difficult to formulate and justify in an exposome context, where influential exposures are numerous and heterogeneous. To capture both the complexity of the exposome and its possibly pleiotropic effects, models handling multivariate predictors and responses, such as partial least squares (PLS) algorithms, can prove useful. As an illustrative example, we applied PLS models to data from a study investigating the inflammatory response (blood concentration of 13 immune markers) to the exposure to four disinfection by-products (one brominated and three chlorinated compounds), while swimming in a pool. To accommodate the multiple observations per participant (n=60; before and after the swim), we adopted a multilevel extension of PLS algorithms, including sparse PLS models shrinking loadings coefficients of unimportant predictors (exposures) and/or responses (protein levels). Despite the strong correlation among co-occurring exposures, our approach identified a subset of exposures (n=3/4) affecting the exhaled levels of 8 (out of 13) immune markers. PLS algorithms can easily scale to high-dimensional exposures and responses, and prove useful for exposome research to identify sparse sets of exposures jointly affecting a set of (selected) biological markers. Our descriptive work may guide these extensions for higher dimensional data.

exposome
multiple exposures
multivariate response
OMICs data
multi-level sparse PLS models

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/

https://doi.org/10.1136/jech-2017-210061

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Health effects of simultaneous exposure to numerous and possibly interacting chemicals is raising growing public health concerns,1–3 and such investigation calls for the definition of efficient statistical approaches.4 The exploration of the biological responses to external exposures, as formalised in the exposome concept,5–9 relies on statistical methods accommodating the multivariate and complex interrelations of exposures. Most of the proposed supervised methods rely on dimensionality reduction or (Bayesian) variable selection.10 11 These methods can handle the multidimensionality of exposome data as well as existing correlation structures. However, in practice, the use of these methods has been so far mainly restricted to the exploration of a single exposure and/or endpoint at a time. Supported by the established and possibly differential combination of exposures to which populations are subjected,12 13 developing statistical approaches to investigate the effect of mixtures has represented a further active research field and resulted in methods explicitly modelling interactions between exposures.14–16 Their usability, however, remains limited in an agnostic context, due to dimensionality and interpretability issues.4 17 Furthermore, these approaches rely on strong assumptions mainly relating to the number, the order, and parametric form of the interactions between exposures. These assumptions become even more complex to formulate and to justify when, as generally the case in exposome studies, most of the effective exposures and their effects are not measurable and even sometimes unknown/unidentified.

To address the dimensionality burden of including interaction terms in the statistical models, a two-stage strategy first identifying prioritised exposures and second investigating their potential interactions has been proposed.17 18 However, this approach assumes that the most relevant exposures potentially active in a mixture could be detected based on their marginal effects or correlation. In order to better capture the complexity of the exposure mix, including potential (un-modelled) interactions, models accommodating multivariate exposures are needed. Furthermore, to account for complex and possible pleiotropic effects of these exposures, multivariate responses need to be accounted for. As a supervised dimensionality reduction technique, partial least squares (PLS) regression aims at constructing summary latent variables as linear combinations of the original predictors and response variables. These summary variables are constructed not only so that they capture as much information as possible in each block of data, but also identifies (i) the variability in the predictors that is relevant to the outcomes, and (ii) the variability in the responses which is mostly linked to the predictors.19 20 Variable selection can also be achieved in PLS regression through L1 penalisation shrinking towards 0 the loadings coefficients of the least influential/influenced variables.21–23

In order to accommodate repeated measurements, linear mixed models or multivariate normal models24 25 can be used in an univariate context, and for dimensionality reduction techniques, multilevel approaches have been proposed.26

As a description of the application of multilevel PLS regression approaches and their sparse variants to investigate the multivariate response to multivariate exposures in the presence of multiple measurements per participants, we analyse here the inflammatory response to the exposure to a set of four trihalomethanes (THMs): chloroform (CHCl₃), bromodichloromethane (BDCM), dibromochloromethane (DBCM), bromoform (CHBr₃), measured in exhaled breath as a surrogate for the exposure to disinfection by-products (DBP). The panel of immune markers we assay include interleukins (IL), growth factors, (C-C and C-X-C motif) chemokines ligands, IL receptor antagonists and C reactive proteins (CRP), which may be involved in the reported immunotoxic effect of DBPs and similar compounds27 28 and/or in the anti-inflammatory effects of physical exercise.29

Methods

Data were collected within the ‘EXPOsOMICS’ project30 and include measurements of (i) four DBPs in exhaled breath from participants before and after a swimming session in a chlorinated pool (PISCINA II study) and (ii) at both time points, blood levels of (n=13) inflammation-related proteins.28

Study population

The PISCINA II study is an experimental investigation of the acute biological response to exposures induced by swimming in a chlorinated pool. As detailed elsewhere,31 the study included volunteers, aged 18–40 years, non-smoking and non-professional swimmers, who swam for 40 min in a 25 m long indoor chlorinated pool in Barcelona, Spain, between June and December 2013. At the time of the experiment, participants were asked to complete a questionnaire providing information on sociodemographic, dietary habits, regular physical activity, medical and anthropometric factors.

DBPs including four THMs, CHCl₃, BDCM, DBCM and CHBr₃, were measured in exhaled breath at two time points: before swimmers entered the swimming pool and immediately after they exited the swimming pool, using the Bio-VOC Sampler (Markes International Ltd, UK). These chemicals were assessed by gas chromatography coupled to a mass spectrometer. Details on sampling collection and analysis have been published previously.31

For each of these 60 participants with full exposure and questionnaire data, two blood samples collected before and 2 hours after swimming were available. These were collected in a room detached from the swimming pool area and stored at −80°C.

Informed consent was provided by each participant before commencement of the experiment.

Protein assay

As detailed elsewhere,28 a panel of 23 immune markers from both serum samples was assessed using an R & D Systems (Abingdon, UK) Luminex screening assay according to the protocol described by the manufacturer. The panel includes interleukin (IL)-1β, IL-1rA, IL-4, IL-5, IL-6, IL-8, IL-10, IL-13, IL-17, tumour necrosis factor-alpha (TNF-α), epidermal growth factor (EGF), macrophage inflammatory protein 1 beta (MIP1 β), chemokine (C-X-C motif) ligand 1 (CXCL1), myeloperoxidase (MPO), C-X-C motif chemokine 10 (CXCL10), vascular endothelial growth factor (VEGF), C-C motif chemokine 22 (CCL22), periostin, chemokine (C-C motif) ligand 2 (CCL2), basic fibroblast growth factor (FGF basic), granulocyte colony-stimulating factor (G-CSF) and C-C motif chemokine 11 (CCL11). In addition, CRP was assessed using an R & D Systems Solid Phase Sandwich ELISA. Both samples from the same individual were analysed in the same analytical batch. Serum concentrations for IL-1β, IL-4, IL-5, IL-6, IL-10, IL- 13, TNF-α, MIP1 beta, CXCL1, and FGF basic were below the limits of quantification in more 60% of the samples and were therefore excluded from the analyses. Missing values in the remaining 13 immune markers were imputed using a maximum likelihood estimation procedure.32

Statistical model

Descriptive analyses of the data were done using paired Student’s t-test comparing the mean concentration of exposures (log-transformed) and proteins before and after the swimming experiment.

We used a partial least squares (PLS) approach in regression mode, setting the (n=4) exposures as predictors and the (n=13) proteins as multivariate response variables. PLS algorithms determine components as linear combinations of predictors X (exposures) and responses Y (proteins) maximising their variance-covariance. Denoting T and U the resulting projections of X and Y, respectively, and P and Q the loadings matrices, the PLS regression decomposes X and Y as

Where E and F are error terms assumed to be independent and identically distributed Gaussian variables. In this supervised context, latent variables of X capture the variability in exposures that is relevant to the inflammatory profile, and symmetrically, the components of Y capture the variability in the inflammatory profiles that is related to exposures.

The contribution of each of the original variables in defining the PLS components (loadings coefficients) enables to identify which (combination of) exposures drive the X-Y variance covariance structure and can be used to identify the (sets of) exposures mostly related to the (sets of) inflammatory proteins. The proportion of variance in X (or Y) explained by a given component of X (or Y) measures how accurately that single component summarises the information contained in the original X (or Y) matrix. The percentage of variance of Y explained by the components of X measures relevance of information summary provided by the PLS components of X to the outcome matrix.

Variable importance in projection (VIP) measures, for a given X component, the relative contribution of each of the original predictor to the explanation of the X-related variance in the outcome. VIP thereby measures the contribution of each original predictors in the overall explanatory performance of a given X component, and helps identifying the main drivers of that component.33 Typically, a variable with a VIP >1 is considered explanatory for the variation in Y captured by that component.

To accommodate the repeated measure design, before and after swimming, where both exposure and protein levels are available, we adopted a multilevel approach, which decomposes the observed variability into within-individual and between-individual variability. The within-individual variability (ie, changes induced, within each participant, by the experiment) is included into a standard PLS model to identify linear combinations of exposures that are best able to explain within individual changes in inflammatory markers levels.

We ran our multilevel PLS analyses setting the four exposures predictors, and the 13 proteins as outcome. Four variants of the PLS approach were considered: the natural PLS, and sparse PLS (sPLS) imposing sparsity in the loadings coefficients of (i) X (exposures) components (sPLS_X), (ii) Y (exposures) components (sPLS_Y), (iii) both components of X and Y (sPLS_XY). sPLS_X models shrink the X loadings coefficients towards 0 for the least informative exposure and hence help identifying the most relevant exposures with respect to inflammatory profiles. Symmetrically, variable selection of Y (sPLS_Y) selects the proteins whose expression is mostly affected by exposures.

Calibration of the sPLS models was done using fivefold cross-validation which was independently repeated 1000 times. The cross-validation procedure is repeated for possible values of (i) the number of components to select and, (ii) the number of non-zero loadings coefficients (ie, the number of original variables contributing to the component). The number of components to be considered was determined using the average Q² statistic calculated across all folds and repeats and was defined as the maximal the number of components such that adding an additional component would yield a substantive drop in the Q² value. Sparsity of the sPLS models was controlled by setting the number of variable included to the one minimising the cross-validated prediction error.

Analyses were performed using R V.3.4.0 (21 April 2017). R-codes used for the analyses are available on request to the corresponding author.

Results

Exposure and protein data

Higher plasma proteins levels after swimming were only observed for Periostin and IL-1ra (table 1). Of the 13 assayed proteins, only 3 showed nominally significant difference (p<0.05) after the swimming experiment (CCL11, and CXCL10 lower and IL-1ra, higher).

View this table:

Table 1

Summary statistics of PISCINA II study—mean (SD) levels of exposures and proteins before and after swimming

For all four exposures, a strong contrast was observed before and after the swimming session (p values <4×10⁻¹⁶). Strong correlations among exposures were also observed, particularly in the postswimming samples (figure 1). Protein levels showed moderate correlation levels that were consistent before or after the swimming session (figure 1).

Figure 1

Spearman correlation coefficients for exposures (top) and Pearson correlation coefficients for protein levels (bottom) before (first column) and after (second column) the swim. The third column represent the correlation coefficients between differences in exposures and proteins levels. BDCM, bromodichloromethane; CCL11, C-C motif chemokine 11, CCL2 motif, chemokine (C-C motif) ligand 2; CCL22, C-C motif chemokine 22; CHBr₃, bromoform; CRP, C reactive protein; CHCl₃, chloroform; CXCL10, C-X-C motif chemokine 10; DBCM, dibromochloromethane; EGF, epidermal growth factor, G-CSF, granulocyte colony-stimulating factor; IL, interleukin; MPO, myeloperoxidase; VEGF, vascular endothelial growth factor.

PLS analyses

By construction, PLS models can calculate up to min(p_X,p_Y) components, where p_X and p_Y are the number of variables in the X and Y matrices, respectively. The loadings coefficients of the resulting 4 PLS components are reported in table 2, where (C_1X,…,C_kX) are the sets of k components for the predictor (exposure) matrix, and C_1Y,…,C_kY, the set of components relating to the response (proteins) matrix.

View this table:

Table 2

Results from the multilevel (s)PLS analyses regressing the four exposures (predictors) against the 13 assayed proteins (response). Results are presented for the PLS analyses, for sparse PLS models performing variable selection on exposures (sPLS on X), on proteins (sPLS on Y) and on both exposures and proteins sPLS on X and Y. We report in the table the loadings coefficients for the (s)PLS components of exposures (top table) and proteins (bottom table). For the (sPLS) components of exposures, we report the per-component proportion of variance (in both X and Y) explained, and for components of the proteins we only report the proportion of the variance in Y. For all sparse PLS models, results are only presented for the first PLS component, which is the only one to be retained according to the Q² criterion (see methods)

PLS models

Owing to the strong correlation among exposures, a single component (C_1X) for the exposure matrix is sufficient to explain about 95% of the variation in X. Each exposure contributed consistently to C_1X: the four loadings coefficients were similar, though slightly lower (in absolute value) for bromoform. C_1X explained around 11% of the variance in Y, suggesting that most of the variation in exposure is able to capture a limited fraction of the variation in protein levels. The remaining 3 components of X explained between 4.33% and 0.04% of the variance in X, and these explained 1.33% to 1.87% of the variance in Y.

VIP plots (figure 2A) show that BDC, DCB and to a lesser extent chloroform more importantly contributed to the explanatory performance of C_1X (VIP >1) than bromofrom (VIP around 0.9). The second component of X explained <5% of the variance in X which in-turn explained slightly >1% of the variance on Y (table 2), and this was driven by bromoform and chloroform (VIP >1, figure 2A). The proportion of variance explained by each component of X was not homogeneous across proteins and was consistent with the loadings of the Y components (table 2): C_1X explained a larger proportion of the variance for CXCL10 (>40%, highest loadings coefficient of C_1Y: 0.57), CCL22 (>20%, second largest loadings coefficient: 0.42), IL-1ra (around 20%, third highest loadings coefficient in absolute value: −0.38) and IL-8 (around 13%, loadings coefficient: 0.33). All other components explained <10% of each of the original protein levels.

Figure 2

Variable importance in projection plots and proportion of variance explained by protein. Results are presented for PLS model (A), for sparse PLS performing variable selection on exposures (B), on proteins (C), and both on exposures and proteins (D). CCL11, C-C motif chemokine 11, CCL2 motif, chemokine (C-C motif) ligand 2; CCL22, C-C motif chemokine 22; CRP, C reactive protein; CXCL10, C-X-C motif chemokine 10; EGF, epidermal growth factor, G-CSF, granulocyte colony-stimulating factor; IL, interleukin; MPO, myeloperoxidase; PLS, partial least squares; sVEGF, vascular endothelial growth factor.

Variable selection on either X or Y

In order to select the most influential exposures, and the most affected exposures we ran sparse PLS models, with penalisation of the loadings coefficients applied to exposures, and protein respectively. When performing variable selection on exposures, the calibrated sparse PLS model included a single component C_1X′, in which bromoform was not selected (table 2). C_1X′ explained 94% of the variance in X and 10.4% of the variance in Y. VIP plot (figure 2B) showed that mainly BDC and DCB contribute to the explanatory performances of C_1X′ and that this component explains a similar proportion of variance of Y than C_1X.

The calibrated sPLS models on proteins also included a single component, and in that component four proteins were excluded. These corresponded to those with the least explained variance by exposures in the non-penalised models and models penalised on X (see figure 2A and B, respectively): CCL2, EGF, IL-17 and G-CSF. The first component of X, C_1X″, was very similar to that of the non-penalised models (table 2), and explained 94.9% of the variance in X, and >14% of the variance of the shrunk set of proteins. As expected, the VIP plot for sPLS models on Y, C_1X″ (figure 2C), were similar to that of the PLS model C_1X (figure 2A), and the proportion of variance explained for each of the selected proteins were not affected.

Variable selection on both X and Y

In the final model, variable selection is performed on both X and Y, one component is retained for exposures and proteins: C_1X‴, and C_1Y‴. As in the sparse model on X, C_1X‴ does not include bromoform. C_1Y ‴ includes eight proteins (vs nine for C_1Y″) and the five proteins with null loadings were CCL2, EGF, IL-17, G-CSF (as in the sparse on Y model) as well as MPO. C_1X‴ explains 94% of the variance of X and 16.10% of the variance of the shrunk set of proteins.

In that model, as before, BDCM and DBMC mainly contribute to the explanatory performances of C_1X‴, and C_1X‴ explains the variance of eight proteins, mainly: CXCL10 (>40%), CCL22 (>20%), IL1-ra (~20%), IL-8 and CCL11 (~15%) (figure 2D).

Altogether, these sPLS models show that the exclusion of bromoform in the exposure matrix and of CCL2, EGF, IL-17, G-CSF, and in a lesser extend MPO, in the protein matrix, does not affect the performances of the model, which suggests that these excluded variables do not contribute to the exposure-protein relationship, and therefore not to the inflammatory response to DBP.

Projection of the dataset using the first PLS components of both exposures and proteins (figure 3) clearly shows that all post-swimming observations exhibit negative PLS scores along the exposure axis. Due to the negative loadings of C_1X‴ (table 2) this is indicative of increased exposure levels after the swimming experiment. Similarly, all except one of the post-swimming observations are allocated negative PLS scores along the first protein PLS component. This is suggestive of post-swimming decreased levels of IL-8, VEGF, CCL22, CCL11, CRP, and CXCL10 (all have positive loadings coefficients, see table 2), and increased levels of IL-1ra and, to a lesser extent, of Periostin (negative loadings coefficients, table 2).

Figure 3

X-Y score plot representing the PLS scores for the first exposure PLS component (C_1X‴x-axis) as a function of the scores of the first PLS component for proteins (C_1Y‴ , y-axis). Scores are presented for all (n=60) participants before (blue), and after (orange) the swimming session. Results are presented for the sparse PLS models performing variable selection of both exposures and proteins. PLS, partial least squares.

In figure 4, we compare the per-protein coefficient of determination (R², figure 4A) and the Akaike information criterion (AIC, figure 4B), calculated over the entire study population, for the four PLS models we investigated and for a series of linear mixed models we ran, setting the individual ID set as random intercept to regress all four exposures against each of the protein levels separately. From this plots, it appears that jointly modelling the effect of exposures on all proteins (as done in PLS models) improves the quality of the model fit and yields overall higher R² values and lower AIC for any of the PLS models compared with those from linear mixed models. Additionally, very small differences in both R² and AIC were observed across the four versions of the PLS approaches we used, suggesting that variable exclusion did not affect model performances and that the penalised models efficiently discarded irrelevant exposures, and/or unrelated exposures. For some proteins, in particular CXCL10, IL8, VEGF, CCL22 and CCL11, all PLS models yield better performances than linear mixed models. These proteins are the ones with the largest proportion of variance explained by exposures in the PLS models. Conversely, for some proteins (eg, MPO, CCL2 and EGF), PLS and linear mixed models similarly misperform, and these proteins correspond to proteins whose changes in concentration cannot be explained by exposures: these proteins were not selected in sPLS models.

Figure 4

Per-protein coefficient of determination (R²) (A) and Akaike information criterion (B) for the four PLS models investigated: non-penalised, with variable selection on X, on Y and on both X and Y. Results are also represented for a linear mixed model using the participant ID as random effect, and the set of four exposure are fixed effects, in relation to each protein separately. CCL11, C-C motif chemokine 11, CCL2 motif, chemokine (C-C motif) ligand 2; CCL22, C-C motif chemokine 22; CRP, C reactive protein; CXCL10, C-X-C motif chemokine 10; EGF, epidermal growth factor, G-CSF, granulocyte colony-stimulating factor; IL, interleukin; MPO, myeloperoxidase; PLS, partial least squares; VEGF, vascular endothelial growth factor.

Discussion

A common problem in environmental epidemiology is to identify the complex set of exposures affecting biological processes and ultimately contributing to health. Most environmental exposures operate in the form of complex mixtures, including air, food and water.

Unlike recent alternative explicitly modelling the variance among exposures,34 35 we used PLS models to identify subsets of exposures and proteins mostly covarying. In the present study, we chose not to explicitly model the exposure mix including its possible interaction in relation to molecular profiles, but considered instead that the set of measured exposures was a better proxy for the exposure mix than each exposure taken separately.

We propose to use an established extension of the PLS model to identify, within the entire set of measured exposures, those that are relevant to the entire set of biomarkers and, using penalisation, to quantify to which extent these exposures contribute to the overall explanatory performance of the model.

While PLS models can natively accommodate high-dimensional data,36–38 we chose to use data of limited dimensionality in our illustration to improve visualisation and interpretation of the main output of the methods. Our data featured two measurements per individual and we handled this design by adopting an establishedmultilevel approach decomposing the total variance into ’between' and ’within' individual variations. The former captures differences across individuals while the latter measures variation induced by the experiment (here, swimming) and is therefore adjusted on potential sources of heterogeneity across participants.

We observed modest changes in the proportion of variance explained across the PLS models investigated, and overall a small proportion of the variance of the proteins explained by the exposures. Our analyses identified a subset of eight proteins that were more affected by swimming-induced exposures and suggested a weaker association with bromoform in comparison to other di-bromo derivatives and chloroform. Our results show that variable selection applied to PLS models in presence of correlated predictors efficiently was able to discard (1/4) exposures not contributing (other than through their correlation with the other exposures) to the explanation of the variance of the protein levels, and (5/13) proteins whose variation was not related to the exposures, and hence improves results interpretability. Further improvements in results interpretability could be achieved by adopting orthogonal PLS approaches19 39 where the orthogonal (ie, non-predictive) information from the exposures is removed and does not contribute to the construction of the components.

Results from the sparse PLS models selecting both exposures and proteins suggest a swimming-induced immunotoxicity through decreased level of IL-8, VEGF, CCL22, CCL11, CRP and CXCL10, and increased levels of IL-1ra, which antagonises the proinflammatory IL-1. These results are consistent with previous studies showing an anti-inflammatory response in relation to exposures to DBP and/or physical activity.27–29 Our results also suggest that these inflammatory changes are concurrent to the acute exposure to DBP.

However, we observe a very strong contrast in exposure levels before and after swimming, which induces a strong correlation between exposures and factors relating to the experiment. Consequently it is not possible to statistically disentangle the effect of exposures from that of physical activity (or of any other factor related to the swimming experiment),25 28 and in the absence of independent evidence relating the identified proteins to DBP, these conclusions should be carefully interpreted.

Finally, our results show that adopting a multivariate approach accommodating multivariate predictors and responses (ie, accounting for the variance-covariance across proteins) yields improvement in the model fit (R² and AIC) over models looking at each response separately.

While our example focuses on the immune response to exposure to DBP, this approach could directly be extended to other exposures (in particular air pollutants), and due to the computational scalability of PLS models, such extensions could accommodate other omics data of higher dimensionality.

As such, variants of the PLS approach as used here may complement correlation-based approaches such as the exposome globe40 to identify ‘exposome haplotypes’ as well as related phenotypes, which could subsequently be interrogated for (potentially complex) interactions. Once limited to a promising set of exposures, interactions of the mixture components could be explicitly modelled using PLS approaches by including interaction terms in the predictor matrix and define a group structure compiling main effects and interaction terms in the same group, and adopt a sparse group PLS approach.41 toThis would identify the relevant sets of exposures (ie, groups) and within each group, the most relevant variables (main and/or interaction terms).

What is already known on this subject

The exploration of the role of exposures on human health relies on the modelling of the effect of exposure mixtures. Several approaches have been proposed to study exposure mixtures, and these rely on strong assumption regarding the identity of the involved exposures and on the form of their interactions. In an exposome context, relevant exposures are not all measured and sometimes unknown, hence making the formulation and justification of such hypotheses impossible.

What this study adds

In order to capture the complexity of the exposure effects and of their biological response, we propose to use PLS approaches combined with variable selection to identify influential subsets of exposures to identify a subset of molecular measurements preferentially affected by these exposures. Our illustrative application investigated the effect of exposure to disinfection by products (n=4) on inflammation, as measured by 13 proteins levels and can naturally be scaled up to high throughput data. It may therefore prove useful in exposome research to explore the complex and possibly pleiotropic effects of exposures on biological makers.

References

↵
2. Carlin DJ ,
3. Rider CV ,
4. Woychik R , et al
. Unraveling the health effects of environmental mixtures: an NIEHS priority. Environ Health Perspect 2013;121:a6–a8.doi:10.1289/ehp.1206182
OpenUrl
↵
2. Rider CV ,
3. Carlin DJ ,
4. Devito MJ , et al
. Mixtures research at NIEHS: an evolving program. Toxicology 2013;313:94–102.doi:10.1016/j.tox.2012.10.017
OpenUrl
↵
2. Dominici F ,
3. Peng RD ,
4. Barr CD , et al
. Protecting human health from air pollution: shifting from a single-pollutant to a multipollutant approach. Epidemiology 2010;21:187–94.doi:10.1097/EDE.0b013e3181cc86e8
OpenUrl CrossRef PubMed Web of Science
↵
2. Manrai AK ,
3. Cui Y ,
4. Bushel PR , et al
. Informatics and Data Analytics to Support Exposome-Based Discovery for Public Health. Annu Rev Public Health 2017;38:279–94.doi:10.1146/annurev-publhealth-082516-012737
OpenUrl
↵
2. Buck Louis GM ,
3. Sundaram R
. Exposome: time for transformative research. Stat Med 2012;31:2569–75.doi:10.1002/sim.5496
OpenUrl CrossRef PubMed
↵
2. Lioy PJ ,
3. Rappaport SM
. Exposure science and the exposome: an opportunity for coherence in the environmental health sciences. Environ Health Perspect 2011;119:a466–a467.doi:10.1289/ehp.1104387
OpenUrl
↵
2. Rappaport SM
. Biomarkers intersect with the exposome. Biomarkers 2012;17:483–9.doi:10.3109/1354750X.2012.691553
OpenUrl CrossRef PubMed Web of Science
↵
2. Wild CP
. Complementing the genome with an "exposome": the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol Biomarkers Prev 2005;14:1847–50.doi:10.1158/1055-9965.EPI-05-0456
OpenUrl FREE Full Text
↵
2. Rappaport SM ,
3. Smith MT
. Environment and Disease Risks. Science 2010;330:460–1.doi:10.1126/science.1192603
OpenUrl Abstract/FREE Full Text
↵
2. Agier L ,
3. Portengen L ,
4. Chadeau-Hyam M , et al
. A Systematic Comparison of Linear Regression-Based Statistical Methods to Assess Exposome-Health Associations. Environ Health Perspect 2016;124:1848–56.doi:10.1289/EHP172
OpenUrl
↵
2. Chadeau-Hyam M ,
3. Campanella G ,
4. Jombart T , et al
. Deciphering the complex: methodological overview of statistical models to derive OMICS-based biomarkers. Environ Mol Mutagen 2013;54:542–57.doi:10.1002/em.21797
OpenUrl CrossRef PubMed
↵
2. Porta M ,
3. Pumarega J ,
4. Gasull M
. Number of persistent organic pollutants detected at high concentrations in a general population. Environ Int 2012;44:106–11.doi:10.1016/j.envint.2012.02.005
OpenUrl CrossRef PubMed Web of Science
↵
2. Pumarega J ,
3. Gasull M ,
4. Lee DH , et al
. Number of Persistent Organic Pollutants Detected at High Concentrations in Blood Samples of the United States Population. PLoS One 2016;11:e0160432.doi:10.1371/journal.pone.0160432
↵
2. Barrera-Gómez J ,
3. Agier L ,
4. Portengen L , et al
. A systematic comparison of statistical methods to detect interactions in exposome-health associations. Environ Health 2017;16:74.doi:10.1186/s12940-017-0277-6
OpenUrl
↵
2. Billionnet C ,
3. Sherrill D ,
4. Annesi-Maesano I , et al
. Estimating the health effects of exposure to multi-pollutant mixture. Ann Epidemiol 2012;22:126–41.doi:10.1016/j.annepidem.2011.11.004
OpenUrl CrossRef PubMed
↵
2. Sun Z ,
3. Tao Y ,
4. Li S , et al
. Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons. Environ Health 2013;12:85.doi:10.1186/1476-069X-12-85
OpenUrl CrossRef PubMed
↵
2. Patel CJ
. Analytic Complexity and Challenges in Identifying Mixtures of Exposures Associated with Phenotypes in the Exposome Era. Curr Epidemiol Rep 2017;4:22–30.doi:10.1007/s40471-017-0100-5
OpenUrl
↵
2. Braun JM ,
3. Gennings C ,
4. Hauser R , et al
. What Can Epidemiological Studies Tell Us about the Impact of Chemical Mixtures on Human Health? Environ Health Perspect 2016;124:A6–9.doi:10.1289/ehp.1510569
OpenUrl
↵
2. Trygg J ,
3. Wold S
. Orthogonal projections to latent structures (O-PLS). J Chemom 2002;16:119–28.doi:10.1002/cem.695
OpenUrl CrossRef Web of Science
↵
2. Wold S ,
3. Sjöström M ,
4. Eriksson L
. PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems 2001;58:109–30.doi:10.1016/S0169-7439(01)00155-1
OpenUrl CrossRef Web of Science
↵
2. Lê Cao KA ,
3. Martin PG ,
4. Robert-Granié C , et al
. Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 2009;10:10.doi:10.1186/1471-2105-10-34
↵
2. Lê Cao KA ,
3. Rossouw D ,
4. Robert-Granié C , et al
. A sparse PLS for variable selection when integrating omics data. Stat Appl Genet Mol Biol 2008;7:7.doi:10.2202/1544-6115.1390
OpenUrl
↵
2. Parkhomenko E ,
3. Tritchler D ,
4. Beyene J
. Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol 2009;8:1–34.doi:10.2202/1544-6115.1406
OpenUrl CrossRef
↵
2. Espín-Pérez A ,
3. Font-Ribera L ,
4. van Veldhoven K , et al
. Blood transcriptional and microRNA responses to short-term exposure to disinfection by-products in a swimming pool. Environ Int 2018;110:42–50.doi:10.1016/j.envint.2017.10.003
OpenUrl
↵
2. van Veldhoven K ,
3. Keski-Rahkonen P ,
4. Barupal DK , et al
. Effects of exposure to water disinfection by-products in a swimming pool: A metabolome-wide association study. Environ Int 2018;111:60–70.doi:10.1016/j.envint.2017.11.017
OpenUrl
↵
2. Liquet B ,
3. Lê Cao KA ,
4. Hocini H , et al
. A novel approach for biomarker selection and the integration of repeated measures experiments from two assays. BMC Bioinformatics 2012;13:325.doi:10.1186/1471-2105-13-325
↵
2. Bassig BA ,
3. Zhang L ,
4. Tang X , et al
. Occupational exposure to trichloroethylene and serum concentrations of IL-6, IL-10, and TNF-alpha. Environ Mol Mutagen 2013;54:450–4.doi:10.1002/em.21789
OpenUrl CrossRef PubMed
↵
2. Vlaanderen J ,
3. van Veldhoven K ,
4. Font-Ribera L , et al
. Acute changes in serum immune markers due to swimming in a chlorinated pool. Environ Int 2017;105:1–11.doi:10.1016/j.envint.2017.04.009
OpenUrl
↵
2. Gleeson M ,
3. Bishop NC ,
4. Stensel DJ , et al
. The anti-inflammatory effects of exercise: mechanisms and implications for the prevention and treatment of disease. Nat Rev Immunol 2011;11:607–15.doi:10.1038/nri3041
OpenUrl CrossRef PubMed
↵
2. Vineis P ,
3. Chadeau-Hyam M ,
4. Gmuender H , et al
. The exposome in practice: Design of the EXPOsOMICS project. Int J Hyg Environ Health 2017;220.doi:10.1016/j.ijheh.2016.08.001
↵
2. Font-Ribera L ,
3. Kogevinas M ,
4. Schmalz C , et al
. Environmental and personal determinants of the uptake of disinfection by-products during swimming. Environ Res 2016;149:206–15.doi:10.1016/j.envres.2016.05.013
OpenUrl
↵
2. Lubin JH ,
3. Colt JS ,
4. Camann D , et al
. Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect 2004;112:1691–6.doi:10.1289/ehp.7199
OpenUrl CrossRef PubMed Web of Science
↵
2. Chong I-G ,
3. Jun C-H
. Performance of some variable selection methods when multicollinearity is present. ‎Chemom Intell Lab Syst 2005;78:103–12.doi:10.1016/j.chemolab.2004.12.011
OpenUrl
↵
2. de Vocht F ,
3. Cherry N ,
4. Wakefield J
. A Bayesian mixture modeling approach for assessing the effects of correlated exposures in case-control studies. J Expo Sci Environ Epidemiol 2012;22:352–60.doi:10.1038/jes.2012.22
OpenUrl CrossRef PubMed
↵
2. Lenters V ,
3. Vermeulen R ,
4. Portengen L
. Performance of variable selection methods for assessing the health effects of correlated exposures in case-control studies. Occup Environ Med 2017:oemed-2016-104231.doi:10.1136/oemed-2016-104231
↵
2. Allen GI ,
3. Peterson C ,
4. Vannucci M , et al
. Regularized Partial Least Squares with an Application to NMR Spectroscopy. Stat Anal Data Min 2013;6:302–14.doi:10.1002/sam.11169
OpenUrl
↵
2. Boulesteix AL ,
3. Strimmer K
. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief Bioinform 2007;8:32–44.doi:10.1093/bib/bbl016
OpenUrl CrossRef PubMed Web of Science
↵
2. Chun H ,
3. Keles S
. Expression quantitative trait loci mapping with multivariate sparse partial least squares regression. Genetics 2009;182:79–90.doi:10.1534/genetics.109.100362
OpenUrl Abstract/FREE Full Text
↵
2. Trygg J
. O2-PLS for qualitative and quantitative analysis in multivariate calibration. Journal of Chemometrics 2002;16:283–93.doi:10.1002/cem.724
OpenUrl CrossRef Web of Science
↵
2. Patel CJ ,
3. Manrai AK
. Development of exposome correlation globes to map out environment-wide associations. Pac Symp Biocomput 2015:231–42.
↵
2. Liquet B ,
3. de Micheaux PL ,
4. Hejblum BP , et al
. Group and sparse group partial least square approaches applied in genomics context. Bioinformatics 2016;32:35–42.doi:10.1093/bioinformatics/btv535
OpenUrl CrossRef PubMed

Footnotes

Contributors PJ, PV, RV and MC-H developed the idea of the study and drafted the manuscript. PJ and BB ran the analyses under the guidance of MC-H and BL. JV, KvV, MK, TJA, LF-R and CMV provided the data, pre-processed them and contributed to the results evaluation and interpretation. All authors contributed to the writing of the paper and have approved this final version.
Funding This work was carried out within the EXPOsOMICS Project, which was funded by the European Commission (grant agreement 308610-FP7 European Commission, to PV). The Centre for Environment and Health is supported by the Medical Research Council and Public Health England (MR/L01341X/1). MC-H acknowledges support from Cancer Research UK, Population Research Committee Project grant ‘Mechanomics’ (project 22184).
Competing interests None declared.
Patient consent Detail has been removed from this case description/these case descriptions to ensure anonymity. The editors and reviewers have seen the detailed information available and are satisfied that the information backs up the case the authors are making.
Ethics approval The study was approved by the ethics committee of the research centre following the international regulations, and all volunteers signed an informed consent before participation.
Provenance and peer review Commissioned; externally peer reviewed.

[1] ↵

Carlin DJ ,
Rider CV ,
Woychik R , et al
. Unraveling the health effects of environmental mixtures: an NIEHS priority. Environ Health Perspect 2013;121:a6–a8.doi:10.1289/ehp.1206182
OpenUrl

[3] Carlin DJ ,

[4] Rider CV ,

[5] Woychik R , et al

[6] ↵

Rider CV ,
Carlin DJ ,
Devito MJ , et al
. Mixtures research at NIEHS: an evolving program. Toxicology 2013;313:94–102.doi:10.1016/j.tox.2012.10.017
OpenUrl

[8] Rider CV ,

[9] Carlin DJ ,

[10] Devito MJ , et al

[11] ↵

Dominici F ,
Peng RD ,
Barr CD , et al
. Protecting human health from air pollution: shifting from a single-pollutant to a multipollutant approach. Epidemiology 2010;21:187–94.doi:10.1097/EDE.0b013e3181cc86e8
OpenUrl CrossRef PubMed Web of Science

[13] Dominici F ,

[14] Peng RD ,

[15] Barr CD , et al

[16] ↵

Manrai AK ,
Cui Y ,
Bushel PR , et al
. Informatics and Data Analytics to Support Exposome-Based Discovery for Public Health. Annu Rev Public Health 2017;38:279–94.doi:10.1146/annurev-publhealth-082516-012737
OpenUrl

[18] Manrai AK ,

[19] Cui Y ,

[20] Bushel PR , et al

[21] ↵

Buck Louis GM ,
Sundaram R
. Exposome: time for transformative research. Stat Med 2012;31:2569–75.doi:10.1002/sim.5496
OpenUrl CrossRef PubMed

[23] Buck Louis GM ,

[24] Sundaram R

[25] ↵

Lioy PJ ,
Rappaport SM
. Exposure science and the exposome: an opportunity for coherence in the environmental health sciences. Environ Health Perspect 2011;119:a466–a467.doi:10.1289/ehp.1104387
OpenUrl

[27] Lioy PJ ,

[28] Rappaport SM

[29] ↵

Rappaport SM
. Biomarkers intersect with the exposome. Biomarkers 2012;17:483–9.doi:10.3109/1354750X.2012.691553
OpenUrl CrossRef PubMed Web of Science

[31] Rappaport SM

[32] ↵

Wild CP
. Complementing the genome with an "exposome": the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol Biomarkers Prev 2005;14:1847–50.doi:10.1158/1055-9965.EPI-05-0456
OpenUrl FREE Full Text

[34] Wild CP

[35] ↵

Rappaport SM ,
Smith MT
. Environment and Disease Risks. Science 2010;330:460–1.doi:10.1126/science.1192603
OpenUrl Abstract/FREE Full Text

[37] Rappaport SM ,

[38] Smith MT

[39] ↵

Agier L ,
Portengen L ,
Chadeau-Hyam M , et al
. A Systematic Comparison of Linear Regression-Based Statistical Methods to Assess Exposome-Health Associations. Environ Health Perspect 2016;124:1848–56.doi:10.1289/EHP172
OpenUrl

[41] Agier L ,

[42] Portengen L ,

[43] Chadeau-Hyam M , et al

[44] ↵

Chadeau-Hyam M ,
Campanella G ,
Jombart T , et al
. Deciphering the complex: methodological overview of statistical models to derive OMICS-based biomarkers. Environ Mol Mutagen 2013;54:542–57.doi:10.1002/em.21797
OpenUrl CrossRef PubMed

[46] Chadeau-Hyam M ,

[47] Campanella G ,

[48] Jombart T , et al

[49] ↵

Porta M ,
Pumarega J ,
Gasull M
. Number of persistent organic pollutants detected at high concentrations in a general population. Environ Int 2012;44:106–11.doi:10.1016/j.envint.2012.02.005
OpenUrl CrossRef PubMed Web of Science

[51] Porta M ,

[52] Pumarega J ,

[53] Gasull M

[54] ↵

Pumarega J ,
Gasull M ,
Lee DH , et al
. Number of Persistent Organic Pollutants Detected at High Concentrations in Blood Samples of the United States Population. PLoS One 2016;11:e0160432.doi:10.1371/journal.pone.0160432

[56] Pumarega J ,

[57] Gasull M ,

[58] Lee DH , et al

[59] ↵

Barrera-Gómez J ,
Agier L ,
Portengen L , et al
. A systematic comparison of statistical methods to detect interactions in exposome-health associations. Environ Health 2017;16:74.doi:10.1186/s12940-017-0277-6
OpenUrl

[61] Barrera-Gómez J ,

[62] Agier L ,

[63] Portengen L , et al

[64] ↵

Billionnet C ,
Sherrill D ,
Annesi-Maesano I , et al
. Estimating the health effects of exposure to multi-pollutant mixture. Ann Epidemiol 2012;22:126–41.doi:10.1016/j.annepidem.2011.11.004
OpenUrl CrossRef PubMed

[66] Billionnet C ,

[67] Sherrill D ,

[68] Annesi-Maesano I , et al

[69] ↵

Sun Z ,
Tao Y ,
Li S , et al
. Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons. Environ Health 2013;12:85.doi:10.1186/1476-069X-12-85
OpenUrl CrossRef PubMed

[71] Sun Z ,

[72] Tao Y ,

[73] Li S , et al

[74] ↵

Patel CJ
. Analytic Complexity and Challenges in Identifying Mixtures of Exposures Associated with Phenotypes in the Exposome Era. Curr Epidemiol Rep 2017;4:22–30.doi:10.1007/s40471-017-0100-5
OpenUrl

[76] Patel CJ

[77] ↵

Braun JM ,
Gennings C ,
Hauser R , et al
. What Can Epidemiological Studies Tell Us about the Impact of Chemical Mixtures on Human Health? Environ Health Perspect 2016;124:A6–9.doi:10.1289/ehp.1510569
OpenUrl

[79] Braun JM ,

[80] Gennings C ,

[81] Hauser R , et al

[82] ↵

Trygg J ,
Wold S
. Orthogonal projections to latent structures (O-PLS). J Chemom 2002;16:119–28.doi:10.1002/cem.695
OpenUrl CrossRef Web of Science

[84] Trygg J ,

[85] Wold S

[86] ↵

Wold S ,
Sjöström M ,
Eriksson L
. PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems 2001;58:109–30.doi:10.1016/S0169-7439(01)00155-1
OpenUrl CrossRef Web of Science

[88] Wold S ,

[89] Sjöström M ,

[90] Eriksson L

[91] ↵

Lê Cao KA ,
Martin PG ,
Robert-Granié C , et al
. Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 2009;10:10.doi:10.1186/1471-2105-10-34

[93] Lê Cao KA ,

[94] Martin PG ,

[95] Robert-Granié C , et al

[96] ↵

Lê Cao KA ,
Rossouw D ,
Robert-Granié C , et al
. A sparse PLS for variable selection when integrating omics data. Stat Appl Genet Mol Biol 2008;7:7.doi:10.2202/1544-6115.1390
OpenUrl

[98] Lê Cao KA ,

[99] Rossouw D ,

[100] Robert-Granié C , et al

[101] ↵

Parkhomenko E ,
Tritchler D ,
Beyene J
. Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol 2009;8:1–34.doi:10.2202/1544-6115.1406
OpenUrl CrossRef

[103] Parkhomenko E ,

[104] Tritchler D ,

[105] Beyene J

[106] ↵

Espín-Pérez A ,
Font-Ribera L ,
van Veldhoven K , et al
. Blood transcriptional and microRNA responses to short-term exposure to disinfection by-products in a swimming pool. Environ Int 2018;110:42–50.doi:10.1016/j.envint.2017.10.003
OpenUrl

[108] Espín-Pérez A ,

[109] Font-Ribera L ,

[110] van Veldhoven K , et al

[111] ↵

van Veldhoven K ,
Keski-Rahkonen P ,
Barupal DK , et al
. Effects of exposure to water disinfection by-products in a swimming pool: A metabolome-wide association study. Environ Int 2018;111:60–70.doi:10.1016/j.envint.2017.11.017
OpenUrl

[113] van Veldhoven K ,

[114] Keski-Rahkonen P ,

[115] Barupal DK , et al

[116] ↵

Liquet B ,
Lê Cao KA ,
Hocini H , et al
. A novel approach for biomarker selection and the integration of repeated measures experiments from two assays. BMC Bioinformatics 2012;13:325.doi:10.1186/1471-2105-13-325

[118] Liquet B ,

[119] Lê Cao KA ,

[120] Hocini H , et al

[121] ↵

Bassig BA ,
Zhang L ,
Tang X , et al
. Occupational exposure to trichloroethylene and serum concentrations of IL-6, IL-10, and TNF-alpha. Environ Mol Mutagen 2013;54:450–4.doi:10.1002/em.21789
OpenUrl CrossRef PubMed

[123] Bassig BA ,

[124] Zhang L ,

[125] Tang X , et al

[126] ↵

Vlaanderen J ,
van Veldhoven K ,
Font-Ribera L , et al
. Acute changes in serum immune markers due to swimming in a chlorinated pool. Environ Int 2017;105:1–11.doi:10.1016/j.envint.2017.04.009
OpenUrl

[128] Vlaanderen J ,

[129] van Veldhoven K ,

[130] Font-Ribera L , et al

[131] ↵

Gleeson M ,
Bishop NC ,
Stensel DJ , et al
. The anti-inflammatory effects of exercise: mechanisms and implications for the prevention and treatment of disease. Nat Rev Immunol 2011;11:607–15.doi:10.1038/nri3041
OpenUrl CrossRef PubMed

[133] Gleeson M ,

[134] Bishop NC ,

[135] Stensel DJ , et al

[136] ↵

Vineis P ,
Chadeau-Hyam M ,
Gmuender H , et al
. The exposome in practice: Design of the EXPOsOMICS project. Int J Hyg Environ Health 2017;220.doi:10.1016/j.ijheh.2016.08.001

[138] Vineis P ,

[139] Chadeau-Hyam M ,

[140] Gmuender H , et al

[141] ↵

Font-Ribera L ,
Kogevinas M ,
Schmalz C , et al
. Environmental and personal determinants of the uptake of disinfection by-products during swimming. Environ Res 2016;149:206–15.doi:10.1016/j.envres.2016.05.013
OpenUrl

[143] Font-Ribera L ,

[144] Kogevinas M ,

[145] Schmalz C , et al

[146] ↵

Lubin JH ,
Colt JS ,
Camann D , et al
. Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect 2004;112:1691–6.doi:10.1289/ehp.7199
OpenUrl CrossRef PubMed Web of Science

[148] Lubin JH ,

[149] Colt JS ,

[150] Camann D , et al

[151] ↵

Chong I-G ,
Jun C-H
. Performance of some variable selection methods when multicollinearity is present. ‎Chemom Intell Lab Syst 2005;78:103–12.doi:10.1016/j.chemolab.2004.12.011
OpenUrl

[153] Chong I-G ,

[154] Jun C-H

[155] ↵

de Vocht F ,
Cherry N ,
Wakefield J
. A Bayesian mixture modeling approach for assessing the effects of correlated exposures in case-control studies. J Expo Sci Environ Epidemiol 2012;22:352–60.doi:10.1038/jes.2012.22
OpenUrl CrossRef PubMed

[157] de Vocht F ,

[158] Cherry N ,

[159] Wakefield J

[160] ↵

Lenters V ,
Vermeulen R ,
Portengen L
. Performance of variable selection methods for assessing the health effects of correlated exposures in case-control studies. Occup Environ Med 2017:oemed-2016-104231.doi:10.1136/oemed-2016-104231

[162] Lenters V ,

[163] Vermeulen R ,

[164] Portengen L

[165] ↵

Allen GI ,
Peterson C ,
Vannucci M , et al
. Regularized Partial Least Squares with an Application to NMR Spectroscopy. Stat Anal Data Min 2013;6:302–14.doi:10.1002/sam.11169
OpenUrl

[167] Allen GI ,

[168] Peterson C ,

[169] Vannucci M , et al

[170] ↵

Boulesteix AL ,
Strimmer K
. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief Bioinform 2007;8:32–44.doi:10.1093/bib/bbl016
OpenUrl CrossRef PubMed Web of Science

[172] Boulesteix AL ,

[173] Strimmer K

[174] ↵

Chun H ,
Keles S
. Expression quantitative trait loci mapping with multivariate sparse partial least squares regression. Genetics 2009;182:79–90.doi:10.1534/genetics.109.100362
OpenUrl Abstract/FREE Full Text

[176] Chun H ,

[177] Keles S

[178] ↵

Trygg J
. O2-PLS for qualitative and quantitative analysis in multivariate calibration. Journal of Chemometrics 2002;16:283–93.doi:10.1002/cem.724
OpenUrl CrossRef Web of Science

[180] Trygg J

[181] ↵

Patel CJ ,
Manrai AK
. Development of exposome correlation globes to map out environment-wide associations. Pac Symp Biocomput 2015:231–42.

[183] Patel CJ ,

[184] Manrai AK

[185] ↵

Liquet B ,
de Micheaux PL ,
Hejblum BP , et al
. Group and sparse group partial least square approaches applied in genomics context. Bioinformatics 2016;32:35–42.doi:10.1093/bioinformatics/btv535
OpenUrl CrossRef PubMed

[187] Liquet B ,

[188] de Micheaux PL ,

[189] Hejblum BP , et al

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Statistics from Altmetric.com

Request Permissions

Introduction

Methods

Study population

Protein assay

Statistical model

Results

Exposure and protein data

PLS analyses

PLS models

Variable selection on either X or Y

Variable selection on both X and Y

Discussion

What is already known on this subject

What this study adds

References

Footnotes

Read the full text or download the PDF:

Log in using your username and password