The present paper scrutinises the European authorities’ assessment of the carcinogenic hazard posed by glyphosate based on Regulation (EC) 1272/2008. We use the authorities’ own criteria as a benchmark to analyse their weight of evidence (WoE) approach. Therefore, our analysis goes beyond the comparison of the assessments made by the European Food Safety Authority and the International Agency for Research on Cancer published by others. We show that not classifying glyphosate as a carcinogen by the European authorities, including the European Chemicals Agency, appears to be not consistent with, and in some instances, a direct violation of the applicable guidance and guideline documents. In particular, we criticise an arbitrary attenuation by the authorities of the power of statistical analyses; their disregard of existing dose–response relationships; their unjustified claim that the doses used in the mouse carcinogenicity studies were too high and their contention that the carcinogenic effects were not reproducible by focusing on quantitative and neglecting qualitative reproducibility. Further aspects incorrectly used were historical control data, multisite responses and progression of lesions to malignancy. Contrary to the authorities’ evaluations, proper application of statistical methods and WoE criteria inevitably leads to the conclusion that glyphosate is ‘probably carcinogenic’ (corresponding to category 1B in the European Union).
- weight of evidence
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
On 15 March 2017, the Risk Assessment Committee of the European Chemicals Agency (ECHA) adopted its opinion that the scientific evidence did not meet the criteria specified in the Classification, Labelling and Packaging (CLP) Regulation to classify glyphosate as a carcinogen, thereby confirming the conclusion of the European Food Safety Authority (EFSA) published in November 2015.1–3 This assessment was contrary to that of the International Agency for Research on Cancer (IARC), which categorised glyphosate in its monograph as a probable human carcinogen (Group 2A, according to IARC’s classification).4
This paper scrutinises the European authorities’ assessment of the carcinogenic potential of glyphosate using their own criteria and their weight of evidence (WoE) approach as the benchmark, concentrating on the hazard assessment according to the CLP Regulation.2 5–7 In contrast, a paper that compared the divergent assessments made by EFSA versus IARC concluded that the difference came partially from the use of different data sets and partially from methodological differences in the evaluation.8 However, a number of flaws in EFSA’s analysis were identified.9 Another paper, again comparing the assessments by the two institutions, identified serious flaws in the scientific evaluation of the Renewal Assessment Report (RAR) as the reason for EFSA’s failure to classify glyphosate as a carcinogen.10 11 Both papers focused on a comparison between the EFSA and IARC evaluations. For the present analysis, it is important to note that ECHA recently published an updated Guidance (version 5.0), which, however, was not yet in place when its opinion on glyphosate was developed.12 Therefore here, we make reference to version 4.1 of this Guidance, which was applicable during glyphosate’s evaluation.8 We show that based on the scrutiny of the available documents,1 3 11 13 14 EFSA’s and ECHA’s main reason for not classifying glyphosate as a carcinogen appears to be inconsistent with, and in some instances a direct violation of, the relevant guidance and guideline documents from the Organisation for Economic Cooperation and Development (OECD) and ECHA itself.5–7
While the European pesticide regulation 1107/2009 prohibits in principle the marketing of a pesticide classified as a presumed human carcinogen (category 1B),15 its classification is governed by the CLP Regulation 1272/2008.2 According to this regulation, a chemical is classified as a category 1B carcinogenic hazard (presumed human carcinogen) if there is ‘sufficient evidence of carcinogenicity’ in experimental animals. In its Article 220.127.116.11.3.b, ‘sufficient evidence’ is defined as a causal relationship that has been established between the agent and an increased incidence of malignant neoplasms or an appropriate combination of benign and malignant neoplasms in at least two independently conducted valid animal studies. In cases in which some evidence of carcinogenicity exists but is ‘not sufficiently convincing’, the CLP Regulation provides for classification in category 2, as a ‘suspected human carcinogen’.2
As well as determining whether a sufficient number of studies exist with positive findings (‘sufficient evidence’), the ‘strength’ of evidence is evaluated. According to Article 18.104.22.168.3 of the CLP Regulation, strength of evidence involves the enumeration of tumours in human and animal studies and the determination of their level of statistical significance.
In other words, strength of evidence normally requires statistical significance in the observed increase in tumour incidences. If this strong (statistically significant) evidence is seen in at least two independent studies, it may be considered ‘sufficient’ for category 1B. Because of the variability of biological systems, a number of additional factors need to be taken into account. Therefore, according to ECHA, expert judgement is necessary to determine the most appropriate category for carcinogenicity.5 This expert judgement, combined with the interpretative methods of the WoE approach16 is supposed to be guided by several documents.2 5–7 In our opinion, both WoE approach and expert judgement need to be transparent and exercised within the limits set by guidance documents in order to prevent them from shifting away from science-based decisions, possibly to the advantage of certain interest groups.16
Concerning the classification of glyphosate’s carcinogenicity, the controversy begins with the statistical assessment, the determination of the strength of evidence. This relates to two different problems: (1) whether pairwise comparisons or trend tests are more appropriate and (2) whether one-sided or two-sided test statistics should be used.
Pairwise comparisons or trend tests
A pairwise comparison analyses whether the incidence in just one dose group is increased over the control group. In contrast, a trend test asks whether the tumour incidence in all dose groups increases with rising dose as compared with the control.6 According to OECD, testing a trend generally has a more specific hypothesis and has greater power than pairwise comparison.6 This obviously applies to study designs with more than one dose group. OECD’s flowchart depicting a trend test for analysing tumour incidences implies that this method is preferred.6 In our opinion, trend test results should not be played off against those from pairwise comparisons, as was done by EFSA and ECHA.1 3 OECD considers significance in either kind of test as sufficient,6 and ECHA itself acknowledges that any statistically significant increase in tumour incidences is generally taken as positive evidence of carcinogenicity.5
One-tailed or two-tailed test
A two-tailed test considers significance for an increase or a decrease in an observation. In contrast, one-tailed tests consider significance only in one direction, thereby doubling the statistical power.
For the assessment of a carcinogenic hazard, the protection of public health is the primary concern, so only one direction of change is relevant—an increase in tumour incidence. In our opinion, the logical conclusion is that the one-tailed test is the relevant type of test. Nevertheless, OECD remains vague. It states that a one-sided test may be considered more appropriate for tumour incidences, but cautions that this can be controversial if the treatment could also be protective. Therefore, it considers a two-sided comparison more appropriate in such situations.6 The reason for OECD’s ambiguity is not clear, but as described above, the nature of the evaluation (hazard assessment) clearly prioritises one-tailed tests. In contrast, EFSA and ECHA exclusively relied on two-tailed tests.
Overall evaluation of the regulatory approach to statistical significance
The European authorities applied a double attenuation of the power of statistical analysis regarding glyphosate’s carcinogenicity in animal studies. They preferred pairwise comparisons and exclusively used two-tailed tests. This weakened the strength of evidence, even before biological relevance was considered.
Carcinogenicity of glyphosate: the database
The authorities recognised significantly increased tumour incidences in 7 long-term studies out of 12 (2 in rats and 5 in mice), with a total of 11 significant increases when the trend test was applied. The data derived from an official document, the CLH Report,13 are summarised in table 1. Following EFSA’s and ECHA’s approach, the results of two-tailed tests are depicted. Based on this, there was, for instance, a statistically significant increase in 3/5 mouse studies for renal tumours and malignant lymphomas and in 2/5 mouse studies for haemangiosarcoma. Using one-tailed tests, four more tumour incidences would become significant. Not shown in table 1, after a reanalysis of the original study data, eight further significant increases became evident when using one-tailed trend tests. These were not considered by the authorities.9
IARC evaluated all published evidence and identified five case-control studies from Canada, Sweden and USA with a statistically significant association between non-Hodgkin’s lymphoma (NHL) and glyphosate use, while a large cohort study, though only with 6.7 years follow-up time for NHL,10 was negative. IARC concluded that this represents ‘limited evidence’ for the carcinogenicity of glyphosate.4 In the RAR, three of these five publications were discussed and initially dismissed as non-reliable, because information about confounding factors was allegedly missing.11 In an expert opinion, it was demonstrated that these allegations had no basis,17 and the German Federal Institute for Risk Assessment (BfR) finally agreed that ‘limited evidence’ existed for the association between glyphosate and NHL.11 Moreover, two meta-analyses confirmed a statistically significant increase in NHL after occupational exposure to glyphosate.18 19 In EFSA’s conclusion, this was modified to ‘very limited evidence’ (a term without legal definition) that was considered ‘overall inconclusive’.3 ECHA shared this view by concluding that the criteria for assigning a CLP category 2 are not fulfilled.1
Remarkably, both authorities discussed the epidemiological evidence for NHL in isolation and not as supportive evidence for the observed increased incidence of malignant lymphoma in three different mouse studies.
Weight of evidence
While there are different definitions for the WoE approach,6 its ultimate goal in hazard and risk assessment is to balance statistical significance (in this case the observed increases in tumour incidences) against biological relevance. One problem is that WoE is often used ‘to refer to a body of scientific evidence that has been examined for some purported risk without any interpretative methodology’.16 It is noteworthy that this is how EFSA and ECHA use the WoE: they refer to WoE elements as defined in the CLP Regulation, but are not transparent as to how the weighting was performed. Furthermore, the way they used important WoE elements is problematic, as described below.
According to OECD Guidance 116, even non-significant increases could in principle be considered relevant based on WoE.6 However, in the case of glyphosate the authorities used their WoE approach to dismiss the 11 statistically significant increases they had previously recognised. Focusing on mouse studies, the most important WoE elements used by EFSA and ECHA are discussed below. Notably, one of the most important elements—dose-dependence—was completely neglected by both agencies. The ECHA guidance states: ‘Any statistically significant increase in tumour incidence, especially where there is a dose-response relationship, is generally taken as positive evidence of carcinogenic activity’.5 Table 2 shows that such a dose-dependent relationship existed in at least two studies each for renal tumours and malignant lymphomas.
It should be noted that in the Kumar (2001) study only renal adenomas were seen, while in the Knezevich and Hogan (1983) study the dose-dependent increase was for renal carcinoma. One striking difference between the two studies was the study duration (18 vs 24 months). Thus, the possibility exists that carcinoma could have developed in the subsequent 6 months if the Kumar (2001) study had been of 24 months duration.
Appropriateness of the top dose
False reference was made by the BfR and EFSA to an alleged ‘limit dose’ of 1000 mg/kg to claim that the top doses used in the studies were inappropriately high and therefore without relevance for the hazard assessment.3 8 11 13 14 But according to relevant guidance and guideline documents,6 20 a ‘limit dose’ of 1000 mg/kg applies only to chronic toxicity, not to carcinogenicity studies. In contrast, ECHA uses the criterion of the maximum tolerated dose (MTD).6 As an example, this guidance describes that the MTD can be determined by a reduction of body weight gain (up to 10%) and that a decrease of more than 10% indicates that the MTD has been exceeded. ECHA claimed that the MTD was exceeded in the Knezevich and Hogan (1983) and in the Sugimoto (1997) studies, because a more than 15% lower body weight gain was seen in the high dose groups of these studies. However, ECHA failed to take into consideration that in the Sugimoto (1997) study the decrease in body weight gain was associated with a similar reduction in food consumption, most likely due to the high glyphosate concentrations in the test diet (above 4000 mg/kg) affecting palatability (food consumption data for the other study were not available to the authors). Thus, the more than 15% decrease in body weight gain was unlikely to be a result of excessive toxicity. In conclusion, top doses in all five mouse studies were appropriate, although they were rather high in two of the studies.
Reproducibility of effects
Qualitatively, significant increases were reproduced for haemangiosarcoma (2/5 studies); malignant lymphoma (3/3 studies; two further studies were unsuitable for comparison) and renal tumours (3/5 studies) (table 1). EFSA did not acknowledge this reproducibility at all. ECHA acknowledged that for renal tumours there ‘was a positive trend in male mice’, but claimed that ‘the findings were not consistent across all studies’.1 Using quantitative comparisons, the authorities concluded that no evidence for carcinogenicity existed due to lack of consistency of the effects.3 However, in order to make quantitative comparisons, much more stringent requirements for comparability should be applied. We contend that the situation is analogous to the use of historical control data (HCD), where quantitative comparisons of spontaneous tumour incidences are made. Therefore, if a quantitative comparison is intended, the requirements for comparing study results should be similar to those required for HCD (see below). The glyphosate carcinogenicity studies do not offer this degree of comparability. Therefore, in our opinion, quantitative comparisons should not be made.
Historical control data
HCD (tumour incidences of control animals in earlier studies) can help to interpret the data of the study under consideration. According to applicable OECD guidance, they should only be used if the concurrent control data are appreciably ‘out of line’ with recent previous studies.6 The same guidance emphasises ‘that the concurrent control group is always the most important consideration in the testing for increased tumour rates’ and defines strict requirements for the appropriateness of HCD. According to this guidance, HCD should be from the same laboratory and the same strain of animals and should be collected within a maximum of 5 years prior to the actual study. Furthermore, the median and IQR should be used instead of the arithmetic mean and range.6 BfR and EFSA made extensive use of HCD to dismiss the significant tumour findings, but used HCD from time periods up to 17 years beyond the 5-year limit, from seven different laboratories, and sometimes different substrains, and described the HCD by arithmetic mean and simple range instead of median and IQR. ECHA also relied on these inappropriate HCD. ECHA mentioned the existence of guidance-compliant HCD but did not point out that these HCD actually supported the observation of increased tumour incidences for haemangiosarcoma (one study), malignant lymphoma (two studies) and renal tumours (one study).1 No valid HCD were presented by the authorities to support the opposite conclusion.
Multisite responses (tumours in different organs/tissues in the same study) increase the level of concern for a carcinogenic effect.5 These were observed in 3/5 mouse studies. While EFSA did not discuss multisite responses at all,3 ECHA acknowledged two of the three studies with multisite responses,1 but failed to incorporate this element into its WoE assessment.
Progression of lesions to malignancy
This WoE element was not discussed by EFSA.3 ECHA acknowledged such a progression for renal tumours, but considered the evidence equivocal.1 According to ECHA, such a progression from renal adenoma to carcinoma, though equivocal, was seen in one study, but not in two other studies, where only adenomas were observed. However, ECHA did not pay attention to the shorter study duration. Therefore, instead of claiming that the evidence is equivocal, ECHA should have acknowledged that the studies are not comparable. The study with renal adenoma and carcinoma was of 24 months duration, while the two adenoma-only studies lasted 18 months (see table 2).
Effects seen in only one sex
According to applicable guidance,8 effects seen only in one sex may be less convincing. However, the same guidance states that there is no requirement for a mechanistic understanding of tumour induction in one sex only in order to use such findings to support classification in a given hazard category. With one exception—malignant lymphoma in the Kumar (2001) study—significant increases for haemangiosarcoma, malignant lymphoma and renal tumours were seen in male mice only. While EFSA did not pay attention to this WoE element,3 ECHA noted that the apparent sex differences in response remain unexplained, a factor that lowers the consistency of the findings.1 This is a valid observation, but the other five WoE elements discussed above point in the opposite direction and support the classification of glyphosate as a carcinogenic hazard. Therefore ECHA, in reproducing EFSA’s conclusion that no hazard classification for carcinogenicity is warranted, failed to objectively weigh the evidence.
After the IARC monograph was published, BfR reanalysed the data and wrote an Addendum to its own report in which it stated that it had identified 11 significantly increased tumour incidences in two rat and five mouse studies. This reanalysis was the basis for EFSA’s Conclusion and essentially for ECHA’s Opinion. According to applicable regulation,2 these 11 statistically significant findings are more than sufficient to categorise glyphosate as a ‘presumed human carcinogen’ (category 1B). However, in spite of this, both authorities, while claiming to have used a WoE approach, concluded that these findings do not even warrant a category 2 classification (‘suspected human carcinogen’) and classified glyphosate as non-carcinogenic. As demonstrated here, this claim was based on multiple deviations from a proper use of important WoE elements. Applying existing rules and guidance and a transparent WoE approach supports the finding of statistically significant tumour effects caused by glyphosate and warrants its classification as a presumed carcinogen.
Contributors PC wrote the draft text which was commented and supplemented by CR and HB-S.
Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests PC is member of the executive board of Pesticide Action Network (PAN Germany, without remuneration). For writing a report related to the topic of this manuscript he received funding from Global 2000 (Friends of the Earth Austria). CR receives a salary from GMWatch and HB-S receives a salary from Global 2000. Their work on the manuscript of this paper was, however, unpaid. The organisations the authors are affiliated with had no role in the analysis and interpretation of the data or in the preparation and review of the manuscript, though open access is funded jointly by PAN Germany and Global 2000.
Patient consent Not required.
Provenance and peer review Commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.