Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Environmental Pollution 234 (2018) 297e306

Contents lists available at ScienceDirect

Environmental Pollution
journal homepage: www.elsevier.com/locate/envpol

Suspect screening and non-targeted analysis of drinking water using


point-of-use filters*
Seth R. Newton a, *, Rebecca L. McMahen a, b, Jon R. Sobus a, Kamel Mansouri b, c, 1,
Antony J. Williams c, Andrew D. McEachran b, c, Mark J. Strynar a
a
United States Environmental Protection Agency, National Exposure Research Laboratory, Research Triangle Park, NC 27709, United States
b
Oak Ridge Institute for Science and Education Research Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC 27709, United States
c
United States Environmental Protection Agency, National Center for Computational Toxicology, Research Triangle Park, NC 27709, United States

a r t i c l e i n f o a b s t r a c t

Article history: Monitored contaminants in drinking water represent a small portion of the total compounds present,
Received 27 June 2017 many of which may be relevant to human health. To understand the totality of human exposure to
Received in revised form compounds in drinking water, broader monitoring methods are imperative. In an effort to more fully
7 November 2017
characterize the drinking water exposome, point-of-use water filtration devices (Brita® filters) were
Accepted 8 November 2017
Available online 26 November 2017
employed to collect time-integrated drinking water samples in a pilot study of nine North Carolina
homes. A suspect screening analysis was performed by matching high resolution mass spectra of un-
known features to molecular formulas from EPA's DSSTox database. Candidate compounds with those
Keywords:
Drinking water
formulas were retrieved from the EPA's CompTox Chemistry Dashboard, a recently developed data hub
Exposome for approximately 720,000 compounds. To prioritize compounds into those most relevant for human
Suspect screening health, toxicity data from the US federal collaborative Tox21 program and the EPA ToxCast program, as
Non-target analysis well as exposure estimates from EPA's ExpoCast program, were used in conjunction with sample
High resolution mass spectrometry detection frequency and abundance to calculate a “ToxPi” score for each candidate compound. From
~15,000 molecular features in the raw data, 91 candidate compounds were ultimately grouped into the
highest priority class for follow up study. Fifteen of these compounds were confirmed using analytical
standards including the highest priority compound, 1,2-Benzisothiazolin-3-one, which appeared in 7 out
of 9 samples. The majority of the other high priority compounds are not targets of routine monitoring,
highlighting major gaps in our understanding of drinking water exposures. General product-use cate-
gories from EPA's CPCat database revealed that several of the high priority chemicals are used in in-
dustrial processes, indicating the drinking water in central North Carolina may be impacted by local
industries.
Published by Elsevier Ltd.

1. Introduction are due to differences in environments (Rappaport and Smith,


2010), which includes direct exposures via consumption of drink-
Safe drinking water supplies are critical for public health and it ing water. Chemicals that are present in water supplies can increase
has been estimated by the World Health Organization (WHO) that a risk for disease and adverse health outcomes over long-term
10% reduction in worldwide disease could be achieved by im- exposure periods (WHO, 2013). It has been demonstrated for
provements related to drinking water alone, including sanitation, various chemical classes, including perfluorinated chemicals, that
hygiene, and water resource management (Prüss-Üstün et al., drinking water can be one of the most important pathways for
2008). Furthermore, it is estimated that 70e90% of disease risks human exposure (Egeghy and Lorber, 2011; Lorber and Egeghy,
2011). Even so, it has been estimated that only 40% of US con-
sumers used any kind of water purification device in 2014 (Anumol
*
This paper has been recommended for acceptance by Maria Cristina Fossi. et al., 2015). Certain chemicals are regulated under the Safe
* Corresponding author. 109 TW Alexander Dr., Durham, NC 27709, United States. Drinking Water Act, but these chemicals constitute only a small
E-mail address: [email protected] (S.R. Newton).
1 fraction of the number of chemicals present in drinking water (US
Current Affiliation: Scitovation LLC, Research Triangle Park, NC 27709, United
States. EPA, 2016). New compounds can be added to this list if they are

https://doi.org/10.1016/j.envpol.2017.11.033
0269-7491/Published by Elsevier Ltd.
298 S.R. Newton et al. / Environmental Pollution 234 (2018) 297e306

discovered and deemed to pose a threat to human health. These researchers must find ways to prioritize unknowns into those that
additions, however, require developing and validating “targeted” they believe are most likely to be relevant to human and environ-
methods, which is a slow and expensive process. Furthermore, this mental health (Sobus et al., 2017). Recently, a method to prioritize
process requires some a priori knowledge of the compounds for the vast number of unknowns in a sample by incorporating toxicity
which methods should be developed. As of yet, there is no reliable and exposure information was presented by Rager et al. (2016). We
mechanism to identify and prioritize novel compounds. There are have sought to apply this method to drinking water in the Raleigh/
needs, then, for: 1) a more complete picture of chemical exposures Durham/Chapel Hill area of North Carolina, United States, and
via drinking water consumption; 2) methods of rapidly identifying improve upon it using tools and data available from EPA's CompTox
emerging chemicals that may be of importance to human health; Chemistry Dashboard (hereafter referred to as “the Dashboard”,
and 3) means with which to properly assess exposure-disease re- https://comptox.epa.gov/dashboard), a newly developed web
lationships and risks to human health (Villanueva et al., 2014). application that supports SSA/NTA workflows (McEachran et al.,
Recent advances in analytical techniques have led to the 2017b). We have also sought to demonstrate that SSA/NTA
detection of various contaminants in water which would have methods can rapidly identify contaminants in drinking water that
otherwise gone undetected using traditional targeted methods are not routinely monitored and would likely go undetected
(Schymanski et al., 2015; Strynar et al., 2015). These advanced without these methods.
techniques often employ high resolution mass spectrometry
(HRMS), or tandem HRMS, to either match unknown sample fea- 2. Materials and methods
tures to compounds within spectral and/or spectra-less databases
(a technique known as suspect screening analysis [SSA]), or eluci- 2.1. Materials
date structures of unknowns that may not be contained in a data-
base (a technique known as non-targeted analysis [NTA]). While Information about the materials used in this study can be found
these two techniques differ, they are often discussed together as in the Supporting Information (SI).
they are complimentary to each other. SSA/NTA workflows are
rapidly evolving, and are becoming more frequently used to detect
differences (or similarities) between two or more groups of sam- 2.2. Sample collection
ples in case-control style experiments. Example applications
include: detecting a chemical spill in a river after a baseline Samples were collected in a pilot scale study by installing a
chemical signature has been established (Bader et al., 2016); eval- Brita® Basic Faucet Filter in the homes of nine North Carolina res-
uating the contribution of various tributaries to a river (Ruff et al., idents. Provided in the SI is a list of chemicals that Brita® Basic
2015); or singling out unknown features that appear in landfill Faucet Filters are known to remove from drinking water (SI,
leachate and in downstream drinking water (Müller et al., 2011). Table S1), as well as a table of organic chemicals included in the Safe
SSA/NTA approaches may also be applied to environmental Drinking Water Act (Table S2). Some residents received drinking
samples in support of general monitoring e that is, to broadly water from their local municipalities, while other residents
screen for the occurrence of chemicals in a selected medium. The received their drinking water from a private well. Information
ability to rapidly identify unknown compounds during routine about the water source and municipality can be found in Table 1.
monitoring is essential to fully explore the exposome, defined as Although the samples are labeled by location, many of the drinking
the sum of all exposures (exogenous and endogenous) for an in- water treatment facilities report purchasing water from other fa-
dividual over a lifetime (Wild, 2005). In order to sequence the cilities so it is possible the sampling location is not fully indicative
exposome, it is useful and necessary, from an analytical standpoint, of the original drinking water source. The study participants were
to compartmentalize exposures by matrix. Examples of monitoring asked to use the filter for cold water during everyday use until the
studies that focus on a specific matrix can be found for dust (Rager indicator light on the filter turned red, signaling that the filter was
et al., 2016), river water (Schymanski et al., 2015), waste water at its maximum capacity. This process took between 1 and 4
(Schymanski et al., 2014b), etc. but drinking water remains rela- months for each sample with an average sampling time of 68 days.
tively unexplored with regards to SSA/NTA. This is somewhat sur- The participants were asked to return their filters for analysis upon
prising, as drinking water is a fairly simple matrix to which humans seeing the red indicator light.
are exposed in similar amounts, in contrast to dust or waste water,
which require clean-up steps after extraction, and for which 2.3. Sample extraction and processing
exposure amounts are not well known.
When applied to environmental and biological samples, SSA/ The filter was removed from the plastic casing using a band saw
NTA methods have the potential to allow rapid chemical charac- with a clean blade and placed into a plastic bag for storage until
terization without the need for standards or a priori knowledge of extraction. The filters were individually lyophilized for three days
sample constituents. Confidence in the identification of unknowns to remove any water which remained in the filter pores. The filters
can be communicated in terms of levels outlined by Schymanski
et al. (2014a), where the highest level of confidence (level 1) re-
Table 1
quires confirmation by an analytical standard, and the next level of
Sample information.
confidence (level 2) requires evidence for a probable structure. A
goal for researchers using SSA/NTA methods should be to confi- Sample # Location Source Type Population Served

dently classify as many unknowns as possible into level 2, and not 1 Durham Municipal 265,472
necessarily level 1, as it is not practical, or even possible, to confirm 2 Durham Municipal 265,472
3 Apex Municipal 46,831
all unknowns with analytical standards. Chemicals of highest
4 Cary Municipal 182,088
concern can then be confirmed with standards, if possible, and 5 Chapel Hill Municipal 83,300
categorized into level 1. Confidence in level 2 identifications will 6 Chapel Hill Private Well e
most likely come about through the development of several 7 Raleigh Municipal 540,000
different tools that build increasing confidence of positive detec- 8 Pittsboro Municipal 4,401
9 Pittsboro Private Well e
tion. As we are in the early years of a burgeoning exposomics field,
S.R. Newton et al. / Environmental Pollution 234 (2018) 297e306 299

were extracted via Soxhlet using 300 mL of an dichlor- gov/dashboard/dsstoxdb/batch_search). In this manner, the most
omethane:methanol (80:20 v/v) mixture for 24 h. Upon comple- likely candidate structures are retrieved and ordered by the num-
tion, the flasks were cooled for 30 min before the solvent was ber of data sources associated with each structure. Data sources in
removed under reduced pressure using a rotary evaporator. The this context represent the number of times an EPA dataset, data-
extract was re-dissolved in 5 mL of methanol, centrifuged at base, or list within DSSTox contains a particular chemical. This
12,500  g for 3 min to remove particles from suspension. One- workflow follows previous reports on the identification of “known
hundred mL of sample was mixed with 300 mL of 2 mM ammo- unknowns” by Little et al. (2012). Additionally, it has been
nium acetate buffer in an autosampler vial for analysis. demonstrated using the EPA Dashboard that candidate compounds
with the greatest number of data sources are the correct compound
2.4. Instrumental analysis for a given formula in over 80% of cases (McEachran et al., 2017b).
Bioactivity and exposure data for some of these structures were
Liquid Chromatography (LC) - Time-of-Flight (TOF) HRMS available from the Tox21/ToxCast (US EPA, 2015) and ExpoCast
analysis was carried out using an Agilent 1100 HPLC (Agilent (Wambaugh et al., 2013) projects, respectively, and accessible via
Technologies, Palo Alto, CA), interfaced with an Agilent 6210 TOF the Dashboard. Compounds for which toxicity and exposure data
HRMS. Chromatographic separation was accomplished using an were available were labeled as “Group A” compounds, whereas
Eclipse Plus C8 column (2.1  50 mm, 3.5 mm; Agilent Technologies, compounds missing one or both of these data types were labeled as
Palo Alto, CA). The method consisted of the following conditions: “Group B”. Multiple candidate compounds often existed for a given
0.2 mL/min flow rate; column at 30  C; mobile phase A as ammo- formula, with some being Group A compounds and some being
nium formate buffer (0.4 mM) and DI water:methanol (95:5 v/v), Group B compounds. For Group A compounds, a bioactivity ratio
and mobile phase B as ammonium formate (0.4 mM) and meth- was calculated as the number of assay hits divided by the total
anol:DI water (95:5 v/v); gradient: 0e25 min linear gradient from number of assays tested. Exposure categories were calculated from
75:25 A:B to 15:85 A:B; 25e40 min a linear gradient from 15:85 A:B ExpoCast daily exposure estimates using the categorization
to 100% B; 40e45 hold at 100% B. The TOF-HRMS was fitted with an described by Rager et al. (2016):
electrospray ionization source, which operated in both negative
and positive ionization modes (separate injection for each mode), Category 1 <1  108 mg/kg/day;
using a fragmentor voltage of 80 V. Data was collected in 4 GHz high
resolution mode, collecting ions in m/z range 100e1700 in both Category 2 1  108 mg/kg/day and <1  107 mg/kg/day;
centroid and profile data formats. Further details on instrumental
parameters can be found in Table S3 (SI). Category 3 1  107 mg/kg/day and <1  106 mg/kg/day;

2.5. Molecular feature detection and chemical formula assignment Category 4 1  106 mg/kg/day and <1  105 mg/kg/day;

Molecular feature extraction and formula assignment was per- Category 5 1  105 mg/kg/day and <1  104 mg/kg/day;
formed according to previously published methods (Rager et al.,
2016). Briefly, molecular features (defined as an exact mass, Category 6 1  104 mg/kg/day and <1  103 mg/kg/day; and
retention time, and isotope cluster of an apparent unknown com-
pound) were identified and extracted using Agilent MassHunter 6.0 Category 7 1  103 mg/kg/day and <1  102 mg/kg/day.
Qualitative Software's molecular feature extractor (MFE). Features
were extracted from the method blanks and solvent blanks first and A ToxPi score was calculated for each Group A compound (i)
the masses of those features were used in a “mass exclusion list” using its bioactivity (B) ratio, exposure category (E), detection fre-
when extracting features from the samples. MassHunter was then quency (DF), and abundance (average chromatographic peak area,
used to match molecular features from the samples to chemical A), according to equation (1). All values for E, DF, and A were log-
formulas contained in EPA's Distributed Structure-Searchable transformed before applying equation (1) due to the skewed na-
Toxicity database V2 (DSSTox_V2). This database contains a list of ture of their distributions.
16,532 unique formulas (de-salted) which correspond to 33,659
chemicals. Feature matches were scored based on neutral accurate Bi  Bmin Ei  Emin DFi  DFmin
ToxPi Score ¼ þ þ
mass, isotope distribution, and isotope ratio. While DSSTox_V2 Bmax  Bmin Emax  Emin DFmax  DFmin
(1)
contains chemical compounds, it was used only to assign molecular A  Amin
þ i
formulas since isomers cannot be distinguished using the methods Amax  Amin
described here (which consider molecular MS spectra only). Newer
Equal weight was given to each category despite the precedent
versions of the DSSTox database, including the version which is
of weighting some categories differently (Rager et al., 2016).
accessed by the Dashboard (approximately 760,000 as of November
All compounds were further subcategorized with a “1” if the
2017), contain many more chemicals; however, the de-salted forms
compound had the highest number of data sources for its formula,
of the molecular formulas were not available at the time the
or a “2” if it did not. Compounds in Group A were also sub-
database matching for this study was conducted. Molecular for-
categorized with a “a” if the compound had the highest ToxPi score
mulas were only assigned to features which attained a match score
for its formula, and a “b” if it did not. Thus, all compounds fell into
of 90. Further details on the software settings for the MFE and
one of six categories: A1a, A2a, A1b, A2b, B1, or B2 (Fig. 1), with A1a
database search can be found in Table S3 (SI).
compounds being the most likely structures and highest ToxPis for
their formulas and thus the highest priority group.
2.6. Assignment of probable structure from molecular formulas

The workflow for assigning structures to formulas and priori- 2.7. Literature search
tizing those structures is shown in Fig. 1. Candidate structures
associated with molecular formulas were retrieved from the Three databases were searched to assess the prevalence of A1a
Dashboard using the Batch Search capability (https://comptox.epa. compounds in the literature: SciFinder®, Google Scholar, and
300 S.R. Newton et al. / Environmental Pollution 234 (2018) 297e306

Fig. 1. Workflow for processing data and categorizing candidate compounds.

PubMed. As described in Rager et al. (2016), the SciFinder® search (PCA) was performed using a matrix of summed peak areas for A1a
(SciFinder, 2017) was performed to determine whether the A1a compounds in specific samples (observations) and product-use
chemicals have previously been reported as being detected in wa- categories (variables). PCA plots were constructed using the caret
ter. Each chemical's CASRN was searched by the term “water” package (version 6.0e62) in the R programming language (version
within the SciFinder® “Research Topic” menu. The results were 3.3.1).
then refined to only include journal references and the number of
results was recorded. The Google Scholar and PubMed searches
2.10. Quality control and quality assurance
were conducted using the same search terms and no filters were
applied. All searched were conducted manually. This literature
Calibration of the instrument was performed prior to analysis in
search was not meant to be exhaustive, but rather to provide some
each mode. Any drift in the mass accuracy of the TOF was contin-
indication of each compound's relative prevalence in the literature
uously corrected by infusion of two reference compounds (purine
and association with water.
[m/z ¼ 119.0363] and Hexakis(1H,1H,3H-perfluoropropoxy)
phosphazene [identified in the Dashboard as DTXSID90880494,
2.8. Retention time prediction using OPERA-RT observed as a formate adduct at m/z ¼ 966.0007]) via dual-ESI
sprayer. Three unused filters were processed along with the sam-
OPERA-RT is quantitative structure property relationship (QSPR) ples as method blanks. The masses of features observed in these
model that is part of OPERA, a free and open-source suite of models methods blanks were used in a blank exclusion list when extracting
used to predict physicochemical and environmental fate of organic features from samples. Solvent blanks were also analyzed consist-
chemicals (download available on Github: https://github.com/ ing of a mix of ammonium acetate buffer and methanol.
kmansouri/OPERA) (Mansouri et al., 2016). OPERA-RT was previ-
ously developed as described in McEachran et al. (2017a). The tool
3. Results and discussion
uses molecular descriptors as input to predict LC retention times for
compounds and is based on the same LC method that was used in
Approximately 15,000 total features were detected across all
this study. Retention times were predicted for A1a compounds and
samples, with 10,606 found in positive mode and 4,317 in negative
a window of ±10% of the total chromatographic run time (±4.5 min)
mode. The greater number of positive mode features may have
was used to compare the observed retention time with the pre-
been aided by the presence of Hþ ions from the slightly acidic
dicted retention time of the putative A1a identification. The tool
mobile phase. Positive mode features tended to be smaller in
was used to increase confidence in the identification of A1a com-
chromatographic peak area, with the median peak area (190,000)
pounds as recommended by McEachran et al. rather than to exclude
roughly half that of negative mode features (370,000). Four-
compounds that fall outside their retention time window.
hundred and thirty features were matched to a formula in the
DSSTox_V2 database with a match score of 90 or greater. A greater
2.9. Product-use categories proportion of negative mode features was matched (4.2%) than
positive mode features (2.3%). Across both modes, 2.9% of features
Product-use categories for A1a compounds were taken from were matched yet peak areas for these matched features comprised
EPA's CPCat database (Dionisio et al., 2015). These data can be 16.9% of the total peak area of all features. The number of features
explored through the Dashboard. Principal component analysis matched is similar to that reported by Rager et al. who matched less
S.R. Newton et al. / Environmental Pollution 234 (2018) 297e306 301

than 2% of the total number of features in 56 dust samples but did Table 2
not report the percentage of peak area that was matched. The Descriptive statistics of features, formulas, and A1a compounds between negative
and positive modes.
median peak area of unmatched features was approximately
200,000 while the median peak area of matched features was Ionization mode Negative Positive Total
approximately 1.5 million (Fig. 2). This means that while the Number of features 4,317 10,606 14,923
number of features being matched is low, matching tends to favor Average (SD) features per sample 480 (207) 1,178 (542) 1,658 (724)
larger peaks. This is not surprising considering that larger features Geometric mean peak area 420,000 230,000 270,000
Features assigned a formula 181 249 430
are likely to contain better isotope peaks which play a crucial role in
Unique formulas 166 231 270
matching to a formula (Kind and Fiehn, 2006). Another possible Percent of features assigned a formula 4.2% 2.3% 2.9%
explanation is that larger peaks tended to be compounds that have Percent peak area assigned a formula 12.8% 19.2% 16.9%
been of interest previously and are therefore more likely to be Features with A1a designation 74 74 148
Percent peak area of A1a compounds 8.2% 7.0% 7.4%
contained within the database from prior study by researchers.
Descriptive statistics for features and molecular formula matches
can be found in Table 2, and a bubble plot of all features with
retention time and m/z can be found in the SI (Fig. S1). percentage of formulas in which they were found. Nitrogen, how-
Kernel density plots showing the distributions of the masses, ever, was found in 48% of the water filter formulas but only 34% of
volumes, and mass defects of features can be seen in Fig. 2. The the house dust formulas. Sulfur was found in 10% of the water filter
mass distribution of features matched to the database was heavily formulas but 29% of formulas from the house dust. Summary sta-
biased towards the distribution of masses in the DSSTox_V2 data- tistics on the elemental composition and mass distributions of the
base. The percentage of features with masses less than 500 Da was formulas generated in the two studies can be found in the SI
51% for all features, but increased to 90% for features assigned a (Table S4) as well as a PCA of the element counts, retention times,
formula. This is likely due to the fact that 92% of compounds in the and masses for each formula (Fig. S2). Despite the differences in
DSSTox_V2 database have masses less than 500 Da. The same trend carbon, nitrogen, and sulfur content, no clear separation or patterns
was observed in the distribution of mass defects among features were observed in the PCA.
assigned a formula, highlighting the importance of the content of The 430 features that were assigned a formula were comprised
the databases used when performing suspect screening analysis. of 270 unique formulas which generated 10,621 candidate com-
Mass and elemental composition of the formulas generated in pounds from the Dashboard, giving an average of 39 compounds
this study (water filters) were compared to those of the previous per formula (range ¼ 1 to 451 compounds per formula). Each
study of house dust by Rager et al. (2016) on the basis that the same candidate compound was then categorized into Group A, contain-
database and matching algorithm were used. Significant differ- ing toxicity and exposure data, or Group B, not containing these
ences (Welch's two sample t-test, p < 0.001) in mass and number of data. Of all candidate compounds, 205 contained were categorized
carbons per formula were observed between the studies, with the into Group A, 91 of which were sub-categorized into Group A1a,
house dust containing heavier compounds and 3.4 more carbons which are considered the most likely compounds based on data
per formula, on average, than the water filters. Oxygen and phos- source rankings (McEachran et al., 2017b) as well as the most
phorous were similar in the average number per formula and important compounds with regards to bioactivity, exposure,

Fig. 2. Kernel density plots of mass, peak area, and mass defects for negative, positive, matched (i.e., formula assigned), unmatched features, and the entire DSSTox_V2 database.
302 S.R. Newton et al. / Environmental Pollution 234 (2018) 297e306

abundance, and detection frequency. The SciFinder® search resul- formulas as confirmed using standards demonstrates the utility of
ted in 59 of the A1a compounds being associated with water in data source ranking described in McEachran et al., where 88% of a
journal articles, meaning 32 have not been associated before with test set of 162 compounds ranked first by data source when using
water. Among those with associated journal articles, the average the Dashboard (McEachran et al., 2017b). For the confirmed com-
number of articles was 569, highlighting the tendency for re- pounds, 8 of the 15 were perfluoroalkylated substances (PFAS), two
searchers to publish on already-known compounds and the need were chlorinated phosphate flame retardants, and one was a
for more work in compound discovery. The PubMed search gave chlorinated pesticide (atrazine). The types of confirmed com-
similar results, with 66 compounds associated with water but the pounds are a reflection of the types of available standards in our
Google Scholar search returned 90 compounds associated with laboratory and not necessarily representative of the types of com-
water. pounds actually contained in the samples (see section on Product-
The remaining 114 Group A compounds were sub-categorized as Use Categories). The percentage of true positives (94%) relative to
follows: 26 into Group A2a, 18 into Group A1b, and 70 into Group false positives (6%) is considered very good for SSA and it increases
A2b. Of the remaining 10,416 Group B compounds, 196 were sub- confidence in the method of prioritization but it must be
categorized into Group B1 and 10,220 in Group B2. While the vast acknowledged that this success rate may not accurately represent
majority of candidate compounds fall into Group B2, these com- the rate of correct prioritization for the rest of the compound-
pounds are less likely to be the correct compounds for a given set of formula mappings due to the fact that standards were not
matched formulas. Group A1a features tend to be larger than most randomly chosen. The standards used in this study were readily
peaks: the median peak area of an A1a feature was approximately available in one of our laboratories and, thus, had previously been
1,900,000 counts whereas the median peak area of non-A1a fea- purchased due to their environmental relevance.
tures that were assigned a formula was 1,300,000 counts, and the Eight of the top 20 ToxPi compounds were confirmed with
median peak area of features that were not assigned a formula was standards, including the compound with the top ToxPi score, 1,2-
220,000 counts. Furthermore, 44% of the peak area that was Benzisothiazolin-3-one. Over 500 product use entries are listed in
assigned to a formula could be mapped to an A1a compound, which the EPA's CPCat database and the Consumer Product Information
was 7.4% of the total peak area of all features. A list of all A1a Database (Consumer Product Information Database, 2017) lists it in
compounds along with their bioactivity and exposure values, many products that are expected to go directly to waste water after
functional use information, results of the SciFinder® search, and use such as hand soap, dish soap, detergent, etc. It was found in 7 of
other supplementary data can be found in the SI (Table S5). the 9 drinking water samples and was active in 173 of 565 toxicity
assays tested. Although the SciFinder® search found 95 journal
3.1. ToxPi scores and confirmation by standards articles associating this compound with water, it is not regularly
monitored for in drinking water and would not have been discov-
ToxPi scores for Group A1a compounds ranged from 0.046 to ered without an SSA approach.
2.99 out of a maximum possible score of 4. All A1a ToxPis scores are
displayed graphically in Fig. 3 with values given for the top 20. In 3.2. Retention time prediction
general, the contribution from the four different categories to the
total ToxPi score varied greatly from compound to compound. As described by McEachran et al. (2017a), the OPERA-RT model
To assess correct structure-to-formula assignments and confirm has a 95% confidence window of ±4.5 min. Of the 91 A1a com-
compounds with standards, sample-based formulas were matched pounds, 52 were never observed outside this window giving us
with formulas for standards readily available in our laboratory. greater confidence in the correct identification of these compounds
Sixteen unique compounds had formulas matching those of exist- (SI, Table S5). These compounds include all 15 true positives that
ing laboratory standards. Thirteen of the standard compounds were were confirmed with standards. The predicted retention time for
categorized as A1a and three as B1. Of the three B1 compounds, no triethyl citrate was within the 95% confidence window of the
group A compounds existed for those formulas. Fifteen of sixteen observed feature that was mislabeled as this compound and, thus,
compounds were ultimately confirmed with standards via reten- OPERA-RT would not have helped to identify this particular false
tion time matching and visual inspection of MS spectra. One positive. Only through the use of an analytical standard were we
compound did not match in retention time to its A1a-assigned able to observe a difference in retention time large enough to
feature and was therefore considered to be a false positive although confidently label this peak as a false positive, yet small enough to
its spectrum matched. The formula for this compound was fall within the predicted retention time window from OPERA-RT. To
C12H20O7 and the standard with this formula was triethyl citrate. date, the effectiveness and proper implementation of this retention
Given the close spectral match but difference in retention time, the time tool has not been fully evaluated, however, it provides an
sample likely contained an isomer of triethyl citrate. Triethyl citrate added layer of confidence for those compounds that fall within
was ranked by ToxPi score as 15th among A1a compounds, but their predicted window.
removed from Fig. 3 because it was confirmed to be a false positive.
All other compounds with this formula were classified as B2 com- 3.3. Product-use categories
pounds. The twelve A1a compounds confirmed with standards as
true positives can be seen in Table S5 (SI), eight of which were All A1a compounds were assigned to at least 1 of 15 product-use
among the top 20 highest ToxPis and can be seen in Fig. 3. The three categories, and some to several categories, as they may have
B1 compounds confirmed with standards were Fipronil Sulfone, different functional uses. Thirteen of fifteen product use categories
Perfluorovaleric Acid (PFPeA), and Perfluorohexanoic Acid (PFHxS). contained at least one A1a chemical from the samples. Fig. 4 shows
The 15 confirmed compounds have a range of log octanol-water the number of A1a compounds in each sample for a given category.
partitioning coefficients (log Kow) from 0.8 (1,2-Benzisothiazolin- A PCA was performed using the sum of the peak areas of the
3-one) to 4.8 (Perfluoroundecanoic acid). The outer bounds of the compounds represented in this matrix. The loadings plot from the
range of log Kow values for which this method is suitable cannot be PCA is given in the SI (Fig. S3). The first principal component
fully assessed due to the small number of confirmed compounds explained 39.2% of the variance and the second explained 27.5%.
but likely extends beyond this range. The two well water samples (Chapel Hill and Pittsboro) were
The high percentage of correct structure assignments to positioned very closely on the PCA score plot, indicating these
S.R. Newton et al. / Environmental Pollution 234 (2018) 297e306 303

Fig. 3. ToxPis of all A1a compounds (bottom left) with the top 20 enlarged (top left) and their corresponding ToxPi scores (right).

samples are very similar with regards to product-use categories. often of concern for public health. A focus was also placed on the
Tap water from Apex and Cary plotted closely on the PCA as well, sample in which the most features was found, the Pittsboro tap
which may be because these towns are very close in proximity and water. The most abundant and the fourth most abundant features
share source water. One outlier on the PCA was the Pittsboro tap in the Pittsboro Tap sample that fell into the mass defect range were
water. This sample had the most number of features (3341 recognized as being decarboxylated perfluoroalkyl acids. We have
compared to an average of 1658 per sample), the most number of previously observed decarboxylation of perfluoroalkyl acids within
formulas assigned to features (108 compared to an average of 48 the ion source and fragments would not match to the DSSTox
per samples), and ultimately the most number of A1a chemicals database using this described method. The second largest peak, m/z
(38). 564.8848, revealed a high degree of chlorination in its spectra and
Besides the category “other”, the two categories with the most was found to co-elute with m/z 518.8796, indicating the peak at m/
number of A1a chemicals were “Industrial Process No Consumer” z 564.8848 is a formate adduct. These peaks were found in negative
followed by “Consumer and Industrial Process”, indicating that ESI mode, meaning the peak at m/z 518.8796 likely results in the
drinking water in this area may be impacted by local industries. loss of a proton making the neutral mass approximately 519.8869. A
Other top categories included those containing pesticides (“Pesti- chromatogram of these two peaks and the spectrum of the larger
cide Active and Consumer”, “Pesticide Active No Consumer”, and peak (m/z 564.8848) is shown in Fig. S4 (SI). Formula generation
“Pesticide Inert”). The category “Personal Care Products” was also using MassHunter, which considers relative isotopic abundance
significant, affecting 8 of the 9 samples. and spacing as well as exact mass for the isotope cluster beginning
at m/z 518.8796, produced C12H20Cl7O5P with a match score of 99.5
3.4. Non-targeted analysis (NTA) of unmatched features out of a possible 100. No compounds matching this formula were
found in public databases such as the Dashboard or PubChem;
An exhaustive NTA is outside the scope of this article, however, however, a search using SciFinder® revealed one match for this
some work has been done on identifying features that were not formula, (2-chloroethyl)-bis[2,2,2-trichloro-1-(1-methylethoxy)
assigned a formula and therefore did not undergo the subsequent ethyl] ester phosphonic acid (CAS 71039-43-5), shown in Fig. 5. This
steps of our SSA workflow. Emphasis was placed on a mass defect compound is found in a patent and described, along with several
range from 0.2 to 0 as this is indicative of halogenated organic other chlorinated phosphonic acids, as plant growth regulators.
compounds which often contain unique isotope signatures and are However, this compound is strikingly similar to other
304 S.R. Newton et al. / Environmental Pollution 234 (2018) 297e306

Fig. 4. A) First (x-axis) and second (y-axis) principal components in a principal component analysis using summed peak areas for all compounds within a category; B) box and
whisker plots representing the range of peak areas for compounds within a category; C) box and whisker plots representing the range of peak areas for compounds within each
sample; and D) heat map showing the number of compounds that fall into each category by sample. Blank squares indicate no A1a compound was present for a category in a
sample.

identify features which were not assigned a formula will continue


using similar approaches as described here.

4. Limitations and future directions

The use of an activated charcoal filter to capture contaminants


from drinking water likely biased the experimental design towards
compounds with sufficiently large Kow values to interact with the
filter. It is possible that some compounds which may be of rele-
vance to human health, probably very polar compounds, passed
through the filter without capture and, thus, were not retained in
the samples. The instrumental analysis could have been expanded
in several ways to increase the percent of total features identified.
Alternative columns, such as HILIC, can be used to separate com-
pounds that elute in the void volume when using a C8 column.
Furthermore, additional ionization sources, such APCI or APPI,
Fig. 5. (2-chloroethyl)-bis[2,2,2-trichloro-1-(1-methylethoxy)ethyl] ester phosphonic could be used to ionize compounds that were not detected under
acid (CAS 71039-43-5), the only discovered structure matching the generated formula
ESI conditions. Future studies should also consider including a gas
of C12H20Cl7O5P for a large unknown peak at m/z 518.8796.
chromatography (GC) component to explore a larger chemical
space. At the time of formula matching, only a limited version of the
organophosphate compounds, such as TDCPP, also found in this DSSTox_Database (V2) was available in its de-salted form (de-sal-
study and commonly used as flame retardants. Further NTA work to ted formulas are required to match to mass spectral data). Since
S.R. Newton et al. / Environmental Pollution 234 (2018) 297e306 305

then, a much larger, more extensive version of the database has Acknowledgments
become available in its de-salted form which includes over 720,000
chemicals and can be accessed via the Dashboard's downloads page The authors thank Kristen Isaacs for providing code to parse
(https://comptox.epa.gov/dashboard/downloads). This increase in formulas and information from the ACToR database; Katherine
size would have most likely resulted in a higher percentage of Phillips for providing product use categories and coding guidance;
features being assigned formulas. The current method was unable John Wambaugh for providing ExpoCast data; and Chris Grulke for
to identify compounds which fragmented in the ionization source, guidance on the DSSTox database structure.
as was observed when the decarboxylated perfluoroalkyl acids
were identified. Another limitation to this study, as with most SSA/ Appendix A. Supplementary data
NTA studies, is the inability to estimate concentration. Future
studies should explore ways of estimating instrument responses for Supplementary data related to this article can be found at
compounds without the use of standards. QSPRs appear to be the https://doi.org/10.1016/j.envpol.2017.11.033.
most viable path to solve this problem. However, a large training set
of instrument responses based on chemical standards will be
References
required. Future studies should also focus on better inclusion of
tools to mount confidence in level 2 identification on the Schy- Anumol, T., Clarke, B.O., Merel, S., Snyder, S.A., 2015. Point-of-use devices for
manski scale, including better implementation of retention time attenuation of trace organic compounds in water. J. AWWA 107, 9.
predictors, fragmentation predictors for MS/MS data, etc. In any Bader, T., Schulz, W., Lucke, T., 2016. Application of Non-target Analysis with LC-
HRMS for the Monitoring of Raw and Potable Water: Strategy and Results,
case, improved access to Open Data sets for integration into our Assessing Transformation Products of Chemicals by Non-target and Suspect
databases will be highly beneficial and the community is encour- Screening  Strategies and Workflows Volume 2. American Chemical Society,
aged to consider the benefits of such an approach (Schymanski and pp. 49e70.
Consumer Product Information Database, 2017. CPID.
Williams, 2017). Dionisio, K.L., Frame, A.M., Goldsmith, M.-R., Wambaugh, J.F., Liddell, A., Cathey, T.,
Smith, D., Vail, J., Ernstoff, A.S., Fantke, P., Jolliet, O., Judson, R.S., 2015. Exploring
consumer exposure pathways and patterns of use for chemicals in the envi-
ronment. Toxicol. Rep. 2, 228e237.
5. Conclusions Egeghy, P.P., Lorber, M., 2011. An assessment of the exposure of Americans to per-
fluorooctane sulfonate: a comparison of estimated intake with values inferred
Although there have been abundant research efforts directed at from NHANES data. J. Expo. Sci. Environ. Epidemiol. 21, 150e168.
Kind, T., Fiehn, O., 2006. Metabolomic database annotations via query of elemental
identifying contaminants in drinking water, to the best of our
compositions: mass accuracy is insufficient even at less than 1 ppm. BMC
knowledge, this study is the first to use a point-of-use home filter Bioinforma. 7, 1.
combined with an SSA/NTA approach. Its utility in this pilot scale Little, J.L., Williams, A.J., Pshenichnov, A., Tkachenko, V., 2012. Identification of
application is illustrated in our identification of several compounds “known unknowns” utilizing accurate mass data and chemspider. J. Am. Soc.
Mass Spectrom. 23, 179e185.
that would not otherwise be monitored in drinking water. The need Lorber, M., Egeghy, P.P., 2011. Simple intake and pharmacokinetic modeling to
for a more comprehensive SSA/NTA approach is highlighted by the characterize exposure of Americans to perfluoroctanoic acid, PFOA. Environ. Sci.
large number of features present in the samples, and the limited Technol. 45, 8006e8014.
Mansouri, K., Grulke, C.M., Richard, A.M., Judson, R.S., Williams, A.J., 2016. An
number of which that were confirmed or tentatively identified. automated curation procedure for addressing chemical errors and in-
We have demonstrated that ranking by data source correctly consistencies in public datasets used in QSAR modelling. SAR QSAR Environ.
prioritized (Group A1a or B1) 15 out of 16 compounds for which Res. 27, 911e937.
McEachran, A.D., Mansouri, K., Newton, S., Beverly, B., Sobus, J.R., Williams, A.J.,
standards were available on hand. Furthermore, ToxPi ranking 2017a. Evaluating Three Gradient HPLC Retention Time Prediction Models: 1)
allowed focus to be placed on compounds of most relevance to logP, 2) ACD/ChromGenius, and 3) a Quantitative Structure Retention Rela-
human health. Standards are still required for level I identification tionship Model (under review).
McEachran, A.D., Sobus, J.R., Williams, A.J., 2017b. Identifying known unknowns
according to the Schymanski confidence levels (Schymanski et al.,
using the US EPA's CompTox Chemistry Dashboard. Anal. Bioanal. Chem. 409,
2014a); however, confirmation of all prioritized candidate com- 1729e1735.
pounds is impractical therefore researchers should focus on tools Müller, A., Schulz, W., Ruck, W.K.L., Weber, W.H., 2011. A new approach to data
evaluation in the non-target screening of organic trace substances in water
that add confidence to level 2 identifications, such as retention time
analysis. Chemosphere 85, 1211e1219.
predictors and in silico fragmentors. The retention time prediction Prüss-Üstün, A., Bos, R., Gore, F., Bartram, J., 2008. Safer Water, Better Health: Costs,
model used in this study (OPERA-RT) was unable to identify the one Benefits and Sustainability of Interventions to Protect and Promote Health.
false positive found and thus further development is necessary for World Health Organization.
Rager, J.E., Strynar, M.J., Liang, S., McMahen, R.L., Richard, A.M., Grulke, C.M.,
larger scale implementation of retention time prediction. Wambaugh, J.F., Isaacs, K.K., Judson, R., Williams, A.J., Sobus, J.R., 2016. Linking
The number of chemicals in the A1a group is very small high resolution mass spectrometry data with exposure and toxicity forecasts to
compared to the number of features extracted, or total chemicals, in advance high-throughput environmental monitoring. Environ. Int. 88,
269e280.
the samples. The vast majority of these features are quite small and, Rappaport, S.M., Smith, M.T., 2010. Environment and disease risks. Science 330,
thus, may represent chemicals at trace levels. That being said, trace 460e461.
levels of compounds may be of importance to human health. While Ruff, M., Mueller, M.S., Loos, M., Singer, H.P., 2015. Quantitative target and sys-
tematic non-target analysis of polar organic micro-pollutants along the river
there was a great degree of variability in the number of features, Rhine using high-resolution mass-spectrometry e identification of unknown
formulas, and Group A1a compounds in the samples, every sample sources and compounds. Water Res. 87, 145e154.
exhibited some degree of contamination. Given the wide range of Schymanski, E.L., Jeon, J., Gulde, R., Fenner, K., Ruff, M., Singer, H.P., Hollender, J.,
2014a. Identifying small molecules via high resolution mass spectrometry:
retention times and masses observed in this study, as well as the communicating confidence. Environ. Sci. Technol. 48, 2097e2098.
sheer number of features observed, our results indicate that acti- Schymanski, E.L., Singer, H.P., Longre e, P., Loos, M., Ruff, M., Stravs, M.A., Ripolle s
vated carbon point-of-use water filtration systems likely remove Vidal, C., Hollender, J., 2014b. Strategies to characterize polar organic contam-
ination in wastewater: exploring the capability of high resolution mass spec-
compounds spanning a wide range of physicochemical properties.
trometry. Environ. Sci. Technol. 48, 1811e1818.
Schymanski, E.L., Singer, H.P., Slobodnik, J., Ipolyi, I.M., Oswald, P., Krauss, M.,
Schulze, T., Haglund, P., Letzel, T., Grosse, S., 2015. Non-target screening with
Notes high-resolution mass spectrometry: critical review using a collaborative trial on
water analysis. Anal. Bioanal. Chem. 407, 6237e6255.
Schymanski, E.L., Williams, A.J., 2017. Open Science for Identifying “Known Un-
The authors declare no competing financial interest. known” Chemicals. ACS Publications.
306 S.R. Newton et al. / Environmental Pollution 234 (2018) 297e306

SciFinder, 2017. Chemical Abstract Services. Columbus, OH. Villanueva, C.M., Kogevinas, M., Cordier, S., Templeton, M.R., Vermeulen, R.,
Sobus, J.R., Wambaugh, J.F., Isaacs, K.K., Williams, A.J., McEachran, A.D., Nuckols, J.R., Nieuwenhuijsen, M.J., Levallois, P., 2014. Assessing exposure and
Richard, A.M., Grulke, C.M., Ulrich, E.M., Rager, J.E., Strynar, M.J., Newton, S.R., health consequences of chemicals in drinking water: current state of knowl-
2017. Integrating tools for non-targeted analysis research and chemical safety edge and research needs. Environ. Health Perspect. (Online) 122, 213.
evaluations at the US EPA. J. Expo. Sci. Environ. Epidemiol. (in press). Wambaugh, J.F., Setzer, R.W., Reif, D.M., Gangwal, S., Mitchell-Blackwood, J.,
Strynar, M., Dagnino, S., McMahen, R., Liang, S., Lindstrom, A., Andersen, E., Arnot, J.A., Joliet, O., Frame, A., Rabinowitz, J., Knudsen, T.B., 2013. High-
McMillan, L., Thurman, M., Ferrer, I., Ball, C., 2015. Identification of novel per- throughput models for exposure-based chemical prioritization in the ExpoCast
fluoroalkyl ether carboxylic acids (PFECAs) and sulfonic acids (PFESAs) in nat- project. Environ. Sci. Technol. 47, 8479e8488.
ural waters using accurate mass time-of-flight mass spectrometry (TOFMS). WHO, 2013. Chemical Safety of Drinking-water. World Health Organization.
Environ. Sci. Technol. 49, 11622e11630. Wild, C.P., 2005. Complementing the genome with an “exposome”: the outstanding
US EPA, 2015. Toxicity ForeCaster (ToxCast™) Data. US EPA. https://www.epa.gov/ challenge of environmental exposure measurement in molecular epidemiology.
chemical-research/toxicity-forecaster-toxcasttm-data. Cancer Epidemiol. Biomarkers Prev. 14, 1847e1850.
US EPA, 2016. Safe Drinking Water Act (SDWA).

You might also like