Sarah McGough, PhD

Sarah McGough, PhD

San Francisco, California, United States
1K followers 500+ connections

Activity

Join now to see all activity

Experience

  • Genentech Graphic
  • -

  • -

  • -

    Cambridge, MA

  • -

    San Francisco, California, United States

  • -

    San Francisco Bay Area

  • -

    Cambridge, MA

  • -

    Brasília Area, Brazil

  • -

    Maputo, Mozambique

  • -

    Washington D.C. Metro Area

  • -

  • -

Education

  • Harvard University Graphic

    Harvard University

    -

    Activities and Societies: Presidential Scholar, Dudley House Public Service Fellow

    Dissertation: "Anticipating Outbreaks: Predictive Modeling to Improve Infectious Disease Surveillance"

  • -

  • -

    Activities and Societies: Phi Beta Kappa, Amnesty International, International Development Research Council, Social Concerns Committee

Volunteer Experience

  • Genentech Graphic

    Mentor- Gene Academy

    Genentech

    - Present 3 years 8 months

    Education

    STEM mentor for elementary school students in underserved South San Francisco public schools.

  • Mentor/Coach

    Futurelab

    - Present 2 years 8 months

    Education

    High school biotech curriculum coach preparing students for careers in biotechnology and STEM. Designed to reach 2M students by 2026.

  • SFSU PINC Graphic

    Mentor- Data Science and Machine Learning for Biotechnology Certificate Program

    SFSU PINC

    - Present 2 years 8 months

    Education

    Interview prep for students enrolled in SFSU Data Science and Machine Learning for Biotechnology Certificate under the Promoting Inclusivity in Computing (PINC) program.

Publications

  • Learning from data with structured missingness

    Nature Machine Intelligence

    Missing data are an unavoidable complication in many machine learning tasks. When data are ‘missing at random’ there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or structure, either explicitly or implicitly. Such ‘structured missingness’ raises a range of…

    Missing data are an unavoidable complication in many machine learning tasks. When data are ‘missing at random’ there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or structure, either explicitly or implicitly. Such ‘structured missingness’ raises a range of challenges that have not yet been systematically addressed, and presents a fundamental hindrance to machine learning at scale. Here we outline the current literature and propose a set of grand challenges in learning from data with structured missingness.

    See publication
  • Penalized regression for left-truncated and right-censored survival data

    Statistics in Medicine

    High-dimensional data are becoming increasingly common in the medical field as large volumes of patient information are collected and processed by high-throughput screening, electronic health records, and comprehensive genomic testing. Statistical models that attempt to study the effects of many predictors on survival typically implement feature selection or penalized methods to mitigate the undesirable consequences of overfitting. In some cases survival data are also left-truncated which can…

    High-dimensional data are becoming increasingly common in the medical field as large volumes of patient information are collected and processed by high-throughput screening, electronic health records, and comprehensive genomic testing. Statistical models that attempt to study the effects of many predictors on survival typically implement feature selection or penalized methods to mitigate the undesirable consequences of overfitting. In some cases survival data are also left-truncated which can give rise to an immortal time bias, but penalized survival methods that adjust for left truncation are not commonly implemented. To address these challenges, we apply a penalized Cox proportional hazards model for left-truncated and right-censored survival data and assess implications of left truncation adjustment on bias and interpretation. We use simulation studies and a high-dimensional, real-world clinico-genomic database to highlight the pitfalls of failing to account for left truncation in survival modeling.

    See publication
  • A dynamic, ensemble learning approach to forecast dengue fever epidemic years in Brazil using weather and population susceptibility cycles

    Journal of the Royal Society Interface

    Transmission of dengue fever depends on a complex interplay of human, climate and mosquito dynamics, which often change in time and space. It is well known that its disease dynamics are highly influenced by multiple factors including population susceptibility to infection as well as by microclimates: small-area climatic conditions which create environments favourable for the breeding and survival of mosquitoes. Here, we present a novel machine learning dengue forecasting approach, which…

    Transmission of dengue fever depends on a complex interplay of human, climate and mosquito dynamics, which often change in time and space. It is well known that its disease dynamics are highly influenced by multiple factors including population susceptibility to infection as well as by microclimates: small-area climatic conditions which create environments favourable for the breeding and survival of mosquitoes. Here, we present a novel machine learning dengue forecasting approach, which, dynamically in time and space, identifies local patterns in weather and population susceptibility to make epidemic predictions at the city level in Brazil, months ahead of the occurrence of disease outbreaks. Weather-based predictions are improved when information on population susceptibility is incorporated, indicating that immunity is an important predictor neglected by most dengue forecast models. Given the generalizability of our methodology to any location or input data, it may prove valuable for public health decision-making aimed at mitigating the effects of seasonal dengue outbreaks in locations globally.

    See publication
  • Nowcasting for real-time COVID-19 tracking in New York City: An evaluation using reportable disease data from early in the pandemic

    JMIR Public Health and Surveillance

    Objective:
    To support real-time COVID-19 situational awareness, the New York City Department of Health and Mental Hygiene used nowcasting to account for testing and reporting delays. We conducted an evaluation to determine which implementation details would yield the most accurate estimated case counts.

    Methods:
    A time-correlated Bayesian approach called Nowcasting by Bayesian Smoothing (NobBS) was applied in real time to line lists of reportable disease surveillance data…

    Objective:
    To support real-time COVID-19 situational awareness, the New York City Department of Health and Mental Hygiene used nowcasting to account for testing and reporting delays. We conducted an evaluation to determine which implementation details would yield the most accurate estimated case counts.

    Methods:
    A time-correlated Bayesian approach called Nowcasting by Bayesian Smoothing (NobBS) was applied in real time to line lists of reportable disease surveillance data, accounting for the delay from diagnosis to reporting and the shape of the epidemic curve. We retrospectively evaluated nowcasting performance for confirmed case counts among residents diagnosed during the period from March to May 2020, a period when the median reporting delay was 2 days.

    Results:
    Nowcasts with a 2-week moving window and a negative binomial distribution had lower mean absolute error, lower relative root mean square error, and higher 95% prediction interval coverage than nowcasts conducted with a 3-week moving window or with a Poisson distribution. Nowcasts conducted toward the end of the week outperformed nowcasts performed earlier in the week, given fewer patients diagnosed on weekends and lack of day-of-week adjustments. When estimating case counts for weekdays only, metrics were similar across days when the nowcasts were conducted, with Mondays having the lowest mean absolute error of 183 cases in the context of an average daily weekday case count of 2914.

    Conclusions:
    Nowcasting using NobBS can effectively support COVID-19 trend monitoring. Accounting for overdispersion, shortening the moving window, and suppressing diagnoses on weekends—when fewer patients submitted specimens for testing—improved the accuracy of estimated case counts. Nowcasting ensured that recent decreases in observed case counts were not overinterpreted as true declines and supported officials in anticipating the magnitude and timing of hospitalizations and deaths and allocating resources geographically.

    See publication
  • Rates of increase of antibiotic resistance and ambient temperature in Europe: a cross-national analysis of 28 countries between 2000 and 2016

    Eurosurveillance

    Background
    The rapid increase of bacterial antibiotic resistance could soon render our most effective method to address infections obsolete. Factors influencing pathogen resistance prevalence in human populations remain poorly described, though temperature is known to contribute to mechanisms of spread.

    Aim
    To quantify the role of temperature, spatially and temporally, as a mechanistic modulator of transmission of antibiotic resistant microbes.

    Methods
    An ecologic…

    Background
    The rapid increase of bacterial antibiotic resistance could soon render our most effective method to address infections obsolete. Factors influencing pathogen resistance prevalence in human populations remain poorly described, though temperature is known to contribute to mechanisms of spread.

    Aim
    To quantify the role of temperature, spatially and temporally, as a mechanistic modulator of transmission of antibiotic resistant microbes.

    Methods
    An ecologic analysis was performed on country-level antibiotic resistance prevalence in three common bacterial pathogens across 28 European countries, collectively representing over 4 million tested isolates. Associations of minimum temperature and other predictors with change in antibiotic resistance rates over 17 years (2000–2016) were evaluated with multivariable models. The effects of predictors on the antibiotic resistance rate change across geographies were quantified.

    Results
    During 2000–2016, for Escherichia coli and Klebsiella pneumoniae, European countries with 10°C warmer ambient minimum temperatures compared to others, experienced more rapid resistance increases across all antibiotic classes. Increases ranged between 0.33%/year (95% CI: 0.2 to 0.5) and 1.2%/year (95% CI: 0.4 to 1.9), even after accounting for recognised resistance drivers including antibiotic consumption and population density. For Staphylococcus aureus a decreasing relationship of −0.4%/year (95% CI:  −0.7 to 0.0) was found for meticillin resistance, reflecting widespread declines in meticillin-resistant S. aureus across Europe over the study period.

    Conclusion
    We found evidence of a long-term effect of ambient minimum temperature on antibiotic resistance rate increases in Europe. Ambient temperature might considerably influence antibiotic resistance growth rates, and explain geographic differences observed in cross-sectional studies. Rising temperatures globally may hasten resistance spread, complicating mitigation efforts.

    See publication
  • Modeling COVID-19 mortality in the US: Community context and mobility matter

    medrxiv

    The United States has become an epicenter for the coronavirus disease 2019 (COVID-19) pandemic. However, communities have been unequally affected and evidence is growing that social determinants of health may be exacerbating the pandemic. Furthermore, the impact and timing of social distancing at the community level have yet to be fully explored. We investigated the relative associations between COVID-19 mortality and social distancing, sociodemographic makeup, economic vulnerabilities, and…

    The United States has become an epicenter for the coronavirus disease 2019 (COVID-19) pandemic. However, communities have been unequally affected and evidence is growing that social determinants of health may be exacerbating the pandemic. Furthermore, the impact and timing of social distancing at the community level have yet to be fully explored. We investigated the relative associations between COVID-19 mortality and social distancing, sociodemographic makeup, economic vulnerabilities, and comorbidities in 24 counties surrounding 7 major metropolitan areas in the US using a flexible and robust time series modeling approach. We found that counties with poorer health and less wealth were associated with higher daily mortality rates compared to counties with fewer economic vulnerabilities and fewer pre-existing health conditions. Declines in mobility were associated with up to 15% lower mortality rates relative to pre-social distancing levels of mobility, but effects were lagged between 25-30 days. While we cannot estimate causal impact, this study provides insight into the association of social distancing on community mortality while accounting for key community factors. For full transparency and reproducibility, we provide all data and code used in this study.

    See publication
  • Nowcasting by Bayesian Smoothing: A flexible, generalizable model for real-time epidemic tracking

    PLOS Computational Biology

    Achieving accurate, real-time estimates of disease activity ('nowcasts') is challenged by delays in case reporting. However, approaches that seek to estimate cases in spite of reporting delays often do not consider the temporal relationship between cases during an outbreak, and may not generalize to surveillance contexts with very different reporting delays. This study describes a smooth Bayesian nowcasting approach that produces accurate estimates that capture the time evolution of the…

    Achieving accurate, real-time estimates of disease activity ('nowcasts') is challenged by delays in case reporting. However, approaches that seek to estimate cases in spite of reporting delays often do not consider the temporal relationship between cases during an outbreak, and may not generalize to surveillance contexts with very different reporting delays. This study describes a smooth Bayesian nowcasting approach that produces accurate estimates that capture the time evolution of the epidemic curve. We assess the performance for two diseases and show that relating cases between sequential time points contributes to NobBS’s performance and robustness across surveillance settings.

    See publication
  • NobBS: Nowcasting by Bayesian Smoothing (R package)

    CRAN: Comprehensive R Archive Network (R programming language)

    A Bayesian approach to estimate the number of occurred-but-not-yet-reported cases from incomplete, time-stamped reporting data for disease outbreaks. 'NobBS' learns the reporting delay distribution and the time evolution of the epidemic curve to produce smoothed nowcasts in both stable and time-varying case reporting settings, as described in McGough et al. (2019) <doi:10.1101/663823>.

    See publication
  • Antibiotic resistance increases with local temperature

    Nature Climate Change

    We explored the role of climate (local minimum temperature) and additional factors on the distribution of antibiotic resistance across the United States, and show that increasing local temperature as well as population density are associated with increasing antibiotic resistance (percent resistant) in common pathogens. We found that an increase in temperature of 10 °C across regions was associated with an increases in antibiotic resistance of 4.2%, 2.2%, and 2.7% for the common pathogens…

    We explored the role of climate (local minimum temperature) and additional factors on the distribution of antibiotic resistance across the United States, and show that increasing local temperature as well as population density are associated with increasing antibiotic resistance (percent resistant) in common pathogens. We found that an increase in temperature of 10 °C across regions was associated with an increases in antibiotic resistance of 4.2%, 2.2%, and 2.7% for the common pathogens Escherichia coli, Klebsiella pneumoniae and Staphylococcus aureus. The associations between temperature and antibiotic resistance in this ecological study are consistent across most classes of antibiotics and pathogens and may be strengthening over time. These findings suggest that current forecasts of the burden of antibiotic resistance could be significant underestimates in the face of a growing population and climate change.

    See publication
  • Forecasting Zika Incidence in the 2016 Latin America Outbreak Combining Traditional Disease Surveillance with Search, Social Media, and News Report Data

    PLoS Neglected Tropical Diseases

    In the absence of access to real-time government-reported Zika case counts, we demonstrate the ability of Internet-based data sources to track the outbreak. We combined information from Zika-related Google searches, Twitter microblogs, and the HealthMap digital surveillance system with historical Zika suspected case counts to track and predict estimates of suspected weekly Zika cases during the 2015–2016 Latin American outbreak, up to three weeks ahead of the publication of official case data…

    In the absence of access to real-time government-reported Zika case counts, we demonstrate the ability of Internet-based data sources to track the outbreak. We combined information from Zika-related Google searches, Twitter microblogs, and the HealthMap digital surveillance system with historical Zika suspected case counts to track and predict estimates of suspected weekly Zika cases during the 2015–2016 Latin American outbreak, up to three weeks ahead of the publication of official case data. Given the significant delay in the release of official government-reported Zika case counts, we show that these Internet-based data streams can be used as timely and complementary ways to assess the dynamics of the outbreak.

    Other authors
    • Jared B Hawkins, PhD, MMSc
    • John S Brownstein, PhD, MPH
    • Mauricio Santillana, PhD, MS
    See publication

Honors & Awards

  • Harvard University Distinction in Teaching Award

    Harvard University

    Acknowledges a special contribution to the teaching of undergraduates in Harvard College.

  • Presidential Scholar

    Harvard University

  • Defeating Malaria: From the Genes to the Globe Student Fellowship

    Harvard University

    Grant receipient for global malaria research

  • Michael Anderson Award for Academic Excellence and Social Responsibility

    Glynn Family Honors Program, University of Notre Dame

    Awarded to the graduating honors senior who has demonstrated high academic achievement, commitment to social justice, and vision for global change.

  • Phi Beta Kappa

    Epsilon of Indiana

    Most distinguished university academic honor society for the liberal arts and sciences.
    Invited for membership by Phi Beta Kappa faculty panel as 1 of 100 seniors selected from Notre Dame's College of Arts and Letters and College of Science. Based on top academic distinction.

  • Raymond W. Murray, C.S.C. Award in Anthropology

    University of Notre Dame Department of Anthropology

    Given to the top graduating senior for exemplary work in anthropology.

  • George Monteiro Prize

    Kellogg Institute for International Studies, University of Notre Dame

    Awarded best paper written in the Portuguese language, for the essay "Espiritismo e etnobotânica do Candomblé no desenvolvimento da identidade dos escravos afro-brasileiros" (“The spiritualism and ethnobotany of Candomblé in the development of the afro-Brazilian slave identity”)

  • Glynn Family Honors Program Undergraduate Research Grant

    University of Notre Dame

    Awarded funds to conduct research in Brazil for summer 2012.

  • Hesburgh-Yusko Scholars Program

    University of Notre Dame

    $100,000 merit-based scholarship offered by the University of Notre Dame in recognition of “distinguished academic accomplishment, exemplary moral character, demonstrated leadership ability, and a sincere commitment to service.” Selected as 1 of 25 scholars in the inaugural class of 2014 from a pool of more than 400 applicants.

Languages

  • English

    Native or bilingual proficiency

  • Portuguese

    Professional working proficiency

  • Spanish

    Professional working proficiency

More activity by Sarah

View Sarah’s full profile

  • See who you know in common
  • Get introduced
  • Contact Sarah directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More