Sarah Aerni

Sarah Aerni

San Francisco, California, United States
7K followers 500+ connections

About

Technology leader with experience in Machine Learning for over 15 years. 9 years in…

Contributions

Activity

Join now to see all activity

Experience

  • Salesforce Graphic

    Salesforce

    San Francisco

  • -

    San Francisco, California, United States

  • -

    San Francisco Bay Area

  • -

  • -

  • -

    San Francisco Bay Area

  • -

  • -

  • -

  • -

  • -

    San Francisco

  • -

  • -

  • -

  • -

  • -

  • -

  • -

  • -

Education

  • Stanford University Graphic

    Stanford University

    -

    Activities and Societies: Founding member of the Stanford Association for Multi-Disciplinary Medicine and Science (SAMMS), organizing committee member for Biomedical Computing at Stanford (a student-run conference), elected student representative to the executive committee of the BMI program, organizing industry panels.

    Primary Faculty Advisor : Serafim Batzoglou
    Co-advisor: Stuart Kim
    William R. Hewlett Fellow (Stanford Graduate Fellowships), National Science Foundation Graduate Research Fellow

  • -

  • -

    Activities and Societies: A competitive summer program providing medical, graduate and post-graduate students education in the analytical and practical skills required to start a business.

  • -

    Activities and Societies: Phi Beta Kappa Society Member, Caledonian Honor Society at Muir College, course tutor for introductory computer science courses, research with Dr. Ben Raphael

    Minored in French Literature, graduated magna cum laude, 2006 Finalist for Computing Research Association's Outstanding Undergraduate Award Program, Provost’s Honors

Volunteer Experience

Publications

  • A Bioinformatics Guide for Molecular Biologists

    Cold Spring Harbor Laboratory Press

    Informatics can vastly assist progress in research and development in cell and molecular biology and biomedicine. However, many investigators are either unaware of the ways in which informatics can improve their research or find it inaccessible due to a feeling of “informatics anxiety.” This sense of apprehension results from improper communication of the principles behind these approaches and of the value of the many tools available. In fact, many researchers are inherently distrustful of…

    Informatics can vastly assist progress in research and development in cell and molecular biology and biomedicine. However, many investigators are either unaware of the ways in which informatics can improve their research or find it inaccessible due to a feeling of “informatics anxiety.” This sense of apprehension results from improper communication of the principles behind these approaches and of the value of the many tools available. In fact, many researchers are inherently distrustful of these tools. A more complete understanding of bioinformatics offered in A Bioinformatics Guide for Molecular Biologists will allow the reader to become comfortable with these techniques, encouraging their use—thus helping to make sense of the vast accumulation of data. To make these concepts more accessible, the editors approach the field of bioinformatics from the viewpoint of a molecular biologist, (1) arming the biologist with a basic understanding of the fundamental concepts in the field, (2) presenting approaches for using the tools from the standpoint of the data for which they are created, and (3) showing how the field of informatics is quickly adapting to the advancements in biology and biomedical technologies. All concepts are paired with recommendations for the appropriate programming environment and tools best suited to solve the particular problem at hand. It is a must-read for those interested in learning informatics techniques required for successful research and development in the laboratory.

    Other authors
    See publication
  • Automated Cellular Annotation for High Resolution Images of Adult C. elegans

    Bioinformatics [ISMB/ECCB] 2013

    Motivation: Advances in high-resolution microscopy have recently made possible the analysis of gene expression at the level of individual cells. The fixed lineage of cells in the adult worm Caenorhabditis elegans makes this organism an ideal model for studying complex biological processes like development and aging. However, annotating individual cells in images of adult C.elegans typically requires expertise and significant manual effort. Automation of this task is therefore critical to…

    Motivation: Advances in high-resolution microscopy have recently made possible the analysis of gene expression at the level of individual cells. The fixed lineage of cells in the adult worm Caenorhabditis elegans makes this organism an ideal model for studying complex biological processes like development and aging. However, annotating individual cells in images of adult C.elegans typically requires expertise and significant manual effort. Automation of this task is therefore critical to enabling high-resolution studies of a large number of genes.

    Results: In this article, we describe an automated method for annotating a subset of 154 cells (including various muscle, intestinal and hypodermal cells) in high-resolution images of adult C.elegans. We formulate the task of labeling cells within an image as a combinatorial optimization problem, where the goal is to minimize a scoring function that compares cells in a test input image with cells from a training atlas of manually annotated worms according to various spatial and morphological characteristics. We propose an approach for solving this problem based on reduction to minimum-cost maximum-flow and apply a cross-entropy–based learning algorithm to tune the weights of our scoring function. We achieve 84% median accuracy across a set of 154 cell labels in this highly variable system. These results demonstrate the feasibility of the automatic annotation of microscopy-based images in adult C.elegans.

    Other authors
    See publication
  • Reconstruction of genealogical relationships with applications to Phase III of HapMap

    Bioinformatics

    MOTIVATION:
    Accurate inference of genealogical relationships between pairs of individuals is paramount in association studies, forensics and evolutionary analyses of wildlife populations. Current methods for relationship inference consider only a small set of close relationships and have limited to no power to distinguish between relationships with the same number of meioses separating the individuals under consideration (e.g. aunt-niece versus niece-aunt or first cousins versus great…

    MOTIVATION:
    Accurate inference of genealogical relationships between pairs of individuals is paramount in association studies, forensics and evolutionary analyses of wildlife populations. Current methods for relationship inference consider only a small set of close relationships and have limited to no power to distinguish between relationships with the same number of meioses separating the individuals under consideration (e.g. aunt-niece versus niece-aunt or first cousins versus great aunt-niece).
    RESULTS:
    We present CARROT (ClAssification of Relationships with ROTations), a novel framework for relationship inference that leverages linkage information to differentiate between rotated relationships, that is, between relationships with the same number of common ancestors and the same number of meioses separating the individuals under consideration. We demonstrate that CARROT clearly outperforms existing methods on simulated data. We also applied CARROT on four populations from Phase III of the HapMap Project and detected previously unreported pairs of third- and fourth-degree relatives.
    AVAILABILITY:
    Source code for CARROT is freely available at http://carrot.stanford.edu.

    Other authors
    See publication
  • Reconstruction of genealogical relationships with applications to Phase III of HapMap.

    Bioinformatics [ISMB/ECCB]

    *Authors should be regarded as joint First Authors.

    Other authors
    • Sofia Kyriazopoulou-Panagiotopoulou*
    • Dorna Kashef Haghigh*
    • Sarah J. Aerni*
    • Andreas Sundquist
    • Sivan Bercovici
    • Serafim Batzoglou
    See publication
  • Analysis of gene regulation and cell fate from single-cell gene expression profiles in C. elegans

    Cell

    The C. elegans cell lineage provides a unique opportunity to look at how cell lineage affects patterns of gene expression. We developed an automatic cell lineage analyzer that converts high-resolution images of worms into a data table showing fluorescence expression with single-cell resolution. We generated expression profiles of 93 genes in 363 specific cells from L1 stage larvae and found that cells with identical fates can be formed by different gene regulatory pathways. Molecular signatures…

    The C. elegans cell lineage provides a unique opportunity to look at how cell lineage affects patterns of gene expression. We developed an automatic cell lineage analyzer that converts high-resolution images of worms into a data table showing fluorescence expression with single-cell resolution. We generated expression profiles of 93 genes in 363 specific cells from L1 stage larvae and found that cells with identical fates can be formed by different gene regulatory pathways. Molecular signatures identified repeating cell fate modules within the cell lineage and enabled the generation of a molecular differentiation map that reveals points in the cell lineage when developmental fates of daughter cells begin to diverge. These results demonstrate insights that become possible using computational approaches to analyze quantitative expression from many genes in parallel using a digital gene expression atlas.

    Other authors
    See publication
  • BJ Raphael, S Volik, P Yu, C Wu, G Huang, EV Linardopoulou, BJ Trask, FM Waldman, J Costello, KJ Pienta, GB Mills, K Bajsarowicz, Y Kobayashi, S Shivaranjani, P Paris, Q Tao, SJ Aerni, RP Brown, A Bashir, JW Gray, JF Cheng, P de Jong, M Nefedov, T Ried, H

    -

  • BT Messmer*, B Raphael*, SJ Aerni, GF Widhopf, LZ Rassenti, JG Gribben, NE Kay, TJ Kipps "Computational Identification Of CDR3 Sequence Archetypes Among Immunoglobulin Sequences in Chronic Lymphocytic Leukemia" Leukemia Research, Volume 33, Issue 3, Pages

    -

  • Reconstructing Cancer Genome Organization

    BMC Bioinformatics

    A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data.

    We demonstrate that PREGO efficiently…

    A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data.

    We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is available at http://compbio.cs.brown.edu/software/.

    Other authors
    See publication
  • SJ Aerni, E Eskin “10 Years of the International Conference on Research in Computational Molecular Biology (RECOMB)”,RECOMB 2006: 546-562

    -

Patents

  • RAPID PROCESSING OF BIOLOGICAL SEQUENCE DATA

    Issued US 9703925

    In general, one aspect of the subject matter described in this specification is embodied in operations of processing sequence data by selecting a distribution key according to a type of one or more tasks to be performed on the data. The key is one or more data fields of a sequence data file, e.g., a sequence alignment/map (SAM) format or binary sequence alignment/map (BAM) format file, or derived from one or more data fields of a sequence data file. The sequence data is then distributed to…

    In general, one aspect of the subject matter described in this specification is embodied in operations of processing sequence data by selecting a distribution key according to a type of one or more tasks to be performed on the data. The key is one or more data fields of a sequence data file, e.g., a sequence alignment/map (SAM) format or binary sequence alignment/map (BAM) format file, or derived from one or more data fields of a sequence data file. The sequence data is then distributed to multiple nodes of a parallel processing relational database system. The system performs the tasks of processing the sequence data by executing database queries. The system executes the database queries on multiple nodes in parallel. The system can use query optimization functions built into the database to expedite performance of each task.

    Other inventors
  • IN-DATABASE SINGLE-NUCLEOTIDE GENETIC VARIANT ANALYSIS

    Issued US 9,594,777

    Genetic data in row-wise flat files, such as VCF and VCF-like files, comprising a plurality of data elements of different types is analyzed using a parallel framework in an MPP shared-nothing distributed database having a plurality of distributed segments by first parsing the data into groups of data elements of the same types, converting the data into entry-wise genetic data such that the same types of data elements are in a column, and distributing and storing the entry-wise genetic data in…

    Genetic data in row-wise flat files, such as VCF and VCF-like files, comprising a plurality of data elements of different types is analyzed using a parallel framework in an MPP shared-nothing distributed database having a plurality of distributed segments by first parsing the data into groups of data elements of the same types, converting the data into entry-wise genetic data such that the same types of data elements are in a column, and distributing and storing the entry-wise genetic data in the distributed segments. SQL database queries are used to analyze the genetic data, including locating probable significant associations between genotype and phenotype data.

    Other inventors
    See patent
  • ELEMENT IDENTIFICATION IN DATABASE

    Issued US 9,569,464

    This document describes, among other things, a computer-implemented method. The method includes obtaining a structured data object that having a plurality of nodes that represent elements in the data object. One or more tables that define a table representation of the data object can be generated. The one or more tables can include a plurality of table entries that correspond to the plurality of nodes, respectively. For each of one or more first nodes from among the plurality of nodes, the…

    This document describes, among other things, a computer-implemented method. The method includes obtaining a structured data object that having a plurality of nodes that represent elements in the data object. One or more tables that define a table representation of the data object can be generated. The one or more tables can include a plurality of table entries that correspond to the plurality of nodes, respectively. For each of one or more first nodes from among the plurality of nodes, the method can include identifying information about one or more second nodes that are determined to be adjacent or otherwise related to the first node by performing window functions along two or more coordinate systems in the one or more tables. The window function can be centered on a particular table entry that corresponds to the first node of the data object.

    Other inventors
    See patent

Courses

  • Machine Learning at Stanford University

    CS 229

Languages

  • English

    Native or bilingual proficiency

  • German

    Native or bilingual proficiency

  • Swiss German

    Native or bilingual proficiency

  • French

    Limited working proficiency

More activity by Sarah

View Sarah’s full profile

  • See who you know in common
  • Get introduced
  • Contact Sarah directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Sarah Aerni in United States

Add new skills with these courses