Amin Vahdat
Mountain View, California, United States
11K followers
500+ connections
About
At Google, I lead the global Machine Learning, Systems and Cloud AI team, with…
Activity
-
The second half of my piece on Alan Turing's seminal #AI paper is out, plumbing the strangeness of what he wanted from #ArtificialIntelligence…
The second half of my piece on Alan Turing's seminal #AI paper is out, plumbing the strangeness of what he wanted from #ArtificialIntelligence…
Liked by Amin Vahdat
-
Absolutely thrilled about my first PhD student, Saksham Agarwal, who has now left the nest to start at UIUC as an assistant professor. Saksham's…
Absolutely thrilled about my first PhD student, Saksham Agarwal, who has now left the nest to start at UIUC as an assistant professor. Saksham's…
Liked by Amin Vahdat
-
Today is my last day at Google. I can’t believe it. This post will not do justice with the experiences and my personal journey at Google Cloud…
Today is my last day at Google. I can’t believe it. This post will not do justice with the experiences and my personal journey at Google Cloud…
Liked by Amin Vahdat
Experience
Education
Publications
-
Pip: Detecting the Unexpected in Distributed Systems
Proceedings of NSDI
Bugs in distributed systems are often hard to find. Many bugs reflect discrepancies between a system's behavior and the programmer's assumptions about that behavior. We present Pip, an infrastructure for comparing actual behavior and expected behavior to expose structural errors and performance problems in distributed systems. Pip allows programmers to express, in a declarative language, expectations about the system's communications structure, timing, and resource consumption. Pip includes…
Bugs in distributed systems are often hard to find. Many bugs reflect discrepancies between a system's behavior and the programmer's assumptions about that behavior. We present Pip, an infrastructure for comparing actual behavior and expected behavior to expose structural errors and performance problems in distributed systems. Pip allows programmers to express, in a declarative language, expectations about the system's communications structure, timing, and resource consumption. Pip includes system instrumentation and annotation tools to log actual system behavior, and visualization and query tools for exploring expected and unexpected behavior. Pip allows a developer to quickly understand and debug both familiar and unfamiliar systems.
We applied Pip to several applications, including FAB, SplitStream, Bullet, and RanSub. We generated most of the instrumentation for all four applications automatically. We found the needed expectations easy to write, starting in each case with automatically generated expectations. Pip found unexpected behavior in each application, and helped to isolate the causes of poor performance and incorrect behavior.Other authorsSee publication -
WAP5: Black-Box Performance Debugging for Wide-Area Systems
Proceedings of WWW
Wide-area distributed applications are challenging to debug, optimize, and maintain. We present Wide-Area Project 5 (WAP5), which aims to make these tasks easier by exposing the causal structure of communication within an application and by exposing delays that imply bottlenecks. These bottlenecks might not otherwise be obvious, with or without the application's source code. Previous research projects have presented algorithms to reconstruct application structure and the corresponding timing…
Wide-area distributed applications are challenging to debug, optimize, and maintain. We present Wide-Area Project 5 (WAP5), which aims to make these tasks easier by exposing the causal structure of communication within an application and by exposing delays that imply bottlenecks. These bottlenecks might not otherwise be obvious, with or without the application's source code. Previous research projects have presented algorithms to reconstruct application structure and the corresponding timing information from black-box message traces of local-area systems. In this paper we present (1) a new algorithm for reconstructing application structure in both local- and wide-area distributed systems, (2) an infrastructure for gathering application traces in PlanetLab, and (3) our experiences tracing and analyzing three systems: CoDeeN and Coral, two content-distribution networks in PlanetLab; and Slurpee, an enterprise-scale incident-monitoring system.
Other authorsSee publication
More activity by Amin
-
After posting on our 10-year TPU retrospective, it is exciting to see one of the major impacts of our TPU systems work: the training and serving of…
After posting on our 10-year TPU retrospective, it is exciting to see one of the major impacts of our TPU systems work: the training and serving of…
Shared by Amin Vahdat
-
Yesterday, we published a 10-year retrospective on the history of TPUs at Google. The article covers the 'origin story' for how the effort began, its…
Yesterday, we published a 10-year retrospective on the history of TPUs at Google. The article covers the 'origin story' for how the effort began, its…
Shared by Amin Vahdat
-
I first joined Google in summer 2019 as an MBA intern, and my project was to "evaluate ROI on Alphabet's investments in ML infrastructure". I…
I first joined Google in summer 2019 as an MBA intern, and my project was to "evaluate ROI on Alphabet's investments in ML infrastructure". I…
Liked by Amin Vahdat
-
Super proud of the work and partnership between Hyperdisk and the GKE teams! Jonas Lindberg Ruwen Hess Steven Soltis Mohit Agarwal Jackie Pan Jim…
Super proud of the work and partnership between Hyperdisk and the GKE teams! Jonas Lindberg Ruwen Hess Steven Soltis Mohit Agarwal Jackie Pan Jim…
Liked by Amin Vahdat
-
Every time I spend time with my friend will i am, I’m inspired, I learn a lot, and we have fun. He’s always ahead of the curve – more than a decade…
Every time I spend time with my friend will i am, I’m inspired, I learn a lot, and we have fun. He’s always ahead of the curve – more than a decade…
Liked by Amin Vahdat
-
Google's Hyperdisk ML is now generally available with up to **12x faster model load times** to help reduce AI/ML inference costs dramatically. So…
Google's Hyperdisk ML is now generally available with up to **12x faster model load times** to help reduce AI/ML inference costs dramatically. So…
Liked by Amin Vahdat
-
Last week I had the privilege of presenting the keynote at Google Cloud Summit Seoul 2024. It was so great to meet Google's customers and partners in…
Last week I had the privilege of presenting the keynote at Google Cloud Summit Seoul 2024. It was so great to meet Google's customers and partners in…
Liked by Amin Vahdat
-
Power efficiency has never been more important. We examine some of the implications in our just-published article "New Computer Evaluation Metrics…
Power efficiency has never been more important. We examine some of the implications in our just-published article "New Computer Evaluation Metrics…
Shared by Amin Vahdat
-
I was slated to run the Boston Marathon this year after an extended and difficult layoff from injury, and I had been looking forward to redemption…
I was slated to run the Boston Marathon this year after an extended and difficult layoff from injury, and I had been looking forward to redemption…
Liked by Amin Vahdat
-
The tradition of having a dinner with Michael Lyu whenever in Hong Kong is upheld. This time he took me and two of his recent PhDs to a Michelin Star…
The tradition of having a dinner with Michael Lyu whenever in Hong Kong is upheld. This time he took me and two of his recent PhDs to a Michelin Star…
Liked by Amin Vahdat
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More