SRE Engineer - Apple Services Engineering

Vancouver, British Columbia, Canada
Software and Services

Summary

Posted:
Weekly Hours: 37.5
Role Number:200565576
The Apple Service Engineering - SRE team is looking for Site Reliability Engineers with experience in developing processes, tools, and automation for managing distributed systems in production environments. Our SRE team combines software and systems engineering and system administration practices to build and run large-scale, massively distributed, fault-tolerant systems. Our software ensures that Apple’s services are reliable, scalable and secure, and we leverage both open source and home-grown technologies to provide managed data infrastructure services. You will help building next generation search infrastructure and platform services, collaborating cross- functionally with various ASE teams, from store and commerce to search and recommendations. You’ll create platforms that can rapidly scale to serve personalized and non-personalized data with very low latencies. You should be someone who is not afraid to question assumptions, are a good standout colleague under tight deadlines, and can take on problems with elegant technical solutions.

Description

The ASE SRE team develops applications and tooling that are safe, reliable, scalable, and fast. Our Data Reliability Engineering team is responsible for all aspects of managing Voldemort key-value distributed database infrastructure deployment on on-premise bare metal and public cloud platforms, including maintenance, deployment automation, backup, observability and telemetry, with focus on reliability, performance, and scaling to deliver continuous data store availability to ASE Media Applications. Success in this role requires expertise in several of the following: Understanding of core SRE concepts - Monitoring, Alerting, Incident management Performance engineering (design concepts, profile-guided optimization) Prepare alert handling procedures, run-books, and collaborate with other SRE team members. Service management across bare metal, and virtualized (EC2) platforms Excellent communication and a high degree of customer focus when engaging with internal platform customers Ability to work optimally with colleagues based in other locations is also essential; experience in this area is a plus Prior experience with development or maintenance of distributed databases, and operating systems systems is recommended Come join us at Apple Services Engineering and help us deliver services and applications that are fluid and responsive. You will collaborate with engineers from across Apple to define the metrics, set targets, uncover optimization opportunities, and ship a service that will delight our customers. This role is for engineers who enjoy deep technical engineering that spans large cross-organizational projects. Your openness to learning and implementing new technologies will contribute to the continuous evolution of our organization. Good ideas are valued and rewarded.

Minimum Qualifications

  • At least 3 years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure- focused role, with preference for distributed database management.
  • Linux expertise
  • Support of internet-facing production services and distributed systems via deployments, onCall and Incident Management.
  • Proficiency in implementing and coordinating telemetry using monitoring and observability tools like Splunk, Grafana, and Prometheus, or similar.
  • Hands on scripting with Python and shell
  • Designing, building and maintaining infrastructure with a cloud provider such as AWS.
  • Automation advocate - prior history of removing operational toil via software.
  • Both a strong sense of ownership as well as team camaraderie with clear and transparent communication abilities.
  • Self motivated, inquisitive and always looking to learn more.

Key Qualifications

Preferred Qualifications

  • Demonstrated expertise developing distributed systems, storage engines, distributed systems, or performance engineering.
  • Experience developing critical internet services and/or platform infrastructure.
  • Proficient in one or more of the following programming languages: Java, Go (golang) or Python
  • Experience managing services on Kubernetes * Experience with Terraform
  • JVM tuning

Education & Experience

Additional Requirements

Pay & Benefits