12 months ago
Site Reliability Engineer
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.
Behind everything my clients users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining their data centers to building the next generation of platforms for their business, the SRE team make things possible.
The ideal candidate has a passion for building automated systems that deploy, change, monitor and fix complicated cloud-based infrastructure and applications. They expect their engineers to solve tough problems collaboratively, create quality products with a sophisticated design, and establish solid lines of communication and trust within the organization.
- Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless postmortems.
- BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
- Experience with algorithms, data structures, complexity analysis and software design.
- Experience in one or more of the following: C, C++, Java, Python, Go, Perl or Ruby.
- Interest in designing, analyzing and troubleshooting large-scale distributed systems.
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
- Ability to debug and optimize code and automate routine tasks.