about 1 month ago
Director, Site Reliability Engineering
$200,000 – $250,000 base plus bonus and benefits
My client are an industry-leading, multinational financial investment services company with one of the best technical environments available, offering lots of opportunities to further develop your skillsets. Based here in Dallas, they are investing in Midwest and South capabilities with amazing growth plans and progression opportunities for leaders in the SRE space.
This is a great opportunity to join a prestigious banking organization where you will be leading a team focusing on technology reference data, automation frameworks, developer enablement and Observability globally across the bank.
You will be driving efforts across performance, reliability and risk requirements to meet their customers expectations.
You should have a strong background leading teams that apply a software engineering mindset to deploying, managing and maintaining the systems and services to deliver business outcomes within their requirements. Whilst solving systemic infrastructure or application architecture gaps or issues and implement automation to deliver predictability.
- Manage the performance, availability and recoverty requirements along with standards that all platforms need to adhere to
- Design and create SRE guidelines and oversee the development of their SRE functions
- Be the lead representative in all the customer support, change and control forums
- Create the balance between cost, availability and risk and be able to communicate this to their customers
- Solid proficiency with one or more of the following: Java, Go, Python, C++
- Hands on experience with development, debugging and poptimizing code as well as automation
- Strong experience with Data Structures, Software Design and algorithms
- Experience with distributed systems design and maintenance
- Experience with databases such as MongoDB, Hadoop, Cassandra or ElasticSearch
- Experience with open source messaging like Kafka, Rabbit MQ
- Strong knowledge of cloud native solutions in AWS or GCP
- Deep knowledge of software development practices
- Programming skills with Java, Python and/or spark
- Commercial experience with Ansible and Ansible Towers
- Understanding of monitoring and analytics best practices using Prometheus, Splunk, Grafana, Elastic Search and Kibana
- Knowledge of data pipelines and streaming data
- Ability to configure and tune the observability platforms they use to streamline the altering and proactive issues identification
- Automate manual activities
- Expand performance and load testing capabilities
- Prior experience with DevOps CI/CD tools like Git and Jenkins
- Experience with Kubernetes and Docker