Service Reliability Engineer

December 20, 2024
Apply Now

Job Description

As passionate about our people as we are about our mission.
What We’re All About :
Q2 is proud of delivering our mobile banking platform and technology solutions, globally, to more than 22 million end users across our 1,300 financial institutions and fintech clients. At Q2, our mission is simple: Build strong, diverse communities by strengthening their financial institutions. We accomplish that by investing in the communities where both our customers and employees serve and live.
What Makes Q2 Special?
Being as passionate about our people as we are about our mission. We celebrate our employees in many ways, including our “Circle of Awesomeness” award ceremony and day of employee celebration among others! We invest in the growth and development of our team members through ongoing learning opportunities, mentorship programs, internal mobility, and meaningful leadership relationships. We also know that nothing builds trust and collaboration like having fun. We hold an annual Dodgeball for Charity event at our Q2 Stadium in Austin, inviting other local companies to play, and community organizations we support to raise money and awareness together.
The Job At-A-Glance:
This role combines operational expertise and technical proficiency to drive service reliability, proactive monitoring, and incident response. As a Service Reliability Engineer, you’ll work closely with cross-functional teams to maintain and improve system resiliency, automate recovery processes, and enhance overall user experience. You’ll contribute to a culture that values continuous improvement, automation, and collaboration.
A Typical Day:

  • Define and measure system reliability through SLAs, SLOs, and SLIs.
  • Consult on hosting solutions to identify the best fit for specific services and optimize internal services’ interactions with hosting platforms.
  • Collaborate with Observability and Incident Response teams to implement monitoring and early-warning systems.
  • Automate recovery processes such as failure remediations, auto-rollbacks, and alerting mechanisms.
  • Support incident management processes, post-incident reviews (PIRs), and root cause analysis (RCA).
  • Perform forensic analysis to isolate issues (e.g., hosting platforms, configuration, or service).
  • Partner with developers to drive performance improvements, establish standards, and optimize changes while maintaining the right balance of reliability, speed of innovation, and cost.
  • Optimize capacity planning and resource performance for seamless scalability under high demand.
  • Foster a reliability-focused culture by partnering across engineering, operations, product, and support teams.
  • Engage in performance and chaos engineering practices to strengthen system resilience.

Bring Your Passion, Do What You Love. Here’s What We’re Looking For:

  • Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s preferred)
  • 5-8 years in Service Reliability Engineering, Infrastructure Engineering, Software Engineering, Implementations, or Service Optimization.
  • Proven ability to consult on hosting solutions and optimize internal services for hosting platforms and global capabilities.
  • Track record of implementing SRE principles in complex technical systems and environments.
  • Technical Proficiency: Expertise in system architecture, hosting platform performance, high availability, load balancing, and distributed systems.
  • Tooling Experience: Familiarity with tools like HashiCorp Nomad, Consul, Vault, Confluent Cloud (Kafka), Prometheus, Grafana, and Splunk.
  • Optimization Expertise: Ability to improve service health, narrow focus for troubleshooting across hosting, configuration, and services.
  • Automation Skills: Proficient in scripting (Python, Go, etc.), orchestration, and infrastructure-as-code (Ansible, Terraform).
  • Incident Management: Experience driving monitoring strategies, root cause analysis, and recovery optimizations.
  • Performance & Chaos Engineering: Capability to implement solutions for failure testing and system improvements.
  • Service-Level Understanding: Knowledge of SLIs, SLOs, and error budget calculations.
  • Strategic Thinking: Ability to balance reliability, innovation speed, and service costs while consulting cross-functionally.
  • Analytical Problem-Solving: Strong forensic analysis skills with natural curiosity for identifying root issues.
  • Collaboration: Proven ability to partner with developers to implement standards, performance enhancements, and engineering changes.
  • Communication: Excellent written and verbal skills to simplify technical concepts for stakeholders.
  • Familiarity with Google’s SRE principles, Agile methodologies, and DevOps practices.
  • Strong belief in automation, risk assessment, and resilience as cornerstones of system reliability.

#LI-HB1
This position requires fluent written and oral communication in English.
Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over sponsorship of an employment Visa at this time.
Health & Wellness

  • Hybrid Work Opportunities
  • Flexible Time Off
  • Career Development & Mentoring Programs
  • Health & Wellness Benefits, including competitive health insurance offerings and generous paid parental leave for eligible new parents
  • Community Volunteering & Company Philanthropy Programs
  • Employee Peer Recognition Programs – “You Earned it”

Click here to find out more about the benefits we offer.
How We Give Back to the Community:
You can learn more about our Q2 Spark Program, Q2 Philanthropy fund, and our employee volunteering programs on our Q2 Community page . Q2 supports dozens of wide-reaching organizations, such as the African American Leadership Institute , and The Trevor Project , promoting diversity and success in leadership and technology. Other deserving beneficiaries include Resource Center helping LGBTQ communities, JDRF , and Homes for our Troops , a group helping veterans rebuild their lives with specially adapted homes.
At Q2, our goal is to be a diverse and inclusive workforce that fosters mutual respect for our employees and the communities we serve. Q2 is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.