Company Overview
Totango + Catalyst have joined forces to build a leading customer growth platform that helps businesses protect and grow their revenue. Built by an experienced team of industry leaders, our software integrates with all the tools CS teams already use to provide one centralized view of customer data. Our modern and intuitive dashboards help CS leaders develop impactful workflows and take the right actions to understand health, prevent churn, increase adoption, and drive expansion.
Position Overview
As a Senior Site Reliability Engineer at Totango + Catalyst, you will help shape our infrastructure and build the foundation our team relies on for the rapid delivery of our product. We’ll depend on you to instill best practices for building scalable distributed systems, emphasizing development experience, observability and fault tolerance. Our current stack consists of technologies such as Ruby on Rails, RDS, Elasticsearch, Java, and Kubernetes, and we are moving towards microservices and serverless. If you thrive in a growth-stage startup environment and are looking for more ownership and the ability to have a significant impact, we would love to meet you.
This role is opened to candidates working remotely anywhere in Canada and the U.S.
What You’ll Do
- Manage our AWS infrastructure, with an emphasis on configuration as code.
- Keep our site and our services up and running, or get it back up and running quickly when a failure occurs
- Improve monitoring and work with developers to improve performance and reliability
- Participate in technical design reviews and architecture planning
- Debugging complex problems across an entire stack and creating solid solutions
- Collaborate with product managers and developers to evolve our delivery pipeline
- Working closely with internal partners and teams to ensure that we ship software that meets security, SLA, performance, and budget requirements
- Help build our on-call policies and runbooks
- Take ownership of projects and demonstrate a high level of accountability
- Manage our data infrastructure and pipeline
- Focus on quality, cost-effective scalability, and distributed system reliability and establish automated mechanisms
Who You Are:
- You are passionate about learning. Obstacles and challenges don’t deter you, you find these as opportunities to learn and grow.
- You have a positive demeanor and a go-getter attitude!
- You are a strong team player. You collaborate well with others, and want to work together to solve common goals.
- You are proactive in seeking opportunities to learn and identifying opportunities to improve our processess.
What You’ll Need
- 5+ years of experience building and maintaining cloud infrastructure for distributed production systems
- 1+ year of experience as a backend engineer developing enterprise web applications
- Excellent communication skills, both verbal and written
- Know your way around a Unix/Linux shell, can write shell scripts, and understands Linux internals
- Experience debugging complex problems
- Experience designing, building, and operating large-scale production systems
- Proficiency in Bash, Python, or other scripting languages
- Experience in databases and data warehouses
- Experience with security requirements for SOC2/ISO
- FinOps experience
- Strong Project Management skills
- A strong desire to show ownership of problems you identify
- Optional CKAD, CKS, CKA Exam, AWS Certified Exams
Technologies You’ll Need
- Demonstrated experience with configuration and orchestration tools such as Terraform, CloudFormation and Ansible
- Experience with containers, such as Docker
- Experience with administering, securing, and optimizing Kubernetes clusters
- Experience building monitoring, observability, logging, and developer tooling
- Experience with Helm, Kustomize, ArgoCD, Grafana, Prometheus, Thanos, VictoriaMetrics, Cilium, Linkerd, Envoy, AWS App Mesh, CoreDNS
- Experience creating CI/CD Pipelines for different coding languages
- Experience with one or more: Ruby on Rails, Python, Java, Kotlin, Go, Node.js
- Experience with version control systems like GitHub
- Familiarity with AWS services, AWS best practices and securing AWS accounts
- Experience operating and tuning data stores such as PostgreSQL and Elasticsearch
- Experience with managing the infrastructure that backs data pipelines and data lakes such as Airflow
- Experience managing streaming infrastructure such as Kafka or Kinesis
Why You’ll Love Working Here!
- Work from anywhere!
- Highly competitive compensation package, including equity
- Comprehensive benefits, including up to 100% paid medical, dental, & vision insurance coverage for you & your loved ones
- Open vacation policy, encouraging you to take the time you need
- Monthly Mental Health Days and Mental Health Weeks twice per year
- Ability to influence and drive key technical and architectural decisions
- High visibility and impact across the whole company
Your base pay is one part of your total compensation package and is determined within a range. The base salary for this role is from $140,000.00 - $175,000.00 per year. We take into account numerous factors in deciding on compensation, such as experience, job-related skills, relevant education or training, and other business and organizational requirements. The salary range provided corresponds to the level at which this position has been defined.
Totango + Catalyst is an equal opportunity employer, meaning that we do not discriminate based on race, religion, national origin, gender identity, age, sexual orientation, or any other protected class. Diversity is more than just good intentions; we are committed to creating an inclusive environment for all employees