Site Reliability Engineer Remote Jobs

49 Results

DevOPS ● Bachelor's degree ● terraform ● Design ● azure ● api ● qa ● java ● c++ ● docker ● kubernetes ● linux ● AWS

iRhythm is hiring a Remote Senior Site Reliability Engineer

Boldly innovating to create trusted solutions that detect, predict, and prevent disease.

Discover your power to innovate while making a difference in patients' lives. iRhythm is advancing cardiac care…Join Us Now!

At iRhythm, we are dedicated, self-motivated, and driven to do the right thing for our patients, clinicians, and coworkers. Our leadership is focused and committed to iRhythm’s employees and the mission of the company. We are better together, embrace change and help one another. We are Thinking Bigger and Moving Faster.

Position: Senior Site Reliability Engineer

Responsibilities include:

iRhythm Technologies, Inc. is seeking a Senior Site Reliability Engineer to provide services for Engineers and SQA to build, provision, deploy and manage their application stacks in AWS and other Cloud Platforms. Write, build and deploy services globally and at scale. Write tools to automate the entire development lifecycle. Work with toolsets including AWS API, Java, Go, configuration management tools and more. Drive standards, tooling and services that is used by internal teams for all aspects of running an application. Work closely with Developers, QA and Operations staff to design and build automated processes for application and database migration, ensuring scalability, reproducibility, auditability and traceability. Provide automation and support throughout the processes required to get a product built and released into test and production environment. Troubleshoot build and deploy issues, and facilitate resolution. Maintain and enhance the automated continuous integration and continuous delivery environment. Evaluate and recommend new tools to improve build and release processes, with a goal of zero downtime releases. Communicate status frequently to all stakeholders. Document any new or changed processes. Telecommuting permitted. 20% domestic and 20% international travel required.

SALARY: $155,958 to $166,200 per year.

JOB REQUIREMENTS:

Requires a Bachelor’s degree in Electronic Engineering, Communications Engineering, Computer Sciences, or a related field, and 4 years of networking and devops experience. Must have experience with: AWS or Azure; Linux system administration skills such as OS deployments, patching and management; Configuration management tools such as Terraform; Container management technologies such as Docker and Kubernetes for automated deployments; Networking concepts and protocols including 5G networks, DNS, TCP/IP, IPSEC SSL VPN and firewalls; and Incident, Change and Operations Management. Telecommuting permitted. 20% domestic and 20% international travel required.

THIS POSITION IS ELIGIBLE FOR THE EMPLOYEE REFERRAL PROGRAM

Actual compensation may vary depending on job-related factors including knowledge, skills, experience, and work location.

Estimated Pay Range

$155,958—$166,200 USD

As a part of our core values, we ensure a diverse and inclusive workforce. We welcome and celebrate people of all backgrounds, experiences, skills, and perspectives. iRhythm Technologies, Inc. is an Equal Opportunity Employer. We will consider for employment all qualified applicants with arrest and conviction records in accordance with all applicable laws.

iRhythm provides reasonable accommodations for qualified individuals with disabilities in job application procedures, including those who may have any difficulty using our online system. If you need such an accommodation, you may contact us at taops@irhythmtech.com

About iRhythm Technologies
iRhythm is a leading digital healthcare company that creates trusted solutions that detect, predict, and prevent disease. Combining wearable biosensors and cloud-based data analytics with powerful proprietary algorithms, iRhythm distills data from millions of heartbeats into clinically actionable information. Through a relentless focus on patient care, iRhythm’s vision is to deliver better data, better insights, and better health for all.

Make iRhythm your path forward. Zio, the heart monitor that changed the game.

See more jobs at iRhythm

Apply for this job

Senior Site Reliability Engineer

InvocaRemote

salesforce ● c++ ● docker ● kubernetes ● linux

Invoca is hiring a Remote Senior Site Reliability Engineer

About Invoca:

Invoca is the industry leader and innovator in AI and machine learning-powered Conversation Intelligence. With over 300 employees, 2,000+ customers, and $100M in revenue, there are tremendous opportunities to continue growing the business. We are building a world-class SaaS company and have raised over $184M from leading venture capitalists including Upfront Ventures, Accel, Silver Lake Waterman, H.I.G. Growth Partners, and Salesforce Ventures.

About the team

Reliability Engineering is Invoca's foundation. We provide the infrastructure, tools, and observability for Invoca to build whatever is needed. We ensure stability today and enable growth for tomorrow.

We’re organized around three major needs:
- Consulting with development teams.
- Core service ownership and building the future of Invoca infrastructure.
- Research & development to keep our skills sharp and stay ahead of the industry.

The SRE Group is responsible for production uptime, observability, and platform reliability. Invoca takes a highly balanced approach to engineer on-call requirements and believes strongly in service ownership, allowing engineering teams to have autonomy and accountability for the amazing things they build.

The position’s reporting structure is:
Engineer -> Senior SRE Manager -> Director, SRE -> CTO -> CEO

About the Role

Our engineers are thoughtful, hard-working, friendly, and curious. We recognize that problem-solvers are everywhere and encourage you to apply if you:

Are curious, thoughtful, and seek to understand first
Understand and apply systems thinking in your day-to-day work
Operate with a customer-focused approach
Enjoy building trust & relationships with your team, your peers, and your colleagues throughout the organization
Understand reliability engineering principles and can advocate for better practices
Want to show up and solve problems

What you will do:

Provide observability for infrastructure and services across the Invoca platform including tools like Prometheus, Grafana, and Kibana
Provide Kubernetes as a service to development teams
Find new and better ways to scale our infrastructure in response to customer (internal and external) needs
Help enable multi-region and international presence to meet developer expectations
Participate in a one-week on-call rotation for services owned by your team
Solve challenging problems presented by the team and the business
Use metrics and your team’s collective experience to drive development decisions

Qualifications

3 years experience in an SRE (or equivalent e.g. sysadmin, software engineer) role
A background in Linux, Docker, and/or Kubernetes
Solid experience with configuration management and infrastructure as code
Critical thinking and problem solving
Exceptional communication skills
A strong sense of accountability

Salary, Benefits & Perks:

Teammates are eligible to begin receiving benefits on the first day of the month following or coinciding with one month of continuous employment. Below are some of our offerings:

Paid Time Off -Invoca encourages a work-life balance for our employees. We have an outstanding PTO policy, starting at 20 days off, for all full-time employees. We also offer 16 paid holidays, 10 days Compassionate Leave, 3 days volunteer time and more.
Healthcare -Invoca offers a health care program that includes medical, dental and vision coverage. There are multiple plan options to choose from so you can make the best choice for yourself, partner and family.
Retirement - Invoca offers a 401(k) plan through Fidelity with a company match of up to 4%.
Stock options - All employees are invited to ownership in Invoca through stock options.
Employee Assistance Program -Invoca offers well-being support on issues ranging from personal matters to everyday life topics through the WorkLifeMatters program.
Paid Family Leave -Invoca offers up to six weeks 100% paid leave for baby bonding, adoption, and caring for family members
Paid Medical Leave - Invoca offers up to twelve weeks 100% paid leave for childbirth and medical need
Sabbatical -We thank our long-term team members with an additional week of PTO along with a bonus after 7 years of service.
Wellness Subsidy - In further support of your well-being,Invoca provides a wellness subsidy that can be applied to a gym membership, fitness classes and more.
Position Base Range -$$127,000.00 - $150,000.00/year, plus bonus potential

Recently, we’ve noticed a rise in phishing attempts targeting individuals who are applying to our job postings. These fraudulent emails, posing as official communications from Invoca aim to deceive individuals into sharing sensitive information. These attacks have attempted to use our name and logo, and have tried to impersonate individuals from our HR team by claiming to represent Invoca.

We will never ask you to send financial information or other sensitive information via email.

DEI Statement

We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender, gender identity or expression, or veteran status. We are proud to be an equal-opportunity workplace.

#LI-Remote

See more jobs at Invoca

Apply for this job

Site Reliability Engineer

EgnyteRemote, India

Full Time ● DevOPS ● golang ● terraform ● Design ● ansible ● api ● docker ● kubernetes ● linux ● python

Egnyte is hiring a Remote Site Reliability Engineer

Description

Site Reliability Engineer

Mumbai, India

EGNYTE YOUR CAREER. SPARK YOUR PASSION.

Egnyte is a place where we spark opportunities for amazing people. We believe that every role has meaning, and every Egnyter should be respected. With 22,000+ customers worldwide and growing, you can make an impact by protecting their valuable data. When joining Egnyte, you’re not just landing a new career, you become part of a team of Egnyters that are doers, thinkers, and collaborators who embrace and live by our values:

Invested Relationships

Fiscal Prudence

Candid Conversations

ABOUT EGNYTE

Egnyte is the secure multi-cloud platform for content security and governance that enables organizations to better protect and collaborate on their most valuable content. Established in 2008, Egnyte has democratized cloud content security for more than 22,000 organizations, helping customers improve data security, maintain compliance, prevent and detect ransomware threats, and boost employee productivity on any app, any cloud, anywhere. For more information, visit www.egnyte.com.

Right now, we are looking for a Site Reliability Engineer. You will be ensuring reliability for large-scale software - we’re talking 22k+ customers, over 6000 instances across geo-distributed Data Centers and Cloud providers, as well as an average of 2k API requests per second as per New Relic. People who own their work from start to finish are integral to Egnyte’s success. Our engineers are part of the whole process: from design through coding and testing to the deployment and back again for further iterations. We are looking for a mid-level engineer eager to apply software development approaches to operations. You can, and will, touch every infrastructure level depending on the day and the project you are working on.

WHAT YOU’LL DO:

Maintain and monitor our environments in a 24/7 rotation system, partial night shift coverage
Improve our monitoring systems, identify repetitive tasks
Cooperate with international teams
Identify performance challenges
Document and communicate progress on resolving issues

YOUR QUALIFICATIONS:

Experience in an SRE/SysAdmin/DevOps or equivalent role - at least +4 years
Practical experience in managing Linux Operating Systems on the administrative level
Solid Monitoring & DevOps skills
Practical knowledge of container orchestration (Kubernetes, Docker)
Familiarity with at least one of the monitoring tools (e.g. Icinga, Newrelic, Prometheus, Grafana, OpenTSDB)
Experience with public cloud services (GCP/AWS/Azure)
Coding skills in Python or Golang
Ability to work effectively in a globally distributed team structure
Drive to grow as a Site Reliability Engineer (we value open-mindedness and a can-do attitude)
Troubleshooting skills to hunt down the root causes of issues and persistence in preventing them from happening again
Experience handling large numbers of diverse systems with configuration management systems like Puppet, Ansible, Terraform
Solid English skills to effectively communicate with other team members (B2 level)

BONUS SKILLS:

Practical Experience using CI/CD tools like Jenkins.
Incident management experience
Experience with Linux HA solutions such as HAProxy

BENEFITS:

Competitive salaries
Company equity depending on role and level
Medical insurance and healthcare benefits for you and your family
Fully paid premiums for life insurance
Flexible hours and PTO
Mental wellness platform subscription
Gym reimbursement
Childcare reimbursement
Group term life insurance

COMMITMENT TO DIVERSITY, EQUITY, AND INCLUSION:

At Egnyte, we celebrate our differences and thrive on our diversity for our employees, our products, our customers, our investors, and our communities. Egnyters are encouraged to bring their whole selves to work and to appreciate the many differences that collectively make Egnyte a higher-performing company and a great place to be.

See more jobs at Egnyte

Apply for this job

Site Reliability Engineer II

Oscar HealthRemote

Design ● c++ ● kubernetes

Oscar Health is hiring a Remote Site Reliability Engineer II

Hi, we're Oscar. We're hiring a Engineer II to join our SRE team.

Oscar is the first health insurance company built around a full stack technology platform and a focus on serving our members. We started Oscar in 2012 to create the kind of health insurance company we would want for ourselves—one that behaves like a doctor in the family.

About the role

As a core member of the Site Reliability Engineering team, you will build reliable and maintainable applications, infrastructure, and interfaces that make interacting with the health care system easier for members and providers. The goal of the team is to create scalable and highly reliable software systems with a focus on building and automating systems that are resilient, fault-tolerant, and self-healing, aiming to bridge the gap between development and operations teams.

You will report to the SRE Staff Engineer.

Work Location:

Oscar is a blended work culture where everyone, regardless of work type or location, feels connected to their teammates, our culture and our mission.

If you live within commutable distance to our New York City office (in Hudson Square), our Tempe office (off the 101 at University Dr), or our Los Angeles office (in Marina Del Rey), you will be expected to come into the office at least two days each week. Otherwise, this is a remote / work-from-home role.

You must reside in one of the following states: Alabama, Arizona, California, Colorado, Connecticut, Florida, Georgia, Illinois, Iowa, Kansas, Kentucky, Maine, Maryland, Massachusetts, Michigan, Minnesota, Missouri, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, Ohio, Oregon, Pennsylvania, Rhode Island, South Carolina, Tennessee, Texas, Utah, Vermont, Virginia, Washington, or Washington, D.C. Note, this list of states is subject to change. #LI-Remote

Pay Transparency:

The base pay for this role $144,000 - $189,000. You are also eligible for employee benefits, participation in Oscar’s unlimited vacation program, company equity grants and annual performance bonuses.

Responsibilities

Consistently write stable, correct, and maintainable code with little oversight; write modular, adaptable code with guidance.
Demonstrate mastery over common uses of available tools, frameworks, libraries, and infrastructure; strong knowledge of available libraries; make judicious choices over what code to write versus what code to import.
Managing Kubernetes Clusters and managing core components.
Work with partners, product managers, and designers to solve challenging problems.
Collaborate with other engineers on the team to improve technology and apply best practices.
Implement step-wise technical migrations of our existing services and applications.
Own small to medium features or infrastructure projects from technical design through completion with little required guidance.
Independent contributor to their team. Work effectively across the codebase with appropriate guidance from code owners.
Make steady, well-paced progress without requiring frequent significant feedback from more senior engineers.
Compliance with all applicable laws and regulations
Other duties as assigned

Qualifications

2+ years of professional software engineering experience, working with a variety of technologies, and have increasingly impactful accomplishments
Experience proposing, experimenting, and iterating, whether it be a new shiny technology or an arcane, ill-conceived data structure; our company may be new, but the health industry isn’t!
Experience with technical contributions, improving the quality of what you create, and are excited to build fault-tolerant, and scalable software systems.
Demonstrates solid understanding of the practical application of CS concepts within their team.

Bonus Points

Experience with Hashicorp Vault.
Experience managing production Kubernetes clusters
Designing a Kubernetes based Platform as a Service

This is an authentic Oscar Health job opportunity. Learn more about how you can safeguard yourself from recruitment fraudhere.

At Oscar, being an Equal Opportunity Employer means more than upholding discrimination-free hiring practices. It means that we cultivate an environment where people can be their most authentic selves and find both belonging and support. We're on a mission to change health care -- an experience made whole by our unique backgrounds and perspectives.

Pay Transparency: Final offer amounts, within the base pay set forth above, are determined by factors including your relevant skills, education, and experience.Full-time employees are eligible for benefits including: medical, dental, and vision benefits, 11 paid holidays, paid sick time, paid parental leave, 401(k) plan participation, life and disability insurance, and paid wellness time and reimbursements.

Reasonable Accommodation:Oscar applicants are considered solely based on their qualifications, without regard to applicant’s disability or need for accommodation. Any Oscar applicant who requires reasonable accommodations during the application process should contact the Oscar Benefits Team (accommodations@hioscar.com) to make the need for an accommodation known.

California Residents: For information about our collection, use, and disclosure of applicants’ personal information as well as applicants’ rights over their personal information, please see our Notice to Job Applicants.

See more jobs at Oscar Health

Apply for this job

11d

Site Reliability Engineer

iManageRemote

Full Time ● agile ● terraform ● sql ● Design ● azure ● ruby ● c++ ● c# ● .net ● docker ● kubernetes ● linux ● python

iManage is hiring a Remote Site Reliability Engineer

Site Reliability Engineer - iManage - Career PageWriting and designing automation, monitoring, diagnosing, and debugging tooling. 

See more jobs at iManage

Apply for this job

20d

Senior Site Reliability Engineer

AcquiaRemote - Costa Rica

DevOPS ● 9 years of experience ● 6 years of experience ● 3 years of experience ● terraform ● drupal ● Design ● ansible ● azure ● ruby ● java ● kubernetes ● jenkins ● python ● AWS ● PHP

Acquia is hiring a Remote Senior Site Reliability Engineer

Acquia empowers the world’s most ambitious brands to create digital customer experiences that matter. With open source Drupal at its core, the Acquia Digital Experience Platform (DXP) enables marketers, developers, and IT operations teams at thousands of global organizations to rapidly compose and deploy digital products and services that engage customers, enhance conversions, and help businesses stand out.

Headquartered in the U.S., Acquiais positioned as a market leader by the analyst community and is listed as one of the world’s top software companies by The Software Report. We are Acquia. We are a global company with employees located in more than 30 countries, and we’re building for the future.We want you to be a part of it!

About the role:

As a Senior Site Reliability Engineer, you will be a key player in designing, implementing, and maintaining our CI/CD pipelines, cloud infrastructure, and monitoring solutions. Your expertise in tools like ArgoCD, Kubernetes, and cloud-native architecture will help us achieve operational excellence at scale. You will work closely with engineering teams to ensure they have the right infrastructure in place to deploy rapidly, safely, and reliably.

This is a hands-on role for someone who thrives in an environment where automation is the goal, reliability is the baseline, and scalability is second nature. You won’t just be maintaining systems—you’ll be innovating, designing new ways to make our infrastructure smarter and our development faster.

Job Responsibilities:

CI/CD Pipeline Mastery: Design, build, and optimize continuous integration and continuous deployment (CI/CD) pipelines using ArgoCD, Jenkins, or similar tools. Ensure zero-downtime, fully automated deployment pipelines.
Infrastructure as Code (IaC): Build and manage scalable, reliable infrastructure using Terraform, Kubernetes, and other IaC tools. Ensure everything is automated—from deployments to monitoring—so that infrastructure becomes a self-service platform.
Cloud Expertise: Architect and manage cloud environments (AWS, GCP, or Azure), focusing on cost optimization, scalability, and performance. Implement disaster recovery, fault tolerance, and high availability strategies.
Monitoring and Alerting: Implement comprehensive monitoring solutions using Prometheus, Grafana, ELK, and Datadog to detect and resolve performance bottlenecks before they impact customers. Design and implement automated alerts for proactive system health monitoring.
DevOps Advocacy: Champion the culture of DevOps across teams—promote best practices, encourage adoption of new technologies, and drive a continuous learning mindset within the engineering teams. Be the go-to person for CI/CD, infrastructure scaling, and deployment automation.
SRE Mindset: Focus on building systems that are resilient by design, automating processes that improve reliability, and implementing Service Level Objectives (SLOs) to align engineering efforts with operational goals.
Security-First Approach: Collaborate with security teams to implement robust security practices, from container security to infrastructure hardening. Automate security checks within the pipeline for compliance and vulnerability management.
Collaboration with Engineering Teams: Work hand-in-hand with product development teams to understand their needs, integrate CI/CD practices into their workflows, and provide a fast, reliable, and secure path from code to production.

Skills:

BS in Computer Science or a comparable field of study, or equivalent practical experience.
Experience working with one or more of: Go, Python, Ruby, PHP, Java or Javascript.
Experience with Unix/Linux systems administration using the CLI.
Fundamental understanding of TCP/UDP networking concepts
Solid oral and written communications skills.
CI/CD Expertise: Extensive hands-on experience with CI/CD tools such as ArgoCD, Jenkins, CircleCI, or GitLab CI. Ability to design and implement pipelines that ensure rapid, reliable deployments.
Kubernetes Guru: Strong understanding and experience with Kubernetes, Helm, and container orchestration. Ability to scale and manage microservices in production.
Cloud Mastery: Proficient in at least one major cloud provider—AWS, GCP, or Azure. Experience with multi-cloud or hybrid-cloud architecture is a plus.
IaC Champion: Proficiency in Terraform, Ansible, or CloudFormation to manage infrastructure as code. Familiarity with GitOps workflows and version-controlled infrastructure.
Monitoring & Observability: Strong experience with monitoring tools like Prometheus, Grafana, Datadog, ELK, or New Relic. Ability to build custom dashboards and alerting systems.
Security-Focused: Deep understanding of security best practices in DevOps, including container security, CI/CD pipeline security, and cloud infrastructure hardening.
Problem Solver: Excellent troubleshooting skills with the ability to diagnose issues across a variety of environments, from code to infrastructure.
Collaboration Skills: Ability to work effectively in cross-functional teams, influencing peers and driving adoption of best practices across the organization.

Preferred Qualifications:

5-9 years of hands-on experience as a DevOps Engineer, SRE, or related role in a cloud-native environment.
Deep knowledge of CI/CD pipelines, especially using ArgoCD or similar tools.
Proven expertise in cloud platforms (AWS, GCP, Azure), with experience building and managing scalable, reliable infrastructure.
Strong scripting skills in Python, Go, or Bash.
Experience with service mesh architectures like Istio or Linkerd is a plus.
SRE Certification (or equivalent experience) is a bonus.
Certified Kubernetes Administrator (CKA) is preferred.
A passion for automation, observability, and reliability.

All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.

See more jobs at Acquia

Apply for this job

20d

Staff Site Reliability Engineer

AcquiaRemote - Costa Rica

Acquia is hiring a Remote Staff Site Reliability Engineer

Acquia empowers the world’s most ambitious brands to create digital customer experiences that matter. With open source Drupal at its core, the Acquia Digital Experience Platform (DXP) enables marketers, developers, and IT operations teams at thousands of global organizations to rapidly compose and deploy digital products and services that engage customers, enhance conversions, and help businesses stand out.

Headquartered in the U.S., Acquiais positioned as a market leader by the analyst community and is listed as one of the world’s top software companies by The Software Report. We are Acquia. We are a global company with employees located in more than 30 countries, and we’re building for the future.We want you to be a part of it!

About the role:

As a Staff Site Reliability Engineer, you will be a key player in designing, implementing, and maintaining our CI/CD pipelines, cloud infrastructure, and monitoring solutions. Your expertise in tools like ArgoCD, Kubernetes, and cloud-native architecture will help us achieve operational excellence at scale. You will work closely with engineering teams to ensure they have the right infrastructure in place to deploy rapidly, safely, and reliably.

Job Responsibilities:

CI/CD Pipeline Mastery: Design, build, and optimize continuous integration and continuous deployment (CI/CD) pipelines using ArgoCD, Jenkins, or similar tools. Ensure zero-downtime, fully automated deployment pipelines.
Infrastructure as Code (IaC): Build and manage scalable, reliable infrastructure using Terraform, Kubernetes, and other IaC tools. Ensure everything is automated—from deployments to monitoring—so that infrastructure becomes a self-service platform.
Cloud Expertise: Architect and manage cloud environments (AWS, GCP, or Azure), focusing on cost optimization, scalability, and performance. Implement disaster recovery, fault tolerance, and high availability strategies.
Monitoring and Alerting: Implement comprehensive monitoring solutions using Prometheus, Grafana, ELK, and Datadog to detect and resolve performance bottlenecks before they impact customers. Design and implement automated alerts for proactive system health monitoring.
DevOps Advocacy: Champion the culture of DevOps across teams—promote best practices, encourage adoption of new technologies, and drive a continuous learning mindset within the engineering teams. Be the go-to person for CI/CD, infrastructure scaling, and deployment automation.
SRE Mindset: Focus on building systems that are resilient by design, automating processes that improve reliability, and implementing Service Level Objectives (SLOs) to align engineering efforts with operational goals.
Security-First Approach: Collaborate with security teams to implement robust security practices, from container security to infrastructure hardening. Automate security checks within the pipeline for compliance and vulnerability management.
Collaboration with Engineering Teams: Work hand-in-hand with product development teams to understand their needs, integrate CI/CD practices into their workflows, and provide a fast, reliable, and secure path from code to production.

Skills:

BS in Computer Science or a comparable field of study, or equivalent practical experience.
Experience working with one or more of: Go, Python, Ruby, PHP, Java or Javascript.
Experience with Unix/Linux systems administration using the CLI.
Fundamental understanding of TCP/UDP networking concepts
Solid oral and written communications skills.
CI/CD Expertise: Extensive hands-on experience with CI/CD tools such as ArgoCD, Jenkins, CircleCI, or GitLab CI. Ability to design and implement pipelines that ensure rapid, reliable deployments.
Kubernetes Guru: Strong understanding and experience with Kubernetes, Helm, and container orchestration. Ability to scale and manage microservices in production.
Cloud Mastery: Proficient in at least one major cloud provider—AWS, GCP, or Azure. Experience with multi-cloud or hybrid-cloud architecture is a plus.
IaC Champion: Proficiency in Terraform, Ansible, or CloudFormation to manage infrastructure as code. Familiarity with GitOps workflows and version-controlled infrastructure.
Monitoring & Observability: Strong experience with monitoring tools like Prometheus, Grafana, Datadog, ELK, or New Relic. Ability to build custom dashboards and alerting systems.
Security-Focused: Deep understanding of security best practices in DevOps, including container security, CI/CD pipeline security, and cloud infrastructure hardening.
Problem Solver: Excellent troubleshooting skills with the ability to diagnose issues across a variety of environments, from code to infrastructure.
Collaboration Skills: Ability to work effectively in cross-functional teams, influencing peers and driving adoption of best practices across the organization.

Preferred Qualifications:

8-13 years of hands-on experience as a DevOps Engineer, SRE, or related role in a cloud-native environment.
Proven experience mentoring junior team-members.
Deep knowledge of CI/CD pipelines, especially using ArgoCD or similar tools.
Proven expertise in cloud platforms (AWS, GCP, Azure), with experience building and managing scalable, reliable infrastructure.
Strong scripting skills in Python, Go, or Bash.
Experience with service mesh architectures like Istio or Linkerd is a plus.
SRE Certification (or equivalent experience) is a bonus.
Certified Kubernetes Administrator (CKA) is preferred.
A passion for automation, observability, and reliability.

See more jobs at Acquia

Apply for this job

26d

Senior Site Reliability Engineer (Bridge) HUN, Budapest, Remote

LTGBudapest, HU - Remote

Lambda ● jira ● terraform ● slack ● ruby ● typescript ● kubernetes ● AWS

LTG is hiring a Remote Senior Site Reliability Engineer (Bridge) HUN, Budapest, Remote

People Matter Most!

We are a global team of Engineers, Product Managers, Designers, and Program Managers across Hungary, the US, and many other countries. We help our customers create work cultures people love.

About the Product

GetBridge was founded to define, develop, and deploy world-class, easy-to-use software; and that’s what we do and will keep on doing. We make better, more usable tools for teaching, learning and career management, stuff people will actually use. Are you interested?

So here are our questions to you:

Do you have a “Challenge Accepted” attitude?

You belong with us, if you are:

A problem solver who asks questions to get at the core issue the team is grappling with before deciding on a solution and a pragmatist who knows how to make trade-offs to solve challenges while building an architecture that scales for the future.
An owner who is capable of leading and delivering complex projects involving multiple teams while also caring about cloud operations for dozens of services across multiple regions, environments, and language stacks.
A builder who loves implementing automation to reduce toil and enable healthy systems by default and building tools and resources for upskilling other engineering teams to make service creation and maintenance self-service.
A watcher who likes configuring observability systems to identify incidents before they happen, respond to incidents, and contribute to a continuous improvement culture with occasional participation in 24/7 on-call rotations.
A learner who loves to learn new things and improve yourself is encoded in your DNA.
A mentor who supports the development and growth of their colleagues.

Knowledge is power; are you armored?

Here’s our tech stack - what you will learn:

At least one modern programming language (Java/Kotlin, Ruby, React & Typescript)
Cloud-based providers (AWS, Kubernetes, Aurora, EKS, Lambda, Pulsar and Apigee)
Cloud networking configuration (VPCs, security groups, load balancers, DNS, etc).
Configuration-as-a code (Terraform)
System observability (Datadog, Sentry)
CI/CD: GitHub, Spinnaker
CMO: SAFe, JIRA, Confluence, Slack, GSuite

Do you like things to be in balance?

Our offer focuses on your:

Healthy work-life balance: We have a great office in Allee Corner where you are welcome, but there is no mandate to get to work on a regular basis. Our employees enjoy the freedom to manage their working hours.
Personal growth: We want to bring out the best in you through several things, learning days, quarterly hack weeks, LinkedIn Learning, mentorship, career development plan and training opportunities from the first day.
Financial stability:We offer you a competitive salary package (1.7-2.1m gross / month depending on your seniority), bonus (based on the performance of the company), a comprehensive healthcare package provided by Medicover,SZÉP card, and other fringe benefits.

We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, colour, sex, age, national origin, religion, sexual orientation, gender identity, status as a veteran, and basis of disability or any other federal, state or local protected class.

See more jobs at LTG

Apply for this job

27d

Site Reliability Engineer

Hack TheAlimos,Attica,Greece, Remote Hybrid

terraform ● Design ● mobile ● docker ● kubernetes ● python ● AWS

Hack The is hiring a Remote Site Reliability Engineer

Ready to embark on the quest of joining Hack The Box?

At the end of this thrilling journey, you'll become a proud member of Hack The Box, with the ultimate mission to help cybersecurity professionals and organizations enhance their cyber-attack readiness. Get ready for an exciting adventure into the world of cybersecurity! ????????????

The Core Mission of the Site Reliability Engineer (SRE):
As a Site Reliability Engineer at Hack The Box, your paramount mission is to assist the seamless migration to AWS, strategically positioning our infrastructure to scale effectively with the company. Over the next 6 months, you will participate in enhancing our capabilities for expansion, setting the stage for the addition of new systems such as Kubernetes clusters, Services, and Databases. Additionally, your focus will shift towards establishing key performance indicators, service level objectives, and incident response metrics to drive a culture of reliability and continuous improvement.

The Fellowship You'll Be Joining:
You’ll join a team of 4 SREs, while collaborating closely with engineers, data scientists, and security experts. Finally, you will report directly to the SRE Lead and will have open communications with infrastructure department management and other high-caliber technical people across the organization.

Technology Tools & Weapons You'll Be Using:

Infrastructure as Code (Terraform): Automate the provisioning of AWS resources.
Containerization and Orchestration (Kubernetes, Flux CD): Ensure seamless deployment and scaling of applications.
Monitoring and Logging (Prometheus, Mimir, Grafana, Loki): Expand monitoring capabilities for new systems.
Automation and Scripting (Go, Python, etc): Scripting for efficient and automated processes.
Cloud Platforms (AWS): Execute the migration plan with a focus on AWS.

The Adventures That Await You After Becoming a Site Reliability Engineer at Hack The Box:

Heavily contribute to the AWS Migration for Scalability: Spearhead the migration from the current cloud provider towards AWS, strategically positioning our infrastructure for scalable growth across regions.
Expand Monitoring Stack: Integrate new systems into the Monitoring Stack, enhancing visibility and alerting capabilities for a globally distributed architecture.
Architectural Design for Reliability: Contribute to the design and implementation of reliable AWS infrastructure, focusing on fault tolerance and high availability.
Establish Metrics Framework: Implement and manage Service Level Agreements (SLAs), Service Level Objectives (SLOs), and Service Level Indicators (SLIs) to measure and improve system reliability.
Incident Response Enhancement: Develop and enhance incident response processes, leveraging metrics to continually improve response times and effectiveness.
Mentorship: Mentor and guide junior SREs in adapting to the AWS environment and implementing reliability best practices.
Collaborative Planning: Work closely with cross-functional teams to plan and implement new systems effectively, ensuring alignment with reliability goals.
Team Expansion: Play a key role in the team's expansion, contributing to the mentoring junior members.
Best Practices Advocacy: Champion best practices in AWS architecture and SRE methodologies, fostering a culture of reliability and continuous improvement.

Skills, Knowledge, and Experience Points Required to Unlock the Role of SRE at Hack The Box:

Hands-on Experience: Minimum 2 years of hands-on experience in site reliability engineering or a related field.
Automation Skills: Proficient in scripting and automation using languages such as Go, Python or Bash.
Cloud Expertise: In-depth knowledge of cloud platforms, particularly AWS.
Containerization: Experience with containerization technologies (Docker) and orchestration (Kubernetes).

Monitoring Mastery: Strong expertise in implementing and managing monitoring and logging solutions.
Metrics Framework: Proven experience establishing and managing SLAs, SLOs, and SLIs.
Problem Solving: Proven ability to troubleshoot complex system issues and implement effective solutions.
Collaborative Mindset: Excellent collaboration and communication skills, with a strong ability to work cross-functionally and mentor junior team members.

????️ What your Hack The Box adventure will have in store:

????You'll have the exhilarating opportunity to contribute to a product that is highly appreciated by users and the cybersecurity community at large.
???? You'll experience a highly supportive and caring environment, fostering growth, flexibility, and autonomy.
???? You'll embark on an exciting journey of continuous learning and problem-solving, leveling up as our organization grows.
???? Most importantly, you'll have a blast at HTB ???? because fun is an essential ingredient in our recipe for success! Just wait until you see our global meet-ups!

???? The gems you’ll be enjoying as a Site Reliability Engineer:

Private insurance
25 annual leave days
Dedicated budget for training and professional development, participation in conferences
State-of-the-art equipment (Macbook, iPhone, and mobile plan)
Free lunch & snacks at the office
Full access to the Hack The Box lab offerings; so you can learn how to hack
Flexible/Hybrid working

????️ The Quest of Becoming Hack The Box’s Site Reliability Engineer:

Level 1: To complete level one’s objective, submit your application.
Level 2: Meet the Talent Acquisition team. Level’s objective: highlight your past achievements, ambitions, and values.
Level 3: Meet the hiring team. Level’s objective: connect with the hiring team and share with them your achievements.
Level 4: Complete an assignment that aligns with day-to-day job-related tasks and responsibilities. Part of the assignment is discussing it with the hiring team in a debriefing session, in order to walk the team through your thinking process.
Level 5: Congratulations! Not many reach this level ????. Level’s objective: have a constructive, final conversation with senior leadership to explore the role and your future at HTB.
Level 6: You've officially received an offer from HTB! To complete the last level and the Quest, all you need to do is accept the offer.
Quest complete. Congratulations, you’re officially one of us ????????????Your next quest: complete the onboarding.

Hack Your Career, Today. Join us in this epic adventure of cybersecurity at Hack The Box! ????????????

At Hack The Box, we are on a quest to find the most exceptional and enthusiastic talent to join our team. Whether or not you consider yourself a gamer, we value what makes you unique and want to know more about you. This job post provides just a glimpse of the incredible gamified experience our business and consumer customers enjoy through our platforms. So, if you're ready to embark on a journey of growth and adventure, we can't wait to meet you!

ABOUT HACK THE BOX

Hack The Box is the Cyber Performance Center with the mission to provide a human-first platform to create and maintain high-performing cybersecurity individuals and organizations. Hack The Box is the only platform that unites upskilling, workforce development, and the human focus in the cybersecurity industry, and it’s trusted by organizations worldwide for driving their teams to peak performance. Offering an all-in-one environment for continuous growth, assessment, and recruitment, Hack The Box provides solutions for all cybersecurity domains.

Launched in 2017, Hack The Box brings together the largest global cybersecurity community of more than 2.6 million platform members. Rapidly growing its international footprint and reach, Hack The Box is headquartered in the UK, with additional offices in the US, Australia, and Greece.

???? Exciting News:

We are super proud to share that HTB’s all three entities across the UK, US, and Greece have been Certified as a Great Place to Work (Oct 2023-Oct 2024).
Furthermore, the HTB's Greek entity has been listed by the Great Place to Work Institute as the #4 Best Workplace in Greece and #7 in Europe for 2023, among more than 3,300 companies????
Get more insights about our HTB culture and employee experience by visiting our career site and Glassdoor.

At Hack The Box, we are committed to fostering a diverse, inclusive, and equitable workplace. We believe that diversity enriches our performance, services, and the communities we serve. As such, we ensure that all job applications are considered solely based on merit, skills, and qualifications. We do not discriminate on grounds of race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. We are dedicated to providing a fair and respectful work environment that reflects our values.

See more jobs at Hack The

Apply for this job

28d

Senior Site Reliability Engineer (Turkey)

SezzleTürkiye, Remote

Sales ● DevOPS ● Bachelor's degree ● terraform ● sql ● Design ● c++ ● docker ● kubernetes ● linux ● python ● AWS

Sezzle is hiring a Remote Senior Site Reliability Engineer (Turkey)

About the Role:

We are looking for a Site Reliability Engineer to work on our core Infrastructure and Security team, to assist us with designing, building, running, improving and scaling the infrastructure that engineering and data teams use to power their services. Your duties will include the development, testing, and maintenance of our serving and data platforms, using a combination of cloud products, open source tools and internal applications. Your duties will blend software development and operations in order to continuously automate our environments. You should be able to build high-quality, scalable solutions for a variety of problems.

Our Company:

Sezzle is a cutting-edge fintech company whose long-standing mission is to financially empower the next generation. Sezzle has built a payment platform that increases purchasing power for consumers by offering interest-free installment plans. This increase in purchasing power for consumers leads to increased sales and basket sizes for the numerous eCommerce merchants that currently work with Sezzle.

What Makes Working at Sezzle Awesome?

At Sezzle, we are more than just brilliant engineers, passionate data enthusiasts, out-of-the-box thinkers, and determined innovators; we are skilled musicians, yogis, cyclists, chefs, golfers, dog-lovers, and rock-climbers. We believe in surrounding ourselves with not only the best and the brightest individuals, but those that are unique and purpose-driven in all that they do. Our culture is not defined by a certain set of perks designed to give the illusion of the traditional startup culture, but rather, it is the visible example living in every employee that we hire.

Responsibilities:

Design, build and maintain scalable infrastructure for running our systems, based on Kubernetes, Redshift and additional AWS services and products.
Help the product teams quickly build out MVP products to test new solutions on the market.
Maintain and develop monitoring and alerting solutions to improve the on-call experience.
Assist product developers in debugging and triaging production issues.
Be the first line of defense for our operational environments, triaging and resolving problems as they occur. You will be on an on-call rotation.
Design and scale platform and data architectures to sustain rapid user growth.
Level up the teams through pairing, code review, and mentoring.
Bring and share with our team extensive experience with industry best practices in software development.

Minimum Requirements:

Bachelor's in computer science (preferred) or equivalent related experience
At least 5+ years of overall software, data, deployments and platform infrastructure experience.

Ideal Skills & Experience:

Experience with building and/or serving REST APIs using Go or a similar language.
Experience with Relational Databases, SQL and ORM technologies.
Strong overall Linux knowledge.
DevOps experience with CI/CD pipelines, Docker and Kubernetes, and cloud computing platforms like AWS.
Experience with deployment/provisioning tools like Terraform, Helm, Ansible.
Experience with implementing and maintaining observability and monitoring tools - Prometheus, Datadog, NewRelic, Grafana, Loki or similar.
Experience in ETL/ELT pipelines using Python and Open-source tools such as DBT.
Proficiency in building and maintaining large-scale data warehousing technologies such as Redshift.

About You:

A+ character. We are team-first here at Sezzle.
A hard-working mentality. It’s early and there is still a lot to build.
An excellent communicator.
A fun attitude. Life’s too short. We can have fun while we work hard on cool things.
Smarts. We need people that are smart enough to make decisions on their own and also smart enough to know when they need input from others.

Sezzle’s Technology Stack:

Languages:Golang, Typescript, Python
Frontend:Typescript - React and React Native
Backend:Golang
Database:MySQL, Postgres, Elasticsearch
DevOps & Cloud:AWS, Kubernetes
Version Control:Git
CI/CD:Gitlab
Testing:Developer-driven, focus on automated unit, integration, and end-to-end tests
Sezzle is focused on using open source, and we build what we can before buying!

Compensation

The compensation range for the role is as follows:

4,600 - 9,000 USD Monthly

Equal Employment Opportunity: Sezzle Inc. is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate based on race, color, religion, sex, national origin, age, disability, genetic information, pregnancy, or any other legally protected status. Sezzle recognizes and values the importance of diversity and inclusion in enriching the employment experience of its employees and in supporting our mission.

#Li-remote

See more jobs at Sezzle

Apply for this job

28d

Principal Site Reliability Engineer

ScienceLogicReston, VA or Remote

DevOPS ● agile ● remote-first ● terraform ● Design ● mobile ● linux ● python ● AWS

ScienceLogic is hiring a Remote Principal Site Reliability Engineer

*This position can be remote within the United States*

Who we are...

In a world of constant change, we're leading the charge towards truly autonomous enterprises. Our cutting-edge platform harnesses the power of automation and generative AI to revolutionize how businesses manage and optimize their IT operations.

We're not just adapting to digital transformation—we're accelerating it. Our solutions bring business and operations leaders together, unlocking new levels of innovation, efficiency, and scalability. We empower organizations to deliver superior customer experiences and drive revenue growth in an always-on, always-mobile world.

At ScienceLogic, we're building the foundation for Autonomic IT—a future where IT operations are self-healing, self-optimizing, and aligned perfectly with business objectives. Our team of visionaries is reshaping the $18+ billion IT operations market, creating cost-optimized, efficient, and next-level capabilities for enterprises worldwide.

ScienceLogic is going through a product transformation and the Site Reliability Engineering (SRE) team is at the forefront of it. We are responsible for the design, deployment, and maintenance of the Cloud Infrastructure used for running company’s revenue generating go-forward SaaS product line. Overall, we’re passionate about automation and solving complex business and technology challenges. Our team combines SRE, DevOps, Software Development and Information Security knowledge to help make Cloud operations agile, elastic inside the security and governance framework boundaries.

What we’re looking for…

We are looking for a Principal Site Reliability Engineer who is well versed in building cloud technologies in a secure manner, has an automation mindset and is an ardent follower of the SRE discipline. If this sounds like you, then our team will benefit from your skillset!

What you’ll be doing…

Enhance the company’s SaaS infrastructure security protocols.
Collaborate across the organization to design, build and operationalize SaaS services conforming to various security standards like FedRAMP, SOC2, ISO etc.
Participate in architecture, security, and operations reviews.
Lead design reviews and buildout of secure systems for delivering various SaaS services with 99.99% uptime.
Design, automate, test, and monitor the use of cloud native technologies as a foundation for a service platform.
Investigate and resolve customer and operational issues with the mentality of fixing and not just mitigating issues.
Identify and automate measurement of operations SLAs and SLOs.
Triage incident response, document SOPs, Runbooks, and train NOC team members
Writing automation that can be easily supported and extended by others.
Work on special projects as assigned.

Qualities you possess…

Here at Site Reliability, we believe that if you are hungry for learning, passionate for technology and like building tools then you are a good fit. Having experience with the skills is an added plus:

Must be a U.S. Citizen.
7-10 years of site reliability engineering or cloud operations experience or equivalent experience.
Proven track record of operating production SaaS environments within security standards like FedRAMP, SOC2, ISO, PCI.
Bachelors or Master's degree in Computer Science, Information Systems or similar field.
Skilled at problem solving, algorithms, and data structures conforming to the modern SaaS security requirements.
Building tools and scripting frameworks from scratch.
Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli.
Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
Exposure to Windows and Linux administration skills.
Familiarity with basic networking, security and cloud engineering concepts.
Highly collaborative with effective written and verbal communication skills.
Ability to work against tight deadlines and occasionally after-hours, part of on-call scheduling.
Occasionally work during off-hours and participate in weekly on-call schedule.
Take full responsibility for the availability and performance of the platform.

Benefits & Perks

A remote-first culture - work from home or come into the office, it's totally up to you.
Comprehensive medical, dental and vision plans.
401(k) plan with employer match.
Flexible Paid Time Off (FTO) so that you can take the time that you need to re-energize.
Volunteer Time Off (VTO) - take two days off per calendar year to volunteer with your preferred charitable organization.
5-year Service Milestone Sabbatical.
Paid parental leave.
Generous employee referral bonus program.
Pet insurance.
HQ Office centrally located in Reston Town Center featuring a well-stocked kitchen with rotating snacks and beverages, and catered lunch on Thursdays.
Regular virtual company-wide events, including cooking classes, yoga, meditation and more.
The opportunity to learn and develop from some of the best and brightest minds in the industry!

Don’t meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. At ScienceLogic, we are dedicated to building a diverse, inclusive and authentic workplace, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyway. You may be just the right candidate for this or other roles.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which you are applying.

About ScienceLogic

ScienceLogic empowers intelligent, automated IT operations, freeing up time and resources, and driving business outcomes with actionable insights. ScienceLogic’s AIOps platform sees broadly across clouds and on-premises, enabling business service visibility with relationship mapping, and workflow automation to eliminate manual tasks. Trusted by thousands of organizations across the globe, ScienceLogic’s technology has been proven for scale by the world’s largest service providers, enterprises and government agencies.

www.sciencelogic.com

All ScienceLogic employees have the responsibility to protect information assets, adhere to access controls, report suspicious activity, and comply with security and privacy policies.

#LI-Remote

See more jobs at ScienceLogic

Apply for this job

+30d

Site Reliability Engineer (Bridge) HUN, Budapest, Remote

LTGBudapest, HU - Remote

Lambda ● jira ● terraform ● slack ● ruby ● typescript ● kubernetes ● AWS

LTG is hiring a Remote Site Reliability Engineer (Bridge) HUN, Budapest, Remote

People Matter Most!

We are a global team of Engineers, Product Managers, Designers, and Program Managers across Hungary, the US, and many other countries. We help our customers create work cultures people love.

About the Product

So here are our questions to you:

Do you have a “Challenge Accepted” attitude?

You belong with us, if you are:

A problem solver who asks questions to get at the core issue the team is grappling with before deciding on a solution and a pragmatist who knows how to make trade-offs to solve challenges while building an architecture that scales for the future.
An owner who is capable of leading and delivering complex projects involving multiple teams while also caring about cloud operations for dozens of services across multiple regions, environments, and language stacks.
A builder who loves implementing automation to reduce toil and enable healthy systems by default and building tools and resources for upskilling other engineering teams to make service creation and maintenance self-service.
A watcher who likes configuring observability systems to identify incidents before they happen, respond to incidents, and contribute to a continuous improvement culture with occasional participation in 24/7 on-call rotations.
A learner who loves to learn new things and improve yourself is encoded in your DNA.
A mentor who supports the development and growth of their colleagues.

Knowledge is power; are you armored?

Here’s our tech stack - what you will learn:

At least one modern programming language (Java/Kotlin, Ruby, React & Typescript)
Cloud-based providers (AWS, Kubernetes, Aurora, EKS, Lambda, Pulsar and Apigee)
Cloud networking configuration (VPCs, security groups, load balancers, DNS, etc).
Configuration-as-a code (Terraform)
System observability (Datadog, Sentry)
CI/CD: GitHub, Spinnaker
CMO: SAFe, JIRA, Confluence, Slack, GSuite

Do you like things to be in balance?

Our offer focuses on your:

Healthy work-life balance: We have a great office at MOM Park where you are welcome, but there is no mandate to get to work on a regular basis. Our employees enjoy the freedom to manage their working hours.
Personal growth: We want to bring out the best in you through several things, learning days, quarterly hack weeks, LinkedIn Learning, mentorship, career development plan and training opportunities from the first day.
Financial stability:We offer you a competitive salary package (1.4 - 1.9M HUF gross / month depending on your seniority), bonus (based on the performance of the company), a comprehensive healthcare package provided by Medicover,SZÉP card, and other fringe benefits.

See more jobs at LTG

Apply for this job

+30d

Site Reliability Engineer II

Signify HealthDallas, TX, Remote

Design ● mobile ● azure ● c++ ● kubernetes ● python ● AWS

Signify Health is hiring a Remote Site Reliability Engineer II

How will this role have an Impact?

Join Signify Health's vibrant Site Reliability Engineering team as a Site Reliability Engineer. We’re seeking passionate individuals from diverse technical backgrounds. Reporting to the Manager of Site Reliability Engineering, we offer a collaborative environment that values each team member's unique contribution and fosters an inclusive culture.

Your Role:

Developing strategies to improve the stability, scalability, and availability of our products.
Maintain and deploy observability solutions to optimize system performance.
Collaborate with cross-functional teams to enhance operational processes and service management.
Design, build, and maintain application stacks for product teams.
Create sustainable systems and services through automation.

Skills We’re Seeking:

An eagerness to grow and collaborate in the field of Site Reliability Engineering.
Strong familiarity with cloud environments (Azure, AWS, or GCP) and a desire to develop further expertise.
Intermediate understanding of scripting languages, preferably with exposure to Bash or Python, and programming languages, preferably with exposure to Golang.
Novice understanding of infrastructure as code, preferably with exposure to Terraform.
Novice understanding of Kubernetes and containerization technologies.
Novice understanding of CI/CD principles and willingness to guide and enforce best practices.
Novice understanding of Site Reliability and observability principles, preferably with exposure to New Relic.
A proactive approach to identifying problems, performance bottlenecks, and areas for improvement.

The base salary hiring range for this position is $72,100 to $125,600. Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for incentive compensation, equity, and benefits.
In addition to your compensation, enjoy the rewards of an organization that puts our heart into caring for our colleagues and our communities. Eligible employees may enroll in a full range of medical, dental, and vision benefits, 401(k) retirement savings plan, and an Employee Stock Purchase Plan. We also offer education assistance, free development courses, paid time off programs, paid holidays, a CVS store discount, and discount programs with participating partners.

About Us:

Signify Health is helping build the healthcare system we all want to experience by transforming the home into the healthcare hub. We coordinate care holistically across individuals’ clinical, social, and behavioral needs so they can enjoy more healthy days at home. By building strong connections to primary care providers and community resources, we’re able to close critical care and social gaps, as well as manage risk for individuals who need help the most. This leads to better outcomes and a better experience for everyone involved.

Our high-performance networks are powered by more than 9,000 mobile doctors and nurses covering every county in the U.S., 3,500 healthcare providers and facilities in value-based arrangements, and hundreds of community-based organizations. Signify’s intelligent technology and decision-support services enable these resources to radically simplify care coordination for more than 1.5 million individuals each year while helping payers and providers more effectively implement value-based care programs.

To learn more about how we’re driving outcomes and making healthcare work better, please visit us at www.signifyhealth.com

Diversity and Inclusion are core values at Signify Health, and fostering a workplace culture reflective of that is critical to our continued success as an organization.

We are committed to equal employment opportunities for employees and job applicants in compliance with applicable law and to an environment where employees are valued for their differences.

See more jobs at Signify Health

Apply for this job

+30d

Senior Site Reliability Engineer

Tyk TechnologiesVancouver,British Columbia,Canada, Remote

B2B ● Design ● mobile ● scrum ● api

Tyk Technologies is hiring a Remote Senior Site Reliability Engineer

Who are Tyk, and what do we do?
The Tyk API Management platform is helping to drive the connected world and power new products and services. We’re changing the way that organisations connect any number of their systems and services. Whether internal, external, public or highly encrypted systems, Tyk helps businesses drive value across the retail, finance, telecoms, healthcare, or media industries (to name just a few!)

If you’ve banked online, used an app to check the news, or perhaps even driven a connected car, API’s, and by extension, Tyk, make that possible. Founded in 2015 with offices in London - UK, London - Ontario, Atlanta and Singapore, we have many thousands of users of our B2B platform across the globe. Brands using Tyk range from Lotte, Bell, T Mobile, to RBS, Capital One and Vinci. We have a varied user base hailing from every continent – even Antarctica.

Our Mission

Tyk is on a mission to connect every system in the world. We’ve started by building an API Management platform.

Total flexibility, default remote, radical responsibility

We offer unlimited paid holidays and remote working from anywhere in the world, for everyone, Why? Tyk was founded on the principle of offering flexibility and autonomy to our employees, we believe this allows our employees to achieve their best results. It also means we can build the best possible team, location and working hours are no barrier.

If this sounds like an environment that you believe could work for you then read on to find out more.

At Tyk, we’re obsessed with building software that solves problems. We count on our Site Reliability Engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions.

Our customer base is growing, so we’re seeking an experienced Senior SRE to optimise, automate, and improve our performance, using insights from massive-scale data in real time. We want an original thinker, a challenger, a technical legend, an opinionated collaborator who wants to make things better.

Here’s what you’ll be getting up to:

Collaborate with the Principal SRE to shape and implement the SRE strategic plan.
Lead the SRE team in translating strategy into actionable plans, coordinating these through the SCRUM process.
Address wellbeing and performance concerns, fostering a positive and productive team environment.
Work with the Principal SRE and Scrum Master to analyse wellbeing survey outcomes and develop improvement plans.
Champion operational communication, ensuring high-quality and timely updates on team progress.
Ensure SLA compliance for our cloud environment through proactive monitoring.
Develop and oversee the roadmap for proactive alerting and monitoring.
Define and track key performance metrics for cloud services, driving continuous improvement.
Design and implement solutions to maintain and enhance KPIs.
Lead performance tuning and fault finding by analysing metrics from operating systems and applications.
Optimise system and infrastructure performance, focusing on innovation and customer needs anticipation.
Engage with commercial teams to understand growth plans and develop corresponding SRE strategies.
Direct the analysis of cloud infrastructure, focusing on automation, scalability, and management.
Align with the Principal SRE on automation strategies for cloud-operations tasks.
Model excellence in software design and automation to enhance Tyk Cloud services, creating runbooks and knowledge sharing.
Conduct blame-free root cause analysis postmortems, reporting findings and recommendations.
Document operational processes and policies, ensuring replicability and adherence.
Provide on-call support, ensuring effective response and resolution in line with SLAs.
Plan and execute software upgrades to optimise cloud services.
Assist commercial teams with data requests and account management.
Champion and adhere to SCRUM methodologies within the SRE team.

Here’s what we’re looking for:

Proven experience in a senior SRE role or similar.
Strong knowledge of cloud technologies and SLA SLO SLI management.
Experience leading teams and implementing SCRUM processes.
Excellent communication and leadership skills.
Experience line managing, mentoring and coaching.
Ability to analyze and improve operational processes and performance metrics.
Experience in software design, automation, and root cause analysis.
On-call support experience and customer-focused mindset.
Collaborative attitude with commercial and technical teams.
Launching and operating production Kubernetes clusters.
Designing and operating infrastructure on AWS and other providers.
Operating MongoDB (or other document database) clusters.
Operating Redis (or other key-value storage) clusters.
Administering Linux servers.
Maintaining distributed software.
Operating Prometheus and Grafana.
Operating logging collection and analysis system.
Working hours within 16:00pm – 4:00am UTC.

Skills:

Kubernetes (administrator)
Go and/or Python (advanced)
AWS (proficient)
Linux (proficient)
Terraform and IaC in general (proficient)
Helm (familiar)
MongoDB (or similar)
Redis (or similar)
Monitoring & logging
Grasp of networking concepts (subnets, routing, peering, load balancing, NAT, etc.)
Common networking protocols (DNS, TCP/IP, HTTP, TLS, UDP)

We all share the same vision - we value authenticity, respect, responsibility, independence, honesty, diversity and inclusion and most importantly treating others how you wish to be treated. We look for like-minded people who bring their personalities to work everyday, strive to achieve their personal goals and who are willing to challenge the way we do things, why? - to make what we do even better!

Our values tell the story of Tyk - here’s how:

It’s ok to screw up!

We’ve found that it’s often the ‘stupid’ or unexpected ideas that turn out to be the successful ones - so try it, at least we can say we have!

The only stupid idea, is the untested one!

It’s in our DNA - starting a business with founders 12 hours apart, giving our gateway away for free - sure, we did that, and we’d do it again!

Trust starts with you - make it count!

Trust is a two-way street - instil it from day one!

Assume best intent!

We have each other’s back - we’re all on the same team. Think before you speak or act.

Make things better!

Always try to leave things better than when you found them - change is constant, inevitable and embraced! Be that change we want to see.

Here’s why you should join us:

Everyone has unlimited paid holidays.
We have total flexibility in hours, as we believe creativity flows better when our people are given freedom to decide when they are most productive. Everyone is unique after all.
Employee share scheme
Generous maternity and paternity leave
Volunteering Days
Company retreats
Employee Wellbeing platform

What’s it like to work here?! check it out: https://tyk.io/worklife/

Tyk is an equal opportunities employer and we are determined to ensure that no applicant or employee receives less favourable treatment on the grounds of gender, age, disability, religion, belief, sexual orientation, marital status, or race, or is disadvantaged by conditions or requirements which cannot be shown to be justifiable.

You can see more about us here https://tyk.io

See more jobs at Tyk Technologies

Apply for this job

+30d

Senior Site Reliability Engineer

WebflowU.S. Remote

Sales ● Webflow ● Bachelor's degree ● remote-first ● terraform ● ansible ● mongodb ● c++ ● docker ● typescript ● kubernetes ● python ● AWS ● javascript

Webflow is hiring a Remote Senior Site Reliability Engineer

At Webflow, our mission is to bring development superpowers to everyone. Webflow is the leading visual development platform for building powerful websites without writing code. By combining modern web development technologies into one platform, Webflow enables people to build websites visually, saving engineering time, while clean code seamlessly generates in the background. From independent designers and creative agencies to Fortune 500 companies, millions worldwide use Webflow to be more nimble, creative, and collaborative. It’s the web, made better.

We’re looking for a Senior Site Reliability Engineerto improve reliability and stability of Webflow’s customer-facing, production infrastructure, serving millions of page views per hour. Our product is used by over 2 million users world-wide across 190 countries, and you’ll help ensure our platform is secure and scalable for these users as tens of thousands of projects are launched on Webflow each month.

About the role

Location: Remote-first (United States; BC & ON, Canada)
Full-time
Permanent
Exempt
The cash compensation for this role is tailored to align with the cost of labor in different geographic markets. We've structured the base pay ranges for this role into zones for our geographic markets, and the specific base pay within the range will be determined by the candidate’s geographic location, job-related experience, knowledge, qualifications, and skills.

United States (all figures cited below in USD and pertain to workers in the United States)

Zone A: $158,000 - $218,000
Zone B: $149,000 - $205,000
Zone C: $139,00 - $192,000

Canada (All figures cited below in CAD and pertain to workers in ON & BC, Canada)

CAD 180,000 - CAD 248,000

Please visit our Careers page for more information on which locations are included in each of our geographic pay zones. However, please confirm the zone for your specific location with your recruiter.
Reporting to the Engineering Manager

As a Senior Site Reliability Engineer, you’ll …

Empower engineers on other teams to take control of their services by maintaining monitoring tooling and collaborating on internal best practices for observability.
Enhance reliability of applications running in Kubernetes by optimizing resource allocation, streamlining upgrade processes, and ensuring scalability and fault tolerance.
Occasionally dive into the main Webflow application in Node, Python, or Go to better discern (and sometimes fix) behavior in production.
Work with peers on Webflow’s Customer Support, Partnerships, and Sales teams to enable customers using Webflow’s services in production.
Participate in and continuously improve on-call and incident response processes.

In addition to the responsibilities outlined above, at Webflow we will support you in identifying where your interests and development opportunities lie and we'll help you incorporate them into your role.

About you

You’ll thrive as a Senior Site Reliability Engineer if you …

Either a background as an ops engineer with an enthusiasm for code, or a background as a software engineer with an enthusiasm for systems administration.
5+ years of experience building, maintaining, and debugging distributed systems in a customer-facing environment that allows for little to no downtime.
Experience navigating and scaling multi-tier cloud environments on either AWS or GCP.
Experience with container-centric architectures, built with Docker and tools like Kubernetes (EKS, GKE, AKS, OpenShift, etc.), ECS, Docker Swarm, or Mesos.
Experience with infrastructure-as-code tools like Terraform, Pulumi, Ansible, Puppet, or Chef.
Experience in contributing to full-stack applications built using tools like React, Node, and MongoDB.
Enthusiasm for mentoring and sponsoring less-experienced engineers.

It would be a bonus if you had even one of the following …

Experience with Kubernetes, Nginx, Terraform, or Pulumi specifically.
Experience improving on-call and incident response processes for Engineering.
Experience working in high-compliance environments or a special interest in security engineering. We are not the security team, but we are always looking to improve our security posture!

Our Core Behaviors:

Obsess over customer experience. We deeply understand what we’re building and who we’re building for and serving. We define the leading edge of what’s possible in our industry and deliver the future for our customers
Move with heartfelt urgency. We have a healthy relationship with impatience, channeling it thoughtfully to show up better and faster for our customers and for each other. Time is the most limited thing we have, and we make the most of every moment
Say the hard thing with care. Our best work often comes from intelligent debate, critique, and even difficult conversations. We speak our minds and don’t sugarcoat things — and we do so with respect, maturity, and care
Make your mark. We seek out new and unique ways to create meaningful impact, and we champion the same from our colleagues. We work as a team to get the job done, and we go out of our way to celebrate and reward those going above and beyond for our customers and our teammates

Benefits & wellness

Equity ownership (RSUs) in a growing, privately-owned company
100% employer-paid healthcare, vision, and dental insurance coverage for employees and dependents (full-time employees working 30+ hours per week), as well as Health Savings Account/Health Reimbursement Account, dependent care Flexible Spending Account (US only), dependent on insurance plan selection where applicable in the respective country of employment; Employees may also have voluntary insurance options, such as life, disability, hospital protection, accident, and critical illness where applicable in the respective country of employment
12 weeks of paid parental leave for both birthing and non-birthing caregivers, as well as an additional 6-8 weeks of pregnancy disability for birthing parents to be used before child bonding leave (where local requirements are more generous employees receive the greater benefit); Employees also have access to family planning care and reimbursement
Flexible PTO with a mandatory annual minimum of 10 days paid time off for all locations (where local requirements are more generous employees receive the greater benefit), and sabbatical program
Access to mental wellness and professional coaching, therapy, and Employee Assistance Program
Monthly stipends to support health and wellness, smart work, and professional growth
Professional career coaching, internal learning & development programs
401k plan and pension schemes (in countries where statutorily required) financial wellness benefits, like CPA or financial advisor coverage
Discounted Pet Insurance offering (US only)
Commuter benefits for in-office employees

Temporary employees are not eligible for paid holiday time off, accrued paid time off, paid leaves of absence, or company-sponsored perks unless otherwise required by law.

Remote, together

At Webflow, equality is a core tenet of our culture. We are an Equal Opportunity (EEO)/Veterans/Disabled Employer and are committed to building an inclusive global team that represents a variety of backgrounds, perspectives, beliefs, and experiences. Employment decisions are made on the basis of job-related criteria without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other classification protected by applicable law. Pursuant to the San Francisco Fair Chance Ordinance, Webflow will consider for employment qualified applicants with arrest and conviction records.

Stay connected

Not ready to apply, but want to be part of the Webflow community? Consider following our story on our Webflow Blog, LinkedIn, X (Twitter), and/or Glassdoor.

Please note:

We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Upon interview scheduling, instructions for confidential accommodation requests will be administered.

To join Webflow, you'll need a valid right to work authorization depending on the country of employment.

If you are extended an offer, that offer may be contingent upon your successful completion of a background check, which will be conducted in accordance with applicable laws. We may obtain one or more background screening reports about you, solely for employment purposes.

For information about how Webflow processes your personal information, please reviewWebflow’s Applicant Privacy Notice.

See more jobs at Webflow

Apply for this job

+30d

Senior Site Reliability Engineer (Argentina)

SezzleArgentina, Remote

Sales ● DevOPS ● Bachelor's degree ● terraform ● sql ● Design ● c++ ● docker ● kubernetes ● linux ● python ● AWS

Sezzle is hiring a Remote Senior Site Reliability Engineer (Argentina)

About the Role:

Our Company:

What Makes Working at Sezzle Awesome?

Responsibilities:

Design, build and maintain scalable infrastructure for running our systems, based on Kubernetes, Redshift and additional AWS services and products.
Help the product teams quickly build out MVP products to test new solutions on the market.
Maintain and develop monitoring and alerting solutions to improve the on-call experience.
Assist product developers in debugging and triaging production issues.
Be the first line of defense for our operational environments, triaging and resolving problems as they occur. You will be on an on-call rotation.
Design and scale platform and data architectures to sustain rapid user growth.
Level up the teams through pairing, code review, and mentoring.
Bring and share with our team extensive experience with industry best practices in software development.

Minimum Requirements:

Bachelor's in computer science (preferred) or equivalent related experience
At least 5+ years of overall software, data, deployments and platform infrastructure experience.

Ideal Skills & Experience:

Experience with building and/or serving REST APIs using Go or a similar language.
Experience with Relational Databases, SQL and ORM technologies.
Strong overall Linux knowledge.
DevOps experience with CI/CD pipelines, Docker and Kubernetes, and cloud computing platforms like AWS.
Experience with deployment/provisioning tools like Terraform, Helm, Ansible.
Experience with implementing and maintaining observability and monitoring tools - Prometheus, Datadog, NewRelic, Grafana, Loki or similar.
Experience in ETL/ELT pipelines using Python and Open-source tools such as DBT.
Proficiency in building and maintaining large-scale data warehousing technologies such as Redshift.

Sezzle’s Technology Stack:

Languages:Golang, Typescript, Python
Frontend:Typescript - React and React Native
Backend:Golang
Database:MySQL, Postgres, Elasticsearch
DevOps & Cloud:AWS, Kubernetes
Version Control:Git
CI/CD:Gitlab
Testing:Developer-driven, focus on automated unit, integration, and end-to-end tests
Sezzle is focused on using open source, and we build what we can before buying!

About You:

A+ character. We are team-first here at Sezzle.
A hard-working mentality. It’s early and there is still a lot to build.
An excellent communicator.
A fun attitude. Life’s too short. We can have fun while we work hard on cool things.
Smarts. We need people that are smart enough to make decisions on their own and also smart enough to know when they need input from others.

Compensation

The compensation range for the role is as follows:

4,600 - 9,000 USD Monthly

#Li-remote

See more jobs at Sezzle

Apply for this job

+30d

Senior Site Reliability Engineer (Brazil)

SezzleBrazil, Remote

Sales ● DevOPS ● Bachelor's degree ● terraform ● sql ● Design ● c++ ● docker ● kubernetes ● linux ● python ● AWS

Sezzle is hiring a Remote Senior Site Reliability Engineer (Brazil)

About the Role:

Our Company:

What Makes Working at Sezzle Awesome?

Responsibilities:

Design, build and maintain scalable infrastructure for running our systems, based on Kubernetes, Redshift and additional AWS services and products.
Help the product teams quickly build out MVP products to test new solutions on the market.
Maintain and develop monitoring and alerting solutions to improve the on-call experience.
Assist product developers in debugging and triaging production issues.
Be the first line of defense for our operational environments, triaging and resolving problems as they occur. You will be on an on-call rotation.
Design and scale platform and data architectures to sustain rapid user growth.
Level up the teams through pairing, code review, and mentoring.
Bring and share with our team extensive experience with industry best practices in software development.

Minimum Requirements:

Bachelor's in computer science (preferred) or equivalent related experience
At least 5+ years of overall software, data, deployments and platform infrastructure experience.

Ideal Skills & Experience:

Experience with building and/or serving REST APIs using Go or a similar language.
Experience with Relational Databases, SQL and ORM technologies.
Strong overall Linux knowledge.
DevOps experience with CI/CD pipelines, Docker and Kubernetes, and cloud computing platforms like AWS.
Experience with deployment/provisioning tools like Terraform, Helm, Ansible.
Experience with implementing and maintaining observability and monitoring tools - Prometheus, Datadog, NewRelic, Grafana, Loki or similar.
Experience in ETL/ELT pipelines using Python and Open-source tools such as DBT.
Proficiency in building and maintaining large-scale data warehousing technologies such as Redshift.

About You:

A+ character. We are team-first here at Sezzle.
A hard-working mentality. It’s early and there is still a lot to build.
An excellent communicator.
A fun attitude. Life’s too short. We can have fun while we work hard on cool things.
Smarts. We need people that are smart enough to make decisions on their own and also smart enough to know when they need input from others.

Compensation

The compensation range for the role is as follows:

4,600 - 9,000 USD Monthly

#Li-remote

See more jobs at Sezzle

Apply for this job

+30d

Senior Site Reliability Engineer (Chile)

SezzleChile, Remote

Sales ● DevOPS ● Bachelor's degree ● terraform ● sql ● Design ● c++ ● docker ● kubernetes ● linux ● python ● AWS

Sezzle is hiring a Remote Senior Site Reliability Engineer (Chile)

About the Role:

Our Company:

What Makes Working at Sezzle Awesome?

Responsibilities:

Design, build and maintain scalable infrastructure for running our systems, based on Kubernetes, Redshift and additional AWS services and products.
Help the product teams quickly build out MVP products to test new solutions on the market.
Maintain and develop monitoring and alerting solutions to improve the on-call experience.
Assist product developers in debugging and triaging production issues.
Be the first line of defense for our operational environments, triaging and resolving problems as they occur. You will be on an on-call rotation.
Design and scale platform and data architectures to sustain rapid user growth.
Level up the teams through pairing, code review, and mentoring.
Bring and share with our team extensive experience with industry best practices in software development.

Minimum Requirements:

Bachelor's in computer science (preferred) or equivalent related experience
At least 5+ years of overall software, data, deployments and platform infrastructure experience.

Ideal Skills & Experience:

Experience with building and/or serving REST APIs using Go or a similar language.
Experience with Relational Databases, SQL and ORM technologies.
Strong overall Linux knowledge.
DevOps experience with CI/CD pipelines, Docker and Kubernetes, and cloud computing platforms like AWS.
Experience with deployment/provisioning tools like Terraform, Helm, Ansible.
Experience with implementing and maintaining observability and monitoring tools - Prometheus, Datadog, NewRelic, Grafana, Loki or similar.
Experience in ETL/ELT pipelines using Python and Open-source tools such as DBT.
Proficiency in building and maintaining large-scale data warehousing technologies such as Redshift.

About You:

A+ character. We are team-first here at Sezzle.
A hard-working mentality. It’s early and there is still a lot to build.
An excellent communicator.
A fun attitude. Life’s too short. We can have fun while we work hard on cool things.
Smarts. We need people that are smart enough to make decisions on their own and also smart enough to know when they need input from others.

Sezzle’s Technology Stack:

Languages:Golang, Typescript, Python
Frontend:Typescript - React and React Native
Backend:Golang
Database:MySQL, Postgres, Elasticsearch
DevOps & Cloud:AWS, Kubernetes
Version Control:Git
CI/CD:Gitlab
Testing:Developer-driven, focus on automated unit, integration, and end-to-end tests
Sezzle is focused on using open source, and we build what we can before buying!

Compensation

The compensation range for the role is as follows:

4,600 - 9,000 USD Monthly

#Li-remote

See more jobs at Sezzle

Apply for this job

+30d

Senior Site Reliability Engineer (Colombia)

SezzleColombia, Remote

Sales ● DevOPS ● Bachelor's degree ● terraform ● sql ● Design ● c++ ● docker ● kubernetes ● linux ● python ● AWS

Sezzle is hiring a Remote Senior Site Reliability Engineer (Colombia)

About the Role:

Our Company:

What Makes Working at Sezzle Awesome?

Responsibilities:

Design, build and maintain scalable infrastructure for running our systems, based on Kubernetes, Redshift and additional AWS services and products.
Help the product teams quickly build out MVP products to test new solutions on the market.
Maintain and develop monitoring and alerting solutions to improve the on-call experience.
Assist product developers in debugging and triaging production issues.
Be the first line of defense for our operational environments, triaging and resolving problems as they occur. You will be on an on-call rotation.
Design and scale platform and data architectures to sustain rapid user growth.
Level up the teams through pairing, code review, and mentoring.
Bring and share with our team extensive experience with industry best practices in software development.

Minimum Requirements:

Bachelor's in computer science (preferred) or equivalent related experience
At least 5+ years of overall software, data, deployments and platform infrastructure experience.

Ideal Skills & Experience:

Experience with building and/or serving REST APIs using Go or a similar language.
Experience with Relational Databases, SQL and ORM technologies.
Strong overall Linux knowledge.
DevOps experience with CI/CD pipelines, Docker and Kubernetes, and cloud computing platforms like AWS.
Experience with deployment/provisioning tools like Terraform, Helm, Ansible.
Experience with implementing and maintaining observability and monitoring tools - Prometheus, Datadog, NewRelic, Grafana, Loki or similar.
Experience in ETL/ELT pipelines using Python and Open-source tools such as DBT.
Proficiency in building and maintaining large-scale data warehousing technologies such as Redshift.

About You:

A+ character. We are team-first here at Sezzle.
A hard-working mentality. It’s early and there is still a lot to build.
An excellent communicator.
A fun attitude. Life’s too short. We can have fun while we work hard on cool things.
Smarts. We need people that are smart enough to make decisions on their own and also smart enough to know when they need input from others.

Sezzle’s Technology Stack:

Languages:Golang, Typescript, Python
Frontend:Typescript - React and React Native
Backend:Golang
Database:MySQL, Postgres, Elasticsearch
DevOps & Cloud:AWS, Kubernetes
Version Control:Git
CI/CD:Gitlab
Testing:Developer-driven, focus on automated unit, integration, and end-to-end tests
Sezzle is focused on using open source, and we build what we can before buying!

Compensation

The compensation range for the role is as follows:

4,600 - 9,000 USD Monthly

#Li-remote

See more jobs at Sezzle

Apply for this job

+30d

Site Reliability Engineer - II (SRE II)

Live PersonHyderabad, Telangana, India (Remote)

DevOPS ● terraform ● nosql ● postgres ● sql ● ansible ● mongodb ● azure ● elasticsearch ● MySQL ● kubernetes ● linux ● jenkins ● AWS

Live Person is hiring a Remote Site Reliability Engineer - II (SRE II)

LivePerson (NASDAQ: LPSN) is the global leader in enterprise conversations. Hundreds of the world’s leading brands — including HSBC, Chipotle, and Virgin Media — use our award-winning Conversational Cloud platform to connect with millions of consumers. We power nearly a billion conversational interactions every month, providing a uniquely rich data set and safety tools to unlock the power of Conversational AI for better customer experiences.

At LivePerson, we foster an inclusive workplace culture that encourages meaningful connection, collaboration, and innovation. Everyone is invited to ask questions, actively seek new ways to achieve success, nd reach their full potential. We are continually looking for ways to improve our products and make things better. This means spotting opportunities, solving ambiguities, and seeking effective solutions to the problems our customers care about.

Overview:

LivePerson is looking for a Site Reliability Engineer for the GPT (Global Product & Technology) Division. You will be part of the LiverPerson SRE team building and managing highly available, distributed systems. You will have the opportunity to be part of a strong team and enjoy the work environment of a start-up, with a robust product and the benefits of a leading company in its field.

You will:

Ensure product high uptime and reliability 24x7.
Manage Linux servers in a multi-cloud environment
Manage high availability Kubernetes resources using Helm charts
Assist with deploying upgrades and patches using Chef/Ansible/Puppet/Helm
Monitoring and troubleshooting warnings and alerts related to the reporting platform’s performance
Develop monitoring resources and alerting systems such as Grafana, Prometheus, Kibana, DataDog and PagerDuty
Coordinate with DBA and developers to manage SQL and NOSQL database systems, including MongoDB, ElasticSearch, Postgres, MySQL and others
Managing message bus systems such as Kafka and Pulsar
Build and maintain CI/CD pipelines using Jenkins/Gitlab/Teamcity

You have:

Minimum 4+ years of experience of managing cloud based production environment (AWS, GCP, Azure, etc)
Highly experienced working in the Linux environment, good scripting in Bash / Python.
Highly experienced working configuration management systems like OpsCode Chef, Ansible, Puppet, etc.
Strong experience in Terraform, CloudFormation or other IAC
Experienced in SQL, including DDL and complex queries
Experienced working in the Kubernetes platform
Experience working in a microservices architecture using a message bus
Good knowledge of CI/CD pipelines orchestrators like TeamCity, Jenkins, Gitlab
Ability to integrate security best practices into the SRE workflow.
Highly motivated and independent.
Team player and excellent interpersonal Skills.
Excellent written and verbal communication skills.
BS in Computer Science or a related field, or equivalent work experience.
A strong background in cloud, network and application security and compliance
Experience with GPT or other LLMs a strong advantage

Benefits

Health: Medical, Dental, and Vision
Time away: Vacation and holidays
Development: Generous tuition reimbursement and access to internal professional development resources.
Equal opportunity employer

Why You’ll Love Working Here

As leaders in enterprise customer conversations, we celebrate diversity, empowering our team to forge impactful conversations globally. LivePerson is a place where uniqueness is embraced, growth is constant, and everyone is empowered to create their own success. And, we're very proud to have earned recognition from Fast Company, Newsweek, and BuiltIn for being a top innovative, beloved, and remote-friendly workplace.

Belonging At LivePerson

We are proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants with criminal histories, consistent with applicable federal, state, and local law.

We are committed to the accessibility needs of applicants and employees. We provide reasonable accommodations to job applicants with physical or mental disabilities. Applicants with a disability who require reasonable accommodation for any part of the application or hiring process should inform their recruiting contact upon initial connection.

Apply for this job