Site Reliability Engineer Remote Jobs

49 Results

+30d

Sr. Site Reliability Engineer

Signify HealthDallas TX, Remote
terraformairflowmobileazurec++kubernetespythonAWS

Signify Health is hiring a Remote Sr. Site Reliability Engineer

How will this role have an impact?

Signify Health is looking for a passionate Site Reliability Engineer (SRE) to enhance our dynamic SRE team. Reporting to the Sr Director of Cloud Operations and SRE, we welcome individuals from different technical backgrounds, especially software engineers aspiring to transition into SRE/DevOps roles. 

At Signify Health, we appreciate and respect the unique experiences and perspectives that each team member brings. We are committed to providing an environment where everyone feels welcomed, respected, and empowered. So, no matter what your background is, we invite you to join us and help shape the future of healthcare while refining your skills in the SRE domain.

Diversity and Inclusion are core values at Signify Health, and fostering a workplace culture reflective of that is critical to our continued success as an organization

What will you do?

  • Develop and implement strategies that improve the stability, scalability, and availability of our products
  • Maintain and deploy observability solutions for infrastructure and applications to ensure optimal performance
  • Participate in real-time service management, including crafting monitoring systems, alerts, playbooks, and runbooks in collaboration with our development teams
  • Utilize your on-call rotation to proactively prevent incidents and maintain uninterrupted operations
  • Work alongside colleagues from various disciplines to optimize operational processes
  • This is a remote role with some occasional travel required to Dallas, TX


Basic Requirements

  • Minimum of 4 years of relevant technical experience, with an emphasis on SRE/DevOps
  • Experience creating python scripts to solve operational challenges
  • Experience with Pipeline orchestration tooling such as Airflow, Dagster, etc.
  • ELT tooling, Azure Data Factory
  • Experience with Databricks interface/tools
  • Practical experience with Azure or AWS, and Terraform
  • Working knowledge of Kubernetes (AKS/EKS preferred)
  • Familiarity with the deployment of CI/CD systems and practices


About Us:

Signify Health is helping build the healthcare system we all want to experience by transforming the home into the healthcare hub. We coordinate care holistically across individuals’ clinical, social, and behavioral needs so they can enjoy more healthy days at home. By building strong connections to primary care providers and community resources, we’re able to close critical care and social gaps, as well as manage risk for individuals who need help the most. This leads to better outcomes and a better experience for everyone involved.

Our high-performance networks are powered by more than 9,000 mobile doctors and nurses covering every county in the U.S., 3,500 healthcare providers and facilities in value-based arrangements, and hundreds of community-based organizations. Signify’s intelligent technology and decision-support services enable these resources to radically simplify care coordination for more than 1.5 million individuals each year while helping payers and providers more effectively implement value-based care programs.

We are committed to equal employment opportunities for employees and job applicants in compliance with applicable law and to an environment where employees are valued for their differences.

To learn more about how we’re driving outcomes and making healthcare work better, please visit us at www.signifyhealth.com

See more jobs at Signify Health

Apply for this job

+30d

Junior Site Reliability Engineer (SRE)

MedfarMontréal, Canada, Remote
DevOPS2 years of experienceterraformsqlazurec++.net

Medfar is hiring a Remote Junior Site Reliability Engineer (SRE)

Job Description

As a Junior Site Reliability Engineer (SRE) you will play a crucial role within the R&D and Innovation department. You will be called upon to collaborate with the Plexia product-aligned and core architecture team. The highly sensitive nature of health and medical systems expertise makes it so that the availability and reliability of our systems are of paramount importance to MEDFAR.

The goal of the Site Reliability Engineering (SRE) team is to enable the Plexia team to deliver work with substantial autonomy, therefore they will be collaborating with team members across the company to help them achieve better outcomes and to provide them with the necessary tools and technologies to deliver them. As part of the SRE team, you will be joining the team accountable for the operation, resilience and backup of the organization’s tools, products, data and services.

What you will be working on: 

  • Refining and extending current monitoring capabilities to track essential service-level indicators and ensure visibility of these metrics.

  • Improving our infrastructure and software by collaborating extensively with the core architecture and product-aligned teams to identify and deliver improvements that enhance site availability through scalable, secure, and resilient architectures.

  • Defining and executing test plans that aim to ensure the robustness and resilience of our infrastructure and software systems.

  • Managing incidents and emergency response, tracking outages, ensuring data integrity and participating in release management to promote safe, efficient and rapid deployments.

Qualifications

Contribute to our team with your strengths:

  • 1-2 years of experience working in site reliability engineering-related projects (required) plus additional experience in system administration, DevOps or software engineering roles (an asset)

  • Knowledge of Microsoft Azure specifically with high-reliability architecture and security hardening.

  • Experience with CI/CD processes and Azure DevOps pipelines.

  • Proficient in PowerShell.

  • Experience with Windows and Network setup and management

  • Experience in C#, .NET frameworks, and SQL programming

  • Experience in SQL Database Management

  • Strong ability and rigor in documenting tasks and procedures with detail

  • Experience working with Terraform or another IaC framework, an asset 

  • Bilingual (FR/EN). The ability to communicate in English is required as many team members are located in BC.  

Working conditions:

  • Full-time permanent role, 40 hours per week schedule. 
  • 'Emergency working hours' may occasionally be necessary to ensure system stability and address critical issues promptly.
  • Flexibility in working hours is important to collaborate with team members in the Pacific Standard Time zone. 

See more jobs at Medfar

Apply for this job

+30d

Junior Site Reliability Engineer

PodiumRemote, US
Bachelor's degreeterraformDesignansibleazurerubydockerkuberneteslinuxpythonAWS

Podium is hiring a Remote Junior Site Reliability Engineer

At Podium, our mission is to help local businesses win. Our lead conversion platform, powered by AI and integrations, helps local businesses convert leads faster, communicate easier, and make more sales. Every day, thousands of local businesses utilize our review management, communication, marketing, and payments products. 

Our work and focus on helping local businesses thrive has been recognized across the industry, including Forbes’ Next Billion Dollar Startups, Forbes’ Cloud 100, the Inc. 5000, and Fast Company’s World’s Most Innovative Companies.

At Podium, we believe in fostering a culture that thrives on hiring and developing exceptional talent. Our operating principles serve as a compass, guiding daily behavior and decision-making, and ensure we hire people who will thrive at Podium. If you resonate with our operating principles and are energized by our mission, Podium will be a great place for you!

A Site Reliability Engineer borders the worlds of software engineering and systems engineering. At Podium, the SRE team drives our products to success by building a stable, scalable, sustainable, and slick system. We permanently sit and sup with the product engineering teams to address all of their needs, and work as an SRE guild to build a world-class platform for our products to run on. We're currently targeting a junior SRE to come in and deliver impact from day one.

What you will be doing: 

  • Working with the following technologies: Kubernetes, Helm, Docker, AWS, Terraform, Datadog, Prometheus, Ansible, StrongDM, Python, Go, Ruby, GitLab and GitLab CI.
  • Engaging with Podium's engineering community to identify potential areas of improvement or pain points and make Podium's systems more secure and pleasant to operate.
  • Participating in an on-call rotation for the services the team owns, triaging and addressing production as well as development issues.
  • Working cross-functionally with different teams to make sure that there is no downtime for our products.

What you should have: 

  • Bachelor’s degree in a technical field or relevant work experience.
  • 1-3  years experience working alongside a production system running on Kubernetes
  • 1-3 years deploying, operating and debugging server software on Linux
  • Curiosity and the desire to learn
  • Ability to take a rotating on-call shift

What we hope you have: 

  • Experience with distributed systems and microservices
  • Practical knowledge of system design
  • Cloud computing, such as AWS, GCP, or Azure
  • SOC2, HIPAA, PCI, or other regulatory or compliance standards
  • Building and maintaining a CI/CD pipeline

BENEFITS

  • Open and transparent culture - Checkout thisvideoto see what it’s like to work at Podium 
  • Life insurance, long and short-term disability coverage
  • Paid maternity and paternity leave
  • Fertility Benefits
  • Generous vacation time, plus three 4-day summer holiday weekends
  • Excellent medical, dental, and vision benefits
  • 401k Plan
  • Bi-annual swag drops with cool Podium gear and apparel 
  • A stellar HQ (Utah) gym with local professional coaches and classes offered
  • Onsite HQ (Utah) child care center, subsidized for employees
  • Additional benefits for fully remote employees

Podium is an equal opportunity employer. Podium provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, gender, national origin, sexual orientation, gender identity or expression, age, disability, genetic information, marital status or veteran status.

See more jobs at Podium

Apply for this job

+30d

Staff Site Reliability Engineer

Modern HealthRemote - US
DevOPSDjangoS3SQSEC2redisterraformDesignazurepostgresqlpythonAWS

Modern Health is hiring a Remote Staff Site Reliability Engineer

Modern Health 

Modern Healthis a mental health benefits platform for employers. We are the first global mental health solution to offer employees access to one-on-one, group, and self-serve digital resources for their emotional, professional, social, financial, and physical well-being needs—all within a single platform. Whether someone wants to proactively manage stress or treat depression, Modern Health guides people to the right care at the right time. We empower companies to helpalltheir employees be the best version of themselves, and believe in meeting people wherever they are in their mental health journey.

We are a female-founded company backed by investors like Kleiner Perkins, Founders Fund, John Doerr, Y Combinator, and Battery Ventures. We partner with 500+ global companies like Lyft, Electronic Arts, Pixar, Clif Bar, Okta, and Udemy that are taking a proactive approach to mental health care for their employees. Modern Health has raised more than $170 million in less than two years with a valuation of $1.17 billion, making Modern Health the fastest entirely female-founded company in the U.S. to reach unicorn status. 

We tripled our headcount in 2021 and as a hyper-growth company with a fully remote workforce, we prioritize our people-first culture (winning awards including Fortune's Best Workplaces in the Bay Area 2021). To protect our culture and help our team stay connected, we require overlapping hours for everyone. While many roles may function from anywhere in the world—see individual job listing for more—team members who live outside the Pacific time zone must be comfortable working early in the morning or late at night; all full-time employees must work at least six hours between 8 am and 5 pm Pacific time each workday. 

We are looking for driven, creative, and passionate individuals to join in our mission. An inclusive and diverse culture are key components of mental well-being in the workplace, and that starts with how we build our own team. If you're excited about a role, we'd love to hear from you!

The Role

In this role, you'll be given lots of responsibility and the opportunity to have true ownership as we build out the product. This is a unique opportunity to use your engineering powers to make a direct impact in people's lives. We need a Staff Site Reliability Engineer who is enthusiastic about building reliable, scalable, and flexible systems to support our growing team, product, and user base. You'll work with other engineers to reliably release and maintain services, and help define and meet internal and customer-facing SLA's and SLO's.

This position is not eligible to be performed in Hawaii.

What You’ll Do

  • Manage and orchestrate Cloud Resource (AWS) configuration using Infrastructure As Code (Terraform) to empower engineering staff to embrace a DevOps culture of Self Service Ownership
  • Develop and govern Observability (Datadog) best practices for tracking platform performance and health trends to meet customer SLAs and lead technical decisions with strong supporting evidence
  • Create solutions that dynamically scale based on demand with enough flexibility to pivot for fast changing project requirements while maintaining a balance of good versus perfect
  • Provide strong and consistent communication updates on technical progress or blockers to keep stakeholders informed while additionally creating appropriate documentation on technical design to spread knowledge and reduce information silos
  • Participate and respond to 24/7 on-call critical alerts and follow documented incident investigation procedures to reestablish customer facing feature availability
  • Maintain HIPAA, GDPR, SOC-2 compliance and general security through best practice implementation

Who You Are

  • At least 8+ years of experience in software engineering with 4+ years experience in DevOps
  • Cloud Provider (AWS, GCP, Azure) experience on managing resources through Infrastructure As Code (Terraform) 
  • Container Orchestration (ECS or K8s) experience to confidently build, test, and release containerized applications for multiple environments and regions
  • Knowledge of Observability best practices across common cloud resources (EC2, ECS, RDS, DynamoDB, S3, SQS, Eventbridge) with experience on rolling out enhancements across a distributed platform with scale in mind
  • Experience with shell scripting for *nix systems
  • Experience with Networking for web applications
  • Effective at communicating ideas through writing and diagramming
  • Comfortable working with a distributed development and ops team
  • Familiarity with AWS: ECS and cloud hosting, Gitlab: CI/CD, Python: Django, Flask, aiohttp, Bash, Data: PostgreSQL, Redis, Monitoring: Datadog and Sentry, IaC: Terraform, Packer

Benefits

Fundamentals:

  • Medical / Dental / Vision / Disability / Life Insurance 
  • High Deductible Health Plan with Health Savings Account (HSA) option
  • Flexible Spending Account (FSA)
  • Access to coaches and therapists through Modern Health's platform
  • Generous Time Off 
  • Company-wide Collective Pause Days 

Family Support:

  • Parental Leave Policy 
  • Family Forming Benefit through Carrot
  • Family Assistance Benefit through UrbanSitter

Professional Development:

  • Professional Development Stipend

Financial Wellness:

  • 401k
  • Financial Planning Benefit through Origin

But wait there’s more…! 

  • Annual Wellness Stipend to use on items that promote your overall well being 
  • New Hire Stipend to help cover work-from-home setup costs
  • ModSquad Community: Virtual events like active ERGs, holiday themed activities, team-building events and more
  • Monthly Cell Phone Reimbursement

Equal Pay for Equal Work Act Information

Please refer to the ranges below to find the starting annual pay range for individuals applying to work remotely from the following locations for this role.


Compensation for the role will depend on a number of factors, including a candidate’s qualifications, skills, competencies, and experience and may fall outside of the range shown. Ranges are not necessarily indicative of the associated starting pay range in other locations. Full-time employees are also eligible for Modern Health's equity program and incredible benefits package. See our Careers page for more information.

Depending on the scope of the role, some ranges are indicative of On Target Earnings (OTE) and includes both base pay and commission at 100% achievement of established targets.

San Francisco Bay Area
$160,700$189,000 USD
All Other California Locations
$160,700$189,000 USD
Colorado
$136,600$160,700 USD
New York City
$160,700$189,000 USD
All Other New York Locations
$144,700$170,000 USD
Seattle
$160,700$189,000 USD
All Other Washington Locations
$144,700$170,000 USD

Below, we are asking you to complete identity information for the Equal Employment Opportunity Commission (EEOC). While we are required by law to ask these questions in the format provided by the EEOC, at Modern Health we know that gender is not binary, and we recognize that these categories do not reflect our employees' full range of identities.

See more jobs at Modern Health

Apply for this job

+30d

Senior Site Reliability Engineer (m/f/x)

commercetoolsEurope (Remote)
golangterraformscalaDesignazuregraphqlkubernetesAWS

commercetools is hiring a Remote Senior Site Reliability Engineer (m/f/x)

commercetools - we are:

  • Engaged: We didn't become the fastest growing, highest ever valued SaaS software company in digital commerce with nearly 100% year-over-year growth by sitting on the sidelines.
  • Inspired: We continually explore what's possible. As the founder of the headless commerce concept, the leader in true composable commerce, and the visionaries behind MACH® — our patented tech has radically disrupted the world of enterprise ecommerce software. And we are just getting started!
  • Valued: Intelligent, resilient, passionate individuals hailing from over 50 countries across the globe, speaking over 43 languages, and collectively embracing diversity, encouraging inclusion, and fostering a culture of care.

 *We can only consider applicants within a commutable distance to our offices in Amsterdam, Berlin, London, Munich, or Valencia.

The Opportunity:

commercetools represents the collective work of numerous teams; each team building a fraction of the overall platform to create a singular, powerful platform for our users. The Special Delivery team focuses our energy on enabling all these teams, building in their own way, to deliver high quality software to the world.

Your Mission:

  • Communicate decisions and actions effectively and asynchronously to the team
  • Assist team members proactively and with priority
  • Ability to divide work tickets into achievable tasks and milestones for the rest of the team.
  • Act as a consistent source of knowledge and counsel for other engineers
  • Foster a culture in which the team feels psychologically safe to openly share their opinions
  • Start and execute a technical Request for Comments (RFC) process to evaluate several alternatives, lead a decision, and collect feedback along the way
  • Take leadership roles as part of an incident management team
  • Use systematic debugging to diagnose all issues within the scope of their domain 

What you need to succeed:

  • 5+ years of SRE experience 
  • 2+ years of experience mentoring and supporting team members 
  • Experience writing automation tooling in Golang
  • Experience using IaC tooling (Terraform or Kubernetes)
  • Experience running production workloads in a major cloud provider (AWS, GCP or Azure)
  • Familiarity in driving architectural discussions and initiatives across teams
  • Experience providing high-quality code reviews to peers and junior engineers, both on and off the team, for development efforts critical to the team
  • Strong time management skills
  • Written and spoken English communication skills

Team Values:

Positivity.Negativity is the enemy of progress.

Trust & Transparency. Promote direct and continuous feedback.

Learning. Be proud if you’ve failed at something. Think big, start small, learn fast!

 

Tech at commercetools:

We Are Open Source And Innovative By Design

???? We make rapid progress by being early adopters of React, Scala, and GraphQL

???? We share & contribute to the open source community: https://github.com/sangria-graphql

⚙️ We <3 Automation and Machine Learning

 

We care about your Growth and Well-being

???? Competitive compensation package:Generous compensation structure consisting of salary, competitive stock option package, various benefits  and perks

☀️ Remote Work:Up to 60 days/year from a country different from your base country  

???? Open Learning & Development Budget

???? ct Academy:Regular internal training sessions

⌚️ Flexibility: Morning person or night owl? We believe in outcome and motivated employees

???? Mindset & Growth:A diverse, creative workspace with an international culture & learning environment

 

Are you ready? Come grow with us!

???? Are you looking for something else? Check out our Career Page and our Website for more information.

 

We are all different and that is what makes us stronger! We hire great people from awide variety of backgrounds, not just because it’s the right thing to do, but because it makes our company better.

commercetools celebrates being adiverse environmentand is proud to be anequal opportunities employer. If your professional profile aligns with our specific hiring requirements and company culture, then we encourage you to apply. We will assessyour competencies, future potential, approachto learning and self-development and passion, and not your age, color, national origin, religion, gender, gender identity or expression, sexual orientation, familial status, genetics, or disability.






See more jobs at commercetools

Apply for this job

+30d

Senior Site Reliability Engineer II

Designmetalc++kubernetes

Oscar Health is hiring a Remote Senior Site Reliability Engineer II

Hi, we're Oscar. We're hiring a Senior Site Reliability Engineer II, Infrastructure Metal to join our Engineering team.

Oscar is the first health insurance company built around a full stack technology platform and a focus on serving our members. We started Oscar in 2012 to create the kind of health insurance company we would want for ourselves—one that behaves like a doctor in the family.

 

About the role

Infrastructure Metal is a site reliability team with a mission to guide software and cloud-based infrastructure decisions for optimal cost, performance, and security.

Our team is focused on the Compute Platforms at Oscar and ensuring that teams have intuitive and simple self-service capabilities to run their applications. We are responsible for maintaining a suite of cloud-native applications that build up our Kubernetes-based platform.

In this role you will lead technical efforts to build reliable and maintainable applications, infrastructure, and interfaces that make interacting with the health care system easier for members and providers.

You will report to the Staff Software Engineer.

 

Work Location

Oscar is a blended work culture where everyone, regardless of work type or location, feels connected to their teammates, our culture and our mission.

If you live within commutable distance to our New York City office ( in Hudson Square), our Tempe office (off the 101 at University Ave), or our Los Angeles office (in Marina Del Rey), you will be expected to come into the office at least two days each week. Otherwise, this is a remote / work-from-home role.  

You must reside in one of the following states: Alabama, Arizona, California, Colorado, Connecticut, Florida, Georgia, Illinois, Iowa, Kentucky, Maryland, Massachusetts, Michigan, Minnesota, New Hampshire, New Jersey, New Mexico, New York, North Carolina, Ohio, Oregon, Pennsylvania, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, or Washington, D.C. Note, this list of states is subject to change. #LI-Remote

 

Pay Transparency

The base pay for this role is: $174,400 - $228,900 per year. You are also eligible for employee benefits, participation in Oscar's unlimited vacation program, company equity grants, and annual performance bonuses.

Responsibilities

  • Become the expert on your team's business and technical domains
  • Lead the planning, execution and release of complex technical projects
  • Work with partners, product managers, and designers to solve challenging problems
  • Lead and mentor engineers on the team to improve technology and apply best practices
  • Independently responsible for large or complex technology capabilities (set of components or services) within their team's domain or spanning multiple domains
  • Facilitates, encourages, and enhances cross-team execution and collaboration; knows when cross-team projects are at risk and actively mitigates risk to deliver on time
  • Prolific contributor to the objectives of their functional group, as well as (potentially) organization-wide projects
  • Drives prioritization of technical roadmap and influences prioritization of product roadmap and process enhancements within their team
  • Actively identifies and reduces failure domains
  • Builds software to minimize effort and business impact during maintenance and failures 
  • Guides the development of Service-Level Objectives (SLOs) for systems they are responsible for
  • Own medium to large features or infrastructure projects from technical design through completion
  • Compliance with all applicable laws and regulations
  • Other duties as assigned

Qualifications

  • 6+ years of professional software engineering experience, working with a variety of technologies, and have increasingly impactful accomplishments
  • Experience as a major contributor cross-pod or cross-company deliverables
  • Experience leading technical contributions, improving the quality of what your teams create, and are excited to build fault-tolerant, and scalable software systems.
  • Demonstrates expertise of the practical application of CS concepts within their team.
  • Sets and enforces the standard for writing stable, correct, and maintainable code 
  • Experience mentoring and training more junior engineers

This is an authentic Oscar Health job opportunity. Learn more about how you can safeguard yourself from recruitment fraudhere

At Oscar, being an Equal Opportunity Employer means more than upholding discrimination-free hiring practices. It means that we cultivate an environment where people can be their most authentic selves and find both belonging and support. We're on a mission to change health care -- an experience made whole by our unique backgrounds and perspectives..

Pay Transparency: 

Final offer amounts, within the base pay set forth above, are determined by factors including your relevant skills, education, and experience.

Full-time employees are eligible for benefits including: medical, dental, and vision benefits, 11 paid holidays, paid sick time, paid parental leave, 401(k) plan participation, life and disability insurance, and paid wellness time and reimbursements.

Reasonable Accommodation:

Oscar applicants are considered solely based on their qualifications, without regard to applicant’s disability or need for accommodation. Any Oscar applicant who requires reasonable accommodations during the application process should contact the Oscar Benefits Team (accommodations@hioscar.com) to make the need for an accommodation known.

See more jobs at Oscar Health

Apply for this job

+30d

Staff Site Reliability Engineer - Observability

FastlyUS (Remote)
agilec++linux

Fastly is hiring a Remote Staff Site Reliability Engineer - Observability

Fastly helps people stay better connected with the things they love. Fastly’s edge cloud platform enables customers to create great digital experiences quickly, securely, and reliably by processing, serving, and securing our customers’ applications as close to their end-users as possible — at the edge of the Internet. The platform is designed to take advantage of the modern internet, to be programmable, and to support agile software development. Fastly’s customers include many of the world’s most prominent companies, including Vimeo, Pinterest, The New York Times, and GitHub.

We're building a more trustworthy Internet. Come join us.

Fastly’s Observability team is looking for a Staff Site Reliability Engineer who is passionate about building, scaling, and automating our internal platforms to provide global visibility to the health and performance of our networks. You will be working alongside other engineering and support teams, to provide insights and recommendations on how we make our services and software stacks more observable. Your focus in logging, metrics, distributed tracing and monitoring will be vital in this role to help Fastly grow our observability platforms.

What You'll Do:

  • Focus on improving and scaling our logging pipelines, telemetry collection, and monitoring systems
  • Improve the performance and reliability of the observability platform infrastructure
  • Create and instrument critical business metrics for insights and transparency
  • Collaborate with other Fastly engineers to implement solutions that deliver value for our internal customer teams
  • You’ll participate in incident reviews to build improved alerts for detection and potential proactive mitigations

What We're Looking For: 

  • Extensive experience scaling out Prometheus architecture i.e. you are not just a user of Prometheus but have actually built the underlying infrastructure
  • Comfortable working with tools like OpenTelemetry, Grafana, Loki, Tempo, and Mimir
  • Extensive experience working with Linux operating systems focusing on metric collection and instrumentation
  • Implementing and scaling observability pipelines using self-managed, on premises, and open source software
  • Experience developing automation, orchestrations, and writing infrastructure as code for platform management
  • Comfortable working with scripting and interpreted languages, and test driven development
  • Excellent communication and listening skills, as well as a high degree of emotional intelligence

We’ll be super impressed if you have experience in any of these: 

  • Deep understanding of challenges with high cardinality, churn, data volumes to anticipate capacity needs 
  • A track record of working across multiple cloud platforms and physical environments to provide global visibility
  • Experience working with Clickhouse for time series data
  • Development of metrics exporters for the Prometheus ecosystem

Work Hours: 

  • This position will require you to be available during core business hours
  • You’ll participate in a on-call rotation to support platform availability

Work Locations & Travel Requirements: 

This position is open to both hybrid and remote locations.

The preferred locations for this position are:

  • San Francisco, CA 
  • Los Angeles, CA
  • Denver, CO
  • New York City, NY

Fastly currently embraces a largely hybrid model for most roles which allows employees flexibility to split their time between the office and home.  

We are willing to consider remote candidates in US (Remote).

This position may require travel as required by your role or requested by your manager.

Salary: 

The estimated salary range for this position is $181,220 to $226,520.

Starting salary may vary based on permissible, non-discriminatory factors such as experience, skills, qualifications, and location.

This role may be eligible to participate in Fastly’s equity and discretionary bonus programs.

Benefits:

We care about you. Fastly works hard to create a positive environment for our employees, and we think your life outside of work is important too. We support our teams with great benefits that start on the first day of your employment with Fastly. Curious about our offerings? 

We offer a comprehensive benefits package including medical, dental, and vision insurance. Family planning, mental health support along with Employee Assistance Program, Insurance (Life, Disability, and Accident), a Flexible Vacation policy and up to 18 days of accrued paid sick leave are there to help support our employees. We also offer 401(k) (including company match) and an Employee Stock Purchase Program. For 2024, we offer 10 paid local holidays, 11 paid company wellness days. 

 

Why Fastly?

  • We have a huge impact. Fastly is a small company with a big reach. Not only do our customers have a tremendous user base, but we also support a growing number of open source projects and initiatives. Outside of code, employees are encouraged to share causes close to their heart with others so we can help lend a supportive hand.

  • We love distributed teams. Fastly’s home-base is in San Francisco, but we have multiple offices and employees sprinkled around the globe. As a new hire, you will be able to attend our IN-PERSON new hire orientation in our San Francisco office! It is an exciting week-long experience that we offer to new employees to build connections with colleagues across Fastly, participate in hands-on learning opportunities, and immerse yourself in our culture firsthand. 

  • We value diversity. Growing and maintaining our inclusive and diverse team matters to us. We are committed to being a company where our employees feel comfortable bringing their authentic selves to work and have the ability to be successful -- every day.

  • We are passionate. Fastly is chock full of passionate people and we’re not ‘one size fits all’. Fastly employs authors, pilots, skiers, parents (of humans and animals), makeup geeks, coffee connoisseurs, and more. We love employees for who they are and what they are passionate about.

We’re always looking for humble, sharp, and creative folks to join the Fastly team. If you think you might be a fit please apply!A fully completed application and resume or CV are required when applying.

Fastly is committed to ensuring equal employment opportunity and to providing employees with a safe and welcoming work environment free of discrimination and harassment. Our employment decisions are based on business needs, job requirements and individual qualifications.All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, family or parental status, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.

Consistent with the Americans with Disabilities Act (ADA) and federal or state disability laws, Fastly will provide reasonable accommodations for applicants and employees with disabilities. If reasonable accommodation is needed to participate in the job application or interview process, to perform essential job functions, and/or to receive other benefits and privileges of employment, please contact your Recruiter, or the Fastly Employee Relations team atcandidateaccommodations@fastly.comor 501-287-4901. 

Fastly collects and processes personal data submitted by job applicants in accordance with our Privacy Policy. Please see our privacy notice for job applicants.

See more jobs at Fastly

Apply for this job

+30d

Site Reliability Engineer

Master’s DegreeBachelor's degreeDesignansibleazurec++.netdockerkubernetesAWSjavascript

Abarca Health is hiring a Remote Site Reliability Engineer

What you’ll do

In a few words…

Abarca is igniting a revolution in healthcare.  We built our company on the belief that with smarter technology we are redefining pharmacy benefits, but this is just the beginning…

Our Site Reliability Engineering team leverages software engineering and infrastructure operations to create highly reliable and scalable software systems. The team is responsible for ensuring that Abarca’s infrastructure operates efficiently by assisting with the design, build, and maintenance of software systems that automate and optimize the deployment, monitoring, and performance of Abarca’s systems. By focusing on improving the reliability and availability of software systems through engineering best practices and tools, we manage complex distributed systems to meet our external Service Level Agreements and internal Operating Level Agreements.

As our Site Reliability Engineer, you will be responsible for collaborating on the design, build, and maintenance of reliable and scalable infrastructure and software systems. This will be accomplished by tracking error budgets against service level agreements in order to meet and maintain compliance. You will also be collaborating with our Infrastructure, Software Engineering and Security teams to identify and implement reliability and performance improvements across our systems.

The fundamentals for the job…

  • Manage error budgets while ensuring that service level agreements are being met while keeping our stakeholders satisfied and reducing penalties associated with performance issues.
  • Monitor systems for potential performance and reliability issues, proactively taking measures to prevent their occurrence and minimize service disruption.
  • Promptly troubleshoot and resolve production issues while also identifying opportunities for improvement in terms of reliability, to ensure timely resolution and mitigate future occurrences.
  • Collaborate with Software Development, among other teams, continuously improving systems and processes to increase efficiency, minimize downtime, and optimize overall system reliability.
  • Develop and maintain automation tools to improve system observability, reliability, and performance.
  • Design and implement disaster recovery plans to ensure business continuity.

What we expect of you

The bold requirements…

  • Bachelor’s or Master’s Degree in Information Technology, Computer Science or a related field. (In lieu of a degree equivalent experience may be considered).
  • 3+ years of experience as a site reliability engineer or within related areas.
  • Experience managing error budgets as well as service level agreements.
  • Experience programming with, but not limited to: .Net, C#, JavaScript, PyScript, T-SQL/SQL.
  • Experience with containerization technologies (e.g. Docker and Kubernetes).
  • Experience with cloud infrastructure platforms (e.g. AWS, Azure, or GCP).
  • Experience with monitoring and alerting tools (e.g. DataDog, AppDynamics, Dynatrace, Prometheus, SolarWinds, Grafana, or Nagios)
  • Participate in on-call rotation to provide 24/7 support for critical systems. Availability to work rotating or irregular shifts, including weekends and certain holidays, per business or operational needs.
  • Some travel required to Puerto Rico location 15-20%.
  • Excellent oral and written communication skills.
  • We are proud to offer a flexible hybrid work model which will require certain on-site work days (Puerto Rico Location Only)

Nice to haves…

  • Experience with automation tools (e.g. Ansible, PowerShell scripting).
  • Certified SRE Foundation (SREF).

Physical requirements…

  • Must be able to access and navigate each department at the organization’s facilities.
  • Sedentary work that primarily involves sitting/standing.

At Abarca we value and celebrate diversity. Diversity, equity, inclusion, and belonging are guiding principles of Abarca and ensure Abarca’s workforce reflects the communities it serves.  We are proud to provide equal employment opportunities to all employees and applicants for employment and prohibit discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, medical condition, genetic information, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws.

Abarca Health LLC is an equal employment opportunity employer and participates in E-Verify.  “Applicant must be a United States’ citizen. Abarca Health LLC does not sponsor employment visas at this time”

The above description is not intended to limit the scope of the job or to exclude other duties not mentioned. It is not a final set of specifications for the position. It’s simply meant to give readers an idea of what the role entails.

#LI-MH1 #LI-REMOTE

See more jobs at Abarca Health

Apply for this job

+30d

Staff Site Reliability Engineer

MozillaRemote US
6 years of experienceterraformairflowsqlDesignansibleazurejavac++openstackdockerelasticsearchkubernetesjenkinspythonAWSbackendNode.js

Mozilla is hiring a Remote Staff Site Reliability Engineer


Why Mozilla?

Mozilla Corporation is the non-profit-backed technology company that has shaped the internet for the better over the last 25 years. We make pioneering brands like Firefox, the privacy-minded web browser, and Pocket, a service for keeping up with the best content online. Now, with more than225million people around the world using our products each month, we’re shaping the next 25 years of technology. Our work focuses on diverse areas including AI, social media, security and more. And we’re doing this while never losing our focus on our core mission – to make the internet better for everyone. 

The Mozilla Corporation is wholly owned by the non-profit 501(c) Mozilla Foundation. This means we aren’t beholden to any shareholders — only to our mission. Along with thousands of volunteer contributors and collaborators all over the world, Mozillians design, build and distributeopen-sourcesoftware that enables people to enjoy the internet on their terms. 

About this team and role:

Mozilla’s Release SRE Team is looking for a Staff SRE to help us build and maintain infrastructure that supports Mozilla products. You will combine skills from DevOps/SRE, systems administration, and software development to influence product architecture and evolution by crafting reliable cloud-based infrastructure for internal and external services.

As a Staff SRE you will work closely with Mozilla’s engineering and product teams and participate in significant engineering projects across the company. You will collaborate with hardworking engineers across different levels of experience and backgrounds. Most of your work will involve improving existing systems, building new infrastructure, evaluating tools and eliminating toil.

What you’ll do:

  • Manage infrastructure in AWS and GCP
  • Write, maintain, and expand automation scripts, metrics and monitoring tooling, and orchestration recipes
  • Lead otherSREs and software development teams to deliver products with an eye on reliability and automation
  • Demonstrate accountability in the delivery of work
  • Spot and raise potential issues to the team
  • Be on-call for production services and infrastructure
  • Be trusted to resolve unclear but urgent tasks
What you’ll bring:
  • Degree and 6 years of experience related to either backend software development or cloud operations or experience related DevOps/SRE
  • Experience programming in at least one of the following languages: Python, Java, C/C++, Go, Node.js or Rust. 
  • Involvement in running services in the cloud
  • Kubernetes administration and optimization
  • Proven understanding of database systems (SQL and/or non-relational databases)
  • Infrastructure As Code and Configuration as Code tooling (Puppet, Chef, Ansible, Salt, Terraform, Amazon Cloudformation or Google Cloud Deployment Manager)
  • Strong communication skills
  • Curiosity and interest in learning new things
  • Commitment to our values:
    • Welcoming differences
    • Being relationship-minded
    • Practicing responsible participation
    • Having grit
Bonus points for…
  • CI/CD orchestration (Jenkins, CircleCI, or TravisCI)
  • ETL, data modeling, cloud-based data storage, processing
  • GCP Data Services (Dataflow, BigQuery, Dataproc)
  • Workflow and data pipeline orchestration (Airflow, Oozie, Jenkins, etc)
  • Container orchestration technologies (Kubernetes, OpenStack, Docker swarm, etc)
  • Open source software involvement
  • Monitoring/Logging with technologies like Splunk, ElasticSearch, Logstash/Fluentd, Stackdriver, Time-series databases like InfluxDB etc.

What you’ll get:

  • Generous performance-based bonus plans to all regular employees - we share in our success as one team
  • Rich medical, dental, and vision coverage
  • Generous retirement contributions with 100% immediate vesting (regardless of whether you contribute)
  • Quarterly all-company wellness days where everyone takes a pause together
  • Country specific holidays plus a day off for your birthday
  • One-time home office stipend
  • Annual professional development budget
  • Quarterly well-being stipend
  • Considerable paid parental leave
  • Employee referral bonus program
  • Other benefits (life/AD&D, disability, EAP, etc. - varies by country)

About Mozilla 

Mozilla exists to build the Internet as a public resource accessible to all because we believe that open and free is better than closed and controlled. When you work at Mozilla, you give yourself a chance to make a difference in the lives of Web users everywhere. And you give us a chance to make a difference in your life every single day. Join us to work on the Web as the platform and help create more opportunity and innovation for everyone online.

Commitment to diversity, equity, inclusion, and belonging

Mozilla understands that valuing diverse creative practices and forms of knowledge are crucial to and enrich the company’s core mission.  We encourage applications from everyone, including members of all equity-seeking communities, such as (but certainly not limited to) women, racialized and Indigenous persons, persons with disabilities, persons of all sexual orientations,gender identities, and expressions.

We will ensure that qualified individuals with disabilities are provided reasonable accommodations to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment, as appropriate. Please contact us at hiringaccommodation@mozilla.com to request accommodation.

We are an equal opportunity employer. We do not discriminate on the basis of race (including hairstyle and texture), religion (including religious grooming and dress practices), gender, gender identity, gender expression, color, national origin, pregnancy, ancestry, domestic partner status, disability, sexual orientation, age, genetic predisposition, medical condition, marital status, citizenship status, military or veteran status, or any other basis covered by applicable laws.  Mozilla will not tolerate discrimination or harassment based on any of these characteristics or any other unlawful behavior, conduct, or purpose.

Group: C

#LI-REMOTE

Req ID: R2515

Hiring Ranges:

US Tier 1 Locations
$163,000$239,000 USD
US Tier 2 Locations
$150,000$220,000 USD
US Tier 3 Locations
$138,000$203,000 USD

See more jobs at Mozilla

Apply for this job