+30d

LambdaRemote, USA

Lambda is hiring a Remote HPC Support Engineering Manager

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

About the role

At Lambda Labs, we are seeking a driven and experienced Manager of High Performance Computing (HPC) Support Operations. As the HPC Support Operations leader at Lambda, you will play a pivotal role in providing design feedback on HPC solutions as well as ensure the highest level of customer satisfaction by responding to and resolving technical issues. In addition to leading, developing, and mentoring a team of HPC Support Engineers, you will also engage product, engineering, and sales teams to provide input into solution and product development.

Must be flexible in working nights and weekends as needed, as well as maintaining an on-call schedule. This position reports to the Director of Customer Support.

What You'll Do

Ensure escalations are handled appropriately and consistently across the team.
Collaborate with the Director of Customer Support to support the development and coaching of the HPC Support team, ensuring the team continues in their technical growth and consistently delivers outstanding customer experiences.
Stay updated on the latest HPC and Nvidia technologies and provide recommendations based on thorough research and knowledge.
Support the HPC Support team by selecting, participating in, and leading training sessions, team meetings, and addressing roadblocks.
Ensure that departmental policies, procedures, and documentation accurately reflect best practices, making necessary changes or modifications as needed.
Provide thought leadership in the evolution of product development based on experiences from customer deployments and field installations.
Review, develop, and distribute support metrics to track team performance and customer satisfaction, constantly seeking opportunities for improvement in support processes and practices.
Assist in developing workflows and procedures for the team based on industry-standard frameworks.
Lead your team to develop tools that assist in the troubleshooting and resolution of technical issues encountered.
Manage team on-call schedule and duties.
Conduct performance reviews for members of the HPC Support team.
Lead by example, actively engaging in resolving customer cases while maintaining the necessary technical knowledge to function effectively as a team member.

You

Proven experience in a technical leadership role, preferably within HPC or AI industry
Strong knowledge of GPU InfiniBand HPC clusters, including hardware, software, and networking components
Advanced knowledge of Linux administration and troubleshooting
Have excellent leadership and team management skills with the ability to motivate and develop a high-performing team
Exceptional customer service and communication skills, with the ability to interact effectively with internal and external customers and stakeholders
Strong problem solving and analytical skills, with a proactive approach to identifying and resolving technical issues
You are action-oriented, humble, have a strong willingness to learn, and serve the team members you lead

Nice to Have

Advanced degree in related field
Certifications in HPC, network, or related technologies
Experience working with AI startups and large enterprises

Salary Range Information

Based on market data and other factors, the salary range for this position is $170K - $210K/yr. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Linux Support Engineer I - 2nd Shift

LambdaRemote

ML ● Lambda ● azure ● metal ● linux ● python ● AWS

Lambda is hiring a Remote Linux Support Engineer I - 2nd Shift

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

Note: This role is a 2nd shift position with a schedule of 2 PM - 12 AM CDT, Wednesday to Saturday. During the onboarding period, you will follow a standard Monday-Friday work schedule until onboarding is complete.

What You’ll Do

Be the first point of contact for all incoming technical support questions and handle all customer interactions with understanding, empathy, and transparency.
Troubleshoot OS, hardware, and Lambda Stack issues for customers and provide guidance on the best technical solutions that suit their needs.
Route and escalate tickets, as needed, to appropriate teams and departments while owning customer communication throughout the issue lifecycle.
Work with our technical writing team to document solutions to common problems to allow for future customer self-service.
Provide feedback to internal teams on technical issues our customers are facing and, above all, be the customer’s advocate.
Work together in a cohesive, customer-first collaborative team environment, sharing your skills, knowledge, and experience.

You

Have Linux administration experience in bare-metal, virtualized, and/or cloud environments.
Familiarity with private or hybrid cloud environments, such as Azure, AWS, and/or OCI.
Experience with monitoring and alerting for enterprise and cloud environments.
Have Shell and Python scripting proficiency.
Strong ability to curate and adhere to technical standard operating procedures.
Possess excellent written and oral communication skills.
Proven experience when handling multiple customer interactions in a fast-paced environment.

Nice to Have

Familiarity with datacenter level hardware, including GPUs.
Familiarity with ML / AI / Deep Learning.
Experience with Zendesk ticketing.
Wide flexibility for scheduling as we push for 24/7 support availability.

Salary Range Information

Based on market data and other factors, the salary range for this position is $69,000-$105,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Staff Software Engineer - Inference

LambdaRemote (US & CAN)

ML ● Lambda ● Design ● metal ● c++ ● kubernetes ● python

Lambda is hiring a Remote Staff Software Engineer - Inference

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

Help design, build and improve our new inference and ML computation platform, acting as technical lead for one or more of our implementation teams.
Work with management, product and other internal business partners to drive technical decisions based on business and market needs
Work on the architecture of our distributed systems to ensure best-in-class reliability and efficiency, while helping to minimize operational costs and toil work
Provide your team empathetic leadership as well as mentorship to grow their own skills and abilities
Build products around a large range of ML models and types, including industry-leading research
Help build safety and fraud systems, around both inference and other ML systems
Handle interesting and dynamic scaling, hardware and scheduling challenges in a very dynamic and rapidly changing industry sector

You

Are an experienced lead software engineer with ten or more years of working on business-critical distributed systems.
Have a history of leading projects from inception to production, including making technical decisions, authoring design and decision documents, and advising on staffing needs.
Have significant experience architecting systems around relational databases, document databases, queue datastores, block storage, object storage, unreliable networks, and caches.
Have a deep understanding of the balance between initial build costs and operational costs, and what it takes to launch a product quickly but with a good technical foundation.
Can write both Go and Python to a high level, and can pick up other languages as needed.
Are very familiar with building integrated test frameworks and using CI/CD systems
Are product-oriented and focused on great user experiences, and are invested in building the best product possible for users.
Are good at working cross-functionally and solving problems across teams, including empathetic conflict resolution when working alongside teams with different priorities.
Have recent team leadership experience (on a team of four or more people)

Nice to Have

Experience writing Kubernetes operators or other Kubernetes integrations
Experience running ML/GPU workloads in production
Experience with computation dispatch and orchestration systems
Bare-metal hardware experience

Salary Range Information

Based on market data and other factors, the salary range for this position is $186,000 - $294,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Senior Software Engineer - Core Infrastructure

LambdaRemote (US & CAN)

DevOPS ● Lambda ● golang ● terraform ● Design ● ansible ● api ● c++ ● kubernetes ● linux ● python ● AWS

Lambda is hiring a Remote Senior Software Engineer - Core Infrastructure

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

Design and implement scalable, secure, and highly available Kubernetes clusters to support our growing application portfolio
Bootstrap new on-prem and managed Kubernetes environments from the ground up, including networking, storage, and security configurations
Extend our existing Kubernetes platforms with advanced features such as service mesh, serverless frameworks, and custom resource definitions (CRDs)
Develop and maintain infrastructure-as-code (IaC) templates using Cluster API (CAPI) for automated cluster provisioning and configuration management
Implement robust monitoring, logging, and alerting solutions using OpenTelemetry to ensure platform health and performance
Optimize resource utilization and cost-effectiveness of Kubernetes deployments across multiple cloud providers
Collaborate with teams to design and implement CI/CD pipelines for containerized applications
Troubleshoot complex issues in production Kubernetes environments and lead incident response efforts
Stay up-to-date with the latest Kubernetes ecosystem developments and evaluate new technologies for potential adoption
Mentor junior engineers and contribute to the development of platform engineering best practices

You

Have 5+ years bootstrapping, extending and operating K8s at scale (1,500+ nodes)
Have 5+ years automating the provisioning, configuration management, and deployment of production systems
Have 5+ years building resilient, scalable systems with Python/Go
Have 5+ years managing and securing infrastructure at scale (2,000+ hosts)
Possess Sound experience with Infrastructure as Code (Terraform, Ansible, etc.)
Possess Sound knowledge of DevOps, Infrastructure, and Platform concepts
Possess Strong development skills in Python or Golang
Possess Strong proficiency with Linux command line and debugging tools

Nice to Have

Experience with building complex hybrid environments (AWS and on-premise preferred)
Experience with service mesh technologies (e.g., Istio, Linkerd) and serverless frameworks (e.g., Knative)
Experience with multi-cluster or multi-cloud Kubernetes deployments
Experience in the machine learning or computer hardware industry
Certified Kubernetes Administrator (CKA) and/or Certified Kubernetes Application Developer (CKAD) certification
Contributions to open-source Kubernetes projects or tools
Familiarity with GitOps principles and tools like ArgoCD or Flux

Salary Range Information

Based on market data and other factors, the salary range for this position is $153,000-$240,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Machine Learning Researcher

LambdaRemote (US & CAN)

ML ● Lambda ● c++

Lambda is hiring a Remote Machine Learning Researcher

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

*Note: This position requires presence in one of our San Francisco Bay Area office locations (Currently San Jose, expanding to Peninsula/SF) 4 days per week; Lambda’s designated work from home day is currently Tuesday.

What You’ll Do

You will work on developing and refining AI models. Publishing research outcomes in the form of dataset, model, demo apps and publications..
You will also collaborate with colleagues across the organization to benchmark and optimize ML workloads on our GPU platforms.
The ideal candidate will possess a solid research background with experience in generative models.

You

Demonstrate a proven track record of enhancing existing machine learning methodologies, evidenced by significant achievements such as first author publications or notable projects.
Manage and advance a research agenda, including selecting meaningful research problems and independently conducting long-term projects.
Execute model training at scale, either by managing multiple experiments concurrently or overseeing critical high-cost runs to ensure optimal performance.

Nice to Have

Experience in MLOps within a collaborative environment.
Experience in training large models using distributed systems.
Experience in creating high-performance implementations of deep learning algorithms.

Salary Range Information

Based on market data and other factors, the salary range for this position is $189,000 - $260,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Senior HPC Operations Engineer

LambdaRemote (United States)

Lambda ● Bachelor's degree ● Design ● c++ ● docker ● kubernetes ● linux

Lambda is hiring a Remote Senior HPC Operations Engineer

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

Remotely provision and manage large-scale HPC clusters for AI workloads (up to many thousands of nodes)
Remotely install and configure operating systems, firmware, software, and networking on HPC clusters both manually and using automation tools
Troubleshoot and resolve HPC cluster issues working closely with physical deployment teams on-site
Provide context and details to an automation team to further automate the deployment process
Provide clear and detailed requirements back to HPC design team on gaps and improvement areas, specifically in the areas of simplification, stability, and operational efficiency
Contribute to the creation and maintenance of Standard Operating Procedures
Provide regular and well-communicated updates to project leads throughout each deployment
Mentor and assist less-experienced team members
Stay up-to-date on the latest HPC/AI technologies and best practices

You

Have 10+ years of experience in managing HPC clusters
Have 10+ years of everyday Linux experience
Have a strong understanding of HPC architecture (compute, networking, storage)
Have an innate attention to detail
Have experience with Bright Cluster Manager or similar cluster management tools
Are an expert in configuring and troubleshooting:

SFP+ fiber, InfiniBand (IB), and 100 GbE network fabrics
Ethernet, switching, power infrastructure, GPU direct, RDMA, NCCL, Horovod environments
Linux-based compute nodes, firmware updates, driver installation
SLURM, Kubernetes, or other job scheduling systems

Work well under deadlines and structured project plans
Have excellent problem-solving and troubleshooting skills
Have the flexibility to travel to our North American data centers as on-site needs arise or as part of training exercises
Are able to work both independently and as part of a team

Nice to Have

Experience with machine learning and deep learning frameworks (PyTorch, TensorFlow) and benchmarking tools (DeepSpeed, MLPerf)
Experience with containerization technologies (Docker, Kubernetes)
Experience working with the technologies that underpin our cloud business (GPU acceleration, virtualization, and cloud computing)
Keen situational awareness in customer situations, employing diplomacy and tact
Bachelor's degree in EE, CS, Physics, Mathematics, or equivalent work experience

Salary Range Information

Based on market data and other factors, the salary range for this position is $170,000-$230,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Senior HPC Systems Engineer

LambdaRemote (US & CAN)

ML ● Lambda ● Design ● c++ ● kubernetes ● linux ● python

Lambda is hiring a Remote Senior HPC Systems Engineer

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

Design and architect the state-of-the-art AI supercomputers powering our cloud
Introduce technology and software to improve the performance, resiliency, and quality of service of our HPC storage and networking infrastructure
Work closely with our ML team to benchmark, tune, and optimize our hypervisors, network, and storage
Set up monitoring, logging and alerting to ensure high availability and observability
Provide guidance and represent the interests of our HPC customers

You

Have expertise with architecting, operating, and debugging large scale HPC network and storage infrastructure, ideally using MPI, NCCL, RDMA, Infiniband, and parallel file systems
Are experienced with building complex, high-quality software using Python
Possess a deep understanding of Linux fundamentals, especially its networking stack
Have experience with large GPU clusters is strongly preferred
Have experience with virtualization and kubernetes
Come from a strong engineering background - Computer Science, Electrical Engineering, Mathematics, Physics

You will be successful in this role if you

Have led and taken full ownership over large, ambiguous, cross team projects from conception to production
Enjoy moving fast and making a large business impact
Value working on a team of high performers that hold each other accountable
Are a self-starter, curious, and not afraid to ask when in doubt
Are a quick learner and enjoy learning new technologies
Value working on a low ego team that emphasizes strong communication, collaboration, and getting to the right answer as a team

Salary Range Information

Based on market data and other factors, the salary range for this position is $180,000 - $250,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Senior Software Engineer - Cloud

LambdaRemote (US & CAN)

Lambda ● Design ● c++ ● linux ● python ● AWS

Lambda is hiring a Remote Senior Software Engineer - Cloud

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

Build software for training models across hundreds of GPUs interconnected with state-of-the-art networking fabric
Build core cloud features like VMs, VPCs, firewalls, distributed file systems within our data centers

Qualifications

8+ years of experience implementing business-critical product features from conception to launch using Python
8+ years of experience contributing to the architecture and design of resilient, large scale distributed systems
Strong understanding of public cloud features (e.g. SDN, block storage, distributed file systems, identity management)
Strong understanding of Linux (e.g. networking, process management, security, virtualization, systemd).
Strong engineering background - EECS preferred, Mathematics, Software Engineering, Physics

You will be successful in this role if you

Have led and taken full ownership over large, ambiguous, cross team projects from conception to production
Enjoy moving fast and making a large business impact
Value working on a team of high performers that hold each other accountable
Are a self-starter, curious, and not afraid to ask when in doubt
Are a quick learner and enjoy learning new technologies
Value working on a low ego team that emphasizes strong communication, collaboration, and getting to the right answer as a team
Care deeply about well-tested code

Salary Range Information

Based on market data and other factors, the salary range for this position is $185,000-$280,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Linux Support Engineer - Philippines

LambdaRemote - Philippines

ML ● Lambda ● azure ● metal ● c++ ● linux ● python ● AWS

Lambda is hiring a Remote Linux Support Engineer - Philippines

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

Be the first point of contact for all incoming technical support questions and handle all customer interactions with understanding, empathy, and transparency.
Troubleshoot OS, hardware, and Lambda Stack issues for customers and provide guidance on the best technical solutions that suit their needs.
Route and escalate tickets, as needed, to appropriate teams and departments while owning customer communication throughout the issue lifecycle.
Work with our technical writing team to document solutions to common problems to allow for future customer self-service.
Provide feedback to internal teams on technical issues our customers are facing and, above all, be the customer’s advocate.
Work together in a cohesive, customer-first collaborative team environment, sharing your skills, knowledge, and experience.

You

Have Linux administration experience in bare-metal, virtualized, and/or cloud environments.
Familiarity with private or hybrid cloud environments, such as Azure, AWS, and/or OCI.
Experience with monitoring and alerting for enterprise and cloud environments.
Have Shell and Python scripting proficiency.
Strong ability to curate and adhere to technical standard operating procedures.
Possess excellent written and oral communication skills.
Proven experience when handling multiple customer interactions in a fast-paced environment.

Nice to Have

Familiarity with datacenter level hardware, including GPUs.
Familiarity with ML / AI / Deep Learning.
Experience with Zendesk ticketing.
Wide flexibility for scheduling as we push for 24/7 support availability.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Data Center Strategy - Facility Engineering

LambdaRemote

Lambda ● Design ● c++

Lambda is hiring a Remote Data Center Strategy - Facility Engineering

Lambda was founded in 2012 by AI engineers who published research at top machine learning conferences. We aim to be the leading AI computing platform, supporting developers throughout the entire AI development lifecycle. At Lambda, we empower AI engineers to easily, securely, and affordably build, test, and deploy AI products at scale. Our offerings include high-performance on-prem GPU hardware and flexible cloud-based GPU solutions. We aim to make access to powerful computation as effortless and ubiquitous as electricity.

If you'd like to build the world's best deep learning cloud, join us.

About the Job

Become a key member of our Data Center Infrastructure Services team as a Principal Data Center Strategist. In this role, you will be instrumental in shaping the future of our data centers. Your responsibilities will include direct engagement with data center providers to evaluate the electrical, mechanical, and operational components of our facilities. You will report to the Vice President of Infrastructure and leverage your extensive knowledge in data center construction and operations. Your expertise will drive thought leadership and ensure optimal performance of our facility portfolio. Additionally, you will spearhead efficiency and build initiatives in both existing facilities and new construction. The ideal candidate will possess profound expertise in data center facilities management and a proven track record of successful implementation of cost saving strategies, and the ability to provide comprehensive technical guidance.

What You'll Do

Act as a technical advisor on data center infrastructureAssess new data centers for suitability and compliance with our operational standards.Evaluate and interface directly with data center providers to ensure operational efficiency, appropriate power utilization, and optimal resource allocation.
Provide expert troubleshooting support for data center operational issues.
Lead after-action reporting and problem remediation processes to continually enhance data center operations.
Ensure adherence to best practices for infrastructure concurrent maintainability, server cooling and power configurations, and maintenance to ensure adherence to operational SLAs Serve as a customer-facing data center expert.Provide strategic input on new technologies, building designs, and retrofitting projects to ensure future-ready infrastructure.Collaborate closely with the VP of Infrastructure and other senior leaders to align data center strategies with Lambda's overarching infrastructure goals.
Lead the design, deployment, and optimization of data center infrastructure, focusing on power distribution, cooling systems, and environmental controls
Drive data center lifecycle controls to ensure technology deployment is aligned and right sized
Develop and maintain comprehensive documentation of data center layout and infrastructure topologies to aid in optimizing the costing controls
Establish and enforce installation standards and documentation to ensure consistency and efficiency across all data center facilities

You

You will know how to build, manage, run and operate a data center at scale.
Bring 15+ years of experience in operating, designing, deploying, and optimizing critical data center infrastructure, with a focus on power systems, cooling solutions, and environmental controls
Demonstrate advanced proficiency in infrastructure deployment for high power compute environments
Have a proven track record of deploying data center operational controls across multiple data center locations
Possess a strong character for negotiating terms for design, build, operate and decommission of data center space
Detail-oriented with a strong commitment to following established procedures and standards
Action-oriented with a passion for continuous learning and professional development
Willingness to travel for the setup and optimization of new data center locations

Nice to have

Construction Management experience
Experience troubleshooting and theoretical knowledge of HPC computer designs
Experience working in large-scale campus and portfolio type business models for distributed data center environments
Experience collaborating with auditors to ensure compliance with industry standards
Previous experience in a leadership or managerial capacity within a data center engineering and operations team

Salary Range Information

Based on market data and other factors, the salary range for this position is $200,000- $ 247,000 However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Senior Network Engineer - Cloud

LambdaRemote (US & CAN)

Lambda ● terraform ● Design ● ansible ● c++ ● openstack ● linux ● python ● AWS

Lambda is hiring a Remote Senior Network Engineer - Cloud

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You'll Do

Help scale Lambda’s high performance cloud network
Contribute to the reproducible automation of network configuration
Contribute to the design and development of software defined networks
Help manage Spine and Leaf networks
Ensure high availability of our network through monitoring, failover, and redundancy
Ensure VMs have predictable networking performance
Help with deploying and maintaining network monitoring and management tools

You

Have led the implementation of production-scale networking projects
Experience managing BGP
Have experience with Spine and Leaf (Clos) network topology
Have experience with multi-data center networks and hybrid cloud networks
Have experience building and maintaining Software Defined Networks (SDN)
Are comfortable on the Linux command line, and have an understanding of the Linux networking stack
Have python programming experience

Nice To Have

Experience with OpenStack
Experience with HPC networking, such as Infiniband
Experience automating network configuration within public clouds, with tools like Terraform
Experience with configuration management tools like Ansible
Experience building and maintaining multi-data center networks
Have led implementation of production-scale SDNs in a cloud context (e.g. helped implement the infrastructure that powers an AWS VPC-like feature)
Deep understanding of the Linux networking stack and its interaction with network virtualization
Understanding of the SDN ecosystem (e.g. OVS, Neutron, DPDK, Cisco ACI or Nexus Fabric Controller, Arista CVP)
Experience with Next-Generation Firewalls (NGFW)

Salary Range Information

Based on market data and other factors, the salary range for this position is $180,000 - $230,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Senior Cloud Solutions Engineer

LambdaRemote

Sales ● Lambda ● Design ● api ● c++ ● kubernetes

Lambda is hiring a Remote Senior Cloud Solutions Engineer

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

Advocate for Lambda’s Products.

Develop and maintain expertise in Lambda’s cloud products and services
Demonstrate Lambda’s software and solutions to customers, partners and staff
Create field enablement materials for technical audience, lead workshops and support product advocacy efforts
Provide technical feedback from customers to Lambda’s product and marketing teams

Own the technical side of Lambda’s sales process.

Partner with Lambda account executives to drive customer adoption and ensure successful delivery
Evaluate and assess customers needs to deeply understand pain-points, bottlenecks and expected outcomes
Recommend appropriate cloud services and configurations to design a cohesive solution that supports the customers applications and workflow
Document proposal and designs in formats including but not limited to presentations, white-papers, diagrams, Bill of Materials and rack elevations

Demonstrate expertise on Lambda’s cloud infrastructure

Build structured and purposeful learning into your work routine
Develop and support internal Lambda community as a subject matter expert
Be an expert at deploying AI/ML workloads on Lambda cloud
Stay up to date on the latest deep learning trends, best practices and experiment with them using internal tools and resources
Develop high quality processes and documentation

Reinforce Lambda’s culture

Contribute positively throughout the organization
Maintain a high level of agility and responsiveness
Hyper-focused on customer satisfaction

You

Love learning both broadly and deeply
Are a skilled communicator who can translate technical concepts into plain english from vague customer needs into technical requirements on the fly
Are comfortable communicating with and crafting presentation collateral for customers, both internal and external, up to and including partnering with sales on customer interactions
Identify as a subject matter expert in the cloud & hyperscale industry
Have 4+ years of experience as a product lead or as a solutions architect role supporting cloud infrastructure and services
Have 3+ years of experience designing, deploying and scaling cloud infrastructure
Have 1+ year of experience working with cloud-based AI/ML services
Have familiarity with container orchestration platforms like Kubernetes
Have experience working with NVIDIAs GPUs
Are a self-starter, curious, and not afraid to ask when in doubt
Measure yourself on results, not effort, and constantly seek to accomplish more by becoming more efficient
Are able to build strong relationships across your entire organization

Nice to Have

Experience using deep learning frameworks such as TensorFlow or PyTorch
Experience deploying generative AI/ML applications
Experience working with LLM architectures
Experience designing, implementing and maintaining large-scale HPC infrastructure in cloud and hybrid environments
Experience working with RESTful API and general service-oriented architecture

Salary Range Information

Based on market data and other factors, the salary range for this position is $194,000-$278,000 OTE. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 300, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

HPC Support Engineer - Enterprise

LambdaRemote (United States)

ML ● Lambda ● c++ ● kubernetes ● linux ● python

Lambda is hiring a Remote HPC Support Engineer - Enterprise

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

Be a first point of contact for all incoming technical support questions
Troubleshoot software and hardware issues for customers and provide solutions
Document solutions to common problems
Collaborate in the development of new products
Provide feedback to product engineering teams

You

Linux administration experience in clustered or HPC environments
Experience with high throughput networking technologies such as Infiniband, RoCE, iWarp etc.
Shell or Python scripting proficiency
Excellent written and oral communication skills
Knowledge of data center hardware and out of band management tools
Experience working in private or hybrid cloud environments
Experience with HPC/AI technologies such as:

SFP+ fiber, InfiniBand (IB) / 100 GbE network fabric experience
Ethernet, switching, power infrastructure, GPU direct, RDMA, NCCL, Horovod environment
SLURM, Kubernetes, or other job scheduling systems
Distributed GPU training systems and third-party ML Ops platforms

Nice to Have

Experience with ML / AI / Deep Learning
Experience with NVIDIA data center GPUs
Experience with parallel file systems
Experience designing HPC clusters
Direct experience with HPC clusters for internal or external customers
Understanding of the hardware, software, and tools used for deep learning
Experience working with the technologies that underpin our business such as: Deep Learning frameworks, GPU acceleration, virtualization, and cloud computing

Salary Range Information

Based on market data and other factors, the salary range for this position is $100,000-$155,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 250, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends for select roles
401k Plan with 2% company match
Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job