Company Name:
Company Url:
Short Pitch:
Description:
Headquarter Location:
Tags:


Job Url:

Lambda


Lambda provides computation to accelerate human progress. We're a team of Deep Learning engineers building the world's best GPU workstations and servers. Our products power engineers and researchers at the forefront of human knowledge. Our customers include Apple, MIT, Los Alamos National Lab, Microsoft, Tencent, Kaiser Permanente, Stanford, Harvard, Caltech, and the Department of Defense.

Headquarter Location:
San Francisco, California

Lambda is hiring a Remote HPC Support Engineering Manager

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

About the role 

At Lambda Labs, we are seeking a driven and experienced Manager of High Performance Computing (HPC) Support Operations. As the HPC Support Operations leader at Lambda, you will play a pivotal role in providing design feedback on HPC solutions as well as ensure the highest level of customer satisfaction by responding to and resolving technical issues.  In addition to leading, developing, and mentoring a team of  HPC Support Engineers, you will also engage product, engineering, and sales teams to provide input into solution and product development.

Must be flexible in working nights and weekends as needed, as well as maintaining an on-call schedule.  This position reports to the Director of Customer Support.

What You'll Do

  • Ensure escalations are handled appropriately and consistently across the team.
  • Collaborate with the Director of Customer Support to support the development and coaching of the HPC Support team, ensuring the team continues in their technical growth and consistently delivers outstanding customer experiences.
  • Stay updated on the latest HPC and Nvidia technologies and provide recommendations based on thorough research and knowledge.
  • Support the HPC Support team by selecting, participating in, and leading training sessions, team meetings, and addressing roadblocks.
  • Ensure that departmental policies, procedures, and documentation accurately reflect best practices, making necessary changes or modifications as needed.
  • Provide thought leadership in the evolution of product development based on experiences from customer deployments and field installations.
  • Review, develop, and distribute support metrics to track team performance and customer satisfaction, constantly seeking opportunities for improvement in support processes and practices.
  • Assist in developing workflows and procedures for the team based on industry-standard frameworks.
  • Lead your team to develop tools that assist in the troubleshooting and resolution of technical issues encountered.
  • Manage team on-call schedule and duties.
  • Conduct performance reviews for members of the HPC Support team.
  • Lead by example, actively engaging in resolving customer cases while maintaining the necessary technical knowledge to function effectively as a team member.

You

  • Proven experience in a technical leadership role, preferably within HPC or AI industry
  • Strong knowledge of GPU InfiniBand HPC clusters, including hardware, software, and networking components
  • Advanced knowledge of Linux administration and troubleshooting
  • Have excellent leadership and team management skills with the ability to motivate and develop a high-performing team
  • Exceptional customer service and communication skills, with the ability to interact effectively with internal and external customers and stakeholders
  • Strong problem solving and analytical skills, with a proactive approach to identifying and resolving technical issues
  • You are action-oriented, humble, have a strong willingness to learn, and serve the team members you lead

Nice to Have

  • Advanced degree in related field
  • Certifications in HPC, network, or related technologies
  • Experience working with AI startups and large enterprises

Salary Range Information 

Based on market data and other factors, the salary range for this position is $170K - $210K/yr.  However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

Lambda is hiring a Remote Linux Support Engineer I - 2nd Shift

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

Note: This role is a 2nd shift position with a schedule of 2 PM - 12 AM CDT, Wednesday to Saturday. During the onboarding period, you will follow a standard Monday-Friday work schedule until onboarding is complete.

What You’ll Do

  • Be the first point of contact for all incoming technical support questions and handle all customer interactions with understanding, empathy, and transparency.
  • Troubleshoot OS, hardware, and Lambda Stack issues for customers and provide guidance on the best technical solutions that suit their needs.
  • Route and escalate tickets, as needed, to appropriate teams and departments while owning customer communication throughout the issue lifecycle.
  • Work with our technical writing team to document solutions to common problems to allow for future customer self-service.
  • Provide feedback to internal teams on technical issues our customers are facing and, above all, be the customer’s advocate.
  • Work together in a cohesive, customer-first collaborative team environment, sharing your skills, knowledge, and experience.

You

  • Have Linux administration experience in bare-metal, virtualized, and/or cloud environments.
  • Familiarity with private or hybrid cloud environments, such as Azure, AWS, and/or OCI.
  • Experience with monitoring and alerting for enterprise and cloud environments.
  • Have Shell and Python scripting proficiency.
  • Strong ability to curate and adhere to technical standard operating procedures.
  • Possess excellent written and oral communication skills.
  • Proven experience when handling multiple customer interactions in a fast-paced environment.

Nice to Have

  • Familiarity with datacenter level hardware, including GPUs.
  • Familiarity with ML / AI / Deep Learning.
  • Experience with Zendesk ticketing.
  • Wide flexibility for scheduling as we push for 24/7 support availability.

Salary Range Information 

Based on market data and other factors, the salary range for this position is $69,000-$105,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description. 

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

Lambda is hiring a Remote Staff Software Engineer - Inference

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

  • Help design, build and improve our new inference and ML computation platform, acting as technical lead for one or more of our implementation teams.
  • Work with management, product and other internal business partners to drive technical decisions based on business and market needs
  • Work on the architecture of our distributed systems to ensure best-in-class reliability and efficiency, while helping to minimize operational costs and toil work
  • Provide your team empathetic leadership as well as mentorship to grow their own skills and abilities
  • Build products around a large range of ML models and types, including industry-leading research
  • Help build safety and fraud systems, around both inference and other ML systems
  • Handle interesting and dynamic scaling, hardware and scheduling challenges in a very dynamic and rapidly changing industry sector

You

  • Are an experienced lead software engineer with ten or more years of working on business-critical distributed systems.
  • Have a history of leading projects from inception to production, including making technical decisions, authoring design and decision documents, and advising on staffing needs.
  • Have significant experience architecting systems around relational databases, document databases, queue datastores, block storage, object storage, unreliable networks, and caches.
  • Have a deep understanding of the balance between initial build costs and operational costs, and what it takes to launch a product quickly but with a good technical foundation.
  • Can write both Go and Python to a high level, and can pick up other languages as needed.
  • Are very familiar with building integrated test frameworks and using CI/CD systems
  • Are product-oriented and focused on great user experiences, and are invested in building the best product possible for users.
  • Are good at working cross-functionally and solving problems across teams, including empathetic conflict resolution when working alongside teams with different priorities.
  • Have recent team leadership experience (on a team of four or more people) 

Nice to Have

  • Experience writing Kubernetes operators or other Kubernetes integrations
  • Experience running ML/GPU workloads in production
  • Experience with computation dispatch and orchestration systems
  • Bare-metal hardware experience

Salary Range Information 

Based on market data and other factors, the salary range for this position is $186,000 - $294,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

Lambda is hiring a Remote Senior Software Engineer - Core Infrastructure

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do 

  • Design and implement scalable, secure, and highly available Kubernetes clusters to support our growing application portfolio
  • Bootstrap new on-prem and managed Kubernetes environments from the ground up, including networking, storage, and security configurations
  • Extend our existing Kubernetes platforms with advanced features such as service mesh, serverless frameworks, and custom resource definitions (CRDs)
  • Develop and maintain infrastructure-as-code (IaC) templates using Cluster API (CAPI) for automated cluster provisioning and configuration management
  • Implement robust monitoring, logging, and alerting solutions using OpenTelemetry to ensure platform health and performance
  • Optimize resource utilization and cost-effectiveness of Kubernetes deployments across multiple cloud providers
  • Collaborate with teams to design and implement CI/CD pipelines for containerized applications
  • Troubleshoot complex issues in production Kubernetes environments and lead incident response efforts
  • Stay up-to-date with the latest Kubernetes ecosystem developments and evaluate new technologies for potential adoption
  • Mentor junior engineers and contribute to the development of platform engineering best practices

You

  • Have 5+ years bootstrapping, extending and operating K8s at scale (1,500+ nodes)
  • Have 5+ years automating the provisioning, configuration management, and deployment of production systems
  • Have 5+ years building resilient, scalable systems with Python/Go
  • Have 5+ years managing and securing infrastructure at scale (2,000+ hosts)
  • Possess Sound experience with Infrastructure as Code (Terraform, Ansible, etc.)
  • Possess Sound knowledge of DevOps, Infrastructure, and Platform concepts
  • Possess Strong development skills in Python or Golang
  • Possess Strong proficiency with Linux command line and debugging tools

Nice to Have

  • Experience with building complex hybrid environments (AWS and on-premise preferred)
  • Experience with service mesh technologies (e.g., Istio, Linkerd) and serverless frameworks (e.g., Knative)
  • Experience with multi-cluster or multi-cloud Kubernetes deployments
  • Experience in the machine learning or computer hardware industry
  • Certified Kubernetes Administrator (CKA) and/or Certified Kubernetes Application Developer (CKAD) certification
  • Contributions to open-source Kubernetes projects or tools
  • Familiarity with GitOps principles and tools like ArgoCD or Flux

Salary Range Information 

Based on market data and other factors, the salary range for this position is $153,000-$240,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Machine Learning Researcher

LambdaRemote (US & CAN)
MLLambdac++

Lambda is hiring a Remote Machine Learning Researcher

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

*Note: This position requires presence in one of our San Francisco Bay Area office locations (Currently San Jose, expanding to Peninsula/SF) 4 days per week; Lambda’s designated work from home day is currently Tuesday.

What You’ll Do

  • You will work on developing and refining AI models. Publishing research outcomes in the form of dataset, model, demo apps and publications.. 
  • You will also collaborate with colleagues across the organization to benchmark and optimize ML workloads on our GPU platforms.
  • The ideal candidate will possess a solid research background with experience in generative models.

You 

  • Demonstrate a proven track record of enhancing existing machine learning methodologies, evidenced by significant achievements such as first author publications or notable projects.
  • Manage and advance a research agenda, including selecting meaningful research problems and independently conducting long-term projects.
  • Execute model training at scale, either by managing multiple experiments concurrently or overseeing critical high-cost runs to ensure optimal performance.

Nice to Have

  • Experience in MLOps within a collaborative environment.
  • Experience in training large models using distributed systems.
  • Experience in creating high-performance implementations of deep learning algorithms.

Salary Range Information 

Based on market data and other factors, the salary range for this position is $189,000 - $260,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

Lambda is hiring a Remote Senior HPC Operations Engineer

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

  • Remotely provision and manage large-scale HPC clusters for AI workloads (up to many thousands of nodes)
  • Remotely install and configure operating systems, firmware, software, and networking on HPC clusters both manually and using automation tools
  • Troubleshoot and resolve HPC cluster issues working closely with physical deployment teams on-site
  • Provide context and details to an automation team to further automate the deployment process
  • Provide clear and detailed requirements back to HPC design team on gaps and improvement areas, specifically in the areas of simplification, stability, and operational efficiency
  • Contribute to the creation and maintenance of Standard Operating Procedures
  • Provide regular and well-communicated updates to project leads throughout each deployment
  • Mentor and assist less-experienced team members
  • Stay up-to-date on the latest HPC/AI technologies and best practices

You

  • Have 10+ years of experience in managing HPC clusters
  • Have 10+ years of everyday Linux experience
  • Have a strong understanding of HPC architecture (compute, networking, storage)
  • Have an innate attention to detail
  • Have experience with Bright Cluster Manager or similar cluster management tools
  • Are an expert in configuring and troubleshooting:
    • SFP+ fiber, InfiniBand (IB), and 100 GbE network fabrics
    • Ethernet, switching, power infrastructure, GPU direct, RDMA, NCCL, Horovod environments
    • Linux-based compute nodes, firmware updates, driver installation
    • SLURM, Kubernetes, or other job scheduling systems
  • Work well under deadlines and structured project plans
  • Have excellent problem-solving and troubleshooting skills
  • Have the flexibility to travel to our North American data centers as on-site needs arise or as part of training exercises
  • Are able to work both independently and as part of a team

Nice to Have

  • Experience with machine learning and deep learning frameworks (PyTorch, TensorFlow) and benchmarking tools (DeepSpeed, MLPerf)
  • Experience with containerization technologies (Docker, Kubernetes)
  • Experience working with the technologies that underpin our cloud business (GPU acceleration, virtualization, and cloud computing)
  • Keen situational awareness in customer situations, employing diplomacy and tact
  • Bachelor's degree in EE, CS, Physics, Mathematics, or equivalent work experience

Salary Range Information 

Based on market data and other factors, the salary range for this position is $170,000-$230,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description. 

 

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Senior HPC Systems Engineer

LambdaRemote (US & CAN)
MLLambdaDesignc++kuberneteslinuxpython

Lambda is hiring a Remote Senior HPC Systems Engineer

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

  • Design and architect the state-of-the-art AI supercomputers powering our cloud
  • Introduce technology and software to improve the performance, resiliency, and quality of service of our HPC storage and networking infrastructure
  • Work closely with our ML team to benchmark, tune, and optimize our hypervisors, network, and storage
  • Set up monitoring, logging and alerting to ensure high availability and observability
  • Provide guidance and represent the interests of our HPC customers

You

  • Have expertise with architecting, operating, and debugging large scale HPC network and storage infrastructure, ideally using MPI, NCCL, RDMA, Infiniband, and parallel file systems
  • Are experienced with building complex, high-quality software using Python
  • Possess a deep understanding of Linux fundamentals, especially its networking stack
  • Have experience with large GPU clusters is strongly preferred
  • Have experience with virtualization and kubernetes
  • Come from a strong engineering background - Computer Science, Electrical Engineering, Mathematics, Physics

You will be successful in this role if you

  • Have led and taken full ownership over large, ambiguous, cross team projects from conception to production
  • Enjoy moving fast and making a large business impact
  • Value working on a team of high performers that hold each other accountable
  • Are a self-starter, curious, and not afraid to ask when in doubt
  • Are a quick learner and enjoy learning new technologies
  • Value working on a low ego team that emphasizes strong communication, collaboration, and getting to the right answer as a team 

Salary Range Information 

Based on market data and other factors, the salary range for this position is $180,000 - $250,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Senior Software Engineer - Cloud

LambdaRemote (US & CAN)
LambdaDesignc++linuxpythonAWS

Lambda is hiring a Remote Senior Software Engineer - Cloud

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

  • Build software for training models across hundreds of GPUs interconnected with state-of-the-art networking fabric
  • Build core cloud features like VMs, VPCs, firewalls, distributed file systems within our data centers

Qualifications

  • 8+ years of experience implementing business-critical product features from conception to launch using Python
  • 8+ years of experience contributing to the architecture and design of resilient, large scale distributed systems
  • Strong understanding of public cloud features (e.g. SDN, block storage, distributed file systems, identity management)
  • Strong understanding of Linux (e.g. networking, process management, security, virtualization, systemd).
  • Strong engineering background - EECS preferred, Mathematics, Software Engineering, Physics

You will be successful in this role if you

  • Have led and taken full ownership over large, ambiguous, cross team projects from conception to production
  • Enjoy moving fast and making a large business impact
  • Value working on a team of high performers that hold each other accountable
  • Are a self-starter, curious, and not afraid to ask when in doubt
  • Are a quick learner and enjoy learning new technologies
  • Value working on a low ego team that emphasizes strong communication, collaboration, and getting to the right answer as a team 
  • Care deeply about well-tested code

Salary Range Information 

Based on market data and other factors, the salary range for this position is $185,000-$280,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

Linux Support Engineer - Philippines

LambdaRemote - Philippines
MLLambdaazuremetalc++linuxpythonAWS

Lambda is hiring a Remote Linux Support Engineer - Philippines

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

  • Be the first point of contact for all incoming technical support questions and handle all customer interactions with understanding, empathy, and transparency.
  • Troubleshoot OS, hardware, and Lambda Stack issues for customers and provide guidance on the best technical solutions that suit their needs.
  • Route and escalate tickets, as needed, to appropriate teams and departments while owning customer communication throughout the issue lifecycle.
  • Work with our technical writing team to document solutions to common problems to allow for future customer self-service.
  • Provide feedback to internal teams on technical issues our customers are facing and, above all, be the customer’s advocate.
  • Work together in a cohesive, customer-first collaborative team environment, sharing your skills, knowledge, and experience.

You

  • Have Linux administration experience in bare-metal, virtualized, and/or cloud environments.
  • Familiarity with private or hybrid cloud environments, such as Azure, AWS, and/or OCI.
  • Experience with monitoring and alerting for enterprise and cloud environments.
  • Have Shell and Python scripting proficiency.
  • Strong ability to curate and adhere to technical standard operating procedures.
  • Possess excellent written and oral communication skills.
  • Proven experience when handling multiple customer interactions in a fast-paced environment.

Nice to Have

  • Familiarity with datacenter level hardware, including GPUs.
  • Familiarity with ML / AI / Deep Learning.
  • Experience with Zendesk ticketing.
  • Wide flexibility for scheduling as we push for 24/7 support availability.

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

Lambda is hiring a Remote Data Center Strategy - Facility Engineering

Lambda was founded in 2012 by AI engineers who published research at top machine learning conferences. We aim to be the leading AI computing platform, supporting developers throughout the entire AI development lifecycle. At Lambda, we empower AI engineers to easily, securely, and affordably build, test, and deploy AI products at scale. Our offerings include high-performance on-prem GPU hardware and flexible cloud-based GPU solutions. We aim to make access to powerful computation as effortless and ubiquitous as electricity.

If you'd like to build the world's best deep learning cloud, join us.

About the Job 

Become a key member of our Data Center Infrastructure Services team as a Principal Data Center Strategist. In this role, you will be instrumental in shaping the future of our data centers. Your responsibilities will include direct engagement with data center providers to evaluate the electrical, mechanical, and operational components of our facilities. You will report to the Vice President of Infrastructure and leverage your extensive knowledge in data center construction and operations. Your expertise will drive thought leadership and ensure optimal performance of our facility portfolio. Additionally, you will spearhead efficiency and build initiatives in both existing facilities and new construction. The ideal candidate will possess profound expertise in data center facilities management and a proven track record of successful implementation of cost saving strategies, and the ability to provide comprehensive technical guidance.

What You'll Do

  • Act as a technical advisor on data center infrastructureAssess new data centers for suitability and compliance with our operational standards.Evaluate and interface directly with data center providers to ensure operational efficiency, appropriate power utilization, and optimal resource allocation.
  • Provide expert troubleshooting support for data center operational issues.
  • Lead after-action reporting and problem remediation processes to continually enhance data center operations.
  • Ensure adherence to best practices for infrastructure concurrent maintainability, server cooling and power configurations, and maintenance to ensure adherence to operational SLAs Serve as a customer-facing data center expert.Provide strategic input on new technologies, building designs, and retrofitting projects to ensure future-ready infrastructure.Collaborate closely with the VP of Infrastructure and other senior leaders to align data center strategies with Lambda's overarching infrastructure goals.
  • Lead the design, deployment, and optimization of data center infrastructure, focusing on power distribution, cooling systems, and environmental controls
  • Drive data center lifecycle controls to ensure technology deployment is aligned and right sized
  • Develop and maintain comprehensive documentation of data center layout and infrastructure topologies to aid in optimizing the costing controls
  • Establish and enforce installation standards and documentation to ensure consistency and efficiency across all data center facilities

You

  • You will know how to build, manage, run and operate a data center at scale.
  • Bring 15+ years of experience in operating, designing, deploying, and optimizing critical data center infrastructure, with a focus on power systems, cooling solutions, and environmental controls
  • Demonstrate advanced proficiency in infrastructure deployment for high power compute environments
  • Have a proven track record of deploying data center operational controls across multiple data center locations
  • Possess a strong character for negotiating terms for design, build, operate and decommission of data center space
  • Detail-oriented with a strong commitment to following established procedures and standards
  • Action-oriented with a passion for continuous learning and professional development
  • Willingness to travel for the setup and optimization of new data center locations

Nice to have

  • Construction Management experience
  • Experience troubleshooting and theoretical knowledge of HPC computer designs
  • Experience working in large-scale campus and portfolio type business models for distributed data center environments
  • Experience collaborating with auditors to ensure compliance with industry standards
  • Previous experience in a leadership or managerial capacity within a data center engineering and operations team

Salary Range Information 

Based on market data and other factors, the salary range for this position is $200,000- $ 247,000 However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

Lambda is hiring a Remote Senior Network Engineer - Cloud

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You'll Do

  • Help scale Lambda’s high performance cloud network
  • Contribute to the reproducible automation of network configuration
  • Contribute to the design and development of software defined networks
  • Help manage Spine and Leaf networks
  • Ensure high availability of our network through monitoring, failover, and redundancy
  • Ensure VMs have predictable networking performance
  • Help with deploying and maintaining network monitoring and management tools

You

  • Have led the implementation of production-scale networking projects
  • Experience managing BGP
  • Have experience with Spine and Leaf (Clos) network topology
  • Have experience with multi-data center networks and hybrid cloud networks
  • Have experience building and maintaining Software Defined Networks (SDN)
  • Are comfortable on the Linux command line, and have an understanding of the Linux networking stack
  • Have python programming experience

Nice To Have

  • Experience with OpenStack
  • Experience with HPC networking, such as Infiniband
  • Experience automating network configuration within public clouds, with tools like Terraform
  • Experience with configuration management tools like Ansible
  • Experience building and maintaining multi-data center networks
  • Have led implementation of production-scale SDNs in a cloud context (e.g. helped implement the infrastructure that powers an AWS VPC-like feature)
  • Deep understanding of the Linux networking stack and its interaction with network virtualization
  • Understanding of the SDN ecosystem (e.g. OVS, Neutron, DPDK, Cisco ACI or Nexus Fabric Controller, Arista CVP)
  • Experience with Next-Generation Firewalls (NGFW)

Salary Range Information 

Based on market data and other factors, the salary range for this position is $180,000 - $230,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description. 

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

Lambda is hiring a Remote Senior Cloud Solutions Engineer

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

  • Advocate for Lambda’s Products.
    • Develop and maintain expertise in Lambda’s cloud products and services
    • Demonstrate Lambda’s software and solutions to customers, partners and staff
    • Create field enablement materials for technical audience, lead workshops and support product advocacy efforts 
    • Provide technical feedback from customers to Lambda’s product and marketing teams
  • Own the technical side of Lambda’s sales process.
    • Partner with Lambda account executives to drive customer adoption and ensure successful delivery
    • Evaluate and assess customers needs to deeply understand pain-points, bottlenecks and expected outcomes
    • Recommend appropriate cloud services and configurations to design a cohesive solution that supports the customers applications and workflow
    • Document proposal and designs in formats including but not limited to presentations, white-papers, diagrams, Bill of Materials and rack elevations
  • Demonstrate expertise on Lambda’s cloud infrastructure 
    • Build structured and purposeful learning into your work routine
    • Develop and support internal Lambda community as a subject matter expert
    • Be an expert at deploying AI/ML workloads on Lambda cloud
    • Stay up to date on the latest deep learning trends, best practices and experiment with them using internal tools and resources
    • Develop high quality processes and documentation
  • Reinforce Lambda’s culture
    • Contribute positively throughout the organization
    • Maintain a high level of agility and responsiveness 
    • Hyper-focused on customer satisfaction

You 

  • Love learning both broadly and deeply
  • Are a skilled communicator who can translate technical concepts into plain english from vague customer needs into technical requirements on the fly
  • Are comfortable communicating with and crafting presentation collateral for customers, both internal and external, up to and including partnering with sales on customer interactions
  • Identify as a subject matter expert in the cloud & hyperscale industry  
  • Have 4+ years of experience as a product lead or as a solutions architect role supporting cloud infrastructure and services 
  • Have 3+ years of experience designing, deploying and scaling cloud infrastructure
  • Have 1+ year of experience working with cloud-based AI/ML services 
  • Have familiarity with container orchestration platforms like Kubernetes
  • Have experience working with NVIDIAs GPUs
  • Are a self-starter, curious, and not afraid to ask when in doubt
  • Measure yourself on results, not effort, and constantly seek to accomplish more by becoming more efficient 
  • Are able to build strong relationships across your entire organization

Nice to Have

  • Experience using deep learning frameworks such as TensorFlow or PyTorch
  • Experience deploying generative AI/ML applications 
  • Experience working with LLM architectures
  • Experience designing, implementing and maintaining large-scale HPC infrastructure in cloud and hybrid environments
  • Experience working with RESTful API and general service-oriented architecture

Salary Range Information 

Based on market data and other factors, the salary range for this position is $194,000-$278,000 OTE. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description. 

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

+30d

HPC Support Engineer - Enterprise

LambdaRemote (United States)
MLLambdac++kuberneteslinuxpython

Lambda is hiring a Remote HPC Support Engineer - Enterprise

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

  • Be a first point of contact for all incoming technical support questions 
  • Troubleshoot software and hardware issues for customers and provide solutions
  • Document solutions to common problems
  • Collaborate in the development of new products
  • Provide feedback to product engineering teams

You

  • Linux administration experience in clustered or HPC environments
  • Experience with high throughput networking technologies such as Infiniband, RoCE, iWarp etc.
  • Shell or Python scripting proficiency
  • Excellent written and oral communication skills
  • Knowledge of data center hardware and out of band management tools
  • Experience working in private or hybrid cloud environments
  • Experience with HPC/AI technologies such as: 
    • SFP+ fiber, InfiniBand (IB) / 100 GbE network fabric experience  
    • Ethernet, switching, power infrastructure, GPU direct, RDMA, NCCL, Horovod environment
    • SLURM, Kubernetes, or other job scheduling systems
    • Distributed GPU training systems and third-party ML Ops platforms

Nice to Have

  • Experience with ML / AI / Deep Learning
  • Experience with NVIDIA data center GPUs
  • Experience with parallel file systems
  • Experience designing HPC clusters
  • Direct experience with HPC clusters for internal or external customers 
  • Understanding of the hardware, software, and tools used for deep learning
  • Experience working with the technologies that underpin our business such as: Deep Learning frameworks, GPU acceleration, virtualization, and cloud computing

Salary Range Information 

Based on market data and other factors, the salary range for this position is $100,000-$155,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description. 

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 250, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job