Staff Software Engineer, Observability - ZipRecruiter : Job Details

Staff Software Engineer, Observability

ZipRecruiter

Job Location : New York,NY, USA

Posted on : 2025-08-06T01:14:46Z

Job Description :

Job Description

CoreWeave is the AI Hyperscaler, delivering a cloud platform of cutting-edge services powering the next wave of AI. Our technology provides enterprises and leading AI labs with the most performant, efficient, and resilient solutions for accelerated computing. Since 2017, CoreWeave has operated a growing footprint of data centers across the US and Europe. CoreWeave was ranked as one of the TIME100 most influential companies of 2024.

As an industry leader, we thrive in an environment where adaptability and resilience are key. Our culture offers career-defining opportunities for those who excel amid change and challenge. If you thrive in a dynamic environment, enjoy solving complex problems, and are eager to make a significant impact, CoreWeave is the place for you. Join us and be part of a team tackling some of the industry's most exciting challenges.

CoreWeave powers the creation and delivery of the intelligence that drives innovation.

About the role:

We are seeking a highly experienced Staff Software Engineer to lead efforts in building, maintaining, and optimizing scalable, reliable, and secure systems.

The Observability team manages critical infrastructure, including logging, tracing, metrics platforms, and related pipelines.

Key Responsibilities:

  • Lead and mentor engineers, fostering collaboration and continuous improvement.
  • Scale logging, tracing, and metrics platforms for a global datacenter footprint.
  • Develop and refine monitoring and alerting systems to enhance reliability.
  • Advise engineers on optimal use of Observability systems.
  • Automate interactions with CoreWeave's Compute Infrastructure.
  • Manage production clusters and ensure best practices in deployments.

Required Qualifications:

  • 7+ years in Software Engineering, SRE, DevOps, or related fields.
  • Deep expertise with observability tools like ClickHouse, Elastic, Loki, Victoria Metrics, Prometheus, Thanos, Grafana.
  • Proficiency in Kubernetes, containerization, microservices architecture.
  • Experience leading incident management and post-mortem analysis.
  • Strong problem-solving, analytical, and communication skills.

Preferred Qualifications:

  • Experience scaling observability tools as a cloud provider.
  • Managing large-scale Kubernetes clusters.
  • Deep understanding of data-streaming systems.

The base salary ranges from $188,000 to $250,000, with a target total cash of $226,000 to $300,000, including bonuses, equity, and benefits. Compensation depends on qualifications, experience, location, and interview performance.

What We Offer:

Competitive salary plus benefits, including health insurance, life insurance, FSA/HSA, tuition reimbursement, stock purchase programs, wellness benefits, parental leave, childcare support, 401(k), flexible PTO, daily catered meals, a casual work environment, and a culture of innovation.

Workplace Environment:

Primarily hybrid, with remote options for certain locations. New hires attend onboarding at nearby hubs. Teams gather quarterly for collaboration.

California applicants only: We are committed to equal opportunity employment and providing accommodations under the ADA. Contact: [email protected].

Export Control:

This role involves access to export-controlled information. Applicants must meet U.S. government criteria or obtain necessary export authorizations. CoreWeave may decline to pursue export licensing processes.

#J-18808-Ljbffr
Apply Now!

Similar Jobs ( 0)