Head OF IT Operations & Service Excellence - BJ's Wholesale Club : Job Details

Head OF IT Operations & Service Excellence

BJ's Wholesale Club

Job Location : Marlborough,MA, USA

Posted on : 2025-09-05T20:11:52Z

Job Description :

Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ's Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we're committed to providing outstanding service and convenience to our members, helping them save on the products and services they need for their families and homes.

The Benefits of working at BJ's

• BJ's pays weekly

• Eligible for free BJ's Inner Circle and Supplemental membership(s)*

• Generous time off programs to support busy lifestyles*

o Vacation, Personal, Holiday, Sick, Bereavement Leave, Jury Duty

• Benefit plans for your changing needs*

o Three medical plans**, Health Savings Account (HSA), two dental plans, vision plan, flexible spending

• 401(k) plan with company match (must be at least 18 years old)

*eligibility requirements vary by position

**medical plans vary by location

The Head of IT Operations & Service Excellence is the strategic and operational leader responsible for uptime and resiliency of systems across BJ's digital and enterprise technology landscape (across applications, infrastructure and security) to provide world‑class experiences to our members and team members. The role sets the north‑star for what “good” looks like — defining and publishing service‑level objectives (SLOs/SLIs) and operational key results — while building the organizational muscle to deliver them consistently. Reporting to the VP of Infrastructure & Operations, this leader balances real‑time incident response with multi‑year service‑reliability vision, enabling teams to see the forest through the trees and make data‑driven trade‑offs.

Key ResponsibilitiesStrategic Leadership

Define and execute the multi‐year IT Service Excellence maturity roadmap aligned to business objectives, cloud migration plans, uptime and resiliency requirements.
Craft multi‑year resiliency and cost‑optimization roadmap aligned to company growth goals.
Implement IT operations best practices
Collaborate with product development teams and influences them to ensure reliability and scalability are considered at the design phase.
Partner with Enterprise Architecture to define standards for building reliable applications that are highly available and resilient.
Define Service Level Objectives (SLOs), Service Level Indicators (SLIs) for all critical services.
Foster a high‑trust, blameless culture that rewards learning, experimentation, and excellence.
Own the IT Operations & Service Excellence budget; optimize OpEx through automation, self‑service, and vendor management.

IT Operations & Incident Management (24×7 Command Center, NOC & Service Desk)

Oversee real‑time monitoring, incident triage, and major‑incident management ensuring MTTR and communications SLAs are met.
Maintain a high‑performing L1 Service Desk; drive call deflection via knowledge, AI chatbots, and self‑service password reset.
Publish operational metrics (MTTA, MTTR, FCR, abandon rate) with actionable insights.
Lead the major incident management function, including defining escalation paths, coordinating cross-functional teams, and ensuring timely communication to stakeholders
Oversee the entire incident lifecycle, from identification and triage to resolution and post-incident analysis, ensuring efficient and effective processes are in place.
Manage on-call rotations and ensure 24 by 7 coverage with major incident managers
Ensure a robust playbook is developed and followed during a MIM process with clearly assigned roles, communication protocols and a well defined triaging process
Matrix management of people, processes and resources including third parties – including resolving conflict to move forward to resolution

Change & Release Governance

Chair the Change Advisory Board (CAB); uphold 99%+ change success while accelerating deployment velocity.
Implement risk‑based change classification; Ensure thoroughness of end to end testing, automated pre‑deployment checks, rollback processes in place and post‑implementation reviews.

Service Reliability Engineering (SRE) & Observability

Develop and implement SRE policies, standards, and best practices for enterprise-wide systems.
Lead SRE squads covering AWS, colocation data centers, network/edge, and SaaS platforms.
Set error budgets, reliability targets, and chaos‑engineering practices; ensure recovery time and point objectives (RTO/RPO) meet or exceed DR objectives and business expectations.
Work with Service managers overseeing SRE functions for Digital, Membership, Enterprise, and Club & Fuel systems and deliver integrated SRE.
Drive end‑to‑end service design — service maps, dependency graphs, support models — to complement observability tooling.
Lead the roadmap for logging, metrics, tracing, and AIOps platforms, delivering actionable insights and predictive alerting.

Engineering Excellence and Practices:

Understand the potential impact of system requirements and design choices across multiple cloud and on-premise technologies
Continuously work on enhancing the reliability, stability, and performance of our key platforms, being at the forefront of promoting engineering excellence, implementing best practices, and overseeing the integration of fully automated telemetry within modern DevOps frameworks
Advance problem detection and ensure service restoration processes are well defined
Utilizing cutting-edge Site Reliability Engineering methods, coupled with automated alerting and self-healing mechanisms, improve both cloud-based and on-premises systems, thereby fortifying our digital infrastructure's robustness and efficiency

Process Ownership & Continuous Improvement

Codify SOPs and RACI matrices across Ops, SRE, Service Desk, and engineering partners to drive clarity of ownership.
Lead Lean/Kaizen initiatives that reduce toil and amplify engineering productivity.
Track and report OKRs; course‑correct based on data.
Drive root‑cause analysis (RCA) and problem management; close systemic gaps and prevent recurrence of major incidents.

Compliance, Security & Risk

Partner with Cybersecurity and Compliance teams to meet PCI‑DSS, SOX, and data‑privacy obligations.
Ensure operational controls withstand internal and external audits.

People Development

Possess robust technical expertise and leadership qualities to lead by example with a proven track record in Site Reliability Engineering
Foster a culture of psychological safety, empowerment, and continuous learning.
Coach and develop managers; Build, mentor, and retain organization spanning Service Desk, Command Center, SRE, Change Governance, Problem Management and Analytics.

Required Qualifications

Bachelor's degree in Computer Science, Engineering, or related discipline (Master's preferred).
15+ years of progressive IT Operations leadership with 5+ years at a Director/Head level supporting large‑scale, Retail and distributed environments.
Proven track record of leading teams through complex system outages and scalability challenges.
5+ years of proven oversight of 24×7 operations (NOC, Service Desk) and SRE or DevOps functions.
Proficiency in system design and architecture, particularly in a cloud environment.
Demonstrated success operating hybrid cloud (AWS) and on‑prem data‑center environments.
Expertise with ITIL v4/Service Management frameworks; ITIL certification strongly desired.
Experience implementing observability, AIOps, and automation platforms (e. g., ServiceNow, Ops Ramp, SolarWinds, New Relic, PagerDuty).
Outstanding communication skills and executive presence; able to brief C‑suite on risk and performance.

Preferred Qualifications

Retail industry experience managing store, fuel, and distribution center technologies.
Certifications in ServiceNow.
Lean Six Sigma or Continuous Improvement accreditation.

Leadership Competencies

Strategic Thinking / “Forest‑Through‑the‑Trees”: Articulates long‑term vision while executing tactically under pressure.
Influence & Communication:
Excellent verbal and written communication skills. Experience presenting to C-level executives and stakeholders.
Translates technical concepts into business outcomes for executives and frontline associates.
Servant Leadership: Builds inclusive teams and empowers others to experiment and learn.
Accountability: Holds self and teams to high standards; measures what matters.
Change Catalyst: Leads through ambiguity, driving adoption of new ways of working.

Work Environment & Travel

Hybrid work model (Westborough, MA HQ) with periodic visits to colocation data centers, distribution centers, and club locations. After‑hours or weekend availability required for major incidents or change windows. Occasional travel (

Apply Now!

Similar Jobs ( 0)

-- View More Similar Jobs --