Job Location : San Jose,CA, USA
Monitoring & Central Dashboard
- Lead the design and implementation of enterprise-wide monitoring solutions for infrastructure and applications.
- Architect centralized dashboards for real-time visibility into system health, performance, and alerts using AIOps platforms.
- Ensure proactive incident detection and resolution through event monitoring systems and technical staff interventions.
Disaster Recovery
- Own the disaster recovery strategy and execution across business-critical systems.
- Conduct risk assessments, DR drills, and ensure alignment with business continuity objectives and compliance standards.
- Collaborate with IT and business units to ensure recovery plans are aligned and tested regularly.
ServiceNow & ITSM
- Oversee ServiceNow modules including Incident, Problem, Change, and Request Management.
- Ensure all incidents are logged, categorized, and prioritized in ServiceNow with complete lifecycle documentation.
- Integrate email-based incident creation and automate workflows for faster resolution.
KPI/SLA Governance
- Define and track KPIs and SLAs across IT operations and service areas.
- Generate regular reports for stakeholders and ensure SLA adherence, especially for P1/P2 incidents.
- Lead governance meetings (CCB-CMT, DSR) and bridge calls for critical incidents.
AIOps & Automation
- Drive AIOps initiatives to automate root cause analysis, anomaly detection, and predictive maintenance.
- Collaborate with cross-functional teams to implement AI-driven insights into operational workflows.
- Support continuous improvement through digitization and transformation programs.
Process Improvement
- Optimize ITIL-aligned processes for Incident, Problem, and Change Management.
- Maintain comprehensive documentation for IT processes and workflows.
- Implement quality control measures to reduce error rates and backlog tickets.
Required Skills
- 10+ years in IT operations and infrastructure support.
- Hands-on experience with ServiceNow, monitoring tools (e.g., Moogsoft , SolarWinds, Dynatrace), and DR technologies.
- Strong understanding of ITIL processes, SLA governance, and AIOps platforms.
- Proficiency in automation scripting (PowerShell, Python) and dashboarding tools.
- Excellent communication and stakeholder management skills.
#J-18808-Ljbffr