Summary:Responsible for designing, implementing, and maintaining enterprise monitoring and observability solutions, with a focus on LogicMonitor and related monitoring platforms. Works closely with IT teams and stakeholders to ensure full infrastructure visibility, reduce alert noise, automate monitoring workflows, and improve system performance. This role plays a key part in establishing best practices for logging, metrics, and alerting, while leveraging automation to optimize monitoring efficiency. Provides technical leadership and contributes to the long-term monitoring strategy for the organization.
Qualifications:Education:
- Bachelors degree in Engineering, Mathematics, Computer Science or equivalent technical experience.
Licenses/Certification:
- CompTIA Server+ or Network+ (preferred).
- ITIL Foundation Certification(preferred).
- Relevant monitoring certifications(e.g., LogicMonitor, Splunk, Datadog, or SolarWinds) are a plus but not required
Experience:
- 4-7 yearsof experience inIT monitoring, observability, or systems engineeringin a mid-to-large-scale enterprise environment.
- Hands-on experience withmonitoring platformssuch asLogicMonitor, SolarWinds, Datadog, Splunk, Nagios, or similar tools.
- Experience configuringalert rules, event-based triggers, dynamic thresholds, anomaly detection, and dashboard visualizations.
- Strong understanding ofIT infrastructure components, includingservers, networks, cloud environments, and virtualization.
- Experience working withRBAC policies, IT service management tools (e.g., ServiceNow), and automation workflows.
- Scripting or automation experience(PowerShell, Python, API integrations)is a plus.
Essential Functions:
Monitoring Platform Administration&Optimization
- Manage and optimize monitoring platforms, ensuring properinfrastructure coverage and alerting configurations.
- Develop and maintain monitoring standardsfor IT infrastructure, including servers, applications, and network devices.
- Refine alerting processesby reducing noise, tuning thresholds, and improving actionable insights.
- Manage daily monitoring administration tasks, includingadding/removing devices, troubleshooting issues, and maintaining system integrity.
Incident Management&Proactive Monitoring
- Analyze monitoring datato detect performance bottlenecks and outages.
- Implement proactive alerting strategiesto identify issues before they impact hospital operations, ensuring rapid response to potential failures.
- Configure anomaly detection and predictive analyticsto recognize early warning signs of system degradation.
- Develop dashboards and automated reportsto provide real-time visibility into infrastructure health and application performance.
Collaboration&Process Improvement
- Work with IT teams(server, network, security, and applications) to align monitoring strategies and ensure visibility across environments.
- Implement RBAC policies to provide team-specific access and monitoring configurations.
- Document monitoring policies and best practices, ensuring knowledge-sharing across teams.
Automation&Integrations
- Create automation and integrations using scripts or APIs to enhance monitoring workflows and reporting.
- Develop event-driven automationto reduce manual intervention in monitoring-related tasks.
- Ensure monitoring platforms are seamlessly integratedwith ITSM tools for efficient incident tracking and resolution.
Knowledge/Skills/Abilities:
- Strong expertise inLogicMonitor, Datadog, SolarWinds, Splunk, or similar monitoring platforms.
- Understanding ofmetrics, logs, and tracesas part of a comprehensive observability strategy.
- Experience configuringalert rules, anomaly detection, event-based monitoring, and trend analysis.
- Familiarity withRBAC policies, ITSM tools (e.g., ServiceNow), and API integrations.
- Basic scripting and automation skills(PowerShell, Python, or other scripting languages)preferred.
- Strong analytical skills toidentify trends, detect anomalies, and troubleshoot monitoring issues.
- Effective communication skills tocollaborate with IT teams and business stakeholders.
- Ability to workindependently and within a team, taking ownership of monitoring initiatives.
- Approximate percent of time required to travel: 10
ACKNOWLEDGEMENT:
- This description is designed to indicate the general nature and level of work for this position. It is not intended to describe minor duties or other responsibilities that may be periodically assigned.
- You agree to conduct your job responsibilities in accordance with the standards set out in the Employee Handbook, Company's Code of Business Conduct, its policies and procedures, applicable federal and state laws, and applicable professional standards.