The Site Reliability Engineer (SRE) – SQL is a critical technical leader responsible for ensuring the availability, performance, and reliability of Tyler Technologies' SQL Server infrastructure. This role combines deep database expertise with operational excellence to support high-performance systems across complex client environments. The ideal candidate is confident, authoritative, and takes ownership in diagnosing and resolving issues, while influencing database design, infrastructure improvements, and long-term stability strategies.
Hybrid Work Policy: The candidate is required to be on-site at least 3x per week at the Plano, TX office.
Responsibilities
- Serve as the lead resource for high-severity SQL Server incidents, driving triage, diagnostics, and resolution in real time.
- Own performance tuning, indexing strategies, and architecture-level improvements to optimize database systems at scale.
- Proactively monitor database performance, system health, and workload trends to identify and resolve issues before they impact customers.
- Collaborate with product and development teams to refine schema design, improve query performance, and enhance overall data architecture.
- Develop and maintain database standards, observability dashboards, and automation for patching, backups, and alerting.
- Design and execute comprehensive backup and disaster recovery strategies for critical systems.
- Contribute to continuous improvement initiatives, including cloud modernization, infrastructure as code, and capacity planning.
- Author technical documentation, including runbooks, architecture designs, internal KBs, and lessons learned.
- Provide mentorship and technical leadership to engineers across support and infrastructure teams.
- Advocate for architectural and operational improvements across teams, using data and insight to influence decisions.
Role Complexity
To be successful, this individual must:
- Demonstrate expert-level understanding of SQL Server internals, optimization techniques, and operational best practices.
- Lead high-stakes conversations and incident calls with clarity, confidence, and control.
- Understand system-level performance (I/O, memory, CPU) and how it affects SQL operations.
- Communicate technical issues and solutions to both technical and business stakeholders effectively.
- Analyze recurring incidents to identify trends and permanently resolve root causes.
- Operate autonomously with a sense of ownership and urgency.
- Balance short-term firefighting with long-term architectural planning and automation.
Qualifications
- Bachelor's degree in Computer Science, Information Systems, or a related field—or 5+ years of equivalent experience.
- Proven experience in a production-facing SQL Server environment, preferably in a SaaS or multi-tenant context.
- Query tuning, indexing, and execution plan analysis.
- High availability, replication, disaster recovery, and backup strategies.
- Scripting and automation using PowerShell, T-SQL, or similar.
- Monitoring and observability tools for SQL Server and infrastructure health.
- Strong familiarity with virtualization, storage performance, and cloud platforms (e.g., AWS, Azure).
- Demonstrated ability to lead incident response and influence cross-functional technical decisions.
- Exceptional written and verbal communication skills, especially under pressure.
- Previous experience mentoring peers or junior engineers is a plus.
#J-18808-Ljbffr