ACL Digital
Job Location :
Houston,TX, USA
Posted on :
2025-08-25T13:21:44Z
Job Description :
JOB DESCRIPTIONCustomer is building an edge (80%) and cloud (20%) application for safe automation andoptimization of Well Construction processes. Read about the product - DrillOps at our website.There are teams located in US, China, France, Georgia.SRE missionBuilding the foundation for modern ops. By using available monitoring system, the SRE willanalyze design and propose way to improve the environment monitoring including the rightand wrong things to monitor and why. The problem SRE will need to solve for our team makeavailable in Cloud current state of each edge deployment (system health, SLI, performance). SREshould be able to identify product issues as they arise in production/test environments andcreate automated (as much as possible) solutions for fixing the issues to keep incidentmanagement sustainable.Responsibilities* In charge of maintaining/improving product monitoring system* Incident response management (troubleshooting, resolution, documentation, post-mortem analysis)* Knowledge sharing on the lessons learnt* Be a bridge between operations and development??Key Requirement Engineers with existing SRE experience - most of SREs have cloud productsbackground, and our focus is Edge.Experience required* Building solutions from scratch* Writing code to automate processes (log analysis, testing production environments, alertsautomation)* Expertise in cloud providersToolsIncident management/on-call: PagerDutyLogging: ELK/Kibana, SEQ loggingLanguage: Python, C#, scripting.Database: SQL,MongoNetwork: Basic network knowledge (inbound/outbound and fw rules)Monitoring: Prometheus, GrafanaProject management and issue tracking: AzureDevOps, WikiSource code management: GitInfrastructure and orchestration: SaltStack, Docker, Zededa? No
Apply Now!