Introduction...Overview
This position is hybrid, must be able to work in Wyomissing, PA.
About the Role
We are seeking an experienced DevOps / Site Reliability Engineer (SRE) to help build and maintain the foundational platform services that support our engineering teams. This role focuses on improving developer velocity and system reliability through infrastructure automation, scalable tooling, and seamless CI/CD workflows.
You will work across cloud (AWS/EKS) and on-prem Kubernetes environments, write internal tooling and microservices in Go, and collaborate directly with developers to improve DevX and promote best practices. You will also play a key role in implementing and optimizing GitOps workflows via GitHub Actions and ArgoCD.
Responsibilities
Containerization & Tooling
- Build and optimize container images for performance, security, and reusability.
- Establish best practices around container lifecycle management, caching, and vulnerability scanning.
- Write internal tools and services in Go to support operational tasks, release workflows, and developer experience.
- Kubernetes Operations.
- Operate and manage both AWS EKS and on-prem Kubernetes clusters.
- Design and automate cluster bootstrapping, upgrades, and policy enforcement using Terraform.
- Implement GitOps workflows for infrastructure and application delivery via ArgoCD.
- CI/CD Pipeline Engineering.
- Develop and maintain GitHub Actions workflows for builds, tests, deployments, and release automation.
- Improve pipeline efficiency, observability, and failure transparency.
- Implement guardrails and quality gates to improve the safety and speed of software delivery.
- Developer Experience & Reliability.
- Act as a bridge between platform engineering and application teams, identifying pain points and reducing friction.
- Build self-service tools to simplify configuration, deployment, and service integration.
- Lead efforts in incident automation, observability improvements, and performance tuning.
- Advocate for infrastructure best practices and support internal adoption of platform tooling.
Requirements
- 3+ years in DevOps, SRE, or Platform Engineering roles.
- Strong experience with Go programming, especially around internal tooling or system components.
- Deep knowledge of Kubernetes administration (EKS and on-prem), Helm, and cluster operations.
- Production experience with ArgoCD, GitHub Actions, and GitOps workflows.
- Proficient with Terraform for infrastructure provisioning and configuration management.
- Strong container expertise (Docker/OCI), including optimization, layering, scanning, and secure publishing.
- Experience designing, securing, and debugging CI/CD pipelines.
- Proficient in Linux, scripting and distributed system debugging and can read a flame graph.
- Effective communicator who can collaborate across teams and drive adoption of infrastructure solutions.
Preferred Qualifications
- Experience with observability tooling (Prometheus, Grafana, Loki, OpenTelemetry).
- Familiarity with service meshes (Istio, Linkerd, Consul Connect) and ingress/gateway technologies.
- Contributions to or usage of open-source cloud-native projects.
- Knowledge of policy enforcement and compliance tooling.
- Prior work in developer platform teams or internal developer tooling initiatives.
#J-18808-Ljbffr