Job Location : all cities,OH, USA
Must be local to Columbus, OH or willing to relocate.
We are seeking a highly skilled and motivated AWS Cloud DevOps / Site Reliability Engineer (SRE) to join our team. This role focuses on building, automating, and maintaining reliable, secure, and scalable AWS infrastructure while supporting CI/CD pipelines and improving system observability. The ideal candidate is passionate about automation, resilient cloud systems, and continuous improvement in software delivery and operations.
Key ResponsibilitiesDesign, build, and maintain scalable and secure AWS cloud infrastructure using services such as Lambda, EC2, S3, RDS, API Gateway, VPC, and IAM.
Develop and manage Infrastructure as Code (IaC) using Terraform and Terragrunt.
Implement and optimize CI/CD pipelines using Azure DevOps or similar tools.
Automate deployments and integrate IaC into delivery workflows.
Maintain monitoring, alerting, and observability systems using CloudWatch, Dynatrace, and Splunk.
Troubleshoot infrastructure and application issues, conduct root cause analysis, and support incident response.
Apply security best practices, manage IAM roles and policies, and perform vulnerability assessments.
Optimize system performance and cost efficiency through automation and tuning.
Collaborate with development teams to support application reliability and deployments.
Create and maintain comprehensive runbooks and documentation for operational procedures.
Participate in on-call rotation and continuously improve system resilience and recovery processes.
Bachelor's Degree in Computer Science, Engineering, or a related field.
3+ years of experience in a DevOps or SRE role.
Hands-on experience with AWS cloud services and Infrastructure as Code tools like Terraform.
Proficiency with scripting languages such as Python, TypeScript, or Boto3.
Strong understanding of CI/CD concepts and experience with tools such as Azure DevOps.
Familiarity with monitoring/observability platforms: CloudWatch, Splunk, Dynatrace.
Solid grasp of cloud security, networking fundamentals, and cost management.
Experience optimizing infrastructure for cost and performance.
Knowledge of serverless architectures and event-driven systems.
Passion for automation, system reliability, and continuous improvement.
Excellent communication, team collaboration, and problem-solving skills.