Cloud Site Reliability Engineer
: Job Details :


Cloud Site Reliability Engineer

Disability Solutions

Job Location : Dublin,CA, USA

Posted on : 2025-08-10T20:23:48Z

Job Description :
We are seeking a highly skilled Site Reliability Engineer with 3 years of experience to join our dynamic team. The ideal candidate will have a strong background in cloud technologies, with a focus on designing, implementing, and managing cloud-based solutions. As a Site Reliability Engineer, you will play a key role in ensuring the availability, performance, and security of our cloud infrastructure.In this role you will:* Lead the day-to-day technical operations, providing the highest levels of availability, reliability, and scalability of the services.* Implement best practices for cloud security, including identity and access management, encryption, and network security.* Provide technical expertise to handle customer escalations and ensure stability in customer environments.* Conduct performance analysis and lead monitoring initiatives on multiple hosted products/platforms.* Maintain operational run book procedures for all production systems and document the knowledge base.* Administer incident management activities (detection, recording, classification, and closure) and provide timely escalations and notifications as required by procedure.* Participate in on-call rotation to respond to cloud-related incidents and emergencies.* Troubleshoot and resolve complex technical issues in a timely manner.* Monitor and optimize cloud infrastructure for performance, cost, and security.* Collaborate with cross-functional teams to troubleshoot and resolve complex cloud-related issues.* Mentor junior team members and provide technical guidance and support.You've got what it takes if you have:* U.S. citizenship required* Minimum bachelor's degree in computer science, engineering, or a related field, or equivalent experience.* 3+ years of experience in cloud operations.* Comprehensive understanding of cloud computing principles and architectures.* Extensive experience in Linux/Unix environments.* Proficiency in containerization technologies like Docker and Kubernetes.* Strong scripting skills in Python or Bash.* Proficient in debugging and optimizing Java-based applications.* Hands-on experience in deploying, optimizing, and troubleshooting applications on Tomcat and JBoss application servers.* Hands-on experience in managing and optimizing Memcached, Nginx, ActiveMQ, Elasticsearch, and Redis applications.* Experience with monitoring and logging tools such as Newrelic and the ELK stack.* Sound knowledge of networking concepts, including TCP/IP, DNS, and VPN.* Proficiency in automation and configuration management tools like Ansible, Jenkins, and Bitbucket.* Thorough understanding of monitoring and alerting tools such as Nagios, New Relic, Grafana, and CheckMk.* Experience with distributed storage technologies such as NFS, Netapp, and Amazon S3, as well as dynamic resource management frameworks (e.g., Kubernetes).* Experience working in Datacenter and AWS cloud platforms.* Strong communication and collaboration skills.* Excellent troubleshooting and problem-solving skills.
Apply Now!

Similar Jobs (0)