Position: Site Reliability Engineer Location: TX Duration: 12 Months Primary Responsibilities: - Layer in instrumentation in the development process so that applications can be monitored
- Establish measurements that are used to detect internal problems before they result into user visible outages
- Build processes and diagnostics tools to troubleshoot, maintain and optimize solutions and respond to customer and production issues
- Embrace continuous learning of engineering practices to ensure industry best practices and technology adoption, including DevOps, Cloud and Agile thinking
- Tech debt reduction/ Tech transformation including Open source adoption, Cloud adoption, HCP assessment and adoption
- Contribution to Optum Inner source / industry community
Can you please provide a summary of the project/initiative which describes what’s being done?. - 5+ years of experience as a Site Reliability Engineer
- 5+ years of experience creating runbooks, processes, and test plans around reliability, performance, etc. of infra/applications
- 5+ years of experience in integrating monitoring and alerting into cloud software solutions
- 5+ years of experience Defining Service Identify and measure SLOs, SLAs and SLIs
- 5+ years of experience performing root cause analysis/postmortem after each Incident and delivering resolution for tools and automation failures
- 3+ years of experience in implementing dashboards to help teams visualize logs, instrumentation and other data to ensure optimal performance of the applications.
|
Comments
Post a Comment
Thanks