Role:- AWS Site Reliability Engineer
Location :- Atlanta , GA(Hybrid)
Job Description :-
- Suggest team's primary responsibility will be to enhance observability coverage across mission-critical delta applications, ultimately success criteria will be defined around reduced MTTI (Mean time to Identify, which refers to time to identify issues in a failing system).
- It will be very difficult to find one size that will fit all the asks, however we can evaluate and mix/match. I will be available to screen the profiles and then take an L-1 discussion.
- Primary:
- Dynatrace – On-Prem and SaaS | Person should have hands-on experience in setting up and designing dashboards
- Observability – Must have complete context of SLI/SLO/SLA, how to set, how to measure, how to track and communicate
- Open Source Observability Stack – Good Understanding of Open Telemetry, How to instrument applications to get desired metrics, traces, logs, etc
- AWS Service – Cloud Watch, X-Ray, Lambda, overall data flow
- Open Shift Rosa – Red Hat Open shift on AWS
- Development Experience – Any language, should be able to read code and develop utilities as required
- Good to have:
- Extensive SRE org setup/stakeholder management/assessment experience
- DevOps pipelining exp
- Quality Gate implementation exp to enhance the reliability of applications
- Extensive development experience in Python managing time series data
- Chaos Engineering - Gremlin, Chaos Monkey
Thanks,
Ankit Kumar Mishra
Direct : 732-832-3488 Ext: - 239
MSR Technology Group LLC
An MSRcosmos Group Company
No comments:
Post a Comment
Thanks
Gigagiglet
gigagiglet.blogspot.com