Site Reliability Engineering ( SRE ) - Remote - Long term

Title: Site Reliability Engineering ( SRE ) 

Location: Remote

Duration: Long term


“Ideally, Client is looking for someone with Application Support experience, through monitoring and incident response; in addition to core sre principles (establishing, supporting & monitoring service levels)...as well as the DevOps background. Experience with our current tools is becoming a must (Opsgenie and New Relic).  

candidates should have advanced coding skills in Java, Go, Python, Shell and YAML, preferably with a minimum of 3-5 years of experience in all of these or similar languages. Candidates should have 3+ years’ experience in SRE and either or both of the following roles: DevOps, Software Engineering, leveraging automation extensively to achieve key deliverables.

Primary Responsibilities:


1. Independently designs, implements, productionizes and maintains site reliability guidelines, processes and systems

2. Service Level Definition, Configuration and Measurement:

Define SLIs, SLOs & SLAs specific to each application or system:

Configuration of monitoring & alerting tools suitable for each product and/or platform team

Measure reliability & resilience (through pre-defined SLIs & SLOs) utilizing monitoring/alerting tools to drive continuous improvement based on data analysis

3. Incident Management

Facilitation of incident response through the engagement of various teams and stakeholders, while providing robust communication and visibility to the organization during service interruptions

Provide Root Cause Analysis for failures

Experience with a modern incident management platform (OpsGenie) to effectively drive incident response and problem resolution

4. Monitoring & Alerting

Debug defects as well as develop dashboards using modern monitoring tools (e.g. New Relic, Splunk, AIOPs) to enable a reduction in mttd (detection time) & mttr (resolution time)

Build monitors and alerts designed to manage SLAs, optimize performance, and minimize outages

Construct E2E customer journey dashboards and alerts for customized transactions and applications

5. Automates reliability requirements into system and application implementations and updates; including the implementation of self-healing solutions (ansible, terraform, etc).

6. Work with product management team to contribute to 1) the identification of reliability features & requirements and 2) level of effort estimates

 

 

Vivek Kumar Yadav | Recruitment Specialist | VBeyond Corporation 

Phone. No: (908) 633-4182

E-mail: VivekY@VBeyond.com| Web: www.vbeyond.com |

Note:  VBeyond is fully committed to Diversity and Equal Employment Opportunity.

 

Disclaimer: We respect your Online Privacy. This is not an unsolicited mail. Under Bill S 1618 Title III passed by the 105th US Congress this mail cannot be considered Spam as long as we include Contact information and a method to be removed from our mailing list. If you are not interested in receiving our e-mails then please reply to VivekY@vbeyond.com subject=Remove. Also mention all the e-mail addresses to be removed which might be diverting the e-mails to you. We are sorry for the inconvenience.

 

Comments

Popular Posts