Search This Blog

August 31, 2021

Sr. SRE / DevOps Engineer // Remote

Hello All,

Please find the below requirement details

 

Position: Sr. SRE / DevOps Engineer

Location: 100% Remote

Duration: 12 Months

Rate: $75/Hr. on C2C

 

 

Skillset:


Any Configuration Tool

Automation Experience

Experience with Programming Language

Any Container Tool

 

 

Job Description:

 

Sr. Site Reliability Engineer

Job Summary:

As a member of the SRE team, you will work with other DevOps practitioners to produce mission-critical infrastructure, tools, and processes that will ensure highest levels of availability and reliability of all our websites, systems, and services. As a senior member of the team, you will be expected to work with management, peers, and customers to define and implement the technical vision of the team.

 

You are right for the job if you are comfortable with deep technical Linux, networking topics, and distributed architectures. You will work cross-functionally amongst a variety of teams and be a core contributor in every significant engineering service or solution that we deliver to our stakeholders. You will excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization, and organization. You will work directly with our Software Engineering teams to build our next generation "always up" and "highly available" cloud-based e-commerce/Retail and Enterprise platform.

 

Site Reliability Engineers are hybrid systems and software engineers who are responsible and take ownership for reliability, scalability, automation, and other issues related to uptime and availability of Walmart's e-commerce/Retail and Enterprise platform. Our goal is to build, scale and guard the systems that delights the customers. To do so, you will need to strong skills in following areas:

Design, write and build tools to improve the reliability, latency, availability, and scalability of Walmart e-commerce/Retail and Enterprise products.

Engender reliability and availability starting with metrics and measurements.

Enable scaling by providing tools, developing training and/or augmenting processes.

Build tools/automate to prevent re-occurrence of problem to mission critical products/services.

Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure.

Participate in capacity planning, demand forecasting, software performance analysis and system tuning.

Develop a deep understanding of the numerous services and applications that come together to deliver Walmart e-commerce/Retail and Enterprise products.

Design new tools to monitor and smart alerts that help discover failures/issues in a timely fashion and work with engineers to identify root cause and fix issues.

Influence, design and create new architectures, standards, and methods for large-scale enterprise systems.

Root-cause analysis complex problems involving multiple parties, networks, hardware, and software that relate to scaling and performance.

Participate in on-call rotation.

Secure the system from issues, be they real, perceived, or notional.

High focus on collecting and inferring metric documentation to be used by others to build and maintain systems.

Scripting and Development responsibilities

Experience with configuration management tools such as Ansible, Saltstack, Chef and Puppet

Build and drive the automation systems that maintain system health

Eliminate Single Point of failure and test disaster recovery and HA regularly.

 

Additional responsibilities may include:

Drives standardization and service focused instrumentation. Provides subject matter expertise. Resolves break/fix scenarios, engaging broader teams as necessary; and partners/leads to achieve continuous improvement. Contributes to command-and-control related activities focused on restoration of complex outages, and rapid restoration. Participate on 24/7 on-call rotation. May work independently or as part of a team on more complex projects. Provides mentoring and guidance to more junior team members.

Creates systems engineering and architectural develop software in several modern languages. Develops large/complex database-backed systems and understands DB schema and query performance. Utilizes professional best practices in day-to-day work like revision control, unit testing, or other. Applies statistical data analysis techniques.

Networking responsibilities: Understanding and performing TCP dumps, snoop, and other network sniffers. Understands and applies knowledge of most protocols (TCP/IP, HTTP, UDP, etc.)

Application Technologies: Provides recommendations and advice to the team and/or department in the areas of web services, OS, and storage, including being an active liaison to Development, QA, and the Business.

Analyzes systems and makes recommendations to prevent potential problems. Takes lead on issue resolution activities using knowledge of complex and company-wide systems.

Lead end-to-end audit of monitors and alarms based on subsystem knowledge.

Utilizes time management and project management skills to lead the resolution of issues in a timely and organized manner, effectively communicating necessary information. May consult directly with developers or third-party vendors; provides subject matter expertise.

Consistent exercise of independent judgment and discretion in matters of significance.

Other duties and responsibilities as assigned.

 


--

Thanks & Regards,

Ranjith Dandabathini 

Account Manager - Apex Account

A picture containing text    Description automatically generated

 

 

Phone: (209) 392-5335

Email: Ranjith@imcsgroup.net

9901 East Valley Ranch Parkway

Suite 3020 Irving, Texas – 75063


Logo, company name    Description automatically generatedGraphical user interface    Description automatically generated with medium confidence

Disclaimer
This electronic mail (including any attachments) may contain information that is privileged, confidential, and/or otherwise protected from disclosure to anyone other than its intended recipient(s). Any dissemination or use of this electronic mail or its contents (including any attachments) by persons other than the intended recipient(s) is strictly prohibited. If you have received this message in error, please notify us immediately by reply e-mail or e-mail unsubscribe@imcsgroup.net so that we may correct our internal records. Please then delete the original message (including any attachments) in its entirety. Thank you