We have below requirement with our client. Please go through the JD let me know your interest
Role: Site Reliability Engineer (SRE)
Location: Remote
Duration: Long Term
Must have 12+ Years experience.
Need USC, GC & EAD only.
LinkedIn & Passport Number Must for Submission
Job Description
Key Responsibilities:
• At least 12+ years of experience defining and implementing Monitoring solutions - alerts, Telemetry, and instrumentation for on-premises and cloud platforms for large enterprises
• Site Reliability Engineer will be playing a key role in building Observability and Resilience capabilities on cloud platform (Azure). Responsibilities of the SRE will be:
• Build and configure alerts, tracing, telemetry, and instrumentation required for Infrastructure Monitoring and Application Performance Management.
• Role entails implementing dashboards to monitor and share Observability at various levels (engineering teams, portfolio, senior management).
• Support resilience engineering (application and infrastructure resilience) to meet availability requirements.
• Work with development engineers, cloud engineers, product teams, and support engineers to gather requirements, implement, and evolve observability and resilience solutions.
Key Skillsets :
• Good knowledge on Observability and Application Performance Monitoring best practices, KPIs/metrics on Cloud platforms
• Experience in monitoring tools such as Splunk, Dyna Trace, Prometheus, Cloud Watch, Azure Monitor, New Relic, other open-source tools.
• Experience building monitoring solutions for variety of workloads such as Micro services (Java / Spring boot desirable), databases, Kafka, Kubernetes
• Experience in resilience engineering, and implementing high availability solutions
• Experience creating Monitoring dashboards using tools such as Grafana (Preferred), Splunk, Kibana, Power BI
• Ability to work in a fast paced and agile environment
SRE Maturity Level 3 (Expectation)
· DevOps Observability
o DORA Metrics are visible
§ Deployment frequency, Mean Time To Restore (MTTR), Cycle time, Change failure rate
· IaC (Infrastructure as Code)
o Platforms leverage IaC
· Test / Release automation
o Unit tests
§ Test in a vacuum
o Integration tests
o Load test results validated against SLOs
o Test run as part of CI/CD pipeline
o Automated rollback
o Business Continuity Plan for Recovering Service(s)
· Capacity planning review
o Show saturation of service as compared to load test and production peak load
· Product Management (Security)
o Security scanning
o Documented procedures for Vulnerability Management
o Integrated into CI/CD pipeline (partner with security)
SRE Maturity Level 4 (Advanced)
· Modernized application
o Deployment to Kubernetes, Azure, or SaaS via CI/CD pipeline
· Synthetic Monitoring
· Canary / Blue Green Deployment
· Self-Healing
· Auto scaling
· Identify KPIs for business performance
· Chaos Engineering
Enterprise Process Tie-Ins
· Problem management will as part of RCA will review the maturity level of the incident owner
Thanks & Regards
Mohd Taher
Unicom Technologies Inc., A Certified MBE
896 S Frontenac st, Ste#112, Aurora, IL 60504
Mail: Taher@unicomtec.com
Web: www.unicomtec.com
Notice to Recipient/Recipients: This electronic message contains information from UNICOM Technologies Inc. IL.USA ("UNICOM"), which may be confidential. The information in this message is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic message in error, please notify UNICOM Technologies Inc by email immediately and thereafter delete this message from your system. UNICOM and its affiliates do not accept liability for any personal views expressed in this message or for any viruses inadvertently transmitted through this email to be removed from our mailing list reply to Taher@unicomtec.com with 'remove' in the subject heading and your email address in the body. Include complete address and/or domain/aliases to be removed