Job Description
Company: Infosys Limited
Location: Bangalore, Karnataka, India
Job Type: Full-Time
Experience: 9-13 Years
Education: Master of Engineering / Master of Technology / Bachelor of Science /
Bachelor of Engineering / BTech (or equivalent)
Expected Salary: ₹22 – ₹30 Lakhs per year
Job Category: Information Technology, Engineering
Website: Website info
Contact: Contact us
About the Role:
As a Senior Site Reliability Engineer (Application SRE) at Infosys, you will play a critical
role in supporting our application developers by providing expert guidance on reliability
best practices for both applications and infrastructure.
Your responsibilities include improving the reliability, quality, and time-to-market of our
suite of products and applications through strategic engineering, continuous monitoring,
and proactive issue resolution.
You will define system metrics (SLO/SLI), establish observability mechanisms, and
develop error budgets. In addition, you will design high-availability architectures, drive
a metrics-driven culture, and work closely with solution architects and development
teams to ensure our systems are robust, secure, and highly efficient.
Key Responsibilities:
- Reliability & Observability:
- Define suitable metrics (SLO/SLI) and set up observability mechanisms to track system
performance. - Establish error budgets as per defined SLOs and balance feature development speed
with system reliability. - System Architecture & Automation:
- Design strategies and implement high availability and load balancer-based
architectures. - Optimize automation and develop self-healing capabilities for systems.
- Operational Support:
- Provide primary operational support and engineering for products and applications.
- Manage and participate in on-call incidents (Priority Incidents) and lead root cause
analysis for issues. - Collaboration & Integration:
- Partner with solution architects and development teams to enhance service reliability
and performance. - Participate in system design, optimize code, and automate operational tasks to reduce
toil. - Performance & Security:
- Provide solutions for performance management, monitoring, and observability.
- Improve application security and performance through proactive measures.
- Process & Best Practices:
- Define, evangelize, and maintain SRE best practices along with DevSecOps standards.
- Work on distributed tracing to visualize workflows and analyse issues/incidents.
- Technical Excellence:
- Use scripting languages (Python, Ruby, JSON, Java, Node.JS, etc.) to develop and
maintain tools. - Leverage experience with observability tools (e.g., New Relic, Prometheus, DataDog,
Splunk) and event correlation tools like BigPanda. - Cloud & Infrastructure:
- Work with cloud platforms (AWS, Azure, Google Cloud) and container orchestration
tools (Kubernetes, Docker Swarm) to support scalable systems.
Technical and Professional Requirements:
- Experience:
- At least 5+ years of SRE experience in large-scale programs with a focus on release
engineering, observability, and reliability. - At least 2 hands-on project experiences in SAP S/4HANA public cloud are not required
here but relevant SRE project experience is essential. - Skills:
- Proficiency in one or more observability tools (e.g., New Relic, AppDynamics,
Prometheus, Dynatrace, DataDog, Splunk). - Strong experience in scripting or development languages, such as Python, Ruby, Java,
or Node. JS. - Experience with CICD tooling, Agile methodologies, and configuration management
tools. - Strong knowledge of microservices architecture, cloud cost optimization, and FinOps
is a plus. - Additional:
- Experience with container orchestration (e.g., Kubernetes, Docker Swarm) and
infrastructure automation tools (e.g., Terraform, CloudFormation, Ansible, Puppet). - Familiarity with ITSM tools such as ServiceNow and knowledge of SQL/NoSQL
databases