Full Stack Software/DevOps Engineer at JPMorgan Chase Bank, N.A. (London, UK)
Location: London, UK
Type: Full Time
Created: 2021-12-23 05:01:04
We are looking for Site Reliability Engineer (SRE) who runs, maintains and improves the services/products against established Service Level Objectives by applying software engineering practices. As a member of the Site Reliability Engineering (SRE) team, you will combine software and systems to help us build a world-class SRE function. SRE is responsible for the availability, performance, change management, monitoring, and capacity management of the services/products you support. Working with your team, you'll focus on creative problem-solving, optimizing existing systems, configuring and managing software and optimizing processes through automation .
All SREs must evidence the following: * Expert in at least one technology stack with designing, coding, testing , delivering software * Expert practitioner in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm * Working knowledge of infrastructure components. (E.g. Routers, load balancers , cloud products , container systems , compute, storage and networks) * Excellent debugging and trouble shooting skills * Expert in in performance monitoring and capacity management of large systems using various tools * Experience with the Agile delivery methodology * Proficiency in Continuous Integration and Continuous Delivery * Operational knowledge of modern DevOps stack (E.g. Git, Jules, Jenkins, ServiceNow ICA, AIM) and ability to troubleshoot its processes * Basic knowledge of operating system administrations / access control - MS Windows, Linux / UNIX * Good knowledge of scripting languages (E.g. GO / PYTHON / PERL / UNIX SHELL / POWERSHELL / AUTOSYS / BATCH) * Ability to follow and enhance code in JAVA and / or .Net * Experience with system integration / app troubleshooting / app optimizations, server deployments, application startups, connectivity, load balancing, firewalls, app performance * Working knowledge of messaging technologies; JMS, MQ Series, Kafka * Advanced User or DBA experience with relational databases: MS SQL, DB2 or Oracle. * Experience with app containers: WebSphere AS clusters; IIS/Microsoft Server; Tomcat; Spring Boot; Docker * Working knowledge of monitoring tools (E.g. AppD, Splunk, Dynatrace, Netcool, Tivoli) * Working knowledge on Windows Stack and support for SQL server * Good understanding of SRE concepts and drive for prod automation.
Responsibilities: * Designs, develops, tests and delivers the software to automate manual operational work * Troubleshoots priority incidents, conducts blameless post-mortems and ensures permanent closure of the incidents * Develop solutions to business problems and new features using either .Net or Java tech stack. * Engages with development team throughout the life cycle to help develop software for reliability * Applies analytics on the past data like incidents and usage patterns for predicting issues and takes proactive actions * Drives adoption of self-healing and resiliency patterns such as circuit breaker, bulkhead etc. * Designs and conducts the performance tests, identifies the bottlenecks, opportunities for optimization and the capacity demand * Defines and drives adoption of a best in class monitoring frameworks to accomplish end to end flow monitoring and noiseless alerting * Deploys the software and product upgrades * Manages the effort split between manual operational work and engineering work * Coaches other team members and manages teams as needed * Design and implement automations for routine processes * Support the day-to-day maintenance with a focus on reducing toil or technical debt of the applications while maintaining a high degree of system availability * Maintain and create all application and instance alerts as it relates to site reliability * Evaluate and modify infrastructure autoscaling as necessary for reliable operation * Maintain (evaluate and upgrade) all platform required applications and libraries * Control application log collection and analysis * Troubleshoot issues across the entire application stack * Perform in-depth research and identify sources of production issues * Identify efficiencies and ways to improve processes * Build monitors and alerts to ensure timely notifications on the health of critical systems * Ensuring automated jobs run as scheduled and logs are completed in accordance with established procedures * Participate in Disaster Recovery events and Major Event Changes * Provide on-call support on a rotating schedule or as needed for emergency situations, including outside of normal business hours * Manage the application risk to ensure the security and resiliency of the application are in compliance with firm and regulatory requirements
* Maintain system documentation and provide production metrics reporting
JPMorgan Chase & Co., one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as any mental health or physical disability needs.
The health and safety of our colleagues, candidates, clients and communities has been a top priority in light of the COVID-19 pandemic. JPMorgan Chase was awarded the "WELL Health-Safety Rating" for all of our 6,200 locations globally based on our operational policies, maintenance protocols, stakeholder engagement and emergency plans to address a post-COVID-19 environment.As a part of our commitment to health and safety, we have implemented various COVID-related health and safety requirements for our workforce. These requirements may include sharing information in the firm's vaccine record tool, vaccination or regular testing, mask wearing, social distancing and daily health checks. Requirements may change in the future with the evolving public health landscape. JPMorgan Chase will consider accommodation requests.
Equal Opportunity Employer/Disability/Veterans