Real-jobs.eu: Site Reliability Engineer

Requirements: English
Company: ION
Region: Milan , Lombardy

About us:The ION Group is made up of innovators who provide trading and workflow automation solutions, high-value analytics, and strategic consulting to corporations, financial institutions, central banks, and governments.More than 40% of the worlds largest companies use our solutions. Weve achieved tremendous growth by bringing together some of the best and most successful financial technology companies in the world.At ION, we offer careers that provide many opportunities: To invent. To design. To collaborate. To build. To transform businesses and empower people around the world to do more, faster and better than before. Imagine what you can do and experience. This is where you can do your best work.Learn more at iongroup.com.We are looking for experienced people who are competent in the cloud and knowledgeable about the SRE (site reliability engineering) domain.The teamThe Core Architecture Team (CAT) produces and manages the core technology, methodologies, and frameworks that underpin all new or re-engineered ION products.We provide our internal and external customers foundations and an open platform they can extend and evolve to manage their solutions independently and with reduced cost of ownership.The ION Cloud Center of Excellence is aimed to support the Group''s strategy toward a Cloud native offering" via a cross-functional team of empowered people that are responsible for developing and managing the strategy, governance, and best practices for the entire Group.Some of the team deliverables:Create the ION Cloud Infrastructure reusable by all the ION Divisions.Reduce the total cost of ownership.Provide guidelines and best practices for the entire organization.Reduce operation complexity via automated platform configuration and deployment.Provide tools that ease the developers to set up the CI environment for ION Products.Governance on the development tools, to increase operational efficiency.Technology recommendations standardization and infrastructure and product design, across the Group.Who you areYour background is either in software development or operations/infrastructure (or both!), and you enjoy coding or automating your workflows.You have proven experience in working with cloud providers and dealing with cloud-first applications engineered with a cloud-native mindset.You are a self-starter individual and a constantly learning engineer who enjoys working in a team of peers.You are open and candid about discussing solutions, problems, and improvements within your team and others in the engineering organization.You have a passion for site reliability engineering (SRE) principles and adoption, and you are keen to start conversations with teams about reliability, performance, and security of the applications, services, and systems.You are an advocate of the DevOps or SRE approach, promoting loosely coupled, heavily automated, constantly monitored distributed systems, and you always plan for failure and never take anything for granted.You are keen to raise the bar of the solutions provided by the whole engineering team (dev and ops).You possess strong written and verbal communication skills.You are happy to be involved in an on-call rotation if needed.What you''ll be doingIts fine to have some of these; the more, the merrier!The Cloud Engineer sideMaintain our internal tooling and automation, to improve the reliability, scalability, and the observability of our services.Proactively identify and solve issues across the whole stack, together with the rest of the infrastructure and engineering teams.Contribute to raise awareness in the security and protection of the cloud, understanding how to fit these into timelines and backlog of the end team.Understand how a distributed application works, constraints, and limitations.Have strong coding and scripting experience and you are interested in improving your programming/coding knowledge (Python or Go ideally).The Site Reliability Engineer sidePromote and execute the adoption of SRE principles and raise awareness on the importance of reliability and automation.Help the team understand concepts like ownership, error budgets, and production readiness.Help define and implement SLIs, SLOs, and check SLAs, to meet customer satisfaction.Work together with teams to identify and solve issues in platforms and tune services for reliability and performance.Aim to reduce toil and manual efforts with automation and repeatable and documented tooling and standard procedures.Take an active part in the incident management process to troubleshoot impacting issues in a timely manner and engage with all stakeholders involved.Your skills, experience, and qualificationsThese are must-haves!Our work language is English, hence its very important to be proficient with it.Extensive knowledge and experience in one of the major clouds, including AWS, Azure, GCP; with a comprehensive understanding and real-world implementation experience (We currently use AWS and Azure).Microser

Site Reliability Engineer

Milan , Lombardy, Italy

Type: n/a

Category: IT & Internet & Media