Requirements: English
Company: DecisionBrain
Region: Bologna , Emilia-Romagna
Site Reliability Engineer (Junior) Bologna, ItalyDecisionBrain develops custom decision-support applications for various clients. While each solution is unique, they share common architectural traits:Web-based applications with a microservice architectureDeployment on various Kubernetes environmentsBuilt using DB Gene, our proprietary development platformAs a Site Reliability Engineer (SRE), you will play a crucial role in ensuring the stability, scalability, and reliability of our applications. You will primarily provide L2 support, assisting users and resolving infrastructure-related issues.Beyond client-facing support, you will also monitor internal tools and deployments to ensure smooth operations for both our customers and our colleagues. We use Slack for real-time communication and rapid response to incidents.Key ResponsibilitiesTechnical Support (L2 & Incident Response)The first point of contact for users (L1) is managed by Business Analysts, who handle basic troubleshooting and initial issue triaging. As an L2 Support Engineer, your role is to:Investigate and resolve technical issues beyond the scope of L1 support.Analyze software bugs, configuration problems, and system performance issues.Review logs, monitor infrastructure health, and validate system components.Provide actionable insights and recommendations to improve platform stability and production readiness.Document findings, actions taken, and resolutions in tickets for both users and L3 support.Escalate critical software bugs or advanced issues to the L3 engineering team while providing structured analysis to aid in resolution.Infrastructure & Application TroubleshootingYour role will involve diagnosing problems and ensuring service reliability across both infrastructure and application layers.Infrastructure Troubleshooting:Monitor system health via Grafana, Loki and Prometheus (our stack) or other observability tools.Check Kubernetes components (pods, jobs, volumes) and logs for errors.Perform actions such as restarting pods, adjusting memory allocations, or resizing volumes to restore services.Work on the alerting stack, integrating it with internal tools to ensure proactive issue detection and resolution.Contribute to an Internal Developer Platform (IDP) approach, where we map and maintain the knowledge of our software assets, configurations, deployments, credentials, and related issues.Application Troubleshooting:Analyze logs and error messages from microservices.Run data validation checks and attempt to reproduce issues in test environments.Work closely with the Platform team, contributing to discussions on architecture improvements and bug fixes.Collaborate in the development of tools and automation scripts to enhance system observability and reliability.Required Skills & QualificationsEducation:Bachelor or Masters degree in Computer Science, Information Technology, or a related technical field.Technical skills:Understanding of microservice architecture: front-end, back-end, databases, REST API interactions.Knowledge of infrastructure & software components:Memory (Java heap, stack, native memory, etc.)CPU performance and throttlingDisk usage, logs, error handling, HTTP status codesKnowledge of DockerFamiliarity with Kubernetes: Deployment management, Helm charts, command-line usage (kubectl).Experience with monitoring tools such as Grafana, Prometheus, Loki.Experience with infrastructure-as-code tools (optional) such as Terraform.Scripting skills in languages such as Bash, Python, or equivalent.Personal skills:Excellent written communication in English (support documentation, ticketing, and user communication).Problem-solving mindset: Ability to troubleshoot issues methodically and document solutions.Customer-oriented: Ability to work with users of varying technical expertise.Organized and resourceful: Strong investigative and documentation skills.Language Requirements:English (Proficient, written & spoken)French and/or Italian is a plusWorking ConditionsContract type: PermanentWork schedule: Full-time, up to 2 days/week working from homeCompensationGross annual salary: from 25 to 30K annuallyBenefits: Meal vouchers, thirteenth month salary, remote workTechnical equipment: laptop (Mac or PC) / double screenRecruitment ProcessOnline meeting Company presentation, role discussion, candidate motivationTechnical test Hands-on problem-solvingManager interview Review of technical test & discussionFinal interviewPractical InformationStart Date: ImmediateTo apply: Please fill out the form below and attach your CV (Your CV must be in English)Application#J-18808-Ljbffr