Added: 2025-05-20 13:47.00
Updated: 2025-05-25 03:41.05

Lead Site Reliability Engineer @ Kontakt.io

remote, poland, Poland

Type: n/a

Category: IT & Internet & Media

Advertisement
Requirements: English
Company: Kontakt.io
Region: remote, poland ,

Kontakt.iois building the platform that care operations run on.


We reduce waste, cut costs, and improve revenue by improving throughput, asset utilization and staff productivity. Our platform uses AI, RTLS, and EHR data to enable self-learning agents to automate workflows, adapt in real-time, and orchestrate all of care delivery operations.
Easy to deploy and scale, it gives a clear picture of spaces, equipment, and people, eliminating inefficiencies and enhancing the patient experience. With measurable 10X ROI and over 20+ use cases,Kontakt.iois the go-to platform for better and faster care delivery operations.


Were looking for a SRE Leader to own the reliability, performance, and automation of our cloud-based, real-time platform. This role will focus on keeping our platform running smoothly 24/7, minimizing downtime, improving observability, incident response, and self-healing automation. You will lead and scale the SRE team to ensure our infrastructure stays ahead of demand, operates efficiently and meets the needs of our growing healthcare customers.


Bonus Points If You Have:

Kontakt.iois building the platform that care operations run on.


We reduce waste, cut costs, and improve revenue by improving throughput, asset utilization and staff productivity. Our platform uses AI, RTLS, and EHR data to enable self-learning agents to automate workflows, adapt in real-time, and orchestrate all of care delivery operations.
Easy to deploy and scale, it gives a clear picture of spaces, equipment, and people, eliminating inefficiencies and enhancing the patient experience. With measurable 10X ROI and over 20+ use cases,Kontakt.iois the go-to platform for better and faster care delivery operations.


Were looking for a SRE Leader to own the reliability, performance, and automation of our cloud-based, real-time platform. This role will focus on keeping our platform running smoothly 24/7, minimizing downtime, improving observability, incident response, and self-healing automation. You will lead and scale the SRE team to ensure our infrastructure stays ahead of demand, operates efficiently and meets the needs of our growing healthcare customers.

,[Ensure 99.99% uptime across our cloud platform, meeting strict SLAs for healthcare customers., Design and implement self-healing, fault-tolerant systems to prevent failures before they happen., Define SLIs, SLOs, and SLAs, ensuring proactive performance monitoring and incident resolution., Architect and manage scalable cloud infrastructure (AWS) for massive real-time data processing., Optimize containerized environments (Kubernetes, Docker) to support multi-region deployments., Lead the adoption of infrastructure as code (Terraform) to fully automate infrastructure management., Build and refine a world-class monitoring, alerting, and logging system using Prometheus, Grafana, OpenTelemetry, and Datadog., Lead incident response and on-call operations, reducing mean time to detection (MTTD) and mean time to resolution (MTTR)., Conduct blameless postmortems and continuously improve system resilience., Reduce manual intervention through automated deployment, scaling, and failover mechanisms., Partner with Security & Compliance teams to ensure infrastructure meets HIPAA and SOC 2 standards, Lead disaster recovery and business continuity planning to ensure critical healthcare services are always available., Drive technical strategy and roadmap for scalability, monitoring, and reliability engineer
Advertisement
Click here to apply and get more details about this job!
It will open in a new tab.
Terms and Conditions - Webmaster - Privacy Policy