Requirements: English
Company: Global M
Region: Gorguja , Catalonia
* Design, develop, and maintain high-performance Python APIs (Flask, FastAPI, Django) to serve AI models (VLMs/LLMs). * Architect and deploy production-ready cloud infrastructure optimized for AI workloads (AWS, GCP, Azure, and compute-optimized providers like Vast.Ai, RunPod or Lambda Labs). * Automate model deployment pipelines with CI/CD practices, containerization (Docker), and orchestration (Kubernetes or alternatives). * Design scalable, secure, and cost-efficient architectures tailored for vision-language (VLMs) and large language (LLMs) model inference at scale. * Implement and manage model serving systems (e.G., custom FastAPI/Django services or Triton Inference Server). * Monitor, log, and troubleshoot production systems and AI model performance, using tools like Prometheus, Grafana, Loki, Sentry, etc. * Handle both backend application codebases and infrastructure as code (Terraform, Pulumi) with a strong focus on reproducibility and scalability. * Collaborate closely with AI/ML engineers to integrate, optimize, and serve AI models efficiently.