Requirements: English
Company: Global M
Region: Valladolid , Castile and Len
Design, develop, and maintain high-performance Python APIs (Flask, FastAPI, Django) to serve AI models (VLMs/LLMs).
Architect and deploy production-ready cloud infrastructure optimized for AI workloads (AWS, GCP, Azure, and compute-optimized providers like Vast.ai, RunPod or Lambda Labs).
Automate model deployment pipelines with CI/CD practices, containerization (Docker), and orchestration (Kubernetes or alternatives).
Design scalable, secure, and cost-efficient architectures tailored for vision-language (VLMs) and large language (LLMs) model inference at scale.
Implement and manage model serving systems (e.g., custom FastAPI/Django services or Triton Inference Server).
Monitor, log, and troubleshoot production systems and AI model performance, using tools like Prometheus, Grafana, Loki, Sentry, etc.
Handle both backend application codebases and infrastructure as code (Terraform, Pulumi) with a strong focus on reproducibility and scalability.
Collaborate closely with AI/ML engineers to integrate, optimize, and serve AI models efficiently.
Advertisement
Click here to apply and get more details about this job!
It will open in a new tab.