DevOps / Site Reliability Engineer

About the Role

We are looking for a DevOps / Site Reliability Engineer (SRE) to design, automate, and manage the infrastructure supporting our IoT and edge AI surveillance platforms. You will ensure scalability, reliability, and security across cloud, on-premise, and edge deployments. This role requires strong expertise in container orchestration, CI/CD pipelines, monitoring, and infrastructure automation, with bonus experience in managing distributed fleets of remote edge devices.

Responsibilities

Design, implement, and maintain infrastructure-as-code using Terraform and Ansible.
Deploy and manage containerized applications with Docker, Kubernetes, and Helm.
Set up and optimize CI/CD pipelines using GitHub Actions, GitLab CI, or Jenkins.
Ensure system observability by integrating Prometheus, Grafana, ELK stack, and OpenTelemetry.
Configure and optimize load balancers (HAProxy, NGINX, Envoy) for scalable traffic distribution.
Harden and monitor Linux-based environments, ensuring compliance with security best practices.
Automate deployment and scaling processes for cloud, on-prem, and edge infrastructures.
Collaborate with backend, AI, and embedded teams to streamline development-to-production workflows.
Troubleshoot and resolve infrastructure, networking, and performance issues.

Requirements

4+ years in DevOps or SRE roles.
Proficiency with Docker, Kubernetes, and Helm charts.
Strong background in Linux administration, networking, and security hardening.
Hands-on experience with GitHub Actions, GitLab CI, or Jenkins.
Experience with Prometheus, Grafana, ELK stack, and OpenTelemetry.
Practical knowledge of HAProxy, NGINX, or Envoy.
Experience with Terraform, Ansible, or similar infrastructure-as-code tools.

Nice to Have

Experience with edge deployments and managing large fleets of remote IoT/AI devices.
Familiarity with hybrid cloud-edge architectures.
Exposure to service mesh technologies (Istio, Linkerd).
Knowledge of hardware resource monitoring for constrained edge devices.
Experience in chaos engineering, failover, and disaster recovery strategies.

Apply for this position

Interested in this role? Send us your resume and we'll get back to you soon.

Apply via Email

careers@qareeb.io

Job Summary

Type:Full-time

Experience:4+ years

Location:Hybrid

Department:Infrastructure