Who We Are:
Osmos, a B2B SaaS company founded by ex-Amazon ad-tech experts, is revolutionizing retail media with an AI-powered operating system that increases retailer profitability (by up to 7% of sales) and delivers superior ROAS for brands. By enabling Tier 1 retailers and marketplaces worldwide to activate more brands and leverage advanced targeting, we help them secure a lasting competitive edge.
Your Impact:
We are seeking a highly skilled Staff DevOps Engineer to architect and maintain a highly available, global infrastructure capable of handling high QPS systems with 99.99% uptime. The role requires expertise in managing deployments across multiple regions, ensuring fault-tolerant systems, and driving scalability for mission-critical applications.
What You'll Do:
Deploy, manage, and scale Kubernetes clusters for high throughput and low latency across multiple regions.
Implement and maintain Infrastructure as Code (IaC) to build fault-tolerant, globally distributed systems.
Build and optimize CI/CD pipelines to enable seamless, zero-downtime deployments.
Ensure high availability (99.99%) for mission-critical, high QPS applications through monitoring, alerting, and incident management practices.
Support and maintain multi-region deployments to achieve low-latency and geo-redundant infrastructure.
Collaborate with engineering, product, and security teams to ensure scalability, security, and operational efficiency.
Contribute to continuous improvement by automating workflows, reducing toil, and sharing best practices.
You'll Thrive If You Have:
3–7 years of experience managing large-scale, high-availability systems in production.
Strong expertise in Kubernetes administration, including scaling clusters and handling multi-region workloads.
Hands-on experience with IaC tools like Terraform or CloudFormation.
Proficiency in designing and maintaining CI/CD pipelines for global deployments.
Solid knowledge of cloud platforms (AWS, GCP, or Azure) and geo-redundant architectures.
Proficient in Linux administration, scripting (Bash, Python), and debugging distributed systems.
A problem-solving mindset with the ability to troubleshoot complex production issues effectively.
Why Choose Osmos?
Startup Energy, Enterprise Scale: Fast-paced innovation with global ambition
Revolutionize Retail Marketing: Be at the forefront of AI-powered solutions
Meaningful Contribution: Directly impacts major brands' success
No Red Tape: Autonomy and empowerment to drive results
Growth & Fun: Continuous learning in a vibrant, collaborative culture
Competitive Rewards: We value your expertise and offer strong compensation
Ready to champion Infra & Cloud? Let's chat.
Quirky & fun. Enjoy new skills and hobbies like being a quiz master, playing board games, trying your hands on percussion, playing Djembe, and spreading love within the org!