The Role
As a key member of our team, you will play a critical role in designing, implementing, and maintaining highly secure and reliable cloud-based systems tailored for government and military users with air-gapped requirements. Your expertise in Python, Terraform, and other Infrastructure as Code (IaC) tools will be instrumental in managing cloud environments across multiple platforms, including AWS, GCP, and Azure.
Having an in-depth understanding of advanced cloud technologies such as PubSub, various Cloud Storage Models, Cloud APIs, VPC, IAM, and other relevant services will be essential for successfully executing your responsibilities. Your expertise will extend to collaborating closely with machine learning teams to support ML workflows, leveraging your knowledge of GPU programming, CUDA, OpenMP, and parallel computing paradigms.
Moreover, as we cater to the unique needs of government and military users with air-gapped environments, you will actively collaborate with product teams to establish specific requirements and Service Level Objectives (SLOs) for deploying software securely and seamlessly.
Finally, you have experience in performing both on-premise and VPC-based deployments in the SaaS space.
Your Responsibilities
- Manage cloud environments across AWS, GCP, Azure, and EKS using Python, Terraform, and IaC tools
- Utilize expertise in cloud technologies (PubSub, Cloud Storage Models, Cloud APIs, VPC, IAM, Kubernetes) to meet government and military security and reliability standards.
- Collaborate with machine learning teams, supporting ML workflows with GPU programming, CUDA, and parallel computing
- Maintain the availability of cloud and physical Linux servers in air-gapped production environments, ensuring uninterrupted operation and security
- Integrate Google Cloud services with our core cloud management systems, utilizing Terraform templates and developing or modifying scripts that integrate with 1st party and 3rd party operational and observability tools
- Collaborate closely with product teams to understand requirements and define Service Level Objectives (SLOs) for deploying software into air-gapped environments
Requirements
- Bachelor's degree in Computer Science, Engineering, or related field with 5+ years of relevant professional work experience.
- Proven experience with Google Cloud, AWS, and Terraform, with a track record of successful cloud or on-premise application deployments.
- Solid understanding of Kubernetes, Ansible, and Docker technology: production experience in creating and administrating Kubernetes-based systems.
- Experience with microservices, API design (REST and gRPC), and cloud-native application development.