Founding MLOps Engineer

About Us

At .omics, we build foundation models for plant biology — turning genomic and multi-omics data into tools for trait discovery and predictive breeding. We run a large internal GPU cluster for training and serving our models.

The Role

We're hiring a founding MLOps Engineer to make sure our models run efficiently at scale. You'll own the infrastructure for training, deployment, and monitoring of large AI models — optimizing workflows across GPUs and cloud, and ensuring reproducibility so research translates into real-world impact.

What You'll Do

Build and maintain end-to-end ML pipelines (data → training → deployment → monitoring).
Manage and optimize GPU cluster infrastructure: job scheduling (SLURM), resource utilization, observability.
Identify and resolve bottlenecks across the training stack — including data loading, I/O, and compute utilization — to ensure infrastructure never limits model development.
Improve SLURM workflows to maximize cluster efficiency, including running evaluation and benchmarking jobs alongside training runs.
Implement efficient hyperparameter tuning workflows designed to scale with model size.
Support model deployment on platforms like Hugging Face and Azure AI Foundry.
Build benchmarking systems to evaluate model performance across diverse biological datasets.
Define infrastructure and data requirements so lab-generated datasets are ML-ready.
Continuously improve CI/CD, reproducibility, monitoring, and deployment practices.

What We're Looking For

Must have:

Experience with ML pipelines and infrastructure on GPU-heavy workloads
Familiarity with MLOps tooling (MLflow, Docker, Kubernetes, Airflow/Kubeflow)
Strong Python/Bash and software engineering skills
Comfortable in cloud and/or HPC environments