Back to Home

Model Serving

Deploy your models to production with auto-scaling and low-latency inference.

Quick Deploy

oneml deploy my-model --replicas 3 --gpu t4

Serving Options

Serverless

Scale to zero, pay per request

Dedicated

Always-on instances for consistent latency

Batch

High-throughput batch processing

Streaming

Real-time streaming inference