Model Serving
Deploy your models to production with auto-scaling and low-latency inference.
Quick Deploy
oneml deploy my-model --replicas 3 --gpu t4Serving Options
Serverless
Scale to zero, pay per request
Dedicated
Always-on instances for consistent latency
Batch
High-throughput batch processing
Streaming
Real-time streaming inference