Back to Home

Data Management

Best practices for organizing, versioning, and managing ML datasets.

Dataset Structure

project/
├── data/
│   ├── raw/           # Original unprocessed data
│   ├── processed/     # Cleaned and transformed
│   ├── train/         # Training split
│   ├── val/           # Validation split
│   └── test/          # Test split
└── oneml.yaml

Versioning

Track dataset versions alongside your models:

oneml data version --tag v1.0
oneml data list-versions
oneml data checkout v1.0