Data Management
Best practices for organizing, versioning, and managing ML datasets.
Dataset Structure
project/
├── data/
│ ├── raw/ # Original unprocessed data
│ ├── processed/ # Cleaned and transformed
│ ├── train/ # Training split
│ ├── val/ # Validation split
│ └── test/ # Test split
└── oneml.yamlVersioning
Track dataset versions alongside your models:
oneml data version --tag v1.0
oneml data list-versions
oneml data checkout v1.0