pmetal dataset
Utilities for working with training datasets.
Subcommands
Section titled “Subcommands”analyze
Section titled “analyze”Analyze a local dataset — shows format, row count, token statistics, and column info.
pmetal dataset analyze train.jsonlpmetal dataset analyze data.parquetdownload
Section titled “download”Download a dataset from HuggingFace Hub.
pmetal dataset download squad --output ./data/convert
Section titled “convert”Convert between dataset formats.
# Parquet to JSONLpmetal dataset convert data.parquet --format jsonl --output data.jsonl
# ShareGPT to Alpacapmetal dataset convert sharegpt.json --format alpaca --output alpaca.jsonlSupported Formats
Section titled “Supported Formats”| Format | Extensions | Read | Write |
|---|---|---|---|
| JSONL | .jsonl | Yes | Yes |
| JSON | .json | Yes | Yes |
| Parquet | .parquet | Yes | Yes |
| CSV | .csv | Yes | Yes |
See Also
Section titled “See Also”- pmetal train — Use datasets for training