Skip to content

pmetal dataset

Utilities for working with training datasets.

Analyze a local dataset — shows format, row count, token statistics, and column info.

Terminal window
pmetal dataset analyze train.jsonl
pmetal dataset analyze data.parquet

Download a dataset from HuggingFace Hub.

Terminal window
pmetal dataset download squad --output ./data/

Convert between dataset formats.

Terminal window
# Parquet to JSONL
pmetal dataset convert data.parquet --format jsonl --output data.jsonl
# ShareGPT to Alpaca
pmetal dataset convert sharegpt.json --format alpaca --output alpaca.jsonl
FormatExtensionsReadWrite
JSONL.jsonlYesYes
JSON.jsonYesYes
Parquet.parquetYesYes
CSV.csvYesYes