DataHerb is built to make clean and documented small datasets easy to retrieve. DataHerb is a metadata-driven open data listing service for datasets.

For Mac users, DataHerb is your “Homebrew for small data”.

How Does It Work

DataHerb do not take your data. The datasets are fully managed by the owners or curators. Once a dataset is linked to DataHerb, we will aggregate the metadata and build a page for the dataset.

How to Contribute

DataHerb is an initiative for transparent data management in open data. To achieve transparency, we use a metadata-driven design. Every step is transparent and can be investigated.

  • Contribute datasets: list your datasets on DataHerb in just two steps. Datasets that can be used to enhance machine learning datasets are preferred. Tutorial
  • Write a short story to tell us about the story behind your dataset and submit to DataHerb Articles.
  • Use DataHerb in your projects.
  • Spread the words.
  • Help us build a better DataHerb. GitHub Organization; Leave a comment


  • Python Package: load and explore any dataset together with the documentation using one line of python code.
  • Command-line tool: create, load and explore any dataset together with the documentation in your terminal.
  • Versioned datasets: make use of GitHub tag/release to version datasets and index versioned datasets automatically.
  • Better APIs: API for all datasets, …
  • Your ideas


  1. Many of the landing page arts are from unDraw.
  2. The embeded terminal is modified based on an MIT Licensed project Portfolio - Type help.