Published May 29, 2024 | Version v1
Presentation Open

Metadata Ahoy! Charting a reusable path for machine learning

  • 1. University of California San Diego

Contributors

  • 1. ROR icon University of California, San Diego
  • 2. University of California San Diego

Description

Machine learning (ML) is more popular than ever, but what is needed to best document, curate, and archive ML research outputs? Data curators are largely in uncharted waters as to what extent repositories are able to manage ML objects and components (data, code, parameters, documentation, etc.) in a way that matches researcher needs and uses. But before we can plot a course towards a set of best practices, we must first ask: where are we now?

This presentation will provide an overview of a recent research project that assessed how well metadata schema and fields in eight generalist (Figshare, Zenodo, Harvard Dataverse, etc.) and specialist repositories facilitate findability, interoperability, and reusability of ML objects. We will discuss strengths of and opportunities for these repositories, and what generalist repositories can learn from specialist repositories and vice versa. The presentation will also summarize the outputs from this project, all of which are publicly available: a multi-repository metadata field crosswalk, complete metadata exports of nearly 20,000 ML-related items from these repositories, and user interface and code to query repository APIs and standardize and analyze metadata exports. We hope the IASSIST community will dive deep into this bounty of (meta)data!

Files

Labou_IASSIST2024.pdf

Files (1.9 MB)

Name Size Download all
md5:a90c1b22038aeccef22ee24644e87ad7
1.9 MB Preview Download

Additional details

Related works

Is published in
Journal article: 10.7191/jeslib.685 (DOI)
Is supplement to
Dataset: 10.6075/J0JS9QMH (DOI)

Software