PIDs for research data

Research data plays a central role in the reproducibility, traceability and reuse of scientific results. Particularly in data-intensive fields of research - such as work with particle accelerators, satellites or research vessels - extensive and complex digital data sets are created, the long-term identification and referencing of which is essential.

Persistent identifiers (PIDs) enable unique, permanent and machine-readable referencing of research data. They support reliable citation, increase the visibility of data and make a significant contribution to transparency and openness in scientific practice. In addition, they make an important contribution to the recognition of research achievements beyond traditional text publications.

Which PIDs are used - and where are they available?

In many data repositories, PIDs are automatically assigned when research data is published. An overview of available repositories and their PID usage can be found, for example, in the re3data.org directory.

Common PIDs for research data are

  • DOI (Digital Object Identifier)
    The DOI is the most frequently used identifier for research data. It is assigned by DataCite or Crossref, among others.
  • ARK (Archival Resource Key)
    The ARK is a PID type that is primarily used in archives and long-term data collections. It is particularly suitable for the permanent referencing of extensive digital collections or complex resource structures. ARKs can be applied for via the California Digital Library, among others.
  • ePIC (European Persistent Identifier Consortium)
    A PID system that is used particularly in the European research environment (e.g. in earth system sciences).

An online seminar on PIDs for research data took place on 15 February 2024. The summary of the discussions and the presentations can be viewed here.