Citing a Dataset

Citations to your data can add to your academic impact. A citation should include enough information so that the exact version of the data being cited can be located. Including a Persistent Identifier (PID) in the citation ensures that even if the location of the data changes, the PID will always link to the data that were used. You can indicate in your (Creative Commons) licence or user agreement that you want your data cited when reused. Data citations work just like book or journal article citations and can include the following information:

  • Author;
  • Year;
  • Dataset title;
  • Repository;
  • Version;
  • Persistent IDentifier (PID), often works as a functional link/URL.


A widespread standard PID is the DOI. DOI stands for ‘Digital Object Identifier’. A DOI is an alphanumeric string assigned to an object which allows for an object to be identified over time. Often a DOI will be presented as a link which looks like: There are other identifiers available which some repositories may use instead. If you are depositing in a reputable repository then you should be given some type of persistent identifier which you can use to cite and link to your data.

Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127‐797. Geological Institute, University of Tokyo.


  • Tip1: Get a PID at the data repository of your choice.
  • Tip2: Is your PID a DOI and do you want to cite it in the format of a specific journal? Use the DOI formatter from CrossCite.