What to Preserve?

These files should definitely be part of your data package

Others should be able to understand what you did. It is not enough to just provide data. Without associated information, research data quickly become useless. For all data selected for preservation, you should therefore keep a ‘data package’ consisting of:

  • Research data files themselves
    • primary (raw) data
    • secondary (processed) data
    • Sometimes, other criteria may help you decide on whether to include particular data or not, e.g.
      • Legal/ethical requirements to keep data for a specified retention period (e.g. for clinical trials)
      • Funder, institutional or publisher policies
      • High potential reuse value of the data
      • Great scientific, historical, or cultural significance of the data
      • The data are unique and/or cannot easily be re created.
      • The benefits outweigh the costs of data preservation.
  • Meaningful file/folder structure An overview of what the contents of the data package stating what file contains what information, and how these are related.
  • Documentation and metadata to ensure data remains findable, comprehensible, and (re)usable:
    • Computer code/scripts;
    • Protocols;
    • Lab journals;
    • Metadata and/or codebooks describing the data;
    • Collection methods;
    • Procedures;
    • Experimental protocol;
    • Your research question;
    • Stimuli used;
    • Sample descriptions.

This is especially practical if the data package can be found and used on its own account. This is the case if it is published in a data repository or data journal as a data package for reuse.

Do not forget to explicitly state who is responsible for the content of the data package, who is to be contacted in case of a request for access, and under what conditions access is granted.

Alternatives to preserving raw data

If preserving your raw data poses problems, alternatives can also ensure verfication. For instance, transcripts of recorded interviews could hold all important information and may be less privacy-sensitive, so it is reasonable to preserve those instead of the recordings themselves. Also, if raw data is very large, preserving your data only in some processed form could be an alternative. Combined with, for instance, a demonstrable quality check on the processing.