There are many different ways to set up and organise your documentation.
Project Level
Project level documentation gives contextual information about the study/project: it explains the aims of the study, the research questions, the methodologies, etc.
Project level documentation also seeks answers to questions such as:
For what purpose was the data created? Describe the project history, its aims, objectives, concepts and hypotheses, including:
The title of the project
Authors, creators, co workers of the dataset
The institution of the author(s)/creator(s)
Funders
Grant numbers
References to related projects
Publications from the data.
What does the dataset contain?
Kind of data (interviews, images, questionnaires, instrumental, etc.)
Organization & structure
Relationships between files
Description of data file(s): version and edition, structure of the database, associations, links between files, external links, formats, compatibility
How was the data collected?
The methodology and technique used in collecting and creating the data
Description of all the sources the data originate from
The methods/modes of data collection (for example):
The instruments, hardware and software used to collect the data
Digitisation or transcription methods
Data collection protocols
Sampling design and procedure
Target population, units of observation
What possible manipulations were done to the data? How was the data processed?
Modifications made to data over time since their original creation and identification of different versions of datasets
Describe workflow and specific tools, instruments, procedures, hardware/software or protocols you might have used to process the data
Anonymisation /pseudonymization strategy
What where the quality assurance procedures?
Checking for equipment and transcription errors
Quality control of materials
Data integrity checks
Calibration procedures
Data capture resolution and repetitions
Other procedures related to data quality such as weighting, calibration, reasons for missing values, checks and corrections of transcripts, transformations.
How can the data be accessed? Describe the use and access conditions of the data:
Where the data can be found
Access conditions such as embargo
Parts of the data that are restricted, protected or confidential
Licences
Permanent identifiers
Copyright and ownership issues
A complete academic thesis normally contains this information in details, but a published article may not. If a dataset is shared, a detailed technical report needs to be included for the user to understand how the data were collected and processed. You should also provide a sample bibliographic citation to indicate how you would like secondary users of your data to cite it in any publication.
File or Database Level
File or database level documentation documents how all the files (or tables in a database) that make up the dataset relate to each other, what format they are in, whether they supersede or are superseded by previous files, etc.
For this purpose, a codebook is advised. These codebooks can be used as a separate file or they can be embedded within the datafile. The first allows for much flexibility, but is yet another document to maintain, the latter sits close to data, is easy to use, but is hardly flexible and may get lost in conversion
Data level documentation should also seek to document the processing steps, answering questions such as:
What happens between data files and why?
What is the chronology like? What happens when, and why?
use annotated scripts or cookbooks that describe all steps, decisions and study protocol
Variable or Item Level
Variable or item level documentation documents how an object of analysis came about. For example, it does not just document a variable name at the top of a spreadsheet file, but also the full label explaining the meaning of that variable in terms of how it was operationalised.
Best practices regarding variable names:
Use valid variable names
Meaningful abbreviations, e.g. use bmi, not var1
Refer to numbering system in instrument, e.g. q1a, q1b, q2, q3a
Avoid simplistic numerical order system like v1, v2, v3
Short, no spaces, no special characters and lower case. (Gender vs gender)
Best practices regarding variable descriptions: Variables in tabular data should have descriptive labels.
Be brief, max. 80 characters
Spaces or special characters are ok
Include unit of measurement where applicable
Refer to number used in instrument. e.g. variable q11bhexw with label q11b: hours spent taking physical exercise in a typical week the description gives the unit of measurement and a reference to the question number (q11b)
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.