What to archive?
Start thinking in advance about the selection for long-term preservation.
You are responsible for selecting the subset of your dataset for long-term preservation. The actual choices for selecting often depend on the specifics of the research involved, but some general guidelines can be given:
Data archiving concerns the final version of the original data, as far as they are relevant for verification of the conclusions or scholarly reuse.
As a general rule, the processed data that was used for a publication is archived, as well as the original source data. The “processed data” includes the data used for creating maps, graphs, tables, or other analyses in publications, while “source data” may perhaps be the original excavation database, any photographs, interviews, and satellite images. Sometimes one or more intermediate stages are also worthwhile preserving, for instance when they can be reused more easily than the actual source data or when substantial effort is needed to create them (e.g. interview transcriptions).
When making a selection, ask yourself:
- Is it really data? Files like powerpoint presentations are probably not needed for long-term preservation.
- Are the data already archived as part of another project, or with a publication?
- Are the data useful for future research?
- Are the data directly relevant for your publications?
- Are the data relevant for the verification of your results and conclusions?
- Are the data unique (impossible to recreate)?
- Are the data valuable (for instance effort to recreate, cultural heritage value)?
- Are there any obligations for long-term storage?
- Are the data subject to intellectual property rights?
- Are the data subject to privacy limitations etc.?
- Do you have source code that is necessary for reproducing your results or understanding your data?
Unique data: the data contain nonrepeatable observations that are important for academic and/or nonacademic purposes.
Valuable data: potential value in terms of re-use, national/international standing and quality, originality, size, scale, costs of data production, or innovative nature of the research.
Keep in mind as a general rule that the data you deposit are going to be reused in the future by you or some other researcher, preserve it in a way to make new analyses possible. Firstly, store the data as raw as possible, and secondly, add enough supporting information to make it comprehensible.