Archiving and long-term storage of data
Archiving is a way to make sure that the data is available over time. Archiving ensures access to data after the end of a research activity.
This is important for many reasons:
- It enables any future re-use of the data – both by yourself and others
- For validating your research results – allowing peer reviewers to disseminate your data
- You can show evidence that you base scientific results on high quality data and that you’ve followed good scientific practice if someone should question your results.
Archiving is also a process where it is legally required that you archive any data that is seen as an official document or having a cultural heritage value. Data that is not used for further processing to produce results may be discarded if the data is not necessary to save in order to disseminate the results. Data that is assessed as having a high value for future research should also be preserved for the future. Data that needs to be preserved is taken care of in a much more structured process to ensure it can be available over a very long time frame.
What data in my project counts as an official document?
Your data is seen as an official document when the data forms the basis for the derived research results. Examples of when information in research is seen as official documents are:
- Surveys and survey results
- Audio and media recordings
- Laboratory notebooks
- Resulting validated measurements from instruments and sensors
In general, information that is seen only as intermediary material in an ongoing research process may be discarded if that material is not necessary for interpreting or disseminating your research results. A concrete example of when data can be discarded is if you perform measurements and realise the instrument is not calibrated properly, so the data collected have no further value for analysis.
For how long do I need to store the data after the end of a project?
The main rule is that data seen as official documents that forms the basis of your scientific results need to be kept for at least 10 years after the end of a research project. For EU-funded projects a more strict requirement of 15 years applies. In clinical trials, the rule is that data must be kept for 25 years.
Most but not all official documents can be public – if information is classified as confidential, access to data need to be restricted. Ideally, you should classify your data when you collect the data. Since KTH is a public authority, anyone can request access to such data and get the request for access approved or denied.
Read more on official documents (only in Swedish)
There are also other more administrative documents that need to be archived, where a local administrator may help you.
Read more on what information to archive from your research in the KTH Information management plan.
When should I archive research data?
A good practice is to document the steps of how data is collected, processed and analyzed already from start. It is easier to re-use data for further research if the different steps of processing is documented and data from different steps are deposited for archiving. That data should be described with sufficient information on the context for data collection or data generation. For more details on how to do this, read more on the page Plan and document.
Where and how can I archive research data?
The general recommendation is to use a high quality repository for research data.
KTH has a service; KTH Data Repository. Here you can apply for a so called "community" where you can deposit documentation and static versions of data and source code for your research project. When it’s time to wrap up the project, you can assess whether data should be kept for internal archive or published as open data.
For some research domains, there are trusted, high quality domain-specific repositories where you can deposit data. In some fields, journals also require you to use specific repositories. Such repositories usually have specific guidelines on formats and documentation for that type of domain-specific data. There are also some generic repositories that can be used for open data that are deemed to ensure long-term storage for at least ten years. In some scientific domains data can also be published in specific data journals.
However, KTH still needs to keep a registry of where that data is deposited. One criteria for a high quality repository is that data is acessible in a machine-readable format. This makes it possible for automatic harvesting of the data and from such repositories KTH can harvest information to a KTH registry. A guide listing such repositories
If you choose a repository meeting the criteria of machine-readable formats in the guide, you don’t have to do anything more after depositing data there. If you choose any other location for archiving the data from your research, deposit a copy of the data or make a metadata record linking to the location of your data in KTH Data Repository.
You are always welcome to contact researchdata@kth.se for more advice on data repositories!
For official documents and data that needs to be preserved for future use, additional processing of data may be needed. Contact the KTH Archive for further support on preservation.