Research Data Management

If you have any questions about research data management and research data plans, please don't hesitate to contact us.

 

Research data is "data collected or produced (e.g. measurements, questionnaires or source materials) in the course of scholarly activity which is used for the purposes of academic research (e.g. digital copies) or which document research findings [...]". forschungsdaten.info 

Some examples of research data

  • Sources: Texts, images, sound recordings, films/videos
  • Observations: Real-time data, examination data
  • Experiments: Laboratory values, spectrograms
  • Simulations: Simulation measurements, model measurements
  • References: Collection of already published datasets
  • Methodological methods such as questionnaires, software or simulations

A data management plan (DMP) forms the basis of good research data management. The DMP describes the life cycle of research data and is intended for long-term use. It describes how the data is to be produced, collected, documented, published and archived during a project.

An integral part of a DMP is the description of the research data in accordance with the FAIR principles. Among other things, a DMP should include information about the following:

  • Data collection and documentation
  • Ethical, legal and security issues
  • Data storage and preservation
  • Data exchange and reuse

The DMP is submitted along with the project proposal (SNF) or shortly after the project has commenced (H2020), and it should be updated and extended at regular intervals. As it describes discipline-specific practices and standards, the content may differ from project to project.

Further information about requirements can be found under Funding requirements.

More and more funding agencies and institutions require publications and research data to be provided on an open access basis.

The submission of a data management plan is a mandatory integral part of the research proposal.

Research data must be archived in open access repositories, unless there are legal, ethical, copyright or other constraints on data sharing.

The data management plan must be submitted within the first six months of starting the project.

Research data must be archived in open access repositories, unless there are legal, ethical, copyright or other constraints on data sharing. Justified opt-outs are possible.

Funders are aware that publishing data can entail additional expense so some costs are eligible for funding.

If the conditions are met (Funding Regulations 2.13), up to CHF 10,000 of the costs for enabling access to research data may be eligible. Such costs should be included at the time of application.

Costs incurred in order to enable access to research data collected, observed or generated in connection with Horizon 2020 projects are eligible for reimbursement during the course of the project.

To avoid errors, mix-ups and long search times in future, it is worth investing some time in creating a systematically organized file and folder structure already at the start of a project. This is especially important if you are collaborating with other research groups. Everyone involved in a project should agree to a scheme and stick to it. It is advisable to record the organizational and naming scheme in a document which you subsequently deposit with the published data as an accompanying document.

  • Group related files in folders (e.g. for measurements, methods or project phases)
  • Use clear, unique folder names
  • Use a hierarchical folder structure (N.B.: too many nested levels results in long and complicated filepaths)
  • Keep active and completed work in separate folders and delete any temporary files that are no longer required.

Make sure you use file names that are unique and are also meaningful for people who are not involved in the project. General elements that can form part of a name:

  • Creation date (YYYY-MM-DD)
  • Project reference/name
  • Description of the content
  • Name of creator (initials or whole name)
  • Name of research team/department
  • Version number


To avoid operating system constraints, use the following character/naming conventions:

  • Short names
  • No special characters (: & * % $ £ ] { ! @)
  • Use underscores _ rather than blank spaces or dots
  • Include a file suffix wherever possible (.txt, .xls, etc.)
  • Do not rely on uppercase/lowercase distinctions

The careful choice of a file format can ensure that files can still be used after many years and consequently greatly facilitate reuse of the research data. When choosing a suitable format, various factors should be taken into consideration:

  • Future-proofing: how many software products can read the data format?
  • Open access to documentation
  • No legal constraints (patents)
  • No technical constraints (encryption, DRM)
  • Established in community


The file formats for research data can vary widely depending on the discipline in question. The following file formats are recommended:

  • Images: TIFF, TIF
  • Documents: TXT, ASC, PDF/A
  • Tabular data: CSV
  • Audio files: WAF
  • Databases: SQL, XML
  • Structured data: XML, JSON, YAML


Further information about which file formats are recommended for long-term preservation can be found at here.

It is essential to use version control, especially for datasets that change over the course of a project. Individual datasets should be named sequentially and the names should include the save date (YYYY-MM-DD) along with the version number. The final version should be indicated as such. Maintaining a version table in which all changes and new names are recorded can help keep track of the datasets.

Especially when working with a number of different people, it may be advisable to regularly save a milestone version of the file which then must not be changed or deleted.

To summarize, forschungsdaten.info recommends:

  • Use sequential numbering
  • Include the date and version number in the name
  • Use a version control table
  • Specify who is responsible for providing the final files
  • Use version control software for large data volumes
  • Save milestone versions


Further information and best practices

We recommend you back up your data using the university's IT system as it collects the data campus-wide and redundantly backs it up to two state-of-the-art tape libraries.

Click here for more information: Campus Backup/Archive

You should always adopt the 3-2-1 backup strategy:

  • 3 copies of the data (1 original + 2 backups)
  • Stored on 2 different types of media (external hard drives, USB sticks, SD cards, CDs, DVDs, Cloud)
  • 1 copy off-site


Backup should be automated to run at regular intervals. Check that the backup was successful and that the data can be retrieved again if necessary.

Comprehensive documentation is essential to enable correct interpretation and reuse of the data at a later date. Among other things, the documentation should include details about the time and place the data was collected, the methods, tools, software and statistics models used, as well as information about the parameters chosen and any missing values, along with nomenclature and acronyms.

Click here for further information.

Metadata is information about data which is created in a structured and machine-readable form. The metadata helps other researchers find and reuse data. Depending on the particular discipline, there are various commonly used metadata standards and tools that can be used to describe datasets in different domains.  

The repository of the University of Bern (BORIS) uses the Dublin Core metadata element set. This metadata is automatically generated by filling in a form when depositing a dataset in the repository.

The decision about what data for a project should be archived and for how long depends on the academic value of the data as well as on legal, regulatory and financial factors.

As a minimum, however, all the data on which a publication is based must be stored and the corresponding metadata must be published online.

The Digital Curation Centre (DCC) and forschungsdaten.info list five steps for deciding what data to keep.

Guide (English)

Guide (German)

Wherever possible, data should be deposited in disciplinary repositories. These are designed to meet the needs of the particular field, are aware of specific data formats and often also offer specific disciplinary metadata.

On its website SNF provides a checklist (pt. 5.1) that you can use to check whether your chosen repository complies with the FAIR principles.

The best starting point for finding a suitable repository is the Registry of Research Data Repositories (re3data.org). The Open Access Directory and PLoS also provide an extensive list of data repositories.

The following repositories adhere to the FAIR principles and are approved by SNF. They are open to researchers in all disciplines. The list is not exhaustive.

BORIS satisfies the requirements for a repository as specified by SNF. It is therefore possible to store supplementary data for publications for which there is as yet no disciplinary repository in BORIS. However, BORIS is not suitable for large raw datasets. We recommend you store these in a general repository.

There are plans to create an institutional data repository (BORIS Research Data).

Before being published, data must be provided with a license. Wherever possible, we recommend choosing an open license such as CC0 or CC BY for example. You can find more information about Creative Commons licenses here.

As part of the FAIR principles, funding bodies require a unique identifier to be assigned to the published data. When depositing your data in BORIS, a Digital Object Identifier (DOI) is assigned to each dataset. Click here for further information.

Research data generated and collected during a project can often be useful beyond its original purpose. It is therefore worthwhile making the data obtained publicly accessible. For this purpose it is important to ensure that your data is assigned persistent identifiers, good metadata is generated and sufficient documentation is provided to enable the data to be reused.
There are currently three ways of publishing research data.

Research data can be published in a disciplinary or a general repository. If possible, it is preferable to publish data in a disciplinary repository rather than in a generic one. Further information about selecting a suitable repository can be found in Finding a repository.

Data papers published in data journals are documents that facilitate the dissemination and reuse of published data. These publications contain all information about data collection, methods, licenses and access rights along with information about potential reuse opportunities. The data itself is usually deposited in a repository.

The website of the Humboldt University of Berlin has a list of data journals.

Data can also be published as additional information for an article in a periodical. This is usually the data on which the publication is based which enables the findings to be understood. The data may either be deposited directly on the periodical's platform or in an external data repository.     

When citing data it is advisable to use either the standards applicable to the research field in question or the form suggested by the repository in which the dataset was deposited. If there are no particular standards or recommendations, Datacite recommends providing the following details as a minimum:

  • Author
  • Year of publication (of the dataset)
  • Title
  • Edition or version (optional)
  • Publisher (for data this is usually the archive in which the data is stored)
  • Resource type (optional)
  • Persistent identifier (as a permanent linkable URL)

Is your data ready to be archived? Take a look at the pre-deposit checklist.

If you have any questions please contact us using our general e-mail address researchdata@ub.unibe.ch.

This e-mail address is monitored by everyone in the research data management team so we can answer your queries efficiently.

November 2017

Workshop: Research data management and Data management plan training course with Sarah Jones (University of Glasgow, Digital Curation Center)
In this workshop, the key requirements for European research projects will be discussed, practical advice on Data Management Plans (DMPs) will be given, and a special focus will be placed on “best practices” in managing data.
Speaker: Sarah Jones
The number of participants for this workshop is limited.
Date & time: Friday, 17 November, 9:15-12:15, University of Bern, Hallerstrasse 6, Room 205
Registration
 

Recent events

SNSF informs about Open Research Data and Data Management Plans

As of October 2017 SNSF funding applications must include a Data Management Plan (DMP). Additionally, the SNSF expects that research data will be published and made freely available whenever possible.
The SNSF will hold a briefing session to provide researchers with practical information on the SNSF Open Research Data Policy and Data Management Plans.
Date & time: 02.11.2017 at 12.30h
Location: Room A003 at UniS, University of Bern
Language: English
Podcast: https://boris.unibe.ch/106848/