Data management plan (DMP) is compulsory for almost all research projects either funded by the Research Council of Norway or European Commission. It is said that the effective data management plan could save large sum of repeat expenses and opens up possibility of new research collaboration based on secondary processing of data. Overall scope of the present document is data generated through research projects in the waterpower laboratory. The document describes data management plan and permanent repository of data generated in the waterpower laboratory through associated research projects and education. However, main purpose is data repository that allows building of next generation of data as well as upgradation. Main source of the data is research conducted by master students, PhDs and Postdocs in various research projects. Data management plan and the repository are extremely important to continue research in proper direction as well as building of blocks towards final technological output. This system also helps to avoid scratch start with each new project. The new students and researchers will have possibility to look into previous data sets and analyze the data and then continue new research without repeating, which is already carried out previously.
This document provides abstract information of research data management plan, storage type and scope. The majority data corresponds to the category of Engineering and physical science that means no personal/health data are included. During a project period, the data of multiple layers can be obtained, i.e., row data, main (processed) data and published data (research paper, dissertation, thesis, book, etc.). The published data are deposited at institutional repository of NTNU including Cristin, other data can be stored locally which allows continuous development of tools and techniques in the laboratory. The data are sorted into different categories such as, structured data, descriptive data, primary data (main data), secondary data, etc. Research publications and the data presented in the publications provide complete open access and the metadata search options to the international research community. However, permanent repository in the waterpower laboratory aims for storage of large data from the experiments, simulations, programs and numerical codes which are continuously developed. All these data shall be structured systematically and managed locally in the waterpower laboratory. The data are intended to store continuously as project work is in progress, both row and main data, and the lifetime of data is dependent on the project need as well as possibility of future development.
CONTENTS
The data management plan
ENGINEERING AND PHYSICAL SCIENCE
Background
A DMP is a document describing how data in a research project will be managed, from project start up, throughout the research process and in the time after completion of the project. A DMP describes what type of data will be collected. The plan states how the data will be stored, described with metadata, analyzed and, if possible, shared. The plan also addresses issues related to rights, privacy and costs. A DMP is a tool for planning and raising awareness and should be a "living document" which is updated during the research project depending on need. A good plan for how to organize and describe the data can make the project work more effective by making it easier to understand and work with the data, especially in larger research projects, such as FME centres. Good documentation and data management contributes to increased data quality, as well as verifiability and reuse.
The Research Council of Norway has a policy for Open Access to Research Data, where the standard is that the project they fund should have a DMP. Similarly, European Commission has certain requirements related to data management and open access for various project schemes. It is extremely important to have DMP in place. The waterpower laboratory aims to manage research data according to international standards, such as the FAIR principles, CARE principles as well as NTNU Open Data concept and thereby support the development of a global research community in which research data is widely shared. The outstanding example of open data is the series of Francis-99 open data, which was initiated in 2010, in fact much before the existing open data policies. Under the Francis-99, precious experimental data collected in the waterpower laboratory are openly available to the research community.
A data management plan:
- is a document that describes how the data is to be managed both during the research project and after it has been completed;
- makes it possible to identify at an early stage significant problems to be resolved (such as obtaining consent or taking consideration to copyright);
- identify ahead of time any additional costs or resources needed to manage the data (such as additional storage capacity, etc.);
- helps to plan the need for data management ahead of time and to monitor data activities throughout the lifetime of the project.
Abstract detail from the national strategy on access to and sharing of research data
Research data should be shared and reused more widely
Better access to research data can boost innovation and value creation by enabling actors outside the research community to find new areas of application. Another benefit that is important in its own right is that greater transparency and insight into research can help to increase confidence in researchers and research findings. In order to make research data more available and increase reuse, researchers need the competence and tools to manage data in a sound, secure manner throughout all steps of the research process. They must have the infrastructure needed for collecting, analyzing, archiving and sharing data, as well as access to clear information about this infrastructure. The infrastructure in place must lay a foundation for cooperation and knowledge-sharing that extends across countries and sectors. It should be easy for international researchers to find Norwegian data sets.
This strategy does not cover research data from privately funded research and development activity. According to the basic principles, in cases where private actors are granted public funding for research or cooperate with public research institutes, universities, university colleges or hospitals on research and innovation projects that are publicly funded, it is possible to restrict access to data to protect trade secrets or when this is necessary in connection with commercialization of results. It will be up to private actors to assess this from case to case.
What does “publicly funded research data” mean?
- Data collected or generated for use for or as a result of publicly funded research.
- Data underpinning publications that are the result of publicly funded research, regardless of the source of the data.
Figure 1. Research data types and collection.
Basic principles
- Research data must be as open as possible, as closed as necessary.
- Research data should be managed and curated to take full advantage of their potential.
- Decisions concerning archiving and management of research data must be taken within the research community.
Government expectations and measures
- the research institutions to work to raise the competency of their staff and students by providing training in data management and reuse of data.
- the research institutions to consider taking part in national and Nordic cooperation with a view to establishing educational programs for research data management and stewardship.
- the research institutions to develop procedures for (i) approving data management plans and (ii) determining whether a given research project is of a type for which an individual data management plan is not necessary or suitable.
- research institutions, administrators of research data infrastructure and researchers to work towards standardization and harmonization that facilitate sharing and reuse of data in accordance with international standards and best practice in different subject areas (for example, by establishing national, subject field-based communication arenas).
- the development of self-service solutions, when feasible, to reduce the costs of operating research data infrastructure by simplifying processes for depositing and accessing research data and metadata.
Abstract detail from the research council of Norway
Policy for research data management is driven from the national strategy. Projects that receive funding from the research council are to assess whether the need to draw up a data management plan. As a general rule, R&D-performing institutions themselves are responsible for determining which archiving solution to use. If the project owner decides that a data management plan is necessary, it should develop such a plan in line with the institution's own guidelines. This plan should be submitted in connection with the revision of the application. Whenever possible, data management plans should be available to the public and be openly published by the research institution so that the academic environment may be able to follow the practice of its colleagues. Under certain circumstances, the research council is entitled to stipulate storage of data and/or metadata in specific national or international archives. For example, in connection with certain relevant projects in the fields of social science, humanities, medicine and health, and environmental and development research, the research council asks to archive data at the Norwegian Centre for Research Data (NSD).
Recommendation and guideline
The data management plan is a living document that follows the research project and specifies the following: (1) the kind of data that will be generated (2) how the data will be described (3) where the data will be stored (4) whether and how the data can be shared. The purpose is to plan how to safeguard the research data, not just during the project period, but also for future reuse of the data. A data management plan is an effective means of identifying costs associated with data management and storage and can also help you to plan how to cover these costs. Data management plans are to be made public and openly accessible. This will promote greater openness and enable scientific groups to follow peer practice. The research data that are stored must be of quality that makes them possible to find and reuse. The Research Council recommends that you follow the international FAIR principles. In keeping with the FAIR principles, research data must be accessible, findable and reusable. The concept interoperable entails that both data and metadata must be machine-readable and that a consistent terminology is used.
Current open access requirements
At present, the Research Council stipulates the following requirements:
- All articles from projects funded by the Research Council are to be made available in open repositories. The deadline for doing so is six months after date of publication in journals related to medicine, health, mathematics, natural science or technology.
- Storage in relevant repositories will normally be carried out in connection with registering and uploading the complete text via the CRIStin Research Information System.
The Research Council Stimulation Scheme for Open Access Publication (STIM-OA) provides support to institutions for activities to make publications openly accessible.
How the STIM-OA scheme works:
- Institutions may apply for funding to cover costs for open access publication independently of how the research activity was funded.
- The STIM-OA scheme covers Article Processing Charges in connection with publication in open access journals.
- The scheme covers up to 50 per cent of the total costs incurred by the institutions, to be disbursed in arrears.
- The Research Council will not cover publication costs for articles as part of the budget for an individual project.
Funding under the scheme will be available through 2022.
Abstract detail from the EU, H2020
Projects funded by H2020 are required to develop a data management plan within 6 months of received funding. In the plan you will be asked to specify: (1) what data will be open (2) what data the project will generate/use (3) how the data will be utilised or made available for verification and reuse (4) how it is organised and stored.
Core requirement for data management plans (borrowed from Science Europe):
- Data description and collection or re-use of existing data
- How will new data be collected or produced and/or how will existing data be re-used?
- What data (for example the kinds, formats, and volumes) will be collected or produced?
- Documentation and data quality
- What metadata and documentation (for example the methodology of data collection and way of organising data) will accompany data?
- Storage and backup during the research process
- How will data and metadata be stored and backed up during the research process?
- How will data security and protection of sensitive data be taken care of during the research?
- Legal and ethical requirements, codes of conduct
- If personal data are processed, how will compliance with legislation on personal data and on data security be ensured?
- How will other legal issues, such as intellectual property rights and ownership, be managed? What legislation is applicable?
- How will possible ethical issues be taken into account, and codes of conduct followed?
- Data sharing and long-term preservation
- How and when will data be shared? Are there possible restrictions to data sharing or embargo reasons?
- How will data for preservation be selected, and where will data be preserved long-term (for example a data repository or archive)?
- What methods or software tools will be needed to access and use the data?
- How will the application of a unique and persistent identifier (such as a Digital Object Identifier (DOI)) to each data set be ensured?
- Data management responsibilities and resources
- Who (for example role, position, and institution) will be responsible for data management (i.e. the data steward)?
What resources (for example financial and time) will be dedicated to data management and ensuring that data will be FAIR (Findable, Accessible, Interoperable, Re-usable)?
NTNU Open Data: NTNU’s policy for open research data 2018-2025
Making research data accessible and reusable contributes to increased reproducibility and transparency in science and may prevent the same data from being collected several times. Open data also create the basis for new and innovative digital services that have the potential to be of great societal value. It is reasonable to expect publicly funded research to be useful for society. Thus, there is considerable national and international awareness regarding open research data.
Expenses related to basic management, storage and publishing of research data should typically be covered by the individual research projects and will usually be considered a legitimate cost in applications for funding. Open access to research data should normally entail free use externally. Covering actual costs related to special preparation of data sets and similar should still be possible.
- who will be responsible for managing the data during and after the project, and what resources are needed.
- how to ensure that the data is well-organized and sufficiently documented (metadata).
- the volume and type of data set to be generated/used.
- how to ensure that the data is compatible with ethical and legal requirements.
- where the data is to be stored and backed up during the lifetime of the project.
- how to save and make the data accessible to others in the long term.
NTNU’s policy for open research data is based on the following principles:
- NTNU’s research data should be openly accessible by default. Data are exempt when required due to safety, privacy, legal or commercial concerns.
- Research data should be made openly accessible as early as possible in the research process, without coming in conflict with the researcher’s use, quality control and possible commercial use of the data.
- Research data published from NTNU must be clearly labelled with provenance and ownership.
- Research data must be findable, accessible and usable without system restrictions and should be reusable by others. NTNU adheres to the FAIR principles for research data management and will use research data licences required by these.
- Research data must be stored with the aim of being accessible in the long term. Classification and metadata must as far as possible follow international standards, and formats must be updated over time if necessary. Data management throughout the life cycle must be in accordance with current legislation and requirements from funding agencies, governments and relevant stakeholders.
- NTNU will make sure that researchers have access to suitable services and infrastructure for secure storage and sharing of research data. NTNU will contribute to the development of appropriate national and international solutions within the field.
- The individual researcher is responsible for managing research data in accordance with applicable regulations, principles and requirements. The procedures must be documented in a data management plan.
NTNU, in addition to the institutional archive, recommends national resources such as easy.DMP (www.easydmp.sigma2.no), which is also part of UNINETT Sigma2 and allows storage of other data and are available openly. Currently a Digital Object Identifier (DOI) is not issued. EasyDMP follows (1) Science Europe (2) Horizon 2020 and (3) Institutional Templates. EasyDMP is a web form consisting of a series of questions grouped into a number of sections. The questionnaire is dynamic, meaning that the type and the amount of questions you will be presented at every stage depends on the answer you have given at the previous stage.
Ownership of research data
As for scientific publications, the main principle for research data is that the institution retains the intellectual property rights and the author/researcher the copyright. Normally NTNU does not own research data produced by students or guest researchers, unless this has been agreed on, for example through externally funded projects.
As a rule, NTNU owns all research data collected and processed by personnel employed or contracted at the institution. This gives NTNU the right to openly publish the material, but does not preclude
- The researcher from claiming an embargo (delayed open publishing) on the content, but normally not for the metadata.
- NTNU from refraining to use its right to open publishing of the material. However, this does not affect the ownership of the data. If the researcher still publishes the material openly through a publisher, the publisher must also give NTNU the right to share this material openly.
Issues regarding rights and ownership must be clarified and secured in an agreement in cases where data are used for commercial and patenting purposes. Agreements must also be made in situations where NTNU’s own researchers utilize other’s data.
Data management plan for the waterpower laboratory
A good data management plan can make the project work more effective and robust cooperation between the research groups, especially in large research projects. Good documentation and data management contributes to increased data quality, as well as verifiability and reuse. Research data should be archived for as long as they are of value to the researcher and/or specified by the funding agency, patent rules, legislation, embargo limit and other regulatory requirements. The shortest storage period for research data is three years after publication unless otherwise determined by the law. In most cases, research data are stored longer than the minimum three-year requirement. In general, research data should be made accessible at the earliest possible time, but only after the research team’s first right of use period and the publication of research work.
The contents of a data management plan vary between the research fields. In the waterpower laboratory, the data corresponds the discipline of engineering and physical science—fluids engineering, hydropower, material erosion, fatigue and lifetime, computational fluid dynamic, vibration, labview programs, engineering codes (matlab, fortran, ansys, python, c, etc.), and engineering drawings. In this repository, no personal and health data require therefore it will be possible to share the data without restrictions unless it is clarified under the grant agreement of the relevant project. The data management plan is drawn specifically for the laboratory, but it is valuable input/reference for writing of new research projects as well as other laboratories within NTNU. The purpose is safeguarding the research data, not just during the project period, but also for future reuse of the data that will enable continuous development of research techniques in the laboratory.
Licensing and data openness
Permanent repository will be created in the laboratory that will be accessible to the researchers working on various associated projects. The storage capacity will be upgraded regularly depending on need. Data storage is compulsory for all public funded projects however exceptions may be granted for the commercial projects to protect confidentiality of research results. Data storage in the repository shall follow the different layers of permissions or common creative license (see Figure 2): Creative Commons Attribution 4.0 (1) Attribution: CC BY (2) Attribution-ShareAlike: CC BY-SA (3) Attribution-NoDerivs: CC BY-ND (4) Attribution-NonCommercial: CC BY-NC (5) Attribution-NonCommercial-ShareAlike: CC BY-NC-SA (6) Attribution-NonCommercial-NoDerivs: CC BY-NC-ND. The ownership will strictly follow the NTNU policy “Ownership of research data.” For commercial and classified research projects, data may/not be stored in the repository depending project conditions. When it comes patent and Intellectual property rights, the data management plan strictly follows the NTNU policy and the project’s grant agreement.
Figure 2. Type of creative common license largely adopted by public funded research projects.
Expected research projects and duration of data collection
Key objective of permanent repository is to enable continuous development of research tools and techniques and, most importantly, avoid repeated research work and data collection. Source of the data are master students, PhD and postdoc on various projects. For newly funded projects or newly started researcher/ students, there is always possibility to study and analyse the previous work/data and build upon those data/ technique. Thus, repeated work can be avoided. Other important aspect is continuous development. For example, the laboratory is involved in developing tools for turbine design, fatigue lifetime estimation, numerical techniques. The tools are developed over 10 – 15 years of work through various research projects that means the data storage is extremely important and the new project work will continue from the last project work instead of either scratch starting or reproducing similar data again.
Figure 3. Categories of research work/projects in the waterpower laboratory.
Research work in the laboratory is grouped into three categories (see Figure 3): short, medium and long term. Short term research corresponds to specific topic and highly related to need of the hydropower industries or solving certain problem at the hydro power plant. Duration of this work is generally 1 – 5 years, and it may be master thesis work or IPN/KPN research project. Medium term research corresponds to developing a technology and demonstration that allows commercialization. One of the examples is design a turbine for flexible operations that includes, design from scratch, laboratory testing, lifetime estimation and implementation at prototype environment. Duration of this work is generally 5 – 12 years, and it includes development of in-house tools and techniques and comprises of master and PhD thesis work, large research projects including H2020. Long term research involves research carried out for FME centers, laboratory development, capacity building and long-term strategic goal of the laboratory as well as continuous work related to academic fundamental research in distinct area. Duration of this work is generally 8 – 25 years and involves several research projects large and small scale, several masters and PhD thesis. For such cases, permanent repository of data plays extremely important role to enable smooth communication between the project data as well as PhDs.
Data structure of data management plan
Figure 4 shows data structure and categories generally considered for any data management plan. Row data are the actual (first) data acquired (without any processing) in the laboratory. Metadata is data that describes other data. Meta is a prefix that — in most information technology usages — means "an underlying definition or description." Metadata summarizes basic information about data, which can make finding and working with particular instances of data easier. For example, author, date created, date modified, file type, file size, main flow phenomena are examples of very basic document metadata. Having the ability to filter through that metadata makes it much easier for someone to locate a specific document. Primary data are the data type generated after processing the row data. The primary data are generally publishable data directly or easily converted to any publishing format, i.e., image, video, graph, plot, table, etc, for scientific visualization and interpretation. Secondary data are the data generated using primary data, for example further analysis of data which are already processed, it may be for statistical analysis, developing statistical methods such as curve fitting, compilation of different flow phenomena, to study other cross disciplinary research field. Structured data/ metadata are the data arranged systematically. For example, all data related to draft tube vortex rope are stored at certain place. Descriptive data/ metadata represent consecutive development/ arrangement of data of similar group/ phenomenon. For example, data generated during the development of turbine design tool. Data related to turbine design stored at certain location then next version of development will use this data as base data and continue the development. Finally, summary of data combining the design tool development of first phase, second phase, etc.
Figure 4. Categories of research data.
Main category of research activities and field of data collection
Research projects in the waterpower laboratory belongs to the field of fluids engineering and hydropower. The topics include turbine design, fatigue loading and lifetime estimation, flexible operations, vortex breakdown, rotor stator interaction, dynamics (water hammer and surging) and fundamental work related to fluid mechanics and boundary layer. This enables main category of data collection through experimental and numerical studies. Figure 5 shows abstract detail of main research and Table 1 provides details of the research in the waterpower laboratory. Majority of data types are pertained to pressure, velocity and strain measurements as well as numerical results and programs/ codes.
Figure 5. Main category of research activities in the waterpower laboratory and data collections.
Table 1. Detailed classification of research in the waterpower laboratory.
SR # | Research categories | Classification of data and related research |
---|---|---|
1 | Turbine design and optimization | Design of Francis, Pelton and Kaplan turbines, related algorithms, optimization programs, codes, numerical simulations, manufacturing, model acceptance tests, hill diagrams, efficiency and pressure measurements. |
2 | Fatigue loading and lifetime | Strain gauge measurements, material testing, SN diagrams, welding techniques, development of lifetime tool, fluid structure interaction, vibration measurements, identification of resonance. |
3 | Flexible operations | Stat-stop, ramping, load rejection, runaway operations, pressure and strain measurements, optimization of start-stop. |
4 | System dynamic | Water hammer, surging and other dynamics related to penstock and draft tube during load rejection process or flexible operation of turbines, bifurcation, trifurcation of penstock, water start time, main inlet valve. |
5 | Specific phenomena and loading | Fundamental research, rotor stator interactions, vortex breakdown, cavitation, draft tube vortex rope, flow measurement techniques such as Winter-Kennedy, pressure-time, sediment erosion, sound wave, pump-turbine S-characteristics, other research upon specific industrial need. |
Figure 6 describes the categories of metadata based on main data collected either from the measurements or from the simulations. Classification of data and related research presented in Table 1 covers all these categories of metadata (or at least one category of metadata). Pressure data are generally collected during almost all experimental research work in the laboratory. However, the data collection technique very from project-to-project or according the type of research. For example, in some cases flush mount sensors are used to acquire high frequency data, at the locations mainly turbine blades and guide vanes. Strain measurements are largely conducted on the blades with specific focus of fatigue lifetime and identifying the resonance condition. Velocity measurements are performed either in the vaneless space or in the draft tube. Computational fluid dynamic (CFD), fluid structure interaction (FSI) and code are used in almost all five categories shown in Table 1 depending on project requirement. Figure 7 shows the expected descriptive metadata during the research work. The metadata are derived from the main data obtained either from the measurements or from the numerical simulations. However, the main data are pressure, velocity and strain. For all research projects in the waterpower laboratory, at least one of these data are collected. The secondary classification of descriptive metadata is presented in Table 2. The metadata can be sorted further according to the file type or the extension type. File type is expected to change as data category changes. For example, acquired row data during the measurements may carry file type depending on software used to collect the data however most common type formats are *.csv and *.txt. After processing the row data, net level of format may be in the figure-graph-plot format *.png or *.jpg for publication of research work.
Figure 6. Categories of structured metadata classified from main data.
Data preservation and life expectancy: short (0-5 years), medium (6-10 years), long (11-20 years), infinite (>20 years). This useful to save space and effective management of data because some of the data becomes obsolete after certain years. This may be due to constant development new technologies and techniques. For example, Fortran code developed 20 years before for simple flow analysis in long pipe are in fact obsolete. Such numerical analysis can be easily carried out by modern tools or ANSYS with more accuracy.
Figure 7. Descriptive metadata classified from the structured metadata.
Data preservation and life expectancy
Figure 8 shows the data preservation and life expectancy: short (0-5 years), medium (6-10 years), long (11-20 years), infinite (>20 years). This is useful to save space and effective management of data because some of the data becomes obsolete after certain years. This may be due to constant development new technologies and techniques. For example, Fortran code developed 20 years before for simple flow analysis in long pipe are infact obsolete. Such numerical analysis can be easily carried out by modern tools or ansys with more accuracy. Similarly, experiments carried out 20 years before to study pressure variation in vaneless space does not provide accurate detail as compared to modern pressure measurement techniques. Hence, those data can be discarded. However, data life and preservation follow the NTNU open data policy as well as data ownership. For example, data of any commercial project may be deleted if it agreed in the project and grant agreement after the embargo period or as soon as project completes. Furthermore, if any data is published in a publication jointly with industry, those data are expected to preserved for minimum three years after the final publication of the research results in a paper.
Table 2. Descriptive metadata classified in the file types and extension.
SR # | Metadata types | File extensions |
---|---|---|
1 | Experimental data | *.xlsx, *.csv, *.txt, *.jpg, *.png, *.mp4 |
2 | Numerical data | *.ansys, *.cfx, *.def, *.res, *.trn, *.dat, *.cas, *.msh, *.cdb, *.wbdb, *.wbpj, *.ccl, *.db, *.gtm, *.cfx5, *.jou |
3 | Engineering drawing data | *.creo, *.x_t, *.tin, *.dxf, *.dwg, *.step, *.stp, *.iges, *.stl, *. .sldprt, *.ipt, *.iam, *.amf, *.3mf, *.obj, *,fbx, *.3ds |
4 | Code | *.m, *.c, *.cpp, *.py, *.f |
Figure 8. Data preservation and life expectancy.
Reference
- NTNU Open Data
- DMP Guidance, NTNU
- National strategy on access to and sharing of research data (Norway)
- Open Access to Research Data, v2017, The Research Council of Norway
- Policy for open science, The Research Council of Norway
- Open science, European Commission
- Creative Commons