Recherche Data Gouv: the data repository
Find out about the objectives, creation and functionalities of the data repository platform supported by Recherche Data Gou
CNRS researchers can now deposit their data in an institutional space on the national Recherche Data Gouv platform. What are the platform's objectives?
In compliance with its CNRS Roadmap for Open Science, the CNRS is encouraging its researchers to make their research data accessible1 in the same way as their scientific publications. Some scientists can access trusted thematic repositories2 or solutions their community have already developed. But what can they do if there are no such solutions available in their field? As a response to this question, the CNRS opened an institutional CNRS Research Data (French link) space on the Recherche Data Gouv platform.
"The CNRS is one of the major providers of research data on multidisciplinary themes. It is important to work on this issue as a whole - from big data to 'small' data that are very useful for science to progress", explains Sylvie Rousset, director of the CNRS's Open Research Data Department (DDOR).
The CNRS Research Data repository enables scientists to publish their data deriving from research supported by the CNRS. This repository offers a generic main collection with laboratories able to request the creation of specific collections. Around ten laboratories have already done so. Nearly 80 datasets are currently available in varied fields (chemistry, physics, engineering and systems sciences, sciences of the Universe, life sciences, etc.) representing over 6600 files.
Find out about the objectives, creation and functionalities of the data repository platform supported by Recherche Data Gou
A comprehensive sovereign solution
This space was created as a state of the art solution by a consortium of institutions led by the National Research Institute for Agriculture, Food and Environment (INRAE)1 as a response to the requirements of the research community as a whole. "This space is part of the national Recherche Data Gouv ecosystem. This initiative has been highly structuring for French higher education and research, particularly the CNRS which is heavily involved in its development", explains Sylvie Rousset who is a member of the steering committee. "The early involvement of the CNRS means the whole community can benefit from the organisation's resources, skills and stakeholders that support the project. The CNRS deserves our gratitude for this", confirms Isabelle Blanc, ministerial administrator for data, algorithms and source codes at the Ministry of Higher Education and Research (MESR) which steers this ecosystem.
The Recherche Data Gouv space can be accessed through a single web portal and represents "a sovereign solution and a reliable alternative to commercial platforms for publishing data", according to Isabelle Blanc. In practical terms, this repository enables each institution to curate and moderate its data on its own institutional space like the one opened by the CNRS. Isabelle Blanc explains that the CNRS space "ensures that the maximum amount of research teams are not left without solutions as the CNRS is the supervisory authority for the most research units in France".
"France's ambition was set out in the 2016 Digital Republic Law – if 50% of a project's research work is publicly funded, There must at least be sharing and, at best, opening up of data2 ", she continues. A commitment to open up all data is also one of the criteria used by French and European funding agencies to evaluate projects and most publishers now stipulate that data linked to a publication should be accessible. "Recherche Data Gouv is part of this dynamic of open science, so that data can benefit the whole community", sums up Isabelle Blanc who also stresses the importance of the objective of "leaving no scientists without a solution for opening or sharing their data".
The priority is supporting scientists
"Opening up data is more complex than opening up publications. Research teams have to carry out a whole range of additional scientific work as far in advance as possible when designing their project. This can't be entrusted to third parties", explains Isabelle Blanc. Scientists need to be able to describe the tools, conditions and protocols used to produce and collect their data. An MESR survey run between 2018 and 2020 found that 80% of research communities either lacked support or the right infrastructure. The other countries1 the Ministry exchanged with on the subject had also identified a need for support in developing their national systems.
Support has become a central element of the system and a priority ahead of the development of technical solutions. In practice, support is organised through a network of services designed to respond to all requirements and based on a strategy of federating, promoting, enhancing and increasing existing initiatives. Firstly, the data management clusters are local solutions throughout the country which bring together complementary expertise from different establishments. Currently, 19 data management clusters are staffed by 350 people from 80 different institutions and provide a range of over 140 services. Ten more such clusters are planned and two calls for accreditation will take place in 2024 and 2025. "The CNRS is a partner in many data workshops which act as the main entry point for scientists", explains Sylvie Rousset.
The CNRS also contributes to rolling out national resource centres like OPIDoR (French link) - a portal set up by the Inist-CNRS1 , a pioneer in the development of data management plans - and DoRANum (French link) which provides data sharing and management resources and training to support the scientific community. These two resource centres mean the CNRS has "a very special place" in the ecosystem, in Isabelle Blanc's view.
Most of the ecosystem's six thematic reference centres2 are also linked to the CNRS which is greatly involved in research infrastructures. This is the case, for example, of Huma-NumIR* (French link) and Data Terra (French link). The latter works on the evolution of the Earth system, covering subject areas ranging from astronomy to climate change including the oceans, the poles, water resources and so forth. These subjects require the analysis of complex, highly heterogeneous observational data from multiple producers. This level of heterogeneity means such analysis often cannot be reproduced under exactly the same conditions. "The past situation was that good data management was an added value but now poor data management is a real obstacle to research", notes Nicolas Arnaud, director of CNRS Earth & Space. FAIR3 data is "part of the Institute's DNA" but such data now needs to be in the right place, at the right time and in the right format for all users working on these themes. For this reason, Data Terra is positioned downstream of data-producing infrastructures and provides a single portal for accessing data and dedicated data processing tools. In addition to the complex curation work carried out by this infrastructure, it also advises on best practices. "Data Terra is an object that functions and can be a source of inspiration for other communities, particularly on the European scale", explains Nicolas Arnaud.
EaSy Data (Earth System Data Repository) was launched on November 6th 2023. This repository for long-tail data on the Earth system and the environment is supported by Data Terra and operated by BRGM. EaSy Data has been identified as the national thematic repository for so-called 'orphan' or long-tail data on the environment and the Earth system. This data results from research projects with a finite duration (projects or publications) and its acquisition/development, preservation and dissemination are not organised on a permanent or community basis.
Structuring the research data environment bolsters scientific collaboration and promotes interdisciplinarity through the sharing and re-use of data. Long-term human and financial resources are required to support the development of professions and skills. This is why Sylvie Rousset calls for "recognition of all these professions and profiles in career assessments". Currently, the skills associated with the FAIRisation of data are being effectively defined and efforts are being made to assign greater recognition to the specific nature of professions involved in managing, preserving and disseminating data like data librarians, curators or stewards, etc.
By 2025, the HAL open archive (French link) developed by the CNRS and the CCSD should also provide a service for direct deposits of datasets associated with publications that will then be made accessible via Recherche Data Gouv. Also that year, the Recherche Data Gouv ecosystem will apply to become a research infrastructure that is part of the national strategy (French link).
Recherche Data Gouv is currently also working towards European recognition (see box) and needs to find a sustainable economic model and governance system. To achieve this, a unit with its own staff and resources and operating under several supervisory authorities will be set up in 2024. Its financial support from the MESR will continue until 2026. Work on the harmonisation of legal information is also ongoing to support data teams and workshops on these complex issues which often need tailor-made solutions. "We hope this ecosystem can expand so it can provide this support and make the work of all institutions and research teams easier through a kind of snowball effect that will speed up the process", concludes Isabelle Blanc.
The Recherche Data Gouv steering committee is preparing several applications for 2024 to bring it closer to the European Open Science Cloud (French link) (EOSC) project. This should enable the national platform to become part of the catalogue of services EOSC makes available for European scientists. Recherche Data Gouv's data management clusters made up of experts that support research teams could become expertise centres recognised at European level. "The national Recherche Data Gouv platform and the European EOSC project have complementary approaches and can feed off each other", explains Suzanne Dumouchel, the head of international cooperation at the DDOR and a member of the EOSC association's board of directors. She considers that "the strategies are similar, building on existing tools and including the issues of skills and data and metadata quality. Its scope, ambition, objectives and national dimension mean Recherche Data Gouv also meets the criteria envisaged to become a future national EOSC node". However the French Ministry aims to go further and Isabelle Blanc reveals that "we are currently taking part in the construction of a European consortium with counterparts who have developed similar projects to respond to a 2024 EOSC call for generic nodes that provide services to all scientific communities throughout Europe".