Augmenting the COVID-19 Data Platform with behavioural data

LS-RI - Life Sciences
SSHOC - Social Sciences and Humanities

The project aimed at adding Social Sciences & Humanities data to the COVID-19 Data Platform, providing contextual data and a knowledge development environment.

Data variety presents a challenge to the alignment of catalogues, metadata, and protocols with the life sciences and other parts of the platform. Collecting and combining quantitative and qualitative data in many formats (text, audio, video, social media) and in multiple languages requires multilingual thesauri and ontologies to make data findable and comparable.

In the project, a catalogue of key datasets for social, economic, psychological analysis has been created, organised by data producer and national service provider in data hubs to replicate the EMBL structure, and supplemented with contextual economic, social, cultural, health, and migration data. The catalogue is supported with background infrastructures such as multilingual thesauri, controlled vocabularies, metadata profiles based on global standards, and data sensitivity tags.
Tools, expertise and techniques for annotations, analysis and extraction of information have been used to deal with qualitative and multimedia data and to address multilinguality.
A knowledge development track has made use of online surveys and AI techniques, focusing on the knowledge cycle and data interoperability, including non-hierarchical data, via semantic techniques such as Knowledge Graphs. To optimise reusability, concerted and collaborative actions to group and enrich data have been envisioned in cooperation with research communities.

Populating the COVID-19 Data Platform with data from the Social Sciences and Humanities allowed pooling and augmenting behavioural and attitudinal data to enable researchers to investigate the opinions and attitudes of European populations during the crisis, and monitor the occurrence and spread of the virus.

Bringing relevant data together will increase efficiency, improve comparability and provide all researchers with the same opportunities for using the data. To facilitate cooperation and to provide seamless access even to sensitive data, secured environments will be set up for bringing data and researchers together.