ParlaCAP project image

Science cluster

SSHOC - Social Sciences and Humanities

Summary

The ParlaCAP project leverages advanced natural language processing to analyse political agendas and sentiments in debates from 27 European national parliaments. The automatic coding of agendas throughout a wide dataset of more than 7 million speeches, given in more than 20 languages, has become possible recently with significant developments in natural language processing and artificial intelligence, allowing for multilingual transformer models to provide both highly consistent and accurate codings. By integrating the ParlaMint dataset and the Comparative Agendas Project's coding scheme, the project will create a comprehensive, FAIR dataset for comparative political research, enhancing transparency and accountability in legislative discourse across Europe.

Research domains:
Social Science and Humanities
Partner(s):
Jožef Stefan Institute, Institute for Contemporary History, University of Zagreb, Bulgarian Academy of Sciences, Polish Academy of Sciences
Project team member(s):
Tomaž Erjavec, Taja Kuzman, Peter Rupnik, Katja Meden, Jure Skubic, Anna Kryvenko, Daniela Širinić, Petya Osenova, Maciej Ogrodniczuk, Łukasz Kobyliński.
External collaborators: Michal Mochtak (Radboud University), Matyáš Kopp (Institute of Formal and Applied Linguistics)

Challenge

Open Science project, Open Science Service, Cross-domain/Cross-RI,

Parliaments are the cornerstone of democracy in Europe, ensuring the political representation of citizens. Despite their empirical relevance, parliamentary studies have often limited their scope to a single parliamentary body or a small group of parliaments analysed in comparative perspective. 

The main challenge of the ParlaCAP project is to bridge the gap between existing parliamentary research data and how to utilise these data in political science research by integrating two key international and cross-disciplinary initiatives: the CLARIN ERIC ParlaMint project, which provides texts of parliamentary debates from 27 European national parliaments, and the Comparative Agendas Project (CAP), which offers a coding schema of 21 topics for tracking political agendas in parliamentary proceedings. 

Solution

The project will employ the Comparative Agendas Project's text-as-data methodology to analyse parliamentary debates of all the 27 parliaments, consisting of more than 7 million speeches, given in more than 20 languages, by automatically coding the agenda of each speech and transforming the ParlaMint corpora into a structured and tabular dataset, available for complete download through CESSDA ERIC. 

The project aims to further code each speech with the sentiment expressed, as well as cross-reference the data with the PartyFacts metadatabase on political party metadata and the V-DEM surveys on the state of democracies. With this enriched and fully-FAIR dataset, now suitable for quantitative research, it will be possible to acquire a comprehensive understanding of how political attention is distributed across policy areas by analysing topic and sentiment coding over an unprecedented number of parliaments for political science research. The dataset will be available through RIs, such as CESSDA, CLARIN, and DARIAH, along with a graphical user interface and API for broader accessibility.

Scientific Impact

ParlaCAP will revolutionise comparative parliamentary studies by providing a robust dataset for tracking political agenda-setting across European parliaments. Its open, FAIR data management approach will support a wide range of RIs and projects in social sciences, while promoting transparency in political discourse and accountability of legislative bodies. The findings will have societal relevance, fostering collaboration in political science and beyond.

Moreover, the engagement activities foreseen in the frame of the project will provide services and accompanying tutorials to raise awareness of the political studies community and ensure that the project's results are FAIR for further application and research by scientists across various Social Sciences and Humanities (SSH) domains. 

Open science added value

The new FAIR dataset will feature speech-level metadata on democracies, parties, speakers, topics, and sentiment, accompanied by both original and translated text of the debates as supporting information. By providing structured data, it will be possible to better serve the needs of the CESSDA ERIC infrastructure, the CAP infrastructure on agenda setting in political discourse, the MEDEM infrastructure on monitoring electoral democracies, and all RIs and research agendas interested in parliamentary debates that rely primarily on structured data analysis.

Keywords
natural language processing, parliamentary research data, ParlaMint, political science, AI, artificial intelligence
Project start date:
Project duration:
24 months

Principal investigator

Nikola Ljubešić - PI - ParlaCAP project
Nikola Ljubešić
Jožef Stefan Institute
BIO

Nikola Ljubešić is senior researcher from the Jožef Stefan Institute in Ljubljana. He is also affiliated with the Faculty of Computer and Information Science of the University of Ljubljana, and the Institute of Contemporary History in Ljubljana. His research interests lie in the areas of natural language processing, computational linguistics and computational social science, with a strong focus on the South-Slavic linguistic and cultural area.

QUOTE
"Open science is a snowball effect in itself. Our ParlaCAP project would not have been possible without the upstream FAIR project ParlaMint, whose results this project will, inter alia, make significantly more useful for social science research. This snowball is picking up in both pace and size!"