Science cluster
Summary
The ParlaCAP project leverages advanced natural language processing to analyse political agendas and sentiments in debates from 27 European national parliaments. The automatic coding of agendas throughout a wide dataset of more than 7 million speeches, given in more than 20 languages, has become possible recently with significant developments in natural language processing and artificial intelligence, allowing for multilingual transformer models to provide both highly consistent and accurate codings. By integrating the ParlaMint dataset and the Comparative Agendas Project's coding scheme, the project will create a comprehensive, FAIR dataset for comparative political research, enhancing transparency and accountability in legislative discourse across Europe.
External collaborators: Michal Mochtak (Radboud University), Matyáš Kopp (Institute of Formal and Applied Linguistics)
Challenge
Open Science project, Open Science Service, Cross-domain/Cross-RI,
Parliaments are the cornerstone of democracy in Europe, ensuring the political representation of citizens. Despite their empirical relevance, parliamentary studies have often limited their scope to a single parliamentary body or a small group of parliaments analysed in comparative perspective.
The main challenge of the ParlaCAP project is to bridge the gap between existing parliamentary research data and how to utilise these data in political science research by integrating two key international and cross-disciplinary initiatives: the CLARIN ERIC ParlaMint project, which provides texts of parliamentary debates from 27 European national parliaments, and the Comparative Agendas Project (CAP), which offers a coding schema of 21 topics for tracking political agendas in parliamentary proceedings.
Solution
The project will employ the Comparative Agendas Project's text-as-data methodology to analyse parliamentary debates of all the 27 parliaments, consisting of more than 7 million speeches, given in more than 20 languages, by automatically coding the agenda of each speech and transforming the ParlaMint corpora into a structured and tabular dataset, available for complete download through CESSDA ERIC.
The project aims to further code each speech with the sentiment expressed, as well as cross-reference the data with the PartyFacts metadatabase on political party metadata and the V-DEM surveys on the state of democracies. With this enriched and fully-FAIR dataset, now suitable for quantitative research, it will be possible to acquire a comprehensive understanding of how political attention is distributed across policy areas by analysing topic and sentiment coding over an unprecedented number of parliaments for political science research. The dataset will be available through RIs, such as CESSDA, CLARIN, and DARIAH, along with a graphical user interface and API for broader accessibility.
Scientific Impact
ParlaCAP will revolutionise comparative parliamentary studies by providing a robust dataset for tracking political agenda-setting across European parliaments. Its open, FAIR data management approach will support a wide range of RIs and projects in social sciences, while promoting transparency in political discourse and accountability of legislative bodies. The findings will have societal relevance, fostering collaboration in political science and beyond.
Moreover, the engagement activities foreseen in the frame of the project will provide services and accompanying tutorials to raise awareness of the political studies community and ensure that the project's results are FAIR for further application and research by scientists across various Social Sciences and Humanities (SSH) domains.
Open science added value
Principal investigator
Nikola Ljubešić is senior researcher from the Jožef Stefan Institute in Ljubljana. He is also affiliated with the Faculty of Computer and Information Science of the University of Ljubljana, and the Institute of Contemporary History in Ljubljana. His research interests lie in the areas of natural language processing, computational linguistics and computational social science, with a strong focus on the South-Slavic linguistic and cultural area.