OSCARS image

Science cluster

SSHOC - Social Sciences and Humanities

Summary

The AMIS project aims to develop an innovative web application specifically designed for humanities researchers, focusing on text analysis for metadata enrichment. It is expected to be a user-friendly tool for assessing metadata quality, enriching it with additional information, and facilitating text analysis. The web application seeks to streamline metadata creation by providing design assistance, improving academic discoverability, and offering personalised recommendations for scholarly data. By leveraging machine learning, AMIS enables a contextualised approach to metadata enrichment, enhancing research processes, particularly for heritage texts in the Social Sciences and Humanities (SSH).

Research domains:
Social Science and Humanities
Partner(s):
ARIANE Consortium of the Huma-Num infrastructure, CNRS, University of Poitiers, University of Sorbonne Nouvelle, University of Lyon, University of Madrid Complutense, University AL. I. Cuza
Project team member(s):
Prof. Ioana Galleron (University of Sorbonne Nouvelle, France), Prof. Fatiha Idmhand (University of Poitiers, France), Prof. Sabine Loudcher (University of Lyon 2, France), Prof. Amelia Sanz (University of Madrid Complutense, Spain), Prof. Simone Rebora (University of Verona, Italy), Dr. Roxana Patras (University AL. I. Cuza, Romania)

Challenge

Open Science Service

Metadata enrichment in digital humanities is often a manual, time-consuming task, with shared standards being defined but lacking open services that support the large-scale creation of quality metadata. As a result, due to differing interpretations of expectations, information extracted from the same field across a wide range of resources tends to display a high degree of inconsistency. The challenge, therefore, is to strike a balance between speed and precision while taking the researcher’s expertise into account.

Solution

By developing a context-based web application, AMIS will provide an innovative environment for scholars to create and refine their own metadata in faster and more precise ways while also assisting them in identifying best practices and shared vocabularies for describing text content. The service uses machine learning techniques (text classification, named entity recognition - NER, named entity linking - NEL, sentiment analysis, topic modelling, text summarization, sequence labelling, and dependency parsing) to analyse the uploaded files, compare them with data from international repositories, and suggest enhancements in the form of metadata. The core module of AMIS will be trained on a wide variety of resources hosted in repositories such as Zenodo, Nakala, Rossio, Recolecta, Docta, and several others.

Scientific Impact

By enhancing metadata quality and coherence, AMIS will contribute to the convergence of conceptual models in text description and analysis within cultural studies and other SSH disciplines. This will increase comparability among similar research conducted in different academic contexts and corpora. The project will also benefit researchers by improving their own metadata. Existing collections will be upgraded, while new digital corpora will be designed from the outset with enriched metadata perspectives. Heritage texts will not only be made available but will also be virtually linked through a multitude of information points to other existing resources, supporting further discoveries about the circulation of ideas and themes across cultural areas and over time. Finally, AMIS promotes better practices for metadata creation, improving the discoverability and reusability of scientific content.

Results

  • AMIS web application: The technical team of the AMIS project completed the first internal testing version of its metadata assistant. The system is designed to optimise the description of cultural and textual resources. AMIS, currently in a private evaluation phase, is based on a standardised set of Dublin Core fields specifically adapted to the needs of digital humanities. Priority metadata elements include key fields such as title, creator, subject, date of creation, license, language, topic, and description. At this initial stage, the assistant is capable of analyzing files uploaded by team members and segmenting the information they contain in order to generate an initial set of 15 metadata fields. Although this version focuses on core fields, its architecture has been designed to support the future integration of more complex ontologies, such as the Corpus Author Ontology CRM (CAO_CRM), enabling advanced semantic inference. The test version is currently being evaluated by the project’s technical teams to ensure proper functionality across all languages supported by the AMIS assistant (French, Spanish, Italian, Romanian, and English).
  • Review of metadata protocols in the databases of AMIS Assistant's European partners: The project team completed a manual review of metadata protocols used in the databases of our European partners. The objective was to identify datasets suitable for fine-tuning language models aimed at improving automatic metadata generation. The results reveal an uneven landscape: while platforms such as Nakala or Dracor adhere to open standards and provide APIs, others rely on proprietary systems such as DSpace, where metadata fields describe resources according to internal requirements rather than shared protocols. The review confirmed that fewer than 15% of the databases analysed allow programmatic access, and many lack structured metadata. Although this diversity-and in many cases the absence of standardised metadata structures-hinders interoperability with query tools, the team has identified promising datasets that could be used to refine the information-retrieval mechanisms of the AMIS assistant.
  • Ontology alignment and multilingual thesaurus integration for coherent metadata recommendations: The project team has initiated work to align the ontology dedicated to author corpora (Corpus Author Ontology CRM CAO_CRM), developed by the ARIANE consortium, with the descriptive fields of the assistant. This activity, currently in progress, aims to ensure that automatically generated metadata can be mapped onto clear and unambiguous semantic structures within the ontological model. Although both the assistant and the ontology are still under development, this integration effort establishes the foundations for the delivery of consistent metadata recommendations across the project’s five working languages. In parallel, the project is addressing the integration of the vocabulary of the Typology of Textual Genres thesaurus, a reference resource developed within the Huma-Num CAHIER consortium and currently maintained by the ARIANE consortium and Biblissima+ Cluster 5b. A thorough revision of the thesaurus’s RDF encoding has been completed to support the integration of concept translations into Spanish, Italian, Romanian, Portuguese, and English. The updated thesaurus comprises 370 concepts structured in accordance with the ISO 25964 international standard and implemented using SKOS (Simple Knowledge Organization System). Its RDF representation has been enhanced with the technical mechanisms required to associate language tags with individual concepts. This transition from a monolingual to a multilingual thesaurus was carried out using the Opentheso platform and included a series of semantic conformity checks. It ensures semantic interoperability at the European level and facilitates access to controlled vocabularies in their working languages for digital humanities researchers. The thesaurus remains accessible via the Huma-Num portal and continues to be integrated within the platform’s open and semantic data infrastructure.

Events 

  • 13 November 2025 | Lyon, France - Practical evaluation of the CAO_CRM model: connecting disciplinary logics and data management systems, at the General Assembly of the ARIANE consortium
  • January 2026 - The AMIS project tookl a decisive step in its development with the in-person presentation of the test version of its metadata assistant at a Paris meeting with European consortium partners to conduct hands-on testing in the project’s languages: French, Spanish, Italian, Romanian, Portuguese and English.
  • 20-22 May 2026 | Paris, France - The AMIS project will organise a dedicated workshop on metadata management and present the AMIS assistant application to the broader scholarly community at Humanistica 2026

Keywords
metadata enrichment, nature language processing, text analysis, machine learning, (meta)data quality assessment, digital humanities
Project start date:
Project duration:
24 months

Principal investigator

Fatiha Idmhand - PI AMIS project
Fatiha Idmhand
University of Poitiers
BIO

Fatiha Idmhand is currently a Professor at the University of Poitiers and a Researcher at the Institut des Textes et Manuscrits Modernes (Archivos, UMR-8132, Paris). Her academic background is in Literature and Hispanic Studies (Spanish and Spanish American Studies), with a particular focus on creative processes (genetic criticism studies) and Digital Humanities.

Her scientific research focuses on archives and manuscripts, particularly in relation to key themes such as cultural transfers and the circulation of literature, arts, and ideas between Europe and the Americas during the major conflicts and crises of the 20th and 21st centuries.

She is currently working on cultural mediators, new typologies for manuscripts, and methods in literary computing.

QUOTE
"The AMIS project offers an innovative web tool that streamlines metadata creation, enriches research with relevant information, and enhances discoverability, advancing Open Science by promoting accessible and well-structured scholarly data in cultural and textual studies."

Resources