OSCARS image

Science cluster

SSHOC - Social Sciences and Humanities

Summary

Lexicon and grammar are essential to communication, spoken or written, which is more efficient if lexicon and grammar follow the orthography rules of the standardised language. Good quality writing is considered a social advantage since it reduces possible misunderstandings in communication. It is part of a person’s social capital.

Opravidlo 2.0 aims to bring a beta-version online proofreading service, Opravidlo, into its operational phase. The project contributes to the evolution of Czech automatic proofreading by integrating AI with expert linguistic rules, making efforts to improve recall while maintaining precision. With over 161,000 monthly users, it corrects spelling, grammar, and typesetting errors with high precision, contributing to Open Science and aiding both native speakers and foreigners. Opravidlo 2.0 will enhance recall and precision, empowering  users from different linguistic backgrounds to communicate more effectively in Czech.

Research domains:
Social Science and Humanities
Partner(s):
Masaryk University, Faculty of Informatics and Faculty of Arts

Challenge

Open Science project, Cross-domain/Cross-RI

Maintaining high precision in automatic proofreading while improving recall is a critical challenge. Current systems, including Beta Opravidlo, effectively identify spelling and grammatical issues but face limitations in covering all orthographic phenomena. Low recall can lead to user dissatisfaction, as incomplete error detection is problematic for a wide range of language users.  The risk is that users will lose trust in the tool and refuse to use it. Most current tools are outdated (over 15 years old) and may not account for shifts in formal Czech, such as influences from social media or the introduction of new words. 

Solution

Opravidlo 2.0 plans to combine the current rule-based system with artificial intelligence (AI) approaches, enhancing both recall and precision. New neural network models will be developed and trained for different proofreader modules, which will refine error detection across areas like punctuation, grammatical agreement, and sentence syntax, and neural models will also be applied to the explainability module. 

In the frame of the project, a set of formalised grammar rules will be published. This dataset can be adapted for languages with similar orthography rules, together with a corpus of corrected texts obtained with the users’ prior permission, and including Opravidlo 2.0 suggestions, applied rules, and information about whether the user accepted the correction.

The tool will also expand its utility with an internationalised interface to support Czech language learners from diverse backgrounds, including speakers of Slovak, Ukrainian, Russian, and English.

All relevant data and methods will be published mainly via CLARIN ERIC and DARIAH ERIC.

Scientific Impact

Opravidlo 2.0 will contribute to Open Science by publishing its grammar rules and anonymised corrected texts via the CLARIN infrastructure. This will enable linguistic analysis, teaching, and tool development across languages with similar orthography systems. The project’s hybrid AI approach aims to set a new standard for proofreading technologies, benefiting both public services and future linguistic research. Also, a high-precision proofreader will serve the public good for all age groups of native Czechs and accelerate the integration of immigrants to Czechia.

All materials will be released via the CLARIN infrastructure. The published material can, among others, serve for linguistic analysis, language teaching, and training and evaluation of automated proofreaders in different languages.


Keywords
proofreading, Opravidlo, Czech language, artificial intelligence, AI tool
Project start date:
Project duration:
24 months

Principal investigator

Ales Horak - PI - Opravidlo 2.0 project
Ales Horak
Masaryk University
BIO

Ales Horak is an Associate Professor of Computer Science at Masaryk University, Brno, Czech Republic. His research focuses on natural language processing, large language models, knowledge representation and reasoning, stylometry, and grammar error checking.

QUOTE
"Mastering written communication can be challenging. Strong language skills are more than a benefit for everyone—they're crucial in everyday life. Whether you're a native speaker or a newcomer, Opravidlo 2.0 is designed to support you using a combination of linguistic expertise and advanced AI."