Common Writing Errors in Latvian: Corpus-Driven Error Analysis and Text Correction (Norma)

Year 2024 Jan–2026 Dec
Funding Latvian Council of Science
Fundamental and Applied Research Projects
lzp-2023/1-0481
Abstract The aim of the project is to create a semi-automatically error-annotated corpus of texts produced by native speakers of Latvian, in which the most common errors of the Latvian language will be documented, corrected and explained. The methodology of corpus creation and data will be used to analyze how language errors affect the grammatical system of the Latvian language and to develop state-of-the-art corpus-based guidelines for improving written language quality. Error-annotated corpus is also required for the development of high-level grammar checkers that could spot complex structural errors in addition to low-level spell checkers.

Publications

R. Dargis, G. Barzdins, I. Skadina, N. Gruzitis, B. Saulite
Evaluating Open-Source LLMs in Low-Resource Languages: Insights from Latvian High School Exams
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, Association for Computational Linguistics, 2024
PDF, BibTeX