Projects / Biežākās kļūdas latviešu valodā: korpusā balstīta kļūdu analīze un teksta labošana (Norma)

Year	2024 Jan–2026 Dec
Funding	Latvian Council of Science Fundamental and Applied Research Projects lzp-2023/1-0481
Abstract	The aim of the project is to create a semi-automatically error-annotated corpus of texts produced by native speakers of Latvian, in which the most common errors of the Latvian language will be documented, corrected and explained. The methodology of corpus creation and data will be used to analyze how language errors affect the grammatical system of the Latvian language and to develop state-of-the-art corpus-based guidelines for improving written language quality. Error-annotated corpus is also required for the development of high-level grammar checkers that could spot complex structural errors in addition to low-level spell checkers.
Homepage	https://norma.korpuss.lv/

R. Dargis, G. Barzdins, I. Skadina, N. Gruzitis, B. Saulite
Evaluating Open-Source LLMs in Low-Resource Languages: Insights from Latvian High School Exams
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, Association for Computational Linguistics, 2024
PDF, BibTeX

Common Writing Errors in Latvian: Corpus-Driven Error Analysis and Text Correction (Norma)

Publications