Research on Modern Latvian Language and Development of Language Technology (LATE)

Year 2022 Dec–2024 Dec
Funding State Research Programme
Letonika – Fostering a Latvian and European Society
VPP-LETONIKA-2021/1-0006
Partners Latvian Language Institute UL, Liepaja University, Faculty of Humanties UL, Institute of Literature, Folklore and Art UL
Abstract The aim of the project is to advance research on the grammatical, lexical-semantic, phonetic and phonological system of the modern Latvian language, and Latvian sign language using data-driven methods, as well as to develop sustainable Latvian language resources and tools. In order to achieve the goal, the Latvian speech corpus, the pilot corpus of Latvian sign language will be developed, and Tezaurs.lv and “Dictionary of Contemporary Latvian” will be improved. Based on Latvian grammar studies, “Latvian Treebank” will be enhanced. These resources will be integrated into a single Latvian language research infrastructure, as well into the CLARIN-LV repository. During the project, a LATE platform for speech transcription and subtitling will be created.
Homepage http://www.digitalhumanities.lv/projects/vpp-late/

Publications

R. Dargis, A. Znotins, I. Auzina, B. Saulite, S. Reinsone, R. Dejus, A. Klavinska, N. Gruzitis
BalsuTalka.lv – Boosting the Common Voice Corpus for Low-Resource Languages
Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), 2024
PDF, BibTeX
I. Auzina, N. Gruzitis, R. Dargis, G. Rabante-Busa, D. Gosko, J. Vempers, R. Kivkucans, A. Znotins
Recent Latvian Speech Corpora for Linguistic Research and Technology Development
Baltic Journal of Modern Computing, 12(4), 646-658, 2024
PDF, DOI, BibTeX
A. Klints, M. Grasmanis, G. Nespore-Berzkalne, L. Pretkalnina, M. Stade, N. Gruzitis, I. Lokmane, P. Paikens, L. Rituma, A. Spektors
Tēzaurs as a Digital Multifunctional Lexical Resource
Baltic Journal of Modern Computing, 12(4), 513-525, 2024
PDF, DOI, BibTeX
L. Rituma, G. Nespore-Berzkalne, A. Klints, I. Lokmane, M. Stade, P. Paikens
Classifying Multi-Word Expressions in the Latvian Monolingual Electronic Dictionary Tēzaurs.lv
6th International Conference Computational Linguistics in Bulgaria (CLIB), 113-118, 2024
PDF, BibTeX
I. Lokmane and B. Saulite
Infinitīva palīgteikumi un teikuma tipu robežgadījumi „Nacionālajā korpusu kolekcijā”
Linguistica Lettica, 32, 308-330, 2023
PDF, DOI, BibTeX
G. Rabante-Busa
Partikulas "kaut" izrunas varianti
Vārds un tā pētīšanas aspekti, 27, 184-191, 2023
BibTeX
B. Saulite, I. Auzina, R. Dargis
Nacionālā korpusu kolekcija Korpuss.lv
Linguistica Lettica, 31(1), 202-223, 2023
PDF, DOI, BibTeX
L. Rituma, G. Nespore-Berzkalne, B. Saulite, L. Pretkalnina
Vārdkopas analogi „Latviešu valodas sintaktiski marķētajā korpusā”
Valoda: nozīme un forma, 14, 156-173, 2023
PDF, DOI, BibTeX
L. Lauze and I. Auzina
Korpusu un individuālā vākuma salīdzinājums: ģenitīva un nominatīva konkurence saistījumā ar adverbu
Valoda: nozīme un forma, 14, 111-125, 2023
PDF, DOI, BibTeX
L. Pretkalnina
Formāls latviešu valodas gramatikas modelis un tā realizācija mašīnlasāmā sintakses korpusā
2023
PDF, BibTeX
M. Grasmanis, P. Paikens, L. Pretkalnina, L. Rituma, L. Strankale, A. Znotins, N. Gruzitis
Tēzaurs.lv – The Experience of Building a Multifunctional Lexical Resource
Electronic lexicography in the 21st century (eLex): Invisible Lexicography, 2023
PDF, BibTeX
I. Skadina, I. Auzina, R. Dargis, E. Lasmanis, A. Voitkans
CLARIN-LV: Many Steps till Operation
CLARIN Annual Conference, 2022
PDF, BibTeX
B. Saulite, R. Dargis, N. Gruzitis, I. Auzina, K. Levane-Petrova, L. Pretkalnina, L. Rituma, P. Paikens, A. Znotins, L. Strankale et al.
Latvian National Corpora Collection – Korpuss.lv
13th Language Resources and Evaluation Conference (LREC), 2022
PDF, BibTeX
R. Dargis, I. Auzina, I. Kaija, K. Levane-Petrova, K. Pokratniece
LaVA – Latvian Language Learner corpus
13th Language Resources and Evaluation Conference (LREC), 2022
PDF, BibTeX
P. Paikens, M. Grasmanis, A. Klints, I. Lokmane, L. Pretkalnina, L. Rituma, M. Stade, L. Strankale
Towards Latvian WordNet
13th Language Resources and Evaluation Conference (LREC), 2022
PDF, BibTeX