Projects / Mūsdienīgas metodes latviešu valodas leksisko resursu pilnveidei

Year	2026 Jan–2028 Dec
Funding	Latvian Council of Science Fundamental and Applied Research Projects lzp-2025/1-0685
Abstract	The aim of the project is to study and develop the latest methods for supplementing lexical resources. To reduce manual linguistic work in preparing new word entries, it is planned to study the use of large language models (LLMs) for word sense separation and the creation of word sense definitions. In addition, a set of criteria will be created for evaluating automatically generated data, as well as methodology for the practical application of this data for supplementing the Latvian electronic dictionary Tēzaurs. It is planned to obtain word candidates not included in the existing lexical resources by collecting newer text corpora and using crowdsourcing. In addition, crowdsourcing will also be used to validate suggested word candidates and their senses. From a linguistic perspective, the project includes fundamental research in lexical semantics (word sense division) and lexicography (the creation of word definitions), while from a computational linguistic perspective, the project explores the practical application of LLMs for specific tasks.

Contemporary Methods for the Development of Latvian Lexical Resources