Tekstu korekcija latviešu valodā, izmantojot Gemma-2 modeli

Poļakova, Anna

View/Open

302-110624-Polakova_Anna_ap11124.pdf (1.947Mb)

Author

Poļakova, Anna

Co-author

Latvijas Universitāte. Eksakto zinātņu un tehnoloģiju fakultāte

Advisor

Bārzdiņš, Guntis

Date

2025

Metadata

Show full item record

Abstract

Šajā darbā tiek pētīta iespēja uzlabot lielo valodas modeli Gemma-2, lai tas labāk rediģētu tekstus latviešu valodā. Modelis tika papildus apmācīts, izmantojot latviešu valodas korpusu Norma, un iegūtie rezultāti tika salīdzināti ar neapmācītā modeļa versiju, kā arī ar 26 redaktoru veidotām teikumu versijām. Tika izmantotas vairākas metrikas (Levenšteina attālums, BLEU, ROUGE-L, ChrF), lai novērtētu līdzību starp teikumiem. Rezultāti rāda, ka apmācītais modelis ir uzlabojis rediģēšanas spējas, ir konservatīvāks un biežāk sakrīt ar redaktoru veidotajām versijām. Darbā tika veikta arī padziļināta analīze par modeļa tipiskajām kļūdām un nesakritībām ar redaktoriem.

This paper explores the potential for improving the large language model Gemma-2 to better edit texts in the Latvian language. The model was fine-tuned using the Latvian language corpus Norma, and the results were compared with both the original (untrained) model and 26 sentence versions produced by human editors. Several similarity metrics—Levenshtein distance, BLEU, ROUGE-L, and ChrF—were used to evaluate performance. The results show that the fine-tuned model demonstrates improved editing capabilities, behaves more conservatively, and more often aligns with human-edited versions. The paper also includes an in-depth analysis of the model’s typical errors and its divergences from human corrections.

URI

https://dspace.lu.lv/dspace/handle/7/71565

Collections

Bakalaura un maģistra darbi (EZTF) / Bachelor's and Master's theses [6168]