Technical Reports
A List by Author: Tomáš Ptáčník
- e-mail:
- ptacnik(a)fi.muni.cz
On Disambiguation in Czech Corpora
by Luboš Popelínský, Tomáš Pavelek, Tomáš Ptáčník, October 2000, 26 pages.
FIMU-RS-2000-07. Available as Postscript, PDF.
Abstract:
Lemma disambiguation means finding the basic word form, typically nominative singular for nouns or infinitive for verbs. We developed a multistrategy method for lemma disambiguation of unannotated text. The method is based on a combination of inductive logic programming and instance-based learning. We present results of the most important subtasks of lemma disambiguation for Czech language. Although no expert knowledge on Czech grammar has been used the accuracy reaches 90% with a fraction of words remaining ambiguous. We also display first results of tag disambiguation.
Responsible contact:
vedaXDjbU2LSW@fiQ1T5fpnNE.muni8CVI=SRv5.cz
Please install a newer browser for this site to function properly.