N-DL questions Digital Linguistics
- Applications of natural language processing: automatic morphological and syntactic analysis. Semantic analysis of sentences. Speech processing, dialogue systems. Text classification, information extraction. Sentiment analysis, named entity recognition, question answering, machine translation (PA153, IA161, PA156, IV029, PLIN034, PLIN063).
- Machine learning for natural language processing: corpora, language models. Text classification (Naive Bayes, neural network based approaches). Vector representations of words, phrases and documents. Convolutional networks for text processing. Recurrent neural networks for language modeling, sequence processing, transformers, large language models. (PA153, IA161, PA154)
- Linguistic analysis. Morphological dictionary as a part of an automatic analyzer - capturing grammatical meanings in the morphological dictionary, capturing standard and substandard forms in the morphological dictionary (PLIN041, PLIN032, PLIN037, PLIN077, PLIN078).
- Linguistics in theory: Word types - criteria of classification (morphological, syntactic, semantic). Sentence articles - subject, object, adverbial determiner, preposition, complement, attribute (how they can be recognized and what their properties are). Substance - grammatical categories of nouns; declension paradigms. Adjectives - types of declension (compound, nominal, mixed declension of possessive adjectives). Verbs - grammatical categories of verbs; finite and non-finite forms, synthetic and analytic forms; conjugation paradigms/verb classes. (PLIN063, PLIN065, PLIN034, PLIN078)
- Lexicography: vocabulary - structuring; developmental tendencies, neologisms. Lexicography - subject of interest; Computer lexicography - dictionary editing systems, dictionary entry tagging; dictionary construction, presentation of macrostructure and microstructure on selected dictionary work; dictionary typology. Territorial stratification of the national language - outline of Czech dialects; leveling processes, interdialects and general Czech. Norms, anxiety, codification - care of the written language; current codification manuals (PLIN035, CJJ15, PLIN033).
- Corpus linguistics: history of corpus linguistics - early corpus linguistics, Chomsky's critique of corpus linguistics, building the first corpora. The development of corpus linguistics. Automatic tools for studying grammar built over language corpora - specific applications, using more complex CQL queries to study the grammatical system of a language. Selection of a suitable corpus for solving a linguistic problem - freely available corpora and their characteristics, DIY corpora, corpus-based linguistic handbooks. (CJBB105, IB047)
- Statistics: Methods of data analysis. Parametric models - parameter estimation, hypothesis testing, ANOVA, independence testing, non-parametric tests. Linear regression models. (MV013)
- Mathematical induction. Binary relations, closures, transitivity. Equivalence and ordered sets. Composition of relations and functions. Concept of graph, isomorphism, continuity, trees, skeletons. (IB000)