Translated using DeepL

Machine-translated page for increased accessibility for English questioners.

N-UIZD Questions Artificial Intelligence and Data Processing

Common ground for the programme

  1. Artificial intelligence methods: state space search, local search and single solution metaheuristics, population metaheuristics (evolutionary algorithms, swarm intelligence). Planning, problem representation, state space planning. Working with uncertainty, Bayesian networks, exact and approximate inference, time and uncertainty, utility theory, Markov decision process, value iteration, strategy iteration. Robotics, robot motion planning (configuration space, combinatorial and probabilistic approaches).
  2. Statistics: basic statistical methods (point estimates, confidence intervals, statistical hypothesis testing). ANOVA. Nonparametric hypothesis tests. Multiple linear regression, autocorrelation, multicollinearity. Principal Component Analysis (PCA). (MA012)
  3. High performance computing and intensive computing: superscalar, multi-core and multicore (GPU, MIC) processors, MIMD and SIMD parallelism. Memory organization, shared and distributed, cache coherence. Code optimization, optimizing compilers. Distributed systems, networking topology. Programming parallel and distributed systems. (PA039)
  4. Databases: data storage, record addressing. Indexing and hashing multiple attributes, raster (bitmap) indexes, dynamic hashing. Query evaluation and algorithms, statistics and cost estimation. Query and schema optimization, query transformation rules, data partitioning. Query and schema tuning. Transaction processing, failures and recovery. Security, access permissions. (PA152)
  5. Neural networks: Multilayer networks and their expression capabilities. Learning neural networks: Gradient descent, backpropagation, practical learning issues (data preparation, weight initialization, hyperparameter selection and adaptation). Regularization. Convolutional networks. Recurrent networks. (PV021)
  6. Machine learning: Basic machine learning methods (decision trees including regression, SVM, naive Bayes, kNN). Semi-supervised learning and active learning. Ensemble learning. Basics of anomaly analysis. Advanced methods for evaluating experiments (cross-validation, ROC curves, AUC, M learning algorithms on N datasets, bootstrapping). Theoretical foundations of machine learning (generalization relations in propositional and predicate logic, hypothesis and version space, bias-variance trade off) (PV021, PV056)
  7. Knowledge mining: Data preprocessing. Learning frequent patterns and association rules. Machine learning tools and data mining (in general + description of one in detail). Temporal data analysis (PV056).
  8. Visualization: basic metrics for evaluating visualization quality (efficiency and expressiveness), eight basic visual variables. Basic visualization techniques for 1D, 2D, 3D (explicit and implicit surface representation). Techniques for visualizing multidimensional data (parallel coordinates, RadViz, scatterplot matrices) and hierarchical structures (treemaps, dimensional stacking). Basic classes of interaction techniques (fisheye, perspective walls), specifics of the application of interaction techniques in the space of the data itself and in the space of its attributes (PV251).

Specialization - Processing and analysis of large-scale data

  1. Data analysis. Data warehouses and their lifecycle, narrowed data warehouses (data marts), dimension model and its implementation (star schema, data cube). Data extraction, transformation and loading (ETL), data profiling, data integrity, data quality.
  2. Advanced search techniques. Search: principles, operators for data retrieval, evaluation of results, metrics. Distributed data processing, map-reduce technique and its applications, distributed file systems. Processing and filtering of data streams, examples of applications. (PA212) (Required for study according to the 2022/2023 or later control template)
  3. Similarity search. Principles of similarity search: metric space, descriptor extraction and its relationship with human-perceived similarity, query types and their definitions. Principles of indexing: data partitioning, data filtering (pivoting). Comparison with traditional indexes (B+ trees). (PA128) (required for study according to the 2022/2023 or newer control template)
  4. Cloud computing and distributed databases. Cloud computing: basic principles, infrastructure as a service (IaaS), virtualization and containers, migration to the cloud, security of services, horizontal and vertical scalability. Current technologies and cloud service providers. Distributed databases: principles and benefits of NoSQL approach, consistency, data distribution. Key-value pair storage, document databases, graph databases, column-oriented databases. (PA200, PA195)
  5. Software Engineering. Software development process. Rational Unified Process methodology. Agile software development. Testing phases and test types. Software metrics, code refactoring. Software quality. Estimating the cost and time of SW development. Maintenance and reusability. (PA017)
  6. Applied cryptography. Symmetric and asymmetric cryptography, differences and applications. Hashing functions and their applications. Digital signature: construction, non-repudiation, public key management, certification authorities and public key infrastructures. Authentication, authorization and access control. (PV079) ( Required for study according to the 2021/2022 or earlier control template)
  7. Programming, file organization and administration. UNIX system: kernel architecture, kernel memory model. Program: start and exit, arguments, environment variables. Process: process attributes, process states, communication between processes (pipes, signals, reliable signals). Indexing and hashing: B+ trees, linear and extensible hashing, locally sensitive hashing (LSH). File system: principles, data organization, external memory features, I/O operations, advanced I/O operations (multiplexing with select() and poll(), file locking, scatter-gather I/O, memory-mapped I/O operations), special files, distributed file systems. (PV065, PA152, PA212) (required for study under the 2021/2022 or earlier review template)

Specialization - Machine Learning and Artificial Intelligence

  1. Probability in computer science: definition of probability space. Random variable, definition and use, Markov and Chebyshev inequalities. Random processes, Markov chains (DTMC and CTMC), invariant distributions, ergodic theorem for DTMC. Information theory (entropy, mutual information), coding theory (Kraft and McMillan theorem, Huffman coding, error channel capacity theorem).
  2. Computational logic: complexity and quantifiability of the satisfiability problem. Resolution method in propositional and predicate logic. Prolog language, relational algebra and Datalog. Tableau proofs in propositional, predicate and modal logic. Natural deduction. Inductive inference in propositional logic. Bisimulation and temporal logics. (IA008)
  3. Natural language processing: corpora, language models. Automatic morphological and syntactic tagging. Text classification, information extraction. Recurrent neural networks for language modeling, sequence processing, transformers. Question answering, machine translation. (PA153)
  4. Constraint programming. Algorithms and consistency: edge consistency, pathwise consistency, k-consistency, general edge consistency, bound consistency, directional variations, graph width of constraints. Tree searches, look ahead, look back, incomplete tree searches. Modeling with constraints, global constraints, constraints for scheduling, programming with CPLEX Optimization Programming Language. (PA163)
  5. Artificial intelligence in image processing: image formation (PSF, OTF, sampling). Image classification (VGGNet, GoogLeNet, ResNet, SENet). Object detection (R-CNN, Fast R-CNN, Faster R-CNN, YOLO). Image segmentation (FCN, UNet, Mask R-CNN). Conditional and unconditional generative models (autoregressive models, VAEs, GANs). Models based on convolutional networks and transformers (attention, CNN vs. ViT). (PA228) (required for study under the 2022/2023 or later control template).
  6. Machine Learning. Logic and machine learning (multirelational learning). Metalearning and automated machine learning (AutoML). Advanced anomaly analysis methods. Text categorization. Disambiguation by machine learning methods. Information extraction from text. (PV056, PA153) (required for study according to the 2021/2022 or older control template)

Specialization - Bioinformatics and Systems Biology

  1. Fundamentals of bioinformatics. Fundamentals of molecular biology: structure of prokaryotic and eukaryotic cells, structure and function of nucleic acids and proteins, replication, transcription and translation. Bioinformatics, definition, field of interest, bioinformatics data. Genomics, the genome and methods for its study, PCR, DNA sequencing, genome organization. Proteomics, the proteome and methods for its investigation. Mass spectrometry of proteins. Basics of phylogenetics, methods of constructing phylogenetic trees. Sequence similarity, sequence alignment, related algorithms (IV107, IV108).
  2. Advanced bioinformatics methods. Computational tools for genome analysis, in silico gene identification, genome browsers. Biological sequences and information theory. DNA structure, RNA, melting point estimation and Nussin's algorithm. Hidden Markov models and their use in bioinformatics. Advanced techniques for working with NGS data, metagenomics. Sequence motif search and genome annotation. Analysis of protein structures and their prediction from amino acid sequence. (IV108, PV269)
  3. Modeling and analysis of biological processes. Biological model specification: biological networks and pathways, static analysis of biological networks. Modeling and simulation of biological processes. Deterministic continuous model: law of active matter, kinetics of enzymes and gene regulation. Stochastic models: Markov chain continuous time, stochastic Petri nets, Gillespie's algorithm (SSA). Rule-based languages for specification of biological models. Hypothesis specification using temporal logics, model robustness with respect to temporal properties. Qualitative models: Boolean networks and their analysis (PB050, PA054).
  4. Continuous and hybrid systems: system definition, object, model, system. Dynamical system, transition function, system dimension, equations of state. Continuous, discrete, hybrid system. Linear and nonlinear systems, linearization. Stability and stability characterization. System identifiability, parameter estimation. Achievability in hybrid system. Basic concepts of control theory: controllability, observability. (IV120)

Specialization - Natural Language Processing

  1. Natural language processing: corpora, their markup. Automatic morphological and syntactic analysis. Text classification, information extraction. Sentiment analysis, named entity recognition. Recurrent neural networks for language modeling, sequence processing, transformers. Question answering, machine translation. (PA153, IA161)
  2. Language modeling: language model, Noisy Channel methods, Markov models, hidden Markov models (HMMs), smoothing. Neural models like GPT, large language models, prompt engineering, models tuned for dialogue. (PA154)
  3. Computational logic: Complexity and quantifiability of the satisfiability problem. Resolution method in propositional and predicate logic. Prolog language, relational algebra and Datalog. Tableau proofs in propositional, predicate and modal logic. Natural deduction. Inductive inference in propositional and predicate logic. Bisimulation and temporal logics. (IA008)
  4. Probability in computer science: definition of probability space. Random variable, definition and use, Markov and Chebyshev inequalities. Random processes, Markov chains (DTMC and CTMC), invariant distributions, ergodic theorem for DTMC. Information theory (entropy, mutual information), coding theory (Kraft and McMillan theorem, Huffman coding, error channel capacity theorem).