N-UIZD Questions Artificial Intelligence and Data Processing

Common ground for the programme

Artificial intelligence methods: state space search, local search and single solution metaheuristics, population metaheuristics (evolutionary algorithms, swarm intelligence). Planning, problem representation, state space planning. Working with uncertainty, Bayesian networks, exact and approximate inference, time and uncertainty, utility theory, Markov decision process, value iteration, strategy iteration. Robotics, robot motion planning (configuration space, combinatorial and probabilistic approaches). (IV126)
Statistics: basic statistical methods (point estimates, confidence intervals, statistical hypothesis testing). ANOVA. Nonparametric hypothesis tests. Multiple linear regression, autocorrelation, multicollinearity. Principal Component Analysis (PCA). (MA012)
High performance computing and intensive computing: superscalar, multi-core and multicore (GPU, MIC) processors, MIMD and SIMD parallelism. Memory organization, shared and distributed, cache coherence. Code optimization, optimizing compilers. Distributed systems, networking topology. Programming parallel and distributed systems. (PA039)
Databases: data storage, record addressing. Indexing and hashing multiple attributes, raster (bitmap) indexes, dynamic hashing. Query evaluation and algorithms, statistics and cost estimation. Query and schema optimization, query transformation rules, data partitioning. Query and schema tuning. Transaction processing, failures and recovery. Security, access permissions. (PA152)
Neural networks: Multilayer networks and their expression capabilities. Learning neural networks: Gradient descent, backpropagation, practical learning issues (data preparation, weight initialization, hyperparameter selection and adaptation). Regularization. Convolutional networks. Recurrent networks. (PV021)
Semi-supervised learning and active learning. Ensemble learning. Fundamentals of anomaly analysis. Advanced methods for evaluating experiments (cross-validation, ROC curves, AUC, M learning algorithms on N datasets, bootstrapping). Theoretical foundations of machine learning (generalization relations in propositional and predicate logic, hypothesis and version space, bias-variance trade off) (PV056) for graduates of the course up to and including Spring 2024
Knowledge mining: data preprocessing. Learning frequent patterns and association rules. Machine learning tools and data mining (in general + description of one in detail). Analysis of temporal data. (PV056) for graduates of the course up to and including Spring 2024.
Machine learning: fundamentals of machine learning (supervised, semi-supervised and unsupervised learning; classification, regression, anomaly detection operations). Learning metrics (contrastive learning, triplet-loss learning). Vector/product quantization with applications to approximate search. Principles of cross-modal learning (CLIP). (PV056) for graduate students from Spring 2025 onwards.
Knowledge mining: association rules and algorithms for frequent pattern search (A-Priori, PCY). Principles of clustering algorithms (k-means, hierarchical clustering, DBSCAN, Chameleon). Analysis of temporal data: properties and preprocessing of time series, DTW, moving average (MA). (PA212, PV056) for graduates of PV056 from Spring 2025 onwards.
Visualization: basic metrics for evaluating visualization quality (efficiency and expressiveness), eight basic visual variables. Basic visualization techniques for 1D, 2D, 3D (explicit and implicit surface representation). Techniques for visualizing multidimensional data (parallel coordinates, RadViz, scatterplot matrices, dimensional stacking) and hierarchical structures (treemaps). Basic classes of interaction techniques (fisheye, perspective walls), specifics of application of interaction techniques in the space of data itself and in the space of its attributes.

Specialization - Processing and analysis of large-scale data

Data analysis. Data warehouses and their lifecycle, narrowed data warehouses (data marts), dimension model and its implementation (star schema, data cube). Data extraction, transformation and loading (ETL), data profiling, data integrity, data quality.
Advanced search techniques. Data processing using Map-Reduce approach. Search using Locality-Sensitive Hashing (LSH) and Min-Hashing techniques. Data stream processing (DGIM, Bloom filters). PageRank and its calculation by iterative method. (PA212) (Required for study according to the 2022/2023 or newer control template)
Similarity search. Principles of similarity search: metric space, descriptor extraction and its relationship with human perceived similarity, query types and their definitions. Principles of indexing: data partitioning, data filtering (pivoting). Comparison with traditional indexes (B+ trees). (PA128) (required for study according to the 2022/2023 or newer control template)
Cloud computing and distributed databases. Cloud computing: basic principles, infrastructure as a service (IaaS), virtualization and containers, migration to the cloud, security of services, horizontal and vertical scalability. Current technologies and cloud service providers. Distributed databases: principles and benefits of NoSQL approach, consistency, data distribution. Key-value pair storage, document databases, graph databases, column-oriented databases. (PA200, PA195)
Software Engineering. Software development process. Rational Unified Process methodology. Agile software development. Testing phases and test types. Software metrics, code refactoring. Software quality. Estimating the cost and time of software development. Maintenance and reusability. (PA017)
Applied cryptography. Symmetric and asymmetric cryptography, differences and applications. Hashing functions and their applications. Digital signature: construction, non-repudiation, public key management, certification authorities and public key infrastructures. Authentication, authorization and access control. (PV079) ( Required for study according to the 2021/2022 or earlier control template)
Programming, file organization and administration. UNIX system: kernel architecture, kernel memory model. Program: start and exit, arguments, environment variables. Process: process attributes, process states, communication between processes (pipes, signals, reliable signals). Indexing and hashing: B+ trees, linear and extensible hashing. File system: principles, data organization, external memory features, I/O operations, advanced I/O operations (multiplexing using select() and poll(), file locking, scatter-gather I/O, memory-mapped I/O operations), special files, distributed file systems. (PV065, PA152) (required for study under the 2021/2022 or earlier review template)

Specialization - Machine Learning and Artificial Intelligence

Probability in computer science: definition of probability space. Random variable, definition and use, Markov and Chebyshev inequalities. Random processes, Markov chains (DTMC and CTMC), invariant distributions, ergodic theorem for DTMC. Information theory (entropy, mutual information), coding theory (Kraft and McMillan theorem, Huffman coding, error channel capacity theorem).
Computational logic: complexity and quantifiability of the satisfiability problem. Resolution method in propositional and predicate logic. Prolog language, relational algebra and Datalog. Tableau proofs in propositional, predicate and modal logic. Natural deduction. Inductive inference in propositional logic. Bisimulation and temporal logics. (IA008)
Natural language processing: corpora, language models. Automatic morphological and syntactic tagging. Text classification, information extraction. Recurrent neural networks for language modeling, sequence processing, transformers. Question answering, machine translation. (PA153)
Constraint programming. Algorithms and consistency: edge consistency, pathwise consistency, k-consistency, general edge consistency, bound consistency, directional variations, graph width of constraints. Tree searches, look ahead, look back, incomplete tree searches. Modeling with constraints, global constraints, constraints for scheduling, programming with CPLEX Optimization Programming Language. (PA163)
Artificial intelligence in image processing: image formation (PSF, OTF, sampling). Image classification (VGGNet, GoogLeNet, ResNet, SENet). Object detection (R-CNN, Fast R-CNN, Faster R-CNN, YOLO). Image segmentation (FCN, UNet, Mask R-CNN). Conditional and unconditional generative models (autoregressive models, VAEs, GANs). Models based on convolutional networks and transformers (attention, CNN vs. ViT). (PA228) (required for study under the 2022/2023 or later control template).
Machine Learning. Logic and machine learning (multirelational learning). Metalearning and automated machine learning (AutoML). Advanced anomaly analysis methods. Text categorization. Disambiguation by machine learning methods. Information extraction from text. (PV056, PA153) (required for study under the 2021/2022 or earlier control template)

Specialization - Bioinformatics and Systems Biology

Fundamentals of bioinformatics. Fundamentals of molecular biology: structure of prokaryotic and eukaryotic cells, structure and function of nucleic acids and proteins, replication, transcription and translation. Bioinformatics, definition, field of interest, bioinformatics data. Genomics, genome and methods of genome exploration, PCR, DNA sequencing, genome organization. Proteomics, the proteome and methods for its investigation. Mass spectrometry of proteins. Basics of phylogenetics, methods of constructing phylogenetic trees. Sequence similarity, sequence alignment, related algorithms (IV107, IV108).
Advanced bioinformatics methods. Computational tools for genome analysis, in silico gene identification, genome browsers. Biological sequences and information theory. DNA structure, RNA, melting point estimation and Nussin's algorithm. Hidden Markov models and their use in bioinformatics. Advanced techniques for working with NGS data, metagenomics. Sequence motif search and genome annotation. Analysis of protein structures and their prediction from amino acid sequence. (IV108, PV269)
Modeling and analysis of biological processes. Biological model specification: biological networks and pathways, static analysis of biological networks. Modeling and simulation of biological processes. Deterministic continuous model: law of active matter, kinetics of enzymes and gene regulation. Stochastic models: Markov chain continuous time, stochastic Petri nets, Gillespie's algorithm (SSA). Rule-based languages for specification of biological models. Hypothesis specification using temporal logics, model robustness with respect to temporal properties. Qualitative models: Boolean networks and their analysis (PB050, PA054).
Continuous and hybrid systems: system definition, object, model, system. Dynamical system, transition function, system dimension, equations of state. Continuous, discrete, hybrid system. Linear and nonlinear systems, linearization. Stability and stability characterization. System identifiability, parameter estimation. Achievability in hybrid system. Basic concepts of control theory: controllability, observability. (IV120)

Specialization - Natural Language Processing

Natural language processing: corpora, their markup. Automatic morphological and syntactic analysis. Text classification, information extraction. Sentiment analysis, named entity recognition. Recurrent neural networks for language modeling, sequence processing, transformers. Question answering, machine translation. (PA153, IA161)
Language modeling: language model, Noisy Channel methods, Markov models, hidden Markov models (HMMs), smoothing. Neural models like GPT, large language models, prompt engineering, models tuned for dialogue. (PA154)
Computational logic: Complexity and quantifiability of the satisfiability problem. Resolution method in propositional and predicate logic. Prolog language, relational algebra and Datalog. Tableau proofs in propositional, predicate and modal logic. Natural deduction. Inductive inference in propositional and predicate logic. Bisimulation and temporal logics. (IA008)
Probability in computer science: definition of probability space. Random variable, definition and use, Markov and Chebyshev inequalities. Random processes, Markov chains (DTMC and CTMC), invariant distributions, ergodic theorem for DTMC. Information theory (entropy, mutual information), coding theory (Kraft and McMillan theorem, Huffman coding, error channel capacity theorem).