Circuit Data processing and machine learning
Subheadings:Similarity search
Annotation:Similarity search is becoming an integral part of data processing tools, as more and more data collections cannot be totally organized, and the only way to compare object pairs is to measure their similarity. The candidate will get acquainted with modeling of similarity as metric spaces, basic types of similarity queries, principles of partitioning of metric spaces and supporting theoretical foundations of building similarity search engines. An overview of existing tools is also included.
Warp:
Metric distance functions, similarity queries, principles of metric space partitioning, metric search strategies, metric transformations, approximated search; Overview of existing approaches; Indexing structures for large data collections; Approximated techniques; Scalable distributed architectures.
Basic study material:
P. Zezula, G. Amato, V. Dohnal, and M. Batko, Similarity Search: The Metric Space Approach. Advances in Database Systems, Springer-Verlag, volume 32. Springer. 2006. Chapters 1, 2, and 3, plus Chapter 4 or 5.
Examiner: prof. Ing. Pavel Zezula, CSc. , RNDr. Michal Batko, Ph.D. , doc. RNDr. Vlastislav Dohnal, Ph.D.
Other recommended literature:
H. Samet, Foundations of Multimedia and Metric Data Structures, Morgan Kaufmann Publishers, 2006.
Searching for information
Annotation:Search is currently considered to be the most widespread application of computer science. Its success is then based on the long-term development of technology, which is constantly revised due to the exponential growth of data. The candidate will get acquainted with modern data retrieval methods used in contemporary practice.
Warp:
Search data models; Search engine evaluation metrics; Documents and inquiries; Indexing and searching; Parallel and distributed search; Web search; Multimedia search; Digital libraries.
Basic study material:
Ricardo Baeza-Yates and Berthier Riberio-Neto, Modern Information Retrieval, Addison Wesley, 2011. Chapters 1, 3, 4, plus one of the other chapters of your choice.
Examiner: prof. Ing. Pavel Zezula, CSc. , RNDr. Michal Batko, Ph.D. , doc. RNDr. Vlastislav Dohnal, Ph.D.
Other recommended literature:
CD Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge University Press. 2008.
Contemporary topics of data processing research
Annotation:Data processing methods belong to the rapidly developing fields of informatics due to the rapidly evolving range of diverse data types, a sharp increase in the volume of data, and the development of hardware infrastructure organized in networks. Topics are discussed every year at hundreds of professional conferences, where the most important ones include: VLDB, ACM SIGMOD, ACM SIGIR, IEEE ICDE, EDBT, and others.
Warp:
The subject of the exam is to study four articles of top conferences, provided that their content reflects as much as possible the subject of the student within his PhD study. Proceedings should belong to the latest years of conferences. The content of the articles should also be close to the subject of examinations.
Basic study material:
Conference proceedings
VLDB - Very Large Data Bases
ACM SIGMOD - Management of Data
ACM SIGIR - Information Retrieval
IEEE ICDE - International Conference on Data Engineering
EDBT - Extending Data Base Technology
Examiner: prof. Ing. Pavel Zezula, CSc. , RNDr. Michal Batko, Ph.D. , doc. RNDr. Vlastislav Dohnal, Ph.D.