Student Preparing Next-Gen Search Engine Receives Brno Ph.D. Talent Award and 300 Thousand
Traditional search engines are sufficient for ordinary text documents, but they fall short for mathematics. They only allow the user to search using text keywords, not mathematical formulae.
He was drawn to computers since young age. Supporting his passion, his parents bought him the Baltík educational programming environment by Bohumír Soukup. Now, he is one of the Brno Ph.D. talents.
The Ph.D. student Vít Novotný develops a next-gen search engine in the MIR research group at the Faculty of Informatics, MU. He tackles problems that Google can’t answer. His project is supposed to help researchers all around the globe.
Traditional Search Engines Such as Google and Seznam Ignore Mathematics
Researchers in STEM (Science, Technology, Engineering, Mathematics) fields require access to articles full of mathematics. “Traditional search engines are sufficient for ordinary text documents, but they fall short for mathematics. They only allow the user to search using text keywords, not mathematical formulae. If you search for a scientific article using text keywords, and the article does not contain the keywords, then you may not find the article at all,” explains Vít.
Mathematical search engines started to appear at the beginning of the 21st century. The Faculty of Informatics participated at the EuDML European project and developed the MIaS mathematical search engine in 2011. However, MIaS does not take advantage of the latest techniques of machine learning and artificial intelligence.
Machine Learning and Artificial Intelligence Can Improve Both Mathematical and Text Search
Today’s mathematical search engines can search for text keywords and mathematical formulae. However, text and mathematics are represented separately. Using the latest techniques of machine learning and artificial intelligence, hybrid search engines that allow the user to search for mathematics using text keywords and to search for text using mathematical formulae can be developed.
We Tested Trains of Thought on the Little Red Riding Hood
“We also experiment with representing document by curves. The curves follows the trains of thought in a text document as they change from one paragraph to the next. In our experiments, we used three version of the classic story of the Little Red Riding Hood: the original version from the 17th century, the Grimm version from the 19th century, and the Hoodwinked film from 2005. While the first two versions only differ in their endings, the film only shares the major characters and locations with the classic tale. The keywords were similar in all three versions, but the trains of thought in the film differed significantly from the first two versions,” explains Vít.
This novel approach should make it possible to detect plagiarism and to search for semantically similar documents.
Brno Ph.D. Talent Award Provides New Opportunities
Due to his success in the Brno Ph.D. Award competition, Vít will be awarded 300 thousand Czech crowns over the period of three years. Beside financial support, the South-moravian center for international mobility (JCMM) also organizes seminars for the Brno Ph.D. talents and provides networking opportunities. By creating conditions comparable to top western research institutions, JCMM aims to keep talented young minds in Brno.
And what are Vít’s interests besides research? He produces electronic music, sings in an ensemble and co-organizes Animefest, the oldest anime convention in the Czech Republic.