Technical Reports
A List by Author: Michal Brandejs
- e-mail:
- brandejs(a)fi.muni.cz
- home page:
- https://www.fi.muni.cz/usr/brandejs/
Access Rights in Enterprise Full-text Search
by
Jan Kasprzak,
Michal Brandejs,
Matěj Čuhel,
Tomáš Obšívač,
A full version of the paper presented at ICEIS 2010 conference. July 2010, 19 pages.
FIMU-RS-2010-08.
Available as Postscript,
PDF.
Abstract:
One of the toughest problems to solve when
deploying an enterprise-wide full-text search system is to
handle the access rights of the documents and intranet web pages
correctly and effectively. Post-processing the results of general-purpose
full-text search engine (filtering out the documents inaccessible
to the user who sent the query) can be an expensive operation,
especially in large collections of documents.
We discuss various approaches to this problem and propose a novel
method which employs virtual tokens for encoding the access rights
directly into the search index.
We then evaluate this approach in an intranet system with several
millions of documents and a complex set of access rights and access
rules.
Distributed System for Discovering Similar Documents
by
Jan Kasprzak,
Michal Brandejs,
Miroslav Křipac,
Pavel Šmerk,
A full version of the paper presented at the ICEIS 2008 converence (www.iceis.org). July 2008, 14 pages.
FIMU-RS-2008-04.
Available as Postscript,
PDF.
Abstract:
One of the drawbacks of e-learning methods such as Web-based submission
and evaluation of students` papers and essays is that it has become easier
for students to plagiarize the work of other people.
In this paper we present a computer-based system for discovering
similar documents, which has been in use at Masaryk University in Brno
since August 2006, and which will also be used
in the forthcoming Czech national archive of graduate theses. We also
focus on practical aspects of this system: achieving near real-time response
to newly imported documents, and computational feasibility of handling large
sets of documents on commodity hardware. We also show the possibilities
and problems with parallelization of this system for running on a distributed
cluster of computers.