+Each keywords based query was constructed from five top ranked keywords consecutively.\r
+Each keyword was used only in one query. \r
+%Too long keywords based queries would be overspecific and it would have resulted in a low recall.\r
+%On the other hand having constructed too short queries (one or two tokens) would have resulted in a low precision and also possibly low recall since they would be too general.\r
+In order to direct the search more at the highest ranked keywords we also extracted their \r
+most frequent two and three term long collocations.\r
+These were combined also into queries of 5 words.\r
+Resulting the 4 top ranked keywords alone can appear in two different queries, one from the keywords\r
+alone and one from the collocations.\r
+%Collocation describes its keyword better than the keyword alone. \r
+\r
+The keywords based queries are non-positional, since they represent the whole document. They are also non-phrasal since\r
+they are constructed of tokens gathered from different parts of the text. And they are deterministic; for certain input\r
+document the extractor always returns the same keywords.\r
+\r
+\subsubsection{Intrinsic Plagiarism Based Queries.}\r
+The second type of queries purpose to retrieve pages which contain text detected\r
+as different, in a manner of writing style, from other parts of the suspicious document.\r
+%Such a change may point out plagiarized passage which is intrinsically bound up with the text. \r
+%We implemented vocabulary richness method which computes average word frequency class value for \r
+%a given text part. The method is described in~\cite{awfc}.\r
+For this purpose we implemented vocabulary richness method~\cite{awfc} together with\r
+sliding windows concept for text chunking as described in~\cite{suchomel_kas_12}.\r
+%The problem is that generally methods based on the vocabulary statistics work better for longer texts.\r
+%According to authors this method scales well for shorter texts than other text style detection methods. \r
+%The usage of this method is in our case limited by relatively short texts.\r
+%It is also difficult to determine\r
+%what parts of text to compare. Therefore we used sliding window concept for text chunking with the \r
+%same settings as described in~\cite{suchomel_kas_12}.\r
+\r
+A representative sentence longer than 6 words was randomly selected among those that apply from the suspicious part of the document.\r
+The query was created from the representative sentence leaving out stop words.\r
+The intrinsic plagiarism based queries are positional. They carry the position of the representative sentence.% in the document.\r
+They are phrasal, since they represent a search for a specific sentence. And they are\r
+nondeterministic, because the representative sentence is selected randomly. \r
+ \r
+\subsubsection{Paragraph Based Queries.}\r
+The purpose of paragraph based queries is to check some parts of the text in more depth.\r
+Those are parts for which no similarity has been found during previous searches. \r
+For this case we considered a paragraph as a minimum text chunk for plagiarism to occur. \r
+%It is discussible whether a plagiarist would be persecuted for plagiarizing only one sentence in a paragraph.\r
+%A detection of a specific sentence is very difficult if we want to avoid exhaustive search approach.\r
+%If someone is to reuse some peace of continuous text, it would probably be no shorter than a paragraph. \r
+Despite the fact, that paragraphs differ in length, we represent one paragraph by only one query.\r
+\r
+%The paragraph based query was created from each paragraph of suspicious document.\r
+From each paragraph we extracted the longest sentence from which the query was constructed.\r
+Ideally the extracted sentence should carry the highest information gain.\r
+The query was maximally 10 words in length which is the upper bound of ChatNoir\r
+and was constructed from the selected sentence by omitting stop words.\r