-function \cite{Kasprzak2010}.
-
-For PAN 2012, we have experimented with using 1-, 2-, and 3-grams instead
-of only 3-grams, and using the different measure of the difference between
-the n-gram profiles. We have used an approach similar to \cite{ngram},
-where we have compute the profile as an ordered set of 400 most-frequent
-$n$-grams in a given text (the whole document or a partial window). Apart
-from ordering the set, we have ignored the actual number of occurrences
-of a given $n$-gram altogether, and used the value inveresly
-proportional to the $n$-gram order in the profile, in accordance with
-the Zipf's law \cite{zipf1935psycho}.
-
-This approach has provided more stable style-change function than
-than the one proposed in \cite{pan09stamatatos}. Because of pair-wise
-nature of the detailed comparison sub-task, we couldn't use the results
-of the intrinsic detection immediately, therefore we wanted to use them
-as hints to the external detection.
-
-We have also experimented with modifying the allowed gap size using the
-intrinsic plagiarism detection: to allow only shorter gap if the common
-features around the gap belong to different passages, detected as plagiarized
-in the suspicious document by the intrinsic detector, and allow larger gap,
-if both the surrounding common features belong to the same passage,
-detected by the intrinsic detector. This approach, however, did not show
-any improvement against allowed gap of a static size, so it was omitted
-from the final submission.
-
-\subsubsection{Language Detection}
-
-For language detection, we used the $n$-gram based categorization \cite{ngram}.
-We have computed the language profiles from the source documents of the
-training corpus (using the annotations from the corpus itself). The result
-of this approach was better than using the stopwords-based detection we have
-used in PAN 2010. However, there were still mis-detected documents,
-mainly the long lists of surnames and other tabular data. We have added
-an ad-hoc fix, where for documents having their profile too distant from all of
-English, German, and Spanish profiles, we have declared them to be in English.
+function \cite{Kasprzak2010}. For PAN 2012, we made further improvements
+to the algorithm, resulting in more stable style change function in
+both short and long documents.
+
+We tried to use the results of the intrinsic plagiarism detection
+as hint for the post-processing phase, allowing to merge larger
+intervals, if they both belong to the same passage, detected by
+the intrinsic detector. This approach did not provide improvement
+when compared to the static gap limits, as described in Section
+\ref{postprocessing}, so we have omitted it from our final submission.
+
+%\subsubsection{Language Detection}
+%
+%For language detection, we used the $n$-gram based categorization \cite{ngram}.
+%We computed the language profiles from the source documents of the
+%training corpus (using the annotations from the corpus itself). The result
+%of this approach was better than using the stopwords-based detection we have
+%used in PAN 2010. However, there were still mis-detected documents,
+%mainly the long lists of surnames and other tabular data. We added
+%an ad-hoc fix, where for documents having their profile too distant from all of
+%English, German, and Spanish profiles, we declared them to be in English.