Accepted Papers
Papers accepted for TSD 2002, with abstracts
Topic: Text
Topic: Speech
Topic: Dialogue
Paper ID: 3
Type: LP
Title: A Common Solution for Tokenization and Part-of-Speech Tagging
Contact author: Jorge Grana and Miguel A. Alonso and Manuel Vilares
Topic: Text - parsing and part-of-speech tagging
Abstract: Current taggers assume that input texts are already tokenized,
i.e. correctly segmented in tokens or high level information
units that identify each individual component of the texts. This
working hypothesis is unrealistic, due to the heterogeneous nature of
the application texts and their sources. The greatest troubles arise
when this segmentation is ambiguous. The choice of the correct
segmentation alternative depends on the context, which is precisely
what taggers study.
In this work, we develop a tagger able not only to decide the tag to
be assigned to every token, but also to decide whether some of them
form or not the same term, according to different segmentation
alternatives. For this task, we design an extension of the Viterbi
algorithm able to evaluate streams of tokens of different lengths over
the same structure. We also compare its time and space complexities
with those of the classic and iterative versions of the algorithm.
Paper ID: 7
Type: LP
Title: Rule Parser for Arabic Stemmer
Contact author: Imad A. Al-Sughaiyer and Ibrahim A. Al-Kharashi
Topic: Text - automatic morphology
Abstract: Arabic language exhibits a complex but very regular morphological structure that greatly affect its automation. Current available morphological analysis techniques for the Arabic language are based on heavy computational processes and/or the existence of large amount of associated data. Utilizing existed morphological techniques greatly degrade the efficiency of some natural language applications such as information retrieval system.
This paper proposed a new Arabic morphological analysis technique. The technique is based on the pattern similarity of words derived from different roots. Unique patterns are extended and coded as rules that encode morphological characteristics. The technique does not require either complex computation or associated data yet adjustable to maintain enough accuracy. This technique utilizes a very simple parser to scan coded rules and decompose a given Arabic word into its morphological components.
This paper provides an introduction to Arabic language and its morphological characteristic followed by an overview of currently available morphological techniques. Explanation of the developed stemmer and its components including rule set and parser were given. Experimental results and the work conclusion were provided at the end.
Paper ID: 9
Type: LP
Title: Automatic Lexical Stress Assignment of Unknown Words for Highly Inflected Slovenian Language
Contact author: Tomaž Šef and Maja Škrjanc and Matjaž Gams
Topic: Speech - text-to-speech synthesis
Abstract: This paper presents a two level lexical stress assignment model for out of vocabulary Slovenian words used in our text-to-speech system. First, each vowel (and consonant 'r') is determined, whether it is stressed or unstressed, and a type of lexical stress is assigned for every stressed vowel (and consonant 'r'). We applied a machine-learning technique (decision trees or boosted decision trees). Then, some corrections are made on the word level, according the number of stressed vowels and the length of the word. For data sets we used the MULTEXT-East Slovene Lexicon, which was supplemented with lexical stress marks. The accuracy achieved by decision trees significantly outperforms all previous results. However, the sizes of the trees indicate that the accentuation in the Slovenian language is a very complex problem and a simple solution in the form of relatively simple rules is not possible.
Paper ID: 11
Type: SP
Title: Kernel Springy Discriminant Analysis and its Application to a Phonological Awareness Teaching System
Contact author: András Kocsor and Kornél Kovács
Topic: Speech - other
Abstract: Making use of the ubiquitous kernel notion, we present a new nonlinear
supervised feature extraction technique called Kernel Springy Discriminant
Analysis. We demonstrate that this method can efficiently reduce the number of features and increase classification performance.
The improvements obtained admittedly
arise from the nonlinear nature of the extraction technique developed
here. Since phonological awareness is a great importance in learning to read, a
computer-aided training system could be most beneficial in teaching young learners.
Naturally, our system employs an effective automatic phoneme recognizer
based on the proposed feature extraction technique.
Paper ID: 14
Type: LP
Title: Achieving an Almost Correct PoS-Tagged Corpus
Contact author: Pavel Květoň and Karel Oliva
Topic: Text - parsing and part-of-speech tagging
Abstract: After some theoretical discussion on the issue of representativity of a corpus, this paper presents a simple yet very efficient technique serving for (semi-)automatic detection of those positions in a part-of-speech tagged corpus where an error is to be suspected. The approach is based on the idea of learning and later application of "negative bigrams", i.e. on the search for pairs of adjacent tags which constitute an incorrect configuration in a text of a particular language (in English, e.g., the bigram ARTICLE - FINITE VERB). Further, the paper describes the generalization of the "negative bigrams" into "extended negative bigrams of length n", for any natural n, which indeed provides a powerful tool for error detection in a corpus. The approach is illustrated throughout on the case of the NEGRA corpus (hence some command of German might be helpful, even though not really necessary). Finally, some general implications for statistical taggers are mentioned.
Paper ID: 15
Type: LP
Title: Evaluation of a Japanese Sentence Compression Method Based on Phrase Significance and Inter-Phrase Dependency
Contact author: Rei Oguro and Hiromi Sekiya and Yuhei Morooka and Kazuyuki Takagi and Kazuhiko Ozeki
Topic: Text - text/topic summarization
Abstract: Sentence compression is a method of text summarisation,
where each sentence in a text is shortened in such a way
as to retain the original information and grammatical
correctness as much as possible. In a previous paper, we
formulated the problem of sentence compression as an
optimisation problem of extracting a subsequence of phrases
from the original sentence that maximises the sum of topical
importance and grammatical correctness. Based on this
formulation an efficient sentence compression algorithm
was derived. This paper reports a result of subjective
evaluation for the quality of sentences compressed by using
the algorithm.
Paper ID: 17
Type: SP
Title: User query understanding by the InBASE system as a source for a multilingual NLG module(first step)
Contact author: Michael V. Boldasov and Elena G. Sokolova and Michael G. Malkovsky
Topic: Text - multi-lingual issues
Abstract: In the paper we consider the NL generation component of InBASE
system - the system for understanding of NL queries to Data Bases.
This component generates new NL-query from the internal InBASE
Q-representation of the user query. During the planning phase a
linear positioned query representation is constructed, positions
bearing at first conceptual, then syntactic information.
Realization phase deals with the NL means to express the concepts
(objects, attributes, values, relations between objects and
attributes). The NL generation component is conceived as the first
step in the direction from one way question - answering system, as
InBASE is now, to a larger scale information system able to
communicate with user in different areas.
Paper ID: 18
Type: LP
Title: Prosodic Classification of Offtalk: First Experiments
Contact author: Anton Batliner and Viktor Zeissler and Elmar Nöth and Heinrich Niemann
Topic: Dialogue - prosody and emotions in dialogues
Abstract: SmartKom is a multi-modal dialogue system which combines speech with
mimics and gestures. In this paper, we want to deal with one of
those phenomena which can be observed in such elaborated systems
that we want to call `offtalk', i.e., speech that is not
directed to the system (speaking to oneself, speaking aside).
We report the classification results of first experiments
which use a large prosodic feature vector in combination
with part--of--speech information.
Paper ID: 20
Type: SP
Title: Large Vocabulary Speech Recognition of Slovenian Language Using Data-Driven Morphological Models
Contact author: Tomaž Rotovnik and Mirjam Sepesy Maučec and Bogomir Horvat and Zdravko Kačič,
Topic: Speech - automatic speech recognition
Abstract: A system for large vocabulary continuous speech recognition of Slovenian language
is described. Two types of modelling units are examined: words and sub-words.
The data-driven algorithm is used to automatically obtain word decompositions.
The performances of one-pass and two-pass decoding strategies were compared.
The new models gave promising results.
The recognition accuracy was improved by 2.5% absolute at the same recognition time.
On the other hand we
achieved 30% increase in real time performance at the same recognition error.
Paper ID: 22
Type: LP
Title: Statistical Decision Making applied to Text and Dialogue Corpora for Effective Plan Recognition
Contact author: Manolis Maragoudakis and Aristomenis Thanopoulos and Nikos Fakotakis
Topic: Dialogue - development of dialogue strategies
Abstract: In this paper, we introduce an architecture designed to achieve effective plan recognition using Bayesian Networks which encode the semantic representation of the user’s utterances. The structure of the networks is determined from dialogue corpora, thus eliminating the high cost process of hand-coding domain knowledge. The conditional probability distributions are learned during a training phase in which data are obtained by the same set of dialogue acts. Furthermore, we have incorporated a module that learns semantic similarities of words from raw text corpora and uses the extracted knowledge to resolve the issue of the unknown terms, thus enhancing plan recognition accuracy, and improves the quality of the discourse. We present experimental results of an implementation of our platform for a weather information system and compare its performance against a similar, commercial one. Results depict significant improvement in the context of identifying the goals of the user. Moreover, we claim that our framework could straightforwardly being updated with new elements from the same domain or adapted to other domains as well.
Paper ID: 24
Type: LP
Title: NATURAL LANGUAGE GUIDED DIALOGUES FOR ACCESSING THE WEB
Contact author: Marta Gatius and Horacio Rodríguez
Topic: Dialogue - dialogue systems
Abstract: This paper proposes the use of ontologies representing domain and linguistic knowledge for guiding natural language (NL) communication on the Web contents. This
proposal deals with the problem of accessing and processing the Web data required to answer user consults. Concepts and communication acts are represented in
the conceptual ontology (CO). Domain-restricted grammars and lexicons are obtained automatically by adapting the general linguistic knowledge to cover the
communication acts for a particular domain. The use of domain-restricted grammars and lexicons has proved to be efficient especially when the user is guided in
introducing the NL queries. Once the query has been processed, the system fires the appropriate wrappers to extract the data from the Web. The domain concepts
described in the CO provides a unifying framework to represent the knowledge obtained from the various Web sources.
Paper ID: 26
Type: LP
Title: German and Czech Speech Synthesis Using HMM-Based Speech Segment Database
Contact author: Jindřich Matoušek and Daniel Tihelka and Josef Psutka and Jana Hesová
Topic: Speech - text-to-speech synthesis
Abstract: This paper presents an experimental German speech synthesis system. As in case of a Czech text-to-speech system ARTIC, statistical approach (using hidden Markov models) was employed to build a speech segment database. This approach was confirmed to be language independent and it was shown to be capable of designing a quality database that led to an intelligible synthetic speech of a high quality. Some experiments with clustering the similar speech contexts were performed to enhance the quality of the synthetic speech. Our results show the superiority of phoneme-level clustering to subphoneme-level one.
Paper ID: 27
Type: SP
Title: Valency Lexicon for Czech: from Verbs to Nouns
Contact author: Markéta Lopatková and Veronika Řezníčková and Zdeněk Žabokrtský,
Topic: Text - other
Abstract: Valency lexicon of Czech verbshas been intensively worked on for more than a year,and now we have at our disposal a detailed description of valencyframes of several hundreds verbs.Presently, the challenge naturally arises, to use the existing lexiconfor capturing valency of other word classes.In this paper, we focus on valency of nouns derived fromverbs. We propose an algorithm for automatic predictionof valency frames of these nouns, and we test it on a sample of data.
Paper ID: 29
Type: LP
Title: Comparison and Combination of Confidence Measures
Contact author: Georg Stemmer and Stefan Steidl and Elmar Nöth and Heinrich Niemann and Anton Batliner
Topic: Speech - automatic speech recognition
Abstract: A set of features for word-level confidence estimation is developed.
The features should be easy to implement and should require no additional
knowledge beyond the information which is available from the speech recognizer
and the training data.
We compare a number of features based on a common scoring method,
the normalized cross entropy. We also study different ways to
combine the features. An artifical neural network leads to the best performance,
and a recognition rate of 76% is achieved.
The approach is extended
not only to detect recognition errors but also to distinguish between insertion
and substitution errors.
Paper ID: 30
Type: SP
Title: Uniform Speech Recognition Platform for Evaluation of New Algorithms
Contact author: Andrej Žgank and Tomaž Rotovnik and Zdravko Kačič and Bogomir Horvat
Topic: Speech - automatic speech recognition
Abstract: This paper presents the development of speech recognition platform,
which main area of use is the evaluation of different new and improved
algorithms for speech recognition (noise reduction, feature extraction,
language model generation, training of acoustic models, ...).
To enable wide use of the platform, different test configurations
were added - from alphabet spelling to large vocabulary continuous speech
recognition. At the moment, the speech recognition platform was implemented
and evaluated with a studio (SNABI) and a fixed telephone (SpeechDat(II))
speech database.
Paper ID: 31
Type: LP
Title: Strategies for Developing a Real-Time Continuous Speech Recognition System for Czech Language
Contact author: Jan Nouza
Topic: Speech - automatic speech recognition
Abstract: This paper presents a set of ‘strategies’ that enabled the development of a real-time continuous speech recognition system for Czech language. The optimization strategies include efficient computation of HMM probability densities, pruning schemes applied to HMM states, words and word hypotheses, a bigram compression technique as well as parallel implementation of the real recognition system. In a series of off-line speaker-independent tests done with 1600 Czech sentences based on 7033-word lexicon we got 65 % recognition rate. Several on-line tests proved that similar rates can be achieved under real conditions and with response time that is shorter than 1 second.
Paper ID: 32
Type: SP
Title: Voice Chat with a Virtual Character: The Good Soldier Svejk Case Project
Contact author: Jan Nouza and Petr Kolář and Josef Chaloupka
Topic: Dialogue - dialogue systems
Abstract: In this paper we present our initial attempt to link speech processing technology, namely continuous speech recognition, text-to-speech synthesis and artificial talking head, with text processing techniques in order to design a Czech demonstration system that allows for informal voice chatting with virtual characters. Legendary novel figure Svejk is the first personality who can be interviewed in the recently implemented version.
Paper ID: 33
Type: SP
Title: Application of Spoken Dialogue Technology in a Medical Domain
Contact author: I. Azzini and T. Giorgino and D. Falavigna and R. Gretter
Topic: Dialogue - dialogue systems
Abstract: The paper describes the ITC-irst approach for handling spoken dialog
interactions over the telephone network. We will specifically
describe the usage of the dialog system within a tele-medicine
application scenario.
First, the system architecture will be summarized, then we will
briefly describe our approach for evaluating confidence measures for
each of the words in the ``best path'' provided by our recognizer.
Finally, an automatic service for home monitoring of patients affected
by hypertension pathology will be described. Patients must
periodically introduce data into a database containing their personal
medical data. The collected data are managed, according to well
established medical guidelines, by an automatic system that can
suggest therapies or alert doctors.
Paper ID: 34
Type: SP
Title: Term Clustering using a Corpus-Based Similarity Measure
Contact author: Goran Nenadić and Irena Spasić and Sophia Ananiadou
Topic: Text - knowledge representation and reasoning
Abstract: In this paper we present a method for the automatic term clustering. The method uses a hybrid similarity measure to cluster terms automatically extracted from a corpus by applying the C/NC value method. The measure comprises contextual, functional and lexical similarity, and it is used to instantiate the cell values in a similarity matrix. The clustering algorithm uses either the nearest neighbour or the Ward’s method to calculate the distance between clusters. The approach has been tested and evaluated in the domain of molecular biology and the results are presented.
Paper ID: 35
Type: LP
Title: Applying dialogue constraints to the understanding process in a Dialogue system
Contact author: Emilio Sanchis and Fernando García and Isabel Galiano and Encarna Segarra
Topic: Dialogue - dialogue systems
Abstract: In this paper, we present an approach to the estimation of a dialogue-dependent understanding component of a dialogue system. This work is developed in the framework of the BASURDE Spanish dialogue system, which answers queries about train timetables by telephone in Spanish. Modelization which is specific to the dialogue state is proposed to improve the behaviour of the understanding process. Some experimental results are presented.
Paper ID: 36
Type: LP
Title: Evaluating a Probabilistic Dialogue Model for a Railway Information Task
Contact author: Carlos D. Martínez-Hinarejos and Francisco Casacuberta
Topic: Dialogue - other
Abstract: Dialogue modelling attempts to
determine the way in which a dialog is
developed. The dialogue strategy (i.e.,
the system behaviour) of an automatic
dialogue system is determined by the
dialogue model. Most dialogue systems
use rule-based dialogue strategies,
but recently, the probabilistic models
have become very promising. We present
probabilistic models based on the
dialogue act concept, which uses user
turns, dialogue history and semantic
information. These models are
evaluated as dialogue act labelers.
The evaluation is carried out on a
railway information task.
Paper ID: 37
Type: LP
Title: Comparative Study on Bigram Language Models for Spoken Czech Recognition
Contact author: Dana Nejedlová
Topic: Speech - automatic speech recognition
Abstract: The article deals with the problem of continuous speech recognition of Czech language. The main goal of this study is to compare various kinds of bigram language models with respect to the accuracy and speed of speech recognition. The main types of bigram language models are described here as well as multiple parameters that affect the performance of a speech recognition system. A comparison with a zerogram model is also made. Different models and various parameter settings are compared by means of the accuracy rate in extensive experiments done with a large test database of 1,600 Czech sentences recorded by 40 speakers.
Paper ID: 38
Type: LP
Title: Integration of speech recognition and automatic lipreading
Contact author: Pascal Wiggers and Leon J. M. Rothkrantz
Topic: Speech - automatic speech recognition
Abstract: At Delft University of Technology there is a project running on multimodal interfaces on the interaction of speech and lipreading. A large vocabulary speaker independent speech recognizer for the Dutch language was developed using Hidden Markov Toolkit and the Polyphone database of recorded Dutch speech. To make the system more noise robust audio cues provided by an automatic lip-reading technique were integrated in the system. In this paper we give an outline of both systems and present results of experiments.
Paper ID: 42
Type: LP
Title: Heuristic and Statistical Methods for Speech/Non-speech Detector Design
Contact author: Michal Prcín and Luděk Müller
Topic: Speech - automatic speech recognition
Abstract: Speech/non-speech (S/NS) detection plays the important role for
automatic speech recognition (ASR) system, especially in the case
of isolated words or commands recognition. Even in continuous
speech a S/NS decision can be made at the beginning and at the end
of a sequence resulting in a "sleep mode" of the speech recognizer
during the silence and in a reduction of computation demands. It
is very difficult, however, to precisely locate the endpoints of
the input utterance because of unpredictable background noise. In
the proposed method in this paper, we make use of the advantages
of two approaches (i.e. to try to find the best set of heuristic
features and apply a statistical induction method) for the best
S/NS decision.
Paper ID: 44
Type: LP
Title: Evaluation of prediction methods applied to an inflected language
Contact author: Nestor Garay-Vitoria and Julio Abascal and Luis Gardeazabal
Topic: Dialogue - assistive technologies based on speech and dialogue
Abstract: Prediction is one of the techniques that have been applied to Augmentative and Alternative Communication to help people enhancing the quality and quantity of the composed text in a time unit. Most of the literature has been focused in word prediction methods that may be easily applied to non-inflected languages. However, for inflected languages other approaches that mainly distinguish roots and suffixes may enhance the results (in terms of keystroke savings and hit ratio) of predictive systems. In this paper we present the approaches we have applied to the Basque language (an inflected one) and the results they achieve with a particular text (that was not used while creating the initial lexicons the systems use for prediction). Starting from this evaluation, one of the presented approaches is suggested as the best one.
Paper ID: 45
Type: LP
Title: The Role of WSD for Multilingual Natural Language Applications
Contact author: Andrés Montoyo and Rafael Romero and Sonia Vázquez and Carmen Calle and Susana Soler
Topic: Text - word sense disambiguation
Abstract: Nowadays, the need of advanced free text
filtering in multilingual environment is
increasing. Therefore, when searching
for specific keywords in multilingual
information space, it is desirable to
eliminate occurrences where the word or
words of each language are used in an
inappropriate sense. This task could be
exploited in internet browsers, and
resource discovery systems, relational
databases containing free text fields,
electronic document management systems,
data warehouse and data mining systems,
etc. In order to resolve this problem
in this paper we present a Word Sense D
isambiguation interface, which it returns
the words senses in different languages
and it could be employed for
multilingual natural language
applications. This interface resolve
lexical ambiguity of nouns and verbs
in some European languages
(English, Spanish) input texts,
using the taxonomy of the EuroWordNet
lexical knowledge database, and
returning a multilingual output of
the words senses (English, Spanish, Catalan and Basque).
In addition to the relations in WordNet
1.5, EuroWordNet includes cross-language
and cross-category relations, which are
directly useful for multilingual
Word Sense Disambiguation.
This interface has been implemented
using programming language C++
and providing a visual framework.
Paper ID: 47
Type: LP
Title: A Gibbsian Context-Free Grammar for Parsing
Contact author: Antoine Rozenknop
Topic: Text - parsing and part-of-speech tagging
Abstract: Probabilistic Context-Free Grammars can be used for speech recognition or
syntactic analysis thanks to especially efficient algorithms. In this
paper, we propose an instanciation of such a grammar, whose
mathematical properties are intuitively more suitable for those tasks
than SCFG's (Stochastic CFG), without requiring specific analysis
algorithms. Results on Susanne text show that up to $33\%$ of analysis
errors made by a SCFG can be avoided with this model.
Paper ID: 48
Type: SP
Title: SPEECH ENHANCEMENT USING MIXTURES OF GAUSSIANS FOR SPEECH AND NOISE
Contact author: Ilyas Potamitis and Nikos Fakotakis and Nikos Liolios and George Kokkinakis
Topic: Speech - other
Abstract: In this article we approximate the clean speech spectral magnitude as well as noise spectral magnitude with a mixture of Gaussians pdfs using the Expectation-Maximization algorithm (EM). Subsequently, we apply the Bayesian inference framework to the degraded spectral coefficients and by employing Minimum Mean Square Error Estimation (MMSE), we derive a closed form solution for the spectral magnitude estimation task adapted to the spectral characteristics and noise variance of each band. We evaluate our algorithm using true, coloured, slowly and quickly varying noise types (Factory and aircraft noise) and demonstrate its robustness at very low SNRs.
Paper ID: 49
Type: LP
Title: Word Sense vs. Word Domain Disambiguation: a Maximum Entropy approach
Contact author: Armando Suárez and Manuel Palomar
Topic: Text - word sense disambiguation
Abstract: In this paper, a supervised learning system of word sense
disambiguation is presented. It is based on \emph{maximum entropy
conditional probability models}. This system acquires the
linguistic knowledge from an annotated corpus and this knowledge
is represented in the form of features. The system were evaluated
both using WordNet's senses and domains as the sets of classes of
each word. Domain labels are obtained from the enrichment of
WordNet with subject field codes which produces a polysemy
reduction. Several types of features has been analyzed for a few
words selected from the DSO corpus. Currently, the system
implementation does not support any smoothing technique or complex
pre-processing but its accuracy of the system is good when it is
compared with, for example, the systems at SENSEVAL-2. Using the
domain enrichment of WordNet, a 14\% of accuracy improvement is
achieved.
Paper ID: 50
Type: SP
Title: From HTML to VoiceXML: A first approach.
Contact author: César González Ferreras and David Escudero Mancebo and Valentírn Carde\ noso Payo
Topic: Dialogue - markup languages related to speech and dialogue
Abstract: In this work, we discuss the construction process of the voice portal counterpart of a departamental web site. VoiceXML has been used as the dialog modelling language. A prototypical system has been built using our own VoiceXML interpreter, which easily integrates different implementation platforms. A general discussion of VoiceXML advantages and disadvantages is also reported and a simple startup procedure is proposed as a means to build voice portals starting from legacy web sites.
Paper ID: 53
Type: LP
Title: Cross-Language Access to Recorded Speech in the MALACH project
Contact author: D.W. Oard and D. Demner-Fushman and J. Hajič and B. Ramabhadran and S. Gustman and W.J. Byrne and D. Soergel and B. Dorr and P. Resnik and M. Picheny
Topic: Text - information retrieval
Abstract: The MALACH project seeks to help users find information in a vast
multilingual collections of untranscribed oral history interviews.
This paper introduces the goals of the project and focuses on
supporting access by users who are unfamiliar with the interview
language. It begins with a review of the state of the art in
cross-language speech retrieval; approaches that will be investigated
in the project are then described. Czech was selected as the first
non-English language to be supported, so results of an initial
experiment with Czech/English cross-language retrieval are reported.
Paper ID: 54
Type: LP
Title: Utterance Verification based on the Likelihood Distance to Alternative Paths
Contact author: Gies Bouwman and Lou Boves
Topic: Speech - automatic speech recognition
Abstract: Utterance verification is the process where one tries to automatically reject incorrectly recognised utterances, while accepting as many correct results as possible. To this aim the probability of an error is often estimated by a one-dimensional confidence measure. In this paper we take a closer look at incorrect classification. We argue that errors stem from a number of different causes and that this observation must be reflected in the design of the utterance verifier.
Therefore, we developed measures to detect either out-of-vocabulary (OOV) word errors or in-vocabulary substitution errors. To this aim, we compute confidence measures based on the distance between the likelihood of the first best output and two alternative hypotheses: one corresponding to the second best output, the other to the most likely free phone string.
The paper reports on experiments on spoken Dutch city names for a directory assistance application. The results show that a 10% reduction in Confidence Error Rate can be achieved by using a classification and regression tree instead of a linear combination of the cues with a threshold value.
Paper ID: 55
Type: LP
Title: Rejection technique based on the mumble model
Contact author: Tomáš Bartoš and Luděk Müller
Topic: Speech - other
Abstract: In this paper a technique for detection and rejection of incorrectly recognized words is described. The used speech recognition system is based on a speaker-independent continuous density Hidden Markov Model recognizer and so-called mumble model, which structure and function is also described. An improved rejection technique is presented in comparison with the heuristic rejection method that we previously used. The new method is fully statistically based. Therefore selection of features for training and classification, procedures for statistical models parameters estimation and experimental results are reported. The improved rejection technique achieves approximately 12% error rate in detection of incorrectly recognized words.
Paper ID: 56
Type: LP
Title: Efficient Noise Estimation and its Application for Robust Speech Recognition
Contact author: Petr Motlírček and Lukáš Burget
Topic: Speech - automatic speech recognition
Abstract: The investigation of some well known noise estimation techniques is presented. The estimated noise is applied in our noise suppression system that is generally used for speech recognition tasks. Moreover, the algorithms are developed to take part in front-end of Distributed Speech Recognition (DSR). Therefore we have proposed some modifications of noise estimation techniques that are quickly adaptable on varying noise and do not need so much information from past segments. We also minimized the algorithmic delay. The robustness of proposed algorithms were tested under several noisy conditions.
Paper ID: 58
Type: LP
Title: Synthesis in Serbian Language
Contact author: Milan Sečujski and Radovan Obradović and Darko Pekar and Ljubomir Jovanov and Vlado Delić
Topic: Speech - text-to-speech synthesis
Abstract: This paper presents some basic criteria for
conception of a concatenative text-to-speech
synthesizer in Serbian language. The paper
describes the prosody generator which was
used, and reflects upon several peculiarities of
Serbian language which led to its adoption.
The paper also describes criteria for on-line
selection of appropriate segments from a large
speech corpus.
Paper ID: 61
Type: LP
Title: Using Salient Words to Perform Categorization of Web Sites
Contact author: Marek Trabalka and Mária Bieliková
Topic: Text - information retrieval
Abstract: In this paper we focus on categorization task for web sites. We compare some quantitative characteristics of existing web directories, analyze vocabulary used in descriptions of web sites in Yahoo web directory and propose an approach to automatically categorize web sites. Our approach is based on the novel concept of salient words. Experimental evaluation compares two realizations of proposed concept. The former uses words typical for just category, while the latter uses words typical for one or few categories. Results show that there is a limitation of using single vocabulary based method to properly categorize such heterogeneous space as is the World Wide Web.
Paper ID: 70
Type: LP
Title: Discourse-Semantic Analysis of Hungarian Sign Language
Contact author: Gábor Alberti and Helga M. Szabó
Topic: Text - lexical semantics and semantic networks
Abstract: not present
Paper ID: 71
Type: LP
Title: Speech Features Extraction Using Cone-shaped Kernel Distribution
Contact author: Janez Žibert and France Mihelič and Nikola Pavešić
Topic: Speech - automatic speech recognition
Abstract: The paper reviews two basic time--frequency distributions,
spectrogram and cone--shaped kernel distribution applied to speech
signals. We are proposing a new modified method of speech features
extracting based on mel--frequency cepstral coefficients with use
of the cone--shaped kernel distribution. We are additionally
exploring several estimates of the time derivatives approximated
by regression coefficients and coefficients determined by
trigonometric functions. Analyzes and tests are performed for
different sets of speech features obtained from spectrogram and
cone--shaped kernel distribution using speech recognition system
based on hidden Markov acoustic models. Our main goal has been to
incorporate different time--frequency distributions into a speech
features extraction process and potentially find an alternative
way of deriving speech features based on these distributions.
Paper ID: 72
Type: LP
Title: A Voice-Driven Web Browser for Blind People
Contact author: Simon Dobrišek and Jerneja Gros and Boštjan Vesnicer and France Mihelič and Nikola Pavešić
Topic: Dialogue - dialogue systems
Abstract: A specialised small Web browser with a voice-driven dialogue manager and a
text-to-speech screen reader is presented. The Web browser was built from the
GTK Web browser Dillo, which is a free software project in the terms of the
GNU general public license. The new built-in screen reader is now triggered
by pointing the mouse and uses the text-to-speech module for its output. A
dialogue module together with a spoken-command input was also introduced into
the browser. It can be used for navigation through a structure of common Web
pages. The developed browser is primarily intended to be used with the new
Web portal, exclusively dedicated to blind and visually impaired users. All
the Web pages at the portal or at sites that are linked from this portal are
expected to be arranged as common HTML/XML pages, which complies with the
basic recommendations set by the Web Access Initiative.
Paper ID: 74
Type: LP
Title: Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments
Contact author: Josef Psutka and Pavel Ircing and Josef V. Psutka and Vlasta Radová and William J. Byrne and Jan Hajič and Samuel Gustman and Bhuvana Ramabhadran
Topic: Speech - automatic speech recognition
Abstract: In this paper we describe the initial stages of the ASR component of the MALACH (Multilingual Access to Large Spoken Archives) project. This project will attempt to provide improved access to the large multilingual spoken archives collected by the Survivors of the Shoah Visual History Foundation (VHF) by advancing the state of the art in automated speech recognition. In order to train the ASR system, it is necessary to manually transcribe a large amount of speech data, identify the appropriate vocabulary, and obtain relevant text for language modeling. We give a detailed description of the speech annotation process; show the specific properties of the spontaneous speech contained in the archives; and present a baseline speech recognition results.
Paper ID: 80
Type: SP
Title: Word Sense Discrimination for Czech
Contact author: Robert Král
Topic: Text - word sense disambiguation
Abstract: This paper deals with the automatic discrimination of contexts of Czech
ambiguous words. The Schutze's methodology was used, modified and transformed
for the Czech language. This algorithm is based on word space and
clustering. The semantic discrimination could be understood as a subtask of
word sense disambiguation. In this approach, the sense of word is defined as
the cluster of contexts of ambiguous word. We show that Schutze's method
is transportable into Czech. Our results are not so good as his because we have
experimented with a highly ambiguous word.
Paper ID: 86
Type: SP
Title: Tools for Semi-Automatic Assignment of Czech Nouns to Declination Patterns
Contact author: Dita Bartůšková and Radek Sedláček
Topic: Text - automatic morphology
Abstract: In this paper, we present tools for the semi-automatic assignment of Czech nouns to
declination patterns. First, we explain the reasons for development of
such tools and then we describe the structure of the system in detail. It
is based on a decision tree that consists of questions and answers allowing
to distinguish particular declination patterns. Finally, we provide basic
statistic data that clarify the relation between the patterns we developped
and the classical ones.
Paper ID: 87
Type: LP
Title: Dependency Analyser Configurable by Measures
Contact author: Tomáš Holan
Topic: Text - parsing and part-of-speech tagging
Abstract: In this paper we present a dependency analyser able to computesyntax recognition and analysis according to dependency grammars.Analyser is able to deal with nonprojective constructions,it has means to express the level of word-order freedom and its limitations.Level of word-order freedom and level of robustness (correctness)of sentences can be given as parameters of the analysis.Data and grammar definition laguages are also presented.
Paper ID: 88
Type: LP
Title: knowledge based speech interface for handhelds
Contact author: C.K. Yang and L.J.M. Rothkrantz
Topic: Dialogue - development of dialogue strategies
Abstract: This Paper describes a project done at CMG Trade Transport & Industry BV. It is called SWAMP and is an example of the application of speech technology in human-computer interaction. The reasoning model behind the speech interface is based on the Belief Desire Intention (BDI) model for rational agents. Other important tools that were used to build the speech user interface are the Microsoft Speech API 5 and CLIPS.
Paper ID: 89
Type: SP
Title: A Flexible Framework for Evaluation of New Algorithms for Dialogue Systems
Contact author: Pavel Cenek
Topic: Dialogue - dialogue systems
Abstract: Research in the field of dialog systems often involves building a dialog system used for evaluation of algorithms, collection of data and various experiments. A significant amount of time is needed to create such a system. In order to facilitate this task, we created a flexible, extensible and easy to use framework which can be used as a base for experimenting with dialog systems. Major features of the framework are introduced in the paper together with possible ways of their practical use.
Paper ID: 90
Type: LP
Title: The Generation and Use of Layer Information in Multilayered Extended Semantic Networks
Contact author: Sven Hartrumpf and Hermann Helbig
Topic: Text - lexical semantics and semantic networks
Abstract: The paradigm of Multilayered Extended Semantic Networks (MultiNet) is one
of the most thoroughly described knowledge representantion systems along
the line of semantic networks (Quillian 1968).
The conceptual representation of MultiNet is characterized by
embedding its nodes into a multidimensional space of layer attributes.
These layer attributes and their values play an important part during the
syntactico-semantic analysis of natural language texts and during the
inferential answer finding in question answering systems.
The paper demonstrates the automatic generation of complex layer information
for conceptual nodes and their use in the phase of assimilation of knowledge
pieces into a larger knowledge base.
Paper ID: 91
Type: SP
Title: ON THE FIRST GREEK-TTS BASED ON FESTIVAL SPEECH SYNTHESIS: ARCHITECTURE AND COMPONENTS DESCRIPTION
Contact author: Zervas P. and Potamitis I. and Fakotakis N. and Kokkinakis G.
Topic: Speech - text-to-speech synthesis
Abstract: In this article we describe the first Text To Speech (TTS) system for the Greek language based on Festival architecture. We discuss practical implementation details and we capitalize on the preparation of the diphone database and on the prediction of phoneme duration module implemented with CART tree technique. Two male databases where used for two different speech synthesis engines, namely, residual LPC synthesis and MBROLA technique.
Paper ID: 93
Type: LP
Title: Enhancing Best Analysis Selection and Parser Comparison
Contact author: Aleš Horák and Vladimírr Kadlec and Pavel Smrž
Topic: Text - parsing and part-of-speech tagging
Abstract: This paper discusses methods enhancing the selection of a ``best''
parsing tree from the output of natural language syntactic analysis.
It presents a method for cutting away redundant parse trees based on
the information obtained from a dependency tree-bank corpus.
The effectivity of the enhanced parser is demonstrated by results of
inter-system parser comparison. The test were run on the standard
evaluation grammars (ATIS, CT and PT), our system outperforms the
referential implementations.
Paper ID: 94
Type: LP
Title: Exploiting Thesauri and Hierarchical Categories in Cross-Language Information Retrieval
Contact author: Fatiha Sadat and Masatoshi Yoshikawa and Shunsuke Uemura
Topic: Text - information retrieval
Abstract: As Internet resources become accessible to more and more countries, there is a need to develop efficient methods for information retrieval across languages. In the present paper, we focus on query expansion techniques to improve the effectiveness of an information retrieval. A combination to a dictionary-based translation and statistical-based disambiguation is indispensable to overcome translation’s ambiguity. We propose a model using multiple sources for query reformulation and expansion to select expansion terms and retrieve information needed by a user. Relevance feedback, thesaurus-based expansion, as well as a new feedback strategy, based on the extraction of domain keywords to expand user’s query, are introduced and evaluated. We evaluated the effectiveness of the proposed combined method, by an application to a French-English Information Retrieval.
Paper ID: 97
Type: SP
Title: An Analysis of Limited Domains for Speech Synthesis
Contact author: Robert Batůšek
Topic: Speech - text-to-speech synthesis
Abstract: This paper deals with the problem of limited domain speech synthesis. Some experiments show that the segment variability is extremely large for unlimited speech synthesis. It seems that it is practically impossible to colllect the text corpus large enough to cover all combinations of even very coarse features. A natural question arises whether restricting the synthesizer to a specific domain can help to increase segment coverage. This paper provides an analysis of several limited domain text corpora and evaluates their applicability to the problem of segment selection for speech synthesis.
Paper ID: 98
Type: LP
Title: Advances in Very Low Bit Rate Speech Coding using Recognition and Synthesis Techniques
Contact author: Genevieve Baudoin and François Capman and Jan Černocký and Fadi El Chami and Maurice Charbit and Gérard Chollet and Dijana Petrovska-Delacrétaz
Topic: Speech - speech coding
Abstract: ALISP (Automatic Language Independent Speech Processing) units are an
alternative concept to using phoneme-derived units in speech
processing. This article describes advances in very low bit rate
coding using ALISP units. Results of speaker-independent
experiments are reported and speaker clustering using vector
quantization is proposed. The improvements of speech re-synthesis
using Harmonic Noise Model and dynamic selection of units are
discussed.
Paper ID: 99
Type: LP
Title: Different Approaches to Build Multilingual Conversational Systems
Contact author: Marion Mast and Thomas Ross and Henrik Schulz and Heli Harrikari
Topic: Dialogue - dialogue systems
Abstract: The paper describes developments and
results of the work being carried out
during the European research project
CATCH-2004 (Converse in AThens Cologne
and Helsinki) . The objective of
the project is multi-modal,
multi-lingual conversational access
to information systems. This paper
concentrates on issues of the
multilingual telephony-based speech
and natural language understanding
components.
Paper ID: 100
Type: LP
Title: Strategies to Overcome Problematic Input in a Spanish Dialogue System
Contact author: Victoria Arranz and Núria Castell and Montserrat Civit
Topic: Dialogue - dialogue systems
Abstract: This paper focuses on the strategies adopted to tackle problematic input
and ease communication between modules in a Spanish railway information
dialogue system for spontaneous speech. The paper describes the design
and tuning considerations followed by the understanding module, both
from a language processing and semantic information extraction point of
view. Such strategies aim to handle the problematic input received from
the speech recogniser, which is due to spontaneous speech as well as
recognition errors.
Paper ID: 101
Type: LP
Title: Fitting German into N-Gram Language Models
Contact author: Robert Hecht and Jürgen Riedler and Gerhard Backfried
Topic: Speech - automatic speech recognition
Abstract: We report on a series of experiments addressing the fact that German is
less suited than English for word-based n-gram language models. Several
systems were trained at different vocabulary sizes and various sets of
lexical units. They were evaluated against a newly created corpus of German
and Austrian broadcast news.
Paper ID: 102
Type: LP
Title: Dialogue systems and planning
Contact author: Guy Camilleri
Topic: Dialogue - other
Abstract: Planning processes are often used in dialogue systems to recognize
the intentions conveyed in dialogue. The generation of utterances
can also be achieved by a planning/execution mechanism. Some advantages of this kind of mechanim are: knowledge sharing,
modular design, declarative description, etc.
In this paper, we present some planning mechanisms and the related
models enabling the dialogue management (generation and
understanding).
Paper ID: 103
Type: LP
Title: A Comparison of Different Approaches to Automatic Speech Segmentation
Contact author: Kris Demuynck and Tom Laureys
Topic: Speech - speech segmentation
Abstract: We compare different methods for obtaining accurate speech
segmentations starting from the corresponding orthography. The
complete segmentation process can be decomposed into two basic
steps. First, a phonetic transcription is automatically produced
with the help of large vocabulary continuous speech recognition
(LVCSR).
Then, the phonetic information and the speech signal serve
as input to a speech segmentation tool. We compare two automatic
approaches to segmentation, based on the Viterbi and the
Forward-Backward algorithm respectively. Further, we develop
different techniques to cope with biases between automatic and
manual segmentations. Experiments were performed to evaluate the
generation of phonetic transcriptions as well as the different
speech segmentation methods.
Paper ID: 104
Type: LP
Title: Filtering of Large Numbers of Unstructured Text Documents by the Developed Tool TEA
Contact author: Jan Žižka and Aleš Bourek
Topic: Text - information retrieval
Abstract: This paper describes a text-document-filtering software tool TEA
(TExt Analyzer), which was originally developed for physicians to
support selections of large numbers of unstructured medical text
documents obtained from available Internet services. TEA learns
interesting and relevant documents for individual users
basically by the naive Bayes algorithm. Moreover, TEA
provides a number of additional functions that improve its
classification accuracy. The learning process of TEA is based on a
set of labeled positive and negative examples of text documents,
which obtain their labels from users interested in documents of
certain, usually very specific topics. Experiments and real uses
of TEA by physicians have demonstrated that a classification
accuracy---separating the documents between two classes
(interesting and uninteresting)---can be expected from 70% up to
97%, typically 85% and better.
Paper ID: 106
Type: LP
Title: KEYWORD SPOTTING USING SUPPORT VECTOR MACHINES
Contact author: Yassine Ben Ayed and Dominique Fohr and Jean Paul Haton and Gérard Chollet
Topic: Speech - automatic speech recognition
Abstract: Support Vector Machines is a new and promising technique in
statistical learning theory. Recently, this technique produced very
interesting results in pattern recognition.
In this paper, one of the first application of Support Vector Machines
(SVM) technique for the problem of keyword spotting is presented. It
classifies the correct and the incorrect keywords by using linear and
Radial Basis Function kernels. This is a first work proposed to use
SVM in keyword spotting, in order to improve recognition and rejection
accuracy. The obtained results are very promising.
Paper ID: 107
Type: LP
Title: Improved performances and automatic parameter estimation for a context-independent speech segmentation algorithm
Contact author: Guido Aversano and Anna Esposito
Topic: Speech - speech segmentation
Abstract: In the framework of a recently introduced algorithm for speech phoneme segmentation, a novel strategy has been elaborated for comparing different speech encoding methods and for finding parameters which are optimal to the algorithm. The automatic procedure that implements this strategy allows to improve previously declared performances and poses the basis for a more accurate comparison between the investigated segmentation system and other segmentation methods proposed in literature.
Paper ID: 110
Type: LP
Title: Phoneme Lattice Based A* Search Algorithm for Speech Recognition
Contact author: Pascal Nocera and Georges Linares and Dominique Massonié and Loic Lefort
Topic: Speech - automatic speech recognition
Abstract: This paper presents the Speeral continuous speech recognition system developed in the LIA.
Speeral uses a modified A* algorithm to find in the search graph the best path taking into
account acoustic and linguistic constraints. Rather than words by words, the A* used
in Speeral is based on a phoneme lattice previously generated.
To avoid the backtraking problems, the system keeps for each
frame the deepest nodes of the partially explored lexical tree
starting at this frame. If a new hypothesis to explore is ended by a word
and the lexicon starting where this word finishes has already been developed,
then the next hypothesis will ``jump'' directly to the deepest nodes.
Decoding performances of Speeral are evaluated on the test set of the ARC B1 campaign
of AUPELF'97. The experiments on this French database show the efficiency
of the search strategy described in this paper.
Paper ID: 112
Type: SP
Title: Some like it Gaussian ...
Contact author: P. Matějka and P. Schwarz and M. Karafiát and J. Černocký
Topic: Speech - automatic speech recognition
Abstract: In Hidden Markov models, speech features are modeled by Gaussian
distributions. In this paper, we propose to gaussianize the features to
better fit to this modeling. A distribution of the data is estimated and
a transform function is derived. We have tested two methods of the transform
estimation (global and speaker based). The results are reported on
recognition of isolated Czech words (SpeechDat-E) with
CI and CD models and on medium vocabulary continuous speech
recognition task (SPINE). Gaussianized data provided in all three
cases results superior to standard MFC coefficients proving, that the
gaussianization is a cheap way to increase the recognition accuracy
Paper ID: 113
Type: LP
Title: Visualisation Techniques for Analysing Meaning
Contact author: Dominic Widdows and Scott Cederberg and Beate Dorow
Topic: Text - lexical semantics and semantic networks
Abstract: Many ways of dealing with large collections of linguistic information
involve the general principle of mapping words, larger terms and
documents into some sort of abstract space. Considerable effort has been
devoted to applying such techniques for practical tasks such as
information retrieval and word-sense disambiguation. However, the
inherent structure of these spaces is often less well-understood.
Visualisation tools can help to uncover the relationships between
meanings in this space, giving a clearer picture of the natural
structure of linguistic information. We present a variety of tools for
visualising word-meanings in vector spaces and graph models, derived
from co-occurrence information and local syntactic analysis. Our
techniques suggest new solutions to standard problems such as
automatic management of lexical resources, which perform well under
evaluation.
The tools presented in this paper are all available for public
use on our website.
Paper ID: 115
Type: LP
Title: Part-of-Speech Tagging for Old Chinese
Contact author: Liang Huang and Yinan Peng and Huan Wang and Zhenyu Wu
Topic: Text - parsing and part-of-speech tagging
Abstract: Old Chinese is essentially different from Modern Chinese, in both grammar and morphology. While there has recently been a great deal of work on part-of-speech (POS) tagging for modern Chinese, the POS of Old Chinese is largely neglected. To the best of our knowledge, this is the first work in this area. Fortunately however, in terms of tagging, Old Chinese is easier than modern Chinese in that most Old Chinese words are single-character-formed, requiring no segmentation. So in this paper, we will propose and analyze a simple statistical approach for POS tagging of Old Chinese. We first designed a tagset for Old Chinese that is later shown to be accurate and efficient. Then we apply the hidden markov model (HMM) together with the Viterbi algorithm and made several improvements, such as sparse data problem handling, and unknown word guessing, both designed especially for Chinese. As the training set grows larger, the hit rate for bigram and trigram increases to 94.9% and 97.6%, respectively. The importance of our work lies in the previously unseen features that are special for Old Chinese and we have developed successful techniques to deal with them. Although Old Chinese is now a dead language, this work still has many applications in such areas as Ancient-Modern Chinese Machine Translation.
Paper ID: 117
Type: LP
Title: Audio Collections of Endangered Arctic Languages in the Russian Federation
Contact author: Marina Lublinskaya and Tatiana Sherstinova
Topic: Speech - other
Abstract: In the Russian Federation 63 minority languages are mentioned in the "Red
Book of the Languages of Russia", what means that they are
practically dying out. Because of that it is highly important to
make and preserve original recordings of these languages and
prepare their documentation. Arctic peoples of Russia are
demographically small and the number of speakers using them is decreasing
dramatically. The paper describes three projects related to two Northern
Languages - Nenets and Nganasan: Nenets Audio Dictionary, Nganasan Audio
Dictionary and Russian-Nenets Online Multimedia Phrase-book.
Paper ID: 122
Type: LP
Title: Spanish Natural Language Interface for a Relational Database Querying System
Contact author: Rodolfo A. Pazos R. and Alexander Gelbukh and J. Javier González B. and Erika Alarcón R. and Alejandro Mendoza M. and A. Patricia Domírnguez S
Topic: Text - other
Abstract: Fast growth of Internet is creating a society where the demand on information storage, organization, access, and analysis services is continuously growing. This constantly increases the number of inexperienced users that need to access databases in a simple way. Together with the emergence of voice interfaces, such a situation foretells a promising future for database query systems using natural language interfaces. We describe the architecture of a relational database querying system using a natural language (Spanish) interface, giving a brief explanation of the implementation of each of the constituent modules: lexical parser, syntax checker, and semantic analyzer.
Paper ID: 128
Type: SP
Title: An Analysis of Conditional Responses in dialogue
Contact author: Elena Karagjosova and Ivana Kruijff-Korbayová
Topic: Speech - other
Abstract: In the context of collaborative dialogue, we analyze
conditional responses of the form ``Not (if)
c/Yes if c'' in reply to a question under discussion ``q''. A
conditional response is used when the validity of "q"
depends on a condition "c": when "c" is established in the context,
the response indicates a possible need to revise "c", and thus
opens negotiation; otherwise, the response raises the question whether "c".
We discuss appropriateness conditions for conditional responses, and propose a
uniform approach to their generation and interpretation.