XML Applications

W3C Voice Browser Activity

Standards for Voice and Dialogue applications
- VoiceXML
- SRGS
- SISR
- SSML
- PLS
- Call Control XML
- State Chart XML
- …
W3C Recommendations

VoiceXML

Language for dialogue applications development.
Specification
Primary targeted to phone applications.
- telephone support automation
- railways/bus schedules information
- ticket reservation
- …
Describes algorithm for dialogue flow control (dialogue strategy)
Alternatively can be described by finite state automaton with output (Mealy automaton)
- SCXML
W3C standard W3C (present version 2.1, version 3.0 in state of Working Draft)

VoiceXML - processing

Application needs to be run on VoiceXML platform or using VoiceXML interpreter.
- desktop platforms - OptimTalk, publicVoiceXML, JVoiceXML, …
- opensource on-line - Asterisk+VoiceGlue, Asterisk+OpenVXI, …
- on-line commercial:
  - Bevocal Cafe
  - Voxeo Prophecy
  - …
- VoiceXML forms in XHTML documents
  - using namespaces (formerly W3C submission XHTML+Voice profile 1.0)
  - Support in Opera a Firefox web browsers.
- …

VoiceXML - example

Figure: VoiceXML example

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml">
   <form id="pizza-mixed">
      <grammar src="pizza.grxml"/>
      <initial name="pizzaall">
       <prompt>Welcome to FI pizzeria</prompt>
       <nomatch count="2"><assign name="pizzaall" expr="true"/></nomatch>
       <noinput count="2"><assign name="pizzaall" expr="true"/></noinput>
      </initial>
      <field name="kind">
       <prompt>What kind of pizza do you want?</prompt>
       <nomatch>We have salami, mozzarela and appolo pizza</nomatch>
       <noinput>We have salami, mozzarela and appolo pizza</noinput>
       <grammar src="pizza.grxml#kind"/>
      </field>
      <field name="topping">
       <prompt>What topping do you want?</prompt>
       <nomatch>We offer ketchup and chilli.</nomatch>
       <noinput>We offer ketchup and chilli.</noinput>
       <grammar src="pizza.grxml#topping"/>
      </field>
      <field name="drink">
       <prompt>What do you want to drink?</prompt>
       <nomatch>Select one of coke, sprite and water</nomatch>
       <noinput>Select one of coke, sprite and water</noinput>
       <grammar src="pizza.grxml#drink"/>
      </field>
      <field name="ack">
       <prompt>Did you ordered <value expr="kind"/> pizza with <value
       expr="topping"/> and <value expr="drink"/>?</prompt>
       <grammar src="yesno.grxml"/>
      </field>
      <filled>
       <if cond="ack=='yes'">
            <prompt>Order submitted</prompt>
       <else/>
            <clear namelist="kind topping drink ack"/>
       </if>
      </filled>
   </form>
</vxml>

SRGS (Speech Recognition Grammar Specification)

Standard for description of context free grammars.
- describes the accepted inputs of particular VoiceXML fields
Specification
Part of W3C Voice Browser Activity standards
Present version 1.0
SRGS - motivation
- User’s voice input needs to be recognized - continues speech recognition.
- success rate 50-99 %
Possibilities how to improve success rate:
- improve the language model
- problem domain restriction
- improve the user model
Problem domain restriction + language model improvement = SRGS.

SRGS - example

Figure: SRGS grammar referenced in the previous VoiceXML example (pizza.grxml)

<?xml version="1.0" encoding="UTF-8"?>
<grammar root="mixed" xml:lang="en_US">
<rule id="mixed">
   <item>
      <ruleref special="GARBAGE"/>
      <ruleref uri="#kind"/> pizza <ruleref special="GARBAGE"/>
      <ruleref uri="#topping"/> and <ruleref uri="#drink"/>
   </item>
   <tag>
   {
    out.kind=rules.kind;
    out.topping=rules.topping;
    out.drink=rules.drink;
   }
   </tag>
</rule>
<rule id="kind">
   <one-of>
    <item>salami</item>
    <item>mozzarela</item>
    <item>polo</item>
   </one-of>
</rule>
...
</grammar>

SISR (Semantic Interpretation for Speech Recognition)

Purpose:
- What is the meaning of recognized input?
Language for derivation of the recognized inputs semantic.
Based on ECMAScript.
Used in speech recognition grammars (see previous slide).
SISR 1.0 Specification

SSML (Speech Synthesis Markup Language)

link: Speech Synthesis Markup Language
W3C Standard
present version 1.1 (September 2010)
Used to describe prosody characteristics of synthesized speech.
- loudness
- prosody
- emphasis
- speech rate
- voice kind (male, female, neutral)
- …
Contains markup for description of pronunciation of foreign words.
- IPA (International Phonetic Alphabet) can be utilized.

SSML - example of loudness and breaks

Figure: SSML Breaks and loudness control example

<?xml version="1.0" encoding="utf-8"?>
<speak version='1.1' xmlns="http://www.w3.org/2001/10/synthesis"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/TR/speech-synthesis11/synthesis.xsd">
   <prosody volume="loud">
      Dobre rano.<break/>
   <prosody>
   <prosody volume="default">
      Jak se mate?
   </prosody>
</speak>

SSML - example of intonation modeling

Figure: SSML Intonation modeling

<speak ...>
   <prosody contour="(0%,50Hz) (75%, +10%) (80%, +20%) (90%,+30%)">
   Mas se dobre?
   </prosody>
</speak>

PLS (Pronunciation Lexicon Specification)

Pronunciation Lexicon Specification
- W3C standard
- Actual version - 1.0, October 2008
Developed for description of pronunciation of words, abbreviations, etc.
Used for:
- Speech synthesis (SSML) - pronunciation of
  - foreign words
  - abbreviations
  - number values
  - …
- Speech recognition (SRGS) - PLS allows to describe different pronunciations of some words (needed to be correctly recognized).

PLS Structure

Root element - lexicon
- contains one or more lexicon entries - lexeme element
  - contains:
    
    one or more word notations - grapheme element
    
    one or more word pronunciation - phoneme element
    
    pronunciation may be written using IPA, SAMPA, etc

PLS - example

Figure: PLS pronunciation example

<?xml version="1.0" encoding="utf-8"?>
<lexicon version="1.0"
    xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
      http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
    alphabet="ipa" xml:lang="cs-CZ">
   <lexeme>
   <grapheme>CSR</grapheme>
   <phoneme>tʃˈeː ˈes ˈer</phoneme>
   <phoneme>tʃˈeskaː rˈepublˌika</phoneme>
   </lexeme>
</lexicon>

Call Control XML

Voice Browser Call Control eXtensible Markup Language
Provides declarative markup to describe telephony call control
- directing calls to corresponding application/human
- merging multiple calls into a conference call
- the ability to place outgoing calls
- handling for a richer class of asynchronous events
- handling the outside call queue for VoiceXML
- etc.

State Chart XML

W3C Recommendation (September 2015) of event-based state machine.
General-purpose event-based state machine language.
Based on:
- CCXML
- Harel State Tables (included in UML for example)

State Chart XML - Relation to Dialogue

Dialogue can be modeled using Mealy Automaton.
- Mealy automaton - finite state automaton with an output function.
- States of the automaton corresponds to the states of the dialogue.
- Transition is function of the user input.
- Output function is the dialogue system response.
Mealy automaton can be described using the SCXML (see example)

SCXML - Demo

Example 1: Process planing demo

Process state diagram

(if the image does not show, click here - Process state diagram)

SCXML - Demo

Example 1: Corresponding SCXML

<?xml version="1.0" encoding="UTF-8"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/07/scxml">
 <initial>
  <transition target="Created" type="external"/>
 </initial>
 <state id="Created">
  <transition target="Waiting" event="enqueue"/>
 </state>
 <state id="Waiting">
  <transition target="Running" event="assign"/>
 </state>
 <state id="Running">
  <transition target="Blocked" event="wait for resource"/>
  <transition target="Waiting" event="timeout"/>
  <transition target="Terminated" event="terminate"/>
 </state>
 <state id="Blocked">
  <transition target="Waiting" event="resource available"/>
 </state>
 <final id="Terminated"/>
</scxml>