Table of Contents
Summary
STAPTk software was developed in the Speech and Hearing group, at the department of Computer Science, University of Sheffield. It is the epitome of the research work of Athanassios Hatzis, nine years in total, during his MSc, PhD and two funded research projects STARDUST, NHS-NEAT 275K€ UK project, and OLP12, FP5-Quality of Life 2.67M€ EC project.
Those days Athanassios was a central figure in the department, he has written most of the code for STARDUST and OLP and also acted as the initiator and a group leader of software development for the two funded projects.
STAPTK in OLP
Background
One of the main components of the OLP system is the final release of STAPTK (OPTACIA). OPTACIA was first published in WISP 2001 conference and is strongly based on the software that Athanassios Hatzis developed during his PhD thesis as well as the software developed during the STARDUST project. This final version is linked to the other components of the OLP system by executing OPTACIA through a command-line interface.
OPTACIA Kinematic Maps
An OPTACIA Kinematic map uses a two-dimensional ANN-trained map as a visual metaphor for isolated sounds and simple utterances. We describe the map as “kinematic” (i.e. relating to movement) because its visualization technique correlates strongly with articulatory movement during speech production. Visualization of speech sounds is instantaneous, the client can vary the articulators in response to on-screen visual feedback. Speech therapy is based on the evaluation of the quality of speech production and accompanied characteristics i.e. consistency and intelligibility. OPTACIA is meant to assist speech therapists in that role.
Publication
This paper concentrates on therapy based on real-time audio-visual feedback of client’s speech and presents results from building and testing visual maps for training hearing-impaired clients using the OPTACIA component. OPTACIA was developed from the Optical Logo-Therapy OLT (Hatzis, 1999; Hatzis and Green, 2001). OPTACIA is based on three basic, well-founded treatment principles: visuomotor tracking, visual contrast, and visual reinforcement. Visuomotor tracking (Ziegler, Vogel, Teiwes, and Ahrndt, 1997) is a special case of biofeedback where some dynamic physical measure of performance is portrayed visually in real-time.
Demonstration
-
Real-time audio visual animation with a sprite on the map
-
Real-time audio visual animation and Waveform display on wavesurfer
Visual Feedback Main Characteristics
Speech Therapy requirements are translated to visual feedback requirements and therefore the computer-based speech training aid, in our case OPTACIA, has to demonstrate visual consistency (phonetic map areas, and speech trajectories), visual contrast of the target areas of the map, and visual accuracy on the production of speech. In OPTACIA we achieved this with target markers, each target-marker has an area associated around it that is used for scoring purposes. The target markers’ functionality make them very useful because speech therapist can set them to fine-tune OPTACIA map with intermediate targets and establish short-term and long-term speech production targets for patients with various speech disorders.
OPTACIA Main Tasks
- Record speech data
- Transcribe speech data
- Build map
- select speech data
- design layout
- train map
- Explore layout of the map
- Monitor/Measure performance on the map
Future Directions and Problems to solve
Researchers interested in the technique developed in OPTACIA should consider the following issues that have been tackled in the present software :
- Acoustic to articulation mapping
- Synchronizing real-time audio-visual playback and recording
- Definition and statistical modeling of speech targets
- Automation of segmentation/labelling problem
- Automation of training maps
- The silence/speech detection problem
- Definition of metrics
- Distance from target
- Acceptance/Rejection
- Consistency
- Intelligibility
- Visualization of metrics
- Mapping/Visualisation of unseen speech input
- Mapping/Visualisation of non-continuous sounds
Technical Specifications
STAPTk is based on open-source software but unfortunately University of Sheffield decided not to make it public.
-
Front end GUI in Tcl/Tk and incr-Tcl/incr-Tk
-
Back end speech and graphics processing in C
-
Speech processing with Snack and Wavesurfer (Kåre Sjölander and Jonas Beskow, 2003)
-
Speech detection algorithms, MGR endpointer (Bruce T. Lowerre, Public domain, 1995)
-
Graphics processing MATLAB Run-Time-Libraries
Credit
Credit is given to the following people that contributed in the development of STAPTk.
Domain expertise contributors
-
Phil Green (Speech Technology)
-
Mark Hawley (Assistive Technology)
-
Pam Enderby (Speech Pathology)
-
Sara Howard (Phonetics)
-
Rebecca Palmer (Speech Therapy)
-
Mark Parker (Speech Therapy)
-
Kate Woods (Speech Therapy)
-
David House (Speech Phonology)
-
Anne-Marie Öster (Speech Pathology)
Code contributors
-
Vincent Wan (ANN training)
-
James Carmichael (GUIs, STARDUST)
-
Stuart Cunningham (STARDUST-Command Sequence Recognition, Speech Training)
-
Kåre Sjölander (Synchronization of real-time audio-visual feedback with Snack-Wavesurfer)
Publications
The paper titled “An Integrated Toolkit Deploying Speech Technology for Computer Based Speech Training with Application to Dysarthric Speakers” summarizes the development and application of STAPTk.
Abstract
Computer based speech training systems aim to provide the client with customised tools for improving articulation based on audio-visual stimuli and feedback. They require the integration of various components of speech technology, such as speech recognition and transcription tools, and a database management system which supports multiple on-the-fly configurations of the speech training application. This paper describes the requirements and development of STAPTk from the point of view of developers, clinicians, and clients in the domain of speech training for severely dysarthric speakers. Preliminary results from an extended field trial are presented.
Presentation
Eurospeech 2003 – STAPTk (ppsx)