CSLU Seminar Information


During the Spring 2012 term, the seminar will meet on Mondays at 11am

E-mail Eric Morley (morleye [at] gmail dot com) to reserve a seminar date.

Upcoming Seminar

Time: Monday, May 14, 11am, PC401

Speakers: Emily Tucker Prud'hommeaux and Maider Lehr (joint work with Izhak Shafran and Brian Roark)

Title: Fully Automated Neuropsychological Assessment for Detecting Mild Cognitive Impairment

Abstract: In this paper we present an end-to-end system for automatically analyzing spoken responses to a narrative recall test commonly administered to seniors as part of clinical neuropsychological assessment. In this test, a patient listens to a brief narrative, immediately retells it, then retells it again later in the session, after some time has elapsed.
ASR transcripts of retellings are automatically aligned to the source narrative, and features are extracted that replicate the published clinical scoring method, which are then used for automatic assessment using a classifier.
On a test corpus of 72 subjects, we empirically evaluate different ASR adaptation strategies and analyze the errors and their relationship to clinical assessment accuracy. Despite imperfect recognition, the system presented here yields classification accuracy comparable to that of scores derived from manual transcripts. This work advances on prior work on automated assessment that relied on manual transcripts for automated scoring.

Related Info and Links

Other CSLU Events

  • Lunch-n-Leisure: Enjoy lunch with your colleagues, every school day at 12:30 in the lab/lounge.



DateSpeakerTitleNotes

Spring 2012

Monday,06/11/2012Eric Morley  
Monday,06/04/2012Brian Snider  
Monday,05/28/2012   
Monday,05/21/2012John HaleEntropy Reduction as a psycholinguistic complexity metric in Asian Languages 
Monday,05/14/2012Emily T. Prud'hommeaux and Maider LehrFully Automated Neuropsychological Assessment for Detecting Mild Cognitive Impairment 
Monday,05/07/2012Courtney StevensImaging the Reading Brain10am
Monday,04/30/2012Kirstin AschbacherDo Psychological States Really Affect Your Health? Dynamic Models of Stress-Arousal Systems Provide Novel Insights 
Monday,04/23/2012Susan KemperAging and Vulnerability to Dual Task DemandPaul Kirk Conference center, 3rd floor Center for Health and Healing (CHH), Rm 4 (3070), 9am
Monday,04/16/2012Géza KissQuantitative Analysis of Prosody in ASD and DLD 
Monday,04/09/2012Margaret Mitchell and Richard SproatDiscourse-Based Modeling for Augmentative and Alternative Communication10am
Monday,04/02/2012   

Winter 2012

Tuesday,03/20/2011Géza KissQuantitative Analysis of Prosody in Conversational Speech in Autism Spectrum Disorders and in Developmental Language Disorders 
Friday,03/16/2011Qi MiaoPerceptual Cost Function for Cross-fading based Concatenation10a.m
Tuesday,03/13/2011Daryush MehtaParametric speech production representations for formant tracking and joint source-filter modeling 
Wednesday,03/07/2012Rebecca Lunsford Toward Improving Dialogue Coordination in Spoken Dialogue Systems10am
Tuesday,03/06/2012   
Tuesday,02/28/2012Russell BeckleySynchronous and Asynchronous methods for Binary Typing Using Language Models, Huffman Encoding, and Morse-like code 
Tuesday,02/21/2012Meg MitchellGenerating Descriptions of Visible Objects  
Tuesday,02/14/2012Emily Tucker Prud'hommeaux Automatic alignment of narrative retellings for diagnostic classification 
Tuesday,02/07/2012   
Tuesday,01/31/2012Meysam AsgariRobust Voiced-Unvoiced Decision Using Likelihood Ratio Test and Harmonic Model in Noisy Environments2pm
Tuesday,01/24/2012Andrew FowlerTowards Technology-Assisted Co-Construction with Communication Partners 
Tuesday,01/17/2012Ranjani RamakrishnanHaplotype blocks: A multi-variate statistical definition and an algorithm for identification 
Tuesday,01/10/2012Florian MetzeAutomatically Assessing Personality from Speech10am


2011 Seminars

2010 Seminars


Seminar Abstracts

Speaker:John Hale
Title:Entropy Reduction as a psycholinguistic complexity metric in Asian Languages
Abstract:

Speaker: Emily Tucker Prud'hommeaux and Maider Lehr (joint work with Izhak Shafran and Brian Roark)
Title: Fully Automated Neuropsychological Assessment for Detecting Mild Cognitive Impairment
Abstract: In this paper we present an end-to-end system for automatically analyzing spoken responses to a narrative recall test commonly administered to seniors as part of clinical neuropsychological assessment. In this test, a patient listens to a brief narrative, immediately retells it, then retells it again later in the session, after some time has elapsed.
ASR transcripts of retellings are automatically aligned to the source narrative, and features are extracted that replicate the published clinical scoring method, which are then used for automatic assessment using a classifier.
On a test corpus of 72 subjects, we empirically evaluate different ASR adaptation strategies and analyze the errors and their relationship to clinical assessment accuracy. Despite imperfect recognition, the system presented here yields classification accuracy comparable to that of scores derived from manual transcripts. This work advances on prior work on automated assessment that relied on manual transcripts for automated scoring.

Speaker: Courtney Stevens & Madison Niermeyer
Title: Imaging the Reading Brain
Abstract: Reading is a form of perceptual expertise demanded in most cultures. However, as reading is a relatively recent cultural invention, the brain is not innately specified for this task. Thus, over the course of literacy acquisition, the brain must develop specialized processing systems to support the fast, accurate identification of written symbol strings. In this talk, we will provide an overview of the key questions in studies of the neurobiology of literacy learning. Time permitting, we will discuss the findings from a subset of recent studies examining (1) the emergence of the 'reading brain' in kindergarten children on-track and at-risk for reading failure, (2) the neural systems recruited for processing more elementary units of alphabetic scripts (e.g., single letters), and (3) differences in neural systems recruited when processing logographic scripts (Japanese Kanji).

Speaker: Kirstin Aschbacher
Title: Do Psychological States Really Affect Your Health? Dynamic Models of Stress-Arousal Systems Provide Novel Insights
Abstract:Popular culture increasingly champions the belief that psychological stress and negative emotions trigger or contribute to disease. This belief has fueled myriad stress-reduction and wellness treatments that advertise health benefits, while the reality of the scientific evidence is that the exact pathways and mechanisms for stress-health relationships are poorly understood. One key barrier to understanding is that even though physiological stress responses are highly dynamic and feedback-regulated, they are not typically modeled as such by stress researchers. In this talk, I will provide several empirical examples of how dynamic systems models and theoretical frameworks can be applied to the analysis of patient data in order to reveal novel insights about disease processes. I focus on two of the primary stress-arousal systems that mediate relationships between psychological experience and immune, metabolic, and cardiovascular system activity - the Hypothalamic-Pituitary-Adrenal (HPA) axis and the Autonomic Nervous System (ANS). This technique derives a “personalized systems model” by estimating a set of unique model parameters for each individual. Based on robustness theory, I propose an intermediate phenotype of stress-arousal system behavior likely to increase vulnerability to certain types of disease. This hypothesis is then tested using HPA data from one sample of individuals with fibromyalgia and/or chronic fatigue syndrome, and a second sample of obese individuals. The theoretical framework is then extended to understanding ANS responses to stress in relation to psychological and biological indicators of health status. In conclusion, dynamic systems approaches provide novel theoretical and quantitative tools, which may help researchers better define the complex relations between stress arousal systems and symptoms of mental and physical health.

Speaker:Susan Kemper
Title:Aging and Vulnerability to Dual Task Demand
Abstract: A digital pursuit rotor was used to monitor speech planning and production costs by examine dual-task costs to pursuit rotor tracking. In one version of the task, the auditory wave form produced as young and older adults were describing someone they admire was time-locked to the tracking task.. The speech sample and time-locked tracking record were segmented at utterance boundaries and multilevel modeling was used to determine how utterance-level predictors such as utterance duration or sentence grammatical complexity and person-level predictors such as speaker age or working memory capacity predicted tracking performance. Three models evaluated the costs of speech planning, the costs of speech production, and the costs of speech output monitoring. The results suggest that planning and producing propositionally dense utterances are more costly for older adults and that older adults experience increased costs as a result of having produced a long, informative, or rapid utterance. In a second version of the task, a controlled sentence production task was combined with digital pursuit rotor tracking. Participants had to plan and produce a sentence using provided nouns and verbs. Properties of the nouns and verbs were manipulated. The analysis indicated that sentence planning was more costly than sentence production, and planning and producing utterances with long noun phrases were especially difficult for older adults. Tracking dual task demands thus reveals how aging affects both sentence planning and production.

Speaker:Géza Kiss
Title:Quantitative Analysis of Prosody in ASD and DLD
Abstract:The diagnosis of Autism Spectrum Disorders (ASD) is labor intensive and requires highly trained professionals. Automated analysis of conversational speech could potentially aid in providing more broadly accessible means for identifying high-risk individuals. This seminar will be complementary to my previous one: This time I am going to show numeric results and plots from my experiments, comparing pitch features of children with ASD, DLD (Developmental Language Disorder), and TD (Typical Development).


Speaker:Margaret Mitchell and Richard Sproat
Title:Discourse-Based Modeling for Augmentative and Alternative Communication
Abstract: This paper presents a method for an Augmentative and Alternative Communication (AAC) system to predict a whole response given features of the previous utterance from the interlocutor. It uses a large corpus of scripted dialogs, computes a variety of lexical, syntactic and whole phrase features for the previous utterance, and predicts features that the response should have, using an entropy-based measure. We evaluate the system on a held-out portion of the corpus. We find that for about 3.5% of cases in the held-out corpus, we are able to predict a response, and among those, over half are either exact or at least reasonable substitutes for the actual response. We also present some results on keystroke savings. Finally we compare our approach to a state-of-the-art chatbot, and show (not surprisingly) that a system like ours, tuned for a particular style of conversation, outperforms one that is not.
Predicting possible responses automatically by mining a corpus of dialogues is a novel contribution to the literature on whole utterance-based methods in AAC. Also useful, we believe, is our estimate that about 3.5-4.0% of utterances in dialogs are in principle predictable given previous context.

Speaker:Géza Kiss
Title:Quantitative Analysis of Prosody in Conversational Speech in Autism Spectrum Disorders and in Developmental Language Disorders
Abstract: The diagnosis of Autism Spectrum Disorders (ASD) is labor intensive and requires highly trained professionals. Automated analysis of conversational speech could potentially aid in providing more broadly accessible means for identifying high-risk individuals. Although speech prosody is often atypical in ASD, prosody plays only a minimal role in diagnostics, possibly because of reliability issues. Our goal was to identify quantitative prosodic features that can reliably differentiate children with Typical Development (TD), Developmental Language Disorder (DLD), and ASD, using ADOS recordings. We matched the subjects on relevant measures, evaluated global and per-utterance pitch statistics, and found that several prosodic features discriminated significantly (at p<0.05) between groups, with correct classification rates well above chance-level. We also analyzed the spectral content using LTAS (Long Term Average Spectrum; pitch-normalized).

Speaker:Qi Miao
Title:Perceptual Cost Function for Cross-fading based Concatenation
Abstract: Concatenative synthesis is currently the most widely-used Text to Speech (TTS) framework. However, the problem is that it can not guarantee to minizie both target cost and concatenation cost at the same time. In result, the selected units for concatenation may come from totally different phonemic and prosodic contexts, which can lead to audible discontinuities in the output speech at the concatenation points. Various speech modification methods have been studied and applied during concatenation. In most cases, they can create a locally smooth transition between two units but the result speech may be far from the target. In a previous study, a linear cross-fading weight function was used to remove spectral and time domain discontinuities during concatenative speech synthesis. We learned that concatenation through a linear weighted cross-fading function can produce smooth, yet unnaturally shaped formant trajectories; in addition, we noted that the precise details of how to cross-fade a specific pair of units may be highly context-dependent.
We propose a new algorithm that uses an unit-dependent parameterized cross-fading weight function to create more natural-looking formant trajectories and, it is hoped, better-sounding output speech. The proposed algorithm uses a perceptually-based objective function to capture differences between cross-faded and natural trajectories across the whole region of the phoneme, and uses phoneme identity, prosodic contexts, and acoustic features of the units to predict optimal cross-fading parameters. This thesis reports a study on the feasibility of developing such perceptual cost functions. A special corpus was designed to produce a variety of shapes of formant frequency trajectories in different linguistic environments. A perceptual experiment was performed to determine if we could predict perceptual quality of output speech from acoustic distance measures. We generated a range of synthetic/natural stimulus pairs, where the synthetic stimuli were generated using three types of cross-fading models, applied to different regions in the vowel. The results show that the perceptual cost function can be reliably predicted from the distance measures. Moreover, the results support our hypotheses that: a) the quality of the output speech is influenced by the shape of formant trajectories in entire region across the vowel; and b)human perceptual scores are correlated to both absolute distance and the first derivative of absolute distance of formant trajectories.

Speaker:Daryush Mehta
Title:Parametric speech production representations for formant tracking and joint source-filter modeling
Abstract: We continue to see an urgent need for robust representations of the acoustic speech waveform, especially in speakers with speech and voice impairments. First, I will discuss our approach to the problem of formant and antiformant tracking, in which extended Kalman algorithms take advantage of a linearized mapping from formant frequencies and bandwidths to cepstral coefficients in an autoregressive moving average model. Second, we will explore joint source-filter models for representing sustained vowel phonation that exhibits nonstationary or asymmetric vocal fold vibration. These algorithms hold potential clinical significance for better understanding acoustic-physiological relationships observable with current imaging systems.

Speaker:Rebecca Lunsford
Title: Toward Improving Dialogue Coordination in Spoken Dialogue Systems
Abstract: When engaged in a conversation, speakers use both verbal and non-verbal mechanisms to help coordinate the dialogue, ensuring that, at each point, the other is engaged in the dialogue, and is capable of hearing, understanding and responding to the speaker. The problem is that current Spoken Dialog Systems (SDSs) do not take full advantage of dialogue coordination mechanisms, which can lead to interactions that are unnatural and inefficient. However, we posit that a SDS should anticipate, recognize and potentially emulate the full richness of dialogue coordination mechanisms. In this dissertation research, we aim to further understand dialogue coordination mechanisms, and to assess how they might be used to ease human-computer interaction. We start by investigating what cues a human speaker uses to differentiate computer-directed speech from self-directed speech, and from human-directed speech, finding that in both cases speech directed to the computer is much louder. We next conduct a perceptual study to determine what cues people attend to when determining whether a speaker is addressing a computer or nearby human. Here we found that people tended to rely on the direction of the speaker's gaze, although this led to systematic errors in their judgments of addressee. We next investigate whether `um' and `uh' result from the same, or different cognitive processes, using human-human interaction data collected while clinicians interacted with children with typical development, autism, and developmental language disorder. Here we found that `um' appears to be listener-oriented, and `uh' speaker-oriented. Next, again using the data from above, we investigated what factors impact the length of inter-turn gaps, and whether there is an interaction between gaps, disfluencies and social pressure to respond. Here we found that, after a question, speakers tend to respond more quickly, are more likely to start their speech with a disfluency, and that the likelihood of a disfluency increased with the length of the gap. Finally, we conduct a simulation study, using Reinforcement Learning, to demonstrate the dialogue policies can be created that take advantage of dialogue coordination mechanisms.

Speaker:Russell Beckley
Title:Synchronous and Asynchronous methods for Binary Typing Using Language Models, Huffman Encoding, and Morse-like code
Abstract: When the normal means of human communication are impaired, there is a strong motive to find alternative means. The field of Augmentative and Alternative Communication is aimed at studying and improving such means. When a person types by making a series of binary choices, we call this binary typing. This talk covers efforts to improve the efficiency of binary typing through the use of context dependent symbol probabilities to dynamically optimize binary codes. I will discuss efforts to make the user experience more leisurely, by introducing "asynchronous" input, i.e. a series of dots and dashes in the style of Morse code. Lastly, we add some functionality to improve the ability to repair input errors.

Speaker:Meg Mitchell
Title:Generating Descriptions of Visible Objects
Abstract:What do people describe when they look at objects? Can we model what they say? (Why does this matter?) This talk will characterize what makes up a visual description and define some of the methods necessary to automatically generate such descriptions. Taking this a bit further, I describe an end-to-end system that reads in computer vision output and generates natural language descriptions. Time permitting, I argue that improving visual descriptions can also improve computer vision, and working on the interaction between the two may lead to advances in both computer vision and natural language generation. My prototype vision-to-language system, developed in collaboration with vision researchers at Stony Brook and language researchers at U. Maryland, is available at: http://recognition.cs.stonybrook.edu:8080/~mitchema/midge/

Speaker:Emily Tucker Prud'hommeaux
Title:Automatic alignment of narrative retellings for diagnostic classification
Abstract:Many commonly used neuropsychological assessment instruments include a narrative recall task in which a subject must listen to and retell a short narrative. Poor performance on such tasks can be indicative of deficits in memory, cognition, language, and social communication, which in turn can be associated with neurological disorders such as dementia and autism. In this talk, I will outline techniques for automatically assessing the quality of narrative retellings via alignment to the source narrative. Word alignments serve both to automate standard manual scoring procedures and to derive features related to narrative fidelity and coherence that have previously been shown to distinguish between different diagnostic groups. I will show that these automatically derived word alignments yield accurate narrative recall scores and provide sufficient information to achieve robust diagnostic classification performance. In addition, I will discuss how these methods could be adapted to enhance other language evaluation technologies, including tools for automated essay scoring and assistive communication software

Speaker:Meysam Asgari
Title:Robust Voiced-Unvoiced Decision Using Likelihood Ratio Test and Harmonic Model in Noisy Environments.
Abstract:Classification of the short-time speech segments into voiced and unvoiced segments is a crucial part of several speech-processing systems, such as automatic speech recognition, speech coding, and speaker diarization. Various types of algorithms have been proposed in the literature. However, they are mostly sensitive to background noise and their performance is significantly degraded at low signal-to-noise ratios (SNRs).
In this study, voicing decision problem using Likelihood Ratio Test (LRT) is proposed. We adopt a harmonic plus noise model for the modeling of voiced speech. We also employ a Bayesian approach using the prior statistical information of speech phonemes for estimating the model parameters. In addition, to improve the voicing decision, we use a two-state Hidden Markov Model (HMM), which considers the pervious decision results by a first-order Markov process modeling. Experimental results show better performance of the proposed method compared to the get-f0, an algorithm employed in many popular tools (Wavesurfer, Snack,etc).

Speaker:Andrew Fowler
Title:Towards Technology-Assisted Co-Construction with Communication Partners
Abstract: In this talk, we examine the idea of technology-assisted co-construction, where the communication partner of an AAC user can make guesses about the text of intended messages, and those guesses are included in the user's word completion/prediction interface. We run some human trials to simulate this new interface concept, with subjects predicting words as the user's intended message is being generated in real time with specified typing speeds. Our results indicate that a human communication partner can provide substantial keystroke savings by providing word completion or prediction, but that the savings are not as high as those provided by n-gram language models. Interestingly, the language model and human predictions are complementary in certain key ways - humans doing a better job in some circumstances on contextually salient nouns. We discuss implications of the enhanced co-construction interface for real-time message generation in AAC direct selection devices.
Speaker:Ranjani Ramakrishnan
Title:Haplotype blocks: A multi-variate statistical definition and an algorithm for identification
Abstract:Large volumes of data are currently generated by next-generation sequencing projects and compression of genomic data becomes critical for analysis and storage. Reducing the dimensionality of genomic data is important in the discovery of risk factors for disease. Previous approaches for compression used linkage disequilibrium(LD)-defned as the non-random association of SNPs-to partition the genome, but this approach misses out on higher order statistical dependencies. We propose an algorithm for defning haplotype blocks based on conditional independence between its component single nucleotide polymorphisms (SNPs). A SNP (pronounced Snip) is a single base pair change in the DNA sequence of an individual, when compared to a reference sequence. We characterize the performance of our algorithm using simulated data-generated under two independent models and examine its performance using population data from the Hapmap project. We find that the algorithm has high concordance between different runs (with differing sample values). It also achieves better levels of compression compared with pair-wise tests for picking tagSNPs and has a 58% overlap with LD blocks generated under the graphical model.

Speaker: Florian Metze
Title: Automatically Assessing Personality from Speech
Abstract:In this talk, we present results on applying a personality assessment paradigm to speech input, and compare human and automatic performance on this task. We cue a professional speaker to produce speech using different personality profiles and encode the resulting vocal personality impressions in terms of the Big Five NEO-FFI personality traits. We then have human raters, who do not know the speaker, estimate the five factors. We analyze the recordings using signal-based acoustic and prosodic methods and observe high consistency between the acted personalities, the raters. assessments, and initial automatic classification results. We further validate the application of our paradigm to speech input, and extend it towards text independent speech. We show that human labelers can consistently label speech data generated across multiple recording sessions with respect to personality, and investigate further which of the 5 scales in the NEO-FFI scheme can be assessed from speech, and how a manipulation of one scale influences the perception of another. Finally, we present a top-down clustering of human labels of personality traits derived from speech, which will be useful in future experiments on automatic classification of personality traits. This presents a first step towards being able to handle personality traits in speech, which we envision will be used in future voice-based communication between humans and machines.