CSLU Seminar Information
During the Fall 2013 term, the seminar will meet on Mondays at 10am. Please see the table below for locations.
E-mail Eric Morley (morleye [at] gmail dot com) to reserve a seminar date.
Upcoming SeminarDate: Monday, December 16
Location: SoN 116
Speaker: Brian Bush
Title: Coarticulation Modeling of Continuous Speech
Abstract: Coarticulation is the influence of one phoneme on another. Modeling these phenomenon has been limited to short phonetic sequences of speech. We propose a novel method for coarticulation modeling of continuous style-independent speech. We model the speech trajectory as a series of overlapping triphone coarticulation models. We use a global error measure to search for optimal formant targets for all phonemes, including classes of sounds where formants are not directly observable. The estimated formant targets are largely in agreement with acoustic-phonetic expectations and the overall model fit with observed formant tracks. We discuss a forthcoming intelligibility experiment that will compare resynthesized words using modeled formant trajectories versus observed formant trajectories.
Related Info and Links
|12/16/2013||Brian Bush||Coarticulation Modeling of Continuous Speech|
|12/09/2013||Shiri Azenkot||Eyes-Free Text Entry on Touchscreen Devices|
|12/05/2013||Yuan Cao||Structured Winnow and M-MIRA: Structured Predictors with Multiplicative Update||at 11 AM|
|12/02/2013||Mahsa Yarmohammadi||Faster hedge parsing via finite-state pre-processing||30 minute talk at 10:30|
|12/02/2013||Mahsa Langarani||A Novel Pitch Decomposition Method for the Generalized Linear Alignment Model||30 minute talk at 10:00|
|11/25/2013||Golnar Sheikhshabbafghi||Latent Dirichlet Allocation, Assumptions And Extensions|
|11/18/2013||Maider Lehr||Dialectal automatic speech recognition / Automatic scoring of narrative retelling tasks|
|11/11/2013||Rob Stites and Izhak Shafran||High-Performance Computing at CSLU|
|10/28/2013||Ranjani Ramakrishnan||Discovery and Characterization of Genetic and Epigenetic Variations in the Macaque Genome|
|10/23/2013||Keith Hall & Richard Sproat||Linear Ranking Models for Speech Applications||On Wednesday; at 1:30pm|
|10/21/2013||Guillaume Thibault||Mathematical morphology operators for signal processing. Application to immunogold particle detection.|
|10/14/2013||Anna Wilson||Language samples from pediatric psychology research: Potential health-related collaborations|
|10/11/2013||Ethan Selfridge||Importance-Driven Turn-taking for Spoken Dialogue Systems||Thesis defense; at 10am|
|10/10/2013||Jason Williams||Dialog state tracking: Open problems, challenge task, and recent work||At 2:30pm|
I will discuss two projects that explore different approaches to improve eyes-free input. The first approach involves onscreen gestures. I will present Perkinput, a text entry method for touchscreens with no soft keys. Perkinput allows users to set reference points anywhere on the screen and enter characters by tapping chords in patterns based on Braille. Next, I will discuss the patterns and challenges of using speech recognition for entering text on mobile devices. While speaking in itself is eyes-free, reviewing and editing a speech recognizer's output poses interesting challenges to blind mobile device users. Our study is the first to explore these issues and provide insight on how to design eyes-free dictation systems.
Speaker: Yuan Cao
Title: Structured Winnow and M-MIRA: Structured Predictors with Multiplicative Update
Abstract: We present two new structured prediction algorithms: structured Winnow and multiplicative MIRA (M-MIRA). Both of them are multiplicatively updating online learning algorithms, and the two algorithms are closely connected. Compared with the popular additively updating predictors like the structured Perceptron and MIRA, multiplicatively updating strategy tends to give sparse solutions when millions of features are defined but a large part of them are irrelevant, which is a common situation for many NLP tasks. This natural feature selection process gives a lower training and generalization loss upper-bound. We give theoretical analysis to those algorithms and show empirical results to compare predictors with different updating strategies. This is joint work with Sanjeev Khudanpur and Abe Ittycheriah (IBM Watson research center).
Speaker: Mahsa Yarmohammadi
Title: Faster hedge parsing via finite-state pre-processing
Abstract: In my previous CSLU seminar, I introduced "hedge grammars" and parsing with those grammars as an approach to discover every constituent of length up to some span of L. Our experimental results on English and Chinese showed that parsing with hedge grammars is fast and it discovers the limited-span constituents quite as accurate as that using a full context-free grammar.
In this seminar, I will talk about faster parsing with hedge grammars using finite-state pre-processing of input sentences. Instead of parsing the entire sentence, we chunk it into segments that are - most probably - complete hedges and we parse the segments. Our chunker is a finite-state classifier that decides if a word can begin a new hedge. I explain some methods to chunk the input sentence with high precision on hedge boundaries.
Speaker: Mahsa Langarani
Title: A Novel Pitch Decomposition Method for the Generalized Linear Alignment Model
Abstract: Superpositional models of intonation typically propose decomposing fundamental frequency (F0) contours into phrase curves and accent curves, aligned with phrases and left-headed feet, respectively. Extracting these component curves from F0 contours without making undue assumptions is challenging. We propose a novel method for decomposing pitch curves, based on the assumption that accent curves can be described by combining skewed normal distributions and sigmoid functions. In contrast to an earlier pitch decomposition algorithm (PRISM), this allows for simple joint optimization of phrase and accent curve parameters, using fewer parameters. The proposed method was evaluated on three speech corpora containing: (1) synthetically generated pitch curves, (2) all-sonorant utterances, and (3) utterances containing both sonorant and non-sonorant speech sounds. The root weighted mean squared error is small, and, on the corpus for which comparable data are available, is significantly smaller than for PRISM.
Speaker: Golnar Sheikhshabbafghi
Title: Latent Dirichlet Allocation, Assumptions And Extensions
Abstract: Latent dirichlet allocation (LDA) has been proved useful not only in many NLP applications but also in other areas of research such as image processing, population genetics, etc. However, there are a few problems, namely unrealistic assumptions and the non-interpretable topics.In this talk, after a brief description of the model itself, I will talk about its assumptions and extensions relaxing those assumptions. I will also introduce methods in the literature for interpreting and guiding the topics.
Speaker: Maider Lehr
Title: Dialectal automatic speech recognition / Automatic scoring of narrative retelling tasks
Abstract: In the first part of the talk, I will present our preliminary work on adapting a Standard American English (SAE) speech recognizer for the recognition of dialectal speech, in particular, for the recognition of African American Vernacular English (AAVE). So far, we have performed generative MAP adaptation of the Acoustic Models, and used this system as the baseline for a subsequent discriminative learning. We have explored different sets of features to train the discriminative model; pronunciation related features as well as features used in previous work to improve Arabic automatic speech recognition.
The second part of the talk will be focused on alternative approaches for the previous work done on the automatic scoring of narrative retelling tasks. I will present some results of the scoring accuracy using a CRF-based linear model vs. a Neural Network-based model.
Speaker: Rob Stites and Izhak Shafran
Title: High-Performance Computing at CSLU
Abstract: We will describe the computing environment at CSLU, including the Intel-OHSU cluster. This will include the system and the user perspectives. The talk is meant to provide pointers to facilitate better usage and management of the system including bug-tracking, user environments, resources and more. Bring your HPC questions.
Speaker: Ranjani Ramakrishnan
Title: Discovery and Characterization of Genetic and Epigenetic Variations in the Macaque Genome
Abstract: In this talk I present the results of two projects. The first is the discovery of single nucleotide polymorphisms in two sub-populations of rhesus macaques. I describe the process of variant discovery, validation, and its application to population stratification and trait differences. The second project is related to the characterization of epigenetic differences in populations of macaques that have been exposed to ethanol. Using low resolution, region-wide techniques for measuring the levels of modification, I infer the levels of modification at specific locations, using a probabilistic approach.
Speakers: Keith Hall and Richard Sproat
Title: Linear Ranking Models for Speech Applications
Abstract: We describe two applications of linear ranking models to problems in speech and language processing.
First we describe a model of stress prediction in Russian using a combination of local contextual features and linguistically-motivated features associated with the word's stem and suffix. We frame this as a ranking problem, where the objective is to rank the pronunciation with the correct stress above those with incorrect stress. We train our models using a simple Maximum Entropy ranking framework allowing for efficient prediction. An empirical evaluation shows that a model combining the local contextual features and the linguistically-motivated non-local features performs best in identifying both primary and secondary stress.
Second, we describe the application of the same general approach to the problem
of determining whether an unknown token is to be pronounced as a letter sequence
("CIA"), as a word ("NATO") or as a mix of the two ("WinNT").
Speaker: Guillaume Thibault
Title: Mathematical morphology operators for signal processing. Application to immunogold particle detection.
Abstract: Mathematical morphology (MM) is a part of mathematics created in the 70s by Jean Serra and George Matheron, at the Ecole des Mines de Paris. Since, MM is still largely used in signal processing. This talk proposes to first present the basic mathematical morphology operators, illustrated using a large number of examples in 1D and 2D. The second part presents a real application, by the immunogold particles detection into electron microscopy images.
Speaker: Anna Wilson
Title: Language samples from pediatric psychology research: Potential health-related collaborations
Abstract: There are several completed and ongoing research projects in the IDD Division of Psychology which have resulted in recordings of youth and/or parents with chronic health conditions. Dr. Wilson will review the aims and methods of these studies, outline sample sizes and other characteristics of existing voice recordings, and highlight potential research questions that would be of interest to health researchers. CSLU graduate students and faculty are welcome to propose collaborative projects using these data.
Speaker: Ethan Selfridge
Title: Importance-Driven Turn-taking for Spoken Dialogue Systems
Abstract: As turn-taking governs the information flow in a dialogue, it plays a critical role in creating a successful interaction between a user and a spoken dialogue system.
Current approaches to determining system turn-taking behavior are overly rigid and focus more on minimizing the gaps and overlaps than on creating an efficient interaction. Viewing turn-taking as a negotiative process and utterance importance as a central mediator of that process, we describe and evaluate Importance-Driven Turn-Taking. This approach uses reinforcement learning to determine the turn-taking behavior of the agent, and provides a greater degree of flexibility than all previous approaches. We evaluate it in a live domain with Mechanical Turk users and find it to be more efficient than current approaches.
Speaker: Jason Williams
Title: Dialog state tracking: Open problems, challenge task, and recent work
Abstract: In conversational systems, "dialog state tracking" means inferring the user's goal from the conversation history up to the current turn. System responses are chosen based on the estimated dialog state, so accurate dialog state tracking is crucial to the performance of conversational systems. However, state tracking is non-trivial since input is received via the error-prone processes of speech recognition and language understanding, so the system is never sure of what the user has said.
In commercial systems, dialog state tracking is typically done with simple but rather effective hand-crafted rules. Over the past 10 years, the research community has developed several statistical methods for state tracking. These have been shown to improve the performance of conversational systems in lab settings, and have recently be tested in public deployments for the first time.
In this talk, I'll explain the dialog state tracking problem, and current solutions in industry and research. I'll then cover two recent public deployments of statistical state tracking, which revealed several unanticipated weaknesses in state-of-the-art methods for state tracking. This study led to the creation of the "Dialog state tracking challenge", in which dialog data and evaluation tools were released; I'll next cover results from the challenge, which included 27 state trackers from 9 teams. Finally, I'll describe some recent work on dialog state tracking at Microsoft.
|09/24/2013||Eric Morley||Disfluency and Maze Detection||On Tuesday|
|09/16/2013||Aaron Dunlop||Comparing and Contrasting CYK Chart Decoding Methods||SoN 122|
|09/09/2013||Ranjani Ramakrishnan||Assembling the Japanese macaque genome|
|08/26/2013||Sydney Ryan||Estimating Phoneme Formant Targets and Coarticulation Functions of Clear and Conversational Speech|
|08/19/2013||Hamidreza Mohammadi||Asynchronous Interpolation Model|
|08/12/2013||Ethan Selfridge||Continuously Predicting and Processing Barge-in During a Live Spoken Dialogue Task|
|08/05/2013||Russell Beckley||Normalization of Vernacular Orthography on Twitter|
|07/29/2013||Mahsa Yarmohammadi||Approximate Parsing with "Hedge Grammars"|
|07/17/2013||Andrew Kun||Eye tracking in driving simulator experiments||On Wednesday at 10am|
|07/15/2013||Sean Slee||Spatial and Spectro-temporal Coding in the Auditory Midbrain|
In the second part of the talk I will describe the results of our ongoing research on
task-related plasticity in the IC of behaving ferrets. Previous research has demonstrated
that auditory cortical neurons can modify both their spatial and spectro-temporal
receptive fields (STRFs) when animals engage in auditory discrimination tasks. In the
mammalian auditory system, massive corticofugal projections send information from
cortical neurons to subcortical nuclei, suggesting that neurons in the earlier stages of the
auditory system may also undergo receptive field changes to enhance behavioral
discrimination. We are testing this hypothesis by recording neural activity in the IC while
ferrets engage in an auditory discrimination task. Ferrets are trained using a go/no-go
paradigm to withhold licking of a waterspout during a sequence of reference stimuli and
to receive a reward for licking during a target stimulus. To study the effects of behavior,
we compare STRFs measured during behavior and during a passive state before and after
behavior. The preliminary results suggest that rapid behaviorally-driven STRF changes in
the auditory midbrain are qualitatively similar to those described previously in the cortex
and demonstrate significant behavioral modulation of the subcortical auditory pathway.
|06/10/2013||Seyed Hamidreza Mohammadi||Transmutative Voice Conversion|
|06/05/2013||Maider Lehr||Weighted finite state transducer-based joint discriminative modeling for speech recognition||Thesis proposal; on Wednesday|
|06/03/2013||Géza Kiss||Estimating Speaker-Specific Intonation Patterns using the Linear Alignment Model and its Application for Characterizing Atypicality of Prosody|
|05/20/2013||Brian Bush||Estimating Phoneme Formant Targets and Coarticulation Parameters of Conversational and Clear Speech||Practice talk for ICASSP poster session|
|05/13/2013||Meysam Asgari||Improving the Accuracy and the Robustness of Harmonic Model for Pitch Estimation|
|05/06/2013||Alireza Bayestehtashk||Efficient and Accurate Multivariate Class-Conditional Densities Using Copula|
|05/02/2013||Andrew Fowler||Autotyping and Improved Bayesian Inference for Binary Typing Systems with Brain Computer Interfaces||RPE; On Thursday|
|04/29/2013||Mahsa Langarani||Pitch decomposition for recombinant synthesis|
|04/22/2013||Eric Morley||The Utility of Manual and Automatic Linguistic Error Codes for Identifying Neurodevelopmental Disorders|
|04/15/2013||Ranjani Ramakrishnan||Predicting methylation levels at unique locations on the Genome Using Next-gen sequencing data|
|04/08/2013||Amanda Stead||Discourse in Aging & Dementia: What real speech tells us about cognitive change|
Log-linear models have been a popular technique in natural language processing community for classification tasks, and recently they have also been widely adopted by the speech processing community. In particular, conditional log-linear models have become a popular way to reestimate the generative acoustic models and the statistical n-gram language models. Their discriminative nature and flexibility to incorporate diverse features make these models specially attractive. In this thesis we use discriminative log-linear models to jointly estimate parameters from the acoustic and language models, thereby optimizing the performance of the composite model that represents the speech recognizer. Additionally, we incorporate duration features which are not properly captured in state-of-the-art speech recognizers, but are important for distinguishing words in certain languages.
We will also apply the proposed joint discriminative model in two
additional scenarios: To jointly
estimate the language model and scoring system parameters from a
narrative-retelling assessment tool by optimizing the task objective, and to adapt speech recognition to dialectal speech. Specifically, we will learn a
pronunciation variation model together with the transition parameters from the
acoustic and language model of the speech recognizer.
Speaker: Géza Kiss
Title: Estimating Speaker-Specific Intonation Patterns using the Linear Alignment Model and its Application for Characterizing Atypicality of Prosody
Abstract: Characterizing speech intonation using a few relevant parameters is important for several purposes, including that of finding common characteristics in the speech of children with Autism Spectrum Disorder (ASD) or Specific Language Impairment (SLI). However, the estimation of intonation parameters from spontaneous speech is not solved for some models, such as the Simplified Linear Alignment Model (SLAM). We compare approaches to estimate SLAM parameters from the speech of children with ASD and SLI, and show significant parameter differences compared to that of their typically developing peers. The approach has the promise of being able to characterize an aspect of the prosodic atypicality in ASD and SLI, as well as to contribute to our general understanding of prosody.
Speaker: Brian Bush
Title: Estimating Phoneme Formant Targets and Coarticulation Parameters of Conversational and Clear Speech
Abstract: We present a data-driven formant model and methodology for discovering its parameters, namely phoneme targets and coarticulation functions for consonant-vowel-consonant (CVC) words from fully-automatic formant data. The model uses formant targets that are speaker dependent, but independent of speaking style and phonemic context. We used a global error measure to search for optimal formant targets for all phonemes, including classes of sounds where formants are not directly observable. Analysis of coarticulation parameters found significant differences in parameters between clear and conversational speech. Estimated formant targets were largely in agreement with acoustic-phonetic expectations. An intelligibility test validated that resynthesized CVC words using modeled formant trajectories were nearly as intelligible as resynthesized CVC words using observed formant trajectories.
Speaker: Meysam Asgari
Title: Improving the Accuracy and the Robustness of Harmonic Model for Pitch Estimation
Abstract: Accurate and robust estimation of pitch plays a central role in speech processing. Various methods in time, frequency and cepstral domain have been proposed for generating pitch candidates. Most algorithms excel when the background noise is minimal or for specific types of background noise. In this work, our aim is to improve the robustness and accuracy of pitch estimation across a wide variety of background noise conditions. For this we have chosen to adopt, the harmonic model of speech, a model that has gained considerable attention recently. We address two major weakness of this model. The problem of pitch halving and doubling, and the need to specify the number of harmonics. We exploit the energy of frequency in the neighborhood to alleviate halving and doubling. Using a model complexity term with a BIC criterion, we chose the optimal number of harmonics. We evaluated our proposed pitch estimation method with other state of the art techniques on Keele data set in terms of gross pitch error and fine pitch error. Through extensive experiments on several noisy conditions, we demonstrate that the proposed improvements provide substantial gains over other popular methods under different noise levels and environments.
Speaker: Alireza Bayestehtashk
Title: Efficient and Accurate Multivariate Class-Conditional Densities Using Copula
Abstract: There is a clear dichotomy between univariate and multivariate generative models for continuous random variables. Univariate densities can be modeled accurately and efficiently using nonparametric kernel density estimators, which unfortunately cannot be easily extended to multivariate case. Gaussian mixture models on the other hand have become the workhorse for multivariate densities because they capture multivariate dependencies effectively and efficiently. However, the multivariate Gaussian mixture models impose a particular form on the marginal, a Gaussian mixture model. This is a strong assumption on the marginal and is violated in many practical applications. In this paper, we propose a simple generative method based on copula model that takes advantage of the accuracy of the nonparametric univariate density estimator and the multivariate dependencies captured in the Gaussian mixture model. This alleviates the aforementioned limitations. We show that the proposed generative model consistently outperforms Gaussian mixture models on classification tasks from UCI repository, with performance often comparable and sometimes better than Support Vector Machine (SVM).
Speaker: Andrew Fowler
Title: Autotyping and Improved Bayesian Inference for Binary Typing Systems with Brain Computer Interfaces
Abstract: RSVP Keyboard is a successful typing system for people with severe physical disabilities, specifically those with locked-in syndrome (LIS). It uses signals from an electroencephalogram (EEG) combined with information from an n-gram language model to select letters to be typed. The main shortcoming of the system as it exists today is that it does not keep track of past EEG observations, i.e. observations made of brain signals while the user was in a different part of a typed message. We present a system for taking all past observations into account in a principled Bayesian manner, and show that this method results in an over 20% increase in simulated typing speed. We also show that our method allows for better calculation of the probability of the backspace symbol, an important feature. Finally, we demonstrate the utility of automatically typing certain letters in certain contexts, a technique that allows for increased typing speed under our new method.
Speaker: Mahsa Langarani
Title: Pitch decomposition for recombinant synthesis
Abstract: Recombinant synthesis is text-to-speech synthesis where both acoustic and prosodic units are stored in a database to get more natural sounding speech. Prosodic units in this case are phrase curves and accent curves. The problem of extracting these curves from the raw F0 curves is not easy to solve. The first hurdle is to start with a robust F0 curve. We discuss an approach to deal with pitch halving and doubling errors by using normalized cross-correlation to compute F0 candidates and applying the Viterbi algorithm to find the best path through the candidates. We show an improvement in F0 curve estimation over the standard get_f0 method present in Snack. Next, we discuss a new method for decomposing F0 curves into phrase and accent curves. We assume phrase curves are piecewise linear. We can model accent curves using skewed normal distributions and sigmoid functions to deal with the end of an interrogative utterance. Number of parameters depends on the number of feet and phrases. All of the parameters are optimized at the same time using the Sequential Least Squares Programming optimization algorithm.
Speaker: Eric Morley
Title: The Utility of Manual and Automatic Linguistic Error Codes for Identifying Neurodevelopmental Disorders
Abstract: We investigate the utility of linguistic features for automatically differentiating between children with varying combinations of two potentially comorbid neurodevelopmental disorders: autism spectrum disorder and specific language impairment. We find that certain manual codes for linguistic errors are useful for distinguishing between diagnostic groups. We investigate the relationship between coding detail and diagnostic classification performance, and find that a simple coding scheme is of high diagnostic utility. We propose a simple method to automate the pared down coding scheme, and find that these automatic codes are of diagnostic utility.
Speaker: Ranjani Ramakrishnan
Title: Predicting methylation levels at unique locations on the Genome Using Next-gen sequencing data
Abstract: In this talk I present the approach that we are taking to predict methylation levels at CpG sites on the genome using read count data from precipitation experiments. We use data from rhesus macaques that have been exposed to ethanol and that have had methylation levels measured using multiple techniques, followed by next-gen sequence analysis. I will talk about the external data sources that we are using, in addition to our experimental data, that we include in the model. I will present the results of our approach and compare it to an SVM-based classification approach.
Speaker: Amanda Stead
Title: Discourse in Aging & Dementia: What real speech tells us about cognitive change
Abstract: To engage in discourse, multiple cognitive systems must engage across various portions of the process. Discourse analysis has been a growing area of investigation in aging and dementia; however, because of some of its qualitative aspects and unpredictable nature, many researchers have yet to see discourse for its true clinical utility. Different types of discourse rely on different types of cognitive support systems. This talk presents the potential clinical utility of discourse in the study of aging and dementia as well as what features of language deteriorate at certain stages of cognitive decline and how discourse can be used to discover them.
|Monday, 03/18/2013||Kyle Gorman||Modeling wordlikeness|
|Monday, 03/11/2013||Golnar Sheikhshabbafghi||Identifying a Topic Mention in Private Conversations: A Semi-Supervised Approach|
|Monday, 03/04/2013||Brian Snider||Minimally-Obtrusive Respiratory Cycle Tracking for Assessing Sleep-Disordered Breathing Severity||RPE|
|Monday, 02/25/2013||Masoud Rouhizadeh||Distributional semantic models for the evaluation of disordered language|
|Monday, 02/11/2013||Christopher Whelan||Cloudbreak: A MapReduce Algorithm for Detecting Genomic Structural Variation|
|Monday, 02/04/2013||Brian Roark||Imposing marginal distribution constraints on language models|
|Monday, 01/28/2013||Aaron Dunlop||Efficient Latent-Variable Grammars : Learning and Inference||Thesis Proposal; at 11:30am|
|Tuesday, 01/22/2013||On Tuesday due to MLK Jr. Day|
We propose three methods of incorporating efficiency concerns into
the process of training latent-variable grammars, and present
preliminary results indicating the potential of these
approaches. First, we propose text-normalization prior to grammar
induction, and demonstrate that even simple normalizations can
reduce the grammar size considerably with minimal impact on
accuracy. Second, we propose a modeling approach, predicting
inference time from attributes of the grammar, and incorporating
those predictions in the optimization criteria during split-merge
grammar training. We present preliminary trials demonstrating a
speedup of 30% with minimal accuracy loss. Finally, we propose a
discriminative criteria for selecting state-splits, allowing a controlled
tradeoff between speed and accuracy in the learned grammar.