Transforming trees into hedges and parsing with "hedgebank" grammars
Finite-state chunking and tagging methods are very fast for annotating non-hierarchical syntactic information, and are often applied in applications that do not require full syntactic analyses. Scenarios such as incremental machine translation may benefit from some degree of hierarchical syntactic analysis without requiring fully connected parses. We introduce hedge parsing as an approach to recovering constituents of length up to some maximum span L. This approach improves efficiency by bounding constituent size, and allows for efficient segmentation strategies prior to parsing. Unlike shallow parsing methods, hedge parsing yields internal hierarchical structure of phrases within its span bound. We present the approach and some initial experiments on different inference strategies.
Detecting linguistic restricted interests in autism using distributional semantic models
Children with autism spectrum disorder often exhibit idiosyncratic patterns of behaviors and interests. In this paper, we focus on measuring the presence of idiosyncratic interests at the linguistic level in children with autism using distributional semantic models. We model the semantic space of children's narratives by calculating pairwise word overlap, and we compare the overlap found within and across diagnostic groups. We find that the words used by children with typical development tend to be used by other children with typical development, while the words used by children with autism overlap less with those used by children with typical development and even less with those used by other children with autism. These findings suggest that children with autism are veering not only away from the topic of the target narrative but also in idiosyncratic semantic directions potentially defined by their individual topics of interest.
Exploring different voice conversion approaches and their applications
Voice Conversion is the process of making the speech of a source speaker similar to a target speaker. These methods can also be applied to speech style conversion. We studied implemented four VC methods based on: frequency warping, frame-selection, gaussian mixture models, and neural networks. For evaluating VC method performance, we perceptually evaluated conversion accuracy and speech quality. We performed three different experiments. In the first experiment, we compared the last three approaches. We used 2 and 70 training sentences to explore the effect of trainig size. For 70 training sentences, frame-selection showed the best performance regarding conversion accuracy and speech quality. For 2 training sentences, the pre-trained neural network performed best conversion accuracy and speech quality. In the second experiment, gaussian mixture model was compared to frequency warping. These two are two different families of VC approaches. The perceptual results showed that frequency warping resulted in superior speech quality, however gaussian mixture model resulted in better conversion accuracy. In the third application, we applied the gaussian mixture model approach to convert speaking styles. The goal was to convert conversational speaking style vowels to clear speaking style vowels which are reportedly more intelligible. For evaluating the method, an intelligibility test was performed. The perceptual results showed that for noisy conditions, the conversion is successful because it increases the intelligibility of the spoken vowels.
Discriminative Joint Modeling of Acoustic and Lexical Variations for Spoken Language Processing
Speech recognition systems consist of three models, namely, the
acoustic model, the pronunciation model and the language model. The
acoustic and language models are typically learned separately and
furthermore optimized for different cost functions. This framework has been
a result of historical and practical considerations such as the availability
of limited amount of training data and the computational cost. These
considerations are currently being overcome. Arguably, learning both
models jointly to directly minimize the word error rate will result in a better
One of the contributions of this thesis is a detailed investigation of a
discriminative framework to jointly learn the parameters of the acoustic, language and duration models. The acoustic state transition
parameters, the n-gram language model parameters and the state
duration parameters are learned using a reranking framework, which
has been previously employed in discriminative language
models. We report experiments on the GALE Arabic transcription
task, a NIST benchmark, with about 200 hours of training data and two test
sets of about 2.5 hours. Our results demonstrate that our model improves the
performance by about 1.4-1.6% absolute word error rate over the baseline
Continuing with the joint modeling framework, next, we apply it to learn pronunciation
variations particular to African American Vernacular English (AAVE) speech. Popular
speaker adaptation methods adapt the acoustic models quickly using small amounts
of data, for example, by estimating a few linear transforms. Such transformations are
incapable of appropriately capturing systematic phonetic transformations. We investigate
strategies for learning phonetic transformations jointly with the discriminative language
model. We compare our new models on NPR's StoryCorps corpus, which consists
of stories from self-identified AAVE and standard
American English (SAE) speakers (non-AAVE speakers). The joint
discriminative pronunciation and language model improves the performance of the AAVE recognizer by about 2.0% WER of which about 0.5% can be attributed to pronunciation models. Improvements on the SAE data are lower and mainly attributed to the discriminative language model.
Finally, we examine how joint modeling of acoustic and lexical variations can
improve the performance of a downstream application, a narrative retelling
assessment tool. We develop a conditional random field (CRF) based model to incorporate both
variations, and demonstrate gains of 6.3% over a generative baseline
in the F-score of detecting story elements on a clinical task, the
Wechsler Logical Memory test.
Modeling Coarticulation in Continuous Speech
We present a data-driven formant model and methodology for discovering its parameters, namely phoneme targets and coarticulation functions of continuous style-independent speech. We hypothesize that these targets are global, yet speaker dependent. The objective of this thesis is to show that targets of acoustic events, including classes of sounds where formants are not directly observable, can be derived from continuous speech. Our approach models speech feature trajectories as a series of overlapping triphone coarticulation models. A global error measure is used to estimate both targets and coarticulation parameters. An overview of applications to be developed using this modeling approach will be discussed.
Statistical analysis of clinical polysomnography
Sleep-disordered breathing (SDB) is believed to be a widespread, under-diagnosed condition associated with many detrimental health problems. The current gold standard for diagnosis of SDB is a sleep study, or polysomnography (PSG). This overnight procedure takes place in a sleep laboratory and typically records twelve or more biological processes requiring 22–40 wires to be attached to the patient. Scoring of study results is time-consuming and expensive, and requires a trained human expert to visually assess several continuous signals while interpreting and applying a multi-step scoring algorithm. In this talk, we analyze scored PSG studies using statistical measures to gain insight into the manual scoring process — as well as the underlying physiological phenomena — to guide future automatic scoring approaches.
Unlocking Autism: Massively Parallel Strategies and Shifting Genetic Paradigms
We are now at a critical juncture where emerging technologies have the potential to transform our understanding of human disease. In recent years, the genetic intractability of complex genetic disorders, such as autism, has been challenged. First, by the application of genomewide platforms to detect copy number variants and more recently sequencing of the entire protein-coding genome (aka exome). Exome studies of simplex or "sporadic" autism has highlighted the importance of de novo mutations and led to the discovery of many novel candidate genes. Our data strongly support a major role for recurrently disrupted genes in sporadic autism, many of which span diagnostic boundaries, and provide a model to discover and rigorously validate bona fide genetic risk factors for neurodevelopmental disorders.
On Twitter, many writers are not committed to orthographic conventions. This creates difficulty for NLP systems that need to recognize familiar words. In this presentation, I introduce methods for making such non-standard writing more amenable to subsequent processes.
Assembly and the Characterization of Variations in the macaque genome: Application to discovering population structure and disease associations
In this talk I present an overview of my research projects. The first is the hybrid approach taken for the assembly of the macaque genome using a combination of long and short read sequences.
The second project is related to the characterization of epigenetic differences in populations of macaques. Using low resolution, region-wide techniques for measuring the levels of modification of the genetic signals, we try to infer the levels of modification at specific locations, using a probabilistic approach.
The third project describes the discovery of genetic variants in two small sub-populations of rhesus macaques. Using a ranking of these variants, we demonstrate the utility of the ranked list in the discovery of population sub-structure, as well as in identifying variants associated with a concordant-trait, SIV resistance.
Systems Biology of Onset of Puberty
The release of gonadotropin-releasing hormone (GnRH) initiates puberty in mammalians. Although the genetic factors behind the secretion of this hormone are not completely well understood yet, biological findings point to a genetic pathway that involves regulation of many genes. We use microarray data to identify the genes that show the highest variability levels at several puberty initiation stages of the biological development. We use weighted gene co-expression network analysis (WGCNA) and similarity distance measures to investigate clusters of genes that behave similarly at different developmental stages. By using Pearson correlation coefficient as the basis of our similarity matrix, we identified groups of genes with similar patterns, which are related to other genetic functions such as energy metabolism and response to organic substance with very high statistical significance (FDR ~ 10-4). We also used partial correlation coefficients as an alternative way to form our similarity matrix to isolate the effect of a single gene on the genetic network. We conditioned the network on ZBTB16, because ZBTB16 is biologically known to be transcriptionally active at the initiation of puberty. We found out that genes previously known to affect puberty such as Penk and Negr1 and Nfat5, change behavior with the extraction of ZBTB16 gene.
Automating Language Sample Analysis
Analyzing samples of natural language, "language sample analysis", can be informative to clinicians, but doing so can also be prohibitively expensive. Techniques in natural language processing could be used to decrease the costs of language sample analysis, thereby increasing its adoption. This talk will address strategies for and challenges in automating of the annotations in the Systematic Analysis of Language Transcripts (SALT), which is the de facto standard annotation system for language sample analysis.
In particular, I will discuss recent results from my work into automating maze annotations (NB: mazes are defined in the SALT manual as "filled pauses, false starts, repetitions, reformulations, and interjections''). The talk will conclude with an overview of the work I plan to do in automating the grammatical error annotations in SALT, as well as an analysis of the clinical utility of both manually and automatically applied SALT annotations.
Deep Learning strategies for Voice Conversion
Traditionally, two categories of features have been used in Voice Conversion (VC). The first category is based on Linear Predictive Coding (LPC) features. Line Spectral Frequencies (LSFs) are computed from LPC coefficients. These features model spectral peaks. Another category of features are Mel Cepstral (MCEP) features. These features try to model spectrum, unlike LSFs which try to model spectrum peaks. Because of this MCEPs suffer from over-smoothing. Autoencoders have been used to extract abstract features from text and image. In this study we will try to build a new category of features using Autoencoders. We hope these features are better suited for modeling, specially when we have a limited amount of training data.
Automated analysis of clinical language samples
Spontaneous language samples can be used to assess language impairment (LI) and related conditions. Compared to the highly structured instruments traditionally used to diagnose LI, measures derived from spontaneous language samples are thought to have comparable sensitivity and specificity, to be less sensitive to cultural biases or dialect variation, and to have superior ecological validity. Conventional spontaneous language measures can be computed with the assistance of proprietary software like SALT (Systematic Analysis of Language Transcripts), but even this requires painstaking manual morphosyntactic analysis. In this talk I’ll describe how conventional natural language processing techniques can be used to automate morphosyntactic analyses needed to compute three widely-used spontaneous language measures: mean length of utterance in morphemes (MLUM), number of distinct word roots (NDRW), and the Index of Productive Syntax (IPSyn) [the latter using tools developed by Richard Sproat]. These measures, applied to spontaneous language samples collected at CSLU, are shown to be useful indicators of LI.
Automatic detection of monkey's vocalization
Vocalization is an important clue in the recognition of monkeys' behaviors. Previous studies have shown the number and length of vocalizations reveal significant information about social interaction in a group of monkeys. For this work, we use a corpus of monkey's sound collected at Oregon National Primate Research Center. The corpus consists of several audio recordings sessions, collected from a group of monkeys placed in a Pen cage. Each monkey's recording was independently sampled at 8 kHz using a collar-mounted recorder. The constraints of the recording environment necessitated using low power recorders. Like in sensor networks, low power recorders are unreliable and have sample loss. This poses a problem while aligning the recorded waveforms since each microphone records a mixture of vocalization from different monkeys including the monkey wearing the recorder and its spatial neighbors. In this talk, we describe our experiments on automated approaches for detecting monkeys' vocalization and discuss the trade-offs of different alignment methods.
Understanding Speech and Language Impairment in Deep Brain Stimulation
Deep brain stimulation (DBS) is often employed to reduce tremor in Parkinson's disease when drugs alone are no longer effective. The placement and parameters of the stimulation such as the pulse width, frequency and duty cycle are tuned solely to reduce tremor. It is widely acknowledged that DBS worsens speech and language impairment for most patients. However, the studies on this topic are limited. There are open questions about differences in effects on stimulating sub-thalamic nucleus (STN) vs globus pallidus interna (GPi), the dependence of impairment on parameters of the stimulation as well as the trajectory of the implant. There are also questions about what are effective strategies for characterizing speech and language impairment. In this ongoing study, funded and run via OCTRI, we are investigating the feasibility of automated administration of speech and language tasks and are collecting data to perform preliminary analysis to address these questions. The talk will not assume any prior knowledge on Parkinson's disease or speech analysis.
Accurate and Robust Models for Clinical Speech Processing
The purpose of this study is to achieve accurate and reliable estimation of voiced
segments, fundamental frequency, HNR, jitter, and shimmer for clinical speech analysis.
Moreover, this study aims to investigate the utility of developed measures in
the context of speech-based assessment of cognitive impairments including PD, autism
spectrum disorder (ASD), and clinical depression. For this, we adopt a harmonic model
(HM) of speech and address major problems of this model. In order to overcome certain
weaknesses of other currently available algorithms, we extract aforementioned speech
features using improved version of HM. We evaluate the performance of our improved
HM in context of voicing detection and pitch estimation with other state-of-the-art
techniques on Keele data set. Through extensive experiments on several noisy conditions,
we demonstrate that the proposed improvements provide substantial gains over
other popular methods under different noise levels and environments. Further, we evaluate
the utility of our speech models in 1) predicting the clinical rating of severity of
PD, 2) detecting ASD and classifying it into 4 sub-type categories, and 3) detecting
the clinical depression in adolescents. The last chapter of the thesis focuses on the use
of deep neural networks (DNNs) on detecting audio events , for example, a speakers
current location (e.g., in apartment, outdoors, and etc.) and activity ( e.g., listening to
music, eating and etc.) in adverse noisy environments.
Detecting and Analyzing Genomic Structural Variation using Distributed Computing
Genomic structural variations (SVs) are an important class of genetic variants with a variety of functional impacts, and account for most of the bases of DNA that differ between individuals in the human population. The detection of SVs using high-throughput short-read sequencing data is a difficult problem, and published algorithms do not provide the sensitivity and specificity required in research and clinical settings. Meanwhile, high-throughput sequencing is rapidly generating ever-larger data sets, necessitating the development of tools that can provide results rapidly and scale to use cloud and cluster infrastructures. MapReduce and Hadoop are becoming a standard for managing the distributed processing of large data sets, but existing SV detection approaches are difficult to translate into the MapReduce framework. We have formulated a general framework for SV detection in MapReduce, and implemented a software package called Cloudbreak, which detects genomic deletions and insertions with high accuracy. Through the use of MapReduce and Hadoop, Cloudbreak can scale to harness large compute clusters and big data sets, leading to much faster runtimes than existing methods. In addition, we show that Cloudbreak's formulation of the SV detection problem in terms of local feature generation allows it to simultaneously integrate many informative signals from genomic data sets in statistical learning frameworks. We demonstrate an implementation of this using conditional random fields, which enable learning conditional probability distributions over labels on sequences of observations, and show that it improves Cloudbreak's results, in particular increasing breakpoint resolution.
Efficient Latent-Variable Grammars: Learning and Inference
Syntactic analysis is important for many natural language processing
(NLP) tasks, but constituency parsing is computationally
expensive - often prohibitively so. Consumers who would be best
served by constituency parsing are often forced by resource
constraints to settle for less effective approaches. In this work, we
examine the barriers to efficient context-free processing, and present
several approaches to improve throughput and latency.
In this talk, I present several methods of incorporating efficiency concerns
into the process of training latent-variable PCFGs, and experimental
trials demonstrating the effects of these approaches.
I explore the characteristics of a grammar that impact efficient
inference, and present a regression model predicting inference time from
those characteristics. I integrate that predictive model into
latent-variable grammar learning, combining predicted accuracy and
efficiency into a joint objective function, and optimizing the set of
retained state-splits according to that objective.
In aggregate, these methods achieve a speedup of approximately
20x for Viterbi search, and 3x for alternate decoding methods.
Emotions and Prosodic Atypicality in the Speech of Children with Autism
We report work on two aspects of the speech of children with Autism Spectrum Disorders (ASD). First, I report my work on automatic measurement of affective valence and arousal, which was part of a larger study aimed at finding manifestations of neurological underconnectivity in the emotional expression of children with ASD. Second, I describe ongoing work aimed at collecting perceptual ratings characterizing the atypicality of their speech prosody (i.e. intonation, rhythm, loudness).