Saul, Lawrence K

Email Address

ORCID

Disciplines

relationships.isProjectOf

relationships.isOrgUnitOf

Position

Introduction

Research Interests

Search Results

Now showing 1 - 10 of 17
  • Publication
    Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization
    (2004-12-13) Sha, Fei; Saul, Lawrence K
    An auditory "scene", composed of overlapping acoustic sources, can be viewed as a complex object whose constituent parts are the individual sources. Pitch is known to be an important cue for auditory scene analysis. In this paper, with the goal of building agents that operate in human environments, we describe a real-time system to identify the presence of one or more voices and compute their pitch. The signal processing in the front end is based on instantaneous frequency estimation, a method for tracking the partials of voiced speech, while the pattern-matching in the back end is based on nonnegative matrix factorization, an unsupervised algorithm for learning the parts of complex objects. While supporting a framework to analyze complicated auditory scenes, our system maintains real-time operability and state-of-the-art performance in clean speech.
  • Publication
    Exploratory analysis and visualization of speech and music by locally linear embedding
    (2004-05-17) Jain, Viren; Saul, Lawrence K
    Many problems in voice recognition and audio processing involve feature extraction from raw waveforms. The goal of feature extraction is to reduce the dimensionality of the audio signal while preserving the informative signatures that, for example, distinguish different phonemes in speech or identify particular instruments in music. If the acoustic variability of a data set is described by a small number of continuous features, then we can imagine the data as lying on a low dimensional manifold in the high dimensional space of all possible waveforms. Locally linear embedding (LLE) is an unsupervised learning algorithm for feature extraction in this setting. In this paper, we present results from the exploratory analysis and visualization of speech and music by LLE.
  • Publication
    Visualization of Low Dimensional Structure in Tonal Pitch Space
    (2005-09-06) Burgoyne, J. Ashley; Saul, Lawrence K
    In his 2001 monograph Tonal Pitch Space, Fred Lerdahl defined a distance function over tonal and post-tonal harmonies distilled from years of research on music cognition. Although this work references the toroidal structure commonly associated with harmonic space, it stops short of presenting an explicit embedding of this torus. It is possible to use statistical techniques to recreate such an embedding from the distance function, yielding a more complex structure than the standard toroidal model has heretofore assumed. Nonlinear techniques can reduce the dimensionality of this structure and be tuned to emphasize global or local anatomy. The resulting manifolds highlight the relationships inherent in the tonal system and offer a basis for future work in machine-assisted analysis and music theory.
  • Publication
    Multiplicative Updates for Classification by Mixture Models
    (2001-12-03) Saul, Lawrence K; Lee, Daniel D
    We investigate a learning algorithm for the classification of nonnegative data by mixture models. Multiplicative update rules are derived that directly optimize the performance of these models as classifiers. The update rules have a simple closed form and an intuitive appeal. Our algorithm retains the main virtues of the Expectation-Maximization (EM) algorithm—its guarantee of monotonic improvement, and its absence of tuning parameters—with the added advantage of optimizing a discriminative objective function. The algorithm reduces as a special case to the method of generalized iterative scaling for log-linear models. The learning rate of the algorithm is controlled by the sparseness of the training data. We use the method of nonnegative matrix factorization (NMF) to discover sparse distributed representations of the data. This form of feature selection greatly accelerates learning and makes the algorithm practical on large problems. Experiments show that discriminatively trained mixture models lead to much better classification than comparably sized models trained by EM.
  • Publication
    Statistical signal processing with nonnegativity constraints
    (2003-09-01) Saul, Lawrence K; Sha, Fei; Lee, Daniel D
    Nonnegativity constraints arise frequently in statistical learning and pattern recognition. Multiplicative updates provide natural solutions to optimizations involving these constraints. One well known set of multiplicative updates is given by the Expectation-Maximization algorithm for hidden Markov models, as used in automatic speech recognition. Recently, we have derived similar algorithms for nonnegative deconvolution and nonnegative quadratic programming. These algorithms have applications to low-level problems in voice processing, such as fundamental frequency estimation, as well as high-level problems, such as the training of large margin classifiers. In this paper, we describe these algorithms and the ideas that connect them.
  • Publication
    Multiplicative Updates for Large Margin Classifiers
    (2003-08-24) Saul, Lawrence K; Sha, Fei; Lee, Daniel D
    Various problems in nonnegative quadratic programming arise in the training of large margin classifiers. We derive multiplicative updates for these problems that converge monotonically to the desired solutions for hard and soft margin classifiers. The updates differ strikingly in form from other multiplicative updates used in machine learning. In this paper, we provide complete proofs of convergence for these updates and extend previous work to incorporate sum and box constraints in addition to nonnegativity.
  • Publication
    Global Coordination of Local Linear Models
    (2001-12-03) Roweis, Sam; Saul, Lawrence K; Hinton, Geoffrey E
    High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifold—arguably an important goal of unsupervised learning. In this paper, we show how to learn a collection of local linear models that solves this more difficult problem. Our local linear models are represented by a mixture of factor analyzers, and the “global coordination” of these models is achieved by adding a regularizing term to the standard maximum likelihood objective function. The regularizer breaks a degeneracy in the mixture model’s parameter space, favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly and continuously as one traverses a connected path on the manifold—even when the path crosses the domains of many different local models. The regularizer takes the form of a Kullback-Leibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable ones.
  • Publication
    Nonnegative deconvolution for time of arrival estimation
    (2004-05-17) Lin, Yuanqing; Lee, Daniel D; Saul, Lawrence K
    The interaural time difference (ITD) of arrival is a primary cue for acoustic sound source localization. Traditional estimation techniques for ITD based upon cross-correlation are related to maximum-likelihood estimation of a simple generative model. We generalize the time difference estimation into a deconvolution problem with nonnegativity constraints. The resulting nonnegative least squares optimization can be efficiently solved using a novel iterative algorithm with guaranteed global convergence properties. We illustrate the utility of this algorithm using simulations and experimental results from a robot platform.
  • Publication
    Multiplicative updates for nonnegative quadratic programming in support vector machines
    (2002-12-10) Sha, Fei; Saul, Lawrence K; Lee, Daniel D
    We derive multiplicative updates for solving the nonnegative quadratic programming problem in support vector machines (SVMs). The updates have a simple closed form, and we prove that they converge monotonically to the solution of the maximum margin hyperplane. The updates optimize the traditionally proposed objective function for SVMs. They do not involve any heuristics such as choosing a learning rate or deciding which variables to update at each iteration. They can be used to adjust all the quadratic programming variables in parallel with a guarantee of improvement at each iteration. We analyze the asymptotic convergence of the updates and show that the coefficients of non-support vectors decay geometrically to zero at a rate that depends on their margins. In practice, the updates converge very rapidly to good classifiers.
  • Publication
    Multiband statistical learning for f0 estimation in speech
    (2004-05-17) Sha, Fei; Burgoyne, J. Ashley; Saul, Lawrence K
    We investigate a simple algorithm that combines multiband processing and least squares fits to estimate f0 contours in speech. The algorithm is untraditional in several respects: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates; and it works reliably, in real time, without the need for postprocessing to produce smooth contours. We show that a baseline implementation of the algorithm, though already quite accurate, is significantly improved by incorporating a model of statistical learning into its final stages. Model parameters are estimated from training data to minimize the likelihood of gross errors in f0 as well as errors in classifying voiced versus unvoiced speech. Experimental results on several databases confirm the benefits of statistical learning.