Analysis and Resynthesis:

The Application of Heterodyne Filter Analysis and Linear Predictive Coding

using cSound's ADSYN, LPREAD, and LPRESON Opcodes


Earlier attempts at Computer-based Spectrum Analysis

Pitch-synchronous analysis

  • developed at Bell Telephone Labs by Max Matthews and Jean-Claude Risset (1961)
  • breaks input waveform into pseudoperiodic segments and then estimates the pitch of each pseudoperiodic segment
  • size of analysis segment is adjusted relative to the estimated pitch period
  • harmonic Fourier spectrum is then calcualted on the analysis segment as though the sound were periodic (as if the pitch is quasi-constant throughout the analysis segment)
  • this program generated time-varying amplitude functions for each harmonic of a given fundamental
  • additional approachesin pitch-synchronous developed at MIT in 1963 (Luce)

Heterodyne Filter Analysis - the next step in computer-based spectrum analysis after pitch-synchronous approaches

About it, assumptions, advantages

  • heterodyne filter analysis multiplies an input waveform by a sine and cosine wave at each harmonic frequency and then sums the results over a short time period to obtain amplitude and phase data
  • Creates a break-point function for each harmonic
  • implies that fundamental frequency is estimated in a prior stage of analysis
  • good for resolving harmonics of a given fundamental frequency
  • Calculates relative energy at each harmonic
  • Similar to STFT, analyzes signal at multiple, evenly-spaced time points
  • Creates a break-point function for each harmonic

How it Works

  • input signal is multiplied by an analysis sine wave and comparisons are made
  • it works by multiplying the sound being analyzed by sine and cosine waveforms at frequencies equal to the given fundamental frequency and then again at each of its harmonics. For each frequency the result of the multiplication is summed to obtain the amplitude and frequency value



  • ***************

Demise of Heterodyne Filter Analysis and the rise of phase vocoders and Additive Resynthesis

  • Problems with Heterodyne Filter Analysis
    1. confused by transients
    2. confused by ptich changes of more than a quarter tone (glissandi, vibrato, portamento)

  • Heterodyne Filter Analysis supplanted by:
    1. the phase vocoder
    2. additive resynthesis - a tracking version of heterodyne filter analysis that could follow changing frequency and amplitude trajectories (but required a lot of processing power)


Additive Resynthesis in cSound: The HETRO Utility and ADSYN opcode


csound -U hetro [flags] infilename outfilename

infilename
input audio file to be analyzed
outfilename
output file name (usually adsyn.nnn)

Flags:
-s rate
sampling rate of the audio input file.
-c channel
channel number sought. Default is all channels.
-b begin
beginning time (in seconds). Default is 0.0
-d duration
duration (in seconds). Default is to the end of the file.
-f begfreq
estimated frequency of the fundamental. Default is 100Hz.
-h partials
number of harmonic partials sought in the audio file. Default is 10, maximum is a function of memory available.
-M maxamp
maximum amplitude summed across all concurrent tracks. The default is 32767.
-m minamp
amplitude threshold. (Harmonics with amplitudes below this won’t contribute.)
-n brkpts
initial number of analysis breakpoints in each amplitude and frequency track, prior to thresholding (-m) and linear breakpoint consolidation. The initial points are spread evenly over the duration. The default is 256.
-l cutfreq
substitute a 3rd order Butterworth low-pass filter with cutoff frequency cutfreq (in Hz), in place of the default comb filter.

ADSYN syntax

 ar adsyn kamod, kfmod, ksmod, ifilcod 

kamod
relative amplitude variable (1 = no change)
kfmod
pitch control variable (1 = no change)
ksmod
speed control variable (1 = no change)
ifilcod
adsyn.nnn or pathname in quotes

Legal Output File Formats

An adsyn control file could have following format:
  • -1 time1 value1 ... timeK valueK 32767 ; amplitude breakpoints for partial 1
  • -2 time1 value1 ... timeL valueL 32767 ; frequency breakpoints for partial 1
  • -1 time1 value1 ... timeM valueM 32767 ; amplitude breakpoints for partial 2
  • -2 time1 value1 ... timeN valueN 32767 ; frequency breakpoints for partial 2



Linear Predictive Coding (LPC)

About it, uses, summary

  • subtractive analysis/resynthesis method that has been extensively used in speech and music applications (especially speech applications)
  • basic idea - given N samples of a waveform, to what extent can the next samples be predicted? LPC attempts to find the best possible prediction of the next sample
  • Advantages:
    1. one can edit analysis data and resynthesize variations on the original input signal
    2. LPC separates the excitation signal from the resonance, making it possible to manipulate rhythm, pitch, & timbre independently



LPC and relationship to subtractive synthesis

  • subtractive synthesis starts with complex waveform from which certain frequencies are removed
  • starting waveshape may be periodic (harmonic), such as pulse, sawtooth, square, or triangular, or it may be random
  • THEN this waveform is filtered
  • This two-part system can be thought of as a source (=excitation) waveform followed by a modification (=resonance)
  • This is similar to natural acoustic sounds (i.e., source = bowed string = excitation waveform AND modification = body of violin = resonant system/filter)



How LPC Works


 2 samples before present sample  1 sample before present sample  present sample  future sample to be predicted 
 X[n-2] 
 X[n-1] 
 X[n] 
 X[n +1] 


  1. multiply each of the N samples by a paramter and adding together all the products to find the predicted value
  2. difference between the prediction and the actual next sample value is called the prediction error
  3. output samples are "predicted" by a linear combination of filter parameters (coefficients) and previous samples. A prediction algorithm tries to find samples at positions outside a region where one already has samples
  4. since the predictor is looking at sums and differences of time-delayed samples, it can be viewed as a filter



Overview of LPC Analysis: The four stages (or directions) of LPC Analysis

Each stage (or direction) of analysis is carried out on a frame-by-frame basis, where frame is like a snapshot

  1. spectrum analysis in terms of formants
  2. pitch analysis
  3. amplitude analysis
  4. decision as to whether the sound was voiced (pitched) or unvoiced (characteristic of noise)



Links to resources regarding LPC

The links progress from basic definitions of LPC to more complex explanations utilizing some math.

  1. Short Definition of LPC from EARS
  2. Short Definition with hyperlinks
  3. One-page explanation with few hyperlinks
  4. One-page explanation with more hyperlinks
  5. LPC using cSound examples (courtesy of MIT and SFU)
  6. Mathematical Explanation
  7. Another Mathematical Explanation
  8. Excerpt from textbook used at Cornell
  9. Links to LPC programming resources



Linear Predictive Coding in cSound: the LPREAD, LPRESON, and LPFRESON opcodes



LPREAD

lpread
Reads a control file of time-ordered information frames

Syntax

 krmsr, krmso, kerr, kcps lpread ktimpnt, ifilcod [, inpoles] [, ifrmrate] 


Initialization

ifilcod
integer or character-string denoting a control-file (reflection coefficients and four parameter values) derived from n-pole linear predictive spectral analysis of a source audio signal. An integer denotes the suffix of a file lp.m; a character-string (in double quotes) gives a filename, optionally a full pathname. If not fullpath, the file is sought first in the current directory, then in that of the environment variable SADIR (if defined). Memory usage depends on the size of the file, which is held entirely in memory during computation but shared by multiple calls (see also adsyn, pvoc).
inpoles (optional, default=0)
number of poles in the lpc analysis. It is required only when the control file does not have a header; it is ignored when a header is detected
ifrmrate (optional, default=0)
frame rate per second in the lpc analysis. It is required only when the control file does not have a header; it is ignored when a header is detected

Performance

lpread accesses a control file of time-ordered information frames, each containing n-pole filter coefficients derived from linear predictive analysis of a source signal at fixed time intervals (e.g. 1/100 of a second), plus four parameter values:
  1. krmsr -- root-mean-square (rms) of the residual of analysis
  2. krmso -- rms of the original signal
  3. kerr -- the normalized error signal
  4. kcps -- pitch in Hz


lpread gets its values from the control file according to the input value ktimpnt (in seconds). If ktimpnt proceeds at the analysis rate, time-normal synthesis will result; proceeding at a faster, slower, or variable rate will result in time-warped synthesis. At each k-period, lpread interpolates between adjacent frames to more accurately determine the parameter values (presented as output) and the filter coefficient settings (passed internally to a subsequent lpreson).

ktimpnt
The passage of time, in seconds, through the analysis file. ktimpnt must always be positive, but can move forwards or backwards in time, be stationary or discontinuous, as a pointer into the analysis file.



LPRESON

lpreson
Modifies the spectrum of an audio signal with time-varying filter coefficients from a control file

Syntax

 ar lpreson asig 


Performance

lpread gets its values from the control file according to the input value ktimpnt (in seconds). If ktimpnt proceeds at the analysis rate, time-normal synthesis will result; proceeding at a faster, slower, or variable rate will result in time-warped synthesis. At each k-period, lpread interpolates between adjacent frames to more accurately determine the parameter values (presented as output) and the filter coefficient settings (passed internally to a subsequent lpreson).

asig
an audio signal to be modified



LPFRESON

lpfreson
Modifies the spectrum of an audio signal with time-varying filter coefficients from a control file and frequncy ratio

Syntax

 ar lpfreson asig, kfrqratio 


Performance

lpread gets its values from the control file according to the input value ktimpnt (in seconds). If ktimpnt proceeds at the analysis rate, time-normal synthesis will result; proceeding at a faster, slower, or variable rate will result in time-warped synthesis. At each k-period, lpread interpolates between adjacent frames to more accurately determine the parameter values (presented as output) and the filter coefficient settings (passed internally to a subsequent lpfreson).

asig
an audio signal to be modified
kfrqratio
frequency ratio. Must be greater than 0


This page updated May 16, 2005


This page under construction
Graphics will be uploaded by May 17, 2005
cSound examples will be added by the end of the Summer of 2005

contact info: Jeremy Baguyos ( jbkontra@hotmail.com )