## Chapter 2

Before design a voice sweetening system, we have to understand the behavior of assorted types of noise, difference between the noise beginning and the scope of noise degrees that may be encountered in existent life. Generally noise refers to the anything that interferes with what we want. There are many proficient challenges seeking to pull out a coveted information signal from a background of unwanted noise such as in constructing a cell phone system, robust voice acknowledgment, ultrasound machine and others.

## Noise Beginnings

In existent life, noise will environ us wherever we go. It will look in different forms and signifiers. For illustration, in the street the autos are go throughing by and people speaking at the nearby tabular array in the eating house.

By and large, noise can be divided into two types which are stationary and non-stationary. Stationary noise has a spectral power denseness that does non alter over clip, for illustration noise brand by a computing machine fan or air conditioning. While the non-stationary noise has statistical belongingss that change over clip, for illustration noise generated caused by door sweep, wireless, voice and Television. We can clearly do a decision where the undertaking of stamp downing non-stationary noise is more hard than that of stamp downing stationary noise.

Another characteristic of the assorted type of noise is the form of spectrum, the distribution of noise energy in the frequence sphere. By compare the long term norm spectra illustration of noise beginnings in eating house, auto and train where the noise beginnings were taken from the NOIZEUS principal. We found that the auto noise is comparatively stationary but the eating house and train noise are non. The noise beginnings are more distinguishable in frequence sphere instead than the clip sphere.

Normally, there are two chief beginnings of deformation may do the voice signal degraded which are linear noise and channel deformation. For the linear noise, it can be categorised into stationary and non-stationary every bit explain as above statement. For illustration, a fan running in the background, a door sweep and a conversation among others. If we captured the signal with the talker near to the mike for certain the signal captured has a small noise and echo. However, if the mike is far from the talker ‘s oral cavity it can pick up a batch of noise and echo.

For channel deformation, it can be caused by echo such as the frequence response of a mike, the response of the local cringle of a telephone line, the presence of an electrical filter in an ADC circuit and a address codec. The echo is caused by the contemplation of acoustic moving ridges on the walls and other objects in the room that will change the address signal. For the direct way, the signal degree at the mike is reciprocally relative to the distance from the talker. For the reflected sound moving ridges, the signal degree is reciprocally relative with the distance of sound travel. Besides that, we have to take into history the energy soaking up which takes topographic point each clip the sound wave hits a surface where the different surface stuff will give a otherwise soaking up.

## 2.1.2 Noise and Speech Levels in Various Environments

To the design of address sweetening algorithms, we have to understanding the cognition of the scope of address and noise strength degrees in real-world scenarios in order to gauge the scope of signal/noise ratio ratio ( SNR ) degrees encountered in realistic environment. The address sweetening algorithms have to work efficaciously in stamp downing noise and hence to bettering speech quality within that scope of SNR degrees.

In 1977, Pearson and co-workers ( Pearson, Bennett & A ; Fidell, 1977 ) had done a comprehensive analysis which is a measuring of address and noise degrees in existent universe environments. They considered a assortment of environments encountered in day-to-day life such as outside and inside of place, commuter train, nursing station and section shop. This analysis had provided an of import principal of informations on typical address and noise, SNR ratio across a broad assortment of mundane listening state of affairss. The address and noise degrees were measured utilizing sound degrees metres. The measurings were reported in dubnium sound force per unit area degree ( SPL ) where the dubnium SPL is the comparative force per unit area of sound in mention to 0.0002 dynes/cmA? , matching to the hardly hearable sound force per unit area.

hypertext transfer protocol: //www.audiologyonline.com/management/uploads/articles/schum_table1.gif

Figure 2.1 Average address and noise degrees in a assortment of environments, from Pearsons et Al. ( 1977 ) .

Figure above is the sum-up of the mean address and noise degrees measured in assorted environments from the consequence of Pearsons ‘s analysis. From the figure, we found that the address degrees are increase in highly high background noise degrees. By and large, people will be given to raise their voices when noise degree goes beyond 45 dB SPL, this phenomenon is known as the Lombard consequence. The address degree will be given to increase 0.5dB for every 1dB addition in background noise. When the ambient noise degree goes beyond 70 dB SPL, people will halt raising their voice. In practical environment, the address sweetening algorithms to be employed have demands to run at SNRs in the scope of 0-15dB.

## Speech Perception and patterning

## 2.2.1 Speech Percept

Procedures by which worlds are able to construe and understand the sounds used in the linguistic communication are called speech perceptual experience. Normally, the address perceptual experience is closely linked to the phonemics and phonic field. Researches for seek to understand how the worlds recognize the address sounds and utilize this information to understand the spoken linguistic communication were done by many speech perceptual experience research workers. These researches about the address perceptual experience have been used in some applications in the building of computing machine systems which are able to acknowledge the address and convey a pregnant signal, every bit good as to better the acknowledgment for hearing of hearers. Figure below the shows speech production of homo ‘s organ, there are a batch of biological and psychological factors which can impact the address which include upsets with the lungs, the voice box and the vocal cords.

hypertext transfer protocol: //www.barcode.ro/tutorials/biometrics/img/speech-production.jpg

Figure 2.2 Speech production of homo ‘s organ

## 2.2.2 Engineering Model of Speech Production

Presents, a batch of electronic devices which can utilize the voice that every bit similar as possible to a existent human voice and speak to us. Figure shown below is the 1 of the production voice theoretical account.

hypertext transfer protocol: //www.gtsav.gatech.edu/vapl/images/speechmodel.gif

Figure 2.3 Source-tract theoretical account of address production

Reason for utilizing this theoretical account is because this theoretical account has been used extensively for low-bit-rate address coding application. By utilizing this theoretical account, foremost we have to make up one’s mind the noise that we want to bring forth is voiced or unvoiced. Voiced sounds are produced when the vocal creases are in the voicing province where voiceless sounds are produced when the vocal creases are in the unvoicing province. If we want the sonant sounds we have to pattern a glottal pulsation train resemble to the produced in our vocal cords. If we want the voiceless sounds the signal produced which is sound like noise we can see in the continuant sounds. After that we have to travel through the vocal piece of land with our generated signal. In this theoretical account, vocal piece of land resonance is represented by a quasi-linear system that is excited by either a periodic or nonperiodic beginnings, depending on the province of the vocal creases. The vocal creases can presume one of two provinces is modelled by a switch. The vocal piece of land is modelled by a time-invariant additive filter. The vocal piece of land resonance will filtrate the signal with a filter that tries to mime the consequence of the form formed with the guttural pit ( pharynx ) , vocal and rhinal pit. Last, the end product of vocal piece of land filter is fed to the radiation theoretical account. The radiation theoretical account will reproduce the consequence of the radiation electric resistance that the air put up to the issue of the address from the oral cavity.

## 2.3 Algorithms for Voice Enhancement

Recently, voice treating algorithms can approximately be divided into three spheres, spectral minus, sub-space analysis and filtrating algorithms. Spectral minus algorithms operate in the spectral sphere by taking the sum of energy which corresponds to the noise part from each spectral set. Spectral minus is one of the popular algorithms being used in address sweetening because it is work efficaciously in gauging the spectral magnitude of the voice signal. Sub-space analysis operates in the autocorrelation sphere. The voice and noise constituents can be assumed to be extraneous whereby their parts can be readily separated but to happen the extraneous constituents is computationally expensive. Besides that, the extraneous premise is hard to actuate. Therefore, this algorithm is non encouraged to utilize in our undertaking. Another algorithm which is filtrating algorithms is runing in time-domain which includes Wiener filtering and Kalman filtering. That Wiener filtrating effort to either take the noise constituent and Kalman filtrating attack to gauge the noise and voice constituents.

## 2.3.1 Spectral-Subtractive Algorithms

The spectral minus algorithm is historically one of the first algorithms proposed for noise decrease ( Boll, 1979 ; Weiss et al. , 1974 ) . Based on the rule, we presuming it are linear noise and the noise spectrum can be estimated and updated during the periods when the signal is absent. Then we obtain an estimation of the clean signal spectrum by deducting out the estimated of the noise spectrum from the noisy address spectrum. The enhanced signal is obtained by calculating the reverse discrete Fourier transform ( IDFT ) of the estimated signal spectrum utilizing the stage of the noisy signal. This algorithm involves a individual forward and opposite Fourier transform. To avoid any speech deformation we have to carefully during the minus procedure. If excessively small is subtracted so the address signal remains will interfere with noise. If excessively much is subtracted so some portion of address information might be removed.

## 2.3.1.1 Principle of Spectral Subtraction

The rule of spectral minus shown below is introduced by Boll in 1979.

Let y ( n ) be the sampled noisy address signal, x ( n ) be the clean signal and vitamin D ( n ) be the noise signal. We assume the sampled noisy address signal consist clean signal and noise signal, therefore we can compose as

Y ( n ) = x ( n ) + vitamin D ( n ) ( 1 )

Taking the short-time Fourier transform of Y ( n ) , we get

Y ( I‰k ) = X ( I‰k ) + D ( I‰k ) ( 2 )

for I‰k = 2Iˆk/N and k = 0,1,2, . . . , N – 1, where N is the frame length in samples.

We can show Y ( I‰k ) in polar signifier as

Y ( I‰k ) =| Y ( I‰k ) | ( 3 )

We can multiply the Y ( I‰k ) by its conjugate Y* ( I‰k ) to obtain the short-run power spectrum of the noisy address.

|Y ( I‰k ) |A? = |X ( I‰k ) | A? + |D ( I‰k ) | A? + X ( I‰k ) a?»D* ( I‰k ) + X* ( I‰k ) a?»D ( I‰k )

= |X ( I‰k ) | A? + |D ( I‰k ) | A? + 2Re|X ( I‰k ) D* ( I‰k ) | ( 4 )

The footings |D ( I‰k ) | A? , X ( I‰k ) a?»D* ( I‰k ) and X* ( I‰k ) a?»D ( I‰k ) can non be obtained straight and are approximated as E { |D ( I‰k ) | A? } , E { X ( I‰k ) a?»D* ( I‰k ) } and E { X* ( I‰k ) a?»D ( I‰k ) } where E [ a?» ] denotes the outlook operator. Typically, E { |D ( I‰k ) | A? } is estimated during non-speech activity and is denoted by |DE† ( I‰k ) | A? . If we assume that vitamin D ( n ) is 0 mean and uncorrelated with the clean signal ten ( n ) , so the footings E { X ( I‰k ) a?»D* ( I‰k ) } and E { X* ( I‰k ) a?»D ( I‰k ) } cut down to zero. Therefore, from the above premises, the estimation of the clean address power spectrum, denoted as |XE† ( I‰k ) | A? , can be obtained as follows:

|XE† ( I‰k ) |A? = |Y ( I‰k ) | A? – |DE† ( I‰k ) | A? ( 5 )

The above equation describes the power spectrum minus algorithm. The estimated power spectrum |XE† ( I‰k ) | A? is non guaranteed to be positive, but can be half-wave rectified. The enhanced signal is eventually obtained by calculating the opposite Fourier transform of |XE† ( I‰k ) | utilizing the stage of the noisy speech signal. We can compose in the undermentioned signifier:

|XE† ( I‰k ) |A? = HA? ( I‰k ) |Y ( I‰k ) | A? ( 6 )

Where H ( I‰k ) = is the addition ( or suppression ) map and a‰? |Y ( I‰k ) |A? / |DE† ( I‰k ) | A? .

Assuming that the cross footings in equation ( 6 ) are zero. Hence, H ( I‰k ) is ever positive taking values in the scope of 0 a‰¤ H ( I‰k ) a‰¤ 1. H ( I‰k ) is called the suppression map because it provides the sum of suppression or fading applied to the noisy power spectrum |Y ( I‰k ) |A? at a given frequence to obtain enhanced power spectrum |XE† ( I‰k ) |A? .

A general version of the spectral minus algorithms is given by

( 7 )

Where P is the power advocate with p=1 giving the original magnitude spectral minus and p=2 giving the power minus algorithm.

From the equation ( 2 ) , the noisy spectrum Y ( I‰k ) at frequence I‰k is obtained by summing two complex-valued spectra at frequence I‰k. Then Y ( I‰k ) can be represented geometrically in the complex plane as the amount of two complex Numberss, X ( I‰k ) and D ( I‰k ) . Figure below shows the representation of Y ( I‰k ) as a vector add-on of X ( I‰k ) and D ( I‰k ) in the complex plane.

Figure 2.4 Representation of the noisy spectrum Y ( I‰k ) in the complex plane as the amount of the clean signal spectrum X ( I‰k ) and noise spectrum D ( I‰k ) .

## 2.3.2 Kalman Filtering

Kalman filter is operates through a anticipation and rectification mechanism. The mistake is statistically minimized by predicts a new province from its old appraisal and adding a rectification term proportional to the predicted mistake. Kalman filter is the chief algorithm to gauge dynamic systems specified in state-space signifier. The Kalman filter consists in a set of mathematic equations which give an optimal recursive solution through the least square method. The end of this solution is to cipher an indifferent minimal discrepancy additive calculator of the province in T, based on the information available in t-1, and update these appraisals, with the extra information available in T, ( Clar eh Al. 1998 ) . The survey of Kalman filter is based on Wiener filter.

## 2.3.2.1 Wiener Filter

The aim of Wiener filter is to take the noise signal from a corrupted signal. This optimum Wiener filter was proposed by Norbert Wiener during the 1940s. Statistical attack has been used to cut down the sum of noise in the corrupted signal this filter. Every device in fact will present an mistake in the end product when a signal is measured. Let xk be the original signal, hk is the response of device, yk is the end product. We can compose as

yk = xk * hk

Apply Fourier Transform,

Yj = Xj a?» Hj

The 2nd beginning of signal corruptness is the unknown background noise nk is added due to the procedure. ykE† , the mensural signal:

ykE† = yk + nk

Solve this equation, if we do non hold noise and we know the transportation response, so the solution is

Xj =

If we have noise, we have to filtrate the end product signal with a Wiener filter.

Xj =

Normally, the filters designed are usage for a specific frequence but in Wiener filters we need the cognition about the spectral belongingss of the original signal and noise. After that, we have to happen an end product that would be every bit near as possible to the original signal which is LTI filter. The Wiener filter makes the premise that the signal and linear noise are stationary additive stochastic procedures with known spectral features or known autocorrelation and cross-correlation. The demand of this filter must be physically realizable and which use the public presentation standard of minimal mean-square mistake.

## 2.3.2.2 Discrete Kalman Filter-The Process to be estimated

In 1960, R.E. Kalman published his celebrated paper depicting a recursive solution to the distinct informations linear filtrating job [ Kalman60 ] .

The Kalman filter has the end of work outing the general job of estimation the province X Iµ of a procedure controlled in distinct clip, which is dominated by a additive equation in stochastic difference in the undermentioned manner:

Xn = A a?» Xn-1 + wn-1

with a step Y Iµ , that is:

Yn = C a?» Xn + vn

The random variables wn and vn represent the procedure and the step mistake, severally. Assuming they are independent of each other and are white noise variables with normal chance distribution:

P ( tungsten ) a‰? N ( 0, Rw )

P ( V ) a‰? N ( 0, Rv )

Practically, the covariance matrix of the procedure ‘s disturbance, Rw, and the step ‘s disturbance, Rv, could alter in clip but we assumed they are invariables. The matrix A is assumed to be of thousand x m dimension and it relates the province in the period n-1 with the province in the n minute. The matrix C has a dimension N x m and it relates the province with the step Yn. These matrixes may alter over clip, but by and large we besides assumed it as changeless.

## 2.3.2.3 The Algorithm of Discrete Kalman Filter

The Kalman filter estimates the old procedure utilizing a feedback control. It estimates the procedure to a minute over the clip and so it gets the feedback through the ascertained informations.

From the point of position of the equation that used to derivate the Kalman filter, it separates them into two groups which included clip update equations and measurement update equations. The first group of equations, clip update, has to throw the province to the n minute taking as mention the province on n-1 minute and the intermediate update of the covariance matrix of the province. The 2nd group of equations, measuring update, has to take attention of the feedback and add new information inside the old appraisal to accomplish an improved appraisal of the province.

The clip update equation can be seen as anticipation equations, while the measurement equations can be seen as rectification equations. The concluding appraisal algorithm can be defined as a prediction-correction algorithm to work out many jobs. The Kalman filter works through a projection and rectification mechanism to foretell the new province and its uncertainness and rectify the projection with the new step. Figure below show the rhythm of distinct Kalman algortihm.

Figure 2.5 The distinct Kalman filter cycle-the clip update undertakings the current province estimation in front in clip. The measuring update adjusts the jutting estimation by an existent measuring at that clip

The specified equations for the province anticipation are detailed as follows:

From the equations predict the province and covariance appraisals frontward from minute n-1 to n. These two expressions give us an estimation value for xn and its covariance. The first Kalman equation estimates the following sample from the old province. The 2nd Kalman equation is the covariance matrix used to foretell the appraisal mistake. The A matrix relates the province in the old minute n-1 with the existent minute N, this matrix could alter for the different minutes over the clip. Rw represents the covariance of the procedure random disturbance which tries to gauge the province.

The specified equations for the province rectification are detailed as follows. They are called measurement updating equations.

First, during the province projection rectification, we have to cipher the Kalman addition, Re, n. This addition factor is chosen in such a manner it minimizes the covariance mistake of the new province appraisal. The following measure is to mensurate the procedure to acquire yn and bring forth a new province appraisal which incorporates the new observation. Last, is to happen a new appraisal of the mistake covariance through the last equation. After each twosome of updates, clip and step, the procedure is repeated taking as get downing point the new province appraisals and the mistake covariance.

The figure below shows us the complete operation of the filter, uniting the anticipation and rectification and the five Kalman equations.

Figure 2.6 Main equations of Kalman Filter -the interaction of the anticipation and rectification stairss