Details

Title

Estimation and tracking of fundamental, 2nd and 3d harmonic frequencies for spectrogram normalization in speech recognition

Journal title

Bulletin of the Polish Academy of Sciences Technical Sciences

Yearbook

2012

Volume

60

Issue

No 1

Authors

Divisions of PAS

Nauki Techniczne

Coverage

71-81

Date

2012

Identifier

DOI: 10.2478/v10175-012-0011-z ; ISSN 2300-1917

Source

Bulletin of the Polish Academy of Sciences: Technical Sciences; 2012; 60; No 1; 71-81

References

Benesty J. (2008), Springer Handbook of Speech Processing, doi.org/10.1007/978-3-540-49127-9 ; Demenko G. (2010), Implementation of Polish speech synthesis for the BOSS system, Bull. Pol. Ac.: Tech, 58, 3, 371. ; Goodwin M. (2008), Springer Handbook of Speech Processing, 229, doi.org/10.1007/978-3-540-49127-9_12 ; U. Glavitsch: "Speaker normalization with respect to <i>F</i>0: a perceptual approach", in: <i>TIK-Report No. 185</i>, Eidgenössische Technische Hochschule Zürich, Zürich, 2003. ; O'Shaughnessy D. (2008), Springer Handbook of Speech Processing, 213, doi.org/10.1007/978-3-540-49127-9_11 ; Schafer R. (2008), Springer Handbook of Speech Processing, 161, doi.org/10.1007/978-3-540-49127-9_9 ; Hess W. (1992), Advances in Speech Signal Processing, 3. ; A. de Cheveign'e (2001), Comparative evaluation of F0 estimation algorithms, null, 1, 2451. ; Unoki M. (2008), Estimation of fundamental frequency of reverberant speech by utilizing complex cepstrum analysis, J. Signal Processing, 12, 1, 31. ; Kawahara H. (1999), Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, null, 2781. ; A. de Cheveign'e (2002), Yin, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am, 111, 4, 1917, doi.org/10.1121/1.1458024 ; Miwa T. (1998), The pitch estimation of different musical instruments sounds using comb filters for transcription, IEICE Trans, D-2, 9, 1965. ; Nakatani T. (2004), Robust and accurate fundamental frequency estimation based on dominant harmonic components, J. Acoust. Soc. Am, 116, 6, 3690, doi.org/10.1121/1.1787522 ; Ishimoto Y. (2001), A fundamental frequency estimation method for noisy speech based on instantaneous amplitude and frequency, null, 2439. ; Atake Y. (2000), Robust estimation of fundamental frequency using instantaneous frequencies of harmonic components, IEICE Proc, D-2, 11, 2077. ; Dubois C. (2007), Joint detection and tracking of time-varying harmonic components: a flexible bayesian approach, IEEE Trans. on Audio Speech and Language Processing, 15, 4, 1283, doi.org/10.1109/TASL.2007.894522 ; Kim S. (2008), Multiharmonic tracking using sigmapoint Kalman filter, IEEE EMBC, 8. ; Nishi K. (1988), Multiple pitch tracking and harmonic segregation algorithm for auditory scene analysis, The Society of Instrument and Control Engineers, 34, 6, 483, doi.org/10.9746/sicetr1965.34.483 ; Hainsworth S. (2003), Beat tracking with particle filtering algorithms, null, 1, 91. ; Tomoike S. (2008), Estimation of local peaks based on particle filter in advance environments, J. Signal Processing, 12, 4, 303. ; Lee L. (1998), A frequency warping approach to speaker normalization, IEEE Trans. on Speech and Audio Processing, 6, 1, 49, doi.org/10.1109/89.650310 ; P. Dognin, "A bandpass transform for speaker normalization", <i>Ph. D. Dissertation</i>, University of Pittsburgh, Pittsburgh, 2003. ; Traunmüller H. (1987), Perceptual relativity in identification of two-formant vowels, Speech Communication, 6, 143, doi.org/10.1016/0167-6393(87)90037-9 ; Eide E. (1996), A parametric approach to vocal tract length normalization, Proc. ICASSP, 1, 346. ; Laroche J. (1999), New phase-vocoder techniques for real-time pitch shifting, chorusing, harmonizing, and other exotic audio modifications, J. Audio Eng. Soc, 47, 11, 928. ; Rabiner L. (1997), On the use of autocorrelation analysis for pitch, IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP-25, 1, 24. ; Shimamura T. (2001), Weighted autocorrelation for pitch extraction of noisy speech, IEEE Trans. on Speech and Audio Processing, 9, 7, 727, doi.org/10.1109/89.952490 ; Ying G. (1994), A probabilistic approach to AMDF pitch detection, J. Acoust. Soc. Am, 95, 5, 2817, doi.org/10.1121/1.409712 ; Miyamoto T. (1983), A real time PARCOR analysis of speech by high- performance signal processors, IEICE, J66-A, 7, 625. ; Sakai T. (1995), Improvement of pitch extraction method in noisy environment based on cepstrum, Electronics, Information, and Communication Engineers, 1, 299. ; Haward D. (1989), Peak-picking fundamental period estimation for hearing prostheses, J. Acoust. Soc. Am, 86, 3, 902, doi.org/10.1121/1.398725 ; Ristic B. (2004), Beyond the Kalman Filter. Particle Filters for Tracking. ; Medan Y. (1991), Super resolution pitch determination of speech, IEEE Trans. on Signal Processing, 39, 1, doi.org/10.1109/78.80763 ; Veprek P. (2002), Analysis, enhancement and evaluation of five pitch determination techniques, Speech Comm, 37, 249, doi.org/10.1016/S0167-6393(01)00017-6 ; Adamczyk B. (2000), Robot's vocabluary, IAiR Bulletin, 12. ; Hu G.-N. (2004), Monaural speech segregation based on pitch tracking and amplitude modulation, IEEE Trans. on Neural Networks, 15, 5, 1135, doi.org/10.1109/TNN.2004.832812 ; Kasprzak W. (2010), Relaxing the WDO assumption in blind extraction of speakers from speech mixtures, J. Telecom. and Information Technology, 4, 50. ; Okazaki F. (2005), A two-step approach to blind deconvolution of speech and sound sources in the time domain, Bull. Pol. Ac.: Tech, 53, 1, 49.
×