Details Details PDF BIBTEX RIS Title Estimation and tracking of fundamental, 2nd and 3d harmonic frequencies for spectrogram normalization in speech recognition Journal title Bulletin of the Polish Academy of Sciences Technical Sciences Yearbook 2012 Volume 60 Issue No 1 Authors Fujimoto, K. ; Kasprzak, W. ; Hamada, N. Divisions of PAS Nauki Techniczne Coverage 71-81 Date 2012 Identifier DOI: 10.2478/v10175-012-0011-z ; ISSN 2300-1917 Source Bulletin of the Polish Academy of Sciences: Technical Sciences; 2012; 60; No 1; 71-81 References Benesty J. (2008), Springer Handbook of Speech Processing, doi.org/10.1007/978-3-540-49127-9 ; Demenko G. (2010), Implementation of Polish speech synthesis for the BOSS system, Bull. Pol. Ac.: Tech, 58, 3, 371. ; Goodwin M. (2008), Springer Handbook of Speech Processing, 229, doi.org/10.1007/978-3-540-49127-9_12 ; U. Glavitsch: "Speaker normalization with respect to <i>F</i>0: a perceptual approach", in: <i>TIK-Report No. 185</i>, Eidgenössische Technische Hochschule Zürich, Zürich, 2003. ; O'Shaughnessy D. (2008), Springer Handbook of Speech Processing, 213, doi.org/10.1007/978-3-540-49127-9_11 ; Schafer R. (2008), Springer Handbook of Speech Processing, 161, doi.org/10.1007/978-3-540-49127-9_9 ; Hess W. (1992), Advances in Speech Signal Processing, 3. ; A. de Cheveign'e (2001), Comparative evaluation of F0 estimation algorithms, null, 1, 2451. ; Unoki M. (2008), Estimation of fundamental frequency of reverberant speech by utilizing complex cepstrum analysis, J. Signal Processing, 12, 1, 31. ; Kawahara H. (1999), Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, null, 2781. ; A. de Cheveign'e (2002), Yin, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am, 111, 4, 1917, doi.org/10.1121/1.1458024 ; Miwa T. (1998), The pitch estimation of different musical instruments sounds using comb filters for transcription, IEICE Trans, D-2, 9, 1965. ; Nakatani T. (2004), Robust and accurate fundamental frequency estimation based on dominant harmonic components, J. Acoust. Soc. Am, 116, 6, 3690, doi.org/10.1121/1.1787522 ; Ishimoto Y. (2001), A fundamental frequency estimation method for noisy speech based on instantaneous amplitude and frequency, null, 2439. ; Atake Y. (2000), Robust estimation of fundamental frequency using instantaneous frequencies of harmonic components, IEICE Proc, D-2, 11, 2077. ; Dubois C. (2007), Joint detection and tracking of time-varying harmonic components: a flexible bayesian approach, IEEE Trans. on Audio Speech and Language Processing, 15, 4, 1283, doi.org/10.1109/TASL.2007.894522 ; Kim S. (2008), Multiharmonic tracking using sigmapoint Kalman filter, IEEE EMBC, 8. ; Nishi K. (1988), Multiple pitch tracking and harmonic segregation algorithm for auditory scene analysis, The Society of Instrument and Control Engineers, 34, 6, 483, doi.org/10.9746/sicetr1965.34.483 ; Hainsworth S. (2003), Beat tracking with particle filtering algorithms, null, 1, 91. ; Tomoike S. (2008), Estimation of local peaks based on particle filter in advance environments, J. Signal Processing, 12, 4, 303. ; Lee L. (1998), A frequency warping approach to speaker normalization, IEEE Trans. on Speech and Audio Processing, 6, 1, 49, doi.org/10.1109/89.650310 ; P. Dognin, "A bandpass transform for speaker normalization", <i>Ph. D. Dissertation</i>, University of Pittsburgh, Pittsburgh, 2003. ; Traunmüller H. (1987), Perceptual relativity in identification of two-formant vowels, Speech Communication, 6, 143, doi.org/10.1016/0167-6393(87)90037-9 ; Eide E. (1996), A parametric approach to vocal tract length normalization, Proc. ICASSP, 1, 346. ; Laroche J. (1999), New phase-vocoder techniques for real-time pitch shifting, chorusing, harmonizing, and other exotic audio modifications, J. Audio Eng. Soc, 47, 11, 928. ; Rabiner L. (1997), On the use of autocorrelation analysis for pitch, IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP-25, 1, 24. ; Shimamura T. (2001), Weighted autocorrelation for pitch extraction of noisy speech, IEEE Trans. on Speech and Audio Processing, 9, 7, 727, doi.org/10.1109/89.952490 ; Ying G. (1994), A probabilistic approach to AMDF pitch detection, J. Acoust. Soc. Am, 95, 5, 2817, doi.org/10.1121/1.409712 ; Miyamoto T. (1983), A real time PARCOR analysis of speech by high- performance signal processors, IEICE, J66-A, 7, 625. ; Sakai T. (1995), Improvement of pitch extraction method in noisy environment based on cepstrum, Electronics, Information, and Communication Engineers, 1, 299. ; Haward D. (1989), Peak-picking fundamental period estimation for hearing prostheses, J. Acoust. Soc. Am, 86, 3, 902, doi.org/10.1121/1.398725 ; Ristic B. (2004), Beyond the Kalman Filter. Particle Filters for Tracking. ; Medan Y. (1991), Super resolution pitch determination of speech, IEEE Trans. on Signal Processing, 39, 1, doi.org/10.1109/78.80763 ; Veprek P. (2002), Analysis, enhancement and evaluation of five pitch determination techniques, Speech Comm, 37, 249, doi.org/10.1016/S0167-6393(01)00017-6 ; Adamczyk B. (2000), Robot's vocabluary, IAiR Bulletin, 12. ; Hu G.-N. (2004), Monaural speech segregation based on pitch tracking and amplitude modulation, IEEE Trans. on Neural Networks, 15, 5, 1135, doi.org/10.1109/TNN.2004.832812 ; Kasprzak W. (2010), Relaxing the WDO assumption in blind extraction of speakers from speech mixtures, J. Telecom. and Information Technology, 4, 50. ; Okazaki F. (2005), A two-step approach to blind deconvolution of speech and sound sources in the time domain, Bull. Pol. Ac.: Tech, 53, 1, 49.