Utilising Tree-Based Ensemble Learning for Speaker Segmentation

Mohamed Abou-Zleikha; Zheng-Hua Tan; Mads Græsbøll Christensen; Søren Holdt Jensen

doi:10.1007/978-3-662-44654-6_5

Conference Papers Year : 2014

Utilising Tree-Based Ensemble Learning for Speaker Segmentation

(1) , (1) , (1) , (1)

Mohamed Abou-Zleikha

Function : Author

Aalborg University [Denmark]

Zheng-Hua Tan

Function : Author
PersonId : 992412

Aalborg University [Denmark]

Mads Græsbøll Christensen

Function : Author
PersonId : 992413

Aalborg University [Denmark]

Søren Holdt Jensen

Function : Author
PersonId : 858903

Aalborg University [Denmark]

Abstract

In audio and speech processing, accurate detection of the changing points between multiple speakers in speech segments is an important stage for several applications such as speaker identification and tracking. Bayesian Information Criteria (BIC)-based approaches are the most traditionally used ones as they proved to be very effective for such task. The main criticism levelled against BIC-based approaches is the use of a penalty parameter in the BIC function. The use of this parameters consequently means that a fine tuning is required for each variation of the acoustic conditions. When tuned for a certain condition, the model becomes biased to the data used for training limiting the model’s generalisation ability.In this paper, we propose a BIC-based tuning-free approach for speaker segmentation through the use of ensemble-based learning. A forest of segmentation trees is constructed in which each tree is trained using a sampled version of the speech segment. During the tree construction process, a set of randomly selected points in the input sequence is examined as potential segmentation points. The point that yields the highest ΔBIC is chosen and the same process is repeated for the resultant left and right segments. The tree is constructed where each node corresponds to the highest ΔBIC with the associated point index. After building the forest and using all trees, the accumulated ΔBIC for each point is calculated and the positions of the local maximums are considered as speaker changing points. The proposed approach is tested on artificially created conversations from the TIMIT database. The approach proposed show very accurate results comparable to those achieved by the-state-of-the-art methods with a 9% (absolute) higher F1 compared with the standard ΔBIC with optimally tuned penalty parameter.

Domains

Computer Science [cs]

Fichier principal

978-3-662-44654-6_5_Chapter.pdf (258.05 Ko)

Origin	Files produced by the author(s)

Hal Ifip : Connect in order to contact the contributor

https://inria.hal.science/hal-01391292

Submitted on : Thursday, November 3, 2016-10:50:56 AM

Last modification on : Thursday, March 5, 2020-5:40:56 PM

Long-term archiving on : Saturday, February 4, 2017-12:59:53 PM

Dates and versions

hal-01391292 , version 1 (03-11-2016)

Licence

Attribution

Identifiers

HAL Id : hal-01391292 , version 1
DOI : 10.1007/978-3-662-44654-6_5

Cite

Mohamed Abou-Zleikha, Zheng-Hua Tan, Mads Græsbøll Christensen, Søren Holdt Jensen. Utilising Tree-Based Ensemble Learning for Speaker Segmentation. 10th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), Sep 2014, Rhodes, Greece. pp.50-59, ⟨10.1007/978-3-662-44654-6_5⟩. ⟨hal-01391292⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP IFIP-AICT IFIP-TC IFIP-WG IFIP-TC12 IFIP-AIAI IFIP-WG12-5 IFIP-AICT-436

99 View

109 Download

Utilising Tree-Based Ensemble Learning for Speaker Segmentation

Abstract

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share