Related Text Discovery Through Consecutive Filtering and Supervised Learning - Intelligence Science II Third IFIP TC 12 International Conference, ICIS 2018 Access content directly
Conference Papers Year : 2018

Related Text Discovery Through Consecutive Filtering and Supervised Learning

Daqing Wu
  • Function : Author
  • PersonId : 1046619
Jinwen Ma
  • Function : Author
  • PersonId : 1046618


In a related or topic-based text discovery task, there are often a small number of related or positive texts in contrast to a large number of unrelated or negative texts. So, the related and unrelated classes of the texts can be strongly imbalanced so that the classification or detection is very difficult because the recall of positive class is very low. In order to overcome this difficulty, we propose a consecutive filtering and supervised learning method, i.e., consecutive supervised bagging. That is, in each consecutive learning stage, we firstly delete some negative texts with the higher degree of confidence via the classifier trained in the previous stage. We then train the classifier on the retained texts. We repeat this procedure until the ratio of the negative and positive texts becomes reasonable and finally obtain a tree-like filtering and recognition system. It is demonstrated by the experimental results on 20NewsGroups data (English data) and THUCNews (Chinese data) that our proposed method is much better than AdaBoost and Rocchio.
Fichier principal
Vignette du fichier
474230_1_En_22_Chapter.pdf (454.55 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-02118810 , version 1 (03-05-2019)


Attribution - CC BY 4.0



Daqing Wu, Jinwen Ma. Related Text Discovery Through Consecutive Filtering and Supervised Learning. 2nd International Conference on Intelligence Science (ICIS), Nov 2018, Beijing, China. pp.211-220, ⟨10.1007/978-3-030-01313-4_22⟩. ⟨hal-02118810⟩
20 View
18 Download



Gmail Facebook Twitter LinkedIn More