Short Text Feature Extension Based on Improved Frequent Term Sets

Huifang Ma; Lei Di; Xiantao Zeng; Li Yan; Yuyi Ma

doi:10.1007/978-3-319-48390-0_18

Conference Papers Year : 2016

Short Text Feature Extension Based on Improved Frequent Term Sets

(1) , (1) , (1) , (1) , (1)

Huifang Ma

Function : Author
PersonId : 990766

Northwest Normal University [Lanzhou]

Lei Di

Function : Author

Northwest Normal University [Lanzhou]

Xiantao Zeng

Function : Author

Northwest Normal University [Lanzhou]

Li Yan

Function : Author

Northwest Normal University [Lanzhou]

Yuyi Ma

Function : Author

Northwest Normal University [Lanzhou]

Abstract

A short text feature extension algorithm based on improved frequent word set is proposed. By calculating support and confidence, the same category tendencies of frequent term sets are extracted. Correlations based frequent term sets are defined to further extend the term set. Meanwhile, information gain is introduced to traditional TF-IDF, better expressing the category distribution information and the weight of word for each category is enhanced. All term pairs with external relations are extracted and the frequent term set is expanded. Finally, the word similarity matrix is constructed via the frequent word set, and the symmetric non-negative matrix factorization technique is applied to extend the feature space. Experiments show that the constructed short text model can improve the performance of short text clustering.

Keywords

Term weighing Information gain Frequent term set Correlation Non-negative matrix factorization

Domains

Computer Science [cs]

Fichier principal

433802_1_En_18_Chapter.pdf (501.85 Ko)

Origin	Files produced by the author(s)

Hal Ifip : Connect in order to contact the contributor

https://inria.hal.science/hal-01614992

Submitted on : Wednesday, October 11, 2017-4:57:54 PM

Last modification on : Thursday, March 5, 2020-5:43:16 PM

Long-term archiving on : Friday, January 12, 2018-3:25:22 PM

Dates and versions

hal-01614992 , version 1 (11-10-2017)

Licence

Attribution

Identifiers

HAL Id : hal-01614992 , version 1
DOI : 10.1007/978-3-319-48390-0_18

Cite

Huifang Ma, Lei Di, Xiantao Zeng, Li Yan, Yuyi Ma. Short Text Feature Extension Based on Improved Frequent Term Sets. 9th International Conference on Intelligent Information Processing (IIP), Nov 2016, Melbourne, VIC, Australia. pp.169-178, ⟨10.1007/978-3-319-48390-0_18⟩. ⟨hal-01614992⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP IFIP-AICT IFIP-TC IFIP-TC12 IFIP-AICT-486

99 View

128 Download

Short Text Feature Extension Based on Improved Frequent Term Sets

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share