User intention-based document summarization on heterogeneous sentence networks

Hsiu Yi Wang, Jia Wei Chang, Jen Wei Huang

研究成果: Conference contribution

摘要

Automatic extraction-based document summarization is a difficult Natural Language Processing task. Previous approaches have usually generated the summary by extracting the top K salient sentences on graph-based ranking algorithms, but sentence feature representation only captures the surface relationship between the objects, hence the results may not accurately reflect the user’s intentions. Therefore, we propose a method to address this challenge, and: (1) obtain deeper semantic concepts among candidate sentences using meaningful sentence vectors combining word vectors and TF-IDF; (2) rank the sentences considering both relationships between sentences and the user’s intention for each sentence to identify significant sentences, and apply these to a heterogeneous graph; (3) generate the result sentence by sentence to ensure summary semantics are properly related to the original document. We verified the proposed approach experimentally using English summarization benchmark datasets DUC2001 and DUC2002; the large Chinese summarization data set, LCSTS. We also collected news data and produced a reference summary using a group of bank auditor experts that we compared to the proposed approach using ROUGE evaluation.

原文English
主出版物標題Database Systems for Advanced Applications - 24th International Conference, DASFAA 2019, Proceedings
編輯Joao Gama, Yongxin Tong, Guoliang Li, Jun Yang, Juggapong Natwichai
發行者Springer Verlag
頁面572-587
頁數16
ISBN(列印)9783030185787
DOIs
出版狀態Published - 2019 四月 25
事件24th International Conference on Database Systems for Advanced Applications, DASFAA 2019 - Chiang Mai, Thailand
持續時間: 2019 四月 222019 四月 25

出版系列

名字Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
11447 LNCS
ISSN(列印)0302-9743
ISSN(電子)1611-3349

Conference

Conference24th International Conference on Database Systems for Advanced Applications, DASFAA 2019
國家/地區Thailand
城市Chiang Mai
期間19-04-2219-04-25

All Science Journal Classification (ASJC) codes

  • 理論電腦科學
  • 電腦科學(全部)

指紋

深入研究「User intention-based document summarization on heterogeneous sentence networks」主題。共同形成了獨特的指紋。

引用此