User intention-based document summarization on heterogeneous sentence networks

Hsiu Yi Wang, Jia Wei Chang, Jen Wei Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic extraction-based document summarization is a difficult Natural Language Processing task. Previous approaches have usually generated the summary by extracting the top K salient sentences on graph-based ranking algorithms, but sentence feature representation only captures the surface relationship between the objects, hence the results may not accurately reflect the user’s intentions. Therefore, we propose a method to address this challenge, and: (1) obtain deeper semantic concepts among candidate sentences using meaningful sentence vectors combining word vectors and TF-IDF; (2) rank the sentences considering both relationships between sentences and the user’s intention for each sentence to identify significant sentences, and apply these to a heterogeneous graph; (3) generate the result sentence by sentence to ensure summary semantics are properly related to the original document. We verified the proposed approach experimentally using English summarization benchmark datasets DUC2001 and DUC2002; the large Chinese summarization data set, LCSTS. We also collected news data and produced a reference summary using a group of bank auditor experts that we compared to the proposed approach using ROUGE evaluation.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 24th International Conference, DASFAA 2019, Proceedings
EditorsJoao Gama, Yongxin Tong, Guoliang Li, Jun Yang, Juggapong Natwichai
PublisherSpringer Verlag
Pages572-587
Number of pages16
ISBN (Print)9783030185787
DOIs
Publication statusPublished - 2019 Apr 25
Event24th International Conference on Database Systems for Advanced Applications, DASFAA 2019 - Chiang Mai, Thailand
Duration: 2019 Apr 222019 Apr 25

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11447 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Database Systems for Advanced Applications, DASFAA 2019
Country/TerritoryThailand
CityChiang Mai
Period19-04-2219-04-25

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'User intention-based document summarization on heterogeneous sentence networks'. Together they form a unique fingerprint.

Cite this