Tags

Type your tag names separated by a space and hit enter

Quantifying the informativeness for biomedical literature summarization: An itemset mining method.
Comput Methods Programs Biomed. 2017 Jul; 146:77-89.CM

Abstract

OBJECTIVE

Automatic text summarization tools can help users in the biomedical domain to access information efficiently from a large volume of scientific literature and other sources of text documents. In this paper, we propose a summarization method that combines itemset mining and domain knowledge to construct a concept-based model and to extract the main subtopics from an input document. Our summarizer quantifies the informativeness of each sentence using the support values of itemsets appearing in the sentence.

METHODS

To address the concept-level analysis of text, our method initially maps the original document to biomedical concepts using the Unified Medical Language System (UMLS). Then, it discovers the essential subtopics of the text using a data mining technique, namely itemset mining, and constructs the summarization model. The employed itemset mining algorithm extracts a set of frequent itemsets containing correlated and recurrent concepts of the input document. The summarizer selects the most related and informative sentences and generates the final summary.

RESULTS

We evaluate the performance of our itemset-based summarizer using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, performing a set of experiments. We compare the proposed method with GraphSum, TexLexAn, SweSum, SUMMA, AutoSummarize, the term-based version of the itemset-based summarizer, and two baselines. The results show that the itemset-based summarizer performs better than the compared methods. The itemset-based summarizer achieves the best scores for all the assessed ROUGE metrics (R-1: 0.7583, R-2: 0.3381, R-W-1.2: 0.0934, and R-SU4: 0.3889). We also perform a set of preliminary experiments to specify the best value for the minimum support threshold used in the itemset mining algorithm. The results demonstrate that the value of this threshold directly affects the accuracy of the summarization model, such that a significant decrease can be observed in the performance of summarization due to assigning extreme thresholds.

CONCLUSION

Compared to the statistical, similarity, and word frequency methods, the proposed method demonstrates that the summarization model obtained from the concept extraction and itemset mining provides the summarizer with an effective metric for measuring the informative content of sentences. This can lead to an improvement in the performance of biomedical literature summarization.

Authors+Show Affiliations

Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran. Electronic address: milad.moradi@ec.iut.ac.ir.Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran. Electronic address: nghadiri@cc.iut.ac.ir.

Pub Type(s)

Journal Article

Language

eng

PubMed ID

28688492

Citation

Moradi, Milad, and Nasser Ghadiri. "Quantifying the Informativeness for Biomedical Literature Summarization: an Itemset Mining Method." Computer Methods and Programs in Biomedicine, vol. 146, 2017, pp. 77-89.
Moradi M, Ghadiri N. Quantifying the informativeness for biomedical literature summarization: An itemset mining method. Comput Methods Programs Biomed. 2017;146:77-89.
Moradi, M., & Ghadiri, N. (2017). Quantifying the informativeness for biomedical literature summarization: An itemset mining method. Computer Methods and Programs in Biomedicine, 146, 77-89. https://doi.org/10.1016/j.cmpb.2017.05.011
Moradi M, Ghadiri N. Quantifying the Informativeness for Biomedical Literature Summarization: an Itemset Mining Method. Comput Methods Programs Biomed. 2017;146:77-89. PubMed PMID: 28688492.
* Article titles in AMA citation format should be in sentence-case
TY - JOUR T1 - Quantifying the informativeness for biomedical literature summarization: An itemset mining method. AU - Moradi,Milad, AU - Ghadiri,Nasser, Y1 - 2017/05/27/ PY - 2016/08/31/received PY - 2017/04/07/revised PY - 2017/05/26/accepted PY - 2017/7/10/entrez PY - 2017/7/10/pubmed PY - 2017/12/26/medline KW - Biomedical text mining KW - Concept-based text analysis KW - Data mining KW - Domain knowledge KW - Frequent itemset mining KW - Informativeness SP - 77 EP - 89 JF - Computer methods and programs in biomedicine JO - Comput Methods Programs Biomed VL - 146 N2 - OBJECTIVE: Automatic text summarization tools can help users in the biomedical domain to access information efficiently from a large volume of scientific literature and other sources of text documents. In this paper, we propose a summarization method that combines itemset mining and domain knowledge to construct a concept-based model and to extract the main subtopics from an input document. Our summarizer quantifies the informativeness of each sentence using the support values of itemsets appearing in the sentence. METHODS: To address the concept-level analysis of text, our method initially maps the original document to biomedical concepts using the Unified Medical Language System (UMLS). Then, it discovers the essential subtopics of the text using a data mining technique, namely itemset mining, and constructs the summarization model. The employed itemset mining algorithm extracts a set of frequent itemsets containing correlated and recurrent concepts of the input document. The summarizer selects the most related and informative sentences and generates the final summary. RESULTS: We evaluate the performance of our itemset-based summarizer using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, performing a set of experiments. We compare the proposed method with GraphSum, TexLexAn, SweSum, SUMMA, AutoSummarize, the term-based version of the itemset-based summarizer, and two baselines. The results show that the itemset-based summarizer performs better than the compared methods. The itemset-based summarizer achieves the best scores for all the assessed ROUGE metrics (R-1: 0.7583, R-2: 0.3381, R-W-1.2: 0.0934, and R-SU4: 0.3889). We also perform a set of preliminary experiments to specify the best value for the minimum support threshold used in the itemset mining algorithm. The results demonstrate that the value of this threshold directly affects the accuracy of the summarization model, such that a significant decrease can be observed in the performance of summarization due to assigning extreme thresholds. CONCLUSION: Compared to the statistical, similarity, and word frequency methods, the proposed method demonstrates that the summarization model obtained from the concept extraction and itemset mining provides the summarizer with an effective metric for measuring the informative content of sentences. This can lead to an improvement in the performance of biomedical literature summarization. SN - 1872-7565 UR - https://www.unboundmedicine.com/medline/citation/28688492/Quantifying_the_informativeness_for_biomedical_literature_summarization:_An_itemset_mining_method_ L2 - https://linkinghub.elsevier.com/retrieve/pii/S0169-2607(16)30925-7 DB - PRIME DP - Unbound Medicine ER -