Indonesian Automatic Text Summarization based on A New
Clustering Method in Sentence Level
Zefeng Cai
School of Information
Science and Technology
Guangdong University
of Foreign Studies
Guangzhou, China
591736923@qq.co
m
Nankai Lin
School of Information
Science and Technology
Guangdong University
of Foreign Studies
Guangzhou, China
Neakail@outlook.co
m
Chuyu Ma
School of Information
Science and Technology
Guangdong University
of Foreign Studies
Guangzhou, China
1436007093@qq.co
m
Shengyi Jiang
Eastern Language
Processing Center
Guangzhou, China
jiangshengyi@163.c
om
ABSTRACT
With the development of the Internet, the amount of
information grows exponentially, and the automatic text
summarization technology becomes more and more important. At
present, the majority of researches on automatic summarization
techniques are applied to common languages such as Chinese and
English, but it is few in low resource language. In this paper, we
constructed an automatic summary dataset of Indonesian language
and conducts related research on Indonesian automatic abstracts.
And in this paper, we propose a new and efficient extraction-based
automatic text summarization method based on sentence similarity
clustering. Based on the idea of clustering, this paper considers the
semantics of sentences and we clusters sentences according to the
similarity between sentences and sentences. According to the rules
we extracts the abstracts and finally obtains the summarization
results. This method not only ensures the integrity, criticality and
importance of the summary, but also reduces the information
redundancy of the summary. In the evaluation, our method achieved
good results and exceeded all the baselines in the indexes of 𝐹
1
score of ROUGE-1、ROUGE-2、ROUGE-3.
Keywords
Indonesian; Extractive Summarization; Sentences Similarity;
Sentences Clustering
1 Introduction
With the development of Internet technology, the relationship
between the Internet and life has become closer and closer.
Although the vast amount of information on the Internet has brought
convenience to people's lives, it has also caused the problem of
“information explosion”. It is difficult for people to get the
information which they want quickly and accurately. Therefore,
how to quickly refine the key information and obtain the essential
information has become an urgent problem for us to solve.
Automatic Text Summarization (ATS) is the process to
reduce text in order to obtain important sentences by the machine
which implements particular algorithm or method. One method to
produce text summary is extraction-based summarization.
Extraction-based summarization method extracts important
sentences in article and then unify them into one summary,
therefore the sentences yielded by this algorithm are part of the
original text without modification. Another method is generation-
based summarization. Generation-based summarization is
generating a summarization that contain the sentences that not exist
in original article and different from the sentences in the original
article.
In universal language, there are many mature technologies in
both methods like English and Chinese. However, because of