CICo 2020
Conference Management System
Main Site
Submission Guide
Register
Login
User List | Statistics
Abstract List | Statistics
Paper List
Presentation Video
Online Q&A Forum
Access Mode
Ifory System
:: Abstract ::

<< back

iAMStemmer: A Comprehensive Approach of Bahasa Indonesia (Indonesian language) Stemming Algorithm
Ashari Imamuddin, Mufid Junaedi, Mohamad Anas Sobarnas, Iskandar

Computer Science Department, Sekolah Tinggi Teknologi Muhammadiyah Cileungsi
Jl. Anggrek No.25, Perum. PTSC, Cileungsi, Kec. Cileungsi, Bogor, Jawa Barat 16820, Indonesia
ashari[at]sttmcileungsi.ac.id. mufid[at]sttmcileungsi.ac.id, anas[at]sttmcileungsi.ac.id, iskandar[at]sttmcileungsi.ac.id


Abstract

Nazief and Adriani had been successful as pioneers in developing the confix stripping stemming algorithm of Bahasa Indonesia (Indonesian language) by removing affix (prefix and suffix) then seeking the new word to the stem dictionary. However, the algorithm, SNA - stemmer of Nazief and Adriani - algorithm, has ambiguities on words which are ended by syllables ""ku"", ""mu"", and ""nya"" such as ""berlaku"" and ""sebeku"". It assumes that ""ku"" in both words is possessive, so it is under-stemming because they would be ""berla"" and ""sebe"" which are meaningless. The algorithm also cannot solve the word ""seolah-olah"" because it treats and removes syllable lah as an article. The recent algorithm was developed by Asian who successfully improved the algorithm by enhancing confix stripping (CSS) for regular repetition-words such as ""berlari-lari"", ""bersama-sama"", and ""terbata-bata"". The most recent stemmer was developed by Suhartono. However, the stemmers failed to fix irregular repetition-words such as ""menari-nari"", ""memutar-mutar"", and ""menyama-nyamakan"". We developed iAMStemmer to flesh out with a comprehensive approach and tune out the deficit of the existing algorithms. Our methods were matching new word as a result of affix removal to the dictionary; developing more stems; repeat-stemming for a word with on a par prefix as in ""dikesampingkan"" or ""diketahui"" which has two equal prefixes ""di"" and ""ke""; removing once time for double prefix as in ""seseorang"" or ""sesekali"" with double prefixes ""se""; improving stemming rule on confix ""meny-"", ""memper-""; and enhancing rule on repetition-word. It changed the method and added more rules to the algorithms. Our stemmer reduces the number of under-stemming or over-stemming words, enhances repeating-word, and improves accuracy and increased success stemming about 3% compared to the existing algorithm which is 93%. Besides, our system produces the root word of confixed compound-words such as ""mempertanggungjawabkan"" with compound-word ""tanggung jawab"" stem.

Keywords: algorithm, stemming, Indonesian, stemming bahasa Indonesia

Topic: Computer and Mathematics

Plain Format | Corresponding Author (Ashari Imamuddin)

Share Link

Share your abstract link to your social media or profile page

CICo 2020 - Conference Management System

Powered By Konfrenzi Ultimate 1.832L-Build7 © 2007-2024 All Rights Reserved