Sentiment Analysis on Indonesian Text

November 20, 2025 · failed · nlp sentiment-analysis indonesian

Objective

Build sentiment classifier for Indonesian social media text.

Approach

Used pre-trained IndoBERT
Fine-tuned on Indonesian Twitter dataset
3-class classification (positive/neutral/negative)

Results

Accuracy: 65% (below target of 80%)
Model confused by slang and mixed language

Why It Failed

Dataset too small (only 5k samples)
Lots of code-switching (Indonesian + English)
Slang words not in pre-trained vocabulary

Lessons Learned

Need larger, more diverse dataset. Consider data augmentation or semi-supervised learning.