IRIS - Res&Arch Institutional Research Information System - Research & Archive

In recent years, topic modeling has been increasingly adopted for finding conceptual patterns in large corpora of digital documents to organize them accordingly. In order to enhance the performance of topic modeling algorithms, such as Latent Dirichlet Allocation (LDA), multiple preprocessing steps have been proposed. In this paper, we introduce N-gram Removal, a novel preprocessing procedure based on the systematic elimination of a dynamic number of repeated words in text documents. We have evaluated the effects of the utilization of N-gram Removal through four different performance metrics: we concluded that its application is effective at improving the performance of LDA and enhances the human interpretation of topics models.

Improving Topic Modeling Performance through N-gram Removal

Almgerbi M.;De Mauro A.;Kahlawi A.;Poggioni V.

2021

Abstract

In recent years, topic modeling has been increasingly adopted for finding conceptual patterns in large corpora of digital documents to organize them accordingly. In order to enhance the performance of topic modeling algorithms, such as Latent Dirichlet Allocation (LDA), multiple preprocessing steps have been proposed. In this paper, we introduce N-gram Removal, a novel preprocessing procedure based on the systematic elimination of a dynamic number of repeated words in text documents. We have evaluated the effects of the utilization of N-gram Removal through four different performance metrics: we concluded that its application is effective at improving the performance of LDA and enhances the human interpretation of topics models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Codice ISBN
	
				9781450391153
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1524632

Citazioni

ND

2

1

social impact