Topic models are useful tools for extracting the most salient themes within a collection of documents, grouping them to construct clusters representative of each specific topic. These clusters summarize and represent the semantic contents of the documents for better document interpretation. In this work, we present a light approach able to learn topic representations in a Self-Supervised fashion. More specifically, we propose a lightweight and scalable architecture using a seed-word driven approach to simultaneously co-learn a representation from a document and its corresponding word embeddings. The results obtained on a variety of datasets of different sizes and natures show that our model is capable of extracting meaningful topics. Furthermore, our experiments on five benchmark datasets illustrate that our model outperforms both traditional and neural topic modelling baseline models in terms of different coherence and clustering accuracy measures.

A self-supervised seed-driven approach to topic modelling and clustering

Raballo, Andrea;
2024

Abstract

Topic models are useful tools for extracting the most salient themes within a collection of documents, grouping them to construct clusters representative of each specific topic. These clusters summarize and represent the semantic contents of the documents for better document interpretation. In this work, we present a light approach able to learn topic representations in a Self-Supervised fashion. More specifically, we propose a lightweight and scalable architecture using a seed-word driven approach to simultaneously co-learn a representation from a document and its corresponding word embeddings. The results obtained on a variety of datasets of different sizes and natures show that our model is capable of extracting meaningful topics. Furthermore, our experiments on five benchmark datasets illustrate that our model outperforms both traditional and neural topic modelling baseline models in terms of different coherence and clustering accuracy measures.
2024
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1587342
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact