| Topic: | Large language models meet topic models: theory, algorithm and applications |
| Date: | 05/02/2026 |
| Time: | 2:00 pm - 3:00 pm |
| Venue: | ICS L1 · CUHK |
| Category: | Seminars |
| Speaker: | Professor Xin Bing |
| PDF: | PROF-Xin-Bing-_5-FEB-2026.pdf |
| Details: | Abstract Topic models are a foundational tool for extracting structure from large text corpora, and recent advances in word embeddings from large language models create new opportunities to significantly improve their interpretability and performance. This talk consists of three parts. The first part introduces classical topic models and reviews prior work in this area, including my own contributions. The second part focuses on leveraging static word embeddings—developed in the context of large language models—to improve classical topic modeling. As will be discussed, this approach is closely related to finite discrete mixture models under a softmax parameterization. The third part explores how contextual word embeddings can be incorporated into topic models to extract more thematically meaningful structure from text data. We model contextual embeddings as arising from a finite mixture distribution on the embedding space, where each mixture component is itself a large Gaussian mixture model. This hierarchical formulation distinguishes our approach from existing methods in the literature. |