Clustering and Visualising Documents using Word Embeddings

Drawing of a star-cluster

This lesson uses word embeddings and clustering algorithms in Python to identify groups of similar documents in a corpus of approximately 9,000 academic abstracts. It will teach you the basics of dimensionality reduction for extracting structure from a large corpus and how to evaluate your results.

edited by

reviewed by

published

modified

difficulty

DOI id icon

https://doi.org/10.46430/phen0111

Donate today!

Great Open Access tutorials cost money to produce. Join the growing number of people supporting Programming Historian so we can continue to share knowledge free of charge.

This lesson is part of a special series in partnership with Jisc and The National Archives. To browse other lessons in this series click here.

Contents