0 bookmark(s) - Sort by: Date ↓ / Title /
A simple Python3 tool to detect similarities between files within a repository. Document similarity code adapted from Jonathan Mugan's tutorial: https://www.oreilly.com/learning/how-do-i-compare-document-similarity-using-python '''
In your example if you use PCA to initialize your t-SNE you get widely spaced centroids; if you use random initialization you'll get tiny centroids and an uninteresting picture.
tokenizing and stemming each synopsis transforming the corpus into vector space using tf-idf calculating cosine distance between each document as a measure of similarity clustering the documents using the k-means algorithm using multidimensional scaling to reduce dimensionality within the corpus plotting the clustering output using matplotlib and mpld3 conducting a hierarchical clustering on the corpus using Ward clustering plotting a Ward dendrogram topic modeling using Latent Dirichlet Allocation (LDA)
First / Previous / Next / Last
/ Page 1 of 0