klotz: python* + document* + tf-idf*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. A simple Python3 tool to detect similarities between files within a repository.
    Document similarity code adapted from Jonathan Mugan's tutorial:
    https://www.oreilly.com/learning/how-do-i-compare-document-similarity-using-python
    '''
    2020-03-11 Tags: , , , , by klotz
  2. tokenizing and stemming each synopsis
    transforming the corpus into vector space using tf-idf
    calculating cosine distance between each document as a measure of similarity
    clustering the documents using the k-means algorithm
    using multidimensional scaling to reduce dimensionality within the corpus
    plotting the clustering output using matplotlib and mpld3
    conducting a hierarchical clustering on the corpus using Ward clustering
    plotting a Ward dendrogram
    topic modeling using Latent Dirichlet Allocation (LDA)
    2018-08-16 Tags: , , , , , , , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: python + document + tf-idf

About - Propulsed by SemanticScuttle