train models for processing documents based on specific needs and requirements. It offers capabilities such as entity recognition, key information extraction, and data validation,
tokenizing and stemming each synopsis
transforming the corpus into vector space using tf-idf
calculating cosine distance between each document as a measure of similarity
clustering the documents using the k-means algorithm
using multidimensional scaling to reduce dimensionality within the corpus
plotting the clustering output using matplotlib and mpld3
conducting a hierarchical clustering on the corpus using Ward clustering
plotting a Ward dendrogram
topic modeling using Latent Dirichlet Allocation (LDA)
Image Similarity Search
Reverse Image Search
Object Similarity Search
Robust OCR Document Search
Semantic Search
Cross-modal Retrieval
Probing Perceptual Similarity
Comparing Model Representations
Concept Interpolation
Concept Space Traversal
Image Similarity Search