we embed all of our documents and convert them to numerical representations. Second, we find out which documents are most similar to one another. We assume that documents that are highly similar will have the same keywords, so there would be no need to extract keywords for all documents. Third, we only extract keywords from 1 document in each cluster and assign the keywords to all documents in the same cluster.