- Embeddings transform words and sentences into sequences of numbers for computers to understand language.
- This technology powers tools like Siri, Alexa, Google Translate, and generative AI systems like ChatGPT, Bard, and DALL-E.
- In the early days, embeddings were crafted by hand, which was time-consuming and couldn't adapt to language nuances easily.
- The 3D hand-crafted embedding app provides an interactive experience to understand this concept.
- The star visualization method offers an intuitive way to understand word embeddings.
- Machine learning models like Word2Vec and GloVe revolutionized the generation of word embeddings from large text datasets.
- Universal Sentence Encoder (USE) extends the concept of word embeddings to entire sentences.
- TensorFlow Projector is an advanced tool to interactively explore high-dimensional data like word and sentence embeddings.
However it is interesting that LSTM can achieve good performance
with word vectors based on a small corpus even though it scored terrible in the semantic and syntactic analysis.
LSTM does perform better than the other classifiers, but it does require more data. If NLP tasks are to be solved in other domains that do not generate enough data for a LSTM to work properly it would be advisable to train a SVM using AvgWV. LSTM is more adaptable but knowing how to optimise the network does require domain knowledge and experience with gradient-decent classifiers.