Transformers are a type of neural network architecture that are used in natural language processing tasks like language translation, language modelling, and text classification. Transformers convert words into numerical values, which is necessary for AI to understand language.
Click '>Play' on the video above to gain further insights into word embeddings and encoding for NLP
There are three key concepts to consider when encoding words numerically:
1. semantics (meaning)
2. position (relative and absolute) and,
3. relationships and attention (grammar).
Transformers excel at capturing relationships and attention, or the way words relate to and pay attention to each other in a sentence. They do this using an attention mechanism, which allows the model to selectively focus on certain parts of the input while processing it.
In the next video, we will look at the attention mechanism in more detail and how it works.
We can encode word semantics using a neural network to predict a target word based on a series of surrounding words in a corpus of text. The network is trained using backpropagation, adjusting the weights and biases of the input and hidden layers until the updates become negligible and the network is said to be "trained". The weights connecting the input neurons to the hidden layer will then contain an encoding of the word, with similar words having similar encodings.
This allows for more efficient processing and a better understanding of the meaning and context of words in the language model.

Video links:
On www.lucidate.co.uk:
- Neural Networks Primer - https://www.lucidate.co.uk/blog/categories/ai-education
- One-hot vector Encoding - https://www.lucidate.co.uk/forum/data-pipelines-for-ai/eda-2-dealing-with-categorical-data
On Youtube:
- Neural Networks Primer - https://www.youtube.com/playlist?list=PLaJCKi8Nk1hzqalT_PL35I9oUTotJGq7a
- One-hot vector Encoding - https://youtu.be/RtymA8mmULE