NLP Models
Natural Language Processing (NLP) is the basis of all AI models. Understanding how NLP Models work will make using this technology easier and give you a better insight on what is and isn't possible.
NLP Models are trained on datasets. These datasets can vary wildly in size. It is a fallacy to consider the higher the dataset parameters, the better the model. This can be the case but it is important to look into the whitepapers of each model to understand the data and what is possible. A multi-billion parameter model trained on junk, is not going to miraculously produce anything other than junk. A smaller model trained on quality text and information is more likely to produce quality outputs.
Size of an NLP model also has other unintended consequences. The more parameters there are, the more power it takes up, the costlier it is to run and the longer a request takes to complete. There is always a sweet spot in terms of speed, quality and cost.
Predicting Sequential Tokens
Every NLP model works in a similar way. You input it some text, it takes this text and it will predict the next tokens within the sequence. It is important to understand this as it is important to grasp that short form creation with AI has a much stronger chance of success than long form as predicting a shorter sequence of tokens to give an output that is deemed desirable is a lot easier than outputting a large amount of tokens where the chance of wildness or steering into realms of irrelevance are much greater.
You could input something like;
and if you hit generate and the NLP model has been trained on datasets that include some of Plato's works you could expect it to return the following.
There are a lot more settings involved in the backend of an NLP model such as Temperature, Top P, Frequency and Presence Penalties but we'll go over those in further writing.
The largest NLP model created so far belongs to Wudao in China which boasts a model of 1.75 trillion parameters!
Last updated