OpenAI new and improved embedding model
OpenAI has announced a new embedding model which is more capable, cost-effective, and simpler to use than its previous model. The new model, text-embedding-ada-002, replaces five separate models for text search, text similarity, and code search, and outperforms Davinci, their previous most capable model, at most tasks. The new model is priced 99.8% lower than Davinci.
Since the inception of the OpenAI /embeddings endpoint, many applications have started to use numerical representations of concepts, known as embeddings, to personalize, recommend, and search for content. Embeddings make it easy for computers to understand relationships between different concepts, which has led to their widespread use in a variety of applications.
You can get embeddings for the new model using just two lines of code with the OpenAI Python Library, just like you could with previous models:
import openai
response = openai.Embedding.create(
input="porcine pals say",
model="text-embedding-ada-002"
)
print(response)
{
"data": [
{
"embedding": [
-0.0108,
-0.0107,
0.0323,
...
-0.0114
],
"index": 0,
"object": "embedding"
}
],
"model": "text-embedding-ada-002",
"object": "list"
}
Model Improvements
1) Stronger performance
text-embedding-ada-002 outperforms all the old embedding models on text search, code search, and sentence similarity tasks. It also gets comparable performance on text classification. For each task category, OpenAI evaluates the models on the datasets used in old embeddings.
2) Unification of capabilities
OpenAI has merged five separate models (text-similarity, text-search-query, text-search-doc, code-search-text, and code-search-code) into a single new model, making the /embeddings endpoint much simpler to use. This new model outperforms the previous embedding models on a variety of text search, sentence similarity, and code search benchmarks.
3) Longer context
The new model has a context length of 8192, which is four times longer than the previous model. This makes it more convenient to work with long documents.
4) Smaller embedding size
The new embeddings have 1536 dimensions, which is one-eighth the size of davinci-001 embeddings. This makes the new embeddings more cost-effective when working with vector databases.
5) Reduced price
OpenAI has slashed the price of its new embedding models by a whopping 90% in comparison to its old models of the same size. The new model performs just as well, if not better, than the old Davinci models but at less than one percent of the price.
The new embedding model is a big step forward for natural language processing and code tasks. It’ll be exciting to see how customers use it to create even more powerful applications in their respective fields.
Limitations
The new text-embedding-ada-002 model isn’t outperforming text-similarity-davinci-001 on the SentEval linear probing classification benchmark. If you need to train a light-weighted linear layer on top of embedding vectors for classification prediction, they suggest comparing the new model to text-similarity-davinci-001 and choosing whichever model gives optimal performance.
Examples of Embeddings APIs
KelendarAI
It’s a sales outreach product that uses embeddings to match the right sales pitch to the right customers. It does this by looking at customer profiles and sale pitches and then finding the most suitable matches. This process can eliminate 40-56% of unwanted targeting.
Notion
It’s an online workspace company, that will use OpenAI’s new word embeddings to improve Notion search beyond today’s keyword matching systems. This will help users find the right Notion workspace for their needs more easily and quickly.
If you like this post let us know what would you want us to post next in the comment section also help this post reach more and more people via our social media accounts on Instagram, Facebook, Twitter, and LinkedIn.
You also like to see: Getting Started with Laravel Sanctum: A Guide to Building Secure Applications