Feb 08

OpenAI’s Assistants API Pros and Cons… and Why You May Want to Think Twice About Using It

CloseBot has been at the forefront of using OpenAI’s models for AI-based lead qualification and appointment booking. We started with the Completions API, then moved over to the Chat Completions API, and were planning on moving to the Assistants API… but there are some things to consider.

We are going to focus here on the pros and cons of the Assistants API, keeping in mind the current LLM market, and the technical factors.

The Current Players

OpenAI isn’t the only player in the AI game. There’s Google’s Gemini, Anthropic’s Claude 2, Meta’s Llama 2, and many others. You’ve probably heard all of the stories about OpenAI being the strongest player among these options, but did you know that Google’s Bard (Gemini Pro) is ranked above GPT-4 currently from HuggingFace’s blind split testing?  Just like in many other industries, it’s rare that the first mover ends up being the long-term victor. And it’s even less likely that they will be the top-runner without ever being dethroned. Let’s keep this in mind going forward.

HuggingFace Leaderboard February 2024

Assistant API

The Assistants API is all the rage right now. There are even those who claim “if you aren’t using the Assistants API, you’re falling behind”. So what is this Assistants API?

The Assistants API is the newest way to interact with OpenAI’s models. This is a tempting option for many integration teams because it automatically handles a few things that were previously extremely difficult to handle. Here are some notes about how the Assistants API is different from the Chat Completions API and other models’ integrations.

RAG (Retrieval Augmented Generation)

Traditionally, AI companies who want their AI to be able to leverage scraped websites or uploaded documents would have to implement their own way of storing and querying these resources. This can be a time-consuming thing to develop that the Assistants API handles automatically. Although it will be more expensive to host this data within an Assistant, the ease of use makes it very tempting.

Long Polling

Currently, the way the Assistants API requests work requires the following steps:

  1. Create a message thread (conversation item)
  2. Add messages to the thread
  3. Create and start a run
  4. Continually check the thread with long polling until it’s complete
  5. Fetch the thread messages to see the final response

You can see this is quite a process for developers.  If you’re a developer, this may be a reason why you would want to think twice about implementing the Assistants API.  This process, however, is handled automatically by systems that integrate with it so end users are not affected aside from slight processing delays. Any time you want to run a response to a new inbound message, you’ll need to add that message to the thread again (step 2) and continue all over again.

Assistant’s “Personas”

Assistants (like CustomGPT’s) can be loaded with their own persona and set of tools that they are allowed to use. You could hand off conversations between different assistants at different stages or even append/ modify the assistant personas dynamically if you wanted.

Chat Completions APIs

This section applies to all other AI tools. Any tool that isn’t the Assistants API functions in roughly the same way currently.


If you’re using something other than the Assistants API, you have to plan out your own document handling. Here’s a look at how CloseBot handles document queries. Let’s break down the rough procedure that must be implemented to be able to give AI access to your documents:

  1. Set Up Vector Database – This vector database is what will store chunks of text from your documents in a way that makes it possible for us to query them compared to their relevance with the current conversation
  2. Chunk Uploads – When someone saves a file, we have to decide how to strip out the text and break it into bite-sized pieces for the AI
  3. Create Embeddings – Those texts must be converted to embeddings and stored in our vector database with references to the original text
  4. Query Vectors – When a message comes in, we have to convert the needed information to an embedding and query our embeddings to find relevant texts
  5. Send Results With Prompt – The results of that query are included in the final prompt to AI, adding to its context that it can reference.

Simple Request (No Long Polling)

The process of requesting a response to a message using the other models is simple. You send your prompt to the model and it returns a response. These models sometimes also allow “streaming”. This is the ability for the AI to return the response one phrase at a time, so you can see the response generating instead of waiting for the full response and getting it all at once.

Model Personas

Since these models operate on simple requests, without saving anything to memory, you have to resend the “persona” of your AI with the request each time. This means you’ll need a way to save the conversation history on your end as well as the assistant identity.

Assistants API Pros

Easy to Implement RAG – because the Assistants API handles document storage and retrieval for you, this is a way you can get off the ground quickly. This makes integrating with documents relatively simple (although you’re limited on document storage capabilities) and you know the methods used to query these documents are following the best practices of the OpenAI team.

Easy to Use Functions – Another tool you can use with the Assistants API is function calling. You can teach the AI how to trigger function calling that can be used to call on external tools (like weather data, property data, etc.). You need to have dev knowledge to set this up, but it is possible!

Customized Personas– With the Assistants API you don’t have to save your bot personas in your local memory, your Assistant will store this persona for you.

Conversation History Storage – It will also store your conversation history for you, so you don’t have to worry about storing that conversation history somewhere yourself.

Assistants API Cons

Locked Into OpenAI – If you use the Assistants API, it will force you to adopt the architecture required to do the more complex string of actions required to initiate a chat. It will also require that you use their document storage. This means it will be much more difficult to migrate this system to use a different model later. It would require a complete rewrite and a large architecture change to accommodate a different model.

Slower – The Assistants do not support streaming and they require inefficient multiple back-and-forth requests to initiate new conversations as well as long-polling, which is slower than normal responses with the other models.

No Web Browsing – While OpenAI’s other models have functions available for web browsing natively, the Assistants API does not currently have this ability.  Assistants API can utilize function calls that call browsing, but this doesn’t work quite the same as the native browsing functions available for the Chat Completions Models and other tools.

More Complicated Implementation – We saw above that there are many steps required to make a simple conversation get started using the Assistants API. This complex setup requires that you shape your backend architecture in a particular way that limits your ability to pivot to other options in the future. This is a note for developers.

Knowledge Retrieval Limitations – Although the knowledge retrieval is easier to use than the alternative option of Chat Completions APIs, you’re limited on the number of documents you can upload and you have no control over how they are queried. You lose the ability to optimize retrieval methods for specific scenarios.

Chat Completions API Pros

Not Married to OpenAI – Because the Chat Completions API uses standard requests, you have full control over the system. Migrating to another model would be a simple task.

Faster – Because there are fewer steps involved with making a request and no long-polling, these models are often faster than the Assistants API.

Has Web Browsing – OpenAI’s Chat Completions API has the ability to browse the web, but you also retain full control to query the web and bring results into context.

Simple Implementation – Your architecture can follow practices used for any normal API request. It’s easy to get started with the APIs.

Knowledge Retrieval Control – Although knowledge retrieval is complex to set up, once that setup is complete you have full control over it. You can optimize the retrieval for different scenarios and optimize for cost. You have no limits on upload size and can easily tie into these documents with other models.

Chat Completions API Cons

Difficult to Implement RAG – the process of setting up the knowledge storage is complicated

Functions – Function calls are possible through Chat Completions, but take more setup work

Storage – You must implement a way to store system prompts (personas) and conversation history yourself.


Although it may be tempting to jump into the newest Assistants API, I would proceed with an educated sense of caution. Using the Assistants API as a way to expedite development may lock you into a marriage with OpenAI that makes it more difficult to pivot to other emerging companies in the future and would even make it impossible to build out a system leveraging other company’s models as backup in case of downtime.

CloseBot has chosen to adopt the more robust option and implement RAG outside of the Assistants API. We are building for the long term instead of making short-term (easier) decisions that would have a certain negative impact on its users. CloseBot is a lead qualification AI tool that plugs into CRMs like GoHighLevel, HubSpot, and Salesforce to automate lead qualification. We make it possible to build expert-level lead qualification AI quickly and easily!  In addition to auto-responding to customers, it can also handle back to back messaging, conversationally update fields and even book appointments based on calendar availability!

Click here to check out CloseBot’s website and learn more!