MIT Researchers Developed a Way to Let AI Chatbots Converse 'Nonstop' Without Crashing banner

MIT Researchers Developed a Way to Let AI Chatbots Converse 'Nonstop' Without Crashing

|News

The role of AI-powered chatbots has become increasingly essential across multiple industries. As technological innovations push boundaries and create better AI tools, customer service processes, digital marketing strategies, and even core business operations are transforming. However, one challenge that persistently arises in AI chatbots has hindered their ability to achieve seamless functionality—frequent crashes during prolonged conversations. 

 

A new study led by graduate student Guangxuan Xiao of MIT has discovered a groundbreaking solution that allows language models to be able to communicate with their users ‘nonstop.’ The purpose is to eliminate the constant crashing of conversational AI once the thread of conversation extends further than what the chatbot is capable of executing during the beginning of the exchange. In the study, the researchers were able to develop a solution that will make large-scale AI chatbots capable of continuing a conversation through a workday without crashing or necessitating a reboot—a feature essential in AI assistants that complete copywriting, editing, or code-generating tasks. 

 

Researchers Developed a Way to Let AI Chatbots Converse 'Nonstop' Without Crashing

Image from Xenioo

 

AI chatbots employed in the workplace are capable of completing complex tasks that require the preservation of the conversations’ key-value cache. Key-value cache, also known as KV cache, is an in-memory storage system that stores data as key-value pairs. KV cache organizes and retrieves data based on the value assigned to each key, essentially functioning as a conversation memory to keep the AI conversation efficient and relevant. 

 

The issue of large-scale language models with KV caches is that once the conversation extends further than what the AI chatbots are originally capable of executing, the initial KV cache is then discarded by the chatbot to make room for more information to keep the conversation going. However, when the initial KV caches are discarded, the new cache (called the ‘sliding cache’) will struggle to produce tokens to connect it to the initial cache necessary to keep replies relevant. Xiao and his colleagues discovered that by tweaking the sliding cache to still contain the tokens from the initial KV cache, the AI chatbot can still produce relevant words to maintain the conversation, even when it requires the inclusion of a sliding cache.

 

The groundbreaking AI chatbot solution, called StreamingLLM, allows conversational AI to accommodate more than 4 million words with 22 times the speed compared to previous AI chatbots. According to Xiao, StreamingLLM can be utilized by AI chatbot software to consistently keep up with conversations based on recent chats.

 

“Now, with this method, we can persistently deploy these large language models. By making a chatbot that we can always chat with, and that can always respond to us based on our recent conversations, we could use these chatbots in some new applications,” said Xiao in an article published on MIT’s website.

 

KV Cache Solution to Improve Future AI Conversations

Image from Mendix

 

The study done by researchers from MIT provides a groundbreaking solution to the persistent problem of crashing AI chatbots. The research was co-authored by UC Berkeley Department of Electrical Engineering and Computer Sciences associate professor Song Han, who is also a member of the MIT-IBM Watson AI Lab and a scientist at the multi-national US-based technological company Nvidia. Beidi Chen, who is an assistant professor at Carnegie Mellon University is also the study’s co-author, as well as Yuandong Tian and Mike Lewis, who are both scientists at Meta Platforms Inc.’s Meta AI. New AI chat models are expected to become capable of extended chats without crashing if they adopt the StreamingLLM method.