Alibaba launched a new artificial intelligence model that the company says can understand images and hold more complex conversations than the company’s previous products, as the global race for technology leadership heats.
The Chinese tech giant said its two new models, Qwen-VL and Qwen-VL-Chat, are open source, meaning researchers, researchers and companies around the world can use them to build their own AI applications without having to train their own systems, which saves time and costs.
Qwen-VL can answer open-ended questions related to various images and generate captions, Alibaba said.
In contrast, the Qwen-VL Chat offers “more complex interactions,” such as comparing multiple image streams and answering multiple questions, according to Alibaba. Some tasks Qwen-VL-Chat can perform, according to Alibaba, include writing stories and creating images based on user-entered photos, as well as solving math equations displayed on the image.
One example from Alibaba is Chinese-language hospital-labeled food. Artificial intelligence can answer questions about the location of certain hospital departments by interpreting the image of the sign.
Until now, much of generative AI — where technology produces responses based on human input — has focused on responding to text. The latest version of OpenAI ChatGPT also can understand images and respond in text like Qwen-VL-Chat.
Alibaba’s two newest models are built on a large language model released earlier this year called the Tongyi Qianwen. LLM is an artificial intelligence model trained on massive amounts of data that supports chatbots.
The Hangzhou-based company unveiled two more AI models this month. Although Alibaba doesn’t earn royalties, the open-source distribution will help the company gain more users for its AI model — when the company’s cloud division aims to speed up growth as it prepares to go public.