Kuala Lumpur, Malaysia, August, 2023 – Alibaba Cloud has launched two open-source large vision language models (LVLM), Qwen-VL and Qwen-VL-Chat. The models can comprehend images, texts and bounding boxes in prompts and facilitate multi-round question answering in both English and Chinese.

 Qwen-VL is the multimodal version of Qwen-7B, Alibaba Cloud’s 7-billion-parameter model of its large language model Tongyi Qianwen (also available on ModelScope as open-source). Capable of understanding both image inputs and text prompts in English and Chinese, Qwen-VL can perform various tasks such as responding to open-ended queries related to different images and generating image captions.

 Qwen-VL-Chat caters to more complex interaction, such as comparing multiple image inputs and engaging in multi-round question answering. Leveraging alignment techniques, this AI assistant exhibits a range of creative capabilities, which include writing poetry and stories based on input images, summarizing the content of multiple pictures, and solving mathematical questions displayed in images.

Multi round question answering by the Qwen-VL-Chat model

Open sourced

In a bid to democratize AI technologies, Alibaba Cloud has shared the model’s code, weights, and documentation with academics, researchers, and commercial institutions worldwide. This contribution to the open-source community is accessible via Alibaba’s AI model community ModelScope and the collaborative AI platform Hugging Face. For commercial uses, companies with over 100 million monthly active users can request a license from Alibaba Cloud.

What does it all mean?

The introduction of these models, with their ability to extract meaning and information from images, holds the potential to revolutionize interaction with visual content. For instance, the models could provide information assistance to visually impaired individuals during online shopping in the future.

The Qwen-VL model was pre-trained on image and text datasets. Compared to other open-source large vision language models that can process and understand images in 224*224 resolution, Qwen-VL can handle image input at a resolution of 448*448, resulting in better image recognition and comprehension of images.

Based on various benchmarks, Qwen-VL recorded outstanding performance on several visual language tasks, including zero-shot captioning, general visual question answering, text-oriented visual question answering and object detection.

The models could provide information assistance to visually impaired individuals during online shopping in the future.

Qwen-VL-Chat has also achieved leading results in both Chinese and English for text-image dialogue and alignment levels with humans, according to the benchmark test of Alibaba Cloud. This test involved over 300 images, 800 questions, and 27 categories.

Earlier this month, Alibaba Cloud open sourced its 7-billion-parameter LLMs, Qwen-7B and Qwen-7B-Chat as its ongoing contribution to the open-source community. The two models have had over 400,000 downloads within a month of their launch

The paper of the model is also available: https://arxiv.org/abs/2308.12966 .

(This article was adapted from an Alibaba Cloud press release)

Recommended PostS

Newsletter

Subscribe to CXPOSÉ Newsletter

Get the latest conversations on customer experiences in your inbox.

Subscription Form
Related Posts

Newsletter

Subscribe to CXPOSÉ Newsletter

Get the latest conversations on customer experiences in your inbox.

Subscription Form
Other Categories
Follow Us
Open Source is more than just code, says SUSE APAC CTO Vishal Ghariwala.

Like, Follow and Share CXposè.tech's Socials below: 💎
🎉Website: https://cxpose.tech/
🎉Linkedin: https://lnkd.in/g5XjYfWW
🎉TikTok: https://lnkd.in/gUypxwVn
🎉Instagram: https://instagram.com / cxposetech 
🎉X: https://x.com/CxposeT3ch

#opensource #code #suse
The Secret Sauce of Open Source with SUSE's APAC CTO.

It's just another day with Hack CX and Javin Chew as he chats with another tech industry thought leader!

Like, Follow and Share CXposè.tech's Socials below: 💎
🎉Website: https://cxpose.tech/
🎉Linkedin: https://lnkd.in/g5XjYfWW
🎉TikTok: https://lnkd.in/gUypxwVn
🎉Instagram: https://instagram.com / cxposetech 
🎉X: https://x.com/CxposeT3ch

#opensource #digitaltransformation #cx
CXposè.tech was in Hong Kong last year for Cyberport's annual venture capital forum. 

Many startup founders and VC professionals like Sogal's Pocket Sun attended as well, and we took the opportunity to find out her top tips for founders to nurture and cultivate their own wellness and that of their teams. #wellness #leadership #startups 

 Like, Follow and Share CXposè.tech
's Socials below: 💎 
🎉Website: https://cxpose.tech/
🎉Linkedin: https://lnkd.in/g5XjYfWW
🎉TikTok: https://lnkd.in/gUypxwVn
🎉Instagram: https://www.instagram.com/cxposetech/
🎉X: https://x.com/CxposeT3ch
🔥CXposè.tech has a new playlist!🔥

Hack CX's Javin Chew talks with Chwee Beng about Supply Chain 2.0 and they dive into disaggregation as a result of the eCommerce wave which began in 2007. This eCommerce had sparked the mindset of brands going directly to consumers with their products and services.
Of course, supply chain had to adapt to this massive trend.

🎉This and more were discussed during Episode 2 of Hack CX: Supply Chain 2.0: Embracing AI and Sustainability Practices with Former GM & Board Member Chwee Beng Lee🎉

💯Watch the full episode here: https://youtu.be/yIhYnNwbdzI?feature=shared💯

CXposè.tech's Socials:
Website: https://cxpose.tech/
Linkedin: / cxpos%c3%a9-tech
TikTok: / cxpose
Instagram: https://www.instagram.com/cxposetech/
Loading the next set of instagram posts...