Name: A Hybrid ViT–GRU Architecture for Myanmar-Script Video Captioning
Start: 2026-06-23T11:00:00+0800
End: 2026-06-23T13:00:00+0800

A Hybrid ViT–GRU Architecture for Myanmar-Script Video Captioning

Tuesday June 23, 2026 11:00am - 1:00pm PST

Virtual Room B

Open Zoom

Authors - Nway Nway Zaw Win, Aye Nyein Mon, Win Lelt Lelt Phyu
Abstract - Generating natural language descriptions for visual content is a key task bridging Computer Vision and Natural Language Processing. Conventional CNN-based approaches often struggle to capture global contextual information, limiting semantic consistency. This paper presents a multimodal video captioning framework for Myanmar-script generation based on a Vision Transformer (ViT) encoder and a Gated Recurrent Unit (GRU) decoder. Global visual representations are derived from transformer-based self-attention, while a class-prefixing mechanism is introduced to improve semantic grounding in a low-resource language setting. Experimental results evaluated using BLEU, CHRF, and TER metrics demonstrate that the proposed ViT–GRU model outperforms CNN–RNN baselines. PCA and t-SNE visualizations further confirm the effectiveness of transformer-based visual representations.

Paper Presenter

Nway Nway Zaw Win

Myanmar

Tuesday June 23, 2026 11:00am - 1:00pm PST
Virtual Room B Manila, Philippines

Virtual Room 2B, Virtual Room B

International Conference on Technological Intelligence and Business Strategies

Nway Nway Zaw Win

Get help with the event

International Conference on Technological Intelligence and Business Strategies

Nway Nway Zaw Win

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event