Publications

2026

GST-WorldVLA: Fine-Grained Gaussian Spatial Tokenization for Vision-Language-Action and World Model
Md Selim Sarowar, Md Tanvir Islam, *Sungho Kim and Sangtae Ahn.
CoRL'26
arxiv | PDF

GST-VLA Pro: Integrating 3D Depth-Aware Chain-of-Thought with Gaussian Spatial Tokenization for VLA Model
Md Selim Sarowar, Md Tanvir Islam, *Sungho Kim and Sangtae Ahn.
BMVC'26
arxiv | PDF

GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models
Md Selim Sarowar, Omer Tariq and *Sungho Kim.
CVPRw'26, ACCV'26
arxiv | PDF

C3G-VM6D: Data-Efficient C3G Vision Model Aided 6D Pose Estimation based on RGB-D Data
Md Selim Sarowar, Manar Alnaasan and *Sungho Kim
IEEE Access(SCIE-Q1, IF: 3.9)
IEEE Access | PDF

Vision-Language-Action and Vision Language Models for Robot Manipulation: A Comprehensive Review Towards Real-World Applications
Md Selim Sarowar and *Sungho Kim
PeerJ Computer Science(SCIE-Q1, IF:3)
PeerJ Computer Science | PDF

Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language Models
Manar Alnaasan, Md Selim Sarowar and *Sungho Kim
Pattern Recognition, Elsevier(under review)
arxiv | PDF

2025

VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose Estimation
Md Selim Sarowar and *Sungho Kim
arxiv(Incomplete Paper)
arxiv | PDF

VLM6D: VLM based 6Dof Pose Estimation based on RGB-D Images
Md Selim Sarowar and *Sungho Kim
The Institute Of Electronics & Information Engineers(ieie Fall 2025 conference)
IEIE | PDF

Hand Gesture Recognition Systems: A Review of Methods, Datasets, and Emerging Trends
*Md Selim Sarowar, and Nur E Jannatul Farjana et. all
International Journal of Computer Applications
IJCA Journal | PDF

2022

Improvement of Denoising in Images Using Generic Image Denoising Network (GID Net)
Md Selim Sarowar, Kaustav Dutta, and *Rasmita Lenka
IEEE 2nd International Conference on Applied Electromagnetics, Signal Processing and Communication (AESPC), Nov. 2021
IEEE Xplore | PDF