Charlotte Vision Lab

The Charlotte Vision Lab is a computer vision research group at UNC Charlotte focused on building systems that understand and act in the visual world. Our work spans video understanding, multimodal learning, robotic perception, generative modeling, and trustworthy machine vision. Our goal is to build trustworthy systems that can perceive, reason, and assist in complex real-world environments.

Meet the team

Research

Our Research Directions

Video Understanding for Activities of Daily Living

Temporal action detection, dense activity recognition, and long-video reasoning over unedited, real-world sequences.

Vision-Language and Vision-Language-Action Models

Visual question answering, domain adaptation, interpretable decision-making, and embodied reasoning.

Reliable 3D and Generative Vision

3D scene understanding, controllable image generation, and uncertainty estimation in open-world settings.

Highlights

Selected Publications and Preprints

CVPR 2026

MS-Temba

Temporal modeling for long untrimmed video understanding.

View paper

CVPR 2026

Personalized Image Descriptions from Attention Sequences

A method to generate image descriptions in the style of a subject using attention sequences.

View paper

CVPR 2025

LLAVIDAL

A multimodal large language vision model for daily activities of living.

View paper

Preprint

UniLACT

A depth-aware latent action framework for vision-language-action models in robot learning.

View paper