Abstract: In recent years, multimodal social relation recognition has become a critical task in the fields of computer vision and natural language processing. However, existing research still faces ...
Abstract: Recently, video-based large language models (video-based LLMs) have achieved impressive performance across various video comprehension tasks. However, this rapid advancement raises ...