Abstract: In recent years, multimodal social relation recognition has become a critical task in the fields of computer vision and natural language processing. However, existing research still faces ...
Abstract: Recently, video-based large language models (video-based LLMs) have achieved impressive performance across various video comprehension tasks. However, this rapid advancement raises ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results