Abstract: Referring Video Object Segmentation (RVOS) aims to segment objects in videos based on natural language descriptions, which requires accurate spatial localization and temporal consistency. In ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results