Abstract: Knowledge-based Visual Question Answering (VQA) is a challenging task that requires models to access external knowledge for reasoning. Large Language Models (LLMs) have recently been ...
Abstract: Image/video coding has been a remarkable research area for both academia and industry for many years. Testing datasets, especially high-quality image/video datasets, are desirable for the ...