Abstract: Image-text matching is an important task in cross-modal information processing, which consists of evaluating the similarity between images and text. However, the data of the two modalities ...