top of page

Unleashing the power of image-text matching in real-world applications is hampered by noise correspondence. Manually curating high-quality datasets is expensive and time-consuming, and gathering image-text pairs from the internet introduces noise that significantly degrades model performance. In this paper, we propose a novel model that transforms the noise correspondence filtering problem into a similarity distribution modeling problem. Leveraging the image-text matching capability of CLIP and employing a Gaussian mixture model, our model filters out most of the noise correspondences in image-text pairs. To further minimize the impact of noise correspondence during fine-tuning, we propose a distribution-aware dynamic margin ranking loss that increases the distance between the clean and noisy distributions. Our extensive experiments on three datasets, including the challenging Conceptual Captions, demonstrate the effectiveness and robustness of our model even under high noise rates. Our approach opens up new opportunities for improving image-text matching performance in real-world settings by breaking through the noise.

Abstract

Pipline

model.png

Code & Data

# Code

NCR

ours

# Data

Flickr30K & MS-COCO
& CC152K

Conceptual Captions

Noise file

Pretrained CLIP

Copyright (C) <2023>  Shandong University

 

This program is licensed under the GNU General Public License 3.0 (https://www.gnu.org/licenses/gpl-3.0.html). Any derivative work obtained under this license must be licensed under the GNU General Public License as published by the Free Software Foundation, either Version 3 of the License, or (at your option) any later version, if this derivative work is distributed to a third party.

 

The copyright for the program is owned by Shandong University. For commercial projects that require the ability to distribute the code of this program as part of a program that cannot be distributed under the GNU General Public License, please contact <shihaitao1111@gmail.com> to purchase a commercial license.

bottom of page