Evaluation of Language-augmented Visual Task-level Transfer.
20 image classification datasets / 35 object detection datasets.
Automatic hyper-parameter tuning; Strong language-augmented efficient adaptation methods
Each dataset concept is augmented with diverse knowledge source include: WordNet, Wiktionary, and GPT3.
To track the research advances in language-image models.
The ELEVATER benchmark is a collection of resources for training, evaluating, and analyzing language-image models on image classification and object detection. ELEVATER consists of:
The ultimate goal of ELEVATER is to drive research in the development of language-image models to tackle core computer vision problems in the wild.
|
|
|
@article{li2022elevater,
title={ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models},
author={Li, Chunyuan and Liu, Haotian and Li, Liunian Harold and Zhang, Pengchuan and Aneja, Jyoti and Yang, Jianwei and Jin, Ping and Hu, Houdong and Liu, Zicheng and Lee, Yong Jae and Gao, Jianfeng},
journal={Neural Information Processing Systems},
year={2022}
}