Workshop on Computer Vision in the Wild
@ ECCV 2022, October 23



Overview

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept.

Recent works show that learning from large-scale image-text data is a promising approach to building transferable visual models that can effortlessly adapt to a wide range of downstream computer vision (CV) and multimodal (MM) tasks. For example, CLIP , ALIGN and Florence for image classification, ViLD , RegionCLIP and GLIP for object detection. These vision models with language interface are naturally open-vocabulary recogntion models, showing superior zero-shot and few-shot adaption performance on various real-world scenarios.

We propose this "Computer Vision in the Wild" workshop, aiming to gather academic and industry communities to work on CV problems in real-world scenarios, focusing on the challenge of open-set/domain visual recognition and efficient task-level transfer. Since there is no established benchmarks to measure the progress of "CV in the Wild", we develop new benchmarks for image classification and object detection, to measure the task-level transfer ablity of various models/methods over diverse real-world datasets, in terms of both prediction accuracy and adaption efficiency. This workshop will also host two challenges based on the benchmarks.

Call for Papers

    Topics of interest include but are not limited to:
  • Open-set visual recognition methods, including classification, object detection, segmentation in images and videos
  • Zero/Few-shot text-to-image generation/editing; Open-domain visual QA & image captioning
  • Unified neural networks architectures and training objectives over different CV & MM tasks
  • Large-scale pre-training, with images/videos only, image/video-text pairs, and external knoweldge
  • Efficient large visual model adaptation methods, measured by #training samples (zero-shot and few-shot), #trainable parameters, throughput, training cost
  • New metrics / benchmarks / datasets to evaluate task-level transfer and open-set visual recognition

  • We accept abstract submissions to our workshop. All submissions shall have maximally 8 pages (excluding references) following the ECCV 2022 author guidelines. All submissions will be reviewed by the Program Committee on the basis of technical quality, relevance to scope of the conference, originality, significance, and clarity. Ther review process is double-blind, and the accepted papers are non-archived in ECCV proceeding.

    Workshop Paper Submission Portal: [CMT]
    Note: Top workshop papers are highly encouraged to extend and go through a more rigorous peer-review process for International Journal of Computer Vision (IJCV) special issue on ``Promises and Dangers of Large Vision Models''

CV in the Wild Challenges

    There are two challenges associated with this workshop: "Image Classification in the Wild" (ICinW) and "Object Detection in the Wild" (ODinW). We summarize their evaluation datasets and metrics in the table below.

    Challenge
    Eval Datasets
    Eval Metrics
    Make a Challenge Submission
    ICinW
    20 Image Classification Datasets
    Zero, few, full-shot
    ODinW
    35 Object Detection Datasets
    Zero, few, full-shot
    To prevent a race purely in pre-training data and model size, we will have two tracks.
  • For the academic track, pre-training data is limited: (1) ICinW: ImageNet21K (Removing ImageNet1K), CC3M+CC12M, YFCC15M; (2) ODinW: Objects365.
  • For the industry track, there is no limitation on pre-training data and model size. Teams are required to disclose meta info of model and data if extra data is used. Here are some publicly available image-text datasets: (1) FLAVA Public Multimodal Datasets (PMD) corpus with 70M pairs; (2) LAION with 400M or 5B pairs.

  • Please see the submission pages for detailed requirements in each Challenge -> Track -> Phase. More information about the challenge benchmark is released: [Benchmark] [Document]. Please reach out if you have any issue in submissions.

Dates

July 25, 2022 Competition starts, testing phase begins
October 7, 2022 Competition ends (challenge paper submission)
September 30, 2022 (Extended) Workshop paper submission deadline
October 9,2022 Workshop paper acceptance decision to authors
October 16,2022 Camera-ready submission deadline


Invited Speakers (TBD)

Program (TBD)


Workshop Organizers



Chunyuan Li
Microsoft



Jyoti Aneja
Microsoft



Jianwei Yang
Microsoft



Xin Wang
Microsoft



Pengchuan Zhang
Meta AI



Haotian Liu
UW Madison



Haotian Zhang
University of Washington



Liunian Li
UCLA



Aishwarya Kamath
NYU


Challenge Organizers



Yinfei Yang
Apple



Yi-Ting Chen
Google



Ye Xia
Google



Yangguang Li
Sensetime



Feng Liang
UT Austin



Yufeng Cui
Sensetime



Ping Jin
Microsoft



Shohei Ono
Microsoft



Houwen Peng
Microsoft



Saining Xie
NYU/Meta



Amanpreet Singh
HuggingFace



Xiaojie Jin
Bytedance



Jiashi Feng
Bytedance



Junyang Lin
Alibaba



An Yang
Alibaba



Peng Wang
Alibaba



Nguyen Bach
Microsoft



Junnan Li
Salesforce



Han Hu
Microsoft


Advisory Committee



Trevor Darrell
UC Berkley



Lei Zhang
IDEA



Yong Jae Lee
UW Madison



Houdong Hu
Microsoft



Zicheng Liu
Microsoft



Ce Liu
Microsoft



Xuedong Huang
Microsoft



Kai-Wei Chang
UCLA



Jingdong Wang
Baidu



Zhuowen Tu
UCSD



Jenq-Neng Hwang
University of Washington



Jianfeng Gao
Microsoft



Yann LeCun
NYU/Meta


Workshop and Challenge Questions?
Reach out: https://github.com/Computer-Vision-in-the-Wild/eccv-2022
Workshop Organizing Team