The 2nd Workshop on Computer Vision in the Wild
@ CVPR 2023, June 19

Overview

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concepts.

Recent works show that learning from large-scale image-text data is a promising approach to building transferable visual models that can effortlessly adapt to a wide range of downstream computer vision (CV) and multimodal (MM) tasks. For example, CLIP, ALIGN and Florence for image classification, ViLD, RegionCLIP, GLIP and OWL-ViT for object detection, GroupViT, OpenSeg, MaskCLIP, X-Decoder, Segment Anything (SAM) and SEEM for segmentation, LLaVA for langauge-and-image instruction-following chatbots built towards multimodal GPT-4 capabities. These vision models with language or interactive interface are naturally open-vocabulary recogntion models, showing superior zero-shot and few-shot adaption performance on various real-world scenarios.

We host this "Computer Vision in the Wild (CVinW)" workshop, aiming to gather academic and industry communities to work on CV and MM problems in real-world scenarios, focusing on the challenge of open-set/domain visual recognition at different granularities and efficient task-level transfer. To measure the progress of CVinW, we develop new benchmarks for image classification, object detection and segmentation to measure the task-level transfer ablity of various models/methods over diverse real-world datasets, in terms of both prediction accuracy and adaption efficiency. This workshop is a continuation of our ECCV 2022 CVinW Workshop. For those who are new to this topic, please check out the CVinW Reading List.


Dates

Feb, 2023 Competition starts, testing phase begins
June 2nd, 2023 Competition ends (challenge paper submission)
April 28st, 2023 Workshop paper submission deadline
May 19th, 2023 Workshop paper acceptance decision to authors
June 2nd, 2023 Camera-ready submission deadline


Keynote Speaker



Andrew Ng
Founder of DeepLearning.AI, Landing AI, General Partner at AI Fund, Chairman and Co-Founder of Coursera and an Adjunct Professor at Stanford University.


Invited Speakers/Panelists



Kristen Grauman
University of Texas at Austin



Boqing Gong
Google



Justin Johnson
University of Michigan | FAIR



Yinfei Yang
Apple



Bryan A. Plummer
Boston University



Ziwei Liu
NTU



Jacob Solawetz
Roboflow



Anelia Angelova
Google Brain



Jiasen Lu
Allen Instutite for AI



Katerina Fragkiadaki
CMU



Dhruv Batra
Georgia Tech | FAIR


Call for Papers

Topics of interest include but are not limited to:

  • Open-set visual recognition methods, including classification, object detection, segmentation in images and videos
  • Zero/Few-shot text-to-image generation/editing; Open-domain visual QA & image captioning & Multimodal instruction-following chatbot
  • Unified neural networks architectures and training objectives over different CV & MM tasks
  • Large-scale pre-training, with images/videos only, image/video-text pairs, and external knoweldge
  • Efficient large visual model adaptation methods, measured by #training samples (zero-shot and few-shot), #trainable parameters, throughput, training cost
  • New metrics / benchmarks / datasets to evaluate task-level transfer and open-set visual recognition

  • We accept abstract submissions to our workshop. All submissions shall have maximally 8 pages (excluding references) following the CVPR 2023 author guidelines. All submissions will be reviewed by the Program Committee on the basis of technical quality, relevance to scope of the conference, originality, significance, and clarity. Ther review process is double-blind, and the accepted papers are NOT archived in CVPR proceeding.

    Workshop Paper Submission Portal: [CMT]


Computer Vision in the Wild Challenges

The two new challenges are developed:

Challenge
Eval Datasets
Eval Metrics
Instructions
Make a Challenge Submission
SGinW
25 Image Segmentation Datasets
Zero, few, full-shot
RF100
100 Object Detection Datasets
Zero, few, full-shot
    The two existing challenges associated with this workshop: "Image Classification in the Wild" (ICinW) and "Object Detection in the Wild" (ODinW). We summarize their evaluation datasets and metrics in the table below.

    Challenge
    Eval Datasets
    Eval Metrics
    Instructions
    Make a Challenge Submission
    ICinW
    20 Image Classification Datasets
    Zero, few, full-shot
    ODinW
    35 Object Detection Datasets
    Zero, few, full-shot
    To prevent a race purely in pre-training data and model size, we will have two tracks.
  • For the academic track, pre-training data is limited: (1) ICinW: ImageNet21K (Removing ImageNet1K), CC3M+CC12M, YFCC15M; (2) ODinW: Objects365; (3) SGinW: COCO, RefCOCO-g.
  • For the industry track, there is no limitation on pre-training data and model size. Teams are required to disclose meta info of model and data if extra data is used. Here are some publicly available image-text datasets: (1) FLAVA Public Multimodal Datasets (PMD) corpus with 70M pairs; (2) LAION with 400M or 5B pairs.

  • Please see the submission pages for detailed requirements in each Challenge -> Track -> Phase. More information about the challenge benchmark is released: [Benchmark] [Document] [Data Download]. Please reach out if you have any issue in submissions.


Tentative Schedule

9 AM - 9:30 AM PT
KeyNote
Andrew Ng
9:30 AM - 10:00 AM PT
Invited Talk
Anelia Angelova

10:00 AM - 10:30 AM PT
Spotlight Paper Presentations
10:30 AM - 11:00 AM PT
Challenge Summary
11:00 AM - 11:30 AM PT

Jiasen Lu

11:30 AM - 12:00 AM PT

Katerina Fragkiadaki

12 PM - 1:30 PM PT

Lunch Break



1:30 PM - 2 PM PT
Invited Talk

Kristen Grauman

2 PM - 2:30 PM PT
Invited Talk

Ziwei Liu

2:30 PM - 3:30 PM PT

Poster Session (Afternoon Break)



3:30 PM - 4:00 PM PT
Invited Talk
Yinfei Yang

4:00 PM - 4:30 PM PT
Invited Talk
Justin Johnson

4:30 PM - 5:30 PM PT
Panel Discussion
Boqing Gong
,
Bryan A. Plummer
,
Katerina Fragkiadaki
,
Dhruv Batra
,
Jacob Solawetz

Workshop Organizers



Jianwei Yang
Microsoft



Haotian Zhang
Apple



Haotian Liu
UW Madison



Xiuye Gu
Google



Chunyuan Li
Microsoft



Neil Houlsby
Google



Jianfeng Gao
Microsoft


Challenge Organizers (TBD)



Xueyan Zou
UW Madison



Francesco Zuppichini
Roboflow


Workshop and Challenge Questions?
Reach out: https://github.com/Computer-Vision-in-the-Wild/cvpr-2023
Workshop Organizing Team