The 2nd Workshop on Computer Vision in the Wild
@ CVPR 2023, June 19
Overview
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concepts.
Recent works show that learning from large-scale image-text data is a promising approach to building transferable visual models that can effortlessly adapt to a wide range of downstream computer vision (CV) and multimodal (MM) tasks. For example,
CLIP, ALIGN and Florence for image classification,
ViLD, RegionCLIP, GLIP and OWL-ViT for object detection,
GroupViT,
OpenSeg,
MaskCLIP, X-Decoder, Segment Anything (SAM) and SEEM for segmentation, LLaVA for langauge-and-image instruction-following chatbots built towards multimodal GPT-4 capabities.
These vision models with language or interactive interface are naturally open-vocabulary recogntion models, showing superior zero-shot and few-shot adaption performance on various real-world scenarios.
We host this "Computer Vision in the Wild (CVinW)" workshop, aiming to gather academic and industry communities to work on CV and MM problems in real-world scenarios, focusing on the challenge of open-set/domain visual recognition at different granularities and efficient task-level transfer.
To measure the progress of CVinW, we develop new benchmarks for image classification, object detection and segmentation to measure the task-level transfer ablity of various models/methods over diverse real-world datasets, in terms of both prediction accuracy and adaption efficiency. This workshop is a continuation of our
ECCV 2022 CVinW Workshop. For those who are new to this topic, please check out the CVinW Reading List.
-
In this year, our workshop will host two new challenges:
- Segmentation in the Wild (SGinW): Open-set instance/semantic/panoptic segmentation on dozens of semgnetaion datasets in the realistic scenarios.
- Roboflow 100 for Object Detection in the Wild: An augmented version of our ODinW by increasing the datasets to hundreds to cover more diverse application domains.
-
The existing challenges based on the ELEVATER benchmark are also welcoming new submisions:
- Image Classification in the Wild (ICinW)
- Object Detection in the Wild (ODinW)
Dates
Feb, 2023
Competition starts, testing phase begins
June 2nd, 2023
Competition ends (challenge paper submission)
April 28st, 2023
Workshop paper submission deadline
May 19th, 2023
Workshop paper acceptance decision to authors
June 2nd, 2023
Camera-ready submission deadline
Keynote Speaker

Andrew Ng
Founder of DeepLearning.AI, Landing AI, General Partner at AI Fund, Chairman and Co-Founder of Coursera and an Adjunct Professor at Stanford University.
Invited Speakers/Panelists
Call for Papers
Topics of interest include but are not limited to:
- Open-set visual recognition methods, including classification, object detection, segmentation in images and videos
- Zero/Few-shot text-to-image generation/editing; Open-domain visual QA & image captioning & Multimodal instruction-following chatbot
- Unified neural networks architectures and training objectives over different CV & MM tasks
- Large-scale pre-training, with images/videos only, image/video-text pairs, and external knoweldge
- Efficient large visual model adaptation methods, measured by #training samples (zero-shot and few-shot), #trainable parameters, throughput, training cost
- New metrics / benchmarks / datasets to evaluate task-level transfer and open-set visual recognition
We accept abstract submissions to our workshop. All submissions shall have maximally 8 pages (excluding references) following the CVPR 2023 author guidelines. All submissions will be reviewed by the Program Committee on the basis of technical quality, relevance to scope of the conference, originality, significance, and clarity. Ther review process is double-blind, and the accepted papers are NOT archived in CVPR proceeding.
Workshop Paper Submission Portal:
[CMT]
Computer Vision in the Wild Challenges
The two new challenges are developed:
|
|
|
|
![]() |
|
|
|
![]() |
![]() |
|
|
|
![]() |
![]() |
-
The two existing challenges associated with this workshop: "Image Classification in the Wild" (ICinW) and "Object Detection in the Wild" (ODinW). We summarize their evaluation datasets and metrics in the table below.
- For the academic track, pre-training data is limited: (1) ICinW: ImageNet21K (Removing ImageNet1K), CC3M+CC12M, YFCC15M; (2) ODinW: Objects365; (3) SGinW: COCO, RefCOCO-g.
- For the industry track, there is no limitation on pre-training data and model size. Teams are required to disclose meta info of model and data if extra data is used. Here are some publicly available image-text datasets: (1) FLAVA Public Multimodal Datasets (PMD) corpus with 70M pairs; (2) LAION with 400M or 5B pairs.
|
|
|
|
![]() |
|
|
|
![]() |
![]() |
|
|
|
![]() |
![]() |
Please see the submission pages for detailed requirements in each Challenge -> Track -> Phase. More information about the challenge benchmark is released: [Benchmark] [Document] [Data Download]. Please reach out if you have any issue in submissions.
Tentative Schedule
|
![]() |
KeyNote
Andrew Ng |
|
![]() |
Invited Talk
Anelia Angelova |
|
|
Spotlight Paper Presentations
|
|
|
Challenge Summary
|
|
![]() |
Jiasen Lu |
|
![]() |
Katerina Fragkiadaki |
|
Lunch Break
|
|
|
![]() |
Invited Talk
Kristen Grauman |
|
![]() |
Invited Talk
Ziwei Liu |
|
Poster Session (Afternoon Break)
|
|
|
![]() |
Invited Talk
Yinfei Yang |
|
![]() |
Invited Talk
Justin Johnson |
|
![]() ![]() ![]() ![]() ![]() |
Panel Discussion
Boqing Gong, Bryan A. Plummer, Katerina Fragkiadaki, Dhruv Batra, Jacob Solawetz |
Workshop Organizers
Challenge Organizers (TBD)
Workshop and Challenge Questions?
Reach out: https://github.com/Computer-Vision-in-the-Wild/cvpr-2023
Workshop Organizing Team