Workshop on Computer Vision in the Wild
@ ECCV 2022, October 23
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept.
Recent works show that learning from large-scale image-text data is a promising approach to building transferable visual models that can effortlessly adapt to a wide range of downstream computer vision (CV) and multimodal (MM) tasks. For example, CLIP , ALIGN and Florence for image classification, ViLD , RegionCLIP and GLIP for object detection. These vision models with language interface are naturally open-vocabulary recogntion models, showing superior zero-shot and few-shot adaption performance on various real-world scenarios.
We propose this "Computer Vision in the Wild" workshop, aiming to gather academic and industry communities to work on CV problems in real-world scenarios, focusing on the challenge of open-set/domain visual recognition and efficient task-level transfer. Since there is no established benchmarks to measure the progress of "CV in the Wild", we develop new benchmarks for image classification and object detection, to measure the task-level transfer ablity of various models/methods over diverse real-world datasets, in terms of both prediction accuracy and adaption efficiency. This workshop will also host two challenges based on the benchmarks.
Call for Papers
Topics of interest include but are not limited to:
- Open-set visual recognition methods, including classification, object detection, segmentation in images and videos
- Zero/Few-shot text-to-image generation/editing; Open-domain visual QA & image captioning
- Unified neural networks architectures and training objectives over different CV & MM tasks
- Large-scale pre-training, with images/videos only, image/video-text pairs, and external knoweldge
- Efficient large visual model adaptation methods, measured by #training samples (zero-shot and few-shot), #trainable parameter, throughput, training cost
- New metrics / benchmarks / datasets to evaluate task-level transfer and open-domain visual recognition
We accept abstract submissions to our workshop. All submissions shall have maximally 8 pages (excluding references) following the ECCV 2022 author guidelines. All submissions will be reviewed by the Program Committee on the basis of technical quality, relevance to scope of the conference, originality, significance, and clarity.
Submission Portal: [CMT]
CV in the Wild Challenges
There are two challenges associated with this workshop: "Image Classification in the Wild" (ICinW) and "Object Detection in the Wild" (ODinW). We summarize their evaluation datasets and metrics in the table below.
- For the academic track, pre-training data is limited to ImageNet21k, Objects365, CC15M, and YFCC15M
- For the industry track, there is no limitation on pre-training data and model size. Teams are required to disclose meta info of model and data if extra data is used.
More information about the challenges are released: [Benchmark] [Document] . Our evaluation server will be online soon.
Invited Speakers (TBD)
Challenge Organizers (TBD)
Workshop and Challenge Questions?
Reach out: https://github.com/Computer-Vision-in-the-Wild/eccv-2022
Workshop Organizing Team