Skip to main content
eScholarship
Open Access Publications from the University of California

Benchmarking mid-level vision with texture-defined 3D objects

Abstract

We introduce a new benchmark dataset based on classic methods of studying 3D shape perception from texture and motion, inspired by earlier work on Gestalt principles of perceptual organization and the ecological (Gibsonian) approach to perception of structure in moving displays. The dataset consists of parametric 3D shapes (superquadrics) with procedurally generated textures rotating and translating against a similarly textured backdrop. We expect these stimuli to be challenging for current computer vision models, as they depart from the statistics of real-world or realistically rendered stimuli. We test a variety of models’ ability to segment textured stimuli across three training conditions: pre-trained on naturalistic stimuli, pre-trained+fine-tuned on textured stimuli, trained on textured stimuli. While no models generalize to segment textured stimuli without fine-tuning, performance improves with fine-tuning and training on textured stimuli. We will discuss how this benchmark can guide models of scene perception towards more human-like robustness and generality.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View