Context-Robust Object Recognition via Object Manipulations in a Synthetic 3D Environment
Access StatusFull text of the requested work is not available in DASH at this time ("dark deposit"). For more information on dark deposits, see our FAQ.
Karev, Dimitar Nikolaev
MetadataShow full item record
CitationKarev, Dimitar Nikolaev. 2021. Context-Robust Object Recognition via Object Manipulations in a Synthetic 3D Environment. Bachelor's thesis, Harvard College.
AbstractThe remote control is a small object that does not fly in the air and is generally found on a table, not in the sink. Such contextual regularities are ingrained in our perception of the world and previous research suggests that they can even influence human and computational models' object recognition ability. However, the exact effects of contextual information on object recognition are still unknown for both humans and machine learning models. Here, we introduce a novel way of studying the effects of different contextual cues in a qualitative and systematic way. We present a diverse synthetic dataset created via a 3D simulation engine that allows for complex object modifications. Our dataset consists of more than 15000 images across 36 object categories and it is designed specifically for studying the effects of gravity, object co-occurrence statistics, and relative size regularities. We conduct a series of psychophysics experiments to assess human performance and establish a benchmark for computational models on the dataset. Additionally, we test state-of-the-art deep learning models on the same dataset and study how contextual information influences their object recognition accuracy. Finally, we propose a context-aware recognition transformer network that integrates contextual and object information via multi-head attention mechanism. Our model captures useful contextual information that allows it to achieve human-level performance and significantly better robustness in out-of-context conditions compared to baseline models across our dataset and another existing out-of-context natural image dataset. Moreover, our model performs in a way that is consistent with human object recognition and shows similar recognition artifacts.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37368550
- FAS Theses and Dissertations