Frermann, Lea; Lapata, Mirella

Categorization in the Wild: Category and Feature Learning across Languages

2021

Abstract

Categories such as 'animal' or 'furniture' play a pivotal role in processing, organizing, and communicating world knowledge. Many theories and computational models of categorization exist, but evaluation has disproportionately focused on artificially simplified learning problems (e.g., by assuming a given set of relevant features or small data sets); and on English native speakers. This paper presents a large-scale computational study of category and feature learning. We approximate the learning environment with natural language text, and scale previous work in three ways: We (1) model the full complexity of the learning process, acquiring learning categories and structured features jointly; (2) study the generalizability of categorization models to five diverse languages; and (3) learn categorizations comprising hundreds of concepts and thousands of features. Our experiments show that meaningful representations emerge across languages. We further demonstrate a joint model of category and feature acquisition produces more relevant and coherent features than simpler models, suggesting it as an exploratory tool to support cross-cultural categorization studies.

Main Content

For improved accessibility of PDF content, download the file to your device.

Proceedings of the Annual Meeting of the Cognitive Science Society

Categorization in the Wild: Category and Feature Learning across Languages