This dataset is based on genealogical data from seven Indigenous groups across three language families—Witotoan (Murui-Muina encompassing two varieties: Murui and Mɨnɨka; Ocaina; Nonuya), Boran (Bora; Muinane), and Arawak (Resígaro)—and one linguistic isolate, Andoke, in the Caquetá-Putumayo River Basins of Colombia. It was collected during fieldwork in Leticia, Colombia, in 2024 by linguist Dr. Katarzyna I. Wojtylak, under the SONATA 16 funding scheme by the National Science Centre (NCN).
The dataset is part of the cross-disciplinary project Social Limits of Languages: The Dynamics of Contact in Northwest Amazonia, which aims to develop a framework for analyzing diffusion patterns in language contact scenarios among the Indigenous languages of the Caquetá-Putumayo region. It provides valuable genealogical data documenting intermarriage, multilingualism, and language vitality, as well as insights into historical linguistic interactions.
Spanning both pre-Rubber Boom societal structures (19th century) and contemporary changes in language use, the dataset offers a comprehensive view of sociolinguistic dynamics in the region. It enables researchers to trace the evolution of language transmission, family structures, and the impact of sociopolitical changes, including colonialism and economic exploitation.
The dataset is structured into CSV and PNG files, containing individual genealogical information, familial relationships, marital ties, and linguistic repertoires. This resource facilitates research into the long-term effects of intermarriage, language contact, and cultural exchange, offering a tool for scholars in sociolinguistics, anthropology, and Indigenous studies.
Available in two formats—open-access anonymized data (CC BY license) and restricted-access data (with personal identifiers available upon request)—this dataset is a valuable resource for reconstructing the social and linguistic history of the Caquetá-Putumayo River Basins in Northwest Amazonia.