WWW.DIS.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Thesis, dissertations, books
 
<< HOME
CONTACTS



Pages:   || 2 | 3 | 4 | 5 |   ...   | 8 |

«PERFECT PHYLOGENETIC NETWORKS: A NEW METHODOLOGY FOR RECONSTRUCTING THE EVOLUTIONARY HISTORY OF NATURAL LANGUAGES LUAY NAKHLEH DON RINGE TANDY WARNOW ...»

-- [ Page 1 ] --

PERFECT PHYLOGENETIC NETWORKS: A NEW METHODOLOGY FOR

RECONSTRUCTING THE EVOLUTIONARY HISTORY OF NATURAL

LANGUAGES

LUAY NAKHLEH DON RINGE TANDY WARNOW

Rice University University of University of Texas

Pennsylvania In this article we extend the model of language evolution exemplified in Ringe et al. 2002, which recovers phylogenetic trees optimized according to a criterion of weighted maximum compatibility, to include cases in which languages remain in contact and trade linguistic material as they evolve. We describe our analysis of an Indo-European (IE) dataset (originally assembled by Ringe and Taylor) based on this new model. Our study shows that this new model fits the IE family well and suggests that the early evolution of IE involved only limited contact between distinct lineages. Furthermore, the candidate histories we obtain appear to be consistent with archaeological findings, which suggests that this method may be of practical use. The case at hand provides no opportunity to explore the problem of conflict between network optimization criteria; that problem must be left to future research.*

1. INTRODUCTION. Languages differentiate and divide into new languages by a process roughly similar to biological speciation:1 communities separate (typically geographically), the language changes differently in each of the new communities, and in time people from separate communities can no longer understand each other.2 While this is not the only way in which languages change, it is this process that is referred to when we say, for example, ‘French and Italian are both descendants of Latin’. The evolution of families of related languages can be modeled mathematically as a rooted tree in which internal nodes represent ancestral languages at the points in time at which they began to diversify and the leaves represent attested languages. Reconstructing this process for various language families is a major endeavor within historical linguistics, but it is also of interest to archaeologists, human geneticists, and physical anthropologists, for example, because an accurate reconstruction of how particular families of languages have evolved can help answer questions about human migrations, the times at which new technologies were first developed, when ancient people began to use * This work was supported in part by the David and Lucile Packard Foundation (Warnow) and by the National Science Foundation with grants EIA 01-21680 (Warnow), BCS 03-12830 (Warnow), SBR-9512092 (Warnow and Ringe), and BCS 03-12911 (Ringe). Warnow would like to acknowledge and thank the Radcliffe Institute for Advanced Study, the Program in Evolutionary Dynamics at Harvard University, andthe Institute for Cellular and Molecular Biology at the University of Texas at Austin for their support during the time this work was done. The authors would like to thank Ann Taylor for help in putting the dataset together and James Clackson, Joe Eska, and Craig Melchert for expert advice regarding data of particular languages.

The software used to construct perfect phylogenetic networks was written by Luay Nakhleh but used earlier code (for perfect phylogeny reconstruction) developed by Alexander Michailov and optimized by Alex Garthwaite.

We take this opportunity to point out that the similarity between biological and linguistic speciation has nothing whatsoever to do with nineteenth-century ideas about the ‘organic’ nature of language. The micro-level processes of biological descent and linguistic descent are actually quite different, but they give rise to similar large-scale patterns, and the similarities are topological—that is, mathematical (see Hoenigswald 1960:144–60, 1987, Ruvolo 1987).

We are well aware that whether one is confronted with ‘the same language’ or ‘different languages’ is a complex matter. However, it seems difficult to dispute that two speakers who cannot understand one another at all are ‘speaking different languages’; we therefore adduce that situation as the paradigm case. What matters for cladistics is that, given enough divergence with too little effective contact, a single language will eventually become two or more different languages by any reasonable criterion.

PERFECT PHYLOGENETIC NETWORKS 383

horses, and so on (see e.g. White & O’Connell 1982, Mallory 1989, Roberts et al.

1990).3 Various researchers (e.g. Gleason 1959, Dobson 1969, 1974, Embleton 1986) have noted that if speech communities do not remain in effective contact as their languages diverge, a tree is a reasonable model for the evolutionary history of their language family, and that this tree (called a PHYLOGENY or EVOLUTIONARY TREE) can be inferred from shared unusual innovations in language structure (changes in inflection, regular sound changes, and the replacement of lexemes for basic meanings). Such techniques established the major subfamilies within Indo-European (IE) decades ago but have not been sufficient to resolve the family’s evolution fully; major questions, such as whether all of the non-Anatolian branches of the family constitute a clade (the ‘Indo-Hittite hypothesis’) or whether Greek and Armenian are sisters, continue to be debated. More recently, techniques for using multistate characters have been devised which suggest that the vast majority of linguistic characters,4 provided that they are correctly chosen and coded, should be COMPATIBLE on the true tree (see Ringe et al. 2002:70–78 with references); in other words, each character should evolve without backmutation or parallel evolution.5 This condition is also expressed by saying that the tree is a PERFECT PHYLOGENY, that is, a phylogenetic tree that is fully compatible with all of the data.





(See §2 for an extended discussion of those requirements.) A collaboration between linguist Don Ringe and computer scientist Tandy Warnow led to a computational technique to solve the ‘perfect phylogeny’ problem (determining whether a perfect phylogeny exists for a given dataset); that technique was subsequently used to analyze an IE dataset compiled by Don Ringe and Ann Taylor (see the references under all three authors in the bibliography). Their initial test of the methodology largely supported the claim that a perfect phylogeny should exist, but not entirely. The Germanic subfamily especially seemed to exhibit nontreelike behavior, evidently acquiring some of its characteristics from its neighbors rather than (only) from its direct ancestors.6 Readers who have not been trained in historical linguistics also need to understand that recognition of language families is different from and independent of the reconstruction of phylogenetic trees, and that the recognition of cognates—words and affixes inherited by two or more related languages from any common ancestor—also does not depend on prior knowledge of the true tree. Cognates are recognized by the regular correspondences between their sounds that are the direct result of regular sound changes; see especially Hoenigswald 1960 for discussion of this fundamental point. Cognates cannot be reliably recognized by mere similarity. Language families are recognized by a density of putative cognates too great to be attributed to mere chance resemblance; see Ringe 1999 for some of the problems involved.

A character is a linguistic parameter in which languages can agree or differ; languages are assigned the same state of the character if they agree, but different states if they differ. Characters of interest in linguistic phylogeny are highly specific, since general characters (such as word order) typically reveal much less about shared linguistic history. For instance, the basic meaning ‘hand’ can be chosen as a character; IE languages that exhibit cognates of English hand will all be assigned a single state of that character, languages that exhibit cognates of French main a second state, and so on. Among phonological developments, across-theboard merger of Proto-Indo-European (PIE) *m and *mbh can be chosen as a character; the two Tocharian languages (which share that merger) will then be assigned one state, while all the other IE languages in our database (which did not undergo the merger) will be assigned another state. On the coding of characters see further Ringe et al. 2002:71–76.

Backmutation is the reappearance at some point in the phylogenetic tree of a state that has already appeared at some earlier point in the same line of descent but was subsequently lost; in other words, a sequence of states a N b (N c... ) N a in a single line of descent is backmutation. Parallel development is the appearance of the same state independently in different lines of descent.

We wish to emphasize that this appears to be an ineluctable conclusion of Ringe et al. 2002; we see no grounds for questioning it and do not revisit the problem here. Interested readers are referred to Ringe et al. 2002, especially pp. 85–92. Since the best tree found in that earlier work also figures largely in this 384 LANGUAGE, VOLUME 81, NUMBER 2 (2005) Consequently, though their methodology seemed promising and offered potential answers to many of the controversial problems in the evolution of IE (cf. Jasanoff 1997, Winter 1998, Ringe 2000 with references), it is clearly necessary to extend their model to address the problem of how characters evolve when diverging language communities remain in significant contact. For such cases trees are not an appropriate model of evolution; NETWORKS are needed instead to model the evolutionary history of the family.

In this article we show how to extend the perfect phylogeny approach to the case in which the language family requires a network model (that is, an underlying tree with additional ‘contact’ edges; see Fig. 3 for an example) instead of a tree model, and we test this approach on the same IE dataset analyzed by Ringe, Warnow, and Taylor. Our analysis finds several networks with a very small number of contact edges that are plausible with respect to what is known about the early linguistic geography of the IE family. The study thus leads us to conjecture that the IE family, though it did not evolve by means of clean speciation, exhibits a pattern of initial diversification that is close to treelike: the vast majority of characters evolve down the ‘genetic’ tree, and the evolution of the rest can be accounted for by positing limited borrowing between languages. It also suggests that this extended model of character evolution is plausible and that the tools we have developed may be helpful in reconstructing evolutionary histories for other datasets that are similarly close to treelike in their evolution.

The rest of this article is organized as follows. We review the model of Ringe and Warnow, and then present our extension to the case of network evolution. We next describe the data we use to represent the IE family, and then turn to our computational analysis of the data which results in the candidate networks we then consider. Comparing the candidate networks in the light of known IE history produces a set of five feasible solutions, leading to a detailed discussion of the best network that we find.

We conclude with a discussion of the implications of this work for future research in IE and general historical linguistics. Notes on the formal mathematical model of language evolution on networks and the computational approach are given in Appendix A. The full set of our coded data, together with a list of characters omitted and the reasons for their omission, are made available in an online appendix at http://www.cs.rice.edu/ nakhleh/CPHL; a selection is given in Appendix B.

2. INFERRING EVOLUTIONARY TREES. An evolutionary tree, or phylogeny, for a language family S describes the evolution of the languages in S from their most recent common ancestor. Different types of data can be used as input to methods of tree reconstruction; QUALITATIVE CHARACTER data, which reflect specific observable discrete characteristics of the languages under study, are one such type of data. Qualitative characters for languages can encode phonological, morphological, and lexical evidence, as described immediately below. Current approaches for subgrouping used in historical linguistics explicitly select characters that appear to have evolved without backmutation or parallel development; because of this, our analysis is based on a subset of the characters (eliminating those with clear parallel development, in particular). We have also found it advisable to eliminate characters that are POLYMORPHIC (those for which at least one language exhibits more than one state) because models of linguistic evolution involving polymorphic characters that are (at least provisionally) accepted as linguistically realistic have not yet been established.

–  –  –

Experience shows that it is easy to construct a comparative dataset using only qualitative characters that evolve without backmutation—that is, characters that never change from a given state to a second state (and potentially to a third, etc.) and then back to the given state (see Ringe et al. 2002:70). The relative absence of backmutation in linguistic data is partly the result of known properties of linguistic systems and language change and partly the result of probabilistic factors. Backmutation in phonological

characters is easy to avoid: since phonemic mergers are irreversible (Hoenigswald 1960:



Pages:   || 2 | 3 | 4 | 5 |   ...   | 8 |


Similar works:

«LOUISIANA STADIUM AND EXPOSITION DISTRICT BOARD OF COMMISSIONERS MINUTES FROM THE MEETING OF FEBRUARY 17, 2015 MEETING PRESENT: Kyle France, Henry Baptiste, Kathleen Blanco, Hilary Landry, Renee Lapeyrolerie, Gregory Morrison, Casey Robin ABSENT: None I. ROLL CALL and WELCOME The newly named Chairman, Kyle France called the meeting to order and led those in attendance in the Pledge of Allegiance followed by a moment of silent prayer. Present were Commissioners France, Baptiste, Blanco, Landry,...»

«ZU CHONGZHI AND THE CHINESE CALENDAR REFORM OF 462 AD A GEM1506K PROJECT BY: LIM NYEK FONG ELAYNE ONG SZE JIE DESMOND TEO CHIH LEANG YANG SHAO-YUN CONTENTS SECTION 1: INTRODUCTION – A BRIEF BIOGRAPHY OF ZU CHONGZHI SECTION 2: HISTORICAL BACKGROUND ON THE NORTHERN AND SOUTHERN DYNASTIES SECTION 3: AN OVERVIEW OF CHINESE CALENDARS AND ASTRONOMY UP TO ZU CHONGZHI’S TIME SECTION 4: ASPECTS OF THE DAMING CALENDAR REFORM SECTION 5: THE ZU CHONGZHI – DAI FAXING DEBATE SECTION 6: HOW EFFECTIVE...»

«Analyse historischer chinesischer Manuskripte mit Hilfe des Shape Context Verfahrens Diplomarbeit Rainer Herzog Betreuer: Dr. rer. nat. Ullrich Köthe Prof. Dr. Leonie Dreschler-Fischer Universität Hamburg, MIN-Fakultät, Department Informatik Arbeitsbereich Kognitive Systeme Vogt-Kölln-Straÿe 30, 22527 Hamburg Dezember 2007 Zusammenfassung Diese Arbeit beschäftigt sich mit der Segmentierung und dem Vergleich einzelner chinesischer Zeichen aus historischen Manuskripten. Dazu werden die...»

«2012 Senior External Examination Ancient History Friday 9 November 2012 Paper Two — Historical sources book 1 pm to 3:40 pm Directions You may write in this book during perusal time. After the examination session Take this book when you leave. Planning space Seen sources for Questions 1 and 2 (Sources A–S) The Spartan Constitution — general information Source A — The Spartan Constitution Numbers* Institution Eligibility Powers and responsibilities Kings 2 Hereditary • Commanded...»

«Valuation Studies 3(1) 2015: 1–7 Editorial note Valuation and Calculation at the Margins Andrea Mennicken and Ebba Sjögren Valuation studies is an emerging feld with visible momentum. This is evidenced not only by the existence of this journal. In 2015 alone, several edited volumes and special issues were published on the explicit theme of examining valuations and how things are made valuable (Berthoin Antal et al. 2015; Cefai et al. 2015; Dussauge et al. 2015; Kornberger et al. 2015). One...»

«The Computational Complexity Column by Lance FORTNOW NEC Laboratories America 4 Independence Way, Princeton, NJ 08540, USA fortnow@nec-labs.com http://www.neci.nj.nec.com/homepages/fortnow/beatcs Every third year the Conference on Computational Complexity is held in Europe and this summer the University of Aarhus (Denmark) will host the meeting July 7-10. More details at the conference web page http://www.computationalcomplexity.org This month we present a historical view of computational...»

«History of Education Vol. 39, No. 4, July 2010, 539–556 Why is there no comprehensive education in Germany? A historical explanation Susanne Wiborg* Institute of Education, Faculty of Policy and Society, London, UK (Received 13 August 2009; final version received 5 February 2010) THED_A_469095.sgm Taylor and Francis Ltd Dr 00 SusanneWiborg Taylor s.wiborg@ioe.ac.uk OriginalofFrancis 0046-760X (print)/1464-5130 History&Article 10.1080/00467601003685733 (online) Education This article...»

«QUEBEC PASSENGER LISTS with Names Compared to the Hamburg Passenger Lists and Families Cross Referenced to Church Registers Annotations by Cathy Friesen Barkman Box 3284, Steinbach, Manitoba R0A 2A0 All rights of reproduction in any form reserved; posted here with permission Original printing: Bergthal Gemeinde Buch Copyright: 1993 Edited by: John Dyck Published by: The Hanover Steinbach Historical Society Inc. ISBN: 0-9694504-0-3 Glossary References and abbreviations used in the annotations:...»

«A DIALECTAL READING OF THE HISTORY OF TRANSLATION1 Jairo Sánchez Galvis Jairo.Sanchez@sta.uwi.edu University of the West Indies Abstract The translation of dialect is one of the most difficult and yet interesting challenges facing literary translators. Although theoretical contributions about dialect translation develop mainly from 1960, this article proposes a historical reading of the history of translation from antiquity to the first half of the 20th century, inquiring about the...»

«A Composer’s Experience in Negotiating Traditions: The Incorporation of Popular Music Influences in Contemporary Compositions Heath Mathews Abstract For the contemporary composer, the issue of inheriting a compositional tradition is complex. In addition to an understanding and incorporation of canonical sources, composers are faced with the prospect of reconciling the influences of popular music, world music, and other musical idioms. In particular, the integration of popular music influences...»

«Journal of Organizational Behavior J. Organiz. Behav. 31, 726–748 (2010) Published online in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/job.704 Finding a place in history: Symbolic and social networks in creative careers and collective memory CANDACE JONES* Organization Studies Department, Boston College, Chestnut Hill, Massachusetts, U.S.A. Boundaryless careers are pervasive, and yet we have little understanding of the boundaries Summary imposed by categorization processes...»

«The Use of Italian and Dialect as a Politeness Strategy Emily Romanello Stony Brook University Department of Linguistics Honors Thesis May 2010 Advisor: Lori Repetti Table of Contents Abstract 4 Section 1: Introduction 4 Section 2: Background Information 6 2.1 History of Italian and its Dialects 6 2.2 Ferguson‟s Notion of Diglossia (1959) 11 2.3 Diglossia and Italy 12 2.4 Italy and Dilalia 14 2.5 Bilingualism and Diglossia in Italy 15 2.6 Implications 17 Section 3: Literature Review 19 3.1...»





 
<<  HOME   |    CONTACTS
2016 www.dis.xlibx.info - Thesis, dissertations, books

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.