ETYMOLOGY AND ELECTRONICS: THE AFROASIATIC INDEX
By Gene Gragg, Professor of Near Eastern Languages
The Oriental Institute
The University of Chicago
(This article originally appeared in The Oriental Institute News and Notes, No. 149, Spring 1996, and is made available electronically with the permission of the editor.)
The word "etymology" comes wrapped in musty, bookish connotations. It brings up memories of the initial section of lexical entries, often in smaller print, encountered while browsing around in older, more compendious dictionaries. In these somewhat detached sections one can pick up unexpected, and sometimes delightful, but somehow not very practical bits of anecdotal information-for example that the (native) English word "dough" and (the ultimately Latin loan-word) "fiction" are historically related through regular developments, which took place independently in Germanic and Italic, from the same ancestral Indo-European root reconstructed as *dheigh- "knead, fashion".
Approaching etymology from this angle, it is easy to lose sight of, or never even be aware of, a) the fact that the establishment of a historical (linguists often use the biology-influenced term "genetic") relationship among a group of languages, that is, the fact that they are descendants of the language of an earlier single parent speech community, and b) the reconstruction of this parent ("proto-") language as well as c) the working out of the historical ("evolutionary") steps whereby the parent language became differentiated into the various daughter languages-all of this depends crucially and centrally on the ability of the historical linguist to establish sets of etymologically related words ("cognate sets") within the language family, and to work out regular phonological and morphological correspondences within and between these sets. It is this, and only this, process that entitles a linguist to assert that the languages in question are indeed genetically related, and that the resemblances are not simply the result of contact or convergence between independent speech communities. Thus the first step towards being able to draw that historically and socio-culturally important conclusion is the establishment of a sufficiently large set of related lexical items, in other words an etymological dictionary or database.
To continue the illustration with "dough," the fact that we can: (1) establish large numbers of equations such as English dough = German Teig , English deed = German Tat, English deep = German tief, heap = Haufe, hip = Hüfte (adding of course cognate items in Dutch, Scandinavian, Gothic, and older periods of English and German); and (2) observe regular phoneme correspondences such as English d = German t (in the first three items), and English p = German f (in the last three) - all this, in conjunction with many other observations both linguistic and historical-archaeological, enables historical linguists to state with a certain amount of confidence:
-that there was a (more or less) unitary proto-German speech community somewhere in north-central Europe, probably sometime late in the first millennium bc
-that all attested Germanic languages are developments of this proto-speech community
-that a fair amount of information can be recovered about what this language was like (for example that the five partial cognate sets cited above are reflexes of proto-Germanic lexical items which, according to one reconstruction, may have been something like: *daigo-z "dough," *daedi-z "deed," *deupo-z "deep," *haupo-z "heap," *hupi-z "hip").
The reconstruction of proto-Germanic, and the relating of Germanic along with Celtic, Italic, Greek, Albanian, Armenian, Slavic, Iranian, and Indic to a superfamily called Indo-European, was one of the great intellectual achievements of the nineteenth century, and one that attracted some of its greatest minds. Building on this magnificent foundation, work in this area still goes on, with newly discovered languages being added (for example, Hittite), and new discoveries being made concerning the process of differentiation and diffusion of Indo-European, and the date and location of the ancestral speech community.
Around the same time that they were discovering Indo-European, scholars were becoming aware of the existence of other major families like Semitic (uniting, among others, Akkadian, Aramaic, Hebrew, Ugaritic, Arabic, South Arabian, and Ethiopic). Progress here however has been much less dramatic. In part because new languages are continually being discovered and added to the list (e.g., Eblaite), and because fundamental research tools in the individual branches (e.g., the Chicago Assyrian Dictionary) are still being compiled, a real etymological dictionary of Semitic still does not exist. To make matters worse, evidence has been accumulating that Semitic is not an isolated family, but is itself part of a superfamily, probably older than Indo-European, which stretched over large parts of Northern and Eastern Africa and Western Asia. This family, sometimes still called "Hamito-Semitic," but now more often "Afroasiatic" or "Afrasian" includes-besides Semitic-Egyptian, Berber, Cushitic (a heterogeneous group of dozens of languages, including Somali, centered around the Horn of Africa), Omotic (a large group of languages in Southwest Ethiopia), and Chadic (more than a hundred languages, including Hausa, spoken over a large sub-Saharan area centered around Lake Chad). Relationships are still being established within the last four groups, many individual languages are very poorly known, and new information is coming in on an almost daily basis. Clearly we are on the verge (or over the edge) of information overload. There are more pieces of information around and more heterogeneous and even contradictory hypotheses about their relationships than anyone can easily keep track of. Thus it is becoming harder and harder to draw together the material for potential cognate sets and sound correspondences, as well as relevant textual, historical, and archaeological detail, which will make possible, first, the firm establishment of Afroasiatic as a language family, and then the drawing of some reasonable hypotheses about its nature, its place and time of origin, and its differentiation and diffusion.
To help in the process of systematization of what is becoming an increasingly amorphous heap of unassimilated information, the Oriental Institute is sponsoring a project that will draw on the two closely related and developing, not to say exploding, technologies which are being harnessed in many different contexts to stay on top of a rising flood of information-electronic data processing and, courtesy of the Internet and the World-Wide Web, data communication. We are currently setting up the Afroasiatic Index, a major source of historical linguistic information. It should permit access to the most reliable current information (including alternate and mutually incompatible hypotheses) about family-level and super-family-level cognate sets, correspondence sets, sound changes, morphological correspondences, and relevant bibliography. Of its major subparts, the Semitic Index, the Egyptian Index, the Cushitic Index, and the Omotic Index can be handled within the Oriental Institute or through contacts whom we already work. With the Berber Index and the Chadic Index, we are currently working on contacting extramural collaborators or outsourcing the work.
A precursor of the Cushitic Index, and something of a pilot for the whole project, has been the Cushlex project, initiated in 1987 with the help of a National Science Foundation Grant. The object of that project was to explore the possibility of using standard relational database file formats and off-the-shelf database managing software to create and maintain an etymological database (cognate sets, correspondence sets, sound changes, bibliography) for Cushitic and Omotic. Inevitably cognates were noted between these languages and the other major branches of Afroasiatic, so that the project early on acquired a certain Afroasiatic dimension. Indeed, as has been noted by other investigators, Cushitic, with its major subfamilies of Bedja, Agaw, East Cushitic, and South Cushitic, is such a heterogeneous group that the question seriously arises whether it is really a separate "family" at all, or just a collection of Afroasiatic language families which through geographic proximity on and around the Horn of Africa stayed linguistically closer to one another than more widely distributed sister families (perhaps thereby making this area a good candidate for the "home" of Afroasiatic?). The database, implemented in one of the commercial DBMS (database managing software) packages, has been available from the Oriental Institute in a preliminary form since 1994. It is designed to run on a single PC, and data and programs have been distributed to interested users in diskette format (sent by U.S. mail or by electronic ftp [file transfer protocol] on the Internet).
At present the Afroasiatic Index web page is under construction, but open, and accessible from the Oriental Institute Home Page (http://oi.uchicago.edu/OI/default.html). A prototype of the current interface with a complete set of data can be seen in "Semitic Index" section, which now also integrates a module on morphological information. The Cushlex material will be transferred to Web format in the course of the Spring Quarter, even as work progresses on other fronts.
Pardon our dust, but please do drop in on us and look around-we would appreciate reactions, comments, and suggestions.
In addition to research and teaching in the peripheral languages of the Ancient Near East, Gene Gragg has long been occupied with the Semitic and Cushitic languages of Ethiopia. He did lexical research in Ethiopia and has published a dictionary of the Cushitic language Oromo. Computational (and Northwest Semitic) expertise for the Afroasiatic Index is being provided by Richard Goerwitz, Research Associate and Lecturer in the Department of Near Eastern Languages and Civilizations, and recent Ph.D. in that department.
Revised: July 30, 2007