« Grammar-writing group (2) - general properties | Blog home | OZCLO- wanna be part of it? »

business learning training articles new learning business training opportunities finance learning training deposit money learning making training art loan learning training deposits make learning your training home good income learning outcome training issue medicine learning training drugs market learning money training trends self learning roof training repairing market learning training online secure skin learning training tools wedding learning training jewellery newspaper learning for training magazine geo learning training places business learning training design Car learning and training Jips production learning training business ladies learning cosmetics training sector sport learning and training fat burn vat learning insurance training price fitness learning training program furniture learning at training home which learning insurance training firms new learning devoloping training technology healthy learning training nutrition dress learning training up company learning training income insurance learning and training life dream learning training home create learning new training business individual learning loan training form cooking learning training ingredients which learning firms training is good choosing learning most training efficient business comment learning on training goods technology learning training business secret learning of training business company learning training redirects credits learning in training business guide learning for training business cheap learning insurance training tips selling learning training abroad protein learning training diets improve learning your training home security learning training importance

Peter K. Austin
Department of Linguistics, SOAS
7th September 2008

In a recent blog post, Jane Simpson reported on opinions expressed by a group at ANU meeting to discuss grammar writing:

"We all agree it's a good thing to publish glossed texts so that readers can check out the hypotheses proposed in the grammar, and expressed by the glossing."

I'd like to inject a note of caution here. It seems to me that many times published texts, with interlinear glossing or not, and especially those that derive from transcriptions of spoken language, have often been fiddled with (or to put it more politely 'edited') on their way from recording to printed page. This is also often true of published texts that are based on written originals produced by literate native speakers. It is rarely the case that, as Wamut commented about Jeffrey Heath's work on Ngandi at the end of Jane's blog post:

"What is especially great, is that when you go back to Heath's archived field recordings, the spoken texts are there in pristine form, that is, the spoken text and written text correlate perfectly" [emphasis added]

Heath adopted the same principle of "perfect correlation" in his published work on other languages such as his 1980 Nunggubuyu Myths and Ethnographic Texts which clearly states in the introduction: "in the texts presented here I have not 'weeded out' false starts, intrusive English words, or grammatical errors by the narrators".

In many other cases of text publication, I know editing has taken place -- I have done it myself, and some other researchers have admitted to it (though rarely indicating exactly what editorial changes were made -- more on this below). The texts in my 1997 book of Texts in the Mantharta Languages, Western Australia. [Tokyo: ILCAA, Tokyo University of Foreign Studies] were heavily edited, though I didn't mention that in print at the time, and it was only when it came to creating a multimedia Jiwarli website where both published texts and original recordings were presented that I had to confess: "[y]ou may also notice that the Jiwarli texts are not word for word identical to the sound files, as Jack Butler, after recording the stories, made his own corrections in the texts". There was no attempt to deceive here, rather it was Jack's explicit wish that the stories be edited for publication.

As an example, consider published Text 50 (which appears on the website here) and the way it corresponds to the original recording (italics indicates material on the tape which was deleted in the editing process, bold indicates text added during editing, and { x == y} indicates substitution during editing):

Nhukuramartuthu ngurrunyjarri julyumartu ngunha nhanyaartu {porcupinemanha == jiriparrinha} puniyanha. {porcupine == Jiriparri} ngunha jakuparlarrirarru. Ngurntirarri jakuparlarru parnajipithu ngunha warrirru nhanyapuka. Ngurrunyjarrilu yarnararnilaartu ngurntapuka ngunhapa jakuparla. Wangkirarringu. Yarnararrima nhurra. Yarnararrima nhurra. Ngatha {nhurranha murrurrpa manara nhurranha}. Yarnararrima. Ngatha {nhurranha murrurrpa manara nhurranha}. Yarnararrima. Ngatha murrurpa manara nhurranha. Kunyarnurru ngunha kumpanhu. {Porcupinemanha == Jiriparri} ngunha kurlkanyunthurru yarnararrira. When he Yarnararrirathu parnarru thangkalpuka wurungku wirntupinyangurru pirrurru yanararri thikaru.

Editorial changes that Jack and I made are the following:

  • replacement of the loan word 'porcupine' with the indigenous word jiriparri, and deletion of the English expression 'when he'
  • omission of the enclitics: -thu 'old information', -pa 'specific referent' in order to decontextualise reference
  • omission of repetition three repeats of 'Lie on your back. I'll get you cicatrices'
  • reordering of constituents: the possessor 'your' and 'cicatrices' are separated on the tape but were made adjacent in the editing for publication


Wamut also mentions in his comment on Jane's post another possible way in which published texts can differ from recordings:

"I've heard other spoken texts vary from the published text because the field worker has interrupted the speaker for clarification etc."

There are also cases I know of where speakers "interrupt" themselves. My colleague David Nathan tells me that when he was working with Luise Hercus to produce a multimedia CD-ROM of Baagandji materials, he found Luise's audio recordings of stories also contained interpolations and explanations in English by the speaker which do not appear in the published texts.

I think descriptive linguists and language documenters could well take some guidance in this area from the work of epigraphers who have been developing a TEI/XML markup for epigraphy called EpiDoc. Some of the EpiDoc proposals are concerned with adaptation of the TEI guidelines to deal with a range of issues such as legibility of characters on stone, missing elements or partially represented signs, but in addition there are several issues that I think should equally be of concern to language documentation:

  • additions and deletions to the text
  • editorial supplements, observations, and hypotheses ‚Äì including:
    • identification and expansion of abbreviations understood by the editor
    • identification of abbreviations not understood by the editor
    • editorial supplement in which the editor makes a "subaudible" word manifest
    • editorial supplement in which the editor explains a "breviatio" or note
    • editorial supplement for characters wholly lost
    • letters omitted because the stonecutter did not carry out the text to the end
  • editorial corrections
    • letters erroneously included in the text, which the editor suppresses
    • letters erroneously omitted from the text, which the editor adds
    • letters erroneously substituted in the text, which the editor corrects


The EpiDoc guidelines contain explicit recommendations on how to encode these as markup annotations to the text. For work on endangered languages I think there are some additional aspects that should be encoded, especially because we need to typically distinguish at least three participants in the process of published text creation, namely the original speaker, the transcriber, and the linguist-editor. We should pay attention to:

  • encoding code-switching, code-mixing and borrowing, ideally by coding for the language (or variety) of the items transcribed
  • puristic editorial amendments on the part of the transcriber
  • puristic editorial amendments on the part of the linguist
  • deletions by the transcriber
  • additions by the transcriber
  • reorderings by the transcriber
  • additions and clarifications (editorial comments) by the linguist-editor
  • when the transcriber is not the originally recorded speaker we need to deal with (1) inter-speaker variation at the dialect or idiolect level and (2) inter-speaker variation arising from language loss, eg. phonemic or grammatical reduction among semi-speakers in a later generation transcribing earlier recorded texts

To my mind, it will only be when linguists make available marked up documents encoding these aspects along with the published texts, and the original media recordings (ideally publically available through an archive or distributed on CD or DVD along with the published texts), that we can start truly talking about "falsifiability" of grammars and other analytical claims about languages. The "published texts" alone are often simply not enough.


Notes:
1. The ideas presented here have been fermenting since they were first publicly presented at an ELAP Workshop at SOAS in February 2005. At the Simposio Internacional: Contacto de Lenguas y Documentatión (International Symposium on Language Contact and Documentation) held in Buenos Aires last month, Ulrike Mosel presented a paper entitled "Putting oral narratives into writing – experiences from a language documentation project in Papua New Guinea" in which she explored the issue of editing recorded Teop texts for publication. She independently identified many of the same issues I outline here.

2. I have been unable to find any discussion of the importance of explicit encoding of transcriptional and analytical editing decisions among the list of "best practices" promoted, eg. by the E-MELD School of Best Practice, despite the fact that, to me at least, they play an important role in "practices which are intended to make digital language documentation optimally longlasting, accessible, and re-usable by other linguists and speakers".

Comments

There is some nice discussion about the material presented here over at Claire Bowern's blog, especially in the comments section.

Peter Austin makes a very important point. It was for this reason that I made a CD version of my PhD thesis and subsequent book on the Tai Languages of Assam. In the version of the grammar on the CD (which has exactly the same text as the printed book), linguistic examples are linked to sound files so the reader can hear the recording and judge for themselves. So far nobody has written back to me and suggested an alternative reading or analysis, but I look forward to that day.

Stephen - I haven't seen your CD version of the grammar, but it sounds similar to Nick Thieberger's Grammar of South Efate (and his previous PhD on the language) which was published with CD of the sound files in the back.

I was also trying to make the point that there may be good reasons to "clean up" transcriptions or written texts for publication (community preferences for editing, removal of loan words, corrections of 'slips of the tongue' etc) but anyone who does that should also make available annotations (eg. in an XML version, as the epigraphers do) which explain why the published version differs from the original. That makes for good science and helps people in the future understand how published texts can differ from other potential sources of data.

Interesting post. Nick Enfield's recent Lao grammar (Mouton) has meticulously transcribed conversations (including of course all the errors, false starts, repairs, etc.) instead of the traditional texts appended. Many of the examples given in his grammar are drawn from these (and other) conversations.

His motivation is worth quoting: 'The texts supplied in this chapter illustrate the kind of discourse in which Lao grammar emerges. (...) The choice to concentrate exclusively on conversation here is a form of affirmative action. Conversation as a structured domain is under-studied in linguistics compared to research on structure in semantics and sentence-level syntax. Yet conversation is by far the dominant, unmarked genre in language usage, and in language acquisition. This chapter reverses the usual balance in the 'texts' section of grammars: elicited monologues, with a very occasional fragment of conversation. (...) [W]ith a large enough sample, conversation yields the full complement of a language's structural resources, including embedded narratives, procedural descriptions, and similar genres more familiar to descriptive linguistics.' (Enfield 2007:487)

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Enter the code shown below before pressing post

The Authors

About the Blog

The Transient Building, symbolising the impermanence of language, houses both the Linguistics Department at Sydney University and PARADISEC, a digital archive for endangered Pacific languages and music.
More

FAQ

Papua New Guinea FAQs from Eva Lindstrom Papua New Guinea (New Ireland): Eva Lindstrom's tips for fieldworkers

Australian Languages Answers to some frequently asked questions about Australian languages

Papua Web Information network on Papua, Indonesia (formerly Irian Jaya)

Hibernating blogs

Indigenous Language SPEAK

Langguj gel Australian linguistics and fieldwork blog

Interesting Blogs

Omniglot Writing systems and languages of the world

LingFormant Linguistics news

Language hat Linguistics news and commentary

Jabal al-Lughat Linguistics news and commentary on a range of languages

Living languages Blog with news items and discussion of endangered languages

OzPapersOnline Notices of recent work on the Indigenous languages of Australia

That Munanga linguist Community linguist blog

Anggarrgoon Claire Bowern's linguistics and fieldwork blog

Savage Minds A group blog on Anthropology

Fully (sic)

Language on the Move Intercultural communication and multilingualism

Talking Alaska: Reflections on the native languages of Alaska

Culture matters: applying anthropology Australian anthropology blog: postgraduates and staff

Long Road ethnography and anthropology blog - including about Australia

matjjin-nehen Blog on Australian linguistics, fieldwork, politics and the environment.

Language Log Group blog on language and linguistics

Links

E-MELD The E-MELD School of Best Practices in Digital Language Documentation

Tema Modersmål Website in Swedish with links to sites on and in many languages

Hans Rausing Endangered Languages Project: Language Documentation: What is it? Information on equipment, formats, and archiving, and examples of documentation

Indigenous Peoples Issues & Resources a worldwide network of organizations, academics, activists, indigenous groups, and others representing indigenous and tribal peoples

Technorati Profile

Technology-enhanced language revitalization Include ILAT (Indigenous Languages and Technology) discussion list.

Endangered languages of Indigenous Peoples of Siberia

Koryak Net Information on the people of Kamchatka

Linguistic fieldwork preparation: a guide for field linguists syllabi, funding, technology, ethics, readings, bibliography

On-line resources for endangered languages

Papua New Guinea Language Resources Phonologies, grammars, dictionaries, literacy, language maps for many PNG languages

Resource network for linguistic diversity Networking practitioners working to record,retrieve & reintroduce endangered languages

Projects

ACLA child language acquisition in three Australian Aboriginal communities

DELAMAN The Digital Endangered Languages and Musics Archives Network

PARADISEC The Pacific And Regional Archive for Digital Sources in Endangered Cultures

Murriny-Patha Song Project Documenting the language and music of public songs and dances composed and performed by Murriny Patha-speaking people

PFED The Project for Free Electronic Dictionaries

DOBES Endangered language documentation and archiving, funded by the Volkswagen Foundation and sponsored by the Max Planck Institute, Nijmegen.

DELP Documenting endangered languages at the University of Sydney

Ethno EResearch Exploring methods and technology for streaming media and interlinear text