« China eight eight oh eight | Blog home | Ethics and the linguist »

business learning training articles new learning business training opportunities finance learning training deposit money learning making training art loan learning training deposits make learning your training home good income learning outcome training issue medicine learning training drugs market learning money training trends self learning roof training repairing market learning training online secure skin learning training tools wedding learning training jewellery newspaper learning for training magazine geo learning training places business learning training design Car learning and training Jips production learning training business ladies learning cosmetics training sector sport learning and training fat burn vat learning insurance training price fitness learning training program furniture learning at training home which learning insurance training firms new learning devoloping training technology healthy learning training nutrition dress learning training up company learning training income insurance learning and training life dream learning training home create learning new training business individual learning loan training form cooking learning training ingredients which learning firms training is good choosing learning most training efficient business comment learning on training goods technology learning training business secret learning of training business company learning training redirects credits learning in training business guide learning for training business cheap learning insurance training tips selling learning training abroad protein learning training diets improve learning your training home security learning training importance

Peter K. Austin
Department of Linguistics, SOAS
8th August 2008

I took a couple of weeks off recently for my summer holidays during which I started reading an "airport book" (picked up at W.H. Smith's in the new Heathrow Terminal 5 under one of those ubiquitous "buy one get one half price" deals also offered by Waterstones, Blackwells and Borders throughout the UK -- even my local Tesco supermarket offers 50% discount on trade paperbacks). It is called The Black Swan by Nassim Nicholas Taleb (Penguin Books, 2007), and what attracted me to shell out my 6 pounds (sorry, readers in Australia) was the subtitle The Impact of the Highly Improbable and the blurb:

"This book is all about Black Swans: the random events that underlie our lives from bestsellers to world disasters. Their impact is huge: they're nearly impossible to predict; yet after they happen we always try to rationalise them."

Taleb is currently Dean's Professor in the Sciences of Uncertainty at the University of Massachusetts, Amherst, and has a background in probability theory, the study of empiricism and randomness, and Wall Street trading. A nice break from linguistics and endangered languages I thought.

Taleb's main thesis is that there are certain discoveries ("Black Swans") which are entirely unexpected ("outliers") but which have a major impact on beliefs and theories of the world that require post-hoc revisions to accumulated wisdom, attempting to make the discovery explainable and predictable. His writing style is rather egotistical, repetitive and dressed up in pop jargon for my taste (and, as I found out when I had finished the book, for other reviewers), however he does make a number of interesting points. One of these is a contrast between two contexts, what he calls "Mediocristan" and "Extremistan", as set out in his Table 1 (page 36) which I partially reproduce here:

Mild randomnessWild randomness
Most typical member is mediocreMost "typical" is either giant or dwarf, ie. there is no typical member
Corresponds (generally) to physical quantitiesCorresponds to abstract elements, eg. numbers
Total is not determined by a single instance or observationTotal can be determined by a small number of extreme events
Short-term observation identifies trendsLong-term observation required
Routine, obvious, predictableAccidental, unseen, unpredictable
History crawlsHistory makes jumps
Events are distributed according to the "bell curve" or its variantsEvents are either Mandelbrotian (tractable scientifically) or totally intractable

The context of what Taleb calls "Mediocristan" includes such things as height, weight, calorie consumption, income for a dentist, or mortality rates. The context of "Extremistan" includes (ultimately socially constructed) values like wealth, number of book sales, number of references on Google, commodity prices, and populations of cities (page 35). To help understand the contrast, Taleb gives an example: if we take 1000 random people and calculate their weights, we can identify a range of values and a total -- addition of one further individual (even the heaviest person on the planet) will make little difference to the total or range. If we look at their wealth, on the other hand, addition of a single individual, eg. Bill Gates, can result in an unpredictable jump (as Taleb surmises, Bill Gates' wealth will represent 99.9% of the new total, with all the others representing "no more than a rounding error for his net worth, the variation of his personal portfolio over the past second" (page 33)). The same would be true for book sales, and the subsequent addition of J.K. Rowling to the group.

What about language? Taleb mentions number of speakers and token frequencies of vocabulary items as being in the domain of "Extremistan" -- speaker numbers vary wildly with extreme outliers (Chinese with 1,200 million, other languages with 1 or 2), and as well known from corpus linguistics, a small number of word forms in any language are highly frequent while others can be vanishingly rare.

It seems to me that if Taleb's thesis is correct (and I have not been able to do justice to all the complexities of his arguments here), it has a further application in the realm of endangered and under-documented languages. It could form the basis for an (attractive) epistemological argument to respond to the question (which I have been frequently asked by members of the general public, at least) "Why study under-documented and endangered languages?" This argument can stand beside, or instead of, the intangible cultural heritage arguments promoted by Unesco, among others (that have been criticised for their fundamental neo-Whorfianism). Under-documented languages are potentially the domain of Black Swans, discoveries that are outliers in terms of currently constructed typologies (formalised or not) of human language (one thinks of extremes of phoneme inventories seen in small languages like !Xóõ or Rotokas, for example). This argument would provide a potential philosophical underpinning for the famous quotation by Martin Joos that "[L]anguages can differ from each other without limit and in unpredictable ways." (Martin Joos (ed.)1957 Readings in linguistics: the development of descriptive linguistics in America since 1925. Washington: American Council of Learned Societies)

The Authors

About the Blog

The Transient Building, symbolising the impermanence of language, houses both the Linguistics Department at Sydney University and PARADISEC, a digital archive for endangered Pacific languages and music.

Recently commented on


Papua New Guinea FAQs from Eva Lindstrom Papua New Guinea (New Ireland): Eva Lindstrom's tips for fieldworkers

Australian Languages Answers to some frequently asked questions about Australian languages

Papua Web Information network on Papua, Indonesia (formerly Irian Jaya)

Hibernating blogs

Indigenous Language SPEAK

Langguj gel Australian linguistics and fieldwork blog

Interesting Blogs

Omniglot Writing systems and languages of the world

LingFormant Linguistics news

Language hat Linguistics news and commentary

Jabal al-Lughat Linguistics news and commentary on a range of languages

Living languages Blog with news items and discussion of endangered languages

OzPapersOnline Notices of recent work on the Indigenous languages of Australia

That Munanga linguist Community linguist blog

Anggarrgoon Claire Bowern's linguistics and fieldwork blog

Savage Minds A group blog on Anthropology

Fully (sic)

Language on the Move Intercultural communication and multilingualism

Talking Alaska: Reflections on the native languages of Alaska

Culture matters: applying anthropology Australian anthropology blog: postgraduates and staff

Long Road ethnography and anthropology blog - including about Australia

matjjin-nehen Blog on Australian linguistics, fieldwork, politics and the environment.

Language Log Group blog on language and linguistics


E-MELD The E-MELD School of Best Practices in Digital Language Documentation

Tema Modersmål Website in Swedish with links to sites on and in many languages

Hans Rausing Endangered Languages Project: Language Documentation: What is it? Information on equipment, formats, and archiving, and examples of documentation

Indigenous Peoples Issues & Resources a worldwide network of organizations, academics, activists, indigenous groups, and others representing indigenous and tribal peoples

Technorati Profile

Technology-enhanced language revitalization Include ILAT (Indigenous Languages and Technology) discussion list.

Endangered languages of Indigenous Peoples of Siberia

Koryak Net Information on the people of Kamchatka

Linguistic fieldwork preparation: a guide for field linguists syllabi, funding, technology, ethics, readings, bibliography

On-line resources for endangered languages

Papua New Guinea Language Resources Phonologies, grammars, dictionaries, literacy, language maps for many PNG languages

Resource network for linguistic diversity Networking practitioners working to record,retrieve & reintroduce endangered languages


ACLA child language acquisition in three Australian Aboriginal communities

DELAMAN The Digital Endangered Languages and Musics Archives Network

PARADISEC The Pacific And Regional Archive for Digital Sources in Endangered Cultures

Murriny-Patha Song Project Documenting the language and music of public songs and dances composed and performed by Murriny Patha-speaking people

PFED The Project for Free Electronic Dictionaries

DOBES Endangered language documentation and archiving, funded by the Volkswagen Foundation and sponsored by the Max Planck Institute, Nijmegen.

DELP Documenting endangered languages at the University of Sydney

Ethno EResearch Exploring methods and technology for streaming media and interlinear text