« Transience and permanence on the web | Blog home | Bits and obits »

business learning training articles new learning business training opportunities finance learning training deposit money learning making training art loan learning training deposits make learning your training home good income learning outcome training issue medicine learning training drugs market learning money training trends self learning roof training repairing market learning training online secure skin learning training tools wedding learning training jewellery newspaper learning for training magazine geo learning training places business learning training design Car learning and training Jips production learning training business ladies learning cosmetics training sector sport learning and training fat burn vat learning insurance training price fitness learning training program furniture learning at training home which learning insurance training firms new learning devoloping training technology healthy learning training nutrition dress learning training up company learning training income insurance learning and training life dream learning training home create learning new training business individual learning loan training form cooking learning training ingredients which learning firms training is good choosing learning most training efficient business comment learning on training goods technology learning training business secret learning of training business company learning training redirects credits learning in training business guide learning for training business cheap learning insurance training tips selling learning training abroad protein learning training diets improve learning your training home security learning training importance

Jane's last post and a post on the ever excellent Language Log have got me thinking about permanence and accountability in the internet age. Its a theme that I encounter again and again, working for a digital archive.

First, Mark Liberman's post on Language Log was a fairly scathing breakdown, reference by reference, article by article, that showed that a point supposedly backed up by hard evidence, well, wasn't. A great effort really. And thanks to his extensive linking, and by simply placing the relevant articles online, we too can come to much the same conclusion. Well, admittedly, I just took his word for it... but that's all I've got time for over my morning coffee.

Exciting things are happening in academia with individuals and organisations starting to fully utilise the internet. As Jane mentions, Sydney University is having great success with several new digital initiatives. I think DSpace is the bee's knees! Its going to be wonderful. Its one step closer to directly linking to the actual section of the actual article which you're interested in. That's the kind of functionality that I'm after.

Great, you say, that'll save me 10 minutes of searching in the library. Well, yes, says I, but that's not what's interesting. Imagine linking directly and explicitly to the paragraphs or sentences that you are interested in. That's heaps better than a simple article reference. Suddenly the reader can discover exactly what your talking about, quickly, and by jumping directly from one article to the next, in a way that you're already accustomed to.

But really that's just the beginning. Think of the Altavista to Google leap. (This over simplifies it a bit but,) Altavista was a simple but vast index of content words and meta tags. Google came along with, amongst other things, the idea that links actually expressed something meaningful, and suddenly the internet became a whole lot more useful.

Well, imagine reading an interesting article and being able to see who quoted it. Imagine a density plot of the most popular quotes overlaid on the text of the article. What parts of a paper are people talking about the most? You could establish quotable articles and the articles that quoted the quotables and became quotable themselves. Imagine looking at quotability on a time line. You would be able to see the "hot spots" in the development of ideas over time. These would be simple additions to a modern search engine, and in fact have already be added in a rough sense. All that needs to happen (maybe that should be scare quoted) is for researchers to adopt a new referencing scheme and to archive their articles in digital repositories.

Leap sideways, imagine that instead of articles, we put up our raw research and/or field work data. Not only can you link directly to the source(s) for your argument, so can others in their critique. Let them crunch the numbers if they don't believe you. Or say you're talking about discourse pragmatics, then people may like to hear for themselves the utterance's intonation. Even better, say you were unable to explore an interesting avenue, this leaves it open for someone else to come along and explore. This is a critical ability to facilitate when you're talking about endangered languages.

To me this extensive referencing is a straight forward way of increasing the empirical weight that a piece of research holds. Sure, where people have already referenced articles, it already technically there, but I'm talking about granularity of referencing here. And in terms of source data, I'm talking about qualifying your statements as explicitly as possible.

But, to come crashing down to reality again, this technology is not quite there yet. Well actually it kinda is there, the main problem as I see it is adoption. So first of all get uploading!. A good solid base of data is where this revolution will begin!

These are hot topics in the Documentary Linguistics and Digital Repositories fields. There'll be a fair bit of discussion of this at our upcoming conference, which is look's like its going to have a great line up of interesting papers. If this kind of thing interests you, then we hope to see you there!

Comments

On uploading to repositories, Sten Christensen just passed on this nice site, the Sherpa Romeo project which lists many publishers' and journals' conditions on uploading pre-prints and post-prints to archives. (Yes there's a Juliet project too).

Tom,

You say

Sydney University is having great success with several new digital initiatives. I think DSpace is the bee's knees! Its going to be wonderful. Its one step closer to directly linking to the actual section of the actual article which you're interested in. That's the kind of functionality that I'm after.

Unfortunately D-Space, particularly as implemented by the University of Sydney library (though none go much better) is limited while papers are published in PDF form. PDF does not allow the passing on of bookmark information from the URL so any hope of linking into the documents from a web page is immediately quashed.

Of course PDF is the best way to archive papers at the moment as it allows for easy translation from whatever word processing system you use while keeping such important things as footnotes and endnotes intact, not to mention formatting.

We really need an archiving system that will allow for computer translation of PDF files to well formed HTML built in. This is a partially solved problem, PDF to HTML translation can be seen, for example, by GMail users who are sent PDF enclosures though it is badly formed HTML, good for display but no other purpose.

Once we can translate PDF into well formed HTML then achieving your goal of adding the ability to link to individual paragraphs is not difficult.

The largest problem with digital archives is the tension between getting information that is as useful, both immediately and in the future, as possible and making it as easy as possible for people to add documents so that you get a reasonable body of works.

As Jane pointed out in her post the flow of documents into even the University of Sydney archive, which has a reasonably large base of working academics to draw from, is slow. The most succesful online information gathering, places such as flickr, youtube and del.icio.us ask of people nothing that does not immediately benefit them. Perhaps archivists need to contemplate ways they can "add value" either before or after publication so that we get a benefit from using digital archive spaces. We could then imagine a good flow of documents into the archive.

Tony Williams

As Jane pointed out in her post the flow of documents into even the University of Sydney archive, which has a reasonably large base of working academics to draw from, is slow. The most succesful online information gathering, places such as flickr, youtube and del.icio.us ask of people nothing that does not immediately benefit them. Perhaps archivists need to contemplate ways they can "add value" either before or after publication so that we get a benefit from using digital archive spaces. We could then imagine a good flow of documents into the archive.

Yes! Great point. I think the benefit is obvious for archival of field work data when we hear stories like this and this (so sorry to hear that!). In the case of raw data, these days Researchers increasingly need somewhere to put it. Its an extra sweetener if they get the ability to link to it extensively and directly in their work.

Theses are a bit harder, and quality is an issue. I think the current mechanisms that determine whether a web page is of sufficiently good quality to appear first in your web search are good enough already... we just need the linking for that to work with theses. So I say take whatever you can get for now. As for actually getting people to add material, publishing conference proceedings automatically is one way, encouraging honours and PhD students to publish is another. Many students feel their hard work goes nowhere once they've finished... this gives it a bit more prominence. On the other hand some students finish their thesis and then get embarrassed about showing it to other people, let alone the world!

Maybe if they library offered to print up and bind their theses at some discount rate that would encourage people to send it off... hmmm. I wonder if that could work? It'd be kinda neat.

Anyway, you're right. We definitely need to figure out a way of making it more attractive for people to upload.

Once we can translate PDF into well formed HTML then achieving your goal of adding the ability to link to individual paragraphs is not difficult.

Yes, I agree, PDF isn't really the bee's knees when it comes to linking...future HTML standards was in fact what I was thinking about when talking about linking to paragraphs and sentences. Good quality pdf -> html conversion is really only a matter of time, so I see it as only a problem in the short term. No (or not much) information is being lost by storing them this way. And in much the same way we have to convert from video format to video format to keep out data accessible, I imagine we'll have to do the same with rich text documents too.

The Authors

About the Blog

The Transient Building, symbolising the impermanence of language, houses both the Linguistics Department at Sydney University and PARADISEC, a digital archive for endangered Pacific languages and music.
More

FAQ

Papua New Guinea FAQs from Eva Lindstrom Papua New Guinea (New Ireland): Eva Lindstrom's tips for fieldworkers

Australian Languages Answers to some frequently asked questions about Australian languages

Papua Web Information network on Papua, Indonesia (formerly Irian Jaya)

Hibernating blogs

Indigenous Language SPEAK

Langguj gel Australian linguistics and fieldwork blog

Interesting Blogs

Omniglot Writing systems and languages of the world

LingFormant Linguistics news

Language hat Linguistics news and commentary

Jabal al-Lughat Linguistics news and commentary on a range of languages

Living languages Blog with news items and discussion of endangered languages

OzPapersOnline Notices of recent work on the Indigenous languages of Australia

That Munanga linguist Community linguist blog

Anggarrgoon Claire Bowern's linguistics and fieldwork blog

Savage Minds A group blog on Anthropology

Fully (sic)

Language on the Move Intercultural communication and multilingualism

Talking Alaska: Reflections on the native languages of Alaska

Culture matters: applying anthropology Australian anthropology blog: postgraduates and staff

Long Road ethnography and anthropology blog - including about Australia

matjjin-nehen Blog on Australian linguistics, fieldwork, politics and the environment.

Language Log Group blog on language and linguistics

Links

E-MELD The E-MELD School of Best Practices in Digital Language Documentation

Tema Modersmål Website in Swedish with links to sites on and in many languages

Hans Rausing Endangered Languages Project: Language Documentation: What is it? Information on equipment, formats, and archiving, and examples of documentation

Indigenous Peoples Issues & Resources a worldwide network of organizations, academics, activists, indigenous groups, and others representing indigenous and tribal peoples

Technorati Profile

Technology-enhanced language revitalization Include ILAT (Indigenous Languages and Technology) discussion list.

Endangered languages of Indigenous Peoples of Siberia

Koryak Net Information on the people of Kamchatka

Linguistic fieldwork preparation: a guide for field linguists syllabi, funding, technology, ethics, readings, bibliography

On-line resources for endangered languages

Papua New Guinea Language Resources Phonologies, grammars, dictionaries, literacy, language maps for many PNG languages

Resource network for linguistic diversity Networking practitioners working to record,retrieve & reintroduce endangered languages

Projects

ACLA child language acquisition in three Australian Aboriginal communities

DELAMAN The Digital Endangered Languages and Musics Archives Network

PARADISEC The Pacific And Regional Archive for Digital Sources in Endangered Cultures

Murriny-Patha Song Project Documenting the language and music of public songs and dances composed and performed by Murriny Patha-speaking people

PFED The Project for Free Electronic Dictionaries

DOBES Endangered language documentation and archiving, funded by the Volkswagen Foundation and sponsored by the Max Planck Institute, Nijmegen.

DELP Documenting endangered languages at the University of Sydney

Ethno EResearch Exploring methods and technology for streaming media and interlinear text