« Doing Great Things with Small Languages | Blog home | Fieldwork training workshop in Manchester - Peter K. Austin »

business learning training articles new learning business training opportunities finance learning training deposit money learning making training art loan learning training deposits make learning your training home good income learning outcome training issue medicine learning training drugs market learning money training trends self learning roof training repairing market learning training online secure skin learning training tools wedding learning training jewellery newspaper learning for training magazine geo learning training places business learning training design Car learning and training Jips production learning training business ladies learning cosmetics training sector sport learning and training fat burn vat learning insurance training price fitness learning training program furniture learning at training home which learning insurance training firms new learning devoloping training technology healthy learning training nutrition dress learning training up company learning training income insurance learning and training life dream learning training home create learning new training business individual learning loan training form cooking learning training ingredients which learning firms training is good choosing learning most training efficient business comment learning on training goods technology learning training business secret learning of training business company learning training redirects credits learning in training business guide learning for training business cheap learning insurance training tips selling learning training abroad protein learning training diets improve learning your training home security learning training importance

This semester, I have been helping out Jane with her wonderful Field Methods class in technical matters such as recording, uploading files onto the server and allowing students to securely and quickly download both .wav and .mp3 files. I took this course myself some years ago, and it was a great experience for me and the whole class, and many members of that class have continued on in their studies to do field research of their own, and I'm sure the Field Methods class was as much a help to their research as it was to mine.

But this post is not about when I took the class. Instead, it's about how I almost buggered up this semester's class in what can best be described as a lesson in keeping backups of your recordings.

(Warning: Some computer nerd stuff follows after the fold.)

The course is being run in conjunction with Paradisec, which is where my helping hand comes in. We provided the equipment for the class to record their informants (two Karo Batak speakers), and provided space on our server for the recordings to sit. Eventually they will be archived in the larger Paradisec collection.

I always like to find ways of doing things quickly and simply using fairly simple programming. I'm not much of a programmer as such, but I know my way around bash, and have been using it to do most things automatically, such as moving things around and producing mp3 files of each recording.

This week, the field methods students began their individual sessions with their informant, meaning that there are suddenly many more than just one recording per week; in fact there are closer to five or six per day, for two days per week. With this in mind, Jane suggested we organise the recordings into directories based on what day they were recorded. A very sound suggestion which I was happy to implement.

Of course, it would have been too easy to do it manually; so I tried to do it in a couple of lines of code. The first step was to take the names of the recordings (which are named in line with out specifications at Paradisec), and create directories based on those filenames such that they will contain all recordings on a given day. To take an example, we might have a list of recordings such as the following.

  • FM2-20100310-01.wav
  • FM2-20100317-01.wav
  • FM2-20100317-02.wav

The command I wrote would create two directories, called FM2-20100310 and FM2-20100317 (the command would also try to produce a directory for the last file, but it fails, since it already exists after being created for the second). Here was the code:

$ for x in *; do mkdir ${x%-*} ; done
Translation: for all files, make a directory of the same name, but strip off everything from the last dash (-).

This worked fine, and despite some redundancy I had a bunch of directories, one for each day. The next step then was to move each file, like those above, into the directories that correspond to that day (which is always predictable from the filename). The code for this should have been:

$ for x in *; do mv $x ${x%-*}/ ; done
Translation: For all files, move them to the directory which has the same name, but with everything from the last dash (-) stripped off.

However I missed the crucial forward-slash in the section ${x%-*}/, meaning I had sent the files not to the directories of the same name, but to files of that name.

Now, when you have many files of the same day, the output filename for this command is the same. So the way the command is run, it takes the first file, say FM2-20100420-01.wav, and moves it (which is synonymous with renaming it) to the file FM2-20100420. If there is a second file, let's say FM2-20100420-02.wav, then it similarly moves it to FM2-20100420, thus overwriting what was there before.

As I pointed out earlier, only this week did the class begin their individual sessions, so only this week was there more than a single recording in a given day. And therefore only recordings from this week were adversely affected (by which I mean deleted). The others were merely renamed.
Luckily, I realised what was going on by the fact that it was taking far too long to perform a mere move, and managed to stop it after only a couple of files got deleted. Even more luckily, especially as this saved my own skin, we have kept backups and the data is safe.

The problem can be boiled down - computationally speaking - to a mere missing slash. But the real culprit here was my trying to be too clever by half.

So let this be a lesson: Always, always keep backups. Especially if you are going to do any work on your recordings, even if you think it's as mundane as simply moving them from one location to another.

Comments

I always tell my students there are three basic principles of
documentary linguistics: backup, backup, backup. And not just any kind
of backup - one that's not useless (for which have a look at this
advice
).

Interesting to see the word "informant" used a couple of times in this post.

And Backup Offsite

- a student has just had her laptop AND backups on memory sticks stolen. All that labour coding data.... Terrible.

Jane - that's one of the ways to create useless backups, as explained in the web page I linked to.

I recommend using Dropbox or similar facilities (like Files Anywhere or Jungle Disk) which provide a couple of gigabytes of storage for free, or more storage at fairly low cost. Alternatively, open a Gmail account and email files to yourself, or get on Google Docs which provides 1 Gigabyte of free storage (you can purchase more for US 25 cents per Gigabyte, and you can set different access priviledges to files stored there). Reportedly it will soon be possible to store any type of file, including audio and video, on Google Docs in its original format (though there is an upload limit of 250 Mbytes per file)

Aidan,

You might like to add "Always carefully test your regular expressions" to your lesson.

As on old Unix hand I usually test things like this by replacing the "mv" or "mkdir" with an "echo" and have a close look at the output before committing to something irrevocable.

Oh, and Peter, I recommend Dropbox to everyone. I swear by it for both backup and convenience - no more leaving files at home or work.

Tony

Yes, I was introduced to Dropbox by Claire Bowern and have used it extensively, both to backup files and to share files with colleagues on the other side of the world when doing a joint publication project. I have found it simple and easy to use, and, as you say, it means files are accessible from anywhere. We have also used Googledocs to write documents together (so much easier than emailing "track changes" documents back and forth) though there are limitations on what you can do with Googledocs, especially in terms of formatting.

This site has some great information about backup -- different types and methods are discussed in detail. It is intended for digital photographers and was an initiative funded by Library of Congress, but has advice with wide applicability. See also here.

Hi Aidan, I enjoyed reading this little anecdote, having tried myself to be too clever with little shell scripts. Apart from the obvious importance of backups which you thankfully had, I always like to try it out on a test file/directory first.

Nice to hear you are helping out with the Field Methods class - I have such fond memories of our semester with Muna. That course is gold.

Shelley.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Enter the code shown below before pressing post

The Authors

About the Blog

The Transient Building, symbolising the impermanence of language, houses both the Linguistics Department at Sydney University and PARADISEC, a digital archive for endangered Pacific languages and music.
More

FAQ

Papua New Guinea FAQs from Eva Lindstrom Papua New Guinea (New Ireland): Eva Lindstrom's tips for fieldworkers

Australian Languages Answers to some frequently asked questions about Australian languages

Papua Web Information network on Papua, Indonesia (formerly Irian Jaya)

Hibernating blogs

Indigenous Language SPEAK

Langguj gel Australian linguistics and fieldwork blog

Interesting Blogs

Omniglot Writing systems and languages of the world

LingFormant Linguistics news

Language hat Linguistics news and commentary

Jabal al-Lughat Linguistics news and commentary on a range of languages

Living languages Blog with news items and discussion of endangered languages

OzPapersOnline Notices of recent work on the Indigenous languages of Australia

That Munanga linguist Community linguist blog

Anggarrgoon Claire Bowern's linguistics and fieldwork blog

Savage Minds A group blog on Anthropology

Fully (sic)

Language on the Move Intercultural communication and multilingualism

Talking Alaska: Reflections on the native languages of Alaska

Culture matters: applying anthropology Australian anthropology blog: postgraduates and staff

Long Road ethnography and anthropology blog - including about Australia

matjjin-nehen Blog on Australian linguistics, fieldwork, politics and the environment.

Language Log Group blog on language and linguistics

Links

E-MELD The E-MELD School of Best Practices in Digital Language Documentation

Tema Modersmål Website in Swedish with links to sites on and in many languages

Hans Rausing Endangered Languages Project: Language Documentation: What is it? Information on equipment, formats, and archiving, and examples of documentation

Indigenous Peoples Issues & Resources a worldwide network of organizations, academics, activists, indigenous groups, and others representing indigenous and tribal peoples

Technorati Profile

Technology-enhanced language revitalization Include ILAT (Indigenous Languages and Technology) discussion list.

Endangered languages of Indigenous Peoples of Siberia

Koryak Net Information on the people of Kamchatka

Linguistic fieldwork preparation: a guide for field linguists syllabi, funding, technology, ethics, readings, bibliography

On-line resources for endangered languages

Papua New Guinea Language Resources Phonologies, grammars, dictionaries, literacy, language maps for many PNG languages

Resource network for linguistic diversity Networking practitioners working to record,retrieve & reintroduce endangered languages

Projects

ACLA child language acquisition in three Australian Aboriginal communities

DELAMAN The Digital Endangered Languages and Musics Archives Network

PARADISEC The Pacific And Regional Archive for Digital Sources in Endangered Cultures

Murriny-Patha Song Project Documenting the language and music of public songs and dances composed and performed by Murriny Patha-speaking people

PFED The Project for Free Electronic Dictionaries

DOBES Endangered language documentation and archiving, funded by the Volkswagen Foundation and sponsored by the Max Planck Institute, Nijmegen.

DELP Documenting endangered languages at the University of Sydney

Ethno EResearch Exploring methods and technology for streaming media and interlinear text