Quick CLAN morcode lookup

author: Sasha Wilmoth date: 2017-10-12

tags: - Tutorial - CLAN - CHAT - Gurindji Kriol - Python

categories: - Scripts

Introduction

Each utterance in the Gurindji Kriol corpus has a tier with morphological information for each token in the transcription tier. It looks like this (with the \P marking a pronominal subject):

*FSO:	kayirra _k tubala karrap . 
%mor:	adv:loc|kayirra&g=north case:all|_k&g=ALL
	pro|dubala&3DU&k=those_two\P v:tran|karrap&g=look_at .
%eng:	Those two are looking to the north.

Whereas the lexicon that all these codes are taken from looks like this:

...
_k {[scat case:all]} "_k&g" =ALL=
_k {[scat der:fact]} "_k&g" =FACT=
...
karrap {[scat v:tran]} "karrap&g" =look_at=
...
kayirra {[scat adv:loc]} "kayirra&g" =north=
...
tubala {[scat pro]} "dubala&3DU&k" =those_two=
...

As you can imagine, if you have to make any small corrections to the mor tier, it’s incredibly fiddly and time-consuming to look up each morph in the lexicon and type out the code. The only other option is to run the MOR command again, which is even more undesirable.

So, I wrote a little interactive script (printMorCodes.py) that looks them up for you.

Instructions

Requirements

This script requires Python 2.x. It works on Mac and has not been tested on Windows

The script can be found here.

Running the script

The command is:

morCodeLookup.py -l lexicon(s)

Gurindji Kriol uses two lexicon files, so the command I use is:

morCodeLookup.py -l /Users/swilmoth/Dropbox/appencoedlinternship/kri/lex/lex_gurindji.cut /Users/swilmoth/Dropbox/appencoedlinternship/kri/lex/lex_kriolgen.cut

The script has a little welcome message, and then you just type a sentence into the terminal and it looks up the codes for you.

If you type kayirra _k tubala karrap, it gives you:

adv:loc|kayirra&g=north case:all|_k&g=ALL^der:fact|_k&g=FACT pro|dubala&3DU&k=those_two v:tran|karrap&g=look_at.

If you type jei _m gon Lajamanu _ngkirri! ‘They went to Lajamanu!’, you get:

pro|dei&3PL/S&k=they suf|_m&TAM&k=PRS v:intran|gu&k=go^v:minor|gon&k=go n:prop|Lajamanu case:all|_jirri&g=ALL !

I’ve tried to replicate CLAN’s MOR command, so punctuation is handled in a similar way, homographs have all the options listed with a ^ and anything starting with a capital letter has n:prop. And if you type something that’s not in the lexicon, you get something like:

Not-in-lexicon:supercalifragilisticexpialidocious

When you’re done, simply type exit.

Copying to clipboard

To save myself the step of highlighting the mor-codes and pressing command+C, I added an option so that when you type in kayirra, it not only prints adv:loc|kayirra&g=north to your terminal, it also copies the code to your clipboard. So you can quickly jump back to your transcript and paste it in. When you’re entering the command for the script, just add -c.

Setting up an alias

If this is something you want to use all the time, you can add an alias to your bash profile so you don’t have to type the whole command and find the lexicon files each time. I open up my ~/.bashrc file, and add this line:

alias lookup = 'morCodeLookup.py -l /Users/swilmoth/Dropbox/appencoedlinternship/kri/lex/lex_gurindji.cut /Users/swilmoth/Dropbox/appencoedlinternship/kri/lex/lex_kriolgen.cut'

Next time, the only command I need is lookup, or lookup -c.