MTAAC and Teaching Computers to Read Sumerian
Nov 5, 2018 4:36:19 GMT -5
Post by us4-he2-gal2 on Nov 5, 2018 4:36:19 GMT -5
MTAAC
Hey everyone - I thought I would mention some things about this fairly new project, MTAAC, which I have been working for in the last 5-6 months. It was the idea and brainchild of a co-student of mine, Emilie Page-Perron. I first heard of Emilie when she posted on our 'So you want to be an Assyriologist' thread, ten years ago as Sohnyrin. Subsequently, we both ended up in Toronto. She has had an interest in both Sumerian language and also computer programming skills, and has worked for the CDLI while completeting her Ph.D studies, in fact, she has become one of the principle investigators at CDLI, coordinating its development with that team. Since MTAAC is her side project, she is now my boss, I suppose.
The goal of MTAAC (Machine Translation and Automated Analysis of Cuneiform Languages) is to develop computer software which is able to 'read' cuneiform languages. At the moment, the focus is on Ur III administrative texts in Sumerian and on transliterations (transcriptions of the cuneiform into our Roman alphabet, so it is not intended for the computer to recognize cuneiform). Initially, my reaction to this project goal was 'so if the computers are going to be reading translating Sumerian, what am I going to be doing in my prospective career?' While I am not unconcerned about this, Emilie points out that automated reading of texts would solve a major problem in the field - that is, even after one hundred years of scholarship, there are still hundreds of thousands of cuneiform texts which are not available in printed translation. And these texts are unlikely be published in translation in our lifetime. After working on the Ur III administrative corpus for almost 6 months, I have witnessed the extent of that problem with this corpus of texts: while most are available in transliteration (transcribed from Sumerian cuneiform into our alphabet), only 1 in 10 (I would estimate) are available in translation. This is, in part, due to the general practice of Sumerologists who tend to publish the Ur III corpus in transliteration but do not publish their translations for various reasons (either the texts seem too obvious with their lists of sheep for X purpose; or there are lingering uncertainties about how the abbreviated grammar should be rendered in English - an so forth).
The work which Jinyan (a co-student) and I do for the project is really the 'grunt work,' that is, we annotate texts in minute detail so that the computer will have material with which to learn the language. This involves labelling each morophological element of a Sumerian word so as to explain its function and meaning using a sort of code that the computer can 'understand'. The project website is here: cdli-gh.github.io/mtaac/ . You will see that Jinyan and I are not mentioned currently, owing to the fact that the website has not been updated in the last 6 months. One contribution I have made is the development of an extensive theoretical treatment of the morphology of the Ur III administrative grammar. The relevant documents should be available on the website soon. Also, MTAAC follows the interpretation of Sumerian grammar laid out by Gábor Zólyomi. Prof. Zólyomi generously makes his new Sumerian grammar available for free download:
elte.academia.edu/G%C3%A1borZ%C3%B3lyomi