Taking stock of language industry terminology

language industry terminology

Language industry, language professional, language profession, language work. Four simple terms. Everyone understands what they mean, but how do you say ‘language professional’ in French? Or ‘language industry’ in Dutch?

Liesbet Van Oudenhove set out to figure this out during her internship at De Taalsector. She collected language industry-related terminology and looked for somewhere to share the terms and discuss them with the ‘community of interest’. The Language Sector is alive and changing: in fact, suffice it to say that terminology is positively swinging! This is the report of her adventure.

Liesbet Van Oudenhove, Ghent – First of all, I visited the De Taalsector website for an overall idea of the most important terminology in The Language Sector. I immediately found an interesting term in the De Taalsector baseline: something along the lines of ‘language professional’ when translated literally into English, but, exactly to whom or what does the term ‘language professional’ refer? And how do you translate this term into French, for instance?

‘Language professional’ refers to anyone engaged in professional language work: copywriters, language teachers, interpreters, language engineers, technical writers, editors, literacy teachers, language policy coordinators, speech therapists, translators, dialectologists, dictionary makers, language coaches, audio describers and a whole host of professionals earning a living through language-related work.

Systran, one of the MT engines (Machine Translation) used by The Language Sector (thelanguagesector.eu) for translations from Dutch into French, translates ‘language professional’ as ‘professionnel de la langue’. Google Translate translates it as ‘professionnel langue’. It is clear, then, that MT engines have difficulties with the term ‘language professional’.

But what about human translators? To determine this, I turned to various Internet forums and Facebook groups for professional translators. The message I posted asked for a French translation of ‘language professional’ based on a short text about The Language Industry Awards (LIAs). I posted this message in six different places and got four different answers. One translator opted for ‘un professionnel de la langue’, another said ‘professionnel linguistique’. Yet another proposed ‘professionnel des langues’ or ‘professionnel en langues’. This term is obviously problematic for human translators as well as machines.

If ‘language professional’ is a problematic term in French, then what about language work, language profession, language product, language service and other key terms in The Language Sector? And what about languages other than French? Indeed, translators have a plethora of different ideas. This isn’t a serious problem in itself and merely goes to show the variety of different concepts a term can cover. In any case, the whole issue appeared more wide ranging than I had anticipated and so all the more worthwhile investigating.

First of all, I wanted to collect the main language industry-specific terms in Dutch as quickly and efficiently as possible. This was done in a variety of ways: I conducted a manual search for terms on the The Language Sector website, asked language industry expert Dries Debackere to create a mind map of The Language Sector, and then used term extraction tools. These are tools that automatically extract terms from a text corpus, useful in this case because the The Language Sector text corpus includes more than a million words.

After a thorough search, I found three term extraction tools that might be useful.

The first was AntConc, not really a term extraction tool but a concordance program. With AntConc I searched for terms and collocations in which the string ‘language’ appeared. With this useful tool, I was quickly able to determine these terms and collocations and could see how many times they occurred in the whole corpus.

The next tool I tried was TermTreffer, commissioned by the Nederlandse Taalunie to extract terms from Dutch text corpora. TermTreffer is an online tool requiring registration. Your login and password are requested by mail - a time consuming business. Once you have logged in, the extraction takes place relatively fast. You upload your corpus and get back your terms immediately. TermTreffer came back with 22,600 terms from our corpus of over a million words, none of which appeared to be verbs, enabling us to deduce that it was not an effective term extraction tool.

Another tool I used was TExSIS, a tool created by Ghent University. It was impossible to upload a whole corpus of a million words, so I divided the corpus into ten smaller pieces. The results came through a couple of days later – a list of 50,000 terms.

The fact that TExSIS found so many language industry-related terms led me to wonder about the reliability of extraction tools in general. I decided to run a manual term extraction on a representative number of randomly selected articles from the The Language Sector website. I compared my own glossary with the lists compiled by the term extraction tools mentioned above. The terms found by AntConc had a 16% match with my manual term extraction. TermTreffer only matched 7%. TExSIS had a better outcome and included 50% of my manually extracted terms, but then again it also produced the most non-terms.

The Language Sector should not only be discussed in Dutch speaking areas. The European language industry has 24 official languages. Specific language industry terminology is also very useful across language borders. Translators and translation machines should have easy access to a multilingual terminology database containing language industry-related terms. This would be very useful, as was made clear not only by translators and their comments on forums and discussion groups, but also by the various MT engines connected to the multilingual The Language Sector website. This means that we should also think about a file type that is manageable for both the translator and the machine. This type does in fact exist. It is called TBX, but I do not know of anyone who uses it.

It would be really helpful if we, as language professionals, could collect all our language industry terms, manage and share them in a space where language professionals could help the process by starting a debate about each term. Managing, sharing and debating on the same spot.

To find such a space, I examined some web-based platforms where terminology could be managed and shared. This is how I got to evoTerm, a paying terminology management system where your input remains one hundred percent your own, where you can give others access to your created termbases and where you can decide whether a user is allowed to make changes. The interface can be shown in English, French, German or Dutch. This would seem to be a good system, but there is nowhere where the terms can be discussed. This feature is really important because language industry terminology is still new and, as such, not yet stable.

Another term management system I looked closely at was TermBeheerder, also a system commissioned by the Nederlandse Taalunie and developed by CrossLang (Ghent). This system saves your terms (which can be changed later on), allowing you to add a lot of contextual information to them and - and this is exactly what I was looking for and, in my opinion, what The Language Sector needs - other users can easily add comments and start discussions. Too bad this web-based tool is still in a beta version! I have uploaded some terms, but no one is yet able to use or edit them. It is also unfortunate that the interface can only be shown in Dutch. If we could create a multilingual interface, TermBeheerder could be a great multilingual environment in which to collect and manage terms as well as to start conversations about them.

Conclusion: there is still a lot of work to be done. There is a need for somewhere where language professionals can hold conversations with each other about their own terminology in different languages. This space would preferably be at the location where the terminology is managed and shared.

 

Would you like to comment on this article? Feel free to send an email to redactie@detaalsector.be.

 

http://www.termbeheerder.org/

http://www.termtreffer.nl/

http://www.evoterm.net/

http://www.thelanguagesector.eu/

 

 

Additional information