Could AI help us translate long-forgotten languages?
Artificial intelligence has been used for everything from solving complex mathematical calculations to cracking military codes. Now, scientists are harnessing it to decipher languages that have been lost to history. This could have interesting implications for the future of the translation industry. But how exactly can AI help restore these long-lost languages?
Restoring lost languages and culture
According to Ethnologue, there are currently 7,151 languages spoken around the globe. Over the course of human history, the total number of languages spoken is believed to be at least 31,000. We still have traces – dictionaries, literature, recordings – of some of these languages, but the meaning of many others has been completely lost. Every time a language disappears, a certain facet of human culture dies with it.
In order to understand a forgotten language, we need to decode the script in which it is written. Any written script is merely a way of recording the sounds of a language, so if we can identify those sounds, we can compare them with the sounds of other languages and see if there are any cognates (words of common origin). Comparing sounds between one language and any number of others would take a human linguist an unfeasibly long time. This is where AI comes in.
The power of machine learning
Jiaming Luo and Regina Barzilay from MIT and Yuan Cao from Google’s AI lab in Mountain View, California have developed a machine-learning system with the potential to decipher lost languages. The key to this system is the principle that all languages involve certain patterns of words. Identifying these patterns in an unknown language and mapping them onto a known one enables us to bring the meaning of the text to light.
The team tested their system using Linear B, a script dating from 1400 BCE and known to represent an early version of ancient Greek, and Ugaritic, an early form of Hebrew. The system managed to identify 67.3% of the cognates between ancient Greek and Linear B, a remarkable achievement and an entirely new level for machine translation.
Every language at once
The main advantage offered by AI in machine translation is that it does not need to know the ‘progenitor language’ from which a language is derived. Both Linear B and Ugaritic were able to be deciphered by human linguists because the linguists knew which languages the scripts represented. A machine-learning system will be able to search for cognates in every known language in a fraction of the time it would take a person or even a team of people. This technique of trying every possible combination is known as a brute force approach, and is already used by computers to crack passwords.
Because a machine-learning system does not ‘understand’ the text it translates, there will always be the need for a human translator to review the results and ensure that the translation actually makes sense and sounds natural. This is already true of machine translation. The potential that this new system represents, however, is enormous. We may be a significant step closer to deciphering iconic artefacts such as the Linear A tablets, the Phaistos Disc and the Voynich manuscript, all written in languages that have so far eluded all attempts by translators to decode.
Keeping pace with the future
Computer technology has been a part of the translator’s arsenal for decades, and we at SwissGlobal have always taken a keen interest in it. From CAT tools to data security solutions via machine translation, technology is a core element of what we offer. We are eager to see how machine translation will change and evolve as the technology behind it becomes ever-more sophisticated, and we look forward to providing a better and more powerful machine translation service to our clients.
-
AI
Lost languages
Machine-Learning
translation