}

A look at the evolution of machine translation systems

2017/09/01 Cortés Etxabe, Itziar - Elhuyarreko itzulpengintza-teknologien arduraduna Iturria: Elhuyar aldizkaria

400

Machine translation is a known technology. When we surf the Internet we often use it to understand texts written in foreign languages, or why not, or to help translate.

The services we find on the Internet are very varied, mostly free, and although they seem to be the same, they can be tools based on different technologies or have been adapted over time, and this change has been transparent for users. To implement a machine translation system, there are, therefore, several options: rules-based systems (RBMT) require linguistic resources or knowledge; those based on statistics (SMT) need collections of texts already translated, and with some ability to learn to translate using statistical techniques; and finally, NMT systems or neural network-based systems, which have been well known in recent times.

Neural network-based systems mimic the functioning of the human brain and arise from large collections of information, but are more complex than statistically based systems. This technology is based on a mathematical model (within deep learning) that tries to imitate the functioning of brain neurons and that, despite having spent several years since its knowledge, is gaining strength. In fact, the implementation of this type of systems requires, in addition to large data collections, high-capacity computing devices, and with the computers and graphics cards we currently have managed to create this type of systems efficiently.

Machine translation systems have evolved over time and Google is an example of this: without users noticing, they have been adapting to new models and some of the translators we use are already based on neural networks. But let's not think that this technology is only in the hands of Internet giants, we are researching in this field and we work with Basque. At the moment we are conducting research experiments, but the first results obtained are encouraging. The aim is to launch shortly a system based on neural networks that translate into Basque.

consumer.eus website, bilingual

Matxin (http://matxin.elhuyar.eus) is a pioneer in translating Spanish into Basque. It is a machine translation system that emerged from a doctoral thesis in 2007 (Aingeru Mayor Martínez, UPV/EHU), the first automatic translator to Basque, and since then it is progressively adapting to the new times in a digital age as important as linguistic technologies. It is based on rules, so you have a knowledge of language resources that allows you to translate. Specifically, it has the necessary knowledge to understand texts in Spanish and, in some way, translate them into Basque. It uses dictionaries and syntactic rules, among others.

In the last three years we are working on the project Consumer.eus Foundation Eroski, the research team of the UPV Ixa and Elhuyar. The contents that until now could only be read in Spanish are automatically translated into Basque and Matxin is used to perform this work; it is a solid translator, based on free software, that allows to easily adapt the linguistic resources and, after its application, is perceived at the same time the improvement of the results. To translate the contents of the web consumer.eus we have focused on the field of food, adapting the linguistic resources used by the automatic translator. The effect of the adaptations made has been immediately confirmed: the adequacy of the resources has been accompanied by an improvement of the quality of the system in the texts on food.

However, it is clear that the quality of the automatically generated translations is not always expected, or at least not directly editable. Therefore, in this same project the reader is offered the possibility to adapt the translations, being able to participate anyone. Consumer.eus allows to correct articles and recipes in Basque and the work done is preserved. For what? Improve the translator with the collected data. With the information received and machine learning a new machine translation system specialized in the translation of food texts will be launched.

Challenges of machine translation

Advancing machine translation systems is a huge challenge and the way to improve the quality of results is full of experiments: when we started working with rules-based systems, the manual work of linguists was essential. The linguist had to know the languages of origin and destination of the translator and create rules for making an interlinguistic bridge (morphologically and syntactically, for example). Starting from scratch a system of these characteristics is therefore an immense task. However, the creation of systems based on statistics or machine learning does not necessarily require linguistic knowledge (at least to create a simple model).

In recent years, we often hear the concept of Big Data, which is used to refer to the immense datasets available. Information is extracted from them and, where appropriate, the corresponding study is carried out. The study techniques mentioned in this article do not apply only to machine translation, but the concept of Big Data is also used for other learning or study processes. It seems, then, that it is easy to create translation systems when we have available data, but in this case we also find limitations.

The preparation of data sets for machine learning or deep learning is often not easy. We must bear in mind that the creation of mathematical models requires a set of sentences already translated: translations must be of quality and the dataset must be large. In theory, the higher the number of data used to create the system, the better the result of the translator.

However, it seems that it has reached the top with systems based on statistics: although the dataset used for learning expands, the translator's result does not improve in the same proportion. That is why we are researching neural network-based systems to overcome this barrier and advance in the field of machine translation.

Gai honi buruzko eduki gehiago

Elhuyarrek garatutako teknologia