}

Language Technologies and Language Industries

2009/06/01 Sagarna, Andoni - Ingeniaria Iturria: Elhuyar aldizkaria

Computer science is the automatic treatment of information. Initially it was limited to performing numerical calculations, but little by little another type of information has been treated: music, image, etc. Soon, in the 1950s, he began to process human languages in that evolution. At the age of 50, after a steep path, applications based on language processing are coming to maturity and the market, and some industries that work them are gaining strength.
Language Technologies and Language Industries
01/06/2009 | Sagarna Izagirre, Andoni | -
(Photo: Bram Janssens/350RF)

It would be good for a Basque who does not know Japanese and a Japanese who does not know Basque, for example, to have a telephone conversation in real time, speaking in his own language and listening to what the other says in his own language. That, of course, is a dream today, but we can say that we are taking small steps towards it.

It is much simpler to translate written texts from one language to another, but it is not easy either. Specific topics (apparatus manuals, weather predictions, etc.) The automatic systems that treat specific linguistic couples are those that obtain the best results without great human corrections.

However, the situation of machine translation has changed a lot in recent years, among other things because there are many translations that have been made in digital support. Machine translation was based on grammatical rules, but statistical methods based on large databases can now be used. These databases contain original texts and their translations, forming parallel corpus. The translation system knows the relationships between the texts and is able to translate them when a similar or equal text is presented. As there are large parallel corpus, this system gets good results.

The current trend is to combine regional and statistical methods.

On a lower level, the so-called translation memories are of great help today. These are databases that contain the episodes that have been previously translated. When the Human Translator is working and the system verifies that a section to translate or very similar is already translated, it presents to the Translator some equivalents to said section and it decides whether any of them is valid or can be used with some modification. These systems are very beneficial as they help to achieve speed and consistency.

To help in the elaboration of monolingual texts there are different tools: orthographic correctors that capture spelling errors, grammatical correctors that verify that the phrases are adjusted to the grammar, search engines that help to find information in the documents, tools that make an automatic summary of the documents, linguistic explanations from non-linguistic data like meteorological data.

Among us are well known computer software for interpreting printed texts received through scanner (OCR). Another thing is to interpret the manuscripts, which is much more mischievous.

When the oral language is present, the understanding of the oral language and the creation of the oral language have a very different difficulty. Today there are very common systems that "read" texts written aloud -- for example, those that allow to listen to what the texts written to the blind say -- but to do otherwise, to automatically interpret a system what a human being says orally, and to convert it into written text, for example, is much more difficult.

All these applications go little by little from the laboratories to the market. Not in all languages, unfortunately. And it is necessary to make great investments in research and languages that do not offer great possibilities to recover them are going back. There is no doubt that English is still the majority language in this field, since it is the one that most relates to economic interests.

Linguistic technologies are entering much into health services. In the health field, to date, clinical information has been preserved in large masses of unstructured text. Linguistic technology allows health professionals to save a lot of time and increase safety. The clinical information that until now was rewritten in free text is replaced by a standardized description of diagnoses, treatments and drugs using specific systems.

Linguistic technology is also being introduced in other sectors such as automotive, aviation and international organizations. In these sectors, as a result of globalization, they must create written documentation in many languages and train multicultural and multilingual staff.

For this purpose, machine translation, terminology extraction and management software, orthographic correctors, multilingual documentation management, etc., are essential to save time and ensure the consistency of the results.

Sagarna Izagirre, Andoni
Services Services Services
254 254
2009 2009 2009 2009 2009
Security security security security
031 031 031
Technology technology technology technology
Analysis of analysis
Services Services Services

Gai honi buruzko eduki gehiago

Elhuyarrek garatutako teknologia