What to listen to, write
2008/03/01 Kortabitarte Egiguren, Irati - Elhuyar Zientzia Iturria: Elhuyar aldizkaria
These systems are currently integrated mainly in telephone services such as appointment, product request, booking request for shows, etc. But there are others like automatic dictation. The latter is working, among others, in the Systems and Automatic Engineering department of the UPV/EHU.
Speech treatment requires a lot of good training. That is, the system must receive some training, which is known as machine learning. For this purpose, on the one hand, files, audios and sounds of television and radio, and on the other, reference texts of what was said in these media. UPV researchers, for example, frequently use ETB's Gaur Egun and Teleberri programs to form the system. It is not necessary to know what has been said literally, but it is capable of collecting a summary of what has been said. In short, try to understand the relationship between sounds and words.
Once the learning process is complete, the system should be able to understand what was said in any Gaur Egun or Teleberri. Although learning is a slow process, once the system has the rules or information internalized, that is, it has the right reference material, it shows the result somewhat quickly. In this case, written text of what was spoken. In short, the goal is to get text from an audio or sound.
Small large
It is true that most of these types of applications that can be found on the market target “large” languages, especially English. However, researchers from the Universidad Politécnica de Donostia-San Sebastián, in collaboration with the IXA, GTTS and Computational Intelligence groups of the UPV/EHU, work with the Basque language. The obvious difference between these 'big' and 'small' languages lies in the number of reference data. This type of English tools have a lot of data, while the reference material in Basque is much smaller. Therefore, researchers are looking for new techniques to better and more accurately take advantage of these few data.
These systems depend entirely on the language and each language has its own tool. But, for example, UPV/EHU researchers work not only with Basque, but also with Spanish and French. The Teleberri programme or the Infozazpi sessions, for example, have two main objectives: on the one hand, they want to understand Spanish and French – together with Basque – and on the other, to look for similarities between Basque and the other two languages in order to improve the training of tools in Basque.
In this sense, a series of essays are currently being conducted that analyze the possibility of using multiple languages in the same tool. This is the future challenge of UPV researchers: to develop a system capable of understanding Basque, Spanish and French.
Gai honi buruzko eduki gehiago
Elhuyarrek garatutako teknologia