Sami - Estonian language technology cooperation: similar languages, same technologies

Project facts

Project promoter:
University of Tartu
Project Number:
EE06-0010
Target groups
Researchers or scientists
Status:
Completed
Initial project cost:
€229,400
Final project cost:
€228,075
From Norway Grants:
€ 195,004
The project is carried out in:
Estonia

Description

In order to survive in the contemporary digital world, a language must be supported by linguistically aware software tailored for the features of the concrete languages. The project involves three typologically similar languages: Sámi, an Ugric minority language, Estonian, a Fenno-Ugric language, and Võro, a minority Fenno-Ugric language. By bringing together R&D on these, the synergy of the project will result in a next generation of language technology. The objective of the project is to make robust models for Estonian and Võro that can be integrated into different practical applications like machine translation and interactive computer-assisted language learning, in the same fashion as Sámi has been already. The project develops solutions for small national languages such as Estonian and Finnish, whereas at the same time developing open source technologies to be ported to a wide range of minority languages. Giellatekno, the centre for Sámi language technology at the University of Tromsø , has been implementing Sámi language tools, and including other languages acts as knowledge transfer, at the same time giving feedback for further enhancements.

Summary of project results

The main objectives of the project were to join the strengths of Estonian language technology resources and competence with those of The Arctic University of Norway (UiT in Tromsø) including the Giellatekno and Divvun groups. Platforms, practices and repositories developed at UiT were shown to apply even to Estonian and Võru. In this way the same technologies for machine translation and interactive computer assisted language learning (iCALL) were applied to Estonian and Võru. The project showed concretely that infrastructures for language technology can be shared across a wide array of language. In particular, this may prove to be valuable for enabling the vitalisation of endangered languages and dialect and providing new methods for second language learning. All results appear to be in open source and readily available in the net, which is an important factor in the further development of the applications and resources. It has shown that one can produce digital descriptions for languages separately from the programs for iCALL and machine translation and many other applications.

Summary of bilateral results

The Estonian language technology research community is well known for for its competence in producing computerised corpora, lexicons, analysers and other so called language resources and so was the Norwegian community at the UiT. The added value of the project was the active combination of these two. The UiT community had competence in handling an unusually wide array of languages, mostly Uralic ones. Basically the approaches in the two countries were based on similar theories. The role of the UiT appears to have been essential for making the existing tools for iCALL and machine translation operational for Estonian and Võru. The staff exchange has probably been essential for the setting up of an Apertium based machine translation and OAHPA based language learning tools. UiT has been committed to open source principles which made this kind of cooperation easier.The technical skills and theoretical knowledge of the project partner should be seen as crucial in achieving the project’s objectives.