1 Introduction

This documentation describes the Apertium platform, one of the open-source machine translation systems which originated within the project “Open-Source Machine Translation for the Languages of Spain” (“Traducción automática de código abierto para las lenguas del estado Español”). It is a shallow-transfer machine translation system, initially designed for the translation between related language pairs, although some of its components have been also used in the deep-transfer architecture (Matxin) that has been developed in the same project for the pair Spanish-Basque. Apertium can translate at present between the pairs Spanish-Galician, Spanish-Catalan1With the name Catalan we refer also to the Valencian dialectal variant of this language., Catalan-Occitan, Catalan-French, and can be used to build translators between other related language pairs, such as Danish-Swedish, Czech-Slovak, etc. 50 pairs have been released and are considered to be stable. They are listed on the wiki of the project and are showcased on apertium.org. Even more translators – in the beta stage of development – can be found on beta.apertium.org.

Existing machine translation systems available at present for the pairs esca and esgl are mostly commercial or use proprietary technologies, which makes them very hard to adapt to new usages; furthermore, they use different technologies across language pairs, which makes it very difficult to integrate them in a single multilingual content management system.

One of the main novelties of the architecture described here is that it has been released under open-source licenses (in most cases, GNU GPL; some data still have a Creative Commons license) and is distributed free of charge. This means that anyone having the necessary computational and linguistic skills will be able to adapt or enhance the platform or the language-pair data to create a new machine translation system, even for other pairs of related languages. The licenses chosen make these improvements immediately available to everyone. We therefore expect that the introduction of this of open-source machine translation architecture will solve some of the mentioned problems (having different technologies for different pairs, closed-source architectures being hard to adapt to new uses, etc.) and promote the exchange of existing linguistic data through the use of the XML-based formats defined in this documentation. On the other hand, we think that it will help shift the current business model from a license-centered one to a services-centered one.

It is worth mentioning that “Open-Source Machine Translation for the Languages of Spain” was the first large open-source machine translation project funded by the central Spanish Government, although the adoption of open-source software by the Spanish governments is not new.

This documentation describes in detail the characteristics of the Apertium platform, and is organized as follows:

The files which this documentation refers to can be found at and downloaded from the project web page in Sourceforge at Github: https://github.com/apertium. From this page you can download the packages needed for installation, as well as view the individual files in the SVN (main) and CVS (residual) repositories of the project. The machine translation systems for the different language pairs can also be tested on the Internet at https://apertium.org/ (released versions) or https://beta.apertium.org (nightly versions). Besides translation modes proper, the latter website also allows to test individual morphological analysers or generators.

The present work has benefited from the contribution of many people and institutions:

1With the name Catalan we refer also to the Valencian dialectal variant of this language.