On this page:
Documentation of the Open-Source Shallow-Transfer Machine-Translation Platform Apertium

Documentation of the Open-Source Shallow-Transfer Machine-Translation Platform Apertium

This is an early draft. Contribute to this documentation on Github.


Of version 2.0 of the documentation:

Mikel L. Forcada
Boyan Ivanov Bonev
Sergio Ortiz Rojas
Juan Antonio Pérez Ortiz
Gema Ramírez Sánchez
Felipe Sánchez Martínez
Carme Armentano-Oller
Marco A. Montava
Francis M. Tyers


Mireia Ginestí Rosell

Departament de Llenguatges i Sistemes Informàtics
Universitat d’Alacant

Of this document:

Ilnar Salimzianov

Copyright © 2007 Grup Transducens, Universitat d’Alacant. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be found in http://www.gnu.org/copyleft/fdl.html.

Version 2.0 of the official Apertium documentation can be found here. The LaTeX source file is archived on Sourceforge.

In addition, there is a lot of information on the wiki of the project.

The goal of this document is two-fold:

    1 Introduction

    2 The shallow-transfer machine translation engine

    3 Format specification of the data stream between modules

      3.1 Introduction

      3.2 Data stream without format

        3.2.1 Stream format

      3.3 Segmented data stream

    4 Modules specification

      4.1 Lexical processing modules

        4.1.1 Module description

        4.1.2 Format processing

 Format encapsulation method

    5 Installing and running the system

    6 Maintaining linguistic data

    7 Data insertion web forms

    8 Best practices when developing an Apertium translator

    9 Appendix A. XML DTDs

    10 Appendix B. Grammatical symbols used in the modules

    11 Appendix C. A list of all Apertium repositories

      11.1 Core

      11.2 Tools

      11.3 Monolingual packages

      11.4 Bilingual packages

      11.5 Collections

      11.6 Documentation

    12 Appendix D. Abbreviations used in the text

    13 Appendix E. Linguistic data repositories by language families

      13.1 Turkic