OpenMaTrEx logo

OpenMaTrEx, a free/open-source marker-driven example-based machine translation system

What is OpenMaTrEx?

OpenMaTrEx is a free/open-source (FOS) example-based machine translation (EBMT) system based on the marker hypothesis. It comprises a marker-driven chunker, a collection of chunk aligners, and two engines: one based on the simple proof-of-concept monotone recombinator (previously released as Marclator, http://www.openmatrex.org/marclator/) and a Moses-based decoder (http://www.statmt.org/moses/). OpenMaTrEx is a FOS version of the basic components of MaTrEx, the data-driven machine translation system designed by the Machine Translation group at the School of Computing of Dublin City University (Stroppa and Way 2006, Stroppa et al. 2006). A great part of the code in OpenMaTrEx is written in Java, although there are many important tasks that are performed in a variety of scripting languages.

OpenMaTrEx has been released under the GNU General Public Licence (GPL), version 3.

OpenMaTrEx is (c) 2007-2011 Dublin City University. The original MaTrEx code was developed among others by Steve Armstrong, Yvette Graham, Nano Gough, Declan Groves, Yanjun Ma, Nicolas Stroppa, John Tinsley, Andy Way, Bart Mellebeek. The free/open-source package OpenMaTrEx has been put together by Sandipan Dandapat, Mikel L. Forcada, Declan Groves, Yanjun Ma, Sergio Penkale, John Tinsley, and Andy Way. Pavel Pecina helped with the Czech marker files. Jimmy O'Regan helped with the Irish marker files.

A more general description of OpenMaTrEx may be found in the ABOUT file of the package.

Downloading OpenMaTrEx

While a proper project site is set up, OpenMaTrEx can be downloaded from here:

A tarball (.tar.gz) file containing a snapshot of current development can be downloaded by saving this link as, for instance, OpenMaTrEx.tar.gz.

Experimental installers

OpenMaTrEx comes complete with an INSTALL file that explains step by step how to install it manually. But you might want to download our experimental installer (still under construction), which may be easily modified for your local installation to do all the downloading, checking out, patching and installing for you. Get it from here:

See the note below.

A note on versions: help sought

The installers above use rather old versions of Moses and IRSTLM, and do not work with recent versions of the gcc compiler (such as 4.6.1). If you are a developer and want to help building a version of OpenMaTrEx that works with up-to-date versions of Moses, IRSTLM and the gcc compiler, please contact mfor...@computing.dcu.ie .

A quick test

For a quick test, make a directory orig inside the OpenMaTrEx-0.97.1, OpenMaTrEx-0.98 or trunk directory created by the installer, and copy:

and then launch sample_run.sh. It will run a baseline Moses training job, a second job in which marker-based chunk pairs are obtained and added directly to the statistical translation table, and a third training job in which marker-based chunk pairs are added but a feature is used to distinguish statistical pairs from marker-based pairs. The results of the three systems are evaluated on the testset. The job may take a couple of hours to complete.

Suggestions welcome!

Subversion repository

If you are an OpenMaTrEx developer, you can access the repository with the command

svn co http://www.openmatrex.org/svn/OpenMaTrEx/

(contact mfor...@computing.dcu.ie if you want to become a developer).

Anyone can Browse the Subversion Repository.

Contact

Please send bug reports, comments, etc. to Mikel L. Forcada, mfor...@computing.dcu.ie .

Meet us at our IRC channel (#openmatrex at irc.freenode.net). Use your favourite IRC client or log in here:

Published information

PDF flyer (24/05/2010)

Sandipan Dandapat, Mikel L. Forcada, Declan Groves, Sergio Penkale, John Tinsley, Andy Way: OpenMaTrEx: A Free/Open-Source Marker-Driven Example-Based Machine Translation System, in Loftsson, H., et al., eds., Advances in Natural Language Processing: 7th International Conference on NLP, IceTAL 2010 (Reykjavík, 16-18 Aug. 2010), Col. Lecture Notes in Artificial Intelligence, vol. 6233, pp. 121-126 (Berlin, Heidelberg: Springer).

A technical report (15.06.2011) which extends and updates the previous paper is also available and is a good starting point.

Papers citing OpenMaTrEx

For a list of papers citing OpenMaTrEx, launch these searches: MT-Archive.info, Google Scholar.

References

A more complete set of references can be found in the ABOUT file of the package or here.