/[svn]/OpenMaTrEx/trunk/INSTALL
ViewVC logotype

Contents of /OpenMaTrEx/trunk/INSTALL

Parent Directory Parent Directory | Revision Log Revision Log


Revision 271 - (show annotations)
Tue May 24 11:00:46 2011 UTC (6 years, 7 months ago) by mikel
File size: 10144 byte(s)
changed references to 0.97.1 to 0.98
1 OpenMaTrEx INSTALL
2
3 0 BEFORE YOU START
4
5 This document describes a manual installation. We are also working on a
6 shell that installs all the mandatory software (it skips optional software).
7 You may want to download it and experiment with it. Get it from
8
9 http://openmatrex.org/OpenMaTrEx-installer-0.98
10
11 Make it executable, and run it with the installation directory as first
12 argument.
13
14 1 INSTALLING REQUIRED SOFTWARE
15
16 The following assumes you are installing everything onto a local directory.
17 In the examples, it will be called the base directory and is located in
18 /home/mikel/OpenMaTrEx-test/ . Change for your particular installation.
19
20 1.1 Giza++
21
22 Download the latest version of GIZA++ and install it as follows:
23
24 wget http://giza-pp.googlecode.com/files/giza-pp-v1.0.5.tar.gz
25 tar -xvzf giza-pp-v1.0.5.tar.gz
26
27
28 Before compiling, we need to patch a file in Giza++ (Instead, it will not
29 compile under g++-4.4, check
30 http://code.google.com/p/giza-pp/issues/detail?id=11#c4)
31
32 wget http://www.openmatrex.org/giza-pp.patch
33 patch giza-pp/GIZA++-v2/file_spec.h <giza-pp.patch
34
35 Now we are ready to compile
36
37 cd giza-pp/
38 make
39
40 Create a bin/ folder
41
42 cd ..
43 mkdir bin
44
45 Copy compiled executables to bin/ folder
46
47 cp giza-pp/GIZA++-v2/GIZA++ bin/
48 cp giza-pp/GIZA++-v2/snt2cooc.out bin/
49 cp giza-pp/mkcls-v2/mkcls bin/
50
51
52
53 1.2 IRSTLM
54
55 We need to download and compile IRSTLM, which we will use as the target
56 language model for Moses, because, unlike SRILM, it is free/open-source
57 software.
58
59 Download the last stable version from
60 http://sourceforge.net/projects/irstlm/ . You may also use a direct link.
61 OpenMaTrEx has been tested with version 5.50.02. To download it, type:
62
63 wget http://hlt.fbk.eu/sites/hlt.fbk.eu/files/irstlm/irstlm-5.50.02.tgz
64
65 Then unpack the .tar.gz file, for instance:
66
67 tar xzvf irstlm-5.50.02.tgz
68
69 And then go to the directory created:
70
71 cd irstlm-5.50.02
72
73 Then install in your local directory, in the example,
74 /home/mikel/OpenMaTrEx-test/
75
76 ./configure --prefix=/home/mikel/OpenMaTrEx-test/
77 make
78 make install
79
80 1.3 Moses
81
82 Moses is available via Subversion from Sourceforge. We will get the last
83 version from SVN. This may be risky sometimes! Alternatively, you can get
84 the latest release from http://sourceforge.net/projects/mosesdecoder .
85
86 cd ..
87 mkdir moses
88 svn co https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk moses
89
90
91 We have tested version 3739: it works with OpenMaTrEx 0.97.1 and later, so you may
92 change the last line to
93
94 svn co https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk moses -r 3739
95
96 This will copy all of the Moses source code to your local machine. Now we
97 will compile it. Change /home/mikel/OpenMaTrEx-test/ to whatever you used
98 previously.
99
100 cd moses
101 ./regenerate-makefiles.sh
102
103 You may get errors here such as: "./regenerate-makefiles.sh: line 40: -I:
104 command not found" or "library used but LIBTOOL not defined in
105 configure.in". This is because version 1.9 or higher of automake and libtool
106 are not installed. Install them and proceed. In a Debian-based GNU/Linux,
107 type: "sudo apt-get install automake1.9 libtool".
108
109 ./configure --with-irstlm=/home/mikel/OpenMaTrEx-test/
110 make
111
112
113 1.4 Moses scripts
114
115 Then, one needs to install Moses scripts. Moses uses a set of scripts to
116 support training, tuning, and other tasks. The support scripts used by Moses
117 are "released" by a Makefile which edits their paths to match your local
118 environment.
119
120 OpenMaTrEx uses these scripts, among other things, to estimate a
121 maximum-likelihood bidirectional lexical translation table. This estimates
122 both target-to-source and source-to-target word translation probabilities.
123 To install these scripts, go to the scripts directory
124
125 cd scripts
126
127 and edit the Moses script Makefile to set the TARGETDIR and BINDIR. In our current
128 example:
129
130 < TARGETDIR?=/exports/home/s0565741/inf_iccs_smt/hieu/bin
131 < BINDIR?=/exports/home/s0565741/inf_iccs_smt/hieu/bin
132
133 > TARGETDIR=/home/mikel/OpenMaTrEx-test/bin
134 > BINDIR=/home/mikel/OpenMaTrEx-test/bin
135
136 Make sure you have "=", not "?=".
137
138 (Note: This bin/ directory is where GIZA++ executables were copied when we
139 installed it).
140
141 There seems to be a bug in one of the Moses scripts for MERT
142 (training/moses_mert.pl) that does not allow for an adequate initialization
143 of feature weights when there are more than 5 translation model (tm)
144 features. As a workaround while this is solved, we need to patch it as
145 follows:
146
147 wget http://openmatrex.org/mert-moses.pl.patch
148 patch training/mert-moses.pl <mert-moses.pl.patch
149 Then
150
151 make release
152
153 This will create a time-stamped folder named (for instance)
154 /home/mikel/OpenMaTrEx-test/bin/scripts-YYYYMMDD-HHMM, with released
155 versions of all the scripts. You will call these versions when training and
156 tuning Moses and also when running the alignment routines of OpenMaTrEx.
157 Some Moses training scripts also require a SCRIPTS_ROOTDIR environment
158 variable to be set. The output of make release should indicate this. Most
159 scripts allow you to override this by setting a -scripts-root-dir flag or
160 something similar.
161
162
163 1.5 Additional (preprocessing) scripts
164
165 These are a few scripts originally not included with Moses which are useful
166 for preparing data before training. They are used in many Machine
167 Translation contests and may be installed by
168
169 cd ../..
170 wget http://homepages.inf.ed.ac.uk/jschroe1/how-to/scripts.tgz
171 tar -xzvf scripts.tgz
172
173 [These scripts have license whatsoever, although they are used as if they
174 were free software (though according to the Berne convention, they are
175 not!)]
176
177 Currently Moses includes versions of or alternatives to some of them and
178 installs them in the timestamped (scripts-YYYYMMDD-HHMM) directory. The most
179 important ones of those are used to tokenize and input and then de-tokenize
180 it:
181
182 * tokenizer/detokenizer.perl (also recaser/detokenizer.perl, slightly
183 different)
184 * tokenizer/tokenizer.perl
185
186 There is no lowercase.perl script in Moses (there is however support for
187 de-true-casing and re-casing using a trained statistical model).
188
189
190 1.6 args4j
191
192 Older versions of OpenMaTrEx included source code for an old version of
193 Kohsuke Kawaguchi’s args4j library (https://args4j.dev.java.net/). args4j is
194 used to parse arguments when OpenMaTrEx components are invoked through the
195 command line (not operational yet, but required for compilation). In this
196 version, we have removed the code, and just require it to be installed
197 before compilation (see "2 INSTALLING OPENMATREX ITSELF" below).
198 advance if needed.
199
200 1.7 Meteor evaluation metric
201
202 If scores from the Meteor evaluation metric are desired in addition to BLEU
203 and NIST, take the following steps. If OpenMaTrEx detects that Meteor is
204 installed, it will use it.
205
206 Make sure you are in the base OpenMaTrEx directory (in the example,
207 /home/mikel/OpenMaTrEx-test/ ).
208
209 Download the Meteor tarball:
210
211 wget http://www.cs.cmu.edu/~alavie/METEOR/old/meteor-1.0-jar.tgz
212
213 Untar the tarball:
214
215 tar -zxvf meteor-1.0-jar.tgz
216
217 Now OpenMatrex can produce Meteor scores when evaluating MT outputs. Note
218 that the TERp paraphrases are used by the Meteor metric using this
219 configuration. If you want to use the TERp paraphrases, please consult the
220 README file in Meteor's directory.
221
222 [Be advised that Meteor has a rather strange license (the CMU license) which
223 is not compatible with the GPL. If you plan to distribute OpenMaTrEx
224 derivatives, be sure not to include this code.]
225
226
227 2 INSTALLING OPENMATREX ITSELF
228
229 2.1 Downloading OpenMaTrEx and args4j
230
231 Let's go back to the directory where we are installing everything, the base
232 directory. In the examples,
233
234 cd /home/mikel/OpenMaTrEx-test/
235
236 Download the last version of OpenMaTrEx (see www.openmatrex.org), for
237 instance:
238
239 wget http://www.openmatrex.org/OpenMaTrEx-0.98.tgz
240
241 and unpack it:
242
243 tar xzvf OpenMaTrEx-0.98.tgz
244
245 this will create a directory, (for instance) OpenMaTrEx-0.98/ where all of
246 the OpenMaTrEx code will reside. We'll go there:
247
248 cd OpenMaTrEx-0.98/
249
250 In order to compile OpenMaTrEx properly please download the latest version
251 of the args4j.jar from the following location:
252
253 http://download.java.net/maven/1/args4j/jars/
254
255 NOTE: the OpenMaTrEx current version has been tested with
256 (args4j-2.0.9.jar), so you can just download it into the lib directory:
257
258 cd lib
259 wget http://download.java.net/maven/1/args4j/jars/args4j-2.0.9.jar
260 cd ..
261
262 NOTE: THIS STEP IS IMPORTANT AS WITHOUT IT OpenMaTrEx.jar will not compile.
263
264 2.2 Compiling the Java classes
265
266 Inside the OpenMaTrEx.tgz package, a build.xml file is provided in the
267 home directory of the OpenMaTrEx so that it may easily be built simply by
268 invoking ant. The resulting OpenMaTrEx.jar contains all the relevant
269 classes, some of which will be invoked using a shell, OpenMaTrEx (see
270 below). Just use the following command:
271
272 ant dist
273
274 2.3. Setting the PATH in the 'OpenMaTrEx' shell
275
276 Once the above are installed, set the path of the BASE_DIR (home
277 directory of OpenMaTrEx). In the example,
278
279 export BASE_DIR=/home/mikel/OpenMaTrEx-test
280
281 and make sure that your OPENMATREX_DIR is correct, for instance:
282
283 export OPENMATREX_DIR=${BASE_DIR}/OpenMaTrEx-0.98/
284
285 Then set
286
287 export MOSES_SCRIPTS_DIR=${BASE_DIR}/bin/scripts-YYYYMMDD-HHMM/
288
289 Where YYYYMMDD-HHMM is a time-stamped release of the Moses scripts. Be sure
290 to change it to the actual value in the folder name.
291
292 The Moses decoder is located in "moses-cmd/src/moses", in the directory
293 where Moses was compiled.
294
295 One can change/ignore the default maximum heapsize by editing/commenting the
296 following line in the 'OpenMaTrEx' shell
297
298 export JAVA_OPTIONS=-Xmx4096m
299
300 Adding the option -XX:-UseGCOverheadLimit may also help if one wants
301 to allow Java extra time for garbage collection when the heap is small.
302
303 2.4 Building the Local Copy of Javadocs for OpenMaTrEx (Optional)
304
305 The build.xml used to compile the OpenMaTrEx package also has a built-in
306 target for preparing javadocs. Hence you can build the javadocs just by
307 firing the fol- lowing command in the OpenMaTrEx root directory where the
308 build.xml file resides
309
310 ant javadoc
311
312 This should create a doc directory within the OpenMaTrEx root directory
313 which will have the javadocs.

Mikel L. Forcada">Mikel L. Forcada
ViewVC Help
Powered by ViewVC 1.1.5