D5.6: Transfer Selection Support 

 
PANACEA is a EC funded project under Grant Agreement 248064 

 
SEVENTH FRAMEWORK PROGRAMME 

THEME 3 

Information and Communication Technologies 

 
PANACEA Project 
Grant Agreement no.: 248064 

 
Platform for Automatic, Normalized Annotation and 

Cost-Effective Acquisition 

of Language Resources for Human Language Technologies 

 
D5.6 

Transfer Selection Support 

 
Dissemination Level:  Public 

Delivery Date: June 15
th
 2012 

Status – Version: Final v1.0 

Author(s) and Affiliation: Gregor Thurmair, Vera Aleksić (Linguatec) 

 
  D5.6 Transfer Selection Support 
 

This document is part of technical documentation generated in the PANACEA Project, Platform 
for Automatic, Normalized Annotation and Cost-Effective Acquisition (Grant Agreement no. 
248064). 
 

This documented is licensed under a Creative Commons Attribution 3.0 Spain License. To view 
a copy of this license, visit http://creativecommons.org/licenses/by/3.0/es/. 

 
Please send feedback and questions on this document to: iulatrl@upf.edu 

TRL Group (Tecnologies dels Recursos Lingüístics), Institut Universitari de Lingüística 
Aplicada, Universitat Pompeu Fabra (IULA-UPF) 

 
D5.6: Transfer Selection Support 

 
PANACEA is a EC funded project under Grant Agreement 248064 

 
Relevant Documents 

 
Panacea Deliverable D5.4: English-French and English-Greek bilingual dictionaries for the 

Environment and Labour Legislation domains 

 
Panacea Deliverable D5.5: English-French and English-Greek bilingual dictionaries for the 

Environment and Labour Legislation domains 

 
Panacea Deliverable D5.7: Sample of Transfer Entries produced 

 
Panacea Deliverable 7.4: Evaluation Report (Third cycle) 


D5.6: Transfer Selection Support 

 
PANACEA is a EC funded project under Grant Agreement 248064 

 
Table of Contents 

1. Introduction ................................................................................................................................. 4 

1.1 Types of Transfer ................................................................................................................... 4 

1.3 Related Work ......................................................................................................................... 5 

1.4 Approach ............................................................................................................................... 5 

2. The Lexicon ................................................................................................................................. 7 

2.1 The LinguaDict Lexicon........................................................................................................ 7 

2.2 Lexicon Preparation .............................................................................................................. 8 

3. The Corpus ................................................................................................................................ 10 

3.1 Corpus Collection ................................................................................................................ 10 

3.2 Corpus Processing ............................................................................................................... 10 

3.3 Subcorpus creation .............................................................................................................. 11 

4. Creation of the Lt-Xfr  Lexicons ............................................................................................... 14 

4.1 Conceptual Lexicon ............................................................................................................. 14 

4.2 Probability Lexicon ............................................................................................................. 15 

5. Test and Evaluation ................................................................................................................... 17 

5.1 Test data ............................................................................................................................... 17 

5.2 Test systems ......................................................................................................................... 18 

5.3 Test results ........................................................................................................................... 18 

6. PANACEA Integration .............................................................................................................. 20 

6.1 Formats ................................................................................................................................ 20 

7. Assessment ................................................................................................................................ 22 

7.1 Relevance ............................................................................................................................ 22 

7.2 Quality ................................................................................................................................. 22 

7.3 Extensions ........................................................................................................................... 23 

8 Citations...................................................................................................................................... 24 

 
D5.6: Transfer Selection Support 

 
4 

 
1. Introduction 

The objective of the task 5.3, and the tool LT-XFR created to meet this challenge, is to find 

automatic means for transfer selection: If a source word has several translations then the right 

one for a given context must be found. This problem becomes the more important the larger 

the dictionary is, and occurs much more frequently than a case where there is no translation at 

all. 

Of course, the problem is only relevant for 1:n transfers; if a source term has exactly one 

translation then there is no selection problem. 

The basis of the investigation is a transfer dictionary; this is a bilingual and directed resource. 

The terminology used in the following section is: 

 an entry (or transfer) is a combination of a source and a target term (defined by: <source-

lemma, source-part-of-speech, target-lemma, target-part-of-speech>) 

 a package is a set of entries with by a common source side (defined by: <source-lemma, 

source-part-of-speech>). A package consists of at least one entry; in the present case, only 

packages containing more than one entry are of interest. Packages differ depending on 

language direction, therefore bilingual lexicons are directed. 

The context of the investigation is knowledge-driven (rule-based) MT. As for SMT, it should 

be noted that transfer selection fully depends on the presence of the translations in the training 

corpus; this fact makes transfer selection in SMT very much domain (or even trainings-text) 

dependent. It will be shown that even in large training sets, many entries do not occur at all. 

Another difference is that the means to disambiguate transfers is on the source side, whereas 

in SMT it is on the target side (the target LM selects the best from a set of transfer options 

coming from the phrase table). So SMTs can better react to local contexts, for the cases where 

the transfers are in the training set. 

The original title of the package was ‘transfer rule creation’, following the paradigm of rule-

based MT. However, it turned out that the only ‘rule’ applied in this work is to look up the 

conceptual context, and the task is to provide significant context clues; therefore the title of 

the deliverable was selected broader than just focusing on the rule aspect. 

1.1 Types of Transfer 

Another distinction which is relevant here is the type of transfer to be considered. We 

distinguish between the following transfer types: 

 structural transfer is a change in the target which is independent of the lexical material 

involved. Example: complex prenominal adjectives in German must be represented in 

English as relative clauses 

 lexical transfer is dependent of the lexicon entries involved. Several cases exist here: 

o Simple lexical transfer is just a replacement of a word by its translation: (en) 

‘incineration’ -> (de) ‘Einäscherung’ 

o Complex lexical transfer takes additional information to disambiguate, and 

performs tests to find the correct transfer.  

 Local transfer considers features on the node which must be transferred 

(e.g. number: (de) ‘Schuld’ -> (en) ‘guilt’ if singular but -> (en) ‘debt’ if 

plural.  

 Contextual transfer inspects the context of the node to be transferred. This 

can be done on several levels: Lexical context [Frye 2012], syntactic 


D5.6: Transfer Selection Support 

 
5 

 
context (like transitivity) [Thurmair 1990], semantic context (e.g. (en) ‘eat’ 

-> (de) ‘essen’ for humans, -> (de) ‘fressen’ for animals), pragmatic 

contexts (domain features, locale etc.) and others.  

The present work deals with conceptual context, which is a form of lexical transfer based on 

lexical context, however not related to specific syntactic structures. It looks at concepts 

surrounding a translation candidate, and determines its transfer depending on such concepts. 

E.g. (en) ‘interest’ -> (de) ‘Zins’ in context of ‘money’, ‘pay’, ‘loan’ etc. but (en) ‘interest’ -> 

(de) ‘Interesse’ in other contexts like ’sports’, ‘activity’, ‘research’ etc. The challenge is to 

find such contexts in an automatic way, using parallel corpus data. 

1.3 Related Work 

1. There are approaches of word sense disambiguation which use bilingual material [e.g. 

Agirre/Edmonds, ed., 2006]. However, word senses and translations do not go parallel; 

polysemous words like (de) ‘Zelle’ transfer all their meanings into the target (en) ‘cell’. The 

goal of the current approach is not to disambiguate word senses but to find the best transfers
1
.  

2. There is significant work on automatic creation of transfer rules, recently cf. [Tyers et al. 

2012]. However, they look at close contexts of the transfer candidates (windows of trigrams 

to pentagrams); however such windows are rather small, and do not always contain the 

relevant information for disambiguation; and there will be significant overhead in the rules 

once the lexicon gets bigger. 

3. A similar approach of disambiguation of source language contexts was presented in 

[Thurmair 2005], called ‘neural transfer’ there. Only monolingual corpora were used there, 

disambiguation of contexts for translation candidates was done by manual annotation of 

training data, and the lookup context was extended from sentences to paragraphs; but very 

high accuracy could be reported. The current approach does automatic context disambiguation 

from parallel corpora, and uses only sentential contexts. 

4. There are approaches to do disambiguation at the target side, not at the source side. This is 

the current paradigm in SMT [Koehn 2010], and also tried in METIS-II [Carl et al., 2008]. 

This approach must carry all possible transfers of all source words into the target, and then try 

to disambiguate there. This creates a massive overhead, which could be reduced by using 

some source-language information. 

1.4 Approach 

The approach taken here tries to model human intuition which, looking at the context words 

of a term, is able to determine how it should be translated. As this intuition works quite 

successfully for humans, it is tried to identify such conceptual context, based on parallel 

corpora. 

The task takes two resources: 

 a lexicon containing possible transfers of a given word; such a lexicon can e.g. result from 

a bilingual term extraction component as described in [Thurmair/Aleksić 2012], from 

legacy systems, or from other available data. 

 a parallel corpus which allows to identify contexts for certain translations 

It produces a resource (a corpus-based add-on to a transfer lexicon) which can be queried at 

                                                 
1
 A similar approach towards transfer can be found in [Brown et al. 1991], but they use just one contextual 

‘informant’. 


D5.6: Transfer Selection Support 

 
6 

 
runtime as an additional source of information. This resource is a static resource, logically 

independent of the MT system and can be used for both ‘deep’ and ‘shallow’ MT. 

The task is executed in the following way: 

 Take a bilingual dictionary, and identify the packages they contain; these packages are the 

target objects of the disambiguation effort. 

 For all source and target lemmata in the packages, index the bilingual corpus for the 

sentences in which they both occur (on source and target side) 

 For all translations of each source entry of each package, create subcorpora consisting of 

the sentence pairs containing the source lemma and the target lemma of this entry. This 

step will subdivide the monolingual source and target corpora into subsets of parallel 

sentences in which the source term has the same translation. 

 Try to identify significant co-occurrences in the source language subcorpus which are 

specific for this translation. The goal is to be able to determine a cluster of source 

language words which indicates a certain transfer selection. 

The result of the task will be a resource which, for each translation in a package, gives a 

vector of contexts which trigger the translation in question. At runtime, this resource will be 

queried, by matching the context of the source language candidates with all possible 

translations, and selecting the best matching cluster and its related translation. 

The test would consist creating a test set containing sentence contexts with ‘right’ (reference) 

translations for the test terms, and in analysing these sentences and their context and 

comparing the transfer proposals with the transfers used by the reference. 

The tool is called ‘LT-Xfr’, and has been developed here for German-to-English language 

direction. 


D5.6: Transfer Selection Support 

 
7 

 
2. The Lexicon 

For the investigations, a dictionary was taken as it is used for human lookup but modified for 

machine processing. The LinguaDict lexicon [http://www.linguatecapps.com/linguadict] was 

selected in order to extend the coverage of transfers beyond normal MT lexicons, and have a 

realistic size of a bilingual lexicon. Compared to MT lexicons, lexicons for human lookup 

contain much more transfers, and give clues how to select the best transfer in a given 

situation. The challenge is to find such transfer disambiguation clues by corpus analysis. 

 
no. entries no. SL terms no. transfers/term 

de->en 213.200 144.900 1.47 

en->de 213.200 136.100 1.57 

Tab. 2-1: Size of LinguaDict 

 
As the transfer selection is directed, i.e. specific for a given language direction, the tests were 

made for German -> English. 

2.1 The LinguaDict Lexicon 

The LinguaDict lexicon in its German-English version is a state-of-the-art lookup resource; it 

is available both for online and offline (on mobile devices) lookup. It is a bilingual and 

directed lexicon; the language directions are built on the fly from a common data base. 

It consists of single and multiword entries, and offers part-of-speech, gender and inflection 

information, and a pronunciation for most entries. Examples are given in Fig. 2-2. 

 
Fig. 2-2: Example of LinguaDict entries: GUI (left), entry examples (right); a typical example for the transfer selection 

problem here is the entry for ‘consult’ 

 
Overall, the lexicon contains 212,000 entries (plus about 1000 entries without transfer (links 

for strong verbs etc.). 


D5.6: Transfer Selection Support 

 
8 

 
2.2 Lexicon Preparation 

2.2.1 Preparatory Steps 

The lexicon was prepared in the following way: 

 All packages with only a single transfer were removed. For these packages, the problem 

of transfer selection does not exist. After this, 104.200 entries remained. 

 All function word entries were removed, as they need a different type of transfer selection, 

and are much more interwoven with the MT system internals 

 The treatment of all entries containing multiwords on either source or target side was 

postponed, to reduce the initial complexity of the task. They need to be integrated later. 

 All entries containing part-of-speech changes were removed, as this is syntactic 

information: (de-adj) ‘sicher’ -> (en-adj) ‘secure’ and (en-adv) ‘securely’. Entries of this 

kind are in lexicons if the adverb formation is not completely regular. However, to 

determine the right part-of-speech selection, syntactic analysis is needed, which goes 

beyond the scope of the current Lt-Xfr work, and cannot easily be modelled by an 

approach for conceptual transfer 

After these operations, 27.000 packages with 71.400 entries remained for the investigation. 

Table 2-3 gives the details on the lexicon used for the following analysis. 

 
part of speech no. packages no. entries no. transfers / entry 

adjectives   6,900 18,200 2,83 

nouns 15,600 35,400 2,27 

verbs   4,500 17,800 3,26 

total 27,000 71,400 2,63 

Tab. 2-3: Packages in the lexicon 

 
2.2.2 Lexicon Inspection 

A short investigation of the lexicon entries reveals that conceptual transfer will never have full 

coverage, and a multitude of transfer selection strategies is required to do proper transfer, as 

many transfers will not be able to be disambiguated on a purely conceptual level: 

 locale: (de) ‘geschmack’ -> ‘flavor’ (en-us) / ‘flavour’ (en-uk));  

 spelling: (en) ‘adaptable’ ->: (de-old) ‘anpaßbar’ and (de-new) ‘anpassbar’ 

 register: (en) ‘anglophobe’ ->  (de-lit) ‘anglophob’ and (de-coll) ‘englandfeindlich’;  

               (en) ‘adiposity’ -> (de-lit) ‘Adipositas’ and (de-coll) ‘Verfettung’ 

 topic: (en) ‘case’ -> (de-legal) ‘Fall’ and (de-mechan) ‘Gehäuse’ 

The lexicon just provides the different alternatives in such cases; it is the task of the global 

system to identify which one to select. This is often done by user settings (locale, topic etc.), 

or automatic tools like topic identification, register and spelling selection etc. must be run. 

As many of the cases just presented could be considered to be synonyms on a semantic level, 

these aspects are not considered in the following analysis; the focus here is on transfer 

selection based on conceptual contexts
2
. 

 
2
 It could have been an option to normalise such varaints before cluster building; this could have resulted in 

better clusters. 


D5.6: Transfer Selection Support 

 
9 

 
D5.6: Transfer Selection Support 

 
10 

 
3. The Corpus 

For the remaining packages of the lexicon, an automatic contextual disambiguation is tried. 

To do this, a parallel corpus is used. The goal is to find conceptual contexts in the corpus 

which allow the disambiguation of translation alternatives. 

3.1 Corpus Collection 

The corpus used for the LT-Xfr experiments consists of parallel sentences collected from 

different domains; details are given in Tab. 3-1: 

 
Domain no sentences 

automotive 47,485 

dgt 530,760 

europarl 1,739,154 

health&safety 57,155 

jrc-acquis 1,239,731 

e-books 82,635 

statmt_dev 15,134 

statmt_news 136,227 

total 3,848,281 

Tab. 3-1: Parallel corpus used (sentences) 

 
Overall, 3.8 mio parallel sentences German-English were used for the experiment.  

3.2 Corpus Processing 

The corpus data were processed in the following way: 

 
Step 1: Format conversion 

All corpus sentences were converted into the PANACEA TO format: Text converted into 

UTF8, <s> tags were inserted with unique sentence-ids and language attribute. Errors in the 

original sentence segmentation were not corrected, as the sentences were already parallelised, 

and the sentence alignment could have been lost. 

 
Step 2: Lemmatisation and tagging 

All sentences of the corpus underwent lexical analysis, i.e. they were tokenised and 

lemmatised as described in [Thurmair et al. 2012]. Cases of homography, as far as related to 

content words, were disambiguated using a simple tagger. This step produced the <textform, 

lemma, POS> triples to work with later on. 

 
Step 3: Monolingual Indexing 

Each <lemma, POS> pair of the corpus which also occurred in the lexicon test set was 


D5.6: Transfer Selection Support 

 
11 

 
indexed (lemma -> sentence ids), and its frequency was computed
3
. The result were two index 

files (one for German, one for English), containing lemmata pointing to sentence ids. 

 
Step 4: Bilingual indexing 

The two index files were merged, such that: for each package: for each transfer, all common 

sentence-id’s were collected into a bilingual index file; an example is given in fig. 3-2. 

 
Fig. 3-2: Example of bilingual index file (automotive corpus). many entries have no sentence in common, i.e. there is 

no evidence for this transfer in the corpus. 

 
For many entries, no bilingual sentences could be found, mainly because there were no 

correspondences in the corpus. These entries had to be eliminated. 

3.3 Subcorpus creation 

From the bilingual index file, subcorpora were created in the following way: 

 
Step 1: Corpus collection 

For each transfer of a package, the common subset of sentence ids was identified, and the 

respective sentences were collected. As an output, one file per package was built, containing: 

 For each package: the number of transfers for this package, and the sum of all sentences 

used therein 

 For each transfer: the sentences in which it occurred (i.e. for the sentence-IDs oin fig.3-2, 

the real sentences were fetched. 

                                                 
3
 Indexing was not done on textforms but on lemmata. 


D5.6: Transfer Selection Support 

 
12 

 
This operation left 2.96 mio sentences which contained relevant lemmata. 

 
Step 2: Word alignment 

In order to avoid accidental co-occurrence of a SL-TL pair, the subcorpora were filtered using 

the criterion of word alignment: Only SL-TL word pairs which could be word-aligned were 

kept in the data
4
. For word alignment, GIZA++ was used. All sentence pair candidates which 

could not be word-aligned were removed from the subcorpora. 

This operation removed another 280K sentences from the text base, leaving 2.68 mio 

sentences for the following steps. It would be worth looking at the difference; it could result 

either from real accidental co-occurrences, or from word alignment errors. 

More importantly, this step also removed entries, and whole packages, for which no word 

alignment could be found, either because they did not co-occur in any sentence pair, or 

because they could not be word-aligned. 

Table 3-3 shows the remaining data sets. 

 
part of speech original packages after bilingual indexing after word alignment 

adjectives 6,900 4,670 1,240 

nouns 15,600 11,360 3,690 

verbs 4,500 3,930 1,680 

Total 27,000 19,960 6,610 

Tab. 3-3: Data sets (packages) available at the beginning, after bilingual indexing, and after word alignment. 

 
It can be seen that only 6.600 packages out of 27.000 could be used for the experiment. So, 

even in a large parallel corpus, for only 25% of the entries, parallel data can be provided to try 

contextual transfer selection. As a consequence, additional means of transfer selection must be 

provided for a working system, beyond parallel-corpus-driven automatic extraction. 

 
Step 3: Classification 

The resulting subcorpora were classified according to the data which were available for 

disambiguation: 

 class 1: all transfers of a package have at least five sentences where this transfer is used 

 class 2: each transfer of a package has at least one sentence in which it occurs 

 class 3: a package contains one or several transfers with no occurrence in any sentence 

pair. 

Of the remaining subcorpora (one per package), only one third shows more than five 

sentences per transfer. Nouns are slightly better represented than verbs and adjectives, cf. Tab. 

3-4 

 
 adj  noun  vrb
5
  total  

packages 1244  3693  1677  6614  

                                                 
4
 This was possible as the XFRlexicon contains only single word transfers. 

5
 Note that verb statistics are not fully accurate as verbs with separated prefixes (‚kommt ... an’) are not 

correctly lemmatised 


D5.6: Transfer Selection Support 

 
13 

 
class 1 301 24.2% 1370 37.1% 447 26.6% 2118 32.0% 

class 2 667 53.6% 1789 48.4% 859 51.2% 3315 50.1% 

class 3 276 22.2% 534 14.5% 371 22.2% 1181 17.9% 

Tab. 3-4: Distribution of corpus data for the different packages: Assignment to classes 

 
From these subcorpora, about 1000 sentences of class 1 were subtracted to be used as a test 

set; the rest was used for training. 


D5.6: Transfer Selection Support 

 
14 

 
4. Creation of the Lt-Xfr  Lexicons 

The analysis of the package coverage showed that sufficiently many contexts would only be 

available for one third of the translation entries resulting from subcorpus collection. To 

provide disambiguation means for the other entries, additional information had to be 

provided. 

Therefore a strategy was adopted which is based on two kinds of information: 

 conceptual context clusters, as the original approach suggested. These data are collected in 

a conceptual lexicon (ConcLex); 

 a translation based on frequency information as a fallback: In case no cluster is available, 

different probability measures are used for transfer selection. The probabilities are 

collected in a probability lexicon (ProbLex). 

Both lexicons are consulted at runtime, in sequential order. 

4.1 Conceptual Lexicon 

The conceptual lexicon is created by analysing the subcorpus attached to each entry for co-

occurrences: All lemmata of all sentences of the subcorpus are compared, and the ones with 

the best co-occurrence score are taken. Experiments to restrict the resulting clusters to a 

certain size, or to use a threshold, showed that the data sparsity requires to basically leave all 

candidates in the clusters. 

Also, lemmata co-occurring with several transfers of a package were not eliminated but left in 

the clusters, as they could still help to disambiguate from other translation candidates, and 

would leave many transfers with very few contexts. 

In addition, experiments to include the distance of the co-occurring word to the lemma weight 

as an additional precision measure have been postponed; their influence would only be 

marginal
6
. 

The output of the component which builds the conceptual lexicon is the lexicon itself 

(ConcLex). It gives, for each source language package defined by <lemma, part-of-speech>, a 

list of translations, consisting of: the translation, its part-of-speech, and an optional cluster of 

variable size, consisting of pairs of <sourcelanguage-lemma, weight>, the weight giving the 

strength of the co-occurrence.  

Such a cluster can be matched to the context lemmata of an input sentence at runtime, and 

their similarity can be computed. 

An example is given in fig. 4-1. 

In case a translation has no example sentences, the conceptual cluster for this translation must 

be left empty. To be able to still include them in the transfer selection, a fallback strategy was 

implemented using probabilities. 

 
6
 An exception may be certain prepositions indicating strict subcategorisation; to be investigated. 


D5.6: Transfer Selection Support 

 
15 

 
Fig. 4-1: Example of conceptual lexicon 

 
4.2 Probability Lexicon 

In case the conceptual transfer does not lead to a result (and this case is rather likely given the 

amount of transfers without any context because there were no sentences), a fallback strategy 

is created, which consists in computing a translation probability score.  

Previous experiments in the creation of the LinguaDict lexicon have shown that simply using 

the (target monolingual) corpus frequency of a translation is not the best option: We want to 

know how often the target lemma occurs as translation of a given source lemma. Otherwise 

target lemmata which are very frequent (like ‘be’ or ‘have’) disturb the transfer selection.  

Also, a relevant factor is for how many words a given target lemma is a translation: If a target 

lemma has high frequency as translation of only one source word, then this is much more 

important than if the frequency results from the fact that it is the translation of e.g. five source 

words. 

Therefore a more complex approach than simple target corpus frequency was taken: The 

translation probability consists of three scores, differing in the reference used to compute 

them. These scores are: 

 Package probability: probability of a given translation related to the other translations of 

this package. Number of sentences for the given translation DIV total number of 

sentences in this package. (0 for all transfers without a sentence in the subcorpus); 

 Target probability: probability of a given translation related to other source terms (i.e. 

for how many SL lemmas is this a possible transfer?) (0 for all terms which are in no 

package) 

 Corpus probability: probability that this translation is used at all in the target language. 

Number of occurrences of TL lemma DIV number of lemmata in the total TL corpus. 

Querying of the probability lexicon is done sequentially, i.e. if a score is zero then the next 

‘weaker’ score is taken. Finally, a corpus probability is nearly always available
7
. 

                                                 
7
 Only in this particular setup. Note that many entries of LinguaDict do not have any reference in the corpus. 


D5.6: Transfer Selection Support 

 
16 

 
The format of the probability lexicon is a tuple of <source lemma, source-pos, tl-lemma, tl-

pos, package prob, target-prob, corpus-prob>. An example is given in fig. 4-2. 

 
Fig. 4-2: Example of the probabilistic lexicon8 

 
These two lexicons (ConcLex and ProbLex) are used for the test and evaluation. The 

challenge is to determine the transfer of a source-lemma based on the context in which it 

occurs. 

 
8
 Words without even a corpus probability could be due to lemmatisation / tagging errors 


D5.6: Transfer Selection Support 

 
17 

 
5. Test and Evaluation 

The transfer selection component is tested by determining the transfer of  a test lemma in a 

given sentence context, and comparing it with the one of a reference translation. In the best 

case, all translations proposed by the Lt-Xfr component are identical with the transfers 

selected in the reference translations. 

As the LinguaDict lexicon contains many near translations, which can hardly be distinguished 

on the basis of conceptual transfer, a special evaluation procedure was adopted, consisting of 

three ranks instead of a binary decision: 

 Rank 1: the translation proposed by the system is identical to the one in the test reference 

sentence 

 Rank 2: the proposed translation close / synonym to the one in the test reference sentence. 

This was decided to be the case if 

o the proposed translation belongs to the same WordNet synset as the reference 

o the proposed translation is orthographically similar to the reference (like: ‘electric’ 

vs. ‘electrical’, ‘agglutinating’ vs. ‚agglutinative‘, ‘dialogue’ (UK) vs. ‘dialog’ 

(US) etc. 

 Rank 3: the two translations are (still) different. 

Evaluation would allow rank1 and rank2, and reject rank3 results. 

Based on the three ranks, a simple scoring system is used (rank1 = 1, rank2 = 2, rank3 = 3) to 

compute an overall score: The lower the score the closer the translation is to the reference. 

5.1 Test data 

5.1.1 Test corpus 

The test corpus was taken from the subcorpora used for the research (cf. section 3.3 above). 

1044 sentences were extracted, containing transfers for nouns, verbs, and adjectives.  

5.1.2 Resources for ranking 

For ranking (esp. rank2: similarity), two additional resources were produced: 

 an indexed version of WordNet V3, whereby for a given input lemma a list of possible 

synonyms was retrieved (i.e. the synset lemmata
9
). 

 a resource for orthographic similarity. For all parts of speech, a resource was used which 

unifies US and UK spelling (This list contains about 4,700 entries). For adjectives, 

additional patterns were considered, like ‘adj + -ed’ (‘abstract’ vs. ‘abstracted’), ‘adj-ic + 

al’ (‘acoustic’ vs. ‘acoustical’) etc. 

The test frame applies pattern matching for the strings, and simple lookup for the differences 

in locale. 

5.1.3 Test frame 

It was not possible with the available resources to integrate the Lt-Xfr component into a 

complete MT system. Therefore a special test system was written which has a translation 

candidate (source lemma) and a sentence context as an input, and returns the ‘best matching’ 

                                                 
9
 As the test lexicon contains only single words, also only the single words of the synsets were taken. 


D5.6: Transfer Selection Support 

 
18 

 
transfer (target lemma). This return lemma can be compared to the reference translation, and 

ranked: In case they are not identical, it can be checked if they are both in the same WordNet 

synset, or are orthographically similar (rank 2). 

5.2 Test systems 

Two test systems were built: 

 one with the full component (called Lt-Xfr below), with all options produced, and both the 

conceptual and the probability lexicon 

 one with only the fallback (called Lt-Xfr-frq below), using the probability lexicon but not 

the conceptual lexicon; this is relevant in cases where no conceptual context information 

would be available. 

For comparison, the test sentences were also given as input to several available MT systems, 

both with statistical and rule-based architecture. Their translations of the test lemmata were 

extracted, and also ranked according to the three ranks chosen (also using the synset and the 

orthographic similarity). 

5.3 Test results 

First, the output of the two Lt-Xfr systems was evaluated against the reference translation 

(absolute evaluation), and then it was compared to the output of the other MT systems 

(comparative evaluation). Results are shown in Table 5-1. 

5.2.1 Absolute Evaluation 

For this evaluation, the test sentences were analysed with the LT-Xfr frame, and the resulting 

transfer was compared to the reference translation. As explained, this procedure was done for 

two system variants:  

 One which takes both conceptual and probability lexicon (Lt-Xfr) 

 One which searches transfers only based on probability information (Lt-Xfr-frq) 

It can be seen that 60% of the test terms are correctly translated (rank 1), and if WordNet and 

string-similarity synonyms are taken into account, then 75% of the test sentences return a 

correct transfer. The values are kind of similar for all parts-of-speech, with verbs doing a bit 

better than the other parts of speech. 

As a result, if a random selection of transfers is assumed as a baseline (with about 41%   

correctness), then the Lt-Xfr improves over the baseline by absolute 34%, and relative 83%; 

improvement is most significant for verbs (with more than 100% relative). For the fallback 

system (only frequency-based), the improvement is still 25.6% absolute, and 61.6% relative. 

5.2.2 Comparative Evaluation 

In order to have an impression how the result is compared to the state of the art, the test 

sentences were translated with several available MT systems, to have an impression how 

useful they would be. The systems selected for comparison were one SMT  and four RMT 

systems. The test sentences were translated, and the translations for the test words were 

identified and compared to the reference translation. Like for the absolute evaluation, total 

(rank1) and partial (rank2) identity were computed, as well as the overall scores. Tab. 5-1 

shows the evaluation result. 

It can be seen that the LT-Xfr system clearly shows the best performance of all systems in all 

categories. It has much better scores than all RMT systems, and also better scores than 

Google. It is absolute 20% better than the least-performing MT system, and still 7% better 


D5.6: Transfer Selection Support 

 
19 

 
than the best-performing one. Even the fallback frequency-based (LT-Xfr-freq) version 

outperforms all RMT systems, and is better than Google in three of six categories (Verbs1, 

Verbs/1+2, Adj/1+2). 

 
Tab. 5-1: Evaluation results,  compared to the reference. Number sentences, ranks (sentences, percentage), per part of 

speech, total, and score, for all systems. 

 
However the result shows that significant improvement in transfer selection can be achieved 

with the techniques used by LT-Xfr, compared to the state-of-the-art of MT systems. 

More detailed information on the evaluation is given in PANACEA deliverable D7-4. 


D5.6: Transfer Selection Support 

 
20 

 
6. PANACEA Integration 

It was not possible, due to the lack of resources, to provide a full workflow how to build the 

two lexicons for additional language directions: 

 input data, consisting of large bilingual transfer lexicons, and of large parallel corpora, 

must be provided 

 the tools in the processing chain must be streamlined, and brought into a better sequence; 

exploratory steps can be skipped. 

However, to demonstrate the scope of the tool, the test frame was made available as a web 

service in the PANACEA registry. 

 
6.1 Formats 

The service is available as a web service in the PANACEA registry, called: 

http://80.190.143.163/panaceaV2/services/LTXfr?wsdl 

Parameters are: 

 source language (only ‘de’ is supported) 

 target language (only ‘en’ is supported) 

 text (a string containing an URL pointing to the input file) 

 
The text must be an UTF8 file, and have a 3-column layout: 

 source language lemma (in normalised form, incl. lowercasing); only single word lemmata 

are supported so far 

 source lemma part-of-speech (No for noun, Vb for verb, Ad for adjectives) 

 context sentence (containing the lemma to be translated) 

 
The format of the resulting file is the same, with an additional column inserted which contains 

the translation which the system proposed for this particular sentence context. 

An example is given in fig. 6-1. 

 
D5.6: Transfer Selection Support 

 
21 

 
Fig. 6-1: Example of input and output of the LT-Xfr lookup service 


D5.6: Transfer Selection Support 

 
22 

 
7. Assessment 

7.1 Relevance 

The transfer strategy presented here is just one of possible transfer strategies; others are 

transfer selection based on external information (topic, locale etc., which are passed to the 

transfer selection component by external features), or based on morphosyntactic content. 

The approach presented here shows the following features: 

 It fits to the architecture of rule-based system inasmuch as it provides transfer selection on 

the source side, not on the target side, and controls the transfer selection strategies for 

such systems. 

 It can be used as additional information source, as it provides a static resource which can 

easily be linked to a system: Most MT systems have an internal structure for their transfer 

packages, like a sequence of tests, and there is always a ‘default’ translation in cases 

where all tests fail. This could be the place where the current resource could successfully 

be used, i.e. for cases where no system relevant information is available. 

 As it relates to conceptual contexts, and is not linked to a particular syntactic structure or 

configuration, it is more robust than current selection strategies, which usually fail in 

cases where the required syntactic structure is not built (e.g. due to a parse failure). So it 

could be used as a fallback in cases where the analysis component returns improper results 

 For the same reason, the approach is independent of the specific system structure, the type 

of analysis results, syntactic structures etc.; it can support shallow MT systems just as well 

as all kinds of deep RMT. 

It simply leads additional information into the transfer selection process which is not used up 

to now. 

7.2 Quality 

The quality of the component crucially depends on the quality of the match between the text 

context and the clusters of the conceptual lexicon.  

1. One option to improve the matching is to extend the context from sentences to paragraphs; 

this step has been taken in [Thurmair 2005] and improves transfer quality to a level of 96% 

accuracy. However, most of the parallel data available today are aligned on sentence level, not 

on paragraph level, so such an approach would be difficult to train.  

2. Another option is to review the clusters. A look at the current clusters shows that there are 

lemmata which would be considered to be irrelevant for transfer selection, and that other 

lemmata are missing which would be expected here.  

 
Fig. 7-1: Cluster for blatt -> leaf 

 
In fig. 7-1, the translation of (de) ‘blatt’ -> (en) ‘leaf’ (as opposed to ‘sheet’ or ‘newspaper’) 


D5.6: Transfer Selection Support 

 
23 

 
would be corroborated by ‘zweig’, ‘frucht’, and also missing terms like ‘ast’ or ‘blüte’, 

whereas ‘patentieren’, ‘zypern’ etc. would not really contribute to the disambiguation of this 

reading.  

Additional missing concepts could be collected by doing monolingual correlation analysis, 

and add lemmata which are highly correlated with the terms of a given seed cluster. Such a 

strategy could provide good additional terms 

3. Clusters suffer from data sparsity, and the more so the less frequent the translations are: 

Many transfers in the conceptual lexicon simply have no conceptual context information at 

all. If there is no seed cluster there is no monolingual extension either. 

4. Therefore, to improve quality, an option must be foreseen to have the conceptual lexicon 

edited by human coders: They should be able to add / remove terms to improve the cluster 

accuracy, and adapt the transfer to specific types of texts, contexts, or other needs. Human 

editing would require a review of the current scoring mechanism, to be changed e.g. into a 

simple three-level score (very relevant – relevant – somewhat relevant), and the lookup would 

have to be adapted accordingly. 

5. Transfer selection on the target side, as done in SMT, has only one advantage: In cases of 

idiomatic expressions, created by an idiosyncratic combination of two target words, can easier 

be solved on the target side. However, such expressions can easily be added to the transfer 

lexicon (as they have to be added to the training data on the other side); they would not even 

create ambiguities in transfer selection. 

7.3 Extensions 

To stabilise the results of the current investigation, the following items must be considered: 

1. The analysis used only a subset of the lexicon; multiword entries and entries with changes 

in the part of speech were not considered.  

 Adding multiwords requires more sophistication in the step where word alignment is 

required; GIZA++ would not be the appropriate tool here anymore, and full MOSES 

phrase alignment may be necessary.  

 As for part-of-speech changes, the most frequent case in German->English is that 

adjectives used adverbially must be translated as adverbs. Care has been taken that the 

part of speech is given to the lookup as one of the input parameters; this can help in 

transfer selection. 

2. The processing chain needs to be stabilised to be able to run the component with other 

lexicon and corpus data. 

3. The coverage must be extended to other language directions. This would require large 

sentence-aligned bilingual corpora, a bilingual lexicon created e.g. with the Lt-P2G tool
10

 
from such a corpus, and language resources for analysis, esp. lemmatiser and tagger 

information.  

Optional, for ranking and evaluation, a WordNet and a ortho-similarity resource could be used 

for the target language. 

4. The cluster building itself could be improved by collapsing transfers which are clearly 

synonyms, or variants of each other (e.g. locale), before the analysis rather than afterwards in 

a step of ranking. This would provide more data for such readings in clustering. 

                                                 
10

 see task 5.2 of PANACEA, or [Thurmair/Aleksid 2012] 


D5.6: Transfer Selection Support 

 
24 

 
8 Citations 

 Agirre, E., Edmonds, Ph., eds., 2006: Word Sense Disambiguation. Springer 

 Brown, P., Della Pietra, St., Della Pietra, V., Mercer, R., 1991: Word-sense 

disambiguation using statistical methods. Proc. 29th ACL 

 Carl, M., Melero, M., Badia, T., Vandeghinste, V., Dirix, P., Schuurman, i., 

Markantonatou, St., Sofianopoulos, S., Vassiliou, M., Yannoutsou, O:, 2008: METIS-II: 

low resource machine translation. in: Machine Translation 22, 1-2, p.67-99 

 Koehn, Ph., 2010: Statistical Machine Translation. Cambridge Univ. Press 

 Santos, D., 2000: The translation network, A model for a fine-grained description of 

translaitons. In: Véronis, ed.: Parallel Text Processing. Kluwer. 

 Thurmair, Gr., 1990: Complex lexical transfer in METAL. Proc. 3
rd

 TMI Conf., Austin, 

Tx 

 Thurmair, Gr., 2005: Improving Machine Translation Quality. Proc. MT Summit X, 

Phuket 

 Thurmair, Gr., Aleksić, V., 2012: Creating term and lexicon entries from phrase tables. 

Proc. EAMT, Trento 

 Thurmair, Gr., Aleksić, V., Schwarz, Chr., 2012: Large scale lexical analysis. Proc. 

LREC, Istanbul 

 Tyers, F.M., Sánchez-Mártinez, F., Forcada, M.L., 2012: Flexible finite-state lexical 

selection for rule-based machine translation: Proc EAMT Trento