D5.6: Transfer Selection Support PANACEA is a EC funded project under Grant Agreement 248064 SEVENTH FRAMEWORK PROGRAMME THEME 3 Information and Communication Technologies PANACEA Project Grant Agreement no.: 248064 Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies D5.6 Transfer Selection Support Dissemination Level: Public Delivery Date: June 15 th 2012 Status – Version: Final v1.0 Author(s) and Affiliation: Gregor Thurmair, Vera Aleksić (Linguatec)   D5.6 Transfer Selection Support     This document is part of technical documentation generated in the PANACEA Project, Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition (Grant Agreement no. 248064). This documented is licensed under a Creative Commons Attribution 3.0 Spain License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/es/. Please send feedback and questions on this document to: iulatrl@upf.edu TRL Group (Tecnologies dels Recursos Lingüístics), Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra (IULA-UPF)   D5.6: Transfer Selection Support PANACEA is a EC funded project under Grant Agreement 248064 Relevant Documents Panacea Deliverable D5.4: English-French and English-Greek bilingual dictionaries for the Environment and Labour Legislation domains Panacea Deliverable D5.5: English-French and English-Greek bilingual dictionaries for the Environment and Labour Legislation domains Panacea Deliverable D5.7: Sample of Transfer Entries produced Panacea Deliverable 7.4: Evaluation Report (Third cycle) D5.6: Transfer Selection Support PANACEA is a EC funded project under Grant Agreement 248064 Table of Contents 1. Introduction ................................................................................................................................. 4 1.1 Types of Transfer ................................................................................................................... 4 1.3 Related Work ......................................................................................................................... 5 1.4 Approach ............................................................................................................................... 5 2. The Lexicon ................................................................................................................................. 7 2.1 The LinguaDict Lexicon........................................................................................................ 7 2.2 Lexicon Preparation .............................................................................................................. 8 3. The Corpus ................................................................................................................................ 10 3.1 Corpus Collection ................................................................................................................ 10 3.2 Corpus Processing ............................................................................................................... 10 3.3 Subcorpus creation .............................................................................................................. 11 4. Creation of the Lt-Xfr Lexicons ............................................................................................... 14 4.1 Conceptual Lexicon ............................................................................................................. 14 4.2 Probability Lexicon ............................................................................................................. 15 5. Test and Evaluation ................................................................................................................... 17 5.1 Test data ............................................................................................................................... 17 5.2 Test systems ......................................................................................................................... 18 5.3 Test results ........................................................................................................................... 18 6. PANACEA Integration .............................................................................................................. 20 6.1 Formats ................................................................................................................................ 20 7. Assessment ................................................................................................................................ 22 7.1 Relevance ............................................................................................................................ 22 7.2 Quality ................................................................................................................................. 22 7.3 Extensions ........................................................................................................................... 23 8 Citations...................................................................................................................................... 24 D5.6: Transfer Selection Support 4 1. Introduction The objective of the task 5.3, and the tool LT-XFR created to meet this challenge, is to find automatic means for transfer selection: If a source word has several translations then the right one for a given context must be found. This problem becomes the more important the larger the dictionary is, and occurs much more frequently than a case where there is no translation at all. Of course, the problem is only relevant for 1:n transfers; if a source term has exactly one translation then there is no selection problem. The basis of the investigation is a transfer dictionary; this is a bilingual and directed resource. The terminology used in the following section is:  an entry (or transfer) is a combination of a source and a target term (defined by: )  a package is a set of entries with by a common source side (defined by: ). A package consists of at least one entry; in the present case, only packages containing more than one entry are of interest. Packages differ depending on language direction, therefore bilingual lexicons are directed. The context of the investigation is knowledge-driven (rule-based) MT. As for SMT, it should be noted that transfer selection fully depends on the presence of the translations in the training corpus; this fact makes transfer selection in SMT very much domain (or even trainings-text) dependent. It will be shown that even in large training sets, many entries do not occur at all. Another difference is that the means to disambiguate transfers is on the source side, whereas in SMT it is on the target side (the target LM selects the best from a set of transfer options coming from the phrase table). So SMTs can better react to local contexts, for the cases where the transfers are in the training set. The original title of the package was ‘transfer rule creation’, following the paradigm of rule- based MT. However, it turned out that the only ‘rule’ applied in this work is to look up the conceptual context, and the task is to provide significant context clues; therefore the title of the deliverable was selected broader than just focusing on the rule aspect. 1.1 Types of Transfer Another distinction which is relevant here is the type of transfer to be considered. We distinguish between the following transfer types:  structural transfer is a change in the target which is independent of the lexical material involved. Example: complex prenominal adjectives in German must be represented in English as relative clauses  lexical transfer is dependent of the lexicon entries involved. Several cases exist here: o Simple lexical transfer is just a replacement of a word by its translation: (en) ‘incineration’ -> (de) ‘Einäscherung’ o Complex lexical transfer takes additional information to disambiguate, and performs tests to find the correct transfer.  Local transfer considers features on the node which must be transferred (e.g. number: (de) ‘Schuld’ -> (en) ‘guilt’ if singular but -> (en) ‘debt’ if plural.  Contextual transfer inspects the context of the node to be transferred. This can be done on several levels: Lexical context [Frye 2012], syntactic D5.6: Transfer Selection Support 5 context (like transitivity) [Thurmair 1990], semantic context (e.g. (en) ‘eat’ -> (de) ‘essen’ for humans, -> (de) ‘fressen’ for animals), pragmatic contexts (domain features, locale etc.) and others. The present work deals with conceptual context, which is a form of lexical transfer based on lexical context, however not related to specific syntactic structures. It looks at concepts surrounding a translation candidate, and determines its transfer depending on such concepts. E.g. (en) ‘interest’ -> (de) ‘Zins’ in context of ‘money’, ‘pay’, ‘loan’ etc. but (en) ‘interest’ -> (de) ‘Interesse’ in other contexts like ’sports’, ‘activity’, ‘research’ etc. The challenge is to find such contexts in an automatic way, using parallel corpus data. 1.3 Related Work 1. There are approaches of word sense disambiguation which use bilingual material [e.g. Agirre/Edmonds, ed., 2006]. However, word senses and translations do not go parallel; polysemous words like (de) ‘Zelle’ transfer all their meanings into the target (en) ‘cell’. The goal of the current approach is not to disambiguate word senses but to find the best transfers 1 . 2. There is significant work on automatic creation of transfer rules, recently cf. [Tyers et al. 2012]. However, they look at close contexts of the transfer candidates (windows of trigrams to pentagrams); however such windows are rather small, and do not always contain the relevant information for disambiguation; and there will be significant overhead in the rules once the lexicon gets bigger. 3. A similar approach of disambiguation of source language contexts was presented in [Thurmair 2005], called ‘neural transfer’ there. Only monolingual corpora were used there, disambiguation of contexts for translation candidates was done by manual annotation of training data, and the lookup context was extended from sentences to paragraphs; but very high accuracy could be reported. The current approach does automatic context disambiguation from parallel corpora, and uses only sentential contexts. 4. There are approaches to do disambiguation at the target side, not at the source side. This is the current paradigm in SMT [Koehn 2010], and also tried in METIS-II [Carl et al., 2008]. This approach must carry all possible transfers of all source words into the target, and then try to disambiguate there. This creates a massive overhead, which could be reduced by using some source-language information. 1.4 Approach The approach taken here tries to model human intuition which, looking at the context words of a term, is able to determine how it should be translated. As this intuition works quite successfully for humans, it is tried to identify such conceptual context, based on parallel corpora. The task takes two resources:  a lexicon containing possible transfers of a given word; such a lexicon can e.g. result from a bilingual term extraction component as described in [Thurmair/Aleksić 2012], from legacy systems, or from other available data.  a parallel corpus which allows to identify contexts for certain translations It produces a resource (a corpus-based add-on to a transfer lexicon) which can be queried at 1 A similar approach towards transfer can be found in [Brown et al. 1991], but they use just one contextual ‘informant’. D5.6: Transfer Selection Support 6 runtime as an additional source of information. This resource is a static resource, logically independent of the MT system and can be used for both ‘deep’ and ‘shallow’ MT. The task is executed in the following way:  Take a bilingual dictionary, and identify the packages they contain; these packages are the target objects of the disambiguation effort.  For all source and target lemmata in the packages, index the bilingual corpus for the sentences in which they both occur (on source and target side)  For all translations of each source entry of each package, create subcorpora consisting of the sentence pairs containing the source lemma and the target lemma of this entry. This step will subdivide the monolingual source and target corpora into subsets of parallel sentences in which the source term has the same translation.  Try to identify significant co-occurrences in the source language subcorpus which are specific for this translation. The goal is to be able to determine a cluster of source language words which indicates a certain transfer selection. The result of the task will be a resource which, for each translation in a package, gives a vector of contexts which trigger the translation in question. At runtime, this resource will be queried, by matching the context of the source language candidates with all possible translations, and selecting the best matching cluster and its related translation. The test would consist creating a test set containing sentence contexts with ‘right’ (reference) translations for the test terms, and in analysing these sentences and their context and comparing the transfer proposals with the transfers used by the reference. The tool is called ‘LT-Xfr’, and has been developed here for German-to-English language direction. D5.6: Transfer Selection Support 7 2. The Lexicon For the investigations, a dictionary was taken as it is used for human lookup but modified for machine processing. The LinguaDict lexicon [http://www.linguatecapps.com/linguadict] was selected in order to extend the coverage of transfers beyond normal MT lexicons, and have a realistic size of a bilingual lexicon. Compared to MT lexicons, lexicons for human lookup contain much more transfers, and give clues how to select the best transfer in a given situation. The challenge is to find such transfer disambiguation clues by corpus analysis. no. entries no. SL terms no. transfers/term de->en 213.200 144.900 1.47 en->de 213.200 136.100 1.57 Tab. 2-1: Size of LinguaDict As the transfer selection is directed, i.e. specific for a given language direction, the tests were made for German -> English. 2.1 The LinguaDict Lexicon The LinguaDict lexicon in its German-English version is a state-of-the-art lookup resource; it is available both for online and offline (on mobile devices) lookup. It is a bilingual and directed lexicon; the language directions are built on the fly from a common data base. It consists of single and multiword entries, and offers part-of-speech, gender and inflection information, and a pronunciation for most entries. Examples are given in Fig. 2-2. Fig. 2-2: Example of LinguaDict entries: GUI (left), entry examples (right); a typical example for the transfer selection problem here is the entry for ‘consult’ Overall, the lexicon contains 212,000 entries (plus about 1000 entries without transfer (links for strong verbs etc.). D5.6: Transfer Selection Support 8 2.2 Lexicon Preparation 2.2.1 Preparatory Steps The lexicon was prepared in the following way:  All packages with only a single transfer were removed. For these packages, the problem of transfer selection does not exist. After this, 104.200 entries remained.  All function word entries were removed, as they need a different type of transfer selection, and are much more interwoven with the MT system internals  The treatment of all entries containing multiwords on either source or target side was postponed, to reduce the initial complexity of the task. They need to be integrated later.  All entries containing part-of-speech changes were removed, as this is syntactic information: (de-adj) ‘sicher’ -> (en-adj) ‘secure’ and (en-adv) ‘securely’. Entries of this kind are in lexicons if the adverb formation is not completely regular. However, to determine the right part-of-speech selection, syntactic analysis is needed, which goes beyond the scope of the current Lt-Xfr work, and cannot easily be modelled by an approach for conceptual transfer After these operations, 27.000 packages with 71.400 entries remained for the investigation. Table 2-3 gives the details on the lexicon used for the following analysis. part of speech no. packages no. entries no. transfers / entry adjectives 6,900 18,200 2,83 nouns 15,600 35,400 2,27 verbs 4,500 17,800 3,26 total 27,000 71,400 2,63 Tab. 2-3: Packages in the lexicon 2.2.2 Lexicon Inspection A short investigation of the lexicon entries reveals that conceptual transfer will never have full coverage, and a multitude of transfer selection strategies is required to do proper transfer, as many transfers will not be able to be disambiguated on a purely conceptual level:  locale: (de) ‘geschmack’ -> ‘flavor’ (en-us) / ‘flavour’ (en-uk));  spelling: (en) ‘adaptable’ ->: (de-old) ‘anpaßbar’ and (de-new) ‘anpassbar’  register: (en) ‘anglophobe’ -> (de-lit) ‘anglophob’ and (de-coll) ‘englandfeindlich’; (en) ‘adiposity’ -> (de-lit) ‘Adipositas’ and (de-coll) ‘Verfettung’  topic: (en) ‘case’ -> (de-legal) ‘Fall’ and (de-mechan) ‘Gehäuse’ The lexicon just provides the different alternatives in such cases; it is the task of the global system to identify which one to select. This is often done by user settings (locale, topic etc.), or automatic tools like topic identification, register and spelling selection etc. must be run. As many of the cases just presented could be considered to be synonyms on a semantic level, these aspects are not considered in the following analysis; the focus here is on transfer selection based on conceptual contexts 2 . 2 It could have been an option to normalise such varaints before cluster building; this could have resulted in better clusters. D5.6: Transfer Selection Support 9 D5.6: Transfer Selection Support 10 3. The Corpus For the remaining packages of the lexicon, an automatic contextual disambiguation is tried. To do this, a parallel corpus is used. The goal is to find conceptual contexts in the corpus which allow the disambiguation of translation alternatives. 3.1 Corpus Collection The corpus used for the LT-Xfr experiments consists of parallel sentences collected from different domains; details are given in Tab. 3-1: Domain no sentences automotive 47,485 dgt 530,760 europarl 1,739,154 health&safety 57,155 jrc-acquis 1,239,731 e-books 82,635 statmt_dev 15,134 statmt_news 136,227 total 3,848,281 Tab. 3-1: Parallel corpus used (sentences) Overall, 3.8 mio parallel sentences German-English were used for the experiment. 3.2 Corpus Processing The corpus data were processed in the following way: Step 1: Format conversion All corpus sentences were converted into the PANACEA TO format: Text converted into UTF8, tags were inserted with unique sentence-ids and language attribute. Errors in the original sentence segmentation were not corrected, as the sentences were already parallelised, and the sentence alignment could have been lost. Step 2: Lemmatisation and tagging All sentences of the corpus underwent lexical analysis, i.e. they were tokenised and lemmatised as described in [Thurmair et al. 2012]. Cases of homography, as far as related to content words, were disambiguated using a simple tagger. This step produced the triples to work with later on. Step 3: Monolingual Indexing Each pair of the corpus which also occurred in the lexicon test set was D5.6: Transfer Selection Support 11 indexed (lemma -> sentence ids), and its frequency was computed 3 . The result were two index files (one for German, one for English), containing lemmata pointing to sentence ids. Step 4: Bilingual indexing The two index files were merged, such that: for each package: for each transfer, all common sentence-id’s were collected into a bilingual index file; an example is given in fig. 3-2. Fig. 3-2: Example of bilingual index file (automotive corpus). many entries have no sentence in common, i.e. there is no evidence for this transfer in the corpus. For many entries, no bilingual sentences could be found, mainly because there were no correspondences in the corpus. These entries had to be eliminated. 3.3 Subcorpus creation From the bilingual index file, subcorpora were created in the following way: Step 1: Corpus collection For each transfer of a package, the common subset of sentence ids was identified, and the respective sentences were collected. As an output, one file per package was built, containing:  For each package: the number of transfers for this package, and the sum of all sentences used therein  For each transfer: the sentences in which it occurred (i.e. for the sentence-IDs oin fig.3-2, the real sentences were fetched. 3 Indexing was not done on textforms but on lemmata. D5.6: Transfer Selection Support 12 This operation left 2.96 mio sentences which contained relevant lemmata. Step 2: Word alignment In order to avoid accidental co-occurrence of a SL-TL pair, the subcorpora were filtered using the criterion of word alignment: Only SL-TL word pairs which could be word-aligned were kept in the data 4 . For word alignment, GIZA++ was used. All sentence pair candidates which could not be word-aligned were removed from the subcorpora. This operation removed another 280K sentences from the text base, leaving 2.68 mio sentences for the following steps. It would be worth looking at the difference; it could result either from real accidental co-occurrences, or from word alignment errors. More importantly, this step also removed entries, and whole packages, for which no word alignment could be found, either because they did not co-occur in any sentence pair, or because they could not be word-aligned. Table 3-3 shows the remaining data sets. part of speech original packages after bilingual indexing after word alignment adjectives 6,900 4,670 1,240 nouns 15,600 11,360 3,690 verbs 4,500 3,930 1,680 Total 27,000 19,960 6,610 Tab. 3-3: Data sets (packages) available at the beginning, after bilingual indexing, and after word alignment. It can be seen that only 6.600 packages out of 27.000 could be used for the experiment. So, even in a large parallel corpus, for only 25% of the entries, parallel data can be provided to try contextual transfer selection. As a consequence, additional means of transfer selection must be provided for a working system, beyond parallel-corpus-driven automatic extraction. Step 3: Classification The resulting subcorpora were classified according to the data which were available for disambiguation:  class 1: all transfers of a package have at least five sentences where this transfer is used  class 2: each transfer of a package has at least one sentence in which it occurs  class 3: a package contains one or several transfers with no occurrence in any sentence pair. Of the remaining subcorpora (one per package), only one third shows more than five sentences per transfer. Nouns are slightly better represented than verbs and adjectives, cf. Tab. 3-4 adj noun vrb 5 total packages 1244 3693 1677 6614 4 This was possible as the XFRlexicon contains only single word transfers. 5 Note that verb statistics are not fully accurate as verbs with separated prefixes (‚kommt ... an’) are not correctly lemmatised D5.6: Transfer Selection Support 13 class 1 301 24.2% 1370 37.1% 447 26.6% 2118 32.0% class 2 667 53.6% 1789 48.4% 859 51.2% 3315 50.1% class 3 276 22.2% 534 14.5% 371 22.2% 1181 17.9% Tab. 3-4: Distribution of corpus data for the different packages: Assignment to classes From these subcorpora, about 1000 sentences of class 1 were subtracted to be used as a test set; the rest was used for training. D5.6: Transfer Selection Support 14 4. Creation of the Lt-Xfr Lexicons The analysis of the package coverage showed that sufficiently many contexts would only be available for one third of the translation entries resulting from subcorpus collection. To provide disambiguation means for the other entries, additional information had to be provided. Therefore a strategy was adopted which is based on two kinds of information:  conceptual context clusters, as the original approach suggested. These data are collected in a conceptual lexicon (ConcLex);  a translation based on frequency information as a fallback: In case no cluster is available, different probability measures are used for transfer selection. The probabilities are collected in a probability lexicon (ProbLex). Both lexicons are consulted at runtime, in sequential order. 4.1 Conceptual Lexicon The conceptual lexicon is created by analysing the subcorpus attached to each entry for co- occurrences: All lemmata of all sentences of the subcorpus are compared, and the ones with the best co-occurrence score are taken. Experiments to restrict the resulting clusters to a certain size, or to use a threshold, showed that the data sparsity requires to basically leave all candidates in the clusters. Also, lemmata co-occurring with several transfers of a package were not eliminated but left in the clusters, as they could still help to disambiguate from other translation candidates, and would leave many transfers with very few contexts. In addition, experiments to include the distance of the co-occurring word to the lemma weight as an additional precision measure have been postponed; their influence would only be marginal 6 . The output of the component which builds the conceptual lexicon is the lexicon itself (ConcLex). It gives, for each source language package defined by , a list of translations, consisting of: the translation, its part-of-speech, and an optional cluster of variable size, consisting of pairs of , the weight giving the strength of the co-occurrence. Such a cluster can be matched to the context lemmata of an input sentence at runtime, and their similarity can be computed. An example is given in fig. 4-1. In case a translation has no example sentences, the conceptual cluster for this translation must be left empty. To be able to still include them in the transfer selection, a fallback strategy was implemented using probabilities. 6 An exception may be certain prepositions indicating strict subcategorisation; to be investigated. D5.6: Transfer Selection Support 15 Fig. 4-1: Example of conceptual lexicon 4.2 Probability Lexicon In case the conceptual transfer does not lead to a result (and this case is rather likely given the amount of transfers without any context because there were no sentences), a fallback strategy is created, which consists in computing a translation probability score. Previous experiments in the creation of the LinguaDict lexicon have shown that simply using the (target monolingual) corpus frequency of a translation is not the best option: We want to know how often the target lemma occurs as translation of a given source lemma. Otherwise target lemmata which are very frequent (like ‘be’ or ‘have’) disturb the transfer selection. Also, a relevant factor is for how many words a given target lemma is a translation: If a target lemma has high frequency as translation of only one source word, then this is much more important than if the frequency results from the fact that it is the translation of e.g. five source words. Therefore a more complex approach than simple target corpus frequency was taken: The translation probability consists of three scores, differing in the reference used to compute them. These scores are:  Package probability: probability of a given translation related to the other translations of this package. Number of sentences for the given translation DIV total number of sentences in this package. (0 for all transfers without a sentence in the subcorpus);  Target probability: probability of a given translation related to other source terms (i.e. for how many SL lemmas is this a possible transfer?) (0 for all terms which are in no package)  Corpus probability: probability that this translation is used at all in the target language. Number of occurrences of TL lemma DIV number of lemmata in the total TL corpus. Querying of the probability lexicon is done sequentially, i.e. if a score is zero then the next ‘weaker’ score is taken. Finally, a corpus probability is nearly always available 7 . 7 Only in this particular setup. Note that many entries of LinguaDict do not have any reference in the corpus. D5.6: Transfer Selection Support 16 The format of the probability lexicon is a tuple of . An example is given in fig. 4-2. Fig. 4-2: Example of the probabilistic lexicon8 These two lexicons (ConcLex and ProbLex) are used for the test and evaluation. The challenge is to determine the transfer of a source-lemma based on the context in which it occurs. 8 Words without even a corpus probability could be due to lemmatisation / tagging errors D5.6: Transfer Selection Support 17 5. Test and Evaluation The transfer selection component is tested by determining the transfer of a test lemma in a given sentence context, and comparing it with the one of a reference translation. In the best case, all translations proposed by the Lt-Xfr component are identical with the transfers selected in the reference translations. As the LinguaDict lexicon contains many near translations, which can hardly be distinguished on the basis of conceptual transfer, a special evaluation procedure was adopted, consisting of three ranks instead of a binary decision:  Rank 1: the translation proposed by the system is identical to the one in the test reference sentence  Rank 2: the proposed translation close / synonym to the one in the test reference sentence. This was decided to be the case if o the proposed translation belongs to the same WordNet synset as the reference o the proposed translation is orthographically similar to the reference (like: ‘electric’ vs. ‘electrical’, ‘agglutinating’ vs. ‚agglutinative‘, ‘dialogue’ (UK) vs. ‘dialog’ (US) etc.  Rank 3: the two translations are (still) different. Evaluation would allow rank1 and rank2, and reject rank3 results. Based on the three ranks, a simple scoring system is used (rank1 = 1, rank2 = 2, rank3 = 3) to compute an overall score: The lower the score the closer the translation is to the reference. 5.1 Test data 5.1.1 Test corpus The test corpus was taken from the subcorpora used for the research (cf. section 3.3 above). 1044 sentences were extracted, containing transfers for nouns, verbs, and adjectives. 5.1.2 Resources for ranking For ranking (esp. rank2: similarity), two additional resources were produced:  an indexed version of WordNet V3, whereby for a given input lemma a list of possible synonyms was retrieved (i.e. the synset lemmata 9 ).  a resource for orthographic similarity. For all parts of speech, a resource was used which unifies US and UK spelling (This list contains about 4,700 entries). For adjectives, additional patterns were considered, like ‘adj + -ed’ (‘abstract’ vs. ‘abstracted’), ‘adj-ic + al’ (‘acoustic’ vs. ‘acoustical’) etc. The test frame applies pattern matching for the strings, and simple lookup for the differences in locale. 5.1.3 Test frame It was not possible with the available resources to integrate the Lt-Xfr component into a complete MT system. Therefore a special test system was written which has a translation candidate (source lemma) and a sentence context as an input, and returns the ‘best matching’ 9 As the test lexicon contains only single words, also only the single words of the synsets were taken. D5.6: Transfer Selection Support 18 transfer (target lemma). This return lemma can be compared to the reference translation, and ranked: In case they are not identical, it can be checked if they are both in the same WordNet synset, or are orthographically similar (rank 2). 5.2 Test systems Two test systems were built:  one with the full component (called Lt-Xfr below), with all options produced, and both the conceptual and the probability lexicon  one with only the fallback (called Lt-Xfr-frq below), using the probability lexicon but not the conceptual lexicon; this is relevant in cases where no conceptual context information would be available. For comparison, the test sentences were also given as input to several available MT systems, both with statistical and rule-based architecture. Their translations of the test lemmata were extracted, and also ranked according to the three ranks chosen (also using the synset and the orthographic similarity). 5.3 Test results First, the output of the two Lt-Xfr systems was evaluated against the reference translation (absolute evaluation), and then it was compared to the output of the other MT systems (comparative evaluation). Results are shown in Table 5-1. 5.2.1 Absolute Evaluation For this evaluation, the test sentences were analysed with the LT-Xfr frame, and the resulting transfer was compared to the reference translation. As explained, this procedure was done for two system variants:  One which takes both conceptual and probability lexicon (Lt-Xfr)  One which searches transfers only based on probability information (Lt-Xfr-frq) It can be seen that 60% of the test terms are correctly translated (rank 1), and if WordNet and string-similarity synonyms are taken into account, then 75% of the test sentences return a correct transfer. The values are kind of similar for all parts-of-speech, with verbs doing a bit better than the other parts of speech. As a result, if a random selection of transfers is assumed as a baseline (with about 41% correctness), then the Lt-Xfr improves over the baseline by absolute 34%, and relative 83%; improvement is most significant for verbs (with more than 100% relative). For the fallback system (only frequency-based), the improvement is still 25.6% absolute, and 61.6% relative. 5.2.2 Comparative Evaluation In order to have an impression how the result is compared to the state of the art, the test sentences were translated with several available MT systems, to have an impression how useful they would be. The systems selected for comparison were one SMT and four RMT systems. The test sentences were translated, and the translations for the test words were identified and compared to the reference translation. Like for the absolute evaluation, total (rank1) and partial (rank2) identity were computed, as well as the overall scores. Tab. 5-1 shows the evaluation result. It can be seen that the LT-Xfr system clearly shows the best performance of all systems in all categories. It has much better scores than all RMT systems, and also better scores than Google. It is absolute 20% better than the least-performing MT system, and still 7% better D5.6: Transfer Selection Support 19 than the best-performing one. Even the fallback frequency-based (LT-Xfr-freq) version outperforms all RMT systems, and is better than Google in three of six categories (Verbs1, Verbs/1+2, Adj/1+2). Tab. 5-1: Evaluation results, compared to the reference. Number sentences, ranks (sentences, percentage), per part of speech, total, and score, for all systems. However the result shows that significant improvement in transfer selection can be achieved with the techniques used by LT-Xfr, compared to the state-of-the-art of MT systems. More detailed information on the evaluation is given in PANACEA deliverable D7-4. D5.6: Transfer Selection Support 20 6. PANACEA Integration It was not possible, due to the lack of resources, to provide a full workflow how to build the two lexicons for additional language directions:  input data, consisting of large bilingual transfer lexicons, and of large parallel corpora, must be provided  the tools in the processing chain must be streamlined, and brought into a better sequence; exploratory steps can be skipped. However, to demonstrate the scope of the tool, the test frame was made available as a web service in the PANACEA registry. 6.1 Formats The service is available as a web service in the PANACEA registry, called: http://80.190.143.163/panaceaV2/services/LTXfr?wsdl Parameters are:  source language (only ‘de’ is supported)  target language (only ‘en’ is supported)  text (a string containing an URL pointing to the input file) The text must be an UTF8 file, and have a 3-column layout:  source language lemma (in normalised form, incl. lowercasing); only single word lemmata are supported so far  source lemma part-of-speech (No for noun, Vb for verb, Ad for adjectives)  context sentence (containing the lemma to be translated) The format of the resulting file is the same, with an additional column inserted which contains the translation which the system proposed for this particular sentence context. An example is given in fig. 6-1. D5.6: Transfer Selection Support 21 Fig. 6-1: Example of input and output of the LT-Xfr lookup service D5.6: Transfer Selection Support 22 7. Assessment 7.1 Relevance The transfer strategy presented here is just one of possible transfer strategies; others are transfer selection based on external information (topic, locale etc., which are passed to the transfer selection component by external features), or based on morphosyntactic content. The approach presented here shows the following features:  It fits to the architecture of rule-based system inasmuch as it provides transfer selection on the source side, not on the target side, and controls the transfer selection strategies for such systems.  It can be used as additional information source, as it provides a static resource which can easily be linked to a system: Most MT systems have an internal structure for their transfer packages, like a sequence of tests, and there is always a ‘default’ translation in cases where all tests fail. This could be the place where the current resource could successfully be used, i.e. for cases where no system relevant information is available.  As it relates to conceptual contexts, and is not linked to a particular syntactic structure or configuration, it is more robust than current selection strategies, which usually fail in cases where the required syntactic structure is not built (e.g. due to a parse failure). So it could be used as a fallback in cases where the analysis component returns improper results  For the same reason, the approach is independent of the specific system structure, the type of analysis results, syntactic structures etc.; it can support shallow MT systems just as well as all kinds of deep RMT. It simply leads additional information into the transfer selection process which is not used up to now. 7.2 Quality The quality of the component crucially depends on the quality of the match between the text context and the clusters of the conceptual lexicon. 1. One option to improve the matching is to extend the context from sentences to paragraphs; this step has been taken in [Thurmair 2005] and improves transfer quality to a level of 96% accuracy. However, most of the parallel data available today are aligned on sentence level, not on paragraph level, so such an approach would be difficult to train. 2. Another option is to review the clusters. A look at the current clusters shows that there are lemmata which would be considered to be irrelevant for transfer selection, and that other lemmata are missing which would be expected here. Fig. 7-1: Cluster for blatt -> leaf In fig. 7-1, the translation of (de) ‘blatt’ -> (en) ‘leaf’ (as opposed to ‘sheet’ or ‘newspaper’) D5.6: Transfer Selection Support 23 would be corroborated by ‘zweig’, ‘frucht’, and also missing terms like ‘ast’ or ‘blüte’, whereas ‘patentieren’, ‘zypern’ etc. would not really contribute to the disambiguation of this reading. Additional missing concepts could be collected by doing monolingual correlation analysis, and add lemmata which are highly correlated with the terms of a given seed cluster. Such a strategy could provide good additional terms 3. Clusters suffer from data sparsity, and the more so the less frequent the translations are: Many transfers in the conceptual lexicon simply have no conceptual context information at all. If there is no seed cluster there is no monolingual extension either. 4. Therefore, to improve quality, an option must be foreseen to have the conceptual lexicon edited by human coders: They should be able to add / remove terms to improve the cluster accuracy, and adapt the transfer to specific types of texts, contexts, or other needs. Human editing would require a review of the current scoring mechanism, to be changed e.g. into a simple three-level score (very relevant – relevant – somewhat relevant), and the lookup would have to be adapted accordingly. 5. Transfer selection on the target side, as done in SMT, has only one advantage: In cases of idiomatic expressions, created by an idiosyncratic combination of two target words, can easier be solved on the target side. However, such expressions can easily be added to the transfer lexicon (as they have to be added to the training data on the other side); they would not even create ambiguities in transfer selection. 7.3 Extensions To stabilise the results of the current investigation, the following items must be considered: 1. The analysis used only a subset of the lexicon; multiword entries and entries with changes in the part of speech were not considered.  Adding multiwords requires more sophistication in the step where word alignment is required; GIZA++ would not be the appropriate tool here anymore, and full MOSES phrase alignment may be necessary.  As for part-of-speech changes, the most frequent case in German->English is that adjectives used adverbially must be translated as adverbs. Care has been taken that the part of speech is given to the lookup as one of the input parameters; this can help in transfer selection. 2. The processing chain needs to be stabilised to be able to run the component with other lexicon and corpus data. 3. The coverage must be extended to other language directions. This would require large sentence-aligned bilingual corpora, a bilingual lexicon created e.g. with the Lt-P2G tool 10 from such a corpus, and language resources for analysis, esp. lemmatiser and tagger information. Optional, for ranking and evaluation, a WordNet and a ortho-similarity resource could be used for the target language. 4. The cluster building itself could be improved by collapsing transfers which are clearly synonyms, or variants of each other (e.g. locale), before the analysis rather than afterwards in a step of ranking. This would provide more data for such readings in clustering. 10 see task 5.2 of PANACEA, or [Thurmair/Aleksid 2012] D5.6: Transfer Selection Support 24 8 Citations  Agirre, E., Edmonds, Ph., eds., 2006: Word Sense Disambiguation. Springer  Brown, P., Della Pietra, St., Della Pietra, V., Mercer, R., 1991: Word-sense disambiguation using statistical methods. Proc. 29th ACL  Carl, M., Melero, M., Badia, T., Vandeghinste, V., Dirix, P., Schuurman, i., Markantonatou, St., Sofianopoulos, S., Vassiliou, M., Yannoutsou, O:, 2008: METIS-II: low resource machine translation. in: Machine Translation 22, 1-2, p.67-99  Koehn, Ph., 2010: Statistical Machine Translation. Cambridge Univ. Press  Santos, D., 2000: The translation network, A model for a fine-grained description of translaitons. In: Véronis, ed.: Parallel Text Processing. Kluwer.  Thurmair, Gr., 1990: Complex lexical transfer in METAL. Proc. 3 rd TMI Conf., Austin, Tx  Thurmair, Gr., 2005: Improving Machine Translation Quality. Proc. MT Summit X, Phuket  Thurmair, Gr., Aleksić, V., 2012: Creating term and lexicon entries from phrase tables. Proc. EAMT, Trento  Thurmair, Gr., Aleksić, V., Schwarz, Chr., 2012: Large scale lexical analysis. Proc. LREC, Istanbul  Tyers, F.M., Sánchez-Mártinez, F., Forcada, M.L., 2012: Flexible finite-state lexical selection for rule-based machine translation: Proc EAMT Trento