Machine Translation (MT) which has been championed as an effective technology for knowledge transfer from English to languages with less digital content. An example of such efforts is the automatic translation of English Wikipedia to languages with smaller collections, such as Arabic. However, MT quality is still far from ideal for many of the languages and text genres. While translating a document, many sentences are poorly translated which can provide an incorrect text, and confuse the reader. Moreover, some of these sentences are not as informative and could be summarized to make a more cohesive document. Thus, for tasks in which complete translation is not mandatory, MT can be effective if the system can provide an informative subset of the content with higher translation quality. For this scenario, text summarization can provide effective support for MT by keeping only the most important and informative parts of a given document to translate. In this work, we demonstrate a framework of MT and text summarization which replaces the baseline translation with a proper summary that has higher translation quality than the full translation. For this, we combine the state of the art English summarization system and a novel framework for prediction of MT quality without references. Our framework is composed of the following major components: (a) a standard machine translation system, (b) a reference-free MT quality estimation system, (c) an MT-aware summarization system, and (d) an English-Arabic sentence matcher. More specifically, our English-Arabic system reads in an English document along with its baseline Arabic translation and outputs, as a summary, a subset of the Arabic sentences based on their informativeness and their translation quality. We demonstrate the utility of our system by evaluating it with respect to its translation and summarization quality and demonstrate that we can balance between improving MT quality and maintaining a decent summarization quality. For summarization, we conduct both reference-based and reference-free evaluations and observe a performance in the range of the state of the art system. Moreover, the translation quality of the summaries shows an important improvement against the baseline translation of the entire documents. This MT-aware summarization approach can be applied to translation of texts such as Wikipedia articles. For such domain-rich articles, there is a large variation of translation quality across different sections. An intelligent reduction of the translation tasks results in improved final outcome. Finally, the framework is mostly language independent and can be easily customized for different target languages and domains.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error