Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

Overall

Abstract in Korean

신경망 기반 기계 번역은 기계 번역에 있어서 최근에 제안된 접근법입니다. 전통적인 통계적 방법의 기계 번역과 달리, 신경망을 이용한 기계 번역은 번역 성능이 최대화되도록 모형 전체를 학습(조율)시킬 수 있는 단일의 신경망을 만드는 데에 목표가 있습니다. 최근에 제안된 신경망 기반 기계 번역 모형들은 인코더-디코더 구조의 일련에 속하는데, 이때 인코더는 입력 문장을 고정된 길이의 벡터로 인코딩하고 디코더는 해당 벡터로부터 번역문을 생성하게 됩니다. 이 논문에서, 우리는 고정된 길이의 벡터를 사용하는 것이 위 기본적인 인코더-디코더 구조의 성능 향상에 제약사항으로 작용한다고 추론하며, 이를 개선시키기 위해 모델이 자동적으로 타겟 단어를 예측하는 데 필요한 입력 문장의 부분을 (소프트한 방식으로) 검색하도록 하여, 해당 부분을 명시적으로 (하드한 방식으로) 고정된 부분으로 변환할 필요가 없도록 하는 방식을 제안합니다. 이러한 새로운 접근법으로, 우리는 영어-불어 번역 태스크에 있어서 기존의 최첨단 구절 기반(phrase-based)의 시스템에 비견될 만한 번역 성능을 달성했습니다. 더 나아가, 양적 분석 결과는 이 모형에서 발견할 수 있는 (소프트한 방식의) 정렬들(alignments)이 우리의 직관과 잘 부합함을 보여줍니다.

Introduction

Machine Translation

Machine Translation (MT) is a sub-field of Natural Language Processing (NLP) and computational linguistics that uses software to translate text, document, or speech from one language to another[2].

Previous Approaches

Statistical Machine Translation

Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora[3].

The main idea of statistical approach comes from information theory. Given an input sentence $\mathbf{x}$, we can represent the probability of a certain output sentence $\mathbf{y}$ as the conditional probability distribution as follows:

$$ p(\mathbf{y} | \mathbf{x}) $$

And the task of the model is to approximate the probability distribution $p(\mathbf{y}|\mathbf{x})$. One of the most famous approaches in statistical methods is to apply Bayes' Theorem, which is $p(\mathbf{y}|\mathbf{x}) \propto p(\mathbf{x}|\mathbf{y}) p(\mathbf{y})$, where the translation model $p(\mathbf{x}|\mathbf{y})$ is the probability that the source string $\mathbf{x}$ is the translation of the target string $\mathbf{y}$, and the language model $p(\mathbf{y})$ is the probability of seeing that target language string $\mathbf{y}$ [3]. Therefore, we can write the translation problem as follows:

$$ \tilde{\mathbf{y}} = \argmax_{\mathbf{y} \in \mathbf{y}^} p(\mathbf{y}|\mathbf{x}) = \argmax_{\mathbf{y} \in \mathbf{y}^} p(\mathbf{x}|\mathbf{y})p(\mathbf{y}) $$

In addition to Bayes Theorem, Statistical Machine Translation uses various statistical techiniques to approximate the probability distribution.

Rule-based Machine Translation

Rule-based machine translation (RBMT; "Classical Approach" of MT) is machine translation systems based on linguistic information about source and target languages basically retrieved from (unilingual, bilingual, or multilingual) dictionaries and grammers covering the main semantic, morphological, and syntactic regularities of each language respectively[4].

Example-based Machine Translation

Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be viewed as an implementation of a case-based reasoning approach to machine learning[5].