May

Decoding with Value Networks for Neural Machine Translation

This work introduces NMT-VNN, which incorporates a value network in machine translation decoding. The value network \(v(X, Y)\) is trained to distinguish the better partial decoding between two partial decoding \(Y^{p,1}\) and \(Y^{p,2}\) generated by beam search with a certain model \(\pi(y|X, Y)\). The discrimination is based on the average BLEU score of complete target sentences of \(Y^{p,1}\) and \(Y^{p,2}\). At each step $T$ of the decoding phase, they maximize

\[\begin{align} \label{eq:he2017decoding:1} \alpha\frac{\sum_i^T \log P(y_i|X, Y_{< i})}{T} + (1-\alpha) \log v(X, Y_{\le T}) \end{align}\]

where $\alpha$ is the balancing factor.

The idea of this work is promising, where they intend to foresee the future result of current selection in machine translation.

We shall start our analysis by interpreting what $v(X,Y)$ stands for here. Derived from the original paper, they train it by minimizing

\[\begin{align} \exp ( ( (v(X, Y^{p,1}) - v(X, Y^{p,2})) sgn (b(Y^*, Y^{p,2}) - b(Y^*, Y^{p,1})))) \label{eq:he2017decoding:2} \end{align}\]

where \(b\) is the BLEU score after the decoding is finished.

If \(v(X, Y)\) can correctly assign values for arbitrary partial decoding, function \eqref{eq:he2017decoding:2} reaches global minima. Thus this objective seems reasonable. We can further infer that considering the current \(v(X, Y_{< T, i})\) and \(v(X, Y_{< T, j})\) is sufficient to compare the future performance of \(Y_{< T, i}\) and \(Y_{< T, j}\).

However, in their experiments, we see that the best values of \(\alpha\)

  1. are large (\(\ge 0.8\)).

  2. are different across tasks.

This may indicate that in practice, the value network \(v(X, Y)\) plays a less important role than our expectation. The reasons are currently unknown, but I hypothesize the major reason comes from insufficient training of \(v(X,Y)\).

@inproceedings{he2017decoding,
  author    = {He, Di and Lu, Hanqing and Xia, Yingce and Qin, Tao and Wang, Liwei and Liu, Tie-Yan},
  booktitle = {NIPS},
  title     = {Decoding with Value Networks for Neural Machine Translation},
  url       = {https://proceedings.neurips.cc/paper/2017/file/2b24d495052a8ce66358eb576b8912c8-Paper.pdf},
  year      = {2017}
}

Page created: 2021-05-22, modified: 2022-12-31.