TensorFlow - 基于 RNN 训练聊天机器人-云服务器玩法在线实验

[复制链接]
查看: 479|回复: 0

35

主题

35

帖子

121

积分

注册会员

Rank: 2

积分
121
发表于 2020-12-28 11:37:22 | 显示全部楼层 |阅读模式

实验内容

TensorFlow是由Google开发的用于数值计算的开源软件库,本实验基于 TensorFlow 构建两层的 RNN,实现可以对话的 AI demo。

免费在线实验地址:点击进入

实验资源:云服务器,没有云服务器的朋友推荐1折抢购:69元/年的阿里云服务器或者88元/年的腾讯云服务器


基于 TensoFlow 构建 SeqSeq 模型,并加入 Attention 机制,encoder 和 decoder 为 3 层的 RNN 网络。本教程主要参考 TensorFlow 官网 translate Demo。
步骤简介
本教程一共分为四个部分
  • generate_chat.py - 清洗数据、提取 ask 数据和 answer 数据、提取词典、为每个字生成唯一的数字 ID、ask 和 answer 用数字 ID 表示;
  • seq2seq.py、seq2seq_model.py - TensorFlow 中 Translate Demo,由于出现 deepcopy 错误,这里对 SeqSeq 稍微改动了;
  • train_chat.py - 训练 SeqSeq 模型;
  • predict_chat.py - 进行聊天。
数据学习
任务时间:时间未知
获取训练数据
我们在腾讯云的 COS 上准备了训练数据,使用 wget 命令获取:
  1. wget http://devlab-1251520893.cos.ap-guangzhou.myqcloud.com/chat.conv
复制代码
数据预处理处理思路:
  • 原始数据中,每次对话是 M 开头,前一行是 E ,并且每次对话都是一问一答的形式。将原始数据分为 ask、answer 两份数据;
  • 两种词袋:“汉字 => 数字”、“数字 => 汉字”,根据第一个词袋将 ask、answer 数据转化为数字表示;
  • answer 数据每句添加 EOS 作为结束符号。
示例代码:
现在您可以在 /home/ubuntu 目录下创建源文件 generate_chat.py,内容可参考:
示例代码:/home/ubuntu/generate_chat.py
  1. #-*- coding:utf-8 -*-
  2. from io import open
  3. import random
  4. import sys
  5. import tensorflow as tf

  6. PAD = "PAD"
  7. GO = "GO"
  8. EOS = "EOS"
  9. UNK = "UNK"
  10. START_VOCAB = [PAD, GO, EOS, UNK]

  11. PAD_ID = 0 #填充
  12. GO_ID = 1  #开始标志
  13. EOS_ID = 2 #结束标志
  14. UNK_ID = 3 #未知字符
  15. _buckets = [(10, 15), (20, 25), (40, 50),(80,100)]
  16. units_num = 256
  17. num_layers = 3
  18. max_gradient_norm = 5.0
  19. batch_size = 50
  20. learning_rate = 0.5
  21. learning_rate_decay_factor = 0.97

  22. train_encode_file = "train_encode"
  23. train_decode_file = "train_decode"
  24. test_encode_file = "test_encode"
  25. test_decode_file = "test_decode"
  26. vocab_encode_file = "vocab_encode"
  27. vocab_decode_file = "vocab_decode"
  28. train_encode_vec_file = "train_encode_vec"
  29. train_decode_vec_file = "train_decode_vec"
  30. test_encode_vec_file = "test_encode_vec"
  31. test_decode_vec_file = "test_decode_vec"

  32. def is_chinese(sentence):
  33.     flag = True
  34.     if len(sentence) < 2:
  35.         flag = False
  36.         return flag
  37.     for uchar in sentence:
  38.         if(uchar == ',' or uchar == '。' or
  39.             uchar == '~' or uchar == '?' or
  40.             uchar == '!'):
  41.             flag = True
  42.         elif '一' <= uchar <= '鿿':
  43.             flag = True
  44.         else:
  45.             flag = False
  46.             break
  47.     return flag

  48. def get_chatbot():
  49.     f = open("chat.conv","r", encoding="utf-8")
  50.     train_encode = open(train_encode_file,"w", encoding="utf-8")
  51.     train_decode = open(train_decode_file,"w", encoding="utf-8")
  52.     test_encode = open(test_encode_file,"w", encoding="utf-8")
  53.     test_decode = open(test_decode_file,"w", encoding="utf-8")
  54.     vocab_encode = open(vocab_encode_file,"w", encoding="utf-8")
  55.     vocab_decode = open(vocab_decode_file,"w", encoding="utf-8")
  56.     encode = list()
  57.     decode = list()

  58.     chat = list()
  59.     print("start load source data...")
  60.     step = 0
  61.     for line in f.readlines():
  62.         line = line.strip('\n').strip()
  63.         if not line:
  64.             continue
  65.         if line[0] == "E":
  66.             if step % 1000 == 0:
  67.                 print("step:%d" % step)
  68.             step += 1
  69.             if(len(chat) == 2 and is_chinese(chat[0]) and is_chinese(chat[1]) and
  70.                 not chat[0] in encode and not chat[1] in decode):
  71.                 encode.append(chat[0])
  72.                 decode.append(chat[1])
  73.             chat = list()
  74.         elif line[0] == "M":
  75.             L = line.split(' ')
  76.             if len(L) > 1:
  77.                 chat.append(L[1])
  78.     encode_size = len(encode)
  79.     if encode_size != len(decode):
  80.         raise ValueError("encode size not equal to decode size")
  81.     test_index = random.sample([i for i in range(encode_size)],int(encode_size*0.2))
  82.     print("divide source into two...")
  83.     step = 0
  84.     for i in range(encode_size):
  85.         if step % 1000 == 0:
  86.             print("%d" % step)
  87.         step += 1
  88.         if i in test_index:
  89.             test_encode.write(encode[i] + "\n")
  90.             test_decode.write(decode[i] + "\n")
  91.         else:
  92.             train_encode.write(encode[i] + "\n")
  93.             train_decode.write(decode[i] + "\n")

  94.     vocab_encode_set = set(''.join(encode))
  95.     vocab_decode_set = set(''.join(decode))
  96.     print("get vocab_encode...")
  97.     step = 0
  98.     for word in vocab_encode_set:
  99.         if step % 1000 == 0:
  100.             print("%d" % step)
  101.         step += 1
  102.         vocab_encode.write(word + "\n")
  103.     print("get vocab_decode...")
  104.     step = 0
  105.     for word in vocab_decode_set:
  106.         print("%d" % step)
  107.         step += 1
  108.         vocab_decode.write(word + "\n")

  109. def gen_chatbot_vectors(input_file,vocab_file,output_file):
  110.     vocab_f = open(vocab_file,"r", encoding="utf-8")
  111.     output_f = open(output_file,"w")
  112.     input_f = open(input_file,"r",encoding="utf-8")
  113.     words = list()
  114.     for word in vocab_f.readlines():
  115.         word = word.strip('\n').strip()
  116.         words.append(word)
  117.     word_to_id = {word:i for i,word in enumerate(words)}
  118.     to_id = lambda word: word_to_id.get(word,UNK_ID)
  119.     print("get %s vectors" % input_file)
  120.     step = 0
  121.     for line in input_f.readlines():
  122.         if step % 1000 == 0:
  123.             print("step:%d" % step)
  124.         step += 1
  125.         line = line.strip('\n').strip()
  126.         vec = map(to_id,line)
  127.         output_f.write(' '.join([str(n) for n in vec]) + "\n")

  128. def get_vectors():
  129.     gen_chatbot_vectors(train_encode_file,vocab_encode_file,train_encode_vec_file)
  130.     gen_chatbot_vectors(train_decode_file,vocab_decode_file,train_decode_vec_file)
  131.     gen_chatbot_vectors(test_encode_file,vocab_encode_file,test_encode_vec_file)
  132.     gen_chatbot_vectors(test_decode_file,vocab_decode_file,test_decode_vec_file)

  133. def get_vocabs(vocab_file):
  134.     words = list()
  135.     with open(vocab_file,"r", encoding="utf-8") as vocab_f:
  136.         for word in vocab_f:
  137.             words.append(word.strip('\n').strip())
  138.     id_to_word = {i: word for i, word in enumerate(words)}
  139.     word_to_id = {v: k for k, v in id_to_word.items()}
  140.     vocab_size = len(id_to_word)
  141.     return id_to_word,word_to_id,vocab_size

  142. def read_data(source_path, target_path, max_size=None):
  143.     data_set = [[] for _ in _buckets]
  144.     with tf.gfile.GFile(source_path, mode="r") as source_file:
  145.         with tf.gfile.GFile(target_path, mode="r") as target_file:
  146.             source, target = source_file.readline(), target_file.readline()
  147.             counter = 0
  148.             while source and target and (not max_size or counter < max_size):
  149.                 counter += 1
  150.                 source_ids = [int(x) for x in source.split()]
  151.                 target_ids = [int(x) for x in target.split()]
  152.                 target_ids.append(EOS_ID)
  153.                 for bucket_id, (source_size, target_size) in enumerate(_buckets):
  154.                     if len(source_ids) < source_size and len(target_ids) < target_size:
  155.                         data_set[bucket_id].append([source_ids, target_ids])
  156.                         break
  157.                 source, target = source_file.readline(), target_file.readline()
  158.     return data_set
复制代码
生成数据:
可以在终端中一步一步执行下面命令
启动 python:
  1. cd /home/ubuntu/
  2. python
  3. from generate_chat import *
复制代码
获取 ask、answer 数据并生成字典:
  1. get_chatbot()
复制代码
  • train_encode - 用于训练的 ask 数据;
  • train_decode - 用于训练的 answer 数据;
  • test_encode - 用于验证的 ask 数据;
  • test_decode - 用于验证的 answer 数据;
  • vocab_encode - ask 数据词典;
  • vocab_decode - answer 数据词典。

训练数据转化为数字表示:
  1. get_vectors()
复制代码
  • train_encode_vec - 用于训练的 ask 数据数字表示形式;
  • train_decode_vec - 用于训练的 answer 数据数字表示形式;
  • test_encode_vec - 用于验证的 ask 数据;
  • test_decode_vec - 用于验证的 answer 数据;

模型学习
任务时间:时间未知
Seq2Seq 模型
采用 translate 的 model,实验过程发现 deepcopy 出现 NotImplementedType 错误,所以对 translate 中 seq2seq 稍微改动了。
seq2seq 示例代码:
现在您可以在 /home/ubuntu 目录下创建源文件 seq2seq.py,内容可参考:
示例代码:/home/ubuntu/seq2seq.py
  1. # Copyright 2015 The TensorFlow Authors. All Rights Reserved.
  2. #
  3. # Licensed under the Apache License, Version 2.0 (the "License");
  4. # you may not use this file except in compliance with the License.
  5. # You may obtain a copy of the License at
  6. #
  7. #     http://www.apache.org/licenses/LICENSE-2.0
  8. #
  9. # Unless required by applicable law or agreed to in writing, software
  10. # distributed under the License is distributed on an "AS IS" BASIS,
  11. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  12. # See the License for the specific language governing permissions and
  13. # limitations under the License.
  14. # ==============================================================================
  15. """Library for creating sequence-to-sequence models in TensorFlow.

  16. Sequence-to-sequence recurrent neural networks can learn complex functions
  17. that map input sequences to output sequences. These models yield very good
  18. results on a number of tasks, such as speech recognition, parsing, machine
  19. translation, or even constructing automated replies to emails.

  20. Before using this module, it is recommended to read the TensorFlow tutorial
  21. on sequence-to-sequence models. It explains the basic concepts of this module
  22. and shows an end-to-end example of how to build a translation model.
  23.   https://www.tensorflow.org/versions/master/tutorials/seq2seq/index.html

  24. Here is an overview of functions available in this module. They all use
  25. a very similar interface, so after reading the above tutorial and using
  26. one of them, others should be easy to substitute.

  27. * Full sequence-to-sequence models.
  28.   - basic_rnn_seq2seq: The most basic RNN-RNN model.
  29.   - tied_rnn_seq2seq: The basic model with tied encoder and decoder weights.
  30.   - embedding_rnn_seq2seq: The basic model with input embedding.
  31.   - embedding_tied_rnn_seq2seq: The tied model with input embedding.
  32.   - embedding_attention_seq2seq: Advanced model with input embedding and
  33.       the neural attention mechanism; recommended for complex tasks.

  34. * Multi-task sequence-to-sequence models.
  35.   - one2many_rnn_seq2seq: The embedding model with multiple decoders.

  36. * Decoders (when you write your own encoder, you can use these to decode;
  37.     e.g., if you want to write a model that generates captions for images).
  38.   - rnn_decoder: The basic decoder based on a pure RNN.
  39.   - attention_decoder: A decoder that uses the attention mechanism.

  40. * Losses.
  41.   - sequence_loss: Loss for a sequence model returning average log-perplexity.
  42.   - sequence_loss_by_example: As above, but not averaging over all examples.

  43. * model_with_buckets: A convenience function to create models with bucketing
  44.     (see the tutorial above for an explanation of why and how to use it).
  45. """

  46. from __future__ import absolute_import
  47. from __future__ import division
  48. from __future__ import print_function

  49. import copy

  50. # We disable pylint because we need python3 compatibility.
  51. from six.moves import xrange  # pylint: disable=redefined-builtin
  52. from six.moves import zip  # pylint: disable=redefined-builtin

  53. from tensorflow.contrib.rnn.python.ops import core_rnn_cell
  54. from tensorflow.python.framework import dtypes
  55. from tensorflow.python.framework import ops
  56. from tensorflow.python.ops import array_ops
  57. from tensorflow.python.ops import control_flow_ops
  58. from tensorflow.python.ops import embedding_ops
  59. from tensorflow.python.ops import math_ops
  60. from tensorflow.python.ops import nn_ops
  61. from tensorflow.python.ops import rnn
  62. from tensorflow.python.ops import rnn_cell_impl
  63. from tensorflow.python.ops import variable_scope
  64. from tensorflow.python.util import nest

  65. # TODO(ebrevdo): Remove once _linear is fully deprecated.
  66. linear = rnn_cell_impl._linear  # pylint: disable=protected-access


  67. def _extract_argmax_and_embed(embedding,
  68.                               output_projection=None,
  69.                               update_embedding=True):
  70.   """Get a loop_function that extracts the previous symbol and embeds it.

  71.   Args:
  72.     embedding: embedding tensor for symbols.
  73.     output_projection: None or a pair (W, B). If provided, each fed previous
  74.       output will first be multiplied by W and added B.
  75.     update_embedding: Boolean; if False, the gradients will not propagate
  76.       through the embeddings.

  77.   Returns:
  78.     A loop function.
  79.   """

  80.   def loop_function(prev, _):
  81.     if output_projection is not None:
  82.       prev = nn_ops.xw_plus_b(prev, output_projection[0], output_projection[1])
  83.     prev_symbol = math_ops.argmax(prev, 1)
  84.     # Note that gradients will not propagate through the second parameter of
  85.     # embedding_lookup.
  86.     emb_prev = embedding_ops.embedding_lookup(embedding, prev_symbol)
  87.     if not update_embedding:
  88.       emb_prev = array_ops.stop_gradient(emb_prev)
  89.     return emb_prev

  90.   return loop_function


  91. def rnn_decoder(decoder_inputs,
  92.                 initial_state,
  93.                 cell,
  94.                 loop_function=None,
  95.                 scope=None):
  96.   """RNN decoder for the sequence-to-sequence model.

  97.   Args:
  98.     decoder_inputs: A list of 2D Tensors [batch_size x input_size].
  99.     initial_state: 2D Tensor with shape [batch_size x cell.state_size].
  100.     cell: rnn_cell.RNNCell defining the cell function and size.
  101.     loop_function: If not None, this function will be applied to the i-th output
  102.       in order to generate the i+1-st input, and decoder_inputs will be ignored,
  103.       except for the first element ("GO" symbol). This can be used for decoding,
  104.       but also for training to emulate http://arxiv.org/abs/1506.03099.
  105.       Signature -- loop_function(prev, i) = next
  106.         * prev is a 2D Tensor of shape [batch_size x output_size],
  107.         * i is an integer, the step number (when advanced control is needed),
  108.         * next is a 2D Tensor of shape [batch_size x input_size].
  109.     scope: VariableScope for the created subgraph; defaults to "rnn_decoder".

  110.   Returns:
  111.     A tuple of the form (outputs, state), where:
  112.       outputs: A list of the same length as decoder_inputs of 2D Tensors with
  113.         shape [batch_size x output_size] containing generated outputs.
  114.       state: The state of each cell at the final time-step.
  115.         It is a 2D Tensor of shape [batch_size x cell.state_size].
  116.         (Note that in some cases, like basic RNN cell or GRU cell, outputs and
  117.          states can be the same. They are different for LSTM cells though.)
  118.   """
  119.   with variable_scope.variable_scope(scope or "rnn_decoder"):
  120.     state = initial_state
  121.     outputs = []
  122.     prev = None
  123.     for i, inp in enumerate(decoder_inputs):
  124.       if loop_function is not None and prev is not None:
  125.         with variable_scope.variable_scope("loop_function", reuse=True):
  126.           inp = loop_function(prev, i)
  127.       if i > 0:
  128.         variable_scope.get_variable_scope().reuse_variables()
  129.       output, state = cell(inp, state)
  130.       outputs.append(output)
  131.       if loop_function is not None:
  132.         prev = output
  133.   return outputs, state


  134. def basic_rnn_seq2seq(encoder_inputs,
  135.                       decoder_inputs,
  136.                       cell,
  137.                       dtype=dtypes.float32,
  138.                       scope=None):
  139.   """Basic RNN sequence-to-sequence model.

  140.   This model first runs an RNN to encode encoder_inputs into a state vector,
  141.   then runs decoder, initialized with the last encoder state, on decoder_inputs.
  142.   Encoder and decoder use the same RNN cell type, but don't share parameters.

  143.   Args:
  144.     encoder_inputs: A list of 2D Tensors [batch_size x input_size].
  145.     decoder_inputs: A list of 2D Tensors [batch_size x input_size].
  146.     cell: tf.nn.rnn_cell.RNNCell defining the cell function and size.
  147.     dtype: The dtype of the initial state of the RNN cell (default: tf.float32).
  148.     scope: VariableScope for the created subgraph; default: "basic_rnn_seq2seq".

  149.   Returns:
  150.     A tuple of the form (outputs, state), where:
  151.       outputs: A list of the same length as decoder_inputs of 2D Tensors with
  152.         shape [batch_size x output_size] containing the generated outputs.
  153.       state: The state of each decoder cell in the final time-step.
  154.         It is a 2D Tensor of shape [batch_size x cell.state_size].
  155.   """
  156.   with variable_scope.variable_scope(scope or "basic_rnn_seq2seq"):
  157.     enc_cell = copy.deepcopy(cell)
  158.     _, enc_state = rnn.static_rnn(enc_cell, encoder_inputs, dtype=dtype)
  159.     return rnn_decoder(decoder_inputs, enc_state, cell)


  160. def tied_rnn_seq2seq(encoder_inputs,
  161.                      decoder_inputs,
  162.                      cell,
  163.                      loop_function=None,
  164.                      dtype=dtypes.float32,
  165.                      scope=None):
  166.   """RNN sequence-to-sequence model with tied encoder and decoder parameters.

  167.   This model first runs an RNN to encode encoder_inputs into a state vector, and
  168.   then runs decoder, initialized with the last encoder state, on decoder_inputs.
  169.   Encoder and decoder use the same RNN cell and share parameters.

  170.   Args:
  171.     encoder_inputs: A list of 2D Tensors [batch_size x input_size].
  172.     decoder_inputs: A list of 2D Tensors [batch_size x input_size].
  173.     cell: tf.nn.rnn_cell.RNNCell defining the cell function and size.
  174.     loop_function: If not None, this function will be applied to i-th output
  175.       in order to generate i+1-th input, and decoder_inputs will be ignored,
  176.       except for the first element ("GO" symbol), see rnn_decoder for details.
  177.     dtype: The dtype of the initial state of the rnn cell (default: tf.float32).
  178.     scope: VariableScope for the created subgraph; default: "tied_rnn_seq2seq".

  179.   Returns:
  180.     A tuple of the form (outputs, state), where:
  181.       outputs: A list of the same length as decoder_inputs of 2D Tensors with
  182.         shape [batch_size x output_size] containing the generated outputs.
  183.       state: The state of each decoder cell in each time-step. This is a list
  184.         with length len(decoder_inputs) -- one item for each time-step.
  185.         It is a 2D Tensor of shape [batch_size x cell.state_size].
  186.   """
  187.   with variable_scope.variable_scope("combined_tied_rnn_seq2seq"):
  188.     scope = scope or "tied_rnn_seq2seq"
  189.     _, enc_state = rnn.static_rnn(
  190.         cell, encoder_inputs, dtype=dtype, scope=scope)
  191.     variable_scope.get_variable_scope().reuse_variables()
  192.     return rnn_decoder(
  193.         decoder_inputs,
  194.         enc_state,
  195.         cell,
  196.         loop_function=loop_function,
  197.         scope=scope)


  198. def embedding_rnn_decoder(decoder_inputs,
  199.                           initial_state,
  200.                           cell,
  201.                           num_symbols,
  202.                           embedding_size,
  203.                           output_projection=None,
  204.                           feed_previous=False,
  205.                           update_embedding_for_previous=True,
  206.                           scope=None):
  207.   """RNN decoder with embedding and a pure-decoding option.

  208.   Args:
  209.     decoder_inputs: A list of 1D batch-sized int32 Tensors (decoder inputs).
  210.     initial_state: 2D Tensor [batch_size x cell.state_size].
  211.     cell: tf.nn.rnn_cell.RNNCell defining the cell function.
  212.     num_symbols: Integer, how many symbols come into the embedding.
  213.     embedding_size: Integer, the length of the embedding vector for each symbol.
  214.     output_projection: None or a pair (W, B) of output projection weights and
  215.       biases; W has shape [output_size x num_symbols] and B has
  216.       shape [num_symbols]; if provided and feed_previous=True, each fed
  217.       previous output will first be multiplied by W and added B.
  218.     feed_previous: Boolean; if True, only the first of decoder_inputs will be
  219.       used (the "GO" symbol), and all other decoder inputs will be generated by:
  220.         next = embedding_lookup(embedding, argmax(previous_output)),
  221.       In effect, this implements a greedy decoder. It can also be used
  222.       during training to emulate http://arxiv.org/abs/1506.03099.
  223.       If False, decoder_inputs are used as given (the standard decoder case).
  224.     update_embedding_for_previous: Boolean; if False and feed_previous=True,
  225.       only the embedding for the first symbol of decoder_inputs (the "GO"
  226.       symbol) will be updated by back propagation. Embeddings for the symbols
  227.       generated from the decoder itself remain unchanged. This parameter has
  228.       no effect if feed_previous=False.
  229.     scope: VariableScope for the created subgraph; defaults to
  230.       "embedding_rnn_decoder".

  231.   Returns:
  232.     A tuple of the form (outputs, state), where:
  233.       outputs: A list of the same length as decoder_inputs of 2D Tensors. The
  234.         output is of shape [batch_size x cell.output_size] when
  235.         output_projection is not None (and represents the dense representation
  236.         of predicted tokens). It is of shape [batch_size x num_decoder_symbols]
  237.         when output_projection is None.
  238.       state: The state of each decoder cell in each time-step. This is a list
  239.         with length len(decoder_inputs) -- one item for each time-step.
  240.         It is a 2D Tensor of shape [batch_size x cell.state_size].

  241.   Raises:
  242.     ValueError: When output_projection has the wrong shape.
  243.   """
  244.   with variable_scope.variable_scope(scope or "embedding_rnn_decoder") as scope:
  245.     if output_projection is not None:
  246.       dtype = scope.dtype
  247.       proj_weights = ops.convert_to_tensor(output_projection[0], dtype=dtype)
  248.       proj_weights.get_shape().assert_is_compatible_with([None, num_symbols])
  249.       proj_biases = ops.convert_to_tensor(output_projection[1], dtype=dtype)
  250.       proj_biases.get_shape().assert_is_compatible_with([num_symbols])

  251.     embedding = variable_scope.get_variable("embedding",
  252.                                             [num_symbols, embedding_size])
  253.     loop_function = _extract_argmax_and_embed(
  254.         embedding, output_projection,
  255.         update_embedding_for_previous) if feed_previous else None
  256.     emb_inp = (embedding_ops.embedding_lookup(embedding, i)
  257.                for i in decoder_inputs)
  258.     return rnn_decoder(
  259.         emb_inp, initial_state, cell, loop_function=loop_function)


  260. def embedding_rnn_seq2seq(encoder_inputs,
  261.                           decoder_inputs,
  262.                           cell,
  263.                           num_encoder_symbols,
  264.                           num_decoder_symbols,
  265.                           embedding_size,
  266.                           output_projection=None,
  267.                           feed_previous=False,
  268.                           dtype=None,
  269.                           scope=None):
  270.   """Embedding RNN sequence-to-sequence model.

  271.   This model first embeds encoder_inputs by a newly created embedding (of shape
  272.   [num_encoder_symbols x input_size]). Then it runs an RNN to encode
  273.   embedded encoder_inputs into a state vector. Next, it embeds decoder_inputs
  274.   by another newly created embedding (of shape [num_decoder_symbols x
  275.   input_size]). Then it runs RNN decoder, initialized with the last
  276.   encoder state, on embedded decoder_inputs.

  277.   Args:
  278.     encoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
  279.     decoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
  280.     cell: tf.nn.rnn_cell.RNNCell defining the cell function and size.
  281.     num_encoder_symbols: Integer; number of symbols on the encoder side.
  282.     num_decoder_symbols: Integer; number of symbols on the decoder side.
  283.     embedding_size: Integer, the length of the embedding vector for each symbol.
  284.     output_projection: None or a pair (W, B) of output projection weights and
  285.       biases; W has shape [output_size x num_decoder_symbols] and B has
  286.       shape [num_decoder_symbols]; if provided and feed_previous=True, each
  287.       fed previous output will first be multiplied by W and added B.
  288.     feed_previous: Boolean or scalar Boolean Tensor; if True, only the first
  289.       of decoder_inputs will be used (the "GO" symbol), and all other decoder
  290.       inputs will be taken from previous outputs (as in embedding_rnn_decoder).
  291.       If False, decoder_inputs are used as given (the standard decoder case).
  292.     dtype: The dtype of the initial state for both the encoder and encoder
  293.       rnn cells (default: tf.float32).
  294.     scope: VariableScope for the created subgraph; defaults to
  295.       "embedding_rnn_seq2seq"

  296.   Returns:
  297.     A tuple of the form (outputs, state), where:
  298.       outputs: A list of the same length as decoder_inputs of 2D Tensors. The
  299.         output is of shape [batch_size x cell.output_size] when
  300.         output_projection is not None (and represents the dense representation
  301.         of predicted tokens). It is of shape [batch_size x num_decoder_symbols]
  302.         when output_projection is None.
  303.       state: The state of each decoder cell in each time-step. This is a list
  304.         with length len(decoder_inputs) -- one item for each time-step.
  305.         It is a 2D Tensor of shape [batch_size x cell.state_size].
  306.   """
  307.   with variable_scope.variable_scope(scope or "embedding_rnn_seq2seq") as scope:
  308.     if dtype is not None:
  309.       scope.set_dtype(dtype)
  310.     else:
  311.       dtype = scope.dtype

  312.     # Encoder.
  313.     encoder_cell = copy.deepcopy(cell)
  314.     encoder_cell = core_rnn_cell.EmbeddingWrapper(
  315.         encoder_cell,
  316.         embedding_classes=num_encoder_symbols,
  317.         embedding_size=embedding_size)
  318.     _, encoder_state = rnn.static_rnn(encoder_cell, encoder_inputs, dtype=dtype)

  319.     # Decoder.
  320.     if output_projection is None:
  321.       cell = core_rnn_cell.OutputProjectionWrapper(cell, num_decoder_symbols)

  322.     if isinstance(feed_previous, bool):
  323.       return embedding_rnn_decoder(
  324.           decoder_inputs,
  325.           encoder_state,
  326.           cell,
  327.           num_decoder_symbols,
  328.           embedding_size,
  329.           output_projection=output_projection,
  330.           feed_previous=feed_previous)

  331.     # If feed_previous is a Tensor, we construct 2 graphs and use cond.
  332.     def decoder(feed_previous_bool):
  333.       reuse = None if feed_previous_bool else True
  334.       with variable_scope.variable_scope(
  335.           variable_scope.get_variable_scope(), reuse=reuse):
  336.         outputs, state = embedding_rnn_decoder(
  337.             decoder_inputs,
  338.             encoder_state,
  339.             cell,
  340.             num_decoder_symbols,
  341.             embedding_size,
  342.             output_projection=output_projection,
  343.             feed_previous=feed_previous_bool,
  344.             update_embedding_for_previous=False)
  345.         state_list = [state]
  346.         if nest.is_sequence(state):
  347.           state_list = nest.flatten(state)
  348.         return outputs + state_list

  349.     outputs_and_state = control_flow_ops.cond(feed_previous,
  350.                                               lambda: decoder(True),
  351.                                               lambda: decoder(False))
  352.     outputs_len = len(decoder_inputs)  # Outputs length same as decoder inputs.
  353.     state_list = outputs_and_state[outputs_len:]
  354.     state = state_list[0]
  355.     if nest.is_sequence(encoder_state):
  356.       state = nest.pack_sequence_as(
  357.           structure=encoder_state, flat_sequence=state_list)
  358.     return outputs_and_state[:outputs_len], state


  359. def embedding_tied_rnn_seq2seq(encoder_inputs,
  360.                                decoder_inputs,
  361.                                cell,
  362.                                num_symbols,
  363.                                embedding_size,
  364.                                num_decoder_symbols=None,
  365.                                output_projection=None,
  366.                                feed_previous=False,
  367.                                dtype=None,
  368.                                scope=None):
  369.   """Embedding RNN sequence-to-sequence model with tied (shared) parameters.

  370.   This model first embeds encoder_inputs by a newly created embedding (of shape
  371.   [num_symbols x input_size]). Then it runs an RNN to encode embedded
  372.   encoder_inputs into a state vector. Next, it embeds decoder_inputs using
  373.   the same embedding. Then it runs RNN decoder, initialized with the last
  374.   encoder state, on embedded decoder_inputs. The decoder output is over symbols
  375.   from 0 to num_decoder_symbols - 1 if num_decoder_symbols is none; otherwise it
  376.   is over 0 to num_symbols - 1.

  377.   Args:
  378.     encoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
  379.     decoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
  380.     cell: tf.nn.rnn_cell.RNNCell defining the cell function and size.
  381.     num_symbols: Integer; number of symbols for both encoder and decoder.
  382.     embedding_size: Integer, the length of the embedding vector for each symbol.
  383.     num_decoder_symbols: Integer; number of output symbols for decoder. If
  384.       provided, the decoder output is over symbols 0 to num_decoder_symbols - 1.
  385.       Otherwise, decoder output is over symbols 0 to num_symbols - 1. Note that
  386.       this assumes that the vocabulary is set up such that the first
  387.       num_decoder_symbols of num_symbols are part of decoding.
  388.     output_projection: None or a pair (W, B) of output projection weights and
  389.       biases; W has shape [output_size x num_symbols] and B has
  390.       shape [num_symbols]; if provided and feed_previous=True, each
  391.       fed previous output will first be multiplied by W and added B.
  392.     feed_previous: Boolean or scalar Boolean Tensor; if True, only the first
  393.       of decoder_inputs will be used (the "GO" symbol), and all other decoder
  394.       inputs will be taken from previous outputs (as in embedding_rnn_decoder).
  395.       If False, decoder_inputs are used as given (the standard decoder case).
  396.     dtype: The dtype to use for the initial RNN states (default: tf.float32).
  397.     scope: VariableScope for the created subgraph; defaults to
  398.       "embedding_tied_rnn_seq2seq".

  399.   Returns:
  400.     A tuple of the form (outputs, state), where:
  401.       outputs: A list of the same length as decoder_inputs of 2D Tensors with
  402.         shape [batch_size x output_symbols] containing the generated
  403.         outputs where output_symbols = num_decoder_symbols if
  404.         num_decoder_symbols is not None otherwise output_symbols = num_symbols.
  405.       state: The state of each decoder cell at the final time-step.
  406.         It is a 2D Tensor of shape [batch_size x cell.state_size].

  407.   Raises:
  408.     ValueError: When output_projection has the wrong shape.
  409.   """
  410.   with variable_scope.variable_scope(
  411.       scope or "embedding_tied_rnn_seq2seq", dtype=dtype) as scope:
  412.     dtype = scope.dtype

  413.     if output_projection is not None:
  414.       proj_weights = ops.convert_to_tensor(output_projection[0], dtype=dtype)
  415.       proj_weights.get_shape().assert_is_compatible_with([None, num_symbols])
  416.       proj_biases = ops.convert_to_tensor(output_projection[1], dtype=dtype)
  417.       proj_biases.get_shape().assert_is_compatible_with([num_symbols])

  418.     embedding = variable_scope.get_variable(
  419.         "embedding", [num_symbols, embedding_size], dtype=dtype)

  420.     emb_encoder_inputs = [
  421.         embedding_ops.embedding_lookup(embedding, x) for x in encoder_inputs
  422.     ]
  423.     emb_decoder_inputs = [
  424.         embedding_ops.embedding_lookup(embedding, x) for x in decoder_inputs
  425.     ]

  426.     output_symbols = num_symbols
  427.     if num_decoder_symbols is not None:
  428.       output_symbols = num_decoder_symbols
  429.     if output_projection is None:
  430.       cell = core_rnn_cell.OutputProjectionWrapper(cell, output_symbols)

  431.     if isinstance(feed_previous, bool):
  432.       loop_function = _extract_argmax_and_embed(embedding, output_projection,
  433.                                                 True) if feed_previous else None
  434.       return tied_rnn_seq2seq(
  435.           emb_encoder_inputs,
  436.           emb_decoder_inputs,
  437.           cell,
  438.           loop_function=loop_function,
  439.           dtype=dtype)

  440.     # If feed_previous is a Tensor, we construct 2 graphs and use cond.
  441.     def decoder(feed_previous_bool):
  442.       loop_function = _extract_argmax_and_embed(
  443.           embedding, output_projection, False) if feed_previous_bool else None
  444.       reuse = None if feed_previous_bool else True
  445.       with variable_scope.variable_scope(
  446.           variable_scope.get_variable_scope(), reuse=reuse):
  447.         outputs, state = tied_rnn_seq2seq(
  448.             emb_encoder_inputs,
  449.             emb_decoder_inputs,
  450.             cell,
  451.             loop_function=loop_function,
  452.             dtype=dtype)
  453.         state_list = [state]
  454.         if nest.is_sequence(state):
  455.           state_list = nest.flatten(state)
  456.         return outputs + state_list

  457.     outputs_and_state = control_flow_ops.cond(feed_previous,
  458.                                               lambda: decoder(True),
  459.                                               lambda: decoder(False))
  460.     outputs_len = len(decoder_inputs)  # Outputs length same as decoder inputs.
  461.     state_list = outputs_and_state[outputs_len:]
  462.     state = state_list[0]
  463.     # Calculate zero-state to know it's structure.
  464.     static_batch_size = encoder_inputs[0].get_shape()[0]
  465.     for inp in encoder_inputs[1:]:
  466.       static_batch_size.merge_with(inp.get_shape()[0])
  467.     batch_size = static_batch_size.value
  468.     if batch_size is None:
  469.       batch_size = array_ops.shape(encoder_inputs[0])[0]
  470.     zero_state = cell.zero_state(batch_size, dtype)
  471.     if nest.is_sequence(zero_state):
  472.       state = nest.pack_sequence_as(
  473.           structure=zero_state, flat_sequence=state_list)
  474.     return outputs_and_state[:outputs_len], state


  475. def attention_decoder(decoder_inputs,
  476.                       initial_state,
  477.                       attention_states,
  478.                       cell,
  479.                       output_size=None,
  480.                       num_heads=1,
  481.                       loop_function=None,
  482.                       dtype=None,
  483.                       scope=None,
  484.                       initial_state_attention=False):
  485.   """RNN decoder with attention for the sequence-to-sequence model.

  486.   In this context "attention" means that, during decoding, the RNN can look up
  487.   information in the additional tensor attention_states, and it does this by
  488.   focusing on a few entries from the tensor. This model has proven to yield
  489.   especially good results in a number of sequence-to-sequence tasks. This
  490.   implementation is based on http://arxiv.org/abs/1412.7449 (see below for
  491.   details). It is recommended for complex sequence-to-sequence tasks.

  492.   Args:
  493.     decoder_inputs: A list of 2D Tensors [batch_size x input_size].
  494.     initial_state: 2D Tensor [batch_size x cell.state_size].
  495.     attention_states: 3D Tensor [batch_size x attn_length x attn_size].
  496.     cell: tf.nn.rnn_cell.RNNCell defining the cell function and size.
  497.     output_size: Size of the output vectors; if None, we use cell.output_size.
  498.     num_heads: Number of attention heads that read from attention_states.
  499.     loop_function: If not None, this function will be applied to i-th output
  500.       in order to generate i+1-th input, and decoder_inputs will be ignored,
  501.       except for the first element ("GO" symbol). This can be used for decoding,
  502.       but also for training to emulate http://arxiv.org/abs/1506.03099.
  503.       Signature -- loop_function(prev, i) = next
  504.         * prev is a 2D Tensor of shape [batch_size x output_size],
  505.         * i is an integer, the step number (when advanced control is needed),
  506.         * next is a 2D Tensor of shape [batch_size x input_size].
  507.     dtype: The dtype to use for the RNN initial state (default: tf.float32).
  508.     scope: VariableScope for the created subgraph; default: "attention_decoder".
  509.     initial_state_attention: If False (default), initial attentions are zero.
  510.       If True, initialize the attentions from the initial state and attention
  511.       states -- useful when we wish to resume decoding from a previously
  512.       stored decoder state and attention states.

  513.   Returns:
  514.     A tuple of the form (outputs, state), where:
  515.       outputs: A list of the same length as decoder_inputs of 2D Tensors of
  516.         shape [batch_size x output_size]. These represent the generated outputs.
  517.         Output i is computed from input i (which is either the i-th element
  518.         of decoder_inputs or loop_function(output {i-1}, i)) as follows.
  519.         First, we run the cell on a combination of the input and previous
  520.         attention masks:
  521.           cell_output, new_state = cell(linear(input, prev_attn), prev_state).
  522.         Then, we calculate new attention masks:
  523.           new_attn = softmax(V^T * tanh(W * attention_states + U * new_state))
  524.         and then we calculate the output:
  525.           output = linear(cell_output, new_attn).
  526.       state: The state of each decoder cell the final time-step.
  527.         It is a 2D Tensor of shape [batch_size x cell.state_size].

  528.   Raises:
  529.     ValueError: when num_heads is not positive, there are no inputs, shapes
  530.       of attention_states are not set, or input size cannot be inferred
  531.       from the input.
  532.   """
  533.   if not decoder_inputs:
  534.     raise ValueError("Must provide at least 1 input to attention decoder.")
  535.   if num_heads < 1:
  536.     raise ValueError("With less than 1 heads, use a non-attention decoder.")
  537.   if attention_states.get_shape()[2].value is None:
  538.     raise ValueError("Shape[2] of attention_states must be known: %s" %
  539.                      attention_states.get_shape())
  540.   if output_size is None:
  541.     output_size = cell.output_size

  542.   with variable_scope.variable_scope(
  543.       scope or "attention_decoder", dtype=dtype) as scope:
  544.     dtype = scope.dtype

  545.     batch_size = array_ops.shape(decoder_inputs[0])[0]  # Needed for reshaping.
  546.     attn_length = attention_states.get_shape()[1].value
  547.     if attn_length is None:
  548.       attn_length = array_ops.shape(attention_states)[1]
  549.     attn_size = attention_states.get_shape()[2].value

  550.     # To calculate W1 * h_t we use a 1-by-1 convolution, need to reshape before.
  551.     hidden = array_ops.reshape(attention_states,
  552.                                [-1, attn_length, 1, attn_size])
  553.     hidden_features = []
  554.     v = []
  555.     attention_vec_size = attn_size  # Size of query vectors for attention.
  556.     for a in xrange(num_heads):
  557.       k = variable_scope.get_variable("AttnW_%d" % a,
  558.                                       [1, 1, attn_size, attention_vec_size])
  559.       hidden_features.append(nn_ops.conv2d(hidden, k, [1, 1, 1, 1], "SAME"))
  560.       v.append(
  561.           variable_scope.get_variable("AttnV_%d" % a, [attention_vec_size]))

  562.     state = initial_state

  563.     def attention(query):
  564.       """Put attention masks on hidden using hidden_features and query."""
  565.       ds = []  # Results of attention reads will be stored here.
  566.       if nest.is_sequence(query):  # If the query is a tuple, flatten it.
  567.         query_list = nest.flatten(query)
  568.         for q in query_list:  # Check that ndims == 2 if specified.
  569.           ndims = q.get_shape().ndims
  570.           if ndims:
  571.             assert ndims == 2
  572.         query = array_ops.concat(query_list, 1)
  573.       for a in xrange(num_heads):
  574.         with variable_scope.variable_scope("Attention_%d" % a):
  575.           y = linear(query, attention_vec_size, True)
  576.           y = array_ops.reshape(y, [-1, 1, 1, attention_vec_size])
  577.           # Attention mask is a softmax of v^T * tanh(...).
  578.           s = math_ops.reduce_sum(v[a] * math_ops.tanh(hidden_features[a] + y),
  579.                                   [2, 3])
  580.           a = nn_ops.softmax(s)
  581.           # Now calculate the attention-weighted vector d.
  582.           d = math_ops.reduce_sum(
  583.               array_ops.reshape(a, [-1, attn_length, 1, 1]) * hidden, [1, 2])
  584.           ds.append(array_ops.reshape(d, [-1, attn_size]))
  585.       return ds

  586.     outputs = []
  587.     prev = None
  588.     batch_attn_size = array_ops.stack([batch_size, attn_size])
  589.     attns = [
  590.         array_ops.zeros(
  591.             batch_attn_size, dtype=dtype) for _ in xrange(num_heads)
  592.     ]
  593.     for a in attns:  # Ensure the second shape of attention vectors is set.
  594.       a.set_shape([None, attn_size])
  595.     if initial_state_attention:
  596.       attns = attention(initial_state)
  597.     for i, inp in enumerate(decoder_inputs):
  598.       if i > 0:
  599.         variable_scope.get_variable_scope().reuse_variables()
  600.       # If loop_function is set, we use it instead of decoder_inputs.
  601.       if loop_function is not None and prev is not None:
  602.         with variable_scope.variable_scope("loop_function", reuse=True):
  603.           inp = loop_function(prev, i)
  604.       # Merge input and previous attentions into one vector of the right size.
  605.       input_size = inp.get_shape().with_rank(2)[1]
  606.       if input_size.value is None:
  607.         raise ValueError("Could not infer input size from input: %s" % inp.name)
  608.       x = linear([inp] + attns, input_size, True)
  609.       # Run the RNN.
  610.       cell_output, state = cell(x, state)
  611.       # Run the attention mechanism.
  612.       if i == 0 and initial_state_attention:
  613.         with variable_scope.variable_scope(
  614.             variable_scope.get_variable_scope(), reuse=True):
  615.           attns = attention(state)
  616.       else:
  617.         attns = attention(state)

  618.       with variable_scope.variable_scope("AttnOutputProjection"):
  619.         output = linear([cell_output] + attns, output_size, True)
  620.       if loop_function is not None:
  621.         prev = output
  622.       outputs.append(output)

  623.   return outputs, state


  624. def embedding_attention_decoder(decoder_inputs,
  625.                                 initial_state,
  626.                                 attention_states,
  627.                                 cell,
  628.                                 num_symbols,
  629.                                 embedding_size,
  630.                                 num_heads=1,
  631.                                 output_size=None,
  632.                                 output_projection=None,
  633.                                 feed_previous=False,
  634.                                 update_embedding_for_previous=True,
  635.                                 dtype=None,
  636.                                 scope=None,
  637.                                 initial_state_attention=False):
  638.   """RNN decoder with embedding and attention and a pure-decoding option.

  639.   Args:
  640.     decoder_inputs: A list of 1D batch-sized int32 Tensors (decoder inputs).
  641.     initial_state: 2D Tensor [batch_size x cell.state_size].
  642.     attention_states: 3D Tensor [batch_size x attn_length x attn_size].
  643.     cell: tf.nn.rnn_cell.RNNCell defining the cell function.
  644.     num_symbols: Integer, how many symbols come into the embedding.
  645.     embedding_size: Integer, the length of the embedding vector for each symbol.
  646.     num_heads: Number of attention heads that read from attention_states.
  647.     output_size: Size of the output vectors; if None, use output_size.
  648.     output_projection: None or a pair (W, B) of output projection weights and
  649.       biases; W has shape [output_size x num_symbols] and B has shape
  650.       [num_symbols]; if provided and feed_previous=True, each fed previous
  651.       output will first be multiplied by W and added B.
  652.     feed_previous: Boolean; if True, only the first of decoder_inputs will be
  653.       used (the "GO" symbol), and all other decoder inputs will be generated by:
  654.         next = embedding_lookup(embedding, argmax(previous_output)),
  655.       In effect, this implements a greedy decoder. It can also be used
  656.       during training to emulate http://arxiv.org/abs/1506.03099.
  657.       If False, decoder_inputs are used as given (the standard decoder case).
  658.     update_embedding_for_previous: Boolean; if False and feed_previous=True,
  659.       only the embedding for the first symbol of decoder_inputs (the "GO"
  660.       symbol) will be updated by back propagation. Embeddings for the symbols
  661.       generated from the decoder itself remain unchanged. This parameter has
  662.       no effect if feed_previous=False.
  663.     dtype: The dtype to use for the RNN initial states (default: tf.float32).
  664.     scope: VariableScope for the created subgraph; defaults to
  665.       "embedding_attention_decoder".
  666.     initial_state_attention: If False (default), initial attentions are zero.
  667.       If True, initialize the attentions from the initial state and attention
  668.       states -- useful when we wish to resume decoding from a previously
  669.       stored decoder state and attention states.

  670.   Returns:
  671.     A tuple of the form (outputs, state), where:
  672.       outputs: A list of the same length as decoder_inputs of 2D Tensors with
  673.         shape [batch_size x output_size] containing the generated outputs.
  674.       state: The state of each decoder cell at the final time-step.
  675.         It is a 2D Tensor of shape [batch_size x cell.state_size].

  676.   Raises:
  677.     ValueError: When output_projection has the wrong shape.
  678.   """
  679.   if output_size is None:
  680.     output_size = cell.output_size
  681.   if output_projection is not None:
  682.     proj_biases = ops.convert_to_tensor(output_projection[1], dtype=dtype)
  683.     proj_biases.get_shape().assert_is_compatible_with([num_symbols])

  684.   with variable_scope.variable_scope(
  685.       scope or "embedding_attention_decoder", dtype=dtype) as scope:

  686.     embedding = variable_scope.get_variable("embedding",
  687.                                             [num_symbols, embedding_size])
  688.     loop_function = _extract_argmax_and_embed(
  689.         embedding, output_projection,
  690.         update_embedding_for_previous) if feed_previous else None
  691.     emb_inp = [
  692.         embedding_ops.embedding_lookup(embedding, i) for i in decoder_inputs
  693.     ]
  694.     return attention_decoder(
  695.         emb_inp,
  696.         initial_state,
  697.         attention_states,
  698.         cell,
  699.         output_size=output_size,
  700.         num_heads=num_heads,
  701.         loop_function=loop_function,
  702.         initial_state_attention=initial_state_attention)


  703. def embedding_attention_seq2seq(encoder_inputs,
  704.                                 decoder_inputs,
  705.                                 encoder_cell,
  706.                                 cell,
  707.                                 num_encoder_symbols,
  708.                                 num_decoder_symbols,
  709.                                 embedding_size,
  710.                                 num_heads=1,
  711.                                 output_projection=None,
  712.                                 feed_previous=False,
  713.                                 dtype=None,
  714.                                 scope=None,
  715.                                 initial_state_attention=False):
  716.   """Embedding sequence-to-sequence model with attention.

  717.   This model first embeds encoder_inputs by a newly created embedding (of shape
  718.   [num_encoder_symbols x input_size]). Then it runs an RNN to encode
  719.   embedded encoder_inputs into a state vector. It keeps the outputs of this
  720.   RNN at every step to use for attention later. Next, it embeds decoder_inputs
  721.   by another newly created embedding (of shape [num_decoder_symbols x
  722.   input_size]). Then it runs attention decoder, initialized with the last
  723.   encoder state, on embedded decoder_inputs and attending to encoder outputs.

  724.   Warning: when output_projection is None, the size of the attention vectors
  725.   and variables will be made proportional to num_decoder_symbols, can be large.

  726.   Args:
  727.     encoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
  728.     decoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
  729.     cell: tf.nn.rnn_cell.RNNCell defining the cell function and size.
  730.     num_encoder_symbols: Integer; number of symbols on the encoder side.
  731.     num_decoder_symbols: Integer; number of symbols on the decoder side.
  732.     embedding_size: Integer, the length of the embedding vector for each symbol.
  733.     num_heads: Number of attention heads that read from attention_states.
  734.     output_projection: None or a pair (W, B) of output projection weights and
  735.       biases; W has shape [output_size x num_decoder_symbols] and B has
  736.       shape [num_decoder_symbols]; if provided and feed_previous=True, each
  737.       fed previous output will first be multiplied by W and added B.
  738.     feed_previous: Boolean or scalar Boolean Tensor; if True, only the first
  739.       of decoder_inputs will be used (the "GO" symbol), and all other decoder
  740.       inputs will be taken from previous outputs (as in embedding_rnn_decoder).
  741.       If False, decoder_inputs are used as given (the standard decoder case).
  742.     dtype: The dtype of the initial RNN state (default: tf.float32).
  743.     scope: VariableScope for the created subgraph; defaults to
  744.       "embedding_attention_seq2seq".
  745.     initial_state_attention: If False (default), initial attentions are zero.
  746.       If True, initialize the attentions from the initial state and attention
  747.       states.

  748.   Returns:
  749.     A tuple of the form (outputs, state), where:
  750.       outputs: A list of the same length as decoder_inputs of 2D Tensors with
  751.         shape [batch_size x num_decoder_symbols] containing the generated
  752.         outputs.
  753.       state: The state of each decoder cell at the final time-step.
  754.         It is a 2D Tensor of shape [batch_size x cell.state_size].
  755.   """
  756.   with variable_scope.variable_scope(
  757.       scope or "embedding_attention_seq2seq", dtype=dtype) as scope:
  758.     dtype = scope.dtype
  759.     # Encoder.
  760.     #encoder_cell = copy.deepcopy(cell)
  761.     encoder_cell = core_rnn_cell.EmbeddingWrapper(
  762.         encoder_cell,
  763.         embedding_classes=num_encoder_symbols,
  764.         embedding_size=embedding_size)
  765.     encoder_outputs, encoder_state = rnn.static_rnn(
  766.         encoder_cell, encoder_inputs, dtype=dtype)

  767.     # First calculate a concatenation of encoder outputs to put attention on.
  768.     top_states = [
  769.         array_ops.reshape(e, [-1, 1, cell.output_size]) for e in encoder_outputs
  770.     ]
  771.     attention_states = array_ops.concat(top_states, 1)

  772.     # Decoder.
  773.     output_size = None
  774.     if output_projection is None:
  775.       cell = core_rnn_cell.OutputProjectionWrapper(cell, num_decoder_symbols)
  776.       output_size = num_decoder_symbols

  777.     if isinstance(feed_previous, bool):
  778.       return embedding_attention_decoder(
  779.           decoder_inputs,
  780.           encoder_state,
  781.           attention_states,
  782.           cell,
  783.           num_decoder_symbols,
  784.           embedding_size,
  785.           num_heads=num_heads,
  786.           output_size=output_size,
  787.           output_projection=output_projection,
  788.           feed_previous=feed_previous,
  789.           initial_state_attention=initial_state_attention)

  790.     # If feed_previous is a Tensor, we construct 2 graphs and use cond.
  791.     def decoder(feed_previous_bool):
  792.       reuse = None if feed_previous_bool else True
  793.       with variable_scope.variable_scope(
  794.           variable_scope.get_variable_scope(), reuse=reuse):
  795.         outputs, state = embedding_attention_decoder(
  796.             decoder_inputs,
  797.             encoder_state,
  798.             attention_states,
  799.             cell,
  800.             num_decoder_symbols,
  801.             embedding_size,
  802.             num_heads=num_heads,
  803.             output_size=output_size,
  804.             output_projection=output_projection,
  805.             feed_previous=feed_previous_bool,
  806.             update_embedding_for_previous=False,
  807.             initial_state_attention=initial_state_attention)
  808.         state_list = [state]
  809.         if nest.is_sequence(state):
  810.           state_list = nest.flatten(state)
  811.         return outputs + state_list

  812.     outputs_and_state = control_flow_ops.cond(feed_previous,
  813.                                               lambda: decoder(True),
  814.                                               lambda: decoder(False))
  815.     outputs_len = len(decoder_inputs)  # Outputs length same as decoder inputs.
  816.     state_list = outputs_and_state[outputs_len:]
  817.     state = state_list[0]
  818.     if nest.is_sequence(encoder_state):
  819.       state = nest.pack_sequence_as(
  820.           structure=encoder_state, flat_sequence=state_list)
  821.     return outputs_and_state[:outputs_len], state


  822. def one2many_rnn_seq2seq(encoder_inputs,
  823.                          decoder_inputs_dict,
  824.                          enc_cell,
  825.                          dec_cells_dict,
  826.                          num_encoder_symbols,
  827.                          num_decoder_symbols_dict,
  828.                          embedding_size,
  829.                          feed_previous=False,
  830.                          dtype=None,
  831.                          scope=None):
  832.   """One-to-many RNN sequence-to-sequence model (multi-task).

  833.   This is a multi-task sequence-to-sequence model with one encoder and multiple
  834.   decoders. Reference to multi-task sequence-to-sequence learning can be found
  835.   here: http://arxiv.org/abs/1511.06114

  836.   Args:
  837.     encoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
  838.     decoder_inputs_dict: A dictionary mapping decoder name (string) to
  839.       the corresponding decoder_inputs; each decoder_inputs is a list of 1D
  840.       Tensors of shape [batch_size]; num_decoders is defined as
  841.       len(decoder_inputs_dict).
  842.     enc_cell: tf.nn.rnn_cell.RNNCell defining the encoder cell function and
  843.       size.
  844.     dec_cells_dict: A dictionary mapping encoder name (string) to an
  845.       instance of tf.nn.rnn_cell.RNNCell.
  846.     num_encoder_symbols: Integer; number of symbols on the encoder side.
  847.     num_decoder_symbols_dict: A dictionary mapping decoder name (string) to an
  848.       integer specifying number of symbols for the corresponding decoder;
  849.       len(num_decoder_symbols_dict) must be equal to num_decoders.
  850.     embedding_size: Integer, the length of the embedding vector for each symbol.
  851.     feed_previous: Boolean or scalar Boolean Tensor; if True, only the first of
  852.       decoder_inputs will be used (the "GO" symbol), and all other decoder
  853.       inputs will be taken from previous outputs (as in embedding_rnn_decoder).
  854.       If False, decoder_inputs are used as given (the standard decoder case).
  855.     dtype: The dtype of the initial state for both the encoder and encoder
  856.       rnn cells (default: tf.float32).
  857.     scope: VariableScope for the created subgraph; defaults to
  858.       "one2many_rnn_seq2seq"

  859.   Returns:
  860.     A tuple of the form (outputs_dict, state_dict), where:
  861.       outputs_dict: A mapping from decoder name (string) to a list of the same
  862.         length as decoder_inputs_dict[name]; each element in the list is a 2D
  863.         Tensors with shape [batch_size x num_decoder_symbol_list[name]]
  864.         containing the generated outputs.
  865.       state_dict: A mapping from decoder name (string) to the final state of the
  866.         corresponding decoder RNN; it is a 2D Tensor of shape
  867.         [batch_size x cell.state_size].

  868.   Raises:
  869.     TypeError: if enc_cell or any of the dec_cells are not instances of RNNCell.
  870.     ValueError: if len(dec_cells) != len(decoder_inputs_dict).
  871.   """
  872.   outputs_dict = {}
  873.   state_dict = {}

  874.   if not isinstance(enc_cell, rnn_cell_impl.RNNCell):
  875.     raise TypeError("enc_cell is not an RNNCell: %s" % type(enc_cell))
  876.   if set(dec_cells_dict) != set(decoder_inputs_dict):
  877.     raise ValueError("keys of dec_cells_dict != keys of decodre_inputs_dict")
  878.   for dec_cell in dec_cells_dict.values():
  879.     if not isinstance(dec_cell, rnn_cell_impl.RNNCell):
  880.       raise TypeError("dec_cell is not an RNNCell: %s" % type(dec_cell))

  881.   with variable_scope.variable_scope(
  882.       scope or "one2many_rnn_seq2seq", dtype=dtype) as scope:
  883.     dtype = scope.dtype

  884.     # Encoder.
  885.     enc_cell = core_rnn_cell.EmbeddingWrapper(
  886.         enc_cell,
  887.         embedding_classes=num_encoder_symbols,
  888.         embedding_size=embedding_size)
  889.     _, encoder_state = rnn.static_rnn(enc_cell, encoder_inputs, dtype=dtype)

  890.     # Decoder.
  891.     for name, decoder_inputs in decoder_inputs_dict.items():
  892.       num_decoder_symbols = num_decoder_symbols_dict[name]
  893.       dec_cell = dec_cells_dict[name]

  894.       with variable_scope.variable_scope("one2many_decoder_" + str(
  895.           name)) as scope:
  896.         dec_cell = core_rnn_cell.OutputProjectionWrapper(
  897.             dec_cell, num_decoder_symbols)
  898.         if isinstance(feed_previous, bool):
  899.           outputs, state = embedding_rnn_decoder(
  900.               decoder_inputs,
  901.               encoder_state,
  902.               dec_cell,
  903.               num_decoder_symbols,
  904.               embedding_size,
  905.               feed_previous=feed_previous)
  906.         else:
  907.           # If feed_previous is a Tensor, we construct 2 graphs and use cond.
  908.           def filled_embedding_rnn_decoder(feed_previous):
  909.             """The current decoder with a fixed feed_previous parameter."""
  910.             # pylint: disable=cell-var-from-loop
  911.             reuse = None if feed_previous else True
  912.             vs = variable_scope.get_variable_scope()
  913.             with variable_scope.variable_scope(vs, reuse=reuse):
  914.               outputs, state = embedding_rnn_decoder(
  915.                   decoder_inputs,
  916.                   encoder_state,
  917.                   dec_cell,
  918.                   num_decoder_symbols,
  919.                   embedding_size,
  920.                   feed_previous=feed_previous)
  921.             # pylint: enable=cell-var-from-loop
  922.             state_list = [state]
  923.             if nest.is_sequence(state):
  924.               state_list = nest.flatten(state)
  925.             return outputs + state_list

  926.           outputs_and_state = control_flow_ops.cond(
  927.               feed_previous, lambda: filled_embedding_rnn_decoder(True),
  928.               lambda: filled_embedding_rnn_decoder(False))
  929.           # Outputs length is the same as for decoder inputs.
  930.           outputs_len = len(decoder_inputs)
  931.           outputs = outputs_and_state[:outputs_len]
  932.           state_list = outputs_and_state[outputs_len:]
  933.           state = state_list[0]
  934.           if nest.is_sequence(encoder_state):
  935.             state = nest.pack_sequence_as(
  936.                 structure=encoder_state, flat_sequence=state_list)
  937.       outputs_dict[name] = outputs
  938.       state_dict[name] = state

  939.   return outputs_dict, state_dict


  940. def sequence_loss_by_example(logits,
  941.                              targets,
  942.                              weights,
  943.                              average_across_timesteps=True,
  944.                              softmax_loss_function=None,
  945.                              name=None):
  946.   """Weighted cross-entropy loss for a sequence of logits (per example).

  947.   Args:
  948.     logits: List of 2D Tensors of shape [batch_size x num_decoder_symbols].
  949.     targets: List of 1D batch-sized int32 Tensors of the same length as logits.
  950.     weights: List of 1D batch-sized float-Tensors of the same length as logits.
  951.     average_across_timesteps: If set, divide the returned cost by the total
  952.       label weight.
  953.     softmax_loss_function: Function (labels, logits) -> loss-batch
  954.       to be used instead of the standard softmax (the default if this is None).
  955.       **Note that to avoid confusion, it is required for the function to accept
  956.       named arguments.**
  957.     name: Optional name for this operation, default: "sequence_loss_by_example".

  958.   Returns:
  959.     1D batch-sized float Tensor: The log-perplexity for each sequence.

  960.   Raises:
  961.     ValueError: If len(logits) is different from len(targets) or len(weights).
  962.   """
  963.   if len(targets) != len(logits) or len(weights) != len(logits):
  964.     raise ValueError("Lengths of logits, weights, and targets must be the same "
  965.                      "%d, %d, %d." % (len(logits), len(weights), len(targets)))
  966.   with ops.name_scope(name, "sequence_loss_by_example",
  967.                       logits + targets + weights):
  968.     log_perp_list = []
  969.     for logit, target, weight in zip(logits, targets, weights):
  970.       if softmax_loss_function is None:
  971.         # TODO(irving,ebrevdo): This reshape is needed because
  972.         # sequence_loss_by_example is called with scalars sometimes, which
  973.         # violates our general scalar strictness policy.
  974.         target = array_ops.reshape(target, [-1])
  975.         crossent = nn_ops.sparse_softmax_cross_entropy_with_logits(
  976.             labels=target, logits=logit)
  977.       else:
  978.         crossent = softmax_loss_function(labels=target, logits=logit)
  979.       log_perp_list.append(crossent * weight)
  980.     log_perps = math_ops.add_n(log_perp_list)
  981.     if average_across_timesteps:
  982.       total_size = math_ops.add_n(weights)
  983.       total_size += 1e-12  # Just to avoid division by 0 for all-0 weights.
  984.       log_perps /= total_size
  985.   return log_perps


  986. def sequence_loss(logits,
  987.                   targets,
  988.                   weights,
  989.                   average_across_timesteps=True,
  990.                   average_across_batch=True,
  991.                   softmax_loss_function=None,
  992.                   name=None):
  993.   """Weighted cross-entropy loss for a sequence of logits, batch-collapsed.

  994.   Args:
  995.     logits: List of 2D Tensors of shape [batch_size x num_decoder_symbols].
  996.     targets: List of 1D batch-sized int32 Tensors of the same length as logits.
  997.     weights: List of 1D batch-sized float-Tensors of the same length as logits.
  998.     average_across_timesteps: If set, divide the returned cost by the total
  999.       label weight.
  1000.     average_across_batch: If set, divide the returned cost by the batch size.
  1001.     softmax_loss_function: Function (labels, logits) -> loss-batch
  1002.       to be used instead of the standard softmax (the default if this is None).
  1003.       **Note that to avoid confusion, it is required for the function to accept
  1004.       named arguments.**
  1005.     name: Optional name for this operation, defaults to "sequence_loss".

  1006.   Returns:
  1007.     A scalar float Tensor: The average log-perplexity per symbol (weighted).

  1008.   Raises:
  1009.     ValueError: If len(logits) is different from len(targets) or len(weights).
  1010.   """
  1011.   with ops.name_scope(name, "sequence_loss", logits + targets + weights):
  1012.     cost = math_ops.reduce_sum(
  1013.         sequence_loss_by_example(
  1014.             logits,
  1015.             targets,
  1016.             weights,
  1017.             average_across_timesteps=average_across_timesteps,
  1018.             softmax_loss_function=softmax_loss_function))
  1019.     if average_across_batch:
  1020.       batch_size = array_ops.shape(targets[0])[0]
  1021.       return cost / math_ops.cast(batch_size, cost.dtype)
  1022.     else:
  1023.       return cost


  1024. def model_with_buckets(encoder_inputs,
  1025.                        decoder_inputs,
  1026.                        targets,
  1027.                        weights,
  1028.                        buckets,
  1029.                        seq2seq,
  1030.                        softmax_loss_function=None,
  1031.                        per_example_loss=False,
  1032.                        name=None):
  1033.   """Create a sequence-to-sequence model with support for bucketing.

  1034.   The seq2seq argument is a function that defines a sequence-to-sequence model,
  1035.   e.g., seq2seq = lambda x, y: basic_rnn_seq2seq(
  1036.       x, y, rnn_cell.GRUCell(24))

  1037.   Args:
  1038.     encoder_inputs: A list of Tensors to feed the encoder; first seq2seq input.
  1039.     decoder_inputs: A list of Tensors to feed the decoder; second seq2seq input.
  1040.     targets: A list of 1D batch-sized int32 Tensors (desired output sequence).
  1041.     weights: List of 1D batch-sized float-Tensors to weight the targets.
  1042.     buckets: A list of pairs of (input size, output size) for each bucket.
  1043.     seq2seq: A sequence-to-sequence model function; it takes 2 input that
  1044.       agree with encoder_inputs and decoder_inputs, and returns a pair
  1045.       consisting of outputs and states (as, e.g., basic_rnn_seq2seq).
  1046.     softmax_loss_function: Function (labels, logits) -> loss-batch
  1047.       to be used instead of the standard softmax (the default if this is None).
  1048.       **Note that to avoid confusion, it is required for the function to accept
  1049.       named arguments.**
  1050.     per_example_loss: Boolean. If set, the returned loss will be a batch-sized
  1051.       tensor of losses for each sequence in the batch. If unset, it will be
  1052.       a scalar with the averaged loss from all examples.
  1053.     name: Optional name for this operation, defaults to "model_with_buckets".

  1054.   Returns:
  1055.     A tuple of the form (outputs, losses), where:
  1056.       outputs: The outputs for each bucket. Its j'th element consists of a list
  1057.         of 2D Tensors. The shape of output tensors can be either
  1058.         [batch_size x output_size] or [batch_size x num_decoder_symbols]
  1059.         depending on the seq2seq model used.
  1060.       losses: List of scalar Tensors, representing losses for each bucket, or,
  1061.         if per_example_loss is set, a list of 1D batch-sized float Tensors.

  1062.   Raises:
  1063.     ValueError: If length of encoder_inputs, targets, or weights is smaller
  1064.       than the largest (last) bucket.
  1065.   """
  1066.   if len(encoder_inputs) < buckets[-1][0]:
  1067.     raise ValueError("Length of encoder_inputs (%d) must be at least that of la"
  1068.                      "st bucket (%d)." % (len(encoder_inputs), buckets[-1][0]))
  1069.   if len(targets) < buckets[-1][1]:
  1070.     raise ValueError("Length of targets (%d) must be at least that of last "
  1071.                      "bucket (%d)." % (len(targets), buckets[-1][1]))
  1072.   if len(weights) < buckets[-1][1]:
  1073.     raise ValueError("Length of weights (%d) must be at least that of last "
  1074.                      "bucket (%d)." % (len(weights), buckets[-1][1]))

  1075.   all_inputs = encoder_inputs + decoder_inputs + targets + weights
  1076.   losses = []
  1077.   outputs = []
  1078.   with ops.name_scope(name, "model_with_buckets", all_inputs):
  1079.     for j, bucket in enumerate(buckets):
  1080.       with variable_scope.variable_scope(
  1081.           variable_scope.get_variable_scope(), reuse=True if j > 0 else None):
  1082.         bucket_outputs, _ = seq2seq(encoder_inputs[:bucket[0]],
  1083.                                     decoder_inputs[:bucket[1]])
  1084.         outputs.append(bucket_outputs)
  1085.         if per_example_loss:
  1086.           losses.append(
  1087.               sequence_loss_by_example(
  1088.                   outputs[-1],
  1089.                   targets[:bucket[1]],
  1090.                   weights[:bucket[1]],
  1091.                   softmax_loss_function=softmax_loss_function))
  1092.         else:
  1093.           losses.append(
  1094.               sequence_loss(
  1095.                   outputs[-1],
  1096.                   targets[:bucket[1]],
  1097.                   weights[:bucket[1]],
  1098.                   softmax_loss_function=softmax_loss_function))

  1099.   return outputs, losses
复制代码
seq2seq_model示例代码:
现在您可以在 /home/ubuntu 目录下创建源文件 seq2seq_model.py,内容可参考:
示例代码:/home/ubuntu/seq2seq_model.py
  1. from __future__ import absolute_import
  2. from __future__ import division
  3. from __future__ import print_function

  4. import random

  5. import numpy as np
  6. from six.moves import xrange
  7. import tensorflow as tf
  8. import seq2seq
  9. import generate_chat


  10. class Seq2SeqModel(object):

  11.   def __init__(self,
  12.                source_vocab_size,
  13.                target_vocab_size,
  14.                buckets,
  15.                size,
  16.                num_layers,
  17.                max_gradient_norm,
  18.                batch_size,
  19.                learning_rate,
  20.                learning_rate_decay_factor,
  21.                use_lstm=False,
  22.                num_samples=512,
  23.                forward_only=False,
  24.                dtype=tf.float32):
  25.     self.source_vocab_size = source_vocab_size
  26.     self.target_vocab_size = target_vocab_size
  27.     self.buckets = buckets
  28.     self.batch_size = batch_size
  29.     self.learning_rate = tf.Variable(
  30.         float(learning_rate), trainable=False, dtype=dtype)
  31.     self.learning_rate_decay_op = self.learning_rate.assign(
  32.         self.learning_rate * learning_rate_decay_factor)
  33.     self.global_step = tf.Variable(0, trainable=False)

  34.     output_projection = None
  35.     softmax_loss_function = None
  36.     if num_samples > 0 and num_samples < self.target_vocab_size:
  37.       w_t = tf.get_variable("proj_w", [self.target_vocab_size, size], dtype=dtype)
  38.       w = tf.transpose(w_t)
  39.       b = tf.get_variable("proj_b", [self.target_vocab_size], dtype=dtype)
  40.       output_projection = (w, b)

  41.       def sampled_loss(labels, logits):
  42.         labels = tf.reshape(labels, [-1, 1])
  43.         local_w_t = tf.cast(w_t, tf.float32)
  44.         local_b = tf.cast(b, tf.float32)
  45.         local_inputs = tf.cast(logits, tf.float32)
  46.         return tf.cast(
  47.             tf.nn.sampled_softmax_loss(
  48.                 weights=local_w_t,
  49.                 biases=local_b,
  50.                 labels=labels,
  51.                 inputs=local_inputs,
  52.                 num_sampled=num_samples,
  53.                 num_classes=self.target_vocab_size),
  54.             dtype)
  55.       softmax_loss_function = sampled_loss

  56.     def single_cell():
  57.       return tf.contrib.rnn.GRUCell(size)
  58.     if use_lstm:
  59.       def single_cell():
  60.         return tf.contrib.rnn.BasicLSTMCell(size)
  61.     cell = single_cell()
  62.     encoder_cell = single_cell()
  63.     if num_layers > 1:
  64.       cell = tf.contrib.rnn.MultiRNNCell([single_cell() for _ in range(num_layers)])
  65.       encoder_cell = tf.contrib.rnn.MultiRNNCell([single_cell() for _ in range(num_layers)])

  66.     def seq2seq_f(encoder_inputs, decoder_inputs, do_decode):
  67.       return seq2seq.embedding_attention_seq2seq(
  68.           encoder_inputs,
  69.           decoder_inputs,
  70.           encoder_cell,
  71.           cell,
  72.           num_encoder_symbols=source_vocab_size,
  73.           num_decoder_symbols=target_vocab_size,
  74.           embedding_size=size,
  75.           output_projection=output_projection,
  76.           feed_previous=do_decode,
  77.           dtype=dtype)

  78.     self.encoder_inputs = []
  79.     self.decoder_inputs = []
  80.     self.target_weights = []
  81.     for i in xrange(buckets[-1][0]):  # Last bucket is the biggest one.
  82.       self.encoder_inputs.append(tf.placeholder(tf.int32, shape=[None],
  83.                                                 name="encoder{0}".format(i)))
  84.     for i in xrange(buckets[-1][1] + 1):
  85.       self.decoder_inputs.append(tf.placeholder(tf.int32, shape=[None],
  86.                                                 name="decoder{0}".format(i)))
  87.       self.target_weights.append(tf.placeholder(dtype, shape=[None],
  88.                                                 name="weight{0}".format(i)))

  89.     targets = [self.decoder_inputs[i + 1]
  90.                for i in xrange(len(self.decoder_inputs) - 1)]

  91.     if forward_only:
  92.       self.outputs, self.losses = seq2seq.model_with_buckets(
  93.           self.encoder_inputs, self.decoder_inputs, targets,
  94.           self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, True),
  95.           softmax_loss_function=softmax_loss_function)
  96.       # If we use output projection, we need to project outputs for decoding.
  97.       if output_projection is not None:
  98.         for b in xrange(len(buckets)):
  99.           self.outputs[b] = [
  100.               tf.matmul(output, output_projection[0]) + output_projection[1]
  101.               for output in self.outputs[b]
  102.           ]
  103.     else:
  104.       self.outputs, self.losses = tf.contrib.legacy_seq2seq.model_with_buckets(
  105.           self.encoder_inputs, self.decoder_inputs, targets,
  106.           self.target_weights, buckets,
  107.           lambda x, y: seq2seq_f(x, y, False),
  108.           softmax_loss_function=softmax_loss_function)

  109.     # Gradients and SGD update operation for training the model.
  110.     params = tf.trainable_variables()
  111.     if not forward_only:
  112.       self.gradient_norms = []
  113.       self.updates = []
  114.       opt = tf.train.GradientDescentOptimizer(self.learning_rate)
  115.       for b in xrange(len(buckets)):
  116.         gradients = tf.gradients(self.losses[b], params)
  117.         clipped_gradients, norm = tf.clip_by_global_norm(gradients,
  118.                                                          max_gradient_norm)
  119.         self.gradient_norms.append(norm)
  120.         self.updates.append(opt.apply_gradients(
  121.             zip(clipped_gradients, params), global_step=self.global_step))

  122.     self.saver = tf.train.Saver(tf.global_variables())

  123.   def step(self, session, encoder_inputs, decoder_inputs, target_weights,
  124.            bucket_id, forward_only):
  125.     encoder_size, decoder_size = self.buckets[bucket_id]
  126.     if len(encoder_inputs) != encoder_size:
  127.       raise ValueError("Encoder length must be equal to the one in bucket,"
  128.                        " %d != %d." % (len(encoder_inputs), encoder_size))
  129.     if len(decoder_inputs) != decoder_size:
  130.       raise ValueError("Decoder length must be equal to the one in bucket,"
  131.                        " %d != %d." % (len(decoder_inputs), decoder_size))
  132.     if len(target_weights) != decoder_size:
  133.       raise ValueError("Weights length must be equal to the one in bucket,"
  134.                        " %d != %d." % (len(target_weights), decoder_size))

  135.     # Input feed: encoder inputs, decoder inputs, target_weights, as provided.
  136.     input_feed = {}
  137.     for l in xrange(encoder_size):
  138.       input_feed[self.encoder_inputs[l].name] = encoder_inputs[l]
  139.     for l in xrange(decoder_size):
  140.       input_feed[self.decoder_inputs[l].name] = decoder_inputs[l]
  141.       input_feed[self.target_weights[l].name] = target_weights[l]

  142.     # Since our targets are decoder inputs shifted by one, we need one more.
  143.     last_target = self.decoder_inputs[decoder_size].name
  144.     input_feed[last_target] = np.zeros([self.batch_size], dtype=np.int32)

  145.     # Output feed: depends on whether we do a backward step or not.
  146.     if not forward_only:
  147.       output_feed = [self.updates[bucket_id],  # Update Op that does SGD.
  148.                      self.gradient_norms[bucket_id],  # Gradient norm.
  149.                      self.losses[bucket_id]]  # Loss for this batch.
  150.     else:
  151.       output_feed = [self.losses[bucket_id]]  # Loss for this batch.
  152.       for l in xrange(decoder_size):  # Output logits.
  153.         output_feed.append(self.outputs[bucket_id][l])

  154.     outputs = session.run(output_feed, input_feed)
  155.     if not forward_only:
  156.       return outputs[1], outputs[2], None  # Gradient norm, loss, no outputs.
  157.     else:
  158.       return None, outputs[0], outputs[1:]  # No gradient norm, loss, outputs.

  159.   def get_batch(self, data, bucket_id):
  160.     encoder_size, decoder_size = self.buckets[bucket_id]
  161.     encoder_inputs, decoder_inputs = [], []

  162.     # Get a random batch of encoder and decoder inputs from data,
  163.     # pad them if needed, reverse encoder inputs and add GO to decoder.
  164.     for _ in xrange(self.batch_size):
  165.       encoder_input, decoder_input = random.choice(data[bucket_id])

  166.       # Encoder inputs are padded and then reversed.
  167.       encoder_pad = [generate_chat.PAD_ID] * (encoder_size - len(encoder_input))
  168.       encoder_inputs.append(list(reversed(encoder_input + encoder_pad)))

  169.       # Decoder inputs get an extra "GO" symbol, and are padded then.
  170.       decoder_pad_size = decoder_size - len(decoder_input) - 1
  171.       decoder_inputs.append([generate_chat.GO_ID] + decoder_input +
  172.                             [generate_chat.PAD_ID] * decoder_pad_size)

  173.     # Now we create batch-major vectors from the data selected above.
  174.     batch_encoder_inputs, batch_decoder_inputs, batch_weights = [], [], []

  175.     # Batch encoder inputs are just re-indexed encoder_inputs.
  176.     for length_idx in xrange(encoder_size):
  177.       batch_encoder_inputs.append(
  178.           np.array([encoder_inputs[batch_idx][length_idx]
  179.                     for batch_idx in xrange(self.batch_size)], dtype=np.int32))

  180.     # Batch decoder inputs are re-indexed decoder_inputs, we create weights.
  181.     for length_idx in xrange(decoder_size):
  182.       batch_decoder_inputs.append(
  183.           np.array([decoder_inputs[batch_idx][length_idx]
  184.                     for batch_idx in xrange(self.batch_size)], dtype=np.int32))

  185.       # Create target_weights to be 0 for targets that are padding.
  186.       batch_weight = np.ones(self.batch_size, dtype=np.float32)
  187.       for batch_idx in xrange(self.batch_size):
  188.         # We set weight to 0 if the corresponding target is a PAD symbol.
  189.         # The corresponding target is decoder_input shifted by 1 forward.
  190.         if length_idx < decoder_size - 1:
  191.           target = decoder_inputs[batch_idx][length_idx + 1]
  192.         if length_idx == decoder_size - 1 or target == generate_chat.PAD_ID:
  193.           batch_weight[batch_idx] = 0.0
  194.       batch_weights.append(batch_weight)
  195.     return batch_encoder_inputs, batch_decoder_inputs, batch_weights
复制代码
训练 Seq2Seq 模型
训练 30 万次后,损失函数基本保持不变,单个 GPU 大概需要 17 个小时左右,如果采用 CPU 训练,大概需要 3 天左右。你可以调整循环次数,体验下训练过程,可以直接下载我们训练好的模型。
示例代码:
现在您可以在 /home/ubuntu 目录下创建源文件 train_chat.py,内容可参考:
示例代码:/home/ubuntu/train_chat.py
  1. #-*- coding:utf-8 -*-
  2. import generate_chat
  3. import seq2seq_model
  4. import tensorflow as tf
  5. import numpy as np
  6. import logging
  7. import logging.handlers

  8. if __name__ == '__main__':

  9.     _,_,source_vocab_size = generate_chat.get_vocabs(generate_chat.vocab_encode_file)
  10.     _,_,target_vocab_size = generate_chat.get_vocabs(generate_chat.vocab_decode_file)
  11.     train_set = generate_chat.read_data(generate_chat.train_encode_vec_file,generate_chat.train_decode_vec_file)
  12.     test_set = generate_chat.read_data(generate_chat.test_encode_vec_file,generate_chat.test_decode_vec_file)
  13.     train_bucket_sizes = [len(train_set[i]) for i in range(len(generate_chat._buckets))]
  14.     train_total_size = float(sum(train_bucket_sizes))
  15.     train_buckets_scale = [sum(train_bucket_sizes[:i + 1]) / train_total_size for i in range(len(train_bucket_sizes))]
  16.     with tf.Session() as sess:
  17.         model = seq2seq_model.Seq2SeqModel(source_vocab_size,
  18.             target_vocab_size,
  19.             generate_chat._buckets,
  20.             generate_chat.units_num,
  21.             generate_chat.num_layers,
  22.             generate_chat.max_gradient_norm,
  23.             generate_chat.batch_size,
  24.             generate_chat.learning_rate,
  25.             generate_chat.learning_rate_decay_factor,
  26.             use_lstm = True)
  27.         ckpt = tf.train.get_checkpoint_state('.')
  28.         if ckpt and tf.train.checkpoint_exists(ckpt.model_checkpoint_path):
  29.             print("Reading model parameters from %s" % ckpt.model_checkpoint_path)
  30.             model.saver.restore(sess, ckpt.model_checkpoint_path)
  31.         else:
  32.             print("Created model with fresh parameters.")
  33.             sess.run(tf.global_variables_initializer())
  34.         loss = 0.0
  35.         step = 0
  36.         previous_losses = []
  37.         while True:
  38.             random_number_01 = np.random.random_sample()
  39.             bucket_id = min([i for i in range(len(train_buckets_scale)) if train_buckets_scale[i] > random_number_01])
  40.             encoder_inputs, decoder_inputs, target_weights = model.get_batch(train_set, bucket_id)
  41.             _, step_loss, _ = model.step(sess, encoder_inputs, decoder_inputs,target_weights, bucket_id, False)
  42.             print("step:%d,loss:%f" % (step,step_loss))
  43.             loss += step_loss / 2000
  44.             step += 1
  45.             if step % 1000 == 0:
  46.                 print("step:%d,per_loss:%f" % (step,loss))
  47.                 if len(previous_losses) > 2 and loss > max(previous_losses[-3:]):
  48.                     sess.run(model.learning_rate_decay_op)
  49.                 previous_losses.append(loss)
  50.                 model.saver.save(sess, "./chatbot.ckpt", global_step=model.global_step)
  51.                 loss = 0.0
  52.             if step % 5000 == 0:
  53.                 for bucket_id in range(len(generate_chat._buckets)):
  54.                     if len(test_set[bucket_id]) == 0:
  55.                         continue
  56.                         encoder_inputs, decoder_inputs, target_weights = model.get_batch(test_set, bucket_id)
  57.                         _, eval_loss, _ = model.step(sess, encoder_inputs, decoder_inputs, target_weights, bucket_id, True)
  58.                         print("bucket_id:%d,eval_loss:%f" % (bucket_id,eval_loss))
复制代码
然后执行:
  1. cd /home/ubuntu;
  2. python train_chat.py
复制代码
执行结果:
  1. step:311991,loss:0.000332
  2. step:311992,loss:0.000199
  3. step:311993,loss:0.000600
  4. step:311994,loss:0.001900
  5. step:311995,loss:0.018695
  6. step:311996,loss:0.000945
  7. step:311997,loss:0.000517
  8. step:311998,loss:0.000530
  9. step:311999,loss:0.001020
  10. step:312000,per_loss:0.000672
  11. step:312000,loss:0.000276
  12. step:312001,loss:0.000332
  13. step:312002,loss:0.003255
  14. step:312003,loss:0.000452
  15. step:312004,loss:0.000553
复制代码
下载已有模型:
  1. wget http://tensorflow-1253675457.cosgz.myqcloud.com/chat/chat_model.zip
  2. unzip -o chat_model.zip
复制代码
开始聊天
利用训练好的模型,我们可以开始聊天了。训练数据有限只能进行简单的对话,提问最好参考训练数据,否则效果不理想。
示例代码:
现在您可以在 /home/ubuntu 目录下创建源文件 predict_chat.py,内容可参考:
示例代码:/home/ubuntu/predict_chat.py
  1. #-*- coding:utf-8 -*-
  2. import generate_chat
  3. import seq2seq_model
  4. import tensorflow as tf
  5. import numpy as np
  6. import sys

  7. if __name__ == '__main__':
  8.     source_id_to_word,source_word_to_id,source_vocab_size = generate_chat.get_vocabs(generate_chat.vocab_encode_file)
  9.     target_id_to_word,target_word_to_id,target_vocab_size = generate_chat.get_vocabs(generate_chat.vocab_decode_file)
  10.     to_id = lambda word: source_word_to_id.get(word,generate_chat.UNK_ID)
  11.     with tf.Session() as sess:
  12.         model = seq2seq_model.Seq2SeqModel(source_vocab_size,
  13.                                            target_vocab_size,
  14.                                            generate_chat._buckets,
  15.                                            generate_chat.units_num,
  16.                                            generate_chat.num_layers,
  17.                                            generate_chat.max_gradient_norm,
  18.                                            1,
  19.                                            generate_chat.learning_rate,
  20.                                            generate_chat.learning_rate_decay_factor,
  21.                                            forward_only = True,
  22.                                            use_lstm = True)
  23.         model.saver.restore(sess,"chatbot.ckpt-317000")
  24.         while True:
  25.             sys.stdout.write("ask > ")
  26.             sys.stdout.flush()
  27.             sentence = sys.stdin.readline().strip('\n')
  28.             flag = generate_chat.is_chinese(sentence)
  29.             if not sentence or not flag:
  30.               print("请输入纯中文")
  31.               continue
  32.             sentence_vec = list(map(to_id,sentence))
  33.             bucket_id = len(generate_chat._buckets) - 1
  34.             if len(sentence_vec) > generate_chat._buckets[bucket_id][0]:
  35.                 print("sentence too long max:%d" % generate_chat._buckets[bucket_id][0])
  36.                 exit(0)
  37.             for i,bucket in enumerate(generate_chat._buckets):
  38.                 if bucket[0] >= len(sentence_vec):
  39.                     bucket_id = i
  40.                     break
  41.             encoder_inputs, decoder_inputs, target_weights = model.get_batch({bucket_id: [(sentence_vec, [])]}, bucket_id)
  42.             _, _, output_logits = model.step(sess, encoder_inputs, decoder_inputs,target_weights, bucket_id, True)
  43.             outputs = [int(np.argmax(logit, axis=1)) for logit in output_logits]
  44.             if generate_chat.EOS_ID in outputs:
  45.                 outputs = outputs[:outputs.index(generate_chat.EOS_ID)]
  46.             answer = "".join([tf.compat.as_str(target_id_to_word[output]) for output in outputs])
  47.             print("answer > " + answer)
复制代码
然后执行(需要耐心等待几分钟):
  1. cd /home/ubuntu
  2. python predict_chat.py
复制代码
执行结果:
  1. ask > 你大爷
  2. answer > 你大爷
  3. ask > 你好
  4. answer > 你好呀
  5. ask > 我是谁
  6. answer > 哈哈,大屌丝,地地眼
复制代码
完成实验
腾讯云
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

精彩图文



在线客服(工作时间:9:00-22:00)
400-600-6565

内容导航

微信客服

Copyright   ©2015-2019  云服务器社区  Powered by©Discuz!  技术支持:尊托网络     ( 湘ICP备15009499号-1 )