Guys, if you struggle with
neg_log_prob = tf.nn.softmax_cross_entropy_with_logits_v2(logits = fc3, labels = actions)
in n Cartpole REINFORCE Monte Carlo Policy Gradients.
I killed some time to understand what is happening there
You can change code as bellow:
y_hat_softmax = tf.nn.softmax(fc3)
y_cross = actions * tf.log(y_hat_softmax)
neg_log_prob = - tf.reduce_sum(y_cross, 1)
loss = tf.reduce_mean(neg_log_prob * discounted_episode_rewards_)
also change
actions = tf.placeholder(tf.float32, [None, action_size], name="actions")