who do struggle with  tf.nn.softmax_cross_entropy_with_logits_v2 in Cartpole REINFORCE Monte Carlo Policy Gradients

Guys, if you struggle with 
`neg_log_prob = tf.nn.softmax_cross_entropy_with_logits_v2(logits = fc3, labels = actions)`
in n Cartpole REINFORCE Monte Carlo Policy Gradients.
I killed some time to understand what is happening there
You can change code as bellow:

`y_hat_softmax = tf.nn.softmax(fc3)`

`y_cross = actions * tf.log(y_hat_softmax)`

`neg_log_prob = - tf.reduce_sum(y_cross, 1)`
        
 `loss = tf.reduce_mean(neg_log_prob * discounted_episode_rewards_) `


also change 
`actions = tf.placeholder(tf.float32, [None, action_size], name="actions")`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

who do struggle with tf.nn.softmax_cross_entropy_with_logits_v2 in Cartpole REINFORCE Monte Carlo Policy Gradients #85

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

who do struggle with tf.nn.softmax_cross_entropy_with_logits_v2 in Cartpole REINFORCE Monte Carlo Policy Gradients #85

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions