Loss function vs gradient updates in policy gradient methods

by Julia Maddalena   Last Updated September 11, 2019 17:19 PM

I am having trouble wrapping my head around the differences in notation used to describe the algorithms of Deep Q Networks (DQN), Policy gradient methods, and actor-critic methods.

For DQN, we talk in terms of loss function and it happens to look like MSE: enter image description here For Policy gradient methods I see notation for loss functions that look like cross-entropy and I also see the policy gradient updates form:

enter image description here

For actor-critic methods, I have not been able to find a loss function. Everything is written in terms of update rule like the above with policy gradients. Why is this?



Related Questions


Updated February 04, 2018 13:19 PM

Updated November 05, 2018 11:19 AM

Updated March 06, 2016 04:08 AM

Updated November 25, 2017 00:19 AM