Loss function vs gradient updates in policy gradient methods

September 11, 2019

I am having trouble wrapping my head around the differences in notation used to describe the algorithms of Deep Q Networks (DQN), Policy gradient methods, and actor-critic methods.

For DQN, we talk in terms of loss function and it happens to look like MSE: enter image description here For Policy gradient methods I see notation for loss functions that look like cross-entropy and I also see the policy gradient updates form:

enter image description here

For actor-critic methods, I have not been able to find a loss function. Everything is written in terms of update rule like the above with policy gradients. Why is this?

