I am having trouble wrapping my head around the differences in notation used to describe the algorithms of Deep Q Networks (DQN), Policy gradient methods, and actor-critic methods.
For DQN, we talk in terms of loss function and it happens to look like MSE: For Policy gradient methods I see notation for loss functions that look like cross-entropy and I also see the policy gradient updates form:
For actor-critic methods, I have not been able to find a loss function. Everything is written in terms of update rule like the above with policy gradients. Why is this?