
Proximal Policy Optimization — Spinning Up documentation
Quick Facts ¶ PPO is an on-policy algorithm. PPO can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of PPO supports parallelization with MPI.
Proximal Policy Optimization - OpenAI
Jul 20, 2017 · We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while …
Part 3: Intro to Policy Optimization — Spinning Up documentation
In this section, we’ll discuss the mathematical foundations of policy optimization algorithms, and connect the material to sample code. We will cover three key results in the theory of policy gradients:
Algorithms — Spinning Up documentation - OpenAI
We chose the core deep RL algorithms in this package to reflect useful progressions of ideas from the recent history of the field, culminating in two algorithms in particular—PPO and SAC—which are …
Soft Actor-Critic — Spinning Up documentation - OpenAI
Soft Actor Critic (SAC) is an algorithm that optimizes a stochastic policy in an off-policy way, forming a bridge between stochastic policy optimization and DDPG-style approaches.
Part 2: Kinds of RL Algorithms — Spinning Up documentation
Use a model-free RL algorithm to train a policy or Q-function, but either 1) augment real experiences with fictitious ones in updating the agent, or 2) use only fictitous experience for updating the agent.
Trust Region Policy Optimization — Spinning Up documentation
Quick Facts ¶ TRPO is an on-policy algorithm. TRPO can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of TRPO supports parallelization with …
Vanilla Policy Gradient — Spinning Up documentation - OpenAI
Quick Facts ¶ VPG is an on-policy algorithm. VPG can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of VPG supports parallelization with MPI.
Spinning Up in Deep RL - OpenAI
Nov 8, 2018 · A well-documented code repo (opens in a new window) of short, standalone implementations of: Vanilla Policy Gradient (VPG), Trust Region Policy Optimization (TRPO), …
OpenAI Baselines: ACKTR & A2C
Aug 18, 2017 · In the following plot we show performance of ACKTR on 49 Atari games compared to other algorithm: A2C, PPO, ACER. The hyperparameters of ACKTR were tuned by the author of …