PPO Algorithm - Search

About 1,410 results

Open links in new tab

Any time

openai.com
https://spinningup.openai.com › en › latest › algorithms › ppo.html
Proximal Policy Optimization — Spinning Up documentation
Quick Facts ¶ PPO is an on-policy algorithm. PPO can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of PPO supports parallelization with MPI.
openai.com
https://openai.com › index › openai-baselines-ppo
Proximal Policy Optimization - OpenAI
Jul 20, 2017 · We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while …
openai.com
https://spinningup.openai.com › en › latest › spinningup
Part 3: Intro to Policy Optimization — Spinning Up documentation
In this section, we’ll discuss the mathematical foundations of policy optimization algorithms, and connect the material to sample code. We will cover three key results in the theory of policy gradients:
openai.com
https://spinningup.openai.com › en › latest › user › algorithms.html
Algorithms — Spinning Up documentation - OpenAI
We chose the core deep RL algorithms in this package to reflect useful progressions of ideas from the recent history of the field, culminating in two algorithms in particular—PPO and SAC—which are …
openai.com
https://spinningup.openai.com › en › latest › algorithms › sac.html
Soft Actor-Critic — Spinning Up documentation - OpenAI
Soft Actor Critic (SAC) is an algorithm that optimizes a stochastic policy in an off-policy way, forming a bridge between stochastic policy optimization and DDPG-style approaches.
openai.com
https://spinningup.openai.com › en › latest › spinningup
Part 2: Kinds of RL Algorithms — Spinning Up documentation
Use a model-free RL algorithm to train a policy or Q-function, but either 1) augment real experiences with fictitious ones in updating the agent, or 2) use only fictitous experience for updating the agent.
openai.com
https://spinningup.openai.com › en › latest › algorithms › trpo.html
Trust Region Policy Optimization — Spinning Up documentation
Quick Facts ¶ TRPO is an on-policy algorithm. TRPO can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of TRPO supports parallelization with …
openai.com
https://spinningup.openai.com › en › latest › algorithms › vpg.html
Vanilla Policy Gradient — Spinning Up documentation - OpenAI
Quick Facts ¶ VPG is an on-policy algorithm. VPG can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of VPG supports parallelization with MPI.
openai.com
https://openai.com › index › spinning-up-in-deep-rl
Spinning Up in Deep RL - OpenAI
Nov 8, 2018 · A well-documented code repo ⁠ (opens in a new window) of short, standalone implementations of: Vanilla Policy Gradient (VPG), Trust Region Policy Optimization (TRPO), …
openai.com
https://openai.com › index
OpenAI Baselines: ACKTR & A2C
Aug 18, 2017 · In the following plot we show performance of ACKTR on 49 Atari games compared to other algorithm: A2C, PPO, ACER. The hyperparameters of ACKTR were tuned by the author of …

Pagination
- 1
- 2
- 3
- 4
- 5
- Next