About 1,410 results
Open links in new tab
  1. Proximal Policy Optimization — Spinning Up documentation

    Quick Facts ¶ PPO is an on-policy algorithm. PPO can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of PPO supports parallelization with MPI.

  2. Proximal Policy Optimization - OpenAI

    Jul 20, 2017 · We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while …

  3. Part 3: Intro to Policy Optimization — Spinning Up documentation

    In this section, we’ll discuss the mathematical foundations of policy optimization algorithms, and connect the material to sample code. We will cover three key results in the theory of policy gradients:

  4. Algorithms — Spinning Up documentation - OpenAI

    We chose the core deep RL algorithms in this package to reflect useful progressions of ideas from the recent history of the field, culminating in two algorithms in particular—PPO and SAC—which are …

  5. Soft Actor-Critic — Spinning Up documentation - OpenAI

    Soft Actor Critic (SAC) is an algorithm that optimizes a stochastic policy in an off-policy way, forming a bridge between stochastic policy optimization and DDPG-style approaches.

  6. Part 2: Kinds of RL Algorithms — Spinning Up documentation

    Use a model-free RL algorithm to train a policy or Q-function, but either 1) augment real experiences with fictitious ones in updating the agent, or 2) use only fictitous experience for updating the agent.

  7. Trust Region Policy Optimization — Spinning Up documentation

    Quick Facts ¶ TRPO is an on-policy algorithm. TRPO can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of TRPO supports parallelization with …

  8. Vanilla Policy Gradient — Spinning Up documentation - OpenAI

    Quick Facts ¶ VPG is an on-policy algorithm. VPG can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of VPG supports parallelization with MPI.

  9. Spinning Up in Deep RL - OpenAI

    Nov 8, 2018 · A well-documented code repo ⁠ (opens in a new window) of short, standalone implementations of: Vanilla Policy Gradient (VPG), Trust Region Policy Optimization (TRPO), …

  10. OpenAI Baselines: ACKTR & A2C

    Aug 18, 2017 · In the following plot we show performance of ACKTR on 49 Atari games compared to other algorithm: A2C, PPO, ACER. The hyperparameters of ACKTR were tuned by the author of …