You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fine-tuning FLAN-T5 with PPO and PEFT to generate less toxic text summaries. This notebook leverages Meta AI's hate speech reward model and utilizes RLHF techniques for improved safety.
This repository hosts Jupyter notebooks showcasing the training of Atari games using a variety of Deep Reinforcement Learning (RL) algorithms such as Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Deep Q-Networks (DQN), Advantage Actor-Critic (A2C), and more.