Deep Reinforcement Learning Implementations

This repository contains implementations of various deep reinforcement learning algorithms with experiments on different environments.

Project Overview

Implementations of reinforcement learning algorithms trained on multiple environments:

A3C (Asynchronous Advantage Actor-Critic) — Kuka Pick & Place manipulation task
A2C & REINFORCE — LunarLander-v2 continuous control task

Repository Structure

MLR/RL/
├── code/
│   ├── A2C/                    # Actor-Critic implementations for LunarLander
│   │   ├── actor.py           # Actor network
│   │   ├── critic.py          # Critic network
│   │   ├── train.py           # Training script
│   │   ├── eval.py            # Evaluation script
│   │   ├── compute_objectives.py  # Loss computation
│   │   ├── utils.py           # Utility functions
│   │   ├── config.json        # Configuration
│   │   ├── checkpoints/       # Trained models
│   │   ├── plots/             # Training curves
│   │   └── videos/            # Evaluation videos
│   └── A3C/                    # A3C implementation for Kuka
│       ├── main.py            # Entry point
│       ├── eval.py            # Evaluation script
│       ├── plot_training.py   # Visualization
│       ├── config/            # Environment and model configs
│       ├── lib/               # A3C algorithm implementation
│       ├── helpers/           # Helper utilities
│       ├── models/            # Trained checkpoints
│       ├── logs/              # Training logs
│       ├── plots/             # Training curves
│       └── requirements.txt   # Dependencies
└── Project3.pdf               # Assignment specification

Implementations

1. A3C on Kuka Pick & Place

Asynchronous Advantage Actor-Critic with 4 parallel workers training on robotic manipulation.

Environment: KukaDiverseObjectEnv (PyBullet)

Observation: RGB image (40×40, from 128×128)
Action space: 3D continuous (end-effector control)
Task: Pick and place diverse objects

Training Parameters:

Episodes: 10,000
Workers: 4 (asynchronous)
Training time: ~4.5 hours (CPU)
Device: CPU (CUDA unavailable)

Results:

Final average reward: ~0.35–0.38
Evaluation (100 episodes): 31% success rate
Average reward: 0.310

2. REINFORCE on LunarLander-v2

Policy gradient method applied to continuous control.

Environment: LunarLander-v2

Observation: 8D state vector (position, velocity, angles, contact)
Action space: 2D continuous (thrust, rotation)
Task: Land the lunar module safely

Training metrics:

Episodes trained: 5,000
Learning rate: Adaptive scheduling
Convergence: ~500-1000 episodes

3. A2C on LunarLander-v2

Actor-Critic method combining policy and value function learning.

Environment: LunarLander-v2 (same as REINFORCE)

Architecture:

Shared hidden layers: [128, 64]
Actor head: outputs action mean and log_std
Critic head: outputs state value estimate

Training metrics:

Episodes trained: 5,000
Workers/parallel processes: 1
Convergence: ~1000-2000 episodes

Setup & Installation

A3C (Kuka)

cd code/A3C
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Run training:

python main.py

Run evaluation (with GUI):

python eval.py --checkpoint models/a3c_kuka_model_final.pth --episodes 10 --render

Plot training curves:

python plot_training.py

A2C/REINFORCE (LunarLander)

cd code/A2C
pip install -r requirements.txt

Run training:

python train.py

Run evaluation:

python eval.py --checkpoint checkpoints/lunar_lander_actor.pt --episodes 5

A3C Experiment

Results Summary

Algorithm	Environment	Episodes	Success Rate	Avg Reward
A3C	Kuka Pick & Place	10,000	31%	0.310
REINFORCE	LunarLander-v2	5,000	~80%*	-50 to 0
A2C	LunarLander-v2	5,000	~85%*	-20 to 0

*Vision-based reward threshold; higher scores indicate better performance.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
code		code
.gitignore		.gitignore
Project3.pdf		Project3.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Reinforcement Learning Implementations

Project Overview

Repository Structure

Implementations

1. A3C on Kuka Pick & Place

2. REINFORCE on LunarLander-v2

3. A2C on LunarLander-v2

Setup & Installation

A3C (Kuka)

A2C/REINFORCE (LunarLander)

A3C Experiment

Results Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning Implementations

Project Overview

Repository Structure

Implementations

1. A3C on Kuka Pick & Place

2. REINFORCE on LunarLander-v2

3. A2C on LunarLander-v2

Setup & Installation

A3C (Kuka)

A2C/REINFORCE (LunarLander)

A3C Experiment

Results Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages