Skip to content

N-Raghav/Reinforcement-Learning-for-Picking-Task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Deep Reinforcement Learning Implementations

This repository contains implementations of various deep reinforcement learning algorithms with experiments on different environments.

Project Overview

Implementations of reinforcement learning algorithms trained on multiple environments:

  • A3C (Asynchronous Advantage Actor-Critic) — Kuka Pick & Place manipulation task
  • A2C & REINFORCE — LunarLander-v2 continuous control task

Repository Structure

MLR/RL/
├── code/
│   ├── A2C/                    # Actor-Critic implementations for LunarLander
│   │   ├── actor.py           # Actor network
│   │   ├── critic.py          # Critic network
│   │   ├── train.py           # Training script
│   │   ├── eval.py            # Evaluation script
│   │   ├── compute_objectives.py  # Loss computation
│   │   ├── utils.py           # Utility functions
│   │   ├── config.json        # Configuration
│   │   ├── checkpoints/       # Trained models
│   │   ├── plots/             # Training curves
│   │   └── videos/            # Evaluation videos
│   └── A3C/                    # A3C implementation for Kuka
│       ├── main.py            # Entry point
│       ├── eval.py            # Evaluation script
│       ├── plot_training.py   # Visualization
│       ├── config/            # Environment and model configs
│       ├── lib/               # A3C algorithm implementation
│       ├── helpers/           # Helper utilities
│       ├── models/            # Trained checkpoints
│       ├── logs/              # Training logs
│       ├── plots/             # Training curves
│       └── requirements.txt   # Dependencies
└── Project3.pdf               # Assignment specification

Implementations

1. A3C on Kuka Pick & Place

Asynchronous Advantage Actor-Critic with 4 parallel workers training on robotic manipulation.

Environment: KukaDiverseObjectEnv (PyBullet)

  • Observation: RGB image (40×40, from 128×128)
  • Action space: 3D continuous (end-effector control)
  • Task: Pick and place diverse objects

Training Parameters:

  • Episodes: 10,000
  • Workers: 4 (asynchronous)
  • Training time: ~4.5 hours (CPU)
  • Device: CPU (CUDA unavailable)

Results:

  • Final average reward: ~0.35–0.38
  • Evaluation (100 episodes): 31% success rate
  • Average reward: 0.310

A3C Training Curve

2. REINFORCE on LunarLander-v2

Policy gradient method applied to continuous control.

Environment: LunarLander-v2

  • Observation: 8D state vector (position, velocity, angles, contact)
  • Action space: 2D continuous (thrust, rotation)
  • Task: Land the lunar module safely

Training metrics:

  • Episodes trained: 5,000
  • Learning rate: Adaptive scheduling
  • Convergence: ~500-1000 episodes

REINFORCE Learning Curve

3. A2C on LunarLander-v2

Actor-Critic method combining policy and value function learning.

Environment: LunarLander-v2 (same as REINFORCE)

Architecture:

  • Shared hidden layers: [128, 64]
  • Actor head: outputs action mean and log_std
  • Critic head: outputs state value estimate

Training metrics:

  • Episodes trained: 5,000
  • Workers/parallel processes: 1
  • Convergence: ~1000-2000 episodes

A2C Learning Curve

Setup & Installation

A3C (Kuka)

cd code/A3C
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Run training:

python main.py

Run evaluation (with GUI):

python eval.py --checkpoint models/a3c_kuka_model_final.pth --episodes 10 --render

Plot training curves:

python plot_training.py

A2C/REINFORCE (LunarLander)

cd code/A2C
pip install -r requirements.txt

Run training:

python train.py

Run evaluation:

python eval.py --checkpoint checkpoints/lunar_lander_actor.pt --episodes 5

A3C Experiment

A3C Diagram

Results Summary

Algorithm Environment Episodes Success Rate Avg Reward
A3C Kuka Pick & Place 10,000 31% 0.310
REINFORCE LunarLander-v2 5,000 ~80%* -50 to 0
A2C LunarLander-v2 5,000 ~85%* -20 to 0

*Vision-based reward threshold; higher scores indicate better performance.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors