[CoRL 2024] ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

Welcome to the official repository for ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter.

ThinkGrasp Featured as a Baseline in Recent Papers

So pleased to see ThinkGrasp being used as a baseline in the following papers:

Date	Paper Title
25/03/02	AffordGrasp: In-Context Affordance Reasoning for Open-Vocabulary Task-Oriented Grasping in Clutter
25/03/12	Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter
25/03/17	Free-form language-based robotic reasoning and grasping

Looking forward to more research in this area! 🚀

To-Do List

Simulation Code Cleanup (without VLP)
Real-World Code Cleanup (without VLP)
Write a Complete README
Add Additional Documentation

Setup

Installation Requirements

Operating System: Ubuntu 23.04
Dependencies:
- PyTorch: 1.13.1
- Torchvision: 0.14.1
- CUDA: 11.8
- Pybullet (simulation environment)
Hardware: GTX 3090 x 2 (for the complete version)
- Minimum Requirements:
  - Simulation: NVIDIA GTX 3090 (single GPU) with ~13GB GPU memory.
  - Real-World Execution: NVIDIA GTX 3090 with ~9.38GB GPU memory (LangSAM).
- Recommended Setup:
  - Two NVIDIA GTX 3090 GPUs for best performance when running VLPart.

Installation Steps

Create and Activate the Conda Environment:

conda create -n thinkgrasp python=3.8 conda activate thinkgrasp

Install PyTorch and Torchvision:

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Allow Deprecated Scikit-learn:

export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True

Install Additional Requirements:

pip install -r requirements.txt pip install -r langsam.txt

Develop Mode Installation:
```
python setup.py develop
```

Install PointNet2:

cd models/graspnet/pointnet2 python setup.py install cd ../knn python setup.py install cd ../../..

Install CUDA 11.8:
Download the CUDA installer and run:

sudo bash cuda_11.8.0_520.61.05_linux.run

Add the following lines to your ~/.bashrc file:

export CUDA_HOME=/usr/local/cuda-11.8 export PATH=$CUDA_HOME/bin:$PATH export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

Refresh the shell:

source ~/.bashrc

Assets

Download the processed object models from:

Place the downloaded files in the assets folder. Ensure the structure is as follows:

ThinkGrasp └── assets ├── simplified_objects ├── unseen_objects_40 └── unseen_objects

Running the Simulation

Log in to WandB:
```
wandb login
```
Set Your OpenAI API Key:
```
export OPENAI_API_KEY="sk-xxxxx"
```

Start the Simulation:

pip install protobuf==3.20.1 python simulation_main.py

Change Testing Data:
Update the dataset directory in simulation_main.py by modifying line 238:

parser.add_argument('--testing_case_dir', action='store', type=str, default='heavy_unseen/')

Running the Realworld Code

 pip install flask python realarm.py

Flask Application Notes:

Flask Configuration: The Flask application is configured to run on:
```
app.run(host='0.0.0.0', port=5000)
```
This allows the app to be accessed from any network interface on port 5000.
API Endpoint: The Flask application provides the following endpoint:
```
POST http://localhost:5000/grasp_pose 
```
Payload Format:
```
{ "image_path": "/path/to/rgb/image.png", "depth_path": "/path/to/depth/image.png", "text_path": "/path/to/goal_text.txt" }
```
- image_path: The path to the RGB image captured by the real-world camera connected to your robotic setup.
- depth_path: The path to the depth image from the same real-world camera.
- text_path: A text file containing the goal or task description.

Testing the API:

You can test the API using various tools:

Postman:

Open Postman and create a new POST request.
Set the URL to http://localhost:5000/grasp_pose.
In the "Body" tab, select "raw" and set the type to JSON.

Provide the JSON payload, ensuring the paths point to the images captured by your real-world camera:

{ "image_path": "/home/freax/camera_outputs/rgb_image.png", "depth_path": "/home/freax/camera_outputs/depth_image.png", "text_path": "/home/freax/goal_texts/task_goal.txt" }

Click "Send" to test the endpoint.

Curl:

Alternatively, use curl in the terminal:

curl -X POST http://localhost:5000/grasp_pose \ -H "Content-Type: application/json" \ -d '{  "image_path": "/home/freax/camera_outputs/rgb_image.png",  "depth_path": "/home/freax/camera_outputs/depth_image.png",  "text_path": "/home/freax/goal_texts/task_goal.txt" }'

Python Script:

Use Python's requests library:

import requests url = "http://localhost:5000/grasp_pose" payload = { "image_path": "/home/freax/camera_outputs/rgb_image.png", "depth_path": "/home/freax/camera_outputs/depth_image.png", "text_path": "/home/freax/goal_texts/task_goal.txt" } response = requests.post(url, json=payload) print(response.json())

Notes:

Ensure that the real-world camera is correctly configured and outputs the RGB and depth images to the specified paths (/home/freax/camera_outputs/ in the example).
If testing on a remote server, replace localhost with the server's IP address in your requests.
Verify that all files are accessible and correctly formatted for processing by the application.

Potential Issues of Installation

1. `AttributeError: module 'numpy' has no attribute 'float'`

Cause: Deprecated usage of numpy.float.

Solution:
Update the problematic lines in the file (e.g., transforms3d/quaternions.py):

_MAX_FLOAT = np.maximum_sctype(np.float64) _FLOAT_EPS = np.finfo(np.float64).eps

2. `graspnetAPI` Installation Issue

Error:

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [18 lines of output] The 'sklearn' PyPI package is deprecated, use 'scikit-learn' rather than 'sklearn' for pip commands.

Solution:
Allow deprecated scikit-learn compatibility by exporting the following environment variable:

export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True

3. CUDA Compatibility Issue

Error:

RuntimeError: CUDA error: no kernel image is available for execution on the device.

Solution:
Ensure the installed PyTorch version matches your CUDA version. For CUDA 11.8, use:

pip3 install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

4. Additional Dependencies

If you still encounter errors, install the following dependencies:

Install Python development tools:
```
sudo apt-get install python3-dev
```

Install GCC and G++ compilers via Conda:

conda install gxx_linux-64 conda install gcc_linux-64

Install Ray and GroundingDINO:

pip install ray pip install https://github.com/IDEA-Research/GroundingDINO/archive/refs/tags/v0.1.0-alpha2.tar.gz

Clone and install GroundingDINO:

cd langsam git clone https://github.com/IDEA-Research/GroundingDINO.git cd GroundingDINO pip install -e .

5. CUDA Installation

Install CUDA 11.8 using the downloaded installer:

sudo bash cuda_11.8.0_520.61.05_linux.run

Add the following lines to your ~/.bashrc file:

export CUDA_HOME=/usr/local/cuda-11.8 export PATH=$CUDA_HOME/bin:$PATH export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

Refresh the shell:

source ~/.bashrc

6. Vision-Language Processing (VLP) Setup

If you plan to use Vision-Language Processing (VLP):

Install additional requirements:
```
pip install -r vlp_requirements.txt
```

Download the required .pth files:

cd VLP wget https://github.com/Cheems-Seminar/grounded-segment-any-parts/releases/download/v1.0/swinbase_part_0a0000.pth wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

Place the downloaded files in the appropriate directory (som/downloaddata).

Comparison with Vision-Language Grasping (VLG)

If you want to compare with VLG, download the repository from VLG GitHub and replace the test data and assets.

Citation

If you find this work useful, please consider citing:

@misc{qian2024thinkgrasp, title={ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter}, author={Yaoyao Qian and Xupeng Zhu and Ondrej Biza and Shuo Jiang and Linfeng Zhao and Haojie Huang and Yu Qi and Robert Platt}, year={2024}, eprint={2407.11298}, archivePrefix={arXiv}, primaryClass={cs.RO} }

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.idea		.idea
VLP		VLP
assets		assets
heavy_unseen		heavy_unseen
langsam		langsam
models		models
testing_cases		testing_cases
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cameras.py		cameras.py
constants.py		constants.py
engine.py		engine.py
environment_sim.py		environment_sim.py
grasp_detetor.py		grasp_detetor.py
langsam.txt		langsam.txt
logger.py		logger.py
realarm.py		realarm.py
requirements.txt		requirements.txt
setup.py		setup.py
simulation_main.py		simulation_main.py
utils.py		utils.py
vlp_requirements.txt		vlp_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[CoRL 2024] ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

ThinkGrasp Featured as a Baseline in Recent Papers

Table of Contents

To-Do List

Setup

Installation Requirements

Installation Steps

Assets

Running the Simulation

Running the Realworld Code

Flask Application Notes:

Testing the API:

Postman:

Curl:

Python Script:

Notes:

Potential Issues of Installation

1. `AttributeError: module 'numpy' has no attribute 'float'`

2. `graspnetAPI` Installation Issue

3. CUDA Compatibility Issue

4. Additional Dependencies

5. CUDA Installation

6. Vision-Language Processing (VLP) Setup

Comparison with Vision-Language Grasping (VLG)

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

H-Freax/ThinkGrasp

Folders and files

Latest commit

History

Repository files navigation

[CoRL 2024] ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

ThinkGrasp Featured as a Baseline in Recent Papers

Table of Contents

To-Do List

Setup

Installation Requirements

Installation Steps

Assets

Running the Simulation

Running the Realworld Code

Flask Application Notes:

Testing the API:

Postman:

Curl:

Python Script:

Notes:

Potential Issues of Installation

1. AttributeError: module 'numpy' has no attribute 'float'

2. graspnetAPI Installation Issue

3. CUDA Compatibility Issue

4. Additional Dependencies

5. CUDA Installation

6. Vision-Language Processing (VLP) Setup

Comparison with Vision-Language Grasping (VLG)

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

1. `AttributeError: module 'numpy' has no attribute 'float'`

2. `graspnetAPI` Installation Issue

Packages