Welcome to the official repository for ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter.
So pleased to see ThinkGrasp being used as a baseline in the following papers:
Looking forward to more research in this area! 🚀
- To-Do List
- Setup
- Assets
- Running the Simulation
- Running the Realworld Code
- Potential Issues of Installation
- Citation
- Simulation Code Cleanup (without VLP)
- Real-World Code Cleanup (without VLP)
- Write a Complete README
- Add Additional Documentation
- Operating System: Ubuntu 23.04
- Dependencies:
- PyTorch: 1.13.1
- Torchvision: 0.14.1
- CUDA: 11.8
- Pybullet (simulation environment)
- Hardware: GTX 3090 x 2 (for the complete version)
- Minimum Requirements:
- Simulation: NVIDIA GTX 3090 (single GPU) with ~13GB GPU memory.
- Real-World Execution: NVIDIA GTX 3090 with ~9.38GB GPU memory (LangSAM).
- Recommended Setup:
- Two NVIDIA GTX 3090 GPUs for best performance when running VLPart.
- Minimum Requirements:
-
Create and Activate the Conda Environment:
conda create -n thinkgrasp python=3.8 conda activate thinkgrasp
-
Install PyTorch and Torchvision:
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
-
Allow Deprecated Scikit-learn:
export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True -
Install Additional Requirements:
pip install -r requirements.txt pip install -r langsam.txt
-
Develop Mode Installation:
python setup.py develop
-
Install PointNet2:
cd models/graspnet/pointnet2 python setup.py install cd ../knn python setup.py install cd ../../..
-
Install CUDA 11.8:
Download the CUDA installer and run:sudo bash cuda_11.8.0_520.61.05_linux.run
Add the following lines to your
~/.bashrcfile:export CUDA_HOME=/usr/local/cuda-11.8 export PATH=$CUDA_HOME/bin:$PATH export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
Refresh the shell:
source ~/.bashrc
Download the processed object models from:
Place the downloaded files in the assets folder. Ensure the structure is as follows:
ThinkGrasp └── assets ├── simplified_objects ├── unseen_objects_40 └── unseen_objects -
Log in to WandB:
wandb login
-
Set Your OpenAI API Key:
export OPENAI_API_KEY="sk-xxxxx"
-
Start the Simulation:
pip install protobuf==3.20.1 python simulation_main.py
-
Change Testing Data:
Update the dataset directory insimulation_main.pyby modifying line 238:parser.add_argument('--testing_case_dir', action='store', type=str, default='heavy_unseen/')
pip install flask python realarm.py-
Flask Configuration: The Flask application is configured to run on:
app.run(host='0.0.0.0', port=5000)
This allows the app to be accessed from any network interface on port
5000. -
API Endpoint: The Flask application provides the following endpoint:
POST http://localhost:5000/grasp_posePayload Format:
{ "image_path": "/path/to/rgb/image.png", "depth_path": "/path/to/depth/image.png", "text_path": "/path/to/goal_text.txt" }- image_path: The path to the RGB image captured by the real-world camera connected to your robotic setup.
- depth_path: The path to the depth image from the same real-world camera.
- text_path: A text file containing the goal or task description.
You can test the API using various tools:
- Open Postman and create a new POST request.
- Set the URL to
http://localhost:5000/grasp_pose. - In the "Body" tab, select "raw" and set the type to
JSON. - Provide the JSON payload, ensuring the paths point to the images captured by your real-world camera:
{ "image_path": "/home/freax/camera_outputs/rgb_image.png", "depth_path": "/home/freax/camera_outputs/depth_image.png", "text_path": "/home/freax/goal_texts/task_goal.txt" } - Click "Send" to test the endpoint.
Alternatively, use curl in the terminal:
curl -X POST http://localhost:5000/grasp_pose \ -H "Content-Type: application/json" \ -d '{ "image_path": "/home/freax/camera_outputs/rgb_image.png", "depth_path": "/home/freax/camera_outputs/depth_image.png", "text_path": "/home/freax/goal_texts/task_goal.txt" }'Use Python's requests library:
import requests url = "http://localhost:5000/grasp_pose" payload = { "image_path": "/home/freax/camera_outputs/rgb_image.png", "depth_path": "/home/freax/camera_outputs/depth_image.png", "text_path": "/home/freax/goal_texts/task_goal.txt" } response = requests.post(url, json=payload) print(response.json())- Ensure that the real-world camera is correctly configured and outputs the RGB and depth images to the specified paths (
/home/freax/camera_outputs/in the example). - If testing on a remote server, replace
localhostwith the server's IP address in your requests. - Verify that all files are accessible and correctly formatted for processing by the application.
- Cause: Deprecated usage of
numpy.float. - Solution:
Update the problematic lines in the file (e.g.,transforms3d/quaternions.py):_MAX_FLOAT = np.maximum_sctype(np.float64) _FLOAT_EPS = np.finfo(np.float64).eps
Error:
× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [18 lines of output] The 'sklearn' PyPI package is deprecated, use 'scikit-learn' rather than 'sklearn' for pip commands. Solution:
Allow deprecated scikit-learn compatibility by exporting the following environment variable:
export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=TrueError:
RuntimeError: CUDA error: no kernel image is available for execution on the device. Solution:
Ensure the installed PyTorch version matches your CUDA version. For CUDA 11.8, use:
pip3 install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117If you still encounter errors, install the following dependencies:
-
Install Python development tools:
sudo apt-get install python3-dev
-
Install GCC and G++ compilers via Conda:
conda install gxx_linux-64 conda install gcc_linux-64
-
Install Ray and GroundingDINO:
pip install ray pip install https://github.com/IDEA-Research/GroundingDINO/archive/refs/tags/v0.1.0-alpha2.tar.gz
-
Clone and install GroundingDINO:
cd langsam git clone https://github.com/IDEA-Research/GroundingDINO.git cd GroundingDINO pip install -e .
Install CUDA 11.8 using the downloaded installer:
sudo bash cuda_11.8.0_520.61.05_linux.runAdd the following lines to your ~/.bashrc file:
export CUDA_HOME=/usr/local/cuda-11.8 export PATH=$CUDA_HOME/bin:$PATH export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATHRefresh the shell:
source ~/.bashrcIf you plan to use Vision-Language Processing (VLP):
-
Install additional requirements:
pip install -r vlp_requirements.txt
-
Download the required
.pthfiles:cd VLP wget https://github.com/Cheems-Seminar/grounded-segment-any-parts/releases/download/v1.0/swinbase_part_0a0000.pth wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -
Place the downloaded files in the appropriate directory (
som/downloaddata).
If you want to compare with VLG, download the repository from VLG GitHub and replace the test data and assets.
If you find this work useful, please consider citing:
@misc{qian2024thinkgrasp, title={ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter}, author={Yaoyao Qian and Xupeng Zhu and Ondrej Biza and Shuo Jiang and Linfeng Zhao and Haojie Huang and Yu Qi and Robert Platt}, year={2024}, eprint={2407.11298}, archivePrefix={arXiv}, primaryClass={cs.RO} }