Decoupled Alignment for Robust Plug-and-Play Adaptation

This project implements a decoupled alignment approach for robust plug-and-play adaptation of language models. It provides tools for analyzing and modifying model behavior while maintaining alignment with desired objectives.

Project Structure

. ├── plugin_aligner/ # Core alignment implementation │ ├── replace.py # Main replacement logic │ ├── analyze.py # Analysis tools │ ├── evulate.py # Evaluation utilities │ └── utils/ # Helper utilities ├── Dataset/ # Dataset directory ├── template_checker/ # Template verification tools └── scripts ├── run_test.sh # Testing script ├── run_jailbreak.sh # Jailbreak testing └── run_ppl.sh # Perplexity evaluation

Installation

Create and activate a conda environment:

conda create --name jailbreak python=3.9.18 conda activate jailbreak

Install required packages:

# PyTorch and related packages pip3 install torch torchvision torchaudio # Transformers and language model tools pip install -U transformers pip install openai pandas einops accelerate pip install sentencepiece protobuf pip install transformers_stream_generator tiktoken pip install ai2-olmo autoawq auto-gptq pip install sympy importlib-metadata

Set up CUDA environment (if using GPU):

export CUDA_HOME=/usr/local/cuda export LD_LIBRARY_PATH=/usr/local/cuda/lib64 export PATH=/usr/local/cuda/bin:$PATH

Usage

Running Tests

To run tests with a specific model:

python plugin_aligner/replace.py --target_model meta-llama/Llama-2-7b-hf

Available Scripts

run_test.sh: Run comprehensive tests
run_jailbreak.sh: Evaluate model robustness
run_ppl.sh: Calculate perplexity metrics

Configuration

Set HF_TOKEN environment variable for Hugging Face model access
Adjust GPU settings in scripts as needed
Modify dataset paths in configuration files

License

See LICENSE file for details.

7. Citation

If you have any question regarding our paper or codes, please feel free to start an issue.

If you use DAPA in your work, please kindly cite our paper:

DAPA

@misc{luo2024decoupledalignmentrobustplugandplay, title={Decoupled Alignment for Robust Plug-and-Play Adaptation}, author={Haozheng Luo and Jiahao Yu and Wenxin Zhang and Jialong Li and Jerry Yao-Chieh Hu and Xinyu Xing and Han Liu}, year={2024}, eprint={2406.01514}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2406.01514}, }

8.Acknowledgement

We appreciate the following GitHub repos a lot for their valuable code and efforts.

GPTFuzz (https://github.com/sherdencooper/GPTFuzz)
ROME (https://github.com/kmeng01/rome)
JailbreakBench (https://github.com/JailbreakBench/jailbreakbench)
Chain-of-Actions (https://github.com/MAGICS-LAB/Chain-of-Actions)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Decoupled Alignment for Robust Plug-and-Play Adaptation

Project Structure

Installation

Usage

Running Tests

Available Scripts

Configuration

License

7. Citation

8.Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
Dataset		Dataset
plugin_aligner		plugin_aligner
template_checker		template_checker
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run_jailbreak.sh		run_jailbreak.sh
run_ppl.sh		run_ppl.sh
run_test.sh		run_test.sh

License

MAGICS-LAB/DAPA

Folders and files

Latest commit

History

Repository files navigation

Decoupled Alignment for Robust Plug-and-Play Adaptation

Project Structure

Installation

Usage

Running Tests

Available Scripts

Configuration

License

7. Citation

8.Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages