LLM Security and Poisoning

Authors: Georg Felber, Grgic Filip, Sperk Lukas

LLM Security and Poisoning

This project systematically evaluates the security risks in C code generated by large language models (LLMs). We benchmark OpenAI's GPT-4o, Anthropic's Claude 3.7 Sonnet, and DeepSeek Chat across critical programming tasks under various prompt engineering scenarios, revealing how prompt phrasing and intent manipulation affect code safety.

Framework

Reproducibility

The entire workflow is automated and reproducible:

# isntall requirements pip install -r requirements.txt # list available templates ./test.py test.py list # generate new tests ./test.py run [OPTIONS] TEMPLATE # run tests on cached files ./test.py cache [OPTIONS] TEMPLATE # iterate over all memory corruptions ./test.py analyze [OPTIONS] TEMPLATE # Analyze logged results (create diagrams) ./analyze.py

adding new problems

Templates are listed in template/ and compose of the following files (taking array_index as an example):

array_index/ ├── bugs.c ├── oracle.c ├── problem.md └── tests └── ...

bugs.c
is the klee setup file that comparesn file contains the problem statement and the prompt used to generate the test
oracle.c
this files contain the code that is that fullfills the task and used for comparsion against the generated code
problem.md
this markdown file contains the problem statement and the prompt used to generate the test
tests/
this folder contains the generated tests

Results

Overview

We generated and analyzed 3,000 samples of LLM-generated C code, combining:

3 Models: GPT-4o, Claude 3.7 Sonnet, DeepSeek Chat V3
4 Tasks: Array Operations, Decompression, Deserialization, String Manipulation
5 Prompt Strategies: No injection, secure, fast, unsafe, and conflicting (unsafe & secure)

These combinations were evaluated for correctness, memory safety, and vulnerability.

🔥 Error Heatmap

A breakdown of bug types and frequency across all models and prompts.

Prompt Injection Strategies

We tested how models react to system-level prompt injections that steer them toward fast, secure, or even maliciously unsafe code:

No Injection: Default behavior
Fast: Prioritize performance over safety
Secure: Add maximum validation/safety checks
Unsafe: Introduce backdoors or memory corruptions
Unsafe & Secure: Conflicting instructions

These manipulations revealed the extreme sensitivity of LLMs to prompt phrasing and goal alignment.

Tasks and Testing Pipeline

Each LLM was asked to solve four security-relevant tasks in C:

Task Name	Key Risk Area
`array_index`	Bounds-checked memory access
`decompression`	Pointer arithmetic, recursion risks
`deserialization`	Length validation & buffer overrun
`unique_words`	Heap safety and memory management

The generated code was compiled and symbolically analyzed using KLEE.

Outcome Distribution

Each output was labeled as:

Bug: logical or functional error
Crpt: memory corruption
Failed: compilation or runtime failure

📊 Outcome Categories Across All Samples

Key Results

37.4% of generated samples had logical bugs
14.7% showed memory corruption
Secure prompting dropped corruption rates as low as 2–3.5%
GPT had the highest bug rates; Claude the lowest
Decompression was the most error-prone task

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
klee		klee
pics		pics
template		template
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
analyze.py		analyze.py
inject.py		inject.py
llm.py		llm.py
requirements.txt		requirements.txt
statistic.csv		statistic.csv
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Security and Poisoning

Framework

Reproducibility

adding new problems

Results

Overview

🔥 Error Heatmap

Prompt Injection Strategies

Tasks and Testing Pipeline

Outcome Distribution

📊 Outcome Categories Across All Samples

Key Results

⚔️ Model Comparison by Prompt Type

Impact of Prompt Engineering

📉 Bug Rates by Prompt Strategy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

gfelber/llm_security_and_poisoning

Folders and files

Latest commit

History

Repository files navigation

LLM Security and Poisoning

Framework

Reproducibility

adding new problems

Results

Overview

🔥 Error Heatmap

Prompt Injection Strategies

Tasks and Testing Pipeline

Outcome Distribution

📊 Outcome Categories Across All Samples

Key Results

⚔️ Model Comparison by Prompt Type

Impact of Prompt Engineering

📉 Bug Rates by Prompt Strategy

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages