GitHub - Pro-GenAI/Agent-Action-Classifier: Classifying AI agent actions to ensure safety and reliability

Agent MCP Action Guard

Classifying AI agent actions to ensure safety and reliability

A neural network model to classify actions proposed by autonomous AI agents as harmful or safe. The model has been based on a small dataset of labeled examples. The work aims to enhance the safety and reliability of AI agents by preventing them from executing actions that are potentially harmful, unethical, or violate predefined guidelines.

Implementation

Training

Usage:

Create a virtual environment and install dependencies:

python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt

For development (optional, includes linting, formatting, and testing tools):

pip install -r requirements-dev.txt

Train the model (Optional):

python3 action_classifier/train_nn.py

Implement the trained model in LLM calls - run the example:

python3 action_classifier/run_sample_query.py

Docs and examples

Detailed usage and API examples: docs/USAGE.md
Runnable example scripts: examples/example_query.py (see examples/README.md)

Files:

action_classifier/sample_actions.json — dataset of action prompts and labels/resources in MCP-like format.
action_classifier/train_nn.py — small script that trains a neural network model and saves the trained model.
action_classifier/action_classifier.py — module that loads the trained model and provides a function to classify actions.
action_classifier/run_sample_query.py — script to classify new actions using the trained model (example wrapper).
requirements.txt — minimal dependencies.
requirements-dev.txt — development dependencies (linting, formatting, testing tools).

Citation

If you find this repository useful in your research, please consider citing:

@misc{vadlapati2025agentactionclassifier, author = {Vadlapati, Praneeth}, title = {Agent Action Classifier: Classifying AI agent actions to ensure safety and reliability}, year = {2025}, howpublished = {\url{https://github.com/Pro-GenAI/Agent-Action-Classifier}}, note = {GitHub repository}, }

Created based on my past work

Agent-Supervisor: Supervising Actions of Autonomous AI Agents for Ethical Compliance: GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
action_classifier		action_classifier
assets		assets
docs		docs
examples		examples
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agent MCP Action Guard

Classifying AI agent actions to ensure safety and reliability

Implementation

Training

Usage:

Docs and examples

Files:

Citation

Created based on my past work

About

Uh oh!

Languages

License

Pro-GenAI/Agent-Action-Classifier

Folders and files

Latest commit

History

Repository files navigation

Agent MCP Action Guard

Classifying AI agent actions to ensure safety and reliability

Implementation

Training

Usage:

Docs and examples

Files:

Citation

Created based on my past work

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages