A neural network model to classify actions proposed by autonomous AI agents as harmful or safe. The model has been based on a small dataset of labeled examples. The work aims to enhance the safety and reliability of AI agents by preventing them from executing actions that are potentially harmful, unethical, or violate predefined guidelines.
- Create a virtual environment and install dependencies:
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txtFor development (optional, includes linting, formatting, and testing tools):
pip install -r requirements-dev.txt- Train the model (Optional):
python3 action_classifier/train_nn.py- Implement the trained model in LLM calls - run the example:
python3 action_classifier/run_sample_query.py- Detailed usage and API examples:
docs/USAGE.md - Runnable example scripts:
examples/example_query.py(seeexamples/README.md)
action_classifier/sample_actions.json— dataset of action prompts and labels/resources in MCP-like format.action_classifier/train_nn.py— small script that trains a neural network model and saves the trained model.action_classifier/action_classifier.py— module that loads the trained model and provides a function to classify actions.action_classifier/run_sample_query.py— script to classify new actions using the trained model (example wrapper).requirements.txt— minimal dependencies.requirements-dev.txt— development dependencies (linting, formatting, testing tools).
If you find this repository useful in your research, please consider citing:
@misc{vadlapati2025agentactionclassifier, author = {Vadlapati, Praneeth}, title = {Agent Action Classifier: Classifying AI agent actions to ensure safety and reliability}, year = {2025}, howpublished = {\url{https://github.com/Pro-GenAI/Agent-Action-Classifier}}, note = {GitHub repository}, }Agent-Supervisor: Supervising Actions of Autonomous AI Agents for Ethical Compliance: GitHub


