- Notifications
You must be signed in to change notification settings - Fork 604
Description
Version
Version 0.18.0
Description
When the input shape of an ONNX model has been set to a string (thus indicating that the axes are dynamic), then making a prediction will give an error of this kind:
cortex.lib.exceptions.UserException: error: key 'input_ids' for model '_cortex_default': failed to convert to NumPy array for model '_cortex_default': cannot reshape array of size 6 into shape (1,1)
Here's an example of a model's input shapes:
model input type shape attention_mask int64 (batch, sequence) input_ids int64 (batch, sequence)
Steps to reproduce
Within a given directory, run all the following steps.
Creating environment/model
Create a virtual environment for Python 3.6.9 and install the following pip dependencies:
onnxruntime==1.3.0 torch==1.5.0 transformers==3.0.0 scipy==1.4.1
Within that environment, run the following instructions to export the XLM-Roberta model in ONNX format:
from transformers.convert_graph_to_onnx import convert convert(framework="pt", model="xlm-roberta-base", output="./output/xlm-roberta-base.onnx", opset=11)
Now, let's run the following:
python -m onnxruntime_tools.optimizer_cli --input ./output/xlm-roberta-base.onnx --output ./output/xlm-roberta-base.onnx --model_type bert --float16
Creating the Cortex deployment
Create a cortex.yaml
config file with the following content:
# cortex.yaml - name: api predictor: type: onnx model_path: ./output/xlm-roberta-base.onnx path: predictor.py image: cortexlabs/onnx-predictor-cpu:0.18.0
Create a predictor.py
script with the following content:
# predictor.py from transformers import XLMRobertaTokenizer from scipy.special import softmax import time class ONNXPredictor: def __init__(self, onnx_client, config): self.client = onnx_client self.tokenizer = XLMRobertaTokenizer.from_pretrained("xlm-roberta-base") def predict(self, payload): start = time.time() model_inputs = self.tokenizer.encode_plus(payload["text"], max_length=512, return_tensors="pt", truncation=True) inputs_onnx = {k: v.cpu().detach().numpy() for k, v in model_inputs.items()} print(self.client._signatures) output = self.client.predict(inputs_onnx) output = softmax(output[0], axis=1)[0].tolist() end = time.time() return {"output": output, "time": end - start}
Copy-paste the pip dependencies as mentioned above into a requirements.txt
file and within the same directory as that of the cortex.yaml
config file, run cortex deploy -e local
. Wait for the API to be live and then run:
curl http://localhost:8888 -X POST -H "Content-Type: application/json" -d '{"text": "That is a nice"}'
Error
The above command will return a non-200 response code. Inspect the logs with cortex get api
. The expected error is:
cortex.lib.exceptions.UserException: error: key 'input_ids' for model '_cortex_default': failed to convert to numpy array for model '_cortex_default': cannot reshape array of size 6 into shape (1,1)