I am trying to change a livebook smart cell to a one-file Elixir script, mostly for educational reasons. I successfully Mix.install
dependencies, load the model (“Salesforce/blip-image-captioning-base”), create a featurizer, a tokenizer, run the configure step and create a serving.
serving = Bumblebee.Vision.image_to_text(model_info, featurizer, tokenizer, generation_config, compile: [batch_size: 1], defn_options: [compiler: EXLA] )
And then I load an image like so:
image = "/tmp/22911352.jpg" |> File.read!() |> Nx.from_binary(:u8) |> ....
I cannot figure out what the last step in the image processing pipeline should be. The original smart cell does the following:
image = image.file_ref |> Kino.Input.file_path() |> File.read!() |> Nx.from_binary(:u8) |> Nx.reshape({image.height, image.width, 3})
The height and width are 3468, 4624 but when I hard-code the values and do
.. |> Nx.reshape({3468, 4624, 3})
I get the following error:
cannot reshape, current shape {4190582} is not compatible with new shape {3468, 4624, 3}
The very last step needs to be
Nx.Serving.run(serving, image)
This is my first look at livebook - I don’t yet understand where image
in the smart cell comes from.