After a lot of struggle doing this, I finally found a simple way.
IMPORTANT:
I've discovered that if you want to be able to save a model/pipeline and have it be importable without encountering ModuleNotFoundError
s when you try to load it again, then you need to be sure that your model is built in the same place that it's getting saved. In the case of a neural network, this means compiling, fitting, and saving in the same module. This has been a big headache for me, so I hope you can avoid it.
We can write and read Tensorflow
and sklearn
models/pipelines using joblib
.
Local Write / Read
from pathlib import Path path = Path(<local path>) # WRITE with path.open("wb") as f: joblib.dump(model, f) # READ with path.open("rb") as f: f.seek(0) model = joblib.load(f)
We can do the same thing on AWS S3 using a boto3
client:
AWS S3 Write / Read
import tempfile import boto3 import joblib s3_client = boto3.client('s3') bucket_name = "my-bucket" key = "model.pkl" # WRITE with tempfile.TemporaryFile() as fp: joblib.dump(model, fp) fp.seek(0) s3_client.put_object(Body=fp.read(), Bucket=bucket_name, Key=key) # READ with tempfile.TemporaryFile() as fp: s3_client.download_fileobj(Fileobj=fp, Bucket=bucket_name, Key=key) fp.seek(0) model = joblib.load(fp) # DELETE s3_client.delete_object(Bucket=bucket_name, Key=key)
Top comments (0)