upload model to S3

I'm using AWS Sage Maker to build my model. I want to store the model in S3 for later use. How do you save your model in S3 with Amazon Sage Maker? I know this seems trivial but I didn't understand the sources/documentation I've read.

Topic sagemaker scikit-learn aws python

Category Data Science


To expand on the other answer: this is a problem that I've run into several times myself, and so I've built an open source modelstore library that automates this step - as well as doing other things like versioning the model, and storing it in s3 with structured paths.

The code to use it looks like this (there is a full example here):

from modelstore import ModelStore

# Train your model, as usual
model = LinearRegression()
model.fit(X, y)

# Create a model store that points to your s3 bucket
bucket_name = "your-bucket-name"
modelstore = ModelStore.from_aws_s3(bucket_name)

# Upload your model
model_domain = "your-model-domain"
modelstore.sklearn.upload(model_domain, model=model)

This will dump your model to a file, create a tar archive from it, and then upload that to s3 for you. The function returns some meta-data as a dictionary; this includes the version ID for your model.


You can use pickle (or any other format to serialize your model) and boto3 library to save your model to s3.

To save your model as a pickle file you can use:

import pickle
import numpy as np

from sklearn.linear_model import LinearRegression

X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3

model = LinearRegression().fit(X, y)

# save the model to disk
pkl_filename = 'pickle_model.pkl'
with open(pkl_filename, 'wb') as file:
    pickle.dump(model, file)

and to save your model as a pickle file to s3, rather than the sagemaker's local:

# to save the model to s3
import boto3

# For aws credentials, if ~/.aws/credentials is missing
# access_key_id =  '...'
# secret_access_key = '...'

# session = boto3.Session(
#     aws_access_key_id=access_key_id ,
#     aws_secret_access_key=secret_access_key,)

# s3_resource = session.resource('s3')

s3_resource = boto3.resource('s3')

bucket='your_bucket'
key= 'pickle_model.pkl'

pickle_byte_obj = pickle.dumps(model)

s3_resource.Object(bucket,key).put(Body=pickle_byte_obj)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.