PyData NYC 2022
Isabel Zimmerman, Posit PBC
a set of practices to deploy and maintain machine learning models in production reliably and efficiently
and these practices can be HARD.
like_count | funny | show_product_quickly | celebrity | danger | animals | |
---|---|---|---|---|---|---|
0 | 1233.0 | False | False | False | False | False |
1 | 485.0 | True | True | True | True | False |
2 | 129.0 | True | False | False | True | True |
3 | 2.0 | False | True | False | False | False |
4 | 20.0 | True | True | False | True | True |
import pandas as pd
import numpy as np
np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
"celebrity", "danger", "animals"]].dropna()
from sklearn import model_selection, preprocessing, ensemble
X_train, X_test, y_train, y_test = model_selection.train_test_split(
df.drop(columns = ['like_count']),
df['like_count'],
test_size=0.2
)
import pandas as pd
import numpy as np
np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
"celebrity", "danger", "animals"]].dropna()
from sklearn import model_selection, preprocessing, ensemble
X_train, X_test, y_train, y_test = model_selection.train_test_split(
df.drop(columns = ['like_count']),
df['like_count'],
test_size=0.2
)
oe = preprocessing.OrdinalEncoder().fit(X_train)
rf = ensemble.RandomForestRegressor().fit(oe.transform(X_train), y_train)
rf_pipe = pipeline.Pipeline([('ordinal_encoder',oe), ('random_forest', rf)])
if you develop models…
you probably should operationalize them
model
model_final
model_final_final
model_final_final_actually
model_final_final_actually (1)
Meta(title='ads: a pinned Pipeline object',
description="Scikit-learn <class 'sklearn.pipeline.Pipeline'> model",
created='20221102T094151Z',
pin_hash='4db397b49e7bff0b',
file='ads.joblib',
file_size=1087,
type='joblib',
api_version=1,
version=VersionRaw(version='65155'),
name='ads',
user={'required_pkgs': ['vetiver', 'scikit-learn']})
not only good models, but good models