PyData Global 2022
if you develop models…
you can operationalize them
a set of practices to deploy and maintain machine learning models in production reliably and efficiently
and these practices can be HARD.
import pandas as pd
import numpy as np
np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
"celebrity", "danger", "animals"]].dropna()
from sklearn import model_selection, preprocessing, ensemble
X_train, X_test, y_train, y_test = model_selection.train_test_split(
df.drop(columns = ['like_count']),
df['like_count'],
test_size=0.2
)
import pandas as pd
import numpy as np
np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
"celebrity", "danger", "animals"]].dropna()
from sklearn import model_selection, preprocessing, ensemble
X_train, X_test, y_train, y_test = model_selection.train_test_split(
df.drop(columns = ['like_count']),
df['like_count'],
test_size=0.2
)
oe = preprocessing.OrdinalEncoder().fit(X_train)
rf = ensemble.RandomForestRegressor().fit(oe.transform(X_train), y_train)
rf_pipe = pipeline.Pipeline([('ordinal_encoder',oe), ('random_forest', rf)])
model
model_final
model_final_final
model_final_final_actually
model_final_final_actually (1)
Meta(title='ads: a pinned Pipeline object',
description="Scikit-learn <class 'sklearn.pipeline.Pipeline'> model",
created='20221102T094151Z',
pin_hash='4db397b49e7bff0b',
file='ads.joblib',
file_size=1087,
type='joblib',
api_version=1,
version=VersionRaw(version='65155'),
name='ads',
user={'required_pkgs': ['vetiver', 'scikit-learn']})
not only good models, but good models
Composability
VetiverAPI
and VetiverModel
Ergonomics