R-Ladies Rome
vetiver
, pins
, tidymodels
, and ranger
packagesimport pandas as pd
import numpy as np
np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
"celebrity", "danger", "animals"]].dropna()
from sklearn import model_selection, preprocessing, ensemble
X_train, X_test, y_train, y_test = model_selection.train_test_split(
df.drop(columns = ['like_count']),
df['like_count'],
test_size=0.2
)
import pandas as pd
import numpy as np
np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
"celebrity", "danger", "animals"]].dropna()
from sklearn import model_selection, preprocessing, ensemble
X_train, X_test, y_train, y_test = model_selection.train_test_split(
df.drop(columns = ['like_count']),
df['like_count'],
test_size=0.2
)
oe = preprocessing.OrdinalEncoder().fit(X_train)
rf = ensemble.RandomForestRegressor().fit(oe.transform(X_train), y_train)
rf_pipe = pipeline.Pipeline([('ordinal_encoder',oe), ('random_forest', rf)])
if you develop models…
you can operationalize them
a set of practices to deploy and maintain machine learning models in production reliably and efficiently
and these practices can be HARD.
model
model_final
model_final_final
model_final_final_actually
model_final_final_actually (1)
pins package publishes data, models, and other R objects, making it easy to share them across projects and with your colleagues
.Rds
objectsMeta(title='ads: a pinned Pipeline object',
description="Scikit-learn <class 'sklearn.pipeline.Pipeline'> model",
created='20221102T094151Z',
pin_hash='4db397b49e7bff0b',
file='ads.joblib',
file_size=1087,
type='joblib',
api_version=1,
version=VersionRaw(version='65155'),
name='ads',
user={'required_pkgs': ['vetiver', 'scikit-learn']})
not only good models, but good models
Composability
VetiverAPI
and VetiverModel
Ergonomics