Holistic MLOps for better science

PyData NYC 2022

Isabel Zimmerman, Posit PBC

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

and these practices can be HARD.

import pandas as pd
import numpy as np

np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
    "celebrity", "danger", "animals"]].dropna()

df.head()

like_count funny show_product_quickly celebrity danger animals
0 1233.0 False False False False False
1 485.0 True True True True False
2 129.0 True False False True True
3 2.0 False True False False False
4 20.0 True True False True True

import pandas as pd
import numpy as np

np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
    "celebrity", "danger", "animals"]].dropna()

from sklearn import model_selection, preprocessing, ensemble

X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop(columns = ['like_count']),
    df['like_count'],
    test_size=0.2
)

import pandas as pd
import numpy as np

np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
    "celebrity", "danger", "animals"]].dropna()

from sklearn import model_selection, preprocessing, ensemble

X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop(columns = ['like_count']),
    df['like_count'],
    test_size=0.2
)
oe = preprocessing.OrdinalEncoder().fit(X_train)
rf = ensemble.RandomForestRegressor().fit(oe.transform(X_train), y_train)
rf_pipe = pipeline.Pipeline([('ordinal_encoder',oe), ('random_forest', rf)])

if you develop models…

you probably should operationalize them

versioning

versioning

model

model_final

model_final_final

model_final_final_actually

model_final_final_actually (1)

versioning

  • living in a central location
  • discoverable by team
  • load right into memory

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp( # create place for models to be stored
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads")

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp( # can also be s3, azure, gcs, connect
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads")

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp( # create place for models to be stored
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads") # create deployable model object

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp( # create place for models to be stored
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads") # create deployable model object
vetiver_pin_write(model_board, v)

Meta(title='ads: a pinned Pipeline object',
    description="Scikit-learn <class 'sklearn.pipeline.Pipeline'> model", 
    created='20221102T094151Z', 
    pin_hash='4db397b49e7bff0b', 
    file='ads.joblib', 
    file_size=1087, 
    type='joblib', 
    api_version=1, 
    version=VersionRaw(version='65155'), 
    name='ads', 
    user={'required_pkgs': ['vetiver', 'scikit-learn']})

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp(
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads")
vetiver_pin_write(model_board, v)
library(vetiver)
library(pins)

model_board <- board_temp()

v <- vetiver_model(rf, "ads")
model_board %>% 
  vetiver_pin_write(v)

know what your input data should look like

  • save a piece of your data to better debug when things go wrong

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp(
    allow_pickle_read = True)

v = VetiverModel(rf, "ads", ptype_data = X_train)
vetiver_pin_write(model_board, rf)

utilizing model cards

not only good models, but good models

  • summary
  • documentation
  • fairness

utilizing model cards

vetiver_pin_write(model_board, v)

utilizing model cards

Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()

utilizing model cards

vetiver.vetiver_pin_write(model_board, v)

vetiver.model_card()

utilizing model cards

utilizing model cards

utilizing model cards

deploy your model

deploy your model

deploy your model

my_api = VetiverAPI(v)

my_api.run()

deploy your model

vetiver.deploy_rsconnect(
    connect_server = connect_server, 
    board = model_board, 
    pin_name = "ads", 
    version = "59869")

deploy your model

vetiver.write_app(board=board, pin_name="ads", version = "59869")
vetiver.write_docker(app_file="app.py")

model doesn’t live in deployment

model doesn’t live in deployment

model doesn’t live in deployment

model doesn’t live in deployment

monitoring

monitoring

metrics = vetiver.compute_metrics(
    new_data, 
    "date", 
    timedelta(weeks = 1), 
    [mean_absolute_error, r2_score], 
    "like_count", 
    "y_pred"
    )

vetiver.pin_metrics(
    model_board, 
    metrics, 
    "metrics_pin_name", 
    overwrite = True
    )

vetiver.plot_metrics(metrics)

monitoring

what’s the difference between MLOps tools?

composability

  • VetiverModel
  • VetiverAPI

ergonomics

  • feels good to use
  • works with the tools you like

vetiver.rstudio.com or visit us at the Posit booth!