Practical MLOps for better models

PyData Global 2022

Isabel Zimmerman, Posit PBC

if you develop models…

you can operationalize them

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

and these practices can be HARD.

import pandas as pd
import numpy as np

np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
    "celebrity", "danger", "animals"]].dropna()

import pandas as pd
import numpy as np

np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
    "celebrity", "danger", "animals"]].dropna()

from sklearn import model_selection, preprocessing, ensemble

X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop(columns = ['like_count']),
    df['like_count'],
    test_size=0.2
)

import pandas as pd
import numpy as np

np.random.RandomState(500)
raw = pd.read_csv('https://bit.ly/3sWty5A')
df = raw[["like_count", "funny", "show_product_quickly", \
    "celebrity", "danger", "animals"]].dropna()

from sklearn import model_selection, preprocessing, ensemble

X_train, X_test, y_train, y_test = model_selection.train_test_split(
    df.drop(columns = ['like_count']),
    df['like_count'],
    test_size=0.2
)
oe = preprocessing.OrdinalEncoder().fit(X_train)
rf = ensemble.RandomForestRegressor().fit(oe.transform(X_train), y_train)
rf_pipe = pipeline.Pipeline([('ordinal_encoder',oe), ('random_forest', rf)])

what are some pieces used in mlops?

  • orchestration
  • experiment tracking
  • model versioning
  • model serving
  • model monitoring

versioning

versioning

model

model_final

model_final_final

model_final_final_actually

model_final_final_actually (1)

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp( # create place for models to be stored
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads")

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp( # can also be s3, azure, gcs, connect
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads")

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp( # create place for models to be stored
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads") # create deployable model object

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp( # create place for models to be stored
    allow_pickle_read = True)

v = VetiverModel(rf_pipe, "ads") # create deployable model object
vetiver_pin_write(model_board, v)

Meta(title='ads: a pinned Pipeline object',
    description="Scikit-learn <class 'sklearn.pipeline.Pipeline'> model", 
    created='20221102T094151Z', 
    pin_hash='4db397b49e7bff0b', 
    file='ads.joblib', 
    file_size=1087, 
    type='joblib', 
    api_version=1, 
    version=VersionRaw(version='65155'), 
    name='ads', 
    user={'required_pkgs': ['vetiver', 'scikit-learn']})

know what your input data should look like

  • save a piece of your data to better debug when things go wrong

import pins
from vetiver import VetiverModel, vetiver_pin_write

model_board = pins.board_temp(
    allow_pickle_read = True)

v = VetiverModel(rf, "ads", ptype_data = X_train)
vetiver_pin_write(model_board, rf)

utilizing model cards

not only good models, but good models

  • summary
  • documentation
  • fairness

utilizing model cards

vetiver_pin_write(model_board, v)

utilizing model cards

Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()

utilizing model cards

vetiver.vetiver_pin_write(model_board, v)
vetiver.model_card()

utilizing model cards

utilizing model cards

utilizing model cards

deploy your model

deploy your model

deploy your model

my_api = VetiverAPI(v)
my_api.run()

deploy your model

vetiver.deploy_rsconnect(
    connect_server = connect_server, 
    board = model_board, 
    pin_name = "ads", 
    version = "59869")

vetiver.write_app(board=board, pin_name="ads")
vetiver.write_docker(app_file="app.py")

monitoring

monitoring

metrics = vetiver.compute_metrics(
    new_data, 
    "date", 
    timedelta(weeks = 1), 
    [mean_absolute_error, r2_score], 
    "like_count", 
    "y_pred"
    )

vetiver.pin_metrics(
    model_board, 
    metrics, 
    "metrics_pin_name", 
    overwrite = True
    )
    
vetiver.plot_metrics(metrics)

monitoring

Why should I be excited about vetiver?

Composability

  • Internally, with VetiverAPI and VetiverModel
  • Externally, leveraging the tools vetiver is built on

Ergonomics

  • feels good to use
  • works with the tools you like

vetiver.rstudio.com