library(tidymodels)
data(Chicago)
<- Chicago %>% slice(1:730) # two years of data
chicago_small
<-
chicago_rec recipe(ridership ~ ., data = Chicago) %>%
step_date(date) %>%
step_holiday(date, keep_original_cols = FALSE) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors()) %>%
step_pca(all_of(stations), num_comp = 4)
Multiple models on one API
As someone who identifies as a Chicago Bears fan for life, I have some strong superstitions rituals on gameday. In a completely hypothetical example, I might have one model that I feel must be used when the Bears have a home game, and a different model for any other day. (They always win when I use this model to write blog posts on game days, and I fear a replay of the 2016 season when our win-loss ratio was 3-13 if I change my ways.)
Oftentimes you need to deploy more than one model. Even if the fate of your favorite NFL team isn’t depending upon your models’ locations, it’s important to understand where these models should be living. tldr;
- Input data is the same -> use one API
- Input data is not the same -> use multiple APIs
(These are not definitive rules. A lot of this is dependent upon architecture, how your deployment is set up, what works best for your organization, etc. Also, vetiver allows you to break both of these rules!)
If your models are unrelated and the input data is different, you probably want to put them on different APIs. However, if the input data is the same for multiple different models, it might make sense to deploy them on the same API, but at different endpoints.
For our Chicago train ridership data, we ALWAYS want to predict ridership from the same parameters every time. However, if the data indicates it was a home game for the Chicago Bears, we want to use our lucky model.
Let’s start by loading some data and use tidymodels
to put all our preprocessing in one recipe.
We can then use tidymodels to make multiple different models quickly. Notice that I am actually putting my preprocessing recipe in the recipe with my model. This is intentional, and best practice for modeling.
Feature engineering is part of your model workflow. You should be packaging this up with your model and deploying it as part of your model workflow. In Python, this involves using something like scikit learn Pipelines. In R, this involves using something like tidymodels workflows.
Build a few models for home games (support vectors for extra support at home):
<-
tree_model svm_linear() %>%
set_engine("LiblineaR") %>%
set_mode("regression")
<-
home_game_model workflow(chicago_rec, tree_model) %>%
fit(chicago_small)
And not home games:
<-
linear_model linear_reg() %>%
set_engine("lm")
<-
not_home_game_model workflow(chicago_rec, tree_model) %>%
fit(chicago_small)
Next, I will make some deployable model objects, or vetiver_model
, with the vetiver
package.
library(vetiver)
<- vetiver_model(home_game_model, "home")
home <- vetiver_model(not_home_game_model, "not_home") not_home
Normally we would be iterating through models and possibly versioning them as well, but let’s skip that step and go right to deployment.
I’ll deploy these two models on ONE plumber API by using a combination of vetiver_pr_post()
and vetiver_pr_docs()
along with the path
argument.
library(plumber)
pr() %>%
vetiver_pr_post(home, path = "/home_game") %>%
vetiver_pr_docs(home) %>%
vetiver_pr_post(not_home, path = "/not_home_game") %>%
vetiver_pr_docs(not_home) %>%
pr_run()
Now, to make predictions, I will route data to each endpoint respectively. To do this on your own computer, you will have to run the above commands as background job (or deploy it with docker.)
We can make predictions at each endpoint with data. Of course this can be as complex or simple as you desire, but here’s the meat-and-potatoes of it.
<- vetiver_endpoint('http://127.0.0.1:5331/home_game')
home_endpoint
<- Chicago %>%
home_data filter(Bears_Home == 1) %>%
tail(5)
predict(home_endpoint, home_data)
<- vetiver_endpoint('http://127.0.0.1:5331/not_home_game')
not_home_endpoint
<- Chicago %>%
not_home_data filter(Bears_Home == 0) %>%
tail(5)
predict(not_home_endpoint, not_home_data)
We have now created an API with multiple models at various endpoints, and successfully interacted with them! This is a great start to making more complex MLOps workflows with vetiver.