import geopandas
import geodatasets
import pandas as pd
import numpy as np
from plotnine import *
= pd.read_csv(
cheese "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-06-04/cheeses.csv"
)= pd.read_csv(
country_lat_long "./world_country_and_usa_states_latitude_and_longitude_values.csv"
"latitude", "longitude", "country"]] )[[
Plotnine plot contest
plotnine
Plotnine is running a plot contest! And it is not too late to enter, submissions close July 12!
For my submission1, I’ll start with reading in cheese data from TidyTuesday, as well as a csv of longitudes and latitudes of different countries.
Next, some data manipulation to set us up with tidy data to plot.
'country']=cheese['country'].str.split(pat=', ')
cheese['milk']= cheese['milk'].str.split(pat=', ')
cheese[= cheese.explode('country').explode('milk')
cheese = cheese.groupby('country')['milk'].agg(pd.Series.mode).reset_index()
mode_milk = cheese["country"].str.split(pat=', ').explode().str.strip().value_counts().reset_index() country_value
= country_value.merge(country_lat_long, how = "left", on="country")
cheese_plot = cheese_plot.merge(mode_milk, how = "left", on = "country").explode('milk')
cheese_plot = cheese_plot.sort_values("count", ascending=False).replace(["United Kingdom"], ["U.K."]).head(5) top_countries
We also will use geopandas
in order to generate the map itself.
"naturalearth land")
geodatasets.fetch(= geopandas.read_file(
world_lowres "https://github.com/geopandas/geopandas/raw/v0.9.0/geopandas/datasets/naturalearth_lowres/naturalearth_lowres.shp"
)= geopandas.read_file(geodatasets.get_path("naturalearth land")) world
Finally, let’s use plotnine to put it all together! The main pieces in play are four geoms_
and then some extra layers to make the plot more readable. A geom_map
to generate the map, geom_point
to place each circle depicting number and type of cheeses, and two geom_text
elements for the country name and number of cheeses. The scale_size
edits the size of the points in geom_point
and scale_colour_brewer
edits the colors of these points. Finally, the theme is a combination of theme_void
and custom theme
and guides
elements.
(
ggplot()+ geom_map(world_lowres, color="#474c53", fill="#d0d0d0", stroke="1")
+ geom_point(
=cheese_plot,
data=aes(x="longitude", y="latitude", size="count", color="milk"),
mapping=0.8,
alpha
)+ geom_text(
top_countries,="longitude", y="latitude", label="country"),
aes(x="bold",
fontweight=16,
size
)+ geom_text(
top_countries,="longitude", y="latitude", label="count"),
aes(x=-4,
nudge_y=16,
size
)+ labs(
="Say Cheese: A World Tour",
title="This map shows the number of cheese types per country and the most common milk used. This is nacho average cheese map!",
subtitle="Data from cheese.com via the TidyTuesday project",
caption="Most common milk type used",
color
)+ theme_void()
+ scale_size(range=(3, 60), guide=None)
+ scale_colour_brewer(type="qual", palette="Accent")
+ theme(
=(20, 12),
figure_size=element_text(size=14),
legend_text_legend="vertical",
legend_direction=element_text(size=16),
legend_title=(0.15, 0.35),
legend_position=element_text(size=36, family="fantasy"),
plot_title=element_text(size=18),
plot_subtitle=element_text(size=16),
plot_caption
)+ guides(colour=guide_legend(override_aes={"size": 8, "alpha": 1}))
)
/Users/isabelzimmerman/.pyenv/versions/3.11.4/envs/docker/lib/python3.11/site-packages/plotnine/layer.py:364: PlotnineWarning: geom_point : Removed 11 rows containing missing values.
If you’re interested in entering the plot contest, but aren’t sure what data to use, here are a few options that look like a lot of fun:
Hope to see some of your plots!
cheers, isabel
Footnotes
Truthfully, I made this plot because I’m a data nerd with a weird idea of fun. Since I’m an employee of the company Posit (who is sponsoring this contest), I cannot actually enter. But you can! Prizes include fun swag, subscriptions to services to host your portfolio, and the priceless bragging rights of being a open source plotting champion 🏆↩︎