Recreating YouGov Plots in plotnine

Net favourability gap between Conservatives and Labour smallest in nearly four years: Political favourability ratings, August 2025

Author
Published

September 27, 2025

Introduction

This document is a submission for Posit’s 2025 Plotnine Contest.

Purpose

This page includes attempted one-to-one recreations of the visualisations in “Net favourability gap between Conservatives and Labour smallest in nearly four years” in Python and plotnine, originally published by Dylan Difford, a Junior Data Journalist at YouGov, on August 19, 2025. It shows how many of YouGov’s effective data visualisations can be recreated using the plotnine Python library.

YouGov describes itself as:

…an international online research data and analytics technology group. Our mission is to offer unparalleled insight into what the world thinks.

At the core of our platform is an ever-growing source of connected consumer data that has developed daily over our 20 years of operation. We call it living data. All of our products and services draw upon this detailed understanding of our 29+ million registered panel members to deliver accurate, actionable consumer insights.

While YouGov collects and vends data relating to a myriad of topics, there are many core data visualisations which crop up time and time again in their online articles. The article chosen for recreation shows some nice examples of typical YouGov visualisations, including line and bar charts. If you would like to recreate YouGov articles yourself, the site makes it particularly easy; all data can be obtained using the “Get the data” buttons at the bottom of each plot in a YouGov web article.

On this page, some of the original YouGov plots are found in collapsible callout boxes for easy reference, though I would recommend reading the original YouGov article to get a feel for what we’re trying to recreate.

About Me

I am a senior consultant and data analyst working for an Environmental Consultancy in South Oxfordshire in the United Kingdom.

My work typically involves writing dynamic reports, creating effective data visualisations, authoring Shiny web apps, writing and maintaining R packages, facilitating training workshops, and otherwise writing code to do interesting things with data.

I’m a collaborator on the {openair} project and the lead developer on the {openairmaps} R package.

My main programming language is R and I’ve rarely used Python in anger, so consider this a warning that there will be some references to the R programming language throughout. If I’ve made some terrible Python faux-pas anywhere in this document, please let me know!

Data Visualisation

Set-Up

We’re going to start by loading some packages:

  • We’ll do all of our data manipulation in polars - it’s fast and has an appealing syntax for an R user.

  • To help a bit, we’ll also import polars.selectors - it’s our tidyselect equivalent.

  • Naturally, we’ll import plotnine for our plots.

  • Finally, we’ll import datetime as we’ll be working with dates and will need to construct and manipulate them.

import polars as pl
import polars.selectors as cs
from plotnine import *
from datetime import datetime

We’re also going to define a function from the off. It takes a load of dates. The first time it sees a new year, it’ll create a month-year label. The next time it sees the same year, it’ll just create a month label. This works near enough the same as scales::label_date_short() in R, which I don’t believe is implemented in plotnine!

def month_labels(dates):
    labels = []
    seen_years = set()
    for d in dates:
        if d.year not in seen_years:
            labels.append(d.strftime("\n%b\n %Y"))  # e.g. "Feb 2025"
            seen_years.add(d.year)
        else:
            labels.append(d.strftime("\n%b"))  # e.g. "Mar"
    return labels

As previously noted, all data were obtained using the “Get the data” buttons at the bottom of each plot in the YouGov web article. Their default names are just random numbers and letters, so I’ve taken the liberty to rename them to be something more evocative.

Leader Favourability By Party (Bar Charts)

This next collection of plots are quite appealing bar charts showing overall favourability for different national and international political figures by UK political affiliation. We might call these kinds of plots ‘small multiples’. On the face of them they look quite simple, but these plots have quite a lot of features that make recreation with plotnine (or even ggplot2) tricky:

  • You could create a facet_grid(), but the “row” labels are on top of each row, not to the left or right of it.

  • There are direct value labels, but whether they are to the right of the bar or left aligned to the whole plotting area depends on the value being presented.

  • When the direct value labels are on top of the bar, they need to be a colour with sufficient contrast to be read.

  • The plots have overall titles and, in one instance, a caption.

So, the strategy here is to actually create multiple individual plots and then assemble them together. plotnine has a basic implementation of some of the functionality of R’s patchwork, which lets you assemble plots with simple mathematical operators like +, | and /.

Once again we’ve defined a function. Some notes:

  • The is_green_only argument exists because there’s a specific plot which only contains two categories (green voters vs all britons) which affects some of the internal plotting parameters like how much to nudge labels by and the colours to use.

  • We use multiple geom_text() calls - one for labels below a certian threshold that go to the right of the bar, and one for labels above that threshold that are left-aligned on the whole plot. In the former, the colour is a dark grey. For the latter, the colour needs to vary based on the political party, so it is mapped to a variable. We create a colour scale that’s purely for the text colour. This is done manually, but one could imagine writing a function to pick the “best contrast” for the party colours (in R I’d use prismatic::best_contrast() for this; I’m sure there’ll be a Python equivalent).

  • I’ve used coord_flip(). In ggplot2 the plot automatically detects the orientation of bars/boxplots/etc.; here I found I needed to have the continuous variable be on the “y” for geom_col() and then flip the coordinate after-the-fact.

  • The politician’s name is used as a subtitle, which will make sense when multiples of these plots are put together!

def plot_leader_favourability(data, who, is_green_only=False):
    plot_data = data.filter(pl.col("who") == who)

    # Set thresholds and colors based on plot type
    if is_green_only:
        text_threshold = 5
        text_y_pos = 1
        nudge_y = 1
        fill_colors = {"All Britons": "#9f29ff", "2024 Green voters": "#31caa8"}
        text_colors = {"All Britons": "white", "2024 Green voters": "#333333"}
    else:
        text_threshold = 30
        text_y_pos = 5
        nudge_y = 5
        fill_colors = {
            "All Britons": "#9f29ff",
            "2024\nGreen": "#31caa8",
            "2024\nLabour": "#c20800",
            "2024\nLib Dem": "#ffba22",
            "2024\nConservative": "#003cab",
            "2024\nReform UK": "#06a6ee",
        }
        text_colors = {
            "All Britons": "white",
            "2024\nGreen": "#333333",
            "2024\nLabour": "white",
            "2024\nLib Dem": "#333333",
            "2024\nConservative": "white",
            "2024\nReform UK": "white",
        }

    plot = (
        ggplot(plot_data, aes(y="value", x="what"))
        + geom_col(aes(y=100), fill="#3333330d")
        + geom_col(aes(fill="variable"))
        + geom_text(
            data=plot_data.filter(pl.col("value") > text_threshold),
            mapping=aes(label="value", y=text_y_pos, color="variable"),
            ha="left",
            size=8,
        )
        + geom_text(
            data=plot_data.filter(pl.col("value") <= text_threshold),
            mapping=aes(label="value", y="value"),
            ha="left",
            size=8,
            color="#333333",
            nudge_y=nudge_y,
        )
        + coord_flip()
        + facet_grid(cols="variable")
        + scale_y_continuous(limits=[0, 100])
        + scale_fill_manual(values=fill_colors)
        + scale_color_manual(values=text_colors)
        + theme_minimal()
        + theme(
            axis_text_x=element_blank(),
            panel_grid=element_blank(),
            legend_position="none",
            axis_text_y=element_text(ha="left"),
            strip_text=element_text(ha="left"),
            plot_subtitle=element_text(face="bold", size=12 / 1.5),
            plot_title_position="plot",
        )
        + labs(x="", y="", subtitle=who)
    )

    return plot

This plot compares the favourability of Keir Starmer to the fledgling left-wing “Your Party” run by Jeremy Corbyn and Zarah Sultana, both historically on the Labour hard-left.

To construct the overall assembly title I created a dummy plot that is effectively only a title and subtitle, and added this to the plot assembly. Patchwork has the plot_annotation() function that doesn’t seem to be mirrored in plotnine yet, so this was the workaround I found!

your_party_leaders = (
    pl.read_csv("assets\\data\\yougov_leftwing.csv")
    .rename({"X.1": "who", "X.2": "what"})
    .unpivot(index=["who", "what"], on=["All Britons", cs.starts_with("2024")])
    .with_columns(pl.col("variable").str.replace("<br>", "\n"))
    .with_columns(
        pl.col("variable").cast(pl.Categorical),
        pl.col("what").cast(pl.Enum(["Unfavourable", "Favourable"])),
    )
)

(
    ggplot()
    + theme_void()
    + labs(
        title="How do Britons' attitudes towards Corbyn and Sultana\ncompare to their views on Starmer?",
        subtitle="Do you have a favourable or unfavourable opinion of the following? %",
    )
    + theme(
        plot_title=element_text(face="bold", size=22 / 1.5),
        plot_subtitle=element_text(face="light", size=12 / 1.5),
        plot_title_position="plot",
        aspect_ratio=0.001,
        plot_margin=0,
    )
) / plot_leader_favourability(
    your_party_leaders, "Jeremy Corbyn"
) / plot_leader_favourability(
    your_party_leaders, "Zarah Sultana"
) / plot_leader_favourability(
    your_party_leaders, "Keir Starmer"
)

This plot is much the same as the domestic leaders, but shows various international political figures - President of Ukraine Volodymyr Zelenskyy, US President Donald Trump, US Vice President JD Vance, and President of Russia Vladimir Putin.

international_leaders = (
    pl.read_csv("assets\\data\\yougov_international.csv")
    .rename({"X.1": "who", "X.2": "what"})
    .unpivot(index=["who", "what"], on=["All Britons", cs.starts_with("2024")])
    .with_columns(pl.col("variable").str.replace("<br>", "\n"))
    .with_columns(
        pl.col("variable").cast(pl.Categorical),
        pl.col("what").cast(pl.Enum(["Unfavourable", "Favourable"])),
    )
)

(
    ggplot()
    + theme_void()
    + labs(
        title="YouGov international favourability ratings, August 2025",
        subtitle="Do you have a favourable or unfavourable opinion of the following? %",
    )
    + theme(
        plot_title=element_text(face="bold", size=22 / 1.5),
        plot_subtitle=element_text(face="light", size=12 / 1.5),
        plot_title_position="plot",
        aspect_ratio=0.001,
        plot_margin=0,
    )
) / plot_leader_favourability(
    international_leaders, "Volodymyr Zelenskyy (net +49)"
) / plot_leader_favourability(
    international_leaders, "Donald Trump (net -61)"
) / plot_leader_favourability(
    international_leaders, "JD Vance (net -55)"
) / plot_leader_favourability(
    international_leaders, "Vladimir Putin (net -86)"
)

At the original time of writing of the YouGov article, a leadership election was occurring in the UK Green Party; our green, left-wing party which has historically had little representation in the House of Commons. Britons were surveyed on their opinions of various Green leadership figures (and a fictitious candidate to guage reflexive political opinions).

This plot is cut from the same cloth as the others but the overall layout is somewhat different, hence the need for the different parameters to control label placement and so on. Here we add an additional mock caption to explain the non-existent Andrew Farmer MP.

green_party_leaders = (
    pl.read_csv("assets\\data\\yougov_green.csv")
    .rename({"X.1": "who", "X.2": "what"})
    .unpivot(index=["who", "what"], on=["All Britons", cs.starts_with("2024")])
    .with_columns(
        pl.col("variable").cast(pl.Categorical),
        pl.col("what").cast(pl.Enum(["Don't know", "Unfavourable", "Favourable"])),
    )
)

(
    ggplot()
    + theme_void()
    + labs(
        title="Green leadership figures largely unknown, even by\nthose who have voted for the party",
        subtitle="Do you have a favourable or unfavourable opinion of the following? %",
    )
    + theme(
        plot_title=element_text(face="bold", size=22 / 1.5),
        plot_subtitle=element_text(face="light", size=12 / 1.5),
        plot_title_position="plot",
        aspect_ratio=0.001,
        plot_margin=0,
    )
) / plot_leader_favourability(
    green_party_leaders, "Carla Denyer", is_green_only=True
) / plot_leader_favourability(
    green_party_leaders, "Adrian Ramsay", is_green_only=True
) / plot_leader_favourability(
    green_party_leaders, "Zack Polanski", is_green_only=True
) / plot_leader_favourability(
    green_party_leaders, "Ellie Chowns", is_green_only=True
) / plot_leader_favourability(
    green_party_leaders, "Andrew Farmer*", is_green_only=True
) / (
    ggplot()
    + theme_void()
    + labs(
        caption="* Andrew Farmer is a fake politician, used to test how many respondents reflexively say they have an opinion of a\nnon-existent figure"
    )
    + theme(
        plot_caption=element_text(ha="left", face="light", size=12 / 1.5),
        plot_caption_position="plot",
        aspect_ratio=0.001,
        plot_margin=0,
    )
) + theme(figure_size=(7, 7))

Overall Senior Political Figure Favourability (Stacked Bars)

Frankly, this is where the wheels start falling off!

This is a big summary stacked bar chart which shows a Favourable, Unfavourable and “Don’t Know” score for a load of different political figures, both domestic and international. On first glance, you might assume we could do what we did for the previous bar charts. However, the height of each “subplot” would be different as the numbers of figures in each category are different. Adjusting the heights of the different subplots would be difficult, so we’re just going to have to use facet function to get something approximating the original plot.

In recent versions of ggplot2 we could use facet_wrap() with one column of facets and a combination of the space and scales arguments to create something very similar to the YouGov plot. In older version of ggplot2, we could use ggforce::facet_col() to achieve much the same thing. No similar functionality seems to exist in plotnine, so we’re going to stick with facet_grid() and live with having the subplot labels be on the right hand side of the plot.

A couple of curious bugs or limitations of plotnine emerged when trying this:

  • I could not seem to left-align the facet labels if there were new lines (\n) within them.

  • Left-aligning the y-axis labels placed the start point in different places for the different facets. This isn’t ideal and makes the plot look less cohesive, and obviously doesn’t match YouGov’s formatting.

Anyway, the Python code below shares many of the same themes as the other plots, with some new strategies needed for this plot:

  • There’s a lot of pl.Enum() to ensure categories appear in the correct order; this is similar to using factor() in R. To order the leaders by favourability was quite round-the-houses; in R I’d have used forcats::fct_reorder() to do it in one line.

  • To achieve the correct value labels I used position_stack() with the vjust argument. Very small value labels don’t appear in the data, so I filtered the dataset within the geom_text() call. It’s not possible to use a nudge argument with position_stack(), and adding a value to the data directly will throw off position_stack()s calculations, so I’ve cheated and just added a space (" ") to the beginning of the label.

  • The x scale extends to 101 - this isn’t a funky Python counting thing this time, though - the YouGov plot has this ragged edge where, I imagine, values have been rounded and now don’t always add up to 100.

# read the leaders dataset
leaders = (
    pl.read_csv("assets\\data\\yougov_politicians.csv")
    .rename({"X.1": "category", "X.2": "who"})
    .unpivot(["Favourable", "Don't know", "Unfavourable"], index=["category", "who"])
    .with_columns(
        # need to order the variable
        pl.col("variable").cast(pl.Enum(["Unfavourable", "Don't know", "Favourable"])),
        # Stack categories by replacing spaces with new lines, then order to match yougov plot
        pl.col("category")
        .str.replace_all(" ", "\n")
        .cast(
            pl.Enum(
                ["Party\nleaders", "Other\nsenior\npoliticians", "Political\nparties"]
            )
        ),
    )
)

# get the order of the leaders based on favourability
ordered_leaders = (
    leaders.filter(pl.col("variable") == "Favourable")
    .sort("value")
    .get_column("who")
    .to_list()
)

# order the leaders column to by favourability
leaders = leaders.with_columns(pl.col("who").cast(pl.Enum(ordered_leaders)))

# construct plot
(
    ggplot(leaders, aes(y="value", x="who"))
    # geometries
    + geom_col(aes(fill="variable"))
    + geom_text(
        data=leaders.with_columns(
            pl.when(pl.col("value") <= 3)
            .then(pl.lit(""))
            .otherwise(" " + pl.col("value").cast(pl.String))
            .alias("value_str")
        ),
        mapping=aes(y="value", label="value_str", group="variable", color="variable"),
        position=position_stack(vjust=0),
        size=8,
        ha="left",
        show_legend=False,
    )
    # facet
    + facet_grid(rows="category", scales="free_y", space="free")
    # scales
    + coord_flip()
    + scale_fill_manual(
        values={
            "Favourable": "#9f29ff",
            "Don't know": "#ccd1db",
            "Unfavourable": "#ff412c",
        },
        breaks=["Favourable", "Don't know", "Unfavourable"],
    )
    + scale_color_manual(
        values={
            "Favourable": "white",
            "Don't know": "#333333",
            "Unfavourable": "white",
        }
    )
    + scale_y_continuous(limits=[0, 101])
    # themes
    + theme_minimal()
    + theme(
        axis_text_x=element_blank(),
        panel_grid=element_blank(),
        legend_position="top",
        axis_text_y=element_text(ha="left"),
        strip_text=element_text(ha="left", face="bold", rotation=0),
        plot_title=element_text(face="bold", size=22 / 1.5),
        plot_subtitle=element_text(face="light", size=12 / 1.5),
        plot_title_position="plot",
        plot_caption=element_text(ha="left", face="light", size=12 / 1.5),
        plot_caption_position="plot",
        panel_spacing=0.05,
        figure_size=(7, 7),
    )
    # labels
    + labs(
        x="",
        y="",
        fill="",
        title="YouGov political favourability ratings, August 2025",
        subtitle="Do you have a favourable or unfavourable opinion of the following? %",
        caption="* Andrew Farmer is a fake politician, used to test how many respondents reflexively say they have an opinion of a\nnon-existent figure",
    )
)

Now these limitations and shortcomings are somewhat frustrating and, just to get this as a “win”, I have also produced this final plot from the article in R’s ggplot2 below. Two points to note here, though:

  • plotnine is relatively new, so its no surprise it can’t do everything ggplot2 can. ggplot2 is old enough to drink in my country, whereas plotnine is only half-way through primary school, so its quite impressive that plotnine can get as close as it does on its own.

  • The ggplot2 implementation still doesn’t quite look right - the category labels aren’t aligned with the plot. You can get part of the way there with the arguably off-label move of passing a negative value to hjust in theme(strip.text), or could split the plots up and patchwork them back together (although you’d need to do the maths on the bar heights so they’re all consistent, and again that’s not convenient).

library(ggplot2)

readr::read_csv("assets/data/yougov_politicians.csv") |>
  dplyr::rename(category = X.1, who = X.2) |>
  tidyr::pivot_longer(-(1:2)) |>
  dplyr::mutate(
    favourable = ifelse(name == "Favourable", value, NA),
    who = forcats::fct_reorder(who, favourable),
    name = factor(name, c("Unfavourable", "Don't know", "Favourable")),
    category = factor(category, c("Party leaders", "Other senior politicians", "Political parties"))
  ) |>
  ggplot(aes(x = value, y = who)) +
  geom_col(aes(fill = name)) +
  geom_text(
    aes(label = ifelse(value < 5, "", paste(" ", value)), group = name, color = name),
    hjust = 0,
    position = position_stack(vjust = 0),
    show.legend = FALSE
  ) +
  theme_minimal() +
  theme(
    axis.text.y = element_text(hjust = 0),
    axis.text.x = element_blank(),
    strip.text = element_text(hjust = 0, face = "bold"),
    strip.clip = "off",
    panel.grid = element_blank(),
    legend.position = "top",
    legend.justification = "left",
    plot.title = element_text(size = 22, face = "bold"),
    plot.subtitle = element_text(size = 12), 
    plot.title.position = "plot",
    plot.caption = element_text(size = 10, hjust = 0, face = "italic"),
    plot.caption.position = "plot"
  ) +
  facet_wrap(vars(category), scales = "free_y", space = "free_y") +
  coord_cartesian(clip = "off") +
  scale_x_continuous(expand = expansion()) +
  labs(
    y = NULL,
    x = NULL,
    fill = NULL,
    title = "YouGov political favourability ratings, August 2025",
    subtitle = "Do you have a favourable or unfavourable opinion of the following? %",
    caption = "* Andrew Farmer is a fake politician, used to test how many respondents reflexively say they have an opinion of a non-\nexistent figure"
  ) +
  scale_fill_manual(
    values = c(
      "Favourable" = "#9f29ff",
      "Don't know" = "#ccd1db",
      "Unfavourable" = "#ff412c"
    ),
    breaks = c("Favourable", "Don't know", "Unfavourable")
  ) +
  scale_color_manual(
    values = c(
      "Favourable" = "white",
      "Don't know" = "#333333",
      "Unfavourable" = "white"
    ),
    breaks = c("Favourable", "Don't know", "Unfavourable")
  )

ggsave("assets/media/R_plot.png", width = 8, height = 10, dpi = 300, device = "png")

Wrap-up

Good news - that’s all of the plots! To close, I just want to give a few thoughts to summarise as an R user coming to plotnine and Python more generally:

  • The syntax of plotnine is truly almost identical to ggplot2, so it really is easy to move from one to the other. Yes, the odd argument is named slightly differently (e.g., ha and va to align text), but these instances are few and far between. Despite never really writing any proper Python in the past, between plotnine and polars it really wasn’t a struggle. Speaking of…

  • polars feels nice to use in a way pandas never quite did. It really does feel close to dplyr - it just, by necessity, lacks the convenience brought by non-standard evaluation in R (e.g., needing to say pl.with_columns(pl.col("x")) rather than just mutate(x)).

  • One of the best things about ggplot2 is its extensibility: the ecosystem of custom stats, geoms, facets, and scales gives you an incredible range of options for building exactly the plots you need. I’m not yet sure how extensible plotnine is, but I imagine many of these capabilities will arrive in time.

  • Speaking as someone who works with a lot of weather data, I’d love coord_polar() to come to plotnine so I can make a wind rose, please! 🙂

  • When learning a new tool, the challenge isn’t just understanding how it works - it’s also figuring out what to do with it. Recreating existing visualisations, especially those based on freely available data (like YouGov’s), is a great way to cut through that problem. You have a clear objective, real data, and a concrete end result to aim for.