Getting Started with Python

Published

May 27, 2026


Pre-workshop prep

Please work through the following installations and configurations before attending the workshop:

1. Install Positron

Positron is the newest data science IDE (Integrated Development Environment) from Posit (the makers of RStudio) that combines the best features of RStudio (e.g. the data explorer, plots pane, etc.) with the flexibility and multi-language support of VS Code (a general-purpose IDE and a popular envrionment for writing Python code). Download and install Positron following these instructions.

You can expand the features and capabilities of Positron by installing extensions. It already comes bundled with those needed for this workshop (Quarto, Jupyter notebooks), but know that you can install additional extensions, as needed, from the Extensions menu in the Activity Bar.

2. Install Miniforge3

conda is both a package installer and environment manager, built for data science work. We’ll learn more about it (and use it!) during the workshop. conda can be installed via several distributions (i.e. bundled collections of software that are packaged and distributed together for easy installation). The most well-known, Anaconda, bundles thousands of pre-installed packages you may never use, and requires accepting a commercial license for organizational use. Miniforge3 is a free, minimal alternative that gives you conda without the bloat, so you install only what you need. Installing Miniforge3 means you’ll receive a base environment containing Python, the conda package manager, and their essential dependencies. Follow the appropriate instructions, below, to install Miniforge3:

  1. Select the appropriate Miniforge3 release from https://conda-forge.org/download/ to download a shell script (.sh file) for installation:

  1. Run your shell script in your Positron Terminal to install Miniforge3. To do this:
    • Make sure you’re in your home directory. Check to see where you are in your computer by typing pwd (“print working directory”) in your Terminal. If you’re already in your home directory, it should return something like, /Users/your-user-name. If you’re not in your home directory, you can jump there by typing cd ~ (cd stands for “change directory”, and ~ is a shortcut for jumping home).
    • Move into your Downloads folder, where your shell script currently lives, by typing cd Downloads in your Terminal.
    • Run your shell script by typing bash downloaded-file-name.sh (mine was named, Miniforge3-Darwin-arm64.sh) in your Terminal.
  2. Follow the prompts in your Terminal to complete the installation:
    • continue pressing ENTER until your reach the end of the license
    • type yes to accept the license terms
    • make sure to install it to your home directory (e.g. Users/samanthacsik/miniforge3) when prompted
    • if asked if you want to update your shell profile to automatically initialize conda, type yes
  1. Select the appropriate Miniforge3 release from https://conda-forge.org/download/ to download an executable (.exe file) for installation:

  1. Double click on the downloaded .exe to begin the installation process. Follow the prompts:
  • Click next on the setup window:

  • Install for All Users, then click next:

  • Check that the destination folder is listed as C:\ProgramData\miniforge3, then click next:

  • Make sure all three Advanced Installation Options are selected, then click Install:

  • Open up Miniforge Prompt:

  • Run conda init:

Intro slides

We’ll start by familiarizing ourselves with some important terminology and concepts. You can access the Google Slides below:

Let’s write & run some Python code

Exercise 0: Create a new local project

Let’s start by creating a local repository (project) to store our work.

  1. Open Positron and click on File > New Folder from Template… > Empty Project > give it a name (e.g. intro-python) and select where on your computer you’d like to save it to > click Create

  2. Create two folders inside your project named exercise1 and exercise2. Click on the Explorer pane (it should already be open) > hover your mouse over your repository name until you see the New Folder… button > add your first folder, then repeat for the second

  3. Download a .gitignore template file from GitHub. A .gitignore file is a plain text file used to tell Git which files or folders it should intentionally ignore and not track. While we won’t be using Git / GitHub to version control our files today, it’s helpful to be aware that the .gitignore template for Python projects looks different than R projects. Run the following in your Positron Terminal:

Terminal
curl -O https://raw.githubusercontent.com/UCSB-MEDS/intro-to-python/refs/heads/main/.gitignore
TipcURL

cURL (i.e. Client URL) is an open-source command-line tool used by developers to transfer data to or from a server. It makes it easy to download a URL from a web server over HTTP. We’ll use this again in the next exercise!

Exercise 1: Run existing Python code

Sometimes you’ll receive Python code and an environment file (environment.yml or requirements.txt) from a collaborator (like the one in the folded chunk, below). In this exercise, you will reproduce an environment and run a Python script.

environment.yml is a text file that specifies a conda environment’s name, channels to download from, and package dependencies. It’s similar to the requirements.txt file that’s used with venv (a virtual environment manager used in conjunction with pip).

environment.yml
name: cmip6-env
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - intake
  - intake-esm
  - s3fs
  - xarray
  - zarr
  - matplotlib
  - numpy
  - pandas
  - jupyter
  - ipykernel
  1. Download the environment.yml and intro-to-python.ipynb from GitHub and save them to your project directory. Do this in the Terminal by running:
Terminal
# Download the environment.yml
curl -O https://raw.githubusercontent.com/UCSB-MEDS/intro-to-python/main/exercises/environment.yml

# Download the intro-to-python.ipynb notebook
curl -O https://raw.githubusercontent.com/UCSB-MEDS/intro-to-python/main/exercises/intro-to-python.ipynb
  • Once downloaded, you can drag and drop these files into your exercise1/ folder
  1. Create the cmip6-env conda environment from our downloaded environment.yml file:
Terminal
conda env create -f exercise1/environment.yml
  1. Activate the environment. You’ll want to make sure you’re in your correct environment (you should see (cmip6-env) at the start of your command line when cmip6-env is correctly activated) before installing packages:
Terminal
conda activate cmip6-env
  1. Register cmip6-env as a Jupyter kernel so it’s available as a notebook interpreter:
Terminal
python -m ipykernel install --user --name cmip6-env --display-name "Python (cmip6-env)"
  1. Open intro-to-python.ipynb (in Positron) and select cmip6-env as your interpreter. Click the interpreter selector in the top right of the Jupyter Notebook and choose Python (cmip6-env). You may need to restart Positron to find your kernel. Shut down Positron entirely > reopen Positron > reopen your project by clicking the folder button (top right corner of Positron) and selecting your project > open your interpreter.

  2. Run through all the cells in the Jupyter Notebook (they should run!).

Exercise 2: Translating R to Python

Let’s say you already have some R code for reproducing the US Drought Monitor’s “Drought area in California” visualization, and you’d like to translate it to Python:

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                                    setup                                 ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

#..........................load packages.........................
library(tidyverse)

#..........................import data...........................
drought <- read_csv(here::here("data", "drought.csv"))

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            wrangle drought data                          ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

drought_clean <- drought |>
  
  # pivot table to be in tidy form ----
  pivot_longer(cols = none:d4, names_to = "drought_lvl", values_to = "area_pct") |>
  
  # select cols of interest & update names for clarity (as needed) ----
  select(start_date, state_abb, drought_lvl, area_pct) |> 
  
  # coerce start_date to date ----
  mutate(start_date = mdy(start_date)) |> 

  # add drought level conditions names ---- 
  mutate(drought_lvl_long = factor(drought_lvl,
                                   levels = c("d4", "d3", "d2", "d1", "d0", "none"),
                                   labels = c("D4 (Exceptional)", "D3 (Extreme)",
                                       "D2 (Severe)", "D1 (Moderate)", 
                                       "D0 (Abnormaly Dry)", 
                                       "No Drought"))) |>
  
  # reorder cols ----
  relocate(start_date, state_abb, drought_lvl, drought_lvl_long, area_pct) |>

  # remove drought_lvl "none" & filter for just CA ----
  filter(drought_lvl != "none",
         state_abb == "CA") |> 

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##       create stacked area plot of CA drought conditions through time     ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
# initialize ggplot ----
ggplot(drought_clean, mapping = aes(x = start_date, y = area_pct, fill = drought_lvl_long)) +
  
  # reverse order of groups so level D4 is closest to x-axis ----
  geom_area(position = position_stack(reverse = TRUE)) +
  
  # update colors to match US Drought Monitor ----
  # (colors identified using ColorPick Eyedropper extension on the original USDM data viz) 
  scale_fill_manual(values = c("#853904", "#FF0000", "#FFC100", "#FFD965", "#FFFF00")) +
  
  # set x-axis breaks & remove padding between data and x-axis ----
  scale_x_date(breaks = scales::breaks_pretty(n = 13),
               limits = as.Date(c("2000-01-01", "2026-12-31")),
               expand = c(0,0)) +

  # set y-axis breaks & remove padding between data and y-axis & convert values to percentages ----
  scale_y_continuous(breaks = seq(0, 100, by = 10),
                     expand = c(0, 0),
                     labels = scales::label_percent(scale = 1)) +
  
  # add title ----
  labs(title = "Drought area in California") +

  # set theme minimal (includes major/minor grid lines, no axes) ----
  theme_minimal() +
  
  # fine-tune adjustments to plot theme ----
  theme(
    
    # update axis lines & ticks color ----
    axis.line = element_line(color = "#5A9CD6"),
    axis.ticks = element_line(color = "#5A9CD6"),
    
    # adjust length of axis ticks ----
    axis.ticks.length = unit(.2, "cm"),
    
    # center plot title ----
    plot.title = element_text(hjust = 0.5, color = "#686868", size = 20,
                              margin = margin(t = 10, r = 0, b = 15, l = 0)),
    
    # remove axis & legend titles ----
    axis.title = element_blank(),
    legend.title = element_blank(),
    
    # axis text color & size ----
    axis.text = element_text(color = "#686868", size = 10),
    legend.text = element_text(color = "#686868", size = 10),
    
    # move legend below plot ----
    legend.position = "bottom",
    legend.direction = "horizontal",
    legend.key.width = unit(0.4, "cm"),
    legend.key.height = unit(0.25, "cm"),
    
    # update plot background color & plot margins ----
    plot.background = element_rect(color = "#686868"),
    plot.margin = margin(t = 10, r = 40, b = 10, l = 40)
  )

Stacked area chart of drought area in CA by severity from 2000-2026.

  1. Download the necessary data and save it to exercise2/data/ (you’ll need to create a data/ folder inside exericse2/ then drag and drop drought.csv into it):
Terminal
curl -O https://raw.githubusercontent.com/UCSB-MEDS/intro-to-python/refs/heads/main/data/drought.csv
  1. Let’s try to use claude.ai (or your preferred GenAI chatbot) to translate our R code into Python code. Try sharing the R code, above, along with the prompt,

Translate this R code into Python code using pandas for the data wrangling and plotnine for the data viz

TipTell your AI tool which Python packages you want to use

Check out this slide for Python equivalents to your favorite R packages.

  1. Add a new Python script named, drought.py to your exercise2/ folder and copy the AI-generated code into it. Yours may look a bit different than ours (and everyone else’s around you).
import pandas as pd
from plotnine import *

# ── Load data ──────────────────────────────────────────────────────────────────
drought = pd.read_csv("data/drought.csv")

# ── Wrangle ────────────────────────────────────────────────────────────────────
drought_clean = (
    drought
    # pivot to tidy form
    .melt(
        id_vars=[c for c in drought.columns if c not in ["none", "d0", "d1", "d2", "d3", "d4"]],
        value_vars=["none", "d0", "d1", "d2", "d3", "d4"],
        var_name="drought_lvl",
        value_name="area_pct",
    )
    # select cols of interest
    [["start_date", "state_abb", "drought_lvl", "area_pct"]]
    # coerce start_date to date
    .assign(
        start_date=lambda df: pd.to_datetime(df["start_date"]),
        drought_lvl_long=lambda df: pd.Categorical(
            df["drought_lvl"],
            categories=["d4", "d3", "d2", "d1", "d0"],
            ordered=True,
        ).rename_categories({
            "d4": "D4 (Exceptional)",
            "d3": "D3 (Extreme)",
            "d2": "D2 (Severe)",
            "d1": "D1 (Moderate)",
            "d0": "D0 (Abnormally Dry)",
        }),
    )
    # filter out "none" drought level and keep only CA
    .query("drought_lvl != 'none' and state_abb == 'CA'")
    .dropna(subset=["drought_lvl_long"])
)

# ── Plot ───────────────────────────────────────────────────────────────────────
(
    ggplot(drought_clean, aes(x="start_date", y="area_pct", fill="drought_lvl_long"))
    + geom_area(position=position_stack(reverse=True))
    + scale_fill_manual(
        values=["#853904", "#FF0000", "#FFC100", "#FFD965", "#FFFF00"]
    )
    + scale_x_datetime(
        breaks=date_breaks("2 years"),
        date_labels="%Y",
        limits=(pd.Timestamp("2000-01-01"), pd.Timestamp("2026-12-31")),
        expand=(0, 0),
    )
    + scale_y_continuous(
        breaks=range(0, 101, 10),
        expand=(0, 0),
        labels=lambda lst: [f"{v}%" for v in lst],
    )
    + labs(title="Drought area in California")
    + theme_minimal()
    + theme(
        axis_line=element_line(color="#5A9CD6"),
        axis_ticks=element_line(color="#5A9CD6"),
        axis_ticks_length=0.2,
        plot_title=element_text(ha="center", color="#686868", size=20,
                                margin={"t": 10, "b": 15}),
        axis_title=element_blank(),
        legend_title=element_blank(),
        axis_text=element_text(color="#686868", size=10),
        legend_text=element_text(color="#686868", size=10),
        legend_position="bottom",
        legend_direction="horizontal",
        legend_key_width=12,
        legend_key_height=8,
        plot_background=element_rect(color="#686868"),
        plot_margin=0.1,
    )
)
  1. Create a new conda environment called drought-env with Python 3.11. While Python 3.13 is the latest version, many packages commonly used for environmental data science work haven’t been updated to support it yet, so we recommend using 3.11 to avoid compatibility issues:
Terminal
conda create -n drought-env python=3.11
  1. Activate your drought-env environment:
Terminal
conda activate drought-env
  1. Install the necessary packages to your drought-env from the conda-forge channel (reminder: conda-forge is a large repository of community-maintained Python packages):
Terminal
conda install -c conda-forge pandas plotnine
  1. Optionally, export your environment to an environment.yml file so others can reproduce it. Use the --from-history flag to export only the packages you explicitly installed, rather than all of their dependencies. This makes the file more portable across operating systems; instead of locking in your machine’s specific dependencies, it lets conda resolve the right ones for each user’s OS.
Terminal
conda env export --from-history > exercise2/environment.yml
  1. Restart Positron, then select your interpreter by clicking the Interpreter Picker (top right corner of Positron) > click New Console Session… > choose the Python 3.11.14 (Conda: drought-env) interpreter.

  2. Try running your drought.py script. Your code will likely not run without some manual updates. Things to try / look out for:

  • it’s helpful to run your code line-by-line
  • make sure the file path for reading in your data is correct
  • check out your df in Positron’s data viewer by running %view drought_clean in your console
  • try commenting out sections of code to get things running (even if it’s not perfect)
  • ask AI to help interpret error messages
import pandas as pd
from plotnine import *

# ── Load data ──────────────────────────────────────────────────────────────────
1drought = pd.read_csv("exercise2/data/drought.csv")

# ── Wrangle ────────────────────────────────────────────────────────────────────
drought_clean = (
    drought
    # pivot to tidy form
    .melt(
        id_vars=[c for c in drought.columns if c not in ["none", "d0", "d1", "d2", "d3", "d4"]],
        value_vars=["none", "d0", "d1", "d2", "d3", "d4"],
        var_name="drought_lvl",
        value_name="area_pct",
    )
    # select cols of interest
    [["start_date", "state_abb", "drought_lvl", "area_pct"]]
    # coerce start_date to date
    .assign(
        start_date=lambda df: pd.to_datetime(df["start_date"]),
        drought_lvl_long=lambda df: pd.Categorical(
            df["drought_lvl"],
            categories=["d4", "d3", "d2", "d1", "d0"],
            ordered=True,
        ).rename_categories({
            "d4": "D4 (Exceptional)",
            "d3": "D3 (Extreme)",
            "d2": "D2 (Severe)",
            "d1": "D1 (Moderate)",
            "d0": "D0 (Abnormally Dry)",
        }),
    )
    # filter out "none" drought level and keep only CA
    .query("drought_lvl != 'none' and state_abb == 'CA'")
    .dropna(subset=["drought_lvl_long"])
)

# ── Plot ───────────────────────────────────────────────────────────────────────
(
    ggplot(drought_clean, aes(x="start_date", y="area_pct", fill="drought_lvl_long"))
    + geom_area(position=position_stack(reverse=True))
    + scale_fill_manual(
        values=["#853904", "#FF0000", "#FFC100", "#FFD965", "#FFFF00"]
    )
2    # + scale_x_datetime(
    #     breaks=date_breaks("2 years"),
    #     date_labels="%Y",
    #     limits=(pd.Timestamp("2000-01-01"), pd.Timestamp("2026-12-31")),
    #     expand=(0, 0),
    # )
    + scale_y_continuous(
        breaks=range(0, 101, 10),
        expand=(0, 0),
        labels=lambda lst: [f"{v}%" for v in lst],
    )
    + labs(title="Drought area in California")
    + theme_minimal()
    + theme(
        axis_line=element_line(color="#5A9CD6"),
        axis_ticks=element_line(color="#5A9CD6"),
        axis_ticks_length=0.2,
        plot_title=element_text(ha="center", color="#686868", size=20,
                                margin={"t": 10, "b": 15}),
        axis_title=element_blank(),
        legend_title=element_blank(),
        axis_text=element_text(color="#686868", size=10),
        legend_text=element_text(color="#686868", size=10),
        legend_position="bottom",
        legend_direction="horizontal",
        legend_key_width=12,
        legend_key_height=8,
        plot_background=element_rect(color="#686868"),
        plot_margin=0.1,
    )
)
1
Updated the file path (our working directory is the project root, rather than where drought.py lives)
2
Running the code line-by-line reveals a NameError: name 'date_breaks' is not defined. when we add in these lines of code. Asking claude.ai to explain the error helps us understand that date_breaks isn’t exported by from plotnine import *, but rather lives in mizani, the underlying scales/formatting dependency. It can be installed via conda install -c conda-forge mizani. For now, we chose to comment out these lines of code to first get a working plot.

Stacked area chart of drought area in CA by severity from 2000-2026.

There are certainly a few more updates we should make to polish up this data viz, including:

  • setting the x-axis breaks to something a bit cleaner (ideally, years)
  • making sure the legend doesn’t spill outside of our plot

We don’t have time to do that today, but we encourage you to give it a try yourself!

Contribute

Have suggestions for improving these materials? File an issue on GitHub GitHub.