Notes From The Dork Web

Modeling ERA5 Climate Data In Python

Getting up and running

I know conda is the new hotness but I prefer venv and pip, so make sure you have those installed first:

python -m venv env
. venv/bin/activate
pip3 install xarray dask netCDF4 bottleneck cartopy proplot numpy scipy notebook cdsapi
jupyter notebook

Jupyter notebook should now show up in your browser. Before we can use the CDS API we need to get an account at the Copernicus Climate Data Store. Once signed up, get the API key and secret from the user profile page and create ~/.cdsapirc in the following format:

url: https://cds.climate.copernicus.eu/api/v2
key: UID:APIKEY
verify: 0

Replace UID with the API UID and APIKEY with your API key value.

In the notebook create cells to store the imports and config, a get_era5 method (taken from here), and some code to generate chart data.

First, the imports:

import cdsapi
import xarray as xr
from urllib.request import urlopen
import urllib3
import proplot as plot 
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import cartopy.crs as ccrs
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

Then get_era5():

def get_era5(dataset_name='reanalysis-era5-single-levels', 
             var=None, 
             dates=None,
             pressure_level=None,
             grid=[1.0, 1,0],
             area=[90, -180, -90, 180],
             download_flag = False,
             download_file='./output.nc'
            ):
    ''' Get ERA5 reanalysis output from the web
    this script grabs ERA5 variables from the web and stores them 
    in an xarray dataset. 

    the ERA5 CDS API must be installed on the local machine.
    See section 4 here: https://cds.climate.copernicus.eu/api-how-to

    Parameters
    ----------                  
    dataset_name: str, default 'reanalysis-era5-single-levels'
        name of dataset to use. Options include:
        * 'reanalysis-era5-single-levels'
        * 'reanalysis-era5-single-levels-monthly-means'
        * 'reanalysis-era5-pressure-levels'
        * 'reanalysis-era5-pressure-levels-monthly-means'
        * 'reanalysis-era5-land'
        * 'reanalysis-era5-land-monthly-means'

    dates: list of strings or datetime64, default None
        example ['1980-01-01', '2020-12-31']
    var: str, default None
        name of variable to download
        example '2m_temperature'
    pressure_level: str, default None
        pressure level to grab data on
    grid: list, deafult [1.0, 1.0]
        spatial lat, lon grid resolution in deg
    area: list, default [90,-180,-90, 180]
        area extent download [N, W, S, E]
    download_flag = True or False, default False
        flag to download data or not
    download_file= str, default './output.nc'
        path to where data should be downloaed to.
        data only downloaded if download_flag is True
    Returns
    -------
    ds: xarrayDataSet
        all the data will be in an xarray dataset

    Example
    -------
    ds = get_era5(dataset_name='reanalysis-era5-single-levels-monthly-means', 
                 var='2m_temperature', 
                 dates=['2021-02-01'],
                 grid=[0.25, 0.25])

    Notes
    -------    
    # cdsapi code is here
    https://github.com/ecmwf/cdsapi/tree/master/cdsapi
    # information on api is here
    https://confluence.ecmwf.int/display/CKB/Climate+Data+Store+%28CDS%29+API+Keywords
    # era5 dataset information is here
    https://confluence.ecmwf.int/display/CKB/The+family+of+ERA5+datasets
    '''
    import cdsapi
    import xarray as xr
    import pandas as pd
    from urllib.request import urlopen

    # test if acceptable pressure level
    acceptable_pressures = [1, 2, 3, 5, 7, 10, 20, 30, 50, 70, range(100, 1000, 25)]
    if pressure_level not in [str(lev) for lev in acceptable_pressures]:
        print(f"!! Pressure level must be in this list: {acceptable_pressures}")

    # start the cdsapi client
    c = cdsapi.Client()

    # parameters
    params = dict(
        format = "netcdf",
        product_type = "reanalysis",
        variable = var,
        grid = grid,
        area = area,
        date = list(dates.strftime('%Y-%m-%d %H:%M')) \
               if isinstance(dates, pd.core.indexes.datetimes.DatetimeIndex)\
               else dates,
        )

    # what to do if asking for monthly means
    if dataset_name in ["reanalysis-era5-single-levels-monthly-means", 
                        "reanalysis-era5-pressure-levels-monthly-means",
                        "reanalysis-era5-land-monthly-means"]:
        params["product_type"] = "monthly_averaged_reanalysis"
        _ = params.pop("date")
        params["time"] = "00:00"

        # if time is in list of pandas format
        if isinstance(dates, list):
            dates_pd = pd.to_datetime(dates)
            params["year"] = sorted(list(set(dates_pd.strftime("%Y"))))
            params["month"] = sorted(list(set(dates_pd.strftime("%m"))))
        else:
            params["year"] = sorted(list(set(dates.strftime("%Y"))))
            params["month"] = sorted(list(set(dates.strftime("%m"))))


    # if pressure surface
    if dataset_name in ["reanalysis-era5-pressure-levels-monthly-means",
                        "reanalysis-era5-pressure-levels"]:
        params["pressure_level"] = pressure_level

    # product_type not needed for era5_land
    if dataset_name in ["reanalysis-era5-land"]:
        _ = params.pop("product_type")

    # file object
    fl=c.retrieve(dataset_name, params) 

    # download the file 
    if download_flag:
        fl.download(f"{download_file}")

    # load into memory and return xarray dataset
    with urlopen(fl.location) as f:
        return xr.open_dataset(f.read())

Lets build a test plot using get_era5():

ds_out = get_era5(dataset_name='reanalysis-era5-single-levels-monthly-means', 
                 var='2m_temperature', 
                 dates=['2022-05-01'],
                 grid=[0.25, 0.25])

fig = plt.figure(figsize=[12,5])

# 111 means 1 row, 1 col and index 1
ax = fig.add_subplot(111, projection=ccrs.PlateCarree(central_longitude=0))

ds_out['t2m'].plot(ax=ax, cmap='jet',
                   transform=ccrs.PlateCarree())
ax.coastlines()

plt.show()

You should get a plate carree projection map plot in Jupyter. Now we can iterate through all of the years to generate data:

stub = "-05-01"
date_range = [f"{x}{stub}" for x in range(1940,2023)]

for d in date_range:
    ds_out = get_era5(dataset_name='reanalysis-era5-single-levels-monthly-means', 
                 var='2m_temperature', 
                 dates=[d],
                 grid=[0.25, 0.25])

    fig = plt.figure(figsize=[12,5])
    ax = fig.add_subplot(111, projection=ccrs.PlateCarree(central_longitude=0))
    ds_out['t2m'].plot(ax=ax, cmap='jet',
                   transform=ccrs.PlateCarree())
    ax.coastlines()
    plt.savefig(f"./{d}-2m-monthly-mean.png")

You might want to modify that to include 2023 (as it only goes up to 2022 on my machine). I generated 2023 by hand, separately.

Doing Something With The Plots

The plots are quite big, so copy them into a sub-folder (I used sm) and cd into it. You can scale the images to something more reasonable with mogrify. It is possible to set size and dpi with matplotlib but I prefer to have large images I scale down than small images I need to scale up.

mogrify -path . -resize 10% *.png

Now you can view the images with feh, or your favourite viewer and use left and right to cycle through.

feh -F *.png

Making An Animation

It's great we have the images, but lets make an animation. I don't like the framerate to be too high to spot changes in individual years. I used ffmpeg to tie everything together.

ffmpeg -f image2 -r 5 -pattern_type glob  -i "*.png" -vcodec libx264 -crf 22 vid.mp4

The resulting vid.mp4 file can be played with ffplay, vlc, or your favourite media player. Areas of note:

Each of these may have climate inputs or outputs represented in the ERA5 data. Explore these to learn more.