Climatic data - Scots pine

Author

Juliette Archambeau

Published

October 13, 2023

For our study, we need climatic data at the location of the studied populations:

for a reference period (representing the climates under which the populations have evolved, or the climates under which the sampled trees were grown). It has to be before temperatures have started to increase with climate change.
for a future time period for predictions.

In this document, I try to understand the differences among climatic datasets from the CEDA archive.

1 HadUK-Grid gridded and regional average climate observations for the UK

HadUK-Grid Gridded Climate Observations on a 1km grid over the UK, v1.2.0.ceda (1836-2022)

“HadUK-Grid is a collection of gridded climate variables derived from the network of UK land surface observations. The data have been interpolated from meteorological station data onto a uniform grid to provide complete and consistent coverage across the UK. The datasets cover the UK at 1 km x 1 km resolution. These 1 km x 1 km data have been used to provide a range of other resolutions and across countries, administrative regions and river basins to allow for comparison to data from UKCP18 climate projections. The dataset spans the period from 1836 to 2022, but the start time is dependent on climate variable and temporal resolution. The gridded data are produced for daily, monthly, seasonal and annual timescales, as well as long term averages for a set of climatological reference periods. Variables include air temperature (maximum, minimum and mean), precipitation, sunshine, mean sea level pressure, wind speed, relative humidity, vapour pressure, days of snow lying, and days of ground frost.”

This dataset is a 1 × 1 km grid on the British National Grid projection EPSG:27700 (Hollis et al. 2019).

Downloaded here: https://data.ceda.ac.uk/badc/ukmo-hadobs/data/insitu/MOHC/HadOBS/HadUK-Grid/v1.2.0.ceda/1km

More information about each variable can be found here: https://www.metoffice.gov.uk/research/climate/maps-and-data/data/haduk-grid/datasets.

We downloaded the 30 year long term averages (1981-2010) of seasonal and annual data.

We first extract the climatic values from the NetCDF files of the HadUK grid at the location of the populations.

Code

# We extract the variable names
var_names <- list.files(path=here::here("data/ScotsPine/ClimateData/HadUKGrid_v120ceda_1km/")) %>% 
  str_subset("ann") %>% 
  str_sub(1,-43)

# Population coordinates in WGS84
pop_coord <- read_csv(file=here::here("data/ScotsPine/population_coordinates.csv"))

# Reproject the population coordinates in EPSG:27700
pop_coord_epsg27700 <- pop_coord %>% 
  dplyr::select(Longitude,Latitude) %>% 
  sp::SpatialPoints(proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")) %>% 
  terra::vect() %>% 
  terra::project("EPSG:27700") %>% 
  terra::crds() %>% 
  as_tibble()

# Function to extract the climatic values from the nc files of the HadUK grid
extract_clim_values_from_nc_files <- function(x){
# We extract the annual values
rast <- raster(here::here(paste0("data/ScotsPine/ClimateData/HadUKGrid_v120ceda_1km/",x,"_hadukgrid_uk_1km_ann-30y_198101-201012.nc")), 
               varname=x)
proj4string(rast)=CRS("+init=EPSG:27700")
ann_values <- raster::extract(rast,pop_coord_epsg27700)

# We extract the seasonal values
rast <- stack(here::here(paste0("data/ScotsPine/ClimateData/HadUKGrid_v120ceda_1km/",x,"_hadukgrid_uk_1km_seas-30y_198101-201012.nc")), 
               varname=x)
proj4string(rast)=CRS("+init=EPSG:27700")
seas_values <- raster::extract(rast,pop_coord_epsg27700) %>% 
  set_colnames(c("winter","spring","summer","autumn")) %>% 
  as_tibble()

seas_values %>% mutate(annual=ann_values) %>% 
  set_colnames(str_c(x,"_",colnames(.))) %>% 
  return()


}

# We merge the climatic data with the population codes and coordinates
dfclim <- lapply(var_names, function(x) extract_clim_values_from_nc_files(x)) %>% 
  bind_cols() %>% 
  bind_cols(pop_coord,.)

We can map the spatial variation for each climatic variable, e.g. sunshine duration during summer (hours).

Code

dfclim %>% make_spatialpoints_map(var="sun_summer",ggtitle="Sunshine duration during summer (hours)")

We can also look at the distribution of the climatic variables.

Code

p <- dfclim %>% 
  dplyr::select(-PopulationCode,-contains("itude")) %>% 
  pivot_longer(everything(),names_to="variable") %>% 
  ggplot(aes(x=value)) +  
  geom_histogram(aes(y=after_stat(density)), colour="blue",fill="white",bins = 34) +
  geom_density(alpha=.2,fill="pink") +
  facet_wrap(~variable,scales="free") + 
  theme_bw() 

p %>% ggsave(filename=here::here("figs/ExploratoryAnalyses/DistributionClimaticVariables.pdf"),
               width=21,height=15)

p

Code

# Generate a PCA
pca <- prcomp(dfclim[,-1], center = TRUE,scale. = TRUE)

p <- ggbiplot(pca,varname.size =4, labels=dfclim$PopulationCode) +  
  ylim(-4, 2.5) +    
  xlim(-3, 3) + 
  theme_minimal(base_size = 12)

ggsave(p,filename=here::here("figs/ExploratoryAnalyses/PCAClimVariables.pdf"),
       width=12,height=12)
p

2 UK Climate Projections (2018) - UKCP18

2.1 Information

2.1.1 UKCP18 products

The UKCP18 land projections contains:

probabilistic projections
global (60km) projections
regional (12km) and local (2.2km) projections
derived projections

The probabilistic projections (25 km spatial resolution) provide the primary tool for assessments of the ranges of uncertainties in UKCP18. For a given emissions scenario, they provide information on known uncertainties in future climate changes. In particular, the aim is to represent uncertainties consistent with the knowledge incorporated in existing ensembles of climate model projections, plus the effects of internal climate variability. They combine climate model data, observations and advanced statistical methods to simulate a wide range of climate outcomes: they assess the broadest range of future outcomes from the UKCP projections over land. It includes projections for five future emissions scenarios (RCP2.6, RCP4.5, RCP6.0, RCP8.5 and SRESA1B) as well as monthly, seasonal or annual temporal averages.

The global, regional and local projections provide flexible datasets derived directly from climate model output. These have full spatial and temporal coherence and offer information on a wider set of variables (that are physically consistent), metrics and time scales than is available from the probabilistic projections. These projections provide storylines of climate futures that can be used to develop case studies and and decision options. Due to computational expense, only RCP8.5 is available for the global and regional projections using Met Office models.

The global (60km) projections are a set of 28 climate futures at 60km grid resolution, showing how the 21st Century climate could evolve under the highest emission scenario, RCP8.5. They assess the uncertainty across different models from different modelling centres as well as the parameter uncertainty. They incorporate 15 members of the Met Office Hadley Centre model, HadGEM3-GC3.05 (PPE-15), and 13 other climate models selected from the climate models that informed the Intergovernmental Panel on Climate Change’s 5th Assessment Report (CMIP5-13).

The regional (12km) projections are a set of 12 high resolution projections at 12km (RCM-PPE), downscaled from the PPE-15 over the UK and Europe. They assess the uncertainty in the regional model parameters, as well as uncertainty in the large-scale conditions from the driving global model.

The local (2.2km) projections are a set of 12 high resolution projections at 12km (RCM-PPE), downscaled from the PPE-15 over the UK and Europe. They assess the uncertainty in the regional model parameters, as well as uncertainty in the large-scale conditions from the driving global model.

The derived projections are a set of climate futures for the UK at 60km grid resolution for a low emissions scenario, RCP2.6 and a global warming level of 2°C and 4°C. These have been derived from the global projections using statistical techniques.

Comment: a PPE is a Perturbed Parameter Ensemble, i.e. a group of simulations (i.e. an ensemble) created from a single model with each simulation slightly different to the other due to changing the model parameters (i.e. the model setup).

2.1.2 Connections among UKCP projections

How are the UKCP regional and local products related to the global products?

The results of UKCP Global are used as inputs into UKCP Regional models and in turn the results of UKCP Regional are used as input to UKCP Local models.

The UKCP products are generated using climate models at three spatial resolutions: UKCP Global at 60 km, UKCP Regional at 12 km and UKCP Local at 2.2 km. The reason for having multiple resolutions is that it would be computationally infeasible to produce an ensemble of climate model simulations for the whole globe at 2.2 km. Instead, the higher resolution climate models are run over a smaller area. This is referred to as a Limited Area Model (LAM). For UKCP Regional, this area covers the North Atlantic-Europe domain, while for UKCP Local this covers the British Isles.

The LAMs are “nested” within each other, meaning that they take information from the larger domain model as a starting point before running their simulation. This is referred to as “dynamical downscaling”. In UKCP, the regional LAM is nested within the global model and the Local LAM is nested within the regional LAM.

2.1.3 How the UKCP18 projections can be compared to the HadUk-Grid dataset?

From Hollis et al. (2019). To facilitate comparisons between the HadUK-Grid dataset (observational dataset) and the UKCP18 projections, the HadUK-Grid dataset contains:

gridded datasets at 5, 12, 25 and 60 km resolution, which use the same grid projection as UKCP18. The re-gridding is conducted through averaging of all 1 km grid points that fall within each of the coarser resolution grid cells.
1 km grids of the 1981–2000 climate average (the baseline used in the UKCP18 projections). These 20-year averages were obtained by averaging or summing the 20 monthly or annual gridded datasets for each variable (in contrast to the 30-year LTAs which were produced by interpolating 30-year station averages).

2.1.4 How to choose the appropriate UKCP18 projection?

Probabilistic projections should be used for:

Accessing to estimates of uncertainty in climate variables. The probabilistic projections provide an estimate of the probability associated with a given level of climate change. However, note that they are NOT estimates of the likelihood of real world outcomes.
Exploring a broad set of future outcomes within the 10th-90th percentile range that covers a given level of risk aversion.
Exploring the four emissions scenarios RCP2.6, RCP4.5, RCP6.0 and RCP8.5.
Exploring future changes at one geographical location, i.e. not multiple geographical locations simultaneously. Indeed, the probabilistic projections are are location specific and lack the full spatial coherence available from raw climate model output.
Placing the global and regional projections in context of the broader set of possible outcomes available from the probabilistic projections.

Global projections should be used for:

Analyzing climate at multiple geographical locations at the same time (ie a physical connection between the climate characteristics at these locations is required), e.g. assessing climate change impacts on the rail network across the whole of the UK (note that some may prefer the enhanced spatial detail offered by the regional projections). Note that the global projections may not sample a broad a range of outcomes as the probabilistic projections and do not enable estimates of relative likelihood.
Using daily data and being able to calculate a larger set of metrics than that available in the probabilistic projections.
Analyzing drivers and impacts of year-to-year variability. The probabilistic, global, regional and local projections all include information on interannual variability. In addition, the global projections support analysis of physical and dynamical processes (e.g. conditions in the North Atlantic and the state of the Arctic) that give rise to the climate variability experienced over the UK. The global and regional projections also support investigation of the resulting high-impact events.
Exploring future outcomes across MOHC’s (PPE-15) and other models from international climate modelling centres (CMIP5-13).
Combining with the probabilistic projections to look at projections that explore the extremes of the distribution, e.g. you could use the PPE-15 to explore unlikely but plausible heat-related impacts in summer climate or the CMIP5-13 for unusually cool future seasons.
Geographical regions outside of the UK and Europe. Note that the regional projections also includes Europe.
Providing large scale or historical context for regional changes (note that the regional projections cover the North Atlantic and Europe only, the local projections cover the UK only and both start in 1981).
Exploring the impacts of the RCP8.5 emissions scenario only.

Some information to consider before using the global projections:

The combined set of 28 projections is not designed to support estimates of the relative likelihood of alternative future climate pathways. They cover a broad range of outcomes but do not provide as broad an assessment of uncertainties as the probabilistic projections.
They are based on two sets of model projections: 15 variants of the MOHC’s HadGEM3-GC3.05 model (PPE-15) and a selection of the CMIP5 models (CMIP5-13). The global projections explore process uncertainties through parameter perturbations for PPE-15 and through variations in structural choices for model components for CMIP5-13.
To analyse future outcomes that are indicated by the probabilistic projections but not available in the global projections, you may consider a number of approaches. These include sourcing additional climate model simulations form the wider CMIP5 dataset (subject to further evaluation), or use of statistical techniques or impacts models driven by climate changes sampled from the probability distributions.
If you choose to sub-select from the 28-member set for your analysis, do this with caution, ensuring that you are able to justify your selection.
There are systematic differences between model results and observations (biases) common to all climate models. You may wish to adjust the data for the differences between climate model results and observations (i.e. carry out bias correction). More information here: guidance on bias-correction.

Regional and local projections should be used for:

Applications where local scales are essential, as the regional and/or local projections better represent local effects due to land elevation, coastlines and surface characteristics, as well as providing improved resolution of dynamical features such as mesoscale circulations and frontal systems.
Improved simulation of extremes with higher temporal variability (e.g. daily, subdaily).
Extremes that explore future outcomes outside of the 90th percentile of the probabilistic projections (e.g. changes in long-term average precipitation showing large increases in winter in some western coastal regions).

Some information to consider before using the regional/local projections:

The regional and/or local projections sample a narrower range of potential future outcomes compared to the full set of global projections. In particular, they only downscale 12 members of the PPE-15 and none of the CMIP5-13. To explore other potential futures, consider using the EURO-CORDEX (https://euro-cordex.net/) multi-model Regional Climate Model simulations.
You need to weigh the benefits of fuller range of sampling available from the probabilistic projections, or the global projections, against the benefits of finer resolution available from the regional climate model.
While the finer resolution adds spatial detail, the benefit comes from the being able to simulate smaller scale atmospheric processes as well as the effects on the climate of geographical features such as coastlines and orography.
For information on summer convective storms (intense storm events that we typically experience in the UK summer) and the associated short-term precipitation events, you should use the local projections, which provide a better representation of convective processes. Like for global projections (see above), there will be biases in the climate models even when moving to kilometer-scale resolutions (see Kendon et al. 2017).
It is important to evaluate the level of downscaling skill for variables and metrics of interest, particularly if the core evaluation work does not cover them (see UKCP18 Land Science Report or Factsheets). This informs the level of credibility for the projected changes at local scales. (this part is unclear to me..)
A set of variants of the Met Office Hadley Centre Model, HadGEM3-GC3.05 is used to drive the regional climate models. The levels of warming in these (PPE-15) simulations suggest that most members possess values of climate sensitivity (the equilibrium response to a doubling of CO2) above 4.5ºC, lying outside the IPCC likely range of 1.5-4.5ºC, but below the IPCC unlikely level of 6ºC.
While high-resolution downscaling adds value to climate projections provided by their driving models, the regional models do not, in general, correct large-scale biases inherited from global simulations.

2.1.5 Resources and case studies

UKCP Case studies:

Climate matching tool. Identify appropriate seed origins to enable the planting of forest tree species that are likely to be tolerant of future climate conditions.
Forests for the future. Creating future tree growth and species suitability maps for foresters. Forest Research.
Other case studies.

UKCP resources:

2.1.6 Sum-up

Importance of the spatial and temporal coherence of the climatic data. I think we have to use spatially and temporally coherent climate projections to estimate the genomic offset of the studied populations. Indeed, spatially coherent data are necessary to estimate the spatial relationships between allele frequencies and climatic variables (ie the gene-climate relationships), and temporally coherent data are necessary to estimate the disruption of the gene-climate associations under future climates. Global, regional, local and derived projections are spatially and temporally coherent but not the probabilistic projections.

Bias-correction. I do not think we have to correct for bias because we can assume that the causes of the biases will not change in the future. The UKCP guide on bias-correction informs that this assumption is common in bias-correction methods, even though it has also been severely criticized. In our study, we want to compare past and future climates at the location of the populations, so we are more interested in the relative differences in climatic values than in the absolute climatic values. That’s why I do not think that small biases in climatic values at the location of the populations would not impact our results under the assumption that the same biases apply to past and future data.

HadUK-grid vs UKCP18 regional/local projections. I do not think it is ok to estimate the gene-climate relationships using the HadUK-grid and then use the future climates of the UKCP18 local projections to calculate the genomic offset. Indeed, the HadUK-grid is an observational dataset, which is probably less prone to biases, while the UKCP18 are projections based on different models. Climatic differences between past climatic values (from the HadUK-grid) and future climatic values (from the UKCP18) may stem from the different methods used to estimate the climatic data at the location of the populations. Therefore, I would suggest to use the UKCP18 regional/local projections for both past and future climates, even though the UKCP18 projections of past climates are (probably, actually I do not know) less accurate than the HadUK-grid dataset.

Local projections at 5km vs 2.2km resolution. In the CEDA archive, 20-year annual and seasonal average (1980-2000, 2020-2040 and 2060-2080) are available for local projections at 5km resolution but not 2.2km resolution. If we consider that 5km resolution is enough, we could use the 1980-2000 and 2060-2080 20-year averages to calculate the genomic offset. If we consider that 2.2km resolution is better or that we want other time periods, I will have to calculate annual and seasonal averages based on monthly data.

How to deal with the 12 regional climate models? Regional projections come 12 different models called ‘members’, see Appendix D of the UKCP18 Guidance: Data availability, access and formats. Should we calculate the genomic offset for each of the members and then average their predictions ?

Probability of extreme events. It may be interesting to use the global projections or the probabilistic projections to extract the probability of extreme events at the location of the populations. Even though this information can not be used in the genomic offset approach, it can still be very informative and valuable to assess population exposure to climate change.

2.2 Data extraction

UKCP Local Projections at 2.2 km Resolution for 1980-2080

For the folder structure and filename conventions, see Tables C.1, C.2 and C.3 in UKCP18 Guidance: Data availability, access and formats.

2.2.1 Automate the download

I tried to download the data automatically following: https://2infectious.wordpress.com/2018/03/09/using-r-to-download-ceda-datasets/. But my Windows computer does not have the wget function, and so I cannot get a security certificate (https://help.ceda.ac.uk/article/4442-ceda-opendap-scripted-interactions#start). I’ve asked the IT department to install it..

Code

library(RCurl)
library(readr)


# set the path to your security certificate
tempCert <- "temp_cert.pem"

# set the url of the data file you want to work on, e.g.:
url <- "https://data.ceda.ac.uk/badc/ukcp18/data/land-cpm/uk/5km/rcp85/01/clt/ann-20y/v20210615/clt_rcp85_land-cpm_uk_5km_01_ann-20y_198012-200011.nc"

# set cURL options
curl = getCurlHandle()
curlSetOpt(cookiejar = "", followlocation = TRUE, curl = curl, sslcert = tempCert)

# read in the data (here for a tab-delimited file)
tmp <- read_delim(getURL(url = url, curl = curl), delim = "\t", col_names = FALSE)

References

Hollis, Dan, Mark McCarthy, Michael Kendon, Tim Legg, and Ian Simpson. 2019. “HadUK-Grid—a New UK Dataset of Gridded Climate Observations.” Geoscience Data Journal 6 (2): 151–59.