Evolvability, narrow-sense heritability and within-family variance of height

Height in 2014 and 2020

Author

Juliette Archambeau

Published

June 21, 2024

Introduction

Goal: Comparing the adaptive potential of the populations.

For that, we estimate the narrow-sense heritability, evolvability and within-population (i.e. family) variance of the populations for height (measured in 2014 and 2020).

We estimate these genetic parameters:

  • at the species-level (i.e. one estimate for all populations).

  • at the population-level with population-specific models (i.e. one model per population).

  • at the population-level with one model for all populations (hereafter referred as ‘full model’), and based on Archambeau et al. (2023).

As heritability and evolvability are specific to the environment, the models are fitted in each common garden separately. We fit models that do not account for the nursery effect for the two common gardens in which trees come from the same nursery: the sites of Yair (FS) in the Scottish borders (in which all trees come from NG) and Glensaugh (FE) in which all trees come from NE (except four trees that come from NG and that we remove for the analyses).

We fit models that do account for the nursery effect in the common garden in which trees come from different nurseries: Inverewe (FW), which contains cohorts of trees raised in each of the three nurseries as follows: 290 grown locally in the NW; 132 grown in the NG; and 82 grown in the NE.

Heritability calculation

In family-based progeny trials, we assume that the seedlings from each maternal tree are half-siblings (i.e. same mother but each with a different father).

The coefficient of relationship of half-siblings is \(r = 1/4\), and therefore \(\sigma^2_A = 4\sigma^2_{family}\).

The narrow-sense heritability is equal to \(h^2 = \frac{\sigma^2_{A}}{\sigma^2_{P}} = \frac{4\sigma^2_{family}}{\sigma^2_{P}}\) where \(\sigma^2_{A}\) is the additive genetic variance, \(\sigma^2_{P}\) is the phenotypic variance and \(\sigma^2_{f}\) is the family variance.

What is annoying is that when calculating \(h^2\) based on family-based progeny trials (without genomic information), people do not include the same terms in the determinant. See the Short Note: On Estimating Heritability according to practical applications by Coterill in 1986

In our case, we will estimate \(h^2\) as follows:

\[h^2 = \frac{4\sigma^2_{family}}{\sigma^2_{family} + \sigma^2_r}\]

And evolvability as follows:

\[I = \frac{4\sigma^2_{family}}{\text{mean(y)}^2}\] When looking at the genomic data (FormattingScotsPineGenomicData.qmd report), we found that some pairs of individuals expected to be half-sibs showed a coefficient of relationships higher than 0.7 (i.e. clones) or between 0.354 and 0.707 (i.e. full-sibs). To account for that, we may want to use the “true” relationship coefficients between individuals, that we can get from the genomic data, i.e. use a genomic relationship matrix to estimate \(h^2\).

Height data

For details on the data, see the report ExploringFormattingCGdata.qmd.

Code
# we load the phenotypic data
data <- readRDS(file=here("data/ScotsPine/formatted_CG_data.rds"))

sites <- c("FE" = "Glensaugh (FE)", "FW" = "Inverewe (FW)", "FS" = "Yair (FS)")

How many NAs for height measurements?

Code
data %>% dplyr::select(contains("HA")) %>% sapply(function(x) sum(is.na(x)))
HA13 HA14 HA15 HA16 HA17 HA18 HA19 HA20 
 748   57   68   79   70   76   74   83 

We analyse height in 2014 and 2020.

Code
ht <- list(codes = c("HA14","HA20"),
           names = c("Height in 2014","Height in 2020"),
           colors= c("chartreuse","chartreuse4"))

names(ht$names) <- ht$codes

We keep only the colums of interest and we remove the four trees in FE that belong to the NG nursery:

Code
subdata <- data %>% 
  dplyr::select(FieldSite,Block,PopulationCode,Family,Nursery,all_of(ht$codes)) %>% 
  dplyr::rename(site=FieldSite,
                bloc=Block,
                pop=PopulationCode,
                fam=Family,
                nurs=Nursery) %>% 
  filter(!(site=="FE" & nurs =="NG")) %>% # we remove the four trees that come from the nursery NG in the field site FE
  pivot_longer(contains("HA"), names_to = "y_names", values_to = "y_values")
Code
subdata %>%
  drop_na(y_values) %>% 
  ggplot(aes(x=y_values)) +  
    geom_histogram(aes(y=after_stat(density), colour=y_names,fill=y_names),bins = 50, alpha = 0.4) +
    geom_density(aes(colour=y_names),alpha=.2,fill="white") +
    facet_wrap(~site,scales="free", labeller = labeller(site = sites)) + 
    scale_fill_manual(name = "Trait", values = ht$colors, labels = ht$names) +
    scale_color_manual(name = "Trait", values = ht$colors, labels = ht$names) +
    ylab("Density") + xlab("Height values") +
    theme_bw() +
    theme(legend.position = c(0.9,0.9))

Stan model options

Options for the Stan models across the document:

Code
# Sampling in Bayesian models
n_chains <- 4 # number of chains (MCMC)
n_iter <- 2500 # number of iterations
n_warm <- 1250 # number of iterations in the warm-up phase
n_thin <- 1 # thinning interval
save_warmup <- FALSE 

# Credible intervals
conf_level <- 0.95

Species-level

In this section, we estimate evolvability (\(I\)) and narrow-sense heritability (\(h^2\)) at the species level, i.e. one value for all populations.

Model equation

Here is the mathematical model for a common garden with different nurseries. The mathematical model for a common garden with trees coming from one nursery is the same, without the nursery intercepts.

We model each trait \(y\) such as:

\[\begin{equation} \begin{aligned} y_{bpfnr} & \sim \mathcal{N}(\mu_{bpfn},\sigma^{2}_{r})\\[3pt] \mu_{bpc} & = \beta_{0} + B_{b} + P_{p} + F_{f} + N_{n} \\[3pt] \beta_{0} & \sim \mathcal{N}(\mu_{y},2)\\[3pt] \begin{bmatrix} B_{b} \\ P_{p} \\ F_{f} \\ N_{n} \end{bmatrix} & \sim \mathcal{N}\left(0, \begin{bmatrix} \sigma^{2}_{B}\\[3pt] \sigma^{2}_{P}\\[3pt] \sigma^{2}_{F}\\[3pt] \sigma^{2}_{N}\\[3pt] \end{bmatrix} \right)\\ \end{aligned} \end{equation}\]

\(\beta_{0}\) is the global intercept. \(B_{b}\) are the block intercepts with a variance \(\sigma^{2}_{B}\). \(P_{p}\) are the population intercepts with a variance \(\sigma^{2}_{P}\). \(F_{f}\) are the family intercepts with a variance \(\sigma^{2}_{F}\). \(N_{n}\) are the nursery intercepts with a variance \(\sigma^{2}_{N}\). \(\sigma^{2}_{r}\) is the residual variance.

Partitioning of the total variance \(\sigma_{tot}^{2}\):

\[\begin{equation} \begin{aligned} \sigma_{tot}^{2} & = \sigma_{r}^{2} + \sigma_{B}^{2} + \sigma_{P}^{2} + \sigma_{F}^{2} + \sigma_{N}^{2} \\[3pt] \sigma_{r} & = \sigma_{tot} \times \sqrt(\pi_{r})\\[3pt] \sigma_{B} & = \sigma_{tot} \times \sqrt(\pi_{B})\\[3pt] \sigma_{P} & = \sigma_{tot} \times \sqrt(\pi_{P})\\[3pt] \sigma_{F} & = \sigma_{tot} \times \sqrt(\pi_{F})\\[3pt] \sigma_{N} & = \sigma_{tot} \times \sqrt(\pi_{N})\\[3pt] \sigma_{tot} & \sim \mathcal{S}^{*}(0,1,3) \end{aligned} \end{equation}\]

with \(\sum_{l}^{5}\pi_{l}=1\).

Estimation of \(h^2\) (narrow-sense heritability) and \(I\) (evolvability).

We assume that the seedlings from each maternal tree are half-siblings (i.e. same mother but each with a different father), so \(\sigma^2_A = 4\sigma^2_{family}\), and so \(h^2 = \frac{4\sigma^2_{family}}{\sigma^2_{family} + \sigma^2_r}\) and \(I = \frac{4\sigma^2_{family}}{\text{mean(y)}^2}\).

Comment: in some papers, I’ve seen \(h^2 = \frac{4\sigma^2_{family}}{\sigma^2_{family} + \sigma^2_{population} + \sigma^2_r}\). Which \(h^2\) estimation should we use?

Stan code

Code
model_different_nurseries <- stan_model(here("scripts/stan_models/Evolvability_M1_DifferentNurseries.stan"))
recompiling to avoid crashing R session
Code
model_one_nursery <- stan_model(here("scripts/stan_models/Evolvability_M1_OneNursery.stan"))
recompiling to avoid crashing R session
Code
model_different_nurseries
S4 class stanmodel 'anon_model' coded as follows:
// Variance partitioning in the common gardens with trees from different nurseries
data {
  int<lower=1> n;                                                               // Number of observations
  vector[n] y;                                                                  // Response variable
  int<lower=0> n_bloc;                                                          // Number of blocks
  int<lower=0> n_pop;                                                           // Number of populations
  int<lower=0> n_fam;                                                           // Number of families
  int<lower=0> n_nurs;                                                          // Number of nurseries
  int<lower=0, upper=n_bloc> bloc[n];                                           // Blocks
  int<lower=0, upper=n_pop> pop[n];                                             // Populations
  int<lower=0, upper=n_fam> fam[n];                                             // Families
  int<lower=0, upper=n_nurs> nurs[n];                                           // Nurseries
}
parameters {
  real beta0; // global intercept
  simplex[5] pi;
  real<lower = 0> sigma_tot;

  vector[n_bloc] z_block;
  vector[n_pop] z_pop;
  vector[n_fam] z_fam;
  vector[n_nurs] z_nurs;

}
transformed parameters {
  real R_squared;

  real<lower = 0>  sigma_r;
  real<lower = 0>  sigma_block;
  real<lower = 0>  sigma_pop;
  real<lower = 0>  sigma_fam;
  real<lower = 0>  sigma_nurs;

  vector[n_bloc] alpha_block;
  vector[n_pop] alpha_pop;
  vector[n_fam] alpha_fam;
  vector[n_nurs] alpha_nurs;

  vector[n] mu; // linear predictor

  // variance partitioning with the simplex pi
  sigma_r = sqrt(pi[1]) * sigma_tot;
  sigma_block = sqrt(pi[2]) * sigma_tot;
  sigma_pop = sqrt(pi[3]) * sigma_tot;
  sigma_fam= sqrt(pi[4]) * sigma_tot;
  sigma_nurs= sqrt(pi[5]) * sigma_tot;

  alpha_block = z_block*sigma_block;
  alpha_pop = z_pop*sigma_pop;
  alpha_fam = z_fam*sigma_fam;
  alpha_nurs = z_nurs*sigma_nurs;

  mu = rep_vector(beta0, n) + alpha_block[bloc] + alpha_pop[pop] + alpha_fam[fam] + alpha_nurs[nurs];
  R_squared = 1 - variance(y - mu) / variance(y);
}
model{
  //Priors
  beta0 ~ normal(mean(y),20);
  sigma_tot ~ student_t(3, 0.0, 1.0);

  z_block ~ std_normal();
  z_pop ~ std_normal();
  z_fam ~ std_normal();
  z_nurs ~ std_normal();

  // Likelihood
  y ~ normal(mu, sigma_r);
}
generated quantities {
  //Variances, narrow-sense heritability and evolvability
  real<lower=0> sigma2_tot;
  real<lower=0> sigma2_r;
  real<lower=0> sigma2_block;
  real<lower=0> sigma2_pop;
  real<lower=0> sigma2_fam;
  real<lower=0> sigma2_nurs;
  real<lower=0> h2;
  real<lower=0> I;

  sigma2_tot = square(sigma_tot);
  sigma2_r = square(sigma_r);
  sigma2_block = square(sigma_block);
  sigma2_pop = square(sigma_pop);
  sigma2_fam = square(sigma_fam);
  sigma2_nurs = square(sigma_nurs);
  h2 = (4*sigma2_fam)/(sigma2_r+sigma2_fam);
  I = (4*sigma2_fam)/(mean(y)^2);

} 

Running the models

Code
lapply(ht$codes, function(ht_i){
  lapply(names(sites), function(site_i){
  
  subsubdata <- subdata %>% 
    filter(site==site_i & y_names == ht_i) %>% 
    drop_na(y_values) %>% 
    dplyr::select(-y_names) %>% 
    dplyr::rename(y=y_values)
  
  if(site_i %in% c("FE","FS")){ 
    model <- model_one_nursery
    pars <- c("beta0", "pi", "R_squared", "h2","I",
              "alpha_block","alpha_pop","alpha_fam",
              "sigma2_r","sigma2_block","sigma2_pop","sigma2_fam","sigma2_tot")
  } else if(site_i == "FW"){
    model <- model_different_nurseries
    pars <- c("beta0", "pi", "R_squared", "h2","I",
              "alpha_block","alpha_pop","alpha_fam","alpha_nurs",
              "sigma2_r","sigma2_block","sigma2_pop","sigma2_fam","sigma2_nurs","sigma2_tot")
      
    }
  
  stanfit <- sampling(model,
                    data = compose_data(subsubdata),
                    pars=pars,
                    iter = n_iter, 
                    chains = n_chains, 
                    cores = n_chains,
                    save_warmup = save_warmup,
                    thin=n_thin)
  }) %>% 
  setNames(names(sites)) %>% 
  saveRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_SpeciesLevelModels.rds"))) 
})

Interval plots

Variance partitioning

Code
# Function to plot the intervals
plot_intervals <- function(ht_i,plot_title){
  
readRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_SpeciesLevelModels.rds"))) %>% 
  lapply(function(x){
    
  broom.mixed::tidyMCMC(x,pars=("pi"),
                droppars = NULL, 
                ess = F, rhat = F, 
                conf.int = T, conf.level = conf_level)  
  }) %>% 
  bind_rows(.id = "site") %>% 
  mutate(prop_var = case_when(term == "pi[1]" ~ "Residuals",
                              term == "pi[2]" ~ "Blocks",
                              term == "pi[3]" ~ "Populations",
                              term == "pi[4]" ~ "Families",
                              term == "pi[5]" ~ "Nurseries")) %>% 
  ggplot(aes(y = prop_var, x = estimate, xmin = conf.low, xmax = conf.high, color=site)) +
  geom_pointinterval(position = position_dodge(width = .4),point_size=3,alpha=0.6,size=5) +
  ylab("") +
  xlab("Proportion of variance explained") +
  ggtitle(plot_title) +
  labs(color = "Field sites") +
  theme(axis.text = element_text(size=16),
        title = element_text(size=14),
        panel.grid.minor.y=element_blank(),
        panel.grid.major.y=element_blank())  +
    guides(color=guide_legend(ncol=1))
}

plot_intervals(ht_i = "HA14", plot_title = "Height in 2014")

Code
plot_intervals(ht_i = "HA20", plot_title = "Height in 2020")

Narrow-sense heritability (\(h^2\))

Code
plot_intervals_splevel <- function(param, x_axis){
  
  lapply(ht$codes, function(ht_i){
    readRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_SpeciesLevelModels.rds"))) %>%
      lapply(function(x){
        
        broom.mixed::tidyMCMC(x,pars=(param),
                              droppars = NULL, 
                              ess = F, rhat = F, 
                              conf.int = T, conf.level = conf_level)  
        }) %>% 
      bind_rows(.id = "site")   }) %>% 
    setNames(ht$codes) %>% 
    bind_rows(.id="ht") %>% 
    
    ggplot(aes(y = site, x = estimate, xmin = conf.low, xmax = conf.high, color=ht)) +
    geom_pointinterval(position = position_dodge(width = .4),point_size=3,alpha=0.6,size=5) +
    ylab("") +
    xlab(x_axis) +
    scale_color_manual(name = "", values = ht$colors, labels = ht$names) +
    theme(axis.text = element_text(size=16),
          legend.text = element_text(size=14),
          title = element_text(size=14),
          panel.grid.minor.y=element_blank(),
          panel.grid.major.y=element_blank())  +
    guides(color=guide_legend(ncol=1))
}
Code
plot_intervals_splevel(param="h2", x_axis = "Narrow-sense heritability")

Evolvability (\(I\))

Code
plot_intervals_splevel(param="I", x_axis = "Evolvability")

Population-level

In this section, we estimate the evolvability (\(I\)) and narrow-sense heritability (\(h^2\)) of each population.

Code
# Parameters of interest
params <-c("I","h2","sigma2_fam")
param_names <- c("Evolvability", "Narrow-sense heritability", "Within-family variance")
names(param_names) <- params

Using the full dataset

We first fit the models on the entire dataset.

Population-specific models

Model equation

We fit one model per population.

Here is the model equation for a common garden with different nurseries. The model equation for a common garden with trees coming from one nursery is the same, without the nursery intercepts.

We model each trait \(y\) such as:

\[\begin{equation} \begin{aligned} y_{bfnr} & \sim \mathcal{N}(\mu_{bfn},\sigma^{2}_{r})\\[3pt] \mu_{bpc} & = \beta_{0} + B_{b} + F_{f} + N_{n} \\[3pt] \beta_{0} & \sim \mathcal{N}(\mu_{y},2)\\[3pt] \begin{bmatrix} B_{b} \\ F_{f} \\ N_{n} \end{bmatrix} & \sim \mathcal{N}\left(0, \begin{bmatrix} \sigma^{2}_{B}\\[3pt] \sigma^{2}_{F}\\[3pt] \sigma^{2}_{N}\\[3pt] \end{bmatrix} \right)\\ \end{aligned} \end{equation}\]

\(\beta_{0}\) is the global intercept. \(B_{b}\) are the block intercepts with a variance \(\sigma^{2}_{B}\). \(F_{f}\) are the family intercepts with a variance \(\sigma^{2}_{F}\). \(N_{n}\) are the nursery intercepts with a variance \(\sigma^{2}_{N}\). \(\sigma^{2}_{r}\) is the residual variance.

Partitioning of the total variance \(\sigma_{tot}^{2}\):

\[\begin{equation} \begin{aligned} \sigma_{tot}^{2} & = \sigma_{r}^{2} + \sigma_{B}^{2} + \sigma_{F}^{2} + \sigma_{N}^{2} \\[3pt] \sigma_{r} & = \sigma_{tot} \times \sqrt(\pi_{r})\\[3pt] \sigma_{B} & = \sigma_{tot} \times \sqrt(\pi_{B})\\[3pt] \sigma_{F} & = \sigma_{tot} \times \sqrt(\pi_{F})\\[3pt] \sigma_{N} & = \sigma_{tot} \times \sqrt(\pi_{N})\\[3pt] \sigma_{tot} & \sim \mathcal{S}^{*}(0,1,3) \end{aligned} \end{equation}\]

with \(\sum_{l}^{4}\pi_{l}=1\).

Stan code

Code
model_pop_specific_different_nurseries <- stan_model(here("scripts/stan_models/Evolvability_M2_DifferentNurseries.stan"))
model_pop_specific_one_nursery <- stan_model(here("scripts/stan_models/Evolvability_M2_OneNursery.stan"))
model_pop_specific_different_nurseries
S4 class stanmodel 'anon_model' coded as follows:
// Variance partitioning in one population in the common gardens with trees from different nurseries
data {
  int<lower=1> n;                                                               // Number of observations
  vector[n] y;                                                                  // Response variable
  int<lower=0> n_bloc;                                                          // Number of blocks
  int<lower=0> n_fam;                                                           // Number of families
  int<lower=0> n_nurs;                                                          // Number of nurseries
  int<lower=0, upper=n_bloc> bloc[n];                                           // Blocks
  int<lower=0, upper=n_fam> fam[n];                                             // Families
  int<lower=0, upper=n_nurs> nurs[n];                                           // Nurseries
}
parameters {
  real beta0; // global intercept
  simplex[4] pi;
  real<lower = 0> sigma_tot;

  vector[n_bloc] z_block;
  vector[n_fam] z_fam;
  vector[n_nurs] z_nurs;

}
transformed parameters {
  real R_squared;

  real<lower = 0>  sigma_r;
  real<lower = 0>  sigma_block;
  real<lower = 0>  sigma_fam;
  real<lower = 0>  sigma_nurs;

  vector[n_bloc] alpha_block;
  vector[n_fam] alpha_fam;
  vector[n_nurs] alpha_nurs;

  vector[n] mu; // linear predictor

  // variance partitioning with the simplex pi
  sigma_r = sqrt(pi[1]) * sigma_tot;
  sigma_block = sqrt(pi[2]) * sigma_tot;
  sigma_fam= sqrt(pi[3]) * sigma_tot;
  sigma_nurs= sqrt(pi[4]) * sigma_tot;

  alpha_block = z_block*sigma_block;
  alpha_fam = z_fam*sigma_fam;
  alpha_nurs = z_nurs*sigma_nurs;

  mu = rep_vector(beta0, n) + alpha_block[bloc] + alpha_fam[fam] + alpha_nurs[nurs];
  R_squared = 1 - variance(y - mu) / variance(y);
}
model{
  //Priors
  beta0 ~ normal(mean(y),20);
  sigma_tot ~ student_t(3, 0.0, 1.0);

  z_block ~ std_normal();
  z_fam ~ std_normal();
  z_nurs ~ std_normal();

  // Likelihood
  y ~ normal(mu, sigma_r);
}
generated quantities {
  //Variances, narrow-sense heritability and evolvability
  real<lower=0> sigma2_tot;
  real<lower=0> sigma2_r;
  real<lower=0> sigma2_block;
  real<lower=0> sigma2_fam;
  real<lower=0> sigma2_nurs;
  real<lower=0> h2;
  real<lower=0> I;

  sigma2_tot = square(sigma_tot);
  sigma2_r = square(sigma_r);
  sigma2_block = square(sigma_block);
  sigma2_fam = square(sigma_fam);
  sigma2_nurs = square(sigma_nurs);
  h2 = (4*sigma2_fam)/(sigma2_r+sigma2_fam);
  I = (4*sigma2_fam)/(mean(y)^2);

} 

Running the models

Code
lapply(ht$codes, function(ht_i){
  lapply(names(sites), function(site_i){
    
    subsubdata <- subdata %>% 
      filter(site==site_i & y_names == ht_i) %>% 
      drop_na(y_values) %>% 
      dplyr::select(-y_names) %>% 
      dplyr::rename(y=y_values) %>% 
      arrange(pop)
    
 lapply(unique(subsubdata$pop), function(pop_i){
   
   subpop <- subsubdata %>% 
    filter(pop==pop_i) 
   
   if(site_i %in% c("FE","FS")){ 
    model <- model_pop_specific_one_nursery
    pars <- c("pi", "h2", "I", #"beta0", "R_squared", 
              "alpha_block", "alpha_fam",
              "sigma2_r", "sigma2_block", "sigma2_fam", "sigma2_tot")
  } else if(site_i == "FW"){
    model <- model_pop_specific_different_nurseries
    pars <- c("pi", "h2", "I", #"beta0", "R_squared",
              "alpha_block", "alpha_fam", "alpha_nurs",
              "sigma2_r", "sigma2_block", "sigma2_fam", "sigma2_nurs", "sigma2_tot")
       }

 
  stanfit <- sampling(model,
                      data = compose_data(subpop),
                      pars=pars,
                      iter = n_iter, 
                      chains = n_chains, 
                      cores = n_chains,
                      save_warmup = save_warmup,
                      thin=n_thin)   
 }) %>% 
   setNames(unique(subsubdata$pop)) %>% 
   saveRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_",site_i,"_PopSpecificModels.rds")))
})
})

Extract params

Code
lapply(ht$codes, function(ht_i){
  lapply(names(sites), function(site_i){
    
    readRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_",site_i,"_PopSpecificModels.rds"))) %>% 
        lapply(function(pop_i){
          
          broom.mixed::tidyMCMC(pop_i, 
                                pars=params,
                                robust = TRUE, # median and mean absolute deviation
                                ess = F, rhat = F, 
                                conf.int = T, conf.level = conf_level) 
          
          }) %>% 
        bind_rows(.id = "pop") 
      }) %>% 
    setNames(names(sites)) %>% 
    bind_rows(.id = "site") 
  }) %>% 
    setNames(ht$codes) %>% 
    bind_rows(.id = "ht") %>% 
  saveRDS(here("outputs/ScotsPine/Evolvability/mcmc_params_pop_specific_models.rds"))
Code
mcmc_params_pop_specific_models <- readRDS(here("outputs/ScotsPine/Evolvability/mcmc_params_pop_specific_models.rds"))

Interval plots

Narrow-sense heritability
Code
plot_intervals_poplevel <- function(ht_i, mcmc_draws, param, x_axis, plot_title){
  
  mcmc_draws %>% 
    filter(term == param & ht == ht_i) %>%
    
    ggplot(aes(y = pop, x = estimate, xmin = conf.low, xmax = conf.high,color=site)) + # invert x and y to get the populations on the x-axis
    geom_pointinterval(position = position_dodge(width = .4),point_size=3,alpha=0.6,size=5) +
    ylab("") +
    xlab(x_axis) +
    ggtitle(plot_title) +
    theme(axis.text = element_text(size=16),
          legend.text = element_text(size=16),
          panel.grid.minor.y=element_blank(),
          panel.grid.major.y=element_blank())  +
    guides(color=guide_legend(ncol=1))
}

plot_intervals_poplevel(ht_i = "HA14", 
                        mcmc_draws = mcmc_params_pop_specific_models,
                        param = "h2",
                        x_axis = "Narrow-sense heritability", 
                        plot_title = "Height in 2014")

Code
plot_intervals_poplevel(ht_i = "HA20", 
                        mcmc_draws = mcmc_params_pop_specific_models,
                        param = "h2",
                        x_axis = "Narrow-sense heritability", 
                        plot_title = "Height in 2020")

Evolvability
Code
plot_intervals_poplevel(ht_i = "HA14", 
                        mcmc_draws = mcmc_params_pop_specific_models,
                        param = "I",
                        x_axis = "Evolvability", 
                        plot_title = "Height in 2014")

Code
plot_intervals_poplevel(ht_i = "HA20", 
                        mcmc_draws = mcmc_params_pop_specific_models,
                        param = "I",
                        x_axis = "Evolvability", 
                        plot_title = "Height in 2020")

Within-family variance
Code
plot_intervals_poplevel(ht_i = "HA14", 
                        mcmc_draws = mcmc_params_pop_specific_models,
                        param = "sigma2_fam",
                        x_axis = "Within-family variance", 
                        plot_title = "Height in 2014")

Code
plot_intervals_poplevel(ht_i = "HA20", 
                        mcmc_draws = mcmc_params_pop_specific_models,
                        param = "sigma2_fam",
                        x_axis = "Within-family variance", 
                        plot_title = "Height in 2020")

One model for all populations

Model equation

Here is the model equation for a common garden with different nurseries.

We modeled each trait \(y_{bpfr}\) such as:

\[\begin{equation} \begin{aligned} y_{bpfnr} & \sim \mathcal{N}(\mu_{bpfn},\sigma^{2}_{r})\\[3pt] \mu_{bpfn} & = \beta_{0} + B_{b} + P_{p} + F_{f(p)} + N_{n}\\[3pt] \end{aligned} \end{equation}\]

\(\beta_{0}\) is the global intercept. \(B_{b}\) are the block intercepts with a variance \(\sigma^{2}_{B}\). \(P_{p}\) are the population intercepts with a variance \(\sigma^{2}_{P}\). \(F_{f}\) are the family intercepts with a variance \(\sigma^{2}_{F}\). \(N_{n}\) are the nursery intercepts with a variance \(\sigma^{2}_{N}\). \(\sigma^{2}_{r}\) is the residual variance.

The prior of \(\beta_{0}\) was weakly informative and centered around the mean of the observed values for the trait under considered, as follows:

\[\beta_{0} \sim \mathcal{N}(\mu_{y},20)\]

The population, nursery and block intercepts, \(P_{p}\), \(N_{n}\) and \(B_{b}\) were considered normally-distributed with variances \(\sigma^{2}_{P}\), \(\sigma^{2}_{N}\) and \(\sigma^{2}_{B}\), such as:

\[\begin{bmatrix} B_{b} \\ P_{p} \\ N_{n} \end{bmatrix} \sim \mathcal{N}\left(0, \begin{bmatrix}\sigma^{2}_{B}\\[3pt] \sigma^{2}_{P}\\ \sigma^{2}_{N}\\ \end{bmatrix} \right)\\[3pt]\]

The family intercepts \(F_{f(p)}\) were considered to follow some population-specific normal distributions, such as: \[F_{f(p)} \sim \mathcal{N}(0,\sigma^{2}_{F_{p}})\]

where \(\sigma^{2}_{F_{p}}\) are the population-specific variances among families.

To partition the total variance, we parameterize our model so that only the total variance, \(\sigma_{tot}^{2}\) has a prior, such that:

\[\begin{equation} \begin{aligned} \sigma_{tot}^{2} & = \sigma_{r}^{2} + \sigma_{B}^{2} + \overline{\sigma_{F_{p}}}^{2} + \sigma_{P}^{2} + \sigma_{N}^{2}\\[3pt] \sigma_{r} & = \sigma_{tot} \times \sqrt(\pi_{r})\\[3pt] \sigma_{B} & = \sigma_{tot} \times \sqrt(\pi_{B})\\[3pt] \sigma_{P} & = \sigma_{tot} \times \sqrt(\pi_{P})\\[3pt] \sigma_{N} & = \sigma_{tot} \times \sqrt(\pi_{N})\\[3pt] \overline{\sigma_{F_{p}}} & = \sigma_{tot} \times \sqrt(\pi_{F})\\[3pt] \sigma_{tot} & \sim \mathcal{S}^{*}(0,1,3) \end{aligned} \end{equation}\]

where \(\overline{\sigma_{F_{p}}}\) and \(\overline{\sigma_{F_{p}}}^{2}\) are the mean of the population-specific among-family standard deviations (\(\sigma_{F_{p}}\)) and variances (\(\sigma^{2}_{F_{p}}\)), respectively, and \(\sum_{l}^{5}\pi_{l}=1\) (using the simplex function in Stan).

The population-specific among-families standard deviations \(\sigma_{F_{p}}\) follow a log-normal distribution with mean \(\overline{\sigma_{F_{p}}}\) and variance \(\sigma^{2}_{K}\), such as:

\[\begin{equation*} \begin{aligned} \sigma_{F_{p}} & \sim \mathcal{LN}\left(\ln(\overline{\sigma_{F_{p}}})-\frac{\sigma^{2}_{K}}{2},\sigma^{2}_{K}\right)\\[3pt] \sigma_{K} & \sim \exp(1)\\[3pt] \end{aligned} \end{equation*}\]

Stan code

Code
model_all_pops_different_nurseries <- stan_model(here("scripts/stan_models/Evolvability_M3_DifferentNurseries.stan"))
recompiling to avoid crashing R session
Code
model_all_pops_one_nursery <- stan_model(here("scripts/stan_models/Evolvability_M3_OneNursery.stan"))
recompiling to avoid crashing R session
Code
model_all_pops_different_nurseries
S4 class stanmodel 'anon_model' coded as follows:
// Estimating population-specific heritabilities and evolvabilities in the common gardens with trees from different nurseries
data {
  int<lower=1> n;                                                              // Number of observations
  vector[n] y;                                                                 // Response variable
  int<lower=0> n_bloc;                                                         // Number of blocks
  int<lower=0> n_pop;                                                          // Number of populations
  int<lower=0> n_fam;                                                          // Number of families
  int<lower=0> n_nurs;                                                         // Number of nurseries
  int<lower=0, upper=n_bloc> bloc[n];                                          // Blocks
  int<lower=0, upper=n_pop> pop[n];                                            // Populations
  int<lower=0, upper=n_pop> which_pop[n_fam];                                  // Populations
  vector[n_pop] pop_mean;                                                      // Mean of the populations
  int<lower=0, upper=n_fam> fam[n];                                            // Families
  int<lower=0, upper=n_nurs> nurs[n];                                          // Nurseries
}
parameters {
  real beta0; // global intercept
  simplex[5] pi;
  real<lower = 0> sigma_tot;
  real<lower = 0> sigma_K;
  vector[n_pop] z_log_sigma_fam;

  vector[n_bloc] z_bloc;
  vector[n_pop] z_pop;
  vector[n_fam] z_fam;
  vector[n_nurs] z_nurs;

}
transformed parameters {
  real R_squared;

  real<lower = 0>  sigma_r;
  real<lower = 0>  sigma_bloc;
  real<lower = 0>  sigma_pop;
  real<lower = 0>  sigma_nurs;

  real mean_sigma_fam;
  vector[n_pop] sigma_fam;

  vector[n_pop] alpha_pop;
  vector[n_fam] alpha_fam;
  vector[n_bloc] alpha_bloc;
  vector[n_nurs] alpha_nurs;

  vector[n] mu; // linear predictor

  // variance partitioning with the simplex pi
  sigma_r = sqrt(pi[1]) * sigma_tot;
  sigma_bloc = sqrt(pi[2]) * sigma_tot;
  sigma_pop = sqrt(pi[3]) * sigma_tot;

  mean_sigma_fam= sqrt(pi[4]) * sigma_tot;
  sigma_fam = exp(log(mean_sigma_fam) - (square(sigma_K)/2)  + z_log_sigma_fam*sigma_K);

  sigma_nurs= sqrt(pi[5]) * sigma_tot;

  alpha_pop = z_pop*sigma_pop;

  for(f in 1:n_fam){
    alpha_fam[f] =  z_fam[f]*sigma_fam[which_pop[f]];
  }

  alpha_bloc = z_bloc*sigma_bloc;
  alpha_nurs = z_nurs*sigma_nurs;

  mu = rep_vector(beta0, n) + alpha_fam[fam] + alpha_bloc[bloc] + alpha_pop[pop] + alpha_nurs[nurs];
  R_squared = 1 - variance(y - mu) / variance(y);
}
model{
  //Priors
  beta0 ~ normal(mean(y),2);
  sigma_tot ~ student_t(3, 0.0, 1.0);

  z_pop ~ std_normal();
  z_bloc ~ std_normal();
  z_fam ~ std_normal();
  z_nurs ~ std_normal();

  z_log_sigma_fam ~ std_normal();
  sigma_K ~ exponential(1);

  // Likelihood
  y ~ normal(mu, sigma_r);
}
generated quantities {
  //Variances
  real<lower=0> sigma2_r;
  real<lower=0> sigma2_pop;
  real<lower=0> sigma2_bloc;
  vector[n_pop] sigma2_fam;
  real<lower=0> sigma2_nurs;

  // Heritability
  vector[n_pop] h2_pop;

  // Evolvability
  vector[n_pop] I_pop;

  // Posterior predictive check
  //vector[n] y_rep;

  // Log-Likelihood (for WAIC/loo computations)
  //vector[n] log_lik;

  // Bayesian R2
  real bayes_R2_res; // residual based R2
  real bayes_R2;     // model based R2, i.e.  using modelled (approximate) residual variance

  sigma2_r = square(sigma_r);
  sigma2_pop = square(sigma_pop);
  sigma2_fam = square(sigma_fam);
  sigma2_nurs = square(sigma_nurs);
  for(p in 1:n_pop)  h2_pop[p] = (4*sigma2_fam[p])/(sigma2_r+sigma2_fam[p]);
  for(p in 1:n_pop)  I_pop[p] = (4*sigma2_fam[p])/(pop_mean[p]^2);// should we divide by the mean(y) of each population?
    sigma2_bloc = square(sigma_bloc);
//  for(i in 1:n)  {
//    y_rep[i] = normal_rng(mu[i], sigma_r);
//    //log_lik[i] = normal_lpdf(y[i]| mu[i], sigma_r); // log probability density function
//  }

  bayes_R2_res = variance(mu) / (variance(mu) + variance(y-mu));
  bayes_R2 = variance(mu) / (variance(mu) + sigma2_r);

} 

Running the models

Code
lapply(ht$codes, function(ht_i){
  lapply(names(sites), function(site_i){
    
    subsubdata <- subdata %>% 
      filter(site==site_i & y_names == ht_i) %>% 
      drop_na(y_values) %>% 
      dplyr::select(-y_names) %>% 
      dplyr::rename(y=y_values) %>% 
      arrange(pop)
    
    if(site_i %in% c("FE","FS")){ 
      model <- model_all_pops_one_nursery
      pars <- c("beta0", "pi", 
                "bayes_R2_res", "bayes_R2",
                "h2_pop","I_pop",
                "alpha_bloc","alpha_pop","alpha_fam",
                "sigma2_r","sigma2_bloc","sigma2_pop","sigma2_fam")
      } else if(site_i == "FW"){
        model <- model_all_pops_different_nurseries
        pars <- c("beta0", "pi",
                  "bayes_R2_res", "bayes_R2",
                  "h2_pop","I_pop",
                  "alpha_bloc","alpha_pop","alpha_fam","alpha_nurs",
                  "sigma2_r","sigma2_bloc","sigma2_pop","sigma2_fam","sigma2_nurs")
        }
    
    stanlist <- compose_data(subsubdata)
    stanlist$which_pop <- as.numeric(as.factor(pull(unique(subsubdata[c("pop","fam")])[,"pop"])))
    stanlist$pop_mean <- subsubdata %>% group_by(pop) %>% summarise(pop_mean=mean(y)) %>% .[["pop_mean"]]
    
    stanfit <- sampling(model,
                        data = stanlist,
                        pars = pars,
                        iter = n_iter, 
                        chains = n_chains, 
                        cores = n_chains,
                        save_warmup = save_warmup,
                        thin = n_thin)
    }) %>% 
    setNames(names(sites)) %>%
    saveRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_FullModel.rds")))})

Model diagnostics

The models have divergent transitions, especially in Inverewe (FW). We can visualize the divergent transitions with pair plots. The divergent transitions correspond to the red crosses on the plots.

Code
# Function to draw pairs plots from MCMC draws
plot_mcmc_pairs <- function(ht_i, site_i, params){
  p <- readRDS(here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_FullModel.rds")))[[site_i]]
  
  mcmc_pairs(as.array(p), 
             np = nuts_params(p), 
             pars = params,
             off_diag_args = list(size = 0.75),
             grid_args = list(top = paste0(ht_i, " in ", sites[[site_i]])))
  }

# Plotting
lapply(ht$codes, function(ht_i){
  lapply(names(sites), function(site_i){
    lapply(list(c("pi[1]","pi[2]","pi[3]","pi[4]"),
                c("h2_pop[1]","I_pop[1]","sigma2_fam[1]")), function(params){
                  plot_mcmc_pairs(ht_i, site_i, params)
                  })
    })
  })
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]


[[1]][[1]][[2]]



[[1]][[2]]
[[1]][[2]][[1]]


[[1]][[2]][[2]]



[[1]][[3]]
[[1]][[3]][[1]]


[[1]][[3]][[2]]




[[2]]
[[2]][[1]]
[[2]][[1]][[1]]


[[2]][[1]][[2]]



[[2]][[2]]
[[2]][[2]][[1]]


[[2]][[2]][[2]]



[[2]][[3]]
[[2]][[3]][[1]]


[[2]][[3]][[2]]

Extract params

Code
lapply(ht$codes, function(ht_i){
  
  readRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_FullModel.rds"))) %>% 
      lapply(function(x){
        
        broom.mixed::tidyMCMC(x,
                              pars=c("I_pop","h2_pop","sigma2_fam"),
                              robust = TRUE, # median and mean absolute deviation
                              ess = F, rhat = F, 
                              conf.int = T, conf.level = conf_level) %>% 
          mutate(pop = rep(sort(unique(subdata$pop)),3)) %>% 
          mutate(term = case_when(grepl("I_pop", term) ~ "I",
                                  grepl("sigma2", term) ~ "sigma2_fam",
                                  grepl("h2", term) ~ "h2"))
      }) %>% 
      setNames(names(sites)) %>% 
      bind_rows(.id = "site") 

    }) %>% 
    setNames(ht$codes) %>% 
    bind_rows(.id = "ht") %>% 
  saveRDS(here("outputs/ScotsPine/Evolvability/mcmc_params_full_model.rds"))
Code
mcmc_params_full_model <- readRDS(here("outputs/ScotsPine/Evolvability/mcmc_params_full_model.rds"))

Comparing estimates

We compare the credible intervals and check the correlations among the \(h^2\), \(I\) and \(\sigma^2_{fam}\) estimates from the population-specific models and the model with all populations.

Code
# Joining the MCMC draws from pop specific models and the full model (with all pops)
list_mcmc_params <- list(mcmc_params_pop_specific_models,
                         mcmc_params_full_model) %>% 
  setNames(c("Population-specific models",
             "One model with all populations")) %>%
  bind_rows(.id = "method") 

# Function to plot the intervals
plot_intervals_bothmodeltypes <- function(ht_i, param, list_mcmc_params){
  list_mcmc_params %>% 
    filter(term == param & ht == ht_i) %>% 
    ggplot(aes(x = estimate, y = pop, xmin = conf.low, xmax = conf.high, color=site, shape=method)) +
    geom_pointinterval(position = position_dodge(width = .6),point_size=3,alpha=0.6,size=5) +
    xlab(param_names[[param]]) +
    ylab("") +
    ggtitle(ht$names[[ht_i]]) +
    theme(axis.text = element_text(size=18),
          title = element_text(size=16),
          panel.grid.minor.x=element_blank(),
          panel.grid.major.x=element_blank())  +
    guides(color=guide_legend(ncol=1)) 

}

# Plot the intervals
lapply(params, function(param){
       lapply(ht$codes, function(ht_i){
         
         plot_intervals_bothmodeltypes(ht_i = ht_i, 
                              param = param,
                              list_mcmc_params = list_mcmc_params)
         })
  })
[[1]]
[[1]][[1]]


[[1]][[2]]



[[2]]
[[2]][[1]]


[[2]][[2]]



[[3]]
[[3]][[1]]


[[3]][[2]]

Code
# Function to plot the correlations
plot_correlations <- function(ht_i, param, list_mcmc_params){
  list_mcmc_params %>% 
    filter(term == param & ht == ht_i) %>% 
    mutate(method_site = paste0(method, " in ", site)) %>% 
    dplyr::select(estimate,method_site,pop) %>% 
    pivot_wider(names_from=method_site,values_from=estimate) %>% 
    dplyr::select(-pop) %>% 
    cor() %>% 
    corrplot(method = 'number',
             type = 'lower', diag = FALSE,
             mar=c(0,0,2,0),
             title=paste0(ht$names[[ht_i]]," - ",param_names[[param]]),
             number.cex=0.9,tl.cex=0.9)
}

# plot the correlations
lapply(params, function(param){
       lapply(ht$codes, function(ht_i){
         
         plot_correlations(ht_i = ht_i,
                           param = param,
                           list_mcmc_params = list_mcmc_params)
         })
  })

The estimation of the narrow-sense heritability incorporates uncertainty in the estimation of the family variance and the residual variance, so it has a higher uncertainty than the evolvability and the family variance. We may consider using the evolvavility or the family variance to compare the populations.

Next step: estimating the genetic parameters with a genomic relationship matrix.

References

Archambeau, Juliette, Marta Benito Garzón, Marina de Miguel, Benjamin Brachi, Frédéric Barraquand, and Santiago C. González-Martínez. 2023. “Reduced Within-Population Quantitative Genetic Variation Is Associated with Climate Harshness in Maritime Pine.” Heredity 131 (1): 68–78. https://doi.org/10.1038/s41437-023-00622-9.