Code
# we load the phenotypic data
<- readRDS(file=here("data/ScotsPine/formatted_CG_data.rds"))
data
<- c("FE" = "Glensaugh (FE)", "FW" = "Inverewe (FW)", "FS" = "Yair (FS)") sites
Height in 2014 and 2020
Goal: Comparing the adaptive potential of the populations.
For that, we estimate the narrow-sense heritability, evolvability and within-population (i.e. family) variance of the populations for height (measured in 2014 and 2020).
We estimate these genetic parameters:
at the species-level (i.e. one estimate for all populations).
at the population-level with population-specific models (i.e. one model per population).
at the population-level with one model for all populations (hereafter referred as ‘full model’), and based on Archambeau et al. (2023).
As heritability and evolvability are specific to the environment, the models are fitted in each common garden separately. We fit models that do not account for the nursery effect for the two common gardens in which trees come from the same nursery: the sites of Yair (FS) in the Scottish borders (in which all trees come from NG) and Glensaugh (FE) in which all trees come from NE (except four trees that come from NG and that we remove for the analyses).
We fit models that do account for the nursery effect in the common garden in which trees come from different nurseries: Inverewe (FW), which contains cohorts of trees raised in each of the three nurseries as follows: 290 grown locally in the NW; 132 grown in the NG; and 82 grown in the NE.
In family-based progeny trials, we assume that the seedlings from each maternal tree are half-siblings (i.e. same mother but each with a different father).
The coefficient of relationship of half-siblings is \(r = 1/4\), and therefore \(\sigma^2_A = 4\sigma^2_{family}\).
The narrow-sense heritability is equal to \(h^2 = \frac{\sigma^2_{A}}{\sigma^2_{P}} = \frac{4\sigma^2_{family}}{\sigma^2_{P}}\) where \(\sigma^2_{A}\) is the additive genetic variance, \(\sigma^2_{P}\) is the phenotypic variance and \(\sigma^2_{f}\) is the family variance.
What is annoying is that when calculating \(h^2\) based on family-based progeny trials (without genomic information), people do not include the same terms in the determinant. See the Short Note: On Estimating Heritability according to practical applications by Coterill in 1986
In our case, we will estimate \(h^2\) as follows:
\[h^2 = \frac{4\sigma^2_{family}}{\sigma^2_{family} + \sigma^2_r}\]
And evolvability as follows:
\[I = \frac{4\sigma^2_{family}}{\text{mean(y)}^2}\] When looking at the genomic data (FormattingScotsPineGenomicData.qmd
report), we found that some pairs of individuals expected to be half-sibs showed a coefficient of relationships higher than 0.7 (i.e. clones) or between 0.354 and 0.707 (i.e. full-sibs). To account for that, we may want to use the “true” relationship coefficients between individuals, that we can get from the genomic data, i.e. use a genomic relationship matrix to estimate \(h^2\).
For details on the data, see the report ExploringFormattingCGdata.qmd
.
# we load the phenotypic data
<- readRDS(file=here("data/ScotsPine/formatted_CG_data.rds"))
data
<- c("FE" = "Glensaugh (FE)", "FW" = "Inverewe (FW)", "FS" = "Yair (FS)") sites
How many NAs for height measurements?
%>% dplyr::select(contains("HA")) %>% sapply(function(x) sum(is.na(x))) data
HA13 HA14 HA15 HA16 HA17 HA18 HA19 HA20
748 57 68 79 70 76 74 83
We analyse height in 2014 and 2020.
<- list(codes = c("HA14","HA20"),
ht names = c("Height in 2014","Height in 2020"),
colors= c("chartreuse","chartreuse4"))
names(ht$names) <- ht$codes
We keep only the colums of interest and we remove the four trees in FE that belong to the NG nursery:
<- data %>%
subdata ::select(FieldSite,Block,PopulationCode,Family,Nursery,all_of(ht$codes)) %>%
dplyr::rename(site=FieldSite,
dplyrbloc=Block,
pop=PopulationCode,
fam=Family,
nurs=Nursery) %>%
filter(!(site=="FE" & nurs =="NG")) %>% # we remove the four trees that come from the nursery NG in the field site FE
pivot_longer(contains("HA"), names_to = "y_names", values_to = "y_values")
%>%
subdata drop_na(y_values) %>%
ggplot(aes(x=y_values)) +
geom_histogram(aes(y=after_stat(density), colour=y_names,fill=y_names),bins = 50, alpha = 0.4) +
geom_density(aes(colour=y_names),alpha=.2,fill="white") +
facet_wrap(~site,scales="free", labeller = labeller(site = sites)) +
scale_fill_manual(name = "Trait", values = ht$colors, labels = ht$names) +
scale_color_manual(name = "Trait", values = ht$colors, labels = ht$names) +
ylab("Density") + xlab("Height values") +
theme_bw() +
theme(legend.position = c(0.9,0.9))
Options for the Stan models across the document:
# Sampling in Bayesian models
<- 4 # number of chains (MCMC)
n_chains <- 2500 # number of iterations
n_iter <- 1250 # number of iterations in the warm-up phase
n_warm <- 1 # thinning interval
n_thin <- FALSE
save_warmup
# Credible intervals
<- 0.95 conf_level
In this section, we estimate evolvability (\(I\)) and narrow-sense heritability (\(h^2\)) at the species level, i.e. one value for all populations.
Here is the mathematical model for a common garden with different nurseries. The mathematical model for a common garden with trees coming from one nursery is the same, without the nursery intercepts.
We model each trait \(y\) such as:
\[\begin{equation} \begin{aligned} y_{bpfnr} & \sim \mathcal{N}(\mu_{bpfn},\sigma^{2}_{r})\\[3pt] \mu_{bpc} & = \beta_{0} + B_{b} + P_{p} + F_{f} + N_{n} \\[3pt] \beta_{0} & \sim \mathcal{N}(\mu_{y},2)\\[3pt] \begin{bmatrix} B_{b} \\ P_{p} \\ F_{f} \\ N_{n} \end{bmatrix} & \sim \mathcal{N}\left(0, \begin{bmatrix} \sigma^{2}_{B}\\[3pt] \sigma^{2}_{P}\\[3pt] \sigma^{2}_{F}\\[3pt] \sigma^{2}_{N}\\[3pt] \end{bmatrix} \right)\\ \end{aligned} \end{equation}\]
\(\beta_{0}\) is the global intercept. \(B_{b}\) are the block intercepts with a variance \(\sigma^{2}_{B}\). \(P_{p}\) are the population intercepts with a variance \(\sigma^{2}_{P}\). \(F_{f}\) are the family intercepts with a variance \(\sigma^{2}_{F}\). \(N_{n}\) are the nursery intercepts with a variance \(\sigma^{2}_{N}\). \(\sigma^{2}_{r}\) is the residual variance.
Partitioning of the total variance \(\sigma_{tot}^{2}\):
\[\begin{equation} \begin{aligned} \sigma_{tot}^{2} & = \sigma_{r}^{2} + \sigma_{B}^{2} + \sigma_{P}^{2} + \sigma_{F}^{2} + \sigma_{N}^{2} \\[3pt] \sigma_{r} & = \sigma_{tot} \times \sqrt(\pi_{r})\\[3pt] \sigma_{B} & = \sigma_{tot} \times \sqrt(\pi_{B})\\[3pt] \sigma_{P} & = \sigma_{tot} \times \sqrt(\pi_{P})\\[3pt] \sigma_{F} & = \sigma_{tot} \times \sqrt(\pi_{F})\\[3pt] \sigma_{N} & = \sigma_{tot} \times \sqrt(\pi_{N})\\[3pt] \sigma_{tot} & \sim \mathcal{S}^{*}(0,1,3) \end{aligned} \end{equation}\]
with \(\sum_{l}^{5}\pi_{l}=1\).
Estimation of \(h^2\) (narrow-sense heritability) and \(I\) (evolvability).
We assume that the seedlings from each maternal tree are half-siblings (i.e. same mother but each with a different father), so \(\sigma^2_A = 4\sigma^2_{family}\), and so \(h^2 = \frac{4\sigma^2_{family}}{\sigma^2_{family} + \sigma^2_r}\) and \(I = \frac{4\sigma^2_{family}}{\text{mean(y)}^2}\).
Comment: in some papers, I’ve seen \(h^2 = \frac{4\sigma^2_{family}}{\sigma^2_{family} + \sigma^2_{population} + \sigma^2_r}\). Which \(h^2\) estimation should we use?
<- stan_model(here("scripts/stan_models/Evolvability_M1_DifferentNurseries.stan")) model_different_nurseries
recompiling to avoid crashing R session
<- stan_model(here("scripts/stan_models/Evolvability_M1_OneNursery.stan")) model_one_nursery
recompiling to avoid crashing R session
model_different_nurseries
S4 class stanmodel 'anon_model' coded as follows:
// Variance partitioning in the common gardens with trees from different nurseries
data {
int<lower=1> n; // Number of observations
vector[n] y; // Response variable
int<lower=0> n_bloc; // Number of blocks
int<lower=0> n_pop; // Number of populations
int<lower=0> n_fam; // Number of families
int<lower=0> n_nurs; // Number of nurseries
int<lower=0, upper=n_bloc> bloc[n]; // Blocks
int<lower=0, upper=n_pop> pop[n]; // Populations
int<lower=0, upper=n_fam> fam[n]; // Families
int<lower=0, upper=n_nurs> nurs[n]; // Nurseries
}
parameters {
real beta0; // global intercept
simplex[5] pi;
real<lower = 0> sigma_tot;
vector[n_bloc] z_block;
vector[n_pop] z_pop;
vector[n_fam] z_fam;
vector[n_nurs] z_nurs;
}
transformed parameters {
real R_squared;
real<lower = 0> sigma_r;
real<lower = 0> sigma_block;
real<lower = 0> sigma_pop;
real<lower = 0> sigma_fam;
real<lower = 0> sigma_nurs;
vector[n_bloc] alpha_block;
vector[n_pop] alpha_pop;
vector[n_fam] alpha_fam;
vector[n_nurs] alpha_nurs;
vector[n] mu; // linear predictor
// variance partitioning with the simplex pi
sigma_r = sqrt(pi[1]) * sigma_tot;
sigma_block = sqrt(pi[2]) * sigma_tot;
sigma_pop = sqrt(pi[3]) * sigma_tot;
sigma_fam= sqrt(pi[4]) * sigma_tot;
sigma_nurs= sqrt(pi[5]) * sigma_tot;
alpha_block = z_block*sigma_block;
alpha_pop = z_pop*sigma_pop;
alpha_fam = z_fam*sigma_fam;
alpha_nurs = z_nurs*sigma_nurs;
mu = rep_vector(beta0, n) + alpha_block[bloc] + alpha_pop[pop] + alpha_fam[fam] + alpha_nurs[nurs];
R_squared = 1 - variance(y - mu) / variance(y);
}
model{
//Priors
beta0 ~ normal(mean(y),20);
sigma_tot ~ student_t(3, 0.0, 1.0);
z_block ~ std_normal();
z_pop ~ std_normal();
z_fam ~ std_normal();
z_nurs ~ std_normal();
// Likelihood
y ~ normal(mu, sigma_r);
}
generated quantities {
//Variances, narrow-sense heritability and evolvability
real<lower=0> sigma2_tot;
real<lower=0> sigma2_r;
real<lower=0> sigma2_block;
real<lower=0> sigma2_pop;
real<lower=0> sigma2_fam;
real<lower=0> sigma2_nurs;
real<lower=0> h2;
real<lower=0> I;
sigma2_tot = square(sigma_tot);
sigma2_r = square(sigma_r);
sigma2_block = square(sigma_block);
sigma2_pop = square(sigma_pop);
sigma2_fam = square(sigma_fam);
sigma2_nurs = square(sigma_nurs);
h2 = (4*sigma2_fam)/(sigma2_r+sigma2_fam);
I = (4*sigma2_fam)/(mean(y)^2);
}
lapply(ht$codes, function(ht_i){
lapply(names(sites), function(site_i){
<- subdata %>%
subsubdata filter(site==site_i & y_names == ht_i) %>%
drop_na(y_values) %>%
::select(-y_names) %>%
dplyr::rename(y=y_values)
dplyr
if(site_i %in% c("FE","FS")){
<- model_one_nursery
model <- c("beta0", "pi", "R_squared", "h2","I",
pars "alpha_block","alpha_pop","alpha_fam",
"sigma2_r","sigma2_block","sigma2_pop","sigma2_fam","sigma2_tot")
else if(site_i == "FW"){
} <- model_different_nurseries
model <- c("beta0", "pi", "R_squared", "h2","I",
pars "alpha_block","alpha_pop","alpha_fam","alpha_nurs",
"sigma2_r","sigma2_block","sigma2_pop","sigma2_fam","sigma2_nurs","sigma2_tot")
}
<- sampling(model,
stanfit data = compose_data(subsubdata),
pars=pars,
iter = n_iter,
chains = n_chains,
cores = n_chains,
save_warmup = save_warmup,
thin=n_thin)
%>%
}) setNames(names(sites)) %>%
saveRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_SpeciesLevelModels.rds")))
})
# Function to plot the intervals
<- function(ht_i,plot_title){
plot_intervals
readRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_SpeciesLevelModels.rds"))) %>%
lapply(function(x){
::tidyMCMC(x,pars=("pi"),
broom.mixeddroppars = NULL,
ess = F, rhat = F,
conf.int = T, conf.level = conf_level)
%>%
}) bind_rows(.id = "site") %>%
mutate(prop_var = case_when(term == "pi[1]" ~ "Residuals",
== "pi[2]" ~ "Blocks",
term == "pi[3]" ~ "Populations",
term == "pi[4]" ~ "Families",
term == "pi[5]" ~ "Nurseries")) %>%
term ggplot(aes(y = prop_var, x = estimate, xmin = conf.low, xmax = conf.high, color=site)) +
geom_pointinterval(position = position_dodge(width = .4),point_size=3,alpha=0.6,size=5) +
ylab("") +
xlab("Proportion of variance explained") +
ggtitle(plot_title) +
labs(color = "Field sites") +
theme(axis.text = element_text(size=16),
title = element_text(size=14),
panel.grid.minor.y=element_blank(),
panel.grid.major.y=element_blank()) +
guides(color=guide_legend(ncol=1))
}
plot_intervals(ht_i = "HA14", plot_title = "Height in 2014")
plot_intervals(ht_i = "HA20", plot_title = "Height in 2020")
<- function(param, x_axis){
plot_intervals_splevel
lapply(ht$codes, function(ht_i){
readRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_SpeciesLevelModels.rds"))) %>%
lapply(function(x){
::tidyMCMC(x,pars=(param),
broom.mixeddroppars = NULL,
ess = F, rhat = F,
conf.int = T, conf.level = conf_level)
%>%
}) bind_rows(.id = "site") }) %>%
setNames(ht$codes) %>%
bind_rows(.id="ht") %>%
ggplot(aes(y = site, x = estimate, xmin = conf.low, xmax = conf.high, color=ht)) +
geom_pointinterval(position = position_dodge(width = .4),point_size=3,alpha=0.6,size=5) +
ylab("") +
xlab(x_axis) +
scale_color_manual(name = "", values = ht$colors, labels = ht$names) +
theme(axis.text = element_text(size=16),
legend.text = element_text(size=14),
title = element_text(size=14),
panel.grid.minor.y=element_blank(),
panel.grid.major.y=element_blank()) +
guides(color=guide_legend(ncol=1))
}
plot_intervals_splevel(param="h2", x_axis = "Narrow-sense heritability")
plot_intervals_splevel(param="I", x_axis = "Evolvability")
In this section, we estimate the evolvability (\(I\)) and narrow-sense heritability (\(h^2\)) of each population.
# Parameters of interest
<-c("I","h2","sigma2_fam")
params <- c("Evolvability", "Narrow-sense heritability", "Within-family variance")
param_names names(param_names) <- params
We first fit the models on the entire dataset.
We fit one model per population.
Here is the model equation for a common garden with different nurseries. The model equation for a common garden with trees coming from one nursery is the same, without the nursery intercepts.
We model each trait \(y\) such as:
\[\begin{equation} \begin{aligned} y_{bfnr} & \sim \mathcal{N}(\mu_{bfn},\sigma^{2}_{r})\\[3pt] \mu_{bpc} & = \beta_{0} + B_{b} + F_{f} + N_{n} \\[3pt] \beta_{0} & \sim \mathcal{N}(\mu_{y},2)\\[3pt] \begin{bmatrix} B_{b} \\ F_{f} \\ N_{n} \end{bmatrix} & \sim \mathcal{N}\left(0, \begin{bmatrix} \sigma^{2}_{B}\\[3pt] \sigma^{2}_{F}\\[3pt] \sigma^{2}_{N}\\[3pt] \end{bmatrix} \right)\\ \end{aligned} \end{equation}\]
\(\beta_{0}\) is the global intercept. \(B_{b}\) are the block intercepts with a variance \(\sigma^{2}_{B}\). \(F_{f}\) are the family intercepts with a variance \(\sigma^{2}_{F}\). \(N_{n}\) are the nursery intercepts with a variance \(\sigma^{2}_{N}\). \(\sigma^{2}_{r}\) is the residual variance.
Partitioning of the total variance \(\sigma_{tot}^{2}\):
\[\begin{equation} \begin{aligned} \sigma_{tot}^{2} & = \sigma_{r}^{2} + \sigma_{B}^{2} + \sigma_{F}^{2} + \sigma_{N}^{2} \\[3pt] \sigma_{r} & = \sigma_{tot} \times \sqrt(\pi_{r})\\[3pt] \sigma_{B} & = \sigma_{tot} \times \sqrt(\pi_{B})\\[3pt] \sigma_{F} & = \sigma_{tot} \times \sqrt(\pi_{F})\\[3pt] \sigma_{N} & = \sigma_{tot} \times \sqrt(\pi_{N})\\[3pt] \sigma_{tot} & \sim \mathcal{S}^{*}(0,1,3) \end{aligned} \end{equation}\]
with \(\sum_{l}^{4}\pi_{l}=1\).
<- stan_model(here("scripts/stan_models/Evolvability_M2_DifferentNurseries.stan"))
model_pop_specific_different_nurseries <- stan_model(here("scripts/stan_models/Evolvability_M2_OneNursery.stan"))
model_pop_specific_one_nursery model_pop_specific_different_nurseries
S4 class stanmodel 'anon_model' coded as follows:
// Variance partitioning in one population in the common gardens with trees from different nurseries
data {
int<lower=1> n; // Number of observations
vector[n] y; // Response variable
int<lower=0> n_bloc; // Number of blocks
int<lower=0> n_fam; // Number of families
int<lower=0> n_nurs; // Number of nurseries
int<lower=0, upper=n_bloc> bloc[n]; // Blocks
int<lower=0, upper=n_fam> fam[n]; // Families
int<lower=0, upper=n_nurs> nurs[n]; // Nurseries
}
parameters {
real beta0; // global intercept
simplex[4] pi;
real<lower = 0> sigma_tot;
vector[n_bloc] z_block;
vector[n_fam] z_fam;
vector[n_nurs] z_nurs;
}
transformed parameters {
real R_squared;
real<lower = 0> sigma_r;
real<lower = 0> sigma_block;
real<lower = 0> sigma_fam;
real<lower = 0> sigma_nurs;
vector[n_bloc] alpha_block;
vector[n_fam] alpha_fam;
vector[n_nurs] alpha_nurs;
vector[n] mu; // linear predictor
// variance partitioning with the simplex pi
sigma_r = sqrt(pi[1]) * sigma_tot;
sigma_block = sqrt(pi[2]) * sigma_tot;
sigma_fam= sqrt(pi[3]) * sigma_tot;
sigma_nurs= sqrt(pi[4]) * sigma_tot;
alpha_block = z_block*sigma_block;
alpha_fam = z_fam*sigma_fam;
alpha_nurs = z_nurs*sigma_nurs;
mu = rep_vector(beta0, n) + alpha_block[bloc] + alpha_fam[fam] + alpha_nurs[nurs];
R_squared = 1 - variance(y - mu) / variance(y);
}
model{
//Priors
beta0 ~ normal(mean(y),20);
sigma_tot ~ student_t(3, 0.0, 1.0);
z_block ~ std_normal();
z_fam ~ std_normal();
z_nurs ~ std_normal();
// Likelihood
y ~ normal(mu, sigma_r);
}
generated quantities {
//Variances, narrow-sense heritability and evolvability
real<lower=0> sigma2_tot;
real<lower=0> sigma2_r;
real<lower=0> sigma2_block;
real<lower=0> sigma2_fam;
real<lower=0> sigma2_nurs;
real<lower=0> h2;
real<lower=0> I;
sigma2_tot = square(sigma_tot);
sigma2_r = square(sigma_r);
sigma2_block = square(sigma_block);
sigma2_fam = square(sigma_fam);
sigma2_nurs = square(sigma_nurs);
h2 = (4*sigma2_fam)/(sigma2_r+sigma2_fam);
I = (4*sigma2_fam)/(mean(y)^2);
}
lapply(ht$codes, function(ht_i){
lapply(names(sites), function(site_i){
<- subdata %>%
subsubdata filter(site==site_i & y_names == ht_i) %>%
drop_na(y_values) %>%
::select(-y_names) %>%
dplyr::rename(y=y_values) %>%
dplyrarrange(pop)
lapply(unique(subsubdata$pop), function(pop_i){
<- subsubdata %>%
subpop filter(pop==pop_i)
if(site_i %in% c("FE","FS")){
<- model_pop_specific_one_nursery
model <- c("pi", "h2", "I", #"beta0", "R_squared",
pars "alpha_block", "alpha_fam",
"sigma2_r", "sigma2_block", "sigma2_fam", "sigma2_tot")
else if(site_i == "FW"){
} <- model_pop_specific_different_nurseries
model <- c("pi", "h2", "I", #"beta0", "R_squared",
pars "alpha_block", "alpha_fam", "alpha_nurs",
"sigma2_r", "sigma2_block", "sigma2_fam", "sigma2_nurs", "sigma2_tot")
}
<- sampling(model,
stanfit data = compose_data(subpop),
pars=pars,
iter = n_iter,
chains = n_chains,
cores = n_chains,
save_warmup = save_warmup,
thin=n_thin)
%>%
}) setNames(unique(subsubdata$pop)) %>%
saveRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_",site_i,"_PopSpecificModels.rds")))
}) })
lapply(ht$codes, function(ht_i){
lapply(names(sites), function(site_i){
readRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_",site_i,"_PopSpecificModels.rds"))) %>%
lapply(function(pop_i){
::tidyMCMC(pop_i,
broom.mixedpars=params,
robust = TRUE, # median and mean absolute deviation
ess = F, rhat = F,
conf.int = T, conf.level = conf_level)
%>%
}) bind_rows(.id = "pop")
%>%
}) setNames(names(sites)) %>%
bind_rows(.id = "site")
%>%
}) setNames(ht$codes) %>%
bind_rows(.id = "ht") %>%
saveRDS(here("outputs/ScotsPine/Evolvability/mcmc_params_pop_specific_models.rds"))
<- readRDS(here("outputs/ScotsPine/Evolvability/mcmc_params_pop_specific_models.rds")) mcmc_params_pop_specific_models
<- function(ht_i, mcmc_draws, param, x_axis, plot_title){
plot_intervals_poplevel
%>%
mcmc_draws filter(term == param & ht == ht_i) %>%
ggplot(aes(y = pop, x = estimate, xmin = conf.low, xmax = conf.high,color=site)) + # invert x and y to get the populations on the x-axis
geom_pointinterval(position = position_dodge(width = .4),point_size=3,alpha=0.6,size=5) +
ylab("") +
xlab(x_axis) +
ggtitle(plot_title) +
theme(axis.text = element_text(size=16),
legend.text = element_text(size=16),
panel.grid.minor.y=element_blank(),
panel.grid.major.y=element_blank()) +
guides(color=guide_legend(ncol=1))
}
plot_intervals_poplevel(ht_i = "HA14",
mcmc_draws = mcmc_params_pop_specific_models,
param = "h2",
x_axis = "Narrow-sense heritability",
plot_title = "Height in 2014")
plot_intervals_poplevel(ht_i = "HA20",
mcmc_draws = mcmc_params_pop_specific_models,
param = "h2",
x_axis = "Narrow-sense heritability",
plot_title = "Height in 2020")
plot_intervals_poplevel(ht_i = "HA14",
mcmc_draws = mcmc_params_pop_specific_models,
param = "I",
x_axis = "Evolvability",
plot_title = "Height in 2014")
plot_intervals_poplevel(ht_i = "HA20",
mcmc_draws = mcmc_params_pop_specific_models,
param = "I",
x_axis = "Evolvability",
plot_title = "Height in 2020")
plot_intervals_poplevel(ht_i = "HA14",
mcmc_draws = mcmc_params_pop_specific_models,
param = "sigma2_fam",
x_axis = "Within-family variance",
plot_title = "Height in 2014")
plot_intervals_poplevel(ht_i = "HA20",
mcmc_draws = mcmc_params_pop_specific_models,
param = "sigma2_fam",
x_axis = "Within-family variance",
plot_title = "Height in 2020")
Here is the model equation for a common garden with different nurseries.
We modeled each trait \(y_{bpfr}\) such as:
\[\begin{equation} \begin{aligned} y_{bpfnr} & \sim \mathcal{N}(\mu_{bpfn},\sigma^{2}_{r})\\[3pt] \mu_{bpfn} & = \beta_{0} + B_{b} + P_{p} + F_{f(p)} + N_{n}\\[3pt] \end{aligned} \end{equation}\]
\(\beta_{0}\) is the global intercept. \(B_{b}\) are the block intercepts with a variance \(\sigma^{2}_{B}\). \(P_{p}\) are the population intercepts with a variance \(\sigma^{2}_{P}\). \(F_{f}\) are the family intercepts with a variance \(\sigma^{2}_{F}\). \(N_{n}\) are the nursery intercepts with a variance \(\sigma^{2}_{N}\). \(\sigma^{2}_{r}\) is the residual variance.
The prior of \(\beta_{0}\) was weakly informative and centered around the mean of the observed values for the trait under considered, as follows:
\[\beta_{0} \sim \mathcal{N}(\mu_{y},20)\]
The population, nursery and block intercepts, \(P_{p}\), \(N_{n}\) and \(B_{b}\) were considered normally-distributed with variances \(\sigma^{2}_{P}\), \(\sigma^{2}_{N}\) and \(\sigma^{2}_{B}\), such as:
\[\begin{bmatrix} B_{b} \\ P_{p} \\ N_{n} \end{bmatrix} \sim \mathcal{N}\left(0, \begin{bmatrix}\sigma^{2}_{B}\\[3pt] \sigma^{2}_{P}\\ \sigma^{2}_{N}\\ \end{bmatrix} \right)\\[3pt]\]
The family intercepts \(F_{f(p)}\) were considered to follow some population-specific normal distributions, such as: \[F_{f(p)} \sim \mathcal{N}(0,\sigma^{2}_{F_{p}})\]
where \(\sigma^{2}_{F_{p}}\) are the population-specific variances among families.
To partition the total variance, we parameterize our model so that only the total variance, \(\sigma_{tot}^{2}\) has a prior, such that:
\[\begin{equation} \begin{aligned} \sigma_{tot}^{2} & = \sigma_{r}^{2} + \sigma_{B}^{2} + \overline{\sigma_{F_{p}}}^{2} + \sigma_{P}^{2} + \sigma_{N}^{2}\\[3pt] \sigma_{r} & = \sigma_{tot} \times \sqrt(\pi_{r})\\[3pt] \sigma_{B} & = \sigma_{tot} \times \sqrt(\pi_{B})\\[3pt] \sigma_{P} & = \sigma_{tot} \times \sqrt(\pi_{P})\\[3pt] \sigma_{N} & = \sigma_{tot} \times \sqrt(\pi_{N})\\[3pt] \overline{\sigma_{F_{p}}} & = \sigma_{tot} \times \sqrt(\pi_{F})\\[3pt] \sigma_{tot} & \sim \mathcal{S}^{*}(0,1,3) \end{aligned} \end{equation}\]
where \(\overline{\sigma_{F_{p}}}\) and \(\overline{\sigma_{F_{p}}}^{2}\) are the mean of the population-specific among-family standard deviations (\(\sigma_{F_{p}}\)) and variances (\(\sigma^{2}_{F_{p}}\)), respectively, and \(\sum_{l}^{5}\pi_{l}=1\) (using the simplex
function in Stan
).
The population-specific among-families standard deviations \(\sigma_{F_{p}}\) follow a log-normal distribution with mean \(\overline{\sigma_{F_{p}}}\) and variance \(\sigma^{2}_{K}\), such as:
\[\begin{equation*} \begin{aligned} \sigma_{F_{p}} & \sim \mathcal{LN}\left(\ln(\overline{\sigma_{F_{p}}})-\frac{\sigma^{2}_{K}}{2},\sigma^{2}_{K}\right)\\[3pt] \sigma_{K} & \sim \exp(1)\\[3pt] \end{aligned} \end{equation*}\]
<- stan_model(here("scripts/stan_models/Evolvability_M3_DifferentNurseries.stan")) model_all_pops_different_nurseries
recompiling to avoid crashing R session
<- stan_model(here("scripts/stan_models/Evolvability_M3_OneNursery.stan")) model_all_pops_one_nursery
recompiling to avoid crashing R session
model_all_pops_different_nurseries
S4 class stanmodel 'anon_model' coded as follows:
// Estimating population-specific heritabilities and evolvabilities in the common gardens with trees from different nurseries
data {
int<lower=1> n; // Number of observations
vector[n] y; // Response variable
int<lower=0> n_bloc; // Number of blocks
int<lower=0> n_pop; // Number of populations
int<lower=0> n_fam; // Number of families
int<lower=0> n_nurs; // Number of nurseries
int<lower=0, upper=n_bloc> bloc[n]; // Blocks
int<lower=0, upper=n_pop> pop[n]; // Populations
int<lower=0, upper=n_pop> which_pop[n_fam]; // Populations
vector[n_pop] pop_mean; // Mean of the populations
int<lower=0, upper=n_fam> fam[n]; // Families
int<lower=0, upper=n_nurs> nurs[n]; // Nurseries
}
parameters {
real beta0; // global intercept
simplex[5] pi;
real<lower = 0> sigma_tot;
real<lower = 0> sigma_K;
vector[n_pop] z_log_sigma_fam;
vector[n_bloc] z_bloc;
vector[n_pop] z_pop;
vector[n_fam] z_fam;
vector[n_nurs] z_nurs;
}
transformed parameters {
real R_squared;
real<lower = 0> sigma_r;
real<lower = 0> sigma_bloc;
real<lower = 0> sigma_pop;
real<lower = 0> sigma_nurs;
real mean_sigma_fam;
vector[n_pop] sigma_fam;
vector[n_pop] alpha_pop;
vector[n_fam] alpha_fam;
vector[n_bloc] alpha_bloc;
vector[n_nurs] alpha_nurs;
vector[n] mu; // linear predictor
// variance partitioning with the simplex pi
sigma_r = sqrt(pi[1]) * sigma_tot;
sigma_bloc = sqrt(pi[2]) * sigma_tot;
sigma_pop = sqrt(pi[3]) * sigma_tot;
mean_sigma_fam= sqrt(pi[4]) * sigma_tot;
sigma_fam = exp(log(mean_sigma_fam) - (square(sigma_K)/2) + z_log_sigma_fam*sigma_K);
sigma_nurs= sqrt(pi[5]) * sigma_tot;
alpha_pop = z_pop*sigma_pop;
for(f in 1:n_fam){
alpha_fam[f] = z_fam[f]*sigma_fam[which_pop[f]];
}
alpha_bloc = z_bloc*sigma_bloc;
alpha_nurs = z_nurs*sigma_nurs;
mu = rep_vector(beta0, n) + alpha_fam[fam] + alpha_bloc[bloc] + alpha_pop[pop] + alpha_nurs[nurs];
R_squared = 1 - variance(y - mu) / variance(y);
}
model{
//Priors
beta0 ~ normal(mean(y),2);
sigma_tot ~ student_t(3, 0.0, 1.0);
z_pop ~ std_normal();
z_bloc ~ std_normal();
z_fam ~ std_normal();
z_nurs ~ std_normal();
z_log_sigma_fam ~ std_normal();
sigma_K ~ exponential(1);
// Likelihood
y ~ normal(mu, sigma_r);
}
generated quantities {
//Variances
real<lower=0> sigma2_r;
real<lower=0> sigma2_pop;
real<lower=0> sigma2_bloc;
vector[n_pop] sigma2_fam;
real<lower=0> sigma2_nurs;
// Heritability
vector[n_pop] h2_pop;
// Evolvability
vector[n_pop] I_pop;
// Posterior predictive check
//vector[n] y_rep;
// Log-Likelihood (for WAIC/loo computations)
//vector[n] log_lik;
// Bayesian R2
real bayes_R2_res; // residual based R2
real bayes_R2; // model based R2, i.e. using modelled (approximate) residual variance
sigma2_r = square(sigma_r);
sigma2_pop = square(sigma_pop);
sigma2_fam = square(sigma_fam);
sigma2_nurs = square(sigma_nurs);
for(p in 1:n_pop) h2_pop[p] = (4*sigma2_fam[p])/(sigma2_r+sigma2_fam[p]);
for(p in 1:n_pop) I_pop[p] = (4*sigma2_fam[p])/(pop_mean[p]^2);// should we divide by the mean(y) of each population?
sigma2_bloc = square(sigma_bloc);
// for(i in 1:n) {
// y_rep[i] = normal_rng(mu[i], sigma_r);
// //log_lik[i] = normal_lpdf(y[i]| mu[i], sigma_r); // log probability density function
// }
bayes_R2_res = variance(mu) / (variance(mu) + variance(y-mu));
bayes_R2 = variance(mu) / (variance(mu) + sigma2_r);
}
lapply(ht$codes, function(ht_i){
lapply(names(sites), function(site_i){
<- subdata %>%
subsubdata filter(site==site_i & y_names == ht_i) %>%
drop_na(y_values) %>%
::select(-y_names) %>%
dplyr::rename(y=y_values) %>%
dplyrarrange(pop)
if(site_i %in% c("FE","FS")){
<- model_all_pops_one_nursery
model <- c("beta0", "pi",
pars "bayes_R2_res", "bayes_R2",
"h2_pop","I_pop",
"alpha_bloc","alpha_pop","alpha_fam",
"sigma2_r","sigma2_bloc","sigma2_pop","sigma2_fam")
else if(site_i == "FW"){
} <- model_all_pops_different_nurseries
model <- c("beta0", "pi",
pars "bayes_R2_res", "bayes_R2",
"h2_pop","I_pop",
"alpha_bloc","alpha_pop","alpha_fam","alpha_nurs",
"sigma2_r","sigma2_bloc","sigma2_pop","sigma2_fam","sigma2_nurs")
}
<- compose_data(subsubdata)
stanlist $which_pop <- as.numeric(as.factor(pull(unique(subsubdata[c("pop","fam")])[,"pop"])))
stanlist$pop_mean <- subsubdata %>% group_by(pop) %>% summarise(pop_mean=mean(y)) %>% .[["pop_mean"]]
stanlist
<- sampling(model,
stanfit data = stanlist,
pars = pars,
iter = n_iter,
chains = n_chains,
cores = n_chains,
save_warmup = save_warmup,
thin = n_thin)
%>%
}) setNames(names(sites)) %>%
saveRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_FullModel.rds")))})
The models have divergent transitions, especially in Inverewe (FW). We can visualize the divergent transitions with pair plots. The divergent transitions correspond to the red crosses on the plots.
# Function to draw pairs plots from MCMC draws
<- function(ht_i, site_i, params){
plot_mcmc_pairs <- readRDS(here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_FullModel.rds")))[[site_i]]
p
mcmc_pairs(as.array(p),
np = nuts_params(p),
pars = params,
off_diag_args = list(size = 0.75),
grid_args = list(top = paste0(ht_i, " in ", sites[[site_i]])))
}
# Plotting
lapply(ht$codes, function(ht_i){
lapply(names(sites), function(site_i){
lapply(list(c("pi[1]","pi[2]","pi[3]","pi[4]"),
c("h2_pop[1]","I_pop[1]","sigma2_fam[1]")), function(params){
plot_mcmc_pairs(ht_i, site_i, params)
})
}) })
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[[1]][[1]][[2]]
[[1]][[2]]
[[1]][[2]][[1]]
[[1]][[2]][[2]]
[[1]][[3]]
[[1]][[3]][[1]]
[[1]][[3]][[2]]
[[2]]
[[2]][[1]]
[[2]][[1]][[1]]
[[2]][[1]][[2]]
[[2]][[2]]
[[2]][[2]][[1]]
[[2]][[2]][[2]]
[[2]][[3]]
[[2]][[3]][[1]]
[[2]][[3]][[2]]
lapply(ht$codes, function(ht_i){
readRDS(file=here(paste0("outputs/ScotsPine/Evolvability/StanModels/",ht_i,"_FullModel.rds"))) %>%
lapply(function(x){
::tidyMCMC(x,
broom.mixedpars=c("I_pop","h2_pop","sigma2_fam"),
robust = TRUE, # median and mean absolute deviation
ess = F, rhat = F,
conf.int = T, conf.level = conf_level) %>%
mutate(pop = rep(sort(unique(subdata$pop)),3)) %>%
mutate(term = case_when(grepl("I_pop", term) ~ "I",
grepl("sigma2", term) ~ "sigma2_fam",
grepl("h2", term) ~ "h2"))
%>%
}) setNames(names(sites)) %>%
bind_rows(.id = "site")
%>%
}) setNames(ht$codes) %>%
bind_rows(.id = "ht") %>%
saveRDS(here("outputs/ScotsPine/Evolvability/mcmc_params_full_model.rds"))
<- readRDS(here("outputs/ScotsPine/Evolvability/mcmc_params_full_model.rds")) mcmc_params_full_model
We compare the credible intervals and check the correlations among the \(h^2\), \(I\) and \(\sigma^2_{fam}\) estimates from the population-specific models and the model with all populations.
# Joining the MCMC draws from pop specific models and the full model (with all pops)
<- list(mcmc_params_pop_specific_models,
list_mcmc_params %>%
mcmc_params_full_model) setNames(c("Population-specific models",
"One model with all populations")) %>%
bind_rows(.id = "method")
# Function to plot the intervals
<- function(ht_i, param, list_mcmc_params){
plot_intervals_bothmodeltypes %>%
list_mcmc_params filter(term == param & ht == ht_i) %>%
ggplot(aes(x = estimate, y = pop, xmin = conf.low, xmax = conf.high, color=site, shape=method)) +
geom_pointinterval(position = position_dodge(width = .6),point_size=3,alpha=0.6,size=5) +
xlab(param_names[[param]]) +
ylab("") +
ggtitle(ht$names[[ht_i]]) +
theme(axis.text = element_text(size=18),
title = element_text(size=16),
panel.grid.minor.x=element_blank(),
panel.grid.major.x=element_blank()) +
guides(color=guide_legend(ncol=1))
}
# Plot the intervals
lapply(params, function(param){
lapply(ht$codes, function(ht_i){
plot_intervals_bothmodeltypes(ht_i = ht_i,
param = param,
list_mcmc_params = list_mcmc_params)
}) })
[[1]]
[[1]][[1]]
[[1]][[2]]
[[2]]
[[2]][[1]]
[[2]][[2]]
[[3]]
[[3]][[1]]
[[3]][[2]]
# Function to plot the correlations
<- function(ht_i, param, list_mcmc_params){
plot_correlations %>%
list_mcmc_params filter(term == param & ht == ht_i) %>%
mutate(method_site = paste0(method, " in ", site)) %>%
::select(estimate,method_site,pop) %>%
dplyrpivot_wider(names_from=method_site,values_from=estimate) %>%
::select(-pop) %>%
dplyrcor() %>%
corrplot(method = 'number',
type = 'lower', diag = FALSE,
mar=c(0,0,2,0),
title=paste0(ht$names[[ht_i]]," - ",param_names[[param]]),
number.cex=0.9,tl.cex=0.9)
}
# plot the correlations
lapply(params, function(param){
lapply(ht$codes, function(ht_i){
plot_correlations(ht_i = ht_i,
param = param,
list_mcmc_params = list_mcmc_params)
}) })
The estimation of the narrow-sense heritability incorporates uncertainty in the estimation of the family variance and the residual variance, so it has a higher uncertainty than the evolvability and the family variance. We may consider using the evolvavility or the family variance to compare the populations.
Next step: estimating the genetic parameters with a genomic relationship matrix.