Here we construct a Binomial model first. We have got 25 groups of size 50 or less, with a slightly different probability of success in each group. Let’s say we have we have 25 groups of \(z_i\) turtles. Females are born with probability \(p_i\) in group \(i\).
sample_size_per_group = round(30*runif(25))+20
n_groups = length(sample_size_per_group)
temperature = (1:n_groups)*0.1 + rnorm(n_groups,0,0.5)
plot(1:n_groups,temperature,type="o",xlab="Group ID",ylab="Temperature")
Y = p = temperature
for (i in 1:n_groups){
p[i] = 1/(1+exp(-3*(temperature[i]-mean(temperature)) ) )
Y[i] = rbinom(1,sample_size_per_group[i],p[i])
}
plot(temperature,p,type="p",xlab="Temperature",ylab="Pr(female)")
plot(1:n_groups,Y,type="p",xlab="Number_group",ylab="N_females")
Data in a list:
m11.data <- list(N = n_groups, y = Y, temp = temperature, z = sample_size_per_group)
Now we fit that model which writes mathematically like
\[ y_i \sim \text{Binomial}(z_i,p(\text{temp}_i)) \]
data { // observed variables
int<lower=1> N; // number of observations
vector[N] temp; // temperature
int<lower=0,upper=50> y[N]; // response variable
int<lower=0,upper=50> z[N]; // sample size per group
}
parameters { // unobserved variables
real mu_temp;
real gamma;
}
model {
for (k in 1:N){
y[k] ~ binomial(z[k], inv_logit(gamma*(temp[k]-mu_temp))); // likelihood
//y[k] ~ bernoulli_logit(alpha+beta*temp[k]); // likelihood -- does not work because not bernoulli of course!
}
mu_temp ~ normal(2, 10); // prior of the mean temp
gamma ~ normal(1, 10); // prior of the slope
}
fit.m1.1 <- sampling(m1.1, data = m11.data, iter = 1000, chains = 2, cores = 2)
print(fit.m1.1, probs = c(0.10, 0.5, 0.9))
## Inference for Stan model: 3f1e7d0b1f2848684011f04c4f6f7f52.
## 2 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=1000.
##
## mean se_mean sd 10% 50% 90% n_eff Rhat
## mu_temp 1.35 0.00 0.03 1.31 1.35 1.38 622 1
## gamma 3.87 0.01 0.26 3.52 3.86 4.19 683 1
## lp__ -250.01 0.04 0.98 -251.27 -249.72 -249.11 482 1
##
## Samples were drawn using NUTS(diag_e) at Mon Mar 30 08:40:46 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Now we consider a simpler model with a different set of priors. We consider only one group which has 50 young turtles. The probability of obtaining a female of 0.3. We have individual-level data. (As we don’t have individual-level covariates, doing a bernoulli or binomial model is very much similar, but I am changing things just for fun and exercise here. Also to see if some codes are much faster).
Y=rbinom(50,1,0.3)
m12.data <- list(N = 50, y = Y)
And now we fit the model
m1.2 = stan_model("m1.2.stan")
## Warning in readLines(file, warn = TRUE): ligne finale incomplète trouvée dans '/home/sylvain/Documents/BIOGECO/BiogecoBayes/Workshop3_BinomialModels/m1.2.stan'
fit.m1.2 <- sampling(m1.2, data = m12.data, iter = 1000, chains = 2, cores = 2)
print(fit.m1.2, probs = c(0.10, 0.5, 0.9))
## Inference for Stan model: m1.
## 2 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=1000.
##
## mean se_mean sd 10% 50% 90% n_eff Rhat
## p 0.30 0.00 0.06 0.22 0.30 0.38 355 1
## lp__ -31.71 0.03 0.62 -32.51 -31.46 -31.25 469 1
##
## Samples were drawn using NUTS(diag_e) at Mon Mar 30 08:41:26 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Why did I choose the Beta distribution for the prior of \(p\)? In short, this is the conjugate prior distribution, some sort of canonical choice. In the past, choosing conjugate priors used to speed up monumentously the computations. Nowadays this is less true but still: - A conjugate prior is a prior whose probability distribution is also the distribution of the posterior. In other words, here if we have a Beta prior we get a Beta posterior, and if we iterate the data assimilation process 10 000 times it will still be Beta.
curve(dbeta(x,2,2))
curve(dbeta(x,0.5,0.5),add=T,col="red")
Obivously red would make little sense here. What would it imply?
m1.3 = stan_model("m1.3.stan")
## Warning in readLines(file, warn = TRUE): ligne finale incomplète trouvée dans '/home/sylvain/Documents/BIOGECO/BiogecoBayes/Workshop3_BinomialModels/m1.3.stan'
fit.m1.3 <- sampling(m1.3, data = m12.data, iter = 1000, chains = 2, cores = 2)
print(fit.m1.3, probs = c(0.10, 0.5, 0.9))
## Inference for Stan model: m1.
## 2 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=1000.
##
## mean se_mean sd 10% 50% 90% n_eff Rhat
## p 0.28 0.00 0.06 0.20 0.29 0.36 344 1.01
## lp__ -29.38 0.03 0.72 -30.17 -29.12 -28.88 485 1.00
##
## Samples were drawn using NUTS(diag_e) at Mon Mar 30 08:42:06 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Still works well but the prior is bringing no information or even poor information.
data { // observed variables
int<lower=1> N; // number of observations
vector[N] temp; // temperature
int<lower=0,upper=50> y[N]; // response variable
int<lower=0,upper=50> z[N]; // sample size per group
}
parameters { // unobserved variables
real mu_temp;
real gamma;
}
model {
for (k in 1:N){
y[k] ~ binomial(z[k], inv_logit(gamma*(temp[k]-mu_temp))); // likelihood
//y[k] ~ bernoulli_logit(alpha+beta*temp[k]); // likelihood -- does not work because not bernoulli of course!
}
mu_temp ~ normal(2, 100); // prior of the mean temp
gamma ~ normal(1, 100); // prior of the slope
}
Let us propagate the uncertainty by visualizing the transformed parameter. We simulate according to the prior
mu_temp = rnorm(100,2, 100) # prior of the mean temp
gamma = rnorm(100,1, 100) # prior on the slope
x=seq(min(temperature),max(temperature),by=0.01)
par(mfrow=c(1,2))
plot(0, bty = 'n', pch = '', ylab = "Pr(female)", xlab = "Temperature",ylim=c(0,1))
for (kprior in 1:100) {
prob = 1/(1+exp(-0.3*(x-mu_temp[kprior])) )
lines(x,prob,type="l",col="blue")}
plot(0, bty = 'n', pch = '', ylab = "Pr(female)", xlab = "Temperature",ylim=c(0,1))
for (kprior in 1:100) {
prob = 1/(1+exp(-gamma[kprior]*(x-mean(temperature))) )
lines(x,prob,type="l",col="blue")}
### We get either 0 or 1.
### Better priors
mu_temp = rnorm(100,2, 1) # prior of the mean temp
gamma = rnorm(100,1, 1) # prior on the slope
par(mfrow=c(1,2))
plot(0, bty = 'n', pch = '', ylab = "Pr(female)", xlab = "Temperature",ylim=c(0,1))
for (kprior in 1:100) {
prob = 1/(1+exp(-0.3*(x-mu_temp[kprior])) )
lines(x,prob,type="l",col="blue")}
plot(0, bty = 'n', pch = '', ylab = "Pr(female)", xlab = "Temperature",ylim=c(0,1))
for (kprior in 1:100) {
prob = 1/(1+exp(-abs(gamma[kprior])*(x-mean(temperature))) )
lines(x,prob,type="l",col="blue")}
As suggested by McElreath p. 330 of his book, records of (160!) salmon-pirating attempts by one Bald eagle on another Bald eagle (not always the same!). NB: I haven’t implemented the quadratic approx. suggested by McElreath though that sounds like an excellent exercise.
?eagles
data(eagles) ### Explanatory variables
#P
#Size of pirating eagle (L = large, S = small).
#A
#Age of pirating eagle (I = immature, A = adult).
#V
#Size of victim eagle (L = large, S = small).
eagles.glm <- glm(cbind(y, n - y) ~ P*A + V, data = eagles,
family = binomial) ##classic frequentist modelling
# Small and immature birds capture less, small birds are more likely to have salmon stolen. So far so good.
m2.data = list(N=nrow(eagles),y=eagles$y,z=eagles$n,P=as.numeric(eagles$P)-1,A=as.numeric(eagles$A)-1,V=as.numeric(eagles$V)-1)
# m2.data = list(N=nrow(eagles),y=eagles$y,z=eagles$n,P=eagles$P,A=eagles$A,V=eagles$V)
We’re in a similar situation to the turtles sex, but with categorical explanatory variables
data { // observed variables
int<lower=1> N; // number of observations
int<lower=0,upper=50> y[N]; // response variable
int<lower=0,upper=50> z[N]; // sample size per group
int<lower=0,upper=1> A[N]; // attribute age of pirating eagle
int<lower=0,upper=1> P[N]; // attribute size of pirating eagle
int<lower=0,upper=1> V[N]; // attribute size of victim eagle
}
parameters { // unobserved variables
real alpha;
real beta_P;
real beta_V;
real beta_A;
}
model {
for (k in 1:N){
y[k] ~ binomial(z[k], inv_logit(alpha + beta_P*P[k] + beta_V*V[k] +beta_A*A[k]) ); // likelihood
}
alpha ~ normal(0, 10); // prior of the mean temp
beta_P ~ normal(0, 5); // prior of the slope
beta_V ~ normal(0, 5); // prior of the slope
beta_A ~ normal(0, 5); // prior of the slope
}
Fitting the model
m2.1 = stan_model("m2.1.stan",verbose = TRUE)
##
## TRANSLATING MODEL 'm2' FROM Stan CODE TO C++ CODE NOW.
## successful in parsing the Stan model 'm2'.
fit.m2.1 <- sampling(m2.1, data = m2.data, iter = 1000, chains = 2, cores = 2)
print(fit.m2.1, probs = c(0.10, 0.5, 0.9))
## Inference for Stan model: e47cfaa9a07140b50edb541286b764dc.
## 2 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=1000.
##
## mean se_mean sd 10% 50% 90% n_eff Rhat
## alpha 1.39 0.02 0.43 0.85 1.38 1.93 536 1
## beta_P -4.60 0.04 0.96 -5.95 -4.54 -3.42 512 1
## beta_V 4.98 0.05 1.02 3.73 4.91 6.35 511 1
## beta_A -1.15 0.02 0.52 -1.80 -1.14 -0.49 575 1
## lp__ -47.98 0.06 1.31 -49.74 -47.67 -46.60 456 1
##
## Samples were drawn using NUTS(diag_e) at Mon Mar 30 08:43:28 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
What do we think of these priors?
The stan book promotes this model
data {
int<lower=1> D;
int<lower=0> N;
int<lower=1> L;
int<lower=0,upper=1> y[N]; // raw data
int<lower=1,upper=L> ll[N]; // set of levels related to each data point
row_vector[D] x[N];
}
parameters {
real mu[D];
real<lower=0> sigma[D];
vector[D] beta[L];
}
model {
for (d in 1:D) {
mu[d] ~ normal(0, 100);
for (l in 1:L)
beta[l,d] ~ normal(mu[d], sigma[d]);
}
for (n in 1:N)
y[n] ~ bernoulli(inv_logit(x[n] * beta[ll[n]]));
}
We’ve got the opportunity to transform the previously eagles model into something less complex but with a similar philosophy (NB I wondering what priors they have on sigma[d] in their example). Currently we have estimated each effect (size of pirating eagle, age of pirating eagle, size of victim eagle) independently. On relatively small datasets, this is prone to inflation of effects and why in practice many statisticians implement some of form of shrinkage in such models. This is done easily with an additional prior linking together the effects (see also McElreath’s book on this).
data { // observed variables
int<lower=1> N; // number of observations
int<lower=0,upper=50> y[N]; // response variable
int<lower=0,upper=50> z[N]; // sample size per group
int<lower=0,upper=1> A[N];
int<lower=0,upper=1> P[N];
int<lower=0,upper=1> V[N];
}
parameters { // unobserved variables
real alpha;
real beta_P;
real beta_V;
real beta_A;
real gamma_b;
}
model {
for (k in 1:N){
y[k] ~ binomial(z[k], inv_logit(alpha + beta_P*P[k] + beta_V*V[k] +beta_A*A[k]) ); // likelihood
}
alpha ~ normal(0, 1); // prior of the mean
beta_P ~ normal(0, gamma_b); // prior of the "slope" (here the effect, rather)
beta_V ~ normal(0, gamma_b); // prior of the slope
beta_A ~ normal(0, gamma_b); // prior of the slope
gamma_b ~ gamma(0.1,0.1); // hyperprior
}
Let us now fit that model
m3.2 = stan_model("m3.2.stan")
fit.m3.2 <- sampling(m3.2, data = m2.data, iter = 1000, chains = 2, cores = 2)
## Warning: There were 7 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
print(fit.m3.2, probs = c(0.10, 0.5, 0.9))
## Inference for Stan model: m3.
## 2 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=1000.
##
## mean se_mean sd 10% 50% 90% n_eff Rhat
## alpha 1.09 0.02 0.41 0.61 1.07 1.65 332 1.01
## beta_P -4.37 0.07 1.10 -5.82 -4.22 -3.15 241 1.00
## beta_V 4.95 0.07 1.13 3.60 4.82 6.43 239 1.00
## beta_A -0.87 0.03 0.51 -1.55 -0.85 -0.21 359 1.00
## gamma_b 4.66 0.16 2.42 2.38 4.13 7.46 227 1.00
## lp__ -55.74 0.12 1.91 -58.33 -55.36 -53.65 254 1.00
##
## Samples were drawn using NUTS(diag_e) at Mon Mar 30 08:45:36 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Very moderate amount of shrinkage. There are some divergent iterations as well. Perhaps there are not enough effects for this, or the distribution - gamma - is not constraining enough.