
最佳线性无偏预测(Best Linear Unbiased Prediction,简称BLUP)可以对多环境数据进行整合,去除环境效应,得到个体稳定遗传的表型。BLUP是表型处理的常用做法。R包lme4中lmer函数是BLUP分析常用的方法,在很多NG文章都引用了该方法。 下面将用实际数据演示多环境无重复数据和多环境有重复数据的过程。首先安装lme4包。
install.packages("lme4")
多环境无重复BLUP
数据格式如下,数据是每个环境叠加的。 有人喜欢用数字表示系名或环境,这样应该把lines和env转换为因子。缺失值用NA表示。
## lines env y
## 1 L1 env1 66.72533
## 2 L2 env1 53.82899
## 3 L3 env1 58.04559
## 4 L4 env1 63.09452
## 5 L5 env1 57.59054
## 6 L6 env1 61.37506
接下我们用lmer进行BLUP分析,在lmer中 1|env 表示把env当作随机效应,我们把env和lines当作随机效应。
blp=lmer(y~(1|env)+(1|lines),data=data)H=5.509/(5.509+3.481/3)summary(blp)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ (1 | env) + (1 | lines)
## Data: data
##
## REML criterion at convergence: 2700.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.69680 -0.52821 -0.00762 0.58518 2.53832
##
## Random effects:
## Groups Name Variance Std.Dev.
## lines (Intercept) 5.50859 2.3470
## env (Intercept) 0.09091 0.3015
## Residual 3.48151 1.8659
## Number of obs: 575, groups: lines, 209; env, 3
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 60.8465 0.2511 242.3
blups= ranef(blp)names(blups)
## [1] "lines" "env"
lines=blups$lines+blp@betares=data.frame(id=rownames(lines),blup=lines)write.table(res,file="data_blup_result.txt",row.names = F,quote = F,sep="\t")hist(lines[,1],col="#0AB3CA",border="white",xlab="BLUP of lines",main="")
多环境有重复BLUP
数据格式也是叠加的,多了一列rep表示重复。
library(lme4)data=read.table("data_rep_blup.txt",header = T)head(data)
## env lines rep y
## 1 env1 L1 rep1 373.6640
## 2 env1 L2 rep1 526.4561
## 3 env1 L3 rep1 544.7073
## 4 env1 L4 rep1 602.2171
## 5 env1 L5 rep1 573.5111
## 6 env1 L6 rep1 415.2294
data$lines=factor(data$lines)data$env=factor(data$env)data$rep=factor(data$rep)
blp=lmer(y~(1|rep%in%env)+(1|env)+(1|lines)+(1|env:lines),data=data,
control=lmerControl(check.nobs.vs.nlev = "ignore", check.nobs.vs.rankZ = "ignore", check.nlev.gtr.1 = "ignore", check.nobs.vs.nRE="ignore"))H=8154/(8154+9409/3+6121/3/3)lines=blups$lines+blp@betablp.out=data.frame(id=rownames(lines),blp=lines)write.table(res,file="data_blup_rep_result.txt",row.names = F,quote = F,sep="\t")summary(blp)
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## y ~ (1 | rep %in% env) + (1 | env) + (1 | lines) + (1 | env:lines)
## Data: data
## Control:
## lmerControl(check.nobs.vs.nlev = "ignore", check.nobs.vs.rankZ = "ignore",
## check.nlev.gtr.1 = "ignore", check.nobs.vs.nRE = "ignore")
##
## REML criterion at convergence: 3754.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6599 -0.3999 0.0071 0.3850 2.8693
##
## Random effects:
## Groups Name Variance Std.Dev.
## env:lines (Intercept) 9409 97.00
## lines (Intercept) 8154 90.30
## env (Intercept) 129802 360.28
## rep %in% env (Intercept) 30449 174.50
## Residual 6121 78.24
## Number of obs: 306, groups:
## env:lines, 102; lines, 34; env, 3; rep %in% env, 1
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 800.0 272.2 2.939
hist(lines[,1],col="#0AB3CA",border="white",xlab="BLUP of lines",main="")