# [转载]杂合率检验

1. 计算杂合度

plink --bfile HapMap_3_r3_9 --het --out R_check

1

2

.het (method-of-moments F coefficient estimates)

Produced by --het.

A text file with a header line, and one line per sample with the following six fields:

FID Family ID  # 家系ID

IID Within-family ID # 个体ID

O(HOM) Observed number of homozygotes # 实际纯合个数

E(HOM) Expected number of homozygotes # 期望纯合个数

N(NM) Number of non-missing autosomal genotypes # 总个数

F Method-of-moments F coefficient estimate #

1

2

3

4

5

6

7

8

9

10

11

F = O − E N − E F = \frac{O-E}{N-E}

F=

N−E

O−E

O: O(HOM)

E: E(HOM)

N: N(NM)

2. 杂合度可视化

R代码：

pdf("heterozygosity.pdf")

het$HET_RATE = (het$"N.NM." - het$"O.HOM.")/het$"N.NM."

hist(het$HET_RATE, xlab="Heterozygosity Rate", ylab="Frequency", main= "Heterozygosity Rate") dev.off() 1 2 3 4 5 6 可视化： 3. 计算杂合度三倍标准差以外的个体 首先，查看哪些个体在3倍标准差以外： het <- read.table("R_check.het", head=TRUE) het$HET_RATE = (het$"N.NM." - het$"O.HOM.")/het$"N.NM." het_fail = subset(het, (het$HET_RATE < mean(het$HET_RATE)-3*sd(het$HET_RATE)) | (het$HET_RATE > mean(het$HET_RATE)+3*sd(het$HET_RATE))); het_fail$HET_DST = (het_fail$HET_RATE-mean(het$HET_RATE))/sd(het$HET_RATE); write.table(het_fail, "fail-het-qc.txt", row.names=FALSE) 1 2 3 4 5 6 结果可以看出，这两个个体杂合度在3倍标准差以外： 4. 去掉这些个体 cat fail-het-qc.txt "FID" "IID" "O.HOM." "E.HOM." "N.NM." "F" "HET_RATE" "HET_DST" 1330 "NA12342" 703770 691000 1066256 0.03402 0.33996151018142 -4.75272486494316 1459 "NA12874" 709454 695300 1072858 0.03758 0.338725162137021 -5.18288854902555 1 2 3 4 5 先对数据进行清洗，去掉引号，然后提取家系和个体ID sed 's/"//g' fail-het-qc.txt |awk '{print$2}' > het_fail_ind.txt

1

2

sed 's/"//g' fail-het-qc.txt |awk '{print $1,$2}' > het_fail_ind.txt

1

2

plink --bfile HapMap_3_r3_9 --remove het_fail_ind.txt --make-bed --out HapMap_3_r3_10

1

2

5. 结果文件

HapMap_3_r3_10.bed  HapMap_3_r3_10.bim  HapMap_3_r3_10.fam  HapMap_3_r3_10.log

————————————————

https://wap.sciencenet.cn/blog-1469385-1280606.html

## 全部精选博文导读

GMT+8, 2022-5-25 05:57