linux shell脚本实现 xx.hmp.txt格式数据转换为plink格式


 

001、测试数据

root@PC1:/home/test# ls
mdp_genotype_test.hmp.txt  record.sh
root@PC1:/home/test# head -n 5 mdp_genotype_test.hmp.txt | cut -f 1-13   ## 测试数据
rs      alleles chrom   pos     strand  assembly        center  protLSID        assayLSID       panel   QCcode  33-16   38-11
PZB00859.1      A/C     1       157104  +       AGPv1   Panzea  NA      NA      maize282        NA      CC      CC
PZA01271.1      C/G     1       1947984 +       AGPv1   Panzea  NA      NA      maize282        NA      CC      GG
PZA03613.2      G/T     1       2914066 +       AGPv1   Panzea  NA      NA      maize282        NA      GG      GG
PZA03613.1      A/T     1       2914171 +       AGPv1   Panzea  NA      NA      maize282        NA      TT      TT
root@PC1:/home/test# cat record.sh     ## 转换脚本
#!/bin/bash

columns=$(head -n 1 $1 | awk '{print NF}')

for i in $(seq 12 $columns); do cut -f $i $1 | paste -d "/t" -s | sed 's//r//g; s/NN/00/g; s//t./&/t/g;s/^/S/+/t/&&/t0/t0/t1/t-9/t/g' >> result.ped; done

sed 1d $1 | awk '{OFS = "/t"; print $3, $1, 0, $4}' > result.map

 

root@PC1:/home/test# ls
mdp_genotype_test.hmp.txt  record.sh
root@PC1:/home/test# bash record.sh mdp_genotype_test.hmp.txt    ## 格式转换
root@PC1:/home/test# ls   
mdp_genotype_test.hmp.txt  record.sh  result.map  result.ped
root@PC1:/home/test# plink --file result --recode      ## 测试转换结果
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --file result
  --recode

15969 MB RAM detected; reserving 7984 MB for main workspace.
.ped scan complete (for binary autoconversion).
Warning: Variant 424 quadallelic; setting rarest alleles missing.
Performing single-pass .bed write (3093 variants, 281 people).
--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.
3093 variants loaded from .bim file.
281 people (281 males, 0 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 281 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.963828.
3093 variants and 281 people pass filters and QC.
Note: No phenotypes present.
--recode ped to plink.ped + plink.map ... done.
root@PC1:/home/test# ls
mdp_genotype_test.hmp.txt  plink.log  plink.map  plink.ped  record.sh  result.map  result.ped

 

原创文章,作者:奋斗,如若转载,请注明出处:https://blog.ytso.com/277647.html

(0)
上一篇 2022年7月29日
下一篇 2022年7月29日

相关推荐

发表回复

登录后才能评论