从Uniprot中批量获取PDB结构
Get PDB脚本源于Github中: https://github.com/Wang-Lin-boop/GetPDB
安装脚本和Julia运行库
git clone https://github.com/Wang-Lin-boop/GetPDB
cd GetPDB
echo "alias GetPDB=${PWD}/GetPDB" >> ~/.bashrc
chmod +x GetPDB
cd ..
wget https://julialang-s3.julialang.org/bin/linux/x64/1.5/julia-1.5.3-linux-x86_64.tar.gz
tar zxvf julia-1.5.3-linux-x86_64.tar.gz
cd julia-1.5.3/bin
echo "export PATH=${PWD}:\$PATH" >> ~/.bashrc
source ~/.bashrc
julia
]add BioStructures # in Julia REPL
exit()
Get PDB用法
Usage: GetPDB [OPTION] <parameter>
Example: GetPDB -i Uniprot_list -w -o Uniprot-PDB -n 10 -p -r
Input parameter:
-i Your Uniprotlist file.
-b Your PDBlib, optional.
-n The Max number of CPU threads available for this job, default is 4.
-l An index for Uniprot, such as "pdb_chain_uniprot.csv".
This file can be download at https://www.ebi.ac.uk/pdbe/docs/sifts/quick.html
OR you can use -w download its latest version automatic.
-w Use -w instead of -l unless you know what you're doing.
Output parameter:
-o Processed PDB files will store in this Path, default is Uniprot-PDB.
-d A dir to store some list of Uniprot-PDBID-Chainid info, defult is Uniprot-info-list.
-p Output a Representative chain per Uniprot's PDB Entry. Such as PXXXXX:XXXX_A/B, only XXXX_A will be output. Defult is false.
-r Each sequence interval preserves only one representative structure. Defult is false.
Such as P00000:XXXX_A:27-213 and P00000:ZZZZ_A:27-213, only one of them will be saved.
获取蛋白质PDB结构首先需要将基因名转换为蛋白质
在Uniprot的检索页中批量检索基因集,下载表格数据并提取其中的蛋白质ID信息
在Uniprot_List_entry中存放基因对应的蛋白质名,文件本身及蛋白质ID不需要添加前后缀
批量获取Uniprot中的AlphaFold数据库存储的蛋白质PDB结构
for i in `cat Uniprot_List_entry`; do mkdir ${i}; wget -q -N -O ./${i}/${i}.pdb https://alphafold.ebi.ac.uk/files/AF-${i}-F1-model_v1.pdb; done
批量获取Uniprot中的源于PDB数据库中的蛋白质PDB结构或晶体CIF结构
GetPDB -i Uniprot_List_entry -w -o PDBbyUS -n 10