There are several ways you can choose template search method. You can also upload your own structure that will serve as a template.
Gap Repairer will run PSI-Blast with 5 iterations and given e-value cutoff (default value: 0.001; it can be changed under Advanced options below). Blast is run on MODELLER database - i.e. all sequences from PDB, with gapped regions ommited. Currently there are 301852 records in our copy of database.
Results are sorted by the diminishing
sequence_identity_within gap [%%] * sequence_coverage_within_gap [%%](calculated from the pairwise alignment with the target and averaged across gaps). The highest scoring template is designated as a reference one, assumed to be most correct.
Topologies of all potential templates are checked, and only those with the same type of nontriviality (or triviality) as the reference structure are kept.
Based on pairwise alignments gap regions are taken from each potential template and a Multiple Sequence Alignment (MSA) with the target sequence is created for each gap (this MSA can be constructed in two ways: as a progressive or consensus alignment.
Final template set consists of as few sequences as possible while maintaining coverage of all amino acids missing from the target.From these structure a consensus structure is made and then refined using the MODELLER software.
Similar to the Consensus option.
You can choose manualy which structures will serve as templates. You have to provide a list of PDB identifiers (complete with the chain symbol, separated by a colon). Only these (and always all of them) structures will be used to create a consensus structure for refinement. As with the 1. option, the templates can be aligned in the missing regions following one of two protocols: as a progressive or consensus alignment.
For example valid list format is:
1ual:A 1uak:A 4mcb:A
Valid separators are:
, ; <spacebar> <tabulator> <newline>but not a pair of them.
Chain symbol must be separeted from PDB identifier with single colon. PDB codes are case insensitive, but chain symbols are. (Usually chain symbols are upper case but not always).
Read more about Protein Data Bank.
This option is similar to the 1. one, the one difference being that potential templates are identified by a structural similarity search against DALI database, and not by PSI-Blast using target sequence. Rest of the workflow, including sorting and restriction to one topology, remains the same.
You can provide your own .pdb file to serve as a template. It should be noted here, that alignment to the target will still be done based on the sequences of both structures.
Uploaded file MUST contain only the ATOM/HETAM records - chain used as a template will be determined based on the 21st postion in the first line (thus the first chain present in the file).
Default options have been selected based on multiple tests. Modifing some of them may result in an unexpected behaviour.
If for some reason you do not want certain structures removed from the potential template pool, point them out here. These structures will not be selected as a template, regardless of which method you choose.
You can either exclude all chains from the given structure
1ual 1uak 4mcb
or specify individual chains (after the colon). To specify multiple chains, please write them down as separate structures:
1ual:A 1uak:A 4mcb:A 4mcb:B
Valid separators for different proteins are: <spacebar> <tabulator> <newline> , <coma> ; <semicolon> but not a pair of them.
Default e-value cutoff is set to 0.001. Through higher values you can potentially find more template sequences (with less significant similarity) but also increase the risk of using random, and possibly unrelated, models.
Acceptable notations are:
Missing residues' postions will start as an averaged position of their counterparts in the templates - assigned through an alignment (or Multiple Sequence Alignment in case of multiple templates).
Two ways of creating the MSA within gaps have been implemented, both based on the BLOSUM62 similarity matrix:
where target is first aligned to the most similar template, then the rest of the sorted templates are added (aligned with the growing alignment) one by one.
where sequences are aligned in an order based on an UPGMA guide tree - in each step the closest (in terms of alignment score) sequences/subalignments are aligned, until only one alignment remains. I particular the target sequence may be the last one added.