GapRepairer

Template selection methods

There are several ways you can choose template search method. You can also upload your own structure that will serve as a template.

Gap Repairer will run PSI-Blast with 5 iterations and given e-value cutoff (default value: 0.001; it can be changed under Advanced options below). Blast is run on MODELLER database - i.e. all sequences from PDB, with gapped regions ommited. Currently there are 301852 records in our copy of database.

Results are sorted by the diminishing

sequence_identity_within gap [%%] * sequence_coverage_within_gap [%%]

(calculated from the pairwise alignment with the target and averaged across gaps). The highest scoring template is designated as a reference one, assumed to be most correct.

Topologies of all potential templates are checked, and only those with the same type of nontriviality (or triviality) as the reference structure are kept.

Based on pairwise alignments gap regions are taken from each potential template and a Multiple Sequence Alignment (MSA) with the target sequence is created for each gap (this MSA can be constructed in two ways: as a progressive or consensus alignment.

Final template set consists of as few sequences as possible while maintaining coverage of all amino acids missing from the target.From these structure a consensus structure is made and then refined using the MODELLER software.

2. The best one

Similar to the Consensus option.

3. My PDB selection

You can choose manualy which structures will serve as templates. You have to provide a list of PDB identifiers (complete with the chain symbol, separated by a colon). Only these (and always all of them) structures will be used to create a consensus structure for refinement. As with the 1. option, the templates can be aligned in the missing regions following one of two protocols: as a progressive or consensus alignment.

For example valid list format is:

1ual:A 1uak:A 4mcb:A

Valid separators are:

, ; <spacebar> <tabulator> <newline>

but not a pair of them.

Chain symbol must be separeted from PDB identifier with single colon. PDB codes are case insensitive, but chain symbols are. (Usually chain symbols are upper case but not always).

4. Dali structural search

This option is similar to the 1. one, the one difference being that potential templates are identified by a structural similarity search against DALI database, and not by PSI-Blast using target sequence. Rest of the workflow, including sorting and restriction to one topology, remains the same.

5. My own structure

You can provide your own .pdb file to serve as a template. It should be noted here, that alignment to the target will still be done based on the sequences of both structures.

Uploaded file MUST contain only the ATOM/HETAM records - chain used as a template will be determined based on the 21^st postion in the first line (thus the first chain present in the file).

Advanced options

Default options have been selected based on multiple tests. Modifing some of them may result in an unexpected behaviour.

Eliminate from alignment

If for some reason you do not want certain structures removed from the potential template pool, point them out here. These structures will not be selected as a template, regardless of which method you choose.

You can either exclude all chains from the given structure

1ual 1uak 4mcb

or specify individual chains (after the colon). To specify multiple chains, please write them down as separate structures:

1ual:A 1uak:A 4mcb:A 4mcb:B

Valid separators for different proteins are: <spacebar> <tabulator> <newline> , <coma> ; <semicolon> but not a pair of them.

E-value cutoff

Default e-value cutoff is set to 0.001. Through higher values you can potentially find more template sequences (with less significant similarity) but also increase the risk of using random, and possibly unrelated, models.

Acceptable notations are:

0.00001 1E-5

Sequence alignment within gaps

Missing residues' postions will start as an averaged position of their counterparts in the templates - assigned through an alignment (or Multiple Sequence Alignment in case of multiple templates).

Two ways of creating the MSA within gaps have been implemented, both based on the BLOSUM62 similarity matrix:

progressive alignment (default),

where target is first aligned to the most similar template, then the rest of the sorted templates are added (aligned with the growing alignment) one by one.

consensus alignment,

where sequences are aligned in an order based on an UPGMA guide tree - in each step the closest (in terms of alignment score) sequences/subalignments are aligned, until only one alignment remains. I particular the target sequence may be the last one added.