Possible uses of GapRepairer

This section of the manual is divided into two categories:

  1. Possible applications - situations (with examples) when GapRepairer is the best way forward. This includes:
    1. Protein published with incorrect topology
    2. Proteins with high sequential similiarity but differing in topology - only example to date is the OTC/ATC protein family;
    3. Repair of multiple incomplete chains in one structure
    4. Reconstruction of protein with many (or long) undetermined protein backbone fragments
  2. How was our server used to date - list of publications for which GapRepairer was used to prepare the files.

Vid. 1 Example video of using GapRepairer to rebuild the correct (knotted) topology of methyltransferase with PDB ID 1oy5. To enhance the experience turn on your sound.

Vid. 2 Example video of using GapRepairer to rebuild the correct (trivial) topology of a linked computational model of HIV envelope glycoprotein (PDB ID 3j70). To enhance the experience turn on your sound.

  1. Possible applications
    1. Artificially disentangled proteins
    2. At times, a protein may be determined with a wrong fold, e.g. due to poor resolution of the structure (X-ray data) or low density map (ECM data). In such case GapRepairer can be used to create a structure with correct nontrivial topology. Correct protocol for such protein will be described below, using as an example a bacterial tRNA methyltransferase (PDB ID: 1oy5) - after proper reconstruction protein backbone will be knotted.


      1. First the offending segments should be removed (either manually by removing them using a text editor, or by a graphical program such as PyMol) - amino acids to be removed are shown in red in Fig. 1 below.

      2. Fig. 1 (Left) Superposed structures with PDB ID 1oy5 (dark gray) and 4mcb (light gray): in blue and red the crossing which creates in the latter case the trefoil knot. Amino acids, that should be removed from 1oy5 to allow a repair with correct fold are those shown in red. (Right) Protein with PDB ID 1oy5 after reconstruction. New crossing is introduced through a different threading of the red coloured parts. The bottom-right to top-left connection was previously below (more inside) the top-right to bottom-left chain, now it is above (further from the core).

      3. Then the modified structure should be uploaded to the GapRepairer using the Process my own structure option, along with the unmodified .fasta file. To ensure that (as it's own best homologue) the original structure is not used as a template, it should be specified in the Structures to exclude option (under Advanced Options in the repair form). Protein backbone after reconstruction is shown in the Fig. 1B.

    3. Artificially entangled proteins
    4. As another possible application, the GapRepairer can be used to untie a protein backbone. Inconsiderate reconstruction (eg. using a straight line) can lead to an artificial entanglement. Such artificial crossings are more common in case of knots than slipknots (77 and 55 cases respectively found by KnotProt). This inequality my be due to the number of additional crossings that must be introduced to create an artificial entanglement - one such error is enough to create a knotted protein. In case of such protein GapRepairer can be used to recreate the missing fragment and return to the trivial topology (if such is the prevalent one amongst the homologues of the protein in question), using the workflow described for Artificially disentangled proteins.

      Examples of untied proteins (reduced from a 31 knot/slipknot to an unknot):

        Fig. 2 Reconstructed protein with PDB ID 3sij chain A. Entanglement search based on a "straight line repair" (magenta tube) would suggest a 31, yet a repair based on homologous proteins (in blue) shows that the loop should go on the outside, and no knot should be created. Schematic vizualization of the crossing in both cases shown in upper left corner.

      Untying more complicated proteins:

      While artificial trefoils (that is the 31 knots/unknots) are by far the most common of similar errors, more complicated topologies can also be created - and resolved using GapRepairer. Such entanglement may be created when a protein contains one (or more) of the twisted loops - and the unfortunate connection that fill the gap happens to go right through this loop.

      Fig. 3 Based on homologous proteins, reconstructed protein with PDB ID 2d7d is unknotted (newly added loop in blue)- contrary to the straight line repair (magenta tube), which would lead to a 41 slipknot. Schematic vizualization of the crossing in both cases shown in upper left corner.

      Case study: Correcting a structure based on an incorrect lasso topology

      Structure with PDB ID 3j70 is a computational model of HIV envelope glycoprotein. Two of its loops are crossed (marked in Fig. 4 left panel below), which, with both belonging to separate lassos, makes a Hopf link out of them. While these are highly mobile loops, such structure is not present in any of the experimental structures of this protein available in the PDB. As such, we have determined reconstruction of this particular region as incorrect and decided to rebuild it using GapRepairer through following workflow:

      • part of the larger loop was cut out to remove the crossing;
      • structure with PDB ID 2b4c (Fig. 4C below) was selected as a template since it contains the most similarly shaped loops in question (accoring to authors of the structure;
      • final model is identical to the original structure (PDB ID 3j70), except for the newly uncrossed loops (Fig. 4 middle panel below)

      Fig. 4 Original structure of the protein with PDB ID 3j70 chain D, with residues to be removed coloured red (left). Structure with loop rebuilt to remove the crossing (middle), based on the structure with most similarily shaped loops (PDB ID 2b4c chain G(right)). In all figures disulfide bridges marked in yellow, with lassos (and modelled crossing) in green and blue.

    5. Change in the chirality of the entanglement
    6. Thanks to the possiblity of the reconstruction of multiple missing fragments at the same time, GapRepairer can also be used to change the chirality of a protein. By correcting just two wrongly interpreted crossings,a -31 slipknot can be changed into a +31 one (as shown in Fig. 5). Protein with PDB ID 4zg6 is annotated as -31 slipknot in KnotProt (due to the straight line gap-filling), while all of its homologues have different chirality. As can be seen in Fig. 5 below repaired protein has a +31 topology

      Fig. 5 Original structure of the protein with PDB ID 4zg6 (upper left). Repaired structure with rebuilt fragments coloured magenta (upper middle). Homologue used as a template (PDB ID 4zg7 chain A - upper right) Respective topology matrices in the lower row.

    7. Proteins with high sequential similiarity but differing in topology
    8. Topology differentation is especially important for proteins where high sequential similarity does not preclude differing topology. The most notable example of such is the ATCase/OTCase protein family (PFam family PF00185) - a family containing two closely related enzymes: a 31- knotted aspartate carbamoylotransferase, and an unknotted ornithine carbamoylotransferase. These two enzymes cannot be easily differentiated based on their sequence, yet contain different fold (as can be seen in the JSmol applet below). Since a consensus structure that averages over different topologies can be quite unexpected (it is impossible to say for sure which topology would it display), by restricting the topology of the templates to those in accordance with the closest homologue, drastically reduce the uncertainity of the final result.

      Fig. 6 Proteins that belong to the ATCase/OTCase family: knotted aspartate carbamoylotransferase (PDB ID 3kzm, in green and blue) and unknotted ornithine carbamoylotransferase (PDB ID 2otc, chain A, in magenta and red), both displayed as backbone only. Knot and knot-equivalent region in the unknotted protein are distinguished by colour and radius of the line.

    9. Repair of multiple incomplete chains in one structure
    10. When uploading a .pdb file to GapRepairer, user can select to take other chains present into consideration. As currently only one chain can be modelled at a time, this option allows user to repair a multichain structure without resorting to editing .pdb files by hand. To repair a to chain structure, with both chains incomplete, the proper workflow would be to:

      • First, upload the structure, with first chain selected to be modeled (e.g. A), and multichain option selected.
      • Then download a chosen finished model from amongst the GapRepiarer results, and upload this file, with the other chain chosen (e.g. B), and again multichain option selected.
      • All the final models after the GapRepairer finishes will contain both chains, repaired.

    11. Reconstruction of protein with many (or long) undetermined protein backbone fragments
    12. There is currently no upper linit to the length or number of missing fragments - as long as at least the first and last residue are present and, to ensure sensible reconstruction, close enough homologues exist.

      Fig. 7 Original structure of the protein with PDB ID 4zg9 with 4 gaps (left). Repaired structure with rebuilt loops in magenta (middle). Homologue used as a template (PDB ID 3nkm - right).

      This functionality can also be used for structures with significant structural errors. One such case that can be resolved using GapRepairer is the structure with PDB ID 2xkl. Based on comparison with its closest structural and sequential homologues (eg. structures with PDB IDs 2wew and 2xkl, one of its beta strands has an incorrectly assigned structure - it was assumed to apprear at the N terminus, while it should fall into the middle of the sequence (deep blue beta strand in the middle of Fig. 8 left panel). Thanks to the close structural relation with its homologues it is possible to fully reconstruct the assumed correct form. Suggest worflow here is as follows:

      • trim the structure so that only a couple of residues remain at both C- and N-terminus (here it was about 10 amino acids at each end). It is best to to leave at least around 8-10 amino acids, to ensure a proper sequence-structure alignment;
      • upload it to GapRepairer (complete with the correct .fasta sequence), and select to Exclude the original structure.
      Any of the homologue selection methods should be proper, for the following images the "Consensus" option was selected.

      Fig. 8 Original structure of the protein with PDB ID 2xkl(left) and the structure reconstructed from its N- and C-terminal ends (middle) based on its closes homologue (PDB ID 2wew - right).

  2. How was our server used to date
    1. KnotProt

    2. LassoProt
    3. GapRepairer was used to check whether the reconstruction of missing amino acids in gapped proteins would allow to confirm the position of a lasso (that is the disulfide bridge). PDB codes of repaired proteins, whose chains and complex lasso type were successfully modeled, are listed in the collapsible panel below, and will shortly be available in our database. List of the repaired lasso proteins

      1a7s (chain A)
      1agq (chain A)
      1ax8 (chain A)
      1b8k (chain A)
      1d2t (chain A)
      1dof (chain A)
      1egi (chain A)
      1f2q (chain A)
      1f97 (chain A)
      1fcq (chain A)
      1fo8 (chain A)
      1g5g (chain A)
      1gcy (chain A)
      1gv9 (chain A)
      1hc1 (chain A)
      1huw (chain A)
      1jdp (chain A)
      1jnd (chain A)
      1js8 (chain A)
      1jy5 (chain A)
      1kxo (chain A)
      1l1l (chain A)
      1lf7 (chain A)
      1lml (chain A)
      1m48 (chain A)
      1n1f (chain A)
      1neu (chain A)
      1nko (chain A)
      1o3u (chain A)
      1olz (chain A)
      1p53 (chain A)
      1pb7 (chain A)
      1peq (chain A)
      1pew (chain A)
      1pgu (chain A)
      1pko (chain A)
      1q35 (chain A)
      1q8d (chain A)
      1qfo (chain A)
      1qfx (chain A)
      1qg8 (chain A)
      1qgv (chain A)
      1r3e (chain A)
      1rxd (chain A)
      1s4n (chain A)
      1scf (chain A)
      1so7 (chain A)
      1t6e (chain X)
      1uct (chain A)
      1ux6 (chain A)
      1v0w (chain A)
      1v9m (chain A)
      1w07 (chain A)
      1w8a (chain A)
      1w8k (chain A)
      1yi9 (chain A)
      1z4v (chain A)
      1zk5 (chain A)
      1zro (chain A)
      2b7u (chain A)
      2b9l (chain A)
      2bgh (chain A)
      2bog (chain X)
      2bsy (chain A)
      2d1g (chain A)
      2d1h (chain A)
      2ddf (chain A)
      2ddu (chain A)
      2de0 (chain X)
      2dre (chain A)
      2dvk (chain A)
      2e1v (chain A)
      2ecf (chain A)
      2eng (chain A)
      2y (chain A)
      2fj0 (chain A)
      2fna (chain A)
      2fy7 (chain A)
      2gak (chain A)
      2h2t (chain B)
      2hlr (chain A)
      2hq4 (chain A)
      2i10 (chain A)
      2id5 (chain A)
      2im9 (chain A)
      2iy9 (chain A)
      2jd4 (chain A)
      2jju (chain A)
      2jks (chain A)
      2nsm (chain A)
      2nw2 (chain A)
      2nxf (chain A)
      2nyk (chain A)
      2oay (chain A)
      2odp (chain A)
      2pf5 (chain A)
      2pmv (chain A)
      2qki (chain A)
      2qn4 (chain A)
      2raa (chain A)
      2rag (chain A)
      2rl8 (chain A)
      2uur (chain A)
      2uy2 (chain A)
      2vl7 (chain A)
      2vsm (chain A)
      2w2g (chain A)
      2w59 (chain A)
      2w61 (chain A)
      2w9x (chain A)
      2wjs (chain A)
      2wy3 (chain A)
      2x1q (chain A)
      2x2u (chain A)
      2xlg (chain A)
      2xot (chain A)
      2y38 (chain A)
      2y8d (chain A)
      2y8t (chain A)
      2yd6 (chain A)
      2ydv (chain A)
      2yg2 (chain A)
      2ykt (chain A)
      2ymo (chain A)
      2z3q (chain A)
      2z4i (chain A)
      2zou (chain A)
      2zws (chain A)
      3ahq (chain A)
      3aja (chain A)
      3ajd (chain A)
      3ap1 (chain A)
      3bix (chain A)
      3bqk (chain A)
      3bwu (chain F)
      3ci0 (chain K)
      3cj1 (chain A)
      3cqn (chain A)
      3db5 (chain A)
      3dxl (chain A)
      3e0g (chain A)
      3ebw (chain A)
      3eeq (chain A)
      3erb (chain A)
      3f6k (chain A)
      3f95 (chain A)
      3g7n (chain A)
      3ghm (chain A)
      3grf (chain A)
      3h6g (chain A)
      3hhs (chain A)
      3hsy (chain A)
      3i26 (chain A)
      3i84 (chain A)
      3icv (chain A)
      3ix0 (chain A)
      3j0a (chain A)
      3jpw (chain A)
      3jxg (chain A)
      3k1l (chain A)
      3k1w (chain A)
      3k7b (chain A)
      3kbr (chain A)
      3lo8 (chain A)
      3m19 (chain A)
      3n7s (chain A)
      3nhi (chain A)
      3nkq (chain A)
      3nsj (chain A)
      3nvx (chain A)
      3o22 (chain A)
      3o6n (chain A)
      3oe3 (chain A)
      3og6 (chain A)
      3ojo (chain A)
      3okw (chain A)
      3om0 (chain A)
      3omz (chain A)
      3p09 (chain A)
      3pim (chain A)
      3pow (chain A)
      3pv7 (chain A)
      3pvk (chain A)
      3qcp (chain A)
      3qdh (chain A)
      3rnq (chain B)
      3s26 (chain A)
      3s9d (chain A)
      3sao (chain A)
      3sqr (chain A)
      3t4l (chain A)
      3tc2 (chain A)
      3u3l (chain C)
      3vpp (chain A)
      3vrh (chain A)
      3whx (chain A)
      3zh5 (chain A)
      3zib (chain A)
      3zy2 (chain A)
      3zyo (chain A)
      4adi (chain A)
      4ae2 (chain A)
      4aee (chain A)
      4aru (chain A)
      4b4h (chain A)
      4bsj (chain A)
      4bvn (chain A)
      4c08 (chain A)
      4ccd (chain A)
      4cn9 (chain A)
      4cxp (chain A)
      4d8m (chain A)
      4dlo (chain A)
      4dlq (chain A)
      4dzr (chain A)
      4ekx (chain C)
      4el6 (chain A)
      4enz (chain A)
      4g2u (chain A)
      4gf2 (chain A)
      4h18 (chain A)
      4hln (chain A)
      4i0w (chain B)
      4i71 (chain A)
      4ijy (chain A)
      4io2 (chain A)
      4irm (chain A)
      4j3r (chain A)
      4jd9 (chain A)
      4jjh (chain A)
      4jjj (chain A)
      4job (chain A)
      4jvu (chain A)
      4jzz (chain A)
      4k3l (chain A)
      4k60 (chain A)
      4kg7 (chain A)
      4kgh (chain A)
      4kqa (chain A)
      4kt3 (chain A)
      4kx7 (chain A)
      4l7g (chain A)
      4lxr (chain A)
      4mh1 (chain A)
      4ms4 (chain A)
      4myk (chain A)
      4mz2 (chain A)
      4nmx (chain B)
      4nob (chain A)
      4nqw (chain A)
      4oe8 (chain A)
      4p1e (chain A)
      4p49 (chain A)
      4per (chain B)
      4plm (chain A)
      4tr2 (chain A)
      4v2d (chain A)

      Research where our server has been used to prepare structures for the study:
      1. Methyl Transfer by Substrate Signaling from a Knotted Protein Fold Thomas Christian, Reiko Sakaguchi, Agata P. Perlinska, Georges Lahoud,Takuhiro Ito, Erika A. Taylor, Shigeyuki Yokoyama, Joanna I. Sulkowska, Ya-Ming Hou Nat Struct Mol Biol. 2016 Oct;23(10):941-948. doi: 10.1038/nsmb.3282

        For the in silico part of the analysis a missing link between two domains of the 1ual protein was repaired using GapRepairer.

      2. "The exclusive effects of chaperonin on the behavior of the 52 knotted proteins" Yani Zhao, Szymon Niewieczerzal, Pawel Dabrowski-Tumanski and J. I. Sulkowska (Submitted)

        The proteins 1cmx and 4i6n were reconstructed using GapRepairer.

    If you use our server and would like to be mentioned here please send us an email to a.jarmolinska [at] cent.uw.edu.pl.

GapRepairer | Interdisciplinary Laboratory of Biological Systems Modelling