Dataset: Cullin4-Ubiquitin ligase + SAMHD1 + Vpr citation . The crosslinker is sulfo-SDA and it is searched from K,S,T,Y,nterm to any amino acid. The raw data is available here.
Open xiSEARCH.
In the files tab, load peak files (recal*.mgf files) and sequence file (complex.fasta) and select an output name.
In the parameters tab, tick on "multiple crosslinkers" and select SDA and non-covalent, deselect BS3. Why is that? citation
Select number of threads, memory and set missed cleavages to 4.
In variable modifications, deselect BS3 modifications and select SDA-loop and SDA-OH.
Tick "Do FDR" and keep threshold to 2% at the residue pair level, boosting heteromeric crosslinks.
This is a very simplified "how to" guide, but there is a lot of complexity to this. In particular, the size of the database may need to be adjusted to include contaminants, and the FDR settings may need to be tweaked in order to have enough targets and decoys to properly model the noise and to do a proper statistical control of the results.
documentation for the search engine here
documentation for FDR estimation engine here
Create your own account in xiview.org and upload the results, or else proceed with the pregenerated dataset as below.
Open the dataset in the xiview.org interactive viewer here.
You can expand each protein with the right click of the mouse to see where the crosslinks are localised on the sequence. Right click again and resize the protein to 0.2
Crosslinks may also be visualised in the circle plot, accessible from "views"... "circular"
At the bottom of the viewer, you have several toggles:
- self/heteromeric: toggle crosslinks within protein sequences or between protein sequences on or off. As the experimnt is peptide based, a "self" link can also be between multiple copies of the same protein
- overlap/non-overlap: toggle crosslinks that are self but between multiple copies of the same protein (not relevant here)
- Score: crosslink score. This is an arbitrary number specific to each search engine. The higher, the better- a greater number indicates a better quality of peptide-spectrum match. This slider should be used for visualization purposes only. A key property of FDR control is that the results after FDR control are not further subsettable, or else the FDR becomes unknowable. For a more stringent dataset, repeat the FDR procedure with a tighter threshold.
- Distance: Filter by distance once a PDB is uploaded to the session.
- Residue pairs per PPI: filter the network so that only protein pairs with more than X crosslinks are displayed
- filter boxes: filter by peptide sequence, protein name, description. On the right also boxes for run name and scan number.
Top panel includes tabs for uploading various files, including PDB files, and for analysing crosslink data.
All run files for the experiments with low SDA concentrations are called "Ratio24". The ones with the high SDA concentrations are called "Ratio56". Using the filters, take a look at how many crosslinks and how many heteromeric crosslinks correspond to either condition. What do you see?
Go to views.. circular, and in the Name/acc selection box at the bottom of the page type Vpr. Which region of SAMHD1 interacts with Vpr? On the other hand, type SAMHD1? does the rest of its sequence have a specific interface with the rest of the complex?
Let's get an idea for what crosslinked peptide spectra look like. In the scan box at the bottom right of the page, select scan 7144 and select the resulting crosslink. From the dropdown menu at the top, select views-> spectrum.
Take a moment to familiarize yourself with the viewer. At the top of the spectral window you see the error in matching the precursor (whole peptide) and its mass and charge state.
The vertical hockey pucks along the sequence are the fragment ions covering the sequence. You can hover over them with the mouse to reveal which part of the sequence they cover and to which peak they correspond. Each fragment ion may have more than one peak supporting it, as there may be versions that are unmodified and versions that have sustained the loss of water or ammonia groups.
Sometimes for this crosslinker, the exact crosslink site is not known as backbone fragmentation is incomplete. In this case, the program assigns the site to the last compatible amino acid in a stretch of equivalent linkage positions.
You can check how the spectrum would look with a different assignment by moving the crosslink site with a mouse. For more radical reannotations, you can click on the wheel and change the sequence or enter custom annotation commands.
The group of Andrea Sinz has observed that diazirine crosslinkers such as sulfo-sda may cleave in the gas phase. We can check if this has happened here by introducing a custom annotation for it.
Inside the spectral viewer, click the wheel and then the "custom" tab, copy the following line
crosslinker:AsymetricSingleAminoAcidRestrictedCrossLinker:Name:SDA;MASS:82.04186484;FIRSTLINKEDAMINOACIDS:*;SECONDLINKEDAMINOACIDS:K,S,T,Y,nterm;STUBS:A,82.041864,S,0
that accounts for a cleavable crosslinker, and click "apply".
Is this annotation better than the previous one? click "butterfly" at the top of the viewer to check.
Finally, let's look at crosslinks supported by scans that barely pass the FDR. Filter to a low spectral matching score (<10) and select a crosslink. Open the spectrum viewer. Check the difference with the high ranking spectra.
Always check the quality of the spectra before staking a biological interpretation on sparse crosslinking data!
In the "Histogram" and "scatterplot sections of the view, several properties of the dataset may be investigated.
Using the box to select "frac6" and "frac9", check out if there is a difference in number of crosslinks, charge state and precursor mass of earlier chromatographic fractions and latter ones.
What consequences does this have for experimental design?
In the upload tab, go to "sequence annotations " and upload the file XXX.csv. In the "annotation" tab at the top, toggle annotations on. You should now see domains highlighted on the sequence.
In this case, we used a custom sequence annotation file (sequence_annotations.csv), as our proteins are recombinant and we searched with an in-house sequence file containing the tags. For proteins with uniprot IDs in the sequence file headers, xiview will automatically download this information from uniprot.
Upload the file "state-3_fit_chains.pdb" from the course package. in view, select the 3d viewer. In the "annotation" tab at the top, toggle off the domains and select "PDB aligned region". What do you notice? Which protein is missing from the experimental structure?
Let's look at the 3d structure now. To check if the crosslinks are satisfied by this model, open "view", "legends and colors" and then color the crosslinks by distance. Let's set satisfied up to 25 angstrom, borderline 25-30 and violated over 30 angstrom. Go back to the 3d viewer. What do you see? You can also toggle between all crosslinks and heteromeric only. Monitor the distances on the circle plot.
For a more quantitative overview, check the histogram tab and plot by distance, or the scatterplot tab and plot crosslink score vs distance.
Select SAMHD1 crosslinks in the protein selection box, and toggle off self links.
Go back to the 3d viewer and toggle the display to "residues with half links" from the dropdown menu.
Within the 3d viewer, you can upload the 3 different PDBs and check the distance of critical crosslinks.
To work with crosslinking data in chimerax, one can export the links from the 3d viewer by clicking on 3d export.. chimerax pseudobond file.
To work directly in chimerax, you can instead use the X-MAS plugin. X-MAS is a plugin by the Scheltema group citation
Open chimerax and load the session.cxs from the course package.
Go to tools... more tools and install x-mas plugin.
Go to tools.. structure analysis... XMAS. In the tool, click on "import files". Navigate to results files and load the "peptide pairs" file. Select State 2 pdb and map crosslinks.
Click on "visualize". You can color by distance here, or export to other programs like DisVis for mapping patches and similar.
To remove dashes, you set dashes to 0 inside X-mas or use the command
style pbonds dashes 0
The advantage of working in chimerax is that you may load densities at the same time, as well as take advantage of much powerful visualization options.
Disvis. Check out DisVis here or, later, download the package and run locally, from here. DisVis calculates the volume of the positions of the center of mass of protein B consistent with N restraints stemming from protein A. In our case, we can position the SAMHD1 CTD against Vpr with it.
We can look at the results of a DisVis analysis in this case. I ran the program with to find the position of the center of mass of the C-terminus of SAMHD1 against the rest of the complex according to crosslinking MS data.
Open an second instance of Chimerax. Loading "state3_loops_reordered_renum_rhesus_A.pdb" from the DisVis folder in the course package, and now open accessible_interaction_space.mrc .
The threshold of this density is the number of crosslinking MS restraints consistent with a given volume.
AlphaLink. A modified version of AlphaFold 2 that pays attention to experimental crosslinks. Run via colab here. Citation here.
Other programs include: IMP (integrative modeling platform), Assembline, HADDOCK .
Finally, AlphaFold3 came out recently. Let's see how crosslinking MS looks with this. Load the alphafold model in chimerax or in xiview.org.
In chimerax, note the model number, and color it by local confidence
color bfactor palette alphafold
Map the crosslinks with Xmas, adjusting coloring and dashes. What do you see?
Is the supposed interaction area of Vpr-SAMHD1 in agreement with AlphaFold? Why was the protein not observed in cryo-EM? What about Vpr on the main complex? You can select individual interfaces in the xiview viewer by typing in the protein selection box
SAMHD1-Cul
and so on for the other proteins Vpr, DDB1, DCAF and Roc1.
