Input data for Harpy

In spatial proteomics dozens of images are generated (one for each protein), so you need to do some heavy image analysis.

Where do all these images come from?

These images can be large full-slide images, when generated by cyclical immunofluorescence staining, as generated by platforms like MACSima (Miltenyi Biotec) and PhenoCycler Fusion (Akoya). These platforms stain a protein on the slide, take a picture and then remove the stain for a next cycle of staining a different protein, generating one image per protein.

So what Harpy adds to the SPArrOW pipeline are bioimage processing techniques, needed because there are a lot of issues in these images.

Issues with spatial proteomics images

Spatial proteomics images need a lot of processing:

  • Stitching artefacts: the machines do not capture the whole slide on one image, they image tiles (smaller, rectangular sections of the slide) and these tiles need to be stitched together to create an image of the whole slide. The stitching process will introduce errors.
  • Background noise because of autofluorescence: this is the light that biological structures emit naturally without any dyes. This light can overlap with the light that is emitted by the dye or the fluorescent labels making it hard to distinguish between real (fluorescent) signal and the background.
  • Bleaching: the machines use bleaching to remove autofluorescence but this will also introduce artefacts and variable results for different channels.
  • Tiling artefacts: brightness can vary between tiles.
  • Staining artefacts or outliers: bright spots that look like cells but they are debris.

Issues because of biology

While transcripts are neatly located in the cell and diffusion of transcripts should (in theory) be very limited, proteins can be membrane bound making it difficult to assign the protein signal to the correct cell.

You also don’t count the proteins as you do with transcripts, the data are signal intensities. This means that not all assumptions that we do for transcriptomics data will be valid for proteomics data.

https://youtu.be/vELF_7YSp0g</p> <!– /wp:paragraph –> <!– wp:heading {"level":3} –> <h3 class="wp-block-heading">Batch effects</h3> <!– /wp:heading –> <!– wp:paragraph –> <p>When working with multiple samples, the intensity of the images will not be the same for the different samples so you need to do batch correction. </p> <!– /wp:paragraph –> <!– wp:paragraph –> <p>https://youtu.be/AQL9NTAFsL4