Multimapping 21-mers in rde1Δ

In the tutorial ‘Cryptococcus siRNAs’ it was observed that in the strain lacking Rde1 most multimappers had a length of 21 nt, rather than 22 nt as normally found. Appearance of 21-mer multimappers can be checked a bit further:

Sample

cDNA counts (collapsed)

Multimapper read counts (uncollapsed)

Name

SRA

20-mer

21-mer

22-mer

23-mer

20-mer

21-mer

22-mer

23-mer

rde1Δ

SRR8697586

9973

12924

15972

12859

375990

723258

550248

446487

SRR8697587

9627

13294

16094

13380

309394

594001

451775

403168

wild type

SRR646636

32990

45271

53001

42888

322177

741118

1557249

1177401

To find counts for reads with matches from 20 to 23 nucleotides in the alignments, as shown in this table, the following approach can be taken. As a control, cDNAs and wild type multimappers are counted.

First, change directory, for the cDNAs:

  • cd STAR-analysis0-h99_collapsed/SRR8697586_collapsed_0mismatch-h99/

Then, get multimapper counts (for above table) with samtools and grep.

(-v for take the inverse, -c for count and -w for search whole word):
  • samtools view samtoolsAligned.sortedByCoord.out.bam | grep -v "NH:i:1" | grep -cw "20M"

  • samtools view samtoolsAligned.sortedByCoord.out.bam | grep -v "NH:i:1" | grep -cw "21M"

  • samtools view samtoolsAligned.sortedByCoord.out.bam | grep -v "NH:i:1" | grep -cw "22M"

  • samtools view samtoolsAligned.sortedByCoord.out.bam | grep -v "NH:i:1" | grep -cw "23M"

Repeat these commands for the replicate:

  • cd ../../STAR-analysis0-h99_collapsed/SRR8697587_collapsed_0mismatch-h99/

And for the multimapping reads:

  • cd ../../STAR-analysis0-h99_uncollapsed/SRR8697586_uncollapsed_0mismatch-h99/

  • cd ../../STAR-analysis0-h99_uncollapsed/SRR8697587_uncollapsed_0mismatch-h99/

and for a wild-type control too:

  • cd ../../STAR-analysis0-h99_collapsed/SRR646636_collapsed_0mismatch-h99/

  • cd ../../STAR-analysis0-h99_uncollapsed/SRR646636_uncollapsed_0mismatch-h99/

To get the numbers for reads of interest - the 21-mers - for each chromosome do (before changing directory):

  • for i in {1..14}; do echo "$i: $(samtools view samtoolsAligned.sortedByCoord.out.bam | grep -v "NH:i:1" | awk "\$3==$i" | grep -cw "21M" )"; done

(awk '$3=1' will select lines when the third field - the chromosome name - is 1; in the full command the chromosome names are passed in as $i, a bash-variable; because bash has to process the awk command as text, the awk-qualifier $3 needs escaping and the whole is in between double quotes to allow interpretation of $i.)

This table collects the results:

Sample

rde1Δ

wildtype

SRA

SRR8697586

SRR8697587

SRR646636

Chromosome

Multimapper read counts 21-mer

1

11036

11510

99094

2

687782

558693

130603

3

2594

2666

59705

4

1271

1312

28507

5

1087

1072

69644

6

896

850

63315

7

3873

4045

26985

8

777

869

28801

9

7885

7442

67859

10

527

566

16059

11

1460

1415

72215

12

987

828

14030

13

1261

1299

50498

14

521

442

11100

With the exception of chromosome 2, to all chromosomes of the wild type 8 to 30-fold more reads are mapped than in the rde1Δ strain. In the mutant, the vast majority of reads map to chromosome 2. Because the counted reads represent a mix of specific and unspecific reads of 21 nt, hits can come from unspecific regions. Given that rRNA reads are multimappers on chromosome 2, these numbers point to this chromosome.

With the following commands we can narrow this down; run in each folder SRR..._uncollapsed_0mismatch-h99/ a command to find the number of reads that map to chromosme 2:

  • samtools view samtoolsAligned.sortedByCoord.out.bam | grep -v "NH:i:1" | awk "\$3==2" | grep -cw "21M"

and then one that retrieves the counts of reads that align within the rDNA locus between 2:270000-284000:

  • samtools view samtoolsAligned.sortedByCoord.out.bam | grep -v "NH:i:1" | awk "\$3==2" | awk '$4 > 270000 && $4 < 284000' | grep -cw "21M"

This gives :

Sample

rde1Δ

wildtype

SRA

SRR8697586

SRR8697587

SRR646636

Chromosome

Multimapper read counts 21-mer

Total

723258

594001

741118

2

687782

558693

130603

270000-284000

684218

555291

32328

This approach points out that most 21-mers in rde1Δ are linked to the rDNA. Whether these represent siRNAs or rRFs is not clear. An analysis with coalispr region, however, revealed that siRNAs antisense to rRNA can be generated which were highly abundant in rde1Δ (see Figure 13 in the tutorial).

After these tests return to Burke-2019/:

  • cd ../../