šŸ˜‡SARS CoV-2 Consensus Genome QC- Frame shift Correction

This page will guide you how to remove frame shifts (FS) from your SARS-CoV-2 FASTA consensus genome file before submitting to GISAID.

Khan W, Kanwar S

AKU CITRIC Center for Bioinformatics and Computational Biology, Depratment of Pediatrics and Child Health, Faculty of Health Sciences, Medical College, The Aga Khan Universitry, Karachi-74800, Pakistan.

On NextClade QC, Frame shift (FS) is represented by "F" and visually as red line. Color coding represents the total number of FS in the sequence.

  • Red indicates the number of FS >2.

  • Yellow indicates the number of FS <=2.

  • Green indicates the number of FS <=1 (acceptable).

You can follow below mentioned step to remove FS from your consensus genome:

1) Identify the nucleotide position from where the FS starts. On hovering the mouse on red line, the nucleotide position can be identified.

2) Open the alignment file of a sample, for example, muscle.out.fasta (generated as intermediate file in CZ ID pipeline ) containing FS on MegaX software and search for the said nucleotide position.

3) Hence, you can see here, the "N" in the above figure should be a gap "-". This is what caused the FS mutation. In this case, removing an ā€œNā€ from this position can remove the FS. Search for the region around ā€œNā€ in the consensus genome FASTA file.

Hint: To find the exact FS position, copy adjacent/flanking nucleotides (+/-3) around that specific "N" and search in FASTA file.

4- Remove the N and Run NextClade QC again.

Note: The alignment file will help us to decide whether we need to insert or delete ā€œNā€ at a specific position in the consensus file.

Hope this will help you to improve the QC of consensus genome.

Last updated