šSARS CoV-2 Consensus Genome QC- Frame shift Correction
This page will guide you how to remove frame shifts (FS) from your SARS-CoV-2 FASTA consensus genome file before submitting to GISAID.
Last updated
This page will guide you how to remove frame shifts (FS) from your SARS-CoV-2 FASTA consensus genome file before submitting to GISAID.
Last updated
Khan W, Kanwar S
AKU CITRIC Center for Bioinformatics and Computational Biology, Depratment of Pediatrics and Child Health, Faculty of Health Sciences, Medical College, The Aga Khan Universitry, Karachi-74800, Pakistan.
On NextClade QC, Frame shift (FS) is represented by "F" and visually as red line. Color coding represents the total number of FS in the sequence.
Red indicates the number of FS >2.
Yellow indicates the number of FS <=2.
Green indicates the number of FS <=1 (acceptable).
You can follow below mentioned step to remove FS from your consensus genome:
1) Identify the nucleotide position from where the FS starts. On hovering the mouse on red line, the nucleotide position can be identified.
2) Open the alignment file of a sample, for example, muscle.out.fasta (generated as intermediate file in CZ ID pipeline ) containing FS on MegaX software and search for the said nucleotide position.
3) Hence, you can see here, the "N" in the above figure should be a gap "-". This is what caused the FS mutation. In this case, removing an āNā from this position can remove the FS. Search for the region around āNā in the consensus genome FASTA file.
Hint: To find the exact FS position, copy adjacent/flanking nucleotides (+/-3) around that specific "N" and search in FASTA file.
4- Remove the N and Run NextClade QC again.
Note: The alignment file will help us to decide whether we need to insert or delete āNā at a specific position in the consensus file.
Hope this will help you to improve the QC of consensus genome.
and voila!!! frameshift is corrected.