šŸ“‘
Implementation of Bioinformatics Pipeline at AKU
  • PHA4GE: Implementation of CZ ID Workflow at AKU for Analyzing SARS-CoV-2 Genomics Data
    • šŸ˜ŽImplementation of CZ ID mini-WDL-based SARS-CoV-2 Consensus Genome Workflow Pipeline at AKU
      • 😊SARS CoV-2 Consensus Genome QC- Mixed sites Correction
      • šŸ˜‡SARS CoV-2 Consensus Genome QC- Frame shift Correction
Powered by GitBook
On this page

Was this helpful?

  1. PHA4GE: Implementation of CZ ID Workflow at AKU for Analyzing SARS-CoV-2 Genomics Data
  2. Implementation of CZ ID mini-WDL-based SARS-CoV-2 Consensus Genome Workflow Pipeline at AKU

SARS CoV-2 Consensus Genome QC- Frame shift Correction

This page will guide you how to remove frame shifts (FS) from your SARS-CoV-2 FASTA consensus genome file before submitting to GISAID.

PreviousSARS CoV-2 Consensus Genome QC- Mixed sites Correction

Last updated 2 years ago

Was this helpful?

Khan W, Kanwar S

AKU CITRIC Center for Bioinformatics and Computational Biology, Depratment of Pediatrics and Child Health, Faculty of Health Sciences, Medical College, The Aga Khan Universitry, Karachi-74800, Pakistan.

On NextClade QC, Frame shift (FS) is represented by "F" and visually as red line. Color coding represents the total number of FS in the sequence.

  • Red indicates the number of FS >2.

  • Yellow indicates the number of FS <=2.

  • Green indicates the number of FS <=1 (acceptable).

You can follow below mentioned step to remove FS from your consensus genome:

1) Identify the nucleotide position from where the FS starts. On hovering the mouse on red line, the nucleotide position can be identified.

2) Open the alignment file of a sample, for example, muscle.out.fasta (generated as intermediate file in CZ ID pipeline ) containing FS on MegaX software and search for the said nucleotide position.

3) Hence, you can see here, the "N" in the above figure should be a gap "-". This is what caused the FS mutation. In this case, removing an ā€œNā€ from this position can remove the FS. Search for the region around ā€œNā€ in the consensus genome FASTA file.

Hint: To find the exact FS position, copy adjacent/flanking nucleotides (+/-3) around that specific "N" and search in FASTA file.

4- Remove the N and Run NextClade QC again.

Note: The alignment file will help us to decide whether we need to insert or delete ā€œNā€ at a specific position in the consensus file.

Hope this will help you to improve the QC of consensus genome.

and voila!!! frameshift is corrected.

šŸ˜Ž
šŸ˜‡
šŸŽ‰
As here in this example, FS starts from 4148 to 13,468
Visualization of nucleotide position on MegaX
Identified position of "N" in consensus.fa
FS removed from the consensus genome.