Exercise 1: Query a public database
Which effect has this variant?
It causes a translation stop; Glu1157X
What are the characteristics of the FASTA format?
The first line starts with > followed by the identifier of the sequence (gi number in case of Genbank, an accession code in other databases) and optionally a description of the sequence. The sequence is on the consecutive lines. https://en.wikipedia.org/wiki/FASTA_format
Exercise 2: Data conversion/translation
How many lines in a fastq file describe one sequence?
4 lines. The first contains the accession code, the second the sequence, the third an optional description and the fourth the base quality scores.
How many sequences does the FASTA file contain?
Which organism(s) are in the dataset?
Candidatus Acetothermus autotrophicum DNA, large contig sequence, contig 4
Exercise 3: Compare two sequences to identify mutations
Which mutation causes Sickle cell disease?
The sixth amino acid in the hemoglobin beta chain (HBB) is mutated from glutamic acid to valine (Glu6Val)
Identify the mutation which causes the change in amino acid
The amino acid “X” is not a variant. You see in the HBS sequence that some nucleotides are different from A, C, G or T. The other letters represent IUPAC codes (see IUPAC list). The “N” means that that base could not be determined. The “Y” represents a “T” or “C”, which in this case might or might not be the same as for the normal HBB.
Exercise 4: Pick primers to screen patients for the HBS mutation
Does the resulting primer set include the region with the mutation?
Yes! The easy way is to search for the start codon (ATG) in your browser and check if the sixth codon is included within the primer set.
What is the rs-id of the variant corresponding with the HBS phenotype?
rs334. In the dbSNP database you will find the frequency of the variant in different populations.
Exercise 5: Gene finding
What is wrong with this transcript?
The mutation in region 3713 (G>T) causes a translational stop. See the first exercise.