THE SpTRANSFORMER GENE FAMILY HAS COMPLEX PATERNS OF REPEATS, DUPLICATIONS, AND SHARED SEQUENCES CONSISTENT WITH GENOMIC INSTABILITY
Megan Barela-Hudgell, George Washington University
Megan A. Barela Hudgell, Matan Oren & L. Courtney Smith
Department of Biological Sciences, George Washington University, Suite 6000, Science and Engineering Hall, 800 22nd St NW, Washington DC, 20052, USA
SpTransformer gene family encodes proteins with innate immune functions in the purple sea urchin. These genes have two exons a single diverse intron and are bracketed on both sides by GA and/or GAT short tandem repeats (STRs). Three clusters for a total of 15 genes are present in the genome sequence of the S. purpuratus. In this work an in-depth analysis was conducted to understand the sequence complexities of this gene family, its genomic structure, and to derive a hypothesis for the formation of the gene clusters. Results allowed for accurate naming of each gene, identification of the corresponding intron category, positions of stop codons, and relationships among the genes that have been used to infer their evolutionary relatedness. All genes share sequence similarity including flanking regions from the 5’ STRs to the 3’ STRs. The 5’ end of 11 of 15 genes have two to three conserved short regions of similarity that are located 5’ to the GA STRs. These regions may be indicative of short regulatory sites located at the 5’ end of each gene. Two of the clusters that are thought to be allelic show differences in gene copy number and a region of ~11,000 bp with complete sequence dissimilarity. The complexity of this gene family suggests that regions with large numbers of repeats, duplications, shared sequences, and tight clustering could be due to, or the basis of, genomic instability. This may underpin the fast diversification rate that is commonly associated with immune genes.