Poster Presentation International Congress on Neuronal Ceroid Lipofuscinoses 2025

A user-friendly web interface to investigate transcription complexity of any gene of interest: A case study of all NCL genes (#32)

Haoyu Zhang 1 , Christopher Minnis 1 , Emil Gustavsson 1 , Mina Ryten 1 , Sara Mole 1
  1. University College London, London, United Kingdom

Since most existing transcript annotations are based on short-read RNA sequencing data, our understanding of transcript diversity is incomplete in terms of both the full variety and accurate structures of transcripts. By using the cutting-edge long-read RNA sequencing technology, transcripts are sequenced in full length, giving high accuracy compared to traditional short-read technology. However, not everyone in the laboratory has the expertise to process the long-read RNA sequencing data. Therefore, we aim to build a user-friendly web interface that allows users to explore the transcript complexity of their genes of interest based on ENCODE PacBio long-read RNA sequencing data. Here, we apply this to the NCL genes. Across all NCL genes, many novel transcripts are detected, and some have more than 100 novel transcripts (GRN, ATP13A2 and CLN3). Transcript usage assesses the fraction of a gene’s total reads that map to a specific transcript. Most NCL genes do not have a dominant transcript where median usage exceeds 85% (except for PPT1 and CTSD). Alternative spliced transcripts are detected for all NCL genes, including alternative 5’/3’ sites, exon-skipping and retained introns. In terms of open-reading frames (ORFs) predicted by ORFik, ORF usage of NCL genes varies across different ENCODE-defined organs, e.g., an ORF encoding a 168-amino acid (aa) CLN6 protein isoform has usage of ~55% in brain and ~5% in blood. Similarly, an ORF encoding a 147 aa KCTD7 protein isoform has usage of ~30% in brain and ~5% in blood. Together, these results offer new insights into NCL research, providing evidence that a single gene may generate multiple protein products. This may aid in understanding the disease pathogenesis and highlights the importance of comprehensive gene annotation for future development of personalised therapeutics.