publications
Publications by categories in reversed chronological order. Generated by jekyll-scholar.
2025
- Inverted Alu repeats in loop-out exon skipping across hominoid evolutionDanielle Denisko, Jeonghyeon Kim, Jayoung Ku, and 2 more authorsbioRxiv [Preprint], 2025
Background: Changes in RNA splicing over the course of evolution have profoundly diversified the functional landscape of the human genome. While DNA sequences proximal to intron-exon junctions are known to be critical for RNA splicing, the impact of distal intronic sequences remains underexplored. Emerging evidence suggests that inverted pairs of intronic Alu elements can promote exon skipping by forming RNA stem-loop structures. However, their prevalence and influence throughout evolution remain unknown. Results: Here, we present a systematic analysis of inverted Alu pairs across the human genome to assess their impact on exon skipping through predicted RNA stem-loop formation and their relevance to hominoid evolution. We found that inverted Alu pairs, particularly pairs of AluY-AluSx1 and AluSz-AluSx, are enriched in the flanking regions of skippable exons genome-wide and are predicted to form stable stem-loop structures. Exons defined by weak 3’ acceptor and strong 5’ donor splice sites appear especially prone to this skipping mechanism. Through comparative genome analysis across nine primate species, we identified 67,126 hominoid-specific Alu insertions, primarily from AluY and AluS subfamilies, which form inverted pairs enriched across skippable exons in genes of ubiquitination-related pathways. Experimental validation of exon skipping among several hominoid-specific inverted Alu pairs further reinforced their potential evolutionary significance. Conclusion: This work extends our current knowledge of the roles of RNA secondary structure formed by inverted Alu pairs and details a newly emerging mechanism through which transposable elements have contributed to genomic innovation across hominoid evolution at the transcriptomic level.
@article{IRAlu, author = {Denisko, Danielle and Kim, Jeonghyeon and Ku, Jayoung and Zhao, Boxun and Lee, Eunjung Alice}, title = {{Inverted Alu repeats in loop-out exon skipping across hominoid evolution}}, journal = {bioRxiv [Preprint]}, year = {2025}, doi = {10.1101/2025.03.07.642063}, url = {https://doi.org/10.1101/2025.03.07.642063}, eprint = {https://www.biorxiv.org/content/10.1101/2025.03.07.642063v1.full.pdf}, }
- Genomic data processing with GenomeFlowJunseok Park, Eduardo A Maury, Changhoon Oh, and 3 more authorsBMC Bioinformatics, 2025
Advances in genome sequencing technologies generate massive amounts of sequence data that are increasingly analyzed and shared through public repositories. On-demand infrastructure services on cloud computing platforms enable the processing of such large-scale genomic sequence data in distributed processing environments with a significant reduction in analysis time. However, parallel processing on cloud computing platforms presents many challenges to researchers, even skillful bioinformaticians. In particular, it is difficult to design a computing architecture optimized to reduce the cost of computing and disk storage as genomic data analysis pipelines often employ many heterogeneous tools with different resource requirements. To address these issues, we developed GenomeFlow, a tool for automated development of computing architecture and resource optimization on Google Cloud Platform, which allows users to process a large number of samples at minimal cost. We outline multiple use cases of GenomeFlow demonstrating its utility to significantly reduce computing time and cost associated with analyzing genomic and transcriptomic data from hundreds to tens of thousands of samples from several consortia. Here, we describe a step-by-step protocol on how to use GenomeFlow for a common genomic data processing task. We introduce this example protocol geared toward a bioinformatician with little experience in cloud computing and large data processing and estimate that it will take <1 hour to execute.
@article{Park2025, author = {Park, Junseok and Maury, Eduardo A and Oh, Changhoon and Donghoon, Shin and Denisko, Danielle and Lee, Eunjung Alice}, title = {Genomic data processing with GenomeFlow}, journal = {BMC Bioinformatics}, year = {2025}, doi = {10.48550/arXiv.2503.15377}, url = {https://doi.org/10.48550/arXiv.2503.15377}, }
2024
- Chromosomal rearrangements and instability caused by the LINE-1 retrotransposonCarlos Mendez-Dorantes, Xi Zeng, Jennifer A. Karlow, and 7 more authorsbioRxiv [Preprint], 2024
LINE-1 (L1) retrotransposition is widespread in many cancers, especially those with a high burden of chromosomal rearrangements. However, whether and to what degree L1 activity directly impacts genome integrity is unclear. Here, we apply whole-genome sequencing to experimental models of L1 expression to comprehensively define the spectrum of genomic changes caused by L1. We provide definitive evidence that L1 expression frequently and directly causes both local and long-range chromosomal rearrangements, small and large segmental copy-number alterations, and subclonal copy-number heterogeneity due to ongoing chromosomal instability. Mechanistically, all these alterations arise from DNA double-strand breaks (DSBs) generated by L1-encoded ORF2p. The processing of ORF2p-generated DSB ends prior to their ligation can produce diverse rearrangements of the target sequences. Ligation between DSB ends generated at distal loci can generate either stable chromosomes or unstable dicentric, acentric, or ring chromosomes that undergo subsequent evolution through breakage-fusion bridge cycles or DNA fragmentation. Together, these findings suggest L1 is a potent mutagenic force capable of driving genome evolution beyond simple insertions.
@article{GenIns, author = {Mendez-Dorantes, Carlos and Zeng, Xi and Karlow, Jennifer A. and Schofield, Philip and Turner, Serfina and Kalinowski, Jupiter and Denisko, Danielle and Lee, Eunjung Alice and Burns, Kathleen H. and Zhang, Cheng-Zhong}, title = {{Chromosomal rearrangements and instability caused by the LINE-1 retrotransposon}}, journal = {bioRxiv [Preprint]}, year = {2024}, doi = {10.1101/2024.12.14.628481}, url = {https://doi.org/10.1101/2024.12.14.628481}, }
- Human cytomegalovirus harnesses host L1 retrotransposon for efficient replicationSung-Yeon Hwang*, Hyewon Kim*, Danielle Denisko*, and 9 more authorsNature Communications, 2024
Genetic parasites, including viruses and transposons, exploit components from the host for their own replication. However, little is known about virus-transposon interactions within host cells. Here, we discover a strategy where human cytomegalovirus (HCMV) hijacks L1 retrotransposon encoded protein during its replication cycle. HCMV infection upregulates L1 expression by enhancing both the expression of L1-activating transcription factors, YY1 and RUNX3 and the chromatin accessibility of L1 promoter regions. Increased L1 expression in turn promotes HCMV replicative fitness. Affinity proteomics reveals UL44, HCMV DNA polymerase subunit, as the most abundant viral binding protein of the L1 ribonucleoprotein (RNP) complex. UL44 directly interacts with L1 ORF2p, inducing DNA damage responses in replicating HCMV compartments. While increased L1- induced mutagenesis is not observed in HCMV for genetic adaptation, the interplay between UL44 and ORF2p accelerates viral DNA replication by resolving stalled replication forks. Our findings shed light on how HCMV exploits host retrotransposons for enhanced viral fitness.
@article{HCMV, author = {Hwang*, Sung-Yeon and Kim*, Hyewon and Denisko*, Danielle and Zhao*, Boxun and Lee, Dohoon and Jeong, Jiseok and Kim, Jinuk and Park, Kiwon and Choi, Hee-Jung and Kim, Sun and Lee, Eunjung Alice and Ahn, Kwangseog}, title = {Human cytomegalovirus harnesses host L1 retrotransposon for efficient replication}, year = {2024}, journal = {Nature Communications}, doi = {10.1038/s41467-024-51961-y}, url = {https://doi.org/10.1038/s41467-024-51961-y}, }
2023
- Motif elucidation in ChIP-seq datasets with a knockout controlDanielle Denisko, Coby Viner, and Michael M HoffmanBioinformatics Advances, 2023
Chromatin immunoprecipitation-sequencing is widely used to find transcription factor binding sites, but suffers from various sources of noise. Knocking out the target factor mitigates noise by acting as a negative control. Paired wild-type and knockout (KO) experiments can generate improved motifs but require optimal differential analysis. We introduce peaKO—a computational method to automatically optimize motif analyses with KO controls, which we compare to two other methods. PeaKO often improves elucidation of the target factor and highlights the benefits of KO controls, which far outperform input controls.PeaKO is freely available at https://peako.hoffmanlab.org.michael.hoffman@utoronto.ca
@article{peaKO, author = {Denisko, Danielle and Viner, Coby and Hoffman, Michael M}, title = {{Motif elucidation in ChIP-seq datasets with a knockout control}}, journal = {Bioinformatics Advances}, volume = {3}, number = {1}, pages = {vbad031}, year = {2023}, issn = {2635-0041}, doi = {10.1093/bioadv/vbad031}, url = {https://doi.org/10.1093/bioadv/vbad031}, eprint = {https://academic.oup.com/bioinformaticsadvances/article-pdf/3/1/vbad031/49761827/vbad031.pdf}, }
2022
- Assessing and assuring interoperability of a genomics file formatYi Nian Niu, Eric G Roberts, Danielle Denisko, and 1 more authorBioinformatics, 2022
Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results.We developed a new verification system, Acidbio, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases—potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the Browser Extensible Data (BED) format. We also used a fuzzing approach to automatically perform additional testing. Of 80 software packages examined, 75 achieved less than 70\% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software’s performance on the test suite.Acidbio is available at https://github.com/hoffmangroup/acidbio.Supplementary data are available at Bioinformatics online.
@article{BED, author = {Niu, Yi Nian and Roberts, Eric G and Denisko, Danielle and Hoffman, Michael M}, title = {{Assessing and assuring interoperability of a genomics file format}}, journal = {Bioinformatics}, volume = {38}, number = {13}, pages = {3327-3336}, year = {2022}, issn = {1367-4803}, doi = {10.1093/bioinformatics/btac327}, url = {https://doi.org/10.1093/bioinformatics/btac327}, eprint = {https://academic.oup.com/bioinformatics/article-pdf/38/13/3327/49884326/btac327.pdf}, }
2021
- GA4GH: International policies and standards for data sharing across genomic research and healthcareHeidi L. Rehm, Angela J.H. Page, Lindsay Smith, and 199 more authorsCell Genomics, 2021
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.
@article{GA4GH, title = {GA4GH: International policies and standards for data sharing across genomic research and healthcare}, author = {Rehm, {Heidi L.} and Page, {Angela J.H.} and Smith, Lindsay and Adams, {Jeremy B.} and Alterovitz, Gil and Babb, {Lawrence J.} and Barkley, {Maxmillian P.} and Baudis, Michael and Beauvais, {Michael J.S.} and Beck, Tim and Beckmann, {Jacques S.} and Beltran, Sergi and Bernick, David and Bernier, Alexander and Bonfield, {James K.} and Boughtwood, {Tiffany F.} and Bourque, Guillaume and Bowers, {Sarion R.} and Brookes, {Anthony J.} and Brudno, Michael and Brush, {Matthew H.} and Bujold, David and Burdett, Tony and Buske, {Orion J.} and Cabili, {Moran N.} and Cameron, {Daniel L.} and Carroll, {Robert J.} and Casas-Silva, Esmeralda and Chakravarty, Debyani and Chaudhari, {Bimal P.} and Chen, {Shu Hui} and Cherry, {J. Michael} and Chung, Justina and Cline, Melissa and Clissold, {Hayley L.} and Cook-Deegan, {Robert M.} and Courtot, M{\'e}lanie and Cunningham, Fiona and Cupak, Miro and Davies, {Robert M.} and Denisko, Danielle and Doerr, {Megan J.} and Dolman, {Lena I.} and Dove, {Edward S.} and Dursi, {L. Jonathan} and Dyke, {Stephanie O.M.} and Eddy, {James A.} and Eilbeck, Karen and Ellrott, {Kyle P.} and Fairley, Susan and Fakhro, {Khalid A.} and Firth, {Helen V.} and Fitzsimons, {Michael S.} and Fiume, Marc and Flicek, Paul and Fore, {Ian M.} and Freeberg, {Mallory A.} and Freimuth, {Robert R.} and Fromont, {Lauren A.} and Fuerth, Jonathan and Gaff, {Clara L.} and Gan, Weiniu and Ghanaim, {Elena M.} and Glazer, David and Green, {Robert C.} and Griffith, Malachi and Griffith, {Obi L.} and Grossman, {Robert L.} and Groza, Tudor and {Guidry Auvil}, {Jaime M.} and Guig{\'o}, Roderic and Gupta, Dipayan and Haendel, {Melissa A.} and Hamosh, Ada and Hansen, {David P.} and Hart, {Reece K.} and Hartley, {Dean Mitchell} and Haussler, David and Hendricks-Sturrup, {Rachele M.} and Ho, {Calvin W.L.} and Hobb, {Ashley E.} and Hoffman, {Michael M.} and Hofmann, {Oliver M.} and Holub, Petr and Hsu, {Jacob Shujui} and Hubaux, Jean-Pierre and Hunt, {Sarah E.} and Husami, Ammar and Jacobsen, {Julius O.} and Jamuar, {Saumya S.} and Janes, {Elizabeth L.} and Jeanson, Francis and Jen{\'e}, Aina and Johns, {Amber L.} and Joly, Yann and Jones, {Steven J.M.} and Kanitz, Alexander and Kato, Kazuto and Keane, {Thomas M.} and Kekesi-Lafrance, Kristina and Kelleher, Jerome and Kerry, Giselle and Khor, Seik-Soon and Knoppers, {Bartha M.} and Konopko, {Melissa A.} and Kosaki, Kenjiro and Kuba, Martin and Lawson, Jonathan and Leinonen, Rasko and Li, Stephanie and Lin, {Michael F.} and Linden, Mikael and Liu, Xianglin and Liyanage, {Isuru Udara} and Lopez, Javier and Lucassen, {Anneke M.} and Lukowski, Michael and Mann, {Alice L.} and Marshall, John and Mattioni, Michele and Metke-Jimenez, Alejandro and Middleton, Anna and Milne, {Richard J.} and Moln{\'a}r-G{\'a}bor, Fruzsina and Mulder, Nicola and Munoz-Torres, {Monica C.} and Nag, Rishi and Nakagawa, Hidewaki and Nasir, Jamal and Navarro, Arcadi and Nelson, {Tristan H.} and Niewielska, Ania and Nisselle, Amy and Niu, Jeffrey and Nyr{\"o}nen, {Tommi H.} and O{\textquoteright}Connor, {Brian D.} and Oesterle, Sabine and Ogishima, Soichi and {Ota Wang}, Vivian and Paglione, {Laura A.D.} and Palumbo, Emilio and Parkinson, {Helen E.} and Philippakis, {Anthony A.} and Pizarro, {Angel D.} and Prlic, Andreas and Rambla, Jordi and Rendon, Augusto and Rider, {Renee A.} and Robinson, {Peter N.} and Rodarmer, {Kurt W.} and Rodriguez, {Laura Lyman} and Rubin, {Alan F.} and Rueda, Manuel and Rushton, {Gregory A.} and Ryan, {Rosalyn S.} and Saunders, {Gary I.} and Schuilenburg, Helen and Schwede, Torsten and Scollen, Serena and Senf, Alexander and Sheffield, {Nathan C.} and Skantharajah, Neerjah and Smith, {Albert V.} and Sofia, {Heidi J.} and Spalding, Dylan and Spurdle, {Amanda B.} and Stark, Zornitza and Stein, {Lincoln D.} and Suematsu, Makoto and Tan, Patrick and Tedds, {Jonathan A.} and Thomson, {Alastair A.} and Thorogood, Adrian and Tickle, {Timothy L.} and Tokunaga, Katsushi and T{\"o}rnroos, Juha and Torrents, David and Upchurch, Sean and Valencia, Alfonso and Guimera, {Roman Valls} and Vamathevan, Jessica and Varma, Susheel and Vears, {Danya F.} and Viner, Coby and Voisin, Craig and Wagner, {Alex H.} and Wallace, {Susan E.} and Walsh, {Brian P.} and Williams, {Marc S.} and Winkler, {Eva C.} and Wold, {Barbara J.} and Wood, {Grant M.} and Woolley, {J. Patrick} and Yamasaki, Chisato and Yates, {Andrew D.} and Yung, {Christina K.} and Zass, {Lyndon J.} and Zaytseva, Ksenia and Zhang, Junjun and Goodhand, Peter and North, Kathryn and Birney, Ewan}, year = {2021}, doi = {https://doi.org/10.1016/j.xgen.2021.100029}, volume = {1}, journal = {Cell Genomics}, issn = {2666-979X}, publisher = {Cell Press}, }
2018
- Classification and interaction in random forestsDanielle Denisko, and Michael M HoffmanProceedings of the National Academy of Sciences, 2018
@article{Denisko2018, author = {Denisko, Danielle and Hoffman, Michael M}, doi = {10.1073/pnas.1800256115}, issn = {0027-8424}, journal = {Proceedings of the National Academy of Sciences}, mendeley-groups = {Commentary}, number = {8}, pages = {1690--1692}, pmid = {29440440}, title = {{Classification and interaction in random forests}}, url = {http://www.pnas.org/content/115/8/1690.abstract}, volume = {115}, year = {2018}, }