publications
Publications by categories in reversed chronological order. Generated by jekyll-scholar.
2024
- Genomic data processing with GenomeFlowJunseok Park, Eduardo A Maury, Changhoon Oh, and 3 more authorsBMC Bioinformatics, 2024
Advances in genome sequencing technologies generate massive amounts of sequence data that are increasingly analyzed and shared through public repositories. On-demand infrastructure services on cloud computing platforms enable the processing of such large-scale genomic sequence data in distributed processing environments with a significant reduction in analysis time. However, parallel processing on cloud computing platforms presents many challenges to researchers, even skillful bioinformaticians. In particular, it is difficult to design a computing architecture optimized to reduce the cost of computing and disk storage as genomic data analysis pipelines often employ many heterogeneous tools with different resource requirements. To address these issues, we developed GenomeFlow, a tool for automated development of computing architecture and resource optimization on Google Cloud Platform, which allows users to process a large number of samples at minimal cost. We outline multiple use cases of GenomeFlow demonstrating its utility to significantly reduce computing time and cost associated with analyzing genomic and transcriptomic data from hundreds to tens of thousands of samples from several consortia. Here, we describe a step-by-step protocol on how to use GenomeFlow for a common genomic data processing task. We introduce this example protocol geared toward a bioinformatician with little experience in cloud computing and large data processing and estimate that it will take <1 hour to execute.
@article{Park2024, author = {Park, Junseok and Maury, Eduardo A and Oh, Changhoon and Donghoon, Shin and Denisko, Danielle and Lee, Eunjung Alice}, title = {Genomic data processing with GenomeFlow}, journal = {BMC Bioinformatics}, year = {2024}, }
- Human cytomegalovirus harnesses host L1 retrotransposon for efficient replicationSung-Yeon Hwang*, Hyewon Kim*, Danielle Denisko*, and 9 more authors2024
Genetic parasites, including viruses and transposons, exploit components from the host for their own replication. However, little is known about virus-transposon interactions within host cells. Here, we discover a strategy where human cytomegalovirus (HCMV) hijacks L1 retrotransposon encoded protein during its replication cycle. HCMV infection upregulates L1 expression by enhancing both the expression of L1-activating transcription factors, YY1 and RUNX3 and the chromatin accessibility of L1 promoter regions. Increased L1 expression in turn promotes HCMV replicative fitness. Affinity proteomics reveals UL44, HCMV DNA polymerase subunit, as the most abundant viral binding protein of the L1 ribonucleoprotein (RNP) complex. UL44 directly interacts with L1 ORF2p, inducing DNA damage responses in replicating HCMV compartments. While increased L1- induced mutagenesis is not observed in HCMV for genetic adaptation, the interplay between UL44 and ORF2p accelerates viral DNA replication by resolving stalled replication forks. Our findings shed light on how HCMV exploits host retrotransposons for enhanced viral fitness.
@article{HCMV, author = {Hwang*, Sung-Yeon and Kim*, Hyewon and Denisko*, Danielle and Zhao*, Boxun and Lee, Dohoon and Jeong, Jiseok and Kim, Jinuk and Park, Kiwon and Choi, Hee-Jung and Kim, Sun and Lee, Eunjung Alice and Ahn, Kwangseog}, title = {Human cytomegalovirus harnesses host L1 retrotransposon for efficient replication}, year = {2024}, }
2023
- Motif elucidation in ChIP-seq datasets with a knockout controlDanielle Denisko, Coby Viner, and Michael M HoffmanBioinformatics Advances, 2023
Chromatin immunoprecipitation-sequencing is widely used to find transcription factor binding sites, but suffers from various sources of noise. Knocking out the target factor mitigates noise by acting as a negative control. Paired wild-type and knockout (KO) experiments can generate improved motifs but require optimal differential analysis. We introduce peaKO—a computational method to automatically optimize motif analyses with KO controls, which we compare to two other methods. PeaKO often improves elucidation of the target factor and highlights the benefits of KO controls, which far outperform input controls.PeaKO is freely available at https://peako.hoffmanlab.org.michael.hoffman@utoronto.ca
@article{peaKO, author = {Denisko, Danielle and Viner, Coby and Hoffman, Michael M}, title = {{Motif elucidation in ChIP-seq datasets with a knockout control}}, journal = {Bioinformatics Advances}, volume = {3}, number = {1}, pages = {vbad031}, year = {2023}, issn = {2635-0041}, doi = {10.1093/bioadv/vbad031}, url = {https://doi.org/10.1093/bioadv/vbad031}, eprint = {https://academic.oup.com/bioinformaticsadvances/article-pdf/3/1/vbad031/49761827/vbad031.pdf}, }
2022
- Assessing and assuring interoperability of a genomics file formatYi Nian Niu, Eric G Roberts, Danielle Denisko, and 1 more authorBioinformatics, 2022
Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results.We developed a new verification system, Acidbio, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases—potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the Browser Extensible Data (BED) format. We also used a fuzzing approach to automatically perform additional testing. Of 80 software packages examined, 75 achieved less than 70\% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software’s performance on the test suite.Acidbio is available at https://github.com/hoffmangroup/acidbio.Supplementary data are available at Bioinformatics online.
@article{BED, author = {Niu, Yi Nian and Roberts, Eric G and Denisko, Danielle and Hoffman, Michael M}, title = {{Assessing and assuring interoperability of a genomics file format}}, journal = {Bioinformatics}, volume = {38}, number = {13}, pages = {3327-3336}, year = {2022}, issn = {1367-4803}, doi = {10.1093/bioinformatics/btac327}, url = {https://doi.org/10.1093/bioinformatics/btac327}, eprint = {https://academic.oup.com/bioinformatics/article-pdf/38/13/3327/49884326/btac327.pdf}, }
2021
- GA4GH: International policies and standards for data sharing across genomic research and healthcareHeidi L. Rehm, Angela J.H. Page, Lindsay Smith, and 199 more authorsCell Genomics, 2021
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.
@article{GA4GH, title = {GA4GH: International policies and standards for data sharing across genomic research and healthcare}, author = {Rehm, {Heidi L.} and Page, {Angela J.H.} and Smith, Lindsay and Adams, {Jeremy B.} and Alterovitz, Gil and Babb, {Lawrence J.} and Barkley, {Maxmillian P.} and Baudis, Michael and Beauvais, {Michael J.S.} and Beck, Tim and Beckmann, {Jacques S.} and Beltran, Sergi and Bernick, David and Bernier, Alexander and Bonfield, {James K.} and Boughtwood, {Tiffany F.} and Bourque, Guillaume and Bowers, {Sarion R.} and Brookes, {Anthony J.} and Brudno, Michael and Brush, {Matthew H.} and Bujold, David and Burdett, Tony and Buske, {Orion J.} and Cabili, {Moran N.} and Cameron, {Daniel L.} and Carroll, {Robert J.} and Casas-Silva, Esmeralda and Chakravarty, Debyani and Chaudhari, {Bimal P.} and Chen, {Shu Hui} and Cherry, {J. Michael} and Chung, Justina and Cline, Melissa and Clissold, {Hayley L.} and Cook-Deegan, {Robert M.} and Courtot, M{\'e}lanie and Cunningham, Fiona and Cupak, Miro and Davies, {Robert M.} and Denisko, Danielle and Doerr, {Megan J.} and Dolman, {Lena I.} and Dove, {Edward S.} and Dursi, {L. Jonathan} and Dyke, {Stephanie O.M.} and Eddy, {James A.} and Eilbeck, Karen and Ellrott, {Kyle P.} and Fairley, Susan and Fakhro, {Khalid A.} and Firth, {Helen V.} and Fitzsimons, {Michael S.} and Fiume, Marc and Flicek, Paul and Fore, {Ian M.} and Freeberg, {Mallory A.} and Freimuth, {Robert R.} and Fromont, {Lauren A.} and Fuerth, Jonathan and Gaff, {Clara L.} and Gan, Weiniu and Ghanaim, {Elena M.} and Glazer, David and Green, {Robert C.} and Griffith, Malachi and Griffith, {Obi L.} and Grossman, {Robert L.} and Groza, Tudor and {Guidry Auvil}, {Jaime M.} and Guig{\'o}, Roderic and Gupta, Dipayan and Haendel, {Melissa A.} and Hamosh, Ada and Hansen, {David P.} and Hart, {Reece K.} and Hartley, {Dean Mitchell} and Haussler, David and Hendricks-Sturrup, {Rachele M.} and Ho, {Calvin W.L.} and Hobb, {Ashley E.} and Hoffman, {Michael M.} and Hofmann, {Oliver M.} and Holub, Petr and Hsu, {Jacob Shujui} and Hubaux, Jean-Pierre and Hunt, {Sarah E.} and Husami, Ammar and Jacobsen, {Julius O.} and Jamuar, {Saumya S.} and Janes, {Elizabeth L.} and Jeanson, Francis and Jen{\'e}, Aina and Johns, {Amber L.} and Joly, Yann and Jones, {Steven J.M.} and Kanitz, Alexander and Kato, Kazuto and Keane, {Thomas M.} and Kekesi-Lafrance, Kristina and Kelleher, Jerome and Kerry, Giselle and Khor, Seik-Soon and Knoppers, {Bartha M.} and Konopko, {Melissa A.} and Kosaki, Kenjiro and Kuba, Martin and Lawson, Jonathan and Leinonen, Rasko and Li, Stephanie and Lin, {Michael F.} and Linden, Mikael and Liu, Xianglin and Liyanage, {Isuru Udara} and Lopez, Javier and Lucassen, {Anneke M.} and Lukowski, Michael and Mann, {Alice L.} and Marshall, John and Mattioni, Michele and Metke-Jimenez, Alejandro and Middleton, Anna and Milne, {Richard J.} and Moln{\'a}r-G{\'a}bor, Fruzsina and Mulder, Nicola and Munoz-Torres, {Monica C.} and Nag, Rishi and Nakagawa, Hidewaki and Nasir, Jamal and Navarro, Arcadi and Nelson, {Tristan H.} and Niewielska, Ania and Nisselle, Amy and Niu, Jeffrey and Nyr{\"o}nen, {Tommi H.} and O{\textquoteright}Connor, {Brian D.} and Oesterle, Sabine and Ogishima, Soichi and {Ota Wang}, Vivian and Paglione, {Laura A.D.} and Palumbo, Emilio and Parkinson, {Helen E.} and Philippakis, {Anthony A.} and Pizarro, {Angel D.} and Prlic, Andreas and Rambla, Jordi and Rendon, Augusto and Rider, {Renee A.} and Robinson, {Peter N.} and Rodarmer, {Kurt W.} and Rodriguez, {Laura Lyman} and Rubin, {Alan F.} and Rueda, Manuel and Rushton, {Gregory A.} and Ryan, {Rosalyn S.} and Saunders, {Gary I.} and Schuilenburg, Helen and Schwede, Torsten and Scollen, Serena and Senf, Alexander and Sheffield, {Nathan C.} and Skantharajah, Neerjah and Smith, {Albert V.} and Sofia, {Heidi J.} and Spalding, Dylan and Spurdle, {Amanda B.} and Stark, Zornitza and Stein, {Lincoln D.} and Suematsu, Makoto and Tan, Patrick and Tedds, {Jonathan A.} and Thomson, {Alastair A.} and Thorogood, Adrian and Tickle, {Timothy L.} and Tokunaga, Katsushi and T{\"o}rnroos, Juha and Torrents, David and Upchurch, Sean and Valencia, Alfonso and Guimera, {Roman Valls} and Vamathevan, Jessica and Varma, Susheel and Vears, {Danya F.} and Viner, Coby and Voisin, Craig and Wagner, {Alex H.} and Wallace, {Susan E.} and Walsh, {Brian P.} and Williams, {Marc S.} and Winkler, {Eva C.} and Wold, {Barbara J.} and Wood, {Grant M.} and Woolley, {J. Patrick} and Yamasaki, Chisato and Yates, {Andrew D.} and Yung, {Christina K.} and Zass, {Lyndon J.} and Zaytseva, Ksenia and Zhang, Junjun and Goodhand, Peter and North, Kathryn and Birney, Ewan}, year = {2021}, doi = {https://doi.org/10.1016/j.xgen.2021.100029}, volume = {1}, journal = {Cell Genomics}, issn = {2666-979X}, publisher = {Cell Press}, }
2018
- Classification and interaction in random forestsDanielle Denisko, and Michael M HoffmanProceedings of the National Academy of Sciences, 2018
@article{Denisko2018, author = {Denisko, Danielle and Hoffman, Michael M}, doi = {10.1073/pnas.1800256115}, issn = {0027-8424}, journal = {Proceedings of the National Academy of Sciences}, mendeley-groups = {Commentary}, number = {8}, pages = {1690--1692}, pmid = {29440440}, title = {{Classification and interaction in random forests}}, url = {http://www.pnas.org/content/115/8/1690.abstract}, volume = {115}, year = {2018}, }