NGSPERL: A semi-automated framework for large scale next generation sequencing data analysis

Quanhu Sheng, Shilin Zhao, Mingsheng Guo, Yu Shyr

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


High-throughput sequencing technologies have been widely used in medical and biological research, especially in cancer biology. With the huge amounts of sequencing data being generated, data analysis has become the bottle-neck of the research procedure. We have designed and implemented NGSPERL, a semi-automated module-based framework, for high-throughput sequencing data analysis. Three major analysis pipelines with multiple tasks have been developed for RNA sequencing, exome sequencing, and small RNA sequencing data. Each task was developed as module. The module uses the output from the previous task as the input parameter to generate the corresponding portable batch system (PBS) script. The PBS scripts can be either submitted to cluster or run directly based on user choice. Multiple tasks can also be combined together as a single task to simplify the data analysis. Such a flexible framework will significantly automate and simplify the process of large scale sequencing data analysis.

Original languageEnglish
Pages (from-to)203-211
Number of pages9
JournalInternational Journal of Computational Biology and Drug Design
Issue number3
Publication statusPublished - 2015

All Science Journal Classification (ASJC) codes

  • Drug Discovery
  • Computer Science Applications


Dive into the research topics of 'NGSPERL: A semi-automated framework for large scale next generation sequencing data analysis'. Together they form a unique fingerprint.

Cite this