Open Access Repository

Differentially expressed genes from RNASeq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols

Corley, SM, MacKenzie, KL, Beverdam, A, Roddam, LF ORCID: 0000-0002-4152-0681 and Wilkins, MR 2017 , 'Differentially expressed genes from RNASeq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols' , Bmc Genomics, vol. 18 , pp. 1-13 , doi:

Corely 2017 PE ...pdf | Download (18MB)

| Preview


Background: RNA-Seq is now widely used as a research tool. Choices must be made whether to use paired-end(PE) or single-end (SE) sequencing, and whether to use strand-specific or non-specific (NS) library preparation kits.To date there has been no analysis of the effect of these choices on identifying differentially expressed genes(DEGs) between controls and treated samples and on downstream functional analysis.Results: We undertook four mammalian transcriptomics experiments to compare the effect of SE and PE protocolson read mapping, feature counting, identification of DEGs and functional analysis. For three of these experimentswe also compared a non-stranded (NS) and a strand-specific approach to mapping the paired-end data. SEmapping resulted in a reduced number of reads mapped to features, in all four experiments, and lower read countper gene. Up to 4.3% of genes in the SE data and up to 12.3% of genes in the NS data had read counts whichwere significantly different compared to the PE data. Comparison of DEGs showed the presence of false positives(average 5%, using voom) and false negatives (average 5%, using voom) using the SE reads. These increasedfurther, by one or two percentage points, with the NS data. Gene ontology functional enrichment (GO) of the DEGsarising from SE or NS approaches, revealed striking differences in the top 20 GO terms, with as little as 40%concordance with PE results. Caution is therefore advised in the interpretation of such results. By comparison, therewas overall consistency in gene set enrichment analysis results.Conclusions: A strand-specific protocol should be used in library preparation to generate the most reliable andaccurate profile of expression. Ideally PE reads are also recommended particularly for transcriptome assembly. WhilstSE reads produce a DEG list with around 5% of false positives and false negatives, this method can substantiallyreduce sequencing cost and this saving could be used to increase the number of biological replicates therebyincreasing the power of the experiment. As SE reads, when used in association with gene set enrichment, cangenerate accurate biological results, this may be a desirable trade-off.

Item Type: Article
Authors/Creators:Corley, SM and MacKenzie, KL and Beverdam, A and Roddam, LF and Wilkins, MR
Keywords: RNA-Seq, Transcriptomics, Paired-end reads, Single-end reads, Differential expression, Strand-specific, Non-strand-specific
Journal or Publication Title: Bmc Genomics
Publisher: Biomed Central Ltd
ISSN: 1471-2164
DOI / ID Number:
Copyright Information:

Copyright 2017 The Author(s). Licensed under Creative Commons Attribution 4.0International License (CC BY 4.0)

Related URLs:
Item Statistics: View statistics for this item

Actions (login required)

Item Control Page Item Control Page