A Novel Method to Detect Bias in Short Read NGS Data. / Alnasir, Jamie; Shanahan, Hugh.
In: Journal of Integrative Bioinformatics, Vol. 14, No. 3, 2017, p. 1-9.Research output: Contribution to journal › Article › peer-review
A Novel Method to Detect Bias in Short Read NGS Data. / Alnasir, Jamie; Shanahan, Hugh.
In: Journal of Integrative Bioinformatics, Vol. 14, No. 3, 2017, p. 1-9.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - A Novel Method to Detect Bias in Short Read NGS Data
AU - Alnasir, Jamie
AU - Shanahan, Hugh
PY - 2017
Y1 - 2017
N2 - Detecting sources of bias in transcriptomic data is essential to determine signals of Biological significance. We outline a novel method to detect sequence specific bias in short read Next Generation Sequencing data. This is based on determining intra-exon correlations between specific motifs. This requires a mild assumption that short reads sampled from specific regions from the same exon will be correlated with each other. This has been implemented on Apache Spark and and used to analyse two D. melanogaster eye-antennal disc data sets generated at the same laboratory. The wild type data set indicates a variation due to motif GC content that is more significant than that found due to exon GC content. There is a clear variation in the spread of correlations between the two data sets suggesting more variability in these data sets than one would expect.
AB - Detecting sources of bias in transcriptomic data is essential to determine signals of Biological significance. We outline a novel method to detect sequence specific bias in short read Next Generation Sequencing data. This is based on determining intra-exon correlations between specific motifs. This requires a mild assumption that short reads sampled from specific regions from the same exon will be correlated with each other. This has been implemented on Apache Spark and and used to analyse two D. melanogaster eye-antennal disc data sets generated at the same laboratory. The wild type data set indicates a variation due to motif GC content that is more significant than that found due to exon GC content. There is a clear variation in the spread of correlations between the two data sets suggesting more variability in these data sets than one would expect.
KW - NGS
KW - BIAS
KW - Spark
KW - Hadoop
U2 - 10.1515/jib-2017-0025
DO - 10.1515/jib-2017-0025
M3 - Article
VL - 14
SP - 1
EP - 9
JO - Journal of Integrative Bioinformatics
JF - Journal of Integrative Bioinformatics
SN - 1613-4516
IS - 3
ER -