feat: FastqToBam can extract UMI(s) from the comment in the read name #989

nh13 · 2024-05-20T16:53:36Z

No description provided.

src/main/scala/com/fulcrumgenomics/fastq/FastqToBam.scala

src/test/scala/com/fulcrumgenomics/umi/UmisTest.scala

codecov · 2024-05-20T16:59:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.63%. Comparing base (ba0788e) to head (9856fe1).
Report is 12 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #989   +/-   ##
=======================================
  Coverage   95.62%   95.63%           
=======================================
  Files         126      126           
  Lines        7364     7380   +16     
  Branches      500      498    -2     
=======================================
+ Hits         7042     7058   +16     
  Misses        322      322

Flag	Coverage Δ
unittests	`95.63% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

msto · 2024-05-20T17:04:15Z

src/test/scala/com/fulcrumgenomics/fastq/FastqToBamTest.scala

+    recs(0).apply[String]("RX") shouldBe "ACGT-CGTA-GG-CC"
+    recs(1).apply[String]("RX") shouldBe "TTGA-TAAT-TA-AA"


Why are these suffixed with -GG-CC and -TA-AA?

As per the method docs, the UMIs may be extracted from the read names, the read sequences, or both. In this case, the read structure shows UMI bases in the read sequences themselves, as well as the comment in the read name header, so we get four (!) UMI segments, two from the read sequences, and two from the comment in the read header.

feat: FastqToBam can extract UMI(s) from the comment in the read name

a316211

nh13 temporarily deployed to github-actions May 20, 2024 16:53 — with GitHub Actions Inactive

nh13 requested review from tfenne and msto May 20, 2024 16:53