From d4900825a804d897cb292882fd7c61bcf1c26e11 Mon Sep 17 00:00:00 2001
From: btmonier
+@@ -172,7 +182,7 @@srvCon <- "phg.maizegdb.org" |> PHGServerCon() srvCon
Server connections
+"www.my-unsecure-phg.org" |> PHGServerCon( port = 5300, @@ -202,18 +212,18 @@
Creating JVM objectshere before you continue!
To initialize, run the following command:
-+initPhg("phg/path/to/lib")
…where
-phg/path/to/lib
is thelib
directory found within the decompressed release of PHGv2. Now that the JVM has been initialized, we can build the JVM graph usingbuildHaplotypeGraph()
using the local connection object as input:+-graph <- localCon |> buildHaplotypeGraph() graph
## A HaplotypeGraph object @ 51a9ad5e +
@@ -221,9 +231,9 @@## A HaplotypeGraph object @ 16c069df ## ❯ # of ref ranges....: 38 ## ❯ # of samples.......: 2 ## ❯ # of chromosomes...: 2
Creating JVM objects
+-graph |> javaRefObj()
+## [1] "Java-Object{net.maizegenetics.phgv2.api.HaplotypeGraph@51a9ad5e}"
## [1] "Java-Object{net.maizegenetics.phgv2.api.HaplotypeGraph@16c069df}"
Returning version and memory values
@@ -232,20 +242,20 @@Returning version and memory values issues and memory allocated to the JVM. For debugging and monitoring purposes, we can use the
jvmStats()
function which creates a instance of aJvmStats
object. -+javaStats <- jvmStats() javaStats
+## ❯ Free...........: 0.227 +## ❯ Allocated......: 0.019## General Stats: ## ❯ # of JARs......: 227 ## ❯ Java version...: 17.0.12 -## ❯ PHG version....: 2.4.12.167 +## ❯ PHG version....: 2.4.16.171 ## ## Memory Stats (GB): ## ❯ Max............: 0.5 ## ❯ Total..........: 0.246 -## ❯ Free...........: 0.22 -## ❯ Allocated......: 0.026
This object contains several values:
- Total number of PHGv2 JAR files added to the class path
@@ -267,7 +277,7 @@Sample IDsreadSamples() function: -
@@ -279,7 +289,7 @@+graph |> readSamples()
## [1] "LineA" "LineB"
Reference rangesGRanges object which is a common data class in the
GenomicRanges
package: -+graph |> readRefRanges()
## GRanges object with 38 ranges and 1 metadata column: ## seqnames ranges strand | rr_id @@ -305,7 +315,7 @@
Haplotype IDs
reference range” × \times matrix
object, we can use thereadHapIds()
function: -+m <- graph |> readHapIds() # Show only first 3 columns @@ -322,7 +332,7 @@
Haplotype ID metadata
To return metadata for each haplotype ID as a
-tibble
object, we can use thereadHapIdMetaData()
function:+graph |> readHapIdMetaData()
## # A tibble: 76 × 6 ## hap_id sample_name description source checksum ref_range_hash @@ -345,7 +355,7 @@
Haplotype ID metadata (positions)To return positional information for each haplotype ID as another
tibble
object, we can use thereadHapIdPosMetaData()
function: -+graph |> readHapIdPosMetaData()
## # A tibble: 76 × 5 ## hap_id contig_start contig_end start end @@ -369,7 +379,7 @@
All hVCF data
+phgDs <- graph |> readPhgDataSet() phgDs
## A PHGDataSet object @@ -406,7 +416,7 @@
Filter data
+phgDs
## A PHGDataSet object ## ❯ # of ref ranges....: 38 @@ -430,7 +440,7 @@
Filter by sample
Ky21
- -
Mo17
+phgDs |> filterSamples(c("B73", "Ky21", "Mo17"))
Note: If samples are added to the filter collection, @@ -468,7 +478,7 @@
Filter by reference range
I can create the following
-GRanges
object and pass that to the filter method:+gr <- GRanges( seqnames = c("1", "2"), ranges = IRanges( @@ -486,7 +496,7 @@
Chaining methods
Since we have two dimensions, we can filter simultaneously by “piping” or combining filter methods in one pass:
-+gr <- GRanges( seqnames = c("1", "2"), ranges = IRanges( @@ -513,7 +523,7 @@
Global values
+phgDs
-## A PHGDataSet object ## ❯ # of ref ranges....: 38 @@ -521,19 +531,19 @@
Global values## --- ## ❯ # of hap IDs.......: 76 ## ❯ # of asm regions...: 76
+# Get number of samples/taxa phgDs |> numberOfSamples()
-## [1] 2
+# Get number of chromosomes phgDs |> numberOfChromosomes()
-## [1] 2
+# Get number of reference ranges phgDs |> numberOfRefRanges()
-## [1] 38
+# Get number of haplotype IDs phgDs |> numberOfHaplotypes()
@@ -545,7 +555,7 @@## [1] 76
Get counts of unique haplotype IDsPHGDataSet object, we can use
numberOfHaplotypes()
again, but set the internal parameterbyRefRange
toTRUE
: -+phgDs |> numberOfHaplotypes(byRefRange = TRUE)
## # A tibble: 38 × 7 ## rr_id n_haplo seqnames start end width strand @@ -570,13 +580,13 @@
Visualize counts of unique hap By default, this will produce a scatter plot of data found across all reference ranges in the reference genome (similar to a Manhattan plot): -
+- +phgDs |> plotHaploCounts()
To get more “granular” views of specific regions within our data, we can pass a
-GRanges
object from the packageGenomicRanges
to thegr
parameter:+- +# library(GenomicRanges) query <- GRanges( @@ -585,10 +595,10 @@
Visualize counts of unique hap ) phgDs |> plotHaploCounts(gr = query)
We can also query mutltiple regions of interest simultaneously by adding more observations to the
-query
object:+- +# library(GenomicRanges) query <- GRanges( @@ -597,7 +607,7 @@
Visualize counts of unique hap ) phgDs |> plotHaploCounts(gr = query)
Finally, we provide 3 separate “geometry” options for plotting using the
geom
parameter: *"l"
- line plotting (e.g.,geom_line()
) (default) *"b"
- @@ -605,9 +615,9 @@Visualize counts of unique hap plotting (e.g.,
geom_point()
)For example, if we want to switch this from a line plot to a bar plot, we can specify
-geom = "b"
:+- +phgDs |> plotHaploCounts(gr = query, geom = "b")
Visualize distributions of unique haplotype IDs @@ -615,9 +625,9 @@
Visualize distributions
To get a more “global” estimate of unique haplotype ID distribution across the genome, we can us the function
-plotHaploDist()
:+- +phgDs |> plotHaploDist()