-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the GlyCompare wiki!
GlyCompare is a novel method wherein glycans from glycomic data are decomposed to a minimal set of intermediate substructures, thus incorporating shared intermediate glycan substructures into all comparisons of glycans.
Currently, GlyCompare is available as a Python package GlyCompare and a command line tool GlyCompareCT. The parameters in gray box refers to the GlyCompare command line tool.
- This is the name of your project. If your dataset is published, usually name it as <last name of the first author>_<year published> (i.e. Sibille_2016).
-
This is a CSV file named as <Dataset Name>_abundance_table.csv, where <Dataset Name> is exactly the one you filled in the Dataset Name section.
-
The row entries are samples. The columns are glycans. The table is expected to contain column names as glycan names and row names as sample names.
Click to see example
-
This is a CSV file named as <Dataset Name>_variable_annotation.csv, where <Dataset Name> is exactly the one you filled in the Dataset Name section.
-
The annotation file should have a column called "Name" that contains glycan names. Dependent on whether your dataset is compositional or structural, the other required column is either "Glycan Structure" for structural data or "Composition" for compositional data.
-
Structureal data includes IUPAC-extended, glycoCT, WURCS, glytoucan_id, and linear_code.
Click to see example
-
Compositional data is of the form HexNAc(2)Hex(5), where HexNAc and Hex are glycans and (2) and (5) are their occurance times.
Click to see example
- Structure: Structural dataset. The syntax could be IUPAC-extended, glycoCT, WURCS, glytoucan_id, or linear_code.
- Composition: Compositional dataset. The data is of compositional form such as HexNAc(2)Hex(5)
- Glycan abundance table
-a TABLE_PATH
- Variable annotation table
-v TABLE_PATH
- linkage + structure
- structure
-s
: Your structural data doesn't contain linkage information.
Please specify the one that your data contains after -p
. It can be glycoCT
, iupac_extended
, inear_code
, wurcs
, or glytoucan_id
.
Whether to count the substructure occurrence in a single glycan once or count it as the actual present times.
- Binary
-m binary
: When constructing the substructure abundance table, each substructure is counted at most once in each glycan. - Integer
-m integer
: When constructing the substructure abundance table, each substructure is counted the actual times it appears in each glycan.
For example, A is count once in Ab3ANb4(NNa3)Ab4Gb if set to Binary and two times if set to Integer.
After glycans are decomposed to substructures, whether normalization is conducted.
- Absolute
-b
: No normalization is conducted. - Relative
If you want to restrict the analysis to a specified root
- N-glycan
-r N
: Specify the root to GlcNAc - O-glycan
-r O
: Specify the root to GalNAc - HMO/glycolipids
-r lactose
: Specify the root to Gal(b1-4)Glc (lactose) - Custom root
-r custom -u ROOT_GLYCAN_GLYCOCT
: Specify the root to a custom monosaccharide(s). You need to input the glycoCT format of your custom root in the Custom core textbox below
If you don't want to specify a single root. Then the analysis will run where every possible monosaccharide is a root.
Add -d
if you want to draw cluster maps, including pseudo_profile_clustering, motif_cluster, profile_clustering
Click to see pseudo_profile_clustering example
Click to see motif_cluster example
Click to see profile_clustering example
- Glycan abundance table
-a TABLE_PATH
- Variable annotation table
-v TABLE_PATH
- no normalization
-n none
: Use the raw abundance data - min-max normalization
-n min-max
: each element x is set to (x - min) / (max - min). - probabilistic quotient normalization
-n prob_quot
: A commonly seen normalization method in biological data described in Dieterle et al. 2006
The reference motif vector is synthesized through past datasets and is used during glyCompare analysis to generate more complete motifs. It is automatically expanded with new datasets.
-
git clone
the glyCompare repository. -
Enter the repository and change to
dev
branch throughgit checkout dev
-
You will find an
environment.yml
file within the folder. To set up the dependencies for glyCompare, you need to set up a conda environment throughconda env create -f environment.yml
-
Now you can use Jupyter notebook to open
glyCompare_working_bench.ipynb
and follow the instructions inside to proceed with glyCompare functions.
# get the repo
git clone https://github.com/yuz682/GlyCompareCT.git
# enter the repo
cd GlyCompareCT
# Create the environment with all required dependencies installed.
conda env create -f environment.yml
# Activate conda environment
conda activate glyCompareCT_env
For specific GlyCompare instruction, please refer to https://github.com/yuz682/GlyCompareCT