Skip to content
Yujie Zhang edited this page Nov 3, 2021 · 16 revisions

Welcome to the GlyCompare wiki!

What is GlyCompare?

GlyCompare is a novel method wherein glycans from glycomic data are decomposed to a minimal set of intermediate substructures, thus incorporating shared intermediate glycan substructures into all comparisons of glycans.

Currently, GlyCompare is available as a Python package GlyCompare and a command line tool GlyCompareCT. The parameters in gray box refers to the GlyCompare command line tool.

Input Files

Mandatory Inputs

a. Dataset Name

  • This is the name of your project. If your dataset is published, usually name it as <last name of the first author>_<year published> (i.e. Sibille_2016).

b. Abundance Table

  • This is a CSV file named as <Dataset Name>_abundance_table.csv, where <Dataset Name> is exactly the one you filled in the Dataset Name section.

  • The row entries are samples. The columns are glycans. The table is expected to contain column names as glycan names and row names as sample names.

    Click to see example

c. Variable Annotation

  • This is a CSV file named as <Dataset Name>_variable_annotation.csv, where <Dataset Name> is exactly the one you filled in the Dataset Name section.

  • The annotation file should have a column called "Name" that contains glycan names. Dependent on whether your dataset is compositional or structural, the other required column is either "Glycan Structure" for structural data or "Composition" for compositional data.

  • Structureal data includes IUPAC-extended, glycoCT, WURCS, glytoucan_id, and linear_code.

    Click to see example
  • Compositional data is of the form HexNAc(2)Hex(5), where HexNAc and Hex are glycans and (2) and (5) are their occurance times.

    Click to see example

Input Parameters

1. Mode

  • Structure: Structural dataset. The syntax could be IUPAC-extended, glycoCT, WURCS, glytoucan_id, or linear_code.
  • Composition: Compositional dataset. The data is of compositional form such as HexNAc(2)Hex(5)

2. For Structure mode

a. Input files

  • Glycan abundance table -a TABLE_PATH
  • Variable annotation table -v TABLE_PATH

b. Linkage information

  • linkage + structure : Your structural data contains linkage information.
  • structure -s: Your structural data doesn't contain linkage information.

c. Input data structure syntax

Please specify the one that your data contains after -p. It can be glycoCT, iupac_extended, inear_code, wurcs, or glytoucan_id.

d. Substructure Abundance Multiplier

Whether to count the substructure occurrence in a single glycan once or count it as the actual present times.

  • Binary -m binary: When constructing the substructure abundance table, each substructure is counted at most once in each glycan.
  • Integer -m integer: When constructing the substructure abundance table, each substructure is counted the actual times it appears in each glycan.

For example, A is count once in Ab3ANb4(NNa3)Ab4Gb if set to Binary and two times if set to Integer.

e. Substructure Abundance Normalization

After glycans are decomposed to substructures, whether normalization is conducted.

  • Absolute -b: No normalization is conducted.
  • Relative : A normalization is conducted so that each element x is set to (x / sum of all elements).

f. Select root

(1). Biosynth

If you want to restrict the analysis to a specified root

  • N-glycan -r N: Specify the root to GlcNAc
  • O-glycan -r O: Specify the root to GalNAc
  • HMO/glycolipids -r lactose: Specify the root to Gal(b1-4)Glc (lactose)
  • Custom root -r custom -u ROOT_GLYCAN_GLYCOCT: Specify the root to a custom monosaccharide(s). You need to input the glycoCT format of your custom root in the Custom core textbox below
(2). Epitope

If you don't want to specify a single root. Then the analysis will run where every possible monosaccharide is a root.

g. Draw cluster map for structural data

Add -d if you want to draw cluster maps, including pseudo_profile_clustering, motif_cluster, profile_clustering

Click to see pseudo_profile_clustering example
Click to see motif_cluster example
Click to see profile_clustering example

4. For Composition mode

a. Input files

  • Glycan abundance table -a TABLE_PATH
  • Variable annotation table -v TABLE_PATH

5. Please select input data normalization

  • no normalization -n none: Use the raw abundance data
  • min-max normalization -n min-max: each element x is set to (x - min) / (max - min).
  • probabilistic quotient normalization -n prob_quot: A commonly seen normalization method in biological data described in Dieterle et al. 2006

6. Download reference motif vector

The reference motif vector is synthesized through past datasets and is used during glyCompare analysis to generate more complete motifs. It is automatically expanded with new datasets.

Install GlyCompare Python library

  • git clone the glyCompare repository.

  • Enter the repository and change to dev branch through git checkout dev

  • You will find an environment.yml file within the folder. To set up the dependencies for glyCompare, you need to set up a conda environment through

    conda env create -f environment.yml

  • Now you can use Jupyter notebook to open glyCompare_working_bench.ipynb and follow the instructions inside to proceed with glyCompare functions.

Install GlyCompareCT

# get the repo
git clone https://github.com/yuz682/GlyCompareCT.git
# enter the repo
cd GlyCompareCT

# Create the environment with all required dependencies installed.
conda env create -f environment.yml

# Activate conda environment
conda activate glyCompareCT_env

For specific GlyCompare instruction, please refer to (https://github.com/yuz682/GlyCompareCT)

Clone this wiki locally