Name		Name	Last commit message	Last commit date
parent directory ..
.output		.output
bop/src		bop/src
gnn		gnn
scripts		scripts
tree-lstm		tree-lstm
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
environment.yml		environment.yml
package.json		package.json
turbo.json		turbo.json

README.md

ML

Tip: Use the ml script of the root package for cleaner terminal output. E.g., pnpm run ml train:gnn.

Conda Setup

All .sh scripts must be run from this directory.

Creating the Conda environment

pnpm run conda:load

Activating the Conda environment

Note: It is not possible to activate a Conda environment via a package.json-script. Instead, use the following command to activate the environment. It is not necessary to run this command if you are using the turbo commands.

source scripts/conda-activate.sh

Updating the Conda environment

Note: Make sure that the cm2ml environment is activated.

pnpm run conda:save

Encoding & Training

Use the encode:* and train:* turbo-tasks to encode and train the model.

Development

Adding new encodings

Create a script for the encoding in the scripts directory, e.g., encode-{ENCODING}.sh. For reduced execution time, consider using Bun.

Example for encoding UML raw graphs:

bun node_modules/@cm2ml/cli/bin/cm2ml.mjs batch-uml-raw-graph ../models/uml/dataset

Create the package.json script.

{
  "scripts": {
    "encode:{ENCODING}": "source scripts/encode-{ENCODING}.sh"
  }
}

Create the Turbo task in turbo.json. It must depend on ^build to ensure that it's using the latest version of the framework. Also, it must use the corresponding script as input and the generated dataset files as output.

{
  "pipeline": {
    "encode:{ENCODING}": {
      "inputs": ["scripts/encode-{ENCODING}.sh"],
      "outputs": [".input/{ENCODING}_train.json", ".input/{ENCODING}_validation.json", ".input/{ENCODING}_test.json"],
      "dependsOn": ["^build"]
    }
  }
}

Adding new evaluations

Implement your evaluation in ./{EVALUATION}/src/{EVALUATION}.py.
Create a script for the evaluation in the scripts directory, e.g., train-{EVALUATION}.sh. For reduced execution time, consider using Bun.

Example:

source scripts/conda-activate.sh

python {EVALUATION}/src/{EVALUATION}.py {ENCODING}_train.json {ENCODING}_validation.json {ENCODING}_test.json

Create the package.json script.

{
  "scripts": {
    "train:{EVALUATION}": "source scripts/train-{EVALUATION}.sh"
  }
}

Create the Turbo task in turbo.json. It must depend on the task of the dataset it uses and use the corresponding script, as well as the output of the encoding task and the Python source as inputs.

{
  "pipeline": {
    "train:{EVALUATION}": {
      "inputs": ["scripts/train-{EVALUATION}.sh", ".input/{ENCODING}_train.json", ".input/{ENCODING}_validation.json", ".input/{ENCODING}_test.json", "{EVALUATION}/src/**"],
      "dependsOn": ["encode:{ENCODING}"]
    }
  }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml

ml

README.md

ML

Conda Setup

Creating the Conda environment

Activating the Conda environment

Updating the Conda environment

Encoding & Training

Development

Adding new encodings

Adding new evaluations

Files

ml

Directory actions

More options

Directory actions

More options

Latest commit

History

ml

Folders and files

parent directory

README.md

ML

Conda Setup

Creating the Conda environment

Activating the Conda environment

Updating the Conda environment

Encoding & Training

Development

Adding new encodings

Adding new evaluations