doc: distributed training #16

wey-gu · 2023-12-05T13:15:30Z

How to do distributed training:

Load data and prepare on graph partition

import dgl

g = ...  # load the DGLGraph object with nebula-dgl
dgl.distributed.partition_graph(g, 'mygraph', 2, 'data_root_dir')

It'll output the partitioned graph as:

data_root_dir/
  |-- mygraph.json          # metadata JSON. File name is the given graph name.
  |-- part0/                # data for partition 0
  |  |-- node_feats.dgl     # node features stored in binary format
  |  |-- edge_feats.dgl     # edge features stored in binary format
  |  |-- graph.dgl          # graph structure of this partition stored in binary format
  |
  |-- part1/                # data for partition 1
     |-- node_feats.dgl
     |-- edge_feats.dgl
     |-- graph.dgl

See more on the reference docs:

ref:

Prepare distributed training env

create a cluster of machines
upload training script and partitioned data to each cluster
- Could consider NFS/JuiceFS for ease of data access from distributed servers
SSH access, prepare SSH pub key to enable password-less SSH auth
Launch training job

ref:

https://docs.dgl.ai/en/1.1.x/tutorials/dist/1_node_classification.html#set-up-distributed-training-environment

The text was updated successfully, but these errors were encountered:

wey-gu added the documentation Improvements or additions to documentation label Dec 5, 2023

wey-gu changed the title ~~NebulaGraph FAQ~~ Doc: distributed training Dec 5, 2023

wey-gu changed the title ~~Doc: distributed training~~ doc: distributed training Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: distributed training #16

doc: distributed training #16

wey-gu commented Dec 5, 2023 •

edited

Loading

doc: distributed training #16

doc: distributed training #16

Comments

wey-gu commented Dec 5, 2023 • edited Loading

How to do distributed training:

Load data and prepare on graph partition

Prepare distributed training env

wey-gu commented Dec 5, 2023 •

edited

Loading