Skip to content
This repository has been archived by the owner on Feb 1, 2022. It is now read-only.

extract_xgbooost_cluster_env() and xgb.rabit.get_rank() get different rank number #106

Open
wulikai1993 opened this issue Nov 18, 2020 · 2 comments

Comments

@wulikai1993
Copy link

wulikai1993 commented Nov 18, 2020

I ran distributed training on k8s.

The rank number was got by extract_xgbooost_cluster_env() as in https://github.com/kubeflow/xgboost-operator/blob/master/config/samples/xgboost-dist/train.py#L29

However, xgb.rabit.get_rank() got another rank number as in https://github.com/kubeflow/xgboost-operator/blob/master/config/samples/xgboost-dist/train.py#L57.

There are two things confusing me:

  1. Now that extract_xgbooost_cluster_env() had got the rank number, why usexgb.rabit.get_rank() to get rank number again?
  2. Why are the two rank numbers different?
@asahalyft
Copy link

Exactly. I also see the same problem today.

(base) asaha-mbp151:maven asaha$ kubectl logs -f xgboost-asaha-rfw4as3le3u-master-0 -n asaha
starting the train job
starting to extract system env
extract the Rabit env from cluster : xgboost-asaha-rfw4as3le3u-master-0, port: 9991, rank: 0, world_size: 3 
start the master node
start listen on 0.0.0.0:9991
###### RabitTracker Setup Finished ######
##### Rabit rank setup with below envs #####
DMLC_NUM_WORKER=3
DMLC_TRACKER_URI=xgboost-asaha-rfw4as3le3u-master-0
DMLC_TRACKER_PORT=9991
DMLC_TASK_ID=0
worker(ip_address=10.46.85.245) connected!
worker(ip_address=10.46.95.126) connected!
##### Rabit rank = 1
@tracker All of 3 nodes getting started
worker(ip_address=10.44.239.26) connected!
Read data from IRIS data source with range from 50 to 100
starting to train xgboost at node with rank 1

@terrytangyuan if you could please comment.

@merlintang
Copy link
Contributor

merlintang commented Mar 17, 2021 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants