Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

Expose all devices. #1059

Open
texasmichelle opened this issue Aug 10, 2020 · 1 comment
Open

Expose all devices. #1059

texasmichelle opened this issue Aug 10, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@texasmichelle
Copy link
Member

On a machine with GPU or TPU, I get a segfault if I try to use Device with CPU type on XLA backend, e.g.:

let device = Device(kind: .CPU, ordinal: 0, backend: .XLA)
let t1 = Tensor([1, 1, 0], on: device)
let t2 = Tensor([1, 1, 0], on: device)
t1 + t2
2020-08-10 15:43:18.077050: E tensorflow/compiler/xla/xla_client/tf_logging.cc:23] Check failed: it != device_contexts_.end() 
*** Begin stack trace ***
	
	
	
	
	copyTensor
	
	
	
	
	$sSa23withUnsafeBufferPointeryqd__qd__SRyxGKXEKlF
	$s10TensorFlow9XLATensorV4make__2onACSRyxG_SaySiGAA6DeviceVtAA13XLAScalarTypeRzlFZ
	$s10TensorFlow0A0V5shape7scalars2onACyxGAA0A5ShapeV_SRyxGAA6DeviceVtcfC
	
*** End stack trace ***
No such device: CPU:0
2020-08-10 15:43:18.077121: F tensorflow/compiler/xla/xla_client/tf_logging.cc:26] tensorflow/compiler/tf2xla/xla_tensor/tensor.cpp:419 : Check failed: it != device_contexts_.end() 
*** Begin stack trace ***
	
	
	
	
	copyTensor
	
	
	
	
	$sSa23withUnsafeBufferPointeryqd__qd__SRyxGKXEKlF
	$s10TensorFlow9XLATensorV4make__2onACSRyxG_SaySiGAA6DeviceVtAA13XLAScalarTypeRzlFZ
	$s10TensorFlow0A0V5shape7scalars2onACyxGAA0A5ShapeV_SRyxGAA6DeviceVtcfC
	
*** End stack trace ***
No such device: CPU:0
Current stack trace:
	frame #21: 0x00007fb3999eb113 $__lldb_expr218`main at <Cell 28>:2

A workaround is to set the XRT_DEVICE_MAP environment variable, but all device and backend combinations should be accessible without this.

See swift-models/#654.

@BradLarson
Copy link
Contributor

BradLarson commented Aug 19, 2020

As examples of how these mappings are defined at the command line, here's how you would expose both the CPU and GPU as selectable devices (assuming a single CPU and GPU):

export XRT_DEVICE_MAP='CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0|GPU:0;/job:localservice/replica:0/task:0/device:XLA_GPU:0'

and here's how you would expose two GPUs (not exposing the CPU):

export XRT_DEVICE_MAP='GPU:0;/job:localservice/replica:0/task:0/device:XLA_GPU:0|GPU:1;/job:localservice/replica:0/task:0/device:XLA_GPU:1'

Currently, only one default device is found and exposed. If you want something other than the default, you need to manually specify the XLA -> S4TF mapping for all devices you want. The devices are parsed from the XRT_DEVICE_MAP environment variable within ParseEnvDevices here. That may be the place to add CPU support on GPU-default systems, because we can safely assume the CPU is present there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants