[DISCUSSION] fix float8 all-gather in FSDP2 + TP: DTensor(WeightWithDynamicFloat8CastTensor) #326

weifengpy · 2024-07-24T08:04:06Z

draft this PR for discussion, before having something landable

we see 2 problems in float8 all-gather FSDP2 + TP

FSDP2 all-gather is in bf16, but expect float8
TP all-reduce amax for weight, but expect all-reduce only for input

crux is how we dispatch torch.chunk, which is called from distribute_tensor for TP init

without this PR, torch.chunk returns Tensor. FSDP2 happens after TP, thus only see Float8Linear(weight=DTensor(_local_tensor=Tensor))
with this PR, torch.chunk returns WeightWithDynamicFloat8CastTensor

profiler trace without this PR: AR (all-reduce) for input -> AG (all-gather) -> 4 ARs for wq,k,v,o -> 1 AR for input. 4 ARs for wq,k,v,o should not happen if we precompute amax/scales for model.parameters() after opt.step()

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

weifengpy · 2024-07-24T08:05:29Z

test/test_fsdp2/test_fsdp2.py

+        self.assertTrue(
+            isinstance(colwise_param, DTensor)
+            and isinstance(
+                colwise_param._local_tensor, WeightWithDynamicFloat8CastTensor


editted: without this PR, torch.chunk returns bf16 tensor. FSDP2 happens after TP, thus only see Float8Linear(weight=DTensor(_local_tensor=Tensor))
with this PR, torch.chunk returns WeightWithDynamicFloat8CastTensor

Can you explain where the bf16 came from?

correct my word to be accurate: without this PR, torch.chunk returns plain Tensor (can be fp32 or bf16) instead of WeightWithDynamicFloat8CastTensor

weifengpy · 2024-07-24T08:09:47Z

float8_experimental/fsdp_utils.py

@@ -81,6 +81,8 @@ def precompute_float8_dynamic_scale_for_fsdp(module: nn.Module) -> None:
    torch.ops.aten.as_strided.default,
    torch.ops.aten._to_copy.default,
    torch.ops.aten._pin_memory.default,
+    torch.ops.aten.split.Tensor,


aten.split is from torch.chunk, when calling from distribute_tensor during TP init

editted: @awgu curious if you still remember the reason to return Tensor from torch.chunk instead of WeightWithDynamicFloat8CastTensor. Is it for padding? any concerns if I prefer torch.chunk to returning WeightWithDynamicFloat8CastTensor ?

@awgu curious if you still remember the reason to return bf16 from torch.chunk.

I thought that dtype and whether is WeightWithDynamicFloat8CastTensor are orthogonal. Do you mean the latter (whether is WeightWithDynamicFloat8CastTensor or not?

I think originally I only added the ops that I saw I needed. Adding aten.split and aten.clone seems okay to me.

whether is WeightWithDynamicFloat8CastTensor or not

exactly, WeightWithDynamicFloat8CastTensor or not is the key. I edited my previous comments to say right now torch.chunk returns Tensor

I think originally I only added the ops that I saw I needed

changing torch.chunk affects both TP and FSDP2. will double check FSDP2 after the change

weifengpy · 2024-07-24T08:10:58Z

float8_experimental/fsdp_utils.py

+            elif isinstance(out, DTensor) and isinstance(
+                out._local_tensor, Float8Tensor
+            ):
+                out._local_tensor._scale = scale


not sure about this change yet. just want to have someting sketchy to discuss first

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

weifengpy and others added 9 commits July 17, 2024 15:11

add unit test for FSDP2 + torch.compile(transformer block)

b5cad8d

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Merge branch 'pytorch-labs:main' into fsdp2

a6b8913

remove debug lines

272e85b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fix linter

097ceed

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

numeric baseline against compiled model

b6ebf8d

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

update README and CI

2eaa51b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Merge branch 'pytorch-labs:main' into fsdp2

969f91f

Merge branch 'pytorch-labs:main' into fsdp2

f475c40

fix float8 all-gather in 2d

cc763ce

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 24, 2024

weifengpy commented Jul 24, 2024

View reviewed changes

weifengpy marked this pull request as draft July 24, 2024 08:11

weifengpy changed the title ~~fix float8 all-gather in FSDP2 + TP: DTensor(WeightWithDynamicFloat8CastTensor)~~ [DISCUSSION] fix float8 all-gather in FSDP2 + TP: DTensor(WeightWithDynamicFloat8CastTensor) Jul 24, 2024

weifengpy requested review from awgu, wanchaol and yifuwang July 24, 2024 08:13

tested successfully

7fbb867

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSSION] fix float8 all-gather in FSDP2 + TP: DTensor(WeightWithDynamicFloat8CastTensor) #326

[DISCUSSION] fix float8 all-gather in FSDP2 + TP: DTensor(WeightWithDynamicFloat8CastTensor) #326

weifengpy commented Jul 24, 2024 •

edited

Loading

weifengpy Jul 24, 2024 •

edited

Loading

awgu Jul 24, 2024

weifengpy Jul 24, 2024

weifengpy Jul 24, 2024 •

edited

Loading

awgu Jul 24, 2024

weifengpy Jul 24, 2024

weifengpy Jul 24, 2024

[DISCUSSION] fix float8 all-gather in FSDP2 + TP: DTensor(WeightWithDynamicFloat8CastTensor) #326

Are you sure you want to change the base?

[DISCUSSION] fix float8 all-gather in FSDP2 + TP: DTensor(WeightWithDynamicFloat8CastTensor) #326

Conversation

weifengpy commented Jul 24, 2024 • edited Loading

weifengpy Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

awgu Jul 24, 2024

Choose a reason for hiding this comment

weifengpy Jul 24, 2024

Choose a reason for hiding this comment

weifengpy Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

awgu Jul 24, 2024

Choose a reason for hiding this comment

weifengpy Jul 24, 2024

Choose a reason for hiding this comment

weifengpy Jul 24, 2024

Choose a reason for hiding this comment

weifengpy commented Jul 24, 2024 •

edited

Loading

weifengpy Jul 24, 2024 •

edited

Loading

weifengpy Jul 24, 2024 •

edited

Loading