Add for option to use tensor hooks for Dynamic Linear #198

drisspg · 2024-01-30T19:41:05Z

Summary

This is a duplicate of: #170
With more testing, ideally I think we wouldn't have the choice between hooks and modified forwards and just use hooks. However compile does not appear to support this yet

drisspg · 2024-01-30T19:41:21Z

float8_experimental/config.py

@@ -14,3 +14,8 @@
 # this doesn't work with autocast + torch.compile + FSDP. Enabling this
 # option is useful for safety, but not strictly necessary.
 enable_pre_and_post_forward = True
+
+# If True, dynamic linear uses hooks for activation casting


need to figure out if we want this

drisspg · 2024-01-30T19:41:50Z

test/test_base.py

+    @pytest.mark.parametrize(
+        "linear_dtype", [torch.float16, torch.bfloat16, torch.float32]
+    )
+    @pytest.mark.parametrize("use_activation_hooks", [True, False])


There was some testing that was globbed together before this split the test into two

drisspg · 2024-01-30T20:04:49Z

@bdhirsh , @wanchaol This is a dupe and some on #170. Module hooks indeed fail with compile today for the same, multiple fake modes in env

wanchaol

Looks great!! I have only one comment as I thought we can avoid the backward prehook by just using a forward hook + the existing autograd function.

wanchaol · 2024-01-31T19:05:29Z

float8_experimental/float8_dynamic_linear.py

+    return module.cast_to_float8_e4m3fn(args[0])
+
+
+def cast_dldy_to_float8_e5m2_backward_pre_hook(module, grad_output):


testing my understanding:

It looks to me that the torch.compile issue is related to the backward_prehook and tensor subclass interactions.

But could this be solved by using module.register_forward_hook and then inside the forward hook we call y = self.cast_to_float8_e5m2_bw(y) just like the current casting?

This would solve the backward prehook issue I believe, but not sure if it would hit subclass issues (hopefully not)

ahhh great, idea unfortunately still erroring for compile..

I think that is a little cleaner, is that enough for a good interaction with DTensor?

I have this locally and can push up but don't know which one ultimately gets us closer, forward_hook orfull_backward_pre_hook

Yeah I think a forward_hook might be a little cleaner and easier for DTensor to interact. We would need to actually try composing those hooks to see if there's any gap. I feel let's try land the forward_hook approach in this PR, and if we found we need full_backward_pre_hook instead of forward_hook later, we can always change it back later if needed?

sounds good!

wanchaol

stamp to unblock, thanks for getting this to work!

facebook-github-bot · 2024-01-31T22:21:17Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-01-31T23:31:35Z

@drisspg merged this pull request in 6df3e55.

add tests and wire up the hooks

7b54ae8

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 30, 2024

drisspg commented Jan 30, 2024

View reviewed changes

add more tests

ac5c905

drisspg requested review from wanchaol and bdhirsh January 30, 2024 20:03

drisspg mentioned this pull request Jan 30, 2024

Float8 + DTensor Integration #194

Closed

1 task

wanchaol reviewed Jan 31, 2024

View reviewed changes

wanchaol approved these changes Jan 31, 2024

View reviewed changes

use forward hook instead

35d7563

facebook-github-bot closed this in 6df3e55 Jan 31, 2024

facebook-github-bot added the Merged label Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add for option to use tensor hooks for Dynamic Linear #198

Add for option to use tensor hooks for Dynamic Linear #198

drisspg commented Jan 30, 2024

drisspg Jan 30, 2024

drisspg Jan 30, 2024

drisspg commented Jan 30, 2024

wanchaol left a comment

wanchaol Jan 31, 2024

drisspg Jan 31, 2024 •

edited

Loading

wanchaol Jan 31, 2024

drisspg Jan 31, 2024

wanchaol left a comment

facebook-github-bot commented Jan 31, 2024

facebook-github-bot commented Jan 31, 2024

		return module.cast_to_float8_e4m3fn(args[0])


		def cast_dldy_to_float8_e5m2_backward_pre_hook(module, grad_output):

Add for option to use tensor hooks for Dynamic Linear #198

Add for option to use tensor hooks for Dynamic Linear #198

Conversation

drisspg commented Jan 30, 2024

Summary

drisspg Jan 30, 2024

Choose a reason for hiding this comment

drisspg Jan 30, 2024

Choose a reason for hiding this comment

drisspg commented Jan 30, 2024

wanchaol left a comment

Choose a reason for hiding this comment

wanchaol Jan 31, 2024

Choose a reason for hiding this comment

drisspg Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

wanchaol Jan 31, 2024

Choose a reason for hiding this comment

drisspg Jan 31, 2024

Choose a reason for hiding this comment

wanchaol left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jan 31, 2024

facebook-github-bot commented Jan 31, 2024

drisspg Jan 31, 2024 •

edited

Loading