Autocast #1235

haytham2597 · 2024-02-11T19:39:44Z

Soon i will try make AMP (Automatic Mixed Precision) with GradScaler.

haytham2597 · 2024-02-11T19:42:50Z

@dotnet-policy-service agree

NiklasGustafsson · 2024-02-12T15:44:40Z

src/Native/LibTorchSharp/THSTorch.cpp

+int8_t THSTorch_get_autocast_gpu_dtype()
+{
+    //TODO: Implement AUTOCAST AMP AND GRADSCALER


Is this a work-in-progress PR, or something you're submitting for approval and merging? If the latter, then please create an issue to track "to do" items and add some unit tests.

U can merging this if you want, this dont break anything (far as I know). But may useful for someone who want use that autocast function manually. My idea and plan is to make AMP, GradScaler, these modules use the functions I added.
Thank, I will try take into account about issue "to do" and unit tests. Sorry.

NiklasGustafsson · 2024-02-12T15:45:20Z

@haytham2597 -- thank you for your first PR! Much appreciated. Please see the comment I made in the review.

haytham2597 · 2024-02-18T18:43:38Z

Do not merge, i keep have some issue.

NiklasGustafsson · 2024-02-20T17:54:17Z

Lots of errors in the build on everything except the .NET FX builds (which don't have System.Range):

https://dev.azure.com/dotnet/TorchSharp/_build/results?buildId=103093&view=logs&j=80b813b5-9a08-5859-11a8-dc0e5b556e52&t=d3977768-5d05-5555-eccf-169680cb7093

lintao185 · 2024-04-08T07:45:29Z

I am very happy to see this proposal.

NiklasGustafsson · 2024-04-18T20:07:17Z

@haytham2597 -- just a gentle ping! I think this PR would be very valuable, but it's still a draft, and thus I will not merge it. I also had some comments in my review.

haytham2597 · 2024-04-19T02:40:54Z

@haytham2597 -- just a gentle ping! I think this PR would be very valuable, but it's still a draft, and thus I will not merge it. I also had some comments in my review.

Yeah, but sorry i am very busy with studied and work. I need managed very well about my time for making some progress on this pull requests, i mean this is very useful for me too.
But i can provide some idea about this if you want continue.

While the autocast is inside on scope automatically convert the tensor to dtype of autocast.
For example

torch.Tensor a;
using(var ac = torch.NewAutocast()){
      torch.Tensor b = a;
      torch.Tensor c = torch.arange(...)
}

The b and c should automatically converted to float16 (if that is dtype of mixed precision from f32) including all weight/bias of modules that found inside i mean the module, example: ResNet should passed to mixed precision.

The idea Is very similar that you do with

using (var d = torch.NewDisposeScope())

And in outer scope need back to original dtype. Because the neural should backward with original dtype (on my understood)
With my external THS_Autocast u can determine the dtype that should passed/work and if is enabled/disabled too
I don't know if I explained myself correctly, but feel free to ask.

NiklasGustafsson · 2024-04-19T16:22:31Z

Yeah, no pressure!

We all have other things to do, so I understand completely. Just wanted to let you know we haven't forgotten about your work, and that it will be appreciated, if and when you find time.

GilesBathgate · 2024-06-10T14:46:26Z

I would also like to see this completed. It should help with #1136 as well.

ingted · 2024-06-19T16:43:17Z

Really need this!! Thank you!!

haytham2597 · 2024-07-02T21:25:25Z

About AMP or Autocast, @NiklasGustafsson do you have any idea what the "only" (or more abstraction) method is to obtain the tensor? Because in autocast for example, inner-scope on Autocast should all tensors pass to Float16, So the problem is Tensor have so much operation (ie: sum, prod, some linalg, div, etc.) And i should in every method cast the tensor to specific ScalarType. But I want to see where is one method for that, I thinking about using the IntPtr of Tensor and each call of this (because some method uses that, like prod, sum, etc use that IntPtr) and casting to that ScalarType. Is best idea work with IntPtr tensor right?

P.D: I don't know why i can Compile but cannot run Test so rare.

NiklasGustafsson · 2024-07-25T13:37:36Z

It looks like you're expecting the element type of 'b' to change after you exit the dynamic scope, is that right?

That would mean that you have to do the type conversion in place, at least from the perspective of the managed instance that 'b' refers to -- i.e. replace the handle to the native tensor rather than create a new managed instance. Is that what your code is doing?

haytham2597 · 2024-07-25T14:47:51Z

@NiklasGustafsson

Yes I trying change the dtype of B. But i think is not bad my code, because of Cuda OPS and the example of [§4]

A few hours ago I noticed that t in my code i keep certain IntPtr values that can change. In all instances I keep IntPtr but both outside and inside the scope they always differ.

[§1]

//From https://github.com/dotnet/TorchSharp/blob/b032342a78435ba6eb197e4e7db53469ac176aa8/src/TorchSharp/Tensor/Tensor.Math.cs#L1289
public Tensor mul(Scalar target)
{
    //For example Handle = 0x168
    var res = THSTensor_mul_scalar(Handle, target.Handle); //Now res is 0x196
    if (res == IntPtr.Zero) { CheckForErrors(); }
    return new Tensor(res);
}

[§2]

//From my src/TorchSharp/Amp/AMPManager.cs
private void Revert()
{
    for (int i = 0; i < TensorsCasts.Count; i++) {
        var tc = TensorsCasts[i];
        tc.Handle= To(tc.Handle, tc.Dtype); 
    }
}

Now like my last comment b=b.mul(1); the B is now completely new IntPtr, That is, I am saving the tensor "references" wrong. Because always change, so i never can revert in that way. I think.

In my code for holding IntPtr i do this:
[§3]

//From src/TorchSharp/Tensor/Tensor.cs in internal Tensor(IntPtr handle)
if (AMPManager.GetInstance().IsEnabled) {
    this.handle = AMPManager.GetInstance().Work(handle, this.handle); //Can ignore second argument because i was testing other things
} else {
    this.handle = handle;
}

I'm getting dizzy but I think that in these code examples; §1, for example 0x168 is no longer available except 0x196.

Update:
[§4]

a_float32 = torch.rand((8, 8), device="cuda")
b_float32 = torch.rand((8, 8), device="cuda")
a_float32_mul = torch.rand((8, 8), device="cuda")
print(f"Dtype of a_float32 Before autocast: {a_float32.dtype}")
print(f"Dtype of a_float32_mul Before autocast: {a_float32_mul.dtype}")
with torch.autocast(device_type="cuda"):
	e_float16= torch.mm(a_float32, b_float32)
	a_float32= torch.mm(a_float32, b_float32)
	a_float32_mul= a_float32_mul.mul(2) 
	print(f"Dtype of e_float16: {e_float16.dtype}")
	print(f"Dtype of a_float32: {a_float32.dtype}")
	print(f"Dtype of a_float32_mul: {a_float32_mul.dtype}")
	
print(f"Dtype of a_float32 OUTSCOPE: {a_float32.dtype}")
print(f"Dtype of a_float32_mul OUTSCOPE: {a_float32_mul.dtype}")

Dtype of a_float32 Before autocast: torch.float32
Dtype of a_float32 Before autocast: torch.float32
Dtype of e_float16: torch.float16
Dtype of a_float32: torch.float16
Dtype of a_float32_mul: torch.float32
Dtype of a_float32 OUTSCOPE: torch.float16
Dtype of a_float32_mul OUTSCOPE: torch.float32

Only certain operator (like torch.mm) keep same dtype. Mmm that mean my code is nothing wrong. I Should change dtype only for certain operator example torch.mm or another.

Glad to be closer to AMP and GradScaler.

Conclussion: I need read very well the documentation and testing well in python.

NiklasGustafsson · 2024-07-25T14:53:13Z

b = b.mul(1);

What this statement does is overwrite the variable b with a completely new instance, both native and managed.

On the other hand:

b = b.mul_(1);

would do the multiplication in place, i.e. modify the existing instance:

public Tensor mul_(Tensor target)
{
        THSTensor_mul_(Handle, target.Handle);
        CheckForErrors();
        return this;
}

NiklasGustafsson · 2024-10-25T17:04:30Z

@haytham2597:

This PR is still labeled 'Draft' -- how close do you think you're getting to having it ready to review and merge?

…TorchSharp into fast_tensor_accesor

haytham2597 · 2024-10-25T19:27:44Z

This PR is still labeled 'Draft' -- how close do you think you're getting to having it ready to review and merge?

I am closest but not enough. I need write and Test the GradScaler And need find out how autocast the Module. Including i try use the BF16 of C10 LibTorch because some operator of CPU can pass as BFloat16 also GPU and how we know the netstandard do not have Half struct only Net 5 or newer, i added the Half Struct for Older than Net 5.

TODO:

C10::BFloat16 and Test
Finish and Test GradScaler
Test Half Struct for older Net
Autocast Cuda Ops
Autocast CPU Ops Bfloat16
Autocast Model, Sequential Module
Implement Test of TestGradScalingMultiple

Autocast

51d1d95

NiklasGustafsson requested changes Feb 12, 2024

View reviewed changes

haytham2597 added 3 commits February 17, 2024 19:17

Added some features

29b4900

Fix mistake gitignore

defd582

AMP

d532402

haytham2597 marked this pull request as draft February 18, 2024 18:47

haytham2597 added 2 commits February 18, 2024 21:21

Add Print Modules Still in progress

0b839db

Add some printing module

98cabfa

Fix some dotnet build. Need fix tests

669b4fa

haytham2597 added 4 commits June 30, 2024 19:39

Fast tensor accessor for ToArray()

3940414

Update local

3469d7a

fix local build dotnet

5062339

Fast ToArray() TensorAccessor

3a467af

haytham2597 added 6 commits July 2, 2024 18:28

Fast tensor accesor

18c7528

fix accesor for every types

728c9fb

GradScaler

a9a611a

Trying fix build for azure

4a406ec

Range sequential

280c8d5

AMPManager

3c42a87

haytham2597 added 9 commits September 3, 2024 02:57

gradscale, device cuda properties, etc.

d6a0c28

some gradscaler. Need grad_scale and found_inf attr in optimizer

21ce055

Merge branch 'main' of https://github.com/dotnet/TorchSharp

e9f34c8

update v2.4.0

c70b523

some advance

36b79b9

Improve autocastmode

376f4fb

Some Autocast f16, f32

9f4a48b

fix test jit, it is literally close

f84392b

Test and some improve on autocast

197c1e4

haytham2597 mentioned this pull request Oct 21, 2024

No way to copy a tensor from gpu to cpu to pre allocated array. #1388

Open

haytham2597 added 8 commits October 21, 2024 10:18

cross between tensors, improve grad scaler and add normalize dotnet#1382

061ec44

GELU approximate dotnet#1368

851a09e

Device Properties dotnet#462

16aba79

tensor backward function signature dotnet#1376

441bbdd

Half, Bfloat16

194a1f0

some fix THSCuda

63da9c2

fast copy tensor accessor

ce679e2

rollback sln

958a187

NiklasGustafsson and others added 3 commits October 25, 2024 10:15

Merge branch 'main' into fast_tensor_accesor

abe9990

Numel

0b20f13

Merge branch 'fast_tensor_accesor' of https://github.com/haytham2597/…

7df8e46

…TorchSharp into fast_tensor_accesor

haytham2597 added 4 commits October 26, 2024 12:38

original sln and fix some issue

1aa1f25

some

572bc3e

Test and fix some error

2c33985

trying fix comp THSCuda

5a6240c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autocast #1235

Autocast #1235

haytham2597 commented Feb 11, 2024 •

edited

Loading

haytham2597 commented Feb 11, 2024

NiklasGustafsson Feb 12, 2024

haytham2597 Feb 12, 2024

NiklasGustafsson commented Feb 12, 2024

haytham2597 commented Feb 18, 2024

NiklasGustafsson commented Feb 20, 2024

lintao185 commented Apr 8, 2024

NiklasGustafsson commented Apr 18, 2024

haytham2597 commented Apr 19, 2024

NiklasGustafsson commented Apr 19, 2024

GilesBathgate commented Jun 10, 2024

ingted commented Jun 19, 2024

haytham2597 commented Jul 2, 2024 •

edited

Loading

NiklasGustafsson commented Jul 25, 2024

haytham2597 commented Jul 25, 2024

NiklasGustafsson commented Jul 25, 2024 •

edited

Loading

NiklasGustafsson commented Oct 25, 2024

haytham2597 commented Oct 25, 2024 •

edited

Loading

Autocast #1235

Are you sure you want to change the base?

Autocast #1235

Conversation

haytham2597 commented Feb 11, 2024 • edited Loading

haytham2597 commented Feb 11, 2024

NiklasGustafsson Feb 12, 2024

Choose a reason for hiding this comment

haytham2597 Feb 12, 2024

Choose a reason for hiding this comment

NiklasGustafsson commented Feb 12, 2024

haytham2597 commented Feb 18, 2024

NiklasGustafsson commented Feb 20, 2024

lintao185 commented Apr 8, 2024

NiklasGustafsson commented Apr 18, 2024

haytham2597 commented Apr 19, 2024

NiklasGustafsson commented Apr 19, 2024

GilesBathgate commented Jun 10, 2024

ingted commented Jun 19, 2024

haytham2597 commented Jul 2, 2024 • edited Loading

NiklasGustafsson commented Jul 25, 2024

haytham2597 commented Jul 25, 2024

NiklasGustafsson commented Jul 25, 2024 • edited Loading

NiklasGustafsson commented Oct 25, 2024

haytham2597 commented Oct 25, 2024 • edited Loading

haytham2597 commented Feb 11, 2024 •

edited

Loading

haytham2597 commented Jul 2, 2024 •

edited

Loading

NiklasGustafsson commented Jul 25, 2024 •

edited

Loading

haytham2597 commented Oct 25, 2024 •

edited

Loading