Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a CUDA viewcopy benchmark #721

Merged
merged 1 commit into from
Aug 15, 2023

Conversation

bernhardmgruber
Copy link
Member

@bernhardmgruber bernhardmgruber commented Feb 28, 2023

This example only shows variations of naive, layout oblivious copy kernels between different data layouts. No specialized layout aware copy routine is implemented.

@bernhardmgruber
Copy link
Member Author

bernhardmgruber commented Feb 28, 2023

Dataset size:    1.5GB (x2)
GMemory size:    6.2GB
Max bandwidth: 313.0GiB/s
SMs: 30
src        -> dst        alg                ms      GiB/s hash
byte[]     -> byte[]     cudaMemcpy      9.930    275.374   OK
A AoS      -> SoA MB     naive          10.362    263.874   OK
A AoS      -> SoA MB     naive GS1D     13.594    201.142   OK
SoA MB     -> A AoS      naive          14.575    187.610   OK
SoA MB     -> A AoS      naive GS1D     17.755    154.007   OK
SoA MB     -> AoSoA32    naive          10.401    262.894   OK
SoA MB     -> AoSoA32    naive GS1D     13.684    199.822   OK
AoSoA32    -> SoA MB     naive          10.240    267.021   OK
AoSoA32    -> SoA MB     naive GS1D     14.054    194.559   OK
AoSoA8     -> AoSoA32    naive          10.508    260.210   OK
AoSoA8     -> AoSoA32    naive GS1D     13.841    197.561   OK
AoSoA32    -> AoSoA8     naive          10.261    266.482   OK
AoSoA32    -> AoSoA8     naive GS1D     13.597    201.105   OK
AoSoA8     -> AoSoA64    naive          10.288    265.782   OK
AoSoA8     -> AoSoA64    naive GS1D     13.678    199.904   OK
AoSoA64    -> AoSoA8     naive          10.252    266.726   OK
AoSoA64    -> AoSoA8     naive GS1D     13.763    198.683   OK

@codecov
Copy link

codecov bot commented Feb 28, 2023

Codecov Report

Merging #721 (b658a1b) into develop (829b57d) will decrease coverage by 0.01%.
The diff coverage is n/a.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #721      +/-   ##
===========================================
- Coverage    98.79%   98.79%   -0.01%     
===========================================
  Files           75       75              
  Lines         7283     7282       -1     
===========================================
- Hits          7195     7194       -1     
  Misses          88       88              

@bernhardmgruber
Copy link
Member Author

bernhardmgruber commented Aug 14, 2023

V100:

Dataset size:    7.3GB (x2)
GMemory size:   34.1GB
Max bandwidth: 836.4GiB/s
SMs: 80
Max threads per SM: 2048

src        -> dst        alg                 ms      GiB/s hash
byte[]     -> byte[]     cudaMemcpy      18.805    727.044   OK
A AoS      -> SoA MB     naive 1D        25.266    541.110   OK
A AoS      -> SoA MB     naive 3D        27.566    495.960   OK
A AoS      -> SoA MB     naive GS 1D     29.514    463.232   OK
A AoS      -> SoA MB     naive GS 3D     93.297    146.541   OK
SoA MB     -> A AoS      naive 1D        47.524    287.683   OK
SoA MB     -> A AoS      naive 3D        52.509    260.371   OK
SoA MB     -> A AoS      naive GS 1D     76.178    179.472   OK
SoA MB     -> A AoS      naive GS 3D     83.028    164.665   OK
SoA MB     -> AoSoA32    naive 1D        19.911    686.659   OK
SoA MB     -> AoSoA32    naive 3D        22.057    619.854   OK
SoA MB     -> AoSoA32    naive GS 1D     28.440    480.728   OK
SoA MB     -> AoSoA32    naive GS 3D     73.554    185.875   OK
AoSoA32    -> SoA MB     naive 1D        20.664    661.638   OK
AoSoA32    -> SoA MB     naive 3D        22.065    619.607   OK
AoSoA32    -> SoA MB     naive GS 1D     30.250    451.959   OK
AoSoA32    -> SoA MB     naive GS 3D    105.816    129.205   OK
AoSoA8     -> AoSoA32    naive 1D        20.525    666.101   OK
AoSoA8     -> AoSoA32    naive 3D        22.680    602.809   OK
AoSoA8     -> AoSoA32    naive GS 1D     33.616    406.702   OK
AoSoA8     -> AoSoA32    naive GS 3D     94.332    144.933   OK
AoSoA32    -> AoSoA8     naive 1D        20.290    673.835   OK
AoSoA32    -> AoSoA8     naive 3D        25.422    537.788   OK
AoSoA32    -> AoSoA8     naive GS 1D     45.214    302.379   OK
AoSoA32    -> AoSoA8     naive GS 3D     55.649    245.681   OK
AoSoA8     -> AoSoA64    naive 1D        20.603    663.594   OK
AoSoA8     -> AoSoA64    naive 3D        22.759    600.733   OK
AoSoA8     -> AoSoA64    naive GS 1D     33.637    406.453   OK
AoSoA8     -> AoSoA64    naive GS 3D     97.774    139.832   OK
AoSoA64    -> AoSoA8     naive 1D        20.498    666.981   OK
AoSoA64    -> AoSoA8     naive 3D        24.875    549.616   OK
AoSoA64    -> AoSoA8     naive GS 1D     44.755    305.483   OK
AoSoA64    -> AoSoA8     naive GS 3D     54.623    250.296   OK

@bernhardmgruber
Copy link
Member Author

Nice, nvcc 11.3 gets an ICE ..

@bernhardmgruber
Copy link
Member Author

Nice, nvcc 11.3 gets an ICE ..

Dropped: #766

@bernhardmgruber bernhardmgruber merged commit bc2ecc3 into alpaka-group:develop Aug 15, 2023
30 of 34 checks passed
@bernhardmgruber bernhardmgruber deleted the copycuda branch August 15, 2023 12:44
@bernhardmgruber bernhardmgruber changed the title Add a CUDA view copy benchmark Add a CUDA viewcopy benchmark Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant