-
Notifications
You must be signed in to change notification settings - Fork 260
[WIP] [spirv] Add Vulkan runtime. #118
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks @denis0x0D!
I've quite a few inline comments. But generally, I think we should add more documentation and improve how errors are reported. :)
include/mlir/Support/VulkanRuntime.h
Outdated
@@ -0,0 +1,59 @@ | |||
//===- VulkanRuntime.h - MLIR Vulkan runtime ------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: -*- C++ -*-===//
at the end.
include/mlir/Support/VulkanRuntime.h
Outdated
// limitations under the License. | ||
// ============================================================================= | ||
// | ||
// This file specifies VulkanDeviceMemoryBuffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This file provides a library for running a module on a Vulkan device." ?
include/mlir/Support/VulkanRuntime.h
Outdated
|
||
#include <vulkan/vulkan.h> | ||
|
||
using Descriptor = uint32_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more comments to this? Normally a descriptor includes two numbers. We only have one here. Would be nice to be clear on what it means.
include/mlir/Support/VulkanRuntime.h
Outdated
|
||
using Descriptor = uint32_t; | ||
|
||
/// Represents device memory buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" Struct containing information regarding a ..." ?
include/mlir/Support/VulkanRuntime.h
Outdated
VkDeviceMemory deviceMemory; | ||
}; | ||
|
||
/// Represents host memory buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Struct containing information regarding a ..."?
} | ||
|
||
LogicalResult VulkanRuntime::run() { | ||
if (failed(vulkanCreateInstance())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (... || ... || ...)
VkApplicationInfo applicationInfo; | ||
applicationInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO; | ||
applicationInfo.pNext = nullptr; | ||
applicationInfo.pApplicationName = "Vulkan MLIR runtime"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MLIR Vulkan runtime
vkEnumeratePhysicalDevices(instance, &physicalDeviceCount, 0), | ||
"vkEnumeratePhysicalDevices"); | ||
|
||
llvm::SmallVector<VkPhysicalDevice, 0> physicalDevices(physicalDeviceCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 ? Most hosts should just have one physical device I think.
vkCreateDevice(physicalDevice, &deviceCreateInfo, 0, &device), | ||
"vkCreateDevice"); | ||
|
||
VkPhysicalDeviceMemoryProperties properties; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's initialize these local variables with = {}
otherwise MSAN might be unhappy.
VkPhysicalDeviceMemoryProperties properties; | ||
vkGetPhysicalDeviceMemoryProperties(physicalDevice, &properties); | ||
|
||
for (uint32_t i = 0, e = properties.memoryTypeCount; i < e; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put comments here on what we are doing here so it's easier for others to follow?
@antiagainst thanks a lot for review! I will update the patch regarding to the comments. |
include/mlir/Support/VulkanRuntime.h
Outdated
@@ -0,0 +1,59 @@ | |||
//===- VulkanRuntime.h - MLIR Vulkan runtime ------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should not be in /Support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@River707 thanks, I'll delete it, since it's just for the test purpose, I think I can declare the function inside the test file.
Implements initial version of Vulkan runtime to test spirv::ModuleOp. Creates and runs Vulkan computation pipeline. Requirements: - Vulkan capable device and drivers installed. - Vulkan SDK installed and VULKAN_SDK environment variable is set. How to build: - Provide -DMLIR_VULKAN_RUNNER_ENABLED=ON as cmake variable.
e1220ee
to
6ab4a57
Compare
@antiagainst I've updated the patch regarding to your comments and have some thoughts about the error reporting machinery in particular and about the vulkan-runner in general,
2.1. Compile-time passes: a. Pass which serializes spir-v module into binary and inserts it as an attribute, similar to GpuToCubinPass:
b. Pass which creates global constant containing spir-v binary, similar to GpuGenerateCubinAccessorsPass:
c. And the final pass which converts launch call into calls for Vulkan runtime wrappers, instruments calls for buffers registration into Vulkan runtime and so on. 2.2. Runtime part must consist of two parts: Vulkan runtime wrappers over actual runtime and actual Vulkan runtime which manages memory, descriptors, layouts, pipeline and so on.
Vulkan runtime wrappers could look like this:
VulkanRuntimeManager could look like this:
In this case we can at first register all needed information for Vulkan runtime such as resources (set, binding), entry point, number of work groups, then create device memory buffers, descriptor set layouts, pipeline layout and bind it to computation pipeline, create command buffer and finaly submit command buffer to the working queue, what do you think? |
namespace { | ||
|
||
/// Vulkan runtime. | ||
/// The purpose of this class is to run spir-v computaiton shader on Vulkan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: computaiton
-> computation
/// spir-v shader, number of work groups and entry point. After the creation of | ||
/// VulkanRuntime, special methods must be called in the following | ||
/// sequence: initRuntime(), run(), updateHostMemoryBuffers(), destroy(); | ||
/// each method in the sequence returns sussses or failure depends on the Vulkan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sussses
-> success
VkDevice device; | ||
VkQueue queue; | ||
|
||
/// Specifies VulkanDeviceMemoryBuffers devided into sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: devided
-> divided
return failure(); | ||
} | ||
|
||
// Descriptor bindings devided into sets. Each descriptor binding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: devided
-> divided
|
||
TEST_F(RuntimeTest, SimpleTest) { | ||
// SPIRV module embedded into the string. | ||
// This module contains 4 resource variables devided into 2 sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: devided
-> divided
# https://cmake.org/cmake/help/v3.7/module/FindVulkan.html | ||
if (NOT CMAKE_VERSION VERSION_LESS 3.7.0) | ||
find_package(Vulkan) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a else and a proper error message here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error message is at L29. :) The probe is not done yet here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that there is a fallback, so throwing error here is not possible indeed, but I still feel that the "you're not using the right CMake version should be something surfaced at some point as a root cause of failure.
Something like adding here: an else if ("$ENV{VULKAN_SDK}" STREQUAL "") message(FATAL_ERROR "Please use at least Make 3.7.0 or provide the VULKAN_SDK path as an environment variable")
uint32_t z{1}; | ||
}; | ||
|
||
// This is a temporary function and will be removed in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify the temporary aspect: there is a layering violation right now where you're poking at the implementation of a tools.
destroyResourceVarFloat(vars[1][1]); | ||
destroyResourceVarFloat(fmulResult); | ||
destroyResourceVarFloat(expected); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid C++ unit-tests and use a regular lit+FileCheck testing please? I think that is the whole point of using a runner.
See the cuda testing for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was just a temporary step to have functionalities implemented incrementally. We don't have all the shims and conversions like the CUDA side yet. We've already >1k LOCs here; having all the above, we are expecting another >1k LOCs. So I'm fine of landing this as-is. But if you are very uncomfortable about this, I see two ways going forward: 1) remove the C++ tests and land the functionality; 2) leaving this PR open and have following-up commits to implement the rest of the functionality so finally we can squash the intermediate state away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still on the same opinion as in September: build the right tool step by step with the right testing at every step along the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply! On my side I just have a few more nits.
# https://cmake.org/cmake/help/v3.7/module/FindVulkan.html | ||
if (NOT CMAKE_VERSION VERSION_LESS 3.7.0) | ||
find_package(Vulkan) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error message is at L29. :) The probe is not done yet here.
struct VulkanDeviceMemoryBuffer { | ||
BindingIndex bindingIndex{0}; | ||
VkDescriptorType descriptorType{VK_DESCRIPTOR_TYPE_MAX_ENUM}; | ||
VkDescriptorBufferInfo bufferInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zero initialize the other fields too? (Using VK_NULL_HANDLE
and other reasonable defaults)
uint32_t z{1}; | ||
}; | ||
|
||
/// Struct containing information regarding to a descriptor set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/to//
namespace { | ||
|
||
/// Vulkan runtime. | ||
/// The purpose of this class is to run spir-v computaiton shader on Vulkan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/spir-v/SPIR-V/
globally
|
||
void createResourceVarFloat(uint32_t id, uint32_t elementCount) { | ||
std::srand(unsigned(std::time(0))); | ||
float *ptr = new float[elementCount]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Actually it's fine to have VulkanHostMemoryBuffer to hold raw pointer. I was just saying that we should wrap these raw pointers in unique_ptr
to avoid potential leaks. For example, you can let this function to return the unique_ptr
and assign it to a local variable in the test so we don't need to explicitly call delete
later.
destroyResourceVarFloat(vars[1][1]); | ||
destroyResourceVarFloat(fmulResult); | ||
destroyResourceVarFloat(expected); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was just a temporary step to have functionalities implemented incrementally. We don't have all the shims and conversions like the CUDA side yet. We've already >1k LOCs here; having all the above, we are expecting another >1k LOCs. So I'm fine of landing this as-is. But if you are very uncomfortable about this, I see two ways going forward: 1) remove the C++ tests and land the functionality; 2) leaving this PR open and have following-up commits to implement the rest of the functionality so finally we can squash the intermediate state away.
Re: #118 (comment)
Again, sorry for the delay on reviewing! There are quite a few interesting things happened in the meantime that relates to this. Our prototyping on the Vulkan runtime side, IREE, was open sourced. (I see you've already noticed that because you posted questions there. ;-P) It is a reflection of how we actually want to approach in a more Vulkan-native way. Compared to CUDA, which provides a more developer-friendly and middle-level host API abstractions, Vulkan's host API is more low-level and much verbose. We think by modelling it in IR form (not literally but selectively on core features), we can gain the benefits of running compiler optimizations on it. It is not fully proven yet but looks quite promising thus far. We are also thinking about how to structure MLIR core and IREE regarding the components. SPIR-V dialect is in MLIR core right now, but the lowering from high-level dialect used by IREE is not; it's inside IREE. That is partially because we don't have a proper modelling of Vulkan host side in MLIR core. Here different from CUDA again, a simple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I just noticed that the PR wasn't updated by Denis recently, I got mislead because it popped up in my email box after recent comments from kiszk@
# https://cmake.org/cmake/help/v3.7/module/FindVulkan.html | ||
if (NOT CMAKE_VERSION VERSION_LESS 3.7.0) | ||
find_package(Vulkan) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that there is a fallback, so throwing error here is not possible indeed, but I still feel that the "you're not using the right CMake version should be something surfaced at some point as a root cause of failure.
Something like adding here: an else if ("$ENV{VULKAN_SDK}" STREQUAL "") message(FATAL_ERROR "Please use at least Make 3.7.0 or provide the VULKAN_SDK path as an environment variable")
@kiszk @joker-eph @antiagainst @MaheshRavishankar thanks for review, actually I was thinking that IREE covers that PR as well, and this PR is not needed anymore, but if it is still relevant to test lowering - that sounds great to me! |
IREE is an independent project, I am not aware of a plan to integrate IREE within MLIR, so having end-to-end codegen story and testing capability seems still important to me. |
sounds great to me! If it's ok, on the next iteration I'll rebase current patch on trunk, update according to the comments, delete c++ unit tests and leave this PR under WIP.
Seems to me, to implement fully working mlir-vulkan-runner also requires one more PR to cover similar functionality with https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp and to update runtime part to be consistent with new passes and mlir-vulkan-runner. I can cover it on the future PRs. |
+1 to what @joker-eph said in #118 (comment). @denis0x0D: what you said SGTM! Just wanted to let you know, we are working on enabling the lowering flow from Linalg dialect to SPIR-V right now. I think that has a higher priority. We'll create issues for tasks soon and would really appreciate it if you can help there first. :) Sorry for keeping this thread open for even longer time, but I guess because this is entirely new code so it won't be much a problem to rebase against master branch later. Thanks! |
Sounds great, I would like to help on it if possible! |
Just catching up on all the discussion here now. Apologies for not paying more attention here. As everyone agrees here, having a mlir-vulkan-runner in MLIR core is extremely useful for testing. So thanks again @denis0x0D for taking this up. Just to provide more context, as @antiagainst mentioned we have been trying to get all the pieces in place to convert from Linalg ops to GPU dialect to SPIR-V dialect. Some of the changes w.r.t to these will land soon (in a couple of days). I have also been making some fairly significant change to the SPIR-V lowering infrastructure to make it easy to build upon. (these should also land in a couple of days). While these changes don't directly affect the development of a vulkan-runner (the runner should be only be concerned about the spir-v binary generated by the dialect), they do motivate some requirements on the vulkan-runner summarized below.
The functions in the
So it would be useful if the vulkan runner can handle such module. Given this I have some follow up below on a previous comment
I am not sure we can translate a
|
On Sat, Nov 23, 2019 at 11:45 PM MaheshRavishankar ***@***.***> wrote:
Just catching up on all the discussion here now. Apologies for not paying
more attention here.
As everyone agrees here, having a mlir-vulkan-runner in MLIR core is
extremely useful for testing. So thanks again @denis0x0D
<https://github.com/denis0x0D> for taking this up. Just to provide more
context, as @antiagainst <https://github.com/antiagainst> mentioned we
have been trying to get all the pieces in place to convert from Linalg ops
to GPU dialect to SPIR-V dialect. Some of the changes w.r.t to these will
land soon (in a couple of days). I have also been making some fairly
significant change to the SPIR-V lowering infrastructure to make it easy to
build upon. (these should also land in a couple of days). While these
changes don't directly affect the development of a vulkan-runner (the
runner should be only be concerned about the spir-v binary generated by the
dialect), they do motivate some requirements on the vulkan-runner
summarized below.
1. As structured right now, going to GPU dialect will result in two
sub-modules within the main module
module {
module attributes {gpu.container_module} {
// Host side code which will contain gpu.launch_func op
}
module @fmul_kernel attributes {gpu.kernel_module} {
// Kernel functions ,i.e. FuncOps with the gpu.kernel attribute
}
}
As far as I can tell, at the moment the host code is in the module
enclosing the kernel module, they aren't siblings:
https://github.com/tensorflow/mlir/blob/master/g3doc/Dialects/GPU.md#gpulaunch_func
Did I miss anything?
The functions in the gpu.kernel_module with the gpu.kernel attribute are
targeted for SPIR-V lowering. THe lowering will create a new spv.module
within the gpu.kernel_module
module {
module attributes {gpu.container_module} {
// Host side code which will contain gpu.launch_func op
}
module @fmul_kernel attributes {gpu.kernel_module} {
spv.module {
...
}
// Kernel functions ,i.e. FuncOps with the gpu.kernel attribute
}
}
Why is the spv.module nested inside the kernel_module instead lowering the
kernel_module into a spv.module?
What do the "FuncOps with the gpu.kernel" attribute becomes in your
representation above?
(is there an example I could follow in the repo maybe?)
Thanks,
…--
Mehdi
So it would be useful if the vulkan runner can handle such module. Given
this I have some follow up below on a previous comment
@joker-eph <https://github.com/joker-eph> @antiagainst
<https://github.com/antiagainst>
having end-to-end codegen story and testing capability seems still
important to me.
sounds great to me!
If it's ok, on the next iteration I'll rebase current patch on trunk,
update according to the comments, delete c++ unit tests and leave this PR
under WIP.
After that I can start to work on a separate PR, the PR will implement a
pass which:
1. Translates module into spv.module.
I am not sure we can translate a module into spv.module directly since
the module contains both host and device side components. Do you mean
extracting spv.module from within a module and running them?
1. Serializes spv.module into binary form.
2. Sets serialized module as an attribute.
This is similar to GpuKernelToCubinPass does
https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp
Seems to me, to implement fully working mlir-vulkan-runner also requires
one more PR to cover similar functionality with
https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp
and to update runtime part to be consistent with new passes and
mlir-vulkan-runner. I can cover it on the future PRs.
What do you think? If you see the better plan, please fix me in this case.
Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#118?email_source=notifications&email_token=AAZXKDFCJRDZ2QY2OE4F3ZTQVIWJRA5CNFSM4IS5DSN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAFU3A#issuecomment-557865580>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZXKDGVYKFLG7BEALSKFYDQVIWJRANCNFSM4IS5DSNQ>
.
|
@MaheshRavishankar thanks for the feedback,
I was wrong about translate to spv.module inside this pass, sorry about it, I lost the context of this task :) So to enalbe
The host part is lowering to llvm ir (llvm.dialect), the serialized module becomes a So as result after the second pass we get a |
Thanks @denis0x0D . Overall what you are suggesting makes sense to me, but I am not that fluent in Vulkan speak to provide more feedback on specifics. But the interaction with the kernel to SPIR-V binary compilation makes sense. @joker-eph : Thanks for catching my mistake. Edited my post to reflect the actual layout. The main point i was raising though was that you cannot serialize the entire module to a spv.module. |
The conversion of a FuncOp with gpu.kernel attribute to spv.module predates the existence of kernel_module. It is pretty straight-forward (and would actually clean up the conversion a little bit) to lower a kernel_module to a spv.module. It is on my list of things to do. Will probably get to that in a day or so. @denis0x0D : I will try to send the change to lower kernel_module to spv.module as a pull request to the mlir github. So that might be something to look out for w.r.t to the vulkan runner. Thanks!
|
Sounds great to me! By the way, those convertions looks great https://github.com/tensorflow/mlir/tree/master/test/Conversion/GPUToSPIRV |
Thannks @MaheshRavishankar for the good catch! I was actually implicitly thinking that way and didn't notice that it is not obvious to everyone. :) I think we are already creating the mlir/lib/Conversion/GPUToSPIRV/ConvertGPUToSPIRVPass.cpp Lines 48 to 57 in d342ff9
But yes going directly from gpu.kernel_module instead of gpu.kernel is cleaner and we can have multiple entry points in the same spv.module ! (We need to do more verification along the way too.)
|
@antiagainst : In top of tree now it is added just before the FuncOp
But the correct way is to convert the entire module into a spv.Module. @denis0x0D : The conversion from GPU to SPIRV is changed now with 8c152c5 |
@MaheshRavishankar thanks for mention this! |
Late to the game and just wanted to comment that it would be really awesome if we had a lowering for the host-side of the gpu dialect in the vulkan/SPIRV context, as well. I think by using a C++ library to implement a higher-level API should make the host-side code-generation part relatively straight forward. However, as far as I have been told, it still requires quite some code to write such C++ library. I would not spend too much energy on the design of the C++ library's API and just build what makes sense for easy code generation. The host side of the dialect will likely evolve a fair bit so these abstractions won't be forever and the mlir-vulkan-runner is not meant to replace a real runtime, so simplicity is more important. |
That's the approach that has been taken in the current pull request. You'll need thousands of lines of code to bootstrap a basic Vulkan compute shader. Right now that's all hidden in a single entry point |
48dcae0
to
3722f03
Compare
This patch is addressing #60
Implements initial version of Vulkan runtime to test spirv::ModuleOp.
Creates and runs Vulkan computation pipeline.
Provides one unititest to test multiplication of two spv.arrays of float types.
Requirements:
How to build:
Note:
@antiagainst @MaheshRavishankar can you please take a look?
Thanks!