Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moar shader decompiler #559

Merged
merged 76 commits into from
Oct 19, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
90f4118
Renderer: Add prepareForDraw callback
wheremyfoodat Jul 24, 2024
a2b8a7b
Add fmt submodule and port shader decompiler instructions to it
wheremyfoodat Jul 24, 2024
251ff5e
Add shader acceleration setting
wheremyfoodat Jul 24, 2024
2f4c169
Hook up vertex shaders to shader cache
wheremyfoodat Jul 25, 2024
69accde
Merge branch 'master' into shader-decomp
wheremyfoodat Jul 25, 2024
efcb42a
Shader decompiler: Fix redundant compilations
wheremyfoodat Jul 25, 2024
d9f4f37
Shader Decompiler: Fix vertex attribute upload
wheremyfoodat Jul 25, 2024
2fc0922
Shader compiler: Simplify generated code for reading and faster compi…
wheremyfoodat Jul 25, 2024
2131838
Further simplify shader decompiler output
wheremyfoodat Jul 25, 2024
e8b4992
Shader decompiler: More smallen-ing
wheremyfoodat Jul 25, 2024
fd90cf7
Merge branch 'master' into shader-decomp
wheremyfoodat Jul 26, 2024
67ff1cc
Shader decompiler: Get PICA uniforms uploaded to the GPU
wheremyfoodat Jul 26, 2024
db64b0a
Shader decompiler: Readd clipping
wheremyfoodat Jul 26, 2024
9274a95
Merge branch 'master' into shader-decomp
wheremyfoodat Jul 26, 2024
67daf03
Shader decompiler: Actually `break` on control flow instructions
wheremyfoodat Jul 26, 2024
ff3afd4
Merge branch 'master' into shader-decomp
wheremyfoodat Jul 26, 2024
5eb15de
Shader decompiler: More control flow handling
wheremyfoodat Jul 26, 2024
a20982f
Shader decompiler: Fix desitnation mask
wheremyfoodat Jul 26, 2024
4470550
Shader Decomp: Remove pair member capture in lambda (unsupported on NDK)
wheremyfoodat Jul 27, 2024
37d7bad
Disgusting changes to handle the fact that hw shader shaders are 2x a…
wheremyfoodat Jul 28, 2024
9ee1c39
Shader decompiler: Implement proper output semantic mapping
wheremyfoodat Jul 28, 2024
6c738e8
Moar instructions
wheremyfoodat Jul 28, 2024
d125180
Shader decompiler: Add FLR/SLT/SLTI/SGE/SGEI
wheremyfoodat Jul 28, 2024
b3f35d8
Merge branch 'master' into shader-decomp
wheremyfoodat Jul 28, 2024
4040d88
Shader decompiler: Add register indexing
wheremyfoodat Jul 28, 2024
94bd060
Shader decompiler: Optimize mova with both x and y masked
wheremyfoodat Jul 28, 2024
59f4f23
Shader decompiler: Add DPH/DPHI
wheremyfoodat Jul 28, 2024
7209740
Fix shader caching being broken
wheremyfoodat Jul 28, 2024
0d6bef2
PICA decompiler: Cache VS uniforms
wheremyfoodat Jul 28, 2024
1c9df7c
Simply vertex cache code
wheremyfoodat Jul 28, 2024
53ee3f3
Simplify vertex cache code
wheremyfoodat Jul 28, 2024
b53df87
Merge branch 'shader-decomp' of https://github.com/wheremyfoodat/Pand…
wheremyfoodat Jul 28, 2024
ffcf352
Merge branch 'master' into shader-decomp
wheremyfoodat Jul 31, 2024
b46f7ad
Shader decompiler: Add loops
wheremyfoodat Jul 31, 2024
370aa8e
Merge branch 'master' into shader-decomp
wheremyfoodat Aug 7, 2024
c7371e3
Shader decompiler: Implement safe multiplication
wheremyfoodat Aug 7, 2024
1366e7a
Merge branch 'master' into shader-decomp
wheremyfoodat Aug 19, 2024
7e04ab7
Shader decompiler: Implement LG2/EX2
wheremyfoodat Aug 19, 2024
e481ce8
Shader decompiler: More control flow
wheremyfoodat Aug 19, 2024
943cf9b
Shader decompiler: Fix JMPU condition
wheremyfoodat Aug 19, 2024
30a6514
Merge branch 'master' into shader-decomp
wheremyfoodat Aug 20, 2024
73a5d44
Merge branch 'master' into shader-decomp
wheremyfoodat Aug 20, 2024
652b600
Shader decompiler: Convert main function to void
wheremyfoodat Aug 20, 2024
e13ef42
PICA: Start implementing GPU vertex fetch
wheremyfoodat Aug 20, 2024
cf31f7b
Merge branch 'master' into shader-decomp
wheremyfoodat Aug 23, 2024
74a341b
More hw VAO work
wheremyfoodat Aug 23, 2024
5d6f591
More hw VAO work
wheremyfoodat Aug 23, 2024
349de65
Merge branch 'shader-decomp' of https://github.com/wheremyfoodat/Pand…
wheremyfoodat Aug 24, 2024
a8b30ee
More GPU vertex fetch code
wheremyfoodat Aug 24, 2024
e34bdb6
Add GL Stream Buffer from Duckstation
wheremyfoodat Aug 24, 2024
f96b609
GL: Actually upload data to stream buffers
wheremyfoodat Aug 25, 2024
33e63f7
GPU: Cleanup immediate mode handling
wheremyfoodat Aug 25, 2024
5432a5a
Get first renders working with accelerated draws
wheremyfoodat Aug 25, 2024
e925a91
Shader decompiler: Fix control flow analysis bugs
wheremyfoodat Aug 25, 2024
37a43e2
HW shaders: Accelerate indexed draws
wheremyfoodat Aug 25, 2024
ca2d7e4
Shader decompiler: Add support for compilation errors
wheremyfoodat Aug 25, 2024
0c2ae1b
GLSL decompiler: Fall back for LITP
wheremyfoodat Aug 25, 2024
0e7697d
Add Renderdoc scope classes
wheremyfoodat Aug 25, 2024
e332ab2
Fix control flow analysis bug
wheremyfoodat Sep 2, 2024
15b6a9e
HW shaders: Fix attribute fetch
wheremyfoodat Sep 2, 2024
4a39b06
Rewriting hw vertex fetch
wheremyfoodat Sep 4, 2024
1642537
Stream buffer: Fix copy-paste mistake
wheremyfoodat Oct 4, 2024
09b0470
HW shaders: Fix indexed rendering
wheremyfoodat Oct 4, 2024
0a2bc7c
HW shaders: Add padding attributes
wheremyfoodat Oct 4, 2024
e3252ec
HW shaders: Avoid redundant glVertexAttrib4f calls
wheremyfoodat Oct 5, 2024
872a6ba
HW shaders: Fix loops
wheremyfoodat Oct 6, 2024
2b82f8b
Merge branch 'master' into shader-decomp
wheremyfoodat Oct 6, 2024
bb7b1b3
HW shaders: Make generated shaders slightly smaller
wheremyfoodat Oct 6, 2024
53097cc
Fix libretro build
wheremyfoodat Oct 6, 2024
2e1f31e
HW shaders: Fix android
wheremyfoodat Oct 6, 2024
0b28b4a
Remove redundant ubershader checks
wheremyfoodat Oct 6, 2024
ad788ea
Set accelerate shader default to true
wheremyfoodat Oct 19, 2024
9d98a3a
Shader decompiler: Don't declare VS input attributes as an array
wheremyfoodat Oct 19, 2024
cc2825e
Merge branch 'master' into shader-decomp
wheremyfoodat Oct 19, 2024
a681977
Change ubuntu-latest to Ubuntu 24.04 because Microsoft screwed up the…
wheremyfoodat Oct 19, 2024
bdf5429
fix merge conflict bug
wheremyfoodat Oct 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,6 @@
[submodule "third_party/metal-cpp"]
path = third_party/metal-cpp
url = https://github.com/Panda3DS-emu/metal-cpp
[submodule "third_party/fmt"]
path = third_party/fmt
url = https://github.com/fmtlib/fmt
4 changes: 3 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ if (NOT ANDROID)
target_link_libraries(AlberCore PUBLIC SDL2-static)
endif()

add_subdirectory(third_party/fmt)
add_subdirectory(third_party/toml11)
include_directories(${SDL2_INCLUDE_DIR})
include_directories(third_party/toml11)
Expand Down Expand Up @@ -255,6 +256,7 @@ set(HEADER_FILES include/emulator.hpp include/helpers.hpp include/termcolor.hpp
include/audio/miniaudio_device.hpp include/ring_buffer.hpp include/bitfield.hpp include/audio/dsp_shared_mem.hpp
include/audio/hle_core.hpp include/capstone.hpp include/audio/aac.hpp include/PICA/pica_frag_config.hpp
include/PICA/pica_frag_uniforms.hpp include/PICA/shader_gen_types.hpp include/PICA/shader_decompiler.hpp
include/PICA/pica_vert_config.hpp
)

cmrc_add_resource_library(
Expand Down Expand Up @@ -419,7 +421,7 @@ set(ALL_SOURCES ${SOURCE_FILES} ${FS_SOURCE_FILES} ${CRYPTO_SOURCE_FILES} ${KERN
target_sources(AlberCore PRIVATE ${ALL_SOURCES})

target_link_libraries(AlberCore PRIVATE dynarmic cryptopp glad resources_console_fonts teakra)
target_link_libraries(AlberCore PUBLIC glad capstone)
target_link_libraries(AlberCore PUBLIC glad capstone fmt::fmt)

if(ENABLE_DISCORD_RPC AND NOT ANDROID)
target_compile_definitions(AlberCore PUBLIC "PANDA3DS_ENABLE_DISCORD_RPC=1")
Expand Down
8 changes: 7 additions & 1 deletion include/PICA/gpu.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@
#include "memory.hpp"
#include "renderer.hpp"

enum class ShaderExecMode {
Interpreter, // Interpret shaders on the CPU
JIT, // Recompile shaders to CPU machine code
Hardware, // Recompiler shaders to host shaders and run them on the GPU
};

class GPU {
static constexpr u32 regNum = 0x300;
static constexpr u32 extRegNum = 0x1000;
Expand Down Expand Up @@ -45,7 +51,7 @@ class GPU {
uint immediateModeVertIndex;
uint immediateModeAttrIndex; // Index of the immediate mode attribute we're uploading

template <bool indexed, bool useShaderJIT>
template <bool indexed, ShaderExecMode mode>
void drawArrays();

// Silly method of avoiding linking problems. TODO: Change to something less silly
Expand Down
31 changes: 31 additions & 0 deletions include/PICA/pica_vert_config.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#pragma once
#include <array>
#include <cstring>
#include <type_traits>
#include <unordered_map>

#include "PICA/pica_hash.hpp"
#include "PICA/regs.hpp"
#include "bitfield.hpp"
#include "helpers.hpp"

namespace PICA {
// Configuration struct used
struct VertConfig {
PICAHash::HashType shaderHash;
PICAHash::HashType opdescHash;
u32 entrypoint;
bool usingUbershader;

bool operator==(const VertConfig& config) const {
// Hash function and equality operator required by std::unordered_map
return std::memcmp(this, &config, sizeof(VertConfig)) == 0;
}
};
} // namespace PICA

// Override std::hash for our vertex config class
template <>
struct std::hash<PICA::VertConfig> {
std::size_t operator()(const PICA::VertConfig& config) const noexcept { return PICAHash::computeHash((const char*)&config, sizeof(config)); }
};
10 changes: 5 additions & 5 deletions include/PICA/shader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,11 @@ class PICAShader {
alignas(16) std::array<vec4f, 16> inputs; // Attributes passed to the shader
alignas(16) std::array<vec4f, 16> outputs;
alignas(16) vec4f dummy = vec4f({f24::zero(), f24::zero(), f24::zero(), f24::zero()}); // Dummy register used by the JIT

// We use a hashmap for matching 3DS shaders to their equivalent compiled code in our shader cache in the shader JIT
// We choose our hash type to be a 64-bit integer by default, as the collision chance is very tiny and generating it is decently optimal
// Ideally we want to be able to support multiple different types of hash depending on compilation settings, but let's get this working first
using Hash = PICAHash::HashType;

protected:
std::array<u32, 128> operandDescriptors;
Expand All @@ -125,11 +130,6 @@ class PICAShader {
std::array<CallInfo, 4> callInfo;
ShaderType type;

// We use a hashmap for matching 3DS shaders to their equivalent compiled code in our shader cache in the shader JIT
// We choose our hash type to be a 64-bit integer by default, as the collision chance is very tiny and generating it is decently optimal
// Ideally we want to be able to support multiple different types of hash depending on compilation settings, but let's get this working first
using Hash = PICAHash::HashType;

Hash lastCodeHash = 0; // Last hash computed for the shader code (Used for the JIT caching mechanism)
Hash lastOpdescHash = 0; // Last hash computed for the operand descriptors (Also used for the JIT)

Expand Down
2 changes: 2 additions & 0 deletions include/PICA/shader_gen.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ namespace PICA::ShaderGen {
FragmentGenerator(API api, Language language) : api(api), language(language) {}
std::string generate(const PICA::FragmentConfig& config);
std::string getDefaultVertexShader();
// For when PICA shader is acceleration is enabled. Turn the PICA shader source into a proper vertex shader
std::string getVertexShaderAccelerated(const std::string& picaSource, bool usingUbershader);

void setTarget(API api, Language language) {
this->api = api;
Expand Down
7 changes: 3 additions & 4 deletions include/PICA/shader_unit.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@
#include "PICA/shader.hpp"

class ShaderUnit {

public:
PICAShader vs; // Vertex shader
PICAShader gs; // Geometry shader
public:
PICAShader vs; // Vertex shader
PICAShader gs; // Geometry shader

ShaderUnit() : vs(ShaderType::Vertex), gs(ShaderType::Geometry) {}
void reset();
Expand Down
6 changes: 4 additions & 2 deletions include/config.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,13 @@ struct EmulatorConfig {
#else
static constexpr bool ubershaderDefault = true;
#endif

static constexpr bool accelerateShadersDefault = false;

bool shaderJitEnabled = shaderJitDefault;
bool discordRpcEnabled = false;
bool useUbershaders = ubershaderDefault;
bool accelerateShaders = accelerateShadersDefault;
bool accurateShaderMul = false;
bool discordRpcEnabled = false;

// Toggles whether to force shadergen when there's more than N lights active and we're using the ubershader, for better performance
bool forceShadergenForLights = true;
Expand Down
10 changes: 8 additions & 2 deletions include/renderer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,11 @@ enum class RendererType : s8 {
};

struct EmulatorConfig;
class GPU;
struct SDL_Window;

class GPU;
class ShaderUnit;

class Renderer {
protected:
GPU& gpu;
Expand Down Expand Up @@ -77,7 +79,11 @@ class Renderer {
virtual std::string getUbershader() { return ""; }
virtual void setUbershader(const std::string& shader) {}

virtual void setUbershaderSetting(bool value) {}
// This function is called on every draw call before parsing vertex data.
// It is responsible for things like looking up which vertex/fragment shaders to use, recompiling them if they don't exist, choosing between
// ubershaders and shadergen, and so on.
// Returns whether this draw is eligible for using hardware-accelerated shaders or if shaders should run on the CPU
virtual bool prepareForDraw(ShaderUnit& shaderUnit, bool isImmediateMode) { return false; }

// Functions for initializing the graphics context for the Qt frontend, where we don't have the convenience of SDL_Window
#ifdef PANDA3DS_FRONTEND_QT
Expand Down
50 changes: 44 additions & 6 deletions include/renderer_gl/renderer_gl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,14 @@
#include <array>
#include <cstring>
#include <functional>
#include <optional>
#include <span>
#include <unordered_map>
#include <utility>

#include "PICA/float_types.hpp"
#include "PICA/pica_frag_config.hpp"
#include "PICA/pica_vert_config.hpp"
#include "PICA/pica_hash.hpp"
#include "PICA/pica_vertex.hpp"
#include "PICA/regs.hpp"
Expand All @@ -28,9 +31,11 @@ class RendererGL final : public Renderer {
OpenGL::Program triangleProgram;
OpenGL::Program displayProgram;

OpenGL::VertexArray vao;
// VAO for when not using accelerated vertex shaders. Contains attribute declarations matching to the PICA fixed function fragment attributes
OpenGL::VertexArray defaultVAO;
// VAO for when using accelerated vertex shaders. The PICA vertex shader inputs are passed as attributes without CPU processing.
OpenGL::VertexArray hwShaderVAO;
OpenGL::VertexBuffer vbo;
bool enableUbershader = true;

// Data
struct {
Expand All @@ -53,6 +58,11 @@ class RendererGL final : public Renderer {
float oldDepthScale = -1.0;
float oldDepthOffset = 0.0;
bool oldDepthmapEnable = false;
// Set by prepareDraw, tells us whether the current draw is using hw-accelerated shader
bool usingAcceleratedShader = false;

// Cached pointer to the current vertex shader when using HW accelerated shaders
OpenGL::Shader* generatedVertexShader = nullptr;

SurfaceCache<DepthBuffer, 16, true> depthBufferCache;
SurfaceCache<ColourBuffer, 16, true> colourBufferCache;
Expand All @@ -75,7 +85,37 @@ class RendererGL final : public Renderer {
struct CachedProgram {
OpenGL::Program program;
};
std::unordered_map<PICA::FragmentConfig, CachedProgram> shaderCache;

struct ShaderCache {
std::unordered_map<PICA::VertConfig, std::optional<OpenGL::Shader>> vertexShaderCache;
std::unordered_map<PICA::FragmentConfig, OpenGL::Shader> fragmentShaderCache;

// Program cache indexed by GLuints for the vertex and fragment shader to use
// Top 32 bits are the vertex shader GLuint, bottom 32 bits are the fs GLuint
std::unordered_map<u64, CachedProgram> programCache;

void clear() {
for (auto& it : programCache) {
CachedProgram& cachedProgram = it.second;
cachedProgram.program.free();
}

for (auto& it : vertexShaderCache) {
if (it.second.has_value()) {
it.second->free();
}
}

for (auto& it : fragmentShaderCache) {
it.second.free();
}

programCache.clear();
vertexShaderCache.clear();
fragmentShaderCache.clear();
}
};
ShaderCache shaderCache;

OpenGL::Framebuffer getColourFBO();
OpenGL::Texture getTexture(Texture& tex);
Expand Down Expand Up @@ -110,15 +150,13 @@ class RendererGL final : public Renderer {
virtual bool supportsShaderReload() override { return true; }
virtual std::string getUbershader() override;
virtual void setUbershader(const std::string& shader) override;

virtual void setUbershaderSetting(bool value) override { enableUbershader = value; }
virtual bool prepareForDraw(ShaderUnit& shaderUnit, bool isImmediateMode) override;

std::optional<ColourBuffer> getColourBuffer(u32 addr, PICA::ColorFmt format, u32 width, u32 height, bool createIfnotFound = true);

// Note: The caller is responsible for deleting the currently bound FBO before calling this
void setFBO(uint handle) { screenFramebuffer.m_handle = handle; }
void resetStateManager() { gl.reset(); }
void clearShaderCache();
void initUbershader(OpenGL::Program& program);

#ifdef PANDA3DS_FRONTEND_QT
Expand Down
2 changes: 2 additions & 0 deletions src/config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ void EmulatorConfig::load() {
vsyncEnabled = toml::find_or<toml::boolean>(gpu, "EnableVSync", true);
useUbershaders = toml::find_or<toml::boolean>(gpu, "UseUbershaders", ubershaderDefault);
accurateShaderMul = toml::find_or<toml::boolean>(gpu, "AccurateShaderMultiplication", false);
accelerateShaders = toml::find_or<toml::boolean>(gpu, "AccelerateShaders", accelerateShadersDefault);

forceShadergenForLights = toml::find_or<toml::boolean>(gpu, "ForceShadergenForLighting", true);
lightShadergenThreshold = toml::find_or<toml::integer>(gpu, "ShadergenLightThreshold", 1);
Expand Down Expand Up @@ -135,6 +136,7 @@ void EmulatorConfig::save() {
data["GPU"]["UseUbershaders"] = useUbershaders;
data["GPU"]["ForceShadergenForLighting"] = forceShadergenForLights;
data["GPU"]["ShadergenLightThreshold"] = lightShadergenThreshold;
data["GPU"]["AccelerateShaders"] = accelerateShaders;

data["Audio"]["DSPEmulation"] = std::string(Audio::DSPCore::typeToString(dspType));
data["Audio"]["EnableAudio"] = audioEnabled;
Expand Down
Loading
Loading