Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal, Action and Policy System #5

Open
7 of 19 tasks
mostafa opened this issue Dec 29, 2023 · 0 comments
Open
7 of 19 tasks

Signal, Action and Policy System #5

mostafa opened this issue Dec 29, 2023 · 0 comments
Assignees
Labels
core Anything related to the gatewayd core project enhancement New feature or request

Comments

@mostafa
Copy link
Member

mostafa commented Dec 29, 2023

Problem

When proxying requests and responses between the client(s) and the database in the proxy object, the traffic hooks are called. These hooks allow the plugins to inspect and possibly modify the traffic, provided that they register to those hooks. The return value of the hook functions contain two essential fields (key-value) in the req.Fields or resp.Fields:

  1. request (bytes): the binary request from the client(s).
  2. response (bytes): the binary response from the database or injected by the plugin(s).

The plugins can inspect and return a modified request and/or response. Then, GatewayD decides what to do with the modified request and/or response.

Plugins can also influence the traffic flow. To enable this, they should return an extra field to signal GatewayD to terminate the request. The terminate (boolean) is a "signal" that dictates an "action" to be taken by GatewayD.

GatewayD detects the terminate signal and decides whether to act on the signal or not based on a policy, called the termination policy. The termination policy takes either of the following values:

  1. stop (default): terminates the request and returns the response injected into the request.Fields by the plugin to the client.
  2. continue: disregards the signal and continues the traffic flow, thus forwarding the packet to the database.

The terminate signal should be injected into the request.Fields and returned by the OnTrafficFromClient hook function. And only the OnTrafficFromClient can return this signal (plus a normal or error response) to influence the traffic flow.

Signal Policy Action
terminate Termination policy (stop/continue) terminate the request by returning a response

Solution

An action system should be developed to account for actions other than just termination of the request:

  • Actions, signals and policies: signals trigger actions and policies control actions. For example, the terminate signal is returned by the plugin's OnTrafficFromClient. The termination policy controls whether to run the action or not. Act system gatewayd#451.
  • Composable actions (workflows): the actions should be chainable and composable. For example, one might want to 1) drop a request, 2) log what happened and 3) send an alert to a system in a single call. Act system gatewayd#451.
  • Order of precedence: signals returned by the plugins should not be contradictory (forward and terminate) or they should follow a weight/order of precedence to determine which one overrules. Terminal (sync) actions always overrule non-terminal (sync) actions. Act system gatewayd#451.
  • Policies and actions: since policies control actions, there should be a way to write those policies. Policies are currently written in Expr language as implemented in Act system gatewayd#451.
  • Policy types: policies are either static or dynamic. Static policies are fixed and final, while dynamic policies can be changed. All policies can be overwritten by the users as implemented in Act system gatewayd#451.
  • Default policy: the default policy dictates the final decision if either of the following conditions are met (as implemented in Act system gatewayd#451):
    • There is no signal passed by the plugin.
    • None of the policies match the given signal and action.
    • The given signal doesn't have a matching action (the action doesn't exist).
  • New actions: new actions should be developed to help with multiplexing, routing, switching, load-balancing, and other possible cases. These actions can be extended by the plugins. That is, plugins should be able to implement both the hooks and the actions (or either of them). Make actions extensible via plugins and/or a scripting language gatewayd#467.
  • Types of actions: internal actions affect the traffic flow (and other hooks), like terminate and they should be always synchronous, while external actions queue jobs and run them asynchronously via runners (plugins). External actions can be either sync (?) or async. Make actions extensible via plugins and/or a scripting language gatewayd#467.
  • Action registry: there should be a way to control which plugins get to return which signals and actions (like hook registry), so as to enable or disable the signal based on certain policies. The registry should also keep track of which plugins implement which actions, so that it can run those actions by calling those plugins. Partially implemented in Act system gatewayd#451. Make actions extensible via plugins and/or a scripting language gatewayd#467.
  • Actions in all methods: other hooks, and not just traffic hooks, should be able to return signals that trigger actions. Enable signals, policies and actions for all hook functions gatewayd#468.
  • Plugins extend actions: the plugins should be able to extend sync and async actions as a runner in addition to hooks, so that all the other plugins and themselves can return those actions from any hook, thus letting the runners to run the action(s). Plugins should not be able to override internal actions, but they can extend them (can they?). Make actions extensible via plugins and/or a scripting language gatewayd#467.
  • Actions runner (queueing system): a queue should be used to let the action system run long-running actions. For example, the terminate action has a quick side-effect of terminating the request, yet calling an API might take more time (and possibly even fail), so it isn't ideal to run the action on the main goroutine. Note that the sync actions exposed by plugins should be called immediately and not queued. Add queueing for async actions gatewayd#464.
  • Distributed runner: the action runner can be on other systems and receive events from the queue. After processing them, it can publish the results to the same queue or just discard it. Multiple runners can listen for events on the other end of the queue to handle load. The (distributed) runner/worker should be a subcommand in gatewayd and it should support multiple queueing systems. Add distributed runner for running actions gatewayd#472.
  • Triggers: certain actions can trigger other actions. For example, when a request is terminated, we might also want to log it, update a metric and possibly trigger an alert and/or call a webhook to notify developers or security people of such an event. Implement triggers (chain of actions) gatewayd#469.
  • Plugable policy engines: GatewayD will use Expr language internally, while being able to integrate with OPA and other policy engines. Integrate with policy engines gatewayd#470.
  • Other signal sources: the possibility of signal being received from other sources, and not only plugins, should be investigated. Investigate other signal sources gatewayd#471.
Flowchart
flowchart LR
    subgraph GatewayD
        direction LR
        subgraph Proxy
            PassThroughToServer
        end
        PassThroughToServer -- calls --> h
        subgraph PX["Plugin X"]
            h["OnTrafficFromClient"] --returns--> v1S
        end
        subgraph v1S["*v1.Struct"]
            rFBt["req.Fields['signals'] = []Signal{
                Terminate({'value': true}),
                Log({'level':'debug','msg':'...'}),
                Call({'method':'get','url':'...'})}"]
        end
        rFBt --are--> Signals
        subgraph PE["Policy Engine"]
            subgraph TP["Terminate policy"]
                IT{"Signal.terminate == true"} -->|true| ST
                IT -->|false| sendToS["Send request to server"]
                ST{"Policy.terminate == 'stop'"} -->|true| sendToC["Send error/response to client"]
                ST -->|false| sendToS
            end
            subgraph LP["Log policy"]
                IL{"'log' in Signal"} -->|true| PL
                IL -->|false| X["Discard log signal"]
                PL{"Policy.log == true"} -->|true| Log
                PL -->|false| X
            end
            subgraph CP["Call policy"]
                IC{"'call' in Signal"} -->|true| PC
                IC -->|false| W["Discard call signal"]
                PC{"Policy.call == true"} -->|true| Y["Call an API"]
                PC -->|false| W
            end
        end
        Signals -. Signal .-> Actions
        Log -.- Async
        X -.- Async
        sendToC -.- Sync
        sendToS -.- Sync
        W -.- Async
        W -.- Sync
        Y -.- Async
        Signals -- passes through --> IT
        Signals -- passes through --> IL
        Signals -- passes through --> IC
        Actions --> Sync
        Actions --> Async
        Async --> Queue
        subgraph JQ["Job Queue"]
            Queue --> Consumer
            Consumer --> Worker
            Worker --> Result
        end
        Sync -->|sync| Worker
        Result -->|response| h
        h -->|response| PassThroughToServer
    end
    style Actions fill:#fff,color:black
    style Log fill:#fff,color:black
    style X fill:#fff,color:black
    style sendToC fill:#fff,color:black
    style sendToS fill:#fff,color:black
    style W fill:#fff,color:black
    style Y fill:#fff,color:black
    style GatewayD fill:#fff,color:black
    style PX fill:#fff,color:black
    style v1S fill:#fff,color:black
    style JQ fill:#fff,color:black
Loading
Sequence diagram
sequenceDiagram
    Client ->> GatewayD: sends a query
    GatewayD ->> Plugins: calls a hook (onTrafficFromClient)
    Plugins ->> GatewayD: return single or multiple non-contradicting signal(s)
    GatewayD ->> GatewayD: signal is mapped to an action
    GatewayD ->> GatewayD: policies control whether to run or discard the action
    par runs sync action(s) (e.g. terminate)
        GatewayD ->> GatewayD: decides what to do with the request or response
        GatewayD -->> Client: terminates traffic (decision is final)
    and queues async action(s) (e.g. call a webhook)
        GatewayD ->> Queue: queue job for calling a webhook
        Queue ->> GatewayD: queued or failed and logged
    end
Loading

Actions

The following is the list of built-in and custom actions.

GatewayD sync actions
Signal Policy Action
terminate (bool) signal.terminate == true && policy.terminate == "stop" Terminate request (current) (change to drop?)
disconnect Terminate connection
reject Drop the request and reset the TCP connection
transmit/tx Bypass plugins and just relay the request/response
allow Allow request/response
deny Deny request/response
block Block client (time-based or permanent)
discard Discard a request/response
reset Reset the connection either way (disconnect?)
fallthough No action (is it needed?)
route Route traffic to a specific server
forward Forward traffic to a specific server
upgrade Upgrade connection to TLS
police Apply rate limits to the traffic
set Set/update session parameter
noset Prevent session parameter update
create create a new connection to the database (connect?)
limit Limit client or request/response (should it be a custom action?)
error Return error response (should it be a response?)
request Return a (modified) request
response Return a (modified) response
GatewayD async actions
Signal Policy Action
log Log audit trail (or request/response)
metric Record a metric
queue Queue request/response
conntrack Connection tracking
mirror Mirror traffic to another server
inspect Store traffic for inspection (submit for inspection?)
record Store all traffic for inspection
quarantine Like record, but for malicious packets (?)
call Call an API or webhook (HTTP request?)
Plugin or custom actions
Signal Policy Action
cache Cache request/response
alert Log and/or trigger an alert
notify Notify a plugin of something (or call a service?)
run Run an action (should it be a custom action?)
webhook Call a webhook
rotate Rotate keys and secrets
reauth Reauthenticate the client either way
ebpf Run eBPF program to block a client in the kernel
wasm Run WebAssembly filter
publish/produce Publish a (batch of) message(s) (to Kafka or any other streaming/messaging system)
subscribe/consume Consume a (batch of) message(s) (to Kafka or any other streaming/messaging system)
Custom debugger actions
Signal Policy Action
debug Start debugging
breakpoint Set a breakpoint on an action/step
pause Pause for user input
step Run a single step after user input (maybe next?)
continue Continue processing
abort Abort processing

Action storage/channel

Currently everything is passed around at the root level of the request.Fields. The action can be passed through in the context (metadata) to avoid clashing with the keys injected by HandleClientMessage (specifically the Terminate message) from the PostgreSQL wire protocol parser. This needs to be abstracted away in something like gRPC metadata. The metadata is passed around in the context object, which is unidirectional from the client (GatewayD) to the server (plugins). However, the plugins can only pass data to GatewayD using the returned request.Fields (and the error, which is not designed for this task). This means that a custom field should be created to handle the action(s).

Policies and actions

Note to self: Policies dictate actions. For example, a simple policy would be equality to a value (query.where.id == 1). Policies can be stored anywhere, including a policy engine like OPA.

TODO

  • Consider simplifying actions (too many seemingly duplicate names: terminate, reject, block, deny, etc.)
  • Create a new ticket for custom debugger actions
  • Create a new ticket for adding more capabilities to the plugin system (and plugins) like extending actions (in addition to hooks)

Related

Resources

Resources
@mostafa mostafa added the enhancement New feature or request label Dec 29, 2023
@mostafa mostafa self-assigned this Jan 21, 2024
@mostafa mostafa changed the title Refactor and extend the action system The signal, action and policy system Jan 28, 2024
@mostafa mostafa changed the title The signal, action and policy system The Signal, Action and Policy System Jan 28, 2024
@mostafa mostafa transferred this issue from gatewayd-io/gatewayd Jan 28, 2024
@mostafa mostafa changed the title The Signal, Action and Policy System Signal, Action and Policy System Jan 28, 2024
@mostafa mostafa moved this from 🚧 In progress to 📋 Backlog in GatewayD Core Public Roadmap Feb 18, 2024
mostafa added a commit to gatewayd-io/gatewayd that referenced this issue Mar 1, 2024
This giant PR adds the very first version of the Act system that was proposed in [this proposal](gatewayd-io/proposals#5). The old way of signaling was static and only supported a single signal: `terminate`. The new system support more signals, adds proper policies that can be easily controlled by the users and the actions are executed in sync and async mode.

The Act system consists of these components:
1. **Act Registry**: takes care of registering signals, policies and actions. It also applies policies to signals to produce outputs for actions and runs actions using those outputs.
2. **Signals**: plugins' hooks can return signal(s) as part of their request/response. These signals tell GatewayD what to do.
3. **Policies**: signals pass through predefined policies that will decide whether GatewayD should react to the signal or not.
4. **Actions**: actions run in sync or async mode and perform a function. Sync actions are used to control traffic (passthrough, terminate, etc.) and other parts of the system, and async actions can other things (log, publish a message to Kafka, etc.).
5. **Plugin Registry**: after running a hook on each plugin, the signals are extracted and the policies are applied to those signals. The output of those policy evaluations are returned to the caller, which knows how to run action and use its results.

And the code spans over two projects:
1. **GatewayD**: all the above components of the Act system are in GatewayD.
2. **SDK**: types and helper functions for creating and exporting signals are in the SDK.

### Breaking changes 
The old way of terminating requests don't work anymore, as it was refactored in #442 and all the plugins are updated to pick up the changes.
@mostafa mostafa added the core Anything related to the gatewayd core project label Mar 2, 2024
smnmna99 pushed a commit to gatewayd-io/gatewayd that referenced this issue Mar 13, 2024
This giant PR adds the very first version of the Act system that was proposed in [this proposal](gatewayd-io/proposals#5). The old way of signaling was static and only supported a single signal: `terminate`. The new system support more signals, adds proper policies that can be easily controlled by the users and the actions are executed in sync and async mode.

The Act system consists of these components:
1. **Act Registry**: takes care of registering signals, policies and actions. It also applies policies to signals to produce outputs for actions and runs actions using those outputs.
2. **Signals**: plugins' hooks can return signal(s) as part of their request/response. These signals tell GatewayD what to do.
3. **Policies**: signals pass through predefined policies that will decide whether GatewayD should react to the signal or not.
4. **Actions**: actions run in sync or async mode and perform a function. Sync actions are used to control traffic (passthrough, terminate, etc.) and other parts of the system, and async actions can other things (log, publish a message to Kafka, etc.).
5. **Plugin Registry**: after running a hook on each plugin, the signals are extracted and the policies are applied to those signals. The output of those policy evaluations are returned to the caller, which knows how to run action and use its results.

And the code spans over two projects:
1. **GatewayD**: all the above components of the Act system are in GatewayD.
2. **SDK**: types and helper functions for creating and exporting signals are in the SDK.

### Breaking changes 
The old way of terminating requests don't work anymore, as it was refactored in #442 and all the plugins are updated to pick up the changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Anything related to the gatewayd core project enhancement New feature or request
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant