-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lua filter #110
base: main
Are you sure you want to change the base?
Add lua filter #110
Conversation
81400af
to
8b1247f
Compare
Codecov Report
@@ Coverage Diff @@
## main #110 +/- ##
==========================================
+ Coverage 49.43% 50.22% +0.79%
==========================================
Files 56 57 +1
Lines 3880 3976 +96
==========================================
+ Hits 1918 1997 +79
- Misses 1752 1764 +12
- Partials 210 215 +5
Continue to review full report at Codecov.
|
7c42f57
to
f22308b
Compare
|
||
This filter is based on [GopherLua](github.com/yuin/gopher-lua), which is an LUA5.1 virtual machine. | ||
|
||
To use this filter you need to declare a function in an lua file. This function serves the same purpose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's worth telling the user that the function name must be the same as FilterName
conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see how can the user could be confused here since the config h as 2 fields: a filename and a function name.
filter/lua.go
Outdated
func (t *LUA) Process(rec baker.Record, next func(baker.Record)) { | ||
atomic.AddInt64(&t.nprocessed, 1) | ||
|
||
t.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure to lock all the Process functions? This means that even if the Lua function is pure it will not exploits any concurrent execution on different disjoined records. Moreover, it will imply that adding a single LuaFilter on a possibly long and complex filter-chain will inhibit any parallel execution, blocking all the execution on a single lock.
IHMO we need to create a new state each time to avoid the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unlocking before next
would solve the filterchain parallelization issue, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should refocus the review on correctness and semantics of the user-api.
Creating a new state each time is probably quite expensive, it might be beneficial in some cases, in might not be in others and I'm not sure one can be certain it's the way to go to obtain the best overall performance in a majority of cases.
Happy to discuss optimizations later once we get the API right though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unlocking before
next
would solve the filterchain parallelization issue, no?
the user may not call next
so I'm not sure I understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the lua function is in a file, the user could have written some statements before the function. This statement may be needed for the filter to work for example. The current implementation allows that by creating the state once and locking it each time any goroutine enters the lua filter. Recreating the state each time, apart from the potential cost, removes this possibility, since the lua file would need to be reinterpreted each time, that means for each record, at least if I correctly understood what you meant.
Another possible API I thought of was to ask the user to code the lua function directly in the TOML, but you lose syntax highlighting when coding in lua, you'll need to escape your code, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it will be better to ask the user to write pure functions, actually, I will see strange the other way around. Moreover, we could use this approach, sharing-lua-byte-code-between-lstates, to compile one time the Lua file and use it with different states. Although, it is thought that calling NewState
each time seems a bit too costly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to refocus the discussion a bit.
The idea behind the lua filter is to lower the bar in using baker and coding some filters fast. If I have to code pure lua functions, that means I won't be able to do simple stuff one could expect to be able to do in a filter, such as counting records for example.
Apart from being faster (which, as always with performance, would needs to be proved with numbers) I'm not sure I see any other benefit in requiring the user to write pure functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we have to clarify, before going into the implementation details, how we want to execute the LUA code of the user. I see two possible alternatives:
- sequentially execute the LUA code on the records
- parallelize the LUA code on multiple records
The two alternatives have pros and cons and both could drastically change both the performance and the expressiveness of the user. We could also think of a configurable behavior of the filter, but in both cases, the user must be aware of what he/she can and cannot do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in any case the user must know the tradeoffs that have been made so they can chose if they match their use case so those have to be documented.
Coming back to @tommyblue comments about the position of unlock, it seems right to release the lock before calling next
to allow concurrent execution of the rest of the filter chain. Will fix this and perform the benchmark/tests with this 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way to offload the locking to the user inside the Lua code? This way they can indicate if they need synchronization or not.
Suppress one allocation, reducing their number from 5 to 4, by pre-allocating the userdata we use to wrap the record passed to the lua filter. name old time/op new time/op delta LUAProcess-8 638ns ± 4% 559ns ± 6% -12.31% (p=0.000 n=10+10) name old alloc/op new alloc/op delta LUAProcess-8 152B ± 0% 104B ± 0% -31.58% (p=0.000 n=10+10) name old allocs/op new allocs/op delta LUAProcess-8 5.00 ± 0% 4.00 ± 0% -20.00% (p=0.000 n=10+10)
Reduce allocation from 4 to 2 by preallocating the lua function which is a wrapper to the Go 'next' function we receive. name old time/op new time/op delta LUAProcess-8 559ns ± 6% 459ns ± 7% -17.97% (p=0.000 n=10+10) name old alloc/op new alloc/op delta LUAProcess-8 104B ± 0% 24B ± 0% -76.92% (p=0.000 n=10+10) name old allocs/op new allocs/op delta LUAProcess-8 4.00 ± 0% 2.00 ± 0% -50.00% (p=0.000 n=10+10)
Lua function just receives a lua Record, and must return 2 values: - the first is a boolean indicating whether is to be kept or discarded - the second can be nil or a string indicating the error for this record
Codecov Report
@@ Coverage Diff @@
## main #110 +/- ##
==========================================
- Coverage 53.95% 50.20% -3.76%
==========================================
Files 60 57 -3
Lines 4003 3976 -27
==========================================
- Hits 2160 1996 -164
- Misses 1646 1764 +118
- Partials 197 216 +19
Continue to review full report at Codecov.
|
❓ What
Add a LUA filter that allows to add filter logic (implement the Process function) in lua.
Needs further testing/benchmarking/polishing but first results shows a relatively good performance with respect to the same filter coded in Go (5% slower for a filter with a single lua operation).
Every
Lua
baker filter compiles a given script and will run a user-specific function (which should have the following prototype):lua script knows the
record
type which is the lua-equivalent of theRecord
Go interface.record
has the following methods:get(int) string
set(int, string)
clear()
copy() record
In the lua context (called lua state), some utility functions are already defined and can be used:
validateRecord(record) bool, int
createRecord record
fieldByName(string) int|nil
different from Go which returns(baker.FieldIndex, bool)
fieldNames
which is a lua table, same asfieldNames
Go slice🔨 How to test
✅ Checklists
This section contains a list of checklists for common uses, please delete the checklists that are useless for your current use case (or add another checklist if your use case isn't covered yet).
all.go
files?make gofmt-write
been run on the code?make govet
been run on the code? Has the code been fixed accordingly to the output?