-
Notifications
You must be signed in to change notification settings - Fork 837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[arrow-string] Implement string view support for regexp_match
#6849
Conversation
Signed-off-by: Tai Le Manh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really liked the use of macros to make the test cases very clean. Although I'm unclear of the reason to use macros elsewhere.
let builder = StringViewBuilder::with_capacity(0); | ||
let mut list_builder = ListBuilder::new(builder); | ||
|
||
process_regexp_array_match!(array, regex_array, flags_array, list_builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed this expands to the original logic, for both StringView (here) and StringArrays (above), to match on an array of regex. Why the use of macros?
fn regexp_scalar_match_utf8view( | ||
array: &StringViewArray, | ||
regex: &Regex, | ||
) -> Result<ArrayRef, ArrowError> { | ||
let builder = StringViewBuilder::with_capacity(0); | ||
let mut list_builder = ListBuilder::new(builder); | ||
|
||
process_regexp_match!(array, regex, list_builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same pattern, and same question as before.
It's the same logic (for a single regex) into a macro to use for both StringView and StringArray. Why the macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wiedld Thanks so much for reviewing ❤️
Confirmed this expands to the original logic, for both StringView (here) and StringArrays (above), to match on an array of regex. Why the use of macros?
It's the same logic (for a single regex) into a macro to use for both StringView and StringArray. Why the macro?
I just want to reduce duplicate code here. I remember that I have struggled when I implement generic functions instead of macro. But if it makes code hard to read/debug/maintain I can try to remove the macro and using another solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around a bit, and It looks like some of the common array builder methods are not traits on ArrayBuilder. Specifically, I got to here and was blocked by the builder.append_value():
fn process_regexp_match<'a, T: ?Sized, A: ArrayAccessor<Item = &'a str>, O: OffsetSizeTrait, B: ArrayBuilder>(
array: ArrayIter<A>,
regex: &Regex,
list_builder: &mut GenericListBuilder<O, B>,
) -> Result<(), ArrowError> {
...
}
It looks like this method is only implemented on 9 of the 15 current ArrayBuilder implementations. Not sure on whether the method should be pulled into the interface or whether we want additional builder traits/interface definitions (maybe in a separate PR). @alamb ?
Otherwise, yes the macro is the alternative for DRY. But it feels like we want abstraction (a.k.a. a trait).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A trait sounds like a good idea to me, but unless we have a specific idea/proposal of one, I think macros are the best way to proceed (we can always clean the code up later)
Or perhaps @wiedld if you have time you can try and work out what a trait based solution would look like so we can compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that in a follow up PR. Thank you for the input @alamb .
@tlm365 -- if the additional test gets added, then I can approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb @tlm365 -- Here is a proposed trait ValuesBuilder: ArrayBuilder for common interfaces in building a list of values.
Note that there were several other common sets of builder functionality (such as building lists of lists), but I didn't address that due to (a) keep the PR small, and (b) I knew only of this immediate use case (altho there are likely others).
expected_builder.values().append_value("005"); | ||
expected_builder.append(true); | ||
expected_builder.values().append_value("7"); | ||
expected_builder.append(true); | ||
expected_builder.append(false); | ||
expected_builder.append(false); | ||
expected_builder.append(false); | ||
expected_builder.values().append_value(""); | ||
expected_builder.append(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much nicer not having this anymore.
arrow-string/src/regexp.rs
Outdated
test_match_scalar_pattern!( | ||
match_scalar_pattern_string_no_flags, | ||
vec![Some("abc-005-def"), Some("x-7-5"), Some("X545"), None], | ||
r"x.*-(\d*)-.*", | ||
None::<&str>, | ||
StringArray, | ||
GenericStringBuilder<i32>, | ||
[None, Some("7"), None, None] | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no difference in the match results with, or without, flags. Might be nice to have test cases demonstrating this difference. 🙏🏼
Signed-off-by: Tai Le Manh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the added test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 🚀 |
Which issue does this PR close?
Closes #6717.
Rationale for this change
What changes are included in this PR?
Add support string view (
Utf8View
) forregexp_match
, unit tests.Are there any user-facing changes?
More type support for
regexp_match
.