Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fulltext bug fixes, performance improvement and support json_value parser #20230

Open
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

cpegeric
Copy link
Contributor

@cpegeric cpegeric commented Nov 20, 2024

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #20217 #20213 #20175 #20149 #20311

What this PR does / why we need it:

bug fixes for #20217 #20213 #20175 #20149
and add json_value parser

  1. limit the batch size to 8192 on both fulltext_index_scan() and fulltext_tokenize() function
  2. In fulltext_index_scan function, create a new thread to evaluate the score in 8192 documents per batch instead of waiting for all results from SQL. It will speed up and avoid OOM in the function. However, the score will be calculated based on each mini-batch instead of complete batch. I think it doesn't matter as long as we have the correct answer.
  3. support json_value parser
  4. Pre-allocation of memory in fulltext_tokenize() function to avoid malloc
  5. bug fix [Bug]: table with fulltext index ,delete from table where xxx failed #20149 Delete table. pkPos, pkType is needed but (doc_id, INT) is given.
  6. add monpl tokenizer repo to matrixone
  7. bug fix json tokenizer to truncate value and increase the limit to 127 bytes
  8. pushdown limit
  9. bug fix [Bug]: data race occurred during bvt test  #20311. data race occurred during bvt test
  10. alter table drop column with fulltext index
  11. SQL executor add streaming mode.

@mergify mergify bot added the kind/bug Something isn't working label Nov 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working size/XXL Denotes a PR that changes 2000+ lines
Projects
None yet
10 participants