Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes File search cannot be canceled #2571

Open
jukzi opened this issue Dec 3, 2024 · 15 comments
Open

Sometimes File search cannot be canceled #2571

jukzi opened this issue Dec 3, 2024 · 15 comments
Labels
bug Something isn't working performance

Comments

@jukzi
Copy link
Contributor

jukzi commented Dec 3, 2024

For example searching for * 1_000_000; in Platform workspace hangs in C:\Users\jkubitz\platform-2024-09-23\git\p2\bundles\org.eclipse.equinox.p2.tests\testData\sat4j\Bug247638.opb

image
image

"Worker-35: File Search Worker" #99 [12772] prio=5 os_prio=0 cpu=389687.50ms elapsed=451.11s tid=0x000002617d846680 nid=12772 runnable  [0x000000e371dfe000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.Character.codePointAt([email protected]/Character.java:9320)
        at java.util.regex.Pattern$CharPropertyGreedy.match([email protected]/Pattern.java:4453)
        at java.util.regex.Pattern$Start.match([email protected]/Pattern.java:3787)
        at java.util.regex.Matcher.search([email protected]/Matcher.java:1767)
        at java.util.regex.Matcher.find([email protected]/Matcher.java:787)
        at org.eclipse.search.internal.core.text.TextSearchVisitor.locateMatches(TextSearchVisitor.java:485)
        at org.eclipse.search.internal.core.text.TextSearchVisitor$TextSearchJob.processFile(TextSearchVisitor.java:204)
        at org.eclipse.search.internal.core.text.TextSearchVisitor$TextSearchJob.run(TextSearchVisitor.java:172)
        at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
@jukzi jukzi added bug Something isn't working performance labels Dec 3, 2024
@jukzi
Copy link
Contributor Author

jukzi commented Dec 3, 2024

@jukzi
Copy link
Contributor Author

jukzi commented Dec 3, 2024

follow up exceptions on shutdown:

Job found still running after platform shutdown.  Jobs should be canceled by the plugin that scheduled them during shutdown: org.eclipse.search2.internal.ui.InternalSearchUI$InternalSearchJob RUNNING
	 at java.base/java.lang.Object.wait0(Native Method)
	 at java.base/java.lang.Object.wait(Object.java:366)
	 at org.eclipse.core.internal.jobs.InternalJobGroup.doJoin(InternalJobGroup.java:363)
	 at org.eclipse.core.internal.jobs.JobManager.join(JobManager.java:1165)
	 at org.eclipse.core.internal.jobs.InternalJobGroup.join(InternalJobGroup.java:106)
	 at org.eclipse.core.runtime.jobs.JobGroup.join(JobGroup.java:254)
	 at org.eclipse.search.internal.core.text.TextSearchVisitor.search(TextSearchVisitor.java:406)
	 at org.eclipse.search.internal.core.text.TextSearchVisitor.search(TextSearchVisitor.java:439)
	 at org.eclipse.search.core.text.TextSearchEngine$1.search(TextSearchEngine.java:62)
	 at org.eclipse.search.internal.ui.text.FileSearchQuery.run(FileSearchQuery.java:237)
	 at org.eclipse.search2.internal.ui.InternalSearchUI$InternalSearchJob.run(InternalSearchUI.java:94)
	 at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Job found still running after platform shutdown.  Jobs should be canceled by the plugin that scheduled them during shutdown: org.eclipse.search.internal.core.text.TextSearchVisitor$TextSearchJob RUNNING
	 at java.base/java.lang.Character.codePointAt(Character.java:9320)
	 at java.base/java.util.regex.Pattern$CharPropertyGreedy.match(Pattern.java:4453)
	 at java.base/java.util.regex.Pattern$Start.match(Pattern.java:3787)
	 at java.base/java.util.regex.Matcher.search(Matcher.java:1767)
	 at java.base/java.util.regex.Matcher.find(Matcher.java:787)
	 at org.eclipse.search.internal.core.text.TextSearchVisitor.locateMatches(TextSearchVisitor.java:485)
	 at org.eclipse.search.internal.core.text.TextSearchVisitor$TextSearchJob.processFile(TextSearchVisitor.java:204)
	 at org.eclipse.search.internal.core.text.TextSearchVisitor$TextSearchJob.run(TextSearchVisitor.java:172)
	 at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)

@jukzi
Copy link
Contributor Author

jukzi commented Dec 4, 2024

Turns out the java pattern matcher is just damn slow to match the pattern .*\Qanything\E with a single long line. Even though the line does not contain "anything" the matter does not check that simple condition first.

public class MatchPerfomance {
  public static void main(String[] args) {
	  Pattern.compile(".*\\Qanything\\E").matcher("A".repeat(100000)).find(); //~20 sec
  }
}

@laeubi
Copy link
Contributor

laeubi commented Dec 4, 2024

@jukzi see for example here:

https://stackoverflow.com/questions/2667015/is-regex-too-slow-real-life-examples-where-simple-non-regex-alternative-is-bett

quoted from there:

The reason the regex is so slow is that the * quantifier is greedy by default, and so the first .* tries to match the whole string, and after that begins to backtrack character by character. The runtime is exponential in the count of numbers on a line.

So I don't think there is java to blame...

@jukzi
Copy link
Contributor Author

jukzi commented Dec 4, 2024

Lets find a solution: In this case it much faster to match ^.*\Qanything\E (^ asserts position at start of a line), with the same result, but how can it be solved in general?

@laeubi
Copy link
Contributor

laeubi commented Dec 4, 2024

Maybe search can be smarter to use contains() in some cases (e.g. no wildcards specified and regex disabled).

In you case you should have searched for \* 1_000_000; instead make it already faster.

@laeubi
Copy link
Contributor

laeubi commented Dec 4, 2024

By the way for this particular case one might optimize the dialog to strip of any starting or ending * if not "match whole word" is selected.

@jukzi
Copy link
Contributor Author

jukzi commented Dec 4, 2024

That is not valid, as the result either also contains the characters before (*) or skip it

@jukzi
Copy link
Contributor Author

jukzi commented Dec 4, 2024

In you case you should have searched for \* 1_000_000; instead make it already faster.

yes, but i am a dump lazy user, that just copy pasted a text into a search field and had to restart eclipse.

@laeubi
Copy link
Contributor

laeubi commented Dec 4, 2024

That is not valid, as the result either also contains the characters before (*) or skip it

I don't understand... if I search for "* something" or for " something" is the same result, in both cases it searches for a space and then "something" to match ...

yes, but i am a dump lazy user, that just copy pasted a text into a search field and had to restart eclipse.

grafik
https://www.azquotes.com/quote/1417975

@stribizhev
Copy link

Not a bug at all. You need to define your task first, understand what you want to achieve, what inputs you have and what checks you need to run against the input. Once you get that, you will know that you either need to precise the regex (=define it manually) or do something else (implement multiprocessing, etc.).

@szarnekow
Copy link
Contributor

@jukzi Can you conduct the same experiment with https://github.com/google/re2j ?

@jukzi
Copy link
Contributor Author

jukzi commented Dec 4, 2024

@szarnekow great idea,
i tried: https://re2js.leopard.in.ua/ -> 5ms
image
That looks very promising!

@jukzi
Copy link
Contributor Author

jukzi commented Dec 4, 2024

I don't understand... if I search for "* something" or for " something" is the same result, in both cases it searches for a space and then "something" to match ...

@laeubi try to search "*main" vs "main"
image
vs
image

@jukzi
Copy link
Contributor Author

jukzi commented Dec 4, 2024

google/re2j#168 sounds dangerous, not production ready :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance
Projects
None yet
Development

No branches or pull requests

4 participants