-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5 DeveloperFixed for esper #1221
base: main
Are you sure you want to change the base?
5 DeveloperFixed for esper #1221
Conversation
For For |
What do you mean by "where are the logs showing that the tests pass in ac5da5eb84d8861252e4bff5ca6bbcb4898192b6 (even for large -DnondexRuns=) but pass in ac5da5eb84d8861252e4bff5ca6bbcb4898192b6^?" For |
In your VM, what's the absolute path to the logs that show tests failing or passing, as appropriate
If the test was fixed before being renamed, then change |
I believe that you had a typo " the tests pass in ... but pass in". So you mean I should provide both logs, the one before fix from developer showing that the test failed, and the one after fix from developer showing the test passes? |
Sorry for the typo. Your logs for |
Got it. Do I need to include those logs in pr-data.csv or just have it in the path with all other logs as Zijie instructed? |
Logs should note be in |
I've just checked that all logs you want are here:/home/zicheng5/cs527/new-tests/esper(on my VM). I will generate the final commit to change |
Final commit, changing Renamed to DeveloperFixed
eaf189d
to
e82c1d2
Compare
@zzjas can you please check the logs? |
@Doom-Prophet could you please save two log files with the commit SHA in their file name? The current 200+ log files only have timestamps but no commit information. |
Sure, I will work on it this afternoon and notify you once completed. |
@zzjas Log files have been renamed in the format "zicheng5-<test_name>-before/after-<commit_SHA>.nondex_log", you can check them now. |
The after logs didn't show any tests were run. Were the tests renamed? |
Due to the specificity of esper, I have to run the whole test file(rename other tests and keep the target one unchanged) to run a certain test, so take an example, instead of "Running com.espertech.esper.regressionrun.suite.epl.TestSuiteEPLSpatial.testEPLSpatialMXCIFQuadTreeEventIndex", only "Running com.espertech.esper.regressionrun.suite.epl.TestSuiteEPLSpatial" will be shown. However, inside the log, sometimes the file run by that target test will be shown, sometimes not. Take
as example, in the after log for If you want to confirm, just check the after logs one more time but this time only search for the class names. |
Does Did you check #1003? It may use JUnit 4. |
It appears that maven surefire plugin version 2.4.3 doesn't support running single test cases (i.e. Based the documentation, it looks like the ability to run a single test case was added in 2.7.3. |
Update Surefire version in |
With or without the maven-surefire update to 2.8 (the min version available on maven central repo closest to 2.7.3), the # Sanity Check: Master (Individually run)
git clone https://github.com/espertechinc/esper
mvn clean install -DskipTests -Dgpg.skip=true
# Sanity Check: com.espertech.esper.regressionrun.suite.epl.TestSuiteEPLSpatial.testEPLSpatialMXCIFQuadTreeEventIndex
mvn -pl regression-run edu.illinois:nondex-maven-plugin:2.1.1:nondex -DnondexRuns=3 -Dtest=com.espertech.esper.regressionrun.suite.epl.TestSuiteEPLSpatial#testEPLSpatialPointRegionQuadTreeEventIndex -DfailIfNoTests=false -Dcheckstyle.skip -Dgpg.skip=true
# Sanity Check: com.espertech.esper.regressionrun.suite.epl.TestSuiteEPLSpatial.testEPLSpatialMXCIFQuadTreeFilterIndex
mvn -pl regression-run edu.illinois:nondex-maven-plugin:2.1.1:nondex -DnondexRuns=3 -Dtest=com.espertech.esper.regressionrun.suite.epl.TestSuiteEPLSpatial#testEPLSpatialPointRegionQuadTreeEventIndex -DfailIfNoTests=false -Dcheckstyle.skip -Dgpg.skip=true
# Sanity Check: Run Entire Test Class
mvn -pl regression-run edu.illinois:nondex-maven-plugin:2.1.1:nondex -DnondexRuns=10 -Dtest=com.espertech.esper.regressionrun.suite.epl.TestSuiteEPLSpatial -DfailIfNoTests=false -Dcheckstyle.skip -Dgpg.skip=true I also checked the commit shas proposed in the PR. Both the commit with the "fix" ac5da5eb84d8861252e4bff5ca6bbcb4898192b6 and the immediate preceding commit 11423a1f7235b26de308490bb2f3483ec3f6ac71, still fail NonDex. The PR has no description, so I do not know why the proposed commits where chosen, nor do I have access to the original logs to investigate why. I can provide logs (CampusWire, SCP to VMs, etc) I am currently reviewing tests for the other changes |
Can you find (1) if and (2) where the tests were fixed? For (1) check if they're passing in the latest SHA. If so, look into (2). If they're failing in the latest SHA, then they should be debugged there. |
The This PR attempts to change too many unrelated things with different sources of flakiness with little to no explanation. I think this should be split into multiple smaller PRs. For example, I believe the reasoning for changing Here is the timeline for this specific test ( (Oct 22, 2018) The test, (Nov 21, 2018) Idoft references the following commit for the flaky test (May 29, 2019) The test, (Dec 18, 2023) The test, I do not understand how The linked commit 2572a764b97241ec8b75e0697c15bc74d97a4177 has no diffs related to the renaming of Moreover, the renaming does not fix the test, it still fails on the master branch. # Sanity Check From Idoft
git checkout 590fa9c9eb854f1420b9d337b802aca19f963cc0 -f
mvn clean install -DskipTests -Dgpg.skip=true -Dcheckstyle.skip=true
mvn edu.illinois:nondex-maven-plugin:2.1.1:clean
mvn -pl regression-run edu.illinois:nondex-maven-plugin:2.1.1:nondex -DnondexRuns=10 -Dtest=com.espertech.esper.regressionrun.suite.infra.TestSuiteInfraNWTable#testInfraNWTableInfraFAF -DfailIfNoTests=false -Dcheckstyle.skip -Dgpg.skip=true > flaky_test_from_idoft_fail_nondex--surefire_2.8--commit--590fa9c9eb854f1420b9d337b802aca19f963cc0--com.espertech.esper.regressionrun.suite.infra.TestSuiteInfraNWTable.testInfraNWTableInfraFAF
# Sanity Check from Latest
git checkout e73f3b763ee3a5e4434b80d5a2e937cb2717755e -f
mvn clean install -DskipTests -Dgpg.skip=true -Dcheckstyle.skip=true
mvn edu.illinois:nondex-maven-plugin:2.1.1:clean
mvn -pl regression-run edu.illinois:nondex-maven-plugin:2.1.1:nondex -DnondexRuns=10 -Dtest=com.espertech.esper.regressionrun.suite.infra.TestSuiteInfraNWTable#testInfraNWTableFAF -DfailIfNoTests=false -Dcheckstyle.skip -Dgpg.skip=true > flaky_test_from_lastest_fails_nondex--surefire_2.8--commit--e73f3b763ee3a5e4434b80d5a2e937cb2717755e--com.espertech.esper.regressionrun.suite.infra.TestSuiteInfraNWTable.testInfraNWTableFAF But my point is, I think this PR should be split into a few smaller PRs with much clearer explanations for each change (or group of changes that have a related explanation) as the description for this PR is getting very big and messy. |
I think there's more issues with this PR: I do not think the following 2 tests should be marked as "Developer Fixed".
I believe this is a false positive. After over 100+ hundred NonDex for the entire TestSuiteExprDateTime test class (without touching maven-surefire) and upgrading maven-surefire to 2.8 to allow individual test runs, I could not get a flaky failures on neither the latest commit nor the replicate the flakiness from Idoft's commit sha. Is the flakiness platform dependent? My setup is on the WSL2 Ubuntu 18.04.6LTS: root@LAPTOP-MR10D0GP:~/esper# java -version
java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
root@LAPTOP-MR10D0GP:~/esper# mvn -version
Apache Maven 3.9.2 (c9616018c7a021c1c39be70fb2843d6f5f9b8a1c)
Maven home: /root/apache-maven-3.9.2
Java version: 1.8.0_202, vendor: Oracle Corporation, runtime: /root/jdk1.8.0_202/jre
Default locale: en, platform encoding: UTF-8
OS name: "linux", version: "5.10.16.3-microsoft-standard-wsl2", arch: "amd64", family: "unix"
root@LAPTOP-MR10D0GP:~/esper# This makes me wonder, have false positives ever been added to Idoft before? This link that "fixes" these tests doesn't make sense either... Steps to reproduce: # Sanity Check From Idoft
git checkout 590fa9c9eb854f1420b9d337b802aca19f963cc0 -f
mvn clean install -DskipTests -Dgpg.skip=true -Dcheckstyle.skip=true
mvn edu.illinois:nondex-maven-plugin:2.1.1:clean
mvn -pl regression-run edu.illinois:nondex-maven-plugin:2.1.1:nondex -DnondexRuns=10 -Dtest=com.espertech.esper.regressionrun.suite.expr.TestSuiteExprDateTime#testExprDTPerfIntervalOps -DfailIfNoTests=false -Dcheckstyle.skip -Dgpg.skip=true > flaky_test_from_idoft--nondex--surefire_2.8--commit--590fa9c9eb854f1420b9d337b802aca19f963cc0--com.espertech.esper.regressionrun.suite.expr.TestSuiteExprDateTime.testExprDTPerfIntervalOps.log
mvn edu.illinois:nondex-maven-plugin:2.1.1:clean
mvn -pl regression-run edu.illinois:nondex-maven-plugin:2.1.1:nondex -DnondexRuns=10 -Dtest=com.espertech.esper.regressionrun.suite.expr.TestSuiteExprEnum#testExprEnumNamedWindowPerformance -DfailIfNoTests=false -Dcheckstyle.skip -Dgpg.skip=true > flaky_test_from_idoft--nondex--surefire_2.8--commit--590fa9c9eb854f1420b9d337b802aca19f963cc0--com.espertech.esper.regressionrun.suite.expr.TestSuiteExprEnum.testExprEnumNamedWindowPerformance.log
# Run entire test suite without changing pom
mvn clean install -DskipTests -Dgpg.skip=true -Dcheckstyle.skip=true
mvn edu.illinois:nondex-maven-plugin:2.1.1:clean
mvn -pl regression-run edu.illinois:nondex-maven-plugin:2.1.1:nondex -DnondexRuns=10 -Dtest=com.espertech.esper.regressionrun.suite.expr.TestSuiteExprDateTime -DfailIfNoTests=false -Dcheckstyle.skip -Dgpg.skip=true > flaky_test_from_idoft--nondex--surefire_2.8--commit--590fa9c9eb854f1420b9d337b802aca19f963cc0--com.espertech.esper.regressionrun.suite.expr.TestSuiteExprDateTime.log
# Sanity Check from Latest
git checkout master -f
mvn clean install -DskipTests -Dgpg.skip=true -Dcheckstyle.skip=true
mvn edu.illinois:nondex-maven-plugin:2.1.1:clean
mvn -pl regression-run edu.illinois:nondex-maven-plugin:2.1.1:nondex -DnondexRuns=10 -Dtest=com.espertech.esper.regressionrun.suite.expr.TestSuiteExprDateTime#testExprDTPerfIntervalOps -DfailIfNoTests=false -Dcheckstyle.skip -Dgpg.skip=true > flaky_test_from_lastest--nondex--surefire_2.8--commit--e73f3b763ee3a5e4434b80d5a2e937cb2717755e--com.espertech.esper.regressionrun.suite.expr.TestSuiteExprDateTime.testExprDTPerfIntervalOps.log
mvn edu.illinois:nondex-maven-plugin:2.1.1:clean
mvn -pl regression-run edu.illinois:nondex-maven-plugin:2.1.1:nondex -DnondexRuns=10 -Dtest=com.espertech.esper.regressionrun.suite.expr.TestSuiteExprEnum#testExprEnumNamedWindowPerformance -DfailIfNoTests=false -Dcheckstyle.skip -Dgpg.skip=true > flaky_test_from_lastest--nondex--surefire_2.8--commit--e73f3b763ee3a5e4434b80d5a2e937cb2717755e--com.espertech.esper.regressionrun.suite.expr.TestSuiteExprEnum.testExprEnumNamedWindowPerformance.log |
For the similar reasoning the com.espertech.esper.regressionrun.suite.epl.TestSuiteEPLSubselect.testEPLSubselectNamedWindowPerformance should not be marked "DeveloperFixed". I could not get this test to fail NonDex using the commit sha from Idoft or from the latest. Based on my evaluation, these status changes for all 6 seem incorrect. Unless @Doom-Prophet can explain the status changes, I think this PR should be closed with 1 new PR opened for the renaming of the testInfraNWTableInfraFAF to testInfraNWTableFAF? |
My setup is on the WSL2 Ubuntu 22.04.6LTS, for all tests in esper I have to go back to jdk8 to make them run. Also, can I see your logs? I want to see why your nondex fails the tests and what message it got. In my situation, I cannot run nondex directly to check the flakiness as instructed in mp1, I have to rename all other tests in the same class and directly run on the class in the form For the renaming of the testInfraNWTableInfraFAF to testInfraNWTableFAF, when I firstly got this assigned test, I found that "TestSuiteInfraNWTable.java" and Ctrl+F on "testInfraNWTableInfraFAF" directly, and all I found is "testInfraNWTableInfraFAFIndex", how can I know which exactly is this test renamed to? How can you directly suppose that "testInfraNWTableFAF" is the one after renamed? For those "Developer Fixed", when I run them on the latest SHA, nondex passed, then I go to check and see on which past commit SHA they have been revised, git checkout to that, and run nondex again. The test failed at those SHA, so I recorded them as "Developer Fixed", I have logs for that. |
I was told another way to avoid the issue, will this result in different outcomes from nondex? |
I tried both ways and there didn't seem to be a difference, but I was also concerned about this. I used Are you saying It seemed to work without that flag. I will check CampusWire, thank you for letting me know about that as I never would've checked! After reading the CampusWire thread, I re-checked the logs and it turns out there a lot more going on when running these test cases. Though I am not sure just renaming tests is the best solution |
After adding print statements, I can confirm that the runs against NonDex are correctly invoking the targeted test method when upgrading maven surefire to 2.8. However, I still cannot replicate any of the flakiness for those 3 tests mentioned above. If you could give me some sample NonDex seeds with failures, I might be able to replicate the failure. For now the only change I can confirm is the renaming of the test mentioned here: #1221 (comment). I will have to revisit this later, but I do have one question in the mean time: Does maven-surefire single test == renaming tests and running the entire test class (from CampusWire post)? |
Likely but not guaranteed. What does "renaming tests" even mean? Are those tests in JUnit 3 (method name starts with
Yes (but hopefully not too many :). The dataset does have some amount of noise. If we insisted on more checks, more runs, and more logs, students would not get so many "points". Moreover, the key problem is nondeterminism. Many of these cases are hard to reproduce. Even what some students claim now to never fail may have genuinely failed for someone in the past (or may have been a complete false positive). Flaky tests are a real problem due to that nondeterminism...
Feel free to start opening new PRs for easier to accept cases. |
The project supports Junit4, but these are legacy Junit3 tests. I tried refactoring them to Junit4, but still could not replicate flakiness. This attempt was in addition to any other permutations of attempts I tried earlier in the conversation. The renaming trick I mentioned is from the CampusWire post @Doom-Prophet mentioned. I didn’t try it, because I don’t think it is applicable to these tests. I tried the maven useFile flag, which I don’t think is required to see the flakiness, and could not replicate the results in the PR. Overall I cannot replicate @Doom-Prophet results from the PR. The only thing I can confirm is the the rename, which is from different commit (see earlier in the conversation) Perhaps @Doom-Prophet can provide steps to replicate his results? Otherwise I am not sure what the next steps to resolving this PR should be as running these tests are time consuming and I feel as though I am going in circles with this one |
Open a new PR for For tests you can't reproduce, can you give me the exact commands you tried? |
See this comment's "Steps to Reproduce" section: #1221 (comment). I was not using the branch from idoft, but forking the original repo and checking out the commit shas accordingly Note: Before running each NonDex run:
Optionally, if you add print statements for each test, you will see that the test specified in maven surefire is the only test that is ran. The logs don't make it that obvious. I lowered the number of NonDex when posting in the comment to avoid someone accidentally running 100 runs. |
In my maven surefile version, using the flag
And the function call trace will also be shown following the message above, guiding me to see which function is exactly called by that flaky test. All above won't show up if I didn't use that flag. For com.espertech.esper.regressionrun.suite.expr.TestSuiteExprDateTime I used: For com.espertech.esper.regressionrun.suite.epl.TestSuiteEPLSubselect.testEPLSubselectNamedWindowPerformance I used: For com.espertech.esper.regressionrun.suite.infra.TestSuiteInfraNWTable.testInfraNWTableInfraFAF I used: Note that you need to rename all tests except the target one in that class before running the commands. |
I've run Are you focused to find what happened to specific tests that were reported before, or are you trying to add some new tests as well? |
No description provided.