-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saw less data than expected during bulk import test. #1278
Comments
This was seen while analyzing the test run mentioned in #1277 |
I opened apache/accumulo-testing#98 with some info about checking the counts after a test. |
Doh! I ran test again and saw less counts than expected, but I forgot to copy the bulk import files before the test. That is why I opened apache/accumulo-testing#104 |
I ran another test in which I copied the files before bulk importing and of course the issue did not happen. |
Closing out as no activity in over 3 years, it can be reopened if this is still a problem. |
Using the new bulk import test I saw less data than expected during a recent test run. The failure was very odd and I am not sure if it was a problem with Accumuo or the test at this point.
When doing continouos ingest test using bulk import the expected number of keys values in the table can be easily computed. After running the test I found the actual was less than the expected. After doing a lot of analysis (which included writing apache/accumulo-testing#97) I found that the output of two map reduce jobs had less data than expected. The data from the other 993 jobs had the expected number of entries.
CI bulk MR job generates data in the following way.
In my case I ran 12 mappers each generating a linked list of 833,333 nodes. So each map reduce job should have created 12*833,888 keys. I confirmed that all reducers generated this many entries by inspecting MR output. However eventually in Accumulo one of the linked list was much shorter than 833,333. What is odd is that the entire rest of the linked list was gone. Therefore no holes in the linked list were detected, however data was missing. One possibility is that two mappers started to generated the same random data.
The text was updated successfully, but these errors were encountered: