Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saw less data than expected during bulk import test. #1278

Closed
keith-turner opened this issue Jul 18, 2019 · 5 comments
Closed

Saw less data than expected during bulk import test. #1278

keith-turner opened this issue Jul 18, 2019 · 5 comments

Comments

@keith-turner
Copy link
Contributor

keith-turner commented Jul 18, 2019

Using the new bulk import test I saw less data than expected during a recent test run. The failure was very odd and I am not sure if it was a problem with Accumuo or the test at this point.

When doing continouos ingest test using bulk import the expected number of keys values in the table can be easily computed. After running the test I found the actual was less than the expected. After doing a lot of analysis (which included writing apache/accumulo-testing#97) I found that the output of two map reduce jobs had less data than expected. The data from the other 993 jobs had the expected number of entries.

CI bulk MR job generates data in the following way.

  • Each mapper generates a random linked list
  • Reducer bins and sorted data from all mappers by tablets.

In my case I ran 12 mappers each generating a linked list of 833,333 nodes. So each map reduce job should have created 12*833,888 keys. I confirmed that all reducers generated this many entries by inspecting MR output. However eventually in Accumulo one of the linked list was much shorter than 833,333. What is odd is that the entire rest of the linked list was gone. Therefore no holes in the linked list were detected, however data was missing. One possibility is that two mappers started to generated the same random data.

@keith-turner
Copy link
Contributor Author

This was seen while analyzing the test run mentioned in #1277

@keith-turner
Copy link
Contributor Author

I opened apache/accumulo-testing#98 with some info about checking the counts after a test.

@keith-turner
Copy link
Contributor Author

Doh! I ran test again and saw less counts than expected, but I forgot to copy the bulk import files before the test. That is why I opened apache/accumulo-testing#104

@keith-turner
Copy link
Contributor Author

I ran another test in which I copied the files before bulk importing and of course the issue did not happen.

@cshannon
Copy link
Contributor

cshannon commented Dec 3, 2022

Closing out as no activity in over 3 years, it can be reopened if this is still a problem.

@cshannon cshannon closed this as not planned Won't fix, can't repro, duplicate, stale Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants