Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly close some file handles #201

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

atoomic
Copy link
Collaborator

@atoomic atoomic commented Nov 6, 2020

After updating to v1.000038 from v1.000000 we can start noticing some Too many open files issues.
I did a quick review of the change from v1.000000..v1.000038 and check all new introduced ~open
and add an extra explicit close for them

@exodist
Copy link
Member

exodist commented Nov 6, 2020

Can you explain to me why this is necessary, with exception of $wfh in one of the files, most of these look like the scope should end the filehandle anyway, so they seem unnecessary. Am I wrong about when filehandles are cleaed up, is it at sub exit instead of scope exit?

@atoomic
Copy link
Collaborator Author

atoomic commented Nov 6, 2020

@exodist IMO the fh should also close when exiting the scope. So yes most of them are not necessary.
As updated in the description, after updating to v1.000038 from v1.000000 we can start noticing some Too many open files issues. I did a quick pass to check all added open during these iterations and ensure they are close as soon as possible.

I agree some close are not necessary, but I also realized that the existing code was doing a pretty good job of using explicit close when needed, so I decided to add them too.

@exodist
Copy link
Member

exodist commented Nov 6, 2020

ah, that makes sense. Did this patch solve your "too many handles" issue?

@atoomic
Copy link
Collaborator Author

atoomic commented Nov 6, 2020

I'm currently working on this right now, the answer is not obvious as this is a flapping issue... that we start noticing since the update... so might need a few extra days to confirm if it fixes it or not

@exodist
Copy link
Member

exodist commented Nov 6, 2020

ok, either way I have no issues with this patch, I think it should be a no-op, explicit close is fine though too. But if you are still working on the problem I will keep it open for any additional patches you may want to attach.

@atoomic
Copy link
Collaborator Author

atoomic commented Nov 7, 2020

this is not good enough, I can still see some random issues while running our test suite
need to keep investigating

@toddr
Copy link
Collaborator

toddr commented Nov 7, 2020

I wonder if there's a refcount issue introduced recently that might be the source of the problem. is something keeping track of dead handles in a way that's maybe not getting freed?

FTR we have about 4000 test files and are running at -j12

@toddr
Copy link
Collaborator

toddr commented Nov 7, 2020

just got:

19:31:42  Could not open events dir: Too many open files at Test2/Harness/Collector/JobDir.pm line 436, <$fh> line 1.

@toddr toddr changed the title Explicitely close some file handles Explicitly close some file handles Nov 7, 2020
@toddr
Copy link
Collaborator

toddr commented Nov 7, 2020

Actually the problem happened on 1.000000. So this may go back to a problem between 0.999009 and 1.000000

@exodist
Copy link
Member

exodist commented Nov 7, 2020

Are open files limited by user, process, or system?

@toddr
Copy link
Collaborator

toddr commented Nov 7, 2020

On most Linux systems, the default for compiled binaries is a max of 1024 file handles per process.

@toddr
Copy link
Collaborator

toddr commented Nov 7, 2020

Don't worry about this for now. I have a way to find the root cause on monday. it should be easy to detect. I just need to check /proc/$$/fd and find out which parent yath process is hoarding file descriptors since they're being passed to the children on fork.

@toddr toddr marked this pull request as draft November 7, 2020 01:46
@toddr
Copy link
Collaborator

toddr commented Nov 7, 2020

I assume if I can tell you what files aren't closing, then it'll be easy to determine the root cause?

@exodist
Copy link
Member

exodist commented Nov 7, 2020

Potentially. I would focus on the collector and the runner, they are the ones that need the most filehandles. The runner creates the handles for the children when it spawns them, but should let them go right after. The collector opens all the files produced by tests, but should let them go when a test completes.

My hunch is that either the collector is not letting go of file handles when it is done with them (unexpected references kept?), Or with 12 processes running there is a combination of 12 of your tests that combined with what yath uses result in too many files. Maybe even 1 or 2 filehandle heavy tests?

@atoomic
Copy link
Collaborator Author

atoomic commented Nov 9, 2020

Thanks @exodist for these details.
This is an issue from the collector when we run too many tests / too fast...
The collector, just try to open every files without any limits...

We could limit the number of jobs opened to avoid(/fix) the issue
Note that when using retry the collector is also keeping open FH...
this is probably something we can ignore for now...

Making sure sub jobs stop polling from the queue when having for example 200 jobs fixes the issue.
IMO this is fine to have that limit hard coded to 200 in the collector.
This is not making the test run slower, just limiting the number of tests we can read at the same time...
and as it s for human eyes... I m not sure I can catch up with 200 jobs lol

@atoomic
Copy link
Collaborator Author

atoomic commented Nov 9, 2020

The real fix is #203, this is just good to have

Copy link
Collaborator Author

@atoomic atoomic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointed the useless close|closedir which should be removed from that PR
leaving a few valuable earlier closer
need to adjust and resubmit

@@ -33,6 +33,7 @@ BEGIN {
if ($no_stat{lc($^O)}) {
opendir(my $dh, $dir) or die "$!";
my $key = join ':' => sort readdir($dh);
closedir($dh);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless

@@ -318,6 +318,7 @@ sub find_libraries {
next unless -f $path && $path =~ m/\.pm$/;
push @found => [$path, $prefix];
}
closedir($dh);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless

@@ -129,6 +129,7 @@ sub yath {
push @lines => @new;
print map { chomp($_); "DEBUG: > $_\n" } @new if $debug > 1;
}
close($rh);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless

@@ -374,6 +374,9 @@ sub update_io {
my $msg = "$_[0] at $caller[1] line $caller[2] ($caller2[1] line $caller2[2]).\n";
print $stderr $msg;
print STDERR $msg;
close($out_fh);
close($err_fh);
close($in_fh) if $in_fh;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants