Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write lock stuck -- tried fsync() #25

Open
jmlzone opened this issue Nov 26, 2020 · 9 comments
Open

write lock stuck -- tried fsync() #25

jmlzone opened this issue Nov 26, 2020 · 9 comments

Comments

@jmlzone
Copy link

jmlzone commented Nov 26, 2020

Hi,
I know this is old code, and not much maintenance, but I thought I would share my issue and potential solution.
I am using a raspberry pi 4, 4GB, running from a fast USB SSD, (not the SD card) PL2303 USB-> serial adapter through a USB3 hub connected to a usb3 port on the pi. This is replacing a long line of x86 linux machines I have been using with heyu in the past, so I was an experienced long term user and this was an issue with the new server hardware.

My issue is that occasionally (frequently) heyu would be stuck with LCK..heyu.write.ttyUSB1 left in place blocking further access.

I tried to debug and look through the code for locking and unlocking -- even added a few more debug statements, of course with the debug statements inlace I did not have a failure again which made me think it could be thread timing since this is a fast machine with multiple cores that it could be a sync between the io and the threads. I have added fsync() calls after the writes to the lock file and the writes to the serial port:
ie: in tty.c
tty.c- if ( (f = fopen(ttydev, "w")) != NULL ) {
tty.c- fprintf(f, " %d\n", (int)getpid() );
tty.c: fsync(f);

and many places in x10aux.c read.c and write.c code like below. So far no hangs.

xread.c- if ( (!i_am_relay || i_am_aux) && write(sptty, (char *)lbuf , 4) != 4 )
xread.c- {
xread.c: fsync(sptty);

xwrite.c- ignoret = write(sptty, (char *)outbuf, size + 4);
xwrite.c: fsync(sptty);

Since I don't know about the portability and robustness of this fix, I am not proposing to pull my changes at this time, but I wanted to share my findings.

Thanks!

@gharris999
Copy link

I'm willing to kick the tires on your proposed changes on x86-64 hardware. Can you post DIFFs?

@jmlzone
Copy link
Author

jmlzone commented Dec 1, 2020

diffs.txt

The diffs I posted have a lot of debug code and also some forced overrides on some parameters. Mostly just try the lines with the fsync().

Thanks!

@jkrzyszt
Copy link
Contributor

Please provide more information:

  • steps to reproduce,
  • error messages displayed when further aceess is blocked (please run Heyu in verbose mode -- command line option -v),
  • whether further access is blocked temporarily or permanently (requires manual recovery).

Thanks,
Janusz

@jmlzone
Copy link
Author

jmlzone commented Mar 29, 2021 via email

@jkrzyszt
Copy link
Contributor

jkrzyszt commented Apr 3, 2021

Hi James and all,

I've just pushed a new branch with locking fixes (c67966a) for you to try. Since I've no access to a test environent at the moment, the patch has been only compile tested on Linux but I hope it can resolve old standing issues with stale lock files.

If you'd like to comment the patch itself, not only results of your testing, you can add your comments to a pull request #33 which I also created.

Thanks,
Janusz

@jmlzone
Copy link
Author

jmlzone commented Apr 3, 2021 via email

@blackketter
Copy link

While trying to solve #46, I tried this new branch but it wouldn't start on my raspi4 running raspian. The output looks like this:

Version:2.11-rc3
Searching for '/home/dean/.heyu/x10config'
Searching for '/usr/local/etc/heyu/x10.conf'
Found configuration file '/usr/local/etc/heyu/x10.conf'
Heyu directory /usr/local/etc/heyu/ is writable.
Reading Heyu configuration file '/usr/local/etc/heyu/x10.conf'
Trying to lock (/usr/local/var/lock/LCK..heyu.write.ttyUSB0)
lockpid: Checking for file '/usr/local/var/lock/LCK..LCK..'
lockpid: Checking for file '/usr/local/var/lock/LCK..LCK..'
lockpid: Checking for file '/usr/local/var/lock/LCK..LCK..'
lockpid: Checking for file '/usr/local/var/lock/LCK..LCK..'
lockpid: Checking for file '/usr/local/var/lock/LCK..LCK..'
lockpid: Checking for file '/usr/local/var/lock/LCK..LCK..'

ad infinitum

Any thoughts on how to move past this?

@jmlzone
Copy link
Author

jmlzone commented Nov 4, 2022 via email

@jkrzyszt
Copy link
Contributor

jkrzyszt commented Nov 5, 2022

I've just updated my pull request #33 with a fix pushed to the topic/locking branch. Since I have no access to hardware, it is only compile tested. Please download it from https://github.com/HeyuX10Automation/heyu/archive/refs/heads/topic/locking.zip, build, test and report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants