Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use flock() instead of UUCP-style locking for serial devices #1770

Open
wants to merge 44 commits into
base: master
Choose a base branch
from

Conversation

taw10
Copy link

@taw10 taw10 commented Feb 7, 2022

This replaces AcquireUUCPLockAndOpen() with AcquireLockAndOpen(). Depending on a compile-time option (--enable-uucp-locking), either flock() or UUCP-style lockfiles will be used as the underlying lock mechanism. The default is to use flock(). The configure script will fall back on UUCP locking if flock() is not found.

It would also be possible to preserve the API by naming the top-level open/lock routine AcquireUUCPLockAndOpen, but then the routine name would not be an accurate description. I've updated the plugins using these routines (only two of them, StageProfi and UsbPro).

ReleaseUUCPLock() is still present, becoming a no-op if flock() is being used. With flock(), the lock is automatically released when the file descriptor is closed.

In practice, the TIOCEXCL lock (when available) is much more helpful, because it prevents any other process from opening the device and doesn't rely on all processes using the same locking scheme. This lock is still used when flock() is available.

Fixes #1189, #1674 and many related problems on various platforms (e.g. running in a Toolbox container on Fedora Silverblue).

This replaces AcquireUUCPLockAndOpen() with AcquireLockAndOpen().
Depending on a compile-time option (--enable-uucp-locking), either
flock() or UUCP-style lockfiles will be used as the underlying lock
mechanism.  The default is flock().

The plugins using these routines have been updated.
Copy link
Member

@peternewman peternewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @taw10 ,

Thanks for this!

There are some lint issues if you could fix them please:
https://app.travis-ci.com/github/OpenLightingProject/ola/jobs/558598505#L1483-L1487

If possible a test for this would be awesome! I.e. lock a file and try and open it again and confirm it fails etc. Which will then let people know during make test if their chosen locking hasn't worked.

common/io/Serial.cpp Outdated Show resolved Hide resolved
@@ -27,6 +27,7 @@
#include <signal.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/file.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has sys/file.h existed forever, so we're not breaking UUCP builds which don't have it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I'll investigate. However the header is checked by the configure script, and used without any #ifdefs in one other place (KarateLight).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, obviously that plugin breaking wouldn't have quite so much impact, but the plugin has been around for a while and no-one has complained so we're probably good...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

he flock call itself conforms to 4.4BSD, i.e. it's been around since 1997. So I think we're pretty safe here.

I could add guards around the #include (ifdef HAVE_SYS_FILE_H), but if it's worked so far then I don't see any reason to add the extra complexity. On inspection, I'm not actually sure why KarateLight includes this header at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah very good point about the fact it's worked!

I'm not sure about KarateLight either; @cpresser ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I am not sure what to answer here. My C++ is quite rusty, I have not used it in years and sure have no idea which include contains which functionality.

I recall that I did include file locking of the serial-port-device in the karatelight code. A quick search turned up this piece of code here:
https://github.com/OpenLightingProject/ola/blob/master/plugins/karate/KarateLight.cpp#L121

Does that help?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #include <sys/file.h> has been there since the first version of KarateDevice.cpp (f69a26b), but the flock call is in a different file (KarateLight.cpp). I suspect the flock calls were originally in KarateDevice.cpp and got moved.

It does mean, however, that there's not only the #include but also a real use of flock() in OLA already without any tests! Therefore we're probably pretty safe using it for all serial devices. Nevertheless, we'll keep the tests and the conditionals :)

common/io/Serial.cpp Outdated Show resolved Hide resolved
common/io/Serial.cpp Outdated Show resolved Hide resolved
common/io/Serial.cpp Outdated Show resolved Hide resolved
common/io/Serial.cpp Outdated Show resolved Hide resolved
include/ola/io/Serial.h Outdated Show resolved Hide resolved
common/io/Serial.cpp Outdated Show resolved Hide resolved
@taw10
Copy link
Author

taw10 commented Feb 13, 2022

Two more things still to do (add a test and check the situation with sys/file.h), but this should cover most of the comments. The lint errors should be fixed (I may have introduced new errors, let's see). I've also renamed the top-level lock/unlock routines, because "ReleaseLock" seemed too vague. Now it's ReleaseSerialPortLock.

Note that in the flock() case, the unit test will really only test the TIOCEXCL locking, since that will prevent the open() call from succeeding before we try to flock(). Working round that would introduce some nasty complexity. But, as long as one of the locking mechanisms works, it's OK. With UUCP lockfiles, the check is done before we try to open the port itself.

common/io/Serial.cpp Outdated Show resolved Hide resolved
common/io/Serial.cpp Outdated Show resolved Hide resolved
include/ola/io/Serial.h Outdated Show resolved Hide resolved
include/ola/io/Serial.h Outdated Show resolved Hide resolved
include/ola/io/Serial.h Outdated Show resolved Hide resolved
common/io/Serial.cpp Outdated Show resolved Hide resolved
Copy link
Member

@peternewman peternewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more tweaks

plugins/usbpro/BaseUsbProWidget.cpp Outdated Show resolved Hide resolved
@peternewman
Copy link
Member

Two more things still to do (add a test and check the situation with sys/file.h), but this should cover most of the comments.

Great thanks.

The lint errors should be fixed (I may have introduced new errors, let's see).

Only the one! 😆

I've also renamed the top-level lock/unlock routines, because "ReleaseLock" seemed too vague. Now it's ReleaseSerialPortLock.

As mentioned in the comment, won't this lock work on any file descriptor? Maybe the UUCP stuff is only implemented for devices?

Note that in the flock() case, the unit test will really only test the TIOCEXCL locking, since that will prevent the open() call from succeeding before we try to flock(). Working round that would introduce some nasty complexity. But, as long as one of the locking mechanisms works, it's OK.

Exactly, I'm more interested in catching an issue during CI/build and around our logic than I am bothered about thoroughly testing this particular system feature. If we prove we can't open a file twice due to the lock then that's a good test as far as I'm concerned.

With UUCP lockfiles, the check is done before we try to open the port itself.

Yep, understood.

@taw10
Copy link
Author

taw10 commented Feb 19, 2022

Unit test is still to come (I have to get myself acquainted with the test system first). One more question: would you prefer to have these commits squashed down to at least three (Codespell fix, unit test, everything else)? The overall diff is very small and understandable. Or do you prefer to have the complete history?

@taw10
Copy link
Author

taw10 commented Feb 19, 2022

I've added a test, but it's not very satisfactory for a couple of reasons. Let me know what you think!

The problem is that we don't have a real serial port to test with, and the TIOCEXCL lock won't work with a normal file. To avoid creating a new file, I used the ChangeLog file for the test since it's always present at the top level of the project. This feels bad to me, but the alternatives all seem less portable. Maybe you or someone else has a suggestion.

To prevent the normal file from failing the test, I changed the two locking routines to not fail if ioctl(fd, TIOCEXCL) fails. Perhaps this is an improvement, because we only need one of the three locking mechanisms to work (flock, UUCP or TIOCEXCL), but it also feels bad.

taw10 and others added 3 commits March 5, 2022 10:30
Co-authored-by: Peter Newman <peternewman@users.noreply.github.com>
Co-authored-by: Peter Newman <peternewman@users.noreply.github.com>
Co-authored-by: Peter Newman <peternewman@users.noreply.github.com>
@taw10
Copy link
Author

taw10 commented Mar 5, 2022

The problem is that we don't have a real serial port to test with, and the TIOCEXCL lock won't work with a normal file.

Is there any reason not to try locking a real serial port too if one exists? See the messaging ability at the bottom of this comment it could just fail saying maybe the port is in use. Especially if it tested ALL ports on a machine there's a good chance one would be available...

The idea of poking real hardware as part of make check worries me - we don't know what's on the other end of the serial port. Plus I'm not sure I want to write a load of serial port enumeration stuff (probably Linux-specific as well) just for this purpose.

Would it make sense to split each of the individual lock mechanism functions out, then you could have it like so (or whatever order it's currently in):

CheckFileExists
#ifdef flock
LockFlock
#else
LockUUCP
#endif
LockTIOCEXCL

So you could then test the constituent parts individually which would work around this issue.

I'll implement this approach. This way, we have a new locking mechanism (flock) and a test that it works, the UUCP stuff is still there as a fallback, and the TIOCEXCL is on top of either without a test. This is already a big improvement over the old situation (legacy UUCP locking without a test). The KarateLight experience (see above) suggests that the UUCP locking will never be used, so the idea of testing UUCP might be purely theoretical anyway.

Presumably the test failing/bombing out once before deletion would cause all subsequent tests to fail afterwards?

Unfortunately yes. I'll add some cleanup so that the file gets removed in any case, and a message to make it clear that the file already exists. However I want to avoid deleting a random user file in the event that my choice of filename wasn't "obscure" enough.

Update pending...

UUCP locking: performed if requested at configure time
flock(): performed if the flock call is available (always?)
ioctl(fd, TIOCEXCL): performed if sys/ioctl.h is available

It's possible that all three will be done, but only one of them really
has to work.  If any of the selected methods fail, however, the port
opening will be aborted.
@taw10
Copy link
Author

taw10 commented Mar 6, 2022

Ok, AcquireLockAndOpenSerialPort() now does this:

check port exists
if UUCP: acquire UUCP lock
open port
if FLOCK: apply flock()
if HAVE_IOCTL: apply TIOCEXCL

If any stage fails, it will bail out, closing the port and releasing UUCP lock if applicable. So, all of the available locking mechanisms (potentially three of them, see below) have to succeed. The bailout logic is a little more complicated than I'd like, because flock() happens after opening, whereas UUCP happens before.

AcquireUUCPLockAndOpen() is still available, although no-one should be using it. It does this:

check port exists
acquire UUCP lock
open port
if HAVE_IOCTL: apply TIOCEXCL

I've split off acquire UUCP lock and apply TIOCEXCL as separate routines.

Each step prints at least an OLA_INFO on success, so you can see which locks are used. If configured with --enable-uucp-locking and flock() is available, it will actually do both - there's no reason not to use flock() if the routine is available. The information about KarateLight suggests that flock() is always available, which would lead me to suggest completely removing the UUCP locking. But perhaps there are systems still relying on it.

The test program (SerialLockTester) tests that the flock() call succeeds, but not that it prevents a second lock. That would be a lot of trouble to test, because the second flock would have to be from a different process. I consider it more a simple test that the routine really exists - it doesn't seem unreasonable to rely on it doing what the specification says it should! It now uses a PID-based filename, to prevent one failure from causing subsequent ones. UUCP locking and TIOCEXCL are not tested at all because of the difficulties noted above, but it's still overall an improvement on the previous situation.

Copy link
Member

@peternewman peternewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor tweaking

common/io/Serial.cpp Outdated Show resolved Hide resolved
common/io/Serial.cpp Show resolved Hide resolved
common/io/Serial.cpp Outdated Show resolved Hide resolved
common/io/SerialLockTester.cpp Outdated Show resolved Hide resolved
@peternewman
Copy link
Member

The idea of poking real hardware as part of make check worries me - we don't know what's on the other end of the serial port.

I think we do already as part of our autostart tests TBH (but I can see your point)...

Plus I'm not sure I want to write a load of serial port enumeration stuff (probably Linux-specific as well) just for this purpose.

I think that code should already exist as we already do essentially that in some of the plugins that will use this new locking!

I'll implement this approach. This way, we have a new locking mechanism (flock) and a test that it works, the UUCP stuff is still there as a fallback, and the TIOCEXCL is on top of either without a test. This is already a big improvement over the old situation (legacy UUCP locking without a test). The KarateLight experience (see above) suggests that the UUCP locking will never be used, so the idea of testing UUCP might be purely theoretical anyway.

Yeah fair point.

Presumably the test failing/bombing out once before deletion would cause all subsequent tests to fail afterwards?

Unfortunately yes. I'll add some cleanup so that the file gets removed in any case, and a message to make it clear that the file already exists. However I want to avoid deleting a random user file in the event that my choice of filename wasn't "obscure" enough.

Lovely thanks. That should be good enough, as long as the test fails with a non-cryptic message it shouldn't generate too many issues (judging by the auto-start stuff when it's already running as a service which obviously breaks more often).

Comment on lines 94 to 95
* Depending on the compile-time configuration, this will use either flock()
* (the default) or UUCP locking. See: ./configure --enable-uucp-locking.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this now doing both/all?

* If UUCP locking was used (see AcquireLockAndOpenSerialPort()), the lockfile
* will be removed (but only if the PID matches).
*
* Does nothing if flock() was used (see ./configure --enable-uucp-locking).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again can't this do all/more now?

@peternewman
Copy link
Member

Ok, AcquireLockAndOpenSerialPort() now does this:
If any stage fails, it will bail out, closing the port and releasing UUCP lock if applicable. So, all of the available locking mechanisms (potentially three of them, see below) have to succeed. The bailout logic is a little more complicated than I'd like, because flock() happens after opening, whereas UUCP happens before.

Amazing!

AcquireUUCPLockAndOpen() is still available, although no-one should be using it. It does this:

Cool.

I've split off acquire UUCP lock and apply TIOCEXCL as separate routines.

Each step prints at least an OLA_INFO on success, so you can see which locks are used. If configured with --enable-uucp-locking and flock() is available, it will actually do both - there's no reason not to use flock() if the routine is available.

👍

The information about KarateLight suggests that flock() is always available, which would lead me to suggest completely removing the UUCP locking. But perhaps there are systems still relying on it.

Yeah given some of the random architectures we've compiled on, they might just not have the KarateLight plugin included/compiled...

The test program (SerialLockTester) tests that the flock() call succeeds, but not that it prevents a second lock. That would be a lot of trouble to test, because the second flock would have to be from a different process.

Ah yeah that would be quite a hassle! Although we could have a shell script wrapper which runs two copies if that's easy enough (like we do for some Python stuff), but as you say may not be worth the hassle.

I consider it more a simple test that the routine really exists - it doesn't seem unreasonable to rely on it doing what the specification says it should! It now uses a PID-based filename, to prevent one failure from causing subsequent ones. UUCP locking and TIOCEXCL are not tested at all because of the difficulties noted above, but it's still overall an improvement on the previous situation.

Yeah sounds great!

taw10 and others added 8 commits March 20, 2022 10:11
Co-authored-by: Peter Newman <peternewman@users.noreply.github.com>
Co-authored-by: Peter Newman <peternewman@users.noreply.github.com>
Co-authored-by: Peter Newman <peternewman@users.noreply.github.com>
Co-authored-by: Peter Newman <peternewman@users.noreply.github.com>
@taw10
Copy link
Author

taw10 commented Mar 20, 2022

Ok, some updates in addition to your suggested changes (which were all applied):

  • Made sure that all the #endifs have comments
  • Fixed the comments about the behaviour of AcquireLockAndOpenSerialPort and ReleaseSerialPortLock
  • Made sure that RemoveUUCPLockFile() happens after close() (as it was before)
  • Added missing ReleaseUUCPLock() calls on the error paths of AcquireLockAndOpenSerialPort

taw10 and others added 2 commits November 11, 2022 16:45
Co-authored-by: Peter Newman <peternewman@users.noreply.github.com>
@taw10
Copy link
Author

taw10 commented Nov 13, 2022

The CI is failing, but the error doesn't make any sense to me (ERROR: No matching distribution found for setuptools>=64). Any suggestions?

@DaAwesomeP
Copy link
Member

@peternewman should this go into 0.10 or master?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UUCP locking step fail in archlinuxARM
4 participants