Fix sf_error(NULL) race condition under concurrent open failures#480
Fix sf_error(NULL) race condition under concurrent open failures#480bastibe merged 2 commits intobastibe:masterfrom
Conversation
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
cc @bastibe Lmk if this looks good :) |
|
cc @bastibe Does this PR look good to you? Would really appreciate your review :) |
bastibe
left a comment
There was a problem hiding this comment.
Thank you for this bugfix. Did you observe this bug in the wild, or is this theoretical?
I don't like the way the lock and the error handling is repeated three times. How about we wrap the opening function in an openfunction, much like the isinstance(file, bytes) case (currying with additional virtio-wrapper and args as needed), and call them in a common place in the end?
|
@bastibe Yep, we observed it in vLLM. The related PR on vLLM: vllm-project/vllm#37616 |
|
OK, thank you for the confirmation. As I said, please factor out the open function to avoid the triplication, as in: # ...
_sf_error_lock = _threading.Lock()
def _open(self, file, mode_int, closefd):
"""Call the appropriate sf_open*() function from libsndfile."""
if isinstance(file, (_unicode, bytes)):
if _os.path.isfile(file):
if 'x' in self.mode:
raise OSError("File exists: {0!r}".format(self.name))
elif set(self.mode).issuperset('w+'):
# truncate the file, because SFM_RDWR doesn't:
_os.close(_os.open(file, _os.O_WRONLY | _os.O_TRUNC))
openfunction = _snd.sf_open
if isinstance(file, _unicode):
if _sys.platform == 'win32':
openfunction = _snd.sf_wchar_open
else:
file = file.encode(_sys.getfilesystemencoding())
elif isinstance(file, int):
openfunction = lambda file, mode_int, info: _snd.sf_open_fd(file, mode_int, info, closefd)
elif _has_virtual_io_attrs(file, mode_int):
openfunction = lambda file, mode_int, info: _snd.sf_open_virtual(self._init_virtual_io(file),
mode_int, info, _ffi.NULL)
else:
raise TypeError("Invalid file: {0!r}".format(self.name))
with self._sf_error_lock:
file_ptr = openfunction(file, mode_int, self._info)
if file_ptr == _ffi.NULL:
err = _snd.sf_error(file_ptr)
raise LibsndfileError(err, prefix=f"Error opening {0!r}: ".format(self.name))(this is just a sketch, I didn't test it) |
|
@bastibe Done :) |
|
The Windows errors are not your fault, and are addressed in #477. Is this ready to merge from your end? |
|
@bastibe Oh yes :) I was already trying to figure out the failure 😅 |
|
Thank you very much! |
|
My pleasure :) |
Serialise each
sf_open*()+sf_error(NULL)pair under a class-level lock so the global error state cannot be clobbered between the two calls when multiple threads fail concurrently.Problem
sf_error(NULL)reads a global (non-thread-safe) error variable in libsndfile. When multiple threads concurrently open unsupported formats (e.g. MP4 bytes viasf_open_virtual), one thread can clear the global error before another reads it. This producesLibsndfileErrorwithcode=0and the message"(Garbled error message from libsndfile)"instead of the correct error code.See #479 for a full reproduction script.
Verification
Before fix:
400 concurrent attempts on MP4 bytes:
All codes seen: {0, 1}
Code 0 (race condition) count: 23
After fix:
same test:
All codes seen: {1}
Code 0 (race condition) count: 0
Normal read/write and concurrent reads of valid audio files are unaffected.