Skip to content

Fix sf_error(NULL) race condition under concurrent open failures#480

Merged
bastibe merged 2 commits intobastibe:masterfrom
AndreasKaratzas:akaratza_fix_race_cond
Apr 20, 2026
Merged

Fix sf_error(NULL) race condition under concurrent open failures#480
bastibe merged 2 commits intobastibe:masterfrom
AndreasKaratzas:akaratza_fix_race_cond

Conversation

@AndreasKaratzas
Copy link
Copy Markdown
Contributor

Serialise each sf_open*() + sf_error(NULL) pair under a class-level lock so the global error state cannot be clobbered between the two calls when multiple threads fail concurrently.

Problem

sf_error(NULL) reads a global (non-thread-safe) error variable in libsndfile. When multiple threads concurrently open unsupported formats (e.g. MP4 bytes via sf_open_virtual), one thread can clear the global error before another reads it. This produces LibsndfileError with code=0 and the message "(Garbled error message from libsndfile)" instead of the correct error code.

See #479 for a full reproduction script.

Verification

Before fix:
400 concurrent attempts on MP4 bytes:
All codes seen: {0, 1}
Code 0 (race condition) count: 23

After fix:
same test:
All codes seen: {1}
Code 0 (race condition) count: 0

Normal read/write and concurrent reads of valid audio files are unaffected.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@AndreasKaratzas
Copy link
Copy Markdown
Contributor Author

cc @bastibe Lmk if this looks good :)

@AndreasKaratzas
Copy link
Copy Markdown
Contributor Author

cc @bastibe

Does this PR look good to you? Would really appreciate your review :)

Copy link
Copy Markdown
Owner

@bastibe bastibe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this bugfix. Did you observe this bug in the wild, or is this theoretical?

I don't like the way the lock and the error handling is repeated three times. How about we wrap the opening function in an openfunction, much like the isinstance(file, bytes) case (currying with additional virtio-wrapper and args as needed), and call them in a common place in the end?

@AndreasKaratzas
Copy link
Copy Markdown
Contributor Author

@bastibe Yep, we observed it in vLLM. The related PR on vLLM: vllm-project/vllm#37616

@bastibe
Copy link
Copy Markdown
Owner

bastibe commented Apr 17, 2026

OK, thank you for the confirmation.

As I said, please factor out the open function to avoid the triplication, as in:

    # ...
    _sf_error_lock = _threading.Lock()

    def _open(self, file, mode_int, closefd):
        """Call the appropriate sf_open*() function from libsndfile."""
        if isinstance(file, (_unicode, bytes)):
            if _os.path.isfile(file):
                if 'x' in self.mode:
                    raise OSError("File exists: {0!r}".format(self.name))
                elif set(self.mode).issuperset('w+'):
                    # truncate the file, because SFM_RDWR doesn't:
                    _os.close(_os.open(file, _os.O_WRONLY | _os.O_TRUNC))
            openfunction = _snd.sf_open
            if isinstance(file, _unicode):
                if _sys.platform == 'win32':
                    openfunction = _snd.sf_wchar_open
                else:
                    file = file.encode(_sys.getfilesystemencoding())
        elif isinstance(file, int):
            openfunction = lambda file, mode_int, info: _snd.sf_open_fd(file, mode_int, info, closefd)
        elif _has_virtual_io_attrs(file, mode_int):
            openfunction = lambda file, mode_int, info: _snd.sf_open_virtual(self._init_virtual_io(file),
                                            mode_int, info, _ffi.NULL)
        else:
            raise TypeError("Invalid file: {0!r}".format(self.name))

        with self._sf_error_lock:
            file_ptr = openfunction(file, mode_int, self._info)
            if file_ptr == _ffi.NULL:
                err = _snd.sf_error(file_ptr)
                raise LibsndfileError(err, prefix=f"Error opening {0!r}: ".format(self.name))

(this is just a sketch, I didn't test it)

@AndreasKaratzas
Copy link
Copy Markdown
Contributor Author

@bastibe Done :)

@bastibe
Copy link
Copy Markdown
Owner

bastibe commented Apr 20, 2026

The Windows errors are not your fault, and are addressed in #477.

Is this ready to merge from your end?

@AndreasKaratzas
Copy link
Copy Markdown
Contributor Author

@bastibe Oh yes :) I was already trying to figure out the failure 😅

@bastibe bastibe merged commit 13f8e20 into bastibe:master Apr 20, 2026
16 of 30 checks passed
@bastibe
Copy link
Copy Markdown
Owner

bastibe commented Apr 20, 2026

Thank you very much!

@AndreasKaratzas AndreasKaratzas deleted the akaratza_fix_race_cond branch April 20, 2026 17:02
@AndreasKaratzas
Copy link
Copy Markdown
Contributor Author

My pleasure :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants