[16.0][FIX] fs_attachment: bound GC cursor with per-transaction timeouts#596
Open
TecnologiaIG wants to merge 1 commit intoOCA:16.0from
Open
[16.0][FIX] fs_attachment: bound GC cursor with per-transaction timeouts#596TecnologiaIG wants to merge 1 commit intoOCA:16.0from
TecnologiaIG wants to merge 1 commit intoOCA:16.0from
Conversation
The secondary cursor opened by ``FsFileGC._in_new_cursor`` had no
statement or idle-in-transaction timeout, so a slow external storage
backend (observed on Azure Blob, same class of issue on any fsspec
backend with network latency) could leave it parked while waiting on
I/O. The cursor then kept the row lock on ``fs_file_gc`` taken by the
``ON CONFLICT DO NOTHING`` INSERT in ``_mark_for_gc``, serialising
every concurrent attachment write behind the ``store_fname`` unique
constraint.
In production we observed six ``idle in transaction`` cursors holding
this lock for 4-10 minutes each; every queued
``POST /mail/attachment/upload`` and ``/web/binary/upload_attachment``
hit the Odoo session ``statement_timeout`` (900s on Odoo.sh) and
returned a 500 HTML page, which the frontend ``FileInput.uploadFiles``
passes straight to ``JSON.parse``, surfacing as::
Unexpected token '<', "<!DOCTYPE "... is not valid JSON
A ``SET LOCAL statement_timeout = '30s'`` and
``SET LOCAL idle_in_transaction_session_timeout = '60s'`` on the new
cursor cap the damage and let the main transaction fail fast.
Both settings are local to the transaction so they do not leak into
the pooled connection after commit.
Signed-off-by: TecnologiaIG <tecnologia@intensegroupgt.com>
3 tasks
Contributor
|
Hi @lmignon, |
lmignon
reviewed
Apr 22, 2026
Contributor
lmignon
left a comment
There was a problem hiding this comment.
Thank you for suggesting this change. However, I am surprised by the error encountered. This issue should only occur if you try to create two files with the same name in the same location in the storage. The likelihood of this happening is virtually zero if the storage is configured to obfuscate filenames and optimize file paths for attachments. But perhaps I am missing something.
| "name": "Base Attachment Object Store", | ||
| "summary": "Store attachments on external object store", | ||
| "version": "16.0.2.0.1", | ||
| "version": "16.0.2.0.2", |
Contributor
There was a problem hiding this comment.
The version should not be bump manually. The merge bot will do it ;-)
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
FsFileGC._in_new_cursoropens a secondary cursor to record files for garbage collection, but that cursor has nostatement_timeoutnoridle_in_transaction_session_timeout. If the external storage backend (fsspec — verified with Azure Blob, same failure mode applies to S3 or any high-latency backend) stalls the transaction that opened it, the cursor sitsidle in transactionwhile keeping the row lock taken by:Because
fs_file_gc.store_fnamehas a unique constraint, every concurrent attachment write that tries to mark the same (or sometimes any) row serialises behind that lock until the main session'sstatement_timeoutfires — on Odoo.sh the default is 900 s.When that finally happens, the blocked
POST /mail/attachment/upload(and/web/binary/upload_attachment) returns a 500 with the standard Odoo error HTML. The web client'sFileInput.uploadFilespipes that response straight intoJSON.parse, which surfaces as:We observed this in production 2026-04-21: 6
idle in transactioncursors (holding thefs_storage/ir_cron/res_usersreads that the ORM triggers when openingenv()on a fresh cursor) locked the table for ~40 min until ops killed them withpg_terminate_backend. During that window every helpdesk compose upload hung.Fix
Add two
SET LOCALstatements at the top of the cursor context manager:SET LOCALscopes both values to the current transaction, so they don't leak into the pooled connection on commit. The numbers are generous (an Azure Blob write should complete in under a second) but tight enough that PostgreSQL kills the offending cursor long before the main session'sstatement_timeoutcan cascade into user-visible failures.Test plan
autovacuum_gcenabled on Azure Blob storage). After deployment: 0idle in transactionzombies holdingfs_file_gclocks under sustained upload load.SELECT count(*) FROM fs_file_gcdecreases on@api.autovacuumruns._mark_for_gcraises inside 30 s instead of hanging 900 s.Other versions
Happy to forward-port once
16.0is reviewed — the code is identical on17.0/18.0/19.0as of today's tip. Let me know the preferred workflow.