Skip to content

[16.0][FIX] fs_attachment: bound GC cursor with per-transaction timeouts#596

Open
TecnologiaIG wants to merge 1 commit intoOCA:16.0from
TecnologiaIG:16.0-fs_attachment-fix-gc-cursor-timeout
Open

[16.0][FIX] fs_attachment: bound GC cursor with per-transaction timeouts#596
TecnologiaIG wants to merge 1 commit intoOCA:16.0from
TecnologiaIG:16.0-fs_attachment-fix-gc-cursor-timeout

Conversation

@TecnologiaIG
Copy link
Copy Markdown

Problem

FsFileGC._in_new_cursor opens a secondary cursor to record files for garbage collection, but that cursor has no statement_timeout nor idle_in_transaction_session_timeout. If the external storage backend (fsspec — verified with Azure Blob, same failure mode applies to S3 or any high-latency backend) stalls the transaction that opened it, the cursor sits idle in transaction while keeping the row lock taken by:

INSERT INTO fs_file_gc (store_fname, ...) VALUES (...) ON CONFLICT DO NOTHING

Because fs_file_gc.store_fname has a unique constraint, every concurrent attachment write that tries to mark the same (or sometimes any) row serialises behind that lock until the main session's statement_timeout fires — on Odoo.sh the default is 900 s.

When that finally happens, the blocked POST /mail/attachment/upload (and /web/binary/upload_attachment) returns a 500 with the standard Odoo error HTML. The web client's FileInput.uploadFiles pipes that response straight into JSON.parse, which surfaces as:

UncaughtPromiseError > SyntaxError
Unexpected token '<', "<!DOCTYPE "... is not valid JSON
    at JSON.parse (<anonymous>)
    at FileInput.uploadFiles

We observed this in production 2026-04-21: 6 idle in transaction cursors (holding the fs_storage/ir_cron/res_users reads that the ORM triggers when opening env() on a fresh cursor) locked the table for ~40 min until ops killed them with pg_terminate_backend. During that window every helpdesk compose upload hung.

Fix

Add two SET LOCAL statements at the top of the cursor context manager:

cr.execute("SET LOCAL statement_timeout = '30s'")
cr.execute("SET LOCAL idle_in_transaction_session_timeout = '60s'")

SET LOCAL scopes both values to the current transaction, so they don't leak into the pooled connection on commit. The numbers are generous (an Azure Blob write should complete in under a second) but tight enough that PostgreSQL kills the offending cursor long before the main session's statement_timeout can cascade into user-visible failures.

Test plan

  • Validated in production (Odoo.sh, DB with autovacuum_gc enabled on Azure Blob storage). After deployment: 0 idle in transaction zombies holding fs_file_gc locks under sustained upload load.
  • Happy-path GC still works: SELECT count(*) FROM fs_file_gc decreases on @api.autovacuum runs.
  • Simulated backend stall: block egress to storage briefly, verify _mark_for_gc raises inside 30 s instead of hanging 900 s.

Other versions

Happy to forward-port once 16.0 is reviewed — the code is identical on 17.0/18.0/19.0 as of today's tip. Let me know the preferred workflow.

The secondary cursor opened by ``FsFileGC._in_new_cursor`` had no
statement or idle-in-transaction timeout, so a slow external storage
backend (observed on Azure Blob, same class of issue on any fsspec
backend with network latency) could leave it parked while waiting on
I/O. The cursor then kept the row lock on ``fs_file_gc`` taken by the
``ON CONFLICT DO NOTHING`` INSERT in ``_mark_for_gc``, serialising
every concurrent attachment write behind the ``store_fname`` unique
constraint.

In production we observed six ``idle in transaction`` cursors holding
this lock for 4-10 minutes each; every queued
``POST /mail/attachment/upload`` and ``/web/binary/upload_attachment``
hit the Odoo session ``statement_timeout`` (900s on Odoo.sh) and
returned a 500 HTML page, which the frontend ``FileInput.uploadFiles``
passes straight to ``JSON.parse``, surfacing as::

    Unexpected token '<', "<!DOCTYPE "... is not valid JSON

A ``SET LOCAL statement_timeout = '30s'`` and
``SET LOCAL idle_in_transaction_session_timeout = '60s'`` on the new
cursor cap the damage and let the main transaction fail fast.
Both settings are local to the transaction so they do not leak into
the pooled connection after commit.

Signed-off-by: TecnologiaIG <tecnologia@intensegroupgt.com>
@OCA-git-bot
Copy link
Copy Markdown
Contributor

Hi @lmignon,
some modules you are maintaining are being modified, check this out!

Copy link
Copy Markdown
Contributor

@lmignon lmignon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for suggesting this change. However, I am surprised by the error encountered. This issue should only occur if you try to create two files with the same name in the same location in the storage. The likelihood of this happening is virtually zero if the storage is configured to obfuscate filenames and optimize file paths for attachments. But perhaps I am missing something.

"name": "Base Attachment Object Store",
"summary": "Store attachments on external object store",
"version": "16.0.2.0.1",
"version": "16.0.2.0.2",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version should not be bump manually. The merge bot will do it ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants