You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
JIT Token Expiration with Long-Running Sequential Workflows
Problem Summary
When running GitHub Actions workflows with max-parallel: 1 and long-running sequential jobs (total runtime > 60 minutes), JIT (Just-In-Time) runner tokens expire after ~60 minutes, causing jobs to fail with "The operation was canceled" error.
This is a fundamental limitation when:
Total workflow runtime exceeds JIT token lifetime (~60 minutes)
Jobs must run sequentially (max-parallel: 1)
Using ephemeral JIT-configured self-hosted runners
Expiration: After 60 minutes, GitHub invalidates the runner registration
Job Cancellation: Any job using that runner gets "The operation was canceled"
The Math Problem
N jobs × M minutes each = Total runtime
JIT token lifetime = 60 minutes
If Total runtime > 60 minutes:
Jobs 1 to floor(60/M): Complete successfully ✅
Jobs floor(60/M)+1 to N: Fail with expired token ❌
Example with 6-minute jobs:
37 jobs × 6 minutes = 222 minutes total
Jobs 1-10: Complete within 60 min window ✅
Jobs 11-37: Start after token expiration ❌
Why Current Architecture Fails
The serverless runner typically:
Receives N webhooks simultaneously when workflow triggers
Fetches N JIT configs immediately (all tokens created at T=0)
Spawns N sandboxes/containers (each with pre-fetched JIT)
Jobs run sequentially, but JIT tokens expire at T=60 regardless
Key Issue: JIT tokens are generated at webhook receipt time, not at job execution time.
Attempted Solutions
1. Queue-Based Worker with Deferred JIT Fetch
Approach: Move JIT fetching from webhook handler to worker function that processes jobs sequentially.
# Webhook: Queue metadata only# Worker: Fetch JIT when job actually runs, then spawn
Why It Fails:
GitHub expects runner to connect within 2-5 minutes of JIT generation
Delaying JIT fetch creates race condition where GitHub cancels job
GitHub's job assignment model expects immediate runner registration
Doesn't solve fundamental issue: sequential execution still exceeds token lifetime
Split long-running workflows into multiple workflow runs that each complete within 60 minutes:
# Instead of one workflow with 37 jobs,# Create multiple workflows or use dynamic matrix:# Workflow Run 1: Jobs 1-9 (54 min)# Workflow Run 2: Jobs 10-18 (54 min) # Workflow Run 3: Jobs 19-27 (54 min)# Workflow Run 4: Jobs 28-N (remaining)
Implementation:
strategy:
fail-fast: falsemax-parallel: 1matrix:
# Use only subset per workflow runjob_id: ${{ fromJson(env.JOB_BATCH) }}
Use traditional runner registration instead of JIT:
# Register runner once (manual or automated)
./config.sh --url https://github.com/OWNER/REPO --token $REGISTRATION_TOKEN# Run with persistent token
./run.sh --token $RUNNER_TOKEN
Pros:
Token doesn't expire during job execution
Simple implementation
Cons:
Security risk (long-lived token)
Requires token rotation policy
Loses benefits of ephemeral runners
Option 3: Hybrid Approach - Batch with Persistent Runner
Use persistent runner for long sequential workflows, JIT for short ones:
GitHub Workflow Trigger
↓
GitHub sends workflow_job webhook (action: queued)
↓
Serverless function receives webhook
↓
Function calls GitHub API: POST /actions/runners/generate-jitconfig
↓
GitHub returns JIT config (valid for ~60 minutes)
↓
Function spawns container/sandbox with JIT config
↓
Container runs: ./run.sh --jitconfig $JIT_CONFIG
↓
Runner connects to GitHub and picks up job
↓
Job executes
↓
Job completes, runner exits
The problem occurs when:
Step 3 (JIT generation) happens at T=0 for all jobs
Step 7 (job execution) for job N happens at T > 60 minutes
Workaround Checklist
If you're experiencing this issue, check:
Can you split jobs into multiple workflow runs (< 60 min each)?
Can you increase max-parallel to reduce total runtime?
Can you use persistent runner tokens instead of JIT?
Can you optimize job duration to be < 6 minutes each?
Can you reduce number of jobs in matrix?
Labels
Suggested labels for this issue:
enhancement
self-hosted-runners
jit-tokens
long-running-workflows
sequential-jobs
documentation
Summary
This issue documents a fundamental architectural limitation: JIT tokens are designed for short-lived ephemeral runners (~60 minutes), but GitHub Actions workflows can legitimately require longer sequential execution.
The core conflict:
JIT Security Model: Short-lived tokens (60 min) for ephemeral runners
Sequential Workflows: May require >60 min total runtime
Reduce total runtime (optimize jobs or increase parallelism)
Long-term solution: Requires GitHub to either:
Extend JIT token lifetime for long workflows
Provide token refresh mechanism
Support job-level (not runner-level) JIT tokens
This issue was compiled from multiple real-world production scenarios and extensive research. It aims to document the limitation clearly and provide actionable workarounds while advocating for a supported long-term solution.
JIT Token Expiration with Long-Running Sequential Workflows
Problem Summary
When running GitHub Actions workflows with
max-parallel: 1and long-running sequential jobs (total runtime > 60 minutes), JIT (Just-In-Time) runner tokens expire after ~60 minutes, causing jobs to fail with "The operation was canceled" error.This is a fundamental limitation when:
max-parallel: 1)Environment
max-parallel: 1(sequential execution)Steps to Reproduce
max-parallel: 1:Trigger workflow with enough jobs that total runtime exceeds 60 minutes
Observe that:
Expected Behavior
All jobs should complete successfully, with each job getting a fresh JIT token when it starts (not when the webhook is received).
Actual Behavior
Error observed:
Failed job timing pattern:
Root Cause Analysis
JIT Token Lifecycle
From GitHub documentation and runner source code:
generate-jitconfigAPI is called, GitHub creates a runner registration with a time-limited tokenThe Math Problem
Why Current Architecture Fails
The serverless runner typically:
Key Issue: JIT tokens are generated at webhook receipt time, not at job execution time.
Attempted Solutions
1. Queue-Based Worker with Deferred JIT Fetch
Approach: Move JIT fetching from webhook handler to worker function that processes jobs sequentially.
Why It Fails:
Reference: actions/runner auth documentation
2. Retry/Refresh JIT Config
Attempt: Detect expired token and re-fetch JIT config.
Why It Fails:
generate-jitconfigcreates a NEW runner registration, doesn't refresh existing3. Increase max-parallel
Attempt: Run jobs in parallel to reduce total runtime below 60 minutes.
Why Not Always Possible:
4. Persistent Runner Token
Attempt: Use
--tokeninstead of--jitconfigwith a long-lived token.Trade-offs:
Research & References
GitHub Documentation
GitHub Actions Limits: Usage limits for self-hosted runners
Automatic Token Authentication: GITHUB_TOKEN documentation
Self-Hosted Runners: About self-hosted runners
GitHub Community Discussions
Discussion #25699: GitHub token lifetime
Discussion #50472: Long-running workflow GITHUB_TOKEN timeout
Discussion #60513: How to configure idle_timeout with JIT
GitHub Issues
actions/runner How long is the runner registration token valid for? #1799: How long is runner registration token valid?
actions/runner Unable to use
./config remove --token ...on a just-in-time runner #2920: Unable to use ./config remove on JIT runnergitHubUrlin configactions-runner-controller Bump @typescript-eslint/eslint-plugin from 8.47.0 to 8.52.0 in /src/Misc/expressionFunc/hashFiles #4183: Runners not terminating after token expiry
actions-runner-controller How can I select runners with Intel Ice Lake ? #2466: Jobs expire while on queue
actions/runner Support for autoscaling self-hosted github runners #845: Support for autoscaling self-hosted runners
External Resources
AWS CodeBuild Issue: Failure to get JIT token
Orchestra Guide: JIT Runner Configuration
Constraints & Considerations
Why This Is Hard to Solve
Common Misconceptions
❌ "We can just fetch JIT when the job runs"
✅ GitHub expects runner registration within minutes of job assignment
❌ "We can retry failed jobs with fresh JIT"
✅ JIT is tied to specific runner registration; can't re-fetch for same job
❌ "Queue the jobs and process later"
✅ GitHub's job timeout (24h) ≠ JIT token lifetime (60min)
Proposed Solutions
Option 1: Batch Processing (Recommended Workaround)
Split long-running workflows into multiple workflow runs that each complete within 60 minutes:
Implementation:
Pros:
Cons:
Option 2: Persistent Runner Token (Security Trade-off)
Use traditional runner registration instead of JIT:
Pros:
Cons:
Option 3: Hybrid Approach - Batch with Persistent Runner
Use persistent runner for long sequential workflows, JIT for short ones:
Pros:
Cons:
Option 4: Workflow-Level Retry with Fresh Webhooks
Instead of job-level retry, trigger new workflow runs:
Pros:
Cons:
Option 5: GitHub-Supported Solution (Requested)
Request GitHub to support one of:
JIT Token Refresh API:
Extended JIT Lifetime:
Job-Level JIT:
Questions for GitHub
Is there an official way to refresh or extend JIT token lifetime for long-running workflows?
Can GitHub support increase JIT token lifetime for specific repositories/use cases?
Is there a documented pattern for handling workflows that exceed 60 minutes with self-hosted runners?
Should the
generate-jitconfigAPI support token refresh or longer lifetimes for sequential job processing?Could GitHub provide a "job-level" JIT token that's valid for the duration of a specific job rather than runner registration?
Related Issues & Discussions
Additional Context
Serverless Runner Architecture
Typical serverless GitHub Actions runner flow:
The problem occurs when:
Workaround Checklist
If you're experiencing this issue, check:
max-parallelto reduce total runtime?Labels
Suggested labels for this issue:
enhancementself-hosted-runnersjit-tokenslong-running-workflowssequential-jobsdocumentationSummary
This issue documents a fundamental architectural limitation: JIT tokens are designed for short-lived ephemeral runners (~60 minutes), but GitHub Actions workflows can legitimately require longer sequential execution.
The core conflict:
Viable workarounds:
Long-term solution: Requires GitHub to either:
This issue was compiled from multiple real-world production scenarios and extensive research. It aims to document the limitation clearly and provide actionable workarounds while advocating for a supported long-term solution.