Skip to content

Cannot find run when more than 5 exist for that branch (pagination issue) #164

@OscarVanL

Description

@OscarVanL

I want to use dispatch-workflow to manually trigger CI deploys for various microservices. For this, I may want to deploy dozens of services simultaneously, which results in dozens of concurrent workflow dispatches.

For example, I may tick all the checkboxes and hit "Approve and deploy" at the same time. Each of these will trigger a workflow dispatch:
Image

I have started encountering a behaviour where a given service's workflow dispatch isn't discovered, even when changing the exponential backoff parameters so aggressively that it keeps retrying for 1m 46s!

🔄 Exponential backoff parameters:
    starting-delay: 200
    max-attempts: 10
    time-multiple: 2
⌛ Fetching workflow id for deploy.yml
✅ Fetched workflow id: REDACTED
✅ Successfully dispatched workflow using workflow_dispatch method:
    repository: REDACTED
    branch: main
    workflow-id: REDACTED
    distinct-id: 7b4db870-2a2c-474e-973d-ae9c3ce5502b
    workflow-inputs: {"env":"dev","service":"redacted"}
⌛ Fetching run-ids for workflow with distinct-id=7b4db870-2a2c-474e-973d-ae9c3ce5502b
Warning: 🟠 Does the token have the correct permissions?
Error: 🔴 Failed to complete: 
getDispatchedWorkflowRun: Failed to find dispatched workflow
Distinct ID: 7b4db870-2a2c-474e-973d-ae9c3ce5502b

I turned on debug logging, and can see logs that look suspiciously like limited / paginated results that haven't been followed.

For example, take this deploy that did find the workflow dispatch, but took 29 seconds to find it!

##[debug]
##[debug]Fetched Workflow Runs
##[debug]Repository: REDACTED
##[debug]Branch: main
##[debug]Runs Fetched: [14312561579,14312482181,14312482192,14312482436,14312481975]
##[debug]
##[debug]Fetched Workflow Runs
##[debug]Repository: REDACTED
##[debug]Branch: main
##[debug]Runs Fetched: [14312561579,14312482181,14312482192,14312482436,14312481975]
##[debug]
##[debug]Fetched Workflow Runs
##[debug]Repository: REDACTED
##[debug]Branch: main
##[debug]Runs Fetched: [14312561579,14312482181,14312482192,14312482436,14312481975]
##[debug]
##[debug]Fetched Workflow Runs
##[debug]Repository: REDACTED
##[debug]Branch: main
##[debug]Runs Fetched: [14312600660,14312600650,14312600645,14312600640,14312561579]
##[debug]
##[debug]Fetched Workflow Runs
##[debug]Repository: REDACTED
##[debug]Branch: main
##[debug]Runs Fetched: [14312601005,14312600650,14312600645,14312600640,14312600660]
##[debug]
##[debug]Fetched Workflow Runs
##[debug]Repository: REDACTED
##[debug]Branch: main
##[debug]Runs Fetched: [14312601005,14312600650,14312600645,14312600640,14312600660]
##[debug]
##[debug]Fetched Workflow Runs
##[debug]Repository: REDACTED
##[debug]Branch: main
##[debug]Runs Fetched: [14312601005,14312600640,14312600660,14312600645,14312600650]
##[debug]
##[debug]Fetched Workflow Runs
##[debug]Repository: REDACTED
##[debug]Branch: main
##[debug]Runs Fetched: [14312601005,14312600640,14312600660,14312600854,14312600645]
✅ Successfully identified remote run:
    run-id: 14312600854
    run-url: https://github.com/REDACTED/actions/runs/14312600854

So basically we got the same run results back every time, until on the final API request the limited response finally happened to include our desired run ID.

As I can see from action's code, the value '5' seems to be some magic number page size limit on PRs (and on main it's 10):

response = await octokit.rest.actions.listWorkflowRuns({
owner: config.owner,
repo: config.repo,
workflow_id: config.workflow,
...(branchName
? {
branch: branchName,
per_page: 5
}
: {
per_page: 10
})
})

I guess perhaps the reason you chose to limit the response was to make the API more performant so it can perform its polling more efficiently, however in my use case it breaks the action.

I guess the fix requires the action to follow the paginated responses as far as they go to ensure the desired workflow dispatch is included.

Scaling issue?

If we follow the paginated response all the way to the end, this will introduce a scaling issue if there were a large number of workflow dispatches on a branch.

For example, in my use-case I will be making hundreds/thousands of workflow dispatches over time on the same branch (the scenario is, I merge my changes to main, then deploy to production via workflow dispatch, multiplied by hundreds of deploys). Over time the action will take longer and longer to reach the end of the paginated responses...

I think there is a solution. I looked at the List workflow runs for a repository API, and I see there is a created field:

created string
Returns workflow runs created within the given date-time range. For more information on the syntax, see "Understanding the search syntax."

Suggested Changes

I think it would make sense to make the following changes:

  • The created field should be assigned so that the listWorkflowRuns call only shows workflow dispatches that were created after the action was triggered. This would involve recording a timestamp when lasith-kg/dispatch-workflow is first invoked, then calling the listWorkflowRuns API with this value (perhaps subtract a few seconds to account for any clock drift between the runner and GitHub APIs).
  • The per_page option could possibly be increased from 5. Maybe this could be an optional input if you are worried about performance regressions.
  • The paginated results should be followed. It looks like your usage of octokit does not currently do this. The docs for how to implement pagination for octokit can be found here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions