Swiftorial Logo
Home
Swift Lessons
Matchups
CodeSnaps
Tutorials
Career
Resources

Workflow Resilience in GitHub Actions

1. Introduction

Workflow resilience in GitHub Actions refers to the ability of a CI/CD pipeline to withstand and recover from failures. This involves designing workflows that can handle errors, retries, and fallbacks gracefully.

2. Key Concepts

  • **Error Handling:** Mechanisms to catch and manage errors in workflows.
  • **Retries:** Automatically retrying failed jobs or steps.
  • **Timeouts:** Setting time limits for jobs to prevent indefinite hanging.
  • **Job Dependencies:** Ensuring jobs depend on the successful completion of previous jobs.

3. Implementing Resilience

3.1 Error Handling

Use the `if: failure()` condition to handle errors gracefully:


jobs:
  example_job:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      - name: Run script
        run: ./script.sh
        continue-on-error: true

      - name: Handle failure
        if: failure()
        run: echo "Script failed, taking alternative action"
            

3.2 Retries

Configure job retries using the `retry` keyword:


jobs:
  retry_job:
    runs-on: ubuntu-latest
    steps:
      - name: Run a command
        run: ./unstable_script.sh
        retry: 3
            

3.3 Timeouts

Set a timeout for jobs to prevent them from running indefinitely:


jobs:
  timeout_job:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - name: Long running process
        run: ./long_process.sh
            

3.4 Job Dependencies

Use `needs` to define job dependencies:


jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Build the project
        run: ./build.sh

  test:
    runs-on: ubuntu-latest
    needs: build
    steps:
      - name: Run tests
        run: ./test.sh
            

4. Best Practices

  • Implement comprehensive error handling in all steps.
  • Use retries cautiously to avoid infinite loops.
  • Set reasonable timeouts based on expected execution time.
  • Define clear job dependencies to ensure logical execution order.
  • Regularly review and update workflows for efficiency and reliability.

5. FAQ

What is the maximum number of retries I can set?

You can set a maximum of 10 retries for a job or step.

Can I set different timeouts for different jobs?

Yes, you can set individual timeouts for each job in your workflow.

What happens if a job fails after all retries?

The workflow will be marked as failed, and subsequent jobs that depend on it will not run.