Automated Updates and Backups as Polycrate Workflows
Fabian Peter 13 Minuten Lesezeit

Automated Updates and Backups as Polycrate Workflows

Automated updates and backups as reproducible Polycrate workflows

TL;DR

  • You build a reusable Polycrate workflow that automatically executes backup → update → verify on your Linux servers – including rollback via Ansible block/rescue.
  • Backups run as pg_dump + tar + upload to S3 with rclone (S3-compatible object storage), updates via apt upgrade plus service restart and health checks, verification through Systemd status and HTTP response.
  • Polycrate workflows give your Ansible actions a name, an order, and documentation – instead of loosely distributed playbooks and README files.
  • Everything runs in the container: No local Python/Ansible chaos, consistent toolchain on every admin workstation, workspaces are fully encrypted if needed.
  • ayedo supports teams in establishing such operational processes as code – from the first block structure to company-wide shared workflows and operations automation demos.

Polycrate Workflows: Operational Processes Instead of CI/CD Pipelines

When you hear “workflow,” you might think of GitHub Actions or GitLab CI. Polycrate workflows are something different:

  • They run where you use Ansible – on your admin workstation, a bastion host, or in an automation container.
  • They orchestrate Polycrate actions (e.g., backup, update, verify) into a named process.
  • They are part of your Polycrate workspace and thus versionable, encryptable, and shareable.

A workflow in Polycrate is a piece of declarative orchestration: “First execute this action in this block, then that action in another block, …”. No CI/CD server needed, no YAML special logic – just your workspace and the Polycrate CLI.

Documentation happens in the code: The workflow is the documentation of the operational process. No additional Word document that doesn’t match reality.

More details on the concept can be found in the documentation on workflows and workspaces.


Scenario: Nightly Linux Maintenance with Backup and Rollback

Starting situation – very typical for Linux/system admins:

  • PostgreSQL database on Ubuntu 24.04
  • Web application as a Systemd service (acme-app.service)
  • Backups should go to S3 (S3-compatible object storage)
  • Updates should run automatically, but with a safety net
  • In the end, you need the assurance: Service is running, HTTP endpoint responds

The process we map as a Polycrate workflow:

  1. backup
    Database dump with pg_dump, create tarball, upload to S3 with rclone, and maintain a latest.tar.gz symlink.

  2. update
    apt update && apt upgrade -y, restart service, health check – all in an Ansible play with block / rescue:

    • Updates and checks run in the block.
    • In rescue, errors automatically restore from latest.tar.gz.
  3. verify
    Explicit verification that the Systemd service is running and an HTTP endpoint responds with status 200.

We orchestrate the whole thing as a workflow nightly-maintenance, which you can, for example, execute every night via a systemd timer.


Workspace Foundation: Blocks, Workflows, Inventory

We start with a Polycrate workspace acme-corp-automation. The structure:

  • workspace.poly – central definition of blocks and workflows
  • inventory.yml – Ansible inventory (YAML, not INI!)
  • blocks/registry.acme-corp.com/acme/infra/linux-maintenance/ – block with actions and playbooks once available locally (on the first polycrate run … or workflow run, Polycrate can install a missing block automatically; polycrate blocks pull remains optional)
  • artifacts/secrets/ – e.g. snippets for rclone (object storage credentials), encrypted if needed

workspace.poly with Workflow Definition

name: acme-corp-automation
organization: acme

blocks:
  - name: linux-maintenance
    from: registry.acme-corp.com/acme/infra/linux-maintenance:0.1.0
    config:
      db_name: "acme_app"
      db_user: "acme"
      db_host: "localhost"
      backup_dir: "/var/backups/acme-db"
      rclone_destination: "acme-s3:acme-backups-db/postgres"
      hosts: "db_servers"
      web_hosts: "web_servers"
      service_name: "acme-app.service"
      health_url: "http://localhost:8080/health"

workflows:
  - name: nightly-maintenance
    description: "Nightly backup → update → verify for Linux application servers"
    steps:
      - name: backup
        block: linux-maintenance
        action: backup
      - name: update
        block: linux-maintenance
        action: update
      - name: verify
        block: linux-maintenance
        action: verify

Important:

  • The block is published on the fictional corporate registry registry.acme-corp.com (not ayedo’s production registry); in from: use the full reference including the tag (0.1.0 matches the block version). You do not need to run polycrate blocks pull first: if the block is missing locally, Polycrate detects that when you run polycrate run … or a workflow and asks whether to install the block automatically.
  • All specific parameters are in config. They are available in the actions via block.config.*.
  • The workflow nightly-maintenance only references block and action names – no paths, no shell scripts.

More on best practices for blocks and workflows: Polycrate Best Practices.

Inventory as YAML

Polycrate automatically sets ANSIBLE_INVENTORY to inventory.yml in the workspace root. All hosts live under all.hosts; under children, groups reference them by hostname only (canonical format as in Multi-server inventories with Ansible):

all:
  vars:
    ansible_user: "ubuntu"
    ansible_ssh_port: 22
    ansible_ssh_common_args: "-o StrictHostKeyChecking=no"
    ansible_python_interpreter: /usr/bin/python3

  hosts:
    db01.acme-corp.com: {}
    app01.acme-corp.com: {}

  children:
    db_servers:
      hosts:
        db01.acme-corp.com:
    web_servers:
      hosts:
        app01.acme-corp.com:

Which group or host runs each action is controlled via block config (hosts / web_hosts) and hosts: "{{ block.config.… }}" in the playbook—not by hard-coded group names alone. No localhost for SSH-based administration: Polycrate runs Ansible in the container; the targets are the inventory hosts. Details: Ansible Integration.


The Maintenance Block: Structure and Actions

The listings below show the block as developed under the full registry path blocks/registry.acme-corp.com/acme/infra/linux-maintenance/—how you version and publish an OCI block with a real from: reference, not a short “demo folder” name. In other workspaces you reference it via from: registry.acme-corp.com/acme/infra/linux-maintenance:0.1.0. If the block is not yet present locally, Polycrate offers to install it automatically on the first polycrate run … or workflow run; polycrate blocks pull remains optional. The block.poly:

name: linux-maintenance
version: 0.1.0
kind: generic

config:
  db_name: "acme_app"
  db_user: "acme"
  db_host: "localhost"
  backup_dir: "/var/backups/acme-db"
  rclone_destination: "acme-s3:acme-backups-db/postgres"
  hosts: "db_servers"
  web_hosts: "web_servers"
  service_name: "acme-app.service"
  health_url: "http://localhost:8080/health"

actions:
  - name: backup
    type: ansible
    playbook: backup.yml

  - name: update
    type: ansible
    playbook: update.yml

  - name: verify
    type: ansible
    playbook: verify.yml

Note: block.poly is plain YAML—no Jinja templating. Defaults live here; overrides go in workspace.poly under config (merge). Expressions like {{ block.config.* }} and Ansible logic belong only in playbooks, tasks, and templates.

Here you see typical Polycrate strengths:

  • Guardrails through the block model: A clear structure instead of playbook sprawl.
  • Good DX: Actions are simply named backup, update, verify – the CLI commands are readable and copy-paste-able.

Run commands for individual actions:

polycrate run linux-maintenance backup
polycrate run linux-maintenance update
polycrate run linux-maintenance verify

Workflow:

polycrate workflows run nightly-maintenance

Backup action: pg_dump, tar, rclone to S3

The file blocks/registry.acme-corp.com/acme/infra/linux-maintenance/backup.yml (simplified example—targets via block.config.hosts as above):

- name: Backup PostgreSQL database
  hosts: "{{ block.config.hosts }}"
  become: true
  gather_facts: false

  vars:
    backup_dir: "{{ block.config.backup_dir }}"
    db_name: "{{ block.config.db_name }}"
    db_user: "{{ block.config.db_user }}"
    db_host: "{{ block.config.db_host }}"
    rclone_destination: "{{ block.config.rclone_destination }}"
    timestamp: "{{ lookup('ansible.builtin.pipe', 'date +%Y%m%d-%H%M%S') }}"
    backup_file: "{{ backup_dir }}/{{ db_name }}-{{ timestamp }}.sql"
    backup_tar: "{{ backup_dir }}/{{ db_name }}-{{ timestamp }}.tar.gz"
    latest_symlink: "{{ backup_dir }}/latest.tar.gz"

  tasks:
    - name: Ensure backup directory exists
      ansible.builtin.file:
        path: "{{ backup_dir }}"
        state: directory
        owner: postgres
        group: postgres
        mode: "0750"

    - name: Dump PostgreSQL database
      ansible.builtin.command:
        cmd: >
          pg_dump -h {{ db_host }} -U {{ db_user }} -F p -f {{ backup_file }} {{ db_name }}
      environment:
        PGPASSWORD: "{{ hostvars[inventory_hostname].pg_password | default('') }}"
      become_user: postgres

    - name: Create tar archive from SQL dump
      ansible.builtin.archive:
        path: "{{ backup_file }}"
        dest: "{{ backup_tar }}"
        format: gz

    - name: Update latest backup symlink
      ansible.builtin.file:
        src: "{{ backup_tar }}"
        dest: "{{ latest_symlink }}"
        state: link
        force: true

    - name: Upload tarball to S3 with rclone
      ansible.builtin.command:
        argv:
          - rclone
          - copy
          - "{{ backup_tar }}"
          - "{{ rclone_destination }}/"

Notes:

  • Upload with rclone (rclone copy … remote:bucket/prefix/) to S3 or S3-compatible object storage; the destination comes from rclone_destination (remote name from rclone.conf on the target host, e.g. acme-s3, plus bucket and prefix). Authentication as usual for rclone (configuration, environment variables; see rclone S3).
  • The latest.tar.gz symlink is our rollback anchor.
  • DB password can come from host variables or – better – from an encrypted workspace secret.

Polycrate comes with built-in workspace encryption using age. There is no polycrate secrets add command. You place sensitive files directly under artifacts/secrets/ (workspace-wide) or artifacts/secrets/BLOCKNAME/ (block-specific); polycrate workspace encrypt encrypts them to *.age and updates .gitignore among other things. Without the Polycrate API you typically set WORKSPACE_ENCRYPTION_KEY before the first encryption – see Workspace encryption (sections on key sources, key generation, and “Encrypting”).

mkdir -p artifacts/secrets
# e.g. rclone configuration (see https://rclone.org/s3/) – plaintext stays local, do not commit
# cp /path/to/rclone.conf artifacts/secrets/rclone.conf

# Without Polycrate API: age private key (see workspace encryption docs, key generation)
# export WORKSPACE_ENCRYPTION_KEY="AGE-SECRET-KEY-..."

polycrate workspace encrypt

No external vault needed – compliance becomes significantly easier.


Update Action: apt upgrade with Ansible block/rescue and Automatic Restore

The update action handles package updates, service restart, and initial health checks. The rollback mechanism is embedded in an Ansible block with rescue.

blocks/registry.acme-corp.com/acme/infra/linux-maintenance/update.yml:

- name: Update application servers with rollback
  hosts: "{{ block.config.web_hosts }}"
  become: true
  gather_facts: false

  vars:
    service_name: "{{ block.config.service_name }}"
    health_url: "{{ block.config.health_url }}"
    backup_dir: "{{ block.config.backup_dir }}"
    latest_backup: "{{ backup_dir }}/latest.tar.gz"

  tasks:
    - name: Ensure latest backup exists before updating
      ansible.builtin.stat:
        path: "{{ latest_backup }}"
      register: backup_stat

    - name: Fail if no latest backup is available
      ansible.builtin.fail:
        msg: "No latest backup found at {{ latest_backup }}. Aborting update."
      when: not backup_stat.stat.exists

    - name: Perform apt upgrade with rollback safety
      block:
        - name: Update apt cache
          ansible.builtin.apt:
            update_cache: true

        - name: Upgrade all packages
          ansible.builtin.apt:
            upgrade: dist
            autoremove: true

        - name: Restart application service
          ansible.builtin.systemd:
            name: "{{ service_name }}"
            state: restarted
            enabled: true

        - name: Wait for application port 8080
          ansible.builtin.wait_for:
            port: 8080
            state: started
            delay: 5
            timeout: 60

        - name: HTTP health-check on application
          ansible.builtin.uri:
            url: "{{ health_url }}"
            status_code: 200
            validate_certs: false

      rescue:
        - name: Log update failure and start rollback
          ansible.builtin.debug:
            msg: "Update failed on {{ inventory_hostname }}, starting rollback from {{ latest_backup }}"

        - name: Stop application service before restore
          ansible.builtin.systemd:
            name: "{{ service_name }}"
            state: stopped

        - name: Restore application data from latest backup
          ansible.builtin.unarchive:
            src: "{{ latest_backup }}"
            dest: "/"
            remote_src: true

        - name: Start application service after rollback
          ansible.builtin.systemd:
            name: "{{ service_name }}"
            state: started
            enabled: true

        - name: HTTP health-check after rollback
          ansible.builtin.uri:
            url: "{{ health_url }}"
            status_code: 200
            validate_certs: false

        - name: Fail play to signal rollback occurred
          ansible.builtin.fail:
            msg: "Update failed and rollback from {{ latest_backup }} was executed."

How the safety net works:

  • If anything in the block fails (e.g., service does not start, HTTP check returns 500), Ansible jumps to the rescue block.
  • The rescue block restores from backup and starts the service again.
  • The final fail task makes it explicit to the workflow (and to you) that something went wrong but state was rolled back.

That is far more robust than a README saying “please take a backup before updates”.


Verify action: status and HTTP checks

The verify action is deliberately simple—it encodes what is often only “felt” checks.

blocks/registry.acme-corp.com/acme/infra/linux-maintenance/verify.yml:

- name: Verify application servers
  hosts: "{{ block.config.web_hosts }}"
  become: true
  gather_facts: false

  vars:
    service_name: "{{ block.config.service_name }}"
    health_url: "{{ block.config.health_url }}"

  tasks:
    - name: Check service status
      ansible.builtin.systemd:
        name: "{{ service_name }}"
      register: service_state

    - name: Fail if service is not running
      ansible.builtin.fail:
        msg: "Service {{ service_name }} is not running."
      when: not service_state.status.ActiveState == "active"

    - name: HTTP health-check
      ansible.builtin.uri:
        url: "{{ health_url }}"
        status_code: 200
        validate_certs: false
      register: health_response

    - name: Show response
      ansible.builtin.debug:
        msg: "Health-check OK: {{ health_response.status }} {{ health_response.msg }}"

This step gives the whole process a clear finish: “Is the application in a good state after backup and update?”


Running the workflow: from a one-off run to a systemd timer

You can run the workflow manually right away:

# From the workspace root
polycrate workflows run nightly-maintenance

Unlike plain Ansible, you do not need your own wrapper:

  • No bash script chaining multiple ansible-playbook invocations.
  • No manual hand-off of variables between steps.
  • No custom logging story—Polycrate handles consistent execution in the container.

Containerization addresses the classic dependency problem:

  • Ansible, Python, rclone, and other tools are defined by the container image.
  • Your team shares the same toolchain regardless of local workstation.
  • You reduce supply-chain risk by controlling the image yourself (Dockerfile.poly / bash build).

More in the CLI reference.

Scheduled execution with a systemd timer

Suppose you run a central automation host (e.g. automation01.acme-corp.com) with Polycrate installed. You can schedule the workflow with a systemd timer.

Service unit /etc/systemd/system/polycrate-nightly-maintenance.service:

[Unit]
Description=Polycrate nightly maintenance workflow

[Service]
Type=oneshot
WorkingDirectory=/opt/polycrate-workspaces/acme-corp-automation
ExecStart=/usr/local/bin/polycrate workflows run nightly-maintenance
User=automation
Group=automation

Timer unit /etc/systemd/system/polycrate-nightly-maintenance.timer:

[Unit]
Description=Run Polycrate nightly maintenance workflow at 2:00

[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true

[Install]
WantedBy=timers.target

Enable:

sudo systemctl daemon-reload
sudo systemctl enable --now polycrate-nightly-maintenance.timer

From then on, the full process runs every night at 2:00—clearly defined, versioned, and repeatable.


Polycrate vs. plain Ansible: what actually changes?

With plain Ansible you would typically:

  • Write several playbooks (backup.yml, update.yml, verify.yml).
  • Build a shell script that calls them in order.
  • Put cron jobs or systemd timers directly on ansible-playbook.
  • Hand your teammates a README explaining what to run in which order.

That works—but you fight:

  • Python/Ansible versions on different admin machines.
  • Individual tool setups (rclone, pg_dump versions, remote configuration).
  • Playbook sprawl without a clear frame.
  • Documentation that goes stale quickly.

With Polycrate:

  • Ansible always runs in the container—no local dependency drift, no version debates.
  • Playbooks get a stable home as blocks with clear actions and configuration.
  • You define full operational processes as workflows and run them by name.
  • You can encrypt the entire workspace in one step—including credentials and secrets.
  • Your team shares blocks via OCI registries or PolyHub—“sharable automation” instead of copy-paste across repos.

The code is not only implementation but documentation: whoever reads the workflow understands the process.


Frequently asked questions

Is a Polycrate workflow the same as a CI/CD workflow?

No. CI/CD workflows (GitHub Actions, GitLab CI, Jenkins pipelines) usually run on code changes in a repository context. Polycrate workflows run where you manage infrastructure and systems:

  • They orchestrate actions in your blocks (e.g. backup, update, verify).
  • They are tightly coupled to your Ansible inventory and operational reality.
  • They can be triggered via CLI, systemd timer, or external orchestrators.

You can combine both: for example, a CI/CD workflow can run polycrate workflows run nightly-maintenance on an automation host when conditions are met.

How do I integrate Polycrate workflows with cron instead of systemd?

If you prefer cron, define a job that cds into the workspace and runs the workflow:

0 2 * * * cd /opt/polycrate-workspaces/acme-corp-automation && /usr/local/bin/polycrate workflows run nightly-maintenance >> /var/log/polycrate-nightly.log 2>&1

The Polycrate advantage stays the same:

  • One stable command per process.
  • Execution still runs containerized with a consistent toolchain.
  • Your Ansible playbooks stay structured in the block model.

How does ayedo help teams adopt such maintenance workflows?

We see these workflows as core to professional operations automation:

  • In workshops and with our consulting packages we help teams migrate existing playbooks into Polycrate blocks and workflows.
  • We support block design, workspace encryption, and integration into existing operations (e.g. systemd, cron, or platform automation).
  • Via PolyHub we publish official blocks you can use as building blocks for your own workflows (monitoring, Kubernetes, security, storage, and more).

More questions? See our FAQ


From routine to reproducibility

In this article you saw a full example of a real operational process as a Polycrate workflow:

  • A backup step that creates database dumps, tarballs them, and uploads reliably to S3.
  • An update step that uses Ansible block/rescue for package upgrades and automatic rollback from the latest backup when something fails.
  • A verify step that explicitly checks service and HTTP endpoint after the run.
  • A workflow that wraps these three steps under a stable name and runs via CLI, systemd, or cron.

Instead of fragmented scripts, implicit dependencies, and stale READMEs you now have:

  • A versioned workspace that describes the reality of your systems.
  • Containerized execution that removes dependency and team inconsistency issues.
  • The ability to share such blocks inside the company or via registries so other teams can use the same standard.

At ayedo we work daily with teams taking this step: from ad-hoc automation to reusable, traceable operational processes. Whether you are structuring your first Ansible playbooks or turning existing runbooks into code—we can help without throwing away your current environment.

If you want to see how such a workflow could look in your environment—with your hosts, compliance needs, and toolchains—find suitable formats and the next step on the workshops overview (including operations automation demos).

Ähnliche Artikel