The unarchive module in ansible is super handy for passing around compressed files and then extracting them on the remote system. It supports a number of archive formats and also one other feature that I like…creates.

There is, however, one thing that bugs me about creates specifically with respect to the unarchive module; you can make it misbehave.

When unarchive runs (and the creates argument is specified), it will check for the existence of the file before it bothers extracting the archive.

Let’s consider what happens though when you have a very large archive and you invoke the famed ctrl+c to the ansible-playbook command.

Here’s my playbook

# unarchive-qcow.yaml
- name: Example of tricking creates
  hosts: localhost
  gather_facts: true
  connection: local

  tasks:

      - name: Unarchive qcow
        unarchive:
            src: "/tmp/some-big-archive.zip"
            dest: "/root/"
            copy: "no"
            creates: "/root/image.qcow2"

And here’s our invocation

ansible-playbook -i notahost, unarchive-qcow.yaml

If you run this playbook normally, and let it finish, you would get the expected output from a sha1sum command. Alternatively we could use the stat module to get the same information.

ddadbaf616e27d9bb8095b46fc4198e04dfe5ce2

That is the correct checksum for this file. What happens though if you kill the playbook someway through the extraction of the archive?

f8d1619a1baf91debd45513cb9f9f55c4d81d750

Oh hai there incorrect checksum.

Taking it a step further though, you get yourself into a “situation” the next time you pass the playbook back through ansible. Namely this message

ok: [localhost] => {
  "changed": false,
  "msg": "skipped, since /root/image.qcow2 exists"
}

Uh oh. That’s suggesting that we don’t need to re-unarchive the file. I mean…it exists. However, it exists incorrectly.

A checksum check would be more appropriate in this case.

Maybe we could re-vamp our playbook to address those concerns.

# unarchive-qcow.yaml
- name: Example of tricking creates
  hosts: localhost
  gather_facts: true
  connection: local

  vars:
      checksum: ddadbaf616e27d9bb8095b46fc4198e04dfe5ce2

  tasks:
      - name: Stat the qcow
        stat:
            path: "/root/image.qcow2"
        register: st

      - name: Unarchive qcow
        unarchive:
            src: "/tmp/some-big-archive.zip"
            dest: "/root/"
            copy: "no"
        when: st.stat.checksum|default('None') != checksum

Take a moment to digest that.

The changes we made were to completely eliminate the creates argument to unarchive and replace it with a step to get the checksum of the file before we attempt to extract it.

We also provide the checksum that should be expected of the file.

Our when condition also include a default() filter because the file itself may not initially exist and this would cause ansible to stop execution with and “error while evaluating conditional” message if it were not there.

I specifically do not check for defined-ness of st, because st will always be defined be defined by the stat module. I could check for defined-ness of st.stat.checksum, but since I want to re-unarchive the file if the checksum is incorrect, it’s simpler to just default to something that could never possibly be a checksum; the value “None”

With those changes in place, our un-archive steps behave as we initially intended and we will not be faced with awkward VM or CT boot errors due to corrupt images that, on the surface, should not appear to exist because “well the image is there bro”.

It is, except it’s not all there.