Desaster recovery using the GitHub Events API » serra.me

Desaster recovery using the GitHub Events API

Last sunday, an unknown individual compromised the libretro project. First, the attacker hijacked and wiped the project’s buildbot server. After gaining access to the server, he took over a GitHub account of a highly ranked member of the libretro team. Using this account, the attacker destroyed multiple repositories managed by the libretro organization by force-pushing a blank initial commit into each affected repository.

Such attacks are not exactly uncommon and happened multiple times before. At first glance, force-pushing an empty commit into a repository means that any data stored in this repository is lost.

Usually, Git doesn’t delete files

The only thing force-pushing an ’empty’ commit into an existing repository really does is resetting it’s history and invalidating all references to the previous commits. As long as you (or GitHub in this case) don’t invoke Git’s garbage collector, your data is still there.

To restore your repository, all we need is the commit hash of the last legitimate commit, even if we have to rely on a fresh (and seemingly empty) clone of the repository.

Luckily, it is possible to retrieve this commit hash through GitHub’s Events API. For this example, we take a look at the libretro/libretro-samples repository, cloned a couple of hours after the incident. At this point, the repo’s history looks like this:

# git log
commit daf240eecb37aaa15e2ad4c499bb9fb39132ddc9 (HEAD -> master, origin/master, origin/HEAD)
Author: Your Name <you@example.com>
Date:   Sun Aug 16 11:58:28 2020 +0800

    initial

The only thing left is an empty file called README.md.

Every operation in a GitHub repository triggers an event we can access using the API. In this case, we are looking for the events with the type PushEvents. Using curl and jq, we can fetch the data for the offending PushEvent using the following command:

curl https://api.github.com/repos/libretro/libretro-samples/events | jq '.[] | select(.payload.commits[]?.sha=="daf240eecb37aaa15e2ad4c499bb9fb39132ddc9")' | jq '.payload'

This results in the following output:

{
  "push_id": 5535666635,
  "size": 1,
  "distinct_size": 1,
  "ref": "refs/heads/master",
  "head": "daf240eecb37aaa15e2ad4c499bb9fb39132ddc9",
  "before": "094d8d807da9dff5a0ec5ab4958cb68eb8c275ce",
  "commits": [
    {
      "sha": "daf240eecb37aaa15e2ad4c499bb9fb39132ddc9",
      "author": {
        "email": "you@example.com",
        "name": "Your Name"
      },
      "message": "initial",
      "distinct": true,
      "url": "https://api.github.com/repos/libretro/libretro-samples/commits/daf240eecb37aaa15e2ad4c499bb9fb39132ddc9"
    }
  ]
}

The commit right before the incident happend is stored in the before key, which means that in this case 094d8d807da9dff5a0ec5ab4958cb68eb8c275ce is the last known good commit.

To revert back to the ‘good’ commit and recover your code, we need to fetch it and forcefully reset the repository to this commit:

git fetch origin 094d8d807da9dff5a0ec5ab4958cb68eb8c275ce
git reset --hard 094d8d807da9dff5a0ec5ab4958cb68eb8c275ce

Now, we check the history of the repository again – all your commits are back and so is your code. All we have to do now is to (forcefully) push it back into the repository.

Conclusion

As long as you are able to quickly mitigate the attack itself by revoking any illegitimate access, chances are high that you will be able to restore your repositories in this scenario. However, this process relies on the absence of Git’s garbage collection and the knowledge of the last known good commit hash.

This garbage collection performs some maintenance tasks like (really) deleting obsolete data and orphaned commits. We simply don’t know how often GitHub runs garbage collection on it’s repositories. They claim that their API retains the data for 90 days, so I assume that at least during this 90 day time window, restoring the repository is still possible.

If you run your own Git server, you won’t have access to an API like we used here. In this case you need to keep track of the commit hashes on your own. Losing access to the commit hash means losing access to the data.

Leave a Reply

Your email address will not be published.