Bare repositories in the working tree are a security risk

From: Glen Choo <chooglen@google.com>
To: git@vger.kernel.org
Cc: Emily Shaffer <emilyshaffer@google.com>,
	justin@justinsteven.com, Taylor Blau <me@ttaylorr.com>
Subject: Bare repositories in the working tree are a security risk
Date: Wed, 06 Apr 2022 15:43:08 -0700	[thread overview]
Message-ID: <kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com> (raw)

Hi all,

My Google colleagues and I would like to address what we think is a security
risk - namely that Git repositories can contain valid bare repositories in their
working trees. This is based on an excellent article by Justin Steven [1] (CC-ed
here).

Below is a discussion of:

* The risky behavior - what Git does and why it is risky
* Possible responses to this risk
* A proposed approach

The proposed changes are nontrivial, so I’d really appreciate any feedback here.
Unfortunately, I will be out of the office and won’t respond to emails for the
next 7 days or so, but there will still be at least one Google Git team member
keeping tabs on the discussion :)

= TL;DR

Git repositories should not be allowed to contain bare repositories in their
working trees because:

* Such bare repositories can have maliciously crafted config files that cause
  `git` to execute arbitrary code.
* Git itself can distribute the malicious repo via `git clone`; no need to
  distribute repos out of band e.g. via tarballs [2].
* Many `git` commands can be affected by malicious config files, and many users
  have tools that will run `git` in the current directory or the subdirectories
  of a repo. Once the malicious repo has been cloned, very little social
  engineering is needed; the user might only need to open the repo in an editor
  or `cd` into the correct subdirectory.

= Background

(This section is primarily a summary of [1]. I highly, highly recommend reading
that as it describes the issue in much more detail and is extremely readable
regardless of Git experience.)

Certain Git configuration options are particularly enticing targets for
attackers, e.g. core.fsmonitor can execute arbitrary commands and is invoked
on many innocuous-looking `git` commands (like `git status`). This is even more
risky when one considers that many tools (like shell prompts and IDEs) will run
`git` opportunistically inside a repository - so many users won't even need to
explicitly run `git` to be affected [3].

Since config files are such an enticing target for attackers, Git intentionally
avoids distributing config files with repos - a user shouldn't be able to `git
clone` a repo with a config file (or really, any files inside .git). However,
one can 'trick' Git into doing this by embedding a bare repository inside of
another, containing repository: a repository can contain subdirectories that are
valid bare repositories, and any `git` operations run in such a subdirectory
will then use the bare repository’s config file instead of the "main"
repository’s.

An attack might look like this:

* Attacker creates a repository where subdirectory "Documentation/" is a bare
  repository i.e. it contains "HEAD", "refs/" and "objects/" [4]. Attacker
  also adds "config" with a malicious setting for core.fsmonitor.
* Attacker convinces User to read their project's documentation by `git
  clone`-ing their repository and inspecting the "Documentation/" directory.
* User cd-s into "Documentation/" and their shell prompt runs `git status`,
  executing the core.fsmonitor command defined by Attacker.

= What can we do about it?

Each subsection is an alternative and an analysis (+/- are pros/cons).

== 1. Prevent users from checking out bare repos

This is similar to an existing mitigation where we prevent repository entries
that can be confused for ".git" (like ".GIT"). but it requires checking multiple
entries instead of a single entry. I suspect that we could accomplish this in
one of two ways:

a. Prevent bare repos from entering the index.
b. Prevent writing bare repos to the working tree.

+ Relatively robust protection - because the malicious repo never enters the
  working tree, we even protect other tools (e.g. JGit) from doing dangerous
  things in the embedded repo (provided the checkout is done with `git`, of
  course).
- This breaks some 'valid' workflows (e.g. someone embedding a bare repo as a
  more convenient alternative to submodules), but it seems reasonable to let
  users opt out of this behavior.
- (1a) is difficult to do in practice because many code paths add entries to
  the index, and checking a combination of new entry and existing entries is
  much trickier than checking just the new entry.
- (1b) might also be difficult, though not as difficult as 1a because we
  already have a complete list of entries we will write. I don’t think there
  are existing facilities that do this sort of checking of multiple entries.

== 2. Detect and reject bare repos using `git fsck` and `transfer.fsckObjects`.

This entails checking for the markers of a bare repository (HEAD, refs/,
objects/) in tree objects. This shares a precedent with (1), since `git fsck`
will also detect ".GIT".

+ Most reputable hosting sites set `transfer.fsckObjects`, which allows them to
  detect and prevent this kind of transmission.
+ Confers some protection to users even without them doing anything.
+ Easy to visualize and to write.
- This won’t directly protect most users because they don’t typically set
  `transfer.fsckObjects` (in fact, `transfer.fsckObjects` will render many
  projects, like linux.git, uncloneable without additional config)
- Won’t guard against malicious/poorly configured hosts.

== 3. Detect that we are in an embedded bare repo and ignore the embedded bare
 repository in favor of the containing repo.

For example, if setup.c detects that we are in a bare repo, it could then walk
up the directory hierarchy to see if there is a containing repo that tracks the
bare one. If so, setup.c chooses to use the containing repo instead.

+ Extremely robust; this even protects against a checkout by an earlier Git
  version.
+ Users who don’t use bare repos won’t even notice the difference.
- The change in rules breaks some legitimate workflows e.g. a user with a repo
  at HOME and bare repos underneath.
- Potentially very expensive for bare repo users because setup.c will likely
  walk up to the root directory; we’d be essentially *forcing* those users to
  set GIT_CEILING_DIRECTORIES.
- Doesn’t protect users who use other tools e.g. JGit.

== 4. Educate users about this risk without making code changes.

Some risks fall into this category e.g. "Avoid unarchiving repositories because
.git/config might be poisoned." [2].

+ We don’t break any users.
- Breaks the trust that users have in `git clone`.
- IMO "Inspect every repo in its entirety before cloning it" is too much of a
  burden to put on users.

= Next steps

I propose that we prevent repositories from containing bare repositories by
doing the following (in order):

* Implement (2) by adding a new fsck message "embeddedBareRepo".
  * When this is done, hosting sites can hopefully use this capability to
    prevent transmission, and help us understand the prevalence of such attacks.
* Implement (1b) by teaching unpack_trees.c to check whether the tree contains
  an entire bare repo, and die() if so. This will be guarded by a
  defaults-to-true config value.
  * This would only block a bare repo from being written in a single operation.
    It wouldn’t stop a user from writing a bare repo entry-by-entry using "git
    checkout <path>", but the amount of social engineering required probably
    renders this attack infeasible.
  * As I noted earlier, I foresee some difficulty actually implementing this
    because I don’t think we have facilities for checking multiple tree entries
    at once.

I am particularly interested in hearing feedback about (1b), namely:

* How to actually teach unpack_trees.c to detect bare repos.
* Whether preventing bare repos in unpack_trees.c is a good enough mitigation
  (e.g. Are there other code paths that should block bare repos? Should we be
  checking the index instead of the tree?).

I have a patch that does (2); I will send that out and we can leave feedback on
that separately.

= Demonstration

This is based on a script by Taylor Blau (thanks!). [1] also contains a
demonstration that runs in Docker.

  #!/bin/sh

  rm -fr malicious cloned &&git init malicious &&

  (
  cd malicious &&

  mkdir -p bare &&
  cd bare &&

  echo 'ref: refs/heads/main' >HEAD &&
  cat >config <<-EOF
  [core]
  repositoryformatversion = 0
  filemode = true
  bare = false
  worktree = "worktree"
  fsmonitor = "echo pwned >&2; false"
  EOF

  mkdir objects refs worktree &&
  touch worktree/.gitkeep &&

  git add . &&
  git commit -m ".gitkeep" &&

  cd .. &&
  git add bare &&
  git commit -m 'initial commit'
  ) &&

  git clone --no-local malicious cloned &&
  cd cloned/bare &&
  git status # pwned

= Footnotes

[1] https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md. 
[2] Archived repositories with malicious .git directories are a known risk. See
 https://lore.kernel.org/git/20171003123239.lisk43a2goxtxkro@sigill.intra.peff.net/
 for an on-list discussion. This is also described in
 https://blog.sonarsource.com/securing-developer-tools-git-integrations
 (referenced in [1]).
[3] We even ship such a tool - contrib/completion/git-prompt.sh. A user can pwn
 themselves with tab completion even if they don’t have a prompt configured
[4] Any directory containing these entries will be recognized as a bare repo, so
 an attacker could add arbitrary entries to the directory to obfuscate the fact
 that it is a bare repo.
[*] https://offensi.com/2019/12/16/4-google-cloud-shell-bugs-explained-bug-3/ is
 similar to [1], but uses hooks instead of core.fsmonitor.