Re: Bare repositories in the working tree are a security risk

From: John Cai <johncai86@gmail.com>
To: Glen Choo <chooglen@google.com>
Cc: git@vger.kernel.org, Emily Shaffer <emilyshaffer@google.com>,
	justin@justinsteven.com, Taylor Blau <me@ttaylorr.com>
Subject: Re: Bare repositories in the working tree are a security risk
Date: Thu, 07 Apr 2022 14:38:41 -0400	[thread overview]
Message-ID: <FF4B21A6-A8B6-4142-B325-B9C9193BD885@gmail.com> (raw)
In-Reply-To: <kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com>

Hi Glen,

This is an interesting find. Thanks for writing it up.

On 6 Apr 2022, at 18:43, Glen Choo wrote:

> Hi all,
>
> My Google colleagues and I would like to address what we think is a security
> risk - namely that Git repositories can contain valid bare repositories in their
> working trees. This is based on an excellent article by Justin Steven [1] (CC-ed
> here).
>
> Below is a discussion of:
>
> * The risky behavior - what Git does and why it is risky
> * Possible responses to this risk
> * A proposed approach
>
> The proposed changes are nontrivial, so I’d really appreciate any feedback here.
> Unfortunately, I will be out of the office and won’t respond to emails for the
> next 7 days or so, but there will still be at least one Google Git team member
> keeping tabs on the discussion :)
>
> = TL;DR
>
> Git repositories should not be allowed to contain bare repositories in their
> working trees because:
>
> * Such bare repositories can have maliciously crafted config files that cause
>   `git` to execute arbitrary code.
> * Git itself can distribute the malicious repo via `git clone`; no need to
>   distribute repos out of band e.g. via tarballs [2].
> * Many `git` commands can be affected by malicious config files, and many users
>   have tools that will run `git` in the current directory or the subdirectories
>   of a repo. Once the malicious repo has been cloned, very little social
>   engineering is needed; the user might only need to open the repo in an editor
>   or `cd` into the correct subdirectory.
>
> = Background
>
> (This section is primarily a summary of [1]. I highly, highly recommend reading
> that as it describes the issue in much more detail and is extremely readable
> regardless of Git experience.)
>
> Certain Git configuration options are particularly enticing targets for
> attackers, e.g. core.fsmonitor can execute arbitrary commands and is invoked
> on many innocuous-looking `git` commands (like `git status`). This is even more
> risky when one considers that many tools (like shell prompts and IDEs) will run
> `git` opportunistically inside a repository - so many users won't even need to
> explicitly run `git` to be affected [3].
>
> Since config files are such an enticing target for attackers, Git intentionally
> avoids distributing config files with repos - a user shouldn't be able to `git
> clone` a repo with a config file (or really, any files inside .git). However,
> one can 'trick' Git into doing this by embedding a bare repository inside of
> another, containing repository: a repository can contain subdirectories that are
> valid bare repositories, and any `git` operations run in such a subdirectory
> will then use the bare repository’s config file instead of the "main"
> repository’s.
>
> An attack might look like this:
>
> * Attacker creates a repository where subdirectory "Documentation/" is a bare
>   repository i.e. it contains "HEAD", "refs/" and "objects/" [4]. Attacker
>   also adds "config" with a malicious setting for core.fsmonitor.
> * Attacker convinces User to read their project's documentation by `git
>   clone`-ing their repository and inspecting the "Documentation/" directory.
> * User cd-s into "Documentation/" and their shell prompt runs `git status`,

This might be obvious to most reading this, but for those trying this out at home,
you'll need to create a worktree that points to the bare repository and then run
`git status` in there.

>   executing the core.fsmonitor command defined by Attacker.

This is easily reproduceable.

>
> = What can we do about it?
>
> Each subsection is an alternative and an analysis (+/- are pros/cons).
>
> == 1. Prevent users from checking out bare repos
>
> This is similar to an existing mitigation where we prevent repository entries
> that can be confused for ".git" (like ".GIT"). but it requires checking multiple
> entries instead of a single entry. I suspect that we could accomplish this in
> one of two ways:
>
> a. Prevent bare repos from entering the index.
> b. Prevent writing bare repos to the working tree.
>
> + Relatively robust protection - because the malicious repo never enters the
>   working tree, we even protect other tools (e.g. JGit) from doing dangerous
>   things in the embedded repo (provided the checkout is done with `git`, of
>   course).
> - This breaks some 'valid' workflows (e.g. someone embedding a bare repo as a
>   more convenient alternative to submodules), but it seems reasonable to let
>   users opt out of this behavior.
> - (1a) is difficult to do in practice because many code paths add entries to
>   the index, and checking a combination of new entry and existing entries is
>   much trickier than checking just the new entry.
> - (1b) might also be difficult, though not as difficult as 1a because we
>   already have a complete list of entries we will write. I don’t think there
>   are existing facilities that do this sort of checking of multiple entries.
>
> == 2. Detect and reject bare repos using `git fsck` and `transfer.fsckObjects`.
>
> This entails checking for the markers of a bare repository (HEAD, refs/,
> objects/) in tree objects. This shares a precedent with (1), since `git fsck`
> will also detect ".GIT".
>
> + Most reputable hosting sites set `transfer.fsckObjects`, which allows them to
>   detect and prevent this kind of transmission.
> + Confers some protection to users even without them doing anything.
> + Easy to visualize and to write.
> - This won’t directly protect most users because they don’t typically set
>   `transfer.fsckObjects` (in fact, `transfer.fsckObjects` will render many
>   projects, like linux.git, uncloneable without additional config)
> - Won’t guard against malicious/poorly configured hosts.
>
> == 3. Detect that we are in an embedded bare repo and ignore the embedded bare
>  repository in favor of the containing repo.
>
> For example, if setup.c detects that we are in a bare repo, it could then walk
> up the directory hierarchy to see if there is a containing repo that tracks the
> bare one. If so, setup.c chooses to use the containing repo instead.
>
> + Extremely robust; this even protects against a checkout by an earlier Git
>   version.
> + Users who don’t use bare repos won’t even notice the difference.
> - The change in rules breaks some legitimate workflows e.g. a user with a repo
>   at HOME and bare repos underneath.
> - Potentially very expensive for bare repo users because setup.c will likely
>   walk up to the root directory; we’d be essentially *forcing* those users to
>   set GIT_CEILING_DIRECTORIES.
> - Doesn’t protect users who use other tools e.g. JGit.
>
> == 4. Educate users about this risk without making code changes.
>
> Some risks fall into this category e.g. "Avoid unarchiving repositories because
> .git/config might be poisoned." [2].
>
> + We don’t break any users.
> - Breaks the trust that users have in `git clone`.
> - IMO "Inspect every repo in its entirety before cloning it" is too much of a
>   burden to put on users.
>
> = Next steps
>
> I propose that we prevent repositories from containing bare repositories by
> doing the following (in order):
>
> * Implement (2) by adding a new fsck message "embeddedBareRepo".
>   * When this is done, hosting sites can hopefully use this capability to
>     prevent transmission, and help us understand the prevalence of such attacks.
> * Implement (1b) by teaching unpack_trees.c to check whether the tree contains
>   an entire bare repo, and die() if so. This will be guarded by a
>   defaults-to-true config value.
>   * This would only block a bare repo from being written in a single operation.
>     It wouldn’t stop a user from writing a bare repo entry-by-entry using "git
>     checkout <path>", but the amount of social engineering required probably
>     renders this attack infeasible.
>   * As I noted earlier, I foresee some difficulty actually implementing this
>     because I don’t think we have facilities for checking multiple tree entries
>     at once.
>
> I am particularly interested in hearing feedback about (1b), namely:
>
> * How to actually teach unpack_trees.c to detect bare repos.
> * Whether preventing bare repos in unpack_trees.c is a good enough mitigation
>   (e.g. Are there other code paths that should block bare repos? Should we be
>   checking the index instead of the tree?).
>
> I have a patch that does (2); I will send that out and we can leave feedback on
> that separately.
>
> = Demonstration
>
> This is based on a script by Taylor Blau (thanks!). [1] also contains a
> demonstration that runs in Docker.
>
>   #!/bin/sh
>
>   rm -fr malicious cloned &&git init malicious &&
>
>   (
>   cd malicious &&
>
>   mkdir -p bare &&
>   cd bare &&
>
>   echo 'ref: refs/heads/main' >HEAD &&
>   cat >config <<-EOF
>   [core]
>   repositoryformatversion = 0
>   filemode = true
>   bare = false
>   worktree = "worktree"
>   fsmonitor = "echo pwned >&2; false"
>   EOF
>
>   mkdir objects refs worktree &&
>   touch worktree/.gitkeep &&
>
>   git add . &&
>   git commit -m ".gitkeep" &&
>
>   cd .. &&
>   git add bare &&
>   git commit -m 'initial commit'
>   ) &&
>
>   git clone --no-local malicious cloned &&
>   cd cloned/bare &&
>   git status # pwned
>
> = Footnotes
>
> [1] https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md.
> [2] Archived repositories with malicious .git directories are a known risk. See
>  https://lore.kernel.org/git/20171003123239.lisk43a2goxtxkro@sigill.intra.peff.net/
>  for an on-list discussion. This is also described in
>  https://blog.sonarsource.com/securing-developer-tools-git-integrations
>  (referenced in [1]).
> [3] We even ship such a tool - contrib/completion/git-prompt.sh. A user can pwn
>  themselves with tab completion even if they don’t have a prompt configured
> [4] Any directory containing these entries will be recognized as a bare repo, so
>  an attacker could add arbitrary entries to the directory to obfuscate the fact
>  that it is a bare repo.
> [*] https://offensi.com/2019/12/16/4-google-cloud-shell-bugs-explained-bug-3/ is
>  similar to [1], but uses hooks instead of core.fsmonitor.