git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Cai <johncai86@gmail.com>
To: Glen Choo <chooglen@google.com>
Cc: git@vger.kernel.org, Emily Shaffer <emilyshaffer@google.com>,
	justin@justinsteven.com, Taylor Blau <me@ttaylorr.com>
Subject: Re: Bare repositories in the working tree are a security risk
Date: Thu, 07 Apr 2022 14:38:41 -0400	[thread overview]
Message-ID: <FF4B21A6-A8B6-4142-B325-B9C9193BD885@gmail.com> (raw)
In-Reply-To: <kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com>

Hi Glen,

This is an interesting find. Thanks for writing it up.

On 6 Apr 2022, at 18:43, Glen Choo wrote:

> Hi all,
>
> My Google colleagues and I would like to address what we think is a security
> risk - namely that Git repositories can contain valid bare repositories in their
> working trees. This is based on an excellent article by Justin Steven [1] (CC-ed
> here).
>
> Below is a discussion of:
>
> * The risky behavior - what Git does and why it is risky
> * Possible responses to this risk
> * A proposed approach
>
> The proposed changes are nontrivial, so I’d really appreciate any feedback here.
> Unfortunately, I will be out of the office and won’t respond to emails for the
> next 7 days or so, but there will still be at least one Google Git team member
> keeping tabs on the discussion :)
>
> = TL;DR
>
> Git repositories should not be allowed to contain bare repositories in their
> working trees because:
>
> * Such bare repositories can have maliciously crafted config files that cause
>   `git` to execute arbitrary code.
> * Git itself can distribute the malicious repo via `git clone`; no need to
>   distribute repos out of band e.g. via tarballs [2].
> * Many `git` commands can be affected by malicious config files, and many users
>   have tools that will run `git` in the current directory or the subdirectories
>   of a repo. Once the malicious repo has been cloned, very little social
>   engineering is needed; the user might only need to open the repo in an editor
>   or `cd` into the correct subdirectory.
>
> = Background
>
> (This section is primarily a summary of [1]. I highly, highly recommend reading
> that as it describes the issue in much more detail and is extremely readable
> regardless of Git experience.)
>
> Certain Git configuration options are particularly enticing targets for
> attackers, e.g. core.fsmonitor can execute arbitrary commands and is invoked
> on many innocuous-looking `git` commands (like `git status`). This is even more
> risky when one considers that many tools (like shell prompts and IDEs) will run
> `git` opportunistically inside a repository - so many users won't even need to
> explicitly run `git` to be affected [3].
>
> Since config files are such an enticing target for attackers, Git intentionally
> avoids distributing config files with repos - a user shouldn't be able to `git
> clone` a repo with a config file (or really, any files inside .git). However,
> one can 'trick' Git into doing this by embedding a bare repository inside of
> another, containing repository: a repository can contain subdirectories that are
> valid bare repositories, and any `git` operations run in such a subdirectory
> will then use the bare repository’s config file instead of the "main"
> repository’s.
>
> An attack might look like this:
>
> * Attacker creates a repository where subdirectory "Documentation/" is a bare
>   repository i.e. it contains "HEAD", "refs/" and "objects/" [4]. Attacker
>   also adds "config" with a malicious setting for core.fsmonitor.
> * Attacker convinces User to read their project's documentation by `git
>   clone`-ing their repository and inspecting the "Documentation/" directory.
> * User cd-s into "Documentation/" and their shell prompt runs `git status`,

This might be obvious to most reading this, but for those trying this out at home,
you'll need to create a worktree that points to the bare repository and then run
`git status` in there.

>   executing the core.fsmonitor command defined by Attacker.

This is easily reproduceable.

>
> = What can we do about it?
>
> Each subsection is an alternative and an analysis (+/- are pros/cons).
>
> == 1. Prevent users from checking out bare repos
>
> This is similar to an existing mitigation where we prevent repository entries
> that can be confused for ".git" (like ".GIT"). but it requires checking multiple
> entries instead of a single entry. I suspect that we could accomplish this in
> one of two ways:
>
> a. Prevent bare repos from entering the index.
> b. Prevent writing bare repos to the working tree.
>
> + Relatively robust protection - because the malicious repo never enters the
>   working tree, we even protect other tools (e.g. JGit) from doing dangerous
>   things in the embedded repo (provided the checkout is done with `git`, of
>   course).
> - This breaks some 'valid' workflows (e.g. someone embedding a bare repo as a
>   more convenient alternative to submodules), but it seems reasonable to let
>   users opt out of this behavior.
> - (1a) is difficult to do in practice because many code paths add entries to
>   the index, and checking a combination of new entry and existing entries is
>   much trickier than checking just the new entry.
> - (1b) might also be difficult, though not as difficult as 1a because we
>   already have a complete list of entries we will write. I don’t think there
>   are existing facilities that do this sort of checking of multiple entries.
>
> == 2. Detect and reject bare repos using `git fsck` and `transfer.fsckObjects`.
>
> This entails checking for the markers of a bare repository (HEAD, refs/,
> objects/) in tree objects. This shares a precedent with (1), since `git fsck`
> will also detect ".GIT".
>
> + Most reputable hosting sites set `transfer.fsckObjects`, which allows them to
>   detect and prevent this kind of transmission.
> + Confers some protection to users even without them doing anything.
> + Easy to visualize and to write.
> - This won’t directly protect most users because they don’t typically set
>   `transfer.fsckObjects` (in fact, `transfer.fsckObjects` will render many
>   projects, like linux.git, uncloneable without additional config)
> - Won’t guard against malicious/poorly configured hosts.
>
> == 3. Detect that we are in an embedded bare repo and ignore the embedded bare
>  repository in favor of the containing repo.
>
> For example, if setup.c detects that we are in a bare repo, it could then walk
> up the directory hierarchy to see if there is a containing repo that tracks the
> bare one. If so, setup.c chooses to use the containing repo instead.
>
> + Extremely robust; this even protects against a checkout by an earlier Git
>   version.
> + Users who don’t use bare repos won’t even notice the difference.
> - The change in rules breaks some legitimate workflows e.g. a user with a repo
>   at HOME and bare repos underneath.
> - Potentially very expensive for bare repo users because setup.c will likely
>   walk up to the root directory; we’d be essentially *forcing* those users to
>   set GIT_CEILING_DIRECTORIES.
> - Doesn’t protect users who use other tools e.g. JGit.
>
> == 4. Educate users about this risk without making code changes.
>
> Some risks fall into this category e.g. "Avoid unarchiving repositories because
> .git/config might be poisoned." [2].
>
> + We don’t break any users.
> - Breaks the trust that users have in `git clone`.
> - IMO "Inspect every repo in its entirety before cloning it" is too much of a
>   burden to put on users.
>
> = Next steps
>
> I propose that we prevent repositories from containing bare repositories by
> doing the following (in order):
>
> * Implement (2) by adding a new fsck message "embeddedBareRepo".
>   * When this is done, hosting sites can hopefully use this capability to
>     prevent transmission, and help us understand the prevalence of such attacks.
> * Implement (1b) by teaching unpack_trees.c to check whether the tree contains
>   an entire bare repo, and die() if so. This will be guarded by a
>   defaults-to-true config value.
>   * This would only block a bare repo from being written in a single operation.
>     It wouldn’t stop a user from writing a bare repo entry-by-entry using "git
>     checkout <path>", but the amount of social engineering required probably
>     renders this attack infeasible.
>   * As I noted earlier, I foresee some difficulty actually implementing this
>     because I don’t think we have facilities for checking multiple tree entries
>     at once.
>
> I am particularly interested in hearing feedback about (1b), namely:
>
> * How to actually teach unpack_trees.c to detect bare repos.
> * Whether preventing bare repos in unpack_trees.c is a good enough mitigation
>   (e.g. Are there other code paths that should block bare repos? Should we be
>   checking the index instead of the tree?).
>
> I have a patch that does (2); I will send that out and we can leave feedback on
> that separately.
>
> = Demonstration
>
> This is based on a script by Taylor Blau (thanks!). [1] also contains a
> demonstration that runs in Docker.
>
>   #!/bin/sh
>
>   rm -fr malicious cloned &&git init malicious &&
>
>   (
>   cd malicious &&
>
>   mkdir -p bare &&
>   cd bare &&
>
>   echo 'ref: refs/heads/main' >HEAD &&
>   cat >config <<-EOF
>   [core]
>   repositoryformatversion = 0
>   filemode = true
>   bare = false
>   worktree = "worktree"
>   fsmonitor = "echo pwned >&2; false"
>   EOF
>
>   mkdir objects refs worktree &&
>   touch worktree/.gitkeep &&
>
>   git add . &&
>   git commit -m ".gitkeep" &&
>
>   cd .. &&
>   git add bare &&
>   git commit -m 'initial commit'
>   ) &&
>
>   git clone --no-local malicious cloned &&
>   cd cloned/bare &&
>   git status # pwned
>
> = Footnotes
>
> [1] https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md.
> [2] Archived repositories with malicious .git directories are a known risk. See
>  https://lore.kernel.org/git/20171003123239.lisk43a2goxtxkro@sigill.intra.peff.net/
>  for an on-list discussion. This is also described in
>  https://blog.sonarsource.com/securing-developer-tools-git-integrations
>  (referenced in [1]).
> [3] We even ship such a tool - contrib/completion/git-prompt.sh. A user can pwn
>  themselves with tab completion even if they don’t have a prompt configured
> [4] Any directory containing these entries will be recognized as a bare repo, so
>  an attacker could add arbitrary entries to the directory to obfuscate the fact
>  that it is a bare repo.
> [*] https://offensi.com/2019/12/16/4-google-cloud-shell-bugs-explained-bug-3/ is
>  similar to [1], but uses hooks instead of core.fsmonitor.

  parent reply	other threads:[~2022-04-07 18:38 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-06 22:43 Bare repositories in the working tree are a security risk Glen Choo
2022-04-06 23:22 ` [PATCH] fsck: detect bare repos in trees and warn Glen Choo
2022-04-07 12:42   ` Johannes Schindelin
2022-04-07 13:21     ` Derrick Stolee
2022-04-07 14:14       ` Ævar Arnfjörð Bjarmason
2022-04-14 20:02         ` Glen Choo
2022-04-15 12:46           ` Ævar Arnfjörð Bjarmason
2022-04-07 15:11       ` Junio C Hamano
2022-04-13 22:24       ` Glen Choo
2022-04-07 13:12   ` Ævar Arnfjörð Bjarmason
2022-04-07 15:20   ` Junio C Hamano
2022-04-07 18:38 ` John Cai [this message]
2022-04-07 21:24 ` Bare repositories in the working tree are a security risk brian m. carlson
2022-04-07 21:53   ` Justin Steven
2022-04-07 22:10     ` brian m. carlson
2022-04-07 22:40       ` rsbecker
2022-04-08  5:54       ` Junio C Hamano
2022-04-14  0:03         ` Junio C Hamano
2022-04-14  0:04         ` Glen Choo
2022-04-13 23:44       ` Glen Choo
2022-04-13 20:37 ` Glen Choo
2022-04-13 23:36   ` Junio C Hamano
2022-04-14 16:41     ` Glen Choo
2022-04-14 17:35       ` Junio C Hamano
2022-04-14 18:19         ` Junio C Hamano
2022-04-15 21:33         ` Glen Choo
2022-04-15 22:17           ` Junio C Hamano
2022-04-16  0:52             ` Taylor Blau
2022-04-15 22:43           ` Glen Choo
2022-04-15 20:13       ` Junio C Hamano
2022-04-15 23:45         ` Glen Choo
2022-04-15 23:59           ` Glen Choo
2022-04-16  1:00           ` Taylor Blau
2022-04-16  1:18             ` Junio C Hamano
2022-04-16  1:30               ` Taylor Blau
2022-04-16  0:34 ` Glen Choo
2022-04-16  0:41 ` Glen Choo
2022-04-16  1:28   ` Taylor Blau
2022-04-21 18:25     ` Emily Shaffer
2022-04-21 18:29       ` Emily Shaffer
2022-04-21 18:47         ` Junio C Hamano
2022-04-21 18:54           ` Taylor Blau
2022-04-21 19:09       ` Taylor Blau
2022-04-21 21:01         ` Emily Shaffer
2022-04-21 21:22           ` Taylor Blau
2022-04-29 23:57     ` Glen Choo
2022-04-30  1:14       ` Taylor Blau
2022-05-02 19:39         ` Glen Choo
2022-05-02 14:05       ` Philip Oakley
2022-05-02 18:50         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FF4B21A6-A8B6-4142-B325-B9C9193BD885@gmail.com \
    --to=johncai86@gmail.com \
    --cc=chooglen@google.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=justin@justinsteven.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).