All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Cai <johncai86@gmail.com>
To: Glen Choo <chooglen@google.com>
Cc: git@vger.kernel.org, Emily Shaffer <emilyshaffer@google.com>,
	justin@justinsteven.com, Taylor Blau <me@ttaylorr.com>
Subject: Re: Bare repositories in the working tree are a security risk
Date: Thu, 07 Apr 2022 14:38:41 -0400	[thread overview]
Message-ID: <FF4B21A6-A8B6-4142-B325-B9C9193BD885@gmail.com> (raw)
In-Reply-To: <kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com>

Hi Glen,

This is an interesting find. Thanks for writing it up.

On 6 Apr 2022, at 18:43, Glen Choo wrote:

> Hi all,
>
> My Google colleagues and I would like to address what we think is a security
> risk - namely that Git repositories can contain valid bare repositories in their
> working trees. This is based on an excellent article by Justin Steven [1] (CC-ed
> here).
>
> Below is a discussion of:
>
> * The risky behavior - what Git does and why it is risky
> * Possible responses to this risk
> * A proposed approach
>
> The proposed changes are nontrivial, so I’d really appreciate any feedback here.
> Unfortunately, I will be out of the office and won’t respond to emails for the
> next 7 days or so, but there will still be at least one Google Git team member
> keeping tabs on the discussion :)
>
> = TL;DR
>
> Git repositories should not be allowed to contain bare repositories in their
> working trees because:
>
> * Such bare repositories can have maliciously crafted config files that cause
>   `git` to execute arbitrary code.
> * Git itself can distribute the malicious repo via `git clone`; no need to
>   distribute repos out of band e.g. via tarballs [2].
> * Many `git` commands can be affected by malicious config files, and many users
>   have tools that will run `git` in the current directory or the subdirectories
>   of a repo. Once the malicious repo has been cloned, very little social
>   engineering is needed; the user might only need to open the repo in an editor
>   or `cd` into the correct subdirectory.
>
> = Background
>
> (This section is primarily a summary of [1]. I highly, highly recommend reading
> that as it describes the issue in much more detail and is extremely readable
> regardless of Git experience.)
>
> Certain Git configuration options are particularly enticing targets for
> attackers, e.g. core.fsmonitor can execute arbitrary commands and is invoked
> on many innocuous-looking `git` commands (like `git status`). This is even more
> risky when one considers that many tools (like shell prompts and IDEs) will run
> `git` opportunistically inside a repository - so many users won't even need to
> explicitly run `git` to be affected [3].
>
> Since config files are such an enticing target for attackers, Git intentionally
> avoids distributing config files with repos - a user shouldn't be able to `git
> clone` a repo with a config file (or really, any files inside .git). However,
> one can 'trick' Git into doing this by embedding a bare repository inside of
> another, containing repository: a repository can contain subdirectories that are
> valid bare repositories, and any `git` operations run in such a subdirectory
> will then use the bare repository’s config file instead of the "main"
> repository’s.
>
> An attack might look like this:
>
> * Attacker creates a repository where subdirectory "Documentation/" is a bare
>   repository i.e. it contains "HEAD", "refs/" and "objects/" [4]. Attacker
>   also adds "config" with a malicious setting for core.fsmonitor.
> * Attacker convinces User to read their project's documentation by `git
>   clone`-ing their repository and inspecting the "Documentation/" directory.
> * User cd-s into "Documentation/" and their shell prompt runs `git status`,

This might be obvious to most reading this, but for those trying this out at home,
you'll need to create a worktree that points to the bare repository and then run
`git status` in there.

>   executing the core.fsmonitor command defined by Attacker.

This is easily reproduceable.

>
> = What can we do about it?
>
> Each subsection is an alternative and an analysis (+/- are pros/cons).
>
> == 1. Prevent users from checking out bare repos
>
> This is similar to an existing mitigation where we prevent repository entries
> that can be confused for ".git" (like ".GIT"). but it requires checking multiple
> entries instead of a single entry. I suspect that we could accomplish this in
> one of two ways:
>
> a. Prevent bare repos from entering the index.
> b. Prevent writing bare repos to the working tree.
>
> + Relatively robust protection - because the malicious repo never enters the
>   working tree, we even protect other tools (e.g. JGit) from doing dangerous
>   things in the embedded repo (provided the checkout is done with `git`, of
>   course).
> - This breaks some 'valid' workflows (e.g. someone embedding a bare repo as a
>   more convenient alternative to submodules), but it seems reasonable to let
>   users opt out of this behavior.
> - (1a) is difficult to do in practice because many code paths add entries to
>   the index, and checking a combination of new entry and existing entries is
>   much trickier than checking just the new entry.
> - (1b) might also be difficult, though not as difficult as 1a because we
>   already have a complete list of entries we will write. I don’t think there
>   are existing facilities that do this sort of checking of multiple entries.
>
> == 2. Detect and reject bare repos using `git fsck` and `transfer.fsckObjects`.
>
> This entails checking for the markers of a bare repository (HEAD, refs/,
> objects/) in tree objects. This shares a precedent with (1), since `git fsck`
> will also detect ".GIT".
>
> + Most reputable hosting sites set `transfer.fsckObjects`, which allows them to
>   detect and prevent this kind of transmission.
> + Confers some protection to users even without them doing anything.
> + Easy to visualize and to write.
> - This won’t directly protect most users because they don’t typically set
>   `transfer.fsckObjects` (in fact, `transfer.fsckObjects` will render many
>   projects, like linux.git, uncloneable without additional config)
> - Won’t guard against malicious/poorly configured hosts.
>
> == 3. Detect that we are in an embedded bare repo and ignore the embedded bare
>  repository in favor of the containing repo.
>
> For example, if setup.c detects that we are in a bare repo, it could then walk
> up the directory hierarchy to see if there is a containing repo that tracks the
> bare one. If so, setup.c chooses to use the containing repo instead.
>
> + Extremely robust; this even protects against a checkout by an earlier Git
>   version.
> + Users who don’t use bare repos won’t even notice the difference.
> - The change in rules breaks some legitimate workflows e.g. a user with a repo
>   at HOME and bare repos underneath.
> - Potentially very expensive for bare repo users because setup.c will likely
>   walk up to the root directory; we’d be essentially *forcing* those users to
>   set GIT_CEILING_DIRECTORIES.
> - Doesn’t protect users who use other tools e.g. JGit.
>
> == 4. Educate users about this risk without making code changes.
>
> Some risks fall into this category e.g. "Avoid unarchiving repositories because
> .git/config might be poisoned." [2].
>
> + We don’t break any users.
> - Breaks the trust that users have in `git clone`.
> - IMO "Inspect every repo in its entirety before cloning it" is too much of a
>   burden to put on users.
>
> = Next steps
>
> I propose that we prevent repositories from containing bare repositories by
> doing the following (in order):
>
> * Implement (2) by adding a new fsck message "embeddedBareRepo".
>   * When this is done, hosting sites can hopefully use this capability to
>     prevent transmission, and help us understand the prevalence of such attacks.
> * Implement (1b) by teaching unpack_trees.c to check whether the tree contains
>   an entire bare repo, and die() if so. This will be guarded by a
>   defaults-to-true config value.
>   * This would only block a bare repo from being written in a single operation.
>     It wouldn’t stop a user from writing a bare repo entry-by-entry using "git
>     checkout <path>", but the amount of social engineering required probably
>     renders this attack infeasible.
>   * As I noted earlier, I foresee some difficulty actually implementing this
>     because I don’t think we have facilities for checking multiple tree entries
>     at once.
>
> I am particularly interested in hearing feedback about (1b), namely:
>
> * How to actually teach unpack_trees.c to detect bare repos.
> * Whether preventing bare repos in unpack_trees.c is a good enough mitigation
>   (e.g. Are there other code paths that should block bare repos? Should we be
>   checking the index instead of the tree?).
>
> I have a patch that does (2); I will send that out and we can leave feedback on
> that separately.
>
> = Demonstration
>
> This is based on a script by Taylor Blau (thanks!). [1] also contains a
> demonstration that runs in Docker.
>
>   #!/bin/sh
>
>   rm -fr malicious cloned &&git init malicious &&
>
>   (
>   cd malicious &&
>
>   mkdir -p bare &&
>   cd bare &&
>
>   echo 'ref: refs/heads/main' >HEAD &&
>   cat >config <<-EOF
>   [core]
>   repositoryformatversion = 0
>   filemode = true
>   bare = false
>   worktree = "worktree"
>   fsmonitor = "echo pwned >&2; false"
>   EOF
>
>   mkdir objects refs worktree &&
>   touch worktree/.gitkeep &&
>
>   git add . &&
>   git commit -m ".gitkeep" &&
>
>   cd .. &&
>   git add bare &&
>   git commit -m 'initial commit'
>   ) &&
>
>   git clone --no-local malicious cloned &&
>   cd cloned/bare &&
>   git status # pwned
>
> = Footnotes
>
> [1] https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md.
> [2] Archived repositories with malicious .git directories are a known risk. See
>  https://lore.kernel.org/git/20171003123239.lisk43a2goxtxkro@sigill.intra.peff.net/
>  for an on-list discussion. This is also described in
>  https://blog.sonarsource.com/securing-developer-tools-git-integrations
>  (referenced in [1]).
> [3] We even ship such a tool - contrib/completion/git-prompt.sh. A user can pwn
>  themselves with tab completion even if they don’t have a prompt configured
> [4] Any directory containing these entries will be recognized as a bare repo, so
>  an attacker could add arbitrary entries to the directory to obfuscate the fact
>  that it is a bare repo.
> [*] https://offensi.com/2019/12/16/4-google-cloud-shell-bugs-explained-bug-3/ is
>  similar to [1], but uses hooks instead of core.fsmonitor.

  parent reply	other threads:[~2022-04-07 18:38 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-06 22:43 Bare repositories in the working tree are a security risk Glen Choo
2022-04-06 23:22 ` [PATCH] fsck: detect bare repos in trees and warn Glen Choo
2022-04-07 12:42   ` Johannes Schindelin
2022-04-07 13:21     ` Derrick Stolee
2022-04-07 14:14       ` Ævar Arnfjörð Bjarmason
2022-04-14 20:02         ` Glen Choo
2022-04-15 12:46           ` Ævar Arnfjörð Bjarmason
2022-04-07 15:11       ` Junio C Hamano
2022-04-13 22:24       ` Glen Choo
2022-04-07 13:12   ` Ævar Arnfjörð Bjarmason
2022-04-07 15:20   ` Junio C Hamano
2022-04-07 18:38 ` John Cai [this message]
2022-04-07 21:24 ` Bare repositories in the working tree are a security risk brian m. carlson
2022-04-07 21:53   ` Justin Steven
2022-04-07 22:10     ` brian m. carlson
2022-04-07 22:40       ` rsbecker
2022-04-08  5:54       ` Junio C Hamano
2022-04-14  0:03         ` Junio C Hamano
2022-04-14  0:04         ` Glen Choo
2022-04-13 23:44       ` Glen Choo
2022-04-13 20:37 ` Glen Choo
2022-04-13 23:36   ` Junio C Hamano
2022-04-14 16:41     ` Glen Choo
2022-04-14 17:35       ` Junio C Hamano
2022-04-14 18:19         ` Junio C Hamano
2022-04-15 21:33         ` Glen Choo
2022-04-15 22:17           ` Junio C Hamano
2022-04-16  0:52             ` Taylor Blau
2022-04-15 22:43           ` Glen Choo
2022-04-15 20:13       ` Junio C Hamano
2022-04-15 23:45         ` Glen Choo
2022-04-15 23:59           ` Glen Choo
2022-04-16  1:00           ` Taylor Blau
2022-04-16  1:18             ` Junio C Hamano
2022-04-16  1:30               ` Taylor Blau
2022-04-16  0:34 ` Glen Choo
2022-04-16  0:41 ` Glen Choo
2022-04-16  1:28   ` Taylor Blau
2022-04-21 18:25     ` Emily Shaffer
2022-04-21 18:29       ` Emily Shaffer
2022-04-21 18:47         ` Junio C Hamano
2022-04-21 18:54           ` Taylor Blau
2022-04-21 19:09       ` Taylor Blau
2022-04-21 21:01         ` Emily Shaffer
2022-04-21 21:22           ` Taylor Blau
2022-04-29 23:57     ` Glen Choo
2022-04-30  1:14       ` Taylor Blau
2022-05-02 19:39         ` Glen Choo
2022-05-02 14:05       ` Philip Oakley
2022-05-02 18:50         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FF4B21A6-A8B6-4142-B325-B9C9193BD885@gmail.com \
    --to=johncai86@gmail.com \
    --cc=chooglen@google.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=justin@justinsteven.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.