git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bare repositories in the working tree are a security risk
@ 2022-04-06 22:43 Glen Choo
  2022-04-06 23:22 ` [PATCH] fsck: detect bare repos in trees and warn Glen Choo
                   ` (5 more replies)
  0 siblings, 6 replies; 50+ messages in thread
From: Glen Choo @ 2022-04-06 22:43 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, justin, Taylor Blau


Hi all,

My Google colleagues and I would like to address what we think is a security
risk - namely that Git repositories can contain valid bare repositories in their
working trees. This is based on an excellent article by Justin Steven [1] (CC-ed
here).

Below is a discussion of:

* The risky behavior - what Git does and why it is risky
* Possible responses to this risk
* A proposed approach

The proposed changes are nontrivial, so I’d really appreciate any feedback here.
Unfortunately, I will be out of the office and won’t respond to emails for the
next 7 days or so, but there will still be at least one Google Git team member
keeping tabs on the discussion :)

= TL;DR

Git repositories should not be allowed to contain bare repositories in their
working trees because:

* Such bare repositories can have maliciously crafted config files that cause
  `git` to execute arbitrary code.
* Git itself can distribute the malicious repo via `git clone`; no need to
  distribute repos out of band e.g. via tarballs [2].
* Many `git` commands can be affected by malicious config files, and many users
  have tools that will run `git` in the current directory or the subdirectories
  of a repo. Once the malicious repo has been cloned, very little social
  engineering is needed; the user might only need to open the repo in an editor
  or `cd` into the correct subdirectory.

= Background

(This section is primarily a summary of [1]. I highly, highly recommend reading
that as it describes the issue in much more detail and is extremely readable
regardless of Git experience.)

Certain Git configuration options are particularly enticing targets for
attackers, e.g. core.fsmonitor can execute arbitrary commands and is invoked
on many innocuous-looking `git` commands (like `git status`). This is even more
risky when one considers that many tools (like shell prompts and IDEs) will run
`git` opportunistically inside a repository - so many users won't even need to
explicitly run `git` to be affected [3].

Since config files are such an enticing target for attackers, Git intentionally
avoids distributing config files with repos - a user shouldn't be able to `git
clone` a repo with a config file (or really, any files inside .git). However,
one can 'trick' Git into doing this by embedding a bare repository inside of
another, containing repository: a repository can contain subdirectories that are
valid bare repositories, and any `git` operations run in such a subdirectory
will then use the bare repository’s config file instead of the "main"
repository’s.

An attack might look like this:

* Attacker creates a repository where subdirectory "Documentation/" is a bare
  repository i.e. it contains "HEAD", "refs/" and "objects/" [4]. Attacker
  also adds "config" with a malicious setting for core.fsmonitor.
* Attacker convinces User to read their project's documentation by `git
  clone`-ing their repository and inspecting the "Documentation/" directory.
* User cd-s into "Documentation/" and their shell prompt runs `git status`,
  executing the core.fsmonitor command defined by Attacker.

= What can we do about it?

Each subsection is an alternative and an analysis (+/- are pros/cons).

== 1. Prevent users from checking out bare repos

This is similar to an existing mitigation where we prevent repository entries
that can be confused for ".git" (like ".GIT"). but it requires checking multiple
entries instead of a single entry. I suspect that we could accomplish this in
one of two ways:

a. Prevent bare repos from entering the index.
b. Prevent writing bare repos to the working tree.

+ Relatively robust protection - because the malicious repo never enters the
  working tree, we even protect other tools (e.g. JGit) from doing dangerous
  things in the embedded repo (provided the checkout is done with `git`, of
  course).
- This breaks some 'valid' workflows (e.g. someone embedding a bare repo as a
  more convenient alternative to submodules), but it seems reasonable to let
  users opt out of this behavior.
- (1a) is difficult to do in practice because many code paths add entries to
  the index, and checking a combination of new entry and existing entries is
  much trickier than checking just the new entry.
- (1b) might also be difficult, though not as difficult as 1a because we
  already have a complete list of entries we will write. I don’t think there
  are existing facilities that do this sort of checking of multiple entries.
    
== 2. Detect and reject bare repos using `git fsck` and `transfer.fsckObjects`.

This entails checking for the markers of a bare repository (HEAD, refs/,
objects/) in tree objects. This shares a precedent with (1), since `git fsck`
will also detect ".GIT".

+ Most reputable hosting sites set `transfer.fsckObjects`, which allows them to
  detect and prevent this kind of transmission.
+ Confers some protection to users even without them doing anything.
+ Easy to visualize and to write.
- This won’t directly protect most users because they don’t typically set
  `transfer.fsckObjects` (in fact, `transfer.fsckObjects` will render many
  projects, like linux.git, uncloneable without additional config)
- Won’t guard against malicious/poorly configured hosts.

== 3. Detect that we are in an embedded bare repo and ignore the embedded bare
 repository in favor of the containing repo.

For example, if setup.c detects that we are in a bare repo, it could then walk
up the directory hierarchy to see if there is a containing repo that tracks the
bare one. If so, setup.c chooses to use the containing repo instead.

+ Extremely robust; this even protects against a checkout by an earlier Git
  version.
+ Users who don’t use bare repos won’t even notice the difference.
- The change in rules breaks some legitimate workflows e.g. a user with a repo
  at HOME and bare repos underneath.
- Potentially very expensive for bare repo users because setup.c will likely
  walk up to the root directory; we’d be essentially *forcing* those users to
  set GIT_CEILING_DIRECTORIES.
- Doesn’t protect users who use other tools e.g. JGit.
    
== 4. Educate users about this risk without making code changes.

Some risks fall into this category e.g. "Avoid unarchiving repositories because
.git/config might be poisoned." [2].

+ We don’t break any users.
- Breaks the trust that users have in `git clone`.
- IMO "Inspect every repo in its entirety before cloning it" is too much of a
  burden to put on users.

= Next steps

I propose that we prevent repositories from containing bare repositories by
doing the following (in order):

* Implement (2) by adding a new fsck message "embeddedBareRepo".
  * When this is done, hosting sites can hopefully use this capability to
    prevent transmission, and help us understand the prevalence of such attacks.
* Implement (1b) by teaching unpack_trees.c to check whether the tree contains
  an entire bare repo, and die() if so. This will be guarded by a
  defaults-to-true config value.
  * This would only block a bare repo from being written in a single operation.
    It wouldn’t stop a user from writing a bare repo entry-by-entry using "git
    checkout <path>", but the amount of social engineering required probably
    renders this attack infeasible.
  * As I noted earlier, I foresee some difficulty actually implementing this
    because I don’t think we have facilities for checking multiple tree entries
    at once.

I am particularly interested in hearing feedback about (1b), namely:

* How to actually teach unpack_trees.c to detect bare repos.
* Whether preventing bare repos in unpack_trees.c is a good enough mitigation
  (e.g. Are there other code paths that should block bare repos? Should we be
  checking the index instead of the tree?).

I have a patch that does (2); I will send that out and we can leave feedback on
that separately.

= Demonstration

This is based on a script by Taylor Blau (thanks!). [1] also contains a
demonstration that runs in Docker.

  #!/bin/sh

  rm -fr malicious cloned &&git init malicious &&

  (
  cd malicious &&

  mkdir -p bare &&
  cd bare &&

  echo 'ref: refs/heads/main' >HEAD &&
  cat >config <<-EOF
  [core]
  repositoryformatversion = 0
  filemode = true
  bare = false
  worktree = "worktree"
  fsmonitor = "echo pwned >&2; false"
  EOF

  mkdir objects refs worktree &&
  touch worktree/.gitkeep &&

  git add . &&
  git commit -m ".gitkeep" &&

  cd .. &&
  git add bare &&
  git commit -m 'initial commit'
  ) &&

  git clone --no-local malicious cloned &&
  cd cloned/bare &&
  git status # pwned

= Footnotes

[1] https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md. 
[2] Archived repositories with malicious .git directories are a known risk. See
 https://lore.kernel.org/git/20171003123239.lisk43a2goxtxkro@sigill.intra.peff.net/
 for an on-list discussion. This is also described in
 https://blog.sonarsource.com/securing-developer-tools-git-integrations
 (referenced in [1]).
[3] We even ship such a tool - contrib/completion/git-prompt.sh. A user can pwn
 themselves with tab completion even if they don’t have a prompt configured
[4] Any directory containing these entries will be recognized as a bare repo, so
 an attacker could add arbitrary entries to the directory to obfuscate the fact
 that it is a bare repo.
[*] https://offensi.com/2019/12/16/4-google-cloud-shell-bugs-explained-bug-3/ is
 similar to [1], but uses hooks instead of core.fsmonitor.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH] fsck: detect bare repos in trees and warn
  2022-04-06 22:43 Bare repositories in the working tree are a security risk Glen Choo
@ 2022-04-06 23:22 ` Glen Choo
  2022-04-07 12:42   ` Johannes Schindelin
                     ` (2 more replies)
  2022-04-07 18:38 ` Bare repositories in the working tree are a security risk John Cai
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 50+ messages in thread
From: Glen Choo @ 2022-04-06 23:22 UTC (permalink / raw)
  To: git; +Cc: Glen Choo

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="kjj whjj", Size: 4768 bytes --]

Git tries not to distribute configs in-repo because they are a security
risk. However, an attacker can do exactly this if they embed a bare
repo inside of another repo.

Teach fsck to detect whether a tree object contains a bare repo (as
determined by setup.c) and warn. This will help hosting sites detect and
prevent transmission of such malicious repos.

See [1] for a more in-depth discussion, including future steps and
alternatives.

[1] https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com/

Signed-off-by: Glen Choo <chooglen@google.com>
---
 fsck.c          | 19 +++++++++++++++++++
 fsck.h          |  1 +
 setup.c         |  4 ++++
 t/t1450-fsck.sh | 36 ++++++++++++++++++++++++++++++++++++
 4 files changed, 60 insertions(+)

diff --git a/fsck.c b/fsck.c
index 3ec500d707..11c11c348a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -573,6 +573,9 @@ static int fsck_tree(const struct object_id *tree_oid,
 	int has_bad_modes = 0;
 	int has_dup_entries = 0;
 	int not_properly_sorted = 0;
+	int has_head = 0;
+	int has_refs_entry = 0;
+	int has_objects_entry = 0;
 	struct tree_desc desc;
 	unsigned o_mode;
 	const char *o_name;
@@ -602,6 +605,12 @@ static int fsck_tree(const struct object_id *tree_oid,
 		has_dotdot |= !strcmp(name, "..");
 		has_dotgit |= is_hfs_dotgit(name) || is_ntfs_dotgit(name);
 		has_zero_pad |= *(char *)desc.buffer == '0';
+		has_head |= !strcasecmp(name, "HEAD")
+			&& (S_ISLNK(mode) || S_ISREG(mode));
+		has_refs_entry |= !strcasecmp(name, "refs")
+			&& (S_ISLNK(mode) || S_ISDIR(mode));
+		has_objects_entry |= !strcasecmp(name, "objects")
+			&& (S_ISLNK(mode) || S_ISDIR(mode));
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
@@ -739,6 +748,16 @@ static int fsck_tree(const struct object_id *tree_oid,
 		retval += report(options, tree_oid, OBJ_TREE,
 				 FSCK_MSG_TREE_NOT_SORTED,
 				 "not properly sorted");
+	/*
+	 * Determine if this tree looks like a bare repository according
+	 * to the rules of setup.c. If those are changed, this should be
+	 * changed too.
+	 */
+	if (has_head && has_refs_entry && has_objects_entry)
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_EMBEDDED_BARE_REPO,
+				 "contains bare repository");
+
 	return retval;
 }
 
diff --git a/fsck.h b/fsck.h
index d07f7a2459..3f0f73b0f3 100644
--- a/fsck.h
+++ b/fsck.h
@@ -65,6 +65,7 @@ enum fsck_msg_type {
 	FUNC(NULL_SHA1, WARN) \
 	FUNC(ZERO_PADDED_FILEMODE, WARN) \
 	FUNC(NUL_IN_COMMIT, WARN) \
+	FUNC(EMBEDDED_BARE_REPO, WARN) \
 	/* infos (reported as warnings, but ignored by default) */ \
 	FUNC(GITMODULES_PARSE, INFO) \
 	FUNC(GITIGNORE_SYMLINK, INFO) \
diff --git a/setup.c b/setup.c
index 04ce33cdcd..2600548776 100644
--- a/setup.c
+++ b/setup.c
@@ -336,6 +336,10 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
  *  - either a HEAD symlink or a HEAD file that is formatted as
  *    a proper "ref:", or a regular file HEAD that has a properly
  *    formatted sha1 object name.
+ *
+ * fsck.c checks for bare repositories in trees using similar rules, but a
+ * duplicated implementation. If these are changed, the correspnding code in
+ * fsck.c should change too.
  */
 int is_git_directory(const char *suspect)
 {
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index de50c0ea01..a65827bc03 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -563,6 +563,42 @@ dot-backslash-case .\\\\.GIT\\\\foobar
 dotgit-case-backslash .git\\\\foobar
 EOF
 
+test_expect_success "fsck notices bare repo" '
+(
+	mkdir -p embedded-bare-repo/bare &&
+	git init embedded-bare-repo &&
+	(
+		cd embedded-bare-repo/bare &&
+		echo content >HEAD &&
+		mkdir refs/ objects/ &&
+		echo content >refs/foo &&
+		echo content >objects/foo &&
+		git add . &&
+		git commit -m base &&
+		bad_tree=$(git rev-parse HEAD:bare) &&
+		git fsck 2>out &&
+		test_i18ngrep "warning.*tree $bad_tree: embeddedBareRepo: contains bare repository" out
+	)
+)'
+
+test_expect_success "fsck notices bare repo with odd casing" '
+(
+	mkdir -p embedded-bare-repo-case/bare &&
+	git init embedded-bare-repo-case &&
+	(
+		cd embedded-bare-repo-case/bare &&
+		echo content >heAD &&
+		mkdir Refs/ objectS/ &&
+		echo content >Refs/foo &&
+		echo content >objectS/foo &&
+		git add . &&
+		git commit -m base &&
+		bad_tree=$(git rev-parse HEAD:bare) &&
+		git fsck 2>out &&
+		test_i18ngrep "warning.*tree $bad_tree: embeddedBareRepo: contains bare repository" out
+	)
+)'
+
 test_expect_success 'fsck allows .Ňit' '
 	(
 		git init not-dotgit &&

base-commit: 805e0a68082a217f0112db9ee86a022227a9c81b
-- 
2.33.GIT


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH] fsck: detect bare repos in trees and warn
  2022-04-06 23:22 ` [PATCH] fsck: detect bare repos in trees and warn Glen Choo
@ 2022-04-07 12:42   ` Johannes Schindelin
  2022-04-07 13:21     ` Derrick Stolee
  2022-04-07 13:12   ` Ævar Arnfjörð Bjarmason
  2022-04-07 15:20   ` Junio C Hamano
  2 siblings, 1 reply; 50+ messages in thread
From: Johannes Schindelin @ 2022-04-07 12:42 UTC (permalink / raw)
  To: Glen Choo; +Cc: git

Hi Glen,

On Wed, 6 Apr 2022, Glen Choo wrote:

> Git tries not to distribute configs in-repo because they are a security
> risk. However, an attacker can do exactly this if they embed a bare
> repo inside of another repo.
>
> Teach fsck to detect whether a tree object contains a bare repo (as
> determined by setup.c) and warn. This will help hosting sites detect and
> prevent transmission of such malicious repos.
>
> See [1] for a more in-depth discussion, including future steps and
> alternatives.
>
> [1] https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com/

Out of curiosity: does this new check trigger with
https://github.com/libgit2/libgit2? AFAIR it has embedded repositories
that are used in its test suite. In other words, libgit2 has a legitimate
use case for embedded bare repositories, I believe.

Thank you,
Johannes

>
> Signed-off-by: Glen Choo <chooglen@google.com>
> ---
>  fsck.c          | 19 +++++++++++++++++++
>  fsck.h          |  1 +
>  setup.c         |  4 ++++
>  t/t1450-fsck.sh | 36 ++++++++++++++++++++++++++++++++++++
>  4 files changed, 60 insertions(+)
>
> diff --git a/fsck.c b/fsck.c
> index 3ec500d707..11c11c348a 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -573,6 +573,9 @@ static int fsck_tree(const struct object_id *tree_oid,
>  	int has_bad_modes = 0;
>  	int has_dup_entries = 0;
>  	int not_properly_sorted = 0;
> +	int has_head = 0;
> +	int has_refs_entry = 0;
> +	int has_objects_entry = 0;
>  	struct tree_desc desc;
>  	unsigned o_mode;
>  	const char *o_name;
> @@ -602,6 +605,12 @@ static int fsck_tree(const struct object_id *tree_oid,
>  		has_dotdot |= !strcmp(name, "..");
>  		has_dotgit |= is_hfs_dotgit(name) || is_ntfs_dotgit(name);
>  		has_zero_pad |= *(char *)desc.buffer == '0';
> +		has_head |= !strcasecmp(name, "HEAD")
> +			&& (S_ISLNK(mode) || S_ISREG(mode));
> +		has_refs_entry |= !strcasecmp(name, "refs")
> +			&& (S_ISLNK(mode) || S_ISDIR(mode));
> +		has_objects_entry |= !strcasecmp(name, "objects")
> +			&& (S_ISLNK(mode) || S_ISDIR(mode));
>
>  		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
>  			if (!S_ISLNK(mode))
> @@ -739,6 +748,16 @@ static int fsck_tree(const struct object_id *tree_oid,
>  		retval += report(options, tree_oid, OBJ_TREE,
>  				 FSCK_MSG_TREE_NOT_SORTED,
>  				 "not properly sorted");
> +	/*
> +	 * Determine if this tree looks like a bare repository according
> +	 * to the rules of setup.c. If those are changed, this should be
> +	 * changed too.
> +	 */
> +	if (has_head && has_refs_entry && has_objects_entry)
> +		retval += report(options, tree_oid, OBJ_TREE,
> +				 FSCK_MSG_EMBEDDED_BARE_REPO,
> +				 "contains bare repository");
> +
>  	return retval;
>  }
>
> diff --git a/fsck.h b/fsck.h
> index d07f7a2459..3f0f73b0f3 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -65,6 +65,7 @@ enum fsck_msg_type {
>  	FUNC(NULL_SHA1, WARN) \
>  	FUNC(ZERO_PADDED_FILEMODE, WARN) \
>  	FUNC(NUL_IN_COMMIT, WARN) \
> +	FUNC(EMBEDDED_BARE_REPO, WARN) \
>  	/* infos (reported as warnings, but ignored by default) */ \
>  	FUNC(GITMODULES_PARSE, INFO) \
>  	FUNC(GITIGNORE_SYMLINK, INFO) \
> diff --git a/setup.c b/setup.c
> index 04ce33cdcd..2600548776 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -336,6 +336,10 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
>   *  - either a HEAD symlink or a HEAD file that is formatted as
>   *    a proper "ref:", or a regular file HEAD that has a properly
>   *    formatted sha1 object name.
> + *
> + * fsck.c checks for bare repositories in trees using similar rules, but a
> + * duplicated implementation. If these are changed, the correspnding code in
> + * fsck.c should change too.
>   */
>  int is_git_directory(const char *suspect)
>  {
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index de50c0ea01..a65827bc03 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -563,6 +563,42 @@ dot-backslash-case .\\\\.GIT\\\\foobar
>  dotgit-case-backslash .git\\\\foobar
>  EOF
>
> +test_expect_success "fsck notices bare repo" '
> +(
> +	mkdir -p embedded-bare-repo/bare &&
> +	git init embedded-bare-repo &&
> +	(
> +		cd embedded-bare-repo/bare &&
> +		echo content >HEAD &&
> +		mkdir refs/ objects/ &&
> +		echo content >refs/foo &&
> +		echo content >objects/foo &&
> +		git add . &&
> +		git commit -m base &&
> +		bad_tree=$(git rev-parse HEAD:bare) &&
> +		git fsck 2>out &&
> +		test_i18ngrep "warning.*tree $bad_tree: embeddedBareRepo: contains bare repository" out
> +	)
> +)'
> +
> +test_expect_success "fsck notices bare repo with odd casing" '
> +(
> +	mkdir -p embedded-bare-repo-case/bare &&
> +	git init embedded-bare-repo-case &&
> +	(
> +		cd embedded-bare-repo-case/bare &&
> +		echo content >heAD &&
> +		mkdir Refs/ objectS/ &&
> +		echo content >Refs/foo &&
> +		echo content >objectS/foo &&
> +		git add . &&
> +		git commit -m base &&
> +		bad_tree=$(git rev-parse HEAD:bare) &&
> +		git fsck 2>out &&
> +		test_i18ngrep "warning.*tree $bad_tree: embeddedBareRepo: contains bare repository" out
> +	)
> +)'
> +
>  test_expect_success 'fsck allows .??it' '
>  	(
>  		git init not-dotgit &&
>
> base-commit: 805e0a68082a217f0112db9ee86a022227a9c81b
> --
> 2.33.GIT
>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] fsck: detect bare repos in trees and warn
  2022-04-06 23:22 ` [PATCH] fsck: detect bare repos in trees and warn Glen Choo
  2022-04-07 12:42   ` Johannes Schindelin
@ 2022-04-07 13:12   ` Ævar Arnfjörð Bjarmason
  2022-04-07 15:20   ` Junio C Hamano
  2 siblings, 0 replies; 50+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-04-07 13:12 UTC (permalink / raw)
  To: Glen Choo; +Cc: git


On Wed, Apr 06 2022, Glen Choo wrote:

> @@ -602,6 +605,12 @@ static int fsck_tree(const struct object_id *tree_oid,
>  		has_dotdot |= !strcmp(name, "..");
>  		has_dotgit |= is_hfs_dotgit(name) || is_ntfs_dotgit(name);
>  		has_zero_pad |= *(char *)desc.buffer == '0';
> +		has_head |= !strcasecmp(name, "HEAD")
> +			&& (S_ISLNK(mode) || S_ISREG(mode));
> +		has_refs_entry |= !strcasecmp(name, "refs")
> +			&& (S_ISLNK(mode) || S_ISDIR(mode));
> +		has_objects_entry |= !strcasecmp(name, "objects")
> +			&& (S_ISLNK(mode) || S_ISDIR(mode));

Doesn't this code need to use is_hfs_dot_str() instead of strcasecmp()
like the other similar checks?

> @@ -336,6 +336,10 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
>   *  - either a HEAD symlink or a HEAD file that is formatted as
>   *    a proper "ref:", or a regular file HEAD that has a properly
>   *    formatted sha1 object name.
> + *
> + * fsck.c checks for bare repositories in trees using similar rules, but a
> + * duplicated implementation. If these are changed, the correspnding code in
> + * fsck.c should change too.
>   */

Probably took much hassle to factor these so it can be re-used. Typo:
correspnding.

> +		test_i18ngrep "warning.*tree $bad_tree: embeddedBareRepo: contains bare repository" out

s/test_i18ngrep/grep/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] fsck: detect bare repos in trees and warn
  2022-04-07 12:42   ` Johannes Schindelin
@ 2022-04-07 13:21     ` Derrick Stolee
  2022-04-07 14:14       ` Ævar Arnfjörð Bjarmason
                         ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Derrick Stolee @ 2022-04-07 13:21 UTC (permalink / raw)
  To: Johannes Schindelin, Glen Choo; +Cc: git

On 4/7/2022 8:42 AM, Johannes Schindelin wrote:
> Hi Glen,
> 
> On Wed, 6 Apr 2022, Glen Choo wrote:
> 
>> Git tries not to distribute configs in-repo because they are a security
>> risk. However, an attacker can do exactly this if they embed a bare
>> repo inside of another repo.
>>
>> Teach fsck to detect whether a tree object contains a bare repo (as
>> determined by setup.c) and warn. This will help hosting sites detect and
>> prevent transmission of such malicious repos.
>>
>> See [1] for a more in-depth discussion, including future steps and
>> alternatives.
>>
>> [1] https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com/
> 
> Out of curiosity: does this new check trigger with
> https://github.com/libgit2/libgit2? AFAIR it has embedded repositories
> that are used in its test suite. In other words, libgit2 has a legitimate
> use case for embedded bare repositories, I believe.

It is definitely good to keep in mind that other repositories have
included bare repositories for convenience. I'm not sure that the behavior
of some good actors should outweigh the benefits of protecting against
this attack vector.

The trouble here is: how could the libgit2 repo change their project to
not trigger this warning? These bare repos are in their history forever if
they don't do go through significant work and pain to remove them from
their history. We would want to have a way to make the warnings less
severe for special cases like this.

Simultaneously, we wouldn't want to bless all _forks_ of libgit2.

Finally, the real thing we want to avoid is having the Git client write
these trees to disk, for example during a 'git checkout', unless the user
gives an override. (We would want 'git bisect' to still work on the
libgit2 repo, for example.)

A more complete protection here would be:

 1. Warn when finding a bare repo as a tree (this patch).

 2. Suppress warnings on trusted repos, scoped to a specific set of known
    trees _or_ based on some set of known commits (in case the known trees
    are too large).

 3. Prevent writing a bare repo to the worktree, unless the user provided
    an opt-in to that behavior.

Since your patch is moving in the right direction here, I don't think
steps (2) and (3) are required to move forward with your patch. However,
it is a good opportunity to discuss the full repercussions of this issue.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] fsck: detect bare repos in trees and warn
  2022-04-07 13:21     ` Derrick Stolee
@ 2022-04-07 14:14       ` Ævar Arnfjörð Bjarmason
  2022-04-14 20:02         ` Glen Choo
  2022-04-07 15:11       ` Junio C Hamano
  2022-04-13 22:24       ` Glen Choo
  2 siblings, 1 reply; 50+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-04-07 14:14 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Johannes Schindelin, Glen Choo, git


On Thu, Apr 07 2022, Derrick Stolee wrote:

> A more complete protection here would be:
>
>  1. Warn when finding a bare repo as a tree (this patch).
>
>  2. Suppress warnings on trusted repos, scoped to a specific set of known
>     trees _or_ based on some set of known commits (in case the known trees
>     are too large).
>
>  3. Prevent writing a bare repo to the worktree, unless the user provided
>     an opt-in to that behavior.
>
> Since your patch is moving in the right direction here, I don't think
> steps (2) and (3) are required to move forward with your patch. However,
> it is a good opportunity to discuss the full repercussions of this issue.

Isn't a gentler solution here to:

 1. In setup.c, we detect a repo
 2. Walk up a directory
 3. Do we find a repo?
 4. Does that repo "contain" the first one?
    If yes: die on setup
    If no: it's OK

It also seems to me that there's pretty much perfect overlap between
this and the long-discussed topic of marking a submodule with config
v.s. detecting it on the fly.

Since we're already discussing breaking long-standing repos in the wild
here I think it's good to take a step back and think of a potential more
narrow solution.

There's nothing wrong with having tracked content per-se, it's just that
we ourselves have other code that'll shoot itself in the foot in certain
scenarios, like this one.

But for a newer git that needs to run this fsck check it'll already need
to detect that something "looks like a repo", but if it can do that and
setup.c can walk up from the parent directory & find a repository we can
combine the two.

Of course we'll probably still want an opt-in fsck check for hosting
providers who'll want to protect older clients, but as long as the two
are separated that'll only need to last as long as such clients are
potentially there in the wild.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] fsck: detect bare repos in trees and warn
  2022-04-07 13:21     ` Derrick Stolee
  2022-04-07 14:14       ` Ævar Arnfjörð Bjarmason
@ 2022-04-07 15:11       ` Junio C Hamano
  2022-04-13 22:24       ` Glen Choo
  2 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2022-04-07 15:11 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Johannes Schindelin, Glen Choo, git

Derrick Stolee <derrickstolee@github.com> writes:

> It is definitely good to keep in mind that other repositories have
> included bare repositories for convenience. I'm not sure that the behavior
> of some good actors should outweigh the benefits of protecting against
> this attack vector.

Good line of thinking.

>  2. Suppress warnings on trusted repos, scoped to a specific set of known
>     trees _or_ based on some set of known commits (in case the known trees
>     are too large).

Is "It is OK to check out an embedded repository from this commit or
any of its ancestors" the kind of suppression you meant by "known
commits"?

>  3. Prevent writing a bare repo to the worktree, unless the user provided
>     an opt-in to that behavior.
>
> Since your patch is moving in the right direction here, I don't think
> steps (2) and (3) are required to move forward with your patch. However,
> it is a good opportunity to discuss the full repercussions of this issue.

We can definitely start without (3).  Unleashing (1) before (2) is
ready would mean folks cannot clone projects like libgit2 until
later, which takes us back to the first point you made above.

Those who use embedded bare repositories as test vector can easily
switch to storing an equivalent of "a tarball that is expanded while
building and/or testing", and as long as the user trusts the project
enough to run its build procedure or tests, we are not adding much
to the attack surface, I would guess.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] fsck: detect bare repos in trees and warn
  2022-04-06 23:22 ` [PATCH] fsck: detect bare repos in trees and warn Glen Choo
  2022-04-07 12:42   ` Johannes Schindelin
  2022-04-07 13:12   ` Ævar Arnfjörð Bjarmason
@ 2022-04-07 15:20   ` Junio C Hamano
  2 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2022-04-07 15:20 UTC (permalink / raw)
  To: Glen Choo; +Cc: git

Glen Choo <chooglen@google.com> writes:

> @@ -602,6 +605,12 @@ static int fsck_tree(const struct object_id *tree_oid,
>  		has_dotdot |= !strcmp(name, "..");
>  		has_dotgit |= is_hfs_dotgit(name) || is_ntfs_dotgit(name);
>  		has_zero_pad |= *(char *)desc.buffer == '0';
> +		has_head |= !strcasecmp(name, "HEAD")
> +			&& (S_ISLNK(mode) || S_ISREG(mode));
> +		has_refs_entry |= !strcasecmp(name, "refs")
> +			&& (S_ISLNK(mode) || S_ISDIR(mode));

Sorry, I know I am to blame, but I think it is wrong to insist
"refs" (and others) to be a directory.  setup.c::is_git_directory()
only checks with access(X_OK).  Also, if I am not mistaken, we plan
to take advantage of it and may make refs a regular file, for
example, to signal that the ref-files backend is unwelcome in the
repository.  HEAD can be a symbolic link, not necessarily a regular
file.  We do not create "refs" and "objects" as symbolic links
ourselves, but it is good to see that the code protects us against
such a directory with these entries being one.

> +		has_objects_entry |= !strcasecmp(name, "objects")
> +			&& (S_ISLNK(mode) || S_ISDIR(mode));
>  
>  		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
>  			if (!S_ISLNK(mode))
> @@ -739,6 +748,16 @@ static int fsck_tree(const struct object_id *tree_oid,
>  		retval += report(options, tree_oid, OBJ_TREE,
>  				 FSCK_MSG_TREE_NOT_SORTED,
>  				 "not properly sorted");
> +	/*
> +	 * Determine if this tree looks like a bare repository according
> +	 * to the rules of setup.c. If those are changed, this should be
> +	 * changed too.
> +	 */
> +	if (has_head && has_refs_entry && has_objects_entry)
> +		retval += report(options, tree_oid, OBJ_TREE,
> +				 FSCK_MSG_EMBEDDED_BARE_REPO,
> +				 "contains bare repository");
> +
>  	return retval;
>  }

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-06 22:43 Bare repositories in the working tree are a security risk Glen Choo
  2022-04-06 23:22 ` [PATCH] fsck: detect bare repos in trees and warn Glen Choo
@ 2022-04-07 18:38 ` John Cai
  2022-04-07 21:24 ` brian m. carlson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 50+ messages in thread
From: John Cai @ 2022-04-07 18:38 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Emily Shaffer, justin, Taylor Blau

Hi Glen,

This is an interesting find. Thanks for writing it up.

On 6 Apr 2022, at 18:43, Glen Choo wrote:

> Hi all,
>
> My Google colleagues and I would like to address what we think is a security
> risk - namely that Git repositories can contain valid bare repositories in their
> working trees. This is based on an excellent article by Justin Steven [1] (CC-ed
> here).
>
> Below is a discussion of:
>
> * The risky behavior - what Git does and why it is risky
> * Possible responses to this risk
> * A proposed approach
>
> The proposed changes are nontrivial, so I’d really appreciate any feedback here.
> Unfortunately, I will be out of the office and won’t respond to emails for the
> next 7 days or so, but there will still be at least one Google Git team member
> keeping tabs on the discussion :)
>
> = TL;DR
>
> Git repositories should not be allowed to contain bare repositories in their
> working trees because:
>
> * Such bare repositories can have maliciously crafted config files that cause
>   `git` to execute arbitrary code.
> * Git itself can distribute the malicious repo via `git clone`; no need to
>   distribute repos out of band e.g. via tarballs [2].
> * Many `git` commands can be affected by malicious config files, and many users
>   have tools that will run `git` in the current directory or the subdirectories
>   of a repo. Once the malicious repo has been cloned, very little social
>   engineering is needed; the user might only need to open the repo in an editor
>   or `cd` into the correct subdirectory.
>
> = Background
>
> (This section is primarily a summary of [1]. I highly, highly recommend reading
> that as it describes the issue in much more detail and is extremely readable
> regardless of Git experience.)
>
> Certain Git configuration options are particularly enticing targets for
> attackers, e.g. core.fsmonitor can execute arbitrary commands and is invoked
> on many innocuous-looking `git` commands (like `git status`). This is even more
> risky when one considers that many tools (like shell prompts and IDEs) will run
> `git` opportunistically inside a repository - so many users won't even need to
> explicitly run `git` to be affected [3].
>
> Since config files are such an enticing target for attackers, Git intentionally
> avoids distributing config files with repos - a user shouldn't be able to `git
> clone` a repo with a config file (or really, any files inside .git). However,
> one can 'trick' Git into doing this by embedding a bare repository inside of
> another, containing repository: a repository can contain subdirectories that are
> valid bare repositories, and any `git` operations run in such a subdirectory
> will then use the bare repository’s config file instead of the "main"
> repository’s.
>
> An attack might look like this:
>
> * Attacker creates a repository where subdirectory "Documentation/" is a bare
>   repository i.e. it contains "HEAD", "refs/" and "objects/" [4]. Attacker
>   also adds "config" with a malicious setting for core.fsmonitor.
> * Attacker convinces User to read their project's documentation by `git
>   clone`-ing their repository and inspecting the "Documentation/" directory.
> * User cd-s into "Documentation/" and their shell prompt runs `git status`,

This might be obvious to most reading this, but for those trying this out at home,
you'll need to create a worktree that points to the bare repository and then run
`git status` in there.

>   executing the core.fsmonitor command defined by Attacker.

This is easily reproduceable.

>
> = What can we do about it?
>
> Each subsection is an alternative and an analysis (+/- are pros/cons).
>
> == 1. Prevent users from checking out bare repos
>
> This is similar to an existing mitigation where we prevent repository entries
> that can be confused for ".git" (like ".GIT"). but it requires checking multiple
> entries instead of a single entry. I suspect that we could accomplish this in
> one of two ways:
>
> a. Prevent bare repos from entering the index.
> b. Prevent writing bare repos to the working tree.
>
> + Relatively robust protection - because the malicious repo never enters the
>   working tree, we even protect other tools (e.g. JGit) from doing dangerous
>   things in the embedded repo (provided the checkout is done with `git`, of
>   course).
> - This breaks some 'valid' workflows (e.g. someone embedding a bare repo as a
>   more convenient alternative to submodules), but it seems reasonable to let
>   users opt out of this behavior.
> - (1a) is difficult to do in practice because many code paths add entries to
>   the index, and checking a combination of new entry and existing entries is
>   much trickier than checking just the new entry.
> - (1b) might also be difficult, though not as difficult as 1a because we
>   already have a complete list of entries we will write. I don’t think there
>   are existing facilities that do this sort of checking of multiple entries.
>
> == 2. Detect and reject bare repos using `git fsck` and `transfer.fsckObjects`.
>
> This entails checking for the markers of a bare repository (HEAD, refs/,
> objects/) in tree objects. This shares a precedent with (1), since `git fsck`
> will also detect ".GIT".
>
> + Most reputable hosting sites set `transfer.fsckObjects`, which allows them to
>   detect and prevent this kind of transmission.
> + Confers some protection to users even without them doing anything.
> + Easy to visualize and to write.
> - This won’t directly protect most users because they don’t typically set
>   `transfer.fsckObjects` (in fact, `transfer.fsckObjects` will render many
>   projects, like linux.git, uncloneable without additional config)
> - Won’t guard against malicious/poorly configured hosts.
>
> == 3. Detect that we are in an embedded bare repo and ignore the embedded bare
>  repository in favor of the containing repo.
>
> For example, if setup.c detects that we are in a bare repo, it could then walk
> up the directory hierarchy to see if there is a containing repo that tracks the
> bare one. If so, setup.c chooses to use the containing repo instead.
>
> + Extremely robust; this even protects against a checkout by an earlier Git
>   version.
> + Users who don’t use bare repos won’t even notice the difference.
> - The change in rules breaks some legitimate workflows e.g. a user with a repo
>   at HOME and bare repos underneath.
> - Potentially very expensive for bare repo users because setup.c will likely
>   walk up to the root directory; we’d be essentially *forcing* those users to
>   set GIT_CEILING_DIRECTORIES.
> - Doesn’t protect users who use other tools e.g. JGit.
>
> == 4. Educate users about this risk without making code changes.
>
> Some risks fall into this category e.g. "Avoid unarchiving repositories because
> .git/config might be poisoned." [2].
>
> + We don’t break any users.
> - Breaks the trust that users have in `git clone`.
> - IMO "Inspect every repo in its entirety before cloning it" is too much of a
>   burden to put on users.
>
> = Next steps
>
> I propose that we prevent repositories from containing bare repositories by
> doing the following (in order):
>
> * Implement (2) by adding a new fsck message "embeddedBareRepo".
>   * When this is done, hosting sites can hopefully use this capability to
>     prevent transmission, and help us understand the prevalence of such attacks.
> * Implement (1b) by teaching unpack_trees.c to check whether the tree contains
>   an entire bare repo, and die() if so. This will be guarded by a
>   defaults-to-true config value.
>   * This would only block a bare repo from being written in a single operation.
>     It wouldn’t stop a user from writing a bare repo entry-by-entry using "git
>     checkout <path>", but the amount of social engineering required probably
>     renders this attack infeasible.
>   * As I noted earlier, I foresee some difficulty actually implementing this
>     because I don’t think we have facilities for checking multiple tree entries
>     at once.
>
> I am particularly interested in hearing feedback about (1b), namely:
>
> * How to actually teach unpack_trees.c to detect bare repos.
> * Whether preventing bare repos in unpack_trees.c is a good enough mitigation
>   (e.g. Are there other code paths that should block bare repos? Should we be
>   checking the index instead of the tree?).
>
> I have a patch that does (2); I will send that out and we can leave feedback on
> that separately.
>
> = Demonstration
>
> This is based on a script by Taylor Blau (thanks!). [1] also contains a
> demonstration that runs in Docker.
>
>   #!/bin/sh
>
>   rm -fr malicious cloned &&git init malicious &&
>
>   (
>   cd malicious &&
>
>   mkdir -p bare &&
>   cd bare &&
>
>   echo 'ref: refs/heads/main' >HEAD &&
>   cat >config <<-EOF
>   [core]
>   repositoryformatversion = 0
>   filemode = true
>   bare = false
>   worktree = "worktree"
>   fsmonitor = "echo pwned >&2; false"
>   EOF
>
>   mkdir objects refs worktree &&
>   touch worktree/.gitkeep &&
>
>   git add . &&
>   git commit -m ".gitkeep" &&
>
>   cd .. &&
>   git add bare &&
>   git commit -m 'initial commit'
>   ) &&
>
>   git clone --no-local malicious cloned &&
>   cd cloned/bare &&
>   git status # pwned
>
> = Footnotes
>
> [1] https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md.
> [2] Archived repositories with malicious .git directories are a known risk. See
>  https://lore.kernel.org/git/20171003123239.lisk43a2goxtxkro@sigill.intra.peff.net/
>  for an on-list discussion. This is also described in
>  https://blog.sonarsource.com/securing-developer-tools-git-integrations
>  (referenced in [1]).
> [3] We even ship such a tool - contrib/completion/git-prompt.sh. A user can pwn
>  themselves with tab completion even if they don’t have a prompt configured
> [4] Any directory containing these entries will be recognized as a bare repo, so
>  an attacker could add arbitrary entries to the directory to obfuscate the fact
>  that it is a bare repo.
> [*] https://offensi.com/2019/12/16/4-google-cloud-shell-bugs-explained-bug-3/ is
>  similar to [1], but uses hooks instead of core.fsmonitor.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-06 22:43 Bare repositories in the working tree are a security risk Glen Choo
  2022-04-06 23:22 ` [PATCH] fsck: detect bare repos in trees and warn Glen Choo
  2022-04-07 18:38 ` Bare repositories in the working tree are a security risk John Cai
@ 2022-04-07 21:24 ` brian m. carlson
  2022-04-07 21:53   ` Justin Steven
  2022-04-13 20:37 ` Glen Choo
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 50+ messages in thread
From: brian m. carlson @ 2022-04-07 21:24 UTC (permalink / raw)
  To: Glen Choo; +Cc: git, Emily Shaffer, justin, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 2415 bytes --]

On 2022-04-06 at 22:43:08, Glen Choo wrote:
> An attack might look like this:
> 
> * Attacker creates a repository where subdirectory "Documentation/" is a bare
>   repository i.e. it contains "HEAD", "refs/" and "objects/" [4]. Attacker
>   also adds "config" with a malicious setting for core.fsmonitor.
> * Attacker convinces User to read their project's documentation by `git
>   clone`-ing their repository and inspecting the "Documentation/" directory.
> * User cd-s into "Documentation/" and their shell prompt runs `git status`,
>   executing the core.fsmonitor command defined by Attacker.

As mentioned elsewhere, git status doesn't work without a working tree.

> = Next steps
> 
> I propose that we prevent repositories from containing bare repositories by
> doing the following (in order):
> 
> * Implement (2) by adding a new fsck message "embeddedBareRepo".
>   * When this is done, hosting sites can hopefully use this capability to
>     prevent transmission, and help us understand the prevalence of such attacks.
> * Implement (1b) by teaching unpack_trees.c to check whether the tree contains
>   an entire bare repo, and die() if so. This will be guarded by a
>   defaults-to-true config value.
>   * This would only block a bare repo from being written in a single operation.
>     It wouldn’t stop a user from writing a bare repo entry-by-entry using "git
>     checkout <path>", but the amount of social engineering required probably
>     renders this attack infeasible.
>   * As I noted earlier, I foresee some difficulty actually implementing this
>     because I don’t think we have facilities for checking multiple tree entries
>     at once.

I'm aware of repositories that happen to break in this case.  It's not
uncommon to embed bare repositories when working with tools that involve
Git, and this will definitely break them.  git fast-import isn't always
a valid option because the test data may involve specific structures or
tooling that can't be replicated that way, or it involves things like
commit signatures which aren't round-tripped.

Instead, I'd rather see us avoid executing any program from the config
or any hooks in a bare repository without a working tree (except for
pushes).  I think that would avoid breaking things while still improving
security.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-07 21:24 ` brian m. carlson
@ 2022-04-07 21:53   ` Justin Steven
  2022-04-07 22:10     ` brian m. carlson
  0 siblings, 1 reply; 50+ messages in thread
From: Justin Steven @ 2022-04-07 21:53 UTC (permalink / raw)
  To: brian m. carlson, Glen Choo, git, Emily Shaffer, justin, Taylor Blau

Hi all,

I'm the author of one of the articles linked in Glen's mail. Thank you
Glen for summarising the problem beautifully and pushing this forward.

Brian said:
> As mentioned elsewhere, git status doesn't work without a working tree.

This is correct. However, it is possible to embed a bare repo that has
its own core.worktree which points to a directory within the
containing repo, satisfying the requirement of having a working tree.
This is covered in the article [1] and looks to be accounted for in
Taylor's reproducer script which admittedly I haven't run.

> Instead, I'd rather see us avoid executing any program from the config
> or any hooks in a bare repository without a working tree (except for
> pushes).  I think that would avoid breaking things while still improving
> security.

Due to the fact that the embedded bare repo can be made to have a
working tree, this won't be an effective fix.

I'm not dismissing your examples of uses of Git which would break
under Glen's suggestions. Thank you for describing these.

[1] https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md#poc---regular-vs-bare-repos-and-adding-a-corefsmonitor-payload-to-a-bare-repo

--
Justin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-07 21:53   ` Justin Steven
@ 2022-04-07 22:10     ` brian m. carlson
  2022-04-07 22:40       ` rsbecker
                         ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: brian m. carlson @ 2022-04-07 22:10 UTC (permalink / raw)
  To: Justin Steven; +Cc: Glen Choo, git, Emily Shaffer, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 1902 bytes --]

On 2022-04-07 at 21:53:26, Justin Steven wrote:
> Hi all,
> 
> I'm the author of one of the articles linked in Glen's mail. Thank you
> Glen for summarising the problem beautifully and pushing this forward.
> 
> Brian said:
> > As mentioned elsewhere, git status doesn't work without a working tree.
> 
> This is correct. However, it is possible to embed a bare repo that has
> its own core.worktree which points to a directory within the
> containing repo, satisfying the requirement of having a working tree.
> This is covered in the article [1] and looks to be accounted for in
> Taylor's reproducer script which admittedly I haven't run.
> 
> > Instead, I'd rather see us avoid executing any program from the config
> > or any hooks in a bare repository without a working tree (except for
> > pushes).  I think that would avoid breaking things while still improving
> > security.
> 
> Due to the fact that the embedded bare repo can be made to have a
> working tree, this won't be an effective fix.

Then we'd probably be better off just walking up the entire hierarchy
and excluding worktrees from embedded bare repositories, or otherwise
restricting the config we read.  That will probably mean we'll need to
walk the entire directory hierarchy to see if it's embedded (or at least
to the root of the device) in such a case, but that should be relatively
uncommon.

I'd definitely like to see us make a security improvement here, but I
also would like to avoid us breaking a lot of repositories, especially
since we lack alternatives.

If git fast-import could 100% correctly round-trip all commits and
repositories, I would be much more open to blocking this in fsck after a
deprecation period, but as it stands that's not possible.  Perhaps
improving that would be a suitable way forward.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: Bare repositories in the working tree are a security risk
  2022-04-07 22:10     ` brian m. carlson
@ 2022-04-07 22:40       ` rsbecker
  2022-04-08  5:54       ` Junio C Hamano
  2022-04-13 23:44       ` Glen Choo
  2 siblings, 0 replies; 50+ messages in thread
From: rsbecker @ 2022-04-07 22:40 UTC (permalink / raw)
  To: 'brian m. carlson', 'Justin Steven'
  Cc: 'Glen Choo', git, 'Emily Shaffer', 'Taylor Blau'

On April 7, 2022 6:10 PM, brian m. carlson wrote:
>On 2022-04-07 at 21:53:26, Justin Steven wrote:
>> Hi all,
>>
>> I'm the author of one of the articles linked in Glen's mail. Thank you
>> Glen for summarising the problem beautifully and pushing this forward.
>>
>> Brian said:
>> > As mentioned elsewhere, git status doesn't work without a working tree.
>>
>> This is correct. However, it is possible to embed a bare repo that has
>> its own core.worktree which points to a directory within the
>> containing repo, satisfying the requirement of having a working tree.
>> This is covered in the article [1] and looks to be accounted for in
>> Taylor's reproducer script which admittedly I haven't run.
>>
>> > Instead, I'd rather see us avoid executing any program from the
>> > config or any hooks in a bare repository without a working tree
>> > (except for pushes).  I think that would avoid breaking things while
>> > still improving security.
>>
>> Due to the fact that the embedded bare repo can be made to have a
>> working tree, this won't be an effective fix.
>
>Then we'd probably be better off just walking up the entire hierarchy and
>excluding worktrees from embedded bare repositories, or otherwise restricting
>the config we read.  That will probably mean we'll need to walk the entire
>directory hierarchy to see if it's embedded (or at least to the root of the device) in
>such a case, but that should be relatively uncommon.
>
>I'd definitely like to see us make a security improvement here, but I also would like
>to avoid us breaking a lot of repositories, especially since we lack alternatives.
>
>If git fast-import could 100% correctly round-trip all commits and repositories, I
>would be much more open to blocking this in fsck after a deprecation period, but
>as it stands that's not possible.  Perhaps improving that would be a suitable way
>forward.

One option relating to enable/disable this is to set up a config option that, by default is false, to allow embedded bare repositories. At least with enough warning that this option is required, it might be acceptable. I would prefer never to receive a bare repo through any means (including through a more worrying submodule). From an attack vector standpoint, I would be more concerned about this in an automation setting, say GitHub workflows or Jenkins GitSCM. At least with GitHub workflows, this is "somewhat" contained within VMs under GitHub's control - although not entirely. Jenkins is probably more vulnerable as the clones done through that path do not get the same scrutiny, although in my world, I use a dedicated non-root UID and sandbox the whole thing. This topic makes me nervous and wonder whether we should be self-reporting a CVE.

Shuddering,
Randall


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-07 22:10     ` brian m. carlson
  2022-04-07 22:40       ` rsbecker
@ 2022-04-08  5:54       ` Junio C Hamano
  2022-04-14  0:03         ` Junio C Hamano
  2022-04-14  0:04         ` Glen Choo
  2022-04-13 23:44       ` Glen Choo
  2 siblings, 2 replies; 50+ messages in thread
From: Junio C Hamano @ 2022-04-08  5:54 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Justin Steven, Glen Choo, git, Emily Shaffer, Taylor Blau

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Then we'd probably be better off just walking up the entire hierarchy
> and excluding worktrees from embedded bare repositories, or otherwise
> restricting the config we read.  That will probably mean we'll need to
> walk the entire directory hierarchy to see if it's embedded (or at least
> to the root of the device) in such a case, but that should be relatively
> uncommon.

I find this direction to notice iffy "user data" and disable it
quite reasonable.  A configuration file can define alias, and it
would be yet another attack vector to overload common ones users
likely use ("git co", "git st", ...).  There may also be a hooks/
directory.

I wonder if it is an acceptable defence to deliberately "corrupt"
such user data when we notice that they smell fishy, perhaps by
renaming "config" and "hooks", when they are found next to "HEAD"
and "objects" and "refs", to "config.disabled" and "hooks.disabled"?
I am just thinking aloud without assessing if it is sensible or
feasible at ths point.

I am not sure if "walking the hierarchy up" is an effective enough
defence offhand.  Do we consider it too much social engineering to
make the user follow cloning instruction of the malicious project to
prepare a repository, with core.worktree set to elsewhere, and pull
into it?  Since walking up from any subdirectory of the directory
the core.worktree points at will never see a directory, with ".git/"
subdirectory that is the malicious project, "git status" run in the
"embedded" place in such a scenario will not notice that it is a
repository lookalike that came from outside.  But we can write it
off as an approach needing too much  social engineering, that's OK.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-06 22:43 Bare repositories in the working tree are a security risk Glen Choo
                   ` (2 preceding siblings ...)
  2022-04-07 21:24 ` brian m. carlson
@ 2022-04-13 20:37 ` Glen Choo
  2022-04-13 23:36   ` Junio C Hamano
  2022-04-16  0:34 ` Glen Choo
  2022-04-16  0:41 ` Glen Choo
  5 siblings, 1 reply; 50+ messages in thread
From: Glen Choo @ 2022-04-13 20:37 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, justin, Taylor Blau, martinvonz, brian m. carlson

Thanks for the thoughtful replies, everyone. I'm still catching up with
the discussion but I thought I'd share a suggestion/observation from
a colleague (cc-ed, thanks!) who was following the discussion
separately.

Glen Choo <chooglen@google.com> writes:

> = What can we do about it?
>
> Each subsection is an alternative and an analysis (+/- are pros/cons).
>
> == 1. Prevent users from checking out bare repos
> ...
> == 2. Detect and reject bare repos using `git fsck` and `transfer.fsckObjects`.
> ...
> == 3. Detect that we are in an embedded bare repo and ignore the embedded bare
>  repository in favor of the containing repo.
> ...     
> == 4. Educate users about this risk without making code changes.

Martin observed that, viability aside, there's another approach that I
haven't discussed:

  == 5 Disable bare repo discovery

  We could introduce a config value that disables bare repo discovery
  altogether. This would only disable _discovery_; a user can still use
  the bare repo by specifying the gitdir (e.g. via `--git-dir=.` or
  GIT_DIR).

  + Extremely robust (for the majority of users who don't need bare
    repos at least).
  + Easy to understand.
  - We need to read config _before_ discovering the gitdir; although Git
    _can_ do this (e.g. via read_early_config()), I don't think there is
    precedent for doing this during repository setup.
  - Making this behavior opt-out would break every bare repo user
    (they'd need to either start passing --git-dir or change their
    config).
  - Making this behavior opt-in won't help the users who need it the
    most.

I'm not sure if specifically "disable bare repo discovery for all users
by default" is viable, but the line of thinking seems like it might
yield some good ideas.

Thoughts?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] fsck: detect bare repos in trees and warn
  2022-04-07 13:21     ` Derrick Stolee
  2022-04-07 14:14       ` Ævar Arnfjörð Bjarmason
  2022-04-07 15:11       ` Junio C Hamano
@ 2022-04-13 22:24       ` Glen Choo
  2 siblings, 0 replies; 50+ messages in thread
From: Glen Choo @ 2022-04-13 22:24 UTC (permalink / raw)
  To: Derrick Stolee, Johannes Schindelin, Junio C Hamano; +Cc: git


Derrick Stolee <derrickstolee@github.com> writes:

> On 4/7/2022 8:42 AM, Johannes Schindelin wrote:
>> Hi Glen,
>> 
>> On Wed, 6 Apr 2022, Glen Choo wrote:
>> 
>>> Git tries not to distribute configs in-repo because they are a security
>>> risk. However, an attacker can do exactly this if they embed a bare
>>> repo inside of another repo.
>>>
>>> Teach fsck to detect whether a tree object contains a bare repo (as
>>> determined by setup.c) and warn. This will help hosting sites detect and
>>> prevent transmission of such malicious repos.
>>>
>>> See [1] for a more in-depth discussion, including future steps and
>>> alternatives.
>>>
>>> [1] https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com/
>> 
>> Out of curiosity: does this new check trigger with
>> https://github.com/libgit2/libgit2? AFAIR it has embedded repositories
>> that are used in its test suite. In other words, libgit2 has a legitimate
>> use case for embedded bare repositories, I believe.
>
> It is definitely good to keep in mind that other repositories have
> included bare repositories for convenience. I'm not sure that the behavior
> of some good actors should outweigh the benefits of protecting against
> this attack vector.
>
> The trouble here is: how could the libgit2 repo change their project to
> not trigger this warning? These bare repos are in their history forever if
> they don't do go through significant work and pain to remove them from
> their history. We would want to have a way to make the warnings less
> severe for special cases like this.
>
> Simultaneously, we wouldn't want to bless all _forks_ of libgit2.

Yes, that makes sense. Thanks for the thoughtful reply.

>  2. Suppress warnings on trusted repos, scoped to a specific set of known
>     trees _or_ based on some set of known commits (in case the known trees
>     are too large).

Since Junio mentioned downthread that we'd need (2), I'll focus on this.
I'm not sure I follow, though, so let me try to verbalize my thought
process to see what I'm not understanding...

By "Suppress warnings on trusted repos", I assume this is done on the
hosting side? (Since I can't imagine a built-in Git feature that could
selectively trust repos.)

"scoped to a specific set of known trees" sounds like fsck.skipList
i.e. as a host, I can configure a list of "good" libgit2 trees that I
will trust and those will be skipped by fsck.

So from my _very_ naive reading of (2), it seems like we already have
all of the pieces in place for hosts to do (2) on their own, _unless_
we think that fsck.skipList is inadequate for this use case. e.g. I
personally can't imagine any way to list every "good" tree and still
have a cloneable fork of libgit2, so we might to teach fsck to do
something smarter like "skip any objects reachable by these commits".


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-13 20:37 ` Glen Choo
@ 2022-04-13 23:36   ` Junio C Hamano
  2022-04-14 16:41     ` Glen Choo
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2022-04-13 23:36 UTC (permalink / raw)
  To: Glen Choo
  Cc: git, Emily Shaffer, justin, Taylor Blau, martinvonz, brian m. carlson

Glen Choo <chooglen@google.com> writes:

> Martin observed that, viability aside, there's another approach that I
> haven't discussed:
>
>   == 5 Disable bare repo discovery
>
>   We could introduce a config value that disables bare repo discovery
>   altogether. This would only disable _discovery_; a user can still use
>   the bare repo by specifying the gitdir (e.g. via `--git-dir=.` or
>   GIT_DIR).

Does it or does it not count as "allowing discovery to do its job"
if you go to the directory, knowing that the directory is a bare
one, and expect Git to work in it?

I am guessing that your definition of "discovery" is not even
consider if the current directory is a repository and always force
the user to tell us with --git-dir or GIT_DIR.  I am not sure that
is realistically feasible (I am thinking of cases like "git fetch"
going to the remote repository on the local disk that is bare to run
"git upload-pack"), but if the fallout is not too bad, it may be a
good heuristics.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-07 22:10     ` brian m. carlson
  2022-04-07 22:40       ` rsbecker
  2022-04-08  5:54       ` Junio C Hamano
@ 2022-04-13 23:44       ` Glen Choo
  2 siblings, 0 replies; 50+ messages in thread
From: Glen Choo @ 2022-04-13 23:44 UTC (permalink / raw)
  To: brian m. carlson, Justin Steven; +Cc: git, Emily Shaffer, Taylor Blau


Thanks a lot for chiming in, especially for advocating on behalf of
users of embedded bare repos :)

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2022-04-07 at 21:53:26, Justin Steven wrote:
>> Hi all,
>> 
>> I'm the author of one of the articles linked in Glen's mail. Thank you
>> Glen for summarising the problem beautifully and pushing this forward.
>> 
>> Brian said:
>> > As mentioned elsewhere, git status doesn't work without a working tree.
>> 
>> This is correct. However, it is possible to embed a bare repo that has
>> its own core.worktree which points to a directory within the
>> containing repo, satisfying the requirement of having a working tree.
>> This is covered in the article [1] and looks to be accounted for in
>> Taylor's reproducer script which admittedly I haven't run.
>> 
>> > Instead, I'd rather see us avoid executing any program from the config
>> > or any hooks in a bare repository without a working tree (except for
>> > pushes).  I think that would avoid breaking things while still improving
>> > security.
>> 
>> Due to the fact that the embedded bare repo can be made to have a
>> working tree, this won't be an effective fix.
>
> Then we'd probably be better off just walking up the entire hierarchy
> and excluding worktrees from embedded bare repositories, or otherwise
> restricting the config we read.  That will probably mean we'll need to
> walk the entire directory hierarchy to see if it's embedded (or at least
> to the root of the device) in such a case, but that should be relatively
> uncommon.

IIUC, the difference between your suggestion and (3) in the original
email:

  == 3. Detect that we are in an embedded bare repo and ignore the embedded bare
  repository in favor of the containing repo.

is that we only walk the hierarchy in the event that the bare repo has a
worktree? If so, then yes that seems relatively uncommon and wouldn't
break most bare repo users.

What I'm unsure about is whether or not this reduces the attack surface
enough; it protects the highest value target, `git status`, but I'm not
sure about the following:

* How well does this protect other, existing commands (e.g. `log`,
  `branch` and `remote` fall into the same category of commands that
  appear informative and harmless)?
* How well does this protect other, abusable configuration directives
  (`core.pager` perhaps)?
* How future-proof is this? (if someone removes a dependency on
  core.worktree, is the command vulnerable again?)

There's a similar argument to be made about restricting the config we
read - we'd be running the risk of either missing some abusable config
that already exists or will exist.

And if my understanding is correct (and it very well might be wrong),
then this seems too piecemeal to be worth the effort.

> I'd definitely like to see us make a security improvement here, but I
> also would like to avoid us breaking a lot of repositories, especially
> since we lack alternatives.

Agreed, leaving existing users high and dry is an awful outcome.

>
> If git fast-import could 100% correctly round-trip all commits and
> repositories, I would be much more open to blocking this in fsck after a
> deprecation period, but as it stands that's not possible.  Perhaps
> improving that would be a suitable way forward.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-08  5:54       ` Junio C Hamano
@ 2022-04-14  0:03         ` Junio C Hamano
  2022-04-14  0:04         ` Glen Choo
  1 sibling, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2022-04-14  0:03 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Justin Steven, Glen Choo, git, Emily Shaffer, Taylor Blau

Junio C Hamano <gitster@pobox.com> writes:

> I am not sure if "walking the hierarchy up" is an effective enough
> defence offhand.  Do we consider it too much social engineering to
> make the user follow cloning instruction of the malicious project to
> prepare a repository, with core.worktree set to elsewhere, and pull
> into it?  Since walking up from any subdirectory of the directory
> the core.worktree points at will never see a directory, with ".git/"
> subdirectory that is the malicious project, "git status" run in the
> "embedded" place in such a scenario will not notice that it is a
> repository lookalike that came from outside.  But we can write it
> off as an approach needing too much  social engineering, that's OK.

In other words, an attacker could lead the victim to clone their
malicious repository at path $R with core.worktree in its
configuration file $R/config pointing at some directory $D that is
entirely unrelated to $R.  In a subdirectory of $D (i.e. the working
tree of that malicious repository) there may be a directory $G with
HEAD, refs and config in it.

When the victim visits such a directory $G, going up from there to
the root of the filesystem would not find any directory with ".git/"
subdirectory in it, so your protection may not even notice that $G
came from a checkout of $R.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-08  5:54       ` Junio C Hamano
  2022-04-14  0:03         ` Junio C Hamano
@ 2022-04-14  0:04         ` Glen Choo
  1 sibling, 0 replies; 50+ messages in thread
From: Glen Choo @ 2022-04-14  0:04 UTC (permalink / raw)
  To: Junio C Hamano, brian m. carlson
  Cc: Justin Steven, git, Emily Shaffer, Taylor Blau

Junio C Hamano <gitster@pobox.com> writes:

> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> Then we'd probably be better off just walking up the entire hierarchy
>> and excluding worktrees from embedded bare repositories, or otherwise
>> restricting the config we read.  That will probably mean we'll need to
>> walk the entire directory hierarchy to see if it's embedded (or at least
>> to the root of the device) in such a case, but that should be relatively
>> uncommon.
>
> I find this direction to notice iffy "user data" and disable it
> quite reasonable.  A configuration file can define alias, and it
> would be yet another attack vector to overload common ones users
> likely use ("git co", "git st", ...).  There may also be a hooks/
> directory.
>
> I wonder if it is an acceptable defence to deliberately "corrupt"
> such user data when we notice that they smell fishy, perhaps by
> renaming "config" and "hooks", when they are found next to "HEAD"
> and "objects" and "refs", to "config.disabled" and "hooks.disabled"?
> I am just thinking aloud without assessing if it is sensible or
> feasible at ths point.

Interesting idea - so the final result for the user is that they can
check out a valid bare repository, but it will not have anything
"sensitive".

I suspect that it will be difficult to define "will not have anything
sensitive" in practice. e.g. Justin's original article [1] shows how
.git/index can be used in surprising, malicious ways. And if we're going
to corrupt the repo anyway, it just be easier block the entire bare repo
from entering the worktree by default (but with an escape hatch for
users who know what they are doing).

> I am not sure if "walking the hierarchy up" is an effective enough
> defence offhand.  Do we consider it too much social engineering to
> make the user follow cloning instruction of the malicious project to
> prepare a repository, with core.worktree set to elsewhere, and pull
> into it?  Since walking up from any subdirectory of the directory
> the core.worktree points at will never see a directory, with ".git/"
> subdirectory that is the malicious project, "git status" run in the
> "embedded" place in such a scenario will not notice that it is a
> repository lookalike that came from outside.  But we can write it
> off as an approach needing too much  social engineering, that's OK.

IMO that sounds like too much social engineering :)

[1] https://github.com/justinsteven/advisories/blob/main/2022_git_buried_bare_repos_and_fsmonitor_various_abuses.md

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-13 23:36   ` Junio C Hamano
@ 2022-04-14 16:41     ` Glen Choo
  2022-04-14 17:35       ` Junio C Hamano
  2022-04-15 20:13       ` Junio C Hamano
  0 siblings, 2 replies; 50+ messages in thread
From: Glen Choo @ 2022-04-14 16:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Emily Shaffer, justin, Taylor Blau, martinvonz, brian m. carlson

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Martin observed that, viability aside, there's another approach that I
>> haven't discussed:
>>
>>   == 5 Disable bare repo discovery
>>
>>   We could introduce a config value that disables bare repo discovery
>>   altogether. This would only disable _discovery_; a user can still use
>>   the bare repo by specifying the gitdir (e.g. via `--git-dir=.` or
>>   GIT_DIR).
>
> Does it or does it not count as "allowing discovery to do its job"
> if you go to the directory, knowing that the directory is a bare
> one, and expect Git to work in it?
>
> I am guessing that your definition of "discovery" is not even
> consider if the current directory is a repository and always force
> the user to tell us with --git-dir or GIT_DIR.

Yes, I mean that even the current directory will be ignored when
discovery is disabled.

>                                                I am not sure that
> is realistically feasible (I am thinking of cases like "git fetch"
> going to the remote repository on the local disk that is bare to run
> "git upload-pack"), but if the fallout is not too bad, it may be a
> good heuristics.

Good detail - I hadn't considered the impact on our own child processes.
I suspect this might be a huge undertaking. Unless there is significant
interest in this option, I probably won't pursue it further.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-14 16:41     ` Glen Choo
@ 2022-04-14 17:35       ` Junio C Hamano
  2022-04-14 18:19         ` Junio C Hamano
  2022-04-15 21:33         ` Glen Choo
  2022-04-15 20:13       ` Junio C Hamano
  1 sibling, 2 replies; 50+ messages in thread
From: Junio C Hamano @ 2022-04-14 17:35 UTC (permalink / raw)
  To: Glen Choo
  Cc: git, Emily Shaffer, justin, Taylor Blau, martinvonz, brian m. carlson

Glen Choo <chooglen@google.com> writes:

> Yes, I mean that even the current directory will be ignored when
> discovery is disabled.

OK.

>>                                                I am not sure that
>> is realistically feasible (I am thinking of cases like "git fetch"
>> going to the remote repository on the local disk that is bare to run
>> "git upload-pack"), but if the fallout is not too bad, it may be a
>> good heuristics.
>
> Good detail - I hadn't considered the impact on our own child processes.
> I suspect this might be a huge undertaking. Unless there is significant
> interest in this option, I probably won't pursue it further.

I do not necessarily think so.  The entry points to transport on the
server side are quite limited (and the client side is dealing with
your own repositories anyway), and they already know which directory
in the server filesystem to hand to the upload-pack and friends, so
it would be a matter of passing GIT_DIR=$there when they call into the
run_command() API, if they are not already doing so.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-14 17:35       ` Junio C Hamano
@ 2022-04-14 18:19         ` Junio C Hamano
  2022-04-15 21:33         ` Glen Choo
  1 sibling, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2022-04-14 18:19 UTC (permalink / raw)
  To: Glen Choo
  Cc: git, Emily Shaffer, justin, Taylor Blau, martinvonz, brian m. carlson

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Yes, I mean that even the current directory will be ignored when
>> discovery is disabled.
>
> OK.
>
>>>                                                I am not sure that
>>> is realistically feasible (I am thinking of cases like "git fetch"
>>> going to the remote repository on the local disk that is bare to run
>>> "git upload-pack"), but if the fallout is not too bad, it may be a
>>> good heuristics.
>>
>> Good detail - I hadn't considered the impact on our own child processes.
>> I suspect this might be a huge undertaking. Unless there is significant
>> interest in this option, I probably won't pursue it further.

> I do not necessarily think so.  The entry points to transport on the

By "not" I meant "this might be huge? It may not be".  Sorry for
being unclear.

> server side are quite limited (and the client side is dealing with
> your own repositories anyway), and they already know which directory
> in the server filesystem to hand to the upload-pack and friends, so
> it would be a matter of passing GIT_DIR=$there when they call into the
> run_command() API, if they are not already doing so.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] fsck: detect bare repos in trees and warn
  2022-04-07 14:14       ` Ævar Arnfjörð Bjarmason
@ 2022-04-14 20:02         ` Glen Choo
  2022-04-15 12:46           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 50+ messages in thread
From: Glen Choo @ 2022-04-14 20:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee
  Cc: Johannes Schindelin, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Apr 07 2022, Derrick Stolee wrote:
>
>> A more complete protection here would be:
>>
>>  1. Warn when finding a bare repo as a tree (this patch).
>>
>>  2. Suppress warnings on trusted repos, scoped to a specific set of known
>>     trees _or_ based on some set of known commits (in case the known trees
>>     are too large).
>>
>>  3. Prevent writing a bare repo to the worktree, unless the user provided
>>     an opt-in to that behavior.
>>
>> Since your patch is moving in the right direction here, I don't think
>> steps (2) and (3) are required to move forward with your patch. However,
>> it is a good opportunity to discuss the full repercussions of this issue.
>
> Isn't a gentler solution here to:
>
>  1. In setup.c, we detect a repo
>  2. Walk up a directory
>  3. Do we find a repo?
>  4. Does that repo "contain" the first one?
>     If yes: die on setup
>     If no: it's OK
>
> It also seems to me that there's pretty much perfect overlap between
> this and the long-discussed topic of marking a submodule with config
> v.s. detecting it on the fly.

Your suggestion seems similar to:

  == 3. Detect that we are in an embedded bare repo and ignore the embedded bare
  repository in favor of the containing repo.

which I also think is a simple, robust mitigation if we put aside the
problem of walking up to the root in too many situations. I seem to
recall that this problem has come up before in [1] (and possibly other
topics? I wasn't really able to locate them through a cursory search..),
so I assume that's what you're referring to by "long-discussed topic".

(Forgive me if I'm asking you to repeat yourself yet another time) I
seem to recall that we weren't able to reach consensus on whether it's
okay for Git to opportunistically walk up the directory hierarchy during
setup, especially since There are some situations where this is
extremely expensive (VFS, network mount).

I actually like this option quite a lot, but I don't see how we could
implement this without imposing a big penalty to all bare repo users -
they'd either be forced to set GIT_DIR or GIT_CEILING_DIRECTORIES, or
take a (potentially big) performance hit. Hopefully I'm just framing
this too narrowly and you're approaching this differently.

PS: As an aside, wouldn't this also break libgit2? We could make this
opt-out behavior, though that requires us to read system config _before_
discovering the gitdir (as I discussed in [2]).

[1] https://lore.kernel.org/git/211109.86v912dtfw.gmgdl@evledraar.gmail.com/
[2] https://lore.kernel.org/git/kl6lv8vc90ts.fsf@chooglen-macbookpro.roam.corp.google.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] fsck: detect bare repos in trees and warn
  2022-04-14 20:02         ` Glen Choo
@ 2022-04-15 12:46           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 50+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-04-15 12:46 UTC (permalink / raw)
  To: Glen Choo; +Cc: Derrick Stolee, Johannes Schindelin, git


On Thu, Apr 14 2022, Glen Choo wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Thu, Apr 07 2022, Derrick Stolee wrote:
>>
>>> A more complete protection here would be:
>>>
>>>  1. Warn when finding a bare repo as a tree (this patch).
>>>
>>>  2. Suppress warnings on trusted repos, scoped to a specific set of known
>>>     trees _or_ based on some set of known commits (in case the known trees
>>>     are too large).
>>>
>>>  3. Prevent writing a bare repo to the worktree, unless the user provided
>>>     an opt-in to that behavior.
>>>
>>> Since your patch is moving in the right direction here, I don't think
>>> steps (2) and (3) are required to move forward with your patch. However,
>>> it is a good opportunity to discuss the full repercussions of this issue.
>>
>> Isn't a gentler solution here to:
>>
>>  1. In setup.c, we detect a repo
>>  2. Walk up a directory
>>  3. Do we find a repo?
>>  4. Does that repo "contain" the first one?
>>     If yes: die on setup
>>     If no: it's OK
>>
>> It also seems to me that there's pretty much perfect overlap between
>> this and the long-discussed topic of marking a submodule with config
>> v.s. detecting it on the fly.
>
> Your suggestion seems similar to:
>
>   == 3. Detect that we are in an embedded bare repo and ignore the embedded bare
>   repository in favor of the containing repo.
>
> which I also think is a simple, robust mitigation if we put aside the
> problem of walking up to the root in too many situations. I seem to
> recall that this problem has come up before in [1] (and possibly other
> topics? I wasn't really able to locate them through a cursory search..),
> so I assume that's what you're referring to by "long-discussed topic".

Yes, I mean the submodule.superprojectGitDir topic.

> (Forgive me if I'm asking you to repeat yourself yet another time) I
> seem to recall that we weren't able to reach consensus on whether it's
> okay for Git to opportunistically walk up the directory hierarchy during
> setup, especially since There are some situations where this is
> extremely expensive (VFS, network mount).

I'm not sure, but I think per the later
https://lore.kernel.org/git/220204.86pmo34d2m.gmgdl@evledraar.gmail.com/
and
https://lore.kernel.org/git/220311.8635joj0lf.gmgdl@evledraar.gmail.com/
that any optimization concerns were likely just "this is slow in
shellscript" and not at the FS level.

There were also passing references to some internal Google-specific
NFS-ish implementation that I know nothing about (but you might),
i.e. what I asked about in:
https://lore.kernel.org/git/220212.864k53yfws.gmgdl@evledraar.gmail.com/

But given the v9 superprojectGitDir becoming a boolean instead of a path
in v9 I'm not sure/have no idea.

The only thing I'm sure of is if past iterations of the series were
addressing such a problem as an optimization that doesn't seem to be a
current goal.

As noted in those past exchanges I have tested this method on e.g. AIX
whose FS is unbelievably slow, and I couldn't even tell the differenc.

That's because if you look at the total FS syscalls even for an
uninitialized repo just traversing .git, getting config etc. is going to
dwarf "walking up" in terms of number of calls.

Of course not all calls are going to be equal, and there's that
potential "I'm not NFS-y, but a parent is" case etc.

In any case, I think even *if* we had such a case somewhere that this
plan would still make sense. Such users could simply set
GIT_CEILING_DIRECTORIES or something similar if they cared about the
performance.

But for everyone else we'd do the right thing, and not prematurely
optimize. I.e. we actually *are* concerned not with "does it look like a
bare repo?" but "is this thing that looks like a bare repo within our
current actual repo or not?".

> I actually like this option quite a lot, but I don't see how we could
> implement this without imposing a big penalty to all bare repo users -
> they'd either be forced to set GIT_DIR or GIT_CEILING_DIRECTORIES, or
> take a (potentially big) performance hit. Hopefully I'm just framing
> this too narrowly and you're approaching this differently.

As noted in the [1] you quoted (link below) I tried to quantify that
potential penalty, and it seems to be a complete non-issue.

Of course there may be other scenarios where it matters, but I haven't
seen any concrete data to support that.

Doesn't pretty everyone who cares about the performance of bare in any
capacity do so because they're running a server that's using
git-upload-pack and the like? Those require you to specify the exact
.git directory you want.

I.e. wouldn't this *only* apply to those doing the equivalent of "git -C
some-dir" to "cd" to a bare repo?

> PS: As an aside, wouldn't this also break libgit2? We could make this
> opt-out behavior, though that requires us to read system config _before_
> discovering the gitdir (as I discussed in [2]).

No it wouldn't? I don't use libgit2, but upthread there's concern that
banning things that look-like-a-repo from being tracked would break it.

Whereas I'm pointing out that we don't need to do that, we can just keep
searching upwards.

But yes, it would "break" anything that assumed you could cd to that
tracked-looks-like-or-is--a-gitdir and have e.g. "git config" pick up
its config instead of our "real repo" config, but that's exactly what we
want in this case isn't it?

I'm just pointing out that we can do it on the fly in setup.c, instead
of forbidding such content from ever being tracked within the
repository, which we'd be doing because we know we're doing the wrong
thing in that setup.c codepath.

Let's just fix that bit in setup.c instead.

> [1] https://lore.kernel.org/git/211109.86v912dtfw.gmgdl@evledraar.gmail.com/
> [2] https://lore.kernel.org/git/kl6lv8vc90ts.fsf@chooglen-macbookpro.roam.corp.google.com


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-14 16:41     ` Glen Choo
  2022-04-14 17:35       ` Junio C Hamano
@ 2022-04-15 20:13       ` Junio C Hamano
  2022-04-15 23:45         ` Glen Choo
  1 sibling, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2022-04-15 20:13 UTC (permalink / raw)
  To: Glen Choo
  Cc: git, Emily Shaffer, justin, Taylor Blau, martinvonz, brian m. carlson

Glen Choo <chooglen@google.com> writes:

> Yes, I mean that even the current directory will be ignored when
> discovery is disabled.
>
>>                                                I am not sure that
>> is realistically feasible (I am thinking of cases like "git fetch"
>> going to the remote repository on the local disk that is bare to run
>> "git upload-pack"), but if the fallout is not too bad, it may be a
>> good heuristics.
>
> Good detail - I hadn't considered the impact on our own child processes.
> I suspect this might be a huge undertaking. Unless there is significant
> interest in this option, I probably won't pursue it further.

So, I was doing a bit of experiment.  The change to the set-up
essentially means that working in either a bare repository or in the
.git/ subdirectory of a non-bare repository that you used to be able
to do with

    cd bare.git && git cmd
    cd .git/ && git cmd

is now forbidden, unless you explicitly say which directory you want
to use as the GIT_DIR, i.e.

    cd bare.git && GIT_DIR=. git cmd
    cd .git/ && GIT_DIR=. git cmd

The required change to the code is surprisingly small, but the
fallout is very big.  Partial patches to two tests are attached with
some commentary at the end.

 setup.c          | 9 +++++++--

diff --git c/setup.c w/setup.c
index a7b36f3ffb..52304ae57e 100644
--- c/setup.c
+++ w/setup.c
@@ -1203,12 +1203,15 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
 	 * Test in the following order (relative to the dir):
 	 * - .git (file containing "gitdir: <path>")
 	 * - .git/
-	 * - ./ (bare)
+	 * - ./ (bare) --- not
 	 * - ../.git
 	 * - ../.git/
-	 * - ../ (bare)
+	 * - ../ (bare) --- not
 	 * - ../../.git
 	 *   etc.
+	 * we used to test the directory itself as a bare one
+	 * after checking if the directory has .git/ that is a repository
+	 * and before moving up the hierarchy, but no longer do so.
 	 */
 	one_filesystem = !git_env_bool("GIT_DISCOVERY_ACROSS_FILESYSTEM", 0);
 	if (one_filesystem)
@@ -1238,12 +1241,14 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
 			return GIT_DIR_DISCOVERED;
 		}
 
+#if 0
 		if (is_git_directory(dir->buf)) {
 			if (!ensure_valid_ownership(dir->buf))
 				return GIT_DIR_INVALID_OWNERSHIP;
 			strbuf_addstr(gitdir, ".");
 			return GIT_DIR_BARE;
 		}
+#endif
 
 		if (offset <= min_offset)
 			return GIT_DIR_HIT_CEILING;


THe above is all it takes to disable the recognition of a bare
repository during the repositiry discovery.  The code to handle
explicitly specified GIT_DIR is way before the loop being touched
here and works OK as before.

Interestingly, many breakages in tests are not due to "clone"
failing to operate in "the other side".  For example, this one tries
to inspect a bare repository that was created by a clone operation,
and we need to tell "git" where the GIT_DIR is, even after we chdir
to it.  I do not think this is so common but with our popularity,
not-so-common population is still an absolute large number of users
that we do not want to hurt.

diff --git c/t/t5601-clone.sh w/t/t5601-clone.sh
index 4a61f2c901..822ee3164a 100755
--- c/t/t5601-clone.sh
+++ w/t/t5601-clone.sh
@@ -111,9 +111,9 @@ test_expect_success 'clone --mirror' '
 	git clone --mirror src mirror &&
 	test -f mirror/HEAD &&
 	test ! -f mirror/file &&
-	FETCH="$(cd mirror && git config remote.origin.fetch)" &&
+	FETCH="$(cd mirror && GIT_DIR=. git config remote.origin.fetch)" &&
 	test "+refs/*:refs/*" = "$FETCH" &&
-	MIRROR="$(cd mirror && git config --bool remote.origin.mirror)" &&
+	MIRROR="$(cd mirror && GIT_DIR=. git config --bool remote.origin.mirror)" &&
 	test "$MIRROR" = true
 
 '
@@ -151,7 +151,7 @@ test_expect_success 'clone --mirror does not repeat tags' '
 	 git tag some-tag HEAD) &&
 	git clone --mirror src mirror2 &&
 	(cd mirror2 &&
-	 git show-ref 2> clone.err > clone.out) &&
+	 GIT_DIR=. git show-ref 2>clone.err >clone.out) &&
 	! grep Duplicate mirror2/clone.err &&
 	grep some-tag mirror2/clone.out
 


The next example is "init" test, which seems to reveal that refusing
to discover a bare repository is not exactly an approach that is
workable.

The first two hunks just show the same pattern as above.  The
check_config helper is given a directory, under which 'config' and
'refs' should exist (in other words, the first part of the helper
emulates setup.c::is_git_directory()).  Then it tries to read from
the configuration file found in that directory.  Be it a bare
repository or .git/ subdirectory of a non-bare repository the caller
has given us, we need to do the same GIT_DIR=. dance as the above.

The last hunk does not really fix.  The scenario is this:

    * A bare repository bare-ancestor-aliased.git is prepared

    * Its config file says "[alias] aliasedinit=init" in it

    * In that bare repository (next to 'refs', 'config'), we create
      a subdirectory 'plain-nested'.

    * We go to that bare-ancestor-aliased.git/plain-nested/
      directory and say "git aliasedinit".

This wants to be able to create a subdirectory in this bare
repository and make that subdirectory an independent repository (it
happens to be a non-bare repository, but a bare repository should
also work---think of .git/modules/name/ bare repository that is
designed to be pointed at with gitfile: link from a submodule
working tree).

Now, in order for the last step to be able to even _use_ the alias,
it has to treat the bare-ancestor-aliased.git directory as a bare
repository and read its configuration.  But by explicitly setting
GIT_DIR to that location, the behaviour of "git init" itself is
changed.  It no longer initializes the current directory,
i.e. plain-nested subdirectory of the bare-ancestor-aliased.git;
GIT_DIR tells us to (re)initialize it instead.

This fundamentally does not work.

It is a separate matter if it makes sense to allow creating a new
repository inside a bare repository (it does---that is what the
modern submodule layout uses), or to allow an alias to help doing so
defined in the top-level bare repository (it probably does---the
end-user may want to have a handy way to customize how submodules
are configured).  But it seems to tell us that with today's external
interface we have for "git init", we cannot take configuration from
a shallower level and use it to drive "git init" to create a new
repository at a deeper level.

diff --git c/t/t0001-init.sh w/t/t0001-init.sh
index d479303efa..57f2eedac0 100755
--- c/t/t0001-init.sh
+++ w/t/t0001-init.sh
@@ -6,6 +6,7 @@ TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 check_config () {
+	# NEEDSWORK: refs could be a file in post reftable world.
 	if test_path_is_dir "$1" &&
 	   test_path_is_file "$1/config" && test_path_is_dir "$1/refs"
 	then
@@ -21,8 +22,8 @@ check_config () {
 		return 1
 	fi
 
-	bare=$(cd "$1" && git config --bool core.bare)
-	worktree=$(cd "$1" && git config core.worktree) ||
+	bare=$(cd "$1" && GIT_DIR=. git config --bool core.bare)
+	worktree=$(cd "$1" && GIT_DIR=. git config core.worktree) ||
 	worktree=unset
 
 	test "$bare" = "$2" && test "$worktree" = "$3" || {
@@ -84,7 +85,7 @@ test_expect_success 'plain nested in bare through aliased command' '
 		echo "[alias] aliasedinit = init" >>config &&
 		mkdir plain-nested &&
 		cd plain-nested &&
-		git aliasedinit
+		GIT_DIR=.. git aliasedinit ;# does not work
 	) &&
 	check_config bare-ancestor-aliased.git/plain-nested/.git false unset
 '

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-14 17:35       ` Junio C Hamano
  2022-04-14 18:19         ` Junio C Hamano
@ 2022-04-15 21:33         ` Glen Choo
  2022-04-15 22:17           ` Junio C Hamano
  2022-04-15 22:43           ` Glen Choo
  1 sibling, 2 replies; 50+ messages in thread
From: Glen Choo @ 2022-04-15 21:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Emily Shaffer, justin, Taylor Blau, martinvonz, brian m. carlson

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Yes, I mean that even the current directory will be ignored when
>> discovery is disabled.
>
> OK.
>
>>>                                                I am not sure that
>>> is realistically feasible (I am thinking of cases like "git fetch"
>>> going to the remote repository on the local disk that is bare to run
>>> "git upload-pack"), but if the fallout is not too bad, it may be a
>>> good heuristics.
>>
>> Good detail - I hadn't considered the impact on our own child processes.
>> I suspect this might be a huge undertaking. Unless there is significant
>> interest in this option, I probably won't pursue it further.
>
> I do not necessarily think so.  The entry points to transport on the
> server side are quite limited (and the client side is dealing with
> your own repositories anyway), and they already know which directory
> in the server filesystem to hand to the upload-pack and friends, so
> it would be a matter of passing GIT_DIR=$there when they call into the
> run_command() API, if they are not already doing so.

FWIW I experimented with turning off bare repo recognition altogether
and seeing what breaks.

CI Run: https://github.com/chooglen/git/runs/6042600953?check_suite_focus=true

AFAICT, most of the test failures are what we'd expect if we stopped
recognizing bare repos altogether. One stands out to me in particular
though - in t5550-http-fetch-dumb.sh, 

    git clone $HTTPD_URL/dumb/repo.git clone-tmpl

yields 

    ++ git clone http://127.0.0.1:5550/dumb/repo.git clone-tmpl
    Cloning into 'clone-tmpl'...
    fatal: repository 'http://127.0.0.1:5550/dumb/repo.git/' not found

This sounds to me like Git isn't recognizing the static http files as a
remote Git repo, and if so, --git-dir doesn't sound like it'll save us.
But I'm not that familiar with the transport code so I don't know if
this is a big deal or just a quirk with our httpd tests.

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----

diff --git a/setup.c b/setup.c
index 2600548776..794d9624d5 100644
--- a/setup.c
+++ b/setup.c
@@ -1196,10 +1196,10 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
 			return GIT_DIR_DISCOVERED;
 		}
 
-		if (is_git_directory(dir->buf)) {
-			strbuf_addstr(gitdir, ".");
-			return GIT_DIR_BARE;
-		}
+		/* if (is_git_directory(dir->buf)) { */
+		/* 	strbuf_addstr(gitdir, "."); */
+		/* 	return GIT_DIR_BARE; */
+		/* } */
 
 		if (offset <= min_offset)
 			return GIT_DIR_HIT_CEILING;


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-15 21:33         ` Glen Choo
@ 2022-04-15 22:17           ` Junio C Hamano
  2022-04-16  0:52             ` Taylor Blau
  2022-04-15 22:43           ` Glen Choo
  1 sibling, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2022-04-15 22:17 UTC (permalink / raw)
  To: Glen Choo
  Cc: git, Emily Shaffer, justin, Taylor Blau, martinvonz, brian m. carlson

Glen Choo <chooglen@google.com> writes:

> FWIW I experimented with turning off bare repo recognition altogether
> and seeing what breaks.

I guess our mails crossed ;-)  I've done a similar one locally and
sent a report out earlier, and I think the part of the function in
question that we disabled is identical.

>     ++ git clone http://127.0.0.1:5550/dumb/repo.git clone-tmpl
>     Cloning into 'clone-tmpl'...
>     fatal: repository 'http://127.0.0.1:5550/dumb/repo.git/' not found
>
> This sounds to me like Git isn't recognizing the static http files as a
> remote Git repo, and if so, --git-dir doesn't sound like it'll save us.

If the http server side we ship _depends_ on the fact that we
historically consider that it is enough to chdir into a directory to
use that directory as a bare repository, it is not all that
surprising that the server side infrastructure needs to do an
equivalent of "export GIT_DIR=." in addition to chdir it already
does into the directory.  There may be other things that the http
responder in the affected test needs to do before it can recognize
that the dumb/repo.git URL refers to a valid bare repository, and
until that happens, the above experiment may not start working.

I am not worried about that kind for breakage all that much, because
it is entirely under _our_ control how an HTTP request received
results in a "git" invocation, how "git clone/fetch ssh://" invokes
the process that runs "git upload-pack" on the other side in the
directory requested, etc.

What worries me more is the effect on _other_ people's server
implementations ("server", because that is one major class of use
case of bare repositories).  Essentially they need to identify the
places where they depend on the current behaviour (i.e. going to the
bare repository is enough to use it) and export GIT_DIR=. when they
invoke "git" there.  The actual change that is necessary might be
small, but identifying the places that need such changes may be
added burden to them.

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-15 21:33         ` Glen Choo
  2022-04-15 22:17           ` Junio C Hamano
@ 2022-04-15 22:43           ` Glen Choo
  1 sibling, 0 replies; 50+ messages in thread
From: Glen Choo @ 2022-04-15 22:43 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Emily Shaffer, justin, Taylor Blau, martinvonz, brian m. carlson

Glen Choo <chooglen@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Glen Choo <chooglen@google.com> writes:
>>
>>> Yes, I mean that even the current directory will be ignored when
>>> discovery is disabled.
>>
>> OK.
>>
>>>>                                                I am not sure that
>>>> is realistically feasible (I am thinking of cases like "git fetch"
>>>> going to the remote repository on the local disk that is bare to run
>>>> "git upload-pack"), but if the fallout is not too bad, it may be a
>>>> good heuristics.
>>>
>>> Good detail - I hadn't considered the impact on our own child processes.
>>> I suspect this might be a huge undertaking. Unless there is significant
>>> interest in this option, I probably won't pursue it further.
>>
>> I do not necessarily think so.  The entry points to transport on the
>> server side are quite limited (and the client side is dealing with
>> your own repositories anyway), and they already know which directory
>> in the server filesystem to hand to the upload-pack and friends, so
>> it would be a matter of passing GIT_DIR=$there when they call into the
>> run_command() API, if they are not already doing so.
>
> FWIW I experimented with turning off bare repo recognition altogether
> and seeing what breaks.
>
> CI Run: https://github.com/chooglen/git/runs/6042600953?check_suite_focus=true
>
> AFAICT, most of the test failures are what we'd expect if we stopped
> recognizing bare repos altogether. One stands out to me in particular
> though - in t5550-http-fetch-dumb.sh, 
>
>     git clone $HTTPD_URL/dumb/repo.git clone-tmpl
>
> yields 
>
>     ++ git clone http://127.0.0.1:5550/dumb/repo.git clone-tmpl
>     Cloning into 'clone-tmpl'...
>     fatal: repository 'http://127.0.0.1:5550/dumb/repo.git/' not found
>
> This sounds to me like Git isn't recognizing the static http files as a
> remote Git repo, and if so, --git-dir doesn't sound like it'll save us.
> But I'm not that familiar with the transport code so I don't know if
> this is a big deal or just a quirk with our httpd tests.

Ok I think this is a false alarm - the previous test does some setup on
the server, which is a bare repo. It was the _setup_ that was broken,
not the actual `git clone`.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-15 20:13       ` Junio C Hamano
@ 2022-04-15 23:45         ` Glen Choo
  2022-04-15 23:59           ` Glen Choo
  2022-04-16  1:00           ` Taylor Blau
  0 siblings, 2 replies; 50+ messages in thread
From: Glen Choo @ 2022-04-15 23:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Emily Shaffer, justin, Taylor Blau, martinvonz, brian m. carlson

Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Yes, I mean that even the current directory will be ignored when
>> discovery is disabled.
>>
>>>                                                I am not sure that
>>> is realistically feasible (I am thinking of cases like "git fetch"
>>> going to the remote repository on the local disk that is bare to run
>>> "git upload-pack"), but if the fallout is not too bad, it may be a
>>> good heuristics.
>>
>> Good detail - I hadn't considered the impact on our own child processes.
>> I suspect this might be a huge undertaking. Unless there is significant
>> interest in this option, I probably won't pursue it further.
>
> So, I was doing a bit of experiment.  The change to the set-up
> essentially means that working in either a bare repository or in the
> .git/ subdirectory of a non-bare repository that you used to be able
> to do with
>
>     cd bare.git && git cmd
>     cd .git/ && git cmd
>
> is now forbidden, unless you explicitly say which directory you want
> to use as the GIT_DIR, i.e.
>
>     cd bare.git && GIT_DIR=. git cmd
>     cd .git/ && GIT_DIR=. git cmd
>
> The required change to the code is surprisingly small, but the
> fallout is very big.  Partial patches to two tests are attached with
> some commentary at the end.

Thanks for running the experiment and the thorough writeup! Within the
Google Git team, there was quite a bit of interest in this approach
because of how simple it is, but the projected impact on users is
potentially very high; this experiment is a great starting heuristic.

> Interestingly, many breakages in tests are not due to "clone"
> failing to operate in "the other side".  For example, this one tries
> to inspect a bare repository that was created by a clone operation,
> and we need to tell "git" where the GIT_DIR is, even after we chdir
> to it.  I do not think this is so common but with our popularity,
> not-so-common population is still an absolute large number of users
> that we do not want to hurt.
>
> diff --git c/t/t5601-clone.sh w/t/t5601-clone.sh
> index 4a61f2c901..822ee3164a 100755
> --- c/t/t5601-clone.sh
> +++ w/t/t5601-clone.sh
> @@ -111,9 +111,9 @@ test_expect_success 'clone --mirror' '
>  	git clone --mirror src mirror &&
>  	test -f mirror/HEAD &&
>  	test ! -f mirror/file &&
> -	FETCH="$(cd mirror && git config remote.origin.fetch)" &&
> +	FETCH="$(cd mirror && GIT_DIR=. git config remote.origin.fetch)" &&
>  	test "+refs/*:refs/*" = "$FETCH" &&
> -	MIRROR="$(cd mirror && git config --bool remote.origin.mirror)" &&
> +	MIRROR="$(cd mirror && GIT_DIR=. git config --bool remote.origin.mirror)" &&
>  	test "$MIRROR" = true
>  
>  '
> @@ -151,7 +151,7 @@ test_expect_success 'clone --mirror does not repeat tags' '
>  	 git tag some-tag HEAD) &&
>  	git clone --mirror src mirror2 &&
>  	(cd mirror2 &&
> -	 git show-ref 2> clone.err > clone.out) &&
> +	 GIT_DIR=. git show-ref 2>clone.err >clone.out) &&
>  	! grep Duplicate mirror2/clone.err &&
>  	grep some-tag mirror2/clone.out

The assumption when I made this proposal is that --git-dir is a
sufficient workaround when we don't detect bare repos. As such, I'm of
the opinion that this is the "expected" result of such a change - we
really do expect that `git clone --bare` no longer works without
--git-dir.

I don't think this case (by itself) means that we can ignore bare repos.

> The next example is "init" test, which seems to reveal that refusing
> to discover a bare repository is not exactly an approach that is
> workable.
>
> The first two hunks just show the same pattern as above.  The
> check_config helper is given a directory, under which 'config' and
> 'refs' should exist (in other words, the first part of the helper
> emulates setup.c::is_git_directory()).  Then it tries to read from
> the configuration file found in that directory.  Be it a bare
> repository or .git/ subdirectory of a non-bare repository the caller
> has given us, we need to do the same GIT_DIR=. dance as the above.
>
> The last hunk does not really fix.  The scenario is this:
>
>     * A bare repository bare-ancestor-aliased.git is prepared
>
>     * Its config file says "[alias] aliasedinit=init" in it
>
>     * In that bare repository (next to 'refs', 'config'), we create
>       a subdirectory 'plain-nested'.
>
>     * We go to that bare-ancestor-aliased.git/plain-nested/
>       directory and say "git aliasedinit".
>
> This wants to be able to create a subdirectory in this bare
> repository and make that subdirectory an independent repository (it
> happens to be a non-bare repository, but a bare repository should
> also work---think of .git/modules/name/ bare repository that is
> designed to be pointed at with gitfile: link from a submodule
> working tree).
>
> Now, in order for the last step to be able to even _use_ the alias,
> it has to treat the bare-ancestor-aliased.git directory as a bare
> repository and read its configuration.  But by explicitly setting
> GIT_DIR to that location, the behaviour of "git init" itself is
> changed.  It no longer initializes the current directory,
> i.e. plain-nested subdirectory of the bare-ancestor-aliased.git;
> GIT_DIR tells us to (re)initialize it instead.

However, this case shows that --git-dir won't work at all with "git
init", and I wouldn't be surprised if there are other breakages hiding
just out of plain sight. This worries me a lot more, and I think
disabling bare repo detection wholesale might be a nonstarter.

I wonder if all we need to do is make setup.c a little smarter - we
recognize bare repos, but *only* if they are contained in a directory
named `.git/`. This should fix "git init" I think, because even though
we'd ignore "./ (bare)", we'd recognize `../.git/` and we get the right
behavior again.

I haven't tested it yet, but this proposal sounds like it has quite a
few merits to me:

- Easy to explain.
- Easy rationalization ("We used to be quite cavalier about detecting
  bare repos, but now we're being more strict by default").
- Fixes the embedded bare repo problem (since we don't allow .git).
- No-op and neglible performance hit for non-bare repo users, even if
  they occasionally run "git" inside ".git".

As with the original proposal of "ignore bare repos altogether", this
will still be a headache for bare repo users (unless they don't mind
renaming their bare repo directory to ".git"), so this behavior needs to
be configurable, but perhaps the fallout is small enough that this
could be a safe default.

> This fundamentally does not work.
>
> It is a separate matter if it makes sense to allow creating a new
> repository inside a bare repository (it does---that is what the
> modern submodule layout uses), or to allow an alias to help doing so
> defined in the top-level bare repository (it probably does---the
> end-user may want to have a handy way to customize how submodules
> are configured).  But it seems to tell us that with today's external
> interface we have for "git init", we cannot take configuration from
> a shallower level and use it to drive "git init" to create a new
> repository at a deeper level.
>  '

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-15 23:45         ` Glen Choo
@ 2022-04-15 23:59           ` Glen Choo
  2022-04-16  1:00           ` Taylor Blau
  1 sibling, 0 replies; 50+ messages in thread
From: Glen Choo @ 2022-04-15 23:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Emily Shaffer, justin, Taylor Blau, martinvonz, brian m. carlson


> I wonder if all we need to do is make setup.c a little smarter - we
> recognize bare repos, but *only* if they are contained in a directory
> named `.git/`. This should fix "git init" I think, because even though
> we'd ignore "./ (bare)", we'd recognize `../.git/` and we get the right
> behavior again.
> 
> > It is a separate matter if it makes sense to allow creating a new
> > repository inside a bare repository (it does---that is what the
> > modern submodule layout uses), or to allow an alias to help doing so
> > defined in the top-level bare repository (it probably does---the
> > end-user may want to have a handy way to customize how submodules
> > are configured).  But it seems to tell us that with today's external
> > interface we have for "git init", we cannot take configuration from
> > a shallower level and use it to drive "git init" to create a new
> > repository at a deeper level.

Hah, I forgot that submodules are stored as `.git/modules/foo` and not
`.git/modules/foo/.git`, so this doesn't preserve the existing behavior
for submodules nested > 1 level.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-06 22:43 Bare repositories in the working tree are a security risk Glen Choo
                   ` (3 preceding siblings ...)
  2022-04-13 20:37 ` Glen Choo
@ 2022-04-16  0:34 ` Glen Choo
  2022-04-16  0:41 ` Glen Choo
  5 siblings, 0 replies; 50+ messages in thread
From: Glen Choo @ 2022-04-16  0:34 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, justin, Taylor Blau


Hi all! Thanks for chiming in.

I'm sorry I wasn't able to participate in a timely manner, and unfortunately
(once again..), I will be out of the office (returning 27 Apr, PDT) and my
colleagues will be keeping tabs on the discussion in my stead (once again..).

Nevertheless, I think the conversation has been pretty fruitful, so I'm
optimistic about next steps. Here's my understanding of the conversation thus
far - do chime in if you feel I'm off base or if I've missed something:

* We all agree that something needs to be done about embedded bare repos. This
  is a pretty good starting point IMO, because we agree that 'do nothing' isn't
  a good response.

* There are use cases for embedded bare repos that don't have great alternatives
  (e.g. libgit2 uses bare repos in its tests). Even if this workflow is frowned
  upon (I personally don't think we should support it), I don't think we're
  ready to categorically declare that Git should ban embedded bare repos
  altogether (e.g. the way we ban .GiT).

* We want additional protection on the client besides `git fsck`; there are 
  a few ways to do this:

  1. Prevent checking out an embedded bare repo.
  2. Detect if the bare repo is embedded and refuse to work with it.
  3. Detect if the bare repo is embedded and do not read its config/hooks, but
     everything else still 'works'.
  4. Don't detect bare repos.
  5. Only detect bare repos that are named `.git` [1].

  (I've responded with my thoughts on each of these approaches in-thread).

With that in mind, here's what I propose we do next:

* Merge the `git fsck` patch [2] if we think that it is useful despite the
  potentially huge number of false positives. That patch needs some fixing; I'll
  make the changes when I'm back.

* I'll experiment with (5), and if it seems promising, I'll propose this as an
  opt-in feature, with the intent of making it opt-out in the future. We'll
  opt-into this at Google to help figure out if this is a good default or not.

* If that direction turns out not to be promising, I'll pursue (1), since that
  is the only option that can be configured per-repo, which should hopefully
  minimize the fallout.

Given that this embedded bare repo problem has been known for a long time, I
don't think we need to rush out a fix, but (especially since I'll be OOO) I'm
more than happy for someone to take my ideas (or any ideas) and run with them.

[1] https://lore.kernel.org/git/kl6l5yn99ahv.fsf@chooglen-macbookpro.roam.corp.google.com/
[2] https://lore.kernel.org/git/20220406232231.47714-1-chooglen@google.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-06 22:43 Bare repositories in the working tree are a security risk Glen Choo
                   ` (4 preceding siblings ...)
  2022-04-16  0:34 ` Glen Choo
@ 2022-04-16  0:41 ` Glen Choo
  2022-04-16  1:28   ` Taylor Blau
  5 siblings, 1 reply; 50+ messages in thread
From: Glen Choo @ 2022-04-16  0:41 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, justin, Taylor Blau, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Junio C Hamano, brian m. carlson, rsbecker


(resending this because I think the original got blocked, and I messed
up the CC list anyway.)

Hi all! Thanks for chiming in.

I'm sorry I wasn't able to participate in a timely manner, and unfortunately
(once again..), I will be out of the office (returning 27 Apr, PDT) and my
colleagues will be keeping tabs on the discussion in my stead (once again..).

Nevertheless, I think the conversation has been pretty fruitful, so I'm
optimistic about next steps. Here's my understanding of the conversation thus
far - do chime in if you feel I'm off base or if I've missed something:

* We all agree that something needs to be done about embedded bare repos. This
  is a pretty good starting point IMO, because we agree that 'do nothing' isn't
  a good response.

* There are use cases for embedded bare repos that don't have great alternatives
  (e.g. libgit2 uses bare repos in its tests). Even if this workflow is frowned
  upon (I personally don't think we should support it), I don't think we're
  ready to categorically declare that Git should ban embedded bare repos
  altogether (e.g. the way we ban .GiT).

* We want additional protection on the client besides `git fsck`; there are 
  a few ways to do this:

  1. Prevent checking out an embedded bare repo.
  2. Detect if the bare repo is embedded and refuse to work with it.
  3. Detect if the bare repo is embedded and do not read its config/hooks, but
     everything else still 'works'.
  4. Don't detect bare repos.
  5. Only detect bare repos that are named `.git` [1].

  (I've responded with my thoughts on each of these approaches in-thread).

With that in mind, here's what I propose we do next:

* Merge the `git fsck` patch [2] if we think that it is useful despite the
  potentially huge number of false positives. That patch needs some fixing; I'll
  make the changes when I'm back.

* I'll experiment with (5), and if it seems promising, I'll propose this as an
  opt-in feature, with the intent of making it opt-out in the future. We'll
  opt-into this at Google to help figure out if this is a good default or not.

* If that direction turns out not to be promising, I'll pursue (1), since that
  is the only option that can be configured per-repo, which should hopefully
  minimize the fallout.

Given that this embedded bare repo problem has been known for a long time, I
don't think we need to rush out a fix, but (especially since I'll be OOO) I'm
more than happy for someone to take my ideas (or any ideas) and run with them.

[1] https://lore.kernel.org/git/kl6l5yn99ahv.fsf@chooglen-macbookpro.roam.corp.google.com/
[2] https://lore.kernel.org/git/20220406232231.47714-1-chooglen@google.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-15 22:17           ` Junio C Hamano
@ 2022-04-16  0:52             ` Taylor Blau
  0 siblings, 0 replies; 50+ messages in thread
From: Taylor Blau @ 2022-04-16  0:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Glen Choo, git, Emily Shaffer, justin, martinvonz, brian m. carlson

On Fri, Apr 15, 2022 at 03:17:54PM -0700, Junio C Hamano wrote:
> What worries me more is the effect on _other_ people's server
> implementations ("server", because that is one major class of use
> case of bare repositories).  Essentially they need to identify the
> places where they depend on the current behaviour (i.e. going to the
> bare repository is enough to use it) and export GIT_DIR=. when they
> invoke "git" there.  The actual change that is necessary might be
> small, but identifying the places that need such changes may be
> added burden to them.

I'm definitely biased here, but I think that this is probably too big of
a burden to place on forge administrators.

Every repository on GitHub (and I suspect GitLab, too) is written as a
bare repository on disk, and so having to inject `GIT_DIR=.` into every
git invocation, while doable, seems like an unnecessary pain to impose.

I suppose you could hide this behavior behind an opt-in configuration
setting, e.g., `core.bareDiscovery` which determines whether or not we
will try to discover bare repositories without `GIT_DIR` in the
environment. That setting could default to true to avoid breaking
existing use-cases.

But I think that's just masking the overall pain that this approach
would incur, without providing much protection, so I wouldn't be unhappy
to see us pursue a different approach here.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-15 23:45         ` Glen Choo
  2022-04-15 23:59           ` Glen Choo
@ 2022-04-16  1:00           ` Taylor Blau
  2022-04-16  1:18             ` Junio C Hamano
  1 sibling, 1 reply; 50+ messages in thread
From: Taylor Blau @ 2022-04-16  1:00 UTC (permalink / raw)
  To: Glen Choo
  Cc: Junio C Hamano, git, Emily Shaffer, justin, martinvonz, brian m. carlson

On Fri, Apr 15, 2022 at 04:45:32PM -0700, Glen Choo wrote:
> However, this case shows that --git-dir won't work at all with "git
> init", and I wouldn't be surprised if there are other breakages hiding
> just out of plain sight. This worries me a lot more, and I think
> disabling bare repo detection wholesale might be a nonstarter.
>
> I wonder if all we need to do is make setup.c a little smarter - we
> recognize bare repos, but *only* if they are contained in a directory
> named `.git/`. This should fix "git init" I think, because even though
> we'd ignore "./ (bare)", we'd recognize `../.git/` and we get the right
> behavior again.
>
> I haven't tested it yet, but this proposal sounds like it has quite a
> few merits to me:
>
> - Easy to explain.
> - Easy rationalization ("We used to be quite cavalier about detecting
>   bare repos, but now we're being more strict by default").
> - Fixes the embedded bare repo problem (since we don't allow .git).
> - No-op and neglible performance hit for non-bare repo users, even if
>   they occasionally run "git" inside ".git".
>
> As with the original proposal of "ignore bare repos altogether", this
> will still be a headache for bare repo users (unless they don't mind
> renaming their bare repo directory to ".git"), so this behavior needs to
> be configurable, but perhaps the fallout is small enough that this
> could be a safe default.

I think you are underestimating the fallout, at least if I'm
understanding your proposal correctly.

Is the proposal to only detect bare repositories that are called `.git`?
I think that's what you're suggesting, though can't we just as easily
embed a bare repository named ".git" in a clone as long as its not in
the root directory?

But like I said in an earlier message, I think that this is vastly
underestimating the number of legitimate repositories that are stored as
bare repos on disk (not embedded in a non-bare repo, nor named ".git").

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-16  1:00           ` Taylor Blau
@ 2022-04-16  1:18             ` Junio C Hamano
  2022-04-16  1:30               ` Taylor Blau
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2022-04-16  1:18 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Glen Choo, git, Emily Shaffer, justin, martinvonz, brian m. carlson

Taylor Blau <me@ttaylorr.com> writes:

> Is the proposal to only detect bare repositories that are called `.git`?
> I think that's what you're suggesting, though can't we just as easily
> embed a bare repository named ".git" in a clone as long as its not in
> the root directory?

I do not think "you can use your bare repository as before ONLY if
the directory is named .git; otherwise you must use GIT_DIR to point
at it" would fly; the Glen's exception may help many uses of ".git
subdirectory of a non-bare repository as if it were a bare" you can
find in tests, but does not help real-world use cases where there
may be bunch of bare repositories named "$project.git" at all.

But I have to point out that your attack above would not work, as we
do not allow ".git" directory in the index to begin with.  IOW, you
as an attacker may be able to prepare such a tree with nonstandard
tools, but the victim won't be able to check it out (and
fsck-during-transfer would probably block the cloning).

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-16  0:41 ` Glen Choo
@ 2022-04-16  1:28   ` Taylor Blau
  2022-04-21 18:25     ` Emily Shaffer
  2022-04-29 23:57     ` Glen Choo
  0 siblings, 2 replies; 50+ messages in thread
From: Taylor Blau @ 2022-04-16  1:28 UTC (permalink / raw)
  To: Glen Choo
  Cc: git, Emily Shaffer, justin, Taylor Blau, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Junio C Hamano, brian m. carlson, rsbecker

On Fri, Apr 15, 2022 at 05:41:59PM -0700, Glen Choo wrote:
> * We all agree that something needs to be done about embedded bare repos. This
>   is a pretty good starting point IMO, because we agree that 'do nothing' isn't
>   a good response.

To be totally honest, I am not absolutely convinced.

I agree that it's sub-optimal that Git is an attack vector for remote
code execution, but I think there is significant social engineering
required in order to meaningfully exploit this. Particularly because an
attacker must convince their victim to:

  - clone the repository,
  - cd into the embedded bare repository, and
  - run a git command

Scripting around the output of git commands in your PS1 makes the latter
more likely, so I think it's worthwhile to explore how to either prevent
this type of attack or make it substantially less likely to have a user
run git commands that execute parts of the config opportunistically.

That said, I think there are other approaches that we could take that
would hopefully disrupt fewer existing workflows.

> * There are use cases for embedded bare repos that don't have great alternatives
>   (e.g. libgit2 uses bare repos in its tests). Even if this workflow is frowned
>   upon (I personally don't think we should support it), I don't think we're
>   ready to categorically declare that Git should ban embedded bare repos
>   altogether (e.g. the way we ban .GiT).

I think there are a handful of legitimate reasons that we might want to
continue supporting this; for projects like libgit2 and Git LFS, it's
useful to have repositories in a known state to execute tests in. Having
bare Git repositories contained in some "test fixture" directory is a
really easy way to do just that.

> * We want additional protection on the client besides `git fsck`; there are
>   a few ways to do this:

I'm a little late to chime into the thread, but I appreciate you
summarizing some of the approaches discussed so far. Let me add my
thoughts on each of these in order:

>   1. Prevent checking out an embedded bare repo.
>   2. Detect if the bare repo is embedded and refuse to work with it.
>   3. Detect if the bare repo is embedded and do not read its config/hooks, but
>      everything else still 'works'.
>   4. Don't detect bare repos.
>   5. Only detect bare repos that are named `.git` [1].
>
>   (I've responded with my thoughts on each of these approaches in-thread).

  1. Likely disrupts too many legitimate workflows for us to adopt
     without designing some way to declare an embedded bare repository
     is "safe".
  2. Ditto.
  3. This seems the most promising approach so far. Similar to (1), I
     would also want to make sure we provide an easy way to declare a
     bare repository as "safe" in order to avoid permanently disrupting
     valid workflows that have accumulated over the past >15 years.
  4. Seems like this approach is too heavy-handed.
  5. Ditto.

> With that in mind, here's what I propose we do next:
>
> * Merge the `git fsck` patch [2] if we think that it is useful despite the
>   potentially huge number of false positives. That patch needs some fixing; I'll
>   make the changes when I'm back.

If there are lots of false positives, we should consider downgrading the
severity of the proposed `EMBEDDED_BARE_REPO` fsck check to "info". I'm
not clear if there is another reason why this patch would have a
significant number of false positives (i.e., if the detection mechanism
is over-zealous).

But if not, and this does detect only legitimate embedded bare
repositories, we should use it as a reminder to consider how many
use-cases and workflows will be affected by whatever approach we take
here, if any.

> * I'll experiment with (5), and if it seems promising, I'll propose this as an
>   opt-in feature, with the intent of making it opt-out in the future. We'll
>   opt-into this at Google to help figure out if this is a good default or not.
>
> * If that direction turns out not to be promising, I'll pursue (1), since that
>   is the only option that can be configured per-repo, which should hopefully
>   minimize the fallout.

Here's an alternative approach, which I haven't seen discussed thus far:

When a bare repository is embedded in another repository, avoid reading
its config by default. This means that most Git commands will still
work, but without the possibility of running any "executable" portions
of the config. To opt-out (i.e., to allow legitimate use-cases to start
reading embedded bare repository config again), the embedding repository
would have to set a multi-valued `safe.embeddedRepo` configuration. This
would specify a list of paths relative to the embedding repository's
root of known-safe bare repositories.

I think there are a couple of desirable attributes of this approach:

  - It minimally disrupts bare repositories, restricting the change to
    only embedded repositories.
  - It allows most Git commands to continue working as expected (modulo
    reading the config), hopefully making the population of users whose
    workflows will suddenly break pretty small.
  - It requires the user to explicitly opt-in to the unsafe behavior,
    because an attacker could not influence the embedding repository's
    `safe.embeddedRepo` config.

If we were going to do something about this, I would strongly advocate
for something that resembles the above. Or at the very least, some
solution that captures the attributes I outlined there.

I would be happy to work together with you (or anybody!) on developing
patches in that direction, so let me know if you are interested in
coordinating our efforts.

> Given that this embedded bare repo problem has been known for a long time, I
> don't think we need to rush out a fix, but (especially since I'll be OOO) I'm
> more than happy for someone to take my ideas (or any ideas) and run with them.

No rush. Regardless of your time out-of-office, we should take advantage
of the fact that this is a long-known wart to carefully craft a solution
that provides some additional safety while disrupting as few existing
workflows as possible.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-16  1:18             ` Junio C Hamano
@ 2022-04-16  1:30               ` Taylor Blau
  0 siblings, 0 replies; 50+ messages in thread
From: Taylor Blau @ 2022-04-16  1:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Taylor Blau, Glen Choo, git, Emily Shaffer, justin, martinvonz,
	brian m. carlson

On Fri, Apr 15, 2022 at 06:18:14PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Is the proposal to only detect bare repositories that are called `.git`?
> > I think that's what you're suggesting, though can't we just as easily
> > embed a bare repository named ".git" in a clone as long as its not in
> > the root directory?
>
> I do not think "you can use your bare repository as before ONLY if
> the directory is named .git; otherwise you must use GIT_DIR to point
> at it" would fly; the Glen's exception may help many uses of ".git
> subdirectory of a non-bare repository as if it were a bare" you can
> find in tests, but does not help real-world use cases where there
> may be bunch of bare repositories named "$project.git" at all.

Agreed.

> But I have to point out that your attack above would not work, as we
> do not allow ".git" directory in the index to begin with.  IOW, you
> as an attacker may be able to prepare such a tree with nonstandard
> tools, but the victim won't be able to check it out (and
> fsck-during-transfer would probably block the cloning).

Makes sense, and thanks for the reminder; I agree.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-16  1:28   ` Taylor Blau
@ 2022-04-21 18:25     ` Emily Shaffer
  2022-04-21 18:29       ` Emily Shaffer
  2022-04-21 19:09       ` Taylor Blau
  2022-04-29 23:57     ` Glen Choo
  1 sibling, 2 replies; 50+ messages in thread
From: Emily Shaffer @ 2022-04-21 18:25 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Glen Choo, Git List, justin, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Junio C Hamano, brian m. carlson, Randall S. Becker

On Fri, Apr 15, 2022 at 6:28 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Fri, Apr 15, 2022 at 05:41:59PM -0700, Glen Choo wrote:
> > * We want additional protection on the client besides `git fsck`; there are
> >   a few ways to do this:
>
> I'm a little late to chime into the thread, but I appreciate you
> summarizing some of the approaches discussed so far. Let me add my
> thoughts on each of these in order:
>
> >   1. Prevent checking out an embedded bare repo.
> >   2. Detect if the bare repo is embedded and refuse to work with it.
> >   3. Detect if the bare repo is embedded and do not read its config/hooks, but
> >      everything else still 'works'.
> >   4. Don't detect bare repos.
> >   5. Only detect bare repos that are named `.git` [1].
> >
> >   (I've responded with my thoughts on each of these approaches in-thread).
>
>   1. Likely disrupts too many legitimate workflows for us to adopt
>      without designing some way to declare an embedded bare repository
>      is "safe".
>   2. Ditto.
>   3. This seems the most promising approach so far. Similar to (1), I
>      would also want to make sure we provide an easy way to declare a
>      bare repository as "safe" in order to avoid permanently disrupting
>      valid workflows that have accumulated over the past >15 years.

I'd like to think a little more about how we want to determine that a
bare repo isn't embedded - wantonly scanning up the filesystem for any
gitdir above the current one is really expensive. When I tried that
approach for the purposes of including some shared config between
superproject and submodules, it slowed down the Git test suite by
something like 3-5x. So, I suppose that "we think this is bare" means
refs/ and objects/ present in a directory that isn't named '.git/',
and then we hunt anywhere above us for another .git/? Of course being
careful not to accept another bare repo as the "parent" gitdir.

Does it work for submodules? I suppose it works for non-bare
submodules - and for bare submodules, e.g.
'submodule-having-project.git/modules/some-submodule/' we can rely on
the user to turn that flag on if they want their submodule's config
and hooks to be noticed from the gitdir. (From
'worktree-for-submodule-having-project/some-submodule' there is a
'.git' pointer, so I'd expect things to work normally that way,
right?)

As long as we are careful to avoid searching the filesystem in *every*
case, this seems like a pretty reasonable approach to me.

>   4. Seems like this approach is too heavy-handed.
>   5. Ditto.
>
> > With that in mind, here's what I propose we do next:
> >
> > * Merge the `git fsck` patch [2] if we think that it is useful despite the
> >   potentially huge number of false positives. That patch needs some fixing; I'll
> >   make the changes when I'm back.
>
> If there are lots of false positives, we should consider downgrading the
> severity of the proposed `EMBEDDED_BARE_REPO` fsck check to "info". I'm
> not clear if there is another reason why this patch would have a
> significant number of false positives (i.e., if the detection mechanism
> is over-zealous).
>
> But if not, and this does detect only legitimate embedded bare
> repositories, we should use it as a reminder to consider how many
> use-cases and workflows will be affected by whatever approach we take
> here, if any.
>
> > * I'll experiment with (5), and if it seems promising, I'll propose this as an
> >   opt-in feature, with the intent of making it opt-out in the future. We'll
> >   opt-into this at Google to help figure out if this is a good default or not.
> >
> > * If that direction turns out not to be promising, I'll pursue (1), since that
> >   is the only option that can be configured per-repo, which should hopefully
> >   minimize the fallout.
>
> Here's an alternative approach, which I haven't seen discussed thus far:
>
> When a bare repository is embedded in another repository, avoid reading
> its config by default. This means that most Git commands will still
> work, but without the possibility of running any "executable" portions
> of the config. To opt-out (i.e., to allow legitimate use-cases to start
> reading embedded bare repository config again), the embedding repository
> would have to set a multi-valued `safe.embeddedRepo` configuration. This
> would specify a list of paths relative to the embedding repository's
> root of known-safe bare repositories.
>
> I think there are a couple of desirable attributes of this approach:
>
>   - It minimally disrupts bare repositories, restricting the change to
>     only embedded repositories.
>   - It allows most Git commands to continue working as expected (modulo
>     reading the config), hopefully making the population of users whose
>     workflows will suddenly break pretty small.
>   - It requires the user to explicitly opt-in to the unsafe behavior,
>     because an attacker could not influence the embedding repository's
>     `safe.embeddedRepo` config.
>
> If we were going to do something about this, I would strongly advocate
> for something that resembles the above. Or at the very least, some
> solution that captures the attributes I outlined there.

Nice - a more strict spin on proposal 3 from above, if I understand it
right. Rather than allowing any and all bare repos, avoid someone
sneaking in a malicious one next to legitimate ones by using an
allowlist. Seems reasonable to me.

 - Emily

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-21 18:25     ` Emily Shaffer
@ 2022-04-21 18:29       ` Emily Shaffer
  2022-04-21 18:47         ` Junio C Hamano
  2022-04-21 19:09       ` Taylor Blau
  1 sibling, 1 reply; 50+ messages in thread
From: Emily Shaffer @ 2022-04-21 18:29 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Glen Choo, Git List, justin, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Junio C Hamano, brian m. carlson, Randall S. Becker

On Thu, Apr 21, 2022 at 11:25 AM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> On Fri, Apr 15, 2022 at 6:28 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > On Fri, Apr 15, 2022 at 05:41:59PM -0700, Glen Choo wrote:
> > > * We want additional protection on the client besides `git fsck`; there are
> > >   a few ways to do this:
> >
> > I'm a little late to chime into the thread, but I appreciate you
> > summarizing some of the approaches discussed so far. Let me add my
> > thoughts on each of these in order:
> >
> > >   1. Prevent checking out an embedded bare repo.
> > >   2. Detect if the bare repo is embedded and refuse to work with it.
> > >   3. Detect if the bare repo is embedded and do not read its config/hooks, but
> > >      everything else still 'works'.
> > >   4. Don't detect bare repos.
> > >   5. Only detect bare repos that are named `.git` [1].
> > >
> > >   (I've responded with my thoughts on each of these approaches in-thread).
> >
> >   1. Likely disrupts too many legitimate workflows for us to adopt
> >      without designing some way to declare an embedded bare repository
> >      is "safe".
> >   2. Ditto.
> >   3. This seems the most promising approach so far. Similar to (1), I
> >      would also want to make sure we provide an easy way to declare a
> >      bare repository as "safe" in order to avoid permanently disrupting
> >      valid workflows that have accumulated over the past >15 years.
>
> I'd like to think a little more about how we want to determine that a
> bare repo isn't embedded - wantonly scanning up the filesystem for any
> gitdir above the current one is really expensive. When I tried that
> approach for the purposes of including some shared config between
> superproject and submodules, it slowed down the Git test suite by
> something like 3-5x. So, I suppose that "we think this is bare" means
> refs/ and objects/ present in a directory that isn't named '.git/',
> and then we hunt anywhere above us for another .git/? Of course being
> careful not to accept another bare repo as the "parent" gitdir.
>
> Does it work for submodules? I suppose it works for non-bare
> submodules - and for bare submodules, e.g.
> 'submodule-having-project.git/modules/some-submodule/' we can rely on
> the user to turn that flag on if they want their submodule's config
> and hooks to be noticed from the gitdir. (From
> 'worktree-for-submodule-having-project/some-submodule' there is a
> '.git' pointer, so I'd expect things to work normally that way,
> right?)
>
> As long as we are careful to avoid searching the filesystem in *every*
> case, this seems like a pretty reasonable approach to me.
>
> >   4. Seems like this approach is too heavy-handed.
> >   5. Ditto.
> >
> > > With that in mind, here's what I propose we do next:
> > >
> > > * Merge the `git fsck` patch [2] if we think that it is useful despite the
> > >   potentially huge number of false positives. That patch needs some fixing; I'll
> > >   make the changes when I'm back.
> >
> > If there are lots of false positives, we should consider downgrading the
> > severity of the proposed `EMBEDDED_BARE_REPO` fsck check to "info". I'm
> > not clear if there is another reason why this patch would have a
> > significant number of false positives (i.e., if the detection mechanism
> > is over-zealous).
> >
> > But if not, and this does detect only legitimate embedded bare
> > repositories, we should use it as a reminder to consider how many
> > use-cases and workflows will be affected by whatever approach we take
> > here, if any.
> >
> > > * I'll experiment with (5), and if it seems promising, I'll propose this as an
> > >   opt-in feature, with the intent of making it opt-out in the future. We'll
> > >   opt-into this at Google to help figure out if this is a good default or not.
> > >
> > > * If that direction turns out not to be promising, I'll pursue (1), since that
> > >   is the only option that can be configured per-repo, which should hopefully
> > >   minimize the fallout.
> >
> > Here's an alternative approach, which I haven't seen discussed thus far:
> >
> > When a bare repository is embedded in another repository, avoid reading
> > its config by default. This means that most Git commands will still
> > work, but without the possibility of running any "executable" portions
> > of the config. To opt-out (i.e., to allow legitimate use-cases to start
> > reading embedded bare repository config again), the embedding repository
> > would have to set a multi-valued `safe.embeddedRepo` configuration. This
> > would specify a list of paths relative to the embedding repository's
> > root of known-safe bare repositories.
> >
> > I think there are a couple of desirable attributes of this approach:
> >
> >   - It minimally disrupts bare repositories, restricting the change to
> >     only embedded repositories.
> >   - It allows most Git commands to continue working as expected (modulo
> >     reading the config), hopefully making the population of users whose
> >     workflows will suddenly break pretty small.
> >   - It requires the user to explicitly opt-in to the unsafe behavior,
> >     because an attacker could not influence the embedding repository's
> >     `safe.embeddedRepo` config.
> >
> > If we were going to do something about this, I would strongly advocate
> > for something that resembles the above. Or at the very least, some
> > solution that captures the attributes I outlined there.
>
> Nice - a more strict spin on proposal 3 from above, if I understand it
> right. Rather than allowing any and all bare repos, avoid someone
> sneaking in a malicious one next to legitimate ones by using an
> allowlist. Seems reasonable to me.

Ah, another thing I forgot to mention. There has been a little
discussion in the past about isolating "safe" parts of config (and
gitdir) from "unsafe" parts, e.g. "which configs and embedded scripts
are executables", to help better protect from zipfile-type attacks,
which are very similar to this kind of attack. I wonder if it makes
sense to consider that sort of thing as a needswork for this bugfix?
e.g. "/* NEEDSWORK: Only ignore unsafe configs and hooks instead of
ignoring the entire embedded config and hooks in the future */"?

>
>  - Emily

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-21 18:29       ` Emily Shaffer
@ 2022-04-21 18:47         ` Junio C Hamano
  2022-04-21 18:54           ` Taylor Blau
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2022-04-21 18:47 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Taylor Blau, Glen Choo, Git List, justin, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	brian m. carlson, Randall S. Becker

Emily Shaffer <emilyshaffer@google.com> writes:

>> Nice - a more strict spin on proposal 3 from above, if I understand it
>> right. Rather than allowing any and all bare repos, avoid someone
>> sneaking in a malicious one next to legitimate ones by using an
>> allowlist. Seems reasonable to me.
>
> Ah, another thing I forgot to mention. There has been a little
> discussion in the past about isolating "safe" parts of config (and
> gitdir) from "unsafe" parts, e.g. "which configs and embedded scripts
> are executables", to help better protect from zipfile-type attacks,
> which are very similar to this kind of attack. I wonder if it makes
> sense to consider that sort of thing as a needswork for this bugfix?
> e.g. "/* NEEDSWORK: Only ignore unsafe configs and hooks instead of
> ignoring the entire embedded config and hooks in the future */"?

There have been such discussions in the past and they all went
nowhere because such safe-listing fundamentally does not work.  What
you consider "safe" today may turn out to be "unsafe" and in a later
version of Git will stop honoring it, and for those who depended on
it being listed as "safe", such a security fix will be a regression.

Disabling the whole thing if the file can be tainted is the only
sensible way forward without promising users that they will be hurt
by such changes/regressions in the future, I would think.

So, no, I do not think such a NEEDSWORK comment is welcome.

Thanks.



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-21 18:47         ` Junio C Hamano
@ 2022-04-21 18:54           ` Taylor Blau
  0 siblings, 0 replies; 50+ messages in thread
From: Taylor Blau @ 2022-04-21 18:54 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Emily Shaffer, Taylor Blau, Glen Choo, Git List, justin,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason,
	Derrick Stolee, brian m. carlson, Randall S. Becker

On Thu, Apr 21, 2022 at 11:47:34AM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
>
> > Ah, another thing I forgot to mention. There has been a little
> > discussion in the past about isolating "safe" parts of config (and
> > gitdir) from "unsafe" parts, e.g. "which configs and embedded scripts
> > are executables", to help better protect from zipfile-type attacks,
> > which are very similar to this kind of attack. I wonder if it makes
> > sense to consider that sort of thing as a needswork for this bugfix?
> > e.g. "/* NEEDSWORK: Only ignore unsafe configs and hooks instead of
> > ignoring the entire embedded config and hooks in the future */"?
>
> There have been such discussions in the past and they all went
> nowhere because such safe-listing fundamentally does not work.  What
> you consider "safe" today may turn out to be "unsafe" and in a later
> version of Git will stop honoring it, and for those who depended on
> it being listed as "safe", such a security fix will be a regression.
>
> Disabling the whole thing if the file can be tainted is the only
> sensible way forward without promising users that they will be hurt
> by such changes/regressions in the future, I would think.

I assume when Junio says "safe-listing" he is talking about your "which
configs and embedded scripts are executables". I have tossed that idea
around in my own head for a little while, and in addition to the points
that Junio refers to, I think that this could be confusing for users.

I worry about forcing the user to consider which parts of their
config and hooks are being read/ignored, and then re-interpret their
meaning in light of that.

That seems like it would be an unnecessarily tricky position to put
users in, and I think we could get around it by either reading the
config/hooks, or ignoring them entirely.

(It also seems error-prone to me: just trying to list which parts of our
config could lead to command-execution is challenging for me at least.
In addition to the ongoing maintenance cost, a clever attacker would
almost certainly be able to spot some obscure piece of config that we
didn't consider and then use it in their attack.)

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-21 18:25     ` Emily Shaffer
  2022-04-21 18:29       ` Emily Shaffer
@ 2022-04-21 19:09       ` Taylor Blau
  2022-04-21 21:01         ` Emily Shaffer
  1 sibling, 1 reply; 50+ messages in thread
From: Taylor Blau @ 2022-04-21 19:09 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Taylor Blau, Glen Choo, Git List, justin, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Junio C Hamano, brian m. carlson, Randall S. Becker

On Thu, Apr 21, 2022 at 11:25:39AM -0700, Emily Shaffer wrote:
> On Fri, Apr 15, 2022 at 6:28 PM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > On Fri, Apr 15, 2022 at 05:41:59PM -0700, Glen Choo wrote:
> > > * We want additional protection on the client besides `git fsck`; there are
> > >   a few ways to do this:
> >
> > I'm a little late to chime into the thread, but I appreciate you
> > summarizing some of the approaches discussed so far. Let me add my
> > thoughts on each of these in order:
> >
> > >   1. Prevent checking out an embedded bare repo.
> > >   2. Detect if the bare repo is embedded and refuse to work with it.
> > >   3. Detect if the bare repo is embedded and do not read its config/hooks, but
> > >      everything else still 'works'.
> > >   4. Don't detect bare repos.
> > >   5. Only detect bare repos that are named `.git` [1].
> > >
> > >   (I've responded with my thoughts on each of these approaches in-thread).
> >
> >   1. Likely disrupts too many legitimate workflows for us to adopt
> >      without designing some way to declare an embedded bare repository
> >      is "safe".
> >   2. Ditto.
> >   3. This seems the most promising approach so far. Similar to (1), I
> >      would also want to make sure we provide an easy way to declare a
> >      bare repository as "safe" in order to avoid permanently disrupting
> >      valid workflows that have accumulated over the past >15 years.
>
> I'd like to think a little more about how we want to determine that a
> bare repo isn't embedded - wantonly scanning up the filesystem for any
> gitdir above the current one is really expensive. When I tried that
> approach for the purposes of including some shared config between
> superproject and submodules, it slowed down the Git test suite by
> something like 3-5x. So, I suppose that "we think this is bare" means
> refs/ and objects/ present in a directory that isn't named '.git/',
> and then we hunt anywhere above us for another .git/? Of course being
> careful not to accept another bare repo as the "parent" gitdir.

Definitely worth considering. Fundamentally I think if you're going to
allow checking out embedded bare repositories (which I strongly think
that we should), any search for "is this repository embedded in another
one?" needs to walk along its entire path up to the filesystem root.

I'd think that we could offer some tools, perhaps "safe.boundary",
specifying a list of directories to halt our traversal at. IOW, if you
have a bunch of bare repositories underneath /data/repositories, and
know that they're all safe, you could set your boundary there to signal
to Git "if my path is a subdirectory of anything in safe.boundary, then
I don't care whether the repo is embedded or not".

> Does it work for submodules? I suppose it works for non-bare
> submodules - and for bare submodules, e.g.
> 'submodule-having-project.git/modules/some-submodule/' we can rely on
> the user to turn that flag on if they want their submodule's config
> and hooks to be noticed from the gitdir. (From
> 'worktree-for-submodule-having-project/some-submodule' there is a
> '.git' pointer, so I'd expect things to work normally that way,
> right?)

I don't enough about submodules to comment here, sorry.

> > Here's an alternative approach, which I haven't seen discussed thus
> > far: [...]
>
> Nice - a more strict spin on proposal 3 from above, if I understand it
> right. Rather than allowing any and all bare repos, avoid someone
> sneaking in a malicious one next to legitimate ones by using an
> allowlist. Seems reasonable to me.

We'd probably want to allow saying "all embedded bare repositories are
safe to read config/hooks from", too. I hadn't considered this approach
as a way to read some embedded repos and not others; I suspect the
overwhelmingly common use-case would be: `git config --local
safe.embeddedRepo '*'`.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-21 19:09       ` Taylor Blau
@ 2022-04-21 21:01         ` Emily Shaffer
  2022-04-21 21:22           ` Taylor Blau
  0 siblings, 1 reply; 50+ messages in thread
From: Emily Shaffer @ 2022-04-21 21:01 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Glen Choo, Git List, justin, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Junio C Hamano, brian m. carlson, Randall S. Becker,
	Martin von Zweigbergk

On Thu, Apr 21, 2022 at 12:09 PM Taylor Blau <me@ttaylorr.com> wrote:
> On Thu, Apr 21, 2022 at 11:25:39AM -0700, Emily Shaffer wrote:
> > On Fri, Apr 15, 2022 at 6:28 PM Taylor Blau <me@ttaylorr.com> wrote:
> > >   3. This seems the most promising approach so far. Similar to (1), I
> > >      would also want to make sure we provide an easy way to declare a
> > >      bare repository as "safe" in order to avoid permanently disrupting
> > >      valid workflows that have accumulated over the past >15 years.
> >
> > I'd like to think a little more about how we want to determine that a
> > bare repo isn't embedded - wantonly scanning up the filesystem for any
> > gitdir above the current one is really expensive. When I tried that
> > approach for the purposes of including some shared config between
> > superproject and submodules, it slowed down the Git test suite by
> > something like 3-5x. So, I suppose that "we think this is bare" means
> > refs/ and objects/ present in a directory that isn't named '.git/',
> > and then we hunt anywhere above us for another .git/? Of course being
> > careful not to accept another bare repo as the "parent" gitdir.
>
> Definitely worth considering. Fundamentally I think if you're going to
> allow checking out embedded bare repositories (which I strongly think
> that we should), any search for "is this repository embedded in another
> one?" needs to walk along its entire path up to the filesystem root.
>
> I'd think that we could offer some tools, perhaps "safe.boundary",
> specifying a list of directories to halt our traversal at. IOW, if you
> have a bunch of bare repositories underneath /data/repositories, and
> know that they're all safe, you could set your boundary there to signal
> to Git "if my path is a subdirectory of anything in safe.boundary, then
> I don't care whether the repo is embedded or not".

Ah, I'm sorry to say that I was more on board before I read this mail
than I am after :)

I don't think there is much reason to continue searching above the
first found '.git' because we disallow '.git' from being committed or
checked out, right? So this extra filesystem hunting and ceiling math
seems unnecessary to me. I know I am slightly allergic to searching
the filesystem for a parent repo to begin with, so I'm sure I've got
some bias here ;)

> > > Here's an alternative approach, which I haven't seen discussed thus
> > > far: [...]
> >
> > Nice - a more strict spin on proposal 3 from above, if I understand it
> > right. Rather than allowing any and all bare repos, avoid someone
> > sneaking in a malicious one next to legitimate ones by using an
> > allowlist. Seems reasonable to me.
>
> We'd probably want to allow saying "all embedded bare repositories are
> safe to read config/hooks from", too. I hadn't considered this approach
> as a way to read some embedded repos and not others; I suspect the
> overwhelmingly common use-case would be: `git config --local
> safe.embeddedRepo '*'`.

Ah, I dislike this option for the exact reason I mentioned - avoiding
a malicious repo being snuck in next to legitimate repos. I'd prefer
to rely on exact matching only - but as the config needs to be set by
every contributor every time the set of bare repos changes, that
sounds impossible to manage for a project which may be constantly
adding and removing these repos.

That said.... I'm biased again, but if you want to be certain of the
state of another repository in order to run tests in it... why not use
a submodule for that repository? Not a helpful comment for those
already using embedded bare repos, though ;)

Thanks for the food for thought.
 - Emily

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-21 21:01         ` Emily Shaffer
@ 2022-04-21 21:22           ` Taylor Blau
  0 siblings, 0 replies; 50+ messages in thread
From: Taylor Blau @ 2022-04-21 21:22 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Taylor Blau, Glen Choo, Git List, justin, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Junio C Hamano, brian m. carlson, Randall S. Becker,
	Martin von Zweigbergk

On Thu, Apr 21, 2022 at 02:01:44PM -0700, Emily Shaffer wrote:
> I don't think there is much reason to continue searching above the
> first found '.git' because we disallow '.git' from being committed or
> checked out, right? So this extra filesystem hunting and ceiling math
> seems unnecessary to me. I know I am slightly allergic to searching
> the filesystem for a parent repo to begin with, so I'm sure I've got
> some bias here ;)

Right; once we find a non-bare embedding repository we should absolutely
stop our search. I was suggesting we could stop our search earlier for
environments that use non-embedded bare repositories with a fairly deep
directory structure.

E.g., if you store lots of bare repositories in /data/repositories and
you know that none of them are embedded, we could quickly determine
whether or not the cwd is a descendent of /data/repositories and avoid
the search entirely if so.

> > We'd probably want to allow saying "all embedded bare repositories are
> > safe to read config/hooks from", too. I hadn't considered this approach
> > as a way to read some embedded repos and not others; I suspect the
> > overwhelmingly common use-case would be: `git config --local
> > safe.embeddedRepo '*'`.
>
> Ah, I dislike this option for the exact reason I mentioned - avoiding
> a malicious repo being snuck in next to legitimate repos. I'd prefer
> to rely on exact matching only - but as the config needs to be set by
> every contributor every time the set of bare repos changes, that
> sounds impossible to manage for a project which may be constantly
> adding and removing these repos.

I share your distaste for this sort of thing, but I think we have to
recognize that there are likely to be many repos that embed dozens or
more bare repos inside of them, and asking them to opt-in each one
individually seems like a fairly large request for Git to make of them.

> That said.... I'm biased again, but if you want to be certain of the
> state of another repository in order to run tests in it... why not use
> a submodule for that repository? Not a helpful comment for those
> already using embedded bare repos, though ;)

Exactly, re: your last sentence.

> Thanks for the food for thought.

Ditto!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-16  1:28   ` Taylor Blau
  2022-04-21 18:25     ` Emily Shaffer
@ 2022-04-29 23:57     ` Glen Choo
  2022-04-30  1:14       ` Taylor Blau
  2022-05-02 14:05       ` Philip Oakley
  1 sibling, 2 replies; 50+ messages in thread
From: Glen Choo @ 2022-04-29 23:57 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Emily Shaffer, justin, Taylor Blau, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Junio C Hamano, brian m. carlson, rsbecker

Thanks so much for the response - it's really helpful, and there's a lot of food
for thought here.

(Sorry that I didn't get back to you sooner, there really is a lot of
great stuff here to think about :))

Taylor Blau <me@ttaylorr.com> writes:

> On Fri, Apr 15, 2022 at 05:41:59PM -0700, Glen Choo wrote:
>> * There are use cases for embedded bare repos that don't have great alternatives
>>   (e.g. libgit2 uses bare repos in its tests). Even if this workflow is frowned
>>   upon (I personally don't think we should support it), I don't think we're
>>   ready to categorically declare that Git should ban embedded bare repos
>>   altogether (e.g. the way we ban .GiT).
>
> I think there are a handful of legitimate reasons that we might want to
> continue supporting this; for projects like libgit2 and Git LFS, it's
> useful to have repositories in a known state to execute tests in. Having
> bare Git repositories contained in some "test fixture" directory is a
> really easy way to do just that.

If I were designing Git from scratch, I would probably block embedded bare repos
from being committed altogether - if an embedded bare repo doesn't behave
particularly differently from `.git` (which we pretty much agree we should not
support), then this is just an inherently dangerous way to work.

But yes, we have historically allowed embedded bare repos, and I don't think we
should stop supporting them altogether. For instance, I don't see a good
alternative for the test fixture use case:

- Submodules aren't a good fit because they only allow you to include the
  contents of a submodule's tree, whereas in a test fixture, you really do want
  the gitdir internals to be source controlled so that you get nice predictable
  results.
- Users could store the repos in some other form e.g. CDN, tarball. It's fine
  when running from a test script, but it's pretty awful to author/review any
  changes.
- Perhaps the users could munge the bare repo at commit time e.g. instead of
  storing (refs/, objects/, HEAD), they could store (test_refs/, test_objects/,
  test_HEAD), which would later get turned into the bare repo in the test
  script. It's a little silly, but not unreasonable for a test script, I think.

So I'll leave behind this idea of "blocking embedding bare repos" for
now; I think there are more promising proposals in this thread.

>>   1. Prevent checking out an embedded bare repo.
>>   2. Detect if the bare repo is embedded and refuse to work with it.
>>   3. Detect if the bare repo is embedded and do not read its config/hooks, but
>>      everything else still 'works'.
>>   4. Don't detect bare repos.
>>   5. Only detect bare repos that are named `.git` [1].
>>
>>   (I've responded with my thoughts on each of these approaches in-thread).
>
>   1. Likely disrupts too many legitimate workflows for us to adopt
>      without designing some way to declare an embedded bare repository
>      is "safe".
>   2. Ditto.
>   3. This seems the most promising approach so far. Similar to (1), I
>      would also want to make sure we provide an easy way to declare a
>      bare repository as "safe" in order to avoid permanently disrupting
>      valid workflows that have accumulated over the past >15 years.
>   4. Seems like this approach is too heavy-handed.
>   5. Ditto.

If I understand you correctly, it seems like we can ship any of the options from
1.-3., provided there is an easy way to opt-in known, "safe" bare repos.

After some more reflection, I tend to agree that 4.-5. are too heavy-handed for
the general population because they would break nearly all bare repo users. The
performance cost of detecting an embedded bare repo isn't ideal either, but that
can be easily fixed via GIT_CEILING_DIRECTORIES or an allow-list of "safe"
repos, and it's a lot gentler than just ignoring bare repos altogether.

I suspect that there is a subset of the user population who would love to
disable bare repo detection because they a) never expect to see bare repos in a
given environment and b) want to be defensive about the possibility of
unexpected bare repos. Since this really heavy-handed, I don't think this fixes
the embedded bare repo problem for everyone, but that subset of users will be
very well-served :) I think that there must be some coherent framework that
encompasses this idea + the allow-list you proposed, but I haven't come up with
one yet.

>> With that in mind, here's what I propose we do next:
>>
>> * Merge the `git fsck` patch [2] if we think that it is useful despite the
>>   potentially huge number of false positives. That patch needs some fixing; I'll
>>   make the changes when I'm back.
>
> If there are lots of false positives, we should consider downgrading the
> severity of the proposed `EMBEDDED_BARE_REPO` fsck check to "info". I'm
> not clear if there is another reason why this patch would have a
> significant number of false positives (i.e., if the detection mechanism
> is over-zealous).
>
> But if not, and this does detect only legitimate embedded bare
> repositories, we should use it as a reminder to consider how many
> use-cases and workflows will be affected by whatever approach we take
> here, if any.

Yes, that's good to keep in mind. After mulling about it some more, I don't have
a clear direction on the fsck patch to be honest, I'll leave this alone for now
and I'll return to it if I get a clearer picture.

>> * I'll experiment with (5), and if it seems promising, I'll propose this as an
>>   opt-in feature, with the intent of making it opt-out in the future. We'll
>>   opt-into this at Google to help figure out if this is a good default or not.
>>
>> * If that direction turns out not to be promising, I'll pursue (1), since that
>>   is the only option that can be configured per-repo, which should hopefully
>>   minimize the fallout.
>
> Here's an alternative approach, which I haven't seen discussed thus far:
>
> When a bare repository is embedded in another repository, avoid reading
> its config by default. This means that most Git commands will still
> work, but without the possibility of running any "executable" portions
> of the config. To opt-out (i.e., to allow legitimate use-cases to start
> reading embedded bare repository config again), the embedding repository
> would have to set a multi-valued `safe.embeddedRepo` configuration. This
> would specify a list of paths relative to the embedding repository's
> root of known-safe bare repositories.
>
> I think there are a couple of desirable attributes of this approach:
>
>   - It minimally disrupts bare repositories, restricting the change to
>     only embedded repositories.
>   - It allows most Git commands to continue working as expected (modulo
>     reading the config), hopefully making the population of users whose
>     workflows will suddenly break pretty small.
>   - It requires the user to explicitly opt-in to the unsafe behavior,
>     because an attacker could not influence the embedding repository's
>     `safe.embeddedRepo` config.
>
> If we were going to do something about this, I would strongly advocate
> for something that resembles the above. Or at the very least, some
> solution that captures the attributes I outlined there.

I really like the `safe.embeddedRepo` idea, though I'm not convinced about
"respect only the safe parts of the embedded repo". I'll address the latter
first.

I think brian m. carlson was coming from a similar direction, and the "respect
only the safe parts of the embedded repo" part of the proposal sounds similar to
his [1]. Both seem to be motivated by your second point - protect as many
workflows as we can. It's a good guiding principle, and I think it's a good
place to start from. That said, I'm not sure that this proposal serves these
users that well:

- Not reading the config might break the embedded bare repos in ways we don't
  expect (e.g. not reading core.repositoryformatversion).

- Users who use embedded bare repos as test fixtures presumably want their tests
  to mimic real-world scenarios as closely as possible; running in this
  half-state of "use some parts of the repo but not others" doesn't seem a good
  fit for that use case.

- This complicates the rules significantly for the user, who now has to figure
  out which parts of the bare repo are respected and which are not.

- I'm also of the opinion that changing the rules like this actually does affect
  workflows, even if it doesn't break libgit2's tests.
  - A diligent user still has to convince themselves that the tests are passing
    for the right reasons, possibly adapting to the new rules (e.g. by
    selectively enabling `safe.embeddedRepo` on the right test fixtures).
  - A less diligent user might not even realize the change has happened and
    end up with difficult to debug results somewhere down the line.

I'm also not keen on it for other reasons:

- This expands the attack surface significantly, and I'm pessimistic that we
  can maintain a list of the 'safe' parts of a bare repo. A lot of attention has
  been focused on config/hooks, but I wouldn't be surprised if a creative
  attacker finds some other avenue that we missed (maybe a buffer overflow
  attack on a malicious index file?).

- I expect that this is also going to be really complex to implement and
  maintain; instead of looking in a single gitdir for everything, we now look in
  two gitdirs.

What is promising is an allow-listing scheme like `safe.embeddedRepo` that can
be enabled on a per-repository basis. . Using an allowlist to selectively choose
*entire* embedded bare repos preserves the first and third attributes you
described, and it keeps things simple(-ish) for us and users. Breaking libgit2
and Git LFS this way is pretty harsh, but it will give us the confidence that we
have communicated the behavior change (and fix!) to the relevant users, rather
than having them find out the wrong way.

Some extra thoughts (in case they're helpful):

- It's pretty important to get the format of `safe.embeddedRepo` 'correct', but
  what 'correct' is is up for debate. For example, should we allow '*'? (I think
  so, but I know some don't ;)).

- There might be some unifying principles behind "allowlisting certain embedded
  bare repos" and "disabling/enabling bare repo detection" that can guide our
  fix.
  - Perhaps we could allow different 'levels' of bare repo protection, like
    'allow all bare repos', 'allow only non-embedded bare repos', 'allow no bare
    repos', 'allow embedded bare repos but not their configs".

  - If we do want to discourage embedded bare repos (and flip the default), this
    kind of gradual roll-out might give projects a way to incrementally migrate.

[1] https://lore.kernel.org/git/Yk9Wcr74gvhtyOi7@camp.crustytoothpaste.net

>
> I would be happy to work together with you (or anybody!) on developing
> patches in that direction, so let me know if you are interested in
> coordinating our efforts.

Thanks! Once again, I really appreciate the response. I think it's moved the
discussion forward in a meaningful way :) I think the easiest next step is for
me (or whoever :)) to propose patches and to hash out the details there.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-29 23:57     ` Glen Choo
@ 2022-04-30  1:14       ` Taylor Blau
  2022-05-02 19:39         ` Glen Choo
  2022-05-02 14:05       ` Philip Oakley
  1 sibling, 1 reply; 50+ messages in thread
From: Taylor Blau @ 2022-04-30  1:14 UTC (permalink / raw)
  To: Glen Choo
  Cc: git, Emily Shaffer, justin, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Junio C Hamano, brian m. carlson, rsbecker

On Fri, Apr 29, 2022 at 04:57:51PM -0700, Glen Choo wrote:
> Thanks so much for the response - it's really helpful, and there's a lot of food
> for thought here.
>
> (Sorry that I didn't get back to you sooner, there really is a lot of
> great stuff here to think about :))

No problem, I'm glad that you found it helpful (and likewise!).

> So I'll leave behind this idea of "blocking embedding bare repos" for
> now; I think there are more promising proposals in this thread.

I am in strong agreement with you here, but I would add an additional
point which is that even if we encouraged submodules over embedded bare
repos, or suggested storing bare repos on CDNs or what have you, there's
significant momentum in the "do nothing" category which we have to take
into account, too.

So even if we made it as easy as possible to convert to, for e.g., using
submodules, getting the millions (?) of repositories with embedded bare
repositories in them that have accumulated over the years to actually
change seems virtually impossible to me.

> >>   1. Prevent checking out an embedded bare repo.
> >>   2. Detect if the bare repo is embedded and refuse to work with it.
> >>   3. Detect if the bare repo is embedded and do not read its config/hooks, but
> >>      everything else still 'works'.
> >>   4. Don't detect bare repos.
> >>   5. Only detect bare repos that are named `.git` [1].
> >>
> >>   (I've responded with my thoughts on each of these approaches in-thread).
> >
> >   1. Likely disrupts too many legitimate workflows for us to adopt
> >      without designing some way to declare an embedded bare repository
> >      is "safe".
> >   2. Ditto.
> >   3. This seems the most promising approach so far. Similar to (1), I
> >      would also want to make sure we provide an easy way to declare a
> >      bare repository as "safe" in order to avoid permanently disrupting
> >      valid workflows that have accumulated over the past >15 years.
> >   4. Seems like this approach is too heavy-handed.
> >   5. Ditto.
>
> If I understand you correctly, it seems like we can ship any of the options from
> 1.-3., provided there is an easy way to opt-in known, "safe" bare repos.

After thinking about it some more, I think that we should probably try
to ship (3) of the ones that we agree are viable, but more on that
below...

> Yes, that's good to keep in mind. After mulling about it some more, I don't have
> a clear direction on the fsck patch to be honest, I'll leave this alone for now
> and I'll return to it if I get a clearer picture.

Sounds good. I'm happy to have ideas bounced off of me in the meantime ;).

> I really like the `safe.embeddedRepo` idea, though I'm not convinced about
> "respect only the safe parts of the embedded repo". I'll address the latter
> first.

To be clear, I am advocating for "only the safe parts" insofar as "read
repository extensions, core.repositoryFormatVersion and literally
nothing else". I'm definitely not suggesting we go and enumerate every
configurable value, determine whether it's safe or not, and then read
only the safe ones. That approach seems doomed to fail, since no matter
how clever we are, there will always be some slightly-cleverer attacker
who can find a vector that we missed.

> I think brian m. carlson was coming from a similar direction, and the "respect
> only the safe parts of the embedded repo" part of the proposal sounds similar to
> his [1]. Both seem to be motivated by your second point - protect as many
> workflows as we can. It's a good guiding principle, and I think it's a good
> place to start from. That said, I'm not sure that this proposal serves these
> users that well:
>
> - Not reading the config might break the embedded bare repos in ways we don't
>   expect (e.g. not reading core.repositoryformatversion).

I'm hoping that if we include this among the list of essential- and
known-safe config values that we'll mitigate this reasonably well.

> - Users who use embedded bare repos as test fixtures presumably want their tests
>   to mimic real-world scenarios as closely as possible; running in this
>   half-state of "use some parts of the repo but not others" doesn't seem a good
>   fit for that use case.

It's hard to estimate how many of these tests will get broken. But I
think the important trade-off to consider between "abort all Git
operations in embedded bare repositories" and "warn, and avoid reading
config/hooks" is how disruptive the change will be. The latter seems far
less disruptive to me, so I would rather us favor that over an
overly-conservative approach.

> - This complicates the rules significantly for the user, who now has to figure
>   out which parts of the bare repo are respected and which are not.

On this point I disagree, but I suspect we weren't on the same page
about what "only the safe parts" meant when you wrote this. To be
extra-extra clear, I don't think we should read some parts of config and
not other, I mean we should read _only_ the above listed parts (the
format version and extensions) and nothing else.

> - I'm also of the opinion that changing the rules like this actually does affect
>   workflows, even if it doesn't break libgit2's tests.
>   - A diligent user still has to convince themselves that the tests are passing
>     for the right reasons, possibly adapting to the new rules (e.g. by
>     selectively enabling `safe.embeddedRepo` on the right test fixtures).
>   - A less diligent user might not even realize the change has happened and
>     end up with difficult to debug results somewhere down the line.

I am sympathetic to what you're saying, but I (a) think there's still a
tradeoff here that doesn't obviously point us in one direction or the
other and (b) we should equally keep in mind other workflows besides
just test fixtures. Does that change our thinking at all? I'm not sure.

> I'm also not keen on it for other reasons:
>
> - This expands the attack surface significantly, and I'm pessimistic that we
>   can maintain a list of the 'safe' parts of a bare repo. A lot of attention has
>   been focused on config/hooks, but I wouldn't be surprised if a creative
>   attacker finds some other avenue that we missed (maybe a buffer overflow
>   attack on a malicious index file?).

I disagree, though again I suspect we were thinking of different thing
when saying "only read safe parts of the config". Still though, I would
argue that it limits the attack surface at the right level, which is to
say any vector that we _did_ miss is something that we should just fix
(e.g., preventing a buffer overflow) and not "oops, this config value
does specify an executable".

(We shouldn't have to deal with the index file, though, since a bare
repository would not read the index, no?).

> - I expect that this is also going to be really complex to implement and
>   maintain; instead of looking in a single gitdir for everything, we now look in
>   two gitdirs.

I'd think that any approach we take that has different behavior
for bare repositories depending on whether or not they are embedded has
to do a similar check, so I don't think this adds significant
complexity. Though not having written any code here yet, I'd take what I
say with a huge grain of salt ;-).

> What is promising is an allow-listing scheme like `safe.embeddedRepo` that can
> be enabled on a per-repository basis. . Using an allowlist to selectively choose
> *entire* embedded bare repos preserves the first and third attributes you
> described, and it keeps things simple(-ish) for us and users. Breaking libgit2
> and Git LFS this way is pretty harsh, but it will give us the confidence that we
> have communicated the behavior change (and fix!) to the relevant users, rather
> than having them find out the wrong way.

Hmm. I still am pretty convicted that this (avoid working with unknown
embedded bare repositories) is too harsh of a change to
make the default. Replace "Breaking libgit2 and Git LFS" with "breaking
many hundreds of thousands of repositories" [1], and I think that we
would need to come up with something more lightweight.

> Some extra thoughts (in case they're helpful):
>
> - It's pretty important to get the format of `safe.embeddedRepo` 'correct', but
>   what 'correct' is is up for debate. For example, should we allow '*'? (I think
>   so, but I know some don't ;)).

Stolee (cc'd) will have an interesting perspective here (at least as it
relates to the 2.35.3 release), I think.

> - There might be some unifying principles behind "allowlisting certain embedded
>   bare repos" and "disabling/enabling bare repo detection" that can guide our
>   fix.
>   - Perhaps we could allow different 'levels' of bare repo protection, like
>     'allow all bare repos', 'allow only non-embedded bare repos', 'allow no bare
>     repos', 'allow embedded bare repos but not their configs".
>
>   - If we do want to discourage embedded bare repos (and flip the default), this
>     kind of gradual roll-out might give projects a way to incrementally migrate.

Ah! Are you suggesting a global configuration setting that controls the
behavior of embedded bare repositories that _aren't_ listed in a
repositories safe.embeddedRepo list?

That sort of thing could work, though I'd argue strongly that any Git
2.x.y release should make the default behavior "avoid config/hooks"
as opposed to "refuse to work". We could consider changing that default
in Git 3.x.y, but I feel strongly that any Git 2.x.y release should
cater as much to existing workflows as possible without significantly
compromising on attack surface area.

Thanks,
Taylor

[1]: I have no idea if that figure is right, but I suspect it's in the
right order of magnitude. I could look into it further, though.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-29 23:57     ` Glen Choo
  2022-04-30  1:14       ` Taylor Blau
@ 2022-05-02 14:05       ` Philip Oakley
  2022-05-02 18:50         ` Junio C Hamano
  1 sibling, 1 reply; 50+ messages in thread
From: Philip Oakley @ 2022-05-02 14:05 UTC (permalink / raw)
  To: Glen Choo, Taylor Blau
  Cc: git, Emily Shaffer, justin, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Junio C Hamano, brian m. carlson, rsbecker

On 30/04/2022 00:57, Glen Choo wrote:
> If I were designing Git from scratch, I would probably block embedded bare repos
> from being committed altogether - if an embedded bare repo doesn't behave
> particularly differently from `.git` (which we pretty much agree we should not
> support), then this is just an inherently dangerous way to work.
>
> But yes, we have historically allowed embedded bare repos, and I don't think we
> should stop supporting them altogether. For instance, I don't see a good
> alternative for the test fixture use case:
>
> - Submodules aren't a good fit because they only allow you to include the
>   contents of a submodule's tree, whereas in a test fixture, you really do want
>   the gitdir internals to be source controlled so that you get nice predictable
>   results.
> - Users could store the repos in some other form e.g. CDN, tarball. It's fine
>   when running from a test script, but it's pretty awful to author/review any
>   changes.

Can I check that storing bundles (of other repos) within a repository is
considered reasonably safe?

I have been looking at how Git's documentation could carry with it small
exemplar repositories that cover the commit hierarchies shown in the
various man pages to allow users to see, explore and understand the man
page examples. I'd settled on bundles as a reasonable compromise, with
the exemplar repositories being generated via the test suite
(obviously). With the recent focus on security issues, I thought it
worth asking now.

> - Perhaps the users could munge the bare repo at commit time e.g. instead of
>   storing (refs/, objects/, HEAD), they could store (test_refs/, test_objects/,
>   test_HEAD), which would later get turned into the bare repo in the test
>   script. It's a little silly, but not unreasonable for a test script, I think.
--
Philip

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-05-02 14:05       ` Philip Oakley
@ 2022-05-02 18:50         ` Junio C Hamano
  0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2022-05-02 18:50 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Glen Choo, Taylor Blau, git, Emily Shaffer, justin,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason,
	Derrick Stolee, brian m. carlson, rsbecker

Philip Oakley <philipoakley@iee.email> writes:

> Can I check that storing bundles (of other repos) within a repository is
> considered reasonably safe?

I think the threat model we are protecting against is that we
shouldn't have to worry about "git clone" and then "cd" into the
hierarchy.  If you did "make" in there and their Makefile (or
"runme.sh" shipped as part of the tree) is malicious, that is not
something we can solve.

So, I would say it is safe to have a bundle, as in its statinary
state it does not actively do anything bad, even when you did "ls"
in a directory that stores it.

> I have been looking at how Git's documentation could carry with it small
> exemplar repositories that cover the commit hierarchies shown in the
> various man pages to allow users to see, explore and understand the man
> page examples. I'd settled on bundles as a reasonable compromise, with
> the exemplar repositories being generated via the test suite
> (obviously). With the recent focus on security issues, I thought it
> worth asking now.

A bundle would be an OK vehicle.  This depends on the size of the
sample project, but another that may be more suitable may be to
create the repository on the fly in the "test suite" you were
planning to use to extract from the bundle.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Bare repositories in the working tree are a security risk
  2022-04-30  1:14       ` Taylor Blau
@ 2022-05-02 19:39         ` Glen Choo
  0 siblings, 0 replies; 50+ messages in thread
From: Glen Choo @ 2022-05-02 19:39 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Emily Shaffer, justin, Johannes Schindelin,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Junio C Hamano, brian m. carlson, rsbecker

Taylor Blau <me@ttaylorr.com> writes:

>> >>   1. Prevent checking out an embedded bare repo.
>> >>   2. Detect if the bare repo is embedded and refuse to work with it.
>> >>   3. Detect if the bare repo is embedded and do not read its config/hooks, but
>> >>      everything else still 'works'.
>> >>   4. Don't detect bare repos.
>> >>   5. Only detect bare repos that are named `.git` [1].
>> >>
>> >>   (I've responded with my thoughts on each of these approaches in-thread).
>> >
>> >   1. Likely disrupts too many legitimate workflows for us to adopt
>> >      without designing some way to declare an embedded bare repository
>> >      is "safe".
>> >   2. Ditto.
>> >   3. This seems the most promising approach so far. Similar to (1), I
>> >      would also want to make sure we provide an easy way to declare a
>> >      bare repository as "safe" in order to avoid permanently disrupting
>> >      valid workflows that have accumulated over the past >15 years.
>> >   4. Seems like this approach is too heavy-handed.
>> >   5. Ditto.
>>
>> If I understand you correctly, it seems like we can ship any of the options from
>> 1.-3., provided there is an easy way to opt-in known, "safe" bare repos.
>
> After thinking about it some more, I think that we should probably try
> to ship (3) of the ones that we agree are viable, but more on that
> below...
>
>> I really like the `safe.embeddedRepo` idea, though I'm not convinced about
>> "respect only the safe parts of the embedded repo". I'll address the latter
>> first.
>
> To be clear, I am advocating for "only the safe parts" insofar as "read
> repository extensions, core.repositoryFormatVersion and literally
> nothing else". I'm definitely not suggesting we go and enumerate every
> configurable value, determine whether it's safe or not, and then read
> only the safe ones. That approach seems doomed to fail, since no matter
> how clever we are, there will always be some slightly-cleverer attacker
> who can find a vector that we missed.

[...]

>> - This complicates the rules significantly for the user, who now has to figure
>>   out which parts of the bare repo are respected and which are not.
>
> On this point I disagree, but I suspect we weren't on the same page
> about what "only the safe parts" meant when you wrote this. To be
> extra-extra clear, I don't think we should read some parts of config and
> not other, I mean we should read _only_ the above listed parts (the
> format version and extensions) and nothing else.

Ah, to clarify, I'm taking an even more pessimistic stance here, which
is that I'd prefer to avoid trusting any parts of the repo (not the
config) as "safe", e.g. I'm not even keen on something as
innocuous-looking as "We trust _only_ HEAD, refs/ and objects/".

Maybe this is overly-pessimistic though; your response (further down)
suggests that this might not be as bad as I think.

>> - I'm also of the opinion that changing the rules like this actually does affect
>>   workflows, even if it doesn't break libgit2's tests.
>>   - A diligent user still has to convince themselves that the tests are passing
>>     for the right reasons, possibly adapting to the new rules (e.g. by
>>     selectively enabling `safe.embeddedRepo` on the right test fixtures).
>>   - A less diligent user might not even realize the change has happened and
>>     end up with difficult to debug results somewhere down the line.
>
> I am sympathetic to what you're saying, but I (a) think there's still a
> tradeoff here that doesn't obviously point us in one direction or the
> other and (b) we should equally keep in mind other workflows besides
> just test fixtures. Does that change our thinking at all? I'm not sure.

Just musing here. I think (b) is particularly important to keep in mind.
If a workflow is well-served by something else, I really don't mind
breaking it at all.

I suppose the test fixture use case comes up the most frequently because
it is one that we are familiar with, and doesn't seem to have a good
alternative.

Maybe we're approaching the limits of what we can know without
performing some studies of users in the wild.

>> I'm also not keen on it for other reasons:
>>
>> - This expands the attack surface significantly, and I'm pessimistic that we
>>   can maintain a list of the 'safe' parts of a bare repo. A lot of attention has
>>   been focused on config/hooks, but I wouldn't be surprised if a creative
>>   attacker finds some other avenue that we missed (maybe a buffer overflow
>>   attack on a malicious index file?).
>
> I disagree, though again I suspect we were thinking of different thing
> when saying "only read safe parts of the config". Still though, I would
> argue that it limits the attack surface at the right level, which is to
> say any vector that we _did_ miss is something that we should just fix
> (e.g., preventing a buffer overflow) and not "oops, this config value
> does specify an executable".
>
> (We shouldn't have to deal with the index file, though, since a bare
> repository would not read the index, no?).

I think this comment highlights the difference in opinion (in a good
way!).

IMO 'only reading parts of the repo and not others' will fix this
particular arbitrary code execution problem, but it won't fix (and might
even open us up to) other, future problems with embedded bare repos.

From this comment, I get the impression (and I hope I'm representing you
correctly) that you think these future problems are manageable, and
don't justify breaking more workflows than we need to in order to fix
this arbitrary code execution problem.

I think it's a healthy disagreement though - there's enough common
ground for us to start working our approaches separately (plus I don't
think either of us will be able to convince the other by thinking up
hypotheticals on a mailing list thread :p), and the discussion will
probably get even better once we start comparing, reviewing, and
internally testing the patches.

(Ah, I'm not sure about the index file in particular, I was just
speculating.)

>> - I expect that this is also going to be really complex to implement and
>>   maintain; instead of looking in a single gitdir for everything, we now look in
>>   two gitdirs.
>
> I'd think that any approach we take that has different behavior
> for bare repositories depending on whether or not they are embedded has
> to do a similar check, so I don't think this adds significant
> complexity. Though not having written any code here yet, I'd take what I
> say with a huge grain of salt ;-).

;)

>> - There might be some unifying principles behind "allowlisting certain embedded
>>   bare repos" and "disabling/enabling bare repo detection" that can guide our
>>   fix.
>>   - Perhaps we could allow different 'levels' of bare repo protection, like
>>     'allow all bare repos', 'allow only non-embedded bare repos', 'allow no bare
>>     repos', 'allow embedded bare repos but not their configs".
>>
>>   - If we do want to discourage embedded bare repos (and flip the default), this
>>     kind of gradual roll-out might give projects a way to incrementally migrate.
>
> Ah! Are you suggesting a global configuration setting that controls the
> behavior of embedded bare repositories that _aren't_ listed in a
> repositories safe.embeddedRepo list?

Yes, and an even more restrictive mode that disables bare repo detection
altogether. I suspect this is a safe, non-disruptive default for many
user, but it obviously can't be the default today (or possibly ever?
we'd need a lot of user testing for sure).

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2022-05-02 19:39 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06 22:43 Bare repositories in the working tree are a security risk Glen Choo
2022-04-06 23:22 ` [PATCH] fsck: detect bare repos in trees and warn Glen Choo
2022-04-07 12:42   ` Johannes Schindelin
2022-04-07 13:21     ` Derrick Stolee
2022-04-07 14:14       ` Ævar Arnfjörð Bjarmason
2022-04-14 20:02         ` Glen Choo
2022-04-15 12:46           ` Ævar Arnfjörð Bjarmason
2022-04-07 15:11       ` Junio C Hamano
2022-04-13 22:24       ` Glen Choo
2022-04-07 13:12   ` Ævar Arnfjörð Bjarmason
2022-04-07 15:20   ` Junio C Hamano
2022-04-07 18:38 ` Bare repositories in the working tree are a security risk John Cai
2022-04-07 21:24 ` brian m. carlson
2022-04-07 21:53   ` Justin Steven
2022-04-07 22:10     ` brian m. carlson
2022-04-07 22:40       ` rsbecker
2022-04-08  5:54       ` Junio C Hamano
2022-04-14  0:03         ` Junio C Hamano
2022-04-14  0:04         ` Glen Choo
2022-04-13 23:44       ` Glen Choo
2022-04-13 20:37 ` Glen Choo
2022-04-13 23:36   ` Junio C Hamano
2022-04-14 16:41     ` Glen Choo
2022-04-14 17:35       ` Junio C Hamano
2022-04-14 18:19         ` Junio C Hamano
2022-04-15 21:33         ` Glen Choo
2022-04-15 22:17           ` Junio C Hamano
2022-04-16  0:52             ` Taylor Blau
2022-04-15 22:43           ` Glen Choo
2022-04-15 20:13       ` Junio C Hamano
2022-04-15 23:45         ` Glen Choo
2022-04-15 23:59           ` Glen Choo
2022-04-16  1:00           ` Taylor Blau
2022-04-16  1:18             ` Junio C Hamano
2022-04-16  1:30               ` Taylor Blau
2022-04-16  0:34 ` Glen Choo
2022-04-16  0:41 ` Glen Choo
2022-04-16  1:28   ` Taylor Blau
2022-04-21 18:25     ` Emily Shaffer
2022-04-21 18:29       ` Emily Shaffer
2022-04-21 18:47         ` Junio C Hamano
2022-04-21 18:54           ` Taylor Blau
2022-04-21 19:09       ` Taylor Blau
2022-04-21 21:01         ` Emily Shaffer
2022-04-21 21:22           ` Taylor Blau
2022-04-29 23:57     ` Glen Choo
2022-04-30  1:14       ` Taylor Blau
2022-05-02 19:39         ` Glen Choo
2022-05-02 14:05       ` Philip Oakley
2022-05-02 18:50         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).