All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] avoid touching ref code in non-repositories
@ 2016-03-05 22:08 Jeff King
  2016-03-05 22:10 ` [PATCH 1/6] setup: make startup_info available everywhere Jeff King
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Jeff King @ 2016-03-05 22:08 UTC (permalink / raw)
  To: David Turner; +Cc: git, Michael Haggerty, Nguyễn Thái Ngọc Duy

Here's the series I mentioned earlier, to avoid avoid accessing the ref
code if we know we aren't in a repository. For those just joining us,
the end goal is to be able to detect and complain when code tries to
open a ref without having run check_repository_format() or similar. Once
we have pluggable ref backends, doing so would be wrong, since we won't
know which backend we're supposed to be using.

It's already _kind of_ wrong, in the sense that we should not be looking
for file-backend refs if we know we don't have a repository. But nobody
really noticed in practice, because you do not generally have
".git/HEAD" lying around in non-repository directories. So it was a de
facto noop, though you could construct pathological cases where it did
the wrong thing.

The patches are:

  [1/6]: setup: make startup_info available everywhere
  [2/6]: setup: set startup_info->have_repository more reliably
  [3/6]: remote: don't resolve HEAD in non-repository
  [4/6]: mailmap: do not resolve blobs in a non-repository
  [5/6]: grep: turn off gitlink detection for --no-index
  [6/6]: use setup_git_directory() in test-* programs

There's a sort-of "7/6" I used to find these spots:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 81f68f8..22c1b6d 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -947,6 +947,8 @@ static struct ref_cache *lookup_ref_cache(const char *submodule)
 {
 	struct ref_cache *refs;
 
+	assert(startup_info->have_repository);
+
 	if (!submodule || !*submodule)
 		return &ref_cache;
 
@@ -1414,6 +1416,8 @@ static const char *resolve_ref_1(const char *refname,
 	int depth = MAXDEPTH;
 	int bad_name = 0;
 
+	assert(startup_info->have_repository);
+
 	if (flags)
 		*flags = 0;
 

Running the test suite with that patch and 1/6 turned up the cases I've
fixed here. But I'm not including it here for a few reasons:

  1. I'm not sure that asserting is the best behavior. It's great for
     finding problems (and that's why I used assert() -- to get a
     coredump for the backtrace), but in practice it might be friendlier
     to turn it into a silent noop ("no, I don't even have a ref
     backend, so your ref definitely does not exist").

  2. In a pluggable ref-backend world, this isn't where we'd put the
     asserts, anyway. We'd do it when accessing the ref_backend vtable
     (or just starting with a dummy vtable that asserts, or returns
     "nope, no refs", or whatever).

So it was useful for debugging/analysis now, and we may want something
like it later, but I think that should come on top the ref-backend
series.

-Peff

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 1/6] setup: make startup_info available everywhere
  2016-03-05 22:08 [PATCH 0/6] avoid touching ref code in non-repositories Jeff King
@ 2016-03-05 22:10 ` Jeff King
  2016-03-05 22:11 ` [PATCH 2/6] setup: set startup_info->have_repository more reliably Jeff King
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2016-03-05 22:10 UTC (permalink / raw)
  To: David Turner; +Cc: git, Michael Haggerty, Nguyễn Thái Ngọc Duy

Commit a60645f (setup: remember whether repository was
found, 2010-08-05) introduced the startup_info structure,
which records some parts of the setup_git_directory()
process (notably, whether we actually found a repository or
not).

One of the uses of this data is for functions to behave
appropriately based on whether we are in a repo. But the
startup_info struct is just a pointer to storage provided by
the main program, and the only program that sets it up is
the git.c wrapper. Thus builtins have access to
startup_info, but externally linked programs do not.

Worse, library code which is accessible from both has to be
careful about accessing startup_info. This can be used to
trigger a die("BUG") via get_sha1():

	$ git fast-import <<-\EOF
	tag foo
	from HEAD:./whatever
	EOF

	fatal: BUG: startup_info struct is not initialized.

Obviously that's fairly nonsensical input to feed to
fast-import, but we should never hit a die("BUG"). And there
may be other ways to trigger it if other non-builtins
resolve sha1s.

So let's point the storage for startup_info to a static
variable in setup.c, making it available to all users of the
library code. We _could_ turn startup_info into a regular
extern struct, but doing so would mean tweaking all of the
existing use sites. So let's leave the pointer indirection
in place.  We can, however, drop any checks for NULL, as
they will always be false (and likewise, we can drop the
test covering this case, which was a rather artificial
situation using one of the test-* programs).

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h                        |  2 +-
 environment.c                  |  1 -
 git.c                          |  3 ---
 setup.c                        | 10 ++++++----
 sha1_name.c                    |  3 ---
 t/t1506-rev-parse-diagnosis.sh |  5 -----
 6 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/cache.h b/cache.h
index b829410..9f09540 100644
--- a/cache.h
+++ b/cache.h
@@ -1771,7 +1771,7 @@ int split_cmdline(char *cmdline, const char ***argv);
 /* Takes a negative value returned by split_cmdline */
 const char *split_cmdline_strerror(int cmdline_errno);
 
-/* git.c */
+/* setup.c */
 struct startup_info {
 	int have_repository;
 	const char *prefix;
diff --git a/environment.c b/environment.c
index 6dec9d0..6cc0a77 100644
--- a/environment.c
+++ b/environment.c
@@ -64,7 +64,6 @@ int grafts_replace_parents = 1;
 int core_apply_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
-struct startup_info *startup_info;
 unsigned long pack_size_limit_cfg;
 
 #ifndef PROTECT_HFS_DEFAULT
diff --git a/git.c b/git.c
index 6cc0c07..968a8a4 100644
--- a/git.c
+++ b/git.c
@@ -15,7 +15,6 @@ const char git_more_info_string[] =
 	   "concept guides. See 'git help <command>' or 'git help <concept>'\n"
 	   "to read about a specific subcommand or concept.");
 
-static struct startup_info git_startup_info;
 static int use_pager = -1;
 static char *orig_cwd;
 static const char *env_names[] = {
@@ -637,8 +636,6 @@ int main(int argc, char **av)
 	const char *cmd;
 	int done_help = 0;
 
-	startup_info = &git_startup_info;
-
 	cmd = git_extract_argv0_path(argv[0]);
 	if (!cmd)
 		cmd = "git-help";
diff --git a/setup.c b/setup.c
index de1a2a7..fa7a306 100644
--- a/setup.c
+++ b/setup.c
@@ -7,6 +7,9 @@ static int inside_work_tree = -1;
 static int work_tree_config_is_bogus;
 static struct string_list unknown_extensions = STRING_LIST_INIT_DUP;
 
+static struct startup_info the_startup_info;
+struct startup_info *startup_info = &the_startup_info;
+
 /*
  * The input parameter must contain an absolute path, and it must already be
  * normalized.
@@ -905,10 +908,9 @@ const char *setup_git_directory_gently(int *nongit_ok)
 	else
 		setenv(GIT_PREFIX_ENVIRONMENT, "", 1);
 
-	if (startup_info) {
-		startup_info->have_repository = !nongit_ok || !*nongit_ok;
-		startup_info->prefix = prefix;
-	}
+	startup_info->have_repository = !nongit_ok || !*nongit_ok;
+	startup_info->prefix = prefix;
+
 	return prefix;
 }
 
diff --git a/sha1_name.c b/sha1_name.c
index 3acf221..776101e 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1353,9 +1353,6 @@ static char *resolve_relative_path(const char *rel)
 	if (!starts_with(rel, "./") && !starts_with(rel, "../"))
 		return NULL;
 
-	if (!startup_info)
-		die("BUG: startup_info struct is not initialized.");
-
 	if (!is_inside_work_tree())
 		die("relative path syntax can't be used outside working tree.");
 
diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
index 613d9bf..86c2ff2 100755
--- a/t/t1506-rev-parse-diagnosis.sh
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -166,11 +166,6 @@ test_expect_success 'relative path when cwd is outside worktree' '
 	grep "relative path syntax can.t be used outside working tree." error
 '
 
-test_expect_success 'relative path when startup_info is NULL' '
-	test_must_fail test-match-trees HEAD:./file.txt HEAD:./file.txt 2>error &&
-	grep "BUG: startup_info struct is not initialized." error
-'
-
 test_expect_success '<commit>:file correctly diagnosed after a pathname' '
 	test_must_fail git rev-parse file.txt HEAD:file.txt 1>actual 2>error &&
 	test_i18ngrep ! "exists on disk" error &&
-- 
2.8.0.rc1.318.g2193183

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/6] setup: set startup_info->have_repository more reliably
  2016-03-05 22:08 [PATCH 0/6] avoid touching ref code in non-repositories Jeff King
  2016-03-05 22:10 ` [PATCH 1/6] setup: make startup_info available everywhere Jeff King
@ 2016-03-05 22:11 ` Jeff King
  2016-03-05 22:11 ` [PATCH 3/6] remote: don't resolve HEAD in non-repository Jeff King
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2016-03-05 22:11 UTC (permalink / raw)
  To: David Turner; +Cc: git, Michael Haggerty, Nguyễn Thái Ngọc Duy

When setup_git_directory() is called, we set a flag in
startup_info to indicate we have a repository. But there are
a few other mechanisms by which we might set up a repo:

  1. When creating a new repository via init_db(), we
     transition from no-repo to being in a repo. We should
     tweak this flag at that moment.

  2. In enter_repo(), a stricter form of
     setup_git_directory() used by server-side programs, we
     check the repository format config. After doing so, we
     know we're in a repository, and can set the flag.

With these changes, library code can now reliably tell
whether we are in a repository and act accordingly. We'll
leave the "prefix" field as NULL, which is what happens when
setup_git_directory() finds there is no prefix.

Signed-off-by: Jeff King <peff@peff.net>
---
I did these patches directly on master.  When applied on top of the
other check_repository_format series I posted a few days ago, this one
will have a minor textual conflict (that other series dropped the
pointless return from this function).

 builtin/init-db.c | 1 +
 setup.c           | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 6223b7d..da531f6 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -322,6 +322,7 @@ int set_git_dir_init(const char *git_dir, const char *real_git_dir,
 		set_git_dir(real_path(git_dir));
 		git_link = NULL;
 	}
+	startup_info->have_repository = 1;
 	return 0;
 }
 
diff --git a/setup.c b/setup.c
index fa7a306..3439ec6 100644
--- a/setup.c
+++ b/setup.c
@@ -986,7 +986,9 @@ int check_repository_format_version(const char *var, const char *value, void *cb
 
 int check_repository_format(void)
 {
-	return check_repository_format_gently(get_git_dir(), NULL);
+	check_repository_format_gently(get_git_dir(), NULL);
+	startup_info->have_repository = 1;
+	return 0;
 }
 
 /*
-- 
2.8.0.rc1.318.g2193183

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/6] remote: don't resolve HEAD in non-repository
  2016-03-05 22:08 [PATCH 0/6] avoid touching ref code in non-repositories Jeff King
  2016-03-05 22:10 ` [PATCH 1/6] setup: make startup_info available everywhere Jeff King
  2016-03-05 22:11 ` [PATCH 2/6] setup: set startup_info->have_repository more reliably Jeff King
@ 2016-03-05 22:11 ` Jeff King
  2016-03-05 22:13 ` [PATCH 4/6] mailmap: do not resolve blobs in a non-repository Jeff King
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2016-03-05 22:11 UTC (permalink / raw)
  To: David Turner; +Cc: git, Michael Haggerty, Nguyễn Thái Ngọc Duy

The remote-config code wants to look at HEAD to mark the
current branch specially. But if we are not in a repository
(e.g., running "git archive --remote"), this makes no sense;
there is no HEAD to look at, and we have no current branch.

This doesn't really cause any bugs in practice (if you are
not in a repo, you probably don't have a .git/HEAD file),
but we should be more careful about triggering the refs code
at all in a non-repo. As we grow new ref backends, we would
not even know which backend to use.

Signed-off-by: Jeff King <peff@peff.net>
---
 remote.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/remote.c b/remote.c
index fc02698..28fd676 100644
--- a/remote.c
+++ b/remote.c
@@ -455,7 +455,6 @@ static void read_config(void)
 {
 	static int loaded;
 	struct object_id oid;
-	const char *head_ref;
 	int flag;
 
 	if (loaded)
@@ -463,10 +462,12 @@ static void read_config(void)
 	loaded = 1;
 
 	current_branch = NULL;
-	head_ref = resolve_ref_unsafe("HEAD", 0, oid.hash, &flag);
-	if (head_ref && (flag & REF_ISSYMREF) &&
-	    skip_prefix(head_ref, "refs/heads/", &head_ref)) {
-		current_branch = make_branch(head_ref, 0);
+	if (startup_info->have_repository) {
+		const char *head_ref = resolve_ref_unsafe("HEAD", 0, oid.hash, &flag);
+		if (head_ref && (flag & REF_ISSYMREF) &&
+		    skip_prefix(head_ref, "refs/heads/", &head_ref)) {
+			current_branch = make_branch(head_ref, 0);
+		}
 	}
 	git_config(handle_config, NULL);
 	alias_all_urls();
-- 
2.8.0.rc1.318.g2193183

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/6] mailmap: do not resolve blobs in a non-repository
  2016-03-05 22:08 [PATCH 0/6] avoid touching ref code in non-repositories Jeff King
                   ` (2 preceding siblings ...)
  2016-03-05 22:11 ` [PATCH 3/6] remote: don't resolve HEAD in non-repository Jeff King
@ 2016-03-05 22:13 ` Jeff King
  2016-03-05 22:15 ` [PATCH 5/6] grep: turn off gitlink detection for --no-index Jeff King
  2016-03-05 22:16 ` [PATCH 6/6] use setup_git_directory() in test-* programs Jeff King
  5 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2016-03-05 22:13 UTC (permalink / raw)
  To: David Turner; +Cc: git, Michael Haggerty, Nguyễn Thái Ngọc Duy

The mailmap code may be triggered outside of a repository by
git-shortlog. There is no point in looking up a name like
"HEAD:.mailmap" there; without a repository, we have no
refs.

This is unlikely to matter much in practice for the current
code, as we would simply fail to find the ref. But as the
refs code learns about new backends, this is more important;
without a repository, we do not even know which backend to
look at.

Signed-off-by: Jeff King <peff@peff.net>
---
I debated whether ".mailmap" in the $PWD should be respected in a
non-repository. If it were 2005, I think I could go either way. But it's
not, and changing it now seems like a pointless and unnecessary
regression, even if there is some grounds to argue for it.

 mailmap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mailmap.c b/mailmap.c
index f4a0f1c..9726237 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -250,7 +250,8 @@ int read_mailmap(struct string_list *map, char **repo_abbrev)
 		git_mailmap_blob = "HEAD:.mailmap";
 
 	err |= read_mailmap_file(map, ".mailmap", repo_abbrev);
-	err |= read_mailmap_blob(map, git_mailmap_blob, repo_abbrev);
+	if (startup_info->have_repository)
+		err |= read_mailmap_blob(map, git_mailmap_blob, repo_abbrev);
 	err |= read_mailmap_file(map, git_mailmap_file, repo_abbrev);
 	return err;
 }
-- 
2.8.0.rc1.318.g2193183

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/6] grep: turn off gitlink detection for --no-index
  2016-03-05 22:08 [PATCH 0/6] avoid touching ref code in non-repositories Jeff King
                   ` (3 preceding siblings ...)
  2016-03-05 22:13 ` [PATCH 4/6] mailmap: do not resolve blobs in a non-repository Jeff King
@ 2016-03-05 22:15 ` Jeff King
  2016-03-07  1:29   ` Junio C Hamano
  2016-03-05 22:16 ` [PATCH 6/6] use setup_git_directory() in test-* programs Jeff King
  5 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2016-03-05 22:15 UTC (permalink / raw)
  To: David Turner; +Cc: git, Michael Haggerty, Nguyễn Thái Ngọc Duy

If we are running "git grep --no-index" outside of a git
repository, we behave roughly like "grep -r", examining all
files in the current directory and its subdirectories.
However, because we use fill_directory() to do the
recursion, it will skip over any directories which look like
sub-repositories.

For a normal git operation (like "git grep" in a repository)
this makes sense; we do not want to cross the boundary out
of our current repository into a submodule. But for
"--no-index" without a repository, we should look at all
files, including embedded repositories.

There is one exception, though: we probably should _not_
descend into ".git" directories. Doing so is inefficient and
unlikely to turn up useful hits.

This patch drops our use of dir.c's gitlink-detection, but
we do still avoid ".git". That makes us more like tools such
as "ack" or "ag", which also know to avoid cruft in .git.

As a bonus, this also drops our usage of the ref code
when we are outside of a repository, making the transition
to pluggable ref backends cleaner.

Based-on-a-patch-by: David Turner <dturner@twopensource.com>
Signed-off-by: Jeff King <peff@peff.net>
---
I hope the reasoning above makes sense. My ulterior motive is the
"bonus", but I really think the new behavior is what people would expect
(i.e., that "git grep --no-index" is basically a replacement for "ack",
etc).

 builtin/grep.c  |  1 +
 t/t7810-grep.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index aa7435f..0636cd7 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -528,6 +528,7 @@ static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec,
 	int i, hit = 0;
 
 	memset(&dir, 0, sizeof(dir));
+	dir.flags |= DIR_NO_GITLINKS;
 	if (exc_std)
 		setup_standard_excludes(&dir);
 
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index b540944..1e72971 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -905,6 +905,33 @@ test_expect_success 'inside git repository but with --no-index' '
 	)
 '
 
+test_expect_success 'grep --no-index descends into repos, but not .git' '
+	rm -fr non &&
+	mkdir -p non/git &&
+	(
+		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
+		export GIT_CEILING_DIRECTORIES &&
+		cd non/git &&
+
+		echo magic >file &&
+		git init repo &&
+		(
+			cd repo &&
+			echo magic >file &&
+			git add file &&
+			git commit -m foo &&
+			echo magic >.git/file
+		) &&
+
+		cat >expect <<-\EOF &&
+		file
+		repo/file
+		EOF
+		git grep -l --no-index magic >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'setup double-dash tests' '
 cat >double-dash <<EOF &&
 --
-- 
2.8.0.rc1.318.g2193183

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/6] use setup_git_directory() in test-* programs
  2016-03-05 22:08 [PATCH 0/6] avoid touching ref code in non-repositories Jeff King
                   ` (4 preceding siblings ...)
  2016-03-05 22:15 ` [PATCH 5/6] grep: turn off gitlink detection for --no-index Jeff King
@ 2016-03-05 22:16 ` Jeff King
  2016-03-07  1:28   ` Junio C Hamano
  5 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2016-03-05 22:16 UTC (permalink / raw)
  To: David Turner; +Cc: git, Michael Haggerty, Nguyễn Thái Ngọc Duy

Some of the test-* programs rely on examining refs, but did
not bother to make sure we are actually in a git repository.
Let's have them call setup_git_directory() to do so.

Signed-off-by: Jeff King <peff@peff.net>
---
As discussed elsewhere, test-match-trees isn't actually used in the test
suite (except for the weird use which was removed in patch 1 here). So
we could also just get rid of it, but maybe somebody uses it for
debugging or something.

 test-match-trees.c      | 2 ++
 test-revision-walking.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/test-match-trees.c b/test-match-trees.c
index 109f03e..4dad709 100644
--- a/test-match-trees.c
+++ b/test-match-trees.c
@@ -6,6 +6,8 @@ int main(int ac, char **av)
 	unsigned char hash1[20], hash2[20], shifted[20];
 	struct tree *one, *two;
 
+	setup_git_directory();
+
 	if (get_sha1(av[1], hash1))
 		die("cannot parse %s as an object name", av[1]);
 	if (get_sha1(av[2], hash2))
diff --git a/test-revision-walking.c b/test-revision-walking.c
index 285f06b..3d03133 100644
--- a/test-revision-walking.c
+++ b/test-revision-walking.c
@@ -50,6 +50,8 @@ int main(int argc, char **argv)
 	if (argc < 2)
 		return 1;
 
+	setup_git_directory();
+
 	if (!strcmp(argv[1], "run-twice")) {
 		printf("1st\n");
 		if (!run_revision_walk())
-- 
2.8.0.rc1.318.g2193183

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 6/6] use setup_git_directory() in test-* programs
  2016-03-05 22:16 ` [PATCH 6/6] use setup_git_directory() in test-* programs Jeff King
@ 2016-03-07  1:28   ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2016-03-07  1:28 UTC (permalink / raw)
  To: Jeff King
  Cc: David Turner, git, Michael Haggerty,
	Nguyễn Thái Ngọc Duy

Jeff King <peff@peff.net> writes:

> Some of the test-* programs rely on examining refs, but did
> not bother to make sure we are actually in a git repository.
> Let's have them call setup_git_directory() to do so.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> As discussed elsewhere, test-match-trees isn't actually used in the test
> suite (except for the weird use which was removed in patch 1 here). So
> we could also just get rid of it, but maybe somebody uses it for
> debugging or something.

I tend to agree that it is very tempting to remove that one.  It was
a throw-away program I wrote early while inventing the subtree merge
strategy.

It would be useful only to those who want to extend the heuristics
of figuring out how to shift two trees to make them match, without
having to exercise the whole merge-recursive machinery.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/6] grep: turn off gitlink detection for --no-index
  2016-03-05 22:15 ` [PATCH 5/6] grep: turn off gitlink detection for --no-index Jeff King
@ 2016-03-07  1:29   ` Junio C Hamano
  2016-03-07 15:51     ` [PATCH v2 " Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2016-03-07  1:29 UTC (permalink / raw)
  To: Jeff King
  Cc: David Turner, git, Michael Haggerty,
	Nguyễn Thái Ngọc Duy

Jeff King <peff@peff.net> writes:

> If we are running "git grep --no-index" outside of a git
> repository, we behave roughly like "grep -r", examining all
> files in the current directory and its subdirectories.
> However, because we use fill_directory() to do the
> recursion, it will skip over any directories which look like
> sub-repositories.
>
> For a normal git operation (like "git grep" in a repository)
> this makes sense; we do not want to cross the boundary out
> of our current repository into a submodule. But for
> "--no-index" without a repository, we should look at all
> files, including embedded repositories.
>
> There is one exception, though: we probably should _not_
> descend into ".git" directories. Doing so is inefficient and
> unlikely to turn up useful hits.
>
> This patch drops our use of dir.c's gitlink-detection, but
> we do still avoid ".git". That makes us more like tools such
> as "ack" or "ag", which also know to avoid cruft in .git.
>
> As a bonus, this also drops our usage of the ref code
> when we are outside of a repository, making the transition
> to pluggable ref backends cleaner.
>
> Based-on-a-patch-by: David Turner <dturner@twopensource.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I hope the reasoning above makes sense. My ulterior motive is the
> "bonus", but I really think the new behavior is what people would expect
> (i.e., that "git grep --no-index" is basically a replacement for "ack",
> etc).

I agree with --no-index part, but the caller of grep_directory() is
"either no-index, or untracked".  I am not sure if the latter wants
this new behaviour.

>
>  builtin/grep.c  |  1 +
>  t/t7810-grep.sh | 27 +++++++++++++++++++++++++++
>  2 files changed, 28 insertions(+)
>
> diff --git a/builtin/grep.c b/builtin/grep.c
> index aa7435f..0636cd7 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -528,6 +528,7 @@ static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec,
>  	int i, hit = 0;
>  
>  	memset(&dir, 0, sizeof(dir));
> +	dir.flags |= DIR_NO_GITLINKS;
>  	if (exc_std)
>  		setup_standard_excludes(&dir);
>  
> diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
> index b540944..1e72971 100755
> --- a/t/t7810-grep.sh
> +++ b/t/t7810-grep.sh
> @@ -905,6 +905,33 @@ test_expect_success 'inside git repository but with --no-index' '
>  	)
>  '
>  
> +test_expect_success 'grep --no-index descends into repos, but not .git' '
> +	rm -fr non &&
> +	mkdir -p non/git &&
> +	(
> +		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
> +		export GIT_CEILING_DIRECTORIES &&
> +		cd non/git &&
> +
> +		echo magic >file &&
> +		git init repo &&
> +		(
> +			cd repo &&
> +			echo magic >file &&
> +			git add file &&
> +			git commit -m foo &&
> +			echo magic >.git/file
> +		) &&
> +
> +		cat >expect <<-\EOF &&
> +		file
> +		repo/file
> +		EOF
> +		git grep -l --no-index magic >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
>  test_expect_success 'setup double-dash tests' '
>  cat >double-dash <<EOF &&
>  --

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 5/6] grep: turn off gitlink detection for --no-index
  2016-03-07  1:29   ` Junio C Hamano
@ 2016-03-07 15:51     ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2016-03-07 15:51 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: David Turner, git, Michael Haggerty,
	Nguyễn Thái Ngọc Duy

On Sun, Mar 06, 2016 at 05:29:01PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > If we are running "git grep --no-index" outside of a git
> > repository, we behave roughly like "grep -r", examining all
> > files in the current directory and its subdirectories.
> > However, because we use fill_directory() to do the
> > recursion, it will skip over any directories which look like
> > sub-repositories.
> >
> > For a normal git operation (like "git grep" in a repository)
> > this makes sense; we do not want to cross the boundary out
> > of our current repository into a submodule. But for
> > "--no-index" without a repository, we should look at all
> > files, including embedded repositories.
> >
> > There is one exception, though: we probably should _not_
> > descend into ".git" directories. Doing so is inefficient and
> > unlikely to turn up useful hits.
> >
> > This patch drops our use of dir.c's gitlink-detection, but
> > we do still avoid ".git". That makes us more like tools such
> > as "ack" or "ag", which also know to avoid cruft in .git.
> >
> > As a bonus, this also drops our usage of the ref code
> > when we are outside of a repository, making the transition
> > to pluggable ref backends cleaner.
> >
> > Based-on-a-patch-by: David Turner <dturner@twopensource.com>
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > I hope the reasoning above makes sense. My ulterior motive is the
> > "bonus", but I really think the new behavior is what people would expect
> > (i.e., that "git grep --no-index" is basically a replacement for "ack",
> > etc).
> 
> I agree with --no-index part, but the caller of grep_directory() is
> "either no-index, or untracked".  I am not sure if the latter wants
> this new behaviour.

I didn't consider that. I could see arguments either way for
--untracked, but I think I'd lean towards it doing the "usual" thing
with embedded untracked repos (i.e., ignoring them).

We could do that by checking startup_info->have_repository. Which makes
me wonder if that check should actually go deep into dir.c, and all
callers (if there are any) who don't have a repository should not treat
gitlinks specially.

But I guess one can actually do "--no-index" inside a repository, so we
really do need to predicate it on that option, not "are we in a repo".

So here's a replacement patch:

-- >8 --
Subject: grep: turn off gitlink detection for --no-index

If we are running "git grep --no-index" outside of a git
repository, we behave roughly like "grep -r", examining all
files in the current directory and its subdirectories.
However, because we use fill_directory() to do the
recursion, it will skip over any directories which look like
sub-repositories.

For a normal git operation (like "git grep" in a repository)
this makes sense; we do not want to cross the boundary out
of our current repository into a submodule. But for
"--no-index" without a repository, we should look at all
files, including embedded repositories.

There is one exception, though: we probably should _not_
descend into ".git" directories. Doing so is inefficient and
unlikely to turn up useful hits.

This patch drops our use of dir.c's gitlink-detection, but
we do still avoid ".git". That makes us more like tools such
as "ack" or "ag", which also know to avoid cruft in .git.

As a bonus, this also drops our usage of the ref code
when we are outside of a repository, making the transition
to pluggable ref backends cleaner.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/grep.c  |  6 ++++--
 t/t7810-grep.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index aa7435f..111b6f6 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -522,12 +522,14 @@ static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec,
 }
 
 static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec,
-			  int exc_std)
+			  int exc_std, int use_index)
 {
 	struct dir_struct dir;
 	int i, hit = 0;
 
 	memset(&dir, 0, sizeof(dir));
+	if (!use_index)
+		dir.flags |= DIR_NO_GITLINKS;
 	if (exc_std)
 		setup_standard_excludes(&dir);
 
@@ -902,7 +904,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		int use_exclude = (opt_exclude < 0) ? use_index : !!opt_exclude;
 		if (list.nr)
 			die(_("--no-index or --untracked cannot be used with revs."));
-		hit = grep_directory(&opt, &pathspec, use_exclude);
+		hit = grep_directory(&opt, &pathspec, use_exclude, use_index);
 	} else if (0 <= opt_exclude) {
 		die(_("--[no-]exclude-standard cannot be used for tracked contents."));
 	} else if (!list.nr) {
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index b540944..1e72971 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -905,6 +905,33 @@ test_expect_success 'inside git repository but with --no-index' '
 	)
 '
 
+test_expect_success 'grep --no-index descends into repos, but not .git' '
+	rm -fr non &&
+	mkdir -p non/git &&
+	(
+		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
+		export GIT_CEILING_DIRECTORIES &&
+		cd non/git &&
+
+		echo magic >file &&
+		git init repo &&
+		(
+			cd repo &&
+			echo magic >file &&
+			git add file &&
+			git commit -m foo &&
+			echo magic >.git/file
+		) &&
+
+		cat >expect <<-\EOF &&
+		file
+		repo/file
+		EOF
+		git grep -l --no-index magic >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'setup double-dash tests' '
 cat >double-dash <<EOF &&
 --
-- 
2.8.0.rc1.318.g2193183

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-03-07 15:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-05 22:08 [PATCH 0/6] avoid touching ref code in non-repositories Jeff King
2016-03-05 22:10 ` [PATCH 1/6] setup: make startup_info available everywhere Jeff King
2016-03-05 22:11 ` [PATCH 2/6] setup: set startup_info->have_repository more reliably Jeff King
2016-03-05 22:11 ` [PATCH 3/6] remote: don't resolve HEAD in non-repository Jeff King
2016-03-05 22:13 ` [PATCH 4/6] mailmap: do not resolve blobs in a non-repository Jeff King
2016-03-05 22:15 ` [PATCH 5/6] grep: turn off gitlink detection for --no-index Jeff King
2016-03-07  1:29   ` Junio C Hamano
2016-03-07 15:51     ` [PATCH v2 " Jeff King
2016-03-05 22:16 ` [PATCH 6/6] use setup_git_directory() in test-* programs Jeff King
2016-03-07  1:28   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.