All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH] grep --no-index: allow to grep without git exclusions
@ 2011-07-20 12:50 Bert Wesarg
  2011-07-20 20:57 ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Bert Wesarg @ 2011-07-20 12:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Bert Wesarg

It is currently not possible to grep in arbitrary files with --no-index,
because the standard git exclusions are active when using --no-index.

Suppress this when the new option --no-exclude-standard was provided.

Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
---

The name comes from git-ls-files, and I think it is worthwhile to add the full
range of exclusion flags to git-grep. Namely --exclude=<patter>,
--exclude-from=<file>, and --exclude-per-directory=<file>. Which should 
only be honored when using --no-index, obviously.

 Documentation/git-grep.txt |    7 +++++
 builtin/grep.c             |   17 ++++++++++---
 t/t7810-grep.sh            |   55 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 07b3c6a..be9cf8b 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -24,6 +24,7 @@ SYNOPSIS
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
 	   [--cached | --no-index | <tree>...]
+	   [--no-exclude-standard]
 	   [--] [<pathspec>...]
 
 DESCRIPTION
@@ -200,6 +201,12 @@ OPTIONS
 	Do not output matched lines; instead, exit with status 0 when
 	there is a match and with non-zero status when there isn't.
 
+--exclude-standard::
+--[no-]exclude-standard::
+	Use the standard git exclusions: .git/info/exclude, .gitignore
+	in each directory, and the user's global exclusion file when
+	searching in the work tree with `--no-index`.
+
 <tree>...::
 	Instead of searching tracked files in the working tree, search
 	blobs in the given trees.
diff --git a/builtin/grep.c b/builtin/grep.c
index cccf8da..f2fce73 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -637,13 +637,15 @@ static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec,
 	return hit;
 }
 
-static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec)
+static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec,
+			  int exclude_standard)
 {
 	struct dir_struct dir;
 	int i, hit = 0;
 
 	memset(&dir, 0, sizeof(dir));
-	setup_standard_excludes(&dir);
+	if (exclude_standard)
+		setup_standard_excludes(&dir);
 
 	fill_directory(&dir, pathspec->raw);
 	for (i = 0; i < dir.nr; i++) {
@@ -761,7 +763,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	struct string_list path_list = STRING_LIST_INIT_NODUP;
 	int i;
 	int dummy;
-	int use_index = 1;
+	int use_index = 1, exclude_standard = 1;
 	enum {
 		pattern_type_unspecified = 0,
 		pattern_type_bre,
@@ -839,6 +841,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOLEAN('p', "show-function", &opt.funcname,
 			"show a line with the function name before matches"),
 		OPT_GROUP(""),
+		OPT_BOOLEAN(0, "exclude-standard", &exclude_standard,
+			"don't use standard excludes, needs --no-index"),
+		OPT_GROUP(""),
 		OPT_CALLBACK('f', NULL, &opt, "file",
 			"read patterns from file", file_callback),
 		{ OPTION_CALLBACK, 'e', NULL, &opt, "pattern",
@@ -936,6 +941,10 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		break; /* nothing */
 	}
 
+	/* --no-exclude-standard needs --no-index */
+	if (use_index && !exclude_standard)
+		die(_("--no-exclude-standard does not make sense without --no-index."));
+
 	if (use_index && !startup_info->have_repository)
 		/* die the same way as if we did it at the beginning */
 		setup_git_directory();
@@ -1050,7 +1059,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			die(_("--cached cannot be used with --no-index."));
 		if (list.nr)
 			die(_("--no-index cannot be used with revs."));
-		hit = grep_directory(&opt, &pathspec);
+		hit = grep_directory(&opt, &pathspec, exclude_standard);
 	} else if (!list.nr) {
 		if (!cached)
 			setup_work_tree();
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index a29ae45..15eb52b 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -587,6 +587,61 @@ test_expect_success 'inside git repository but with --no-index' '
 	)
 '
 
+test_expect_success 'outside of git repository and --no-exclude-standard' '
+	rm -fr non &&
+	mkdir -p non/git/sub &&
+	echo hello >non/git/file1 &&
+	echo world >non/git/sub/file2 &&
+	echo ".*o*" >non/git/.gitignore &&
+	echo "fi*" >>non/git/.gitignore &&
+	{
+		echo file1:hello &&
+		echo sub/file2:world
+	} >non/expect.full &&
+	: >non/expect.empty &&
+	echo file2:world >non/expect.sub &&
+	(
+		GIT_CEILING_DIRECTORIES="$(pwd)/non/git" &&
+		export GIT_CEILING_DIRECTORIES &&
+		cd non/git &&
+		test_must_fail git grep l &&
+		git grep --no-index --no-exclude-standard l >../actual.full &&
+		test_cmp ../expect.full ../actual.full
+		cd sub &&
+		test_must_fail git grep l &&
+		git grep --no-index l >../../actual.sub &&
+		test_cmp ../../expect.sub ../../actual.sub
+	)
+'
+
+test_expect_success 'inside git repository but with --no-index and --no-exclude-standard' '
+	rm -fr is &&
+	mkdir -p is/git/sub &&
+	echo hello >is/git/file1 &&
+	echo world >is/git/sub/file2 &&
+	echo ".*o*" >is/git/.gitignore &&
+	echo "fi*" >>is/git/.gitignore &&
+	{
+		echo file1:hello &&
+		echo sub/file2:world
+	} >is/expect.full &&
+	: >is/expect.empty &&
+	echo file2:world >is/expect.sub &&
+	(
+		cd is/git &&
+		git init &&
+		test_must_fail git grep l >../actual.empty1 &&
+		test_cmp ../expect.empty ../actual.empty1 &&
+		git grep --no-index --no-exclude-standard l >../actual.full &&
+		test_cmp ../expect.full ../actual.full &&
+		cd sub &&
+		test_must_fail git grep l >../../actual.sub &&
+		test_cmp ../../expect.empty ../../actual.sub &&
+		git grep --no-index --no-exclude-standard l >../../actual.sub &&
+		test_cmp ../../expect.sub ../../actual.sub
+	)
+'
+
 test_expect_success 'setup double-dash tests' '
 cat >double-dash <<EOF &&
 --
-- 
1.7.6.588.g8d735

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] grep --no-index: allow to grep without git exclusions
  2011-07-20 12:50 [RFC/PATCH] grep --no-index: allow to grep without git exclusions Bert Wesarg
@ 2011-07-20 20:57 ` Junio C Hamano
  2011-07-21  3:47   ` Nguyen Thai Ngoc Duy
  2011-07-21  7:11   ` Bert Wesarg
  0 siblings, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2011-07-20 20:57 UTC (permalink / raw)
  To: Bert Wesarg; +Cc: git

Taken in total isolation, this patch does allow a use case where we did
not allow, but when considered in a larger picture of what "grep" is used
for and how different use cases the command should support, a few random
thoughts come to mind:

 - Like "diff --no-index", "grep --no-index" is about a directory that is
   not managed by git at all with random collection of files. Do we even
   want to be using any "exclude" in such a use case? Wouldn't it actually
   be a bug that we pay attention to standard-exlcludes in the current
   code, as .gitignore files scattered in such a directory should _not_
   mean anything, as it is not a git working tree at all?

 - Even in a git managed directory, you _could_ use "grep --no-index" to
   find hits from both tracked and untracked files. In this particular use
   case, it makes some sense to pay attention to "exclude", as that would
   catch what _could_ be committed, and paths that would be excluded won't
   be part of that set (unless you use "add -f"). But wouldn't that use
   case better be covered by a switch that is different from --no-index
   (which means "These are not managed by git at all")? It is still about
   files in a git working tree, it is just that the user wants us to pay
   attention also to untracked files, e.g. "grep --untracked-too"?

So I think the patch identified a good problem to solve, but it might be a
wrong solution that encourages a use of a wrong option (i.e. --no-index)
only because we do not have the right one (i.e. "I am in the working tree,
but I want untracked ones also considered.").

What do people think if we did this a bit differently?

 - Since 3081623 (grep --no-index: allow use of "git grep" outside a git
   repository, 2010-01-15) and 59332d1 (Resurrect "git grep --no-index",
   2010-02-06), "grep --no-index" incorrectly paid attention to the
   exclude patterns. We shouldn't have, and we'd fix that bug.

 - It might be useful to be able to "git grep" both tracked and untracked
   (i.e. new files you may want to "git add") paths, but there is no good
   way to do so. Introduce a new option --untracked-too (or more suitable
   name --- I am bad at naming and not married to this one) to allow
   this. This mode always takes "exclude" into account.

Opinions?

Regarding the patch:

> +	/* --no-exclude-standard needs --no-index */
> +	if (use_index && !exclude_standard)
> +		die(_("--no-exclude-standard does not make sense without --no-index."));

For that matter,

    $ git grep --no-exclude-standard --exclude-standard --cached -e foo

should be an error, no?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] grep --no-index: allow to grep without git exclusions
  2011-07-20 20:57 ` Junio C Hamano
@ 2011-07-21  3:47   ` Nguyen Thai Ngoc Duy
  2011-07-21  7:11   ` Bert Wesarg
  1 sibling, 0 replies; 7+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-07-21  3:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Bert Wesarg, git

On Thu, Jul 21, 2011 at 3:57 AM, Junio C Hamano <gitster@pobox.com> wrote:
>  - It might be useful to be able to "git grep" both tracked and untracked
>   (i.e. new files you may want to "git add") paths, but there is no good
>   way to do so. Introduce a new option --untracked-too (or more suitable
>   name --- I am bad at naming and not married to this one) to allow
>   this. This mode always takes "exclude" into account.

--ls-files ([c|d|o|i|s|u|k|m])+? This opens door for grepping only
untracked files, for instant.
-- 
Duy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] grep --no-index: allow to grep without git exclusions
  2011-07-20 20:57 ` Junio C Hamano
  2011-07-21  3:47   ` Nguyen Thai Ngoc Duy
@ 2011-07-21  7:11   ` Bert Wesarg
  2011-07-21  7:25     ` Bert Wesarg
  2011-07-21 16:22     ` Junio C Hamano
  1 sibling, 2 replies; 7+ messages in thread
From: Bert Wesarg @ 2011-07-21  7:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyễn Thái Ngọc

On Wed, Jul 20, 2011 at 22:57, Junio C Hamano <gitster@pobox.com> wrote:
> Taken in total isolation, this patch does allow a use case where we did
> not allow, but when considered in a larger picture of what "grep" is used
> for and how different use cases the command should support, a few random
> thoughts come to mind:
>
>  - Like "diff --no-index", "grep --no-index" is about a directory that is
>   not managed by git at all with random collection of files. Do we even
>   want to be using any "exclude" in such a use case? Wouldn't it actually
>   be a bug that we pay attention to standard-exlcludes in the current
>   code, as .gitignore files scattered in such a directory should _not_
>   mean anything, as it is not a git working tree at all?
>
>  - Even in a git managed directory, you _could_ use "grep --no-index" to
>   find hits from both tracked and untracked files. In this particular use
>   case, it makes some sense to pay attention to "exclude", as that would
>   catch what _could_ be committed, and paths that would be excluded won't
>   be part of that set (unless you use "add -f"). But wouldn't that use
>   case better be covered by a switch that is different from --no-index
>   (which means "These are not managed by git at all")? It is still about
>   files in a git working tree, it is just that the user wants us to pay
>   attention also to untracked files, e.g. "grep --untracked-too"?
>
> So I think the patch identified a good problem to solve, but it might be a
> wrong solution that encourages a use of a wrong option (i.e. --no-index)
> only because we do not have the right one (i.e. "I am in the working tree,
> but I want untracked ones also considered.").
>
> What do people think if we did this a bit differently?
>
>  - Since 3081623 (grep --no-index: allow use of "git grep" outside a git
>   repository, 2010-01-15) and 59332d1 (Resurrect "git grep --no-index",
>   2010-02-06), "grep --no-index" incorrectly paid attention to the
>   exclude patterns. We shouldn't have, and we'd fix that bug.

I considered this when I noticed this, but feared the
backward-incompatibility. If you call it a bug, than we can change for
the better.

>
>  - It might be useful to be able to "git grep" both tracked and untracked
>   (i.e. new files you may want to "git add") paths, but there is no good
>   way to do so. Introduce a new option --untracked-too (or more suitable
>   name --- I am bad at naming and not married to this one) to allow
>   this. This mode always takes "exclude" into account.

My proposal would be to name it --include-untracked.

>
> Opinions?
>
> Regarding the patch:
>
>> +     /* --no-exclude-standard needs --no-index */
>> +     if (use_index && !exclude_standard)
>> +             die(_("--no-exclude-standard does not make sense without --no-index."));
>
> For that matter,
>
>    $ git grep --no-exclude-standard --exclude-standard --cached -e foo
>
> should be an error, no?
>

It should be. But I think that unveils one of the shortcomings of the
(any) option parser: You wont get notified when an option was given,
regardless of its value. To handle the above I would have to use
OPTION_CALLBACK to set an addition flag exc_given (like it is done in
git-ls-files) and test against this. The same problem is possible for
a number option, if you want to know whether the option was actually
given on the command line, one need to invent an invalid value (which
isn't always possible) and use this as the initializer or use
OPTION_CALLBACK again.

Bert

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] grep --no-index: allow to grep without git exclusions
  2011-07-21  7:11   ` Bert Wesarg
@ 2011-07-21  7:25     ` Bert Wesarg
  2011-07-21 16:22     ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Bert Wesarg @ 2011-07-21  7:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyễn Thái Ngọc

2011/7/21 Bert Wesarg <bert.wesarg@googlemail.com>:
>
> It should be. But I think that unveils one of the shortcomings of the
> (any) option parser: You wont get notified when an option was given,
> regardless of its value. To handle the above I would have to use
> OPTION_CALLBACK to set an addition flag exc_given (like it is done in
> git-ls-files) and test against this. The same problem is possible for
> a number option, if you want to know whether the option was actually
> given on the command line, one need to invent an invalid value (which
> isn't always possible) and use this as the initializer or use
> OPTION_CALLBACK again.

Here is a proof-of-concept patch (probably whitespace damaged) to make
it more clear what I meant:

--- >8 ---
Subject: [PoC/RFC/PATCH] parse-option: flag the presence of an option regardless
 of its value

Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>

---
 parse-options.c |   15 +++++++++++++--
 parse-options.h |   20 ++++++++++++++++++++
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/parse-options.c b/parse-options.c
index 73bd28a..df0f483 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -156,7 +156,10 @@ static int parse_short_opt(struct parse_opt_ctx_t
*p, const struct option *optio
 	for (; options->type != OPTION_END; options++) {
 		if (options->short_name == *p->opt) {
 			p->opt = p->opt[1] ? p->opt + 1 : NULL;
-			return get_value(p, options, OPT_SHORT);
+			if (options->given) {
+				*options->given = 1;
+				return get_value(p, options, OPT_SHORT);
+			}
 		}

 		/*
@@ -171,6 +174,9 @@ static int parse_short_opt(struct parse_opt_ctx_t
*p, const struct option *optio
 		char *arg;
 		int rc;

+		if (options->given)
+			*options->given = 1;
+
 		while (isdigit(p->opt[len]))
 			len++;
 		arg = xmemdupz(p->opt, len);
@@ -254,6 +260,8 @@ is_abbreviated:
 				continue;
 			p->opt = rest + 1;
 		}
+		if (options->given)
+			*options->given = 1;
 		return get_value(p, options, flags);
 	}

@@ -276,8 +284,11 @@ static int parse_nodash_opt(struct
parse_opt_ctx_t *p, const char *arg,
 	for (; options->type != OPTION_END; options++) {
 		if (!(options->flags & PARSE_OPT_NODASH))
 			continue;
-		if (options->short_name == arg[0] && arg[1] == '\0')
+		if (options->short_name == arg[0] && arg[1] == '\0') {
+			if (options->given)
+				*options->given = 1;
 			return get_value(p, options, OPT_SHORT);
+		}
 	}
 	return -2;
 }
diff --git a/parse-options.h b/parse-options.h
index d1b12fe..7fe3d49 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -100,6 +100,10 @@ typedef int parse_opt_ll_cb(struct parse_opt_ctx_t *ctx,
  *   OPTION_{BIT,SET_INT,SET_PTR} store the {mask,integer,pointer} to put in
  *   the value when met.
  *   CALLBACKS can use it like they want.
+ *
+ * `given`::
+ *   Pointer to a flag variable which is set to 1 if this option was given.
+ *   The flag should be 0 initialized, to make sense.
  */
 struct option {
 	enum parse_opt_type type;
@@ -112,6 +116,7 @@ struct option {
 	int flags;
 	parse_opt_cb *callback;
 	intptr_t defval;
+	int *given;
 };

 #define OPT_END()                   { OPTION_END }
@@ -146,6 +151,21 @@ struct option {
 	{ OPTION_CALLBACK, (s), (l), (v), "when", (h), PARSE_OPT_OPTARG, \
 		parse_opt_color_flag_cb, (intptr_t)"always" }

+/* same as above but with given flag pointer */
+#define OPT_BIT_GIVEN(s, l, v, h, b, g) \
+	{ OPTION_BIT,     (s), (l), (v), NULL, (h), PARSE_OPT_NOARG, NULL, (b), (g) }
+#define OPT_NEGBIT_GIVEN(s, l, v, h, b, g) \
+	{ OPTION_NEGBIT,  (s), (l), (v), NULL, (h), PARSE_OPT_NOARG, NULL, (b), (g) }
+#define OPT_BOOLEAN_GIVEN(s, l, v, h, g) \
+	{ OPTION_BOOLEAN, (s), (l), (v), NULL, (h), PARSE_OPT_NOARG, NULL, 0, (g) }
+#define OPT_SET_INT_GIVEN(s, l, v, h, i, g) \
+	{ OPTION_SET_INT, (s), (l), (v), NULL, (h), PARSE_OPT_NOARG, NULL, (i), (g) }
+#define OPT_SET_PTR_GIVEN(s, l, v, h, p, g) \
+	{ OPTION_SET_PTR, (s), (l), (v), NULL, (h), PARSE_OPT_NOARG, NULL, (p), (g) }
+#define OPT_INTEGER_GIVEN(s, l, v, h, g) \
+	{ OPTION_INTEGER, (s), (l), (v), "n", (h), 0, NULL, 0, (g) }
+#define OPT_STRING_GIVEN(s, l, v, a, h, g) \
+	{ OPTION_STRING,  (s), (l), (v), (a), (h), 0, NULL, 0, (g) }

 /* parse_options() will filter out the processed options and leave the
  * non-option arguments in argv[].
-- 
1.7.6.588.g8d735

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] grep --no-index: allow to grep without git exclusions
  2011-07-21  7:11   ` Bert Wesarg
  2011-07-21  7:25     ` Bert Wesarg
@ 2011-07-21 16:22     ` Junio C Hamano
  2011-07-21 17:23       ` Bert Wesarg
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2011-07-21 16:22 UTC (permalink / raw)
  To: Bert Wesarg; +Cc: Junio C Hamano, git, Nguyễn Thái Ngọc

Bert Wesarg <bert.wesarg@googlemail.com> writes:

> It should be. But I think that unveils one of the shortcomings of the
> (any) option parser: You wont get notified when an option was given,
> regardless of its value. To handle the above I would have to use
> OPTION_CALLBACK to set an addition flag exc_given (like it is done in
> git-ls-files) and test against this.

Prepare a three-value variable, initialized to -1, set it to 0 on --no-foo
and set it to 1 on --foo. Use the default if the variable is still -1.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] grep --no-index: allow to grep without git exclusions
  2011-07-21 16:22     ` Junio C Hamano
@ 2011-07-21 17:23       ` Bert Wesarg
  0 siblings, 0 replies; 7+ messages in thread
From: Bert Wesarg @ 2011-07-21 17:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyễn Thái Ngọc

2011/7/21 Junio C Hamano <gitster@pobox.com>:
> Bert Wesarg <bert.wesarg@googlemail.com> writes:
>
>> It should be. But I think that unveils one of the shortcomings of the
>> (any) option parser: You wont get notified when an option was given,
>> regardless of its value. To handle the above I would have to use
>> OPTION_CALLBACK to set an addition flag exc_given (like it is done in
>> git-ls-files) and test against this.
>
> Prepare a three-value variable, initialized to -1, set it to 0 on --no-foo
> and set it to 1 on --foo. Use the default if the variable is still -1.
>

Thats the 'invent an invalid value'-case I described. Which does not
necessarily exist.

Having the proposed 'given' flag available I would need only one
variable for all the options (these coming from ls-files) to check if
any of them where given, and could decide if the given options would
make sense.

Bert

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-07-21 17:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-20 12:50 [RFC/PATCH] grep --no-index: allow to grep without git exclusions Bert Wesarg
2011-07-20 20:57 ` Junio C Hamano
2011-07-21  3:47   ` Nguyen Thai Ngoc Duy
2011-07-21  7:11   ` Bert Wesarg
2011-07-21  7:25     ` Bert Wesarg
2011-07-21 16:22     ` Junio C Hamano
2011-07-21 17:23       ` Bert Wesarg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.