* [regression?] trailing slash required in .gitattributes @ 2013-03-19 17:57 Jeff King 2013-03-19 18:10 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 66+ messages in thread From: Jeff King @ 2013-03-19 17:57 UTC (permalink / raw) To: git; +Cc: Jean-Noël AVILA Prior to v1.8.1.1, if I did this: git init echo content >foo && mkdir subdir && echo content >subdir/bar && echo "subdir export-ignore" >.gitattributes git add . && git commit -m one && git archive HEAD | tar tf - my archive would contain only "foo" and ".gitattributes", not subdir. As of v1.8.1.1, the attribute on subdir is ignored unless it is written with a trailing slash, like: subdir/ export-ignore The issue bisects to 94bc671 (Add directory pattern matching to attributes, 2012-12-08). That commit actually tests not only that "subdir/" matches, but also that just "subdir" does not match. The commit message there is vague about the reasoning, but my understanding is that it was meant to harmonize gitignore and gitattributes, the former of which can take "dir/". I don't have a problem with offering "dir/" to match only directories, but what is the point in disallowing just "dir" to match a directory? It seems like a pointless regression to me, but I'm not clear whether it was intentional or not (and if it was intentional, I think we would need to handle it with a proper transition period, not in a maint release). -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [regression?] trailing slash required in .gitattributes 2013-03-19 17:57 [regression?] trailing slash required in .gitattributes Jeff King @ 2013-03-19 18:10 ` Junio C Hamano 2013-03-19 18:10 ` Jeff King 2013-03-25 6:05 ` [PATCH 0/4] attr directory matching regression Nguyễn Thái Ngọc Duy 2 siblings, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2013-03-19 18:10 UTC (permalink / raw) To: Jeff King; +Cc: git, Jean-Noël AVILA Jeff King <peff@peff.net> writes: > Prior to v1.8.1.1, if I did this: > > git init > echo content >foo && > mkdir subdir && > echo content >subdir/bar && > echo "subdir export-ignore" >.gitattributes > git add . && > git commit -m one && > git archive HEAD | tar tf - > > my archive would contain only "foo" and ".gitattributes", not subdir. As > of v1.8.1.1, the attribute on subdir is ignored unless it is written > with a trailing slash, like: > > subdir/ export-ignore > > The issue bisects to 94bc671 (Add directory pattern matching to > attributes, 2012-12-08). That commit actually tests not only that > "subdir/" matches, but also that just "subdir" does not match. > > The commit message there is vague about the reasoning, but my > understanding is that it was meant to harmonize gitignore and > gitattributes, the former of which can take "dir/". I don't have a > problem with offering "dir/" to match only directories, but what is the > point in disallowing just "dir" to match a directory? > > It seems like a pointless regression to me, but I'm not clear whether it > was intentional or not (and if it was intentional, I think we would need > to handle it with a proper transition period, not in a maint release). I agree that is a pointless regression. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [regression?] trailing slash required in .gitattributes 2013-03-19 17:57 [regression?] trailing slash required in .gitattributes Jeff King 2013-03-19 18:10 ` Junio C Hamano @ 2013-03-19 18:10 ` Jeff King 2013-03-22 22:24 ` Jeff King 2013-03-25 6:05 ` [PATCH 0/4] attr directory matching regression Nguyễn Thái Ngọc Duy 2 siblings, 1 reply; 66+ messages in thread From: Jeff King @ 2013-03-19 18:10 UTC (permalink / raw) To: git; +Cc: Jean-Noël AVILA On Tue, Mar 19, 2013 at 01:57:56PM -0400, Jeff King wrote: > Prior to v1.8.1.1, if I did this: > > git init > echo content >foo && > mkdir subdir && > echo content >subdir/bar && > echo "subdir export-ignore" >.gitattributes > git add . && > git commit -m one && > git archive HEAD | tar tf - > > my archive would contain only "foo" and ".gitattributes", not subdir. As > of v1.8.1.1, the attribute on subdir is ignored unless it is written > with a trailing slash, like: > > subdir/ export-ignore > > The issue bisects to 94bc671 (Add directory pattern matching to > attributes, 2012-12-08). That commit actually tests not only that > "subdir/" matches, but also that just "subdir" does not match. Sorry, I mis-read the tests. They are not testing that "subdir" does not work, only that "subdir/" will match only a directory, not a regular file. Which does make sense. So I think the regression is accidental. And we would want tests like this on top (which currently fail): diff --git a/t/t5002-archive-attr-pattern.sh b/t/t5002-archive-attr-pattern.sh index 0c847fb..3be809c 100755 --- a/t/t5002-archive-attr-pattern.sh +++ b/t/t5002-archive-attr-pattern.sh @@ -27,6 +27,10 @@ test_expect_success 'setup' ' echo ignored-only-if-dir/ export-ignore >>.git/info/attributes && git add ignored-only-if-dir && + mkdir -p ignored-without-slash && + echo ignored without slash >ignored-without-slash/foo && + git add ignored-without-slash/foo && + echo ignored-without-slash export-ignore >>.git/info/attributes && mkdir -p one-level-lower/two-levels-lower/ignored-only-if-dir && echo ignored by ignored dir >one-level-lower/two-levels-lower/ignored-only-if-dir/ignored-by-ignored-dir && @@ -49,6 +53,8 @@ test_expect_missing archive/ignored-ony-if-dir/ignored-by-ignored-dir test_expect_exists archive/not-ignored-dir/ test_expect_missing archive/ignored-only-if-dir/ test_expect_missing archive/ignored-ony-if-dir/ignored-by-ignored-dir +test_expect_missing archive/ignored-without-slash/ && +test_expect_missing archive/ignored-without-slash/foo && test_expect_exists archive/one-level-lower/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-only-if-dir/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-ony-if-dir/ignored-by-ignored-dir ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [regression?] trailing slash required in .gitattributes 2013-03-19 18:10 ` Jeff King @ 2013-03-22 22:24 ` Jeff King 2013-03-22 23:08 ` Junio C Hamano 2013-03-23 4:18 ` [regression?] trailing slash required in .gitattributes Duy Nguyen 0 siblings, 2 replies; 66+ messages in thread From: Jeff King @ 2013-03-22 22:24 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy, Jean-Noël AVILA On Tue, Mar 19, 2013 at 02:10:42PM -0400, Jeff King wrote: > > The issue bisects to 94bc671 (Add directory pattern matching to > > attributes, 2012-12-08). That commit actually tests not only that > > "subdir/" matches, but also that just "subdir" does not match. > [...] > So I think the regression is accidental. And we would want tests like > this on top (which currently fail): > [...] I'm having trouble figuring out the right solution for this. The problem is in path_matches, which used to receive just the unadorned pathname, and now receives "path/" for directories. It now looks like this: > static int path_matches(const char *pathname, int pathlen, > const char *basename, > const struct pattern *pat, > const char *base, int baselen) > { > const char *pattern = pat->pattern; > int prefix = pat->nowildcardlen; > > if ((pat->flags & EXC_FLAG_MUSTBEDIR) && > ((!pathlen) || (pathname[pathlen-1] != '/'))) > return 0; This first stanza checks that a pattern like "foo/" must be matched by a real directory. Which is fine; that's the point of adding the "/" to the pattern. > if (pat->flags & EXC_FLAG_NODIR) { > return match_basename(basename, > pathlen - (basename - pathname), > pattern, prefix, > pat->patternlen, pat->flags); > } > return match_pathname(pathname, pathlen, > base, baselen, > pattern, prefix, pat->patternlen, pat->flags); > } But then here we'll end up feeding "foo/" to be compared with "foo", which we don't want. For a pattern "foo", we want to match _either_ "foo/" or "foo". So you'd think something like: if (pathlen && pathname[pathlen-1] == '/') pathlen--; would work. But it seems that match_basename, despite taking the length of all of the strings we pass it, will happily use NUL-terminated functions like strcmp or fnmatch. Converting the former to check lengths should be pretty straightforward. But there is no version of fnmatch that does what we want. I wonder if we using wildmatch can get around this limitation. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [regression?] trailing slash required in .gitattributes 2013-03-22 22:24 ` Jeff King @ 2013-03-22 23:08 ` Junio C Hamano 2013-03-23 8:39 ` Jeff King 2013-03-23 4:18 ` [regression?] trailing slash required in .gitattributes Duy Nguyen 1 sibling, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2013-03-22 23:08 UTC (permalink / raw) To: Jeff King Cc: git, Nguyễn Thái Ngọc Duy, Jean-Noël AVILA Jeff King <peff@peff.net> writes: > if (pathlen && pathname[pathlen-1] == '/') > pathlen--; > > would work. But it seems that match_basename, despite taking the length > of all of the strings we pass it, will happily use NUL-terminated > functions like strcmp or fnmatch. Converting the former to check lengths > should be pretty straightforward. But there is no version of fnmatch > that does what we want. I wonder if we using wildmatch can get around > this limitation. Or save away pathname[pathlen], temporarily NUL terminate and call these functions? ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [regression?] trailing slash required in .gitattributes 2013-03-22 23:08 ` Junio C Hamano @ 2013-03-23 8:39 ` Jeff King 2013-03-24 5:25 ` Junio C Hamano 2013-03-26 18:39 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Junio C Hamano 0 siblings, 2 replies; 66+ messages in thread From: Jeff King @ 2013-03-23 8:39 UTC (permalink / raw) To: Junio C Hamano Cc: git, Nguyễn Thái Ngọc Duy, Jean-Noël AVILA On Fri, Mar 22, 2013 at 04:08:08PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > if (pathlen && pathname[pathlen-1] == '/') > > pathlen--; > > > > would work. But it seems that match_basename, despite taking the length > > of all of the strings we pass it, will happily use NUL-terminated > > functions like strcmp or fnmatch. Converting the former to check lengths > > should be pretty straightforward. But there is no version of fnmatch > > that does what we want. I wonder if we using wildmatch can get around > > this limitation. > > Or save away pathname[pathlen], temporarily NUL terminate and call > these functions? Yeah, that is a possibility, though it involves casting away some constness. Patch is below, which seems to work. It still feels really ugly to me, and like match_basename is misdesigned and should respect the lengths we pass it. Also, if it does respect the lengths, it should be able to go much faster (e.g., in the common case, we can drop a ton of strcmp_icase calls if we just check the lengths beforehand). I feel like Duy was working on something like this recently, but I don't see anything in pu. --- diff --git a/attr.c b/attr.c index e2f9377..bd00a78 100644 --- a/attr.c +++ b/attr.c @@ -663,20 +663,58 @@ static int path_matches(const char *pathname, int pathlen, { const char *pattern = pat->pattern; int prefix = pat->nowildcardlen; + char path_munge = 0; + char pattern_munge = 0; + int ret; if ((pat->flags & EXC_FLAG_MUSTBEDIR) && ((!pathlen) || (pathname[pathlen-1] != '/'))) return 0; + /* + * Drop trailing slash from path, as we would want + * an unadorned pattern like "foo" to match both the + * file "foo" and the directory "foo/". + */ + if (pathlen && pathname[pathlen-1] == '/') { + pathlen--; + + /* + * The match_* functions, despite taking a string length, will + * happily read all the way up to the NUL-terminating character. + * So we must not only shrink pathlen, but munge the buffer + * to NUL-terminate it. + */ + path_munge = pathname[pathlen]; + ((char *)pathname)[pathlen] = '\0'; + } + + /* + * The pattern up to patternlen will not include a + * trailing slash, but it may still be present in the string. + * And since the match_* functions will read up to the NUL, + * we need to terminate the buffer. + */ + pattern_munge = pattern[pat->patternlen]; + ((char *)pattern)[pat->patternlen] = '\0'; + if (pat->flags & EXC_FLAG_NODIR) { - return match_basename(basename, - pathlen - (basename - pathname), - pattern, prefix, - pat->patternlen, pat->flags); - } - return match_pathname(pathname, pathlen, - base, baselen, - pattern, prefix, pat->patternlen, pat->flags); + ret = match_basename(basename, + pathlen - (basename - pathname), + pattern, prefix, + pat->patternlen, pat->flags); + } + else { + ret = match_pathname(pathname, pathlen, + base, baselen, + pattern, prefix, + pat->patternlen, pat->flags); + } + + if (path_munge) + ((char *)pathname)[pathlen] = path_munge; + ((char *)pattern)[pat->patternlen] = pattern_munge; + return ret; } static int macroexpand_one(int attr_nr, int rem); diff --git a/t/t5002-archive-attr-pattern.sh b/t/t5002-archive-attr-pattern.sh index 0c847fb..3be809c 100755 --- a/t/t5002-archive-attr-pattern.sh +++ b/t/t5002-archive-attr-pattern.sh @@ -27,6 +27,10 @@ test_expect_success 'setup' ' echo ignored-only-if-dir/ export-ignore >>.git/info/attributes && git add ignored-only-if-dir && + mkdir -p ignored-without-slash && + echo ignored without slash >ignored-without-slash/foo && + git add ignored-without-slash/foo && + echo ignored-without-slash export-ignore >>.git/info/attributes && mkdir -p one-level-lower/two-levels-lower/ignored-only-if-dir && echo ignored by ignored dir >one-level-lower/two-levels-lower/ignored-only-if-dir/ignored-by-ignored-dir && @@ -49,6 +53,8 @@ test_expect_missing archive/ignored-ony-if-dir/ignored-by-ignored-dir test_expect_exists archive/not-ignored-dir/ test_expect_missing archive/ignored-only-if-dir/ test_expect_missing archive/ignored-ony-if-dir/ignored-by-ignored-dir +test_expect_missing archive/ignored-without-slash/ && +test_expect_missing archive/ignored-without-slash/foo && test_expect_exists archive/one-level-lower/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-only-if-dir/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-ony-if-dir/ignored-by-ignored-dir ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [regression?] trailing slash required in .gitattributes 2013-03-23 8:39 ` Jeff King @ 2013-03-24 5:25 ` Junio C Hamano 2013-03-26 18:39 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Junio C Hamano 1 sibling, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2013-03-24 5:25 UTC (permalink / raw) To: Jeff King Cc: git, Nguyễn Thái Ngọc Duy, Jean-Noël AVILA Jeff King <peff@peff.net> writes: > Yeah, that is a possibility, though it involves casting away some > constness. Patch is below, which seems to work. Hmm, because this was after I read this part: > ... match_basename, despite taking the length > of all of the strings we pass it, will happily use NUL-terminated > functions like strcmp or fnmatch. I actually meant to do that inside match_basename(), not in one particular caller of it. Otherwise, new callers to match_basename() will also suffer from this broken API that pretends as if it takes counted strings but uses strings as NUL terminated, no? > It still feels really ugly to me, and like match_basename is misdesigned > and should respect the lengths we pass it. Exactly. ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward 2013-03-23 8:39 ` Jeff King 2013-03-24 5:25 ` Junio C Hamano @ 2013-03-26 18:39 ` Junio C Hamano 2013-03-26 18:39 ` [PATCH 1/4] attr.c::path_matches(): the basename is part of the pathname Junio C Hamano ` (5 more replies) 1 sibling, 6 replies; 66+ messages in thread From: Junio C Hamano @ 2013-03-26 18:39 UTC (permalink / raw) To: git; +Cc: pclouds, peff, avila.jn So here is an attempt to fix the unintended regression, on top of 9db9eecfe5c2 (attr: avoid calling find_basename() twice per path, 2013-01-16). It consists of four patches. The first patch is not essential to the fix, but I think it clarifies what is going on in this codepath. The second patch addresses the issue Jeff noticed; it appears as if match_basename() takes counted strings, but one of the strings was not a counted string at all. Its length was given to the function because the caller already had one, so that we do not have to do strlen() ourselves. And the other one was meant to be a counted string, but the callee was not using it as such. The patch makes them both counted strings and treat them as such. The third patch is the main fix. As I said in the log message, I didn't look at it very carefully, so extra sets of eyeballs are very much appreciated. The last one is a test stolen from Jeff to seal the series. It needs sign-off from Jeff. Jeff King (1): make sure a pattern without trailing slash matches a directory Junio C Hamano (3): attr.c::path_matches(): the basename is part of the pathname dir.c::match_basename(): pay attention to the length of string parameters attr.c::path_matches(): special case paths that end with a slash attr.c | 23 ++++++++++++----------- dir.c | 31 +++++++++++++++++++++++++++---- t/t5002-archive-attr-pattern.sh | 6 ++++++ 3 files changed, 45 insertions(+), 15 deletions(-) -- 1.8.2-350-g3df87a1 ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH 1/4] attr.c::path_matches(): the basename is part of the pathname 2013-03-26 18:39 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Junio C Hamano @ 2013-03-26 18:39 ` Junio C Hamano 2013-03-26 18:49 ` Jeff King 2013-03-26 18:39 ` [PATCH 2/4] dir.c::match_basename(): pay attention to the length of string parameters Junio C Hamano ` (4 subsequent siblings) 5 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2013-03-26 18:39 UTC (permalink / raw) To: git; +Cc: pclouds, peff, avila.jn The function takes two strings (pathname and basename) as if they are independent strings, but in reality, the latter is always pointing into a substring in the former. Clarify this relationship by expressing the latter as an offset into the former. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- attr.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/attr.c b/attr.c index ab2aab2..4cfe0ee 100644 --- a/attr.c +++ b/attr.c @@ -655,7 +655,7 @@ static void prepare_attr_stack(const char *path, int dirlen) } static int path_matches(const char *pathname, int pathlen, - const char *basename, + int basename_offset, const struct pattern *pat, const char *base, int baselen) { @@ -667,8 +667,8 @@ static int path_matches(const char *pathname, int pathlen, return 0; if (pat->flags & EXC_FLAG_NODIR) { - return match_basename(basename, - pathlen - (basename - pathname), + return match_basename(pathname + basename_offset, + pathlen - basename_offset, pattern, prefix, pat->patternlen, pat->flags); } @@ -701,7 +701,7 @@ static int fill_one(const char *what, struct match_attr *a, int rem) return rem; } -static int fill(const char *path, int pathlen, const char *basename, +static int fill(const char *path, int pathlen, int basename_offset, struct attr_stack *stk, int rem) { int i; @@ -711,7 +711,7 @@ static int fill(const char *path, int pathlen, const char *basename, struct match_attr *a = stk->attrs[i]; if (a->is_macro) continue; - if (path_matches(path, pathlen, basename, + if (path_matches(path, pathlen, basename_offset, &a->u.pat, base, stk->originlen)) rem = fill_one("fill", a, rem); } @@ -750,7 +750,8 @@ static void collect_all_attrs(const char *path) { struct attr_stack *stk; int i, pathlen, rem, dirlen; - const char *basename, *cp, *last_slash = NULL; + const char *cp, *last_slash = NULL; + int basename_offset; for (cp = path; *cp; cp++) { if (*cp == '/' && cp[1]) @@ -758,10 +759,10 @@ static void collect_all_attrs(const char *path) } pathlen = cp - path; if (last_slash) { - basename = last_slash + 1; + basename_offset = last_slash + 1 - path; dirlen = last_slash - path; } else { - basename = path; + basename_offset = 0; dirlen = 0; } @@ -771,7 +772,7 @@ static void collect_all_attrs(const char *path) rem = attr_nr; for (stk = attr_stack; 0 < rem && stk; stk = stk->prev) - rem = fill(path, pathlen, basename, stk, rem); + rem = fill(path, pathlen, basename_offset, stk, rem); } int git_check_attr(const char *path, int num, struct git_attr_check *check) -- 1.8.2-350-g3df87a1 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 1/4] attr.c::path_matches(): the basename is part of the pathname 2013-03-26 18:39 ` [PATCH 1/4] attr.c::path_matches(): the basename is part of the pathname Junio C Hamano @ 2013-03-26 18:49 ` Jeff King 2013-03-27 1:40 ` Duy Nguyen 0 siblings, 1 reply; 66+ messages in thread From: Jeff King @ 2013-03-26 18:49 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn On Tue, Mar 26, 2013 at 11:39:28AM -0700, Junio C Hamano wrote: > The function takes two strings (pathname and basename) as if they > are independent strings, but in reality, the latter is always > pointing into a substring in the former. > > Clarify this relationship by expressing the latter as an offset into > the former. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> This is a huge improvement in maintainability. My initial fix attempt was to just xstrdup() the strings (knowing that the performance would be horrible, but I was still investigating correctness issues at that point). And of course I ran into this same issue as I tried to make a copy of pathname. So even without the rest of the fix, this is definitely a good idea. :) -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 1/4] attr.c::path_matches(): the basename is part of the pathname 2013-03-26 18:49 ` Jeff King @ 2013-03-27 1:40 ` Duy Nguyen 0 siblings, 0 replies; 66+ messages in thread From: Duy Nguyen @ 2013-03-27 1:40 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, avila.jn On Wed, Mar 27, 2013 at 1:49 AM, Jeff King <peff@peff.net> wrote: > On Tue, Mar 26, 2013 at 11:39:28AM -0700, Junio C Hamano wrote: > >> The function takes two strings (pathname and basename) as if they >> are independent strings, but in reality, the latter is always >> pointing into a substring in the former. >> >> Clarify this relationship by expressing the latter as an offset into >> the former. >> >> Signed-off-by: Junio C Hamano <gitster@pobox.com> > > This is a huge improvement in maintainability. My initial fix attempt > was to just xstrdup() the strings (knowing that the performance would be > horrible, but I was still investigating correctness issues at that > point). And of course I ran into this same issue as I tried to make a > copy of pathname. > > So even without the rest of the fix, this is definitely a good idea. :) match_{base,path}name and their exclude callers do the same thing. I guess I'm used to it and did not see the maintainability issue. Maybe we should do the same there too. -- Duy ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH 2/4] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-26 18:39 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Junio C Hamano 2013-03-26 18:39 ` [PATCH 1/4] attr.c::path_matches(): the basename is part of the pathname Junio C Hamano @ 2013-03-26 18:39 ` Junio C Hamano 2013-03-26 18:55 ` Jeff King 2013-03-26 18:39 ` [PATCH 3/4] attr.c::path_matches(): special case paths that end with a slash Junio C Hamano ` (3 subsequent siblings) 5 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2013-03-26 18:39 UTC (permalink / raw) To: git; +Cc: pclouds, peff, avila.jn The function takes two counted strings (<basename, basenamelen> and <pattern, patternlen>) as parameters, together with prefix (the length of the prefix in pattern that is to be matched literally without globbing against the basename) and EXC_* flags that tells it how to match the pattern against the basename. However, it did not pay attention to the length of these counted strings. Update them to do the following: * When the entire pattern is to be matched literally, the pattern matches the basename only when the lengths of them are the same, and they match up to that length. * When the pattern is "*" followed by a string to be matched literally, make sure that the basenamelen is equal or longer than the "literal" part of the pattern, and the tail of the basename string matches that literal part. * Otherwise, make sure we use only the counted part of the strings when calling fnmatch_icase(). Because these counted strings are full strings most of the time, avoid unnecessary allocation. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- dir.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/dir.c b/dir.c index 5a83aa7..aa16303 100644 --- a/dir.c +++ b/dir.c @@ -537,15 +537,38 @@ int match_basename(const char *basename, int basenamelen, int flags) { if (prefix == patternlen) { - if (!strcmp_icase(pattern, basename)) + if (patternlen == basenamelen && + !strncmp_icase(pattern, basename, basenamelen)) return 1; } else if (flags & EXC_FLAG_ENDSWITH) { + /* "*literal" matching against "fooliteral" */ if (patternlen - 1 <= basenamelen && - !strcmp_icase(pattern + 1, - basename + basenamelen - patternlen + 1)) + !strncmp_icase(pattern + 1, + basename + basenamelen - (patternlen - 1), + patternlen - 1)) return 1; } else { - if (fnmatch_icase(pattern, basename, 0) == 0) + int match_status; + struct strbuf pat = STRBUF_INIT; + struct strbuf base = STRBUF_INIT; + const char *use_pat = pattern; + const char *use_base = basename; + + if (pattern[patternlen]) { + strbuf_add(&pat, pattern, patternlen); + use_pat = pat.buf; + } + if (basename[basenamelen]) { + strbuf_add(&base, basename, basenamelen); + use_base = base.buf; + } + match_status = fnmatch_icase(use_pat, use_base, 0); + if (use_pat) + strbuf_release(&pat); + if (use_base) + strbuf_release(&base); + + if (match_status == 0) return 1; } return 0; -- 1.8.2-350-g3df87a1 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 2/4] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-26 18:39 ` [PATCH 2/4] dir.c::match_basename(): pay attention to the length of string parameters Junio C Hamano @ 2013-03-26 18:55 ` Jeff King 2013-03-26 20:39 ` Jeff King 0 siblings, 1 reply; 66+ messages in thread From: Jeff King @ 2013-03-26 18:55 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn On Tue, Mar 26, 2013 at 11:39:29AM -0700, Junio C Hamano wrote: > The function takes two counted strings (<basename, basenamelen> and > <pattern, patternlen>) as parameters, together with prefix (the > length of the prefix in pattern that is to be matched literally > without globbing against the basename) and EXC_* flags that tells it > how to match the pattern against the basename. > > However, it did not pay attention to the length of these counted > strings. Update them to do the following: > > * When the entire pattern is to be matched literally, the pattern > matches the basename only when the lengths of them are the same, > and they match up to that length. > > * When the pattern is "*" followed by a string to be matched > literally, make sure that the basenamelen is equal or longer than > the "literal" part of the pattern, and the tail of the basename > string matches that literal part. > > * Otherwise, make sure we use only the counted part of the strings > when calling fnmatch_icase(). Because these counted strings are > full strings most of the time, avoid unnecessary allocation. I think this is OK, with the intention that we would eventually drop the allocations from your third bullet point in favor of using a byte-counted version of fnmatch (i.e., nwildmatch). But until then we're going to see a performance drop. The pattern is usually going to be NUL-terminated at the length counter, but every time we feed a directory, it's going to run into this allocation. And we do it once for _every_ directory against _every_ wildcard gitignore pattern. So I think it is probably going to be measurable. I guess we can try measuring it on something like WebKit, which has plenty of both directories and gitattributes. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/4] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-26 18:55 ` Jeff King @ 2013-03-26 20:39 ` Jeff King 2013-03-26 20:49 ` Junio C Hamano 0 siblings, 1 reply; 66+ messages in thread From: Jeff King @ 2013-03-26 20:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn On Tue, Mar 26, 2013 at 02:55:59PM -0400, Jeff King wrote: > > * Otherwise, make sure we use only the counted part of the strings > > when calling fnmatch_icase(). Because these counted strings are > > full strings most of the time, avoid unnecessary allocation. > > I think this is OK, with the intention that we would eventually drop the > allocations from your third bullet point in favor of using a > byte-counted version of fnmatch (i.e., nwildmatch). But until then we're > going to see a performance drop. > > The pattern is usually going to be NUL-terminated at the length counter, > but every time we feed a directory, it's going to run into this > allocation. And we do it once for _every_ directory against _every_ > wildcard gitignore pattern. So I think it is probably going to be > measurable. I guess we can try measuring it on something like WebKit, > which has plenty of both directories and gitattributes. I timed this doing "git archive HEAD" on webkit.git before and after. It actually ended up not mattering much (I think because it is only the directories which are affected, not each individually path, so it's a much smaller number than you'd think). The best-of-five timing was slightly slower, but was within the noise. So I do still think it would make sense to go to a byte-limited version of fnmatch eventually, just for code cleanliness and predictability of performance, but this is really not a bad solution in the interim. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/4] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-26 20:39 ` Jeff King @ 2013-03-26 20:49 ` Junio C Hamano 2013-03-26 21:29 ` Jeff King 0 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2013-03-26 20:49 UTC (permalink / raw) To: Jeff King; +Cc: git, pclouds, avila.jn Jeff King <peff@peff.net> writes: > I timed this doing "git archive HEAD" on webkit.git before and after. It > actually ended up not mattering much (I think because it is only the > directories which are affected, not each individually path, so it's a > much smaller number than you'd think). The best-of-five timing was > slightly slower, but was within the noise. Interesting. Because "archive" has to incur a large I/O cost anyway, I expected extra allocation for correctness for only the directory paths would be dwarfed in the noise. I actually care more about cases other than "archive", though. Do we even feed directory paths to the machinery? > So I do still think it would make sense to go to a byte-limited version > of fnmatch eventually, just for code cleanliness and predictability of > performance, but this is really not a bad solution in the interim. Yes, what we do with wildmatch is a separate issue for 'master' and upwards. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/4] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-26 20:49 ` Junio C Hamano @ 2013-03-26 21:29 ` Jeff King 2013-03-26 22:33 ` Junio C Hamano 0 siblings, 1 reply; 66+ messages in thread From: Jeff King @ 2013-03-26 21:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn On Tue, Mar 26, 2013 at 01:49:10PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > I timed this doing "git archive HEAD" on webkit.git before and after. It > > actually ended up not mattering much (I think because it is only the > > directories which are affected, not each individually path, so it's a > > much smaller number than you'd think). The best-of-five timing was > > slightly slower, but was within the noise. > > Interesting. Because "archive" has to incur a large I/O cost > anyway, I expected extra allocation for correctness for only the > directory paths would be dwarfed in the noise. > > I actually care more about cases other than "archive", though. Do > we even feed directory paths to the machinery? In general, no, I don't think so. That's why I tested "archive", since I knew it did. In the normal case, we should just feed file paths, meaning we only run into this code path when somebody has "foo/" in their pattern. Testing like: git ls-files -z >files time git check-attr --stdin -z -a <files >/dev/null showed a difference well within the noise. > > So I do still think it would make sense to go to a byte-limited version > > of fnmatch eventually, just for code cleanliness and predictability of > > performance, but this is really not a bad solution in the interim. > > Yes, what we do with wildmatch is a separate issue for 'master' and > upwards. Oh, agreed. I just wanted to see how much performance would be impacted for the interim. But it seems that it's not. So I think your series is the right direction, but we would want to factor out the allocation code and use it from match_pathname, as well. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/4] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-26 21:29 ` Jeff King @ 2013-03-26 22:33 ` Junio C Hamano 2013-03-27 1:04 ` Jeff King 0 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2013-03-26 22:33 UTC (permalink / raw) To: Jeff King; +Cc: git, pclouds, avila.jn Jeff King <peff@peff.net> writes: > So I think your series is the right direction, but we would want to > factor out the allocation code and use it from match_pathname, as well. I am deep into today's integration cycle, so perhaps in the meantime you can help with a follow-up patch ;-)? ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/4] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-26 22:33 ` Junio C Hamano @ 2013-03-27 1:04 ` Jeff King 0 siblings, 0 replies; 66+ messages in thread From: Jeff King @ 2013-03-27 1:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn On Tue, Mar 26, 2013 at 03:33:40PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > So I think your series is the right direction, but we would want to > > factor out the allocation code and use it from match_pathname, as well. > > I am deep into today's integration cycle, so perhaps in the meantime > you can help with a follow-up patch ;-)? Your afternoon integration is my dinner-time. :) I'll try to look at it tomorrow. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH 3/4] attr.c::path_matches(): special case paths that end with a slash 2013-03-26 18:39 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Junio C Hamano 2013-03-26 18:39 ` [PATCH 1/4] attr.c::path_matches(): the basename is part of the pathname Junio C Hamano 2013-03-26 18:39 ` [PATCH 2/4] dir.c::match_basename(): pay attention to the length of string parameters Junio C Hamano @ 2013-03-26 18:39 ` Junio C Hamano 2013-03-26 19:05 ` Jeff King 2013-03-26 18:39 ` [PATCH 4/4] make sure a pattern without trailing slash matches a directory Junio C Hamano ` (2 subsequent siblings) 5 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2013-03-26 18:39 UTC (permalink / raw) To: git; +Cc: pclouds, peff, avila.jn The function is given a string that ends with a slash to signal that the path is a directory to make sure that a pattern that ends with a slash (i.e. MUSTBEDIR) can tell directories and non-directories apart. However, the pattern itself (pat->pattern and pat->patternlen) that came from such a MUSTBEDIR pattern is represented as a string that ends with a slash, but patternlen does not count that trailing slash. A MUSTBEDIR pattern "element/" is represented as a counted string <"element/", 7> and this must match match pathname "element/". Because match_basename() wants to see pathname "element" to match against the pattern <"element/", 7>, reduce the length of the path to exclude the trailing slash when calling match_basename(). A similar adjustment for match_pathname() might be needed, but I didn't look into it. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- attr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/attr.c b/attr.c index 4cfe0ee..00a0016 100644 --- a/attr.c +++ b/attr.c @@ -661,14 +661,14 @@ static int path_matches(const char *pathname, int pathlen, { const char *pattern = pat->pattern; int prefix = pat->nowildcardlen; + int isdir = (pathlen && pathname[pathlen - 1] == '/'); - if ((pat->flags & EXC_FLAG_MUSTBEDIR) && - ((!pathlen) || (pathname[pathlen-1] != '/'))) + if ((pat->flags & EXC_FLAG_MUSTBEDIR) && !isdir) return 0; if (pat->flags & EXC_FLAG_NODIR) { return match_basename(pathname + basename_offset, - pathlen - basename_offset, + pathlen - basename_offset - isdir, pattern, prefix, pat->patternlen, pat->flags); } -- 1.8.2-350-g3df87a1 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 3/4] attr.c::path_matches(): special case paths that end with a slash 2013-03-26 18:39 ` [PATCH 3/4] attr.c::path_matches(): special case paths that end with a slash Junio C Hamano @ 2013-03-26 19:05 ` Jeff King 2013-03-26 21:33 ` Jeff King 2013-03-28 19:49 ` Jeff King 0 siblings, 2 replies; 66+ messages in thread From: Jeff King @ 2013-03-26 19:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn On Tue, Mar 26, 2013 at 11:39:30AM -0700, Junio C Hamano wrote: > A similar adjustment for match_pathname() might be needed, but I > didn't look into it. I notice that match_pathname takes _two_ lengths for the pattern: the nowildcardlen (called "prefix", and the full patternlen). But the first thing it does is: if (*pattern == '/') { pattern++; prefix--; } which seems obviously wrong, as patternlen should be dropped, too. But we do not seem to look at patternlen at all! I think we can drop the parameter totally. We do seem to use strncmp_icase through the rest of the function, though, which should be OK. The one exception is that we call fnmatch at the end. Should the allocation hack from the previous patch make its way into an "fnmatch_icase_bytes()" function, so we can use it here, too? And then when we have a more efficient solution, we can just plug it in there. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 3/4] attr.c::path_matches(): special case paths that end with a slash 2013-03-26 19:05 ` Jeff King @ 2013-03-26 21:33 ` Jeff King 2013-03-27 1:30 ` Duy Nguyen 2013-03-28 19:49 ` Jeff King 1 sibling, 1 reply; 66+ messages in thread From: Jeff King @ 2013-03-26 21:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn On Tue, Mar 26, 2013 at 03:05:58PM -0400, Jeff King wrote: > On Tue, Mar 26, 2013 at 11:39:30AM -0700, Junio C Hamano wrote: > > > A similar adjustment for match_pathname() might be needed, but I > > didn't look into it. > > I notice that match_pathname takes _two_ lengths for the pattern: the > nowildcardlen (called "prefix", and the full patternlen). But the first > thing it does is: > > if (*pattern == '/') { > pattern++; > prefix--; > } > > which seems obviously wrong, as patternlen should be dropped, too. But > we do not seem to look at patternlen at all! I think we can drop the > parameter totally. > > We do seem to use strncmp_icase through the rest of the function, > though, which should be OK. The one exception is that we call fnmatch at > the end. Should the allocation hack from the previous patch make its way > into an "fnmatch_icase_bytes()" function, so we can use it here, too? > And then when we have a more efficient solution, we can just plug it in > there. Hmm. match_pathname does have this: /* * baselen does not count the trailing slash. base[] may or * may not end with a trailing slash though. */ if (pathlen < baselen + 1 || (baselen && pathname[baselen] != '/') || strncmp_icase(pathname, base, baselen)) return 0; which seems to imply that the trailing slash is important here, and that we should not drop it when passing the path to match_pathname. I'm still trying to figure out exactly what it is that the extra slash check is for, and whether it might not have the same problem. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 3/4] attr.c::path_matches(): special case paths that end with a slash 2013-03-26 21:33 ` Jeff King @ 2013-03-27 1:30 ` Duy Nguyen 0 siblings, 0 replies; 66+ messages in thread From: Duy Nguyen @ 2013-03-27 1:30 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, avila.jn On Wed, Mar 27, 2013 at 4:33 AM, Jeff King <peff@peff.net> wrote: > Hmm. match_pathname does have this: > > /* > * baselen does not count the trailing slash. base[] may or > * may not end with a trailing slash though. > */ > if (pathlen < baselen + 1 || > (baselen && pathname[baselen] != '/') || > strncmp_icase(pathname, base, baselen)) > return 0; > > which seems to imply that the trailing slash is important here, and that > we should not drop it when passing the path to match_pathname. I'm > still trying to figure out exactly what it is that the extra slash check > is for, and whether it might not have the same problem. The "may not end with a trailing slash" can only happen when baselen == 0. And that rule is documented in code, at this line in last_exclude_matching_from_list: assert(x->baselen == 0 || x->base[x->baselen - 1] == '/'); -- Duy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 3/4] attr.c::path_matches(): special case paths that end with a slash 2013-03-26 19:05 ` Jeff King 2013-03-26 21:33 ` Jeff King @ 2013-03-28 19:49 ` Jeff King 1 sibling, 0 replies; 66+ messages in thread From: Jeff King @ 2013-03-28 19:49 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn On Tue, Mar 26, 2013 at 03:05:58PM -0400, Jeff King wrote: > On Tue, Mar 26, 2013 at 11:39:30AM -0700, Junio C Hamano wrote: > > > A similar adjustment for match_pathname() might be needed, but I > > didn't look into it. > [...] > We do seem to use strncmp_icase through the rest of the function, > though, which should be OK. The one exception is that we call fnmatch at > the end. Should the allocation hack from the previous patch make its way > into an "fnmatch_icase_bytes()" function, so we can use it here, too? > And then when we have a more efficient solution, we can just plug it in > there. Hmm, yeah, there is more going on here than just that. If I add the tests below, the first one (a wildcard) passes, because you fixed the fnmatch code path. But the deep/ ones do not, as they should be going through match_pathname. I expected the deep/with/wildcard one to fail (because of the fnmatch problem I mentioned above), but not the deep/and/slashless one, which should be using strncmp. I'll see if I can track down the cause. -Peff --- diff --git a/t/t5002-archive-attr-pattern.sh b/t/t5002-archive-attr-pattern.sh index 3be809c..234a615 100755 --- a/t/t5002-archive-attr-pattern.sh +++ b/t/t5002-archive-attr-pattern.sh @@ -32,6 +32,21 @@ test_expect_success 'setup' ' git add ignored-without-slash/foo && echo ignored-without-slash export-ignore >>.git/info/attributes && + mkdir -p wildcard-without-slash && + echo "ignored without slash" >wildcard-without-slash/foo && + git add wildcard-without-slash/foo && + echo "wild*-without-slash export-ignore" >>.git/info/attributes && + + mkdir -p deep/and/slashless && + echo "ignored without slash" >deep/and/slashless/foo && + git add deep/and/slashless/foo && + echo deep/and/slashless export-ignore >>.git/info/attributes && + + mkdir -p deep/with/wildcard && + echo "ignored without slash" >deep/with/wildcard/foo && + git add deep/with/wildcard/foo && + echo "deep/*t*/wildcard export-ignore" >>.git/info/attributes && + mkdir -p one-level-lower/two-levels-lower/ignored-only-if-dir && echo ignored by ignored dir >one-level-lower/two-levels-lower/ignored-only-if-dir/ignored-by-ignored-dir && git add one-level-lower && @@ -55,6 +70,12 @@ test_expect_missing archive/ignored-without-slash/foo && test_expect_missing archive/ignored-ony-if-dir/ignored-by-ignored-dir test_expect_missing archive/ignored-without-slash/ && test_expect_missing archive/ignored-without-slash/foo && +test_expect_missing archive/wildcard-without-slash/ +test_expect_missing archive/wildcard-without-slash/foo && +test_expect_missing archive/deep/and/slashless/ && +test_expect_missing archive/deep/and/slashless/foo && +test_expect_missing archive/deep/with/wildcard/ && +test_expect_missing archive/deep/with/wildcard/foo && test_expect_exists archive/one-level-lower/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-only-if-dir/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-ony-if-dir/ignored-by-ignored-dir ^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH 4/4] make sure a pattern without trailing slash matches a directory 2013-03-26 18:39 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Junio C Hamano ` (2 preceding siblings ...) 2013-03-26 18:39 ` [PATCH 3/4] attr.c::path_matches(): special case paths that end with a slash Junio C Hamano @ 2013-03-26 18:39 ` Junio C Hamano 2013-03-26 19:08 ` Jeff King 2013-03-27 1:13 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Duy Nguyen 2013-03-28 21:43 ` [PATCH v2 0/6] " Jeff King 5 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2013-03-26 18:39 UTC (permalink / raw) To: git; +Cc: Jeff King, pclouds, avila.jn From: Jeff King <peff@peff.net> Prior to v1.8.1.1, with: git init echo content >foo && mkdir subdir && echo content >subdir/bar && echo "subdir export-ignore" >.gitattributes git add . && git commit -m one && git archive HEAD | tar tf - the resulting archive would contain only "foo" and ".gitattributes", not subdir. This was broken with a recent change that intended to allow "subdir/ export-ignore" to also exclude the directory, but instead ended up _requiring_ the trailing slash by mistake. A pattern "subdir" should match any path "subdir", whether it is a directory or a non-diretory. A pattern "subdir/" insists that a path "subdir" must be a directory for it to match. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- t/t5002-archive-attr-pattern.sh | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/t/t5002-archive-attr-pattern.sh b/t/t5002-archive-attr-pattern.sh index 0c847fb..3be809c 100755 --- a/t/t5002-archive-attr-pattern.sh +++ b/t/t5002-archive-attr-pattern.sh @@ -27,6 +27,10 @@ test_expect_success 'setup' ' echo ignored-only-if-dir/ export-ignore >>.git/info/attributes && git add ignored-only-if-dir && + mkdir -p ignored-without-slash && + echo ignored without slash >ignored-without-slash/foo && + git add ignored-without-slash/foo && + echo ignored-without-slash export-ignore >>.git/info/attributes && mkdir -p one-level-lower/two-levels-lower/ignored-only-if-dir && echo ignored by ignored dir >one-level-lower/two-levels-lower/ignored-only-if-dir/ignored-by-ignored-dir && @@ -49,6 +53,8 @@ test_expect_exists archive/not-ignored-dir/ignored-only-if-dir test_expect_exists archive/not-ignored-dir/ test_expect_missing archive/ignored-only-if-dir/ test_expect_missing archive/ignored-ony-if-dir/ignored-by-ignored-dir +test_expect_missing archive/ignored-without-slash/ && +test_expect_missing archive/ignored-without-slash/foo && test_expect_exists archive/one-level-lower/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-only-if-dir/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-ony-if-dir/ignored-by-ignored-dir -- 1.8.2-350-g3df87a1 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 4/4] make sure a pattern without trailing slash matches a directory 2013-03-26 18:39 ` [PATCH 4/4] make sure a pattern without trailing slash matches a directory Junio C Hamano @ 2013-03-26 19:08 ` Jeff King 0 siblings, 0 replies; 66+ messages in thread From: Jeff King @ 2013-03-26 19:08 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn On Tue, Mar 26, 2013 at 11:39:31AM -0700, Junio C Hamano wrote: > From: Jeff King <peff@peff.net> > > Prior to v1.8.1.1, with: > > git init > echo content >foo && > mkdir subdir && > echo content >subdir/bar && > echo "subdir export-ignore" >.gitattributes > git add . && > git commit -m one && > git archive HEAD | tar tf - > > the resulting archive would contain only "foo" and ".gitattributes", > not subdir. This was broken with a recent change that intended to > allow "subdir/ export-ignore" to also exclude the directory, but > instead ended up _requiring_ the trailing slash by mistake. Yeah, I think that is fine. I'd just squash this test and description into the previous patch, though (I do not care about dropping my commit count by 1). And of course, Signed-off-by: Jeff King <peff@peff.net> > A pattern "subdir" should match any path "subdir", whether it is a > directory or a non-diretory. A pattern "subdir/" insists that a > path "subdir" must be a directory for it to match. s/diretory/directory/ -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward 2013-03-26 18:39 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Junio C Hamano ` (3 preceding siblings ...) 2013-03-26 18:39 ` [PATCH 4/4] make sure a pattern without trailing slash matches a directory Junio C Hamano @ 2013-03-27 1:13 ` Duy Nguyen 2013-03-27 3:57 ` Junio C Hamano 2013-03-28 21:43 ` [PATCH v2 0/6] " Jeff King 5 siblings, 1 reply; 66+ messages in thread From: Duy Nguyen @ 2013-03-27 1:13 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, peff, avila.jn qOn Tue, Mar 26, 2013 at 11:39:27AM -0700, Junio C Hamano wrote: > So here is an attempt to fix the unintended regression, on top of > 9db9eecfe5c2 (attr: avoid calling find_basename() twice per path, > 2013-01-16). It consists of four patches. Not that I disagree with this. Just wanted to see how far the "dtype" idea went. How about this? git_check_attr() now takes dtype as an argument and the caller must not add the trailing slash. This could be split into two patches, one for git_check_attr prototype change, and the other the real meat. -- 8< -- diff --git a/archive.c b/archive.c index 93e00bb..ab811b3 100644 --- a/archive.c +++ b/archive.c @@ -112,7 +112,7 @@ static int write_archive_entry(const unsigned char *sha1, const char *base, write_archive_entry_fn_t write_entry = c->write_entry; struct git_attr_check check[2]; const char *path_without_prefix; - int err; + int err, dtype; args->convert = 0; strbuf_reset(&path); @@ -120,18 +120,18 @@ static int write_archive_entry(const unsigned char *sha1, const char *base, strbuf_add(&path, args->base, args->baselen); strbuf_add(&path, base, baselen); strbuf_addstr(&path, filename); - if (S_ISDIR(mode) || S_ISGITLINK(mode)) - strbuf_addch(&path, '/'); + dtype = S_ISDIR(mode) || S_ISGITLINK(mode) ? DT_DIR : DT_REG; path_without_prefix = path.buf + args->baselen; setup_archive_check(check); - if (!git_check_attr(path_without_prefix, ARRAY_SIZE(check), check)) { + if (!git_check_attr(path_without_prefix, dtype, ARRAY_SIZE(check), check)) { if (ATTR_TRUE(check[0].value)) return 0; args->convert = ATTR_TRUE(check[1].value); } if (S_ISDIR(mode) || S_ISGITLINK(mode)) { + strbuf_addch(&path, '/'); if (args->verbose) fprintf(stderr, "%.*s\n", (int)path.len, path.buf); err = write_entry(args, sha1, path.buf, path.len, mode); diff --git a/attr.c b/attr.c index e2f9377..ab6422c 100644 --- a/attr.c +++ b/attr.c @@ -255,6 +255,8 @@ static struct match_attr *parse_attr_line(const char *line, const char *src, &res->u.pat.patternlen, &res->u.pat.flags, &res->u.pat.nowildcardlen); + if (res->u.pat.flags & EXC_FLAG_MUSTBEDIR) + p[res->u.pat.patternlen] = '\0'; if (res->u.pat.flags & EXC_FLAG_NEGATIVE) { warning(_("Negative patterns are ignored in git attributes\n" "Use '\\!' for literal leading exclamation.")); @@ -657,15 +659,14 @@ static void prepare_attr_stack(const char *path, int dirlen) } static int path_matches(const char *pathname, int pathlen, - const char *basename, + const char *basename, int dtype, const struct pattern *pat, const char *base, int baselen) { const char *pattern = pat->pattern; int prefix = pat->nowildcardlen; - if ((pat->flags & EXC_FLAG_MUSTBEDIR) && - ((!pathlen) || (pathname[pathlen-1] != '/'))) + if ((pat->flags & EXC_FLAG_MUSTBEDIR) && dtype != DT_DIR) return 0; if (pat->flags & EXC_FLAG_NODIR) { @@ -704,7 +705,7 @@ static int fill_one(const char *what, struct match_attr *a, int rem) } static int fill(const char *path, int pathlen, const char *basename, - struct attr_stack *stk, int rem) + int dtype, struct attr_stack *stk, int rem) { int i; const char *base = stk->origin ? stk->origin : ""; @@ -713,7 +714,7 @@ static int fill(const char *path, int pathlen, const char *basename, struct match_attr *a = stk->attrs[i]; if (a->is_macro) continue; - if (path_matches(path, pathlen, basename, + if (path_matches(path, pathlen, basename, dtype, &a->u.pat, base, stk->originlen)) rem = fill_one("fill", a, rem); } @@ -748,20 +749,17 @@ static int macroexpand_one(int attr_nr, int rem) * Collect all attributes for path into the array pointed to by * check_all_attr. */ -static void collect_all_attrs(const char *path) +static void collect_all_attrs(const char *path, int dtype) { struct attr_stack *stk; int i, pathlen, rem, dirlen; - const char *basename, *cp, *last_slash = NULL; + const char *basename, *cp; - for (cp = path; *cp; cp++) { - if (*cp == '/' && cp[1]) - last_slash = cp; - } - pathlen = cp - path; - if (last_slash) { - basename = last_slash + 1; - dirlen = last_slash - path; + cp = strrchr(path, '/'); + pathlen = strlen(path); + if (cp) { + basename = cp + 1; + dirlen = cp - path; } else { basename = path; dirlen = 0; @@ -773,14 +771,14 @@ static void collect_all_attrs(const char *path) rem = attr_nr; for (stk = attr_stack; 0 < rem && stk; stk = stk->prev) - rem = fill(path, pathlen, basename, stk, rem); + rem = fill(path, pathlen, basename, dtype, stk, rem); } -int git_check_attr(const char *path, int num, struct git_attr_check *check) +int git_check_attr(const char *path, int dtype, int num, struct git_attr_check *check) { int i; - collect_all_attrs(path); + collect_all_attrs(path, dtype); for (i = 0; i < num; i++) { const char *value = check_all_attr[check[i].attr->attr_nr].value; @@ -796,7 +794,7 @@ int git_all_attrs(const char *path, int *num, struct git_attr_check **check) { int i, count, j; - collect_all_attrs(path); + collect_all_attrs(path, DT_REG); /* Count the number of attributes that are set. */ count = 0; diff --git a/attr.h b/attr.h index 8b08d33..ce39c9c 100644 --- a/attr.h +++ b/attr.h @@ -36,7 +36,7 @@ struct git_attr_check { */ char *git_attr_name(struct git_attr *); -int git_check_attr(const char *path, int, struct git_attr_check *); +int git_check_attr(const char *path, int, int, struct git_attr_check *); /* * Retrieve all attributes that apply to the specified path. *num diff --git a/builtin/check-attr.c b/builtin/check-attr.c index 075d01d..261e57d 100644 --- a/builtin/check-attr.c +++ b/builtin/check-attr.c @@ -49,7 +49,7 @@ static void check_attr(const char *prefix, int cnt, char *full_path = prefix_path(prefix, prefix ? strlen(prefix) : 0, file); if (check != NULL) { - if (git_check_attr(full_path, cnt, check)) + if (git_check_attr(full_path, DT_REG, cnt, check)) die("git_check_attr died"); output_attr(cnt, check, file); } else { diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index f069462..7a77288 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -894,7 +894,7 @@ static int no_try_delta(const char *path) struct git_attr_check check[1]; setup_delta_attr_check(check); - if (git_check_attr(path, ARRAY_SIZE(check), check)) + if (git_check_attr(path, DT_REG, ARRAY_SIZE(check), check)) return 0; if (ATTR_FALSE(check->value)) return 1; diff --git a/convert.c b/convert.c index 3520252..3f09cbb 100644 --- a/convert.c +++ b/convert.c @@ -755,7 +755,7 @@ static void convert_attrs(struct conv_attrs *ca, const char *path) git_config(read_convert_config, NULL); } - if (!git_check_attr(path, NUM_CONV_ATTRS, ccheck)) { + if (!git_check_attr(path, DT_REG, NUM_CONV_ATTRS, ccheck)) { ca->crlf_action = git_path_check_crlf(path, ccheck + 4); if (ca->crlf_action == CRLF_GUESS) ca->crlf_action = git_path_check_crlf(path, ccheck + 0); diff --git a/ll-merge.c b/ll-merge.c index fb61ea6..1944392 100644 --- a/ll-merge.c +++ b/ll-merge.c @@ -342,7 +342,7 @@ static int git_path_check_merge(const char *path, struct git_attr_check check[2] check[0].attr = git_attr("merge"); check[1].attr = git_attr("conflict-marker-size"); } - return git_check_attr(path, 2, check); + return git_check_attr(path, DT_REG, 2, check); } static void normalize_file(mmfile_t *mm, const char *path) @@ -399,7 +399,7 @@ int ll_merge_marker_size(const char *path) if (!check.attr) check.attr = git_attr("conflict-marker-size"); - if (!git_check_attr(path, 1, &check) && check.value) { + if (!git_check_attr(path, DT_REG, 1, &check) && check.value) { marker_size = atoi(check.value); if (marker_size <= 0) marker_size = DEFAULT_CONFLICT_MARKER_SIZE; diff --git a/t/t5002-archive-attr-pattern.sh b/t/t5002-archive-attr-pattern.sh index 0c847fb..98ccc3c 100755 --- a/t/t5002-archive-attr-pattern.sh +++ b/t/t5002-archive-attr-pattern.sh @@ -27,6 +27,10 @@ test_expect_success 'setup' ' echo ignored-only-if-dir/ export-ignore >>.git/info/attributes && git add ignored-only-if-dir && + mkdir -p ignored-without-slash && + echo ignored without slash >ignored-without-slash/foo && + git add ignored-without-slash/foo && + echo ignored-without-slash export-ignore >>.git/info/attributes && mkdir -p one-level-lower/two-levels-lower/ignored-only-if-dir && echo ignored by ignored dir >one-level-lower/two-levels-lower/ignored-only-if-dir/ignored-by-ignored-dir && @@ -49,6 +53,8 @@ test_expect_exists archive/not-ignored-dir/ignored-only-if-dir test_expect_exists archive/not-ignored-dir/ test_expect_missing archive/ignored-only-if-dir/ test_expect_missing archive/ignored-ony-if-dir/ignored-by-ignored-dir +test_expect_missing archive/ignored-without-slash/ && +test_expect_missing archive/ignored-without-slash/foo && test_expect_exists archive/one-level-lower/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-only-if-dir/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-ony-if-dir/ignored-by-ignored-dir diff --git a/userdiff.c b/userdiff.c index ea43a03..fd4f576 100644 --- a/userdiff.c +++ b/userdiff.c @@ -260,7 +260,7 @@ struct userdiff_driver *userdiff_find_by_path(const char *path) if (!path) return NULL; - if (git_check_attr(path, 1, &check)) + if (git_check_attr(path, DT_REG, 1, &check)) return NULL; if (ATTR_TRUE(check.value)) diff --git a/ws.c b/ws.c index b498d75..34c2145 100644 --- a/ws.c +++ b/ws.c @@ -88,7 +88,7 @@ unsigned whitespace_rule(const char *pathname) struct git_attr_check attr_whitespace_rule; setup_whitespace_attr_check(&attr_whitespace_rule); - if (!git_check_attr(pathname, 1, &attr_whitespace_rule)) { + if (!git_check_attr(pathname, DT_REG, 1, &attr_whitespace_rule)) { const char *value; value = attr_whitespace_rule.value; -- 8< -- ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward 2013-03-27 1:13 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Duy Nguyen @ 2013-03-27 3:57 ` Junio C Hamano 2013-03-27 4:01 ` Duy Nguyen 0 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2013-03-27 3:57 UTC (permalink / raw) To: Duy Nguyen; +Cc: git, peff, avila.jn Duy Nguyen <pclouds@gmail.com> writes: > How about this? git_check_attr() now takes dtype as an argument > and the caller must not add the trailing slash. This could be > split into two patches, one for git_check_attr prototype change, > and the other the real meat. "git check-attr" fundamentally cannot know, but aside from that do all the callsites know if the path in question is a directory or not? My impression was that there are some cases you do not necessarily know. "Add slash when you _know_ it is a directory, but otherwise pass the path without trailing slash." is easier to understand than "Pass 040000 if you know it is a directory, but otherwise pass 100644", exactly because "otherwise" in both of these instructions include the case where the path in question _is_ a directory (you just do not know what it is). I do not particularly like the "trailing slash on the basename" approach, but it feels less bad than passing dtype down. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward 2013-03-27 3:57 ` Junio C Hamano @ 2013-03-27 4:01 ` Duy Nguyen 0 siblings, 0 replies; 66+ messages in thread From: Duy Nguyen @ 2013-03-27 4:01 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, peff, avila.jn On Wed, Mar 27, 2013 at 10:57 AM, Junio C Hamano <gitster@pobox.com> wrote: > Duy Nguyen <pclouds@gmail.com> writes: > >> How about this? git_check_attr() now takes dtype as an argument >> and the caller must not add the trailing slash. This could be >> split into two patches, one for git_check_attr prototype change, >> and the other the real meat. > > "git check-attr" fundamentally cannot know, but aside from that do > all the callsites know if the path in question is a directory or > not? My impression was that there are some cases you do not > necessarily know. > > "Add slash when you _know_ it is a directory, but otherwise pass the > path without trailing slash." is easier to understand than "Pass > 040000 if you know it is a directory, but otherwise pass 100644", > exactly because "otherwise" in both of these instructions include > the case where the path in question _is_ a directory (you just do > not know what it is). > > I do not particularly like the "trailing slash on the basename" > approach, but it feels less bad than passing dtype down. Fair enough. I'll rebase my changes on top of yours as long term cleanup. Maybe I can make nwildmatch take patternlen too. -- Duy ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH v2 0/6] attribute regression fix for maint-1.8.1 and upward 2013-03-26 18:39 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Junio C Hamano ` (4 preceding siblings ...) 2013-03-27 1:13 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Duy Nguyen @ 2013-03-28 21:43 ` Jeff King 2013-03-28 21:45 ` [PATCH 1/6] attr.c::path_matches(): the basename is part of the pathname Jeff King ` (5 more replies) 5 siblings, 6 replies; 66+ messages in thread From: Jeff King @ 2013-03-28 21:43 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn On Tue, Mar 26, 2013 at 11:39:27AM -0700, Junio C Hamano wrote: > So here is an attempt to fix the unintended regression, on top of > 9db9eecfe5c2 (attr: avoid calling find_basename() twice per path, > 2013-01-16). It consists of four patches. Here's my update to the series. I think this should fix all of the issues. And it should be very easy to drop in Duy's nwildmatch later on; it can just replace the fnmatch_icase_mem function added in patch 2 below. The main fix in this iteration is that match_pathname receives the same treatment as match_basename, which is done in patches 3 and 4 (the issues were subtly different enough that I didn't want to squash it all together; plus, gotta keep that commit count up). [1/6]: attr.c::path_matches(): the basename is part of the pathname [2/6]: dir.c::match_basename(): pay attention to the length of string parameters [3/6]: dir.c::match_pathname(): adjust patternlen when shifting pattern [4/6]: dir.c::match_pathname(): pay attention to the length of string parameters [5/6]: attr.c::path_matches(): special case paths that end with a slash [6/6]: t: check that a pattern without trailing slash matches a directory -Peff PS I followed your subject-naming convention since I was adding into your series, but it seems quite long to me. I would have just said: "match_basename: pay attention...". ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH 1/6] attr.c::path_matches(): the basename is part of the pathname 2013-03-28 21:43 ` [PATCH v2 0/6] " Jeff King @ 2013-03-28 21:45 ` Jeff King 2013-03-28 21:47 ` [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters Jeff King ` (4 subsequent siblings) 5 siblings, 0 replies; 66+ messages in thread From: Jeff King @ 2013-03-28 21:45 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn From: Junio C Hamano <gitster@pobox.com> The function takes two strings (pathname and basename) as if they are independent strings, but in reality, the latter is always pointing into a substring in the former. Clarify this relationship by expressing the latter as an offset into the former. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jeff King <peff@peff.net> --- This is identical to the original 1/4. attr.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/attr.c b/attr.c index ab2aab2..4cfe0ee 100644 --- a/attr.c +++ b/attr.c @@ -655,7 +655,7 @@ static int path_matches(const char *pathname, int pathlen, } static int path_matches(const char *pathname, int pathlen, - const char *basename, + int basename_offset, const struct pattern *pat, const char *base, int baselen) { @@ -667,8 +667,8 @@ static int path_matches(const char *pathname, int pathlen, return 0; if (pat->flags & EXC_FLAG_NODIR) { - return match_basename(basename, - pathlen - (basename - pathname), + return match_basename(pathname + basename_offset, + pathlen - basename_offset, pattern, prefix, pat->patternlen, pat->flags); } @@ -701,7 +701,7 @@ static int fill_one(const char *what, struct match_attr *a, int rem) return rem; } -static int fill(const char *path, int pathlen, const char *basename, +static int fill(const char *path, int pathlen, int basename_offset, struct attr_stack *stk, int rem) { int i; @@ -711,7 +711,7 @@ static int fill(const char *path, int pathlen, const char *basename, struct match_attr *a = stk->attrs[i]; if (a->is_macro) continue; - if (path_matches(path, pathlen, basename, + if (path_matches(path, pathlen, basename_offset, &a->u.pat, base, stk->originlen)) rem = fill_one("fill", a, rem); } @@ -750,7 +750,8 @@ static void collect_all_attrs(const char *path) { struct attr_stack *stk; int i, pathlen, rem, dirlen; - const char *basename, *cp, *last_slash = NULL; + const char *cp, *last_slash = NULL; + int basename_offset; for (cp = path; *cp; cp++) { if (*cp == '/' && cp[1]) @@ -758,10 +759,10 @@ static void collect_all_attrs(const char *path) } pathlen = cp - path; if (last_slash) { - basename = last_slash + 1; + basename_offset = last_slash + 1 - path; dirlen = last_slash - path; } else { - basename = path; + basename_offset = 0; dirlen = 0; } @@ -771,7 +772,7 @@ static void collect_all_attrs(const char *path) rem = attr_nr; for (stk = attr_stack; 0 < rem && stk; stk = stk->prev) - rem = fill(path, pathlen, basename, stk, rem); + rem = fill(path, pathlen, basename_offset, stk, rem); } int git_check_attr(const char *path, int num, struct git_attr_check *check) -- 1.8.2.13.g0f18d3c ^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-28 21:43 ` [PATCH v2 0/6] " Jeff King 2013-03-28 21:45 ` [PATCH 1/6] attr.c::path_matches(): the basename is part of the pathname Jeff King @ 2013-03-28 21:47 ` Jeff King 2013-03-28 22:40 ` Jeff King 2013-03-29 1:25 ` Duy Nguyen 2013-03-28 21:47 ` [PATCH 3/6] dir.c::match_pathname(): adjust patternlen when shifting pattern Jeff King ` (3 subsequent siblings) 5 siblings, 2 replies; 66+ messages in thread From: Jeff King @ 2013-03-28 21:47 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn From: Junio C Hamano <gitster@pobox.com> The function takes two counted strings (<basename, basenamelen> and <pattern, patternlen>) as parameters, together with prefix (the length of the prefix in pattern that is to be matched literally without globbing against the basename) and EXC_* flags that tells it how to match the pattern against the basename. However, it did not pay attention to the length of these counted strings. Update them to do the following: * When the entire pattern is to be matched literally, the pattern matches the basename only when the lengths of them are the same, and they match up to that length. * When the pattern is "*" followed by a string to be matched literally, make sure that the basenamelen is equal or longer than the "literal" part of the pattern, and the tail of the basename string matches that literal part. * Otherwise, use the new fnmatch_icase_mem helper to make sure we only lookmake sure we use only look at the counted part of the strings. Because these counted strings are full strings most of the time, we check for termination to avoid unnecessary allocation. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jeff King <peff@peff.net> --- Compared to v1: - This factors the fnmatch bits into a helper function so we can reuse it later. As a result, the variable names are changed a bit. - The original did: if (use_pat) strbuf_release(&pat); but AFAICT that was a useless conditional; use_pat always points to either the incoming buffer or the strbuf. But strbuf_release will handle both cases for us. dir.c | 40 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/dir.c b/dir.c index 5a83aa7..fac82c1 100644 --- a/dir.c +++ b/dir.c @@ -34,6 +34,33 @@ int fnmatch_icase(const char *pattern, const char *string, int flags) return fnmatch(pattern, string, flags | (ignore_case ? FNM_CASEFOLD : 0)); } +static int fnmatch_icase_mem(const char *pattern, int patternlen, + const char *string, int stringlen, + int flags) +{ + int match_status; + struct strbuf pat_buf = STRBUF_INIT; + struct strbuf str_buf = STRBUF_INIT; + const char *use_pat = pattern; + const char *use_str = string; + + if (pattern[patternlen]) { + strbuf_add(&pat_buf, pattern, patternlen); + use_pat = pat_buf.buf; + } + if (string[stringlen]) { + strbuf_add(&str_buf, string, stringlen); + use_str = str_buf.buf; + } + + match_status = fnmatch_icase(use_pat, use_str, 0); + + strbuf_release(&pat_buf); + strbuf_release(&str_buf); + + return match_status; +} + static size_t common_prefix_len(const char **pathspec) { const char *n, *first; @@ -537,15 +564,20 @@ int match_basename(const char *basename, int basenamelen, int flags) { if (prefix == patternlen) { - if (!strcmp_icase(pattern, basename)) + if (patternlen == basenamelen && + !strncmp_icase(pattern, basename, basenamelen)) return 1; } else if (flags & EXC_FLAG_ENDSWITH) { + /* "*literal" matching against "fooliteral" */ if (patternlen - 1 <= basenamelen && - !strcmp_icase(pattern + 1, - basename + basenamelen - patternlen + 1)) + !strncmp_icase(pattern + 1, + basename + basenamelen - (patternlen - 1), + patternlen - 1)) return 1; } else { - if (fnmatch_icase(pattern, basename, 0) == 0) + if (fnmatch_icase_mem(pattern, patternlen, + basename, basenamelen, + 0) == 0) return 1; } return 0; -- 1.8.2.13.g0f18d3c ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-28 21:47 ` [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters Jeff King @ 2013-03-28 22:40 ` Jeff King 2013-03-28 22:49 ` Jeff King 2013-03-29 1:25 ` Duy Nguyen 1 sibling, 1 reply; 66+ messages in thread From: Jeff King @ 2013-03-28 22:40 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn On Thu, Mar 28, 2013 at 05:47:28PM -0400, Jeff King wrote: > From: Junio C Hamano <gitster@pobox.com> > > The function takes two counted strings (<basename, basenamelen> and > <pattern, patternlen>) as parameters, together with prefix (the > length of the prefix in pattern that is to be matched literally > without globbing against the basename) and EXC_* flags that tells it > how to match the pattern against the basename. > > However, it did not pay attention to the length of these counted > strings. Update them to do the following: > > * When the entire pattern is to be matched literally, the pattern > matches the basename only when the lengths of them are the same, > and they match up to that length. Hrm. Though the tip of this series passes all tests, this one actually breaks bisectability. What happens is that the existing code passes: path=foo/ pathlen=4 pattern=foo/ patternlen=3 match_basename is happy to compare "foo/" to "foo/" and realize they're equal. With this change, we compare "foo" to "foo/" and do not match. It isn't until the later patch where you start passing pathlen=3 that it works again. I wonder if it is worth reordering the series to put the path_matches fix first. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-28 22:40 ` Jeff King @ 2013-03-28 22:49 ` Jeff King 2013-03-28 23:10 ` Junio C Hamano 2013-03-28 23:40 ` Duy Nguyen 0 siblings, 2 replies; 66+ messages in thread From: Jeff King @ 2013-03-28 22:49 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn On Thu, Mar 28, 2013 at 06:40:27PM -0400, Jeff King wrote: > On Thu, Mar 28, 2013 at 05:47:28PM -0400, Jeff King wrote: > > > From: Junio C Hamano <gitster@pobox.com> > > > > The function takes two counted strings (<basename, basenamelen> and > > <pattern, patternlen>) as parameters, together with prefix (the > > length of the prefix in pattern that is to be matched literally > > without globbing against the basename) and EXC_* flags that tells it > > how to match the pattern against the basename. > > > > However, it did not pay attention to the length of these counted > > strings. Update them to do the following: > > > > * When the entire pattern is to be matched literally, the pattern > > matches the basename only when the lengths of them are the same, > > and they match up to that length. > > Hrm. Though the tip of this series passes all tests, this one actually > breaks bisectability. What happens is that the existing code passes: Ugh. That is a problem, but this series does _not_ pass all tests. I think I failed to run the complete test suite on the correct tip. My match_pathspec fix breaks at least t1011. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-28 22:49 ` Jeff King @ 2013-03-28 23:10 ` Junio C Hamano 2013-03-28 23:40 ` Duy Nguyen 1 sibling, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2013-03-28 23:10 UTC (permalink / raw) To: Jeff King; +Cc: git, pclouds, avila.jn Jeff King <peff@peff.net> writes: > On Thu, Mar 28, 2013 at 06:40:27PM -0400, Jeff King wrote: > >> On Thu, Mar 28, 2013 at 05:47:28PM -0400, Jeff King wrote: >> >> > From: Junio C Hamano <gitster@pobox.com> >> > >> > The function takes two counted strings (<basename, basenamelen> and >> > <pattern, patternlen>) as parameters, together with prefix (the >> > length of the prefix in pattern that is to be matched literally >> > without globbing against the basename) and EXC_* flags that tells it >> > how to match the pattern against the basename. >> > >> > However, it did not pay attention to the length of these counted >> > strings. Update them to do the following: >> > >> > * When the entire pattern is to be matched literally, the pattern >> > matches the basename only when the lengths of them are the same, >> > and they match up to that length. >> >> Hrm. Though the tip of this series passes all tests, this one actually >> breaks bisectability. What happens is that the existing code passes: > > Ugh. That is a problem, but this series does _not_ pass all tests. I > think I failed to run the complete test suite on the correct tip. > > My match_pathspec fix breaks at least t1011. Yeah, the tip of 'jch' (slightly ahead of 'next' that I use myself) has 0003, 1011 and 3001 broken X-<. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-28 22:49 ` Jeff King 2013-03-28 23:10 ` Junio C Hamano @ 2013-03-28 23:40 ` Duy Nguyen 1 sibling, 0 replies; 66+ messages in thread From: Duy Nguyen @ 2013-03-28 23:40 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, avila.jn On Fri, Mar 29, 2013 at 5:49 AM, Jeff King <peff@peff.net> wrote: > My match_pathspec fix breaks at least t1011. Haven't looked closely at the series, but I suspect you need this http://article.gmane.org/gmane.comp.version-control.git/219008 -- Duy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-28 21:47 ` [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters Jeff King 2013-03-28 22:40 ` Jeff King @ 2013-03-29 1:25 ` Duy Nguyen 2013-03-29 3:02 ` Jeff King 1 sibling, 1 reply; 66+ messages in thread From: Duy Nguyen @ 2013-03-29 1:25 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, avila.jn On Fri, Mar 29, 2013 at 4:47 AM, Jeff King <peff@peff.net> wrote: > +static int fnmatch_icase_mem(const char *pattern, int patternlen, > + const char *string, int stringlen, > + int flags) > +{ > + int match_status; > + struct strbuf pat_buf = STRBUF_INIT; > + struct strbuf str_buf = STRBUF_INIT; > + const char *use_pat = pattern; > + const char *use_str = string; > + > + if (pattern[patternlen]) { > + strbuf_add(&pat_buf, pattern, patternlen); > + use_pat = pat_buf.buf; > + } > + if (string[stringlen]) { > + strbuf_add(&str_buf, string, stringlen); > + use_str = str_buf.buf; > + } > + > + match_status = fnmatch_icase(use_pat, use_str, 0); You should pass flags in here instead of 0. -- Duy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-29 1:25 ` Duy Nguyen @ 2013-03-29 3:02 ` Jeff King 2013-03-29 5:57 ` Junio C Hamano 0 siblings, 1 reply; 66+ messages in thread From: Jeff King @ 2013-03-29 3:02 UTC (permalink / raw) To: Duy Nguyen; +Cc: Junio C Hamano, git, avila.jn On Fri, Mar 29, 2013 at 08:25:00AM +0700, Nguyen Thai Ngoc Duy wrote: > On Fri, Mar 29, 2013 at 4:47 AM, Jeff King <peff@peff.net> wrote: > > +static int fnmatch_icase_mem(const char *pattern, int patternlen, > > + const char *string, int stringlen, > > + int flags) > > +{ > > + int match_status; > > + struct strbuf pat_buf = STRBUF_INIT; > > + struct strbuf str_buf = STRBUF_INIT; > > + const char *use_pat = pattern; > > + const char *use_str = string; > > + > > + if (pattern[patternlen]) { > > + strbuf_add(&pat_buf, pattern, patternlen); > > + use_pat = pat_buf.buf; > > + } > > + if (string[stringlen]) { > > + strbuf_add(&str_buf, string, stringlen); > > + use_str = str_buf.buf; > > + } > > + > > + match_status = fnmatch_icase(use_pat, use_str, 0); > > You should pass flags in here instead of 0. Eek, yeah, that's obviously wrong. Thanks for catching it. Fixing that clears up all of the test failures outside of t5002. And if you move patch 5 ("special case paths that end with a slash") into position 2, it cleans up the mid-series failures of t5002, making the series clean for later bisecting. Thanks for looking it over. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters 2013-03-29 3:02 ` Jeff King @ 2013-03-29 5:57 ` Junio C Hamano 0 siblings, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2013-03-29 5:57 UTC (permalink / raw) To: Jeff King; +Cc: Duy Nguyen, git, avila.jn Jeff King <peff@peff.net> writes: > Eek, yeah, that's obviously wrong. Thanks for catching it. Fixing that > clears up all of the test failures outside of t5002. > > And if you move patch 5 ("special case paths that end with a slash") > into position 2, it cleans up the mid-series failures of t5002, making > the series clean for later bisecting. Yeah, the tip becomes test-clean with that "flags" change, but it appears that there seems to be some mismerge (there is another topic or two in flight that want to touch dir.c for pathspecs) that breaks 0003 and 3001 when merged to 'pu'. Duy, can you take another look? ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH 3/6] dir.c::match_pathname(): adjust patternlen when shifting pattern 2013-03-28 21:43 ` [PATCH v2 0/6] " Jeff King 2013-03-28 21:45 ` [PATCH 1/6] attr.c::path_matches(): the basename is part of the pathname Jeff King 2013-03-28 21:47 ` [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters Jeff King @ 2013-03-28 21:47 ` Jeff King 2013-03-28 21:48 ` [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters Jeff King ` (2 subsequent siblings) 5 siblings, 0 replies; 66+ messages in thread From: Jeff King @ 2013-03-28 21:47 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn If we receive a pattern that starts with "/", we shift it forward to avoid looking at the "/" part. Since the prefix and patternlen parameters are counts of what is in the pattern, we must decrement them as we increment the pointer. We remembered to handle prefix, but not patternlen. This didn't cause any bugs, though, because the patternlen parameter is not actually used. Since it will be used in future patches, let's correct this oversight. Signed-off-by: Jeff King <peff@peff.net> --- New in this iteration. dir.c | 1 + 1 file changed, 1 insertion(+) diff --git a/dir.c b/dir.c index fac82c1..cc4ce8b 100644 --- a/dir.c +++ b/dir.c @@ -597,6 +597,7 @@ int match_pathname(const char *pathname, int pathlen, */ if (*pattern == '/') { pattern++; + patternlen--; prefix--; } -- 1.8.2.13.g0f18d3c ^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-28 21:43 ` [PATCH v2 0/6] " Jeff King ` (2 preceding siblings ...) 2013-03-28 21:47 ` [PATCH 3/6] dir.c::match_pathname(): adjust patternlen when shifting pattern Jeff King @ 2013-03-28 21:48 ` Jeff King 2013-03-28 22:30 ` Junio C Hamano 2013-03-29 8:45 ` Duy Nguyen 2013-03-28 21:49 ` [PATCH 5/6] attr.c::path_matches(): special case paths that end with a slash Jeff King 2013-03-28 21:50 ` [PATCH 6/6] t: check that a pattern without trailing slash matches a directory Jeff King 5 siblings, 2 replies; 66+ messages in thread From: Jeff King @ 2013-03-28 21:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn This function takes two counted strings: a <pattern, patternlen> pair and a <pathname, pathlen> pair. But we end up feeding the result to fnmatch, which expects NUL-terminated strings. We can fix this by calling the fnmatch_icase_mem function, which handles re-allocating into a NUL-terminated string if necessary. While we're at it, we can avoid even calling fnmatch in some cases. In addition to patternlen, we get "prefix", the size of the pattern that contains no wildcard characters. We do a straight match of the prefix part first, and then use fnmatch to cover the rest. But if there are no wildcards in the pattern at all, we do not even need to call fnmatch; we would simply be comparing two empty strings. Signed-off-by: Jeff King <peff@peff.net> --- New in this iteration. dir.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/dir.c b/dir.c index cc4ce8b..3ad44c3 100644 --- a/dir.c +++ b/dir.c @@ -624,11 +624,22 @@ int match_pathname(const char *pathname, int pathlen, if (strncmp_icase(pattern, name, prefix)) return 0; pattern += prefix; + patternlen -= prefix; name += prefix; namelen -= prefix; + + /* + * If the whole pattern did not have a wildcard, + * then our prefix match is all we need; we + * do not need to call fnmatch at all. + */ + if (!patternlen && !namelen) + return 1; } - return fnmatch_icase(pattern, name, FNM_PATHNAME) == 0; + return fnmatch_icase_mem(pattern, patternlen, + name, namelen, + FNM_PATHNAME) == 0; } /* Scan the list and let the last match determine the fate. -- 1.8.2.13.g0f18d3c ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-28 21:48 ` [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters Jeff King @ 2013-03-28 22:30 ` Junio C Hamano 2013-03-29 8:45 ` Duy Nguyen 1 sibling, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2013-03-28 22:30 UTC (permalink / raw) To: Jeff King; +Cc: git, pclouds, avila.jn Jeff King <peff@peff.net> writes: > This function takes two counted strings: a <pattern, patternlen> pair > and a <pathname, pathlen> pair. But we end up feeding the result to > fnmatch, which expects NUL-terminated strings. > > We can fix this by calling the fnmatch_icase_mem function, which > handles re-allocating into a NUL-terminated string if necessary. > > While we're at it, we can avoid even calling fnmatch in some cases. In > addition to patternlen, we get "prefix", the size of the pattern that > contains no wildcard characters. We do a straight match of the prefix > part first, and then use fnmatch to cover the rest. But if there are > no wildcards in the pattern at all, we do not even need to call > fnmatch; we would simply be comparing two empty strings. It is not a new issue, but it would be nicer to have a comment to warn people that this part needs to be adjusted when they want to add support for using FNM_PERIOD in our codebase. Other than that, looks sane to me. Thanks. > > Signed-off-by: Jeff King <peff@peff.net> > --- > New in this iteration. > > dir.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/dir.c b/dir.c > index cc4ce8b..3ad44c3 100644 > --- a/dir.c > +++ b/dir.c > @@ -624,11 +624,22 @@ int match_pathname(const char *pathname, int pathlen, > if (strncmp_icase(pattern, name, prefix)) > return 0; > pattern += prefix; > + patternlen -= prefix; > name += prefix; > namelen -= prefix; > + > + /* > + * If the whole pattern did not have a wildcard, > + * then our prefix match is all we need; we > + * do not need to call fnmatch at all. > + */ > + if (!patternlen && !namelen) > + return 1; > } > > - return fnmatch_icase(pattern, name, FNM_PATHNAME) == 0; > + return fnmatch_icase_mem(pattern, patternlen, > + name, namelen, > + FNM_PATHNAME) == 0; > } > > /* Scan the list and let the last match determine the fate. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-28 21:48 ` [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters Jeff King 2013-03-28 22:30 ` Junio C Hamano @ 2013-03-29 8:45 ` Duy Nguyen 2013-03-29 10:03 ` Duy Nguyen 2013-03-29 12:05 ` Jeff King 1 sibling, 2 replies; 66+ messages in thread From: Duy Nguyen @ 2013-03-29 8:45 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, avila.jn On Fri, Mar 29, 2013 at 4:48 AM, Jeff King <peff@peff.net> wrote: > - return fnmatch_icase(pattern, name, FNM_PATHNAME) == 0; > + return fnmatch_icase_mem(pattern, patternlen, > + name, namelen, > + FNM_PATHNAME) == 0; > } I think you (or Junio) should rebase this on maint. Since c41244e (included in maint), this call is turned to wildmatch(WM_PATHNAME) and WM_PATHNAME is _not_ the same as FNM_PATHNAME for patterns like "foo/**/bar". A diff between next and pu shows me that WM_PATHNAME is incorrectly converted to FNM_PATHNAME. I hope that is the cause of all breakages Junio found out on pu. -- Duy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-29 8:45 ` Duy Nguyen @ 2013-03-29 10:03 ` Duy Nguyen 2013-03-29 11:32 ` Torsten Bögershausen 2013-03-29 12:05 ` Jeff King 1 sibling, 1 reply; 66+ messages in thread From: Duy Nguyen @ 2013-03-29 10:03 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, avila.jn On Fri, Mar 29, 2013 at 3:45 PM, Duy Nguyen <pclouds@gmail.com> wrote: > On Fri, Mar 29, 2013 at 4:48 AM, Jeff King <peff@peff.net> wrote: >> - return fnmatch_icase(pattern, name, FNM_PATHNAME) == 0; >> + return fnmatch_icase_mem(pattern, patternlen, >> + name, namelen, >> + FNM_PATHNAME) == 0; >> } > > I think you (or Junio) should rebase this on maint. Since c41244e > (included in maint), this call is turned to wildmatch(WM_PATHNAME) and > WM_PATHNAME is _not_ the same as FNM_PATHNAME for patterns like > "foo/**/bar". A diff between next and pu shows me that WM_PATHNAME is > incorrectly converted to FNM_PATHNAME. I hope that is the cause of all > breakages Junio found out on pu. Just tested. t0003 and t3001 on 'pu' work for me because I have USE_WILDMATCH on (which turns FNM_PATHNAME to WM_PATHNAME). Both break without USE_WILDMATCH. -- Duy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-29 10:03 ` Duy Nguyen @ 2013-03-29 11:32 ` Torsten Bögershausen 2013-03-29 11:37 ` Duy Nguyen 0 siblings, 1 reply; 66+ messages in thread From: Torsten Bögershausen @ 2013-03-29 11:32 UTC (permalink / raw) To: Duy Nguyen Cc: Jeff King, Junio C Hamano, git, avila.jn, Torsten Bögershausen On 29.03.13 11:03, Duy Nguyen wrote: > On Fri, Mar 29, 2013 at 3:45 PM, Duy Nguyen <pclouds@gmail.com> wrote: >> On Fri, Mar 29, 2013 at 4:48 AM, Jeff King <peff@peff.net> wrote: >>> - return fnmatch_icase(pattern, name, FNM_PATHNAME) == 0; >>> + return fnmatch_icase_mem(pattern, patternlen, >>> + name, namelen, >>> + FNM_PATHNAME) == 0; >>> } >> >> I think you (or Junio) should rebase this on maint. Since c41244e >> (included in maint), this call is turned to wildmatch(WM_PATHNAME) and >> WM_PATHNAME is _not_ the same as FNM_PATHNAME for patterns like >> "foo/**/bar". A diff between next and pu shows me that WM_PATHNAME is >> incorrectly converted to FNM_PATHNAME. I hope that is the cause of all >> breakages Junio found out on pu. > > Just tested. t0003 and t3001 on 'pu' work for me because I have > USE_WILDMATCH on (which turns FNM_PATHNAME to WM_PATHNAME). Both break > without USE_WILDMATCH. > Hm, tested what? diff --git a/dir.c b/dir.c index 73a08af..0b63167 100644 --- a/dir.c +++ b/dir.c @@ -564,7 +564,7 @@ int match_pathname(const char *pathname, int pathlen, return fnmatch_icase_mem(pattern, patternlen, name, namelen, - FNM_PATHNAME) == 0; + WM_PATHNAME) == 0; } Gives only one breakage, so we are coming closer. *** t3001-ls-files-others-exclude.sh *** [snip] not ok 17 - ls-files with "**" patterns ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-29 11:32 ` Torsten Bögershausen @ 2013-03-29 11:37 ` Duy Nguyen 0 siblings, 0 replies; 66+ messages in thread From: Duy Nguyen @ 2013-03-29 11:37 UTC (permalink / raw) To: Torsten Bögershausen; +Cc: Jeff King, Junio C Hamano, git, avila.jn On Fri, Mar 29, 2013 at 6:32 PM, Torsten Bögershausen <tboegi@web.de> wrote: >> Just tested. t0003 and t3001 on 'pu' work for me because I have >> USE_WILDMATCH on (which turns FNM_PATHNAME to WM_PATHNAME). Both break >> without USE_WILDMATCH. >> > Hm, tested what? Tested t0003 and t3001 with and without USE_WILDMATCH, which is pretty much like you patch, except that wildmatch is used instead of fnmatch. > diff --git a/dir.c b/dir.c > index 73a08af..0b63167 100644 > --- a/dir.c > +++ b/dir.c > @@ -564,7 +564,7 @@ int match_pathname(const char *pathname, int pathlen, > > return fnmatch_icase_mem(pattern, patternlen, > name, namelen, > - FNM_PATHNAME) == 0; > + WM_PATHNAME) == 0; > } > > Gives only one breakage, so we are coming closer. > > > *** t3001-ls-files-others-exclude.sh *** > [snip] > not ok 17 - ls-files with "**" patterns > > > -- Duy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-29 8:45 ` Duy Nguyen 2013-03-29 10:03 ` Duy Nguyen @ 2013-03-29 12:05 ` Jeff King 2013-03-29 13:02 ` Duy Nguyen 1 sibling, 1 reply; 66+ messages in thread From: Jeff King @ 2013-03-29 12:05 UTC (permalink / raw) To: Duy Nguyen; +Cc: Junio C Hamano, git, avila.jn On Fri, Mar 29, 2013 at 03:45:35PM +0700, Nguyen Thai Ngoc Duy wrote: > On Fri, Mar 29, 2013 at 4:48 AM, Jeff King <peff@peff.net> wrote: > > - return fnmatch_icase(pattern, name, FNM_PATHNAME) == 0; > > + return fnmatch_icase_mem(pattern, patternlen, > > + name, namelen, > > + FNM_PATHNAME) == 0; > > } > > I think you (or Junio) should rebase this on maint. Since c41244e > (included in maint), this call is turned to wildmatch(WM_PATHNAME) and > WM_PATHNAME is _not_ the same as FNM_PATHNAME for patterns like > "foo/**/bar". A diff between next and pu shows me that WM_PATHNAME is > incorrectly converted to FNM_PATHNAME. I hope that is the cause of all > breakages Junio found out on pu. I don't think we want to rebase; the regression is in the v1.8.1 series, and I suspected that Junio was planning to ship a v1.8.1.6 with the fix. The wildmatch code comes in v1.8.2. So we would want to do any adjustment to the fix when we merge up to maint. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-29 12:05 ` Jeff King @ 2013-03-29 13:02 ` Duy Nguyen 2013-03-29 16:44 ` Junio C Hamano 0 siblings, 1 reply; 66+ messages in thread From: Duy Nguyen @ 2013-03-29 13:02 UTC (permalink / raw) To: Jeff King, Junio C Hamano; +Cc: git, avila.jn On Fri, Mar 29, 2013 at 08:05:40AM -0400, Jeff King wrote: > On Fri, Mar 29, 2013 at 03:45:35PM +0700, Nguyen Thai Ngoc Duy wrote: > > > On Fri, Mar 29, 2013 at 4:48 AM, Jeff King <peff@peff.net> wrote: > > > - return fnmatch_icase(pattern, name, FNM_PATHNAME) == 0; > > > + return fnmatch_icase_mem(pattern, patternlen, > > > + name, namelen, > > > + FNM_PATHNAME) == 0; > > > } > > > > I think you (or Junio) should rebase this on maint. Since c41244e > > (included in maint), this call is turned to wildmatch(WM_PATHNAME) and > > WM_PATHNAME is _not_ the same as FNM_PATHNAME for patterns like > > "foo/**/bar". A diff between next and pu shows me that WM_PATHNAME is > > incorrectly converted to FNM_PATHNAME. I hope that is the cause of all > > breakages Junio found out on pu. > > I don't think we want to rebase; the regression is in the v1.8.1 series, > and I suspected that Junio was planning to ship a v1.8.1.6 with the fix. > The wildmatch code comes in v1.8.2. > > So we would want to do any adjustment to the fix when we merge up to > maint. OK. Then Junio, you may need to resolve the conflict with something like this. Originally match_basename uses fnmatch, not wildmatch. But using wildmatch there too should be fine, now that both match_{base,path}name share fnmatch_icase_mem(). -- 8< -- diff --git a/dir.c b/dir.c index 73a08af..84744df 100644 --- a/dir.c +++ b/dir.c @@ -81,7 +81,9 @@ static int fnmatch_icase_mem(const char *pattern, int patternlen, use_str = str_buf.buf; } - match_status = fnmatch_icase(use_pat, use_str, flags); + if (ignore_case) + flags |= WM_CASEFOLD; + match_status = wildmatch(use_pat, use_str, flags, NULL); strbuf_release(&pat_buf); strbuf_release(&str_buf); @@ -564,7 +566,7 @@ int match_pathname(const char *pathname, int pathlen, return fnmatch_icase_mem(pattern, patternlen, name, namelen, - FNM_PATHNAME) == 0; + WM_PATHNAME) == 0; } /* -- 8< -- ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-29 13:02 ` Duy Nguyen @ 2013-03-29 16:44 ` Junio C Hamano 2013-03-29 17:04 ` Jeff King 2013-03-30 1:40 ` Duy Nguyen 0 siblings, 2 replies; 66+ messages in thread From: Junio C Hamano @ 2013-03-29 16:44 UTC (permalink / raw) To: Duy Nguyen; +Cc: Jeff King, git, avila.jn Duy Nguyen <pclouds@gmail.com> writes: >> So we would want to do any adjustment to the fix when we merge up to >> maint. > > OK. Then Junio, you may need to resolve the conflict with something > like this. Originally match_basename uses fnmatch, not wildmatch. But > using wildmatch there too should be fine, now that both > match_{base,path}name share fnmatch_icase_mem(). Thanks. The result still smells somewhat funny, though. fnmatch_icase_mem() is meant to be a wrapper of fnmatch_icase() for counted strings and its matching semantics should be the same as fnmatch_icase(). With the merge-fix, fnmatch_icase_mem() calls into wildmatch(), but fnmatch_icase() still calls into fnmatch(). The latter's flags are meant to be taken from FNM_* family, but the former takes flags from WM_* family of bits, no? I think you are running with USE_WILDMATCH which may make the differences harder to notice, but the name fnmatch_icase_mem() that is not in the same family as fnmatch but is from the wildmatch() family smells like an accident waiting to happen. I tend to think in the longer term it may be a good idea to build with USE_WILDMATCH unconditionally (we can lose compat/fnmatch), so in the end this may not matter that much, but before that happens, soon after we merge the regression fix with this merge-fix, we may want to update the codebase as if we applied a series that were based on 'maint' as you suggested, i.e. using raw wildmatch() consistently in the match_{base,path}name() codepath. Opinions? > > -- 8< -- > diff --git a/dir.c b/dir.c > index 73a08af..84744df 100644 > --- a/dir.c > +++ b/dir.c > @@ -81,7 +81,9 @@ static int fnmatch_icase_mem(const char *pattern, int patternlen, > use_str = str_buf.buf; > } > > - match_status = fnmatch_icase(use_pat, use_str, flags); > + if (ignore_case) > + flags |= WM_CASEFOLD; > + match_status = wildmatch(use_pat, use_str, flags, NULL); > > strbuf_release(&pat_buf); > strbuf_release(&str_buf); > @@ -564,7 +566,7 @@ int match_pathname(const char *pathname, int pathlen, > > return fnmatch_icase_mem(pattern, patternlen, > name, namelen, > - FNM_PATHNAME) == 0; > + WM_PATHNAME) == 0; > } > > /* > -- 8< -- ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-29 16:44 ` Junio C Hamano @ 2013-03-29 17:04 ` Jeff King 2013-03-29 17:35 ` Junio C Hamano 2013-03-30 1:40 ` Duy Nguyen 1 sibling, 1 reply; 66+ messages in thread From: Jeff King @ 2013-03-29 17:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: Duy Nguyen, git, avila.jn On Fri, Mar 29, 2013 at 09:44:32AM -0700, Junio C Hamano wrote: > Duy Nguyen <pclouds@gmail.com> writes: > > >> So we would want to do any adjustment to the fix when we merge up to > >> maint. > > > > OK. Then Junio, you may need to resolve the conflict with something > > like this. Originally match_basename uses fnmatch, not wildmatch. But > > using wildmatch there too should be fine, now that both > > match_{base,path}name share fnmatch_icase_mem(). > > Thanks. > > The result still smells somewhat funny, though. > > fnmatch_icase_mem() is meant to be a wrapper of fnmatch_icase() for > counted strings and its matching semantics should be the same as > fnmatch_icase(). > > With the merge-fix, fnmatch_icase_mem() calls into wildmatch(), but > fnmatch_icase() still calls into fnmatch(). > > The latter's flags are meant to be taken from FNM_* family, but the > former takes flags from WM_* family of bits, no? Yeah, that does not seem right. If match_pathname has learned to call into wildmatch instead of fnmatch_icase in the interim, then the right resolution is to convert its call to fnmatch_icase_mem to a new wildmatch_mem. Presumably that can be done by either tweaking fnmatch_icase_mem, or, if wildmatch is ready to take counted strings, calling into it with the right options. > I think you are running with USE_WILDMATCH which may make the > differences harder to notice, but the name fnmatch_icase_mem() that is > not in the same family as fnmatch but is from the wildmatch() family > smells like an accident waiting to happen. Agreed. > I tend to think in the longer term it may be a good idea to build with > USE_WILDMATCH unconditionally (we can lose compat/fnmatch), so in the > end this may not matter that much Yeah, I think that is a sane long-term goal. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-29 17:04 ` Jeff King @ 2013-03-29 17:35 ` Junio C Hamano 2013-03-29 17:44 ` Jeff King 0 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2013-03-29 17:35 UTC (permalink / raw) To: Jeff King; +Cc: Duy Nguyen, git, avila.jn Jeff King <peff@peff.net> writes: > On Fri, Mar 29, 2013 at 09:44:32AM -0700, Junio C Hamano wrote: > ... >> With the merge-fix, fnmatch_icase_mem() calls into wildmatch(), but >> fnmatch_icase() still calls into fnmatch(). >> >> The latter's flags are meant to be taken from FNM_* family, but the >> former takes flags from WM_* family of bits, no? > > Yeah, that does not seem right. If match_pathname has learned to call > into wildmatch instead of fnmatch_icase in the interim, then the right > resolution is to convert its call to fnmatch_icase_mem to a new > wildmatch_mem. Presumably that can be done by either tweaking > fnmatch_icase_mem, or, if wildmatch is ready to take counted strings, > calling into it with the right options. > >> I think you are running with USE_WILDMATCH which may make the >> differences harder to notice, but the name fnmatch_icase_mem() that is >> not in the same family as fnmatch but is from the wildmatch() family >> smells like an accident waiting to happen. > > Agreed. This may be just the matter of naming. It smelled wrong to me only because the "fnmatch" in the helper fnmatch_icase_mem() told me that it should forever use fnmatch semantics. After reading its (only) two callsites, they are both "the caller has transformed the inputs to this lowest level pathname vs pattern matching function in order to reduce the cost of matching, and now it is time to exercise the matcher". The only thing they care about is that they are calling "the lowest level pathname vs pattern matching function." If we pronounce "fnmatch_icase_mem()" as "match_path_with_pattern()" or something in the original series, the problem may go away ;-) Does any caller pass FNM_* bits to a callchain that reach the new *_mem() function? ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-29 17:35 ` Junio C Hamano @ 2013-03-29 17:44 ` Jeff King 0 siblings, 0 replies; 66+ messages in thread From: Jeff King @ 2013-03-29 17:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: Duy Nguyen, git, avila.jn On Fri, Mar 29, 2013 at 10:35:17AM -0700, Junio C Hamano wrote: > This may be just the matter of naming. > > It smelled wrong to me only because the "fnmatch" in the helper > fnmatch_icase_mem() told me that it should forever use fnmatch > semantics. After reading its (only) two callsites, they are both > "the caller has transformed the inputs to this lowest level pathname > vs pattern matching function in order to reduce the cost of > matching, and now it is time to exercise the matcher". The only > thing they care about is that they are calling "the lowest level > pathname vs pattern matching function." > > If we pronounce "fnmatch_icase_mem()" as "match_path_with_pattern()" > or something in the original series, the problem may go away ;-) Yes, since there are only the two new added callers, if they both want to switch to wildmatch, then it is fine for the helper function to switch. The danger is if some other caller wants to start using it; I added it with the name I did figuring that other spots might want to use it eventually. But if all of fnmatch is going to go away in favor of wildmatch eventually, then that helper is not all that useful (or it would be even more useful as "wildmatch_mem" or similar). > Does any caller pass FNM_* bits to a callchain that reach the new *_mem() > function? The series only adds two callers, and they provide constant flags; match_basename passes no flags, and should be OK. match_pathname uses FNM_PATHNAME, and would need to be converted to use WM_PATHNAME as part of the conflict resolution. -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters 2013-03-29 16:44 ` Junio C Hamano 2013-03-29 17:04 ` Jeff King @ 2013-03-30 1:40 ` Duy Nguyen 1 sibling, 0 replies; 66+ messages in thread From: Duy Nguyen @ 2013-03-30 1:40 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, git, avila.jn On Fri, Mar 29, 2013 at 09:44:32AM -0700, Junio C Hamano wrote: > I tend to think in the longer term it may be a good idea to build > with USE_WILDMATCH unconditionally (we can lose compat/fnmatch), so > in the end this may not matter that much I was thinking about that yesterday. After all, it's the purpose of USE_WILDMATCH and nd/retire-fnmatch series. So how about this patch to enable USE_WILDMATCH for a wider audience? I think it could get merged down to master. If wildmatch's bug history is messy by the time we enter rc cycles, we could revert the patch and cut new releases with fnmatch. If not, after 1.9 is released, I'll submit a patch to remove fnmatch compatiblity later and we will be on wildmatch only. -- 8< -- diff --git a/Makefile b/Makefile index 598d631..8a740f8 100644 --- a/Makefile +++ b/Makefile @@ -106,7 +106,7 @@ all:: # Define NO_FNMATCH_CASEFOLD if your fnmatch function doesn't have the # FNM_CASEFOLD GNU extension. # -# Define USE_WILDMATCH if you want to use Git's wildmatch +# Define NO_USE_WILDMATCH if you do not want to use Git's wildmatch # implementation as fnmatch # # Define NO_GECOS_IN_PWENT if you don't have pw_gecos in struct passwd @@ -1255,7 +1255,7 @@ ifdef NO_FNMATCH_CASEFOLD COMPAT_OBJS += compat/fnmatch/fnmatch.o endif endif -ifdef USE_WILDMATCH +ifndef NO_USE_WILDMATCH COMPAT_CFLAGS += -DUSE_WILDMATCH endif ifdef NO_SETENV -- 8< -- ^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH 5/6] attr.c::path_matches(): special case paths that end with a slash 2013-03-28 21:43 ` [PATCH v2 0/6] " Jeff King ` (3 preceding siblings ...) 2013-03-28 21:48 ` [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters Jeff King @ 2013-03-28 21:49 ` Jeff King 2013-03-28 21:50 ` [PATCH 6/6] t: check that a pattern without trailing slash matches a directory Jeff King 5 siblings, 0 replies; 66+ messages in thread From: Jeff King @ 2013-03-28 21:49 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn From: Junio C Hamano <gitster@pobox.com> The function is given a string that ends with a slash to signal that the path is a directory to make sure that a pattern that ends with a slash (i.e. MUSTBEDIR) can tell directories and non-directories apart. However, the pattern itself (pat->pattern and pat->patternlen) that came from such a MUSTBEDIR pattern is represented as a string that ends with a slash, but patternlen does not count that trailing slash. A MUSTBEDIR pattern "element/" is represented as a counted string <"element/", 7> and this must match match pathname "element/". Because match_basename() and match_pathname() want to see pathname "element" to match against the pattern <"element/", 7>, reduce the length of the path to exclude the trailing slash when calling these functions. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jeff King <peff@peff.net> --- Tweaked since v1 to also drop the trailing slash when we pass the path to match_pathname. attr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/attr.c b/attr.c index 4cfe0ee..4d620bc 100644 --- a/attr.c +++ b/attr.c @@ -661,18 +661,18 @@ static int path_matches(const char *pathname, int pathlen, { const char *pattern = pat->pattern; int prefix = pat->nowildcardlen; + int isdir = (pathlen && pathname[pathlen - 1] == '/'); - if ((pat->flags & EXC_FLAG_MUSTBEDIR) && - ((!pathlen) || (pathname[pathlen-1] != '/'))) + if ((pat->flags & EXC_FLAG_MUSTBEDIR) && !isdir) return 0; if (pat->flags & EXC_FLAG_NODIR) { return match_basename(pathname + basename_offset, - pathlen - basename_offset, + pathlen - basename_offset - isdir, pattern, prefix, pat->patternlen, pat->flags); } - return match_pathname(pathname, pathlen, + return match_pathname(pathname, pathlen - isdir, base, baselen, pattern, prefix, pat->patternlen, pat->flags); } -- 1.8.2.13.g0f18d3c ^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH 6/6] t: check that a pattern without trailing slash matches a directory 2013-03-28 21:43 ` [PATCH v2 0/6] " Jeff King ` (4 preceding siblings ...) 2013-03-28 21:49 ` [PATCH 5/6] attr.c::path_matches(): special case paths that end with a slash Jeff King @ 2013-03-28 21:50 ` Jeff King 2013-03-28 22:21 ` Eric Sunshine 5 siblings, 1 reply; 66+ messages in thread From: Jeff King @ 2013-03-28 21:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, pclouds, avila.jn Prior to v1.8.1.1, with: git init echo content >foo && mkdir subdir && echo content >subdir/bar && echo "subdir export-ignore" >.gitattributes git add . && git commit -m one && git archive HEAD | tar tf - the resulting archive would contain only "foo" and ".gitattributes", not subdir. This was broken with a recent change that intended to allow "subdir/ export-ignore" to also exclude the directory, but instead ended up _requiring_ the trailing slash by mistake. A pattern "subdir" should match any path "subdir", whether it is a directory or a non-diretory. A pattern "subdir/" insists that a path "subdir" must be a directory for it to match. This patch adds test not just for this simple case, but also for deeper cross-directory cases, as well as cases with wildcards. Signed-off-by: Jeff King <peff@peff.net> --- Added new tests since v1 that handle the match_pathname code path. t/t5002-archive-attr-pattern.sh | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/t/t5002-archive-attr-pattern.sh b/t/t5002-archive-attr-pattern.sh index 0c847fb..6667d15 100755 --- a/t/t5002-archive-attr-pattern.sh +++ b/t/t5002-archive-attr-pattern.sh @@ -27,6 +27,25 @@ test_expect_success 'setup' ' echo ignored-only-if-dir/ export-ignore >>.git/info/attributes && git add ignored-only-if-dir && + mkdir -p ignored-without-slash && + echo "ignored without slash" >ignored-without-slash/foo && + git add ignored-without-slash/foo && + echo "ignored-without-slash export-ignore" >>.git/info/attributes && + + mkdir -p wildcard-without-slash && + echo "ignored without slash" >wildcard-without-slash/foo && + git add wildcard-without-slash/foo && + echo "wild*-without-slash export-ignore" >>.git/info/attributes && + + mkdir -p deep/and/slashless && + echo "ignored without slash" >deep/and/slashless/foo && + git add deep/and/slashless/foo && + echo "deep/and/slashless export-ignore" >>.git/info/attributes && + + mkdir -p deep/with/wildcard && + echo "ignored without slash" >deep/with/wildcard/foo && + git add deep/with/wildcard/foo && + echo "deep/*t*/wildcard export-ignore" >>.git/info/attributes && mkdir -p one-level-lower/two-levels-lower/ignored-only-if-dir && echo ignored by ignored dir >one-level-lower/two-levels-lower/ignored-only-if-dir/ignored-by-ignored-dir && @@ -49,6 +68,14 @@ test_expect_missing archive/ignored-ony-if-dir/ignored-by-ignored-dir test_expect_exists archive/not-ignored-dir/ test_expect_missing archive/ignored-only-if-dir/ test_expect_missing archive/ignored-ony-if-dir/ignored-by-ignored-dir +test_expect_missing archive/ignored-without-slash/ && +test_expect_missing archive/ignored-without-slash/foo && +test_expect_missing archive/wildcard-without-slash/ +test_expect_missing archive/wildcard-without-slash/foo && +test_expect_missing archive/deep/and/slashless/ && +test_expect_missing archive/deep/and/slashless/foo && +test_expect_missing archive/deep/with/wildcard/ && +test_expect_missing archive/deep/with/wildcard/foo && test_expect_exists archive/one-level-lower/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-only-if-dir/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-ony-if-dir/ignored-by-ignored-dir -- 1.8.2.13.g0f18d3c ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 6/6] t: check that a pattern without trailing slash matches a directory 2013-03-28 21:50 ` [PATCH 6/6] t: check that a pattern without trailing slash matches a directory Jeff King @ 2013-03-28 22:21 ` Eric Sunshine 2013-03-28 22:22 ` Jeff King 0 siblings, 1 reply; 66+ messages in thread From: Eric Sunshine @ 2013-03-28 22:21 UTC (permalink / raw) To: Jeff King Cc: Junio C Hamano, Git List, Nguyễn Thái Ngọc Duy, avila.jn On Thu, Mar 28, 2013 at 5:50 PM, Jeff King <peff@peff.net> wrote: > A pattern "subdir" should match any path "subdir", whether it is a > directory or a non-diretory. A pattern "subdir/" insists that a s/diretory/directory/ [1] > path "subdir" must be a directory for it to match. [1]: http://article.gmane.org/gmane.comp.version-control.git/219185/ ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 6/6] t: check that a pattern without trailing slash matches a directory 2013-03-28 22:21 ` Eric Sunshine @ 2013-03-28 22:22 ` Jeff King 0 siblings, 0 replies; 66+ messages in thread From: Jeff King @ 2013-03-28 22:22 UTC (permalink / raw) To: Eric Sunshine Cc: Junio C Hamano, Git List, Nguyễn Thái Ngọc Duy, avila.jn On Thu, Mar 28, 2013 at 06:21:08PM -0400, Eric Sunshine wrote: > On Thu, Mar 28, 2013 at 5:50 PM, Jeff King <peff@peff.net> wrote: > > A pattern "subdir" should match any path "subdir", whether it is a > > directory or a non-diretory. A pattern "subdir/" insists that a > > s/diretory/directory/ [1] I think I've taken proofreading to a new level by missing the error I already corrected somebody else for. Thanks for noticing. :) -Peff ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [regression?] trailing slash required in .gitattributes 2013-03-22 22:24 ` Jeff King 2013-03-22 23:08 ` Junio C Hamano @ 2013-03-23 4:18 ` Duy Nguyen 2013-03-23 4:43 ` Duy Nguyen 1 sibling, 1 reply; 66+ messages in thread From: Duy Nguyen @ 2013-03-23 4:18 UTC (permalink / raw) To: Jeff King; +Cc: git, Jean-Noël AVILA On Fri, Mar 22, 2013 at 06:24:39PM -0400, Jeff King wrote: > I'm having trouble figuring out the right solution for this. Thanks for looking into this. It was on my todo list, but you beat me to it :) > But then here we'll end up feeding "foo/" to be compared with "foo", > which we don't want. For a pattern "foo", we want to match _either_ > "foo/" or "foo". So you'd think something like: > > if (pathlen && pathname[pathlen-1] == '/') > pathlen--; > > would work. But it seems that match_basename, despite taking the length > of all of the strings we pass it, will happily use NUL-terminated > functions like strcmp or fnmatch. Converting the former to check lengths > should be pretty straightforward. But there is no version of fnmatch > that does what we want. I wonder if we using wildmatch can get around > this limitation. You can use nwildmatch() from this patch. I tested it lightly with t3070-wildmatch.sh, feeding the strings with no terminating NUL. It seems to work ok. -- 8< -- Subject: [PATCH] wildmatch: do not require "text" to be NUL-terminated This may be helpful when we just want to match a part of "text". nwildmatch can be used for this purpose. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- wildmatch.c | 25 +++++++++++++------------ wildmatch.h | 11 +++++++++-- 2 files changed, 22 insertions(+), 14 deletions(-) diff --git a/wildmatch.c b/wildmatch.c index 7192bdc..f97ae2a 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -52,7 +52,8 @@ typedef unsigned char uchar; #define ISXDIGIT(c) (ISASCII(c) && isxdigit(c)) /* Match pattern "p" against "text" */ -static int dowild(const uchar *p, const uchar *text, unsigned int flags) +static int dowild(const uchar *p, const uchar *text, + const uchar *textend, unsigned int flags) { uchar p_ch; const uchar *pattern = p; @@ -60,8 +61,9 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) for ( ; (p_ch = *p) != '\0'; text++, p++) { int matched, match_slash, negated; uchar t_ch, prev_ch; - if ((t_ch = *text) == '\0' && p_ch != '*') + if (text >= textend && p_ch != '*') return WM_ABORT_ALL; + t_ch = *text; if ((flags & WM_CASEFOLD) && ISUPPER(t_ch)) t_ch = tolower(t_ch); if ((flags & WM_CASEFOLD) && ISUPPER(p_ch)) @@ -101,7 +103,7 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) * both foo/bar and foo/a/bar. */ if (p[0] == '/' && - dowild(p + 1, text, flags) == WM_MATCH) + dowild(p + 1, text, textend, flags) == WM_MATCH) return WM_MATCH; match_slash = 1; } else @@ -130,9 +132,7 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) /* the slash is consumed by the top-level for loop */ break; } - while (1) { - if (t_ch == '\0') - break; + while (text < textend) { /* * Try to advance faster when an asterisk is * followed by a literal. We know in this case @@ -145,18 +145,18 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) p_ch = *p; if ((flags & WM_CASEFOLD) && ISUPPER(p_ch)) p_ch = tolower(p_ch); - while ((t_ch = *text) != '\0' && + while (text < textend && (match_slash || t_ch != '/')) { if ((flags & WM_CASEFOLD) && ISUPPER(t_ch)) t_ch = tolower(t_ch); if (t_ch == p_ch) break; - text++; + t_ch = *++text; } if (t_ch != p_ch) return WM_NOMATCH; } - if ((matched = dowild(p, text, flags)) != WM_NOMATCH) { + if ((matched = dowild(p, text, textend, flags)) != WM_NOMATCH) { if (!match_slash || matched != WM_ABORT_TO_STARSTAR) return matched; } else if (!match_slash && t_ch == '/') @@ -261,12 +261,13 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) } } - return *text ? WM_NOMATCH : WM_MATCH; + return text < textend ? WM_NOMATCH : WM_MATCH; } /* Match the "pattern" against the "text" string. */ -int wildmatch(const char *pattern, const char *text, +int nwildmatch(const char *pattern, const char *text, int textlen, unsigned int flags, struct wildopts *wo) { - return dowild((const uchar*)pattern, (const uchar*)text, flags); + return dowild((const uchar*)pattern, (const uchar*)text, + (const uchar*)text + textlen, flags); } diff --git a/wildmatch.h b/wildmatch.h index 4090c8f..cdd7544 100644 --- a/wildmatch.h +++ b/wildmatch.h @@ -12,7 +12,14 @@ struct wildopts; -int wildmatch(const char *pattern, const char *text, - unsigned int flags, +int nwildmatch(const char *pattern, const char *text, + int len, unsigned int flags, struct wildopts *wo); + +/* Match the "pattern" against the "text" string. */ +static inline int wildmatch(const char *pattern, const char *text, + unsigned int flags, struct wildopts *wo) +{ + return nwildmatch(pattern, text, strlen(text), flags, wo); +} #endif -- 1.8.2.83.gc99314b -- 8< -- ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [regression?] trailing slash required in .gitattributes 2013-03-23 4:18 ` [regression?] trailing slash required in .gitattributes Duy Nguyen @ 2013-03-23 4:43 ` Duy Nguyen 0 siblings, 0 replies; 66+ messages in thread From: Duy Nguyen @ 2013-03-23 4:43 UTC (permalink / raw) To: Jeff King; +Cc: git, Jean-Noël AVILA On Sat, Mar 23, 2013 at 11:18:24AM +0700, Duy Nguyen wrote: > You can use nwildmatch() from this patch. I tested it lightly with > t3070-wildmatch.sh, feeding the strings with no terminating NUL. It > seems to work ok. And valgrind spotted my faults, especially for using strchr. You would need this on top: -- 8< -- diff --git a/wildmatch.c b/wildmatch.c index f97ae2a..939bac8 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -61,9 +61,13 @@ static int dowild(const uchar *p, const uchar *text, for ( ; (p_ch = *p) != '\0'; text++, p++) { int matched, match_slash, negated; uchar t_ch, prev_ch; - if (text >= textend && p_ch != '*') - return WM_ABORT_ALL; - t_ch = *text; + if (text >= textend) { + if (p_ch != '*') + return WM_ABORT_ALL; + else + t_ch = '\0'; + } else + t_ch = *text; if ((flags & WM_CASEFOLD) && ISUPPER(t_ch)) t_ch = tolower(t_ch); if ((flags & WM_CASEFOLD) && ISUPPER(p_ch)) @@ -115,8 +119,9 @@ static int dowild(const uchar *p, const uchar *text, /* Trailing "**" matches everything. Trailing "*" matches * only if there are no more slash characters. */ if (!match_slash) { - if (strchr((char*)text, '/') != NULL) - return WM_NOMATCH; + for (;text < textend; text++) + if (*text == '/') + return WM_NOMATCH; } return WM_MATCH; } else if (!match_slash && *p == '/') { @@ -125,10 +130,11 @@ static int dowild(const uchar *p, const uchar *text, * with WM_PATHNAME matches the next * directory */ - const char *slash = strchr((char*)text, '/'); - if (!slash) + for (;text < textend; text++) + if (*text == '/') + break; + if (text == textend) return WM_NOMATCH; - text = (const uchar*)slash; /* the slash is consumed by the top-level for loop */ break; } @@ -151,7 +157,7 @@ static int dowild(const uchar *p, const uchar *text, t_ch = tolower(t_ch); if (t_ch == p_ch) break; - t_ch = *++text; + t_ch = ++text < textend ? *text : '\0'; } if (t_ch != p_ch) return WM_NOMATCH; -- 8< -- ^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH 0/4] attr directory matching regression 2013-03-19 17:57 [regression?] trailing slash required in .gitattributes Jeff King 2013-03-19 18:10 ` Junio C Hamano 2013-03-19 18:10 ` Jeff King @ 2013-03-25 6:05 ` Nguyễn Thái Ngọc Duy 2013-03-25 6:05 ` [PATCH 1/4] wildmatch: do not require "text" to be NUL-terminated Nguyễn Thái Ngọc Duy ` (4 more replies) 2 siblings, 5 replies; 66+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2013-03-25 6:05 UTC (permalink / raw) To: git Cc: Jeff King, avila.jn, Junio C Hamano, Nguyễn Thái Ngọc Duy I think the fix is something like this. There is still one thing I'd like to do: make this code not rely on NUL for terminating the patterns. That should remove the ugly "p[len] = '\0'" in prepare_attr_stack() 4/4 and and the reallocation in add_exclude() (in current code). But let's deal with the regression first. Nguyễn Thái Ngọc Duy (4): wildmatch: do not require "text" to be NUL-terminated attr.c: fix pattern{,len} inconsistency in struct match_attr dir.c: make match_{base,path}name respect {basename,path}len attr.c: fix matching "subdir" without the trailing slash attr.c | 11 ++++++++++- dir.c | 13 ++++++++----- dir.h | 2 +- t/t5002-archive-attr-pattern.sh | 6 ++++++ wildmatch.c | 43 ++++++++++++++++++++++++----------------- wildmatch.h | 11 +++++++++-- 6 files changed, 59 insertions(+), 27 deletions(-) -- 1.8.2.82.gc24b958 ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH 1/4] wildmatch: do not require "text" to be NUL-terminated 2013-03-25 6:05 ` [PATCH 0/4] attr directory matching regression Nguyễn Thái Ngọc Duy @ 2013-03-25 6:05 ` Nguyễn Thái Ngọc Duy 2013-03-25 6:05 ` [PATCH 2/4] attr.c: fix pattern{,len} inconsistency in struct match_attr Nguyễn Thái Ngọc Duy ` (3 subsequent siblings) 4 siblings, 0 replies; 66+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2013-03-25 6:05 UTC (permalink / raw) To: git Cc: Jeff King, avila.jn, Junio C Hamano, Nguyễn Thái Ngọc Duy This may be helpful when we just want to match a part of "text". nwildmatch can be used for this purpose. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- wildmatch.c | 43 +++++++++++++++++++++++++------------------ wildmatch.h | 11 +++++++++-- 2 files changed, 34 insertions(+), 20 deletions(-) diff --git a/wildmatch.c b/wildmatch.c index 7192bdc..939bac8 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -52,7 +52,8 @@ typedef unsigned char uchar; #define ISXDIGIT(c) (ISASCII(c) && isxdigit(c)) /* Match pattern "p" against "text" */ -static int dowild(const uchar *p, const uchar *text, unsigned int flags) +static int dowild(const uchar *p, const uchar *text, + const uchar *textend, unsigned int flags) { uchar p_ch; const uchar *pattern = p; @@ -60,8 +61,13 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) for ( ; (p_ch = *p) != '\0'; text++, p++) { int matched, match_slash, negated; uchar t_ch, prev_ch; - if ((t_ch = *text) == '\0' && p_ch != '*') - return WM_ABORT_ALL; + if (text >= textend) { + if (p_ch != '*') + return WM_ABORT_ALL; + else + t_ch = '\0'; + } else + t_ch = *text; if ((flags & WM_CASEFOLD) && ISUPPER(t_ch)) t_ch = tolower(t_ch); if ((flags & WM_CASEFOLD) && ISUPPER(p_ch)) @@ -101,7 +107,7 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) * both foo/bar and foo/a/bar. */ if (p[0] == '/' && - dowild(p + 1, text, flags) == WM_MATCH) + dowild(p + 1, text, textend, flags) == WM_MATCH) return WM_MATCH; match_slash = 1; } else @@ -113,8 +119,9 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) /* Trailing "**" matches everything. Trailing "*" matches * only if there are no more slash characters. */ if (!match_slash) { - if (strchr((char*)text, '/') != NULL) - return WM_NOMATCH; + for (;text < textend; text++) + if (*text == '/') + return WM_NOMATCH; } return WM_MATCH; } else if (!match_slash && *p == '/') { @@ -123,16 +130,15 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) * with WM_PATHNAME matches the next * directory */ - const char *slash = strchr((char*)text, '/'); - if (!slash) + for (;text < textend; text++) + if (*text == '/') + break; + if (text == textend) return WM_NOMATCH; - text = (const uchar*)slash; /* the slash is consumed by the top-level for loop */ break; } - while (1) { - if (t_ch == '\0') - break; + while (text < textend) { /* * Try to advance faster when an asterisk is * followed by a literal. We know in this case @@ -145,18 +151,18 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) p_ch = *p; if ((flags & WM_CASEFOLD) && ISUPPER(p_ch)) p_ch = tolower(p_ch); - while ((t_ch = *text) != '\0' && + while (text < textend && (match_slash || t_ch != '/')) { if ((flags & WM_CASEFOLD) && ISUPPER(t_ch)) t_ch = tolower(t_ch); if (t_ch == p_ch) break; - text++; + t_ch = ++text < textend ? *text : '\0'; } if (t_ch != p_ch) return WM_NOMATCH; } - if ((matched = dowild(p, text, flags)) != WM_NOMATCH) { + if ((matched = dowild(p, text, textend, flags)) != WM_NOMATCH) { if (!match_slash || matched != WM_ABORT_TO_STARSTAR) return matched; } else if (!match_slash && t_ch == '/') @@ -261,12 +267,13 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) } } - return *text ? WM_NOMATCH : WM_MATCH; + return text < textend ? WM_NOMATCH : WM_MATCH; } /* Match the "pattern" against the "text" string. */ -int wildmatch(const char *pattern, const char *text, +int nwildmatch(const char *pattern, const char *text, int textlen, unsigned int flags, struct wildopts *wo) { - return dowild((const uchar*)pattern, (const uchar*)text, flags); + return dowild((const uchar*)pattern, (const uchar*)text, + (const uchar*)text + textlen, flags); } diff --git a/wildmatch.h b/wildmatch.h index 4090c8f..cdd7544 100644 --- a/wildmatch.h +++ b/wildmatch.h @@ -12,7 +12,14 @@ struct wildopts; -int wildmatch(const char *pattern, const char *text, - unsigned int flags, +int nwildmatch(const char *pattern, const char *text, + int len, unsigned int flags, struct wildopts *wo); + +/* Match the "pattern" against the "text" string. */ +static inline int wildmatch(const char *pattern, const char *text, + unsigned int flags, struct wildopts *wo) +{ + return nwildmatch(pattern, text, strlen(text), flags, wo); +} #endif -- 1.8.2.82.gc24b958 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH 2/4] attr.c: fix pattern{,len} inconsistency in struct match_attr 2013-03-25 6:05 ` [PATCH 0/4] attr directory matching regression Nguyễn Thái Ngọc Duy 2013-03-25 6:05 ` [PATCH 1/4] wildmatch: do not require "text" to be NUL-terminated Nguyễn Thái Ngọc Duy @ 2013-03-25 6:05 ` Nguyễn Thái Ngọc Duy 2013-03-25 6:05 ` [PATCH 3/4] dir.c: make match_{base,path}name respect {basename,path}len Nguyễn Thái Ngọc Duy ` (2 subsequent siblings) 4 siblings, 0 replies; 66+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2013-03-25 6:05 UTC (permalink / raw) To: git Cc: Jeff King, avila.jn, Junio C Hamano, Nguyễn Thái Ngọc Duy When parse_exclude_pattern detects EXC_FLAG_MUSTBEDIR, it sets patternlen _not_ to include the trailing slash and expects the caller to trim it. Of the two callers, add_exclude() does, parse_attr_line() does not. Because of that, after parse_attr_line() returns, we may have pattern "foo/" but its length is reported 3. Some functions do not care about patternlen and will see the pattern as "foo/" while others may see it as "foo". This patch makes patternlen reflect the true length of pattern. This is a bandage patch that's required for the next patch to pass the test suite as that patch will rely on patternlen's correctness. The true fix comes in the patch after the next one. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- attr.c | 2 ++ dir.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/attr.c b/attr.c index e2f9377..1818ba5 100644 --- a/attr.c +++ b/attr.c @@ -255,6 +255,8 @@ static struct match_attr *parse_attr_line(const char *line, const char *src, &res->u.pat.patternlen, &res->u.pat.flags, &res->u.pat.nowildcardlen); + if (res->u.pat.flags & EXC_FLAG_MUSTBEDIR) + res->u.pat.patternlen++; if (res->u.pat.flags & EXC_FLAG_NEGATIVE) { warning(_("Negative patterns are ignored in git attributes\n" "Use '\\!' for literal leading exclamation.")); diff --git a/dir.h b/dir.h index c3eb4b5..dc63fc8 100644 --- a/dir.h +++ b/dir.h @@ -40,7 +40,7 @@ struct exclude_list { struct exclude_list *el; const char *pattern; - int patternlen; + int patternlen; /* must equal strlen(pattern) */ int nowildcardlen; const char *base; int baselen; -- 1.8.2.82.gc24b958 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH 3/4] dir.c: make match_{base,path}name respect {basename,path}len 2013-03-25 6:05 ` [PATCH 0/4] attr directory matching regression Nguyễn Thái Ngọc Duy 2013-03-25 6:05 ` [PATCH 1/4] wildmatch: do not require "text" to be NUL-terminated Nguyễn Thái Ngọc Duy 2013-03-25 6:05 ` [PATCH 2/4] attr.c: fix pattern{,len} inconsistency in struct match_attr Nguyễn Thái Ngọc Duy @ 2013-03-25 6:05 ` Nguyễn Thái Ngọc Duy 2013-03-25 6:05 ` [PATCH 4/4] attr.c: fix matching "subdir" without the trailing slash Nguyễn Thái Ngọc Duy 2013-03-26 15:10 ` [PATCH 0/4] attr directory matching regression Junio C Hamano 4 siblings, 0 replies; 66+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2013-03-25 6:05 UTC (permalink / raw) To: git Cc: Jeff King, avila.jn, Junio C Hamano, Nguyễn Thái Ngọc Duy When match_basename was split out of excluded_from_list in 593cb88 (exclude: split basename matching code into a separate function - 2012-10-15), it took basenamelen only as a hint. basename was required to be NUL-terminated at the given length. This was fine until match_basename had a new caller (from attr.c) and therefore was no longer excluded_from_list's internal business. Make match_basename stop relying on the NUL assumption above. Do the same for match_pathname. From now on, only pattern is required to be NUL-terminated at the specified patternlen for both match_{base,path}name. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- dir.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/dir.c b/dir.c index 57394e4..30f5241 100644 --- a/dir.c +++ b/dir.c @@ -626,15 +626,18 @@ int match_basename(const char *basename, int basenamelen, int flags) { if (prefix == patternlen) { - if (!strcmp_icase(pattern, basename)) + if (patternlen == basenamelen && + !strncmp_icase(pattern, basename, basenamelen)) return 1; } else if (flags & EXC_FLAG_ENDSWITH) { if (patternlen - 1 <= basenamelen && - !strcmp_icase(pattern + 1, - basename + basenamelen - patternlen + 1)) + !strncmp_icase(pattern + 1, + basename + basenamelen - patternlen + 1, + patternlen - 1)) return 1; } else { - if (fnmatch_icase(pattern, basename, 0) == 0) + if (nwildmatch(pattern, basename, basenamelen, + ignore_case ? WM_CASEFOLD : 0, NULL) == 0) return 1; } return 0; @@ -684,7 +687,7 @@ int match_pathname(const char *pathname, int pathlen, namelen -= prefix; } - return wildmatch(pattern, name, + return nwildmatch(pattern, name, namelen, WM_PATHNAME | (ignore_case ? WM_CASEFOLD : 0), NULL) == 0; } -- 1.8.2.82.gc24b958 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH 4/4] attr.c: fix matching "subdir" without the trailing slash 2013-03-25 6:05 ` [PATCH 0/4] attr directory matching regression Nguyễn Thái Ngọc Duy ` (2 preceding siblings ...) 2013-03-25 6:05 ` [PATCH 3/4] dir.c: make match_{base,path}name respect {basename,path}len Nguyễn Thái Ngọc Duy @ 2013-03-25 6:05 ` Nguyễn Thái Ngọc Duy 2013-03-25 7:20 ` Duy Nguyen 2013-03-26 15:10 ` [PATCH 0/4] attr directory matching regression Junio C Hamano 4 siblings, 1 reply; 66+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2013-03-25 6:05 UTC (permalink / raw) To: git Cc: Jeff King, avila.jn, Junio C Hamano, Nguyễn Thái Ngọc Duy The story goes back to 94bc671 (Add directory pattern matching to attributes - 2012-12-08). Before this commit, directories are passed to path_matches without the trailing slash. This is fine for matching pattern "subdir" with "foo/subdir". Patterns like "subdir/" (i.e. match _directory_ subdir) won't work though. So paths are now passed to path_matches with the trailing slash (i.e. "subdir/"). The trailing slash is used as the directory indicator (similar to dtype in exclude case). This makes pattern "subdir/" match directory "subdir/". Pattern "subdir" no longer match subdir, which is now "subdir/". As the trailing slash in pathname is the directory indicator, we do not need to keep it in the pathname for matching. The trailing slash should be turned to dtype "DT_DIR" and stripped out of pathname. This keeps the code pattern similar to exclude. The same applies for the pattern "subdir/". The trailing slash is converted to flag EXC_FLAG_MUSTBEDIR and should not remain in the pattern, as noted in parse_exclude_pattern(). prepare_attr_stack() breaks this and keeps the trailing slash anyway. To sum up, both patterns and pathnames should never have the trailing slash when it comes to match_basename. Reported-and-analyzed-by: Jeff King <peff@peff.net> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- attr.c | 11 +++++++++-- t/t5002-archive-attr-pattern.sh | 6 ++++++ 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/attr.c b/attr.c index 1818ba5..a95c837 100644 --- a/attr.c +++ b/attr.c @@ -256,7 +256,7 @@ static struct match_attr *parse_attr_line(const char *line, const char *src, &res->u.pat.flags, &res->u.pat.nowildcardlen); if (res->u.pat.flags & EXC_FLAG_MUSTBEDIR) - res->u.pat.patternlen++; + p[res->u.pat.patternlen] = '\0'; if (res->u.pat.flags & EXC_FLAG_NEGATIVE) { warning(_("Negative patterns are ignored in git attributes\n" "Use '\\!' for literal leading exclamation.")); @@ -665,9 +665,16 @@ static int path_matches(const char *pathname, int pathlen, { const char *pattern = pat->pattern; int prefix = pat->nowildcardlen; + int dtype; + + if (pathlen && pathname[pathlen-1] == '/') { + dtype = DT_DIR; + pathlen--; + } else + dtype = DT_REG; if ((pat->flags & EXC_FLAG_MUSTBEDIR) && - ((!pathlen) || (pathname[pathlen-1] != '/'))) + dtype != DT_DIR) return 0; if (pat->flags & EXC_FLAG_NODIR) { diff --git a/t/t5002-archive-attr-pattern.sh b/t/t5002-archive-attr-pattern.sh index 0c847fb..98ccc3c 100755 --- a/t/t5002-archive-attr-pattern.sh +++ b/t/t5002-archive-attr-pattern.sh @@ -27,6 +27,10 @@ test_expect_success 'setup' ' echo ignored-only-if-dir/ export-ignore >>.git/info/attributes && git add ignored-only-if-dir && + mkdir -p ignored-without-slash && + echo ignored without slash >ignored-without-slash/foo && + git add ignored-without-slash/foo && + echo ignored-without-slash export-ignore >>.git/info/attributes && mkdir -p one-level-lower/two-levels-lower/ignored-only-if-dir && echo ignored by ignored dir >one-level-lower/two-levels-lower/ignored-only-if-dir/ignored-by-ignored-dir && @@ -49,6 +53,8 @@ test_expect_exists archive/not-ignored-dir/ignored-only-if-dir test_expect_exists archive/not-ignored-dir/ test_expect_missing archive/ignored-only-if-dir/ test_expect_missing archive/ignored-ony-if-dir/ignored-by-ignored-dir +test_expect_missing archive/ignored-without-slash/ && +test_expect_missing archive/ignored-without-slash/foo && test_expect_exists archive/one-level-lower/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-only-if-dir/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-ony-if-dir/ignored-by-ignored-dir -- 1.8.2.82.gc24b958 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 4/4] attr.c: fix matching "subdir" without the trailing slash 2013-03-25 6:05 ` [PATCH 4/4] attr.c: fix matching "subdir" without the trailing slash Nguyễn Thái Ngọc Duy @ 2013-03-25 7:20 ` Duy Nguyen 2013-03-25 9:24 ` Duy Nguyen 0 siblings, 1 reply; 66+ messages in thread From: Duy Nguyen @ 2013-03-25 7:20 UTC (permalink / raw) To: git Cc: Jeff King, avila.jn, Junio C Hamano, Nguyễn Thái Ngọc Duy On Mon, Mar 25, 2013 at 1:05 PM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote: > The story goes back to 94bc671 (Add directory pattern matching to > attributes - 2012-12-08). Before this commit, directories are passed > to path_matches without the trailing slash. This is fine for matching > pattern "subdir" with "foo/subdir". > > Patterns like "subdir/" (i.e. match _directory_ subdir) won't work > though. So paths are now passed to path_matches with the trailing > slash (i.e. "subdir/"). The trailing slash is used as the directory > indicator (similar to dtype in exclude case). This makes pattern > "subdir/" match directory "subdir/". Pattern "subdir" no longer match > subdir, which is now "subdir/". > > As the trailing slash in pathname is the directory indicator, we do > not need to keep it in the pathname for matching. The trailing slash > should be turned to dtype "DT_DIR" and stripped out of pathname. This > keeps the code pattern similar to exclude. On second thought, maybe we should not pass path "subdir/" at all. Instead we create a fake dtype based on the trailing slash and pass it down to attr.c:fill() -> path_matches(), just like how last_exclude_matching_from_list() is called. -- Duy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 4/4] attr.c: fix matching "subdir" without the trailing slash 2013-03-25 7:20 ` Duy Nguyen @ 2013-03-25 9:24 ` Duy Nguyen 0 siblings, 0 replies; 66+ messages in thread From: Duy Nguyen @ 2013-03-25 9:24 UTC (permalink / raw) To: git; +Cc: Jeff King, avila.jn, Junio C Hamano On Mon, Mar 25, 2013 at 02:20:31PM +0700, Duy Nguyen wrote: > On second thought, maybe we should not pass path "subdir/" at all. > Instead we create a fake dtype based on the trailing slash and pass it > down to attr.c:fill() -> path_matches(), just like how > last_exclude_matching_from_list() is called. I was hoping to make a small patch, but as it turns out, collect_all_attrs() takes a const path that contains the trailing slash, we still need to ignore it in match_{base,path}name so the whole series is still required. The only difference is in the final patch, which is a bit longer: -- 8< -- diff --git a/attr.c b/attr.c index 1818ba5..0b2b716 100644 --- a/attr.c +++ b/attr.c @@ -256,7 +256,7 @@ static struct match_attr *parse_attr_line(const char *line, const char *src, &res->u.pat.flags, &res->u.pat.nowildcardlen); if (res->u.pat.flags & EXC_FLAG_MUSTBEDIR) - res->u.pat.patternlen++; + p[res->u.pat.patternlen] = '\0'; if (res->u.pat.flags & EXC_FLAG_NEGATIVE) { warning(_("Negative patterns are ignored in git attributes\n" "Use '\\!' for literal leading exclamation.")); @@ -659,15 +659,14 @@ static void prepare_attr_stack(const char *path, int dirlen) } static int path_matches(const char *pathname, int pathlen, - const char *basename, + const char *basename, int dtype, const struct pattern *pat, const char *base, int baselen) { const char *pattern = pat->pattern; int prefix = pat->nowildcardlen; - if ((pat->flags & EXC_FLAG_MUSTBEDIR) && - ((!pathlen) || (pathname[pathlen-1] != '/'))) + if ((pat->flags & EXC_FLAG_MUSTBEDIR) && dtype != DT_DIR) return 0; if (pat->flags & EXC_FLAG_NODIR) { @@ -706,7 +705,7 @@ static int fill_one(const char *what, struct match_attr *a, int rem) } static int fill(const char *path, int pathlen, const char *basename, - struct attr_stack *stk, int rem) + int dtype, struct attr_stack *stk, int rem) { int i; const char *base = stk->origin ? stk->origin : ""; @@ -715,7 +714,7 @@ static int fill(const char *path, int pathlen, const char *basename, struct match_attr *a = stk->attrs[i]; if (a->is_macro) continue; - if (path_matches(path, pathlen, basename, + if (path_matches(path, pathlen, basename, dtype, &a->u.pat, base, stk->originlen)) rem = fill_one("fill", a, rem); } @@ -755,11 +754,17 @@ static void collect_all_attrs(const char *path) struct attr_stack *stk; int i, pathlen, rem, dirlen; const char *basename, *cp, *last_slash = NULL; + int dtype; for (cp = path; *cp; cp++) { if (*cp == '/' && cp[1]) last_slash = cp; } + if (cp > path && cp[-1] == '/') { + dtype = DT_DIR; + cp--; + } else + dtype = DT_REG; pathlen = cp - path; if (last_slash) { basename = last_slash + 1; @@ -775,7 +780,7 @@ static void collect_all_attrs(const char *path) rem = attr_nr; for (stk = attr_stack; 0 < rem && stk; stk = stk->prev) - rem = fill(path, pathlen, basename, stk, rem); + rem = fill(path, pathlen, basename, dtype, stk, rem); } int git_check_attr(const char *path, int num, struct git_attr_check *check) diff --git a/t/t5002-archive-attr-pattern.sh b/t/t5002-archive-attr-pattern.sh index 0c847fb..98ccc3c 100755 --- a/t/t5002-archive-attr-pattern.sh +++ b/t/t5002-archive-attr-pattern.sh @@ -27,6 +27,10 @@ test_expect_success 'setup' ' echo ignored-only-if-dir/ export-ignore >>.git/info/attributes && git add ignored-only-if-dir && + mkdir -p ignored-without-slash && + echo ignored without slash >ignored-without-slash/foo && + git add ignored-without-slash/foo && + echo ignored-without-slash export-ignore >>.git/info/attributes && mkdir -p one-level-lower/two-levels-lower/ignored-only-if-dir && echo ignored by ignored dir >one-level-lower/two-levels-lower/ignored-only-if-dir/ignored-by-ignored-dir && @@ -49,6 +53,8 @@ test_expect_exists archive/not-ignored-dir/ignored-only-if-dir test_expect_exists archive/not-ignored-dir/ test_expect_missing archive/ignored-only-if-dir/ test_expect_missing archive/ignored-ony-if-dir/ignored-by-ignored-dir +test_expect_missing archive/ignored-without-slash/ && +test_expect_missing archive/ignored-without-slash/foo && test_expect_exists archive/one-level-lower/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-only-if-dir/ test_expect_missing archive/one-level-lower/two-levels-lower/ignored-ony-if-dir/ignored-by-ignored-dir -- 8<-- ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 0/4] attr directory matching regression 2013-03-25 6:05 ` [PATCH 0/4] attr directory matching regression Nguyễn Thái Ngọc Duy ` (3 preceding siblings ...) 2013-03-25 6:05 ` [PATCH 4/4] attr.c: fix matching "subdir" without the trailing slash Nguyễn Thái Ngọc Duy @ 2013-03-26 15:10 ` Junio C Hamano 4 siblings, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2013-03-26 15:10 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, avila.jn Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > I think the fix is something like this. There is still one thing I'd > like to do: make this code not rely on NUL for terminating the > patterns. That should remove the ugly "p[len] = '\0'" in > prepare_attr_stack() 4/4 and and the reallocation in add_exclude() (in > current code). But let's deal with the regression first. As a regression fix, we would need a fix that applies to maint-1.8.1, not 'next'. > > Nguyễn Thái Ngọc Duy (4): > wildmatch: do not require "text" to be NUL-terminated > attr.c: fix pattern{,len} inconsistency in struct match_attr > dir.c: make match_{base,path}name respect {basename,path}len > attr.c: fix matching "subdir" without the trailing slash > > attr.c | 11 ++++++++++- > dir.c | 13 ++++++++----- > dir.h | 2 +- > t/t5002-archive-attr-pattern.sh | 6 ++++++ > wildmatch.c | 43 ++++++++++++++++++++++++----------------- > wildmatch.h | 11 +++++++++-- > 6 files changed, 59 insertions(+), 27 deletions(-) ^ permalink raw reply [flat|nested] 66+ messages in thread
end of thread, other threads:[~2013-03-30 1:40 UTC | newest] Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-03-19 17:57 [regression?] trailing slash required in .gitattributes Jeff King 2013-03-19 18:10 ` Junio C Hamano 2013-03-19 18:10 ` Jeff King 2013-03-22 22:24 ` Jeff King 2013-03-22 23:08 ` Junio C Hamano 2013-03-23 8:39 ` Jeff King 2013-03-24 5:25 ` Junio C Hamano 2013-03-26 18:39 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Junio C Hamano 2013-03-26 18:39 ` [PATCH 1/4] attr.c::path_matches(): the basename is part of the pathname Junio C Hamano 2013-03-26 18:49 ` Jeff King 2013-03-27 1:40 ` Duy Nguyen 2013-03-26 18:39 ` [PATCH 2/4] dir.c::match_basename(): pay attention to the length of string parameters Junio C Hamano 2013-03-26 18:55 ` Jeff King 2013-03-26 20:39 ` Jeff King 2013-03-26 20:49 ` Junio C Hamano 2013-03-26 21:29 ` Jeff King 2013-03-26 22:33 ` Junio C Hamano 2013-03-27 1:04 ` Jeff King 2013-03-26 18:39 ` [PATCH 3/4] attr.c::path_matches(): special case paths that end with a slash Junio C Hamano 2013-03-26 19:05 ` Jeff King 2013-03-26 21:33 ` Jeff King 2013-03-27 1:30 ` Duy Nguyen 2013-03-28 19:49 ` Jeff King 2013-03-26 18:39 ` [PATCH 4/4] make sure a pattern without trailing slash matches a directory Junio C Hamano 2013-03-26 19:08 ` Jeff King 2013-03-27 1:13 ` [PATCH 0/4] attribute regression fix for maint-1.8.1 and upward Duy Nguyen 2013-03-27 3:57 ` Junio C Hamano 2013-03-27 4:01 ` Duy Nguyen 2013-03-28 21:43 ` [PATCH v2 0/6] " Jeff King 2013-03-28 21:45 ` [PATCH 1/6] attr.c::path_matches(): the basename is part of the pathname Jeff King 2013-03-28 21:47 ` [PATCH 2/6] dir.c::match_basename(): pay attention to the length of string parameters Jeff King 2013-03-28 22:40 ` Jeff King 2013-03-28 22:49 ` Jeff King 2013-03-28 23:10 ` Junio C Hamano 2013-03-28 23:40 ` Duy Nguyen 2013-03-29 1:25 ` Duy Nguyen 2013-03-29 3:02 ` Jeff King 2013-03-29 5:57 ` Junio C Hamano 2013-03-28 21:47 ` [PATCH 3/6] dir.c::match_pathname(): adjust patternlen when shifting pattern Jeff King 2013-03-28 21:48 ` [PATCH 4/6] dir.c::match_pathname(): pay attention to the length of string parameters Jeff King 2013-03-28 22:30 ` Junio C Hamano 2013-03-29 8:45 ` Duy Nguyen 2013-03-29 10:03 ` Duy Nguyen 2013-03-29 11:32 ` Torsten Bögershausen 2013-03-29 11:37 ` Duy Nguyen 2013-03-29 12:05 ` Jeff King 2013-03-29 13:02 ` Duy Nguyen 2013-03-29 16:44 ` Junio C Hamano 2013-03-29 17:04 ` Jeff King 2013-03-29 17:35 ` Junio C Hamano 2013-03-29 17:44 ` Jeff King 2013-03-30 1:40 ` Duy Nguyen 2013-03-28 21:49 ` [PATCH 5/6] attr.c::path_matches(): special case paths that end with a slash Jeff King 2013-03-28 21:50 ` [PATCH 6/6] t: check that a pattern without trailing slash matches a directory Jeff King 2013-03-28 22:21 ` Eric Sunshine 2013-03-28 22:22 ` Jeff King 2013-03-23 4:18 ` [regression?] trailing slash required in .gitattributes Duy Nguyen 2013-03-23 4:43 ` Duy Nguyen 2013-03-25 6:05 ` [PATCH 0/4] attr directory matching regression Nguyễn Thái Ngọc Duy 2013-03-25 6:05 ` [PATCH 1/4] wildmatch: do not require "text" to be NUL-terminated Nguyễn Thái Ngọc Duy 2013-03-25 6:05 ` [PATCH 2/4] attr.c: fix pattern{,len} inconsistency in struct match_attr Nguyễn Thái Ngọc Duy 2013-03-25 6:05 ` [PATCH 3/4] dir.c: make match_{base,path}name respect {basename,path}len Nguyễn Thái Ngọc Duy 2013-03-25 6:05 ` [PATCH 4/4] attr.c: fix matching "subdir" without the trailing slash Nguyễn Thái Ngọc Duy 2013-03-25 7:20 ` Duy Nguyen 2013-03-25 9:24 ` Duy Nguyen 2013-03-26 15:10 ` [PATCH 0/4] attr directory matching regression Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).