All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, me@ttaylorr.com, newren@gmail.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v3 09/12] sparse-checkout: properly match escaped characters
Date: Wed, 29 Jan 2020 08:58:59 -0500	[thread overview]
Message-ID: <6003bbf2-ad16-0686-dc58-2010fe02ce05@gmail.com> (raw)
In-Reply-To: <20200129100309.GA4218@coredump.intra.peff.net>

On 1/29/2020 5:03 AM, Jeff King wrote:
> On Tue, Jan 28, 2020 at 06:26:40PM +0000, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> In cone mode, the sparse-checkout feature uses hashset containment
>> queries to match paths. Make this algorithm respect escaped asterisk
>> (*) and backslash (\) characters.
> 
> Do we also need to worry about other glob metacharacters? E.g., "?" or
> ranges like "[A-Z]"?

These are not part of the .gitignore patterns [1].

[1] https://git-scm.com/docs/gitignore#_pattern_format

>> +static char *dup_and_filter_pattern(const char *pattern)
>> +{
>> +	char *set, *read;
>> +	char *result = xstrdup(pattern);
>> +
>> +	set = result;
>> +	read = result;
>> +
>> +	while (*read) {
>> +		/* skip escape characters (once) */
>> +		if (*read == '\\')
>> +			read++;
>> +
>> +		*set = *read;
>> +
>> +		set++;
>> +		read++;
>> +	}
>> +	*set = 0;
>> +
>> +	if (*(read - 2) == '/' && *(read - 1) == '*')
>> +		*(read - 2) = 0;
>> +
>> +	return result;
>> +}
> 
> Do we need to check that the pattern is longer than 1 character here? If
> it's a single character, it seems like this "*(read - 2)" will
> dereference the byte before the string.

This method is only called by add_pattern_to_hashsets(), which
has a guard against paths of length less than 2, but thats' no
excuse for dangerous pointer arithmetic here.

But you also point out an even more confusing thing: why are we
modifying based on the 'read' pointer, and not the 'set' pointer?
This seems to work _accidentally_ only when the pattern has "<something>/*"
and "<something>" has no escape characters.

I had to recall exactly why we are dropping this "/*", but it's because
the pattern _actually_ ends with "/*/" but the in-memory pattern has
already dropped that last slash and applied PATTERN_FLAG_MUSTBEDIR.

Here is a diff that I can apply to this patch to fix this problem
_and_ demonstrate it in the tests:

diff --git a/dir.c b/dir.c
index 579f274d13..277577c8bf 100644
--- a/dir.c
+++ b/dir.c
@@ -633,6 +633,7 @@ int pl_hashmap_cmp(const void *unused_cmp_data,
 static char *dup_and_filter_pattern(const char *pattern)
 {
        char *set, *read;
+       size_t count  = 0;
        char *result = xstrdup(pattern);
 
        set = result;
@@ -647,11 +648,14 @@ static char *dup_and_filter_pattern(const char *pattern)
 
                set++;
                read++;
+               count++;
        }
        *set = 0;
 
-       if (*(read - 2) == '/' && *(read - 1) == '*')
-               *(read - 2) = 0;
+       if (count > 2 &&
+           *(set - 1) == '*' &&
+           *(set - 2) == '/')
+               *(set - 2) = 0;
 
        return result;
 }
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 0a21a5e15d..20b0465f77 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -383,6 +383,7 @@ test_expect_success BSLASHPSPEC 'pattern-checks: escaped "*"' '
        /*
        !/*/
        /zbad\\dir/
+       !/zbad\\dir/*/
        /zdoes\*not\*exist/
        /zdoes\*exist/
        EOF

With this extra line in the test, but compiling the old version of this patch,
the test fails with:

'err' is not empty, it contains:
+ cat err
warning: unrecognized negative pattern: '/zbad\\dir/*'
warning: disabling cone pattern matching

To ensure this negative pattern exists in the later patch where we set
the patterns using the builtin, I'll add "zbad\\dir/bogus" to the list
of directories to include, which will add another pattern to the set.

Thanks,
-Stolee


  reply	other threads:[~2020-01-29 13:59 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
2020-01-14 19:25 ` [PATCH 1/8] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
2020-01-16 21:40   ` Junio C Hamano
2020-01-14 19:25 ` [PATCH 2/8] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
2020-01-16 21:46   ` Junio C Hamano
2020-01-14 19:25 ` [PATCH 3/8] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
2020-01-14 19:30   ` Taylor Blau
2020-01-14 19:25 ` [PATCH 4/8] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
2020-01-14 21:16   ` Jeff King
2020-01-14 19:25 ` [PATCH 5/8] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
2020-01-14 19:26 ` [PATCH 6/8] sparse-checkout: warn on incorrect '*' in patterns Derrick Stolee via GitGitGadget
2020-01-14 19:26 ` [PATCH 7/8] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
2020-01-14 21:21   ` Jeff King
2020-01-14 22:08     ` Derrick Stolee
2020-01-14 19:26 ` [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
2020-01-14 21:25   ` Jeff King
2020-01-14 22:11     ` Derrick Stolee
2020-01-14 22:48       ` Jeff King
2020-01-24 21:10         ` Derrick Stolee
2020-01-24 21:42           ` Jeff King
2020-01-28 15:03             ` Derrick Stolee
2020-01-14 19:34 ` [PATCH 0/8] Harden the sparse-checkout builtin Taylor Blau
2020-01-14 19:44   ` Derrick Stolee
2020-01-14 21:31     ` Jeff King
2020-01-15 19:16 ` Junio C Hamano
2020-01-15 20:32   ` Derrick Stolee
2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 01/12] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 02/12] t1091: improve here-docs Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 03/12] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 04/12] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 05/12] sparse-checkout: fix documentation typo for core.sparseCheckoutCone Jeff King via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 06/12] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 07/12] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 08/12] sparse-checkout: warn on incorrect '*' in patterns Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 09/12] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 10/12] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 11/12] sparse-checkout: use C-style quotes in 'list' subcommand Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 12/12] sparse-checkout: improve docs around 'set' in cone mode Derrick Stolee via GitGitGadget
2020-01-28 18:26   ` [PATCH v3 00/12] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 01/12] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 02/12] t1091: improve here-docs Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 03/12] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 04/12] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 05/12] sparse-checkout: fix documentation typo for core.sparseCheckoutCone Jeff King via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 06/12] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 07/12] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 08/12] sparse-checkout: warn on incorrect '*' in patterns Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 09/12] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
2020-01-29 10:03       ` Jeff King
2020-01-29 13:58         ` Derrick Stolee [this message]
2020-01-29 14:04           ` Derrick Stolee
2020-01-28 18:26     ` [PATCH v3 10/12] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
2020-01-29 10:17       ` Jeff King
2020-01-29 10:33         ` Jeff King
2020-01-29 14:16           ` Derrick Stolee
2020-01-29 14:39             ` Derrick Stolee
2020-01-30  7:29             ` Jeff King
2020-01-30 15:01               ` Derrick Stolee
2020-01-28 18:26     ` [PATCH v3 11/12] sparse-checkout: use C-style quotes in 'list' subcommand Derrick Stolee via GitGitGadget
2020-01-29 10:23       ` Jeff King
2020-01-28 18:26     ` [PATCH v3 12/12] sparse-checkout: improve docs around 'set' in cone mode Derrick Stolee via GitGitGadget
2020-01-31 20:16     ` [PATCH v4 00/15] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 01/15] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 02/15] t1091: improve here-docs Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 03/15] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 04/15] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 05/15] sparse-checkout: fix documentation typo for core.sparseCheckoutCone Jeff King via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 06/15] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 07/15] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 08/15] sparse-checkout: warn on globs in cone patterns Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 09/15] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 10/15] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 11/15] sparse-checkout: unquote C-style strings over --stdin Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 12/15] sparse-checkout: use C-style quotes in 'list' subcommand Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 13/15] sparse-checkout: escape all glob characters on write Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 14/15] sparse-checkout: improve docs around 'set' in cone mode Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 15/15] sparse-checkout: fix cone mode behavior mismatch Derrick Stolee via GitGitGadget
2020-01-31 20:36       ` [PATCH v4 00/15] Harden the sparse-checkout builtin Elijah Newren
2020-02-03 14:09         ` Derrick Stolee
2020-02-08 23:32           ` Taylor Blau
2020-02-09 17:27             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6003bbf2-ad16-0686-dc58-2010fe02ce05@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.