All of lore.kernel.org
 help / color / mirror / Atom feed
* bug: sparse config interpretation incorrectness in 2.8.0-rc2
@ 2016-03-17  0:09 Durham Goode
  2016-03-17  0:56 ` Duy Nguyen
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Durham Goode @ 2016-03-17  0:09 UTC (permalink / raw)
  To: git; +Cc: pclouds, Mateusz Kwapich

Using git 2.8.0-rc2, given a repo with the following files:

- one/hideme
- one/donthide
- two/foo

A sparse config of:

cat > .git/info/sparse-checkout <<EOF
/*
!one/hideme
EOF

Results in a repository that only has `one/donthide` in it.  I would 
expect `two/foo`to be present as well.  This worked in 2.6, and 
bisecting it points to d589a67eceacd1cc171bbe94906ca7c9a0edd8c5 "dir.c: 
don't exclude whole dir prematurely" (author cc'd).

The script I used to repro and for bisecting is pasted below:





#!/bin/bash

set -x
rm -rf sparse-test
GIT=git

$GIT init sparse-test
cd sparse-test
$GIT config --add core.sparsecheckout true

mkdir one two
touch one/hideme
touch one/donthide
touch two/foo

$GIT add .
$GIT commit -m "initial commit"
$GIT read-tree --reset -u HEAD

mkdir .git/info
cat > .git/info/sparse-checkout <<EOF
/*
!one/hideme
EOF
$GIT read-tree --reset -u HEAD

ls -R one two
set +x
echo
echo expected: see one/donthide and two/foo
echo actual: see only one/donthide

[ -d two ] && exit 0
exit 1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bug: sparse config interpretation incorrectness in 2.8.0-rc2
  2016-03-17  0:09 bug: sparse config interpretation incorrectness in 2.8.0-rc2 Durham Goode
@ 2016-03-17  0:56 ` Duy Nguyen
  2016-03-17  6:49   ` Durham Goode
  2016-03-17  7:51   ` Junio C Hamano
  2016-03-17  7:22 ` Junio C Hamano
  2016-03-17 12:45 ` [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic Nguyễn Thái Ngọc Duy
  2 siblings, 2 replies; 30+ messages in thread
From: Duy Nguyen @ 2016-03-17  0:56 UTC (permalink / raw)
  To: Durham Goode; +Cc: git, Mateusz Kwapich

On Wed, Mar 16, 2016 at 05:09:23PM -0700, Durham Goode wrote:
> Using git 2.8.0-rc2, given a repo with the following files:
> 
> - one/hideme
> - one/donthide
> - two/foo
> 
> A sparse config of:
> 
> cat > .git/info/sparse-checkout <<EOF
> /*
> !one/hideme
> EOF
> 
> Results in a repository that only has `one/donthide` in it.  I would
> expect `two/foo`to be present as well.  This worked in 2.6, and
> bisecting it points to d589a67eceacd1cc171bbe94906ca7c9a0edd8c5
> "dir.c: don't exclude whole dir prematurely" (author cc'd).

Thank you. This should fix it. I think I understand why it goes
wrong. I'm going to run some more tests and post a proper patch later.

-- 8< --
diff --git a/dir.c b/dir.c
index 69e0be6..77f38a5 100644
--- a/dir.c
+++ b/dir.c
@@ -1027,7 +1027,6 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
 				exc = x;
 				break;
 			}
-			continue;
 		}
 
 		if (x->flags & EXC_FLAG_MUSTBEDIR) {
-- 8< --
--
Duy

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: bug: sparse config interpretation incorrectness in 2.8.0-rc2
  2016-03-17  0:56 ` Duy Nguyen
@ 2016-03-17  6:49   ` Durham Goode
  2016-03-17  7:51   ` Junio C Hamano
  1 sibling, 0 replies; 30+ messages in thread
From: Durham Goode @ 2016-03-17  6:49 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: git, Mateusz Kwapich

On 3/16/16 5:56 PM, Duy Nguyen wrote:
> On Wed, Mar 16, 2016 at 05:09:23PM -0700, Durham Goode wrote:
>> Using git 2.8.0-rc2, given a repo with the following files:
>>
>> - one/hideme
>> - one/donthide
>> - two/foo
>>
>> A sparse config of:
>>
>> cat > .git/info/sparse-checkout <<EOF
>> /*
>> !one/hideme
>> EOF
>>
>> Results in a repository that only has `one/donthide` in it.  I would
>> expect `two/foo`to be present as well.  This worked in 2.6, and
>> bisecting it points to d589a67eceacd1cc171bbe94906ca7c9a0edd8c5
>> "dir.c: don't exclude whole dir prematurely" (author cc'd).
> Thank you. This should fix it. I think I understand why it goes
> wrong. I'm going to run some more tests and post a proper patch later.
>
> -- 8< --
> diff --git a/dir.c b/dir.c
> index 69e0be6..77f38a5 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1027,7 +1027,6 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
>   				exc = x;
>   				break;
>   			}
> -			continue;
>   		}
>   
>   		if (x->flags & EXC_FLAG_MUSTBEDIR) {
> -- 8< --
> --
> Duy
To provide some perspective on the severity of this bug: all of our 
users who were using sparse checkouts had their working copy mostly 
deleted when they did a checkout after we upgraded to 2.8.0-rc2.  So I'd 
think this is a fix that should get in to 2.8.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bug: sparse config interpretation incorrectness in 2.8.0-rc2
  2016-03-17  0:09 bug: sparse config interpretation incorrectness in 2.8.0-rc2 Durham Goode
  2016-03-17  0:56 ` Duy Nguyen
@ 2016-03-17  7:22 ` Junio C Hamano
  2016-03-17 17:51   ` Durham Goode
  2016-03-17 12:45 ` [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2016-03-17  7:22 UTC (permalink / raw)
  To: Durham Goode; +Cc: git, pclouds, Mateusz Kwapich

Durham Goode <durham@fb.com> writes:

> Using git 2.8.0-rc2, given a repo with the following files:
>
> - one/hideme
> - one/donthide
> - two/foo
>
> A sparse config of:
>
> cat > .git/info/sparse-checkout <<EOF
> /*
> !one/hideme
> EOF
>
> Results in a repository that only has `one/donthide` in it.  I would
> expect `two/foo`to be present as well.  This worked in 2.6,...

2.6 is a tad too old as a reference, as the "!reinclusion" has been
in flux in recent releases.  Can you test these?

 - 2.7.0
 - 2.7.1
 - e79112d

I suspect that v2.7.0 would be broken, v2.7.1 is OK and e79112d
would also be OK (I am guessing this from [1]).  e79112d is the last
version on the 'master' branch without the topic that contains the
commit you bisected down to, but between 2.7.0 and 2.7.1 there was a
reversion of a commit that introduced the original issue.

The commit you bisected down to that is on 'master', IIUC, was a
(faulty) attempt to fix the breakage in a different way.


[Reference]

*1* http://thread.gmane.org/gmane.comp.version-control.git/288228/focus=288273

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bug: sparse config interpretation incorrectness in 2.8.0-rc2
  2016-03-17  0:56 ` Duy Nguyen
  2016-03-17  6:49   ` Durham Goode
@ 2016-03-17  7:51   ` Junio C Hamano
  2016-03-17 10:17     ` Duy Nguyen
  1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2016-03-17  7:51 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Durham Goode, git, Mateusz Kwapich

Duy Nguyen <pclouds@gmail.com> writes:

> On Wed, Mar 16, 2016 at 05:09:23PM -0700, Durham Goode wrote:
>> Using git 2.8.0-rc2, given a repo with the following files:
>> 
>> - one/hideme
>> - one/donthide
>> - two/foo
>> 
>> A sparse config of:
>> 
>> cat > .git/info/sparse-checkout <<EOF
>> /*
>> !one/hideme
>> EOF
>> 
>> Results in a repository that only has `one/donthide` in it.  I would
>> expect `two/foo`to be present as well.  This worked in 2.6, and
>> bisecting it points to d589a67eceacd1cc171bbe94906ca7c9a0edd8c5
>> "dir.c: don't exclude whole dir prematurely" (author cc'd).
>
> Thank you. This should fix it. I think I understand why it goes
> wrong. I'm going to run some more tests and post a proper patch later.

I admit that I've always considered "sparse checkout" was an
uninteresting experimental feature that I do not want to pay too
much attention to, and the only review I did carefully myself was to
make sure that patches around that area would not change the
behaviour of the original code in repositories that do not use that
feature, so please do not laugh too loudly at me if I ask the
obvious ;-) So the pattern list in that info/sparse-checkout file
shares the same logic with the gitignore mechanism, and the paths
that would have been "ignored" if the pattern list were in the
.gitignore file are the ones that would be included in the checkout?

If so, among the three paths Durham lists:

 * one/hideme matches !one/hideme the last, i.e. it would not be
   ignored if the pattern appeared in .gitignore, hence it should not
   be in the checkout;

 * one/donthide matches * the last, i.e. would be ignored, hence it
   should be in the checkout;

 * two/foo (or really anything) matches * the last, i.e. would be
   ignored, hence it should be in the checkout.

Am I reading the bug correctly?

> -- 8< --
> diff --git a/dir.c b/dir.c
> index 69e0be6..77f38a5 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1027,7 +1027,6 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
>  				exc = x;
>  				break;
>  			}
> -			continue;
>  		}
>  
>  		if (x->flags & EXC_FLAG_MUSTBEDIR) {
> -- 8< --
> --
> Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bug: sparse config interpretation incorrectness in 2.8.0-rc2
  2016-03-17  7:51   ` Junio C Hamano
@ 2016-03-17 10:17     ` Duy Nguyen
  2016-03-17 13:04       ` Johannes Schindelin
  0 siblings, 1 reply; 30+ messages in thread
From: Duy Nguyen @ 2016-03-17 10:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Durham Goode, Git Mailing List, Mateusz Kwapich

On Thu, Mar 17, 2016 at 2:51 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> On Wed, Mar 16, 2016 at 05:09:23PM -0700, Durham Goode wrote:
>>> Using git 2.8.0-rc2, given a repo with the following files:
>>>
>>> - one/hideme
>>> - one/donthide
>>> - two/foo
>>>
>>> A sparse config of:
>>>
>>> cat > .git/info/sparse-checkout <<EOF
>>> /*
>>> !one/hideme
>>> EOF
>>>
>>> Results in a repository that only has `one/donthide` in it.  I would
>>> expect `two/foo`to be present as well.  This worked in 2.6, and
>>> bisecting it points to d589a67eceacd1cc171bbe94906ca7c9a0edd8c5
>>> "dir.c: don't exclude whole dir prematurely" (author cc'd).
>>
>> Thank you. This should fix it. I think I understand why it goes
>> wrong. I'm going to run some more tests and post a proper patch later.
>
> I admit that I've always considered "sparse checkout" was an
> uninteresting experimental feature that I do not want to pay too
> much attention to, and the only review I did carefully myself was to
> make sure that patches around that area would not change the
> behaviour of the original code in repositories that do not use that
> feature, so please do not laugh too loudly at me if I ask the
> obvious ;-)

Good news for you is there's "sparse checkout v2" in the work, that
would not rely on exclude engine and should be both faster and more
elegant. That should reduce "sparse checkout v1" usage to really small
cases.

> So the pattern list in that info/sparse-checkout file
> shares the same logic with the gitignore mechanism, and the paths
> that would have been "ignored" if the pattern list were in the
> .gitignore file are the ones that would be included in the checkout?
>
> If so, among the three paths Durham lists:
>
>  * one/hideme matches !one/hideme the last, i.e. it would not be
>    ignored if the pattern appeared in .gitignore, hence it should not
>    be in the checkout;
>
>  * one/donthide matches * the last, i.e. would be ignored, hence it
>    should be in the checkout;
>
>  * two/foo (or really anything) matches * the last, i.e. would be
>    ignored, hence it should be in the checkout.
>
> Am I reading the bug correctly?

Yes that's how I read it too. The patterns basically say "include
everything (/*) except one/hideme (!one/hideme)" so two/foo should be
included.

>> -- 8< --
>> diff --git a/dir.c b/dir.c
>> index 69e0be6..77f38a5 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -1027,7 +1027,6 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
>>                               exc = x;
>>                               break;
>>                       }
>> -                     continue;
>>               }
>>
>>               if (x->flags & EXC_FLAG_MUSTBEDIR) {
>> -- 8< --
>> --
>> Duy



-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic
  2016-03-17  0:09 bug: sparse config interpretation incorrectness in 2.8.0-rc2 Durham Goode
  2016-03-17  0:56 ` Duy Nguyen
  2016-03-17  7:22 ` Junio C Hamano
@ 2016-03-17 12:45 ` Nguyễn Thái Ngọc Duy
  2016-03-17 12:45   ` Nguyễn Thái Ngọc Duy
                     ` (3 more replies)
  2 siblings, 4 replies; 30+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-03-17 12:45 UTC (permalink / raw)
  To: git
  Cc: durham, mitrandir, Junio C Hamano, Nguyễn Thái Ngọc Duy

The topic in question introduces "sticky" path list for this purpose:
given a path 'abc/def', if 'abc' already matches a pattern X, it's added
to X's sticky path list. When we need to check if 'abc/def' matches
pattern X and see that 'abc' is already in the list, we conclude right
away that 'abc/def' also matches X.

The short reason (*) for sticky path list is to workaround limitations
of matching code that will return "not match" when we compare
'abc/def' and pattern X.

The bug is in this code. Not only it does "when we need to check if
'abc/def' matches...", it does an extra thing: if 'foo/bar' is _not_ in
the list, return 'not matched' by bypassing all matching code with the
"continue;" statement. It should let the remaining code decide match
status instead.

This bug affects both .gitignore and sparse checkout, but it's reported
as a sparse checkout bug, so let's examine how it happens. The
sparse-checkout pattern has two rules

    /*
    !one/hideme

and the worktree has three tracked files, one/hideme, one/showme and
two/showme. What happens is this

* check "one", it matches the first pattern -> positive -> keep
  examining.

*1* "thanks" to 'nd/exclusion-regression-fix' we detect this pair of
  patterns, so we put "one" in the sticky list of pattern "/*"

* enter "one", check "one/hideme", it matches the second pattern
  first (we search from bottom up) -> negative -> excluded

* check "one/showme", it does not match the second pattern.

*2* We then check it against the first pattern and notice the sticky list
  that includes "one", so we decide right away that "one/showme" is
  included.

* leave "one", check "two" which does not match the second pattern.

*3* then we check "two" against the first pattern and notice that this
   pattern has a non-empty sticky list, which contains "one", not "two".
   This bug kicks in and bypasses the true matching logic for pattern
   "/*". As a result, we exclude "two/showme".

One may notice that the order of these steps matter. If *3* occurs
before *1*, then the sticky list at that moment is empty and the bug
does not kick in. Sparse checkout always examines entries in
alphabetical order, so "abc/showme" would be examined before "one" and
not hit this bug!

The last remark is important when we move to .gitignore. We receive the
list of entries with readdir() and cannot control the order of
entries. Which means we can't write a test for .gitignore that will
reliably fail without this fix. Which is why this patch only adds a test
for sparse checkout, even though the same above steps happen to
.gitignore.

(*) The problem is known and will be fixed later and described in
detail then. For this commit, it's sufficient to see the following
link because the long reason is really long:

http://article.gmane.org/gmane.comp.version-control.git/288479

Reported-by: Durham Goode <durham@fb.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 dir.c                                |  1 -
 t/t1011-read-tree-sparse-checkout.sh | 20 ++++++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 69e0be6..77f38a5 100644
--- a/dir.c
+++ b/dir.c
@@ -1027,7 +1027,6 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
 				exc = x;
 				break;
 			}
-			continue;
 		}
 
 		if (x->flags & EXC_FLAG_MUSTBEDIR) {
diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 0c74bee..ecc5e93 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -274,4 +274,24 @@ test_expect_success 'checkout with --ignore-skip-worktree-bits' '
 	git diff --exit-code HEAD
 '
 
+test_expect_success 'sparse checkout and dir.c sticky bits' '
+	git init sticky &&
+	(
+		cd sticky &&
+		mkdir one two &&
+		touch one/hideme one/showme two/showme &&
+		git add . &&
+		git commit -m initial &&
+		cat >.git/info/sparse-checkout <<-\EOF &&
+		/*
+		!one/hideme
+		EOF
+		git config core.sparsecheckout true &&
+		git checkout &&
+		test_path_is_missing one/hideme &&
+		test_path_is_file    one/showme &&
+		test_path_is_file    two/showme
+	)
+'
+
 test_done
-- 
2.8.0.rc0.210.gd302cd2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic
  2016-03-17 12:45 ` [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic Nguyễn Thái Ngọc Duy
@ 2016-03-17 12:45   ` Nguyễn Thái Ngọc Duy
  2016-03-17 12:54     ` [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR" Nguyễn Thái Ngọc Duy
  2016-03-17 12:45   ` [PATCH 2/2] dir.c: correct "stuck" logging logic Nguyễn Thái Ngọc Duy
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-03-17 12:45 UTC (permalink / raw)
  To: git
  Cc: durham, mitrandir, Junio C Hamano, Nguyễn Thái Ngọc Duy

The topic in question introduces "sticky" path list for this purpose:
given a path 'abc/def', if 'abc' already matches a pattern X, it's added
to X's sticky path list. When we need to check if 'abc/def' matches
pattern X and see that 'abc' is already in the list, we conclude right
away that 'abc/def' also matches X.

The short reason for sticky path list is to workaround limitations of
matching code (*) that will return "not match" when we compare 'abc/def'
and pattern X.

The bug is in this code. Not only it does "when we need to check if
'abc/def' matches...", it does an extra thing: if 'foo/bar' is _not_ in
the list, return 'not matched' by bypassing all matching code with the
"continue;" statement. It should let the remaining code decide match
status instead.

This bug affects both .gitignore and sparse checkout, but it's reported
as a sparse checkout bug, so let's examine how it happens. The
sparse-checkout pattern has two rules

    /*
    !one/hideme

and the worktree has three tracked files, one/hideme, one/showme and
two/showme. What happens is this

* check "one", it matches the first pattern -> positive -> keep
  examining.

*1* "thanks" to 'nd/exclusion-regression-fix' we detect this pair of
  patterns, so we put "one" in the sticky list of pattern "/*"

* enter "one", check "one/hideme", it matches the second pattern
  first (we search from bottom up) -> negative -> excluded

* check "one/showme", it does not match the second pattern.

*2* We then check it against the first pattern and notice the sticky list
  that includes "one", so we decide right away that "one/showme" is
  included.

* leave "one", check "two" which does not match the second pattern.

*3* then we check "two" against the first pattern and notice that this
   pattern has a non-empty sticky list, which contains "one", not "two".
   This bug kicks in and bypasses the true matching logic for pattern
   "/*". As a result, we exclude "two/showme".

One may notice that the order of these steps matter. If *3* occurs
before *1*, then the sticky list at that moment is empty and the bug
does not kick in. Sparse checkout always examines entries in
alphabetical order, so "abc/showme" would be examined before "one" and
not hit this bug!

The last remark is important when we move to .gitignore. We receive the
list of entries with readdir() and cannot control the order of
entries. Which means we can't write a test for .gitignore that will
reliably fail without this fix. Which is why this patch only adds a test
for sparse checkout, even though the same above steps happen to
.gitignore.

(*) which will be fixed later and described in detail then. For this
commit, it's sufficient to see the following link because the
explanation is long:

http://article.gmane.org/gmane.comp.version-control.git/288479

Reported-by: Durham Goode <durham@fb.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 dir.c                                |  1 -
 t/t1011-read-tree-sparse-checkout.sh | 20 ++++++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 69e0be6..77f38a5 100644
--- a/dir.c
+++ b/dir.c
@@ -1027,7 +1027,6 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
 				exc = x;
 				break;
 			}
-			continue;
 		}
 
 		if (x->flags & EXC_FLAG_MUSTBEDIR) {
diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 0c74bee..ecc5e93 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -274,4 +274,24 @@ test_expect_success 'checkout with --ignore-skip-worktree-bits' '
 	git diff --exit-code HEAD
 '
 
+test_expect_success 'sparse checkout and dir.c sticky bits' '
+	git init sticky &&
+	(
+		cd sticky &&
+		mkdir one two &&
+		touch one/hideme one/showme two/showme &&
+		git add . &&
+		git commit -m initial &&
+		cat >.git/info/sparse-checkout <<-\EOF &&
+		/*
+		!one/hideme
+		EOF
+		git config core.sparsecheckout true &&
+		git checkout &&
+		test_path_is_missing one/hideme &&
+		test_path_is_file    one/showme &&
+		test_path_is_file    two/showme
+	)
+'
+
 test_done
-- 
2.8.0.rc0.210.gd302cd2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] dir.c: correct "stuck" logging logic
  2016-03-17 12:45 ` [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic Nguyễn Thái Ngọc Duy
  2016-03-17 12:45   ` Nguyễn Thái Ngọc Duy
@ 2016-03-17 12:45   ` Nguyễn Thái Ngọc Duy
  2016-03-17 12:45   ` Nguyễn Thái Ngọc Duy
  2016-03-18 17:38   ` [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic Junio C Hamano
  3 siblings, 0 replies; 30+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-03-17 12:45 UTC (permalink / raw)
  To: git
  Cc: durham, mitrandir, Junio C Hamano, Nguyễn Thái Ngọc Duy

Before the last patch, the loop in last_exclude_matching_from_list()
looks like this (greatly simplified of course)

   exc = NULL;
   for (...) {
      if (sticky_paths.nr) {
         if (matched) {
            exc = non-NULL;
            break;
         }
         continue;
      }
      ...
   }

With this loop, if sticky_paths.nr is non-zero and exc is not NULL, we
know we have found a sticky path and can log " (stuck)".

With the last patch, the "continue;" line is removed and that won't work
anymore. So explicitly keep track of when to print " (stuck)".

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 dir.c                                | 4 +++-
 t/t1011-read-tree-sparse-checkout.sh | 8 ++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 77f38a5..2028094 100644
--- a/dir.c
+++ b/dir.c
@@ -1005,6 +1005,7 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
 {
 	struct exclude *exc = NULL; /* undecided */
 	int i, maybe_descend = 0;
+	const char *stuck = "";
 
 	if (!el->nr)
 		return NULL;	/* undefined */
@@ -1024,6 +1025,7 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
 			if (*dtype == DT_UNKNOWN)
 				*dtype = get_dtype(NULL, pathname, pathlen);
 			if (match_sticky(x, pathname, pathlen, *dtype)) {
+				stuck = " (stuck)";
 				exc = x;
 				break;
 			}
@@ -1093,7 +1095,7 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
 	trace_printf_key(&trace_exclude, "exclude: %.*s vs %s at line %d => %s%s\n",
 			 pathlen, pathname, exc->pattern, exc->srcpos,
 			 exc->flags & EXC_FLAG_NEGATIVE ? "no" : "yes",
-			 exc->sticky_paths.nr ? " (stuck)" : "");
+			 stuck);
 	return exc;
 }
 
diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index ecc5e93..f0b856f 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -287,10 +287,14 @@ test_expect_success 'sparse checkout and dir.c sticky bits' '
 		!one/hideme
 		EOF
 		git config core.sparsecheckout true &&
-		git checkout &&
+		GIT_TRACE_EXCLUDE=2 git checkout 2>&1 | grep stuck >stuck-list &&
 		test_path_is_missing one/hideme &&
 		test_path_is_file    one/showme &&
-		test_path_is_file    two/showme
+		test_path_is_file    two/showme &&
+		cat >expected <<-\EOF &&
+		exclude: one/showme vs /* at line 1 => yes (stuck)
+		EOF
+		test_cmp expected stuck-list
 	)
 '
 
-- 
2.8.0.rc0.210.gd302cd2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] dir.c: correct "stuck" logging logic
  2016-03-17 12:45 ` [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic Nguyễn Thái Ngọc Duy
  2016-03-17 12:45   ` Nguyễn Thái Ngọc Duy
  2016-03-17 12:45   ` [PATCH 2/2] dir.c: correct "stuck" logging logic Nguyễn Thái Ngọc Duy
@ 2016-03-17 12:45   ` Nguyễn Thái Ngọc Duy
  2016-03-18 17:38   ` [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic Junio C Hamano
  3 siblings, 0 replies; 30+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-03-17 12:45 UTC (permalink / raw)
  To: git
  Cc: durham, mitrandir, Junio C Hamano, Nguyễn Thái Ngọc Duy

Before the last patch, the loop in last_exclude_matching_from_list()
looks like this (greatly simplified of course)

   exc = NULL;
   for (...) {
      if (sticky_paths.nr) {
         if (matched) {
            exc = something;
            break;
         }
         continue;
      }
      ...
   }

With this loop, if sticky_paths.nr is non-zero and exc is not NULL, we
know we have found a sticky path and can log " (stuck)".

With the last patch, the "continue;" line is removed and that won't work
anymore. So explicitly keep track of when to print " (stuck)".

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 dir.c                                | 4 +++-
 t/t1011-read-tree-sparse-checkout.sh | 8 ++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 77f38a5..2028094 100644
--- a/dir.c
+++ b/dir.c
@@ -1005,6 +1005,7 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
 {
 	struct exclude *exc = NULL; /* undecided */
 	int i, maybe_descend = 0;
+	const char *stuck = "";
 
 	if (!el->nr)
 		return NULL;	/* undefined */
@@ -1024,6 +1025,7 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
 			if (*dtype == DT_UNKNOWN)
 				*dtype = get_dtype(NULL, pathname, pathlen);
 			if (match_sticky(x, pathname, pathlen, *dtype)) {
+				stuck = " (stuck)";
 				exc = x;
 				break;
 			}
@@ -1093,7 +1095,7 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
 	trace_printf_key(&trace_exclude, "exclude: %.*s vs %s at line %d => %s%s\n",
 			 pathlen, pathname, exc->pattern, exc->srcpos,
 			 exc->flags & EXC_FLAG_NEGATIVE ? "no" : "yes",
-			 exc->sticky_paths.nr ? " (stuck)" : "");
+			 stuck);
 	return exc;
 }
 
diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index ecc5e93..f0b856f 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -287,10 +287,14 @@ test_expect_success 'sparse checkout and dir.c sticky bits' '
 		!one/hideme
 		EOF
 		git config core.sparsecheckout true &&
-		git checkout &&
+		GIT_TRACE_EXCLUDE=2 git checkout 2>&1 | grep stuck >stuck-list &&
 		test_path_is_missing one/hideme &&
 		test_path_is_file    one/showme &&
-		test_path_is_file    two/showme
+		test_path_is_file    two/showme &&
+		cat >expected <<-\EOF &&
+		exclude: one/showme vs /* at line 1 => yes (stuck)
+		EOF
+		test_cmp expected stuck-list
 	)
 '
 
-- 
2.8.0.rc0.210.gd302cd2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR"
  2016-03-17 12:45   ` Nguyễn Thái Ngọc Duy
@ 2016-03-17 12:54     ` Nguyễn Thái Ngọc Duy
  2016-03-17 23:49       ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-03-17 12:54 UTC (permalink / raw)
  To: git
  Cc: durham, mitrandir, Junio C Hamano, Nguyễn Thái Ngọc Duy

For NODIR case, the patterns look like this

    *          # exclude dir, dir/file1 and dir/file2..
    !dir       # ..except that dir and everything inside is re-included..
    dir/file2  # ..except (again!) that dir/file2 is excluded
               # ..which means dir/file1 stays included

When we stay at topdir and test "dir", everything is hunky dory, current
code returns "re-include dir" as expected. When we stay in "dir" and
examine "dir/file1", however, match_basename() checks if the base name
component, which is "file1", matches "dir" from the second rule.

This is wrong. We contradict ourselves because earlier we decided to
re-include dir/file1 (from the second rule) when we were at toplevel.
Because the second rule is ignored, we hit the first one and return
"excluded" even though "dir/file1" should be re-included.

In the MUSTBEDIR case, the patterns look like this

    *          # exclude dir, dir/file1 and dir/file2..
    !dir/      #  ..except that dir and everything inside is re-included..
    dir/file2  # ..except (again!) that dir/file2 is excluded
               # ..which means dir/file1 stays included

Again, we're ok at the toplevel, then we enter "dir" and test
"dir/file1". The MUSTBEDIR code tests if the _full_ path (ie. "dir/file1")
is a directory. We want it to test the "dir" part from "dir/file1"
instead.

In both cases, the second decision on "dir/file1" is wrong and
contradicts with the first one on "dir". This is a perfect use case for
"sticky path list" added earlier to solve a different (but quite
similar) problem. So, when we have the right decision the first time, we
mark the path sticky to override the coming wrong decision.

The reason to do this, instead of actually fixing the code to make the
right second decision in the first place, is because it's soooooo
complicated. There are many combinations to take care of, many
optimizations involved to keep the cost on normal/common case (and
what's being described here is NOT normal) down to minimum. On top of
that, exclude code is already complicated as it is. It's best to not
turn the code upside down. Not until this approach is proved
insufficient.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 This is NOT for 2.8.0! Posted here to give you some context on the
 problem mentioned in the first patch.

 If you actually read the link in 1/2, you'll notice this patch is
 completely different. "soooooo complicated" is not an exaggeration. I
 found some problem with that old patch, which ended up with a
 combination explosion of cases I would have to cover separately,
 essentially the same thing I encountered in my first try before that
 patch. I finally admitted I could not fit all that in my brain
 anymore.

 dir.c                              |  5 ++++
 t/t3007-ls-files-other-negative.sh | 51 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/dir.c b/dir.c
index 2028094..a704e8a 100644
--- a/dir.c
+++ b/dir.c
@@ -1090,6 +1090,11 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
 					 x->pattern, x->srcpos);
 			return NULL;
 		}
+	} else if (exc->flags & EXC_FLAG_NEGATIVE) {
+		if (*dtype == DT_UNKNOWN)
+			*dtype = get_dtype(NULL, pathname, pathlen);
+		if (*dtype == DT_DIR)
+			add_sticky(exc, pathname, pathlen);
 	}
 
 	trace_printf_key(&trace_exclude, "exclude: %.*s vs %s at line %d => %s%s\n",
diff --git a/t/t3007-ls-files-other-negative.sh b/t/t3007-ls-files-other-negative.sh
index 0797b86..c8f39dd 100755
--- a/t/t3007-ls-files-other-negative.sh
+++ b/t/t3007-ls-files-other-negative.sh
@@ -150,4 +150,55 @@ test_expect_success 'match, literal pathname, nested negatives' '
 	test_cmp tmp/expected tmp/actual
 '
 
+test_expect_success 're-include case, NODIR' '
+	git init re-include-nodir &&
+	(
+		cd re-include-nodir &&
+		mkdir dir &&
+		touch dir/file1 dir/file2 &&
+		cat >.gitignore <<-\EOF &&
+		*
+		!dir
+		dir/file2
+		EOF
+		git ls-files -o --exclude-standard >../actual &&
+		echo dir/file1 >../expected &&
+		test_cmp ../expected ../actual
+	)
+'
+
+test_expect_success 're-include case, MUSTBEDIR with NODIR' '
+	git init re-include-mustbedir &&
+	(
+		cd re-include-mustbedir &&
+		mkdir dir &&
+		touch dir/file1 dir/file2 &&
+		cat >.gitignore <<-\EOF &&
+		*
+		!dir/
+		dir/file2
+		EOF
+		git ls-files -o --exclude-standard >../actual &&
+		echo dir/file1 >../expected &&
+		test_cmp ../expected ../actual
+	)
+'
+
+test_expect_success 're-include case, MUSTBEDIR without NODIR' '
+	git init re-include-pathname &&
+	(
+		cd re-include-pathname &&
+		mkdir -p dir1/dir2 &&
+		touch dir1/dir2/file1 dir1/dir2/file2 &&
+		cat >.gitignore <<-\EOF &&
+		*
+		!dir1/dir2/
+		dir1/dir2/file2
+		EOF
+		git ls-files -o --exclude-standard >../actual &&
+		echo dir1/dir2/file1 >../expected &&
+		test_cmp ../expected ../actual
+	)
+'
+
 test_done
-- 
2.8.0.rc0.210.gd302cd2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: bug: sparse config interpretation incorrectness in 2.8.0-rc2
  2016-03-17 10:17     ` Duy Nguyen
@ 2016-03-17 13:04       ` Johannes Schindelin
  2016-03-17 13:20         ` Duy Nguyen
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Schindelin @ 2016-03-17 13:04 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Junio C Hamano, Durham Goode, Git Mailing List, Mateusz Kwapich

Hi Duy,

On Thu, 17 Mar 2016, Duy Nguyen wrote:

> Good news for you is there's "sparse checkout v2" in the work, that
> would not rely on exclude engine and should be both faster and more
> elegant. That should reduce "sparse checkout v1" usage to really small
> cases.

I dabbled myself with speeding up the entire exclude engine (essentially,
I use a hash map of the non-wildcard prefixes to the corresponding line
number). So I am interested to see what your approach is. Could you point
me to it (I did not see any obvious branch in your GitHub space)?

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bug: sparse config interpretation incorrectness in 2.8.0-rc2
  2016-03-17 13:04       ` Johannes Schindelin
@ 2016-03-17 13:20         ` Duy Nguyen
  2016-03-17 13:46           ` Johannes Schindelin
  0 siblings, 1 reply; 30+ messages in thread
From: Duy Nguyen @ 2016-03-17 13:20 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Durham Goode, Git Mailing List, Mateusz Kwapich

On Thu, Mar 17, 2016 at 8:04 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Duy,
>
> On Thu, 17 Mar 2016, Duy Nguyen wrote:
>
>> Good news for you is there's "sparse checkout v2" in the work, that
>> would not rely on exclude engine and should be both faster and more
>> elegant. That should reduce "sparse checkout v1" usage to really small
>> cases.
>
> I dabbled myself with speeding up the entire exclude engine (essentially,
> I use a hash map of the non-wildcard prefixes to the corresponding line
> number). So I am interested to see what your approach is. Could you point
> me to it (I did not see any obvious branch in your GitHub space)?

First of all it's not about speeding up exclude engine. I think I did
that (sort of) already with untracked cache (have you tried it on
Windows?)

It's Junio's idea actually, what I'm implementing. The index contains
a list of _files_ yes?
What if the index contains also _directories_? Of course we can't
"refresh" or check if the whole directory is "dirty" like a file, so
those dirs will stay registered in the index, but be entirely missing
from worktree. The effect is the same as sparse checkout, except that
we keep all "hidden" files of sparse checkout in the index. In fact I
plan to reuse CE_SKIP_WORKTREE (from sparse checkout) to mark those
dirs.

The advantage is obvious: the index can be much smaller, which leads
to faster update and access. Of course now you can only hide either
whole directory, or hide nothing. It's less flexible than sparse
checkout "v1", but you can use "v1" on top of "v2" for that. Changing
checkout regions involves unfolding or folding certain directories,
sparse patterns are not used.

If you are still curious, you can check out my 'narrow-checkout'
branch on github. The basic is working. Folding/unfolding is not. Be
warned that it's even dirtier than your rebase-helper branch (and will
not be released any time soon either, lots of other stuff to be
finished first).
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bug: sparse config interpretation incorrectness in 2.8.0-rc2
  2016-03-17 13:20         ` Duy Nguyen
@ 2016-03-17 13:46           ` Johannes Schindelin
  2016-03-17 14:00             ` Duy Nguyen
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Schindelin @ 2016-03-17 13:46 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Junio C Hamano, Durham Goode, Git Mailing List, Mateusz Kwapich

Hi Duy,

On Thu, 17 Mar 2016, Duy Nguyen wrote:

> On Thu, Mar 17, 2016 at 8:04 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Thu, 17 Mar 2016, Duy Nguyen wrote:
> >
> >> Good news for you is there's "sparse checkout v2" in the work, that
> >> would not rely on exclude engine and should be both faster and more
> >> elegant. That should reduce "sparse checkout v1" usage to really
> >> small cases.
> >
> > I dabbled myself with speeding up the entire exclude engine
> > (essentially, I use a hash map of the non-wildcard prefixes to the
> > corresponding line number). So I am interested to see what your
> > approach is. Could you point me to it (I did not see any obvious
> > branch in your GitHub space)?
> 
> First of all it's not about speeding up exclude engine. I think I did
> that (sort of) already with untracked cache (have you tried it on
> Windows?)

Yes, I tested the untracked cache and enabled it in a couple of my
repositories. It works great. But it has little to do with the exclude
engine ;-)

> It's Junio's idea actually, what I'm implementing. The index contains
> a list of _files_ yes?
> What if the index contains also _directories_? Of course we can't
> "refresh" or check if the whole directory is "dirty" like a file, so
> those dirs will stay registered in the index, but be entirely missing
> from worktree. The effect is the same as sparse checkout, except that
> we keep all "hidden" files of sparse checkout in the index. In fact I
> plan to reuse CE_SKIP_WORKTREE (from sparse checkout) to mark those
> dirs.

Oh, I see.

Unfortunately, this does not help me at all. In the use case I am trying
to get to work fast, we have tons and tons of directories and need *one*
file in pretty much *all* of those directories, and exclude most of the
other files.

To make matters even worse, the list of excluded (or included) files is
constantly changing.

> The advantage is obvious: the index can be much smaller, which leads
> to faster update and access. Of course now you can only hide either
> whole directory, or hide nothing. It's less flexible than sparse
> checkout "v1", but you can use "v1" on top of "v2" for that. Changing
> checkout regions involves unfolding or folding certain directories,
> sparse patterns are not used.
> 
> If you are still curious, you can check out my 'narrow-checkout'
> branch on github. The basic is working. Folding/unfolding is not. Be
> warned that it's even dirtier than your rebase-helper branch (and will
> not be released any time soon either, lots of other stuff to be
> finished first).

Thanks!
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bug: sparse config interpretation incorrectness in 2.8.0-rc2
  2016-03-17 13:46           ` Johannes Schindelin
@ 2016-03-17 14:00             ` Duy Nguyen
  2016-03-18 15:49               ` Johannes Schindelin
  0 siblings, 1 reply; 30+ messages in thread
From: Duy Nguyen @ 2016-03-17 14:00 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Durham Goode, Git Mailing List, Mateusz Kwapich

On Thu, Mar 17, 2016 at 8:46 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Unfortunately, this does not help me at all. In the use case I am trying
> to get to work fast, we have tons and tons of directories and need *one*
> file in pretty much *all* of those directories, and exclude most of the
> other files.
>
> To make matters even worse, the list of excluded (or included) files is
> constantly changing.

OK very weird use case to me :) There's something you might try. [1]
can help trimming unrelated patterns, fewer patterns to match for a
dir, faster matching.

I knew it might help speed up sparse checkout (because we can't spread
sparse patterns over many .gitignore files,  we only have one place
for all patterns). I did try at some point. I just don't remember if I
gave up because I didn't think it's worth the effort or I found out
something wrong with it for sparse checkout case. It may or may help
your case, depending on the patterns, of course.

[1] http://article.gmane.org/gmane.comp.version-control.git/217958
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bug: sparse config interpretation incorrectness in 2.8.0-rc2
  2016-03-17  7:22 ` Junio C Hamano
@ 2016-03-17 17:51   ` Durham Goode
  0 siblings, 0 replies; 30+ messages in thread
From: Durham Goode @ 2016-03-17 17:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, pclouds, Mateusz Kwapich

On 3/17/16 12:22 AM, Junio C Hamano wrote:
> Durham Goode <durham@fb.com> writes:
>
>> Using git 2.8.0-rc2, given a repo with the following files:
>>
>> - one/hideme
>> - one/donthide
>> - two/foo
>>
>> A sparse config of:
>>
>> cat > .git/info/sparse-checkout <<EOF
>> /*
>> !one/hideme
>> EOF
>>
>> Results in a repository that only has `one/donthide` in it.  I would
>> expect `two/foo`to be present as well.  This worked in 2.6,...
> 2.6 is a tad too old as a reference, as the "!reinclusion" has been
> in flux in recent releases.  Can you test these?
>
>   - 2.7.0
>   - 2.7.1
>   - e79112d
>
> I suspect that v2.7.0 would be broken, v2.7.1 is OK and e79112d
> would also be OK (I am guessing this from [1]).  e79112d is the last
> version on the 'master' branch without the topic that contains the
> commit you bisected down to, but between 2.7.0 and 2.7.1 there was a
> reversion of a commit that introduced the original issue.
>
> The commit you bisected down to that is on 'master', IIUC, was a
> (faulty) attempt to fix the breakage in a different way.
Since Duy has already sent a fix, I assume it's a real bug and he 
understands what's going on.  If you still want me to test those hashes 
let me know.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR"
  2016-03-17 12:54     ` [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR" Nguyễn Thái Ngọc Duy
@ 2016-03-17 23:49       ` Junio C Hamano
  2016-03-18  0:15         ` Duy Nguyen
  2016-03-18  4:51         ` Durham Goode
  0 siblings, 2 replies; 30+ messages in thread
From: Junio C Hamano @ 2016-03-17 23:49 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, durham, mitrandir

Thanks for these 5 patches, two of which need to be discarded ;-).
I think you can pick either one of 1/2, pick the one that says
"non-NULL" (as opposed to "something") in the log message for 2/2.

Durham, does it fix your issues if you apply the 1/2 and 2/2 (but
not 3/2) on top of 2.8-rc?

Duy, how comfortable are you with the idea of including this two in
2.8 final?  We have long passed the final -rc, and while it is
probably OK to prolong the cycle and do another -rc, we cannot keep
going like "oops, there is another thing discovered by somebod new"
forever.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR"
  2016-03-17 23:49       ` Junio C Hamano
@ 2016-03-18  0:15         ` Duy Nguyen
  2016-03-18  5:40           ` Junio C Hamano
  2016-03-18  4:51         ` Durham Goode
  1 sibling, 1 reply; 30+ messages in thread
From: Duy Nguyen @ 2016-03-18  0:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Durham Goode, Mateusz Kwapich

On Fri, Mar 18, 2016 at 6:49 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Thanks for these 5 patches, two of which need to be discarded ;-).
> I think you can pick either one of 1/2, pick the one that says
> "non-NULL" (as opposed to "something") in the log message for 2/2.

Sorry, I did "git send-email ... 00*" and it picked up *.patch~ as
well. non-NULL is the non-backup version.

> Durham, does it fix your issues if you apply the 1/2 and 2/2 (but
> not 3/2) on top of 2.8-rc?
>
> Duy, how comfortable are you with the idea of including this two in
> 2.8 final?  We have long passed the final -rc, and while it is
> probably OK to prolong the cycle and do another -rc, we cannot keep
> going like "oops, there is another thing discovered by somebody new"
> forever.

The fix is well understood. I did feel unsure about that "continue;"
when I first wrote it and should have given it more thought. No other
"unsure" feelings, so we're probably good from now on.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR"
  2016-03-17 23:49       ` Junio C Hamano
  2016-03-18  0:15         ` Duy Nguyen
@ 2016-03-18  4:51         ` Durham Goode
  2016-03-18  5:40           ` Duy Nguyen
  1 sibling, 1 reply; 30+ messages in thread
From: Durham Goode @ 2016-03-18  4:51 UTC (permalink / raw)
  To: Junio C Hamano, Nguyễn Thái Ngọc Duy; +Cc: git, mitrandir

On 3/17/16 4:49 PM, Junio C Hamano wrote:
> Thanks for these 5 patches, two of which need to be discarded ;-).
> I think you can pick either one of 1/2, pick the one that says
> "non-NULL" (as opposed to "something") in the log message for 2/2.
>
> Durham, does it fix your issues if you apply the 1/2 and 2/2 (but
> not 3/2) on top of 2.8-rc?
>
> Duy, how comfortable are you with the idea of including this two in
> 2.8 final?  We have long passed the final -rc, and while it is
> probably OK to prolong the cycle and do another -rc, we cannot keep
> going like "oops, there is another thing discovered by somebod new"
> forever.
>
> Thanks.
Patches 1+2 fix the repro steps in the report, yes.  But I've found 
another case that produces different results in 2.8 than in 2.7:

Given a repo with files:

dir1/dir2/show/file
dir1/dir2/hide/file

and a sparse-checkout of

/*
/dir1/dir2/show
!/dir1/dir2/

the working copy still contains dir1/dir2/hide/file when run from 
2.8.0-rc2. In git 2.6 and 2.7.3 it does not show up (which is the 
expected behavior, from what I understand of the docs).  Repro script is 
below.  Notice, the 'dir2/' part of the paths is important.  If I drop 
that directory, the issue doesn't repro.


#!/bin/bash

set -x
rm -rf sparse-test
GIT=git
$GIT init sparse-test
cd sparse-test
$GIT config --add core.sparsecheckout true

mkdir -p dir1/dir2/show dir1/dir2/hide
touch dir1/dir2/show/file1
touch dir1/dir2/hide/file2

$GIT add .
$GIT commit -m "initial commit"
$GIT read-tree --reset -u HEAD

mkdir .git/info
cat > .git/info/sparse-checkout <<EOF
/*
/dir1/dir2/show
!/dir1/dir2/
EOF
$GIT read-tree --reset -u HEAD

ls -R dir1/dir2
set +x
echo
echo expected: see only "dir1/dir2/show/file"
echo actual: see "dir1/dir2/hide/file" as well

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR"
  2016-03-18  0:15         ` Duy Nguyen
@ 2016-03-18  5:40           ` Junio C Hamano
  2016-03-18  5:51             ` Duy Nguyen
  2016-03-18  5:58             ` Eric Sunshine
  0 siblings, 2 replies; 30+ messages in thread
From: Junio C Hamano @ 2016-03-18  5:40 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Durham Goode, Mateusz Kwapich

Duy Nguyen <pclouds@gmail.com> writes:

> On Fri, Mar 18, 2016 at 6:49 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Thanks for these 5 patches, two of which need to be discarded ;-).
>> I think you can pick either one of 1/2, pick the one that says
>> "non-NULL" (as opposed to "something") in the log message for 2/2.
>
> Sorry, I did "git send-email ... 00*" and it picked up *.patch~ as
> well. non-NULL is the non-backup version.

Off-topic.  I wonder what people think about doing something like
this patch.

-- >8 --
send-email: detect and offer to skip backup files

Diligent people save output from format-patch to files, proofread
and edit them and then finally send the result out.  If the
resulting files are sent out with "git send-email 0*", this ends up
sending backup files (e.g. 0001-X.patch.backup or 0001-X.patch~)
left by their editors next to the final version.  Sending them with
"git send-email 0*.patch" (if format-patch was run with the standard
suffix) would avoid such an embarrassment, but not everybody is
careful.

After collecting files to be sent (and sorting them if read from a
directory), notice when the file being sent out has the same name as
the previous file, plus some suffix (e.g. 0001-X.patch was sent, and
we are looking at 0001-X.patch.backup or 0001-X.patch~), and the
suffix begins with a non-alnum (e.g. ".backup" or "~") and ask if
the user really wants to send it out.  Once the user skips sending
such a "backup" file, remember the suffix and stop asking the same
question (e.g. after skipping 0001-X.patch~, skip 0002-Y.patch~
without asking).


 git-send-email.perl | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/git-send-email.perl b/git-send-email.perl
index d356901..74ed01a 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -621,6 +621,8 @@ sub is_format_patch_arg {
 	push @files, $repo->command('format-patch', '-o', tempdir(CLEANUP => 1), @rev_list_opts);
 }
 
+@files = handle_backup_files(@files);
+
 if ($validate) {
 	foreach my $f (@files) {
 		unless (-p $f) {
@@ -1726,6 +1728,44 @@ sub validate_patch {
 	return;
 }
 
+sub handle_backup {
+	my ($last, $lastlen, $file, $known_suffix) = @_;
+	my ($suffix, $skip);
+
+	$skip = 0;
+	if (defined $last &&
+	    ($lastlen < length($file)) &&
+	    (substr($file, 0, $lastlen) eq $last) &&
+	    ($suffix = substr($file, $lastlen)) !~ /^[a-z0-9]/i) {
+		if (defined $known_suffix && $suffix eq $known_suffix) {
+			print "Skipping $file with backup suffix '$known_suffix'.\n";
+			$skip = 1;
+		} else {
+			my $answer = ask("Do you really want to send $file? (y|N): ",
+					 valid_re => qr/^(?:y|n)/i,
+					 default => 'y');
+			$skip = ($answer ne 'y');
+			if ($skip) {
+				$known_suffix = $suffix;
+			}
+		}
+	}
+	return ($skip, $known_suffix);
+}
+
+sub handle_backup_files {
+	my @file = @_;
+	my ($last, $lastlen, $known_suffix, $skip, @result);
+	for my $file (@file) {
+		($skip, $known_suffix) = handle_backup($last, $lastlen,
+						       $file, $known_suffix);
+		push @result, $file unless $skip;
+		$last = $file;
+		$lastlen = length($file);
+	}
+	return @result;
+}
+
 sub file_has_nonascii {
 	my $fn = shift;
 	open(my $fh, '<', $fn)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR"
  2016-03-18  4:51         ` Durham Goode
@ 2016-03-18  5:40           ` Duy Nguyen
  2016-03-18  6:21             ` Durham Goode
  2016-03-18 18:00             ` Junio C Hamano
  0 siblings, 2 replies; 30+ messages in thread
From: Duy Nguyen @ 2016-03-18  5:40 UTC (permalink / raw)
  To: Durham Goode; +Cc: Junio C Hamano, Git Mailing List, Mateusz Jakub Kwapich

On Fri, Mar 18, 2016 at 11:51 AM, Durham Goode <durham@fb.com> wrote:
> On 3/17/16 4:49 PM, Junio C Hamano wrote:
>>
>> Thanks for these 5 patches, two of which need to be discarded ;-).
>> I think you can pick either one of 1/2, pick the one that says
>> "non-NULL" (as opposed to "something") in the log message for 2/2.
>>
>> Durham, does it fix your issues if you apply the 1/2 and 2/2 (but
>> not 3/2) on top of 2.8-rc?
>>
>> Duy, how comfortable are you with the idea of including this two in
>> 2.8 final?  We have long passed the final -rc, and while it is
>> probably OK to prolong the cycle and do another -rc, we cannot keep
>> going like "oops, there is another thing discovered by somebod new"
>> forever.
>>
>> Thanks.
>
> Patches 1+2 fix the repro steps in the report, yes.  But I've found another
> case that produces different results in 2.8 than in 2.7:
>
> Given a repo with files:
>
> dir1/dir2/show/file
> dir1/dir2/hide/file
>
> and a sparse-checkout of
>
> /*
> /dir1/dir2/show
> !/dir1/dir2/

I would say this is "undefined behavior" patterns. The intention of
"!" pattern is to revert a subset of a matched pattern, e.g.
!/dir1/dir2/show/something. Combining lines 2 and 3 together,
"!dir1/dir2/" is not only supposed to revert dir1/dir2/show entirely,
but extend the reversion outside of it. At least to me that's not
intended.

Skipping that tricky pair, the pairs "/*" and "!/dir1/dir2/" means
"exclude everything except dir1/dir2" (in .gitignore sense) or
"include everything except dir1/dir2" in sparse checkout sense. Which
results in empty worktree. 1+2 trips when the trailing slash in the
last rule exists and includes both files in show/hide. Patch 3/2 fixes
the tripping and exclude both. If the last rule is "!/dir1/dir2" then
1+2 results in empty worktree as well.

If I swap the last two rules, then it behaves the way we expect,
dir1/dir2/show stays, dir1/dir2/hide is gone (again, the trailing "/"
in !dir1/dir2/ requires patch 3/2).

v2.7.3 differs in the way "!" is handled. It does extend reversion
outside dir1/dir2/show, back to dir1/dir2. While 2.8+1+2 recognizes
and follows the "/*" and "!dir1/dir2/" pair.

The way I interpreted the rules above, though, may be because I'm just
trying to defend the current code. Junio, your call on whether to
revert this whole patch series.

> the working copy still contains dir1/dir2/hide/file when run from 2.8.0-rc2.
> In git 2.6 and 2.7.3 it does not show up (which is the expected behavior,
> from what I understand of the docs).  Repro script is below.  Notice, the
> 'dir2/' part of the paths is important.  If I drop that directory, the issue
> doesn't repro.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR"
  2016-03-18  5:40           ` Junio C Hamano
@ 2016-03-18  5:51             ` Duy Nguyen
  2016-03-18  5:58             ` Eric Sunshine
  1 sibling, 0 replies; 30+ messages in thread
From: Duy Nguyen @ 2016-03-18  5:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Durham Goode, Mateusz Kwapich

On Fri, Mar 18, 2016 at 12:40 PM, Junio C Hamano <gitster@pobox.com> wrote:
> send-email: detect and offer to skip backup files
>
> Diligent people save output from format-patch to files, proofread
> and edit them and then finally send the result out.  If the
> resulting files are sent out with "git send-email 0*", this ends up
> sending backup files (e.g. 0001-X.patch.backup or 0001-X.patch~)
> left by their editors next to the final version.  Sending them with
> "git send-email 0*.patch" (if format-patch was run with the standard
> suffix) would avoid such an embarrassment, but not everybody is
> careful.
>
> After collecting files to be sent (and sorting them if read from a
> directory), notice when the file being sent out has the same name as
> the previous file, plus some suffix (e.g. 0001-X.patch was sent, and
> we are looking at 0001-X.patch.backup or 0001-X.patch~), and the
> suffix begins with a non-alnum (e.g. ".backup" or "~") and ask if
> the user really wants to send it out.  Once the user skips sending
> such a "backup" file, remember the suffix and stop asking the same
> question (e.g. after skipping 0001-X.patch~, skip 0002-Y.patch~
> without asking).

The problem I see is it's hard to review the to-send list.
git-send-email does list it, but the the dashes in file names make
them really hard to split out words. Maybe highlighting can help, or
maybe we can show subject lines instead of file names.

Back to the patch, another case you can catch is when people have some
leftover patches in current dir, then 0*.patch may include unrelated
patches too. A check on patch numbering (contiguous and no duplicates)
might help.

>  git-send-email.perl | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index d356901..74ed01a 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -621,6 +621,8 @@ sub is_format_patch_arg {
>         push @files, $repo->command('format-patch', '-o', tempdir(CLEANUP => 1), @rev_list_opts);
>  }
>
> +@files = handle_backup_files(@files);
> +
>  if ($validate) {
>         foreach my $f (@files) {
>                 unless (-p $f) {
> @@ -1726,6 +1728,44 @@ sub validate_patch {
>         return;
>  }
>
> +sub handle_backup {
> +       my ($last, $lastlen, $file, $known_suffix) = @_;
> +       my ($suffix, $skip);
> +
> +       $skip = 0;
> +       if (defined $last &&
> +           ($lastlen < length($file)) &&
> +           (substr($file, 0, $lastlen) eq $last) &&
> +           ($suffix = substr($file, $lastlen)) !~ /^[a-z0-9]/i) {
> +               if (defined $known_suffix && $suffix eq $known_suffix) {
> +                       print "Skipping $file with backup suffix '$known_suffix'.\n";
> +                       $skip = 1;
> +               } else {
> +                       my $answer = ask("Do you really want to send $file? (y|N): ",
> +                                        valid_re => qr/^(?:y|n)/i,
> +                                        default => 'y');
> +                       $skip = ($answer ne 'y');
> +                       if ($skip) {
> +                               $known_suffix = $suffix;
> +                       }
> +               }
> +       }
> +       return ($skip, $known_suffix);
> +}
> +
> +sub handle_backup_files {
> +       my @file = @_;
> +       my ($last, $lastlen, $known_suffix, $skip, @result);
> +       for my $file (@file) {
> +               ($skip, $known_suffix) = handle_backup($last, $lastlen,
> +                                                      $file, $known_suffix);
> +               push @result, $file unless $skip;
> +               $last = $file;
> +               $lastlen = length($file);
> +       }
> +       return @result;
> +}
> +
>  sub file_has_nonascii {
>         my $fn = shift;
>         open(my $fh, '<', $fn)
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR"
  2016-03-18  5:40           ` Junio C Hamano
  2016-03-18  5:51             ` Duy Nguyen
@ 2016-03-18  5:58             ` Eric Sunshine
  1 sibling, 0 replies; 30+ messages in thread
From: Eric Sunshine @ 2016-03-18  5:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Duy Nguyen, Git Mailing List, Durham Goode, Mateusz Kwapich

On Fri, Mar 18, 2016 at 1:40 AM, Junio C Hamano <gitster@pobox.com> wrote:
> send-email: detect and offer to skip backup files
>
> Diligent people save output from format-patch to files, proofread
> and edit them and then finally send the result out.  If the
> resulting files are sent out with "git send-email 0*", this ends up
> sending backup files (e.g. 0001-X.patch.backup or 0001-X.patch~)
> left by their editors next to the final version.  Sending them with
> "git send-email 0*.patch" (if format-patch was run with the standard
> suffix) would avoid such an embarrassment, but not everybody is
> careful.
>
> After collecting files to be sent (and sorting them if read from a
> directory), notice when the file being sent out has the same name as
> the previous file, plus some suffix (e.g. 0001-X.patch was sent, and
> we are looking at 0001-X.patch.backup or 0001-X.patch~), and the
> suffix begins with a non-alnum (e.g. ".backup" or "~") and ask if
> the user really wants to send it out.  Once the user skips sending
> such a "backup" file, remember the suffix and stop asking the same
> question (e.g. after skipping 0001-X.patch~, skip 0002-Y.patch~
> without asking).
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> @@ -1726,6 +1728,44 @@ sub validate_patch {
> +sub handle_backup {
> +       my ($last, $lastlen, $file, $known_suffix) = @_;

Is $lastlen a micro-optimization or does it have some other purpose
I'm overlooking?

> +       my ($suffix, $skip);
> +
> +       $skip = 0;
> +       if (defined $last &&
> +           ($lastlen < length($file)) &&
> +           (substr($file, 0, $lastlen) eq $last) &&
> +           ($suffix = substr($file, $lastlen)) !~ /^[a-z0-9]/i) {
> +               if (defined $known_suffix && $suffix eq $known_suffix) {
> +                       print "Skipping $file with backup suffix '$known_suffix'.\n";
> +                       $skip = 1;
> +               } else {
> +                       my $answer = ask("Do you really want to send $file? (y|N): ",
> +                                        valid_re => qr/^(?:y|n)/i,
> +                                        default => 'y');

Is this 'default' correct or am I reading the code incorrectly?

> +                       $skip = ($answer ne 'y');
> +                       if ($skip) {
> +                               $known_suffix = $suffix;
> +                       }
> +               }
> +       }
> +       return ($skip, $known_suffix);
> +}
> +
> +sub handle_backup_files {
> +       my @file = @_;
> +       my ($last, $lastlen, $known_suffix, $skip, @result);
> +       for my $file (@file) {
> +               ($skip, $known_suffix) = handle_backup($last, $lastlen,
> +                                                      $file, $known_suffix);
> +               push @result, $file unless $skip;
> +               $last = $file;
> +               $lastlen = length($file);
> +       }
> +       return @result;
> +}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR"
  2016-03-18  5:40           ` Duy Nguyen
@ 2016-03-18  6:21             ` Durham Goode
  2016-03-18  6:28               ` Duy Nguyen
  2016-03-18 18:00             ` Junio C Hamano
  1 sibling, 1 reply; 30+ messages in thread
From: Durham Goode @ 2016-03-18  6:21 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Git Mailing List, Mateusz Jakub Kwapich



On 3/17/16 10:40 PM, Duy Nguyen wrote:
> On Fri, Mar 18, 2016 at 11:51 AM, Durham Goode <durham@fb.com> wrote:
>> On 3/17/16 4:49 PM, Junio C Hamano wrote:
>>> Thanks for these 5 patches, two of which need to be discarded ;-).
>>> I think you can pick either one of 1/2, pick the one that says
>>> "non-NULL" (as opposed to "something") in the log message for 2/2.
>>>
>>> Durham, does it fix your issues if you apply the 1/2 and 2/2 (but
>>> not 3/2) on top of 2.8-rc?
>>>
>>> Duy, how comfortable are you with the idea of including this two in
>>> 2.8 final?  We have long passed the final -rc, and while it is
>>> probably OK to prolong the cycle and do another -rc, we cannot keep
>>> going like "oops, there is another thing discovered by somebod new"
>>> forever.
>>>
>>> Thanks.
>> Patches 1+2 fix the repro steps in the report, yes.  But I've found another
>> case that produces different results in 2.8 than in 2.7:
>>
>> Given a repo with files:
>>
>> dir1/dir2/show/file
>> dir1/dir2/hide/file
>>
>> and a sparse-checkout of
>>
>> /*
>> /dir1/dir2/show
>> !/dir1/dir2/
> I would say this is "undefined behavior" patterns. The intention of
> "!" pattern is to revert a subset of a matched pattern, e.g.
> !/dir1/dir2/show/something. Combining lines 2 and 3 together,
> "!dir1/dir2/" is not only supposed to revert dir1/dir2/show entirely,
> but extend the reversion outside of it. At least to me that's not
> intended.
After looking at it further, I agree with you.  The new behavior is as 
justifiable as the old. In addition, my description of this secondary 
issue was wrong (I said it left the files around, when in reality it 
resulted in an empty working copy, so git read-tree failed and left the 
files).  So we can ignore that case. Thanks.
>
> Skipping that tricky pair, the pairs "/*" and "!/dir1/dir2/" means
> "exclude everything except dir1/dir2" (in .gitignore sense) or
> "include everything except dir1/dir2" in sparse checkout sense. Which
> results in empty worktree. 1+2 trips when the trailing slash in the
> last rule exists and includes both files in show/hide. Patch 3/2 fixes
> the tripping and exclude both. If the last rule is "!/dir1/dir2" then
> 1+2 results in empty worktree as well.
>
I'm not sure I fully understand.  Here's what I'm seeing, with patch 1, 
2, and 3 applied:  with patterns "/*', "!/dir1/dir2", "/dir1/dir2/show", 
I see the contents of dir1/dir2/show/ (good). If I add a trailing slash 
to the last pattern (so it becomes "/dir1/dir2/show/"), it now results 
in an empty working copy. That seems funky, given that the last rule is 
to include that directory. Am I misunderstanding the trailing slash?

We're pretty far beyond my ability to understand ignore patterns now.  
The patches seem to make the behavior better though, so thanks for that.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR"
  2016-03-18  6:21             ` Durham Goode
@ 2016-03-18  6:28               ` Duy Nguyen
  0 siblings, 0 replies; 30+ messages in thread
From: Duy Nguyen @ 2016-03-18  6:28 UTC (permalink / raw)
  To: Durham Goode; +Cc: Junio C Hamano, Git Mailing List, Mateusz Jakub Kwapich

On Fri, Mar 18, 2016 at 1:21 PM, Durham Goode <durham@fb.com> wrote:
>> Skipping that tricky pair, the pairs "/*" and "!/dir1/dir2/" means
>> "exclude everything except dir1/dir2" (in .gitignore sense) or
>> "include everything except dir1/dir2" in sparse checkout sense. Which
>> results in empty worktree. 1+2 trips when the trailing slash in the
>> last rule exists and includes both files in show/hide. Patch 3/2 fixes
>> the tripping and exclude both. If the last rule is "!/dir1/dir2" then
>> 1+2 results in empty worktree as well.
>>
> I'm not sure I fully understand.  Here's what I'm seeing, with patch 1, 2,
> and 3 applied:  with patterns "/*', "!/dir1/dir2", "/dir1/dir2/show", I see
> the contents of dir1/dir2/show/ (good). If I add a trailing slash to the
> last pattern (so it becomes "/dir1/dir2/show/"), it now results in an empty
> working copy. That seems funky, given that the last rule is to include that
> directory. Am I misunderstanding the trailing slash?

Ah.. I meant the last slash in !dir1/dir2/, not in dir1/dir2/show. But
that's interesting, will need to run some more tests.

> We're pretty far beyond my ability to understand ignore patterns now.

I'm right there with you. I'm starting to fear that I don't (and
probably won't) understand this thing.
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: bug: sparse config interpretation incorrectness in 2.8.0-rc2
  2016-03-17 14:00             ` Duy Nguyen
@ 2016-03-18 15:49               ` Johannes Schindelin
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Schindelin @ 2016-03-18 15:49 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Junio C Hamano, Durham Goode, Git Mailing List, Mateusz Kwapich

Hi Duy,

On Thu, 17 Mar 2016, Duy Nguyen wrote:

> On Thu, Mar 17, 2016 at 8:46 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > Unfortunately, this does not help me at all. In the use case I am trying
> > to get to work fast, we have tons and tons of directories and need *one*
> > file in pretty much *all* of those directories, and exclude most of the
> > other files.
> >
> > To make matters even worse, the list of excluded (or included) files is
> > constantly changing.
> 
> OK very weird use case to me :) There's something you might try. [1]
> can help trimming unrelated patterns, fewer patterns to match for a
> dir, faster matching.

Sadly, [1] (exclude: filter out patterns not applicable to the current
directory) would not help at all, because the regular use case really
would use the top-level directory as current one.

So I really need to speed up the sparse machinery from O(m*n) (where m is
the number of entries in the sparse-checkout file and n is the number of
files in the index).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic
  2016-03-17 12:45 ` [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic Nguyễn Thái Ngọc Duy
                     ` (2 preceding siblings ...)
  2016-03-17 12:45   ` Nguyễn Thái Ngọc Duy
@ 2016-03-18 17:38   ` Junio C Hamano
  3 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2016-03-18 17:38 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, durham, mitrandir

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> The topic in question introduces "sticky" path list for this purpose:
> given a path 'abc/def', if 'abc' already matches a pattern X, it's added
> to X's sticky path list. When we need to check if 'abc/def' matches
> pattern X and see that 'abc' is already in the list, we conclude right
> away that 'abc/def' also matches X.

It took me two reads to realize that I can get what this paragraph
wants to say if I ignore "given a path 'abc/def'" at the beginning.
In short, if we already know that a directory 'abc' matches a
pattern, the pattern remembers that fact in the list and further
queries about anything in that directory, e.g. 'abc/def', is
answered with "we already know abc matches, so it also does match".

"Sticky" is probably better understood if called "ancestor directory".

> The short reason (*) for sticky path list is to workaround limitations
> of matching code that will return "not match" when we compare
> 'abc/def' and pattern X.

If one asks about 'abc/def' first, you ask "does '*' match
'abc/def'?"--if there is a limitation with the matching code, this
list would not work at all as a workaround.  But we can safely
assume that before asking about 'abc/def' we would always ask about
'abc', so it is OK.  Is that what happens here?

> The bug is in this code. Not only it does "when we need to check if
> 'abc/def' matches...", it does an extra thing: if 'foo/bar' is _not_ in
> the list, return 'not matched' by bypassing all matching code with the
> "continue;" statement. It should let the remaining code decide match
> status instead.

So once a pattern has a non-empty "sticky" list after examining
'abc' and if you ask about 'foo', because it is not in the list and
instead of saying "it is not in the list, so we need to try matching
the pattern against 'foo' to decide if it matches", it incorrectly
says "'foo' does not match".  Is that what happens here?

An abrupt appearance of 'foo/bar' in the description made me read
the above three times and that is why I dropped '/bar' part in the
above.  The correctness of the 'sticky' hack seems to depend on the
assumption that we would always ask about 'foo' before asking about
'foo/bar', so the scenario presented with 'foo/bar' is implausible;
even when asking about 'foo/bar', we would have 'foo' in the list,
no?

The remainder, assuming that I read the above correctly, made sense
to me and justifies what the update code does.

Thanks.

> This bug affects both .gitignore and sparse checkout, but it's reported
> as a sparse checkout bug, so let's examine how it happens. The
> sparse-checkout pattern has two rules
>
>     /*
>     !one/hideme
>
> and the worktree has three tracked files, one/hideme, one/showme and
> two/showme. What happens is this
>
> * check "one", it matches the first pattern -> positive -> keep
>   examining.
>
> *1* "thanks" to 'nd/exclusion-regression-fix' we detect this pair of
>   patterns, so we put "one" in the sticky list of pattern "/*"
>
> * enter "one", check "one/hideme", it matches the second pattern
>   first (we search from bottom up) -> negative -> excluded
>
> * check "one/showme", it does not match the second pattern.
>
> *2* We then check it against the first pattern and notice the sticky list
>   that includes "one", so we decide right away that "one/showme" is
>   included.
>
> * leave "one", check "two" which does not match the second pattern.
>
> *3* then we check "two" against the first pattern and notice that this
>    pattern has a non-empty sticky list, which contains "one", not "two".
>    This bug kicks in and bypasses the true matching logic for pattern
>    "/*". As a result, we exclude "two/showme".
>
> One may notice that the order of these steps matter. If *3* occurs
> before *1*, then the sticky list at that moment is empty and the bug
> does not kick in. Sparse checkout always examines entries in
> alphabetical order, so "abc/showme" would be examined before "one" and
> not hit this bug!
>
> The last remark is important when we move to .gitignore. We receive the
> list of entries with readdir() and cannot control the order of
> entries. Which means we can't write a test for .gitignore that will
> reliably fail without this fix. Which is why this patch only adds a test
> for sparse checkout, even though the same above steps happen to
> .gitignore.
>
> (*) The problem is known and will be fixed later and described in
> detail then. For this commit, it's sufficient to see the following
> link because the long reason is really long:
>
> http://article.gmane.org/gmane.comp.version-control.git/288479
>
> Reported-by: Durham Goode <durham@fb.com>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  dir.c                                |  1 -
>  t/t1011-read-tree-sparse-checkout.sh | 20 ++++++++++++++++++++
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index 69e0be6..77f38a5 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1027,7 +1027,6 @@ static struct exclude *last_exclude_matching_from_list(const char *pathname,
>  				exc = x;
>  				break;
>  			}
> -			continue;
>  		}
>  
>  		if (x->flags & EXC_FLAG_MUSTBEDIR) {
> diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
> index 0c74bee..ecc5e93 100755
> --- a/t/t1011-read-tree-sparse-checkout.sh
> +++ b/t/t1011-read-tree-sparse-checkout.sh
> @@ -274,4 +274,24 @@ test_expect_success 'checkout with --ignore-skip-worktree-bits' '
>  	git diff --exit-code HEAD
>  '
>  
> +test_expect_success 'sparse checkout and dir.c sticky bits' '
> +	git init sticky &&
> +	(
> +		cd sticky &&
> +		mkdir one two &&
> +		touch one/hideme one/showme two/showme &&
> +		git add . &&
> +		git commit -m initial &&
> +		cat >.git/info/sparse-checkout <<-\EOF &&
> +		/*
> +		!one/hideme
> +		EOF
> +		git config core.sparsecheckout true &&
> +		git checkout &&
> +		test_path_is_missing one/hideme &&
> +		test_path_is_file    one/showme &&
> +		test_path_is_file    two/showme
> +	)
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR"
  2016-03-18  5:40           ` Duy Nguyen
  2016-03-18  6:21             ` Durham Goode
@ 2016-03-18 18:00             ` Junio C Hamano
  2016-03-18 18:37               ` Extending this cycle by a week and reverting !reinclusion topic Junio C Hamano
  2016-03-19  1:03               ` [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR" Duy Nguyen
  1 sibling, 2 replies; 30+ messages in thread
From: Junio C Hamano @ 2016-03-18 18:00 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Durham Goode, Git Mailing List, Mateusz Jakub Kwapich

Duy Nguyen <pclouds@gmail.com> writes:

> v2.7.3 differs in the way "!" is handled. It does extend reversion
> outside dir1/dir2/show, back to dir1/dir2. While 2.8+1+2 recognizes
> and follows the "/*" and "!dir1/dir2/" pair.
>
> The way I interpreted the rules above, though, may be because I'm just
> trying to defend the current code. Junio, your call on whether to
> revert this whole patch series.

It seems that the more you dig into piling changes on this topic
there is somebody else who comes back with yet another corner case.

At this point in the release cycle, I am inclined to say that we
should revert the whole thing from 2.8-rc.  Even with that plan, we
probably need an extra rc as the reverting a topic this late in the
cycle is something I feel uncomfortable with.

What is the ultimate goal of nd/exclusion-regression-fix topic over
the behaviour before 2.7.0 and after 2.7.1 (there was a regression
in 2.7.0 that was reverted in 2.7.1)?  From the cover letter of the
series:

    Take one was bad and reverted in commit 8c72236. Take two provides a
    more complete solution to the pair of rules

      exclude/this
      !exclude/this/except/this

    3/4 should do a better job at stopping regressions in take 1. 4/4
    provides the solution. I think I have tested (and wrote tests) for all
    the cases I can imagine.

"solution to the pair of rules" hints there are some problem in the
pair of rules, without stating what it is trying to solve.

Isn't the root cause of the issue that treat_one_path() when
deciding if it is worth descending into a directory check if the
directory itself is excluded and returns path_excluded, even if some
paths inside it may have a countermanding ! entries that would match
them?  What if we change that part smarter and allow it to descend?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Extending this cycle by a week and reverting !reinclusion topic
  2016-03-18 18:00             ` Junio C Hamano
@ 2016-03-18 18:37               ` Junio C Hamano
  2016-03-19  1:03               ` [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR" Duy Nguyen
  1 sibling, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2016-03-18 18:37 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Duy Nguyen

It seems that people are still finding regressions in the topic for
this item in the 2.8-rc3:

 * Another try to improve the ignore mechanism that lets you say
   "this is excluded" and then later say "oh, no, this part (that is
   a subset of the previous part) is not excluded".  This has still
   a known limitation, though.

Known limitations in a new feature are somewhat OK, but I am getting
an impression that this is not nearly complete yet and it needs more
time to be tested before it inflicts pain to the general public.

I've reverted the offending changes from 'master' and will be
tagging an extra -rc (2.8-rc4) early next week, as reverting a topic
this late in the cycle is something I feel uncomfortable with.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR"
  2016-03-18 18:00             ` Junio C Hamano
  2016-03-18 18:37               ` Extending this cycle by a week and reverting !reinclusion topic Junio C Hamano
@ 2016-03-19  1:03               ` Duy Nguyen
  1 sibling, 0 replies; 30+ messages in thread
From: Duy Nguyen @ 2016-03-19  1:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Durham Goode, Git Mailing List, Mateusz Jakub Kwapich

On Sat, Mar 19, 2016 at 1:00 AM, Junio C Hamano <gitster@pobox.com> wrote:
> What is the ultimate goal of nd/exclusion-regression-fix topic over
> the behaviour before 2.7.0 and after 2.7.1 (there was a regression
> in 2.7.0 that was reverted in 2.7.1)?  From the cover letter of the
> series:
>
>     Take one was bad and reverted in commit 8c72236. Take two provides a
>     more complete solution to the pair of rules
>
>       exclude/this
>       !exclude/this/except/this
>
>     3/4 should do a better job at stopping regressions in take 1. 4/4
>     provides the solution. I think I have tested (and wrote tests) for all
>     the cases I can imagine.
>
> "solution to the pair of rules" hints there are some problem in the
> pair of rules, without stating what it is trying to solve.
>
> Isn't the root cause of the issue that treat_one_path() when
> deciding if it is worth descending into a directory check if the
> directory itself is excluded and returns path_excluded, even if some
> paths inside it may have a countermanding ! entries that would match
> them?  What if we change that part smarter and allow it to descend?

That's the easy part, detecting a pair and continue descending. The
problem is after you descend, exclude engine may return contradicting
results on old patterns. It's the same thing that 3/2 describes (after
you change patterns from "!dir\ndir/file2" to "dir\n!dir/file2").
-- 
Duy

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-03-19  1:04 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-17  0:09 bug: sparse config interpretation incorrectness in 2.8.0-rc2 Durham Goode
2016-03-17  0:56 ` Duy Nguyen
2016-03-17  6:49   ` Durham Goode
2016-03-17  7:51   ` Junio C Hamano
2016-03-17 10:17     ` Duy Nguyen
2016-03-17 13:04       ` Johannes Schindelin
2016-03-17 13:20         ` Duy Nguyen
2016-03-17 13:46           ` Johannes Schindelin
2016-03-17 14:00             ` Duy Nguyen
2016-03-18 15:49               ` Johannes Schindelin
2016-03-17  7:22 ` Junio C Hamano
2016-03-17 17:51   ` Durham Goode
2016-03-17 12:45 ` [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic Nguyễn Thái Ngọc Duy
2016-03-17 12:45   ` Nguyễn Thái Ngọc Duy
2016-03-17 12:54     ` [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR" Nguyễn Thái Ngọc Duy
2016-03-17 23:49       ` Junio C Hamano
2016-03-18  0:15         ` Duy Nguyen
2016-03-18  5:40           ` Junio C Hamano
2016-03-18  5:51             ` Duy Nguyen
2016-03-18  5:58             ` Eric Sunshine
2016-03-18  4:51         ` Durham Goode
2016-03-18  5:40           ` Duy Nguyen
2016-03-18  6:21             ` Durham Goode
2016-03-18  6:28               ` Duy Nguyen
2016-03-18 18:00             ` Junio C Hamano
2016-03-18 18:37               ` Extending this cycle by a week and reverting !reinclusion topic Junio C Hamano
2016-03-19  1:03               ` [PATCH 3/2] dir.c: fix dir re-inclusion rules with "NODIR" and "MUSTBEDIR" Duy Nguyen
2016-03-17 12:45   ` [PATCH 2/2] dir.c: correct "stuck" logging logic Nguyễn Thái Ngọc Duy
2016-03-17 12:45   ` Nguyễn Thái Ngọc Duy
2016-03-18 17:38   ` [PATCH 1/2] dir.c: fix bug in 'nd/exclusion-regression-fix' topic Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.