git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Question] Unicode weirdness breaking tests on ZFS?
@ 2021-11-17 15:17 Derrick Stolee
  2021-11-17 15:41 ` Ævar Arnfjörð Bjarmason
  2021-11-17 16:12 ` Torsten Bögershausen
  0 siblings, 2 replies; 10+ messages in thread
From: Derrick Stolee @ 2021-11-17 15:17 UTC (permalink / raw)
  To: Git Mailing List

I recently had to pave my Linux machine, so I updated it to Ubuntu
21.10 and had the choice to start using the ZFS filesystem. I thought,
"Why not?" but now I maybe see why.

Running the Git test suite at the v2.34.0 tag on my machine results in
these failures:

t0050-filesystem.sh                   (Wstat: 0 Tests: 11 Failed: 0)
  TODO passed:   9-10
t0021-conversion.sh                   (Wstat: 256 Tests: 41 Failed: 1)
  Failed test:  31
  Non-zero exit status: 1
t3910-mac-os-precompose.sh            (Wstat: 256 Tests: 25 Failed: 10)
  Failed tests:  1, 4, 6, 8, 11-16
  TODO passed:   23
  Non-zero exit status: 1

These are all related to the UTF8_NFD_TO_NFC prereq.

Zooming in on t0050, these tests are marked as "test_expect_failure" due
to an assignment of $test_unicode using the UTF8_NFD_TO_NFC prereq:


$test_unicode 'rename (silent unicode normalization)' '
	git mv "$aumlcdiar" "$auml" &&
	git commit -m rename
'

$test_unicode 'merge (silent unicode normalization)' '
	git reset --hard initial &&
	git merge topic
'


The prereq creates two files using unicode characters that could
collapse to equivalent meanings:


test_lazy_prereq UTF8_NFD_TO_NFC '
	# check whether FS converts nfd unicode to nfc
	auml=$(printf "\303\244")
	aumlcdiar=$(printf "\141\314\210")
	>"$auml" &&
	test -f "$aumlcdiar"
'


What I see in that first test, the 'git mv' does change the
index, but the filesystem thinks the files are the same. This
may mean that our 'git add "$aumlcdiar"' from an earlier test
is providing a non-equivalence in the index, and the 'git mv'
changes the index without causing any issues in the filesystem.

It reminds me as if we used 'git mv README readme' on a case-
insensitive filesystem. Is this not a similar situation?

What I'm trying to gather is that maybe this test is flawed?
Or maybe something broke (or never worked?) in how we use
'git add' to not get the canonical unicode from the filesystem?

The other tests all have similar interactions with 'git add'.
I'm hoping that these are just test bugs, and not actually a
functionality issue in Git. Yes, it is confusing that we can
change the unicode of a file in the index without the filesystem
understanding the difference, but that is very similar to how
case-insensitive filesystems work and I don't know what else we
would do here.

These filesystem/unicode things are out of my expertise, so
hopefully someone else has a clearer idea of what is going on.
I'm happy to be a test bed, or even attempt producing patches
to fix the issue once we have that clarity.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] Unicode weirdness breaking tests on ZFS?
  2021-11-17 15:17 [Question] Unicode weirdness breaking tests on ZFS? Derrick Stolee
@ 2021-11-17 15:41 ` Ævar Arnfjörð Bjarmason
  2021-11-17 16:12 ` Torsten Bögershausen
  1 sibling, 0 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-17 15:41 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List


On Wed, Nov 17 2021, Derrick Stolee wrote:

> I recently had to pave my Linux machine, so I updated it to Ubuntu
> 21.10 and had the choice to start using the ZFS filesystem. I thought,
> "Why not?" but now I maybe see why.
>
> Running the Git test suite at the v2.34.0 tag on my machine results in
> these failures:
>
> t0050-filesystem.sh                   (Wstat: 0 Tests: 11 Failed: 0)
>   TODO passed:   9-10
> t0021-conversion.sh                   (Wstat: 256 Tests: 41 Failed: 1)
>   Failed test:  31
>   Non-zero exit status: 1
> t3910-mac-os-precompose.sh            (Wstat: 256 Tests: 25 Failed: 10)
>   Failed tests:  1, 4, 6, 8, 11-16
>   TODO passed:   23
>   Non-zero exit status: 1
>
> These are all related to the UTF8_NFD_TO_NFC prereq.
>
> Zooming in on t0050, these tests are marked as "test_expect_failure" due
> to an assignment of $test_unicode using the UTF8_NFD_TO_NFC prereq:
>
>
> $test_unicode 'rename (silent unicode normalization)' '
> 	git mv "$aumlcdiar" "$auml" &&
> 	git commit -m rename
> '
>
> $test_unicode 'merge (silent unicode normalization)' '
> 	git reset --hard initial &&
> 	git merge topic
> '
>
>
> The prereq creates two files using unicode characters that could
> collapse to equivalent meanings:
>
>
> test_lazy_prereq UTF8_NFD_TO_NFC '
> 	# check whether FS converts nfd unicode to nfc
> 	auml=$(printf "\303\244")
> 	aumlcdiar=$(printf "\141\314\210")
> 	>"$auml" &&
> 	test -f "$aumlcdiar"
> '
>
>
> What I see in that first test, the 'git mv' does change the
> index, but the filesystem thinks the files are the same. This
> may mean that our 'git add "$aumlcdiar"' from an earlier test
> is providing a non-equivalence in the index, and the 'git mv'
> changes the index without causing any issues in the filesystem.
>
> It reminds me as if we used 'git mv README readme' on a case-
> insensitive filesystem. Is this not a similar situation?
>
> What I'm trying to gather is that maybe this test is flawed?
> Or maybe something broke (or never worked?) in how we use
> 'git add' to not get the canonical unicode from the filesystem?
>
> The other tests all have similar interactions with 'git add'.
> I'm hoping that these are just test bugs, and not actually a
> functionality issue in Git. Yes, it is confusing that we can
> change the unicode of a file in the index without the filesystem
> understanding the difference, but that is very similar to how
> case-insensitive filesystems work and I don't know what else we
> would do here.
>
> These filesystem/unicode things are out of my expertise, so
> hopefully someone else has a clearer idea of what is going on.
> I'm happy to be a test bed, or even attempt producing patches
> to fix the issue once we have that clarity.

I haven't used ZFS, but this points to non-POSIX behavior on the FS
itself. It looks like tweaking the "normalization" property might change
it, see: https://manpages.ubuntu.com/manpages/eoan/man8/zfs.8.html

There's also "casesensitivity" and "utf8only".

We probably don't want to invoke some ZFS command on every test to
interrogate this, but if we can pass it down from GIT-BUILD-OPTIONS or
similar then we could have a test prereq check this.

Or perhaps it's as simple as changing the "UTF8_NFD_TO_NFC" prereq from
doing a "test -f" to e.g. "echo *" and seeing what it gets back. Perhaps
ZFS says "yes" to "it exists?" but when doing a readdir() it will
canonicalize?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] Unicode weirdness breaking tests on ZFS?
  2021-11-17 15:17 [Question] Unicode weirdness breaking tests on ZFS? Derrick Stolee
  2021-11-17 15:41 ` Ævar Arnfjörð Bjarmason
@ 2021-11-17 16:12 ` Torsten Bögershausen
  2021-11-17 17:06   ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
  1 sibling, 1 reply; 10+ messages in thread
From: Torsten Bögershausen @ 2021-11-17 16:12 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List

On Wed, Nov 17, 2021 at 10:17:53AM -0500, Derrick Stolee wrote:
> I recently had to pave my Linux machine, so I updated it to Ubuntu
> 21.10 and had the choice to start using the ZFS filesystem. I thought,
> "Why not?" but now I maybe see why.
>
> Running the Git test suite at the v2.34.0 tag on my machine results in
> these failures:
>
> t0050-filesystem.sh                   (Wstat: 0 Tests: 11 Failed: 0)
>   TODO passed:   9-10
> t0021-conversion.sh                   (Wstat: 256 Tests: 41 Failed: 1)
>   Failed test:  31
>   Non-zero exit status: 1
> t3910-mac-os-precompose.sh            (Wstat: 256 Tests: 25 Failed: 10)
>   Failed tests:  1, 4, 6, 8, 11-16
>   TODO passed:   23
>   Non-zero exit status: 1
>
> These are all related to the UTF8_NFD_TO_NFC prereq.
>
> Zooming in on t0050, these tests are marked as "test_expect_failure" due
> to an assignment of $test_unicode using the UTF8_NFD_TO_NFC prereq:
>
>
> $test_unicode 'rename (silent unicode normalization)' '
> 	git mv "$aumlcdiar" "$auml" &&
> 	git commit -m rename
> '
>
> $test_unicode 'merge (silent unicode normalization)' '
> 	git reset --hard initial &&
> 	git merge topic
> '
>
>
> The prereq creates two files using unicode characters that could
> collapse to equivalent meanings:
>
>
> test_lazy_prereq UTF8_NFD_TO_NFC '
> 	# check whether FS converts nfd unicode to nfc
> 	auml=$(printf "\303\244")
> 	aumlcdiar=$(printf "\141\314\210")
> 	>"$auml" &&
> 	test -f "$aumlcdiar"
> '
>
>
> What I see in that first test, the 'git mv' does change the
> index, but the filesystem thinks the files are the same. This
> may mean that our 'git add "$aumlcdiar"' from an earlier test
> is providing a non-equivalence in the index, and the 'git mv'
> changes the index without causing any issues in the filesystem.
>
> It reminds me as if we used 'git mv README readme' on a case-
> insensitive filesystem. Is this not a similar situation?
>
> What I'm trying to gather is that maybe this test is flawed?
> Or maybe something broke (or never worked?) in how we use
> 'git add' to not get the canonical unicode from the filesystem?
>
> The other tests all have similar interactions with 'git add'.
> I'm hoping that these are just test bugs, and not actually a
> functionality issue in Git. Yes, it is confusing that we can
> change the unicode of a file in the index without the filesystem
> understanding the difference, but that is very similar to how
> case-insensitive filesystems work and I don't know what else we
> would do here.
>
> These filesystem/unicode things are out of my expertise, so
> hopefully someone else has a clearer idea of what is going on.
> I'm happy to be a test bed, or even attempt producing patches
> to fix the issue once we have that clarity.
>
> Thanks,
> -Stolee

Interesting.
The tests have always been working on HFS+, then we got
APFS (and needed a small fix) and now ZFS.

I'll can have a look - just installing in a virtual machine.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] Unicode weirdness breaking tests on ZFS?
  2021-11-17 16:12 ` Torsten Bögershausen
@ 2021-11-17 17:06   ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
  2021-11-17 17:39     ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
  0 siblings, 1 reply; 10+ messages in thread
From: Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= @ 2021-11-17 17:06 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List

On Wed, Nov 17, 2021 at 05:12:26PM +0100, Torsten B??gershausen wrote:
> On Wed, Nov 17, 2021 at 10:17:53AM -0500, Derrick Stolee wrote:
> > I recently had to pave my Linux machine, so I updated it to Ubuntu
> > 21.10 and had the choice to start using the ZFS filesystem. I thought,
> > "Why not?" but now I maybe see why.
> >
> > Running the Git test suite at the v2.34.0 tag on my machine results in
> > these failures:
> >
> > t0050-filesystem.sh                   (Wstat: 0 Tests: 11 Failed: 0)
> >   TODO passed:   9-10
> > t0021-conversion.sh                   (Wstat: 256 Tests: 41 Failed: 1)
> >   Failed test:  31
> >   Non-zero exit status: 1
> > t3910-mac-os-precompose.sh            (Wstat: 256 Tests: 25 Failed: 10)
> >   Failed tests:  1, 4, 6, 8, 11-16
> >   TODO passed:   23
> >   Non-zero exit status: 1
> >
> > These are all related to the UTF8_NFD_TO_NFC prereq.
> >
> > Zooming in on t0050, these tests are marked as "test_expect_failure" due
> > to an assignment of $test_unicode using the UTF8_NFD_TO_NFC prereq:
> >
> >
> > $test_unicode 'rename (silent unicode normalization)' '
> > 	git mv "$aumlcdiar" "$auml" &&
> > 	git commit -m rename
> > '
> >
> > $test_unicode 'merge (silent unicode normalization)' '
> > 	git reset --hard initial &&
> > 	git merge topic
> > '
> >
> >
> > The prereq creates two files using unicode characters that could
> > collapse to equivalent meanings:
> >
> >
> > test_lazy_prereq UTF8_NFD_TO_NFC '
> > 	# check whether FS converts nfd unicode to nfc
> > 	auml=$(printf "\303\244")
> > 	aumlcdiar=$(printf "\141\314\210")
> > 	>"$auml" &&
> > 	test -f "$aumlcdiar"
> > '
> >
> >
> > What I see in that first test, the 'git mv' does change the
> > index, but the filesystem thinks the files are the same. This
> > may mean that our 'git add "$aumlcdiar"' from an earlier test
> > is providing a non-equivalence in the index, and the 'git mv'
> > changes the index without causing any issues in the filesystem.
> >
> > It reminds me as if we used 'git mv README readme' on a case-
> > insensitive filesystem. Is this not a similar situation?
> >
> > What I'm trying to gather is that maybe this test is flawed?
> > Or maybe something broke (or never worked?) in how we use
> > 'git add' to not get the canonical unicode from the filesystem?
> >
> > The other tests all have similar interactions with 'git add'.
> > I'm hoping that these are just test bugs, and not actually a
> > functionality issue in Git. Yes, it is confusing that we can
> > change the unicode of a file in the index without the filesystem
> > understanding the difference, but that is very similar to how
> > case-insensitive filesystems work and I don't know what else we
> > would do here.
> >
> > These filesystem/unicode things are out of my expertise, so
> > hopefully someone else has a clearer idea of what is going on.
> > I'm happy to be a test bed, or even attempt producing patches
> > to fix the issue once we have that clarity.
> >
> > Thanks,
> > -Stolee
>
> Interesting.
> The tests have always been working on HFS+, then we got
> APFS (and needed a small fix) and now ZFS.
>
> I'll can have a look - just installing in a virtual machine.

So, the virtual machine is up-and-running.

I got 2 messages:

ok 9 - rename (silent unicode normalization) # TODO known breakage vanished
ok 10 - merge (silent unicode normalization) # TODO known breakage vanished

Do you get the same ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] Unicode weirdness breaking tests on ZFS?
  2021-11-17 17:06   ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
@ 2021-11-17 17:39     ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
  2021-11-17 18:29       ` Derrick Stolee
  0 siblings, 1 reply; 10+ messages in thread
From: Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= @ 2021-11-17 17:39 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 4634 bytes --]

On Wed, Nov 17, 2021 at 06:06:13PM +0100, Torsten B??gershausen wrote:
> On Wed, Nov 17, 2021 at 05:12:26PM +0100, Torsten B??gershausen wrote:
> > On Wed, Nov 17, 2021 at 10:17:53AM -0500, Derrick Stolee wrote:
> > > I recently had to pave my Linux machine, so I updated it to Ubuntu
> > > 21.10 and had the choice to start using the ZFS filesystem. I thought,
> > > "Why not?" but now I maybe see why.
> > >
> > > Running the Git test suite at the v2.34.0 tag on my machine results in
> > > these failures:
> > >
> > > t0050-filesystem.sh                   (Wstat: 0 Tests: 11 Failed: 0)
> > >   TODO passed:   9-10
> > > t0021-conversion.sh                   (Wstat: 256 Tests: 41 Failed: 1)
> > >   Failed test:  31
> > >   Non-zero exit status: 1
> > > t3910-mac-os-precompose.sh            (Wstat: 256 Tests: 25 Failed: 10)
> > >   Failed tests:  1, 4, 6, 8, 11-16
> > >   TODO passed:   23
> > >   Non-zero exit status: 1
> > >
> > > These are all related to the UTF8_NFD_TO_NFC prereq.
> > >
> > > Zooming in on t0050, these tests are marked as "test_expect_failure" due
> > > to an assignment of $test_unicode using the UTF8_NFD_TO_NFC prereq:
> > >
> > >
> > > $test_unicode 'rename (silent unicode normalization)' '
> > > 	git mv "$aumlcdiar" "$auml" &&
> > > 	git commit -m rename
> > > '
> > >
> > > $test_unicode 'merge (silent unicode normalization)' '
> > > 	git reset --hard initial &&
> > > 	git merge topic
> > > '
> > >
> > >
> > > The prereq creates two files using unicode characters that could
> > > collapse to equivalent meanings:
> > >
> > >
> > > test_lazy_prereq UTF8_NFD_TO_NFC '
> > > 	# check whether FS converts nfd unicode to nfc
> > > 	auml=$(printf "\303\244")
> > > 	aumlcdiar=$(printf "\141\314\210")
> > > 	>"$auml" &&
> > > 	test -f "$aumlcdiar"
> > > '
> > >
> > >
> > > What I see in that first test, the 'git mv' does change the
> > > index, but the filesystem thinks the files are the same. This
> > > may mean that our 'git add "$aumlcdiar"' from an earlier test
> > > is providing a non-equivalence in the index, and the 'git mv'
> > > changes the index without causing any issues in the filesystem.
> > >
> > > It reminds me as if we used 'git mv README readme' on a case-
> > > insensitive filesystem. Is this not a similar situation?
> > >
> > > What I'm trying to gather is that maybe this test is flawed?
> > > Or maybe something broke (or never worked?) in how we use
> > > 'git add' to not get the canonical unicode from the filesystem?
> > >
> > > The other tests all have similar interactions with 'git add'.
> > > I'm hoping that these are just test bugs, and not actually a
> > > functionality issue in Git. Yes, it is confusing that we can
> > > change the unicode of a file in the index without the filesystem
> > > understanding the difference, but that is very similar to how
> > > case-insensitive filesystems work and I don't know what else we
> > > would do here.
> > >
> > > These filesystem/unicode things are out of my expertise, so
> > > hopefully someone else has a clearer idea of what is going on.
> > > I'm happy to be a test bed, or even attempt producing patches
> > > to fix the issue once we have that clarity.
> > >
> > > Thanks,
> > > -Stolee
> >
> > Interesting.
> > The tests have always been working on HFS+, then we got
> > APFS (and needed a small fix) and now ZFS.
> >
> > I'll can have a look - just installing in a virtual machine.
>
> So, the virtual machine is up-and-running.
>
> I got 2 messages:
>
> ok 9 - rename (silent unicode normalization) # TODO known breakage vanished
> ok 10 - merge (silent unicode normalization) # TODO known breakage vanished
>
> Do you get the same ?


Now I am even more puzzled.
running t0050 with -x gives this:

 Author: A U Thor <author@example.com>
  1 file changed, 0 insertions(+), 0 deletions(-)
   rename "a\314\210" => "\303\244" (100%)
   ok 9 - rename (silent unicode normalization) # TODO known breakage vanished


----------------
When I create a test Git, with one file in ä-decomposed,
and rename into ä-precomposed, Git gives me:

tb@Ubuntu2021:~/ttt$ git mv "$aumlcdiar" "$auml"
fatal: destination exists, source=ä, destination=ä

and in hex form:

tb@Ubuntu2021:~/ttt$ git mv "$aumlcdiar" "$auml" 2>&1 | xxd
00000000: 6661 7461 6c3a 2064 6573 7469 6e61 7469  fatal: destinati
00000010: 6f6e 2065 7869 7374 732c 2073 6f75 7263  on exists, sourc
00000020: 653d 61cc 882c 2064 6573 7469 6e61 7469  e=a.., destinati
00000030: 6f6e 3dc3 a40a                           on=...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] Unicode weirdness breaking tests on ZFS?
  2021-11-17 17:39     ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
@ 2021-11-17 18:29       ` Derrick Stolee
  2021-11-17 18:35         ` Derrick Stolee
  0 siblings, 1 reply; 10+ messages in thread
From: Derrick Stolee @ 2021-11-17 18:29 UTC (permalink / raw)
  To: Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=; +Cc: Git Mailing List

On 11/17/2021 12:39 PM, Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= wrote:
> On Wed, Nov 17, 2021 at 06:06:13PM +0100, Torsten B??gershausen wrote:
>> On Wed, Nov 17, 2021 at 05:12:26PM +0100, Torsten B??gershausen wrote:
>>> On Wed, Nov 17, 2021 at 10:17:53AM -0500, Derrick Stolee wrote:
>>>> I recently had to pave my Linux machine, so I updated it to Ubuntu
>>>> 21.10 and had the choice to start using the ZFS filesystem. I thought,
>>>> "Why not?" but now I maybe see why.
>>>>
>>>> Running the Git test suite at the v2.34.0 tag on my machine results in
>>>> these failures:
>>>>
>>>> t0050-filesystem.sh                   (Wstat: 0 Tests: 11 Failed: 0)
>>>>   TODO passed:   9-10
>>>> t0021-conversion.sh                   (Wstat: 256 Tests: 41 Failed: 1)
>>>>   Failed test:  31
>>>>   Non-zero exit status: 1
>>>> t3910-mac-os-precompose.sh            (Wstat: 256 Tests: 25 Failed: 10)
>>>>   Failed tests:  1, 4, 6, 8, 11-16
>>>>   TODO passed:   23
>>>>   Non-zero exit status: 1
>>>>
>>>> These are all related to the UTF8_NFD_TO_NFC prereq.
>>>>
>>>> Zooming in on t0050, these tests are marked as "test_expect_failure" due
>>>> to an assignment of $test_unicode using the UTF8_NFD_TO_NFC prereq:
>>>>
>>>>
>>>> $test_unicode 'rename (silent unicode normalization)' '
>>>> 	git mv "$aumlcdiar" "$auml" &&
>>>> 	git commit -m rename
>>>> '
>>>>
>>>> $test_unicode 'merge (silent unicode normalization)' '
>>>> 	git reset --hard initial &&
>>>> 	git merge topic
>>>> '
>>>>
>>>>
>>>> The prereq creates two files using unicode characters that could
>>>> collapse to equivalent meanings:
>>>>
>>>>
>>>> test_lazy_prereq UTF8_NFD_TO_NFC '
>>>> 	# check whether FS converts nfd unicode to nfc
>>>> 	auml=$(printf "\303\244")
>>>> 	aumlcdiar=$(printf "\141\314\210")
>>>> 	>"$auml" &&
>>>> 	test -f "$aumlcdiar"
>>>> '
>>>>
>>>>
>>>> What I see in that first test, the 'git mv' does change the
>>>> index, but the filesystem thinks the files are the same. This
>>>> may mean that our 'git add "$aumlcdiar"' from an earlier test
>>>> is providing a non-equivalence in the index, and the 'git mv'
>>>> changes the index without causing any issues in the filesystem.
>>>>
>>>> It reminds me as if we used 'git mv README readme' on a case-
>>>> insensitive filesystem. Is this not a similar situation?
>>>>
>>>> What I'm trying to gather is that maybe this test is flawed?
>>>> Or maybe something broke (or never worked?) in how we use
>>>> 'git add' to not get the canonical unicode from the filesystem?
>>>>
>>>> The other tests all have similar interactions with 'git add'.
>>>> I'm hoping that these are just test bugs, and not actually a
>>>> functionality issue in Git. Yes, it is confusing that we can
>>>> change the unicode of a file in the index without the filesystem
>>>> understanding the difference, but that is very similar to how
>>>> case-insensitive filesystems work and I don't know what else we
>>>> would do here.
>>>>
>>>> These filesystem/unicode things are out of my expertise, so
>>>> hopefully someone else has a clearer idea of what is going on.
>>>> I'm happy to be a test bed, or even attempt producing patches
>>>> to fix the issue once we have that clarity.
>>>>
>>>> Thanks,
>>>> -Stolee
>>>
>>> Interesting.
>>> The tests have always been working on HFS+, then we got
>>> APFS (and needed a small fix) and now ZFS.
>>>
>>> I'll can have a look - just installing in a virtual machine.
>>
>> So, the virtual machine is up-and-running.
>>
>> I got 2 messages:
>>
>> ok 9 - rename (silent unicode normalization) # TODO known breakage vanished
>> ok 10 - merge (silent unicode normalization) # TODO known breakage vanished
>>
>> Do you get the same ?

Halfway, I see this:

ok 9 - rename (silent unicode normalization) # TODO known breakage vanished
not ok 10 - merge (silent unicode normalization) # TODO known breakage

> Now I am even more puzzled.
> running t0050 with -x gives this:
> 
>  Author: A U Thor <author@example.com>
>   1 file changed, 0 insertions(+), 0 deletions(-)
>    rename "a\314\210" => "\303\244" (100%)
>    ok 9 - rename (silent unicode normalization) # TODO known breakage vanished
> 
> 
> ----------------
> When I create a test Git, with one file in ä-decomposed,
> and rename into ä-precomposed, Git gives me:
> 
> tb@Ubuntu2021:~/ttt$ git mv "$aumlcdiar" "$auml"
> fatal: destination exists, source=ä, destination=ä
> 
> and in hex form:
> 
> tb@Ubuntu2021:~/ttt$ git mv "$aumlcdiar" "$auml" 2>&1 | xxd
> 00000000: 6661 7461 6c3a 2064 6573 7469 6e61 7469  fatal: destinati
> 00000010: 6f6e 2065 7869 7374 732c 2073 6f75 7263  on exists, sourc
> 00000020: 653d 61cc 882c 2064 6573 7469 6e61 7469  e=a.., destinati
> 00000030: 6f6e 3dc3 a40a                           on=...
 
Interesting: does this "fatal" error not change the exit code? Oddly,
I don't get that failure under -x:

checking known breakage of 0050.9 'rename (silent unicode normalization)': 
        git mv "$aumlcdiar" "$auml" &&
        git commit -m rename

+ git mv ä ä
+ git commit -m rename
[main 591d19c] rename
 Author: A U Thor <author@example.com>
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename "a\314\210" => "\303\244" (100%)
ok 9 - rename (silent unicode normalization) # TODO known breakage vanished

checking known breakage of 0050.10 'merge (silent unicode normalization)': 
        git reset --hard initial &&
        git merge topic

+ git reset --hard initial
error: unable to unlink old 'ä': No such file or directory
fatal: Could not reset index file to revision 'initial'.
error: last command exited with $?=128
not ok 10 - merge (silent unicode normalization) # TODO known breakage


But notice that -x does make test 10 go back to failing.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] Unicode weirdness breaking tests on ZFS?
  2021-11-17 18:29       ` Derrick Stolee
@ 2021-11-17 18:35         ` Derrick Stolee
  2021-11-19 15:44           ` Torsten Bögershausen
  0 siblings, 1 reply; 10+ messages in thread
From: Derrick Stolee @ 2021-11-17 18:35 UTC (permalink / raw)
  To: Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=; +Cc: Git Mailing List

On 11/17/2021 1:29 PM, Derrick Stolee wrote:
> On 11/17/2021 12:39 PM, Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= wrote:
>> On Wed, Nov 17, 2021 at 06:06:13PM +0100, Torsten B??gershausen wrote:
>>> On Wed, Nov 17, 2021 at 05:12:26PM +0100, Torsten B??gershausen wrote:
>>>> I'll can have a look - just installing in a virtual machine.
>>>
>>> So, the virtual machine is up-and-running.
>>>
>>> I got 2 messages:
>>>
>>> ok 9 - rename (silent unicode normalization) # TODO known breakage vanished
>>> ok 10 - merge (silent unicode normalization) # TODO known breakage vanished
>>>
>>> Do you get the same ?
> 
> Halfway, I see this:
> 
> ok 9 - rename (silent unicode normalization) # TODO known breakage vanished
> not ok 10 - merge (silent unicode normalization) # TODO known breakage

Making this even more confusing, my original output shows both of
the TODOs vanishing, but I can't make that happen only running this
test. However, with "prove -j8 t00*.sh" I can get them to both
vanish:

Test Summary Report
-------------------
t0050-filesystem.sh           (Wstat: 0 Tests: 11 Failed: 0)
  TODO passed:   9-10
t0021-conversion.sh           (Wstat: 256 Tests: 41 Failed: 1)
  Failed test:  31
  Non-zero exit status: 1
Files=53, Tests=2896, 15 wallclock secs ( 0.59 usr  0.07 sys + 26.96 cusr 13.95 csys = 41.57 CPU)
Result: FAIL


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] Unicode weirdness breaking tests on ZFS?
  2021-11-17 18:35         ` Derrick Stolee
@ 2021-11-19 15:44           ` Torsten Bögershausen
  2021-11-19 17:03             ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Torsten Bögershausen @ 2021-11-19 15:44 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List

On Wed, Nov 17, 2021 at 01:35:32PM -0500, Derrick Stolee wrote:
> On 11/17/2021 1:29 PM, Derrick Stolee wrote:
> > On 11/17/2021 12:39 PM, Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= wrote:
> >> On Wed, Nov 17, 2021 at 06:06:13PM +0100, Torsten B??gershausen wrote:
> >>> On Wed, Nov 17, 2021 at 05:12:26PM +0100, Torsten B??gershausen wrote:
> >>>> I'll can have a look - just installing in a virtual machine.
> >>>
> >>> So, the virtual machine is up-and-running.
> >>>
> >>> I got 2 messages:
> >>>
> >>> ok 9 - rename (silent unicode normalization) # TODO known breakage vanished
> >>> ok 10 - merge (silent unicode normalization) # TODO known breakage vanished
> >>>
> >>> Do you get the same ?
> >
> > Halfway, I see this:
> >
> > ok 9 - rename (silent unicode normalization) # TODO known breakage vanished
> > not ok 10 - merge (silent unicode normalization) # TODO known breakage
>
> Making this even more confusing, my original output shows both of
> the TODOs vanishing, but I can't make that happen only running this
> test. However, with "prove -j8 t00*.sh" I can get them to both
> vanish:
>
> Test Summary Report
> -------------------
> t0050-filesystem.sh           (Wstat: 0 Tests: 11 Failed: 0)
>   TODO passed:   9-10
> t0021-conversion.sh           (Wstat: 256 Tests: 41 Failed: 1)
>   Failed test:  31
>   Non-zero exit status: 1
> Files=53, Tests=2896, 15 wallclock secs ( 0.59 usr  0.07 sys + 26.96 cusr 13.95 csys = 41.57 CPU)
> Result: FAIL
>

Should we conclude that the underlying os/zfs is not stable ?
Things don't seem to be reproducable

What Git needs here in t0050 is that stat("ä") behaves the same as stat("a¨"),
when either "ä" or "a¨" exist on disk.
The same for open() and all other file system functions.
("ä" is the precomposed form "a¨" is the decomposed form,
 typically both render to the same glyph on the screen,
 and a hex dump or xxd will show what we had.
 I just use this notation here for illustration)

Should we contact the zfs developers ?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] Unicode weirdness breaking tests on ZFS?
  2021-11-19 15:44           ` Torsten Bögershausen
@ 2021-11-19 17:03             ` Junio C Hamano
  2021-11-19 18:30               ` Derrick Stolee
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2021-11-19 17:03 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Derrick Stolee, Git Mailing List

Torsten Bögershausen <tboegi@web.de> writes:

> Should we conclude that the underlying os/zfs is not stable ?
> Things don't seem to be reproducable
>
> What Git needs here in t0050 is that stat("ä") behaves the same as stat("a¨"),
> when either "ä" or "a¨" exist on disk.
> The same for open() and all other file system functions.

We either need to see these two are treated as the same thing, or
these two are treated as two distict filesystem entities, just like
stat("a") and stat("b") are.  What we absolutely need is the
unification either always happens or never happens consistently.

I wonder what readdir() is returning.  After creat("ä") in an empty
directory, does readdir() in there return "ä" or "a¨?  And vice
versa?  Is this also inconsistent?

> ("ä" is the precomposed form "a¨" is the decomposed form,
>  typically both render to the same glyph on the screen,
>  and a hex dump or xxd will show what we had.
>  I just use this notation here for illustration)
>
> Should we contact the zfs developers ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] Unicode weirdness breaking tests on ZFS?
  2021-11-19 17:03             ` Junio C Hamano
@ 2021-11-19 18:30               ` Derrick Stolee
  0 siblings, 0 replies; 10+ messages in thread
From: Derrick Stolee @ 2021-11-19 18:30 UTC (permalink / raw)
  To: Junio C Hamano, Torsten Bögershausen; +Cc: Git Mailing List

On 11/19/2021 12:03 PM, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
> 
>> Should we conclude that the underlying os/zfs is not stable ?
>> Things don't seem to be reproducable
>>
>> What Git needs here in t0050 is that stat("ä") behaves the same as stat("a¨"),
>> when either "ä" or "a¨" exist on disk.
>> The same for open() and all other file system functions.
> 
> We either need to see these two are treated as the same thing, or
> these two are treated as two distict filesystem entities, just like
> stat("a") and stat("b") are.  What we absolutely need is the
> unification either always happens or never happens consistently.
> 
> I wonder what readdir() is returning.  After creat("ä") in an empty
> directory, does readdir() in there return "ä" or "a¨?  And vice
> versa?  Is this also inconsistent?

Following this suggestion, I added a test helper with this code:

int cmd__create_and_read(int argc, const char **argv)
{
	DIR *dir;
	struct dirent *de;

	if (strcmp(argv[0], "--nfc"))
		creat("\303\244", 0766);
	else if (strcmp(argv[0], "--nfd"))
		creat("\141\314\210", 0766);
	else
		die("select --nfc or --nfd");

	dir = opendir(".");
	readdir(dir);

	while ((de = readdir(dir)) != NULL)
		printf("%s\n", de->d_name);

	return 0;
}

And then added this test:

test_expect_success 'unicode stuff' '
	mkdir nfc &&
	(
		cd nfc &&
		test-tool create-and-read --nfc >../nfc.txt
	) &&

	mkdir nfd &&
	(
		cd nfd &&
		test-tool create-and-read --nfd >../nfd.txt
	) &&

	test_cmp nfc.txt nfd.txt
'

This test always passes for me, and is essentially doing
a similar check that the prereq is doing, except that it
actually writes both names to files instead of writing
one and doing a read with the other.

After changing the "$test_unicode" instances to instances of
"test_expect_success", I ran t0050 under --stress and quickly
got a failure on the 'merge (silent unicode normalization)'
test:


expecting success of 0050.11 'merge (silent unicode normalization)': 
        git reset --hard initial &&
        git merge topic

+ git reset --hard initial
error: unable to unlink old 'ä': No such file or directory
fatal: Could not reset index file to revision 'initial'.
error: last command exited with $?=128
not ok 11 - merge (silent unicode normalization)


Deleting that test gave mostly-consistent results, although I once
got a failure on the "setup unicode normalization tests" tests with
a similar error message:

+ git checkout -f main
error: unable to unlink old 'ä': No such file or directory
Switched to branch 'main'
error: last command exited with $?=1
not ok 8 - setup unicode normalization tests
 
>> ("ä" is the precomposed form "a¨" is the decomposed form,
>>  typically both render to the same glyph on the screen,
>>  and a hex dump or xxd will show what we had.
>>  I just use this notation here for illustration)
>>
>> Should we contact the zfs developers ?

Hopefully someone has a good way to contact them, and I
can start a thread at the appropriate place. To optimize
for their time, what is our minimal reproduction steps?

1. Build Git at the v2.34.0 tag.
2. cd to t/
3. ./t0050-filesystem.sh --stress

Those instructions (given enough time) should get the
repro on test 8, 'setup unicode normalization tests'.

To get the faster stress, the same steps work except
use the 'zfs-minimal' branch at my fork [1] because it
changes the tests to expect success, and demonstrates
the unpredictable tests more quickly.

[1] https://github.com/derrickstolee/git/tree/zfs-minimal

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-11-19 18:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-17 15:17 [Question] Unicode weirdness breaking tests on ZFS? Derrick Stolee
2021-11-17 15:41 ` Ævar Arnfjörð Bjarmason
2021-11-17 16:12 ` Torsten Bögershausen
2021-11-17 17:06   ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
2021-11-17 17:39     ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
2021-11-17 18:29       ` Derrick Stolee
2021-11-17 18:35         ` Derrick Stolee
2021-11-19 15:44           ` Torsten Bögershausen
2021-11-19 17:03             ` Junio C Hamano
2021-11-19 18:30               ` Derrick Stolee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).