git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug: Changing folder case with `git mv` crashes on case-insensitive file system
@ 2021-05-03 17:25 Mark Amery
  2021-05-03 22:58 ` brian m. carlson
  2021-05-04 15:19 ` Torsten Bögershausen
  0 siblings, 2 replies; 14+ messages in thread
From: Mark Amery @ 2021-05-03 17:25 UTC (permalink / raw)
  To: git

Attempting to change the case of a folder's name using a command like
`git mv foo FOO` crashes on case-insensitive file systems, like the
default APFS used on Apple Macs.

Here are simple steps to repro this:

    $ mkdir testrepo && cd testrepo && git init
    Initialized empty Git repository in /Users/markamery/testrepo/.git/
    $ mkdir foo && touch foo/bar && git add foo && git commit -m bla
    [master (root-commit) a7e9f5f] bla
    1 file changed, 0 insertions(+), 0 deletions(-)
    create mode 100644 foo/bar
    $ git mv foo FOO
    fatal: renaming 'foo' failed: Invalid argument
    $ echo $?
    128
    $ git status
    On branch master
    nothing to commit, working tree clean

If I create a case-sensitive APFS volume using Disk Utility and try
the commands above on that volume, `git mv foo FOO` works correctly:
it emits no output, exits with a 0 status code, and stages a change
renaming `foo/bar` to `FOO/bar`. However, on my main case-insensitive
volume, `git mv` behaves as shown above: it exits with code 128,
prints an "Invalid argument" error message, and does not stage any
changes.

The command still fails in the same way if you use `git mv --force`
instead of just `git mv`.

Note that previously, `git mv` could not change the case of *file*
names on case-insensitive file systems, until that was fixed in commit
https://github.com/git/git/commit/baa37bff9a845471754d3f47957d58a6ccc30058.
I'm guessing there's a different code path that needs fixing for
changing the case of *folders*.

As far as I can tell, this error has never been reported to the Git
mailing list, but it seems to be encountered frequently;
https://stackoverflow.com/questions/3011625/git-mv-and-only-change-case-of-directory
mentions this bug and has 86000 views.

In case it's relevant, here's my system info as output by `git bugreport`:

    [System Info]
    git version:
    git version 2.31.1
    cpu: x86_64
    no commit associated with this build
    sizeof-long: 8
    sizeof-size_t: 8
    shell-path: /bin/sh
    uname: Darwin 18.7.0 Darwin Kernel Version 18.7.0: Mon Apr 27
20:09:39 PDT 2020; root:xnu-4903.278.35~1/RELEASE_X86_64 x86_64
    compiler info: clang: 11.0.0 (clang-1100.0.33.17)
    libc info: no libc information available
    $SHELL (typically, interactive shell): /bin/bash

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-03 17:25 Bug: Changing folder case with `git mv` crashes on case-insensitive file system Mark Amery
@ 2021-05-03 22:58 ` brian m. carlson
  2021-05-04  3:46   ` Junio C Hamano
  2021-05-04 15:19 ` Torsten Bögershausen
  1 sibling, 1 reply; 14+ messages in thread
From: brian m. carlson @ 2021-05-03 22:58 UTC (permalink / raw)
  To: Mark Amery; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2929 bytes --]

On 2021-05-03 at 17:25:43, Mark Amery wrote:
> Attempting to change the case of a folder's name using a command like
> `git mv foo FOO` crashes on case-insensitive file systems, like the
> default APFS used on Apple Macs.
> 
> Here are simple steps to repro this:
> 
>     $ mkdir testrepo && cd testrepo && git init
>     Initialized empty Git repository in /Users/markamery/testrepo/.git/
>     $ mkdir foo && touch foo/bar && git add foo && git commit -m bla
>     [master (root-commit) a7e9f5f] bla
>     1 file changed, 0 insertions(+), 0 deletions(-)
>     create mode 100644 foo/bar
>     $ git mv foo FOO
>     fatal: renaming 'foo' failed: Invalid argument
>     $ echo $?
>     128
>     $ git status
>     On branch master
>     nothing to commit, working tree clean
> 
> If I create a case-sensitive APFS volume using Disk Utility and try
> the commands above on that volume, `git mv foo FOO` works correctly:
> it emits no output, exits with a 0 status code, and stages a change
> renaming `foo/bar` to `FOO/bar`. However, on my main case-insensitive
> volume, `git mv` behaves as shown above: it exits with code 128,
> prints an "Invalid argument" error message, and does not stage any
> changes.

Yeah, this is because your operating system returns EINVAL in this case.
POSIX specifies EINVAL when you're trying to make a directory a
subdirectory of itself.  Which, I mean, I guess is a valid
interpretation here, but it of course makes renaming the path needlessly
difficult.

> The command still fails in the same way if you use `git mv --force`
> instead of just `git mv`.
> 
> Note that previously, `git mv` could not change the case of *file*
> names on case-insensitive file systems, until that was fixed in commit
> https://github.com/git/git/commit/baa37bff9a845471754d3f47957d58a6ccc30058.
> I'm guessing there's a different code path that needs fixing for
> changing the case of *folders*.

My guess is this is because when the argument is a directory, the mode
is set to WORKING_DIRECTORY, and then we check that the mode isn't INDEX
and so rename(2) gets called.  However, I believe there is a -k option
which should make this work, since we ignore the errors.

I suspect part of the problem here is two fold: on macOS we can't
distinguish an attempt to rename the path due to it folding or
canonicalizing to the same thing from a real attempt to move an actual
directory into itself.  The latter would be a problem we'd want to
report, and the former is not.  Unfortunately, detecting this is
difficult because that means we'd have to implement the macOS
canonicalization algorithm in Git and we don't want to do that.

Maybe someone who frequently uses a macOS system that uses a
case-sensitive file system can chime in here, but for now, I think -k
should do what you want.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-03 22:58 ` brian m. carlson
@ 2021-05-04  3:46   ` Junio C Hamano
  2021-05-04 11:20     ` brian m. carlson
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2021-05-04  3:46 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Mark Amery, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Yeah, this is because your operating system returns EINVAL in this case.
> POSIX specifies EINVAL when you're trying to make a directory a
> subdirectory of itself.  Which, I mean, I guess is a valid
> interpretation here, but it of course makes renaming the path needlessly
> difficult.
> ...
> I suspect part of the problem here is two fold: on macOS we can't
> distinguish an attempt to rename the path due to it folding or
> canonicalizing to the same thing from a real attempt to move an actual
> directory into itself.  The latter would be a problem we'd want to
> report, and the former is not.  Unfortunately, detecting this is
> difficult because that means we'd have to implement the macOS
> canonicalization algorithm in Git and we don't want to do that.

I agree we'd probably need to resort to macOS specific hack (like we
have NFS or Coda specific hacks), but it may not be too bad.

After seeing EINVAL, we can lstat src 'foo' and dst 'FOO', and
realize that both are directories and have the same st_dev/st_ino,
which should be fairly straightforward, no?

For that, we do not exactly have to depend on any part of macOS-ism;
we do depend on the traditional "within the same device, inum is a
good way to tell if two filesystem entities are the same".

The (mis)design of "git mv a b c d ... DST" that turns the request
into "mv a DST/a && mv b DST/b && ..." too early may make the
fallback behaviour a bit cumbersome to implement (mainly, we need to
strip out the '/foo' part out of the failing dst FOO/foo to get the
real destination directory 'FOO' the user gave us, before checking
with that lstat dance), but since it is an error codepath, we can be
as inefficient as we like, as long as we are correct ;-)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-04  3:46   ` Junio C Hamano
@ 2021-05-04 11:20     ` brian m. carlson
  2021-05-05 13:51       ` Johannes Schindelin
  0 siblings, 1 reply; 14+ messages in thread
From: brian m. carlson @ 2021-05-04 11:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Mark Amery, git, Johannes Schindelin

[-- Attachment #1: Type: text/plain, Size: 1922 bytes --]

On 2021-05-04 at 03:46:12, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > Yeah, this is because your operating system returns EINVAL in this case.
> > POSIX specifies EINVAL when you're trying to make a directory a
> > subdirectory of itself.  Which, I mean, I guess is a valid
> > interpretation here, but it of course makes renaming the path needlessly
> > difficult.
> > ...
> > I suspect part of the problem here is two fold: on macOS we can't
> > distinguish an attempt to rename the path due to it folding or
> > canonicalizing to the same thing from a real attempt to move an actual
> > directory into itself.  The latter would be a problem we'd want to
> > report, and the former is not.  Unfortunately, detecting this is
> > difficult because that means we'd have to implement the macOS
> > canonicalization algorithm in Git and we don't want to do that.
> 
> I agree we'd probably need to resort to macOS specific hack (like we
> have NFS or Coda specific hacks), but it may not be too bad.
> 
> After seeing EINVAL, we can lstat src 'foo' and dst 'FOO', and
> realize that both are directories and have the same st_dev/st_ino,
> which should be fairly straightforward, no?
> 
> For that, we do not exactly have to depend on any part of macOS-ism;
> we do depend on the traditional "within the same device, inum is a
> good way to tell if two filesystem entities are the same".

Yes, although that won't work on Windows, which I don't believe has the
concept of inodes and almost certainly has the same problem.  CCing
Dscho in case he has some ideas on how we can make this more resilient
there.

In any event, I'm not planning on writing a patch for this since I have
no way to test it, but I'm sure someone who uses macOS could probably
write one reasonably easily.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-03 17:25 Bug: Changing folder case with `git mv` crashes on case-insensitive file system Mark Amery
  2021-05-03 22:58 ` brian m. carlson
@ 2021-05-04 15:19 ` Torsten Bögershausen
  2021-05-05  0:23   ` Junio C Hamano
  1 sibling, 1 reply; 14+ messages in thread
From: Torsten Bögershausen @ 2021-05-04 15:19 UTC (permalink / raw)
  To: Mark Amery; +Cc: git

On Mon, May 03, 2021 at 06:25:43PM +0100, Mark Amery wrote:
> Attempting to change the case of a folder's name using a command like
> `git mv foo FOO` crashes on case-insensitive file systems, like the
> default APFS used on Apple Macs.
>
> Here are simple steps to repro this:
>
>     $ mkdir testrepo && cd testrepo && git init
>     Initialized empty Git repository in /Users/markamery/testrepo/.git/
>     $ mkdir foo && touch foo/bar && git add foo && git commit -m bla
>     [master (root-commit) a7e9f5f] bla
>     1 file changed, 0 insertions(+), 0 deletions(-)
>     create mode 100644 foo/bar
>     $ git mv foo FOO
>     fatal: renaming 'foo' failed: Invalid argument
>     $ echo $?
>     128
>     $ git status
>     On branch master
>     nothing to commit, working tree clean
>
> If I create a case-sensitive APFS volume using Disk Utility and try
> the commands above on that volume, `git mv foo FOO` works correctly:
> it emits no output, exits with a 0 status code, and stages a change
> renaming `foo/bar` to `FOO/bar`. However, on my main case-insensitive
> volume, `git mv` behaves as shown above: it exits with code 128,
> prints an "Invalid argument" error message, and does not stage any
> changes.
>
> The command still fails in the same way if you use `git mv --force`
> instead of just `git mv`.
>
> Note that previously, `git mv` could not change the case of *file*
> names on case-insensitive file systems, until that was fixed in commit
> https://github.com/git/git/commit/baa37bff9a845471754d3f47957d58a6ccc30058.
> I'm guessing there's a different code path that needs fixing for
> changing the case of *folders*.
>
> As far as I can tell, this error has never been reported to the Git
> mailing list, but it seems to be encountered frequently;
> https://stackoverflow.com/questions/3011625/git-mv-and-only-change-case-of-directory
> mentions this bug and has 86000 views.
>
> In case it's relevant, here's my system info as output by `git bugreport`:
>
>     [System Info]
>     git version:
>     git version 2.31.1
>     cpu: x86_64
>     no commit associated with this build
>     sizeof-long: 8
>     sizeof-size_t: 8
>     shell-path: /bin/sh
>     uname: Darwin 18.7.0 Darwin Kernel Version 18.7.0: Mon Apr 27
> 20:09:39 PDT 2020; root:xnu-4903.278.35~1/RELEASE_X86_64 x86_64
>     compiler info: clang: 11.0.0 (clang-1100.0.33.17)
>     libc info: no libc information available
>     $SHELL (typically, interactive shell): /bin/bash


Thanks for reporting - that's always good.

To my undestanding we try to rename
foo/ into FOO/.
But because FOO/ already "exists" as directory,
Git tries to move foo/ into FOO/foo, which fails.

And no, the problem is probably not restricted to MacOs,
Windows and all case-insenstive file systems should show
the same, but I haven't tested yet, so it's more a suspicion.

The following diff allows to move foo/ into FOO/
If someone wants to make a patch out if, that would be good.



diff --git a/builtin/mv.c b/builtin/mv.c
index 3fccdcb6452..fbf184bcfa9 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -163,8 +163,10 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 		destination = internal_prefix_pathspec(dest_path[0], argv, argc, DUP_BASENAME);
 	else if (!lstat(dest_path[0], &st) &&
 			S_ISDIR(st.st_mode)) {
-		dest_path[0] = add_slash(dest_path[0]);
-		destination = internal_prefix_pathspec(dest_path[0], argv, argc, DUP_BASENAME);
+		if (!ignore_case || strcasecmp(source[0], dest_path[0])) {
+			dest_path[0] = add_slash(dest_path[0]);
+			destination = internal_prefix_pathspec(dest_path[0], argv, argc, DUP_BASENAME);
+		}
 	} else {
 		if (argc != 1)
 			die(_("destination '%s' is not a directory"), dest_path[0]);
@@ -187,9 +189,12 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 				(dst[length] == 0 || dst[length] == '/')) {
 			bad = _("can not move directory into itself");
 		} else if ((src_is_dir = S_ISDIR(st.st_mode))
-				&& lstat(dst, &st) == 0)
-			bad = _("cannot move directory over file");
-		else if (src_is_dir) {
+			   && lstat(dst, &st) == 0) {
+			if (!ignore_case || strcasecmp(src, dst)){
+				bad = _("cannot move directory over file");
+			}
+		}
+		if (!bad && src_is_dir) {
 			int first = cache_name_pos(src, length), last;

 			if (first >= 0)
@@ -277,7 +282,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 		if (mode != INDEX && rename(src, dst) < 0) {
 			if (ignore_errors)
 				continue;
-			die_errno(_("renaming '%s' failed"), src);
+			die_errno(_("renaming '%s' into '%s' failed"), src, dst);
 		}
 		if (submodule_gitfile[i]) {
 			if (!update_path_in_gitmodules(src, dst))

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-04 15:19 ` Torsten Bögershausen
@ 2021-05-05  0:23   ` Junio C Hamano
  2021-05-05  2:12     ` brian m. carlson
  2021-05-06  4:34     ` Torsten Bögershausen
  0 siblings, 2 replies; 14+ messages in thread
From: Junio C Hamano @ 2021-05-05  0:23 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Mark Amery, git

Torsten Bögershausen <tboegi@web.de> writes:

> To my undestanding we try to rename
> foo/ into FOO/.
> But because FOO/ already "exists" as directory,
> Git tries to move foo/ into FOO/foo, which fails.
>
> And no, the problem is probably not restricted to MacOs,
> Windows and all case-insenstive file systems should show
> the same, but I haven't tested yet, so it's more a suspicion.
>
> The following diff allows to move foo/ into FOO/
> If someone wants to make a patch out if, that would be good.

Is strcasecmp() sufficient for macOS whose filesystem has not just
case insensitivity but UTF-8 normalization issues?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-05  0:23   ` Junio C Hamano
@ 2021-05-05  2:12     ` brian m. carlson
  2021-05-06  4:34     ` Torsten Bögershausen
  1 sibling, 0 replies; 14+ messages in thread
From: brian m. carlson @ 2021-05-05  2:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Torsten Bögershausen, Mark Amery, git

[-- Attachment #1: Type: text/plain, Size: 1389 bytes --]

On 2021-05-05 at 00:23:05, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
> 
> > To my undestanding we try to rename
> > foo/ into FOO/.
> > But because FOO/ already "exists" as directory,
> > Git tries to move foo/ into FOO/foo, which fails.
> >
> > And no, the problem is probably not restricted to MacOs,
> > Windows and all case-insenstive file systems should show
> > the same, but I haven't tested yet, so it's more a suspicion.
> >
> > The following diff allows to move foo/ into FOO/
> > If someone wants to make a patch out if, that would be good.
> 
> Is strcasecmp() sufficient for macOS whose filesystem has not just
> case insensitivity but UTF-8 normalization issues?

strcasecmp is locale sensitive, so I would say it is not a good choice
here.  Notably, that means we have differing behavior with "i" and "I"
if the locale is Turkish versus English.  Only one of those behaviors
will be a correct match for what the kernel does, since it is not locale
sensitive.  Moreover, on Windows, the version of Unicode used for case
mapping is written into an NTFS file system when it's created, so a
fresh install may have different behavior from an upgrade, and
strcasecmp won't honor that.

And, as you pointed out, it doesn't handle Unicode normalization at all.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-04 11:20     ` brian m. carlson
@ 2021-05-05 13:51       ` Johannes Schindelin
  2021-05-06  0:38         ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Schindelin @ 2021-05-05 13:51 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, Mark Amery, git

Hi,

On Tue, 4 May 2021, brian m. carlson wrote:

> On 2021-05-04 at 03:46:12, Junio C Hamano wrote:
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >
> > > Yeah, this is because your operating system returns EINVAL in this case.
> > > POSIX specifies EINVAL when you're trying to make a directory a
> > > subdirectory of itself.  Which, I mean, I guess is a valid
> > > interpretation here, but it of course makes renaming the path needlessly
> > > difficult.
> > > ...
> > > I suspect part of the problem here is two fold: on macOS we can't
> > > distinguish an attempt to rename the path due to it folding or
> > > canonicalizing to the same thing from a real attempt to move an actual
> > > directory into itself.  The latter would be a problem we'd want to
> > > report, and the former is not.  Unfortunately, detecting this is
> > > difficult because that means we'd have to implement the macOS
> > > canonicalization algorithm in Git and we don't want to do that.
> >
> > I agree we'd probably need to resort to macOS specific hack (like we
> > have NFS or Coda specific hacks), but it may not be too bad.
> >
> > After seeing EINVAL, we can lstat src 'foo' and dst 'FOO', and
> > realize that both are directories and have the same st_dev/st_ino,
> > which should be fairly straightforward, no?
> >
> > For that, we do not exactly have to depend on any part of macOS-ism;
> > we do depend on the traditional "within the same device, inum is a
> > good way to tell if two filesystem entities are the same".
>
> Yes, although that won't work on Windows, which I don't believe has the
> concept of inodes and almost certainly has the same problem.  CCing
> Dscho in case he has some ideas on how we can make this more resilient
> there.

Windows does not really have inodes. But it has what it calls "file IDs".
This concept is pretty much what you expect on inodes, except on that
still-used file system called FAT (for full details, see
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information):

	In the FAT file system, the file ID is generated from the first
	cluster of the containing directory and the byte offset within the
	directory of the entry for the file. Some defragmentation products
	change this byte offset. (Windows in-box defragmentation does
	not.) Thus, a FAT file ID can change over time. Renaming a file in
	the FAT file system can also change the file ID, but only if the
	new file name is longer than the old one.

In this instance, because we're not actually renaming (yet), the file ID
would probably be okay. But in general, we should assume that we do not
have inodes on Windows.

There is another complication: it is not actually cheap to get to that
file ID. For performance reasons, we introduced that (optional) FSCache
feature in Git for Windows where cache the `lstat()` results because the
assumption that `lstat()` is fast really only holds true on Linux.

In fact, what we do is to use `FindFirstFile()`/`FindNextFile()` to
enumerate an entire directory's worth of `lstat()` data, because funnily
enough, it is a lot faster when we need an entire directory's worth of
`lstat()` data anyway (calling `GetFileAttributes()` for individual files
is of course faster, but not for a dozen files or so).

But even `GetFileAttributes()` won't get you that "file ID". You have to
create a file handle (via `CreateFile()`, which is *expensive*) and then
call `GetFileInformationByHandle()`.

Now, way too much of Git's source code still pretends that `lstat()` is
just this fast operation and we can do it left and right and not say what
we _actually_ want to know. That function is called when we need only
parts of the `lstat()` data. Sometimes it is even used to determine
whether a file or directory is present. But since Git does not have proper
abstraction, `mingw_lstat()` _still_ has to fill in all the information.

So _if_ we need that file ID information, I would be very much in favor of
introducing a proper abstraction, where differentiate between the
intention (think `get_inode(const char *path)`) from the
platform-dependent implementation detail (think `lstat()`, `CreateFile()`
and `GetFileInformationByHandle()`).

Ciao,
Dscho

> In any event, I'm not planning on writing a patch for this since I have
> no way to test it, but I'm sure someone who uses macOS could probably
> write one reasonably easily.
> --
> brian m. carlson (he/him or they/them)
> Houston, Texas, US
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-05 13:51       ` Johannes Schindelin
@ 2021-05-06  0:38         ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2021-05-06  0:38 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: brian m. carlson, Mark Amery, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> So _if_ we need that file ID information, I would be very much in favor of
> introducing a proper abstraction, where differentiate between the
> intention (think `get_inode(const char *path)`) from the
> platform-dependent implementation detail (think `lstat()`, `CreateFile()`
> and `GetFileInformationByHandle()`).

I agree in principle.  Essentially, we need to

 (1) examine all calls to lstat(2) we make in our codebase, and find
     out what members of "stat" are really used out of the result
     for each callsite.  This will be different from caller to
     caller (some callers may want only ino, other callers may want
     ino, size, and mtime, etc.), and we would learn that there are
     N patterns.

 (2) write N abstracted helper functions (or a single helper that
     takes const char *path, struct stat *, and an enum to tell
     which one of N patterns this call is about).

 (3) replace each lstat(2) call with one of these N abstract helper
     functions.

get_inode() might be one of these N functions, but what is important
is that the current callers that want ino and something else should
not be penalized by making two separte calls get_inode() and
get_other_things(), which, when done naively, would result in two
lstat(2) calls.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-05  0:23   ` Junio C Hamano
  2021-05-05  2:12     ` brian m. carlson
@ 2021-05-06  4:34     ` Torsten Bögershausen
  2021-05-06  9:12       ` Mark Amery
  1 sibling, 1 reply; 14+ messages in thread
From: Torsten Bögershausen @ 2021-05-06  4:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Mark Amery, git

On Wed, May 05, 2021 at 09:23:05AM +0900, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
>
> > To my undestanding we try to rename
> > foo/ into FOO/.
> > But because FOO/ already "exists" as directory,
> > Git tries to move foo/ into FOO/foo, which fails.
> >
> > And no, the problem is probably not restricted to MacOs,
> > Windows and all case-insenstive file systems should show
> > the same, but I haven't tested yet, so it's more a suspicion.
> >
> > The following diff allows to move foo/ into FOO/
> > If someone wants to make a patch out if, that would be good.
>
> Is strcasecmp() sufficient for macOS whose filesystem has not just
> case insensitivity but UTF-8 normalization issues?
>

Strictly speaking: no.

The Git code doesn't handle UTF-8 uppper/lower case at all:
git mv bar.txt BAR.TXT works because strcasecmp() is catching it.

git mv bär.txt BÄR.TXT needs the long way:
git mv bär.txt baer.txt && git mv baer.txt BÄR.TXT

We have been restricting the case-change-is-allowed to ASCII filenames
all the time.
There is no information, which code points map onto each other in Git,
since this is all file system dependent.
NTFS has one way, HFS+, APFS another, VFAT a third one, and if I expose
ext4 via SAMBA we probably have another one.
Not mentioniong that ext4 can be use case-insensitve on later Linux kernels,
which sticks to unicode.
Or Git repos running on machines using ISO-8859-1, those should be rare these
days.

That said, people are renaming files in ASCII only and are happy,
and in that sense renaming directories in ASCII can be supported
without major hassle.

And the inode approach mentioned as well:
This could go on top of strcasecmp() to cover non-ASCII filenames
or other oddities, if someone implements it.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-06  4:34     ` Torsten Bögershausen
@ 2021-05-06  9:12       ` Mark Amery
  2021-05-06 13:11         ` Bagas Sanjaya
                           ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Mark Amery @ 2021-05-06  9:12 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Junio C Hamano, git

So, I'm just a dumb Git user who doesn't even write C, so much of this
discussion is over my head, but I have a few thoughts that may be
helpful:

• The mv utility on Mac is capable of doing `mv bär.txt bÄr.txt` just
fine. Maybe `git mv` can learn something from whatever `mv` does?

• On a case-insensitive file system, `git mv somedir sOMEdir` is a
rename. But on a case-sensitive file system, it might NOT be a rename;
it might be the case that `somedir` and `sOMEdir` both exist and that
the command should put `somedir` inside `sOMEdir`. I mention this
because I can imagine some naive attempts at fixing the original bug
by doing a case-insensitive comparison of the two names ending up
breaking this behaviour on case-sensitive file systems by wrongly
treating such a command as a rename. It's probably worth having a test
that this scenario gets handled cleanly on case-sensitive file
systems? (I haven't checked whether Torsten's proposed diff falls into
this trap or not.)

• Above, Torsten mentions that there are filesystem-specific rules
about what names are equal to each other that Git can't easily handle,
because they go beyond just ASCII case changes. In that case, maybe
the right solution is to always defer the question to the filesystem
rather than Git trying to figure out the answer "in its head"?

  That is: first check the inode or file ID of the src and dst passed
to `git mv`. If they are different and the second one is a folder,
move src inside the existing folder. If either they are the same or
the second one is not a folder, then do a rename.

  It seems to me that this approach automatically handles stuff like
`git mv bär.txt bÄr.txt` plus any other rules about names being equal
(like two different sequences of code points that both express "à"),
all without Git ever needing to explicitly check whether two names are
case-insensitively equal. Am I missing something?

Sorry if any of the above is dumb or if I'm reiterating things others
have already said without realising it.

On Thu, May 6, 2021 at 5:34 AM Torsten Bögershausen <tboegi@web.de> wrote:
>
> On Wed, May 05, 2021 at 09:23:05AM +0900, Junio C Hamano wrote:
> > Torsten Bögershausen <tboegi@web.de> writes:
> >
> > > To my undestanding we try to rename
> > > foo/ into FOO/.
> > > But because FOO/ already "exists" as directory,
> > > Git tries to move foo/ into FOO/foo, which fails.
> > >
> > > And no, the problem is probably not restricted to MacOs,
> > > Windows and all case-insenstive file systems should show
> > > the same, but I haven't tested yet, so it's more a suspicion.
> > >
> > > The following diff allows to move foo/ into FOO/
> > > If someone wants to make a patch out if, that would be good.
> >
> > Is strcasecmp() sufficient for macOS whose filesystem has not just
> > case insensitivity but UTF-8 normalization issues?
> >
>
> Strictly speaking: no.
>
> The Git code doesn't handle UTF-8 uppper/lower case at all:
> git mv bar.txt BAR.TXT works because strcasecmp() is catching it.
>
> git mv bär.txt BÄR.TXT needs the long way:
> git mv bär.txt baer.txt && git mv baer.txt BÄR.TXT
>
> We have been restricting the case-change-is-allowed to ASCII filenames
> all the time.
> There is no information, which code points map onto each other in Git,
> since this is all file system dependent.
> NTFS has one way, HFS+, APFS another, VFAT a third one, and if I expose
> ext4 via SAMBA we probably have another one.
> Not mentioniong that ext4 can be use case-insensitve on later Linux kernels,
> which sticks to unicode.
> Or Git repos running on machines using ISO-8859-1, those should be rare these
> days.
>
> That said, people are renaming files in ASCII only and are happy,
> and in that sense renaming directories in ASCII can be supported
> without major hassle.
>
> And the inode approach mentioned as well:
> This could go on top of strcasecmp() to cover non-ASCII filenames
> or other oddities, if someone implements it.
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-06  9:12       ` Mark Amery
@ 2021-05-06 13:11         ` Bagas Sanjaya
  2021-05-06 14:53         ` Torsten Bögershausen
  2021-05-06 21:03         ` Junio C Hamano
  2 siblings, 0 replies; 14+ messages in thread
From: Bagas Sanjaya @ 2021-05-06 13:11 UTC (permalink / raw)
  To: Mark Amery, Torsten Bögershausen; +Cc: Junio C Hamano, git

On 06/05/21 16.12, Mark Amery wrote:

> • Above, Torsten mentions that there are filesystem-specific rules
> about what names are equal to each other that Git can't easily handle,
> because they go beyond just ASCII case changes. In that case, maybe
> the right solution is to always defer the question to the filesystem
> rather than Git trying to figure out the answer "in its head"?
> 
>    That is: first check the inode or file ID of the src and dst passed
> to `git mv`. If they are different and the second one is a folder,
> move src inside the existing folder. If either they are the same or
> the second one is not a folder, then do a rename.
> 
>    It seems to me that this approach automatically handles stuff like
> `git mv bär.txt bÄr.txt` plus any other rules about names being equal
> (like two different sequences of code points that both express "à"),
> all without Git ever needing to explicitly check whether two names are
> case-insensitively equal. Am I missing something?

On case when src and dst have different inodes and dst is a file, rather
than a directory, mv will overwrite dst with contents of src (on Linux).

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-06  9:12       ` Mark Amery
  2021-05-06 13:11         ` Bagas Sanjaya
@ 2021-05-06 14:53         ` Torsten Bögershausen
  2021-05-06 21:03         ` Junio C Hamano
  2 siblings, 0 replies; 14+ messages in thread
From: Torsten Bögershausen @ 2021-05-06 14:53 UTC (permalink / raw)
  To: Mark Amery; +Cc: Junio C Hamano, git

On Thu, May 06, 2021 at 10:12:40AM +0100, Mark Amery wrote:
> So, I'm just a dumb Git user who doesn't even write C, so much of this
> discussion is over my head, but I have a few thoughts that may be
> helpful:
>
> • The mv utility on Mac is capable of doing `mv bär.txt bÄr.txt` just
> fine. Maybe `git mv` can learn something from whatever `mv` does?

Yes, Git can do the same.
The thing is, that Git avoids to overwrite of existing files.
Just to be nice.
In that sense, if the destination exists, Git refuses the mv,
unless you use git mv -f
Having said that,
git mv bär.txt bÄr.txt will not work (without the -f)

git mv bear.txt BEAR.txt does work,

>
> • On a case-insensitive file system, `git mv somedir sOMEdir` is a
> rename. But on a case-sensitive file system, it might NOT be a rename;
> it might be the case that `somedir` and `sOMEdir` both exist and that
> the command should put `somedir` inside `sOMEdir`. I mention this
> because I can imagine some naive attempts at fixing the original bug
> by doing a case-insensitive comparison of the two names ending up
> breaking this behaviour on case-sensitive file systems by wrongly
> treating such a command as a rename. It's probably worth having a test
> that this scenario gets handled cleanly on case-sensitive file
> systems? (I haven't checked whether Torsten's proposed diff falls into
> this trap or not.)

Tests are needed - I should have started with those.
But I didn't intend to send a patch (yet), just sharing
ideas and knowledge. Which may enable someone to write a patch.

>
> • Above, Torsten mentions that there are filesystem-specific rules
> about what names are equal to each other that Git can't easily handle,
> because they go beyond just ASCII case changes. In that case, maybe
> the right solution is to always defer the question to the filesystem
> rather than Git trying to figure out the answer "in its head"?

There are different trade-offs:
So far I am only aware of people asking for the
git mv bear.txt BEAR.txt rename.
Just because they are all SW developers ? I don't know.
And just because SW developers are developping Git,
the case-insensitive string compare is good enough,
it is working for them/us.
So things are as they are.

>
>   That is: first check the inode or file ID of the src and dst passed
> to `git mv`. If they are different and the second one is a folder,
> move src inside the existing folder. If either they are the same or
> the second one is not a folder, then do a rename.

Yes. In short: patches are welcome.
In long: inodes don't work on Windows (without a major effort)

>
>   It seems to me that this approach automatically handles stuff like
> `git mv bär.txt bÄr.txt` plus any other rules about names being equal
> (like two different sequences of code points that both express "à"),
> all without Git ever needing to explicitly check whether two names are
> case-insensitively equal. Am I missing something?

That could be a solution. There may are situations/configurations,
where inodes don't work:
What happens if a Windows server exports a file system to MacOs ?
To Linux ?
Do we have working inodes ?
What about other networking combinations ?
Our code should handle them well as well.

>
> Sorry if any of the above is dumb or if I'm reiterating things others
> have already said without realising it.

No problem. Actually I realized that we used top-posting here,
So I remove the reset to make it more readable.

[]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
  2021-05-06  9:12       ` Mark Amery
  2021-05-06 13:11         ` Bagas Sanjaya
  2021-05-06 14:53         ` Torsten Bögershausen
@ 2021-05-06 21:03         ` Junio C Hamano
  2 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2021-05-06 21:03 UTC (permalink / raw)
  To: Mark Amery; +Cc: Torsten Bögershausen, git

Mark Amery <markrobertamery@gmail.com> writes:

>   That is: first check the inode or file ID of the src and dst passed
> to `git mv`. If they are different and the second one is a folder,
> move src inside the existing folder. If either they are the same or
> the second one is not a folder, then do a rename.
>
>   It seems to me that this approach automatically handles stuff like
> `git mv bär.txt bÄr.txt` plus any other rules about names being equal
> (like two different sequences of code points that both express "à"),
> all without Git ever needing to explicitly check whether two names are
> case-insensitively equal. Am I missing something?
>
> Sorry if any of the above is dumb or if I'm reiterating things others
> have already said without realising it.

There is nothing dumb about the idea.  I think that is essentially
what brian and I outlined in the earlier exchange, except that we
need to use something other than inum on Windows to decide if two
pathnames refer to the same filesystem entity, and the approach we
suggested was to use the inum-based check _after_ seeing the current
code fail with EINVAL in the error codepath, instead of doing so
upfront.

Thanks.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-05-06 21:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-03 17:25 Bug: Changing folder case with `git mv` crashes on case-insensitive file system Mark Amery
2021-05-03 22:58 ` brian m. carlson
2021-05-04  3:46   ` Junio C Hamano
2021-05-04 11:20     ` brian m. carlson
2021-05-05 13:51       ` Johannes Schindelin
2021-05-06  0:38         ` Junio C Hamano
2021-05-04 15:19 ` Torsten Bögershausen
2021-05-05  0:23   ` Junio C Hamano
2021-05-05  2:12     ` brian m. carlson
2021-05-06  4:34     ` Torsten Bögershausen
2021-05-06  9:12       ` Mark Amery
2021-05-06 13:11         ` Bagas Sanjaya
2021-05-06 14:53         ` Torsten Bögershausen
2021-05-06 21:03         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).