* Bug: Changing folder case with `git mv` crashes on case-insensitive file system @ 2021-05-03 17:25 Mark Amery 2021-05-03 22:58 ` brian m. carlson 2021-05-04 15:19 ` Torsten Bögershausen 0 siblings, 2 replies; 14+ messages in thread From: Mark Amery @ 2021-05-03 17:25 UTC (permalink / raw) To: git Attempting to change the case of a folder's name using a command like `git mv foo FOO` crashes on case-insensitive file systems, like the default APFS used on Apple Macs. Here are simple steps to repro this: $ mkdir testrepo && cd testrepo && git init Initialized empty Git repository in /Users/markamery/testrepo/.git/ $ mkdir foo && touch foo/bar && git add foo && git commit -m bla [master (root-commit) a7e9f5f] bla 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 foo/bar $ git mv foo FOO fatal: renaming 'foo' failed: Invalid argument $ echo $? 128 $ git status On branch master nothing to commit, working tree clean If I create a case-sensitive APFS volume using Disk Utility and try the commands above on that volume, `git mv foo FOO` works correctly: it emits no output, exits with a 0 status code, and stages a change renaming `foo/bar` to `FOO/bar`. However, on my main case-insensitive volume, `git mv` behaves as shown above: it exits with code 128, prints an "Invalid argument" error message, and does not stage any changes. The command still fails in the same way if you use `git mv --force` instead of just `git mv`. Note that previously, `git mv` could not change the case of *file* names on case-insensitive file systems, until that was fixed in commit https://github.com/git/git/commit/baa37bff9a845471754d3f47957d58a6ccc30058. I'm guessing there's a different code path that needs fixing for changing the case of *folders*. As far as I can tell, this error has never been reported to the Git mailing list, but it seems to be encountered frequently; https://stackoverflow.com/questions/3011625/git-mv-and-only-change-case-of-directory mentions this bug and has 86000 views. In case it's relevant, here's my system info as output by `git bugreport`: [System Info] git version: git version 2.31.1 cpu: x86_64 no commit associated with this build sizeof-long: 8 sizeof-size_t: 8 shell-path: /bin/sh uname: Darwin 18.7.0 Darwin Kernel Version 18.7.0: Mon Apr 27 20:09:39 PDT 2020; root:xnu-4903.278.35~1/RELEASE_X86_64 x86_64 compiler info: clang: 11.0.0 (clang-1100.0.33.17) libc info: no libc information available $SHELL (typically, interactive shell): /bin/bash ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-03 17:25 Bug: Changing folder case with `git mv` crashes on case-insensitive file system Mark Amery @ 2021-05-03 22:58 ` brian m. carlson 2021-05-04 3:46 ` Junio C Hamano 2021-05-04 15:19 ` Torsten Bögershausen 1 sibling, 1 reply; 14+ messages in thread From: brian m. carlson @ 2021-05-03 22:58 UTC (permalink / raw) To: Mark Amery; +Cc: git [-- Attachment #1: Type: text/plain, Size: 2929 bytes --] On 2021-05-03 at 17:25:43, Mark Amery wrote: > Attempting to change the case of a folder's name using a command like > `git mv foo FOO` crashes on case-insensitive file systems, like the > default APFS used on Apple Macs. > > Here are simple steps to repro this: > > $ mkdir testrepo && cd testrepo && git init > Initialized empty Git repository in /Users/markamery/testrepo/.git/ > $ mkdir foo && touch foo/bar && git add foo && git commit -m bla > [master (root-commit) a7e9f5f] bla > 1 file changed, 0 insertions(+), 0 deletions(-) > create mode 100644 foo/bar > $ git mv foo FOO > fatal: renaming 'foo' failed: Invalid argument > $ echo $? > 128 > $ git status > On branch master > nothing to commit, working tree clean > > If I create a case-sensitive APFS volume using Disk Utility and try > the commands above on that volume, `git mv foo FOO` works correctly: > it emits no output, exits with a 0 status code, and stages a change > renaming `foo/bar` to `FOO/bar`. However, on my main case-insensitive > volume, `git mv` behaves as shown above: it exits with code 128, > prints an "Invalid argument" error message, and does not stage any > changes. Yeah, this is because your operating system returns EINVAL in this case. POSIX specifies EINVAL when you're trying to make a directory a subdirectory of itself. Which, I mean, I guess is a valid interpretation here, but it of course makes renaming the path needlessly difficult. > The command still fails in the same way if you use `git mv --force` > instead of just `git mv`. > > Note that previously, `git mv` could not change the case of *file* > names on case-insensitive file systems, until that was fixed in commit > https://github.com/git/git/commit/baa37bff9a845471754d3f47957d58a6ccc30058. > I'm guessing there's a different code path that needs fixing for > changing the case of *folders*. My guess is this is because when the argument is a directory, the mode is set to WORKING_DIRECTORY, and then we check that the mode isn't INDEX and so rename(2) gets called. However, I believe there is a -k option which should make this work, since we ignore the errors. I suspect part of the problem here is two fold: on macOS we can't distinguish an attempt to rename the path due to it folding or canonicalizing to the same thing from a real attempt to move an actual directory into itself. The latter would be a problem we'd want to report, and the former is not. Unfortunately, detecting this is difficult because that means we'd have to implement the macOS canonicalization algorithm in Git and we don't want to do that. Maybe someone who frequently uses a macOS system that uses a case-sensitive file system can chime in here, but for now, I think -k should do what you want. -- brian m. carlson (he/him or they/them) Houston, Texas, US [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-03 22:58 ` brian m. carlson @ 2021-05-04 3:46 ` Junio C Hamano 2021-05-04 11:20 ` brian m. carlson 0 siblings, 1 reply; 14+ messages in thread From: Junio C Hamano @ 2021-05-04 3:46 UTC (permalink / raw) To: brian m. carlson; +Cc: Mark Amery, git "brian m. carlson" <sandals@crustytoothpaste.net> writes: > Yeah, this is because your operating system returns EINVAL in this case. > POSIX specifies EINVAL when you're trying to make a directory a > subdirectory of itself. Which, I mean, I guess is a valid > interpretation here, but it of course makes renaming the path needlessly > difficult. > ... > I suspect part of the problem here is two fold: on macOS we can't > distinguish an attempt to rename the path due to it folding or > canonicalizing to the same thing from a real attempt to move an actual > directory into itself. The latter would be a problem we'd want to > report, and the former is not. Unfortunately, detecting this is > difficult because that means we'd have to implement the macOS > canonicalization algorithm in Git and we don't want to do that. I agree we'd probably need to resort to macOS specific hack (like we have NFS or Coda specific hacks), but it may not be too bad. After seeing EINVAL, we can lstat src 'foo' and dst 'FOO', and realize that both are directories and have the same st_dev/st_ino, which should be fairly straightforward, no? For that, we do not exactly have to depend on any part of macOS-ism; we do depend on the traditional "within the same device, inum is a good way to tell if two filesystem entities are the same". The (mis)design of "git mv a b c d ... DST" that turns the request into "mv a DST/a && mv b DST/b && ..." too early may make the fallback behaviour a bit cumbersome to implement (mainly, we need to strip out the '/foo' part out of the failing dst FOO/foo to get the real destination directory 'FOO' the user gave us, before checking with that lstat dance), but since it is an error codepath, we can be as inefficient as we like, as long as we are correct ;-) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-04 3:46 ` Junio C Hamano @ 2021-05-04 11:20 ` brian m. carlson 2021-05-05 13:51 ` Johannes Schindelin 0 siblings, 1 reply; 14+ messages in thread From: brian m. carlson @ 2021-05-04 11:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: Mark Amery, git, Johannes Schindelin [-- Attachment #1: Type: text/plain, Size: 1922 bytes --] On 2021-05-04 at 03:46:12, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > Yeah, this is because your operating system returns EINVAL in this case. > > POSIX specifies EINVAL when you're trying to make a directory a > > subdirectory of itself. Which, I mean, I guess is a valid > > interpretation here, but it of course makes renaming the path needlessly > > difficult. > > ... > > I suspect part of the problem here is two fold: on macOS we can't > > distinguish an attempt to rename the path due to it folding or > > canonicalizing to the same thing from a real attempt to move an actual > > directory into itself. The latter would be a problem we'd want to > > report, and the former is not. Unfortunately, detecting this is > > difficult because that means we'd have to implement the macOS > > canonicalization algorithm in Git and we don't want to do that. > > I agree we'd probably need to resort to macOS specific hack (like we > have NFS or Coda specific hacks), but it may not be too bad. > > After seeing EINVAL, we can lstat src 'foo' and dst 'FOO', and > realize that both are directories and have the same st_dev/st_ino, > which should be fairly straightforward, no? > > For that, we do not exactly have to depend on any part of macOS-ism; > we do depend on the traditional "within the same device, inum is a > good way to tell if two filesystem entities are the same". Yes, although that won't work on Windows, which I don't believe has the concept of inodes and almost certainly has the same problem. CCing Dscho in case he has some ideas on how we can make this more resilient there. In any event, I'm not planning on writing a patch for this since I have no way to test it, but I'm sure someone who uses macOS could probably write one reasonably easily. -- brian m. carlson (he/him or they/them) Houston, Texas, US [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-04 11:20 ` brian m. carlson @ 2021-05-05 13:51 ` Johannes Schindelin 2021-05-06 0:38 ` Junio C Hamano 0 siblings, 1 reply; 14+ messages in thread From: Johannes Schindelin @ 2021-05-05 13:51 UTC (permalink / raw) To: brian m. carlson; +Cc: Junio C Hamano, Mark Amery, git Hi, On Tue, 4 May 2021, brian m. carlson wrote: > On 2021-05-04 at 03:46:12, Junio C Hamano wrote: > > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > > > Yeah, this is because your operating system returns EINVAL in this case. > > > POSIX specifies EINVAL when you're trying to make a directory a > > > subdirectory of itself. Which, I mean, I guess is a valid > > > interpretation here, but it of course makes renaming the path needlessly > > > difficult. > > > ... > > > I suspect part of the problem here is two fold: on macOS we can't > > > distinguish an attempt to rename the path due to it folding or > > > canonicalizing to the same thing from a real attempt to move an actual > > > directory into itself. The latter would be a problem we'd want to > > > report, and the former is not. Unfortunately, detecting this is > > > difficult because that means we'd have to implement the macOS > > > canonicalization algorithm in Git and we don't want to do that. > > > > I agree we'd probably need to resort to macOS specific hack (like we > > have NFS or Coda specific hacks), but it may not be too bad. > > > > After seeing EINVAL, we can lstat src 'foo' and dst 'FOO', and > > realize that both are directories and have the same st_dev/st_ino, > > which should be fairly straightforward, no? > > > > For that, we do not exactly have to depend on any part of macOS-ism; > > we do depend on the traditional "within the same device, inum is a > > good way to tell if two filesystem entities are the same". > > Yes, although that won't work on Windows, which I don't believe has the > concept of inodes and almost certainly has the same problem. CCing > Dscho in case he has some ideas on how we can make this more resilient > there. Windows does not really have inodes. But it has what it calls "file IDs". This concept is pretty much what you expect on inodes, except on that still-used file system called FAT (for full details, see https://docs.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information): In the FAT file system, the file ID is generated from the first cluster of the containing directory and the byte offset within the directory of the entry for the file. Some defragmentation products change this byte offset. (Windows in-box defragmentation does not.) Thus, a FAT file ID can change over time. Renaming a file in the FAT file system can also change the file ID, but only if the new file name is longer than the old one. In this instance, because we're not actually renaming (yet), the file ID would probably be okay. But in general, we should assume that we do not have inodes on Windows. There is another complication: it is not actually cheap to get to that file ID. For performance reasons, we introduced that (optional) FSCache feature in Git for Windows where cache the `lstat()` results because the assumption that `lstat()` is fast really only holds true on Linux. In fact, what we do is to use `FindFirstFile()`/`FindNextFile()` to enumerate an entire directory's worth of `lstat()` data, because funnily enough, it is a lot faster when we need an entire directory's worth of `lstat()` data anyway (calling `GetFileAttributes()` for individual files is of course faster, but not for a dozen files or so). But even `GetFileAttributes()` won't get you that "file ID". You have to create a file handle (via `CreateFile()`, which is *expensive*) and then call `GetFileInformationByHandle()`. Now, way too much of Git's source code still pretends that `lstat()` is just this fast operation and we can do it left and right and not say what we _actually_ want to know. That function is called when we need only parts of the `lstat()` data. Sometimes it is even used to determine whether a file or directory is present. But since Git does not have proper abstraction, `mingw_lstat()` _still_ has to fill in all the information. So _if_ we need that file ID information, I would be very much in favor of introducing a proper abstraction, where differentiate between the intention (think `get_inode(const char *path)`) from the platform-dependent implementation detail (think `lstat()`, `CreateFile()` and `GetFileInformationByHandle()`). Ciao, Dscho > In any event, I'm not planning on writing a patch for this since I have > no way to test it, but I'm sure someone who uses macOS could probably > write one reasonably easily. > -- > brian m. carlson (he/him or they/them) > Houston, Texas, US > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-05 13:51 ` Johannes Schindelin @ 2021-05-06 0:38 ` Junio C Hamano 0 siblings, 0 replies; 14+ messages in thread From: Junio C Hamano @ 2021-05-06 0:38 UTC (permalink / raw) To: Johannes Schindelin; +Cc: brian m. carlson, Mark Amery, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > So _if_ we need that file ID information, I would be very much in favor of > introducing a proper abstraction, where differentiate between the > intention (think `get_inode(const char *path)`) from the > platform-dependent implementation detail (think `lstat()`, `CreateFile()` > and `GetFileInformationByHandle()`). I agree in principle. Essentially, we need to (1) examine all calls to lstat(2) we make in our codebase, and find out what members of "stat" are really used out of the result for each callsite. This will be different from caller to caller (some callers may want only ino, other callers may want ino, size, and mtime, etc.), and we would learn that there are N patterns. (2) write N abstracted helper functions (or a single helper that takes const char *path, struct stat *, and an enum to tell which one of N patterns this call is about). (3) replace each lstat(2) call with one of these N abstract helper functions. get_inode() might be one of these N functions, but what is important is that the current callers that want ino and something else should not be penalized by making two separte calls get_inode() and get_other_things(), which, when done naively, would result in two lstat(2) calls. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-03 17:25 Bug: Changing folder case with `git mv` crashes on case-insensitive file system Mark Amery 2021-05-03 22:58 ` brian m. carlson @ 2021-05-04 15:19 ` Torsten Bögershausen 2021-05-05 0:23 ` Junio C Hamano 1 sibling, 1 reply; 14+ messages in thread From: Torsten Bögershausen @ 2021-05-04 15:19 UTC (permalink / raw) To: Mark Amery; +Cc: git On Mon, May 03, 2021 at 06:25:43PM +0100, Mark Amery wrote: > Attempting to change the case of a folder's name using a command like > `git mv foo FOO` crashes on case-insensitive file systems, like the > default APFS used on Apple Macs. > > Here are simple steps to repro this: > > $ mkdir testrepo && cd testrepo && git init > Initialized empty Git repository in /Users/markamery/testrepo/.git/ > $ mkdir foo && touch foo/bar && git add foo && git commit -m bla > [master (root-commit) a7e9f5f] bla > 1 file changed, 0 insertions(+), 0 deletions(-) > create mode 100644 foo/bar > $ git mv foo FOO > fatal: renaming 'foo' failed: Invalid argument > $ echo $? > 128 > $ git status > On branch master > nothing to commit, working tree clean > > If I create a case-sensitive APFS volume using Disk Utility and try > the commands above on that volume, `git mv foo FOO` works correctly: > it emits no output, exits with a 0 status code, and stages a change > renaming `foo/bar` to `FOO/bar`. However, on my main case-insensitive > volume, `git mv` behaves as shown above: it exits with code 128, > prints an "Invalid argument" error message, and does not stage any > changes. > > The command still fails in the same way if you use `git mv --force` > instead of just `git mv`. > > Note that previously, `git mv` could not change the case of *file* > names on case-insensitive file systems, until that was fixed in commit > https://github.com/git/git/commit/baa37bff9a845471754d3f47957d58a6ccc30058. > I'm guessing there's a different code path that needs fixing for > changing the case of *folders*. > > As far as I can tell, this error has never been reported to the Git > mailing list, but it seems to be encountered frequently; > https://stackoverflow.com/questions/3011625/git-mv-and-only-change-case-of-directory > mentions this bug and has 86000 views. > > In case it's relevant, here's my system info as output by `git bugreport`: > > [System Info] > git version: > git version 2.31.1 > cpu: x86_64 > no commit associated with this build > sizeof-long: 8 > sizeof-size_t: 8 > shell-path: /bin/sh > uname: Darwin 18.7.0 Darwin Kernel Version 18.7.0: Mon Apr 27 > 20:09:39 PDT 2020; root:xnu-4903.278.35~1/RELEASE_X86_64 x86_64 > compiler info: clang: 11.0.0 (clang-1100.0.33.17) > libc info: no libc information available > $SHELL (typically, interactive shell): /bin/bash Thanks for reporting - that's always good. To my undestanding we try to rename foo/ into FOO/. But because FOO/ already "exists" as directory, Git tries to move foo/ into FOO/foo, which fails. And no, the problem is probably not restricted to MacOs, Windows and all case-insenstive file systems should show the same, but I haven't tested yet, so it's more a suspicion. The following diff allows to move foo/ into FOO/ If someone wants to make a patch out if, that would be good. diff --git a/builtin/mv.c b/builtin/mv.c index 3fccdcb6452..fbf184bcfa9 100644 --- a/builtin/mv.c +++ b/builtin/mv.c @@ -163,8 +163,10 @@ int cmd_mv(int argc, const char **argv, const char *prefix) destination = internal_prefix_pathspec(dest_path[0], argv, argc, DUP_BASENAME); else if (!lstat(dest_path[0], &st) && S_ISDIR(st.st_mode)) { - dest_path[0] = add_slash(dest_path[0]); - destination = internal_prefix_pathspec(dest_path[0], argv, argc, DUP_BASENAME); + if (!ignore_case || strcasecmp(source[0], dest_path[0])) { + dest_path[0] = add_slash(dest_path[0]); + destination = internal_prefix_pathspec(dest_path[0], argv, argc, DUP_BASENAME); + } } else { if (argc != 1) die(_("destination '%s' is not a directory"), dest_path[0]); @@ -187,9 +189,12 @@ int cmd_mv(int argc, const char **argv, const char *prefix) (dst[length] == 0 || dst[length] == '/')) { bad = _("can not move directory into itself"); } else if ((src_is_dir = S_ISDIR(st.st_mode)) - && lstat(dst, &st) == 0) - bad = _("cannot move directory over file"); - else if (src_is_dir) { + && lstat(dst, &st) == 0) { + if (!ignore_case || strcasecmp(src, dst)){ + bad = _("cannot move directory over file"); + } + } + if (!bad && src_is_dir) { int first = cache_name_pos(src, length), last; if (first >= 0) @@ -277,7 +282,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) if (mode != INDEX && rename(src, dst) < 0) { if (ignore_errors) continue; - die_errno(_("renaming '%s' failed"), src); + die_errno(_("renaming '%s' into '%s' failed"), src, dst); } if (submodule_gitfile[i]) { if (!update_path_in_gitmodules(src, dst)) ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-04 15:19 ` Torsten Bögershausen @ 2021-05-05 0:23 ` Junio C Hamano 2021-05-05 2:12 ` brian m. carlson 2021-05-06 4:34 ` Torsten Bögershausen 0 siblings, 2 replies; 14+ messages in thread From: Junio C Hamano @ 2021-05-05 0:23 UTC (permalink / raw) To: Torsten Bögershausen; +Cc: Mark Amery, git Torsten Bögershausen <tboegi@web.de> writes: > To my undestanding we try to rename > foo/ into FOO/. > But because FOO/ already "exists" as directory, > Git tries to move foo/ into FOO/foo, which fails. > > And no, the problem is probably not restricted to MacOs, > Windows and all case-insenstive file systems should show > the same, but I haven't tested yet, so it's more a suspicion. > > The following diff allows to move foo/ into FOO/ > If someone wants to make a patch out if, that would be good. Is strcasecmp() sufficient for macOS whose filesystem has not just case insensitivity but UTF-8 normalization issues? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-05 0:23 ` Junio C Hamano @ 2021-05-05 2:12 ` brian m. carlson 2021-05-06 4:34 ` Torsten Bögershausen 1 sibling, 0 replies; 14+ messages in thread From: brian m. carlson @ 2021-05-05 2:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: Torsten Bögershausen, Mark Amery, git [-- Attachment #1: Type: text/plain, Size: 1389 bytes --] On 2021-05-05 at 00:23:05, Junio C Hamano wrote: > Torsten Bögershausen <tboegi@web.de> writes: > > > To my undestanding we try to rename > > foo/ into FOO/. > > But because FOO/ already "exists" as directory, > > Git tries to move foo/ into FOO/foo, which fails. > > > > And no, the problem is probably not restricted to MacOs, > > Windows and all case-insenstive file systems should show > > the same, but I haven't tested yet, so it's more a suspicion. > > > > The following diff allows to move foo/ into FOO/ > > If someone wants to make a patch out if, that would be good. > > Is strcasecmp() sufficient for macOS whose filesystem has not just > case insensitivity but UTF-8 normalization issues? strcasecmp is locale sensitive, so I would say it is not a good choice here. Notably, that means we have differing behavior with "i" and "I" if the locale is Turkish versus English. Only one of those behaviors will be a correct match for what the kernel does, since it is not locale sensitive. Moreover, on Windows, the version of Unicode used for case mapping is written into an NTFS file system when it's created, so a fresh install may have different behavior from an upgrade, and strcasecmp won't honor that. And, as you pointed out, it doesn't handle Unicode normalization at all. -- brian m. carlson (he/him or they/them) Houston, Texas, US [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-05 0:23 ` Junio C Hamano 2021-05-05 2:12 ` brian m. carlson @ 2021-05-06 4:34 ` Torsten Bögershausen 2021-05-06 9:12 ` Mark Amery 1 sibling, 1 reply; 14+ messages in thread From: Torsten Bögershausen @ 2021-05-06 4:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: Mark Amery, git On Wed, May 05, 2021 at 09:23:05AM +0900, Junio C Hamano wrote: > Torsten Bögershausen <tboegi@web.de> writes: > > > To my undestanding we try to rename > > foo/ into FOO/. > > But because FOO/ already "exists" as directory, > > Git tries to move foo/ into FOO/foo, which fails. > > > > And no, the problem is probably not restricted to MacOs, > > Windows and all case-insenstive file systems should show > > the same, but I haven't tested yet, so it's more a suspicion. > > > > The following diff allows to move foo/ into FOO/ > > If someone wants to make a patch out if, that would be good. > > Is strcasecmp() sufficient for macOS whose filesystem has not just > case insensitivity but UTF-8 normalization issues? > Strictly speaking: no. The Git code doesn't handle UTF-8 uppper/lower case at all: git mv bar.txt BAR.TXT works because strcasecmp() is catching it. git mv bär.txt BÄR.TXT needs the long way: git mv bär.txt baer.txt && git mv baer.txt BÄR.TXT We have been restricting the case-change-is-allowed to ASCII filenames all the time. There is no information, which code points map onto each other in Git, since this is all file system dependent. NTFS has one way, HFS+, APFS another, VFAT a third one, and if I expose ext4 via SAMBA we probably have another one. Not mentioniong that ext4 can be use case-insensitve on later Linux kernels, which sticks to unicode. Or Git repos running on machines using ISO-8859-1, those should be rare these days. That said, people are renaming files in ASCII only and are happy, and in that sense renaming directories in ASCII can be supported without major hassle. And the inode approach mentioned as well: This could go on top of strcasecmp() to cover non-ASCII filenames or other oddities, if someone implements it. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-06 4:34 ` Torsten Bögershausen @ 2021-05-06 9:12 ` Mark Amery 2021-05-06 13:11 ` Bagas Sanjaya ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Mark Amery @ 2021-05-06 9:12 UTC (permalink / raw) To: Torsten Bögershausen; +Cc: Junio C Hamano, git So, I'm just a dumb Git user who doesn't even write C, so much of this discussion is over my head, but I have a few thoughts that may be helpful: • The mv utility on Mac is capable of doing `mv bär.txt bÄr.txt` just fine. Maybe `git mv` can learn something from whatever `mv` does? • On a case-insensitive file system, `git mv somedir sOMEdir` is a rename. But on a case-sensitive file system, it might NOT be a rename; it might be the case that `somedir` and `sOMEdir` both exist and that the command should put `somedir` inside `sOMEdir`. I mention this because I can imagine some naive attempts at fixing the original bug by doing a case-insensitive comparison of the two names ending up breaking this behaviour on case-sensitive file systems by wrongly treating such a command as a rename. It's probably worth having a test that this scenario gets handled cleanly on case-sensitive file systems? (I haven't checked whether Torsten's proposed diff falls into this trap or not.) • Above, Torsten mentions that there are filesystem-specific rules about what names are equal to each other that Git can't easily handle, because they go beyond just ASCII case changes. In that case, maybe the right solution is to always defer the question to the filesystem rather than Git trying to figure out the answer "in its head"? That is: first check the inode or file ID of the src and dst passed to `git mv`. If they are different and the second one is a folder, move src inside the existing folder. If either they are the same or the second one is not a folder, then do a rename. It seems to me that this approach automatically handles stuff like `git mv bär.txt bÄr.txt` plus any other rules about names being equal (like two different sequences of code points that both express "à"), all without Git ever needing to explicitly check whether two names are case-insensitively equal. Am I missing something? Sorry if any of the above is dumb or if I'm reiterating things others have already said without realising it. On Thu, May 6, 2021 at 5:34 AM Torsten Bögershausen <tboegi@web.de> wrote: > > On Wed, May 05, 2021 at 09:23:05AM +0900, Junio C Hamano wrote: > > Torsten Bögershausen <tboegi@web.de> writes: > > > > > To my undestanding we try to rename > > > foo/ into FOO/. > > > But because FOO/ already "exists" as directory, > > > Git tries to move foo/ into FOO/foo, which fails. > > > > > > And no, the problem is probably not restricted to MacOs, > > > Windows and all case-insenstive file systems should show > > > the same, but I haven't tested yet, so it's more a suspicion. > > > > > > The following diff allows to move foo/ into FOO/ > > > If someone wants to make a patch out if, that would be good. > > > > Is strcasecmp() sufficient for macOS whose filesystem has not just > > case insensitivity but UTF-8 normalization issues? > > > > Strictly speaking: no. > > The Git code doesn't handle UTF-8 uppper/lower case at all: > git mv bar.txt BAR.TXT works because strcasecmp() is catching it. > > git mv bär.txt BÄR.TXT needs the long way: > git mv bär.txt baer.txt && git mv baer.txt BÄR.TXT > > We have been restricting the case-change-is-allowed to ASCII filenames > all the time. > There is no information, which code points map onto each other in Git, > since this is all file system dependent. > NTFS has one way, HFS+, APFS another, VFAT a third one, and if I expose > ext4 via SAMBA we probably have another one. > Not mentioniong that ext4 can be use case-insensitve on later Linux kernels, > which sticks to unicode. > Or Git repos running on machines using ISO-8859-1, those should be rare these > days. > > That said, people are renaming files in ASCII only and are happy, > and in that sense renaming directories in ASCII can be supported > without major hassle. > > And the inode approach mentioned as well: > This could go on top of strcasecmp() to cover non-ASCII filenames > or other oddities, if someone implements it. > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-06 9:12 ` Mark Amery @ 2021-05-06 13:11 ` Bagas Sanjaya 2021-05-06 14:53 ` Torsten Bögershausen 2021-05-06 21:03 ` Junio C Hamano 2 siblings, 0 replies; 14+ messages in thread From: Bagas Sanjaya @ 2021-05-06 13:11 UTC (permalink / raw) To: Mark Amery, Torsten Bögershausen; +Cc: Junio C Hamano, git On 06/05/21 16.12, Mark Amery wrote: > • Above, Torsten mentions that there are filesystem-specific rules > about what names are equal to each other that Git can't easily handle, > because they go beyond just ASCII case changes. In that case, maybe > the right solution is to always defer the question to the filesystem > rather than Git trying to figure out the answer "in its head"? > > That is: first check the inode or file ID of the src and dst passed > to `git mv`. If they are different and the second one is a folder, > move src inside the existing folder. If either they are the same or > the second one is not a folder, then do a rename. > > It seems to me that this approach automatically handles stuff like > `git mv bär.txt bÄr.txt` plus any other rules about names being equal > (like two different sequences of code points that both express "à"), > all without Git ever needing to explicitly check whether two names are > case-insensitively equal. Am I missing something? On case when src and dst have different inodes and dst is a file, rather than a directory, mv will overwrite dst with contents of src (on Linux). -- An old man doll... just what I always wanted! - Clara ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-06 9:12 ` Mark Amery 2021-05-06 13:11 ` Bagas Sanjaya @ 2021-05-06 14:53 ` Torsten Bögershausen 2021-05-06 21:03 ` Junio C Hamano 2 siblings, 0 replies; 14+ messages in thread From: Torsten Bögershausen @ 2021-05-06 14:53 UTC (permalink / raw) To: Mark Amery; +Cc: Junio C Hamano, git On Thu, May 06, 2021 at 10:12:40AM +0100, Mark Amery wrote: > So, I'm just a dumb Git user who doesn't even write C, so much of this > discussion is over my head, but I have a few thoughts that may be > helpful: > > • The mv utility on Mac is capable of doing `mv bär.txt bÄr.txt` just > fine. Maybe `git mv` can learn something from whatever `mv` does? Yes, Git can do the same. The thing is, that Git avoids to overwrite of existing files. Just to be nice. In that sense, if the destination exists, Git refuses the mv, unless you use git mv -f Having said that, git mv bär.txt bÄr.txt will not work (without the -f) git mv bear.txt BEAR.txt does work, > > • On a case-insensitive file system, `git mv somedir sOMEdir` is a > rename. But on a case-sensitive file system, it might NOT be a rename; > it might be the case that `somedir` and `sOMEdir` both exist and that > the command should put `somedir` inside `sOMEdir`. I mention this > because I can imagine some naive attempts at fixing the original bug > by doing a case-insensitive comparison of the two names ending up > breaking this behaviour on case-sensitive file systems by wrongly > treating such a command as a rename. It's probably worth having a test > that this scenario gets handled cleanly on case-sensitive file > systems? (I haven't checked whether Torsten's proposed diff falls into > this trap or not.) Tests are needed - I should have started with those. But I didn't intend to send a patch (yet), just sharing ideas and knowledge. Which may enable someone to write a patch. > > • Above, Torsten mentions that there are filesystem-specific rules > about what names are equal to each other that Git can't easily handle, > because they go beyond just ASCII case changes. In that case, maybe > the right solution is to always defer the question to the filesystem > rather than Git trying to figure out the answer "in its head"? There are different trade-offs: So far I am only aware of people asking for the git mv bear.txt BEAR.txt rename. Just because they are all SW developers ? I don't know. And just because SW developers are developping Git, the case-insensitive string compare is good enough, it is working for them/us. So things are as they are. > > That is: first check the inode or file ID of the src and dst passed > to `git mv`. If they are different and the second one is a folder, > move src inside the existing folder. If either they are the same or > the second one is not a folder, then do a rename. Yes. In short: patches are welcome. In long: inodes don't work on Windows (without a major effort) > > It seems to me that this approach automatically handles stuff like > `git mv bär.txt bÄr.txt` plus any other rules about names being equal > (like two different sequences of code points that both express "à"), > all without Git ever needing to explicitly check whether two names are > case-insensitively equal. Am I missing something? That could be a solution. There may are situations/configurations, where inodes don't work: What happens if a Windows server exports a file system to MacOs ? To Linux ? Do we have working inodes ? What about other networking combinations ? Our code should handle them well as well. > > Sorry if any of the above is dumb or if I'm reiterating things others > have already said without realising it. No problem. Actually I realized that we used top-posting here, So I remove the reset to make it more readable. [] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system 2021-05-06 9:12 ` Mark Amery 2021-05-06 13:11 ` Bagas Sanjaya 2021-05-06 14:53 ` Torsten Bögershausen @ 2021-05-06 21:03 ` Junio C Hamano 2 siblings, 0 replies; 14+ messages in thread From: Junio C Hamano @ 2021-05-06 21:03 UTC (permalink / raw) To: Mark Amery; +Cc: Torsten Bögershausen, git Mark Amery <markrobertamery@gmail.com> writes: > That is: first check the inode or file ID of the src and dst passed > to `git mv`. If they are different and the second one is a folder, > move src inside the existing folder. If either they are the same or > the second one is not a folder, then do a rename. > > It seems to me that this approach automatically handles stuff like > `git mv bär.txt bÄr.txt` plus any other rules about names being equal > (like two different sequences of code points that both express "à"), > all without Git ever needing to explicitly check whether two names are > case-insensitively equal. Am I missing something? > > Sorry if any of the above is dumb or if I'm reiterating things others > have already said without realising it. There is nothing dumb about the idea. I think that is essentially what brian and I outlined in the earlier exchange, except that we need to use something other than inum on Windows to decide if two pathnames refer to the same filesystem entity, and the approach we suggested was to use the inum-based check _after_ seeing the current code fail with EINVAL in the error codepath, instead of doing so upfront. Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-05-06 21:03 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-03 17:25 Bug: Changing folder case with `git mv` crashes on case-insensitive file system Mark Amery 2021-05-03 22:58 ` brian m. carlson 2021-05-04 3:46 ` Junio C Hamano 2021-05-04 11:20 ` brian m. carlson 2021-05-05 13:51 ` Johannes Schindelin 2021-05-06 0:38 ` Junio C Hamano 2021-05-04 15:19 ` Torsten Bögershausen 2021-05-05 0:23 ` Junio C Hamano 2021-05-05 2:12 ` brian m. carlson 2021-05-06 4:34 ` Torsten Bögershausen 2021-05-06 9:12 ` Mark Amery 2021-05-06 13:11 ` Bagas Sanjaya 2021-05-06 14:53 ` Torsten Bögershausen 2021-05-06 21:03 ` Junio C Hamano
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.