git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	Mark Amery <markrobertamery@gmail.com>,
	git@vger.kernel.org
Subject: Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
Date: Wed, 5 May 2021 15:51:14 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2105051528030.50@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <YJEuBqVVa/7x+jrZ@camp.crustytoothpaste.net>

Hi,

On Tue, 4 May 2021, brian m. carlson wrote:

> On 2021-05-04 at 03:46:12, Junio C Hamano wrote:
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >
> > > Yeah, this is because your operating system returns EINVAL in this case.
> > > POSIX specifies EINVAL when you're trying to make a directory a
> > > subdirectory of itself.  Which, I mean, I guess is a valid
> > > interpretation here, but it of course makes renaming the path needlessly
> > > difficult.
> > > ...
> > > I suspect part of the problem here is two fold: on macOS we can't
> > > distinguish an attempt to rename the path due to it folding or
> > > canonicalizing to the same thing from a real attempt to move an actual
> > > directory into itself.  The latter would be a problem we'd want to
> > > report, and the former is not.  Unfortunately, detecting this is
> > > difficult because that means we'd have to implement the macOS
> > > canonicalization algorithm in Git and we don't want to do that.
> >
> > I agree we'd probably need to resort to macOS specific hack (like we
> > have NFS or Coda specific hacks), but it may not be too bad.
> >
> > After seeing EINVAL, we can lstat src 'foo' and dst 'FOO', and
> > realize that both are directories and have the same st_dev/st_ino,
> > which should be fairly straightforward, no?
> >
> > For that, we do not exactly have to depend on any part of macOS-ism;
> > we do depend on the traditional "within the same device, inum is a
> > good way to tell if two filesystem entities are the same".
>
> Yes, although that won't work on Windows, which I don't believe has the
> concept of inodes and almost certainly has the same problem.  CCing
> Dscho in case he has some ideas on how we can make this more resilient
> there.

Windows does not really have inodes. But it has what it calls "file IDs".
This concept is pretty much what you expect on inodes, except on that
still-used file system called FAT (for full details, see
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information):

	In the FAT file system, the file ID is generated from the first
	cluster of the containing directory and the byte offset within the
	directory of the entry for the file. Some defragmentation products
	change this byte offset. (Windows in-box defragmentation does
	not.) Thus, a FAT file ID can change over time. Renaming a file in
	the FAT file system can also change the file ID, but only if the
	new file name is longer than the old one.

In this instance, because we're not actually renaming (yet), the file ID
would probably be okay. But in general, we should assume that we do not
have inodes on Windows.

There is another complication: it is not actually cheap to get to that
file ID. For performance reasons, we introduced that (optional) FSCache
feature in Git for Windows where cache the `lstat()` results because the
assumption that `lstat()` is fast really only holds true on Linux.

In fact, what we do is to use `FindFirstFile()`/`FindNextFile()` to
enumerate an entire directory's worth of `lstat()` data, because funnily
enough, it is a lot faster when we need an entire directory's worth of
`lstat()` data anyway (calling `GetFileAttributes()` for individual files
is of course faster, but not for a dozen files or so).

But even `GetFileAttributes()` won't get you that "file ID". You have to
create a file handle (via `CreateFile()`, which is *expensive*) and then
call `GetFileInformationByHandle()`.

Now, way too much of Git's source code still pretends that `lstat()` is
just this fast operation and we can do it left and right and not say what
we _actually_ want to know. That function is called when we need only
parts of the `lstat()` data. Sometimes it is even used to determine
whether a file or directory is present. But since Git does not have proper
abstraction, `mingw_lstat()` _still_ has to fill in all the information.

So _if_ we need that file ID information, I would be very much in favor of
introducing a proper abstraction, where differentiate between the
intention (think `get_inode(const char *path)`) from the
platform-dependent implementation detail (think `lstat()`, `CreateFile()`
and `GetFileInformationByHandle()`).

Ciao,
Dscho

> In any event, I'm not planning on writing a patch for this since I have
> no way to test it, but I'm sure someone who uses macOS could probably
> write one reasonably easily.
> --
> brian m. carlson (he/him or they/them)
> Houston, Texas, US
>

  reply	other threads:[~2021-05-05 13:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-03 17:25 Bug: Changing folder case with `git mv` crashes on case-insensitive file system Mark Amery
2021-05-03 22:58 ` brian m. carlson
2021-05-04  3:46   ` Junio C Hamano
2021-05-04 11:20     ` brian m. carlson
2021-05-05 13:51       ` Johannes Schindelin [this message]
2021-05-06  0:38         ` Junio C Hamano
2021-05-04 15:19 ` Torsten Bögershausen
2021-05-05  0:23   ` Junio C Hamano
2021-05-05  2:12     ` brian m. carlson
2021-05-06  4:34     ` Torsten Bögershausen
2021-05-06  9:12       ` Mark Amery
2021-05-06 13:11         ` Bagas Sanjaya
2021-05-06 14:53         ` Torsten Bögershausen
2021-05-06 21:03         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2105051528030.50@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=markrobertamery@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).