All of lore.kernel.org
 help / color / mirror / Atom feed
* RFD: Handling case-colliding filenames on case-insensitive filesystems
@ 2011-02-23 17:11 Johan Herland
  2011-02-23 18:56 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Johan Herland @ 2011-02-23 17:11 UTC (permalink / raw)
  To: git

Hi,

At $dayjob we recently had a problem where a developer pushed a commit 
that added new files, two of which were named "foobar.TXT" 
and "FOOBAR.txt". When this commit (or anything based on it) is checked 
out by one of our Windows developers, Git maps two files in its index 
to a single file on the filesystem, and ends up reporting a diff on one 
of those files. The diff won't go away unless one (or both) of the 
case-colliding files is removed from the repo. Obivously, the 
persisting diff prevents the developer from easily rebasing, switching 
branches, merging, bisecting and a number of other useful tasks.

The root of the problem is that the case-colliding files were added in 
the first place, and this should obviously be prevented in projects 
that aim to be compatible with case-insensitive filesystems. To that 
end, I'm currently writing an update hook which will prevent 
case-colliding files from being pushed to our central repo.

However, given that this has already happened, how can we design Git to 
handle this situation more gracefully. In other words, how can we 
better handle checking out filenames that collide on case-insensitive 
filesystems?

My first idea was to simply refuse checking out trees with 
case-colliding filenames. I.e. when core.ignoreCase is enabled, we 
check whether any of the files we're about to checkout map to the same 
filesystem representation, and if they do, we abort the checkout and 
complain loudly to the user. However, that doesn't really help the user 
at all. Failure to checkout would only make it much harder to fix the 
issue.

A colleague suggested instead that Git should notice that the collision 
will occur, and work around the failure to represent the repository 
objects in the file system with a one-to-one match. Either by checking 
out only _one_ of the colliding files, or by using a non-colliding name 
for the second file. After all, Git already has functionality for 
manipulating the file contents on checkout (CRLF conversion). Doesn't 
it make sense to add functionality for manipulating the _directory_ 
contents on checkout as well? Even if that makes sense, I'm not sure 
that implementing it will be straightforward.

Are there better suggestions on how to deal with this?


Thanks,

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFD: Handling case-colliding filenames on case-insensitive filesystems
  2011-02-23 17:11 RFD: Handling case-colliding filenames on case-insensitive filesystems Johan Herland
@ 2011-02-23 18:56 ` Junio C Hamano
  2011-02-23 19:01   ` Shawn Pearce
  2011-02-23 19:07 ` Jay Soffian
  2011-02-24  0:30 ` Johan Herland
  2 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2011-02-23 18:56 UTC (permalink / raw)
  To: Johan Herland; +Cc: git

Johan Herland <johan@herland.net> writes:

> Are there better suggestions on how to deal with this?

Just from the top off my head, perhaps you can go to the same route as
symbolic link support on filesystems that are not symlink-capable?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFD: Handling case-colliding filenames on case-insensitive filesystems
  2011-02-23 18:56 ` Junio C Hamano
@ 2011-02-23 19:01   ` Shawn Pearce
  2011-02-23 19:27     ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Shawn Pearce @ 2011-02-23 19:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, git

On Wed, Feb 23, 2011 at 10:56, Junio C Hamano <gitster@pobox.com> wrote:
> Johan Herland <johan@herland.net> writes:
>
>> Are there better suggestions on how to deal with this?
>
> Just from the top off my head, perhaps you can go to the same route as
> symbolic link support on filesystems that are not symlink-capable?

I don't know how that helps here Junio. On those systems we write a
text file holding the symlink contents. That text file name is at
least still unique in the working directory.


Perhaps instead the "colliding file" becomes a directory that stores
all of the files below it, each with a unique name and a table of
contents, e.g.:

  foo.txt/

    git-contents:
      file-A foo.TXT
      file-B FOO.txt

    file-A:
      ... the contents of foo.TXT ..

    file-B:
      ... the contents of FOO.txt ..

It would be hard to work with in the index, and the project's build
system might fail, but at least the user can edit both files using
normal tools in the working tree, and can see which one is which using
the magic git-contents file.

This is such an odd corner case though, we need really good tests for
it, because it won't come up in daily usage very often. :-(

-- 
Shawn.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFD: Handling case-colliding filenames on case-insensitive filesystems
  2011-02-23 17:11 RFD: Handling case-colliding filenames on case-insensitive filesystems Johan Herland
  2011-02-23 18:56 ` Junio C Hamano
@ 2011-02-23 19:07 ` Jay Soffian
  2011-02-23 19:17   ` Matthieu Moy
  2011-02-23 22:52   ` Marc Branchaud
  2011-02-24  0:30 ` Johan Herland
  2 siblings, 2 replies; 12+ messages in thread
From: Jay Soffian @ 2011-02-23 19:07 UTC (permalink / raw)
  To: Johan Herland; +Cc: git

On Wed, Feb 23, 2011 at 12:11 PM, Johan Herland <johan@herland.net> wrote:
> A colleague suggested instead that Git should notice that the collision
> will occur, and work around the failure to represent the repository
> objects in the file system with a one-to-one match. Either by checking
> out only _one_ of the colliding files, or by using a non-colliding name
> for the second file. After all, Git already has functionality for
> manipulating the file contents on checkout (CRLF conversion). Doesn't
> it make sense to add functionality for manipulating the _directory_
> contents on checkout as well? Even if that makes sense, I'm not sure
> that implementing it will be straightforward.
>
> Are there better suggestions on how to deal with this?

The general problem is aliasing in the working-tree, of which
case-insenitivity is the most common form, but it also happens due to
HFS's use of NFD. A search on gmane for "insensitive" or "nfd" will
return many hits.

I think the argument against remapping filenames is that it doesn't
really help the user.

Let's say (for the sake of argument) that git supported remapping
between the index and the working-tree. Further, my repo has:

$ cat Foo.c
#include "Foo.h"

$ cat foo.c
#include "foo.h"

And on a case-insensitive file-system, git has remapped foo.[ch] to
foo~2.[ch] for the purposes of avoiding collisions on checkout.

The checkout can't be compiled correctly, so what's the point of even
allowing it?

(I'm not saying this is right/wrong, just that's been one of the
arguments against remapping.)

j.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFD: Handling case-colliding filenames on case-insensitive filesystems
  2011-02-23 19:07 ` Jay Soffian
@ 2011-02-23 19:17   ` Matthieu Moy
  2011-02-23 22:52   ` Marc Branchaud
  1 sibling, 0 replies; 12+ messages in thread
From: Matthieu Moy @ 2011-02-23 19:17 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Johan Herland, git

Jay Soffian <jaysoffian@gmail.com> writes:

> The checkout can't be compiled correctly, so what's the point of even
> allowing it?

There's at least one: allow the user to fix it.

I'm not a user of case-insensitive filesystem, but I guess it must be
terribly frustrating for a user to have the tool say "your repo is so
broken that I'm not even going to show you what's it in".

Now, it doesn't solve the problem of people having case-colliding
filenames on purpose ...

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFD: Handling case-colliding filenames on case-insensitive filesystems
  2011-02-23 19:01   ` Shawn Pearce
@ 2011-02-23 19:27     ` Junio C Hamano
  2011-02-24  0:58       ` Johan Herland
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2011-02-23 19:27 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Johan Herland, git

Shawn Pearce <spearce@spearce.org> writes:

> On Wed, Feb 23, 2011 at 10:56, Junio C Hamano <gitster@pobox.com> wrote:
>> Johan Herland <johan@herland.net> writes:
>>
>>> Are there better suggestions on how to deal with this?
>>
>> Just from the top off my head, perhaps you can go to the same route as
>> symbolic link support on filesystems that are not symlink-capable?
>
> I don't know how that helps here Junio. On those systems we write a
> text file holding the symlink contents. That text file name is at
> least still unique in the working directory.

Heh, I probably should have more explicitly hinted that I was suggesting
to rename, e.g. foo-1.txt, when checking out conflicting paths.  I chose
not to be precise because I knew readers were intelligent enough to read
what's between my lines themselves ;-).

Just like a text file that records link target is useless as a symlink,
such a file would be useless for its original purpose (e.g. renaming
xt_TCPMSS.c to xt_tcpmss-1.c to avoid a collision with xt_tcpmss.c would
not help when its associated Makefile wants to build xt_TCPMSS.o and
xt_tcpmss.o next to each other), just like your "treat everybody the same
way and make that a directory" approach.

I think two things are sensible to do, are relatively low hanging fruits,
and are of low risk:

 - break checkout on such a tree on incapable filesystems; and

 - per project configuration (or attribute given to paths underneath a
   particular directory) that forbids or warns addition of case colliding
   paths to the index; enforce it at write_index() codepath; and

 - if we choose to just warn in the second item above instead of downright
   forbidding, barf in cache_tree_update() codepath when the per project
   configuration (or attribute) triggers upon case colliding paths, to
   prevent a commit from being made.

I think "warn at add time, fail at write-tree time" is more preferrable,
as it might be more convenient if you can add hello.c while you still have
HELLO.c in the index as long as you do not forget to remove HELLO.c from
the index before making your next commit.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFD: Handling case-colliding filenames on case-insensitive filesystems
  2011-02-23 19:07 ` Jay Soffian
  2011-02-23 19:17   ` Matthieu Moy
@ 2011-02-23 22:52   ` Marc Branchaud
  2011-02-23 23:09     ` Greg Troxel
  1 sibling, 1 reply; 12+ messages in thread
From: Marc Branchaud @ 2011-02-23 22:52 UTC (permalink / raw)
  To: Jay Soffian; +Cc: Johan Herland, git

On 11-02-23 02:07 PM, Jay Soffian wrote:
> On Wed, Feb 23, 2011 at 12:11 PM, Johan Herland <johan@herland.net> wrote:
>> A colleague suggested instead that Git should notice that the collision
>> will occur, and work around the failure to represent the repository
>> objects in the file system with a one-to-one match. Either by checking
>> out only _one_ of the colliding files, or by using a non-colliding name
>> for the second file. After all, Git already has functionality for
>> manipulating the file contents on checkout (CRLF conversion). Doesn't
>> it make sense to add functionality for manipulating the _directory_
>> contents on checkout as well? Even if that makes sense, I'm not sure
>> that implementing it will be straightforward.
>>
>> Are there better suggestions on how to deal with this?
> 
> The general problem is aliasing in the working-tree, of which
> case-insenitivity is the most common form, but it also happens due to
> HFS's use of NFD. A search on gmane for "insensitive" or "nfd" will
> return many hits.
> 
> I think the argument against remapping filenames is that it doesn't
> really help the user.
> 
> Let's say (for the sake of argument) that git supported remapping
> between the index and the working-tree. Further, my repo has:
> 
> $ cat Foo.c
> #include "Foo.h"
> 
> $ cat foo.c
> #include "foo.h"
> 
> And on a case-insensitive file-system, git has remapped foo.[ch] to
> foo~2.[ch] for the purposes of avoiding collisions on checkout.
> 
> The checkout can't be compiled correctly, so what's the point of even
> allowing it?

In our case it would be useful to still have that checkout because the people
working on the case-insensitive systems are dealing with a different part of
the tree and don't care about the part with the collision.

A build designed to exploit case-sensitivity obviously won't work on a
case-insensitive system, but there's no reason to expect a git repo to have a
single, monolithic build.  There are a couple of parts of our code tree --
parts that are out of our control -- that use case sensitive file names, but
most of it doesn't.  It would be good if git would allow people on
case-insensitive systems to work with the repository, if not the complete build.

I suggest:

1. Git should emit a warning when checking out a case-colliding file (or
directory) on a case-insensitive system.  I don't really care _what_ gets
checked out for that file -- whatever it is ain't gonna work anyway.  Let's
say it checks out the associated blob the first time it runs across
thing.foo, but then emits the warning when it tries to check out Thing.Foo.

2. Git should forbid (yes, *forbid*) a user on a case-insensitive system from
adding any change to any files stored in the repository under
case-conflicting names.  The error message should basically be "You need to
use a case-sensitive system to work on this file."

3. I'm OK with git allowing case-insensitive users to forcibly delete
case-conflicting files.  "git rm thing.foo" should, on case-insensitive
systems, fail and display all case-colliding names for
[tT][hH][iI][nN][gG].[fF][oO][oO], and tell the user to use -f if they really
want to delete *all* those files.

		M.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFD: Handling case-colliding filenames on case-insensitive filesystems
  2011-02-23 22:52   ` Marc Branchaud
@ 2011-02-23 23:09     ` Greg Troxel
  0 siblings, 0 replies; 12+ messages in thread
From: Greg Troxel @ 2011-02-23 23:09 UTC (permalink / raw)
  To: Marc Branchaud; +Cc: Jay Soffian, Johan Herland, git

[-- Attachment #1: Type: text/plain, Size: 1137 bytes --]


Marc Branchaud <marcnarc@xiplink.com> writes:

> On 11-02-23 02:07 PM, Jay Soffian wrote:
>
>> And on a case-insensitive file-system, git has remapped foo.[ch] to
>> foo~2.[ch] for the purposes of avoiding collisions on checkout.
>> 
>> The checkout can't be compiled correctly, so what's the point of even
>> allowing it?
>
> In our case it would be useful to still have that checkout because the
> people working on the case-insensitive systems are dealing with a
> different part of the tree and don't care about the part with the
> collision.

Agreed; I've had this problem too.  In particular, a repository with
multiple packages imported, one of which was a Linux flavor that has
conflicting names in case-preserving filesystems.  The result was an
apparently modified checkout, but the offending files were not
interesting to the project.  So some sort of remapping, and the
subsequent prohibition on modifications (perhaps to either) seems like a
good plan.

Perhaps by default the checkout should just error out, but then it would
be good to have a variable to instead translate the duplicated names.

[-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFD: Handling case-colliding filenames on case-insensitive filesystems
  2011-02-23 17:11 RFD: Handling case-colliding filenames on case-insensitive filesystems Johan Herland
  2011-02-23 18:56 ` Junio C Hamano
  2011-02-23 19:07 ` Jay Soffian
@ 2011-02-24  0:30 ` Johan Herland
  2 siblings, 0 replies; 12+ messages in thread
From: Johan Herland @ 2011-02-24  0:30 UTC (permalink / raw)
  To: git

On Wednesday 23 February 2011, Johan Herland wrote:
> Are there better suggestions on how to deal with this?

Just a small note that I forgot in the first email:

For the record, this issue has been discussed on stackoverflow <URL: 
http://stackoverflow.com/questions/2528589/git-windows-case-sensitive-file-
names-not-handled-properly > and the last comment there suggests an 
alternative way to work around the case-colliding problem without having to 
remove either file from the repo:

Use sparse-checkout to exclude the case-colliding files from the working 
tree.

Obviously, this isn't a permanent solution, but I thought I'd just throw it 
out there, as something to consider until a more permanent solution is in 
place.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFD: Handling case-colliding filenames on case-insensitive filesystems
  2011-02-23 19:27     ` Junio C Hamano
@ 2011-02-24  0:58       ` Johan Herland
  2011-02-24  1:26         ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Johan Herland @ 2011-02-24  0:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn Pearce

On Wednesday 23 February 2011, Junio C Hamano wrote:
> I think two things are sensible to do, are relatively low hanging fruits,
> and are of low risk:
> 
>  - break checkout on such a tree on incapable filesystems; and

Wouldn't that be a regression from the current state (where the poor user in 
a case-insensitive worktree can at least "git rm" the offending files, and 
keep working without assistance from a case-sensitive worktree)?

What about giving a warning on checkout, instead, explaining the problem, 
and advising that - for now - the user can remove the offending files with 
"git rm"?

>  - per project configuration (or attribute given to paths underneath a
>    particular directory) that forbids or warns addition of case colliding
>    paths to the index; enforce it at write_index() codepath; and
> 
>  - if we choose to just warn in the second item above instead of
> downright forbidding, barf in cache_tree_update() codepath when the per
> project configuration (or attribute) triggers upon case colliding paths,
> to prevent a commit from being made.

I support making this a per-project configuration that will trigger at tree-
creation (i.e. commit) time. I would even argue that the default should be 
to warn about (though maybe not refuse) case-colliding filenames, since they 
are either (a) directly harmful for cross-platform projects, or (b) probably 
unwanted in most projects anyway.

Having a per-project configuration sure beats trying to solve the problem in 
a hook script (using "pre-commit" introduces the logistical problem making 
sure everybody installs/enables the hook, whereas using "update" requires 
(precious) server runtime, triggers too late in the developer's workflow 
(forcing developer to amend/rebase), and probably confuses newbie developers 
as well).

> I think "warn at add time, fail at write-tree time" is more preferrable,
> as it might be more convenient if you can add hello.c while you still
> have HELLO.c in the index as long as you do not forget to remove HELLO.c
> from the index before making your next commit.

Agreed.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFD: Handling case-colliding filenames on case-insensitive filesystems
  2011-02-24  0:58       ` Johan Herland
@ 2011-02-24  1:26         ` Junio C Hamano
  2011-02-24  8:50           ` Johan Herland
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2011-02-24  1:26 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, git, Shawn Pearce

Johan Herland <johan@herland.net> writes:

> On Wednesday 23 February 2011, Junio C Hamano wrote:
>> I think two things are sensible to do, are relatively low hanging fruits,
>> and are of low risk:
>> 
>>  - break checkout on such a tree on incapable filesystems; and
>
> Wouldn't that be a regression from the current state (where the poor user in 
> a case-insensitive worktree can at least "git rm" the offending files, and 
> keep working without assistance from a case-sensitive worktree)?

Depends on the definition of "break".  I meant "exit with non-zero
status", not necessarily changing what is left in the working tree from
what the current code gives us.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFD: Handling case-colliding filenames on case-insensitive filesystems
  2011-02-24  1:26         ` Junio C Hamano
@ 2011-02-24  8:50           ` Johan Herland
  0 siblings, 0 replies; 12+ messages in thread
From: Johan Herland @ 2011-02-24  8:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn Pearce

On Thursday 24 February 2011, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > On Wednesday 23 February 2011, Junio C Hamano wrote:
> >> I think two things are sensible to do, are relatively low hanging
> >> fruits,
> >> 
> >> and are of low risk:
> >>  - break checkout on such a tree on incapable filesystems; and
> > 
> > Wouldn't that be a regression from the current state (where the poor
> > user in a case-insensitive worktree can at least "git rm" the
> > offending files, and keep working without assistance from a
> > case-sensitive worktree)?
> 
> Depends on the definition of "break".  I meant "exit with non-zero
> status", not necessarily changing what is left in the working tree from
> what the current code gives us.

Ah, ok. I have no problem with that.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-02-24  8:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-23 17:11 RFD: Handling case-colliding filenames on case-insensitive filesystems Johan Herland
2011-02-23 18:56 ` Junio C Hamano
2011-02-23 19:01   ` Shawn Pearce
2011-02-23 19:27     ` Junio C Hamano
2011-02-24  0:58       ` Johan Herland
2011-02-24  1:26         ` Junio C Hamano
2011-02-24  8:50           ` Johan Herland
2011-02-23 19:07 ` Jay Soffian
2011-02-23 19:17   ` Matthieu Moy
2011-02-23 22:52   ` Marc Branchaud
2011-02-23 23:09     ` Greg Troxel
2011-02-24  0:30 ` Johan Herland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.