All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fast-import.c: always honor the filename case
@ 2014-02-02 13:13 Reuben Hawkins
  2014-02-02 20:08 ` Torsten Bögershausen
  2014-02-02 23:00 ` Jeff King
  0 siblings, 2 replies; 9+ messages in thread
From: Reuben Hawkins @ 2014-02-02 13:13 UTC (permalink / raw)
  To: git; +Cc: dpotapov, tboegi, Reuben Hawkins

fast-import should not use strncmp_icase.  When it does, files with
similar names, but different case can be lost in the import.  For
example...

M 100644 :1 FileName.txt
D Filename.txt

...would end up deleting FileName from the index during the fast-
import when strncmp_icase is used and core.ignorecase=true.  The
intent in the above snippet is to rename the file, not delete it.

Replacing strncmp_icase with strncmp in fast-import.c fixes the
issue.

alternatives:
* check if the filesystem is case-preserving.  If it is, don't
  set core.ignorecase=true.  This, however, exposes another issue
  where git is tricked by stat into thinking that tracked files
  are untracked on case-preserving and case-insensitive filesystems.
---
 fast-import.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index f4d9969..62e28c0 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1500,7 +1500,7 @@ static int tree_content_set(
 	t = root->tree;
 	for (i = 0; i < t->entry_count; i++) {
 		e = t->entries[i];
-		if (e->name->str_len == n && !strncmp_icase(p, e->name->str_dat, n)) {
+		if (e->name->str_len == n && !strncmp(p, e->name->str_dat, n)) {
 			if (!slash1) {
 				if (!S_ISDIR(mode)
 						&& e->versions[1].mode == mode
@@ -1593,7 +1593,7 @@ static int tree_content_remove(
 	t = root->tree;
 	for (i = 0; i < t->entry_count; i++) {
 		e = t->entries[i];
-		if (e->name->str_len == n && !strncmp_icase(p, e->name->str_dat, n)) {
+		if (e->name->str_len == n && !strncmp(p, e->name->str_dat, n)) {
 			if (slash1 && !S_ISDIR(e->versions[1].mode))
 				/*
 				 * If p names a file in some subdirectory, and a
@@ -1663,7 +1663,7 @@ static int tree_content_get(
 	t = root->tree;
 	for (i = 0; i < t->entry_count; i++) {
 		e = t->entries[i];
-		if (e->name->str_len == n && !strncmp_icase(p, e->name->str_dat, n)) {
+		if (e->name->str_len == n && !strncmp(p, e->name->str_dat, n)) {
 			if (!slash1)
 				goto found_entry;
 			if (!S_ISDIR(e->versions[1].mode))
-- 
1.8.5.3.1.gac93028.dirty

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] fast-import.c: always honor the filename case
  2014-02-02 13:13 [PATCH] fast-import.c: always honor the filename case Reuben Hawkins
@ 2014-02-02 20:08 ` Torsten Bögershausen
       [not found]   ` <CAD_8n+RZACW0380co75gWSwVmCJdcH4COsySTF3BFCyKEumXNA@mail.gmail.com>
  2014-02-02 23:00 ` Jeff King
  1 sibling, 1 reply; 9+ messages in thread
From: Torsten Bögershausen @ 2014-02-02 20:08 UTC (permalink / raw)
  To: Reuben Hawkins, git; +Cc: dpotapov, tboegi

On 2014-02-02 14.13, Reuben Hawkins wrote:
> fast-import should not use strncmp_icase.  When it does, files with
> similar names, but different case can be lost in the import.  For
> example...
> 
> M 100644 :1 FileName.txt
> D Filename.txt
That seems to be wrong, shouldn't it be
D Filename.txt
M 100644 :1 FileName.txt

How do you generate the export/import stream ?

Please see here:
>https://www.kernel.org/pub/software/scm/git/docs/git-fast-import.html
>Handling Renames

>When importing a renamed file or directory, simply delete the old name(s) and modify the new name(s) during the corresponding commit. 
>Git performs rename detection after-the-fact, rather than explicitly during a commit.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fast-import.c: always honor the filename case
  2014-02-02 13:13 [PATCH] fast-import.c: always honor the filename case Reuben Hawkins
  2014-02-02 20:08 ` Torsten Bögershausen
@ 2014-02-02 23:00 ` Jeff King
  2014-02-03 17:21   ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Jeff King @ 2014-02-02 23:00 UTC (permalink / raw)
  To: Reuben Hawkins; +Cc: git, dpotapov, tboegi, Joshua Jensen

[+cc Joshua Jensen, who wrote 50906e0]

On Sun, Feb 02, 2014 at 07:13:04AM -0600, Reuben Hawkins wrote:

> fast-import should not use strncmp_icase.

I am not sure of that. My gut feeling is that core.ignorecase is
completely about the _filesystem_, and that git should generally be
case-sensitive internally. But I do not know that everyone agrees. Your
commit is basically a revert of 50906e0 (Support case folding in git
fast-import when core.ignorecase=true, 2010-10-03). And here's some
additional discussion specifically regarding fast-import:

  http://thread.gmane.org/gmane.comp.version-control.git/200597

So I think there is a discussion to be had[1].

> When it does, files with
> similar names, but different case can be lost in the import.  For
> example...
> 
> M 100644 :1 FileName.txt
> D Filename.txt
> 
> ...would end up deleting FileName from the index during the fast-
> import when strncmp_icase is used and core.ignorecase=true.  The
> intent in the above snippet is to rename the file, not delete it.

There may be a separate bug where fast-import needs to be smarter about
ordering operations in such a case (or perhaps fast-export generators
need to be more careful about the order of their output). But it might
be fixable without disabling case-insensitivity entirely.

-Peff

[1] I am mostly trying to connect people on various sides of the
    discussion here. So take my "gut feeling" above with a grain of
    salt, as it does not come from experience nor thinking too hard
    about the issue.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fast-import.c: always honor the filename case
  2014-02-02 23:00 ` Jeff King
@ 2014-02-03 17:21   ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2014-02-03 17:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Reuben Hawkins, git, dpotapov, tboegi, Joshua Jensen

Jeff King <peff@peff.net> writes:

> [+cc Joshua Jensen, who wrote 50906e0]
>
> On Sun, Feb 02, 2014 at 07:13:04AM -0600, Reuben Hawkins wrote:
>
>> fast-import should not use strncmp_icase.
>
> I am not sure of that. My gut feeling is that core.ignorecase is
> completely about the _filesystem_, and that git should generally be
> case-sensitive internally.

I agree; if squashing mixed cases into a single canonical one is
desired in one use case (like Joshua described in $gmane/200597),
that should have been done as an optional feature, not by default
(ideally, it should probably be done in a filter between exporters
and the fast-import, I would think).

> [1] I am mostly trying to connect people on various sides of the
>     discussion here. So take my "gut feeling" above with a grain of
>     salt, as it does not come from experience nor thinking too hard
>     about the issue.

Thanks; that is exactly what is expected of project elders ;-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fast-import.c: always honor the filename case
       [not found]   ` <CAD_8n+RZACW0380co75gWSwVmCJdcH4COsySTF3BFCyKEumXNA@mail.gmail.com>
@ 2014-02-03 20:21     ` Torsten Bögershausen
  2014-02-04  0:14       ` Junio C Hamano
       [not found]       ` <CAD_8n+RuwQEXJRCOr+B_PqA7z6LkFdbcRZkiiVJsEhJ=+YjRDg@mail.gmail.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Torsten Bögershausen @ 2014-02-03 20:21 UTC (permalink / raw)
  To: Reuben Hawkins, Torsten Bögershausen; +Cc: git, Dmitry Potapov

[]
> So to summarize, when fast-import uses strncmp_icase (what fast-import does now) import on a repository where ignorecase=true is wrong.  My patch, "fast-import.c: always honor the filename case" fixes this.  Can you verify?
>
> Thanks in advance,
> Reuben
>
Yes, I can verify. My feeling is that
a) the fast-export should generate the rename the other way around.
   Would that be feasable ?
   Or generate a real rename ?
  (I'm not using fast-export or import myself)

b)  As a workaround, does it help if you manually set core.ignorecase false ?

c)  Does it help to use git-hg-remote ? (could be another workaround)

And no,  50906e04e8f48215b0 does not include any test cases.
(try git show 50906e04e8)

This is only a short answer, I can prepare a longer answer about ignorecase the next days.
/Torsten

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fast-import.c: always honor the filename case
  2014-02-03 20:21     ` Torsten Bögershausen
@ 2014-02-04  0:14       ` Junio C Hamano
       [not found]       ` <CAD_8n+RuwQEXJRCOr+B_PqA7z6LkFdbcRZkiiVJsEhJ=+YjRDg@mail.gmail.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2014-02-04  0:14 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Reuben Hawkins, git, Dmitry Potapov, Joshua Jensen

Torsten Bögershausen <tboegi@web.de> writes:

> []
>> So to summarize, when fast-import uses strncmp_icase (what fast-import does now) import on a repository where ignorecase=true is wrong.  My patch, "fast-import.c: always honor the filename case" fixes this.  Can you verify?
>>
>> Thanks in advance,
>> Reuben
>>
> Yes, I can verify. My feeling is that
> a) the fast-export should generate the rename the other way around.
>    Would that be feasable ?
>    Or generate a real rename ?
>   (I'm not using fast-export or import myself)

I do not think this matters.  Or at least, it should not matter.  As
Peff said in the nearby message, core.ignorecase is completely about
the _filesystem_, and that git should generally be case-sensitive
internally.

And fast-import is about reading internal representation of paths
and data to populate the repository, without having to guess what
pathnames were meant to be used---the guess is only needed if we
need to consult the filesystem that loses case information.

The change made by 50906e04 (Support case folding in git fast-import
when core.ignorecase=true, 2010-10-03) should probably be half
reverted by making the case-squashing an optional feature, that
could be used even on a system with case sensitive filesystems.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fast-import.c: always honor the filename case
       [not found]       ` <CAD_8n+RuwQEXJRCOr+B_PqA7z6LkFdbcRZkiiVJsEhJ=+YjRDg@mail.gmail.com>
@ 2014-02-05 21:19         ` Torsten Bögershausen
       [not found]           ` <CAD_8n+Thn3tNTYxLK49mDOGdLpWRCFUCJo9b76UbAjnCdqXsRQ@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Torsten Bögershausen @ 2014-02-05 21:19 UTC (permalink / raw)
  To: Reuben Hawkins, Torsten Bögershausen; +Cc: git, Dmitry Potapov

On 2014-02-03 23.11, Reuben Hawkins wrote:
>
>
>
> On Mon, Feb 3, 2014 at 2:21 PM, Torsten Bögershausen <tboegi@web.de <mailto:tboegi@web.de>> wrote:
>
>     []
>     > So to summarize, when fast-import uses strncmp_icase (what fast-import does now) import on a repository where ignorecase=true is wrong.  My patch, "fast-import.c: always honor the filename case" fixes this.  Can you verify?
>     >
>     > Thanks in advance,
>     > Reuben
>     >
>     Yes, I can verify. My feeling is that
>     a) the fast-export should generate the rename the other way around.
>        Would that be feasable ?
>
>
> I *think* this is feasible.  I did try this, and it worked, but I didn't like the idea of having to fix all the exporters.  I know about hg-export and svn-export, but I assume there are more that I don't know about, and maybe even other tools out there none of us know about, which would also have to be fixed in the same way.  As such, I decided fixing fast-export isn't really the right thing to do...I don't really think fast-export is broken to begin with.  I'm hoping there is a way to fix ignorecase such that it doesn't create this type of problem with this...
>
> M 100644 :1 Filename.txt
> D FileName.txt
>
> ..maybe by very carefully identifying when ignorecase should apply and when it shouldn't (I'm still trying to sort that out, the docs on ignorecase aren't specific).
>
> But for what it's worth, to "fix" fast-export, I added a check in builtin/fast-export.c in the function depth_first before all the other checks so it would always make diff_filepair->status == 'D' the lesser when not equal...something like this (I'm not looking at the code now, so this may *not* be what I did)...
>
> if (a->status != b->status) {
>   if (a->status == 'D') return -1;
>   if (b->status == 'D') return 1;
> }
>
> /Reuben
>  
>
>        Or generate a real rename ?
>
>
> A rename may also work, but it may not.  And that would also require fixing all other exporters.  If I understand the docs well enough, a rename would be done like so...
>
> R Filename.txt FileName.txt
>
> I think in the ignorecase=true situation, case folding would happen and this would be a nop like this...
>
> R Filename.txt Filename.txt
>
> ...Right?  I haven't tested this, but I *suspect* the result would be to not rename the file when ignorecase=true...I definitely think it's worth a try just to know the result, but this fixes the ignorecase problem in fast-import by passing a requirement to all the fast-exporters...a semi-artificial requirement created because ignorecase *could* be true, but may not be.
> /Reuben
>  
>
>       (I'm not using fast-export or import myself)
>
>     b)  As a workaround, does it help if you manually set core.ignorecase false ?
>
>
> Yes, this works.  It makes a single step clone, git clone hg::..., into a multi step process like this...
>
> $ mkdir test
> $ git init
> $ git config core.ignorecase false
> $ git remote add origin hg::whatever
> $ git fetch origin
> $ git reset --hard origin/master
> $ git branch --set-upstream-to=origin/master master
>
> That isn't a too big deal for people fluent in GIT (if you only have to do it once and wrote it down too maybe).  It works, just not ideally and it's easy to get burned on because git clone sort-of works, it just doesn't truly clone.  The resulting cloned repo can be mangled by case folding.
>
> Typically, unless somebody explains the multi step process to everybody, some people will have master with commit xxxxxx and others will have the exact same master with commit yyyyyy.  Some will have Filename.txt instead of FileName.txt way back in history.  Merging those branches is a mess.
>
> So setting core.ignorecase=flase does work, it's just a bit cumbersome.  My fingers really want to just type git clone hg::whatever and I hope to get a true 'clone' as in an exact, identical copy on all machines regardless of filesystem.  If GIT wants to do case folding after that I suppose it would be fine.  Maybe I'm expecting too much, but I've been under the impression for years that a clone of any git repo will have all the exact same commit id's.
> /Reuben
>
>
>     c)  Does it help to use git-hg-remote ? (could be another workaround)
>
>
> Yes, sorry, I guess I wasn't clear on that point.  That is what I'm using. 
> /Reuben
>
>
>     And no,  50906e04e8f48215b0 does not include any test cases.
>     (try git show 50906e04e8)
>
>     This is only a short answer, I can prepare a longer answer about ignorecase the next days.
>     /Torsten
>
>
> Thank you!  That would be very helpful.  I'm still trying to wrap my head around what ignorecase really needs to do, when and where it needs to do it and what it shouldn't do.  I suspect ignorecase is touching too many code paths and needs to be reined in a bit.
>
> I'm also wondering if it's possible to test a bunch of situations in 'make tests' with ignorecase=true/false....but I can't think of any way other than mounting filesystems on loopbacks to setup the tests (to ensure a vfat fs for example)....do you know a better way?
>
> /Reuben 
>
My experience and understanding is that ignorecase=true is useful when working in a Windows environment.
Some tools (or users) change the case of a filename (or directory name) and Git compensates for that.
Which means that you can rename a file behind the back of Git, and Git is not complaining.
(Junio recently explained it much better in detail)

The thing is that you can push and pull between different machines, and ignorecase=true makes sure that
Git finds the "right file", similar as Windows finds it.

The same is true for directory names, and I could call the whole ignorecase=true feature
a kind of "don't worry" packet, or a "Changing case is harmless insurance".

Since 50906e04e8 this is even more true, since even the fast-import is covered,
The importer will find the "right file" even if the exporter did not tell us about the rename.
(Or should we say re-case ?)

Some trouble starts when you push and pull such a repo to a Linux machine.
(You can replace Windows with Mac OS+case insensitve HFS+ and Linux with
Mac OS + case sensitive HFS+ (or Unix))

Especially the the case folding of directory names is interesting:
For Linux "Dir1/File", "dir1/file" or "dir1/File" are completely different, for Windows not.
So if a mixture of Windows/Linux systems is working together it can help to set
ignorecase=false on Windows, to detect some of the mess seen by Linux.
(and thats why I suggested to set ignorecase=false as a workaround)

Reading the answers from Peff and Junio, I am convinced that the fast-import should
not look at core.ignorecase at all.

If I am allowed to make a suggestion:
the patch for fast-import.c (0001-fast-import.c-ignorecase-iff-explicitly-told-to.patch)
 you send out is a good starting point.

(I would avoid using ignore_case and  strncmp_icase().
 A file-local variable like fi_ignore_case or so and a local function will isolate variables.
 
Perhaps slow_same_name() from name-hash.c can be used (it needs to be made non-local)

On top of that, a test case could be good.
 You have send a shell script to demonstrate the problem using fast-export and fast-import.
 This can be used as a start.

And if you want to experiement with case-sensitive/unsensitve file systems
under Mac OS, simply get a USB drive (8 GB Flash stick could be enough)
and format it with the HFS+ version you don't have on your hard disk.
/Torsten
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fast-import.c: always honor the filename case
       [not found]           ` <CAD_8n+Thn3tNTYxLK49mDOGdLpWRCFUCJo9b76UbAjnCdqXsRQ@mail.gmail.com>
@ 2014-02-09 20:34             ` Torsten Bögershausen
       [not found]               ` <CAD_8n+ToUDbXrVuru7GV7toYKHXuQb8vL3B_-sfzQdXZFqzD2A@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Torsten Bögershausen @ 2014-02-09 20:34 UTC (permalink / raw)
  To: Reuben Hawkins, Torsten Bögershausen; +Cc: git, Dmitry Potapov

On 2014-02-06 12.24, Reuben Hawkins wrote:
[snipped away minor interesting stuff]

    Reading the answers from Peff and Junio, I am convinced that the fast-import should
    not look at core.ignorecase at all.


Agreed, but my patch 0001-fast-import.c-ignorecase-iff-... is working very well (for me anyway), and the ignore-case option may be useful to the git-p4 importer (which I guess is something that should be determined for sure).

If you want, you can turn this into a real patch and send it to the list.
I think  Pete Wyckoff <pw@padd.com> is one of the experts about p4.

And the same is for fast-export fixes you have made: If you have the time,
convert it into a patch and send it to the list.

[]
> BTW, if you can, can you give me a quick overview of testing it git?  I can run 'make tests' easy enough, but there seems to be a well defined framework of testing written in bash and c..  Is there a doc on that framework anywhere?
I'm not sure if there is a document, and yes, it's a nice framework.
You can have a look at t0050, it gives a good overview over the most important
feattures in the framework, I would say.

(And the shell scripts uses a subset of the POSIX shell, which means we avoid
bash-ism things like "[[ ]]" or "==")
/Torsten

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fast-import.c: always honor the filename case
       [not found]               ` <CAD_8n+ToUDbXrVuru7GV7toYKHXuQb8vL3B_-sfzQdXZFqzD2A@mail.gmail.com>
@ 2014-02-11 17:29                 ` Torsten Bögershausen
  0 siblings, 0 replies; 9+ messages in thread
From: Torsten Bögershausen @ 2014-02-11 17:29 UTC (permalink / raw)
  To: Reuben Hawkins, Torsten Bögershausen; +Cc: git, Dmitry Potapov

On 2014-02-10 15.24, Reuben Hawkins wrote:
>
>
>
> On Sun, Feb 9, 2014 at 2:34 PM, Torsten Bögershausen <tboegi@web.de <mailto:tboegi@web.de>> wrote:
>
>     On 2014-02-06 12 <tel:2014-02-06%2012>.24, Reuben Hawkins wrote:
>     [snipped away minor interesting stuff]
>
>         Reading the answers from Peff and Junio, I am convinced that the fast-import should
>         not look at core.ignorecase at all.
>
>
>     Agreed, but my patch 0001-fast-import.c-ignorecase-iff-... is working very well (for me anyway), and the ignore-case option may be useful to the git-p4 importer (which I guess is something that should be determined for sure).
>
>     If you want, you can turn this into a real patch and send it to the list.
>     I think  Pete Wyckoff <pw@padd.com <mailto:pw@padd.com>> is one of the experts about p4.
>
>
> hehe. I thought it was a real patch....so I guess I'm not sure what you mean.  What do I need to do to make the patch more real (pardon my ignorance)?
> /Reuben
I have found a patch which was send as an attachment,
and that's why it got the file name 0001-fast-import.c-ignorecase-iff.... here
So yes, it was a real patch. Please re-send it inline, not as an attachment, as you did in
"[PATCH] fast-import.c: always honor the filename case"
>
>
>     And the same is for fast-export fixes you have made: If you have the time,
>     convert it into a patch and send it to the list.
>
>      
>
>
> Do you mean the fast-export fixes where I pushed all the delete operations to the front of the list?
Yes
/Torsten

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-02-11 17:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-02 13:13 [PATCH] fast-import.c: always honor the filename case Reuben Hawkins
2014-02-02 20:08 ` Torsten Bögershausen
     [not found]   ` <CAD_8n+RZACW0380co75gWSwVmCJdcH4COsySTF3BFCyKEumXNA@mail.gmail.com>
2014-02-03 20:21     ` Torsten Bögershausen
2014-02-04  0:14       ` Junio C Hamano
     [not found]       ` <CAD_8n+RuwQEXJRCOr+B_PqA7z6LkFdbcRZkiiVJsEhJ=+YjRDg@mail.gmail.com>
2014-02-05 21:19         ` Torsten Bögershausen
     [not found]           ` <CAD_8n+Thn3tNTYxLK49mDOGdLpWRCFUCJo9b76UbAjnCdqXsRQ@mail.gmail.com>
2014-02-09 20:34             ` Torsten Bögershausen
     [not found]               ` <CAD_8n+ToUDbXrVuru7GV7toYKHXuQb8vL3B_-sfzQdXZFqzD2A@mail.gmail.com>
2014-02-11 17:29                 ` Torsten Bögershausen
2014-02-02 23:00 ` Jeff King
2014-02-03 17:21   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.