All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Daniel Barkalow <barkalow@iabervon.org>
Cc: Martin Langhoff <martin.langhoff@gmail.com>,
	Jeff King <peff@peff.net>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	Esko Luontola <esko.luontola@gmail.com>,
	git@vger.kernel.org
Subject: Re: Cross-Platform Version Control
Date: Wed, 13 May 2009 14:29:24 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.2.01.0905131420500.3343@localhost.localdomain> (raw)
In-Reply-To: <alpine.LNX.2.00.0905131639580.2147@iabervon.org>



On Wed, 13 May 2009, Daniel Barkalow wrote:
> > 
> > Now, the simple OS X case is not a huge problem, since the lstat will 
> > succeed with the fixed-up filename too.
> 
> I'm not seeing what the general case is, and how it could possibly behave.

Here's a simple example.

Let's say that your company uses Latin1 internally for your filesystems, 
because your tools really aren't utf-8 ready. 

This is NOT AT ALL unnatural - it's how lots of people used to work with 
Linux over the years, and it's largely how people still use FAT, I suspect 
(except it's not latin1, it's some windows-specific 8-bits-per-character 
mapping).


IOW, if you have a file called 'åäö', it literally is encoded as 
'\xe5\xe4\xf6' (if you wonder why I picked those three letters, it's 
because they are the regular extra letters in Swedish - Swedish has 29 
letters in its alphabet, and those three letters really are letters in 
their own right, they are NOT 'a' and 'o' with some dots/rings on top).

IOW, if you open such a file, you need to use those three bytes.

Now, even if you happen to have an OS and use Latin1 on disk, you may 
realize that you'd like to interact with others that use UTF-8, and would 
want to have your git archive that you export use nice portable UTF-8.

But you absolutely MUST NOT just do a conversion at "readdir()" time. If 
you do that, then your three-byte filename turns into a six-byte utf-8 
sequence of '\xc3\xa5\xc3\xa4\xc3\xb6' and the thing is, now "lstat()" 
won't work on that sequence.

So obviously you could always turn things _back_ for lstat(), but quite 
frankly, that's (a) insane (b) incompetent and (c) not even always 
well-defined.

> There's the "insensitive" behavior: if you create "foo" and look for 
> "FOO", it's there, but readdir() reports "foo".
> 
> There's the "converting" behavior: if you create "foo", readdir() reports 
> "FOO", but lstat("foo") returns it.

Then there's the behaviour above: you want your git repository to have 
utf-8, but your filesystem doesn't convert anything at all, and all your 
regular tools (think editors etc) are all Latin1.

Latin1 is going away, I hope, but I bet EUC-JP etc still exist. 

		Linus

  reply	other threads:[~2009-05-13 21:29 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-12 15:06 Cross-Platform Version Control Esko Luontola
2009-05-12 15:14 ` Shawn O. Pearce
2009-05-12 16:13   ` Johannes Schindelin
2009-05-12 17:56     ` Esko Luontola
2009-05-12 20:38       ` Johannes Schindelin
2009-05-12 21:16         ` Esko Luontola
2009-05-13  0:23           ` Johannes Schindelin
2009-05-13  5:34             ` Esko Luontola
2009-05-13  6:49               ` Alex Riesen
2009-05-13 10:15               ` Johannes Schindelin
     [not found]                 ` <43d8ce650905130340q596043d5g45b342b62fe20e8d@mail.gmail.com>
2009-05-13 10:41                   ` John Tapsell
2009-05-13 13:42                     ` Jay Soffian
2009-05-13 13:44                       ` Alex Riesen
2009-05-13 13:50                         ` Jay Soffian
2009-05-13 13:57                           ` John Tapsell
2009-05-13 15:27                             ` Nicolas Pitre
2009-05-13 16:22                               ` Johannes Schindelin
2009-05-13 17:24                             ` Andreas Ericsson
2009-05-14  1:49                             ` Miles Bader
2009-05-12 16:16   ` Jeff King
2009-05-12 16:57     ` Johannes Schindelin
2009-05-13 16:26     ` Linus Torvalds
2009-05-13 17:12       ` Linus Torvalds
2009-05-13 17:31         ` Andreas Ericsson
2009-05-13 17:46         ` Linus Torvalds
2009-05-13 18:26           ` Martin Langhoff
2009-05-13 18:37             ` Linus Torvalds
2009-05-13 21:04               ` Theodore Tso
2009-05-13 21:20                 ` Linus Torvalds
2009-05-13 21:08               ` Daniel Barkalow
2009-05-13 21:29                 ` Linus Torvalds [this message]
2009-05-13 20:57         ` Matthias Andree
2009-05-13 21:10           ` Linus Torvalds
2009-05-13 21:30             ` Jay Soffian
2009-05-13 21:47             ` Matthias Andree
2009-05-12 18:28 ` Dmitry Potapov
2009-05-12 18:40   ` Martin Langhoff
2009-05-12 18:55     ` Jakub Narebski
2009-05-12 21:43       ` [PATCH] Extend sample pre-commit hook to check for non ascii file/usernames Heiko Voigt
2009-05-12 21:55         ` Jakub Narebski
2009-05-14 17:59           ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Heiko Voigt
2009-05-15 10:52             ` Martin Langhoff
2009-05-18  9:37               ` Heiko Voigt
2009-05-18 22:26                 ` Jakub Narebski
2009-06-20 12:14               ` [RFC PATCH] check for filenames that only differ in case to sample pre-commit hook Heiko Voigt
2009-05-15 14:57             ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Jakub Narebski
2009-05-18  9:50               ` [PATCH] " Heiko Voigt
2009-05-18 10:40                 ` Johannes Sixt
2009-05-18 11:50                   ` Heiko Voigt
2009-05-18 12:04                     ` Johannes Sixt
2009-05-19 20:01                   ` [PATCH v4] " Heiko Voigt
2009-05-18 14:42                 ` [PATCH] " Junio C Hamano
2009-05-18 20:35                 ` Julian Phillips
2009-05-15 18:11             ` [PATCH v2] " Junio C Hamano
2009-05-14 13:48 ` Cross-Platform Version Control Peter Krefting
2009-05-14 19:58   ` Esko Luontola
2009-05-14 20:21     ` Andreas Ericsson
2009-05-14 22:25     ` Johannes Schindelin
2009-05-15 11:18     ` Dmitry Potapov
  -- strict thread matches above, loose matches on Subject: below --
2009-04-27  8:55 Eric Sink's blog - notes on git, dscms and a "whole product" approach Martin Langhoff
2009-04-28 11:24 ` Cross-Platform Version Control (was: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Jakub Narebski
2009-04-29  6:55   ` Martin Langhoff
2009-04-29  7:52     ` Cross-Platform Version Control Jakub Narebski
2009-04-29  8:25       ` Martin Langhoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.01.0905131420500.3343@localhost.localdomain \
    --to=torvalds@linux-foundation.org \
    --cc=barkalow@iabervon.org \
    --cc=esko.luontola@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=martin.langhoff@gmail.com \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.