All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Ericsson <ae@op5.se>
To: Esko Luontola <esko.luontola@gmail.com>
Cc: Robin Rosenberg <robin.rosenberg@dewire.com>, git@vger.kernel.org
Subject: Re: [RFC 1/8] UTF helpers
Date: Wed, 13 May 2009 12:02:10 +0200	[thread overview]
Message-ID: <4A0A9AA2.1000004@op5.se> (raw)
In-Reply-To: <4A0A91CE.3080905@gmail.com>

Esko Luontola wrote:
> Robin Rosenberg wrote on 13.5.2009 8:24:
>> If the conclusion is that this is a way forward, then I
>> could start working on a completely new set of much cleaner patches.,
> 
> That would be great!
> 
> I see that in those early patches you took the approach of converting 
> the filenames from the local encoding to UTF-8 at the outer edges of 
> Git. That obviously was the easiest way to make the changes with minimal 
> changes to Git.
> 
> I've been thinking about a bit more extensive approach, which should 
> serve the interest of all stakeholders:
> 
> 
> Now the tree object contains the following information for each file: 
> filename, mode, sha1. To that would be added one more string: filename
> encoding. Unless the encoding is specified (such as in old commits 
> before the encoding information was added), the default encoding is 
> "binary", which is the same as how Git works now (it thinks filenames as 
> series of bytes, ignoring their encoding completely).
> 

[ long and incompatible plan removed ]

> One big question is, that will this change require a change to the 
> repository format? Will it be possible to add the encoding field to the 
> tree object, without breaking compatibility with older Git clients? If 
> compatibility needs to be broken, how it can be done in a controlled 
> fashion?
> 

Generally when one wants to change one of the basic object types in
git, some extraordinary benefit has to be shown that is not aimed
at just a few people. Academic benefits (ie, "non-real-worldy") do
not fall into that category. In fact, it's so rare for someone to
provide such enormous benefit that the only time a core object format
in git has been incompatibly changed is when Linus decided that trees
should be able to have subtrees. The change reduced the repository
size for the early git-tracked Linux kernel to about 4% of its
original size, so there was a clear, undisputable and obvious benefit
huge enough to warrant breaking the git repository format entirely
just to get it in (I might have gotten those details entirely wrong,
but it was something along those lines).

So unless you can change tree objects in a way that lets older git
clients understand them while still adding this encoding cruft
(it's cruft to me), I think your chances of getting such a change
into the git core are about the size of the colour green.

If you're *really* serious about it though, here's how to go about
it:

1. Make the changes so that newer git can always read and operate
on trees without the encoding information, regardless of what the
configuration says.
2. Modify 1.4.x branch to support this new format too, at least
for reading trees with the information in it. Otherwise some
package maintainers will just ignore such compatibility.
3. Modify 1.5.x branch similarly.
5. Make it configurable, but turned off by default and with a big
fat warning when its turned on.
6. 2 years later, remove the warning.
7. 2 years lter, turn it on by default.
8. 2 years later, remove the config option and make it a new
major release, but maintain the two codepaths forever.


1.[45].x branches are imaginary. They represent the branch that
gets created when a new release in that series is necessary for
some reason.


I haven't perused Robin's patches enough to know how they would
interact with older git, and I'm not really interested in encoding
issues. English being the lingua franca of internet and opensource
development anyways, every project I've ever seen has only files
named in a manner that would fit nicely into 7-bit ascii.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Register now for Nordic Meet on Nagios, June 3-4 in Stockholm
 http://nordicmeetonnagios.op5.org/

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

  reply	other threads:[~2009-05-13 10:02 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-12 22:50 [RFC 0/8] Antique UTF-8 filename support Robin Rosenberg
2009-05-12 22:50 ` [RFC 1/8] UTF helpers Robin Rosenberg
2009-05-12 22:50   ` [RFC 2/8] Messages in locale Robin Rosenberg
2009-05-12 22:50     ` [RFC 3/8] Extend tests to cover locale wrt to commit messages Robin Rosenberg
2009-05-12 22:50       ` [RFC 4/8] UTF file names Robin Rosenberg
     [not found]         ` <1242168631-30753-6-git-send-email-robin.rosenberg@dewire.com>
2009-05-12 22:50           ` [RFC 6/8] test of utf_locallinks Robin Rosenberg
2009-05-12 22:50             ` [RFC 7/8] Convert symlink dest in diff Robin Rosenberg
2009-05-12 22:50               ` [RFC 8/8] UTF-8 in non-SHA1-objects Robin Rosenberg
2009-05-13  0:20   ` [RFC 1/8] UTF helpers Johannes Schindelin
2009-05-13  5:24     ` Robin Rosenberg
2009-05-13  9:24       ` Esko Luontola
2009-05-13 10:02         ` Andreas Ericsson [this message]
2009-05-13 10:21           ` Esko Luontola
2009-05-13 11:44             ` Alex Riesen
2009-05-13 18:48         ` Junio C Hamano
2009-05-13 19:31           ` Esko Luontola
2009-05-13 20:10             ` Junio C Hamano
2009-05-13 10:14       ` Johannes Schindelin
2009-05-14  4:38       ` Junio C Hamano
2009-05-14 13:57         ` Jay Soffian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A0A9AA2.1000004@op5.se \
    --to=ae@op5.se \
    --cc=esko.luontola@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=robin.rosenberg@dewire.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.