git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: Yuri <yuri@rawbw.com>, Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages
Date: Thu, 27 May 2021 10:02:01 -0400	[thread overview]
Message-ID: <YK+mWZP+sl3zXECx@coredump.intra.peff.net> (raw)
In-Reply-To: <20210527045628.uvesihyhtqrfyfae@tb-raspi4>

On Thu, May 27, 2021 at 06:56:28AM +0200, Torsten Bögershausen wrote:

> On Wed, May 26, 2021 at 04:41:38PM -0700, Yuri wrote:
> > On 5/26/21 4:32 PM, Junio C Hamano wrote:
> > > "git config core.quotepath no"?
> >
> >
> > I didn't have the 'core.quotepath' value set. 'git config core.quotepath no'
> > changed the behavior to no quoting.
> >
> > So it looks like the default value of 'core.quotepath' is incorrect: it
> > should be based on terminal capabilities.
> >
> 
> This are 2 different things.
> If you are in a project where only ASCII names are allowed (for whatever reason),
> you may want `git config core.quotepath no`, regardless what the terminal can do.
> 
> (Beside that, are ther terminals that don't handle UTF-8 these days?)

I don't think core.quotepath is just about UTF-8. It is agnostic to the
encoding of the paths, so it is really a question of whether to just
pass through bytes with the high bit set.

So I think the more accurate question is: do the paths in your
repositories generally contain bytes that your terminal can interpret
sensibly?  I'd guess the answer is usually yes, even if you are using
latin1 or similar (or else "ls" would show you mojibake, too).

But there's a follow-on, too: do all the other things which consume
quoted path output likewise handle it? Setting core.quotepath will
impact all parts of Git, including plumbing. So a script that parses
diff-tree output, for example, will see a difference.

I'd guess that most text-processing tools these days are reasonably
happy with high-bit chars. But if we were to flip the default, we might
see regressions with:

  - very old / obscure systems (I'd guess even old versions of GNU tools
    are good, but who knows what Solaris sed will do)

  - some scripting languages (like perl and ruby) have internal strings
    that are encoding-aware, and so they are picky about reading
    high-bit input from a descriptor, especially if it isn't utf8.
    The fix is usually easy-ish, but may be a surprise for some folks
    (OTOH, I can imagine it fixes bugs in sloppily-written scripts which
    did not anticipate the incoming filenames being quoted ;) ).

As Git is used more and more internationally, I suspect the value of
defaulting core.quotepath=no increases. And as time goes on and people
tend to standardize on utf8-aware tools and environments, the risk of
doing so decreases. So while core.quotepath=yes was a conservative
choice in 2007, it might be time to look at switching.

> Any, if you prefer UTF-8 as a default,
> 
> git config --global core.quotepath yes
> 
> is your friend (like mine)

Just a nit/clarification for other readers, but I think you have yes/no
flipped here and earlier in your message.

-Peff

  reply	other threads:[~2021-05-27 14:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 22:47 [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages Yuri
2021-05-26 23:32 ` Junio C Hamano
2021-05-26 23:41   ` Yuri
2021-05-27  4:56     ` Torsten Bögershausen
2021-05-27 14:02       ` Jeff King [this message]
2021-05-27 20:50         ` Yuri
2021-05-28  4:39           ` Bagas Sanjaya
2021-05-28  4:45             ` Yuri
2021-05-29  9:27               ` Torsten Bögershausen
2021-05-30 21:44                 ` Jeff King
2021-05-30 21:55                   ` Yuri
2021-05-31  1:14                     ` Thomas Guyot
2021-05-31  3:35                       ` Bagas Sanjaya
2021-05-30 22:23                   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YK+mWZP+sl3zXECx@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=tboegi@web.de \
    --cc=yuri@rawbw.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).