git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Question] Is it normal for accented characters to be shown as decomposed Unicode on GNU/Linux?
@ 2015-06-22 13:17 Bastien Traverse
  2015-06-22 15:04 ` Charles Bailey
  0 siblings, 1 reply; 3+ messages in thread
From: Bastien Traverse @ 2015-06-22 13:17 UTC (permalink / raw)
  To: git

Hi everybody,

I have a repository where some files and folders contain accented
characters due to being in French. Such names include "rêve" (dream),
"réunion" (meeting) etc.

Whether already in version control or not, git tools only show their
*decomposed* representation (I use a UTF-8 locale, see below), but don't
accept those representations as input (and auto-completion is broken for
those), which is a bit misleading (test case follows).

I've seen the threads about accented characters on OSX and the use of
'core.precomposeunicode', but as I'm running on GNU/Linux I thought this
shouldn't apply.

Since I've already had a problem in git with a weirdly encoded character
(see http://thread.gmane.org/gmane.comp.version-control.git/269710), I
wanted to get some feedback to determine whether my setup was the cause
of it or if it was normal to see decomposed file names in git. I found
in man git-status:

> If a filename contains whitespace or other nonprintable
> characters, that field will be quoted in the manner of a C string
> literal: surrounded by ASCII double quote (34) characters, and with
> interior special characters backslash-escaped.

So do everybody using accented characters see those in decomposed form
in git? And if so why some softwares built on top of it (like gitit [1])
don't inherit those decomposed representations?

[1] http://gitit.net/

Thanks!

---
test case:
$ mkdir accent-test && cd !$
$ git init
$ touch rêve réunion
$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	"r\303\251union"
	"r\303\252ve"
$ git add .
$ git commit -m "accent test"
[master (root commit) 0d776b7] accent test
 2 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 "r\303\251union"
 create mode 100644 "r\303\252ve"
$ git log --summary
commit 0d776b7a09d5384a76066999431507018e292efe
Author: Bastien Traverse <bastien@traverse.email>
Date:   2015-06-22 14:13:46 +0200

    accent test

 create mode 100644 "r\303\251union"
 create mode 100644 "r\303\252ve"
$ mv rêve reve
$ git status
On branch master
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	deleted:    "r\303\252ve"

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	reve

no changes added to commit (use "git add" and/or "git commit -a")
$ git add [[TAB-TAB]]
"r\303\252ve"  reve
$ git add "[[TAB]] --> git add "\"r\\303\\252ve\""
fatal: pathspec '"r\303\252ve"' did not match any files
$ git add "r\303\252ve"
fatal: pathspec 'r\303\252ve' did not match any files
$ git add rêve reve OR git add .
$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	renamed:    "r\303\252ve" -> reve

I'm running an up-to-date Arch linux with following software versions
and locale config:

$ uname -a
Linux xxx 4.0.5-1-ARCH #1 SMP PREEMPT Sat Jun 6 18:37:49 CEST 2015
x86_64 GNU/Linux
$ bash --version
GNU bash, version 4.3.39(1)-release (x86_64-unknown-linux-gnu)
$ git --version
git version 2.4.3
$ locale
LANG=fr_FR.utf8
LC_CTYPE="fr_FR.utf8"
LC_NUMERIC=fr_FR.utf8
LC_TIME=fr_FR.utf8
LC_COLLATE="fr_FR.utf8"
LC_MONETARY=fr_FR.utf8
LC_MESSAGES="fr_FR.utf8"
LC_PAPER=fr_FR.utf8
LC_NAME="fr_FR.utf8"
LC_ADDRESS="fr_FR.utf8"
LC_TELEPHONE="fr_FR.utf8"
LC_MEASUREMENT=fr_FR.utf8
LC_IDENTIFICATION="fr_FR.utf8"
LC_ALL=
$ localectl
   System Locale: LANG=fr_FR.UTF8
       VC Keymap: fr
      X11 Layout: fr
     X11 Variant: oss

Cheers

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Question] Is it normal for accented characters to be shown as decomposed Unicode on GNU/Linux?
  2015-06-22 13:17 [Question] Is it normal for accented characters to be shown as decomposed Unicode on GNU/Linux? Bastien Traverse
@ 2015-06-22 15:04 ` Charles Bailey
  2015-06-22 16:13   ` Bastien Traverse
  0 siblings, 1 reply; 3+ messages in thread
From: Charles Bailey @ 2015-06-22 15:04 UTC (permalink / raw)
  To: Bastien Traverse; +Cc: git

On Mon, Jun 22, 2015 at 03:17:40PM +0200, Bastien Traverse wrote:
> test case:
> $ mkdir accent-test && cd !$
> $ git init
> $ touch rêve réunion
> $ git status
> On branch master
> 
> Initial commit
> 
> Untracked files:
>   (use "git add <file>..." to include in what will be committed)
> 
> 	"r\303\251union"
> 	"r\303\252ve"

Note that these aren't "decomposed" (in the unicode decomposition
sense) but are merely octal escaped representations of the utf-8
encoded file names.

My understanding that this is normal and probably dates back (at least
for status as far as:

	commit a734d0b10bd0f5554abb3acdf11426040cfc4df0
	Author: Dmitry Potapov <dpotapov@gmail.com>
	Date:   Fri Mar 7 05:30:58 2008 +0300

	Make private quote_path() in wt-status.c available as
quote_path_relative()

	[...]

The behaviour can be changed by setting the git config variable
"core.quotePath" to false.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Question] Is it normal for accented characters to be shown as decomposed Unicode on GNU/Linux?
  2015-06-22 15:04 ` Charles Bailey
@ 2015-06-22 16:13   ` Bastien Traverse
  0 siblings, 0 replies; 3+ messages in thread
From: Bastien Traverse @ 2015-06-22 16:13 UTC (permalink / raw)
  To: Charles Bailey; +Cc: git

Le 22/06/2015 17:04, Charles Bailey a écrit :
> Note that these aren't "decomposed" (in the unicode decomposition
> sense) but are merely octal escaped representations of the utf-8
> encoded file names.

Thanks, I had read that term in similar context (German umlaut) and
thought it was correctly describing the phenomenon. Key words "octal
escape" return more precise results :)

> My understanding that this is normal and probably dates back (at least
> for status as far as:
> 
> 	commit a734d0b10bd0f5554abb3acdf11426040cfc4df0
> 	Author: Dmitry Potapov <dpotapov@gmail.com>
> 	Date:   Fri Mar 7 05:30:58 2008 +0300
> 
> 	Make private quote_path() in wt-status.c available as
> quote_path_relative()
> 
> 	[...]
> 
> The behaviour can be changed by setting the git config variable
> "core.quotePath" to false.

This is awesome, thank you. Indeed I just tried my test case with this
config option set to false and accented characters appear normally.

Thank you!

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-06-22 16:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-22 13:17 [Question] Is it normal for accented characters to be shown as decomposed Unicode on GNU/Linux? Bastien Traverse
2015-06-22 15:04 ` Charles Bailey
2015-06-22 16:13   ` Bastien Traverse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).