* [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages @ 2021-05-26 22:47 Yuri 2021-05-26 23:32 ` Junio C Hamano 0 siblings, 1 reply; 14+ messages in thread From: Yuri @ 2021-05-26 22:47 UTC (permalink / raw) To: Git Mailing List I have the file that contains the "∞" character in its name. When this file was modified, 'git status .' showed it as: > modified: "file-name-\342\210\236.ext" It replaced the UTF8 character with its byte representation, and put the file name in quotes. git should show such files without escaping when the terminal is able to show UTF8 characters because escaping decreases readability. $ env | grep TERM COLORTERM=truecolor TERM=xterm-256color $ env | grep LANG LANG=C.UTF-8 $ env | grep CTYPE LC_CTYPE=en_US.UTF-8 Thanks, Yuri ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-26 22:47 [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages Yuri @ 2021-05-26 23:32 ` Junio C Hamano 2021-05-26 23:41 ` Yuri 0 siblings, 1 reply; 14+ messages in thread From: Junio C Hamano @ 2021-05-26 23:32 UTC (permalink / raw) To: Yuri; +Cc: Git Mailing List Yuri <yuri@rawbw.com> writes: > I have the file that contains the "∞" character in its name. "git config core.quotepath no"? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-26 23:32 ` Junio C Hamano @ 2021-05-26 23:41 ` Yuri 2021-05-27 4:56 ` Torsten Bögershausen 0 siblings, 1 reply; 14+ messages in thread From: Yuri @ 2021-05-26 23:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List On 5/26/21 4:32 PM, Junio C Hamano wrote: > "git config core.quotepath no"? I didn't have the 'core.quotepath' value set. 'git config core.quotepath no' changed the behavior to no quoting. So it looks like the default value of 'core.quotepath' is incorrect: it should be based on terminal capabilities. Yuri ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-26 23:41 ` Yuri @ 2021-05-27 4:56 ` Torsten Bögershausen 2021-05-27 14:02 ` Jeff King 0 siblings, 1 reply; 14+ messages in thread From: Torsten Bögershausen @ 2021-05-27 4:56 UTC (permalink / raw) To: Yuri; +Cc: Junio C Hamano, Git Mailing List On Wed, May 26, 2021 at 04:41:38PM -0700, Yuri wrote: > On 5/26/21 4:32 PM, Junio C Hamano wrote: > > "git config core.quotepath no"? > > > I didn't have the 'core.quotepath' value set. 'git config core.quotepath no' > changed the behavior to no quoting. > > So it looks like the default value of 'core.quotepath' is incorrect: it > should be based on terminal capabilities. > This are 2 different things. If you are in a project where only ASCII names are allowed (for whatever reason), you may want `git config core.quotepath no`, regardless what the terminal can do. (Beside that, are ther terminals that don't handle UTF-8 these days?) Any, if you prefer UTF-8 as a default, git config --global core.quotepath yes is your friend (like mine) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-27 4:56 ` Torsten Bögershausen @ 2021-05-27 14:02 ` Jeff King 2021-05-27 20:50 ` Yuri 0 siblings, 1 reply; 14+ messages in thread From: Jeff King @ 2021-05-27 14:02 UTC (permalink / raw) To: Torsten Bögershausen; +Cc: Yuri, Junio C Hamano, Git Mailing List On Thu, May 27, 2021 at 06:56:28AM +0200, Torsten Bögershausen wrote: > On Wed, May 26, 2021 at 04:41:38PM -0700, Yuri wrote: > > On 5/26/21 4:32 PM, Junio C Hamano wrote: > > > "git config core.quotepath no"? > > > > > > I didn't have the 'core.quotepath' value set. 'git config core.quotepath no' > > changed the behavior to no quoting. > > > > So it looks like the default value of 'core.quotepath' is incorrect: it > > should be based on terminal capabilities. > > > > This are 2 different things. > If you are in a project where only ASCII names are allowed (for whatever reason), > you may want `git config core.quotepath no`, regardless what the terminal can do. > > (Beside that, are ther terminals that don't handle UTF-8 these days?) I don't think core.quotepath is just about UTF-8. It is agnostic to the encoding of the paths, so it is really a question of whether to just pass through bytes with the high bit set. So I think the more accurate question is: do the paths in your repositories generally contain bytes that your terminal can interpret sensibly? I'd guess the answer is usually yes, even if you are using latin1 or similar (or else "ls" would show you mojibake, too). But there's a follow-on, too: do all the other things which consume quoted path output likewise handle it? Setting core.quotepath will impact all parts of Git, including plumbing. So a script that parses diff-tree output, for example, will see a difference. I'd guess that most text-processing tools these days are reasonably happy with high-bit chars. But if we were to flip the default, we might see regressions with: - very old / obscure systems (I'd guess even old versions of GNU tools are good, but who knows what Solaris sed will do) - some scripting languages (like perl and ruby) have internal strings that are encoding-aware, and so they are picky about reading high-bit input from a descriptor, especially if it isn't utf8. The fix is usually easy-ish, but may be a surprise for some folks (OTOH, I can imagine it fixes bugs in sloppily-written scripts which did not anticipate the incoming filenames being quoted ;) ). As Git is used more and more internationally, I suspect the value of defaulting core.quotepath=no increases. And as time goes on and people tend to standardize on utf8-aware tools and environments, the risk of doing so decreases. So while core.quotepath=yes was a conservative choice in 2007, it might be time to look at switching. > Any, if you prefer UTF-8 as a default, > > git config --global core.quotepath yes > > is your friend (like mine) Just a nit/clarification for other readers, but I think you have yes/no flipped here and earlier in your message. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-27 14:02 ` Jeff King @ 2021-05-27 20:50 ` Yuri 2021-05-28 4:39 ` Bagas Sanjaya 0 siblings, 1 reply; 14+ messages in thread From: Yuri @ 2021-05-27 20:50 UTC (permalink / raw) To: Jeff King, Torsten Bögershausen; +Cc: Junio C Hamano, Git Mailing List It's not clear from the conversation if git reads terminal capabilities at all. But the default behavior, without any options set, should be to read terminal capabilities, and write non-ASCII characters verbatim when terminal supports this and escape them when terminal doesn't support them. Current default behavior appears to be to always escape non-ASCII characters. Then options can change this basic behavior according to user's choice. Yuri ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-27 20:50 ` Yuri @ 2021-05-28 4:39 ` Bagas Sanjaya 2021-05-28 4:45 ` Yuri 0 siblings, 1 reply; 14+ messages in thread From: Bagas Sanjaya @ 2021-05-28 4:39 UTC (permalink / raw) To: Yuri, Jeff King, Torsten Bögershausen Cc: Junio C Hamano, Git Mailing List On 28/05/21 03.50, Yuri wrote: > It's not clear from the conversation if git reads terminal capabilities > at all. > > > But the default behavior, without any options set, should be to read > terminal capabilities, and write non-ASCII characters verbatim when > terminal supports this and escape them when terminal doesn't support them. > > Current default behavior appears to be to always escape non-ASCII > characters. So the current default is only supports ASCII, and escape other characters, right? -- An old man doll... just what I always wanted! - Clara ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-28 4:39 ` Bagas Sanjaya @ 2021-05-28 4:45 ` Yuri 2021-05-29 9:27 ` Torsten Bögershausen 0 siblings, 1 reply; 14+ messages in thread From: Yuri @ 2021-05-28 4:45 UTC (permalink / raw) To: Bagas Sanjaya, Jeff King, Torsten Bögershausen Cc: Junio C Hamano, Git Mailing List On 5/27/21 9:39 PM, Bagas Sanjaya wrote: > So the current default is only supports ASCII, and escape other > characters, right? It appears this way. Yuri ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-28 4:45 ` Yuri @ 2021-05-29 9:27 ` Torsten Bögershausen 2021-05-30 21:44 ` Jeff King 0 siblings, 1 reply; 14+ messages in thread From: Torsten Bögershausen @ 2021-05-29 9:27 UTC (permalink / raw) To: Yuri; +Cc: Bagas Sanjaya, Jeff King, Junio C Hamano, Git Mailing List On Thu, May 27, 2021 at 09:45:53PM -0700, Yuri wrote: > On 5/27/21 9:39 PM, Bagas Sanjaya wrote: > > So the current default is only supports ASCII, and escape other > > characters, right? > > > It appears this way. > Yes, that is how it is. After reading the wiki here: https://wiki.gentoo.org/wiki/UTF-8 (There are many other web pages as well) I am not sure that there is a reliable way for Git to detect, if the terminal is capable of handling UTF-8. This should work reliable under Linux, Windows, Mac and all the supported Unix-ish platforms. Beside that, the outputs of git commands can be feed into other programs via a pipe usning "|" on the command line or redirectet to a file. And what is a terminal ? We need to consider that we run programs like `less` or `more´ which need to be UTF-8 compatble. Most of them are probably UTF-8 compliant (and LANG is set to xx.UTF-8) these days. And most repositories have been feed with filenames encoded in UTF-8 as well. Having said that, the default could be switched some day in the future. Before that is "save", there may be a transition phase, where users are warned that the default may change. Scripts calling git need to use `git -c core.quotepath=yes`, or no, whatever input they expect. Sorry for the longish answer. Changing one thing for some users may effect hundrets, thousands or millions of other users later, cause surprises, need debugging and fixing effort. Does someone wants to come up with a patch that anounces a possible change ? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-29 9:27 ` Torsten Bögershausen @ 2021-05-30 21:44 ` Jeff King 2021-05-30 21:55 ` Yuri 2021-05-30 22:23 ` Junio C Hamano 0 siblings, 2 replies; 14+ messages in thread From: Jeff King @ 2021-05-30 21:44 UTC (permalink / raw) To: Torsten Bögershausen Cc: Yuri, Bagas Sanjaya, Junio C Hamano, Git Mailing List On Sat, May 29, 2021 at 11:27:52AM +0200, Torsten Bögershausen wrote: > I am not sure that there is a reliable way for Git to detect, if the > terminal is capable of handling UTF-8. > This should work reliable under Linux, Windows, Mac and all the supported > Unix-ish platforms. Yeah, I'm not sure how such a check would be done. On most Linux systems I've seen, $LANG will mention "en_US.UTF-8" or similar. But I've no idea how portable that convention is, not to mention that people may have more complex setups anyway (e.g., not setting $LANG but setting some of LC_*). But more importantly, this is not even a UTF-8 problem. It is "can your terminal do something sensible with high-bit characters in filenames of your repositories". We don't know the encoding of those filenames (and you may even have a mix). (And likewise "terminal" here is really "whatever consumes Git's output, be it the terminal or some program you've piped to). > Having said that, the default could be switched some day in the future. > Before that is "save", there may be a transition phase, > where users are warned that the default may change. > Scripts calling git need to use `git -c core.quotepath=yes`, or no, > whatever input they expect. Yes. If we're going to do anything, I think it would be to say "most terminals and programs deal with high-bit characters OK these days, so switching the default is more likely to fix things than break them". I suspect most scripts would be OK either way. They need to handle maybe-quoted filenames already, so it is really just a question of whether the consuming program is OK with the high bits. If so, we could probably get away with just a mention in the release notes, rather than an annoying transition phase (which is likely to simply confuse most users, who are unaware of the issue entirely). But I'd feel more confident if whoever proposes such a change does some research on how piping such names into common tools and scripting languages works (both for utf8 and non-utf8 names). -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-30 21:44 ` Jeff King @ 2021-05-30 21:55 ` Yuri 2021-05-31 1:14 ` Thomas Guyot 2021-05-30 22:23 ` Junio C Hamano 1 sibling, 1 reply; 14+ messages in thread From: Yuri @ 2021-05-30 21:55 UTC (permalink / raw) To: Jeff King, Torsten Bögershausen Cc: Bagas Sanjaya, Junio C Hamano, Git Mailing List On 5/30/21 2:44 PM, Jeff King wrote: > Yeah, I'm not sure how such a check would be done. On most Linux systems > I've seen, $LANG will mention "en_US.UTF-8" or similar. But I've no idea > how portable that convention is, not to mention that people may have > more complex setups anyway (e.g., not setting $LANG but setting some of > LC_*). When 'locale charmap' prints 'UTF-8' the terminal can be assumed to be able to accept UTF-8 characters. 'locale charmap', I think, determines this only based on environment variables. Yuri ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-30 21:55 ` Yuri @ 2021-05-31 1:14 ` Thomas Guyot 2021-05-31 3:35 ` Bagas Sanjaya 0 siblings, 1 reply; 14+ messages in thread From: Thomas Guyot @ 2021-05-31 1:14 UTC (permalink / raw) To: Yuri, Jeff King, Torsten Bögershausen Cc: Bagas Sanjaya, Junio C Hamano, Git Mailing List On 2021-05-30 17:55, Yuri wrote: > On 5/30/21 2:44 PM, Jeff King wrote: >> Yeah, I'm not sure how such a check would be done. On most Linux systems >> I've seen, $LANG will mention "en_US.UTF-8" or similar. But I've no idea >> how portable that convention is, not to mention that people may have >> more complex setups anyway (e.g., not setting $LANG but setting some of >> LC_*). > > > When 'locale charmap' prints 'UTF-8' the terminal can be assumed to be > able to accept UTF-8 characters. > > 'locale charmap', I think, determines this only based on environment > variables. > Hi Yuri, Even if the terminal supports UTF8, will it print it properly? The font used could have no or minimal utf8 support. Even when it's supported, some characters might look alike and this could have undesired consequences (ex accidentally switching from a normal space to a non-break space while renaming a file that has spaces...). I believe repos with utf8 files are rare enough and it could be left to the user to select whenever to use utf8 or not... An option like "auto" or "detect" could make it automatic but I'm not convinced it should be the default. Oh, and looking at "locale charmap", it doesn't check the terminal capabilities at all - it just prints the charmap based on LC_ALL or LC_CTYPE value, or default if they're unset. It doesn't mater what terminal you're on... Regards, -- Thomas ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-31 1:14 ` Thomas Guyot @ 2021-05-31 3:35 ` Bagas Sanjaya 0 siblings, 0 replies; 14+ messages in thread From: Bagas Sanjaya @ 2021-05-31 3:35 UTC (permalink / raw) To: Thomas Guyot, Yuri, Jeff King, Torsten Bögershausen Cc: Junio C Hamano, Git Mailing List On 31/05/21 08.14, Thomas Guyot wrote: > Even if the terminal supports UTF8, will it print it properly? The font > used could have no or minimal utf8 support. Even when it's supported, > some characters might look alike and this could have undesired > consequences (ex accidentally switching from a normal space to a > non-break space while renaming a file that has spaces...). On Linux distributions, Noto and DejaVu fonts are often installed as default fonts, because Noto has almost complete Unicode coverage and DejaVu Mono become goto monospace font. And yeah, we steer clear of using non-monospace fonts (either serif or sans serif), because many terminal-only programs depend on text alignment which often can be achieved only with monospace fonts, and reading texts on terminal screen is vertical-oriented as opposed to horizontal-oriented texts like books. -- An old man doll... just what I always wanted! - Clara ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages 2021-05-30 21:44 ` Jeff King 2021-05-30 21:55 ` Yuri @ 2021-05-30 22:23 ` Junio C Hamano 1 sibling, 0 replies; 14+ messages in thread From: Junio C Hamano @ 2021-05-30 22:23 UTC (permalink / raw) To: Jeff King Cc: Torsten Bögershausen, Yuri, Bagas Sanjaya, Git Mailing List Jeff King <peff@peff.net> writes: > Yes. If we're going to do anything, I think it would be to say "most > terminals and programs deal with high-bit characters OK these days, so > switching the default is more likely to fix things than break them". Amen to that. The conservative setting was from v1.5.3 days in 2007, and it would be highly disappointing if the situation hasn't changed in the 14 years. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-05-31 3:36 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-26 22:47 [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages Yuri 2021-05-26 23:32 ` Junio C Hamano 2021-05-26 23:41 ` Yuri 2021-05-27 4:56 ` Torsten Bögershausen 2021-05-27 14:02 ` Jeff King 2021-05-27 20:50 ` Yuri 2021-05-28 4:39 ` Bagas Sanjaya 2021-05-28 4:45 ` Yuri 2021-05-29 9:27 ` Torsten Bögershausen 2021-05-30 21:44 ` Jeff King 2021-05-30 21:55 ` Yuri 2021-05-31 1:14 ` Thomas Guyot 2021-05-31 3:35 ` Bagas Sanjaya 2021-05-30 22:23 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).