git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Let's use the Win32 API more precisely
@ 2019-06-27  9:37 Johannes Schindelin via GitGitGadget
  2019-06-27  9:37 ` [PATCH 1/2] mingw: get pw_name in UTF-8 format Johannes Schindelin via GitGitGadget
  2019-06-27  9:37 ` [PATCH 2/2] mingw: use Unicode functions explicitly Johannes Schindelin via GitGitGadget
  0 siblings, 2 replies; 8+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-06-27  9:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

For many Win32 functions, there actually exist two variants: one that takes 
const char * ("ANSI", meaning the current code page) and wchar_t * 
("Unicode", i.e. UTF-16, at least for all practical matters).

These functions have "A" and "W" suffixes, respectively, e.g. 
GetFileAttributesW(). The symbols without this suffix are #defined to the 
*W() versions if the constant UNICODE is defined before including the
Windows headers, and to *A() otherwise.

Let's not rely on this constant, but explicitly say what we want: we want
the Unicode versions, as they seem to be used by the ANSI flavor anyway.

Johannes Schindelin (2):
  mingw: get pw_name in UTF-8 format
  mingw: use Unicode functions explicitly

 compat/mingw.c     | 12 +++++++++---
 compat/poll/poll.c |  2 +-
 compat/winansi.c   | 10 ++++++----
 3 files changed, 16 insertions(+), 8 deletions(-)


base-commit: aa25c82427ae70aebf3b8f970f2afd54e9a2a8c6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-147%2Fdscho%2Fansi-unicode-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-147/dscho/ansi-unicode-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/147
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] mingw: get pw_name in UTF-8 format
  2019-06-27  9:37 [PATCH 0/2] Let's use the Win32 API more precisely Johannes Schindelin via GitGitGadget
@ 2019-06-27  9:37 ` Johannes Schindelin via GitGitGadget
  2019-07-04 20:13   ` Beat Bolli
  2019-06-27  9:37 ` [PATCH 2/2] mingw: use Unicode functions explicitly Johannes Schindelin via GitGitGadget
  1 sibling, 1 reply; 8+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-06-27  9:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Previously, we would have obtained the user name encoded in whatever the
current code page is.

Note: the "user name" here does not denote the full name but instead the
short logon name.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 compat/mingw.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 9b6d2400e1..8526876262 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1946,13 +1946,19 @@ struct passwd *getpwuid(int uid)
 	static unsigned initialized;
 	static char user_name[100];
 	static struct passwd *p;
+	wchar_t buf[100];
 	DWORD len;
 
 	if (initialized)
 		return p;
 
-	len = sizeof(user_name);
-	if (!GetUserName(user_name, &len)) {
+	len = sizeof(buf);
+	if (!GetUserNameW(buf, &len)) {
+		initialized = 1;
+		return NULL;
+	}
+
+	if (xwcstoutf(user_name, buf, sizeof(user_name)) < 0) {
 		initialized = 1;
 		return NULL;
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] mingw: use Unicode functions explicitly
  2019-06-27  9:37 [PATCH 0/2] Let's use the Win32 API more precisely Johannes Schindelin via GitGitGadget
  2019-06-27  9:37 ` [PATCH 1/2] mingw: get pw_name in UTF-8 format Johannes Schindelin via GitGitGadget
@ 2019-06-27  9:37 ` Johannes Schindelin via GitGitGadget
  2019-06-27 17:26   ` Eric Sunshine
  1 sibling, 1 reply; 8+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-06-27  9:37 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Many Win32 API functions actually exist in two variants: one with
the `A` suffix that takes ANSI parameters (`char *` or `const char *`)
and one with the `W` suffix that takes Unicode parameters (`wchar_t *`
or `const wchar_t *`).

The ANSI variant assumes that the strings are encoded according to
whatever is the current locale. This is not what Git wants to use on
Windows: we assume that `char *` variables point to strings encoded in
UTF-8.

There is a pseudo UTF-8 locale on Windows, but it does not work
as one might expect. In addition, if we overrode the user's locale, that
would modify the behavior of programs spawned by Git (such as editors,
difftools, etc), therefore we cannot use that pseudo locale.

Further, it is actually highly encouraged to use the Unicode versions
instead of the ANSI versions, so let's do precisely that.

Note: when calling the Win32 API functions _without_ any suffix, it
depends whether the `UNICODE` constant is defined before the relevant
headers are #include'd. Without that constant, the ANSI variants are
used. Let's be explicit and avoid that ambiguity.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 compat/mingw.c     |  2 +-
 compat/poll/poll.c |  2 +-
 compat/winansi.c   | 10 ++++++----
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 8526876262..b8a62bf914 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1407,7 +1407,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaen
 	do_unset_environment_variables();
 
 	/* Determine whether or not we are associated to a console */
-	cons = CreateFile("CONOUT$", GENERIC_WRITE,
+	cons = CreateFileW(L"CONOUT$", GENERIC_WRITE,
 			FILE_SHARE_WRITE, NULL, OPEN_EXISTING,
 			FILE_ATTRIBUTE_NORMAL, NULL);
 	if (cons == INVALID_HANDLE_VALUE) {
diff --git a/compat/poll/poll.c b/compat/poll/poll.c
index 4459408c7d..8f24b80252 100644
--- a/compat/poll/poll.c
+++ b/compat/poll/poll.c
@@ -150,7 +150,7 @@ win32_compute_revents (HANDLE h, int *p_sought)
       if (!once_only)
 	{
 	  NtQueryInformationFile = (PNtQueryInformationFile)
-	    GetProcAddress (GetModuleHandle ("ntdll.dll"),
+	    GetProcAddress (GetModuleHandleW (L"ntdll.dll"),
 			    "NtQueryInformationFile");
 	  once_only = TRUE;
 	}
diff --git a/compat/winansi.c b/compat/winansi.c
index f4f08237f9..cd947e048e 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -599,7 +599,7 @@ int winansi_isatty(int fd)
 void winansi_init(void)
 {
 	int con1, con2;
-	char name[32];
+	wchar_t name[32];
 
 	/* check if either stdout or stderr is a console output screen buffer */
 	con1 = is_console(1);
@@ -619,13 +619,15 @@ void winansi_init(void)
 	}
 
 	/* create a named pipe to communicate with the console thread */
-	xsnprintf(name, sizeof(name), "\\\\.\\pipe\\winansi%lu", GetCurrentProcessId());
-	hwrite = CreateNamedPipe(name, PIPE_ACCESS_OUTBOUND,
+	if (swprintf(name, ARRAY_SIZE(name) - 1, L"\\\\.\\pipe\\winansi%lu",
+		     GetCurrentProcessId()) < 0)
+		die("Could not initialize winansi pipe name");
+	hwrite = CreateNamedPipeW(name, PIPE_ACCESS_OUTBOUND,
 		PIPE_TYPE_BYTE | PIPE_WAIT, 1, BUFFER_SIZE, 0, 0, NULL);
 	if (hwrite == INVALID_HANDLE_VALUE)
 		die_lasterr("CreateNamedPipe failed");
 
-	hread = CreateFile(name, GENERIC_READ, 0, NULL, OPEN_EXISTING, 0, NULL);
+	hread = CreateFileW(name, GENERIC_READ, 0, NULL, OPEN_EXISTING, 0, NULL);
 	if (hread == INVALID_HANDLE_VALUE)
 		die_lasterr("CreateFile for named pipe failed");
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mingw: use Unicode functions explicitly
  2019-06-27  9:37 ` [PATCH 2/2] mingw: use Unicode functions explicitly Johannes Schindelin via GitGitGadget
@ 2019-06-27 17:26   ` Eric Sunshine
  2019-06-27 18:54     ` Johannes Schindelin
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Sunshine @ 2019-06-27 17:26 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Git List, Junio C Hamano, Johannes Schindelin

On Thu, Jun 27, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Many Win32 API functions actually exist in two variants: one with
> the `A` suffix that takes ANSI parameters (`char *` or `const char *`)
> and one with the `W` suffix that takes Unicode parameters (`wchar_t *`
> or `const wchar_t *`).
>
> The ANSI variant assumes that the strings are encoded according to
> whatever is the current locale. This is not what Git wants to use on
> Windows: we assume that `char *` variables point to strings encoded in
> UTF-8.
>
> There is a pseudo UTF-8 locale on Windows, but it does not work
> as one might expect.

What does "does not work as one might expect" mean? The reader is left
hanging, not knowing why or how the UTF-8 locale on Windows is
undesirable.

> In addition, if we overrode the user's locale, that
> would modify the behavior of programs spawned by Git (such as editors,
> difftools, etc), therefore we cannot use that pseudo locale.
>
> Further, it is actually highly encouraged to use the Unicode versions
> instead of the ANSI versions, so let's do precisely that.
>
> Note: when calling the Win32 API functions _without_ any suffix, it
> depends whether the `UNICODE` constant is defined before the relevant
> headers are #include'd. Without that constant, the ANSI variants are
> used. Let's be explicit and avoid that ambiguity.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mingw: use Unicode functions explicitly
  2019-06-27 17:26   ` Eric Sunshine
@ 2019-06-27 18:54     ` Johannes Schindelin
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Schindelin @ 2019-06-27 18:54 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Johannes Schindelin via GitGitGadget, Git List, Junio C Hamano

Hi Eric,

On Thu, 27 Jun 2019, Eric Sunshine wrote:

> On Thu, Jun 27, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > Many Win32 API functions actually exist in two variants: one with
> > the `A` suffix that takes ANSI parameters (`char *` or `const char *`)
> > and one with the `W` suffix that takes Unicode parameters (`wchar_t *`
> > or `const wchar_t *`).
> >
> > The ANSI variant assumes that the strings are encoded according to
> > whatever is the current locale. This is not what Git wants to use on
> > Windows: we assume that `char *` variables point to strings encoded in
> > UTF-8.
> >
> > There is a pseudo UTF-8 locale on Windows, but it does not work
> > as one might expect.
>
> What does "does not work as one might expect" mean? The reader is left
> hanging, not knowing why or how the UTF-8 locale on Windows is
> undesirable.

Should I really bore the reader with half-details? At least one of those
behaviors was reported as an unbootable Windows 7, but some others could
not reproduce.

I really did not want to write a novel here that is 1) largely irrelevant
and 2) would take way too long to write because I forgot most of the
details and would have to look them up again.

Hopefully we can do without the full story here, as even one report that
it does not work as expected is enough to make this an unfeasible option
for us, and that's that as far as I am concerned.

Ciao,
Dscho

>
> > In addition, if we overrode the user's locale, that
> > would modify the behavior of programs spawned by Git (such as editors,
> > difftools, etc), therefore we cannot use that pseudo locale.
> >
> > Further, it is actually highly encouraged to use the Unicode versions
> > instead of the ANSI versions, so let's do precisely that.
> >
> > Note: when calling the Win32 API functions _without_ any suffix, it
> > depends whether the `UNICODE` constant is defined before the relevant
> > headers are #include'd. Without that constant, the ANSI variants are
> > used. Let's be explicit and avoid that ambiguity.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mingw: get pw_name in UTF-8 format
  2019-06-27  9:37 ` [PATCH 1/2] mingw: get pw_name in UTF-8 format Johannes Schindelin via GitGitGadget
@ 2019-07-04 20:13   ` Beat Bolli
  2019-07-04 21:52     ` Johannes Schindelin
  0 siblings, 1 reply; 8+ messages in thread
From: Beat Bolli @ 2019-07-04 20:13 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git
  Cc: Junio C Hamano, Johannes Schindelin

On 27.06.19 11:37, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> Previously, we would have obtained the user name encoded in whatever the
> current code page is.
> 
> Note: the "user name" here does not denote the full name but instead the
> short logon name.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  compat/mingw.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/compat/mingw.c b/compat/mingw.c
> index 9b6d2400e1..8526876262 100644
> --- a/compat/mingw.c
> +++ b/compat/mingw.c
> @@ -1946,13 +1946,19 @@ struct passwd *getpwuid(int uid)
>  	static unsigned initialized;
>  	static char user_name[100];
>  	static struct passwd *p;
> +	wchar_t buf[100];
>  	DWORD len;
>  
>  	if (initialized)
>  		return p;
>  
> -	len = sizeof(user_name);
> -	if (!GetUserName(user_name, &len)) {
> +	len = sizeof(buf);

I think this should be "len = sizeof(buf) / sizeof(buf[0])".

GetUserNameW() takes the number of characters, not bytes.

> +	if (!GetUserNameW(buf, &len)) {
> +		initialized = 1;
> +		return NULL;
> +	}
> +
> +	if (xwcstoutf(user_name, buf, sizeof(user_name)) < 0) {
>  		initialized = 1;
>  		return NULL;
>  	}
> 

Cheers,
Beat

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mingw: get pw_name in UTF-8 format
  2019-07-04 20:13   ` Beat Bolli
@ 2019-07-04 21:52     ` Johannes Schindelin
  2019-07-04 21:54       ` Beat Bolli
  0 siblings, 1 reply; 8+ messages in thread
From: Johannes Schindelin @ 2019-07-04 21:52 UTC (permalink / raw)
  To: Beat Bolli; +Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

Hi Beat,

On Thu, 4 Jul 2019, Beat Bolli wrote:

> On 27.06.19 11:37, Johannes Schindelin via GitGitGadget wrote:
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > Previously, we would have obtained the user name encoded in whatever the
> > current code page is.
> >
> > Note: the "user name" here does not denote the full name but instead the
> > short logon name.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  compat/mingw.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/compat/mingw.c b/compat/mingw.c
> > index 9b6d2400e1..8526876262 100644
> > --- a/compat/mingw.c
> > +++ b/compat/mingw.c
> > @@ -1946,13 +1946,19 @@ struct passwd *getpwuid(int uid)
> >  	static unsigned initialized;
> >  	static char user_name[100];
> >  	static struct passwd *p;
> > +	wchar_t buf[100];
> >  	DWORD len;
> >
> >  	if (initialized)
> >  		return p;
> >
> > -	len = sizeof(user_name);
> > -	if (!GetUserName(user_name, &len)) {
> > +	len = sizeof(buf);
>
> I think this should be "len = sizeof(buf) / sizeof(buf[0])".
>
> GetUserNameW() takes the number of characters, not bytes.

Good catch.

How about I use `ARRAY_SIZE(buf)`? Since this is already in `next`, I will
prepare a follow-up patch.

Ciao,
Dscho

>
> > +	if (!GetUserNameW(buf, &len)) {
> > +		initialized = 1;
> > +		return NULL;
> > +	}
> > +
> > +	if (xwcstoutf(user_name, buf, sizeof(user_name)) < 0) {
> >  		initialized = 1;
> >  		return NULL;
> >  	}
> >
>
> Cheers,
> Beat
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mingw: get pw_name in UTF-8 format
  2019-07-04 21:52     ` Johannes Schindelin
@ 2019-07-04 21:54       ` Beat Bolli
  0 siblings, 0 replies; 8+ messages in thread
From: Beat Bolli @ 2019-07-04 21:54 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano

Hi Dscho

On 04.07.19 23:52, Johannes Schindelin wrote:
> Hi Beat,
> 
> On Thu, 4 Jul 2019, Beat Bolli wrote:
> 
>> On 27.06.19 11:37, Johannes Schindelin via GitGitGadget wrote:
>>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>>>
>>> Previously, we would have obtained the user name encoded in whatever the
>>> current code page is.
>>>
>>> Note: the "user name" here does not denote the full name but instead the
>>> short logon name.
>>>
>>> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>>> ---
>>>  compat/mingw.c | 10 ++++++++--
>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/compat/mingw.c b/compat/mingw.c
>>> index 9b6d2400e1..8526876262 100644
>>> --- a/compat/mingw.c
>>> +++ b/compat/mingw.c
>>> @@ -1946,13 +1946,19 @@ struct passwd *getpwuid(int uid)
>>>  	static unsigned initialized;
>>>  	static char user_name[100];
>>>  	static struct passwd *p;
>>> +	wchar_t buf[100];
>>>  	DWORD len;
>>>
>>>  	if (initialized)
>>>  		return p;
>>>
>>> -	len = sizeof(user_name);
>>> -	if (!GetUserName(user_name, &len)) {
>>> +	len = sizeof(buf);
>>
>> I think this should be "len = sizeof(buf) / sizeof(buf[0])".
>>
>> GetUserNameW() takes the number of characters, not bytes.
> 
> Good catch.
> 
> How about I use `ARRAY_SIZE(buf)`? Since this is already in `next`, I will
> prepare a follow-up patch.

Of course, ARRAY_SIZE() is even better. I didn't remember it existed.

Cheers,
Beat

> 
> Ciao,
> Dscho
> 
>>
>>> +	if (!GetUserNameW(buf, &len)) {
>>> +		initialized = 1;
>>> +		return NULL;
>>> +	}
>>> +
>>> +	if (xwcstoutf(user_name, buf, sizeof(user_name)) < 0) {
>>>  		initialized = 1;
>>>  		return NULL;
>>>  	}
>>>
>>
>> Cheers,
>> Beat
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-07-04 21:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-27  9:37 [PATCH 0/2] Let's use the Win32 API more precisely Johannes Schindelin via GitGitGadget
2019-06-27  9:37 ` [PATCH 1/2] mingw: get pw_name in UTF-8 format Johannes Schindelin via GitGitGadget
2019-07-04 20:13   ` Beat Bolli
2019-07-04 21:52     ` Johannes Schindelin
2019-07-04 21:54       ` Beat Bolli
2019-06-27  9:37 ` [PATCH 2/2] mingw: use Unicode functions explicitly Johannes Schindelin via GitGitGadget
2019-06-27 17:26   ` Eric Sunshine
2019-06-27 18:54     ` Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).