All of lore.kernel.org
 help / color / mirror / Atom feed
* Need multibyte advice - Shift-JIS
@ 2019-02-27 13:04 Randall S. Becker
  2019-02-27 14:08 ` Michal Suchánek
  0 siblings, 1 reply; 18+ messages in thread
From: Randall S. Becker @ 2019-02-27 13:04 UTC (permalink / raw)
  To: git

Hi Git Team,

I have to admit being perplexed by this one. I have been asked to support
the Shift-JIS character set in file contents, comments, and logs, for a
partner of mine. I know there are a few ways to do this, but I'm looking for
the official non-hacky way to do this. This is CLI only, and our pager,
less, does not support multi-byte, so I'm looking for options there also.

I normally do all of my UTF-16 work from a workstation via ECLIPSE, with
UTF-8 comments, so never really have encountered this as an issue. Although
our UTF-16 HTML files look pretty ugly in a diff.

The platform (NonStop) does not have a lot of UTF-16 tooling by default
(less does not support it), so I may have to write stuff, which is no issue.
We are on 2.21.0 officially as of yesterday.

Kind Regards,
Randall

-- Brief whoami:
NonStop developer since approximately 211288444200000000
UNIX developer since approximately 421664400
-- In my real life, I talk too much.






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Need multibyte advice - Shift-JIS
  2019-02-27 13:04 Need multibyte advice - Shift-JIS Randall S. Becker
@ 2019-02-27 14:08 ` Michal Suchánek
  2019-02-27 15:54   ` Randall S. Becker
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Suchánek @ 2019-02-27 14:08 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: git

On Wed, 27 Feb 2019 08:04:08 -0500
"Randall S. Becker" <rsbecker@nexbridge.com> wrote:

> Hi Git Team,
> 
> I have to admit being perplexed by this one. I have been asked to support
> the Shift-JIS character set in file contents, comments, and logs, for a
> partner of mine. I know there are a few ways to do this, but I'm looking for
> the official non-hacky way to do this. This is CLI only, and our pager,
> less, does not support multi-byte, so I'm looking for options there also.

SJIS is about as much multibyte as UTF-8.

Why do you think less does not support it?

Last time I looked there was SJIS locale for libc so it is only matter
of generating the correct locales and using them. Of course, if you are
running in UTF-8 SJIS will look like garbage. 

HTH

Michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Need multibyte advice - Shift-JIS
  2019-02-27 14:08 ` Michal Suchánek
@ 2019-02-27 15:54   ` Randall S. Becker
  2019-02-27 16:11     ` Michal Suchánek
  0 siblings, 1 reply; 18+ messages in thread
From: Randall S. Becker @ 2019-02-27 15:54 UTC (permalink / raw)
  To: 'Michal Suchánek'; +Cc: git

On February 27, 2019 9:09, Michal Suchánek wrote:
> On Wed, 27 Feb 2019 08:04:08 -0500
> "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> 
> > Hi Git Team,
> >
> > I have to admit being perplexed by this one. I have been asked to
> > support the Shift-JIS character set in file contents, comments, and
> > logs, for a partner of mine. I know there are a few ways to do this,
> > but I'm looking for the official non-hacky way to do this. This is CLI
> > only, and our pager, less, does not support multi-byte, so I'm looking
for
> options there also.
> 
> SJIS is about as much multibyte as UTF-8.
> 
> Why do you think less does not support it?
> 
> Last time I looked there was SJIS locale for libc so it is only matter of
> generating the correct locales and using them. Of course, if you are
running
> in UTF-8 SJIS will look like garbage.

Sadly, I did not personally build less on this platform, and the libc used
did not include UTF-16, on the platform vendor supplied less. cat works
fine, but the usual LESSCHARSET=utf-16 is unsupported, so I am looking for
an alternative. THAT is why I think less does not support it. Sorry, I
should have made that more clear.

cat works fine, so if I set GIT_PAGER=cat, I can at least see the diffs
cleanly in SJIS, but this partner wants a pager that is usable.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Need multibyte advice - Shift-JIS
  2019-02-27 15:54   ` Randall S. Becker
@ 2019-02-27 16:11     ` Michal Suchánek
  2019-02-27 16:19       ` Randall S. Becker
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Suchánek @ 2019-02-27 16:11 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: git

On Wed, 27 Feb 2019 10:54:23 -0500
"Randall S. Becker" <rsbecker@nexbridge.com> wrote:

> On February 27, 2019 9:09, Michal Suchánek wrote:
> > On Wed, 27 Feb 2019 08:04:08 -0500
> > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> >   
> > > Hi Git Team,
> > >
> > > I have to admit being perplexed by this one. I have been asked to
> > > support the Shift-JIS character set in file contents, comments, and
> > > logs, for a partner of mine. I know there are a few ways to do this,
> > > but I'm looking for the official non-hacky way to do this. This is CLI
> > > only, and our pager, less, does not support multi-byte, so I'm looking  
> for
> > options there also.
> > 
> > SJIS is about as much multibyte as UTF-8.
> > 
> > Why do you think less does not support it?
> > 
> > Last time I looked there was SJIS locale for libc so it is only matter of
> > generating the correct locales and using them. Of course, if you are  
> running
> > in UTF-8 SJIS will look like garbage.  
> 
> Sadly, I did not personally build less on this platform, and the libc used
> did not include UTF-16, on the platform vendor supplied less. cat works
> fine, but the usual LESSCHARSET=utf-16 is unsupported, so I am looking for
> an alternative. THAT is why I think less does not support it. Sorry, I
> should have made that more clear.
> 
> cat works fine, so if I set GIT_PAGER=cat, I can at least see the diffs
> cleanly in SJIS, but this partner wants a pager that is usable.
> 

So you want to use SJIS because UTF-16 is not supported. So what is the
problem with SJIS (or UTF-8 for that matter)?

Thanks

Michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Need multibyte advice - Shift-JIS
  2019-02-27 16:11     ` Michal Suchánek
@ 2019-02-27 16:19       ` Randall S. Becker
  2019-02-27 16:28         ` Michal Suchánek
  0 siblings, 1 reply; 18+ messages in thread
From: Randall S. Becker @ 2019-02-27 16:19 UTC (permalink / raw)
  To: 'Michal Suchánek'; +Cc: git

On February 27, 2019 11:11, Michal Suchánek wrote:
> On Wed, 27 Feb 2019 10:54:23 -0500
> "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> 
> > On February 27, 2019 9:09, Michal Suchánek wrote:
> > > On Wed, 27 Feb 2019 08:04:08 -0500
> > > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> > >
> > > > Hi Git Team,
> > > >
> > > > I have to admit being perplexed by this one. I have been asked to
> > > > support the Shift-JIS character set in file contents, comments,
> > > > and logs, for a partner of mine. I know there are a few ways to do
> > > > this, but I'm looking for the official non-hacky way to do this.
> > > > This is CLI only, and our pager, less, does not support
> > > > multi-byte, so I'm looking
> > for
> > > options there also.
> > >
> > > SJIS is about as much multibyte as UTF-8.
> > >
> > > Why do you think less does not support it?
> > >
> > > Last time I looked there was SJIS locale for libc so it is only
> > > matter of generating the correct locales and using them. Of course,
> > > if you are
> > running
> > > in UTF-8 SJIS will look like garbage.
> >
> > Sadly, I did not personally build less on this platform, and the libc
> > used did not include UTF-16, on the platform vendor supplied less. cat
> > works fine, but the usual LESSCHARSET=utf-16 is unsupported, so I am
> > looking for an alternative. THAT is why I think less does not support
> > it. Sorry, I should have made that more clear.
> >
> > cat works fine, so if I set GIT_PAGER=cat, I can at least see the
> > diffs cleanly in SJIS, but this partner wants a pager that is usable.
> >
> 
> So you want to use SJIS because UTF-16 is not supported. So what is the
> problem with SJIS (or UTF-8 for that matter)?

The partner I am working with is using multi-byte SJIS, which is also not supported by this incarnation of less. As a result, UTF-8 does not work either in this situation. The content is definitely multi-byte. I know this was fixed in RedHat's Less in 2016, but did not make this platform.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Need multibyte advice - Shift-JIS
  2019-02-27 16:19       ` Randall S. Becker
@ 2019-02-27 16:28         ` Michal Suchánek
  2019-02-27 16:33           ` Randall S. Becker
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Suchánek @ 2019-02-27 16:28 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: git

On Wed, 27 Feb 2019 11:19:33 -0500
"Randall S. Becker" <rsbecker@nexbridge.com> wrote:

> On February 27, 2019 11:11, Michal Suchánek wrote:
> > On Wed, 27 Feb 2019 10:54:23 -0500
> > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> >   
> > > On February 27, 2019 9:09, Michal Suchánek wrote:  
> > > > On Wed, 27 Feb 2019 08:04:08 -0500
> > > > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> > > >  
> > > > > Hi Git Team,
> > > > >
> > > > > I have to admit being perplexed by this one. I have been asked to
> > > > > support the Shift-JIS character set in file contents, comments,
> > > > > and logs, for a partner of mine. I know there are a few ways to do
> > > > > this, but I'm looking for the official non-hacky way to do this.
> > > > > This is CLI only, and our pager, less, does not support
> > > > > multi-byte, so I'm looking  
> > > for  
> > > > options there also.
> > > >
> > > > SJIS is about as much multibyte as UTF-8.
> > > >
> > > > Why do you think less does not support it?
> > > >
> > > > Last time I looked there was SJIS locale for libc so it is only
> > > > matter of generating the correct locales and using them. Of course,
> > > > if you are  
> > > running  
> > > > in UTF-8 SJIS will look like garbage.  
> > >
> > > Sadly, I did not personally build less on this platform, and the libc
> > > used did not include UTF-16, on the platform vendor supplied less. cat
> > > works fine, but the usual LESSCHARSET=utf-16 is unsupported, so I am
> > > looking for an alternative. THAT is why I think less does not support
> > > it. Sorry, I should have made that more clear.
> > >
> > > cat works fine, so if I set GIT_PAGER=cat, I can at least see the
> > > diffs cleanly in SJIS, but this partner wants a pager that is usable.
> > >  
> > 
> > So you want to use SJIS because UTF-16 is not supported. So what is the
> > problem with SJIS (or UTF-8 for that matter)?  
> 
> The partner I am working with is using multi-byte SJIS, which is also not supported by this incarnation of less. As a result, UTF-8 does not work either in this situation. The content is definitely multi-byte. I know this was fixed in RedHat's Less in 2016, but did not make this platform.
> 

Both UTF-8 and SJIS is multibyte and both is supported by less
in general. If your particular less cannot support it then it is broken
and you should fix it or get it fixed.

HTH

Michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Need multibyte advice - Shift-JIS
  2019-02-27 16:28         ` Michal Suchánek
@ 2019-02-27 16:33           ` Randall S. Becker
  2019-02-27 16:51             ` Michal Suchánek
  0 siblings, 1 reply; 18+ messages in thread
From: Randall S. Becker @ 2019-02-27 16:33 UTC (permalink / raw)
  To: 'Michal Suchánek'; +Cc: git

On February 27, 2019 11:29 Michal Suchánek wrote:
> On Wed, 27 Feb 2019 11:19:33 -0500
> "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> 
> > On February 27, 2019 11:11, Michal Suchánek wrote:
> > > On Wed, 27 Feb 2019 10:54:23 -0500
> > > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> > >
> > > > On February 27, 2019 9:09, Michal Suchánek wrote:
> > > > > On Wed, 27 Feb 2019 08:04:08 -0500 "Randall S. Becker"
> > > > > <rsbecker@nexbridge.com> wrote:
> > > > >
> > > > > > Hi Git Team,
> > > > > >
> > > > > > I have to admit being perplexed by this one. I have been asked
> > > > > > to support the Shift-JIS character set in file contents,
> > > > > > comments, and logs, for a partner of mine. I know there are a
> > > > > > few ways to do this, but I'm looking for the official non-hacky way
> to do this.
> > > > > > This is CLI only, and our pager, less, does not support
> > > > > > multi-byte, so I'm looking
> > > > for
> > > > > options there also.
> > > > >
> > > > > SJIS is about as much multibyte as UTF-8.
> > > > >
> > > > > Why do you think less does not support it?
> > > > >
> > > > > Last time I looked there was SJIS locale for libc so it is only
> > > > > matter of generating the correct locales and using them. Of
> > > > > course, if you are
> > > > running
> > > > > in UTF-8 SJIS will look like garbage.
> > > >
> > > > Sadly, I did not personally build less on this platform, and the
> > > > libc used did not include UTF-16, on the platform vendor supplied
> > > > less. cat works fine, but the usual LESSCHARSET=utf-16 is
> > > > unsupported, so I am looking for an alternative. THAT is why I
> > > > think less does not support it. Sorry, I should have made that more
> clear.
> > > >
> > > > cat works fine, so if I set GIT_PAGER=cat, I can at least see the
> > > > diffs cleanly in SJIS, but this partner wants a pager that is usable.
> > > >
> > >
> > > So you want to use SJIS because UTF-16 is not supported. So what is
> > > the problem with SJIS (or UTF-8 for that matter)?
> >
> > The partner I am working with is using multi-byte SJIS, which is also not
> supported by this incarnation of less. As a result, UTF-8 does not work either
> in this situation. The content is definitely multi-byte. I know this was fixed in
> RedHat's Less in 2016, but did not make this platform.
> >
> 
> Both UTF-8 and SJIS is multibyte and both is supported by less in general. If
> your particular less cannot support it then it is broken and you should fix it or
> get it fixed.

To be more specific, the implementation of less' UTF-8 on this platform will present the data as unusable junk as expected. SJIS is multi-byte, but is not one of the allowed encodings in less. I am not empowered to "get it fixed". Thanks for your advice.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Need multibyte advice - Shift-JIS
  2019-02-27 16:33           ` Randall S. Becker
@ 2019-02-27 16:51             ` Michal Suchánek
  2019-02-27 17:03               ` Randall S. Becker
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Suchánek @ 2019-02-27 16:51 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: git

On Wed, 27 Feb 2019 11:33:47 -0500
"Randall S. Becker" <rsbecker@nexbridge.com> wrote:

> On February 27, 2019 11:29 Michal Suchánek wrote:
> > On Wed, 27 Feb 2019 11:19:33 -0500
> > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> >   
> > > On February 27, 2019 11:11, Michal Suchánek wrote:  
> > > > On Wed, 27 Feb 2019 10:54:23 -0500
> > > > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> > > >  
> > > > > On February 27, 2019 9:09, Michal Suchánek wrote:  
> > > > > > On Wed, 27 Feb 2019 08:04:08 -0500 "Randall S. Becker"
> > > > > > <rsbecker@nexbridge.com> wrote:
> > > > > >  
> > > > > > > Hi Git Team,
> > > > > > >
> > > > > > > I have to admit being perplexed by this one. I have been asked
> > > > > > > to support the Shift-JIS character set in file contents,
> > > > > > > comments, and logs, for a partner of mine. I know there are a
> > > > > > > few ways to do this, but I'm looking for the official non-hacky way  
> > to do this.  
> > > > > > > This is CLI only, and our pager, less, does not support
> > > > > > > multi-byte, so I'm looking  
> > > > > for  
> > > > > > options there also.
> > > > > >
> > > > > > SJIS is about as much multibyte as UTF-8.
> > > > > >
> > > > > > Why do you think less does not support it?
> > > > > >
> > > > > > Last time I looked there was SJIS locale for libc so it is only
> > > > > > matter of generating the correct locales and using them. Of
> > > > > > course, if you are  
> > > > > running  
> > > > > > in UTF-8 SJIS will look like garbage.  
> > > > >
> > > > > Sadly, I did not personally build less on this platform, and the
> > > > > libc used did not include UTF-16, on the platform vendor supplied
> > > > > less. cat works fine, but the usual LESSCHARSET=utf-16 is
> > > > > unsupported, so I am looking for an alternative. THAT is why I
> > > > > think less does not support it. Sorry, I should have made that more  
> > clear.  
> > > > >
> > > > > cat works fine, so if I set GIT_PAGER=cat, I can at least see the
> > > > > diffs cleanly in SJIS, but this partner wants a pager that is usable.
> > > > >  
> > > >
> > > > So you want to use SJIS because UTF-16 is not supported. So what is
> > > > the problem with SJIS (or UTF-8 for that matter)?  
> > >
> > > The partner I am working with is using multi-byte SJIS, which is also not  
> > supported by this incarnation of less. As a result, UTF-8 does not work either
> > in this situation. The content is definitely multi-byte. I know this was fixed in
> > RedHat's Less in 2016, but did not make this platform.  
> > >  
> > 
> > Both UTF-8 and SJIS is multibyte and both is supported by less in general. If
> > your particular less cannot support it then it is broken and you should fix it or
> > get it fixed.  
> 
> To be more specific, the implementation of less' UTF-8 on this platform will present the data as unusable junk as expected. SJIS is multi-byte, but is not one of the allowed encodings in less. I am not empowered to "get it fixed". Thanks for your advice.
> 

How is this 'allowed encodings in less' defined?

Michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Need multibyte advice - Shift-JIS
  2019-02-27 16:51             ` Michal Suchánek
@ 2019-02-27 17:03               ` Randall S. Becker
  2019-02-27 17:14                 ` Michal Suchánek
  0 siblings, 1 reply; 18+ messages in thread
From: Randall S. Becker @ 2019-02-27 17:03 UTC (permalink / raw)
  To: 'Michal Suchánek'; +Cc: git

On February 27, 2019 11:52, Michal Suchánek wrote:
> To: Randall S. Becker <rsbecker@nexbridge.com>
> Cc: git@vger.kernel.org
> Subject: Re: Need multibyte advice - Shift-JIS
> 
> On Wed, 27 Feb 2019 11:33:47 -0500
> "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> 
> > On February 27, 2019 11:29 Michal Suchánek wrote:
> > > On Wed, 27 Feb 2019 11:19:33 -0500
> > > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> > >
> > > > On February 27, 2019 11:11, Michal Suchánek wrote:
> > > > > On Wed, 27 Feb 2019 10:54:23 -0500 "Randall S. Becker"
> > > > > <rsbecker@nexbridge.com> wrote:
> > > > >
> > > > > > On February 27, 2019 9:09, Michal Suchánek wrote:
> > > > > > > On Wed, 27 Feb 2019 08:04:08 -0500 "Randall S. Becker"
> > > > > > > <rsbecker@nexbridge.com> wrote:
> > > > > > >
> > > > > > > > Hi Git Team,
> > > > > > > >
> > > > > > > > I have to admit being perplexed by this one. I have been
> > > > > > > > asked to support the Shift-JIS character set in file
> > > > > > > > contents, comments, and logs, for a partner of mine. I
> > > > > > > > know there are a few ways to do this, but I'm looking for
> > > > > > > > the official non-hacky way
> > > to do this.
> > > > > > > > This is CLI only, and our pager, less, does not support
> > > > > > > > multi-byte, so I'm looking
> > > > > > for
> > > > > > > options there also.
> > > > > > >
> > > > > > > SJIS is about as much multibyte as UTF-8.
> > > > > > >
> > > > > > > Why do you think less does not support it?
> > > > > > >
> > > > > > > Last time I looked there was SJIS locale for libc so it is
> > > > > > > only matter of generating the correct locales and using
> > > > > > > them. Of course, if you are
> > > > > > running
> > > > > > > in UTF-8 SJIS will look like garbage.
> > > > > >
> > > > > > Sadly, I did not personally build less on this platform, and
> > > > > > the libc used did not include UTF-16, on the platform vendor
> > > > > > supplied less. cat works fine, but the usual
> > > > > > LESSCHARSET=utf-16 is unsupported, so I am looking for an
> > > > > > alternative. THAT is why I think less does not support it.
> > > > > > Sorry, I should have made that more
> > > clear.
> > > > > >
> > > > > > cat works fine, so if I set GIT_PAGER=cat, I can at least see
> > > > > > the diffs cleanly in SJIS, but this partner wants a pager that is usable.
> > > > > >
> > > > >
> > > > > So you want to use SJIS because UTF-16 is not supported. So what
> > > > > is the problem with SJIS (or UTF-8 for that matter)?
> > > >
> > > > The partner I am working with is using multi-byte SJIS, which is
> > > > also not
> > > supported by this incarnation of less. As a result, UTF-8 does not
> > > work either in this situation. The content is definitely multi-byte.
> > > I know this was fixed in RedHat's Less in 2016, but did not make this
> platform.
> > > >
> > >
> > > Both UTF-8 and SJIS is multibyte and both is supported by less in
> > > general. If your particular less cannot support it then it is broken
> > > and you should fix it or get it fixed.
> >
> > To be more specific, the implementation of less' UTF-8 on this platform will
> present the data as unusable junk as expected. SJIS is multi-byte, but is not
> one of the allowed encodings in less. I am not empowered to "get it fixed".
> Thanks for your advice.
> >
> 
> How is this 'allowed encodings in less' defined?

When you run less with LESSCHARSET=encoding, if the encoding is not known, you get the error:
invalid charset name

Doing the due diligence, I actually read the man page on the platform before asking the question, which listed the following as the only allowed encodings: ascii, iso8859, latin1, latin9, dos, IBM-1047, koi8-r, next, utf-8, windows. The utf-8 variant does not know how to display its multi-byte forms in SJIS, as with other platforms. Does that make sense now?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Need multibyte advice - Shift-JIS
  2019-02-27 17:03               ` Randall S. Becker
@ 2019-02-27 17:14                 ` Michal Suchánek
  2019-02-27 17:38                   ` Randall S. Becker
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Suchánek @ 2019-02-27 17:14 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: git

On Wed, 27 Feb 2019 12:03:58 -0500
"Randall S. Becker" <rsbecker@nexbridge.com> wrote:

> On February 27, 2019 11:52, Michal Suchánek wrote:
> > To: Randall S. Becker <rsbecker@nexbridge.com>
> > Cc: git@vger.kernel.org
> > Subject: Re: Need multibyte advice - Shift-JIS
> > 
> > On Wed, 27 Feb 2019 11:33:47 -0500
> > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> >   
> > > On February 27, 2019 11:29 Michal Suchánek wrote:  
> > > > On Wed, 27 Feb 2019 11:19:33 -0500
> > > > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> > > >  
> > > > > On February 27, 2019 11:11, Michal Suchánek wrote:  
> > > > > > On Wed, 27 Feb 2019 10:54:23 -0500 "Randall S. Becker"
> > > > > > <rsbecker@nexbridge.com> wrote:
> > > > > >  
> > > > > > > On February 27, 2019 9:09, Michal Suchánek wrote:  
> > > > > > > > On Wed, 27 Feb 2019 08:04:08 -0500 "Randall S. Becker"
> > > > > > > > <rsbecker@nexbridge.com> wrote:
> > > > > > > >  
> > > > > > > > > Hi Git Team,
> > > > > > > > >
> > > > > > > > > I have to admit being perplexed by this one. I have been
> > > > > > > > > asked to support the Shift-JIS character set in file
> > > > > > > > > contents, comments, and logs, for a partner of mine. I
> > > > > > > > > know there are a few ways to do this, but I'm looking for
> > > > > > > > > the official non-hacky way  
> > > > to do this.  
> > > > > > > > > This is CLI only, and our pager, less, does not support
> > > > > > > > > multi-byte, so I'm looking  
> > > > > > > for  
> > > > > > > > options there also.
> > > > > > > >
> > > > > > > > SJIS is about as much multibyte as UTF-8.
> > > > > > > >
> > > > > > > > Why do you think less does not support it?
> > > > > > > >
> > > > > > > > Last time I looked there was SJIS locale for libc so it is
> > > > > > > > only matter of generating the correct locales and using
> > > > > > > > them. Of course, if you are  
> > > > > > > running  
> > > > > > > > in UTF-8 SJIS will look like garbage.  
> > > > > > >
> > > > > > > Sadly, I did not personally build less on this platform, and
> > > > > > > the libc used did not include UTF-16, on the platform vendor
> > > > > > > supplied less. cat works fine, but the usual
> > > > > > > LESSCHARSET=utf-16 is unsupported, so I am looking for an
> > > > > > > alternative. THAT is why I think less does not support it.
> > > > > > > Sorry, I should have made that more  
> > > > clear.  
> > > > > > >
> > > > > > > cat works fine, so if I set GIT_PAGER=cat, I can at least see
> > > > > > > the diffs cleanly in SJIS, but this partner wants a pager that is usable.
> > > > > > >  
> > > > > >
> > > > > > So you want to use SJIS because UTF-16 is not supported. So what
> > > > > > is the problem with SJIS (or UTF-8 for that matter)?  
> > > > >
> > > > > The partner I am working with is using multi-byte SJIS, which is
> > > > > also not  
> > > > supported by this incarnation of less. As a result, UTF-8 does not
> > > > work either in this situation. The content is definitely multi-byte.
> > > > I know this was fixed in RedHat's Less in 2016, but did not make this  
> > platform.  
> > > > >  
> > > >
> > > > Both UTF-8 and SJIS is multibyte and both is supported by less in
> > > > general. If your particular less cannot support it then it is broken
> > > > and you should fix it or get it fixed.  
> > >
> > > To be more specific, the implementation of less' UTF-8 on this platform will  
> > present the data as unusable junk as expected. SJIS is multi-byte, but is not
> > one of the allowed encodings in less. I am not empowered to "get it fixed".
> > Thanks for your advice.  
> > >  
> > 
> > How is this 'allowed encodings in less' defined?  
> 
> When you run less with LESSCHARSET=encoding, if the encoding is not known, you get the error:
> invalid charset name
> 
> Doing the due diligence, I actually read the man page on the platform before asking the question, which listed the following as the only allowed encodings: ascii, iso8859, latin1, latin9, dos, IBM-1047, koi8-r, next, utf-8, windows. The utf-8 variant does not know how to display its multi-byte forms in SJIS, as with other platforms. Does that make sense now?
> 

Does the said man page also mention LESSCHARDEF or LESSOPEN?

Michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Need multibyte advice - Shift-JIS
  2019-02-27 17:14                 ` Michal Suchánek
@ 2019-02-27 17:38                   ` Randall S. Becker
  2019-02-27 17:50                     ` Michal Suchánek
  0 siblings, 1 reply; 18+ messages in thread
From: Randall S. Becker @ 2019-02-27 17:38 UTC (permalink / raw)
  To: 'Michal Suchánek'; +Cc: git

On February 27, 2019 12:15, Michal Suchánek wrote:
> To: Randall S. Becker <rsbecker@nexbridge.com>
> Cc: git@vger.kernel.org
> Subject: Re: Need multibyte advice - Shift-JIS
> 
> On Wed, 27 Feb 2019 12:03:58 -0500
> "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> 
> > On February 27, 2019 11:52, Michal Suchánek wrote:
> > > To: Randall S. Becker <rsbecker@nexbridge.com>
> > > Cc: git@vger.kernel.org
> > > Subject: Re: Need multibyte advice - Shift-JIS
> > >
> > > On Wed, 27 Feb 2019 11:33:47 -0500
> > > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> > >
> > > > On February 27, 2019 11:29 Michal Suchánek wrote:
> > > > > On Wed, 27 Feb 2019 11:19:33 -0500 "Randall S. Becker"
> > > > > <rsbecker@nexbridge.com> wrote:
> > > > >
> > > > > > On February 27, 2019 11:11, Michal Suchánek wrote:
> > > > > > > On Wed, 27 Feb 2019 10:54:23 -0500 "Randall S. Becker"
> > > > > > > <rsbecker@nexbridge.com> wrote:
> > > > > > >
> > > > > > > > On February 27, 2019 9:09, Michal Suchánek wrote:
> > > > > > > > > On Wed, 27 Feb 2019 08:04:08 -0500 "Randall S. Becker"
> > > > > > > > > <rsbecker@nexbridge.com> wrote:
> > > > > > > > >
> > > > > > > > > > Hi Git Team,
> > > > > > > > > >
> > > > > > > > > > I have to admit being perplexed by this one. I have
> > > > > > > > > > been asked to support the Shift-JIS character set in
> > > > > > > > > > file contents, comments, and logs, for a partner of
> > > > > > > > > > mine. I know there are a few ways to do this, but I'm
> > > > > > > > > > looking for the official non-hacky way
> > > > > to do this.
> > > > > > > > > > This is CLI only, and our pager, less, does not
> > > > > > > > > > support multi-byte, so I'm looking
> > > > > > > > for
> > > > > > > > > options there also.
> > > > > > > > >
> > > > > > > > > SJIS is about as much multibyte as UTF-8.
> > > > > > > > >
> > > > > > > > > Why do you think less does not support it?
> > > > > > > > >
> > > > > > > > > Last time I looked there was SJIS locale for libc so it
> > > > > > > > > is only matter of generating the correct locales and
> > > > > > > > > using them. Of course, if you are
> > > > > > > > running
> > > > > > > > > in UTF-8 SJIS will look like garbage.
> > > > > > > >
> > > > > > > > Sadly, I did not personally build less on this platform,
> > > > > > > > and the libc used did not include UTF-16, on the platform
> > > > > > > > vendor supplied less. cat works fine, but the usual
> > > > > > > > LESSCHARSET=utf-16 is unsupported, so I am looking for an
> > > > > > > > alternative. THAT is why I think less does not support it.
> > > > > > > > Sorry, I should have made that more
> > > > > clear.
> > > > > > > >
> > > > > > > > cat works fine, so if I set GIT_PAGER=cat, I can at least
> > > > > > > > see the diffs cleanly in SJIS, but this partner wants a pager that is
> usable.
> > > > > > > >
> > > > > > >
> > > > > > > So you want to use SJIS because UTF-16 is not supported. So
> > > > > > > what is the problem with SJIS (or UTF-8 for that matter)?
> > > > > >
> > > > > > The partner I am working with is using multi-byte SJIS, which
> > > > > > is also not
> > > > > supported by this incarnation of less. As a result, UTF-8 does
> > > > > not work either in this situation. The content is definitely multi-byte.
> > > > > I know this was fixed in RedHat's Less in 2016, but did not make
> > > > > this
> > > platform.
> > > > > >
> > > > >
> > > > > Both UTF-8 and SJIS is multibyte and both is supported by less
> > > > > in general. If your particular less cannot support it then it is
> > > > > broken and you should fix it or get it fixed.
> > > >
> > > > To be more specific, the implementation of less' UTF-8 on this
> > > > platform will
> > > present the data as unusable junk as expected. SJIS is multi-byte,
> > > but is not one of the allowed encodings in less. I am not empowered to
> "get it fixed".
> > > Thanks for your advice.
> > > >
> > >
> > > How is this 'allowed encodings in less' defined?
> >
> > When you run less with LESSCHARSET=encoding, if the encoding is not
> known, you get the error:
> > invalid charset name
> >
> > Doing the due diligence, I actually read the man page on the platform
> before asking the question, which listed the following as the only allowed
> encodings: ascii, iso8859, latin1, latin9, dos, IBM-1047, koi8-r, next, utf-8,
> windows. The utf-8 variant does not know how to display its multi-byte
> forms in SJIS, as with other platforms. Does that make sense now?
> >
> 
> Does the said man page also mention LESSCHARDEF or LESSOPEN?

Of course it does.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Need multibyte advice - Shift-JIS
  2019-02-27 17:38                   ` Randall S. Becker
@ 2019-02-27 17:50                     ` Michal Suchánek
  2019-02-27 17:59                       ` Randall S. Becker
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Suchánek @ 2019-02-27 17:50 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: git

On Wed, 27 Feb 2019 12:38:06 -0500
"Randall S. Becker" <rsbecker@nexbridge.com> wrote:

> On February 27, 2019 12:15, Michal Suchánek wrote:
> > To: Randall S. Becker <rsbecker@nexbridge.com>
> > Cc: git@vger.kernel.org
> > Subject: Re: Need multibyte advice - Shift-JIS
> > 
> > On Wed, 27 Feb 2019 12:03:58 -0500
> > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> >   
> > > On February 27, 2019 11:52, Michal Suchánek wrote:  
> > > > To: Randall S. Becker <rsbecker@nexbridge.com>
> > > > Cc: git@vger.kernel.org
> > > > Subject: Re: Need multibyte advice - Shift-JIS
> > > >
> > > > On Wed, 27 Feb 2019 11:33:47 -0500
> > > > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> > > >  
> > > > > On February 27, 2019 11:29 Michal Suchánek wrote:  
> > > > > > On Wed, 27 Feb 2019 11:19:33 -0500 "Randall S. Becker"
> > > > > > <rsbecker@nexbridge.com> wrote:
> > > > > >  
> > > > > > > On February 27, 2019 11:11, Michal Suchánek wrote:  
> > > > > > > > On Wed, 27 Feb 2019 10:54:23 -0500 "Randall S. Becker"
> > > > > > > > <rsbecker@nexbridge.com> wrote:
> > > > > > > >  
> > > > > > > > > On February 27, 2019 9:09, Michal Suchánek wrote:  
> > > > > > > > > > On Wed, 27 Feb 2019 08:04:08 -0500 "Randall S. Becker"
> > > > > > > > > > <rsbecker@nexbridge.com> wrote:
> > > > > > > > > >  
> > > > > > > > > > > Hi Git Team,
> > > > > > > > > > >
> > > > > > > > > > > I have to admit being perplexed by this one. I have
> > > > > > > > > > > been asked to support the Shift-JIS character set in
> > > > > > > > > > > file contents, comments, and logs, for a partner of
> > > > > > > > > > > mine. I know there are a few ways to do this, but I'm
> > > > > > > > > > > looking for the official non-hacky way  
> > > > > > to do this.  
> > > > > > > > > > > This is CLI only, and our pager, less, does not
> > > > > > > > > > > support multi-byte, so I'm looking  
> > > > > > > > > for  
> > > > > > > > > > options there also.
> > > > > > > > > >
> > > > > > > > > > SJIS is about as much multibyte as UTF-8.
> > > > > > > > > >
> > > > > > > > > > Why do you think less does not support it?
> > > > > > > > > >
> > > > > > > > > > Last time I looked there was SJIS locale for libc so it
> > > > > > > > > > is only matter of generating the correct locales and
> > > > > > > > > > using them. Of course, if you are  
> > > > > > > > > running  
> > > > > > > > > > in UTF-8 SJIS will look like garbage.  
> > > > > > > > >
> > > > > > > > > Sadly, I did not personally build less on this platform,
> > > > > > > > > and the libc used did not include UTF-16, on the platform
> > > > > > > > > vendor supplied less. cat works fine, but the usual
> > > > > > > > > LESSCHARSET=utf-16 is unsupported, so I am looking for an
> > > > > > > > > alternative. THAT is why I think less does not support it.
> > > > > > > > > Sorry, I should have made that more  
> > > > > > clear.  
> > > > > > > > >
> > > > > > > > > cat works fine, so if I set GIT_PAGER=cat, I can at least
> > > > > > > > > see the diffs cleanly in SJIS, but this partner wants a pager that is  
> > usable.  
> > > > > > > > >  
> > > > > > > >
> > > > > > > > So you want to use SJIS because UTF-16 is not supported. So
> > > > > > > > what is the problem with SJIS (or UTF-8 for that matter)?  
> > > > > > >
> > > > > > > The partner I am working with is using multi-byte SJIS, which
> > > > > > > is also not  
> > > > > > supported by this incarnation of less. As a result, UTF-8 does
> > > > > > not work either in this situation. The content is definitely multi-byte.
> > > > > > I know this was fixed in RedHat's Less in 2016, but did not make
> > > > > > this  
> > > > platform.  
> > > > > > >  
> > > > > >
> > > > > > Both UTF-8 and SJIS is multibyte and both is supported by less
> > > > > > in general. If your particular less cannot support it then it is
> > > > > > broken and you should fix it or get it fixed.  
> > > > >
> > > > > To be more specific, the implementation of less' UTF-8 on this
> > > > > platform will  
> > > > present the data as unusable junk as expected. SJIS is multi-byte,
> > > > but is not one of the allowed encodings in less. I am not empowered to  
> > "get it fixed".  
> > > > Thanks for your advice.  
> > > > >  
> > > >
> > > > How is this 'allowed encodings in less' defined?  
> > >
> > > When you run less with LESSCHARSET=encoding, if the encoding is not  
> > known, you get the error:  
> > > invalid charset name
> > >
> > > Doing the due diligence, I actually read the man page on the platform  
> > before asking the question, which listed the following as the only allowed
> > encodings: ascii, iso8859, latin1, latin9, dos, IBM-1047, koi8-r, next, utf-8,
> > windows. The utf-8 variant does not know how to display its multi-byte
> > forms in SJIS, as with other platforms. Does that make sense now?  
> > >  
> > 
> > Does the said man page also mention LESSCHARDEF or LESSOPEN?  
> 
> Of course it does.
> 

So what's the problem with displaying SJIS or even UTF-16 in less,
exactly?

Also if you really don't like less there is lv.

Michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Need multibyte advice - Shift-JIS
  2019-02-27 17:50                     ` Michal Suchánek
@ 2019-02-27 17:59                       ` Randall S. Becker
  2019-02-27 18:18                         ` Michal Suchánek
  0 siblings, 1 reply; 18+ messages in thread
From: Randall S. Becker @ 2019-02-27 17:59 UTC (permalink / raw)
  To: 'Michal Suchánek'; +Cc: git

On February 27, 2019 12:51, Michal Suchánek wrote:
> To: Randall S. Becker <rsbecker@nexbridge.com>
> Cc: git@vger.kernel.org
> Subject: Re: Need multibyte advice - Shift-JIS
> 
> On Wed, 27 Feb 2019 12:38:06 -0500
> "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> 
> > On February 27, 2019 12:15, Michal Suchánek wrote:
> > > To: Randall S. Becker <rsbecker@nexbridge.com>
> > > Cc: git@vger.kernel.org
> > > Subject: Re: Need multibyte advice - Shift-JIS
> > >
> > > On Wed, 27 Feb 2019 12:03:58 -0500
> > > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> > >
> > > > On February 27, 2019 11:52, Michal Suchánek wrote:
> > > > > To: Randall S. Becker <rsbecker@nexbridge.com>
> > > > > Cc: git@vger.kernel.org
> > > > > Subject: Re: Need multibyte advice - Shift-JIS
> > > > >
> > > > > On Wed, 27 Feb 2019 11:33:47 -0500 "Randall S. Becker"
> > > > > <rsbecker@nexbridge.com> wrote:
> > > > >
> > > > > > On February 27, 2019 11:29 Michal Suchánek wrote:
> > > > > > > On Wed, 27 Feb 2019 11:19:33 -0500 "Randall S. Becker"
> > > > > > > <rsbecker@nexbridge.com> wrote:
> > > > > > >
> > > > > > > > On February 27, 2019 11:11, Michal Suchánek wrote:
> > > > > > > > > On Wed, 27 Feb 2019 10:54:23 -0500 "Randall S. Becker"
> > > > > > > > > <rsbecker@nexbridge.com> wrote:
> > > > > > > > >
> > > > > > > > > > On February 27, 2019 9:09, Michal Suchánek wrote:
> > > > > > > > > > > On Wed, 27 Feb 2019 08:04:08 -0500 "Randall S. Becker"
> > > > > > > > > > > <rsbecker@nexbridge.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Git Team,
> > > > > > > > > > > >
> > > > > > > > > > > > I have to admit being perplexed by this one. I
> > > > > > > > > > > > have been asked to support the Shift-JIS character
> > > > > > > > > > > > set in file contents, comments, and logs, for a
> > > > > > > > > > > > partner of mine. I know there are a few ways to do
> > > > > > > > > > > > this, but I'm looking for the official non-hacky
> > > > > > > > > > > > way
> > > > > > > to do this.
> > > > > > > > > > > > This is CLI only, and our pager, less, does not
> > > > > > > > > > > > support multi-byte, so I'm looking
> > > > > > > > > > for
> > > > > > > > > > > options there also.
> > > > > > > > > > >
> > > > > > > > > > > SJIS is about as much multibyte as UTF-8.
> > > > > > > > > > >
> > > > > > > > > > > Why do you think less does not support it?
> > > > > > > > > > >
> > > > > > > > > > > Last time I looked there was SJIS locale for libc so
> > > > > > > > > > > it is only matter of generating the correct locales
> > > > > > > > > > > and using them. Of course, if you are
> > > > > > > > > > running
> > > > > > > > > > > in UTF-8 SJIS will look like garbage.
> > > > > > > > > >
> > > > > > > > > > Sadly, I did not personally build less on this
> > > > > > > > > > platform, and the libc used did not include UTF-16, on
> > > > > > > > > > the platform vendor supplied less. cat works fine, but
> > > > > > > > > > the usual
> > > > > > > > > > LESSCHARSET=utf-16 is unsupported, so I am looking for
> > > > > > > > > > an alternative. THAT is why I think less does not support it.
> > > > > > > > > > Sorry, I should have made that more
> > > > > > > clear.
> > > > > > > > > >
> > > > > > > > > > cat works fine, so if I set GIT_PAGER=cat, I can at
> > > > > > > > > > least see the diffs cleanly in SJIS, but this partner
> > > > > > > > > > wants a pager that is
> > > usable.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > So you want to use SJIS because UTF-16 is not supported.
> > > > > > > > > So what is the problem with SJIS (or UTF-8 for that matter)?
> > > > > > > >
> > > > > > > > The partner I am working with is using multi-byte SJIS,
> > > > > > > > which is also not
> > > > > > > supported by this incarnation of less. As a result, UTF-8
> > > > > > > does not work either in this situation. The content is definitely
> multi-byte.
> > > > > > > I know this was fixed in RedHat's Less in 2016, but did not
> > > > > > > make this
> > > > > platform.
> > > > > > > >
> > > > > > >
> > > > > > > Both UTF-8 and SJIS is multibyte and both is supported by
> > > > > > > less in general. If your particular less cannot support it
> > > > > > > then it is broken and you should fix it or get it fixed.
> > > > > >
> > > > > > To be more specific, the implementation of less' UTF-8 on this
> > > > > > platform will
> > > > > present the data as unusable junk as expected. SJIS is
> > > > > multi-byte, but is not one of the allowed encodings in less. I
> > > > > am not empowered to
> > > "get it fixed".
> > > > > Thanks for your advice.
> > > > > >
> > > > >
> > > > > How is this 'allowed encodings in less' defined?
> > > >
> > > > When you run less with LESSCHARSET=encoding, if the encoding is
> > > > not
> > > known, you get the error:
> > > > invalid charset name
> > > >
> > > > Doing the due diligence, I actually read the man page on the
> > > > platform
> > > before asking the question, which listed the following as the only
> > > allowed
> > > encodings: ascii, iso8859, latin1, latin9, dos, IBM-1047, koi8-r,
> > > next, utf-8, windows. The utf-8 variant does not know how to display
> > > its multi-byte forms in SJIS, as with other platforms. Does that make
> sense now?
> > > >
> > >
> > > Does the said man page also mention LESSCHARDEF or LESSOPEN?
> >
> > Of course it does.
> >
> 
> So what's the problem with displaying SJIS or even UTF-16 in less, exactly?
> 
> Also if you really don't like less there is lv.

I'm sorry if I was not clear about all this. NonStop is not a Linux platform. It is POSIX. Not all utilities are available and not all utilities have all capabilities. lv is not available for the platform. less considers the data binary and displays what usually is displayed when you try to use it for binary multibyte. You get @^@- and such. It does not present the data in the correct character set for the user.

This was only one part of my original question. I am searching elsewhere for support on pagers, because this really is not an appropriate discussion for the git group to focus on, do let's drop this, please, as not worth continuing. My original request was more about how to set up the file attributes, difference engine, and the rest of the git infrastructure. The partner I am working with is doing this with git hooks, which I am not really happy about. Let's prune this discussion as not worthy.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Need multibyte advice - Shift-JIS
  2019-02-27 17:59                       ` Randall S. Becker
@ 2019-02-27 18:18                         ` Michal Suchánek
  2019-02-27 18:50                           ` Randall S. Becker
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Suchánek @ 2019-02-27 18:18 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: git

On Wed, 27 Feb 2019 12:59:15 -0500
"Randall S. Becker" <rsbecker@nexbridge.com> wrote:

> On February 27, 2019 12:51, Michal Suchánek wrote:
> > To: Randall S. Becker <rsbecker@nexbridge.com>
> > Cc: git@vger.kernel.org
> > Subject: Re: Need multibyte advice - Shift-JIS
> > 

> I'm sorry if I was not clear about all this. NonStop is not a Linux platform. It is POSIX. Not all utilities are available and not all utilities have all capabilities. lv is not available for the platform. less considers the data binary and displays what usually is displayed when you try to use it for binary multibyte. You get @^@- and such. It does not present the data in the correct character set for the user.
> 
> This was only one part of my original question. I am searching elsewhere for support on pagers, because this really is not an appropriate discussion for the git group to focus on, do let's drop this, please, as not worth continuing. My original request was more about how to set up the file attributes, difference engine, and the rest of the git infrastructure. The partner I am working with is doing this with git hooks, which I am not really happy about. Let's prune this discussion as not worthy.
> 

Yes, this is totally unclear. Setting git hooks is possible but setting
LESSCHARDEF is not?

Is patching git acceptable or is that out of question too?

What are your requirements, exactly?

If your data is in fact UTF-16 how is SJIS involved?

Michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Need multibyte advice - Shift-JIS
  2019-02-27 18:18                         ` Michal Suchánek
@ 2019-02-27 18:50                           ` Randall S. Becker
  2019-02-27 18:59                             ` Michal Suchánek
  2019-02-27 19:36                             ` Johannes Sixt
  0 siblings, 2 replies; 18+ messages in thread
From: Randall S. Becker @ 2019-02-27 18:50 UTC (permalink / raw)
  To: 'Michal Suchánek'; +Cc: git

On February 27, 2019 13:18, Michal Suchánek wrote:
> On Wed, 27 Feb 2019 12:59:15 -0500
> "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> 
> > On February 27, 2019 12:51, Michal Suchánek wrote:
> > > To: Randall S. Becker <rsbecker@nexbridge.com>
> > > Cc: git@vger.kernel.org
> > > Subject: Re: Need multibyte advice - Shift-JIS
> > >
> 
> > I'm sorry if I was not clear about all this. NonStop is not a Linux platform. It
> is POSIX. Not all utilities are available and not all utilities have all capabilities.
> lv is not available for the platform. less considers the data binary and displays
> what usually is displayed when you try to use it for binary multibyte. You get
> @^@- and such. It does not present the data in the correct character set for
> the user.
> >
> > This was only one part of my original question. I am searching elsewhere
> for support on pagers, because this really is not an appropriate discussion for
> the git group to focus on, do let's drop this, please, as not worth continuing.
> My original request was more about how to set up the file attributes,
> difference engine, and the rest of the git infrastructure. The partner I am
> working with is doing this with git hooks, which I am not really happy about.
> Let's prune this discussion as not worthy.
> >
> 
> Yes, this is totally unclear. Setting git hooks is possible but setting
> LESSCHARDEF is not?
It can enter the environment simply through .profile, where we can change GIT_PAGER. We have established that separately.

> Is patching git acceptable or is that out of question too?
I have done a bunch of git patching, where specifically.

> What are your requirements, exactly?
Source code and comments contain SJIS content. The requirement is to be able to move seamlessly in and out of git, and have git show/diff/log display SJIS as well as ASCII content. How that happens is open. The UTF-16 is a red-herring, only as an attempt at getting at SJIS content differently than the limitation imposed by less.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Need multibyte advice - Shift-JIS
  2019-02-27 18:50                           ` Randall S. Becker
@ 2019-02-27 18:59                             ` Michal Suchánek
  2019-02-27 19:36                             ` Johannes Sixt
  1 sibling, 0 replies; 18+ messages in thread
From: Michal Suchánek @ 2019-02-27 18:59 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: git

On Wed, 27 Feb 2019 13:50:18 -0500
"Randall S. Becker" <rsbecker@nexbridge.com> wrote:

> On February 27, 2019 13:18, Michal Suchánek wrote:
> > On Wed, 27 Feb 2019 12:59:15 -0500
> > "Randall S. Becker" <rsbecker@nexbridge.com> wrote:
> >   
> > > On February 27, 2019 12:51, Michal Suchánek wrote:  
> > > > To: Randall S. Becker <rsbecker@nexbridge.com>
> > > > Cc: git@vger.kernel.org
> > > > Subject: Re: Need multibyte advice - Shift-JIS
> > > >  
> >   
> > > I'm sorry if I was not clear about all this. NonStop is not a Linux platform. It  
> > is POSIX. Not all utilities are available and not all utilities have all capabilities.
> > lv is not available for the platform. less considers the data binary and displays
> > what usually is displayed when you try to use it for binary multibyte. You get
> > @^@- and such. It does not present the data in the correct character set for
> > the user.  
> > >
> > > This was only one part of my original question. I am searching elsewhere  
> > for support on pagers, because this really is not an appropriate discussion for
> > the git group to focus on, do let's drop this, please, as not worth continuing.
> > My original request was more about how to set up the file attributes,
> > difference engine, and the rest of the git infrastructure. The partner I am
> > working with is doing this with git hooks, which I am not really happy about.
> > Let's prune this discussion as not worthy.  
> > >  
> > 
> > Yes, this is totally unclear. Setting git hooks is possible but setting
> > LESSCHARDEF is not?  
> It can enter the environment simply through .profile, where we can change GIT_PAGER. We have established that separately.
> 
> > Is patching git acceptable or is that out of question too?  
> I have done a bunch of git patching, where specifically.
> 
> > What are your requirements, exactly?  
> Source code and comments contain SJIS content. The requirement is to be able to move seamlessly in and out of git, and have git show/diff/log display SJIS as well as ASCII content. How that happens is open. The UTF-16 is a red-herring, only as an attempt at getting at SJIS content differently than the limitation imposed by less.
> 
> 

 - less is a very limited pager when it comes to international support.
   Patches to support Japanese specifically were at a time merged in
   RedHat but then reverted because they broke other languages. If
   'cat' pager is good enough except for the paging 'less -r' provides
   same plus the paging. You can fine-tune what is supposed to be
   displayed verbatim and what is supposed to be treated specially with
   LESSCHARDEF but it might not work well for SJIS. You can also set
   the LESS variable to include -r by default.

 - if patching git is not out of question I don't see how adding a a
   full-featured pager such as lv is. 

 - I don't think git itself needs any adjustments to handle SJIS if it
   can handle UTF-8 already. UTF-16 would be a different story.

Michal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Need multibyte advice - Shift-JIS
  2019-02-27 18:50                           ` Randall S. Becker
  2019-02-27 18:59                             ` Michal Suchánek
@ 2019-02-27 19:36                             ` Johannes Sixt
  2019-02-27 19:53                               ` Randall S. Becker
  1 sibling, 1 reply; 18+ messages in thread
From: Johannes Sixt @ 2019-02-27 19:36 UTC (permalink / raw)
  To: Randall S. Becker, 'Michal Suchánek'; +Cc: git

Am 27.02.19 um 19:50 schrieb Randall S. Becker:
> On February 27, 2019 13:18, Michal Suchánek wrote:
>> What are your requirements, exactly?
> Source code and comments contain SJIS content. The requirement is to
> be able to move seamlessly in and out of git, and have git show/diff/log
> display SJIS as well as ASCII content. How that happens is open. The
> UTF-16 is a red-herring, only as an attempt at getting at SJIS content
> differently than the limitation imposed by less.

When your file content contains ShiftJIS, you should set an attribute in
.gitattributes:

*.sourcecode	encoding=ShiftJIS

When your git commits messages contain ShiftJIS, you should configure

 git config i18n.commitEncoding ShiftJIS

More precisely, this assumes that your editor that composes to commits
messages writes the .git/COMMIT_MSG in ShiftJIS.

When your terminal or pager is configured that it interprets the byte
stream that it receives from applications for display as ShiftJIS, then
you should configure

 git config i18n.logOutputEncoding ShiftJIS

You can set this independently from the other settings. In particular,
when it is not set, UTF-8 is assumed. That is, if your terminal or pager
supports UTF-8, you should *NOT* set this configuration (or set it to
UTF-8).

And, of course, you must have built Git with iconv, which must have
support for ShiftJIS.

-- Hannes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Need multibyte advice - Shift-JIS
  2019-02-27 19:36                             ` Johannes Sixt
@ 2019-02-27 19:53                               ` Randall S. Becker
  0 siblings, 0 replies; 18+ messages in thread
From: Randall S. Becker @ 2019-02-27 19:53 UTC (permalink / raw)
  To: 'Johannes Sixt', 'Michal Suchánek'; +Cc: git

On February 27, 2019 14:36, Johannes Sixt wrote:
> Am 27.02.19 um 19:50 schrieb Randall S. Becker:
> > On February 27, 2019 13:18, Michal Suchánek wrote:
> >> What are your requirements, exactly?
> > Source code and comments contain SJIS content. The requirement is to
> > be able to move seamlessly in and out of git, and have git
> > show/diff/log display SJIS as well as ASCII content. How that happens
> > is open. The
> > UTF-16 is a red-herring, only as an attempt at getting at SJIS content
> > differently than the limitation imposed by less.
> 
> When your file content contains ShiftJIS, you should set an attribute in
> .gitattributes:
> 
> *.sourcecode	encoding=ShiftJIS
> 
> When your git commits messages contain ShiftJIS, you should configure
> 
>  git config i18n.commitEncoding ShiftJIS
> 
> More precisely, this assumes that your editor that composes to commits
> messages writes the .git/COMMIT_MSG in ShiftJIS.
> 
> When your terminal or pager is configured that it interprets the byte stream
> that it receives from applications for display as ShiftJIS, then you should
> configure
> 
>  git config i18n.logOutputEncoding ShiftJIS
> 
> You can set this independently from the other settings. In particular, when it
> is not set, UTF-8 is assumed. That is, if your terminal or pager supports UTF-
> 8, you should *NOT* set this configuration (or set it to UTF-8).
> 
> And, of course, you must have built Git with iconv, which must have support
> for ShiftJIS.

Thanks. I will forward this along and check the iconv build and rev-level we have used in git. Many thanks.



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-02-27 19:54 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-27 13:04 Need multibyte advice - Shift-JIS Randall S. Becker
2019-02-27 14:08 ` Michal Suchánek
2019-02-27 15:54   ` Randall S. Becker
2019-02-27 16:11     ` Michal Suchánek
2019-02-27 16:19       ` Randall S. Becker
2019-02-27 16:28         ` Michal Suchánek
2019-02-27 16:33           ` Randall S. Becker
2019-02-27 16:51             ` Michal Suchánek
2019-02-27 17:03               ` Randall S. Becker
2019-02-27 17:14                 ` Michal Suchánek
2019-02-27 17:38                   ` Randall S. Becker
2019-02-27 17:50                     ` Michal Suchánek
2019-02-27 17:59                       ` Randall S. Becker
2019-02-27 18:18                         ` Michal Suchánek
2019-02-27 18:50                           ` Randall S. Becker
2019-02-27 18:59                             ` Michal Suchánek
2019-02-27 19:36                             ` Johannes Sixt
2019-02-27 19:53                               ` Randall S. Becker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.