git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luke Diamand <luke@diamand.org>
To: Andrew Oakley <andrew@adoakley.name>,
	Tzadik Vanderhoof <tzadik.vanderhoof@gmail.com>
Cc: Git List <git@vger.kernel.org>, Feiyang Xue <me@feiyangxue.com>
Subject: Re: [PATCH 2/2] git-p4: do not decode data from perforce by default
Date: Fri, 30 Apr 2021 15:33:11 +0000	[thread overview]
Message-ID: <021c0caf-8e6f-4fbb-6ff7-40bacbe5de38@diamand.org> (raw)
In-Reply-To: <20210430095342.58134e4e@ado-tr>



On 30/04/2021 08:53, Andrew Oakley wrote:
> On Thu, 29 Apr 2021 03:00:06 -0700
> Tzadik Vanderhoof <tzadik.vanderhoof@gmail.com> wrote:
>> However, on Windows, UTF-8 strings passed to "p4 submit -d" are
>> somehow converted to the default Windows code page by the time they
>> are stored in the Perforce database, probably as part of the process
>> of passing the command line arguments to the Windows p4 executable.
>> However, the "code page" data is *not* converted to UTF-8 on the way
>> back from p4 to git-p4.py.  The only way to get it into UTF-8 is to
>> call string.decode().  As a result, this patch, which takes out the
>> call to string.decode() will not work on Windows.
> 
> Thanks for that explanation, the reencoding of the data on Windows is
> not something I was expecting.  Given the behaviour you've described, I
> suspect that there might be two different problems that we are trying
> to solve.
> 
> The perforce depot I'm working with has a mixture of encodings, and
> commits are created from a variety of different environments. The
> majority of commits are ASCII or UTF-8, there are a small number that
> are in some other encoding.  Any attempt to reencode the data is likely
> to make the problem worse in at least some cases.
> 
> I suspect that other perforce depots are used primarily from Windows
> machines, and have data that is encoded in a mostly consistent way but
> the encoding is not UTF-8.  Re-encoding the data for git makes sense in
> that case.  Is this the kind of repository you have?
> 
> If there are these two different cases then we probably need to come up
> with a patch that solves both issues.
> 
> For my cases where we've got a repository containing all sorts of junk,
> it sounds like it might be awkward to create a test case that works on
> Windows.
> 


https://www.perforce.com/perforce/doc.current/user/i18nnotes.txt

Tzadik - is your server unicode enabled or not? That would be 
interesting to know:

     p4 counters | grep -i unicode

I suspect it is not. It's only if unicode is enabled that the server 
will convert to/from utf8 (at least that's my understanding). Without 
this setting, p4d and p4 are (probably) not doing any conversions.

I think it might be useful to clarify exactly what conversions are 
actually happening.

I wonder what encoding Perforce thinks you've got in place.






  reply	other threads:[~2021-04-30 15:33 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-12  8:52 [PATCH 0/2] git-p4: encoding of data from perforce Andrew Oakley
2021-04-12  8:52 ` [PATCH 1/2] git-p4: avoid decoding more " Andrew Oakley
2021-04-12  8:52 ` [PATCH 2/2] git-p4: do not decode data from perforce by default Andrew Oakley
2021-04-29 10:00   ` Tzadik Vanderhoof
2021-04-30  8:53     ` Andrew Oakley
2021-04-30 15:33       ` Luke Diamand [this message]
2021-04-30 18:08         ` Tzadik Vanderhoof
2021-05-04 21:01           ` Andrew Oakley
2021-05-04 21:46             ` Tzadik Vanderhoof
2021-05-05  1:11               ` Junio C Hamano
2021-05-05  4:02                 ` Tzadik Vanderhoof
2021-05-05  4:06                   ` Tzadik Vanderhoof
2021-05-05  4:34                   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=021c0caf-8e6f-4fbb-6ff7-40bacbe5de38@diamand.org \
    --to=luke@diamand.org \
    --cc=andrew@adoakley.name \
    --cc=git@vger.kernel.org \
    --cc=me@feiyangxue.com \
    --cc=tzadik.vanderhoof@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).