All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Zhao <yang.zhao@skyboxlabs.com>
To: git@vger.kernel.org
Cc: Yang Zhao <yang.zhao@skyboxlabs.com>
Subject: [RFC PATCH 4/4] git-p4: use utf-8 encoding for file paths throughout
Date: Wed, 27 Nov 2019 17:28:07 -0800	[thread overview]
Message-ID: <20191128012807.3103-5-yang.zhao@skyboxlabs.com> (raw)
In-Reply-To: <20191128012807.3103-1-yang.zhao@skyboxlabs.com>

Try to decode file paths in responses from p4 as soon as possible so
that we are working with unicode string throughout the rest of the flow.
This makes python 3 a lot happier.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
---

This is probably the most risky patch out of the set. It's very likely
that I've neglected to consider certain corner cases with decoding of
path data.

 git-p4.py | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 6821d6aafd..bd693e1404 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -650,11 +650,27 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             if use_encoded_streams:
                 # Decode unmarshalled dict to use str keys and values, except for:
                 #   - `data` which may contain arbitrary binary data
-                #   - `depotFile` which may contain non-UTF8 encoded text
+                #   - `depotFile` which may contain non-UTF8 encoded text, and is decoded
+                #     according to git-p4.pathEncoding config
                 decoded_entry = {}
                 for key, value in entry.items():
                     key = key.decode()
-                    decoded_entry[key] = value.decode() if not (key in ['data', 'depotFile'] or isinstance(value, str)) else value
+                    if key == 'data':
+                        pass
+                    elif key == 'depotFile':
+                        try:
+                            value = value.decode('ascii')
+                        except:
+                            encoding = 'utf-8'
+                            if gitConfig('git-p4.pathEncoding'):
+                                encoding = gitConfig('git-p4.pathEncoding')
+                            path = path.decode(encoding, 'replace')
+                            if verbose:
+                                print('Path with non-ASCII characters detected. Used %s to decode: %s ' % (encoding, path))
+                    elif not isinstance(value, str):
+                        value = value.decode()
+
+                    decoded_entry[key] = value
                 entry = decoded_entry
             if skip_info:
                 if 'code' in entry and entry['code'] == 'info':
@@ -2758,24 +2774,11 @@ def writeToGitStream(self, gitMode, relPath, contents):
             self.gitStream.write(d)
         self.gitStream.write('\n')
 
-    def encodeWithUTF8(self, path):
-        try:
-            path.decode('ascii')
-        except:
-            encoding = 'utf8'
-            if gitConfig('git-p4.pathEncoding'):
-                encoding = gitConfig('git-p4.pathEncoding')
-            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
-            if self.verbose:
-                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
-        return path
-
     # output one file from the P4 stream
     # - helper for streamP4Files
 
     def streamOneP4File(self, file, contents):
         relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
         if verbose:
             if 'fileSize' in self.stream_file:
                 size = int(self.stream_file['fileSize'])
@@ -2858,7 +2861,6 @@ def streamOneP4File(self, file, contents):
 
     def streamOneP4Deletion(self, file):
         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
         if verbose:
             sys.stdout.write("delete %s\n" % relPath)
             sys.stdout.flush()
-- 
2.24.0.windows.2


  parent reply	other threads:[~2019-11-28  1:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-28  1:28 [RFC PATCH 0/4] git-p4: python 3 compatability Yang Zhao
2019-11-28  1:28 ` [RFC PATCH 1/4] git-p4: decode response from p4 to str for python3 Yang Zhao
2019-11-28  1:28 ` [RFC PATCH 2/4] git-p4: properly encode/decode communication with git for python 3 Yang Zhao
2019-11-28  1:28 ` [RFC PATCH 3/4] git-p4: open .gitp4-usercache.txt in text mode Yang Zhao
2019-11-28  1:28 ` Yang Zhao [this message]
2019-11-28  2:57   ` [RFC PATCH 4/4] git-p4: use utf-8 encoding for file paths throughout Elijah Newren
2019-11-28 12:54 ` [RFC PATCH 0/4] git-p4: python 3 compatability Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191128012807.3103-5-yang.zhao@skyboxlabs.com \
    --to=yang.zhao@skyboxlabs.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.