All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Zhao <yang.zhao@skyboxlabs.com>
To: git@vger.kernel.org
Cc: Yang Zhao <yang.zhao@skyboxlabs.com>
Subject: [PATCH 04/13] git-p4: decode response from p4 to str for python3
Date: Fri,  6 Dec 2019 16:33:22 -0800	[thread overview]
Message-ID: <20191207003333.3228-5-yang.zhao@skyboxlabs.com> (raw)
In-Reply-To: <20191207003333.3228-1-yang.zhao@skyboxlabs.com>

The marshalled dict in the response given on STDOUT by p4 uses `str` for
keys and string values. When run using python3, these values are
deserialized as `bytes`, leading to a whole host of problems as the rest
of the code assumes `str` is used throughout.

This patch changes the deserialization behaviour such that, as much as
possible, text output from p4 is decoded to native unicode strings.
Exceptions are made for the field `data` as it is usually arbitrary
binary data. `depotFile[0-9]*`, `path`, and `clientFile` are also exempt
as they contain path information which may not be UTF-8 encoding
compatible, and must survive round-trip back to p4.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>

SQUASH: use unicode string internally throughout
---
 git-p4.py | 61 ++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 45 insertions(+), 16 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index ebeef35a92..6720c7b24a 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -157,6 +157,19 @@ def die(msg):
         sys.stderr.write(msg + "\n")
         sys.exit(1)
 
+# We need different encoding/decoding strategies for text data being passed
+# around in pipes depending on python version
+if sys.version_info.major >= 3:
+    def decode_text_stream(s):
+        return s.decode() if isinstance(s, bytes) else s
+    def encode_text_stream(s):
+        return s.encode() if isinstance(s, str) else s
+else:
+    def decode_text_stream(s):
+        return s
+    def encode_text_stream(s):
+        return s.encode('utf_8') if isinstance(s, unicode) else s
+
 def write_pipe(c, stdin):
     if verbose:
         sys.stderr.write('Writing pipe: %s\n' % str(c))
@@ -186,7 +199,7 @@ def read_pipe_full(c):
     expand = isinstance(c,basestring)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
-    return (p.returncode, out, err)
+    return (p.returncode, out, decode_text_stream(err))
 
 def read_pipe(c, ignore_error=False):
     """ Read output from  command. Returns the output text on
@@ -209,11 +222,11 @@ def read_pipe_text(c):
     if retcode != 0:
         return None
     else:
-        return out.rstrip()
+        return decode_text_stream(out).rstrip()
 
-def p4_read_pipe(c, ignore_error=False):
+def p4_read_pipe(c, ignore_error=False, raw=False):
     real_cmd = p4_build_cmd(c)
-    return read_pipe(real_cmd, ignore_error)
+    return read_pipe(real_cmd, ignore_error, raw=raw)
 
 def read_pipe_lines(c):
     if verbose:
@@ -222,7 +235,7 @@ def read_pipe_lines(c):
     expand = isinstance(c, basestring)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, shell=expand)
     pipe = p.stdout
-    val = pipe.readlines()
+    val = [decode_text_stream(line) for line in pipe.readlines()]
     if pipe.close() or p.wait():
         die('Command failed: %s' % str(c))
 
@@ -253,6 +266,7 @@ def p4_has_move_command():
     cmd = p4_build_cmd(["move", "-k", "@from", "@to"])
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     (out, err) = p.communicate()
+    err = decode_text_stream(err)
     # return code will be 1 in either case
     if err.find("Invalid option") >= 0:
         return False
@@ -633,6 +647,20 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     try:
         while True:
             entry = marshal.load(p4.stdout)
+            if bytes is not str:
+                # Decode unmarshalled dict to use str keys and values, except for:
+                #   - `data` which may contain arbitrary binary data
+                #   - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text
+                decoded_entry = {}
+                for key, value in entry.items():
+                    key = key.decode()
+                    if isinstance(value, bytes) and not (key in ('data', 'path', 'clientFile') or key.startswith('depotFile')):
+                        value = value.decode()
+                    decoded_entry[key] = value
+                # Parse out data if it's an error response
+                if decoded_entry.get('code') == 'error' and 'data' in decoded_entry:
+                    decoded_entry['data'] = decoded_entry['data'].decode()
+                entry = decoded_entry
             if skip_info:
                 if 'code' in entry and entry['code'] == 'info':
                     continue
@@ -850,6 +878,7 @@ def branch_exists(branch):
     cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     out, _ = p.communicate()
+    out = decode_text_stream(out)
     if p.returncode:
         return False
     # expect exactly one line of output: the branch name
@@ -1993,7 +2022,7 @@ def applyCommit(self, id):
         tmpFile = os.fdopen(handle, "w+b")
         if self.isWindows:
             submitTemplate = submitTemplate.replace("\n", "\r\n")
-        tmpFile.write(submitTemplate)
+        tmpFile.write(encode_text_stream(submitTemplate))
         tmpFile.close()
 
         if self.prepare_p4_only:
@@ -2040,11 +2069,11 @@ def applyCommit(self, id):
             if self.edit_template(fileName):
                 # read the edited message and submit
                 tmpFile = open(fileName, "rb")
-                message = tmpFile.read()
+                message = decode_text_stream(tmpFile.read())
                 tmpFile.close()
                 if self.isWindows:
                     message = message.replace("\r\n", "\n")
-                submitTemplate = message[:message.index(separatorLine)]
+                submitTemplate = encode_text_stream(message[:message.index(separatorLine)])
 
                 if update_shelve:
                     p4_write_pipe(['shelve', '-r', '-i'], submitTemplate)
@@ -2145,7 +2174,7 @@ def exportGitTags(self, gitTags):
                 print("Not creating p4 label %s for tag due to option" \
                       " --prepare-p4-only" % name)
             else:
-                p4_write_pipe(["label", "-i"], labelTemplate)
+                p4_write_pipe(["label", "-i"], encode_text_stream(labelTemplate))
 
                 # Use the label
                 p4_system(["tag", "-l", name] +
@@ -2469,7 +2498,7 @@ def append(self, view_line):
 
     def convert_client_path(self, clientFile):
         # chop off //client/ part to make it relative
-        if not clientFile.startswith(self.client_prefix):
+        if not decode_path(clientFile).startswith(self.client_prefix):
             die("No prefix '%s' on clientFile '%s'" %
                 (self.client_prefix, clientFile))
         return clientFile[len(self.client_prefix):]
@@ -2729,7 +2758,7 @@ def splitFilesIntoBranches(self, commit):
         return branches
 
     def writeToGitStream(self, gitMode, relPath, contents):
-        self.gitStream.write('M %s inline %s\n' % (gitMode, relPath))
+        self.gitStream.write(encode_text_stream(u'M {} inline {}\n'.format(gitMode, relPath)))
         self.gitStream.write('data %d\n' % sum(len(d) for d in contents))
         for d in contents:
             self.gitStream.write(d)
@@ -2770,7 +2799,7 @@ def streamOneP4File(self, file, contents):
             git_mode = "120000"
             # p4 print on a symlink sometimes contains "target\n";
             # if it does, remove the newline
-            data = ''.join(contents)
+            data = ''.join(decode_text_stream(c) for c in contents)
             if not data:
                 # Some version of p4 allowed creating a symlink that pointed
                 # to nothing.  This causes p4 errors when checking out such
@@ -2824,7 +2853,7 @@ def streamOneP4File(self, file, contents):
         pattern = p4_keywords_regexp_for_type(type_base, type_mods)
         if pattern:
             regexp = re.compile(pattern, re.VERBOSE)
-            text = ''.join(contents)
+            text = ''.join(decode_text_stream(c) for c in contents)
             text = regexp.sub(r'$\1$', text)
             contents = [ text ]
 
@@ -2839,7 +2868,7 @@ def streamOneP4Deletion(self, file):
         if verbose:
             sys.stdout.write("delete %s\n" % relPath)
             sys.stdout.flush()
-        self.gitStream.write("D %s\n" % relPath)
+        self.gitStream.write(encode_text_stream(u'D {}\n'.format(relPath)))
 
         if self.largeFileSystem and self.largeFileSystem.isLargeFile(relPath):
             self.largeFileSystem.removeLargeFile(relPath)
@@ -2939,9 +2968,9 @@ def streamP4FilesCbSelf(entry):
                 if 'shelved_cl' in f:
                     # Handle shelved CLs using the "p4 print file@=N" syntax to print
                     # the contents
-                    fileArg = '%s@=%d' % (f['path'], f['shelved_cl'])
+                    fileArg = f['path'] + encode_text_stream('@={}'.format(f['shelved_cl']))
                 else:
-                    fileArg = '%s#%s' % (f['path'], f['rev'])
+                    fileArg = f['path'] + encode_text_stream('#{}'.format(f['rev']))
 
                 fileArgs.append(fileArg)
 
-- 
2.21.0.windows.1


  parent reply	other threads:[~2019-12-07  0:33 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-07  0:33 [PATCH 00/13] git-p4: python3 compatibility Yang Zhao
2019-12-07  0:33 ` [PATCH 01/13] ci: also run linux-gcc pipeline with python-3.7 environment Yang Zhao
2019-12-10 10:30   ` SZEDER Gábor
2019-12-10 19:11     ` Yang Zhao
2019-12-12 14:13       ` SZEDER Gábor
2019-12-12 17:04         ` Yang Zhao
2019-12-12 17:15           ` SZEDER Gábor
2019-12-12 19:02             ` Yang Zhao
2019-12-07  0:33 ` [PATCH 02/13] git-p4: make python-2.7 the oldest supported version Yang Zhao
2019-12-07  0:33 ` [PATCH 03/13] git-p4: simplify python version detection Yang Zhao
2019-12-07  0:33 ` Yang Zhao [this message]
2019-12-07  0:33 ` [PATCH 05/13] git-p4: properly encode/decode communication with git for python 3 Yang Zhao
2019-12-07  0:33 ` [PATCH 06/13] git-p4: convert path to unicode before processing them Yang Zhao
2019-12-07  0:33 ` [PATCH 06/13] git-p4: open .gitp4-usercache.txt in text mode Yang Zhao
2019-12-07  0:33 ` [PATCH 07/13] git-p4: convert path to unicode before processing them Yang Zhao
2019-12-07  0:33 ` [PATCH 07/13] git-p4: open .gitp4-usercache.txt in text mode Yang Zhao
2019-12-07  0:33 ` [PATCH 08/13] git-p4: use marshal format version 2 when sending to p4 Yang Zhao
2019-12-07  0:33 ` [PATCH 09/13] git-p4: fix freezing while waiting for fast-import progress Yang Zhao
2019-12-07  0:33 ` [PATCH 10/13] git-p4: use functools.reduce instead of reduce Yang Zhao
2019-12-07  0:33 ` [PATCH 11/13] git-p4: use dict.items() iteration for python3 compatibility Yang Zhao
2019-12-07  0:33 ` [PATCH 12/13] git-p4: simplify regex pattern generation for parsing diff-tree Yang Zhao
2019-12-07  0:33 ` [PATCH 13/13] git-p4: use python3's input() everywhere Yang Zhao
2019-12-07  1:09 ` [PATCH 00/13] git-p4: python3 compatibility Denton Liu
2019-12-07  7:29   ` Yang Zhao
2019-12-07 16:21     ` Ben Keene
2019-12-07 19:59       ` Yang Zhao
2019-12-09 15:03         ` Ben Keene
2019-12-09 18:54           ` Ben Keene
2019-12-09 19:48             ` Johannes Schindelin
2019-12-10 14:20               ` Ben Keene
2019-12-09 20:21           ` Yang Zhao
2019-12-13 17:10             ` Ben Keene
2019-12-07  7:34 ` Yang Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191207003333.3228-5-yang.zhao@skyboxlabs.com \
    --to=yang.zhao@skyboxlabs.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.