Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3
@ 2019-11-13 21:07 Ben Keene via GitGitGadget
  2019-11-13 21:07 ` [PATCH 1/1] " Ben Keene via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-11-13 21:07 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano

commit: git-p4.py: Cast byte strings to unicode strings in python3

I tried to run git-p4 under python3 and it failed with an error that it
could not connect to the P4 server. This is caused by the return values from
the process.popen returning byte strings and the code is failing when it is
comparing these with literal strings which are Unicode in Python 3.

To support this, I added a new function ustring() in the code that
determines if python is natively supporting Unicode (Python 3) or not
(Python 2). 

 * If the python version supports Unicode (Python 3), it will cast the text
   (expected a byte string) to UTF-8. This allows the existing code to match
   literal strings as expected.
   
   
 * If the python version does not natively support Unicode (Python 2) the
   ustring() function does not change the byte string, maintaining current
   behavior.
   
   

There are a few notable methods changed:

 * pipe functions have their output passed through the ustring() function:
   
    * read_pipe_full(c)
    * p4_has_move_command()
   
   
 * p4CmdList has new conditional code to parse the dictionary marshaled from
   the process call. Both the keys and values are converted to Unicode.
   
   
 * gitConfig passes the return value through ustring() so all calls to
   gitConfig return unicode values.
   
   

Signed-off-by: Ben Keene seraphire@gmail.com [seraphire@gmail.com]

Ben Keene (1):
  Cast byte strings to unicode strings in python3

 git-p4.py | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)


base-commit: d9f6f3b6195a0ca35642561e530798ad1469bd41
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-463%2Fseraphire%2Fseraphire%2Fp4-python3-unicode-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-463/seraphire/seraphire/p4-python3-unicode-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/463
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/1] Cast byte strings to unicode strings in python3
  2019-11-13 21:07 [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
@ 2019-11-13 21:07 ` " Ben Keene via GitGitGadget
  2019-11-14  2:25 ` [PATCH 0/1] git-p4.py: " Junio C Hamano
  2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
  2 siblings, 0 replies; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-11-13 21:07 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <bkeene@partswatch.com>

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 60c73b6a37..6e8b3a26cd 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -36,12 +36,22 @@
     unicode = str
     bytes = bytes
     basestring = (str,bytes)
+    isunicode = True
+    def ustring(text):
+        """Returns the byte string as a unicode string"""
+        if text == '' or text == b'':
+            return ''
+        return unicode(text, "utf-8")
 else:
     # 'unicode' exists, must be Python 2
     str = str
     unicode = unicode
     bytes = str
     basestring = basestring
+    isunicode = False
+    def ustring(text):
+        """Returns the byte string unchanged"""
+        return text
 
 try:
     from subprocess import CalledProcessError
@@ -196,6 +206,8 @@ def read_pipe_full(c):
     expand = isinstance(c,basestring)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
+    out = ustring(out)
+    err = ustring(err)
     return (p.returncode, out, err)
 
 def read_pipe(c, ignore_error=False):
@@ -263,6 +275,7 @@ def p4_has_move_command():
     cmd = p4_build_cmd(["move", "-k", "@from", "@to"])
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     (out, err) = p.communicate()
+    err = ustring(err)
     # return code will be 1 in either case
     if err.find("Invalid option") >= 0:
         return False
@@ -646,10 +659,18 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             if skip_info:
                 if 'code' in entry and entry['code'] == 'info':
                     continue
+                if b'code' in entry and entry[b'code'] == b'info':
+                    continue
             if cb is not None:
                 cb(entry)
             else:
-                result.append(entry)
+                if isunicode:
+                    out = {}
+                    for key, value in entry.items():
+                        out[ustring(key)] = ustring(value)
+                    result.append(out)
+                else:
+                    result.append(entry)
     except EOFError:
         pass
     exitCode = p4.wait()
@@ -792,7 +813,7 @@ def gitConfig(key, typeSpecifier=None):
         cmd += [ key ]
         s = read_pipe(cmd, ignore_error=True)
         _gitConfig[key] = s.strip()
-    return _gitConfig[key]
+    return ustring(_gitConfig[key])
 
 def gitConfigBool(key):
     """Return a bool, using git config --bool.  It is True only if the
@@ -860,6 +881,7 @@ def branch_exists(branch):
     cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     out, _ = p.communicate()
+    out = ustring(out)
     if p.returncode:
         return False
     # expect exactly one line of output: the branch name
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3
  2019-11-13 21:07 [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  2019-11-13 21:07 ` [PATCH 1/1] " Ben Keene via GitGitGadget
@ 2019-11-14  2:25 ` " Junio C Hamano
  2019-11-14  9:46   ` Luke Diamand
  2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
  2 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2019-11-14  2:25 UTC (permalink / raw)
  To: Luke Diamand; +Cc: git, Ben Keene, Ben Keene via GitGitGadget

"Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:

> commit: git-p4.py: Cast byte strings to unicode strings in python3

Luke, this patch [*1*] came in my way, but I am hardly an expert on
Py2to3 and know nothing about P4.  Could you take a look at them,
please?

Thanks.


[References]

<0bca930ff82623bbef172b4cb6c36ef8e5c46098.1573679258.git.gitgitgadget@gmail.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3
  2019-11-14  2:25 ` [PATCH 0/1] git-p4.py: " Junio C Hamano
@ 2019-11-14  9:46   ` Luke Diamand
  0 siblings, 0 replies; 46+ messages in thread
From: Luke Diamand @ 2019-11-14  9:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Users, Ben Keene, Ben Keene via GitGitGadget

On Thu, 14 Nov 2019 at 02:25, Junio C Hamano <gitster@pobox.com> wrote:
>
> "Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > commit: git-p4.py: Cast byte strings to unicode strings in python3
>
> Luke, this patch [*1*] came in my way, but I am hardly an expert on
> Py2to3 and know nothing about P4.  Could you take a look at them,
> please?
>
> Thanks.
>
>
> [References]
>
> <0bca930ff82623bbef172b4cb6c36ef8e5c46098.1573679258.git.gitgitgadget@gmail.com>

I just quickly tried it, and with git-p4 switched to using python3,
the unit tests fail.

$ make -C t T=t98*

But it looks like a reasonable approach, and with the demise of
Python2 fast approaching it would be good to get this fully working!

Luke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 0/3] git-p4.py: Cast byte strings to unicode strings in python3
  2019-11-13 21:07 [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  2019-11-13 21:07 ` [PATCH 1/1] " Ben Keene via GitGitGadget
  2019-11-14  2:25 ` [PATCH 0/1] git-p4.py: " Junio C Hamano
@ 2019-11-15 14:39 ` " Ben Keene via GitGitGadget
  2019-11-15 14:39   ` [PATCH v2 1/3] " Ben Keene via GitGitGadget
                     ` (3 more replies)
  2 siblings, 4 replies; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-11-15 14:39 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano

git-p4.py: Cast byte strings to unicode strings in python3

I tried to run git-p4 under python3 and it failed with an error that it
could not connect to the P4 server. This PR covers updating the git-p4.py
python script to work with unicode strings in python3.

Changes since v1: Commit: (0435d0e) 2019-11-14

The problem was caused by the ustring() function being called on a string
that had already been cast as a unicode string. This second call to
ustring() would fail with an error of "decoding str is not supported"

The following changes were made to fix this:

The call to ustring() in the gitConfig() function is actually unnecessary
because the read_pipe() function returns unicode strings so the call has
been removed.

The ustring() function was given a new conditional test to see if the value
is already a unicode value. If it is, the value will be returned without any
casting.

These two changes should fix the immediate fail. However, I do not have an
environment that I can run the test suite against so I don't know if another
error will be uncovered yet. I'm still working on it.

v1: (Initial Commit)

This is caused by the return values from the process.popen returning byte
strings and the code is failing when it is comparing these with literal
strings which are Unicode in Python 3.

To support this, I added a new function ustring() in the code that
determines if python is natively supporting Unicode (Python 3) or not
(Python 2). 

 * If the python version supports Unicode (Python 3), it will cast the text
   (expected a byte string) to UTF-8. This allows the existing code to match
   literal strings as expected.
   
   
 * If the python version does not natively support Unicode (Python 2) the
   ustring() function does not change the byte string, maintaining current
   behavior.
   
   

There are a few notable methods changed:

 * pipe functions have their output passed through the ustring() function:
   
    * read_pipe_full(c)
    * p4_has_move_command()
   
   
 * p4CmdList has new conditional code to parse the dictionary marshaled from
   the process call. Both the keys and values are converted to Unicode.
   
   
 * gitConfig passes the return value through ustring() so all calls to
   gitConfig return unicode values.
   
   

Signed-off-by: Ben Keene seraphire@gmail.com [seraphire@gmail.com]

Ben Keene (3):
  Cast byte strings to unicode strings in python3
  FIX: cast as unicode fails when a value is already unicode
  FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read
    of the pointer file in ustring()

 git-p4.py | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)


base-commit: d9f6f3b6195a0ca35642561e530798ad1469bd41
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-463%2Fseraphire%2Fseraphire%2Fp4-python3-unicode-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-463/seraphire/seraphire/p4-python3-unicode-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/463

Range-diff vs v1:

 1:  0bca930ff8 = 1:  0bca930ff8 Cast byte strings to unicode strings in python3
 -:  ---------- > 2:  0435d0e2cb FIX: cast as unicode fails when a value is already unicode
 -:  ---------- > 3:  2288690b94 FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 1/3] Cast byte strings to unicode strings in python3
  2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
@ 2019-11-15 14:39   ` " Ben Keene via GitGitGadget
  2019-11-15 14:39   ` [PATCH v2 2/3] FIX: cast as unicode fails when a value is already unicode Ben Keene via GitGitGadget
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-11-15 14:39 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <bkeene@partswatch.com>

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 60c73b6a37..6e8b3a26cd 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -36,12 +36,22 @@
     unicode = str
     bytes = bytes
     basestring = (str,bytes)
+    isunicode = True
+    def ustring(text):
+        """Returns the byte string as a unicode string"""
+        if text == '' or text == b'':
+            return ''
+        return unicode(text, "utf-8")
 else:
     # 'unicode' exists, must be Python 2
     str = str
     unicode = unicode
     bytes = str
     basestring = basestring
+    isunicode = False
+    def ustring(text):
+        """Returns the byte string unchanged"""
+        return text
 
 try:
     from subprocess import CalledProcessError
@@ -196,6 +206,8 @@ def read_pipe_full(c):
     expand = isinstance(c,basestring)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
+    out = ustring(out)
+    err = ustring(err)
     return (p.returncode, out, err)
 
 def read_pipe(c, ignore_error=False):
@@ -263,6 +275,7 @@ def p4_has_move_command():
     cmd = p4_build_cmd(["move", "-k", "@from", "@to"])
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     (out, err) = p.communicate()
+    err = ustring(err)
     # return code will be 1 in either case
     if err.find("Invalid option") >= 0:
         return False
@@ -646,10 +659,18 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             if skip_info:
                 if 'code' in entry and entry['code'] == 'info':
                     continue
+                if b'code' in entry and entry[b'code'] == b'info':
+                    continue
             if cb is not None:
                 cb(entry)
             else:
-                result.append(entry)
+                if isunicode:
+                    out = {}
+                    for key, value in entry.items():
+                        out[ustring(key)] = ustring(value)
+                    result.append(out)
+                else:
+                    result.append(entry)
     except EOFError:
         pass
     exitCode = p4.wait()
@@ -792,7 +813,7 @@ def gitConfig(key, typeSpecifier=None):
         cmd += [ key ]
         s = read_pipe(cmd, ignore_error=True)
         _gitConfig[key] = s.strip()
-    return _gitConfig[key]
+    return ustring(_gitConfig[key])
 
 def gitConfigBool(key):
     """Return a bool, using git config --bool.  It is True only if the
@@ -860,6 +881,7 @@ def branch_exists(branch):
     cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     out, _ = p.communicate()
+    out = ustring(out)
     if p.returncode:
         return False
     # expect exactly one line of output: the branch name
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 2/3] FIX: cast as unicode fails when a value is already unicode
  2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
  2019-11-15 14:39   ` [PATCH v2 1/3] " Ben Keene via GitGitGadget
@ 2019-11-15 14:39   ` Ben Keene via GitGitGadget
  2019-11-15 14:39   ` [PATCH v2 3/3] FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring() Ben Keene via GitGitGadget
  2019-12-02 19:02   ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  3 siblings, 0 replies; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-11-15 14:39 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/git-p4.py b/git-p4.py
index 6e8b3a26cd..b088095b15 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -39,6 +39,8 @@
     isunicode = True
     def ustring(text):
         """Returns the byte string as a unicode string"""
+        if isinstance(text, unicode):
+            return text
         if text == '' or text == b'':
             return ''
         return unicode(text, "utf-8")
@@ -813,7 +815,7 @@ def gitConfig(key, typeSpecifier=None):
         cmd += [ key ]
         s = read_pipe(cmd, ignore_error=True)
         _gitConfig[key] = s.strip()
-    return ustring(_gitConfig[key])
+    return _gitConfig[key]
 
 def gitConfigBool(key):
     """Return a bool, using git config --bool.  It is True only if the
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 3/3] FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring()
  2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
  2019-11-15 14:39   ` [PATCH v2 1/3] " Ben Keene via GitGitGadget
  2019-11-15 14:39   ` [PATCH v2 2/3] FIX: cast as unicode fails when a value is already unicode Ben Keene via GitGitGadget
@ 2019-11-15 14:39   ` Ben Keene via GitGitGadget
  2019-12-02 19:02   ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  3 siblings, 0 replies; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-11-15 14:39 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/git-p4.py b/git-p4.py
index b088095b15..83f59ddca5 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -180,6 +180,11 @@ def die(msg):
         sys.exit(1)
 
 def write_pipe(c, stdin):
+    """Writes stdin to the command's stdin
+    Returns the number of bytes written.
+
+    Be aware - the byte count may change between 
+    Python2 and Python3"""
     if verbose:
         sys.stderr.write('Writing pipe: %s\n' % str(c))
 
@@ -249,6 +254,11 @@ def read_pipe_lines(c):
     val = pipe.readlines()
     if pipe.close() or p.wait():
         die('Command failed: %s' % str(c))
+    # Unicode conversion from str
+    # Iterate and fix in-place to avoid a second list in memory.
+    if isunicode:
+        for i in range(len(val)):
+            val[i] = ustring(val[i])
 
     return val
 
@@ -1268,7 +1278,7 @@ def generatePointer(self, contentFile):
             ['git', 'lfs', 'pointer', '--file=' + contentFile],
             stdout=subprocess.PIPE
         )
-        pointerFile = pointerProcess.stdout.read()
+        pointerFile = ustring(pointerProcess.stdout.read())
         if pointerProcess.wait():
             os.remove(contentFile)
             die('git-lfs pointer command failed. Did you install the extension?')
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3
  2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
                     ` (2 preceding siblings ...)
  2019-11-15 14:39   ` [PATCH v2 3/3] FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring() Ben Keene via GitGitGadget
@ 2019-12-02 19:02   ` Ben Keene via GitGitGadget
  2019-12-02 19:02     ` [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support Ben Keene via GitGitGadget
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  3 siblings, 2 replies; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-02 19:02 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano

Issue: The current git-p4.py script does not work with python3.

I have attempted to use the P4 integration built into GIT and I was unable
to get the program to run because I have Python 3.8 installed on my
computer. I was able to get the program to run when I downgraded my python
to version 2.7. However, python 2 is reaching its end of life.

Submission: I am submitting a patch for the git-p4.py script that partially
supports python 3.8. This code was able to pass the basic tests (t9800) when
run against Python3. This provides basic functionality. 

In an attempt to pass the t9822 P4 path-encoding test, a new parameter for
git P4 Clone was introduced. 

--encoding Format-identifier

This will create the GIT repository following the current functionality;
however, before importing the files from P4, it will set the
git-p4.pathEncoding option so any files or paths that are encoded with
non-ASCII/non-UTF-8 formats will import correctly.

Technical details: The script was updated by futurize (
https://python-future.org/futurize.html) to support Py2/Py3 syntax. The few
references to classes in future were reworked so that future would not be
required. The existing code test for Unicode support was extended to
normalize the classes “unicode” and “bytes” to across platforms:

 * ‘unicode’ is an alias for ‘str’ in Py3 and is the unicode class in Py2.
 * ‘bytes’ is bytes in Py3 and an alias for ‘str’ in Py2.

New coercion methods were written for both Python2 and Python3:

 * as_string(text) – In Python3, this encodes a bytes object as a UTF-8
   encoded Unicode string. 
 * as_bytes(text) – In Python3, this decodes a Unicode string to an array of
   bytes.

In Python2, these functions do not change the data since a ‘str’ object
function in both roles as strings and byte arrays. This reduces the
potential impact on backward compatibility with Python 2.

 * to_unicode(text) – ensures that the supplied data is encoded as a UTF-8
   string. This function will encode data in both Python2 and Python3. * 
      path_as_string(path) – This function is an extension function that
      honors the option “git-p4.pathEncoding” to convert a set of bytes or
      characters to UTF-8. If the str/bytes cannot decode as ASCII, it will
      use the encodeWithUTF8() method to convert the custom encoded bytes to
      Unicode in UTF-8.
   
   

Generally speaking, information in the script is converted to Unicode as
early as possible and converted back to a byte array just before passing to
external programs or files. The exception to this rule is P4 Repository file
paths.

Paths are not converted but left as “bytes” so the original file path
encoding can be preserved. This formatting is required for commands that
interact with the P4 file path. When the file path is used by GIT, it is
converted with encodeWithUTF8().

Signed-off-by: Ben Keene seraphire@gmail.com [seraphire@gmail.com]

Ben Keene (1):
  Python3 support for t9800 tests. Basic P4/Python3 support

 git-p4.py | 825 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 628 insertions(+), 197 deletions(-)


base-commit: d9f6f3b6195a0ca35642561e530798ad1469bd41
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-463%2Fseraphire%2Fseraphire%2Fp4-python3-unicode-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-463/seraphire/seraphire/p4-python3-unicode-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/463

Range-diff vs v2:

 1:  0bca930ff8 < -:  ---------- Cast byte strings to unicode strings in python3
 2:  0435d0e2cb < -:  ---------- FIX: cast as unicode fails when a value is already unicode
 3:  2288690b94 < -:  ---------- FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring()
 -:  ---------- > 1:  02b3843e9f Python3 support for t9800 tests. Basic P4/Python3 support

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support
  2019-12-02 19:02   ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
@ 2019-12-02 19:02     ` Ben Keene via GitGitGadget
  2019-12-03  0:18       ` Denton Liu
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  1 sibling, 1 reply; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-02 19:02 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Signed-off-by: Ben Keene <seraphire@gmail.com>
---
 git-p4.py | 825 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 628 insertions(+), 197 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 60c73b6a37..6f82184fe5 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -26,22 +26,87 @@
 import zlib
 import ctypes
 import errno
+import os.path
+import codecs
+import io
 
 # support basestring in python3
 try:
     unicode = unicode
 except NameError:
     # 'unicode' is undefined, must be Python 3
-    str = str
+    #
+    # For Python3 which is natively unicode, we will use 
+    # unicode for internal information but all P4 Data
+    # will remain in bytes
+    isunicode = True
     unicode = str
     bytes = bytes
-    basestring = (str,bytes)
+
+    def as_string(text):
+        """Return a byte array as a unicode string"""
+        if text == None:
+            return None
+        if isinstance(text, bytes):
+            return unicode(text, "utf-8")
+        else:
+            return text
+
+    def as_bytes(text):
+        """Return a Unicode string as a byte array"""
+        if text == None:
+            return None
+        if isinstance(text, bytes):
+            return text
+        else:
+            return bytes(text, "utf-8")
+
+    def to_unicode(text):
+        """Return a byte array as a unicode string"""
+        return as_string(text)    
+
+    def path_as_string(path):
+        """ Converts a path to the UTF8 encoded string """
+        if isinstance(path, unicode):
+            return path
+        return encodeWithUTF8(path).decode('utf-8')
+    
 else:
     # 'unicode' exists, must be Python 2
-    str = str
+    #
+    # We will treat the data as:
+    #   str   -> str
+    #   bytes -> str
+    # So for Python2 these functions are no-ops
+    # and will leave the data in the ambiguious
+    # string/bytes state
+    isunicode = False
     unicode = unicode
     bytes = str
-    basestring = basestring
+
+    def as_string(text):
+        """ Return text unaltered (for Python3 support) """
+        return text
+
+    def as_bytes(text):
+        """ Return text unaltered (for Python3 support) """
+        return text
+
+    def to_unicode(text):
+        """Return a string as a unicode string"""
+        return text.decode('utf-8')
+    
+    def path_as_string(path):
+        """ Converts a path to the UTF8 encoded bytes """
+        return encodeWithUTF8(path)
+
+
+ 
+# Check for raw_input support
+try:
+    raw_input
+except NameError:
+    raw_input = input
 
 try:
     from subprocess import CalledProcessError
@@ -75,7 +140,11 @@ def p4_build_cmd(cmd):
     location. It means that hooking into the environment, or other configuration
     can be done more easily.
     """
-    real_cmd = ["p4"]
+    # Look for the P4 binary
+    if (platform.system() == "Windows"):
+        real_cmd = ["p4.exe"]    
+    else:
+        real_cmd = ["p4"]
 
     user = gitConfig("git-p4.user")
     if len(user) > 0:
@@ -105,7 +174,7 @@ def p4_build_cmd(cmd):
         # Provide a way to not pass this option by setting git-p4.retries to 0
         real_cmd += ["-r", str(retries)]
 
-    if isinstance(cmd,basestring):
+    if not isinstance(cmd, list):
         real_cmd = ' '.join(real_cmd) + ' ' + cmd
     else:
         real_cmd += cmd
@@ -168,10 +237,11 @@ def die(msg):
         sys.exit(1)
 
 def write_pipe(c, stdin):
+    """Executes the command 'c', passing 'stdin' on the standard input"""
     if verbose:
         sys.stderr.write('Writing pipe: %s\n' % str(c))
 
-    expand = isinstance(c,basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdin=subprocess.PIPE, shell=expand)
     pipe = p.stdin
     val = pipe.write(stdin)
@@ -179,11 +249,11 @@ def write_pipe(c, stdin):
     if p.wait():
         die('Command failed: %s' % str(c))
 
-    return val
 
 def p4_write_pipe(c, stdin):
+    """ Runs a P4 command 'c', passing 'stdin' data to P4"""
     real_cmd = p4_build_cmd(c)
-    return write_pipe(real_cmd, stdin)
+    write_pipe(real_cmd, stdin)
 
 def read_pipe_full(c):
     """ Read output from  command. Returns a tuple
@@ -193,9 +263,11 @@ def read_pipe_full(c):
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
-    expand = isinstance(c,basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
+    out = as_string(out)
+    err = as_string(err)
     return (p.returncode, out, err)
 
 def read_pipe(c, ignore_error=False):
@@ -222,19 +294,31 @@ def read_pipe_text(c):
         return out.rstrip()
 
 def p4_read_pipe(c, ignore_error=False):
+    """ Read output from the P4 command 'c'. Returns the output text on
+        success. On failure, terminates execution, unless
+        ignore_error is True, when it returns an empty string.
+    """
     real_cmd = p4_build_cmd(c)
     return read_pipe(real_cmd, ignore_error)
 
 def read_pipe_lines(c):
+    """ Returns a list of text from executing the command 'c'.
+        The program will die if the command fails to execute.
+    """
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
-    expand = isinstance(c, basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, shell=expand)
     pipe = p.stdout
     val = pipe.readlines()
     if pipe.close() or p.wait():
         die('Command failed: %s' % str(c))
+    # Unicode conversion from byte-string
+    # Iterate and fix in-place to avoid a second list in memory.
+    if isunicode:
+        for i in range(len(val)):
+            val[i] = as_string(val[i])
 
     return val
 
@@ -263,6 +347,8 @@ def p4_has_move_command():
     cmd = p4_build_cmd(["move", "-k", "@from", "@to"])
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     (out, err) = p.communicate()
+    out=as_string(out)
+    err=as_string(err)
     # return code will be 1 in either case
     if err.find("Invalid option") >= 0:
         return False
@@ -272,7 +358,7 @@ def p4_has_move_command():
     return True
 
 def system(cmd, ignore_error=False):
-    expand = isinstance(cmd,basestring)
+    expand = not isinstance(cmd, list)
     if verbose:
         sys.stderr.write("executing %s\n" % str(cmd))
     retcode = subprocess.call(cmd, shell=expand)
@@ -282,9 +368,10 @@ def system(cmd, ignore_error=False):
     return retcode
 
 def p4_system(cmd):
-    """Specifically invoke p4 as the system command. """
+    """ Specifically invoke p4 as the system command. 
+    """
     real_cmd = p4_build_cmd(cmd)
-    expand = isinstance(real_cmd, basestring)
+    expand = not isinstance(real_cmd, list)
     retcode = subprocess.call(real_cmd, shell=expand)
     if retcode:
         raise CalledProcessError(retcode, real_cmd)
@@ -390,16 +477,20 @@ def p4_last_change():
     return int(results[0]['change'])
 
 def p4_describe(change, shelved=False):
-    """Make sure it returns a valid result by checking for
-       the presence of field "time".  Return a dict of the
-       results."""
+    """ Returns information about the requested P4 change list.
+
+        Data returns is not string encoded (returned as bytes)
+    """
+    # Make sure it returns a valid result by checking for
+    #   the presence of field "time".  Return a dict of the
+    #   results.
 
     cmd = ["describe", "-s"]
     if shelved:
         cmd += ["-S"]
     cmd += [str(change)]
 
-    ds = p4CmdList(cmd, skip_info=True)
+    ds = p4CmdList(cmd, skip_info=True, encode_data=False)
     if len(ds) != 1:
         die("p4 describe -s %d did not return 1 result: %s" % (change, str(ds)))
 
@@ -409,21 +500,31 @@ def p4_describe(change, shelved=False):
         die("p4 describe -s %d exited with %d: %s" % (change, d["p4ExitCode"],
                                                       str(d)))
     if "code" in d:
-        if d["code"] == "error":
+        if d["code"] == b"error":
             die("p4 describe -s %d returned error code: %s" % (change, str(d)))
 
     if "time" not in d:
         die("p4 describe -s %d returned no \"time\": %s" % (change, str(d)))
 
+    # Convert depotFile(X) to be UTF-8 encoded, as this is what GIT
+    # requires. This will also allow us to encode the rest of the text
+    # at the same time to simplify textual processing later.
+    keys=d.keys()
+    for key in keys:
+        if key.startswith('depotFile'):
+            d[key]=d[key] #DepotPath(d[key])
+        elif key == 'path':
+            d[key]=d[key] #DepotPath(d[key])
+        else:
+            d[key] = as_string(d[key])
+
     return d
 
-#
-# Canonicalize the p4 type and return a tuple of the
-# base type, plus any modifiers.  See "p4 help filetypes"
-# for a list and explanation.
-#
 def split_p4_type(p4type):
-
+    """ Canonicalize the p4 type and return a tuple of the
+        base type, plus any modifiers.  See "p4 help filetypes"
+        for a list and explanation.
+    """
     p4_filetypes_historical = {
         "ctempobj": "binary+Sw",
         "ctext": "text+C",
@@ -452,18 +553,16 @@ def split_p4_type(p4type):
         mods = s[1]
     return (base, mods)
 
-#
-# return the raw p4 type of a file (text, text+ko, etc)
-#
 def p4_type(f):
+    """ return the raw p4 type of a file (text, text+ko, etc)
+    """
     results = p4CmdList(["fstat", "-T", "headType", wildcard_encode(f)])
     return results[0]['headType']
 
-#
-# Given a type base and modifier, return a regexp matching
-# the keywords that can be expanded in the file
-#
 def p4_keywords_regexp_for_type(base, type_mods):
+    """ Given a type base and modifier, return a regexp matching
+        the keywords that can be expanded in the file
+    """
     if base in ("text", "unicode", "binary"):
         kwords = None
         if "ko" in type_mods:
@@ -482,12 +581,11 @@ def p4_keywords_regexp_for_type(base, type_mods):
     else:
         return None
 
-#
-# Given a file, return a regexp matching the possible
-# RCS keywords that will be expanded, or None for files
-# with kw expansion turned off.
-#
 def p4_keywords_regexp_for_file(file):
+    """ Given a file, return a regexp matching the possible
+        RCS keywords that will be expanded, or None for files
+        with kw expansion turned off.
+    """
     if not os.path.exists(file):
         return None
     else:
@@ -522,7 +620,7 @@ def getP4OpenedType(file):
 # Return the set of all p4 labels
 def getP4Labels(depotPaths):
     labels = set()
-    if isinstance(depotPaths,basestring):
+    if not isinstance(depotPaths, list):
         depotPaths = [depotPaths]
 
     for l in p4CmdList(["labels"] + ["%s..." % p for p in depotPaths]):
@@ -531,8 +629,8 @@ def getP4Labels(depotPaths):
 
     return labels
 
-# Return the set of all git tags
 def getGitTags():
+    """Return the set of all git tags"""
     gitTags = set()
     for line in read_pipe_lines(["git", "tag"]):
         tag = line.strip()
@@ -565,7 +663,7 @@ def parseDiffTreeEntry(entry):
 
     If the pattern is not matched, None is returned."""
 
-    match = diffTreePattern().next().match(entry)
+    match = next(diffTreePattern()).match(entry)
     if match:
         return {
             'src_mode': match.group(1),
@@ -584,6 +682,38 @@ def isModeExec(mode):
     # otherwise False.
     return mode[-3:] == "755"
 
+def encodeWithUTF8(path, verbose = False):
+    """ Ensure that the path is encoded as a UTF-8 string
+
+        Returns bytes(P3)/str(P2)
+    """
+   
+    if isunicode:
+        try:
+            if isinstance(path, unicode):
+                # It is already unicode, cast it as a bytes
+                # that is encoded as utf-8.
+                return path.encode('utf-8', 'strict')
+            path.decode('ascii', 'strict')
+        except:
+            encoding = 'utf8'
+            if gitConfig('git-p4.pathEncoding'):
+                encoding = gitConfig('git-p4.pathEncoding')
+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
+            if verbose:
+                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
+    else:    
+        try:
+            path.decode('ascii')
+        except:
+            encoding = 'utf8'
+            if gitConfig('git-p4.pathEncoding'):
+                encoding = gitConfig('git-p4.pathEncoding')
+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
+            if verbose:
+                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
+    return path
+
 class P4Exception(Exception):
     """ Base class for exceptions from the p4 client """
     def __init__(self, exit_code):
@@ -607,9 +737,25 @@ def isModeExecChanged(src_mode, dst_mode):
     return isModeExec(src_mode) != isModeExec(dst_mode)
 
 def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
-        errors_as_exceptions=False):
+        errors_as_exceptions=False, encode_data=True):
+    """ Executes a P4 command:  'cmd' optionally passing 'stdin' to the command's
+        standard input via a temporary file with 'stdin_mode' mode.
+
+        Output from the command is optionally passed to the callback function 'cb'.
+        If 'cb' is None, the response from the command is parsed into a list
+        of resulting dictionaries. (For each block read from the process pipe.)
+
+        If 'skip_info' is true, information in a block read that has a code type of
+        'info' will be skipped.
 
-    if isinstance(cmd,basestring):
+        If 'errors_as_exceptions' is set to true (the default is false) the error
+        code returned from the execution will generate an exception.
+
+        If 'encode_data' is set to true (the default) the data that is returned 
+        by this function will be passed through the "as_string" function.
+    """
+
+    if not isinstance(cmd, list):
         cmd = "-G " + cmd
         expand = True
     else:
@@ -626,11 +772,11 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     stdin_file = None
     if stdin is not None:
         stdin_file = tempfile.TemporaryFile(prefix='p4-stdin', mode=stdin_mode)
-        if isinstance(stdin,basestring):
+        if not isinstance(stdin, list):
             stdin_file.write(stdin)
         else:
             for i in stdin:
-                stdin_file.write(i + '\n')
+                stdin_file.write(as_bytes(i) + b'\n')
         stdin_file.flush()
         stdin_file.seek(0)
 
@@ -644,12 +790,15 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
         while True:
             entry = marshal.load(p4.stdout)
             if skip_info:
-                if 'code' in entry and entry['code'] == 'info':
+                if b'code' in entry and entry[b'code'] == b'info':
                     continue
             if cb is not None:
                 cb(entry)
             else:
-                result.append(entry)
+                out = {}
+                for key, value in entry.items():
+                    out[as_string(key)] = (as_string(value) if encode_data else value)
+                result.append(out)
     except EOFError:
         pass
     exitCode = p4.wait()
@@ -677,6 +826,7 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     return result
 
 def p4Cmd(cmd):
+    """ Executes a P4 command an returns the results in a dictionary"""
     list = p4CmdList(cmd)
     result = {}
     for entry in list:
@@ -772,6 +922,7 @@ def extractSettingsGitLog(log):
     return values
 
 def gitBranchExists(branch):
+    """Checks to see if a given branch exists in the git repo"""
     proc = subprocess.Popen(["git", "rev-parse", branch],
                             stderr=subprocess.PIPE, stdout=subprocess.PIPE);
     return proc.wait() == 0;
@@ -785,20 +936,22 @@ def gitDeleteRef(ref):
 _gitConfig = {}
 
 def gitConfig(key, typeSpecifier=None):
+    """ Return a configuration setting from GIT
+	"""
     if key not in _gitConfig:
         cmd = [ "git", "config" ]
         if typeSpecifier:
             cmd += [ typeSpecifier ]
         cmd += [ key ]
         s = read_pipe(cmd, ignore_error=True)
-        _gitConfig[key] = s.strip()
+        _gitConfig[key] = as_string(s).strip()
     return _gitConfig[key]
 
 def gitConfigBool(key):
-    """Return a bool, using git config --bool.  It is True only if the
-       variable is set to true, and False if set to false or not present
-       in the config."""
-
+    """ Return a bool, using git config --bool.  It is True only if the
+        variable is set to true, and False if set to false or not present
+        in the config.
+    """
     if key not in _gitConfig:
         _gitConfig[key] = gitConfig(key, '--bool') == "true"
     return _gitConfig[key]
@@ -822,6 +975,11 @@ def gitConfigList(key):
             _gitConfig[key] = []
     return _gitConfig[key]
 
+def gitConfigSet(key, value):
+    """ Set the git configuration key 'key' to 'value' for this session
+    """
+    _gitConfig[key] = value
+
 def p4BranchesInGit(branchesAreInRemotes=True):
     """Find all the branches whose names start with "p4/", looking
        in remotes or heads as specified by the argument.  Return
@@ -860,6 +1018,7 @@ def branch_exists(branch):
     cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     out, _ = p.communicate()
+    out = as_string(out)
     if p.returncode:
         return False
     # expect exactly one line of output: the branch name
@@ -869,7 +1028,7 @@ def findUpstreamBranchPoint(head = "HEAD"):
     branches = p4BranchesInGit()
     # map from depot-path to branch name
     branchByDepotPath = {}
-    for branch in branches.keys():
+    for branch in list(branches.keys()):
         tip = branches[branch]
         log = extractLogMessageFromGitCommit(tip)
         settings = extractSettingsGitLog(log)
@@ -940,7 +1099,8 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = "refs/remotes/p4/", silent
             system("git update-ref %s %s" % (remoteHead, originHead))
 
 def originP4BranchesExist():
-        return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
+    """Checks if origin/p4/master exists"""
+    return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
 
 
 def p4ParseNumericChangeRange(parts):
@@ -1035,7 +1195,7 @@ def p4ChangesForPaths(depotPaths, changeRange, requestedBlockSize):
     changes = sorted(changes)
     return changes
 
-def p4PathStartsWith(path, prefix):
+def p4PathStartsWith(path, prefix, verbose = False):
     # This method tries to remedy a potential mixed-case issue:
     #
     # If UserA adds  //depot/DirA/file1
@@ -1043,9 +1203,22 @@ def p4PathStartsWith(path, prefix):
     #
     # we may or may not have a problem. If you have core.ignorecase=true,
     # we treat DirA and dira as the same directory
+    
+    # Since we have to deal with mixed encodings for p4 file
+    # paths, first perform a simple startswith check, this covers
+    # the case that the formats and path are identical.
+    if as_bytes(path).startswith(as_bytes(prefix)):
+        return True
+    
+    # attempt to convert the prefix and path both to utf8
+    path_utf8 = encodeWithUTF8(path)
+    prefix_utf8 = encodeWithUTF8(prefix)
+
     if gitConfigBool("core.ignorecase"):
-        return path.lower().startswith(prefix.lower())
-    return path.startswith(prefix)
+        # Check if we match byte-per-byte.  
+        
+        return path_utf8.lower().startswith(prefix_utf8.lower())
+    return path_utf8.startswith(prefix_utf8)
 
 def getClientSpec():
     """Look at the p4 client spec, create a View() object that contains
@@ -1063,7 +1236,7 @@ def getClientSpec():
     client_name = entry["Client"]
 
     # just the keys that start with "View"
-    view_keys = [ k for k in entry.keys() if k.startswith("View") ]
+    view_keys = [ k for k in list(entry.keys()) if k.startswith("View") ]
 
     # hold this new View
     view = View(client_name)
@@ -1101,18 +1274,24 @@ def wildcard_decode(path):
     # Cannot have * in a filename in windows; untested as to
     # what p4 would do in such a case.
     if not platform.system() == "Windows":
-        path = path.replace("%2A", "*")
-    path = path.replace("%23", "#") \
-               .replace("%40", "@") \
-               .replace("%25", "%")
+        path = path.replace(b"%2A", b"*")
+    path = path.replace(b"%23", b"#") \
+               .replace(b"%40", b"@") \
+               .replace(b"%25", b"%")
     return path
 
 def wildcard_encode(path):
     # do % first to avoid double-encoding the %s introduced here
-    path = path.replace("%", "%25") \
-               .replace("*", "%2A") \
-               .replace("#", "%23") \
-               .replace("@", "%40")
+    if isinstance(path, unicode):
+        path = path.replace("%", "%25") \
+                   .replace("*", "%2A") \
+                   .replace("#", "%23") \
+                   .replace("@", "%40")
+    else:
+        path = path.replace(b"%", b"%25") \
+                   .replace(b"*", b"%2A") \
+                   .replace(b"#", b"%23") \
+                   .replace(b"@", b"%40")
     return path
 
 def wildcard_present(path):
@@ -1244,7 +1423,7 @@ def generatePointer(self, contentFile):
             ['git', 'lfs', 'pointer', '--file=' + contentFile],
             stdout=subprocess.PIPE
         )
-        pointerFile = pointerProcess.stdout.read()
+        pointerFile = as_string(pointerProcess.stdout.read())
         if pointerProcess.wait():
             os.remove(contentFile)
             die('git-lfs pointer command failed. Did you install the extension?')
@@ -1305,7 +1484,7 @@ def processContent(self, git_mode, relPath, contents):
         else:
             return LargeFileSystem.processContent(self, git_mode, relPath, contents)
 
-class Command:
+class Command(object):
     delete_actions = ( "delete", "move/delete", "purge" )
     add_actions = ( "add", "branch", "move/add" )
 
@@ -1320,7 +1499,7 @@ def ensure_value(self, attr, value):
             setattr(self, attr, value)
         return getattr(self, attr)
 
-class P4UserMap:
+class P4UserMap(object):
     def __init__(self):
         self.userMapFromPerforceServer = False
         self.myP4UserId = None
@@ -1345,10 +1524,14 @@ def p4UserIsMe(self, p4User):
             return True
 
     def getUserCacheFilename(self):
+        """ Returns the filename of the username cache """
         home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
-        return home + "/.gitp4-usercache.txt"
+        return os.path.join(home, ".gitp4-usercache.txt")
 
     def getUserMapFromPerforceServer(self):
+        """ Creates the usercache from the data in P4.
+        """
+        
         if self.userMapFromPerforceServer:
             return
         self.users = {}
@@ -1371,21 +1554,24 @@ def getUserMapFromPerforceServer(self):
                 self.emails[email] = user
 
         s = ''
-        for (key, val) in self.users.items():
+        for (key, val) in list(self.users.items()):
             s += "%s\t%s\n" % (key.expandtabs(1), val.expandtabs(1))
 
-        open(self.getUserCacheFilename(), "wb").write(s)
+        cache = io.open(self.getUserCacheFilename(), "wb")
+        cache.write(as_bytes(s))
+        cache.close()
         self.userMapFromPerforceServer = True
 
     def loadUserMapFromCache(self):
+        """ Reads the P4 username to git email map """
         self.users = {}
         self.userMapFromPerforceServer = False
         try:
-            cache = open(self.getUserCacheFilename(), "rb")
+            cache = io.open(self.getUserCacheFilename(), "rb")
             lines = cache.readlines()
             cache.close()
             for line in lines:
-                entry = line.strip().split("\t")
+                entry = as_string(line).strip().split("\t")
                 self.users[entry[0]] = entry[1]
         except IOError:
             self.getUserMapFromPerforceServer()
@@ -1585,21 +1771,27 @@ def prepareLogMessage(self, template, message, jobs):
         return result
 
     def patchRCSKeywords(self, file, pattern):
-        # Attempt to zap the RCS keywords in a p4 controlled file matching the given pattern
+        """ Attempt to zap the RCS keywords in a p4 
+            controlled file matching the given pattern
+        """
+        bSubLine = as_bytes(r'$\1$')
         (handle, outFileName) = tempfile.mkstemp(dir='.')
         try:
-            outFile = os.fdopen(handle, "w+")
-            inFile = open(file, "r")
-            regexp = re.compile(pattern, re.VERBOSE)
+            outFile = os.fdopen(handle, "w+b")
+            inFile = open(file, "rb")
+            regexp = re.compile(as_bytes(pattern), re.VERBOSE)
             for line in inFile.readlines():
-                line = regexp.sub(r'$\1$', line)
+                line = regexp.sub(bSubLine, line)
                 outFile.write(line)
             inFile.close()
             outFile.close()
+            outFile = None
             # Forcibly overwrite the original file
             os.unlink(file)
             shutil.move(outFileName, file)
         except:
+            if outFile != None:
+                outFile.close()
             # cleanup our temporary file
             os.unlink(outFileName)
             print("Failed to strip RCS keywords in %s" % file)
@@ -1722,14 +1914,14 @@ def prepareSubmitTemplate(self, changelist=None):
                 break
         if not change_entry:
             die('Failed to decode output of p4 change -o')
-        for key, value in change_entry.iteritems():
+        for key, value in list(change_entry.items()):
             if key.startswith('File'):
                 if 'depot-paths' in settings:
                     if not [p for p in settings['depot-paths']
-                            if p4PathStartsWith(value, p)]:
+                            if p4PathStartsWith(value, p, self.verbose)]:
                         continue
                 else:
-                    if not p4PathStartsWith(value, self.depotPath):
+                    if not p4PathStartsWith(value, self.depotPath, self.verbose):
                         continue
                 files_list.append(value)
                 continue
@@ -1779,7 +1971,8 @@ def edit_template(self, template_file):
             return True
 
         while True:
-            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
+            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
+                .strip()[0]
             if response == 'y':
                 return True
             if response == 'n':
@@ -1817,8 +2010,8 @@ def get_diff_description(self, editedFiles, filesToAdd, symlinks):
     def applyCommit(self, id):
         """Apply one commit, return True if it succeeded."""
 
-        print("Applying", read_pipe(["git", "show", "-s",
-                                     "--format=format:%h %s", id]))
+        print(("Applying", read_pipe(["git", "show", "-s",
+                                     "--format=format:%h %s", id])))
 
         (p4User, gitEmail) = self.p4UserForCommit(id)
 
@@ -1939,8 +2132,23 @@ def applyCommit(self, id):
                     # disable the read-only bit on windows.
                     if self.isWindows and file not in editedFiles:
                         os.chmod(file, stat.S_IWRITE)
-                    self.patchRCSKeywords(file, kwfiles[file])
-                    fixed_rcs_keywords = True
+                    
+                    try:
+                        self.patchRCSKeywords(file, kwfiles[file])
+                        fixed_rcs_keywords = True
+                    except:
+                        # We are throwing an exception, undo all open edits
+                        for f in editedFiles:
+                            p4_revert(f)
+                        raise
+            else:
+                # They do not have attemptRCSCleanup set, this might be the fail point
+                # Check to see if the file has RCS keywords and suggest setting the property.
+                for file in editedFiles | filesToDelete:
+                    if p4_keywords_regexp_for_file(file) != None:
+                        print("At least one file in this commit has RCS Keywords that may be causing problems. ")
+                        print("Consider:\ngit config git-p4.attemptRCSCleanup true")
+                        break
 
             if fixed_rcs_keywords:
                 print("Retrying the patch with RCS keywords cleaned up")
@@ -1966,7 +2174,7 @@ def applyCommit(self, id):
             p4_delete(f)
 
         # Set/clear executable bits
-        for f in filesToChangeExecBit.keys():
+        for f in list(filesToChangeExecBit.keys()):
             mode = filesToChangeExecBit[f]
             setP4ExecBit(f, mode)
 
@@ -2003,7 +2211,7 @@ def applyCommit(self, id):
         tmpFile = os.fdopen(handle, "w+b")
         if self.isWindows:
             submitTemplate = submitTemplate.replace("\n", "\r\n")
-        tmpFile.write(submitTemplate)
+        tmpFile.write(as_bytes(submitTemplate))
         tmpFile.close()
 
         if self.prepare_p4_only:
@@ -2053,8 +2261,8 @@ def applyCommit(self, id):
                 message = tmpFile.read()
                 tmpFile.close()
                 if self.isWindows:
-                    message = message.replace("\r\n", "\n")
-                submitTemplate = message[:message.index(separatorLine)]
+                    message = message.replace(b"\r\n", b"\n")
+                submitTemplate = message[:message.index(as_bytes(separatorLine))]
 
                 if update_shelve:
                     p4_write_pipe(['shelve', '-r', '-i'], submitTemplate)
@@ -2164,6 +2372,50 @@ def exportGitTags(self, gitTags):
                 if verbose:
                     print("created p4 label for tag %s" % name)
 
+    def run_hook(self, hook_name, args = []):
+        """ Runs a hook if it is found.
+
+            Returns NONE if the hook does not exist
+            Returns TRUE if the exit code is 0, FALSE for a non-zero exit code.
+        """
+        hook_file = self.find_hook(hook_name)
+        if hook_file == None:
+            if self.verbose:
+                print("Skipping hook: %s" % hook_name)
+            return None
+
+        if self.verbose:
+            print("hooks_path = %s " % hooks_path)
+            print("hook_file = %s " % hook_file)
+
+        # Run the hook
+        # TODO - allow non-list format
+        cmd = [hook_file] + args
+        return subprocess.call(cmd) == 0
+
+    def find_hook(self, hook_name):
+        """ Locates the hook file for the given operating system.
+        """
+        hooks_path = gitConfig("core.hooksPath")
+        if len(hooks_path) <= 0:
+            hooks_path = os.path.join(os.environ.get("GIT_DIR", ".git"), "hooks")
+
+        # Look in the obvious place
+        hook_file = os.path.join(hooks_path, hook_name)
+        if os.path.isfile(hook_file) and os.access(hook_file, os.X_OK):
+            return hook_file
+
+        # if we are windows, we will also allow them to have the hooks have extensions
+        if (platform.system() == "Windows"):
+            for ext in ['.exe', '.bat', 'ps1']:
+                if os.path.isfile(hook_file + ext) and os.access(hook_file + ext, os.X_OK):
+                    return hook_file + ext
+
+        # We didn't find the file
+        return None
+
+
+
     def run(self, args):
         if len(args) == 0:
             self.master = currentGitBranch()
@@ -2219,7 +2471,7 @@ def run(self, args):
             self.clientSpecDirs = getClientSpec()
 
         # Check for the existence of P4 branches
-        branchesDetected = (len(p4BranchesInGit().keys()) > 1)
+        branchesDetected = (len(list(p4BranchesInGit().keys())) > 1)
 
         if self.useClientSpec and not branchesDetected:
             # all files are relative to the client spec
@@ -2314,12 +2566,8 @@ def run(self, args):
             sys.exit("number of commits (%d) must match number of shelved changelist (%d)" %
                      (len(commits), num_shelves))
 
-        hooks_path = gitConfig("core.hooksPath")
-        if len(hooks_path) <= 0:
-            hooks_path = os.path.join(os.environ.get("GIT_DIR", ".git"), "hooks")
-
-        hook_file = os.path.join(hooks_path, "p4-pre-submit")
-        if os.path.isfile(hook_file) and os.access(hook_file, os.X_OK) and subprocess.call([hook_file]) != 0:
+        rtn = self.run_hook("p4-pre-submit")
+        if rtn == False:
             sys.exit(1)
 
         #
@@ -2332,8 +2580,8 @@ def run(self, args):
         last = len(commits) - 1
         for i, commit in enumerate(commits):
             if self.dry_run:
-                print(" ", read_pipe(["git", "show", "-s",
-                                      "--format=format:%h %s", commit]))
+                print((" ", read_pipe(["git", "show", "-s",
+                                      "--format=format:%h %s", commit])))
                 ok = True
             else:
                 ok = self.applyCommit(commit)
@@ -2351,7 +2599,7 @@ def run(self, args):
                         if self.conflict_behavior == "ask":
                             print("What do you want to do?")
                             response = raw_input("[s]kip this commit but apply"
-                                                 " the rest, or [q]uit? ")
+                                                 " the rest, or [q]uit? ").lower().strip()[0]
                             if not response:
                                 continue
                         elif self.conflict_behavior == "skip":
@@ -2403,8 +2651,8 @@ def run(self, args):
                         star = "*"
                     else:
                         star = " "
-                    print(star, read_pipe(["git", "show", "-s",
-                                           "--format=format:%h %s",  c]))
+                    print((star, read_pipe(["git", "show", "-s",
+                                           "--format=format:%h %s",  c])))
                 print("You will have to do 'git p4 sync' and rebase.")
 
         if gitConfigBool("git-p4.exportLabels"):
@@ -2533,6 +2781,7 @@ def cloneExcludeCallback(option, opt_str, value, parser):
     # ("-//depot/A/..." becomes "/depot/A/..." after option parsing)
     parser.values.cloneExclude += ["/" + re.sub(r"\.\.\.$", "", value)]
 
+
 class P4Sync(Command, P4UserMap):
 
     def __init__(self):
@@ -2610,7 +2859,7 @@ def __init__(self):
         self.knownBranches = {}
         self.initialParents = {}
 
-        self.tz = "%+03d%02d" % (- time.timezone / 3600, ((- time.timezone % 3600) / 60))
+        self.tz = "%+03d%02d" % (- time.timezone // 3600, ((- time.timezone % 3600) // 60))
         self.labels = {}
 
     # Force a checkpoint in fast-import and wait for it to finish
@@ -2624,17 +2873,23 @@ def checkpoint(self):
     def isPathWanted(self, path):
         for p in self.cloneExclude:
             if p.endswith("/"):
-                if p4PathStartsWith(path, p):
+                if p4PathStartsWith(path, p, self.verbose):
                     return False
             # "-//depot/file1" without a trailing "/" should only exclude "file1", but not "file111" or "file1_dir/file2"
             elif path.lower() == p.lower():
                 return False
         for p in self.depotPaths:
-            if p4PathStartsWith(path, p):
+            if p4PathStartsWith(path, p, self.verbose):
                 return True
         return False
 
     def extractFilesFromCommit(self, commit, shelved=False, shelved_cl = 0):
+        """ Generates the list of files to be added in this git commit.
+
+            commit     = Unicode[] - data read from the P4 commit
+            shelved    = Bool      - Is the P4 commit flagged as being shelved.
+            shelved_cl = Unicode   - Numeric string with the changelist number.
+        """
         files = []
         fnum = 0
         while "depotFile%s" % fnum in commit:
@@ -2676,7 +2931,7 @@ def stripRepoPath(self, path, prefixes):
             path = self.clientSpecDirs.map_in_client(path)
             if self.detectBranches:
                 for b in self.knownBranches:
-                    if p4PathStartsWith(path, b + "/"):
+                    if p4PathStartsWith(path, b + "/", self.verbose):
                         path = path[len(b)+1:]
 
         elif self.keepRepoPath:
@@ -2684,12 +2939,12 @@ def stripRepoPath(self, path, prefixes):
             # //depot/; just look at first prefix as they all should
             # be in the same depot.
             depot = re.sub("^(//[^/]+/).*", r'\1', prefixes[0])
-            if p4PathStartsWith(path, depot):
+            if p4PathStartsWith(path, depot, self.verbose):
                 path = path[len(depot):]
 
         else:
             for p in prefixes:
-                if p4PathStartsWith(path, p):
+                if p4PathStartsWith(path, p, self.verbose):
                     path = path[len(p):]
                     break
 
@@ -2697,8 +2952,11 @@ def stripRepoPath(self, path, prefixes):
         return path
 
     def splitFilesIntoBranches(self, commit):
-        """Look at each depotFile in the commit to figure out to what
-           branch it belongs."""
+        """ Look at each depotFile in the commit to figure out to what
+            branch it belongs.
+
+            Data in the commit will NOT be encoded
+        """
 
         if self.clientSpecDirs:
             files = self.extractFilesFromCommit(commit)
@@ -2727,10 +2985,10 @@ def splitFilesIntoBranches(self, commit):
             else:
                 relPath = self.stripRepoPath(path, self.depotPaths)
 
-            for branch in self.knownBranches.keys():
+            for branch in list(self.knownBranches.keys()):
                 # add a trailing slash so that a commit into qt/4.2foo
                 # doesn't end up in qt/4.2, e.g.
-                if p4PathStartsWith(relPath, branch + "/"):
+                if p4PathStartsWith(relPath, branch + "/", self.verbose):
                     if branch not in branches:
                         branches[branch] = []
                     branches[branch].append(file)
@@ -2739,36 +2997,34 @@ def splitFilesIntoBranches(self, commit):
         return branches
 
     def writeToGitStream(self, gitMode, relPath, contents):
-        self.gitStream.write('M %s inline %s\n' % (gitMode, relPath))
+        """ Writes the bytes[] 'contents' to the git fast-import
+            with the given 'gitMode' and 'relPath' as the relative
+            path.
+        """
+        self.gitStream.write('M %s inline %s\n' % (gitMode, as_string(relPath)))
         self.gitStream.write('data %d\n' % sum(len(d) for d in contents))
         for d in contents:
-            self.gitStream.write(d)
+            self.gitStreamBytes.write(d)
         self.gitStream.write('\n')
 
-    def encodeWithUTF8(self, path):
-        try:
-            path.decode('ascii')
-        except:
-            encoding = 'utf8'
-            if gitConfig('git-p4.pathEncoding'):
-                encoding = gitConfig('git-p4.pathEncoding')
-            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
-            if self.verbose:
-                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
-        return path
-
-    # output one file from the P4 stream
-    # - helper for streamP4Files
-
     def streamOneP4File(self, file, contents):
+        """ output one file from the P4 stream to the git inbound stream.
+            helper for streamP4files.
+
+            contents should be a bytes (bytes) 
+        """
         relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
+        relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
             if 'fileSize' in self.stream_file:
                 size = int(self.stream_file['fileSize'])
             else:
                 size = 0 # deleted files don't get a fileSize apparently
-            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024))
+            #if isunicode:
+            #    sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), to_unicode(relPath), size//1024//1024))
+            #else:
+            #    sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), relPath, size//1024//1024))
+            sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), as_string(relPath), size//1024//1024))
             sys.stdout.flush()
 
         (type_base, type_mods) = split_p4_type(file["type"])
@@ -2786,7 +3042,7 @@ def streamOneP4File(self, file, contents):
                 # to nothing.  This causes p4 errors when checking out such
                 # a change, and errors here too.  Work around it by ignoring
                 # the bad symlink; hopefully a future change fixes it.
-                print("\nIgnoring empty symlink in %s" % file['depotFile'])
+                print("\nIgnoring empty symlink in %s" % path_as_string(file['depotFile']))
                 return
             elif data[-1] == '\n':
                 contents = [data[:-1]]
@@ -2826,16 +3082,16 @@ def streamOneP4File(self, file, contents):
             # Ideally, someday, this script can learn how to generate
             # appledouble files directly and import those to git, but
             # non-mac machines can never find a use for apple filetype.
-            print("\nIgnoring apple filetype file %s" % file['depotFile'])
+            print("\nIgnoring apple filetype file %s" % path_as_string(file['depotFile']))
             return
 
         # Note that we do not try to de-mangle keywords on utf16 files,
         # even though in theory somebody may want that.
-        pattern = p4_keywords_regexp_for_type(type_base, type_mods)
+        pattern = as_bytes(p4_keywords_regexp_for_type(type_base, type_mods))
         if pattern:
             regexp = re.compile(pattern, re.VERBOSE)
-            text = ''.join(contents)
-            text = regexp.sub(r'$\1$', text)
+            text = b''.join(contents)
+            text = regexp.sub(as_bytes(r'$\1$'), text)
             contents = [ text ]
 
         if self.largeFileSystem:
@@ -2845,7 +3101,7 @@ def streamOneP4File(self, file, contents):
 
     def streamOneP4Deletion(self, file):
         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
+        relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
             sys.stdout.write("delete %s\n" % relPath)
             sys.stdout.flush()
@@ -2854,21 +3110,25 @@ def streamOneP4Deletion(self, file):
         if self.largeFileSystem and self.largeFileSystem.isLargeFile(relPath):
             self.largeFileSystem.removeLargeFile(relPath)
 
-    # handle another chunk of streaming data
     def streamP4FilesCb(self, marshalled):
+        """ Callback function for recording P4 chunks of data for streaming 
+            into GIT.
+
+            marshalled data is bytes[] from the caller
+        """
 
         # catch p4 errors and complain
         err = None
-        if "code" in marshalled:
-            if marshalled["code"] == "error":
-                if "data" in marshalled:
-                    err = marshalled["data"].rstrip()
+        if b"code" in marshalled:
+            if marshalled[b"code"] == b"error":
+                if b"data" in marshalled:
+                    err = marshalled[b"data"].rstrip()
 
         if not err and 'fileSize' in self.stream_file:
             required_bytes = int((4 * int(self.stream_file["fileSize"])) - calcDiskFree())
             if required_bytes > 0:
                 err = 'Not enough space left on %s! Free at least %i MB.' % (
-                    os.getcwd(), required_bytes/1024/1024
+                    os.getcwd(), required_bytes//1024//1024
                 )
 
         if err:
@@ -2884,11 +3144,11 @@ def streamP4FilesCb(self, marshalled):
             # ignore errors, but make sure it exits first
             self.importProcess.wait()
             if f:
-                die("Error from p4 print for %s: %s" % (f, err))
+                die("Error from p4 print for %s: %s" % (path_as_string(f), err))
             else:
                 die("Error from p4 print: %s" % err)
 
-        if 'depotFile' in marshalled and self.stream_have_file_info:
+        if b'depotFile' in marshalled and self.stream_have_file_info:
             # start of a new file - output the old one first
             self.streamOneP4File(self.stream_file, self.stream_contents)
             self.stream_file = {}
@@ -2897,14 +3157,17 @@ def streamP4FilesCb(self, marshalled):
 
         # pick up the new file information... for the
         # 'data' field we need to append to our array
-        for k in marshalled.keys():
-            if k == 'data':
+        for k in list(marshalled.keys()):
+            if k == b'data':
                 if 'streamContentSize' not in self.stream_file:
                     self.stream_file['streamContentSize'] = 0
-                self.stream_file['streamContentSize'] += len(marshalled['data'])
-                self.stream_contents.append(marshalled['data'])
+                self.stream_file['streamContentSize'] += len(marshalled[b'data'])
+                self.stream_contents.append(marshalled[b'data'])
             else:
-                self.stream_file[k] = marshalled[k]
+                if k == b'depotFile':
+                    self.stream_file[as_string(k)] = marshalled[k]
+                else:
+                    self.stream_file[as_string(k)] = as_string(marshalled[k])
 
         if (verbose and
             'streamContentSize' in self.stream_file and
@@ -2912,14 +3175,15 @@ def streamP4FilesCb(self, marshalled):
             'depotFile' in self.stream_file):
             size = int(self.stream_file["fileSize"])
             if size > 0:
-                progress = 100*self.stream_file['streamContentSize']/size
-                sys.stdout.write('\r%s %d%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size/1024/1024)))
+                progress = 100.0*self.stream_file['streamContentSize']/size
+                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (path_as_string(self.stream_file['depotFile']), progress, int(size//1024//1024)))
                 sys.stdout.flush()
 
         self.stream_have_file_info = True
 
-    # Stream directly from "p4 files" into "git fast-import"
     def streamP4Files(self, files):
+        """ Stream directly from "p4 files" into "git fast-import" 
+        """
         filesForCommit = []
         filesToRead = []
         filesToDelete = []
@@ -2940,7 +3204,7 @@ def streamP4Files(self, files):
             self.stream_contents = []
             self.stream_have_file_info = False
 
-            # curry self argument
+            # Callback for P4 command to collect file content
             def streamP4FilesCbSelf(entry):
                 self.streamP4FilesCb(entry)
 
@@ -2949,9 +3213,9 @@ def streamP4FilesCbSelf(entry):
                 if 'shelved_cl' in f:
                     # Handle shelved CLs using the "p4 print file@=N" syntax to print
                     # the contents
-                    fileArg = '%s@=%d' % (f['path'], f['shelved_cl'])
+                    fileArg = b'%s@=%d' % (f['path'], as_bytes(f['shelved_cl']))
                 else:
-                    fileArg = '%s#%s' % (f['path'], f['rev'])
+                    fileArg = b'%s#%s' % (f['path'], as_bytes(f['rev']))
 
                 fileArgs.append(fileArg)
 
@@ -2971,7 +3235,7 @@ def make_email(self, userid):
 
     def streamTag(self, gitStream, labelName, labelDetails, commit, epoch):
         """ Stream a p4 tag.
-        commit is either a git commit, or a fast-import mark, ":<p4commit>"
+            commit is either a git commit, or a fast-import mark, ":<p4commit>"
         """
 
         if verbose:
@@ -2994,7 +3258,7 @@ def streamTag(self, gitStream, labelName, labelDetails, commit, epoch):
 
         gitStream.write("tagger %s\n" % tagger)
 
-        print("labelDetails=",labelDetails)
+        print(("labelDetails=",labelDetails))
         if 'Description' in labelDetails:
             description = labelDetails['Description']
         else:
@@ -3016,7 +3280,7 @@ def hasBranchPrefix(self, path):
         if not self.branchPrefixes:
             return True
         hasPrefix = [p for p in self.branchPrefixes
-                        if p4PathStartsWith(path, p)]
+                        if p4PathStartsWith(path, p, self.verbose)]
         if not hasPrefix and self.verbose:
             print('Ignoring file outside of prefix: {0}'.format(path))
         return hasPrefix
@@ -3043,7 +3307,22 @@ def commit(self, details, files, branch, parent = "", allow_empty=False):
                 .format(details['change']))
             return
 
+        # fast-import:
+        #'commit' SP <ref> LF
+	    #mark?
+	    #original-oid?
+	    #('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
+	    #'committer' (SP <name>)? SP LT <email> GT SP <when> LF
+	    #('encoding' SP <encoding>)?
+	    #data
+	    #('from' SP <commit-ish> LF)?
+	    #('merge' SP <commit-ish> LF)*
+	    #(filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
+	    #LF?
+        
+        #'commit' - <ref> is the name of the branch to make the commit on
         self.gitStream.write("commit %s\n" % branch)
+        #'mark' SP :<idnum>
         self.gitStream.write("mark :%s\n" % details["change"])
         self.committedChanges.add(int(details["change"]))
         committer = ""
@@ -3053,19 +3332,29 @@ def commit(self, details, files, branch, parent = "", allow_empty=False):
 
         self.gitStream.write("committer %s\n" % committer)
 
-        self.gitStream.write("data <<EOT\n")
-        self.gitStream.write(details["desc"])
+        # Per https://git-scm.com/docs/git-fast-import
+        # The preferred method for creating the commit message is to supply the 
+        # byte count in the data method and not to use a Delimited format. 
+        # Collect all the text in the commit message into a single string and 
+        # compute the byte count.
+        commitText = details["desc"]
         if len(jobs) > 0:
-            self.gitStream.write("\nJobs: %s" % (' '.join(jobs)))
-
+            commitText += "\nJobs: %s" % (' '.join(jobs))
         if not self.suppress_meta_comment:
-            self.gitStream.write("\n[git-p4: depot-paths = \"%s\": change = %s" %
-                                (','.join(self.branchPrefixes), details["change"]))
-            if len(details['options']) > 0:
-                self.gitStream.write(": options = %s" % details['options'])
-            self.gitStream.write("]\n")
+            # coherce the path to the correct formatting in the branch prefixes as well.
+            dispPaths = []
+            for p in self.branchPrefixes:
+                dispPaths += [path_as_string(p)]
 
-        self.gitStream.write("EOT\n\n")
+            commitText += ("\n[git-p4: depot-paths = \"%s\": change = %s" %
+                                (','.join(dispPaths), details["change"]))
+            if len(details['options']) > 0:
+                commitText += (": options = %s" % details['options'])
+            commitText += "]"
+        commitText += "\n" 
+        self.gitStream.write("data %s\n" % len(as_bytes(commitText)))
+        self.gitStream.write(commitText)
+        self.gitStream.write("\n")
 
         if len(parent) > 0:
             if self.verbose:
@@ -3133,7 +3422,7 @@ def getLabels(self):
             self.labels[newestChange] = [output, revisions]
 
         if self.verbose:
-            print("Label changes: %s" % self.labels.keys())
+            print("Label changes: %s" % list(self.labels.keys()))
 
     # Import p4 labels as git tags. A direct mapping does not
     # exist, so assume that if all the files are at the same revision
@@ -3234,7 +3523,7 @@ def getBranchMapping(self):
                 source = paths[0]
                 destination = paths[1]
                 ## HACK
-                if p4PathStartsWith(source, self.depotPaths[0]) and p4PathStartsWith(destination, self.depotPaths[0]):
+                if p4PathStartsWith(source, self.depotPaths[0], self.verbose) and p4PathStartsWith(destination, self.depotPaths[0], self.verbose):
                     source = source[len(self.depotPaths[0]):-4]
                     destination = destination[len(self.depotPaths[0]):-4]
 
@@ -3276,7 +3565,7 @@ def getBranchMapping(self):
 
     def getBranchMappingFromGitBranches(self):
         branches = p4BranchesInGit(self.importIntoRemotes)
-        for branch in branches.keys():
+        for branch in list(branches.keys()):
             if branch == "master":
                 branch = "main"
             else:
@@ -3388,14 +3677,14 @@ def importChanges(self, changes, origin_revision=0):
             self.updateOptionDict(description)
 
             if not self.silent:
-                sys.stdout.write("\rImporting revision %s (%s%%)" % (change, cnt * 100 / len(changes)))
+                sys.stdout.write("\rImporting revision %s (%4.1f%%)" % (change, cnt * 100 / len(changes)))
                 sys.stdout.flush()
             cnt = cnt + 1
 
             try:
                 if self.detectBranches:
                     branches = self.splitFilesIntoBranches(description)
-                    for branch in branches.keys():
+                    for branch in list(branches.keys()):
                         ## HACK  --hwn
                         branchPrefix = self.depotPaths[0] + branch + "/"
                         self.branchPrefixes = [ branchPrefix ]
@@ -3464,6 +3753,7 @@ def importChanges(self, changes, origin_revision=0):
                 sys.exit(1)
 
     def sync_origin_only(self):
+        """ Ensures that the origin has been synchronized if one is set """
         if self.syncWithOrigin:
             self.hasOrigin = originP4BranchesExist()
             if self.hasOrigin:
@@ -3472,30 +3762,35 @@ def sync_origin_only(self):
                 system("git fetch origin")
 
     def importHeadRevision(self, revision):
-        print("Doing initial import of %s from revision %s into %s" % (' '.join(self.depotPaths), revision, self.branch))
-
+        # Re-encode depot text
+        dispPaths = []
+        utf8Paths = []
+        for p in self.depotPaths:
+            dispPaths += [path_as_string(p)]
+        print("Doing initial import of %s from revision %s into %s" % (' '.join(dispPaths), revision, self.branch))
         details = {}
         details["user"] = "git perforce import user"
-        details["desc"] = ("Initial import of %s from the state at revision %s\n"
-                           % (' '.join(self.depotPaths), revision))
+        details["desc"] = ("Initial import of %s from the state at revision %s\n" %
+                           (' '.join(dispPaths), revision))
         details["change"] = revision
         newestRevision = 0
+        del dispPaths
 
         fileCnt = 0
         fileArgs = ["%s...%s" % (p,revision) for p in self.depotPaths]
 
-        for info in p4CmdList(["files"] + fileArgs):
+        for info in p4CmdList(["files"] + fileArgs, encode_data = False):
 
-            if 'code' in info and info['code'] == 'error':
+            if 'code' in info and info['code'] == b'error':
                 sys.stderr.write("p4 returned an error: %s\n"
-                                 % info['data'])
-                if info['data'].find("must refer to client") >= 0:
+                                 % as_string(info['data']))
+                if info['data'].find(b"must refer to client") >= 0:
                     sys.stderr.write("This particular p4 error is misleading.\n")
                     sys.stderr.write("Perhaps the depot path was misspelled.\n");
                     sys.stderr.write("Depot path:  %s\n" % " ".join(self.depotPaths))
                 sys.exit(1)
             if 'p4ExitCode' in info:
-                sys.stderr.write("p4 exitcode: %s\n" % info['p4ExitCode'])
+                sys.stderr.write("p4 exitcode: %s\n" % as_string(info['p4ExitCode']))
                 sys.exit(1)
 
 
@@ -3508,8 +3803,10 @@ def importHeadRevision(self, revision):
                 #fileCnt = fileCnt + 1
                 continue
 
+            # Save all the file information, howerver do not translate the depotFile name at 
+            # this time. Leave that as bytes since the encoding may vary.
             for prop in ["depotFile", "rev", "action", "type" ]:
-                details["%s%s" % (prop, fileCnt)] = info[prop]
+                details["%s%s" % (prop, fileCnt)] = (info[prop] if prop == "depotFile" else as_string(info[prop]))
 
             fileCnt = fileCnt + 1
 
@@ -3529,13 +3826,18 @@ def importHeadRevision(self, revision):
             print(self.gitError.read())
 
     def openStreams(self):
+        """ Opens the fast import pipes.  Note that the git* streams are wrapped
+            to expect Unicode text.  To send a raw byte Array, use the importProcess
+            underlying port
+        """
         self.importProcess = subprocess.Popen(["git", "fast-import"],
                                               stdin=subprocess.PIPE,
                                               stdout=subprocess.PIPE,
                                               stderr=subprocess.PIPE);
-        self.gitOutput = self.importProcess.stdout
-        self.gitStream = self.importProcess.stdin
-        self.gitError = self.importProcess.stderr
+        self.gitOutput = Py23File(self.importProcess.stdout, verbose = self.verbose)
+        self.gitStream = Py23File(self.importProcess.stdin, verbose = self.verbose)
+        self.gitError = Py23File(self.importProcess.stderr, verbose = self.verbose)
+        self.gitStreamBytes = self.importProcess.stdin
 
     def closeStreams(self):
         self.gitStream.close()
@@ -3584,13 +3886,13 @@ def run(self, args):
                 if short in branches:
                     self.p4BranchesInGit = [ short ]
             else:
-                self.p4BranchesInGit = branches.keys()
+                self.p4BranchesInGit = list(branches.keys())
 
             if len(self.p4BranchesInGit) > 1:
                 if not self.silent:
                     print("Importing from/into multiple branches")
                 self.detectBranches = True
-                for branch in branches.keys():
+                for branch in list(branches.keys()):
                     self.initialParents[self.refPrefix + branch] = \
                         branches[branch]
 
@@ -3870,19 +4172,25 @@ def __init__(self):
                                  help="where to leave result of the clone"),
             optparse.make_option("--bare", dest="cloneBare",
                                  action="store_true", default=False),
+            optparse.make_option("--encoding", dest="setPathEncoding",
+                                 action="store", default=None,
+                                 help="Sets the path encoding for this depot")
         ]
         self.cloneDestination = None
         self.needsGit = False
         self.cloneBare = False
+        self.setPathEncoding = None
 
     def defaultDestination(self, args):
+        """Returns the last path component as the default git 
+        repository directory name"""
         ## TODO: use common prefix of args?
         depotPath = args[0]
         depotDir = re.sub("(@[^@]*)$", "", depotPath)
         depotDir = re.sub("(#[^#]*)$", "", depotDir)
         depotDir = re.sub(r"\.\.\.$", "", depotDir)
         depotDir = re.sub(r"/$", "", depotDir)
-        return os.path.split(depotDir)[1]
+        return depotDir.split('/')[-1]
 
     def run(self, args):
         if len(args) < 1:
@@ -3894,19 +4202,29 @@ def run(self, args):
 
         depotPaths = args
 
+        # If we have an encoding provided, ignore what may already exist
+        # in the registry. This will ensure we show the displayed values
+        # using the correct encoding.
+        if self.setPathEncoding:
+            gitConfigSet("git-p4.pathEncoding", self.setPathEncoding)
+
+        # If more than 1 path element is supplied, the last element
+        # is the clone destination.
         if not self.cloneDestination and len(depotPaths) > 1:
             self.cloneDestination = depotPaths[-1]
             depotPaths = depotPaths[:-1]
 
+        dispPaths = []
         for p in depotPaths:
             if not p.startswith("//"):
                 sys.stderr.write('Depot paths must start with "//": %s\n' % p)
                 return False
+            dispPaths += [path_as_string(p)]
 
         if not self.cloneDestination:
             self.cloneDestination = self.defaultDestination(args)
 
-        print("Importing from %s into %s" % (', '.join(depotPaths), self.cloneDestination))
+        print("Importing from %s into %s" % (', '.join(dispPaths), path_as_string(self.cloneDestination)))
 
         if not os.path.exists(self.cloneDestination):
             os.makedirs(self.cloneDestination)
@@ -3919,6 +4237,13 @@ def run(self, args):
         if retcode:
             raise CalledProcessError(retcode, init_cmd)
 
+        # Set the encoding if it was provided command line
+        if self.setPathEncoding:
+            init_cmd= ["git", "config", "git-p4.pathEncoding", self.setPathEncoding]
+            retcode = subprocess.call(init_cmd)
+            if retcode:
+                raise CalledProcessError(retcode, init_cmd)
+
         if not P4Sync.run(self, depotPaths):
             return False
 
@@ -3974,7 +4299,7 @@ def findLastP4Revision(self, starting_point):
             to find the P4 commit we are based on, and the depot-paths.
         """
 
-        for parent in (range(65535)):
+        for parent in (list(range(65535))):
             log = extractLogMessageFromGitCommit("{0}^{1}".format(starting_point, parent))
             settings = extractSettingsGitLog(log)
             if 'change' in settings:
@@ -4080,6 +4405,107 @@ def run(self, args):
             print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
         return True
 
+class Py23File():
+    """ Python2/3 Unicode File Wrapper 
+    """
+    
+    stream_handle = None
+    verbose       = False
+    debug_handle  = None
+   
+    def __init__(self, stream_handle, verbose = False,
+                 debug_handle = None):
+        """ Create a Python3 compliant Unicode to Byte String
+            Windows compatible wrapper
+
+            stream_handle = the underlying file-like handle
+            verbose       = Boolean if content should be echoed
+            debug_handle  = A file-like handle data is duplicately written to
+        """
+        self.stream_handle = stream_handle
+        self.verbose       = verbose
+        self.debug_handle  = debug_handle
+
+    def write(self, utf8string):
+        """ Writes the utf8 encoded string to the underlying 
+            file stream
+        """
+        self.stream_handle.write(as_bytes(utf8string))
+        if self.verbose:
+            sys.stderr.write("Stream Output: %s" % utf8string)
+            sys.stderr.flush()
+        if self.debug_handle:
+            self.debug_handle.write(as_bytes(utf8string))
+
+    def read(self, size = None):
+        """ Reads int charcters from the underlying stream 
+            and converts it to utf8.
+
+            Be aware, the size value is for reading the underlying
+            bytes so the value may be incorrect. Usage of the size
+            value is discouraged.
+        """
+        if size == None:
+            return as_string(self.stream_handle.read())
+        else:
+            return as_string(self.stream_handle.read(size))
+
+    def readline(self):
+        """ Reads a line from the underlying byte stream 
+            and converts it to utf8
+        """
+        return as_string(self.stream_handle.readline())
+
+    def readlines(self, sizeHint = None):
+        """ Returns a list containing lines from the file converted to unicode.
+
+            sizehint - Optional. If the optional sizehint argument is 
+            present, instead of reading up to EOF, whole lines totalling 
+            approximately sizehint bytes are read.
+        """
+        lines = self.stream_handle.readlines(sizeHint)
+        for i in range(0, len(lines)):
+            lines[i] = as_string(lines[i])
+        return lines
+
+    def close(self):
+        """ Closes the underlying byte stream """
+        self.stream_handle.close()
+
+    def flush(self):
+        """ Flushes the underlying byte stream """
+        self.stream_handle.flush()
+
+class DepotPath():
+    """ Describes a DepotPath or File
+    """
+
+    raw_path = None
+    utf8_path = None
+    bytes_path = None
+
+    def __init__(self, path):
+        """ Creates a new DepotPath with the path encoded
+            with by the P4 repository
+        """
+        raw_path = path
+
+    def raw():
+        """ Returns the path as it was originally found
+            in the P4 repository
+        """
+        return raw_path
+
+    def startswith(self, prefix, start = None, end = None):
+        """ Return True if string starts with the prefix, otherwise 
+            return False. prefix can also be a tuple of prefixes to 
+            look for. With optional start, test string beginning at 
+            that position. With optional end, stop comparing 
+            string at that position.
+        """
+        return raw_path.startswith(prefix, start, end)
+
+
 class HelpFormatter(optparse.IndentedHelpFormatter):
     def __init__(self):
         optparse.IndentedHelpFormatter.__init__(self)
@@ -4113,7 +4539,7 @@ def printUsage(commands):
 
 def main():
     if len(sys.argv[1:]) == 0:
-        printUsage(commands.keys())
+        printUsage(list(commands.keys()))
         sys.exit(2)
 
     cmdName = sys.argv[1]
@@ -4123,7 +4549,7 @@ def main():
     except KeyError:
         print("unknown command %s" % cmdName)
         print("")
-        printUsage(commands.keys())
+        printUsage(list(commands.keys()))
         sys.exit(2)
 
     options = cmd.options
@@ -4140,7 +4566,12 @@ def main():
                                    description = cmd.description,
                                    formatter = HelpFormatter())
 
-    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
+    try:
+        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
+    except:
+        parser.print_help()
+        raise
+
     global verbose
     verbose = cmd.verbose
     if cmd.needsGit:
@@ -4155,8 +4586,8 @@ def main():
                         chdir(cdup);
 
         if not isValidGitDir(cmd.gitdir):
-            if isValidGitDir(cmd.gitdir + "/.git"):
-                cmd.gitdir += "/.git"
+            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
+                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
             else:
                 die("fatal: cannot locate git repository at %s" % cmd.gitdir)
 
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support
  2019-12-02 19:02     ` [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support Ben Keene via GitGitGadget
@ 2019-12-03  0:18       ` Denton Liu
  2019-12-03 16:03         ` Ben Keene
  0 siblings, 1 reply; 46+ messages in thread
From: Denton Liu @ 2019-12-03  0:18 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

Hi Ben,

Thanks for the contribution!

> Subject: Python3 support for t9800 tests. Basic P4/Python3 support

In git.git, the convention for commit subjects is to use 
"<area>: <summary>". Perhaps something like, "git-p4: support Python 3"?
Although I doubt this patch should remain as is... More below.

On Mon, Dec 02, 2019 at 07:02:16PM +0000, Ben Keene via GitGitGadget wrote:
> From: Ben Keene <seraphire@gmail.com>

It would be nice to have a bit more information about what this patch
does. Could you please fill this in with some more details about the
whats and, more importantly, the _whys_ of your change?

> 
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> ---
>  git-p4.py | 825 +++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 628 insertions(+), 197 deletions(-)

This is a very big change to be done in one patch. Could you please
split this into multiple smaller patches that each do one logical
change? For example, you could have the following series of changes:

	1. git-p4: use p4.exe if on Windows
	2. git-p4: introduce encoding helper functions # this is to
	        introduce the as_string(), as_bytes(), etc. functions
	3. git-p4: start using the encoding helper functions
	...

This was just an example and you don't have to follow those literally. I
just wanted to give you an idea of what I meant.

You can see Documentation/SubmittingPatches#separate-commits for more
information.

Thanks,

Denton

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support
  2019-12-03  0:18       ` Denton Liu
@ 2019-12-03 16:03         ` Ben Keene
  2019-12-04  6:14           ` Denton Liu
  0 siblings, 1 reply; 46+ messages in thread
From: Ben Keene @ 2019-12-03 16:03 UTC (permalink / raw)
  To: Denton Liu, Ben Keene via GitGitGadget; +Cc: git, Junio C Hamano


On 12/2/2019 7:18 PM, Denton Liu wrote:
> Hi Ben,
>
> Thanks for the contribution!
>
>> Subject: Python3 support for t9800 tests. Basic P4/Python3 support
> In git.git, the convention for commit subjects is to use
> "<area>: <summary>". Perhaps something like, "git-p4: support Python 3"?
> Although I doubt this patch should remain as is... More below.
I didn't realize the email message from gitgitgadget was going to be the 
commit message, I thought it was the PR message.  I'll work on changing 
that!
>
> On Mon, Dec 02, 2019 at 07:02:16PM +0000, Ben Keene via GitGitGadget wrote:
>> From: Ben Keene <seraphire@gmail.com>
> It would be nice to have a bit more information about what this patch
> does. Could you please fill this in with some more details about the
> whats and, more importantly, the _whys_ of your change?
Sure, I'll add more detail.
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> ---
>>   git-p4.py | 825 +++++++++++++++++++++++++++++++++++++++++-------------
>>   1 file changed, 628 insertions(+), 197 deletions(-)
> This is a very big change to be done in one patch. Could you please
> split this into multiple smaller patches that each do one logical
> change? For example, you could have the following series of changes:
>
> 	1. git-p4: use p4.exe if on Windows
> 	2. git-p4: introduce encoding helper functions # this is to
> 	        introduce the as_string(), as_bytes(), etc. functions
> 	3. git-p4: start using the encoding helper functions
> 	...
>
> This was just an example and you don't have to follow those literally. I
> just wanted to give you an idea of what I meant.
>
> You can see Documentation/SubmittingPatches#separate-commits for more
> information.
>
> Thanks,
>
> Denton
So my last question would be, should I open a different PR on 
gitgitgadget? I can cherry-pick my changes into another branch and 
restart my submission?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support
  2019-12-03 16:03         ` Ben Keene
@ 2019-12-04  6:14           ` Denton Liu
  0 siblings, 0 replies; 46+ messages in thread
From: Denton Liu @ 2019-12-04  6:14 UTC (permalink / raw)
  To: Ben Keene; +Cc: Ben Keene via GitGitGadget, git, Junio C Hamano

On Tue, Dec 03, 2019 at 11:03:31AM -0500, Ben Keene wrote:
> So my last question would be, should I open a different PR on gitgitgadget?
> I can cherry-pick my changes into another branch and restart my submission?

You can reuse the same PR. Just force-push to overwrite your old commits
and then you'll be able to `/submit` again to send another revision.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-02 19:02   ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  2019-12-02 19:02     ` [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support Ben Keene via GitGitGadget
@ 2019-12-04 22:29     ` Ben Keene via GitGitGadget
  2019-12-04 22:29       ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
                         ` (11 more replies)
  1 sibling, 12 replies; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano

Issue: The current git-p4.py script does not work with python3.

I have attempted to use the P4 integration built into GIT and I was unable
to get the program to run because I have Python 3.8 installed on my
computer. I was able to get the program to run when I downgraded my python
to version 2.7. However, python 2 is reaching its end of life.

Submission: I am submitting a patch for the git-p4.py script that partially 
supports python 3.8. This code was able to pass the basic tests (t9800) when
run against Python3. This provides basic functionality. 

In an attempt to pass the t9822 P4 path-encoding test, a new parameter for
git P4 Clone was introduced. 

--encoding Format-identifier

This will create the GIT repository following the current functionality;
however, before importing the files from P4, it will set the
git-p4.pathEncoding option so any files or paths that are encoded with
non-ASCII/non-UTF-8 formats will import correctly.

Technical details: The script was updated by futurize (
https://python-future.org/futurize.html) to support Py2/Py3 syntax. The few
references to classes in future were reworked so that future would not be
required. The existing code test for Unicode support was extended to
normalize the classes “unicode” and “bytes” to across platforms:

 * ‘unicode’ is an alias for ‘str’ in Py3 and is the unicode class in Py2.
 * ‘bytes’ is bytes in Py3 and an alias for ‘str’ in Py2.

New coercion methods were written for both Python2 and Python3:

 * as_string(text) – In Python3, this encodes a bytes object as a UTF-8
   encoded Unicode string. 
 * as_bytes(text) – In Python3, this decodes a Unicode string to an array of
   bytes.

In Python2, these functions do not change the data since a ‘str’ object
function in both roles as strings and byte arrays. This reduces the
potential impact on backward compatibility with Python 2.

 * to_unicode(text) – ensures that the supplied data is encoded as a UTF-8
   string. This function will encode data in both Python2 and Python3. * 
      path_as_string(path) – This function is an extension function that
      honors the option “git-p4.pathEncoding” to convert a set of bytes or
      characters to UTF-8. If the str/bytes cannot decode as ASCII, it will
      use the encodeWithUTF8() method to convert the custom encoded bytes to
      Unicode in UTF-8.
   
   

Generally speaking, information in the script is converted to Unicode as
early as possible and converted back to a byte array just before passing to
external programs or files. The exception to this rule is P4 Repository file
paths.

Paths are not converted but left as “bytes” so the original file path
encoding can be preserved. This formatting is required for commands that
interact with the P4 file path. When the file path is used by GIT, it is
converted with encodeWithUTF8().

Signed-off-by: Ben Keene seraphire@gmail.com [seraphire@gmail.com]

Ben Keene (11):
  git-p4: select p4 binary by operating-system
  git-p4: change the expansion test from basestring to list
  git-p4: add new helper functions for python3 conversion
  git-p4: python3 syntax changes
  git-p4: Add new functions in preparation of usage
  git-p4: Fix assumed path separators to be more Windows friendly
  git-p4: Add a helper class for stream writing
  git-p4: p4CmdList  - support Unicode encoding
  git-p4: Add usability enhancements
  git-p4: Support python3 for basic P4 clone, sync, and submit
  git-p4: Added --encoding parameter to p4 clone

 Documentation/git-p4.txt        |   5 +
 git-p4.py                       | 690 ++++++++++++++++++++++++--------
 t/t9822-git-p4-path-encoding.sh | 101 +++++
 3 files changed, 629 insertions(+), 167 deletions(-)


base-commit: 228f53135a4a41a37b6be8e4d6e2b6153db4a8ed
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-463%2Fseraphire%2Fseraphire%2Fp4-python3-unicode-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-463/seraphire/seraphire/p4-python3-unicode-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/463

Range-diff vs v3:

  -:  ---------- >  1:  4012426993 git-p4: select p4 binary by operating-system
  -:  ---------- >  2:  0ef2f56b04 git-p4: change the expansion test from basestring to list
  -:  ---------- >  3:  f0e658b984 git-p4: add new helper functions for python3 conversion
  -:  ---------- >  4:  3c41db3e91 git-p4: python3 syntax changes
  -:  ---------- >  5:  1bf7b073b0 git-p4: Add new functions in preparation of usage
  -:  ---------- >  6:  8f5752c127 git-p4: Fix assumed path separators to be more Windows friendly
  -:  ---------- >  7:  10dc059444 git-p4: Add a helper class for stream writing
  -:  ---------- >  8:  e1a424a955 git-p4: p4CmdList  - support Unicode encoding
  -:  ---------- >  9:  4fc49313f0 git-p4: Add usability enhancements
  1:  02b3843e9f ! 10:  04a0aedbaa Python3 support for t9800 tests. Basic P4/Python3 support
     @@ -1,159 +1,60 @@
      Author: Ben Keene <seraphire@gmail.com>
      
     -    Python3 support for t9800 tests. Basic P4/Python3 support
     +    git-p4: Support python3 for basic P4 clone, sync, and submit
     +
     +    Issue: Python 3 is still not properly supported for any use with the git-p4 python code.
     +    Warning - this is a very large atomic commit.  The commit text is also very large.
     +
     +    Change the code such that, with the exception of P4 depot paths and depot files, all text read by git-p4 is cast as a string as soon as possible and converted back to bytes as late as possible, following Python2 to Python3 conversion best practices.
     +
     +    Important: Do not cast the bytes that contain the p4 depot path or p4 depot file name.  These should be left as bytes until used.
     +
     +    These two values should not be converted because the encoding of these values is unknown.  git-p4 supports a configuration value git-p4.pathEncoding that is used by the encodeWithUTF8()  to determine what a UTF8 version of the path and filename should be.  However, since depot path and depot filename need to be sent to P4 in their original encoding, they will be left as byte streams until they are actually used:
     +
     +    * When sent to P4, the bytes are literally passed to the p4 command
     +    * When displayed in text for the user, they should be passed through the path_as_string() function
     +    * When used by GIT they should be passed through the encodeWithUTF8() function
     +
     +    Change all the rest of system calls to cast output (stdin) as_bytes() and input (stdout) as_string().  This retains existing Python 2 support, and adds python 3 support for these functions:
     +    * read_pipe_full
     +    * read_pipe_lines
     +    * p4_has_move_command (used internally)
     +    * gitConfig
     +    * branch_exists
     +    * GitLFS.generatePointer
     +    * applyCommit - template must be read and written to the temporary file as_bytes() since it is created in memory as a string.
     +    * streamOneP4File(file, contents) - wrap calls to the depotFile in path_as_string() for display. The file contents must be retained as bytes, so update the RCS changes to be forced to bytes.
     +    * streamP4Files
     +    * importHeadRevision(revision) - encode the depotPaths for display separate from the text for processing.
     +
     +    Py23File usage -
     +    Change the P4Sync.OpenStreams() function to cast the gitOutput, gitStream, and gitError streams as Py23File() wrapper classes.  This facilitates taking strings in both python 2 and python 3 and casting them to bytes in the wrapper class instead of having to modify each method. Since the fast-import command also expects a raw byte stream for file content, add a new stream handle - gitStreamBytes which is an unwrapped verison of gitStream.
     +
     +    Literal text -
     +    Depending on context, most literal text does not need casting to unicode or bytes as the text is Python dependent - In python 2, the string is implied as 'str' and python 3 the string is implied as 'unicode'. Under these conditions, they match the rest of the operating text, following best practices.  However, when a literal string is used in functions that are dealing with the raw input from and raw ouput to files streams, literal bytes may be required. Additionally, functions that are dealing with P4 depot paths or P4 depot file names are also dealing with bytes and will require the same casting as bytes.  The following functions cast text as byte strings:
     +    * wildcard_decode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
     +    * wildcard_encode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
     +    * streamP4FilesCb(marshalled) - the marshalled data is in bytes. Cast the literals as bytes. When using this data to manipulate self.stream_file, encode all the marshalled data except for the 'depotFile' name.
     +    * streamP4Files
     +
     +    Special behavior:
     +    * p4_describe - encoding is disabled for the depotFile(x) and path elements since these are depot path and depo filenames.
     +    * p4PathStartsWith(path, prefix) - Since P4 depot paths can contain non-UTF-8 encoded strings, change this method to compare paths while supporting the optional encoding.
     +       - First, perform a byte-to-byte check to see if the path and prefix are both identical text.  There is no need to perform encoding conversions if the text is identical.
     +       - If the byte check fails, pass both the path and prefix through encodeWithUTF8() to ensure both paths are using the same encoding. Then perform the test as originally written.
     +    * patchRCSKeywords(file, pattern) - the parameters of file and pattern are both strings. However this function changes the contents of the file itentified by name "file". Treat the content of this file as binary to ensure that python does not accidently change the original encoding. The regular expression is cast as_bytes() and run against the file as_bytes(). The P4 keywords are ASCII strings and cannot span lines so iterating over each line of the file is acceptable.
     +    * writeToGitStream(gitMode, relPath, contents) - Since 'contents' is already bytes data, instead of using the self.gitStream, use the new self.gitStreamBytes - the unwrapped gitStream that does not cast as_bytes() the binary data.
     +    * commit(details, files, branch, parent = "", allow_empty=False) - Changed the encoding for the commit message to the preferred format for fast-import. The number of bytes is sent in the data block instead of using the EOT marker.
     +    * Change the code for handling the user cache to use binary files. Cast text as_bytes() when writing to the cache and as_string() when reading from the cache.  This makes the reading and writing of the cache determinstic in it's encoding. Unlike file paths, P4 encodes the user names in UTF-8 encoding so no additional string encoding is required.
      
          Signed-off-by: Ben Keene <seraphire@gmail.com>
     +    (cherry picked from commit 65ff0c74ebe62a200b4385ecfd4aa618ce091f48)
      
       diff --git a/git-p4.py b/git-p4.py
       --- a/git-p4.py
       +++ b/git-p4.py
      @@
     - import zlib
     - import ctypes
     - import errno
     -+import os.path
     -+import codecs
     -+import io
     - 
     - # support basestring in python3
     - try:
     -     unicode = unicode
     - except NameError:
     -     # 'unicode' is undefined, must be Python 3
     --    str = str
     -+    #
     -+    # For Python3 which is natively unicode, we will use 
     -+    # unicode for internal information but all P4 Data
     -+    # will remain in bytes
     -+    isunicode = True
     -     unicode = str
     -     bytes = bytes
     --    basestring = (str,bytes)
     -+
     -+    def as_string(text):
     -+        """Return a byte array as a unicode string"""
     -+        if text == None:
     -+            return None
     -+        if isinstance(text, bytes):
     -+            return unicode(text, "utf-8")
     -+        else:
     -+            return text
     -+
     -+    def as_bytes(text):
     -+        """Return a Unicode string as a byte array"""
     -+        if text == None:
     -+            return None
     -+        if isinstance(text, bytes):
     -+            return text
     -+        else:
     -+            return bytes(text, "utf-8")
     -+
     -+    def to_unicode(text):
     -+        """Return a byte array as a unicode string"""
     -+        return as_string(text)    
     -+
     -+    def path_as_string(path):
     -+        """ Converts a path to the UTF8 encoded string """
     -+        if isinstance(path, unicode):
     -+            return path
     -+        return encodeWithUTF8(path).decode('utf-8')
     -+    
     - else:
     -     # 'unicode' exists, must be Python 2
     --    str = str
     -+    #
     -+    # We will treat the data as:
     -+    #   str   -> str
     -+    #   bytes -> str
     -+    # So for Python2 these functions are no-ops
     -+    # and will leave the data in the ambiguious
     -+    # string/bytes state
     -+    isunicode = False
     -     unicode = unicode
     -     bytes = str
     --    basestring = basestring
     -+
     -+    def as_string(text):
     -+        """ Return text unaltered (for Python3 support) """
     -+        return text
     -+
     -+    def as_bytes(text):
     -+        """ Return text unaltered (for Python3 support) """
     -+        return text
     -+
     -+    def to_unicode(text):
     -+        """Return a string as a unicode string"""
     -+        return text.decode('utf-8')
     -+    
     -+    def path_as_string(path):
     -+        """ Converts a path to the UTF8 encoded bytes """
     -+        return encodeWithUTF8(path)
     -+
     -+
     -+ 
     -+# Check for raw_input support
     -+try:
     -+    raw_input
     -+except NameError:
     -+    raw_input = input
     - 
     - try:
     -     from subprocess import CalledProcessError
     -@@
     -     location. It means that hooking into the environment, or other configuration
     -     can be done more easily.
     -     """
     --    real_cmd = ["p4"]
     -+    # Look for the P4 binary
     -+    if (platform.system() == "Windows"):
     -+        real_cmd = ["p4.exe"]    
     -+    else:
     -+        real_cmd = ["p4"]
     - 
     -     user = gitConfig("git-p4.user")
     -     if len(user) > 0:
     -@@
     -         # Provide a way to not pass this option by setting git-p4.retries to 0
     -         real_cmd += ["-r", str(retries)]
     - 
     --    if isinstance(cmd,basestring):
     -+    if not isinstance(cmd, list):
     -         real_cmd = ' '.join(real_cmd) + ' ' + cmd
     -     else:
     -         real_cmd += cmd
     -@@
     -         sys.exit(1)
     - 
     - def write_pipe(c, stdin):
     -+    """Executes the command 'c', passing 'stdin' on the standard input"""
     -     if verbose:
     -         sys.stderr.write('Writing pipe: %s\n' % str(c))
     - 
     --    expand = isinstance(c,basestring)
     -+    expand = not isinstance(c, list)
     -     p = subprocess.Popen(c, stdin=subprocess.PIPE, shell=expand)
     -     pipe = p.stdin
     -     val = pipe.write(stdin)
     -@@
     -     if p.wait():
     -         die('Command failed: %s' % str(c))
     - 
     --    return val
     - 
     - def p4_write_pipe(c, stdin):
     -+    """ Runs a P4 command 'c', passing 'stdin' data to P4"""
     -     real_cmd = p4_build_cmd(c)
     --    return write_pipe(real_cmd, stdin)
     -+    write_pipe(real_cmd, stdin)
     - 
     - def read_pipe_full(c):
     -     """ Read output from  command. Returns a tuple
     -@@
     -     if verbose:
     -         sys.stderr.write('Reading pipe: %s\n' % str(c))
     - 
     --    expand = isinstance(c,basestring)
     -+    expand = not isinstance(c, list)
     +     expand = not isinstance(c, list)
           p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
           (out, err) = p.communicate()
      +    out = as_string(out)
     @@ -179,10 +80,7 @@
           if verbose:
               sys.stderr.write('Reading pipe: %s\n' % str(c))
       
     --    expand = isinstance(c, basestring)
     -+    expand = not isinstance(c, list)
     -     p = subprocess.Popen(c, stdout=subprocess.PIPE, shell=expand)
     -     pipe = p.stdout
     +@@
           val = pipe.readlines()
           if pipe.close() or p.wait():
               die('Command failed: %s' % str(c))
     @@ -203,28 +101,6 @@
           # return code will be 1 in either case
           if err.find("Invalid option") >= 0:
               return False
     -@@
     -     return True
     - 
     - def system(cmd, ignore_error=False):
     --    expand = isinstance(cmd,basestring)
     -+    expand = not isinstance(cmd, list)
     -     if verbose:
     -         sys.stderr.write("executing %s\n" % str(cmd))
     -     retcode = subprocess.call(cmd, shell=expand)
     -@@
     -     return retcode
     - 
     - def p4_system(cmd):
     --    """Specifically invoke p4 as the system command. """
     -+    """ Specifically invoke p4 as the system command. 
     -+    """
     -     real_cmd = p4_build_cmd(cmd)
     --    expand = isinstance(real_cmd, basestring)
     -+    expand = not isinstance(real_cmd, list)
     -     retcode = subprocess.call(real_cmd, shell=expand)
     -     if retcode:
     -         raise CalledProcessError(retcode, real_cmd)
      @@
           return int(results[0]['change'])
       
     @@ -234,7 +110,7 @@
      -       results."""
      +    """ Returns information about the requested P4 change list.
      +
     -+        Data returns is not string encoded (returned as bytes)
     ++        Data returned is not string encoded (returned as bytes)
      +    """
      +    # Make sure it returns a valid result by checking for
      +    #   the presence of field "time".  Return a dict of the
     @@ -261,218 +137,29 @@
           if "time" not in d:
               die("p4 describe -s %d returned no \"time\": %s" % (change, str(d)))
       
     -+    # Convert depotFile(X) to be UTF-8 encoded, as this is what GIT
     -+    # requires. This will also allow us to encode the rest of the text
     -+    # at the same time to simplify textual processing later.
     ++    # Do not convert 'depotFile(X)' or 'path' to be UTF-8 encoded, however 
     ++    # cast as_string() the rest of the text. 
      +    keys=d.keys()
      +    for key in keys:
      +        if key.startswith('depotFile'):
     -+            d[key]=d[key] #DepotPath(d[key])
     ++            d[key]=d[key] 
      +        elif key == 'path':
     -+            d[key]=d[key] #DepotPath(d[key])
     ++            d[key]=d[key] 
      +        else:
      +            d[key] = as_string(d[key])
      +
           return d
       
     --#
     --# Canonicalize the p4 type and return a tuple of the
     --# base type, plus any modifiers.  See "p4 help filetypes"
     --# for a list and explanation.
     --#
     - def split_p4_type(p4type):
     --
     -+    """ Canonicalize the p4 type and return a tuple of the
     -+        base type, plus any modifiers.  See "p4 help filetypes"
     -+        for a list and explanation.
     -+    """
     -     p4_filetypes_historical = {
     -         "ctempobj": "binary+Sw",
     -         "ctext": "text+C",
     -@@
     -         mods = s[1]
     -     return (base, mods)
     - 
     --#
     --# return the raw p4 type of a file (text, text+ko, etc)
     --#
     - def p4_type(f):
     -+    """ return the raw p4 type of a file (text, text+ko, etc)
     -+    """
     -     results = p4CmdList(["fstat", "-T", "headType", wildcard_encode(f)])
     -     return results[0]['headType']
     - 
     --#
     --# Given a type base and modifier, return a regexp matching
     --# the keywords that can be expanded in the file
     --#
     - def p4_keywords_regexp_for_type(base, type_mods):
     -+    """ Given a type base and modifier, return a regexp matching
     -+        the keywords that can be expanded in the file
     -+    """
     -     if base in ("text", "unicode", "binary"):
     -         kwords = None
     -         if "ko" in type_mods:
     -@@
     -     else:
     -         return None
     - 
     --#
     --# Given a file, return a regexp matching the possible
     --# RCS keywords that will be expanded, or None for files
     --# with kw expansion turned off.
     --#
     - def p4_keywords_regexp_for_file(file):
     -+    """ Given a file, return a regexp matching the possible
     -+        RCS keywords that will be expanded, or None for files
     -+        with kw expansion turned off.
     -+    """
     -     if not os.path.exists(file):
     -         return None
     -     else:
     -@@
     - # Return the set of all p4 labels
     - def getP4Labels(depotPaths):
     -     labels = set()
     --    if isinstance(depotPaths,basestring):
     -+    if not isinstance(depotPaths, list):
     -         depotPaths = [depotPaths]
     - 
     -     for l in p4CmdList(["labels"] + ["%s..." % p for p in depotPaths]):
     -@@
     - 
     -     return labels
     - 
     --# Return the set of all git tags
     - def getGitTags():
     -+    """Return the set of all git tags"""
     -     gitTags = set()
     -     for line in read_pipe_lines(["git", "tag"]):
     -         tag = line.strip()
     -@@
     - 
     -     If the pattern is not matched, None is returned."""
     - 
     --    match = diffTreePattern().next().match(entry)
     -+    match = next(diffTreePattern()).match(entry)
     -     if match:
     -         return {
     -             'src_mode': match.group(1),
     -@@
     -     # otherwise False.
     -     return mode[-3:] == "755"
     - 
     -+def encodeWithUTF8(path, verbose = False):
     -+    """ Ensure that the path is encoded as a UTF-8 string
     -+
     -+        Returns bytes(P3)/str(P2)
     -+    """
     -+   
     -+    if isunicode:
     -+        try:
     -+            if isinstance(path, unicode):
     -+                # It is already unicode, cast it as a bytes
     -+                # that is encoded as utf-8.
     -+                return path.encode('utf-8', 'strict')
     -+            path.decode('ascii', 'strict')
     -+        except:
     -+            encoding = 'utf8'
     -+            if gitConfig('git-p4.pathEncoding'):
     -+                encoding = gitConfig('git-p4.pathEncoding')
     -+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
     -+            if verbose:
     -+                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
     -+    else:    
     -+        try:
     -+            path.decode('ascii')
     -+        except:
     -+            encoding = 'utf8'
     -+            if gitConfig('git-p4.pathEncoding'):
     -+                encoding = gitConfig('git-p4.pathEncoding')
     -+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
     -+            if verbose:
     -+                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
     -+    return path
     -+
     - class P4Exception(Exception):
     -     """ Base class for exceptions from the p4 client """
     -     def __init__(self, exit_code):
     -@@
     -     return isModeExec(src_mode) != isModeExec(dst_mode)
     - 
     - def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     --        errors_as_exceptions=False):
     -+        errors_as_exceptions=False, encode_data=True):
     -+    """ Executes a P4 command:  'cmd' optionally passing 'stdin' to the command's
     -+        standard input via a temporary file with 'stdin_mode' mode.
     -+
     -+        Output from the command is optionally passed to the callback function 'cb'.
     -+        If 'cb' is None, the response from the command is parsed into a list
     -+        of resulting dictionaries. (For each block read from the process pipe.)
     -+
     -+        If 'skip_info' is true, information in a block read that has a code type of
     -+        'info' will be skipped.
     - 
     --    if isinstance(cmd,basestring):
     -+        If 'errors_as_exceptions' is set to true (the default is false) the error
     -+        code returned from the execution will generate an exception.
     -+
     -+        If 'encode_data' is set to true (the default) the data that is returned 
     -+        by this function will be passed through the "as_string" function.
     -+    """
     -+
     -+    if not isinstance(cmd, list):
     -         cmd = "-G " + cmd
     -         expand = True
     -     else:
     -@@
     -     stdin_file = None
     -     if stdin is not None:
     -         stdin_file = tempfile.TemporaryFile(prefix='p4-stdin', mode=stdin_mode)
     --        if isinstance(stdin,basestring):
     -+        if not isinstance(stdin, list):
     -             stdin_file.write(stdin)
     -         else:
     -             for i in stdin:
     --                stdin_file.write(i + '\n')
     -+                stdin_file.write(as_bytes(i) + b'\n')
     -         stdin_file.flush()
     -         stdin_file.seek(0)
     - 
     -@@
     -         while True:
     -             entry = marshal.load(p4.stdout)
     -             if skip_info:
     --                if 'code' in entry and entry['code'] == 'info':
     -+                if b'code' in entry and entry[b'code'] == b'info':
     -                     continue
     -             if cb is not None:
     -                 cb(entry)
     -             else:
     --                result.append(entry)
     -+                out = {}
     -+                for key, value in entry.items():
     -+                    out[as_string(key)] = (as_string(value) if encode_data else value)
     -+                result.append(out)
     -     except EOFError:
     -         pass
     -     exitCode = p4.wait()
     + #
      @@
           return result
       
       def p4Cmd(cmd):
     -+    """ Executes a P4 command an returns the results in a dictionary"""
     ++    """ Executes a P4 command and returns the results in a dictionary
     ++    """
           list = p4CmdList(cmd)
           result = {}
           for entry in list:
     -@@
     -     return values
     - 
     - def gitBranchExists(branch):
     -+    """Checks to see if a given branch exists in the git repo"""
     -     proc = subprocess.Popen(["git", "rev-parse", branch],
     -                             stderr=subprocess.PIPE, stdout=subprocess.PIPE);
     -     return proc.wait() == 0;
      @@
       _gitConfig = {}
       
     @@ -490,29 +177,6 @@
           return _gitConfig[key]
       
       def gitConfigBool(key):
     --    """Return a bool, using git config --bool.  It is True only if the
     --       variable is set to true, and False if set to false or not present
     --       in the config."""
     --
     -+    """ Return a bool, using git config --bool.  It is True only if the
     -+        variable is set to true, and False if set to false or not present
     -+        in the config.
     -+    """
     -     if key not in _gitConfig:
     -         _gitConfig[key] = gitConfig(key, '--bool') == "true"
     -     return _gitConfig[key]
     -@@
     -             _gitConfig[key] = []
     -     return _gitConfig[key]
     - 
     -+def gitConfigSet(key, value):
     -+    """ Set the git configuration key 'key' to 'value' for this session
     -+    """
     -+    _gitConfig[key] = value
     -+
     - def p4BranchesInGit(branchesAreInRemotes=True):
     -     """Find all the branches whose names start with "p4/", looking
     -        in remotes or heads as specified by the argument.  Return
      @@
           cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
           p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     @@ -521,34 +185,6 @@
           if p.returncode:
               return False
           # expect exactly one line of output: the branch name
     -@@
     -     branches = p4BranchesInGit()
     -     # map from depot-path to branch name
     -     branchByDepotPath = {}
     --    for branch in branches.keys():
     -+    for branch in list(branches.keys()):
     -         tip = branches[branch]
     -         log = extractLogMessageFromGitCommit(tip)
     -         settings = extractSettingsGitLog(log)
     -@@
     -             system("git update-ref %s %s" % (remoteHead, originHead))
     - 
     - def originP4BranchesExist():
     --        return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
     -+    """Checks if origin/p4/master exists"""
     -+    return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
     - 
     - 
     - def p4ParseNumericChangeRange(parts):
     -@@
     -     changes = sorted(changes)
     -     return changes
     - 
     --def p4PathStartsWith(path, prefix):
     -+def p4PathStartsWith(path, prefix, verbose = False):
     -     # This method tries to remedy a potential mixed-case issue:
     -     #
     -     # If UserA adds  //depot/DirA/file1
      @@
           #
           # we may or may not have a problem. If you have core.ignorecase=true,
     @@ -574,15 +210,6 @@
       
       def getClientSpec():
           """Look at the p4 client spec, create a View() object that contains
     -@@
     -     client_name = entry["Client"]
     - 
     -     # just the keys that start with "View"
     --    view_keys = [ k for k in entry.keys() if k.startswith("View") ]
     -+    view_keys = [ k for k in list(entry.keys()) if k.startswith("View") ]
     - 
     -     # hold this new View
     -     view = View(client_name)
      @@
           # Cannot have * in a filename in windows; untested as to
           # what p4 would do in such a case.
     @@ -626,45 +253,16 @@
                   os.remove(contentFile)
                   die('git-lfs pointer command failed. Did you install the extension?')
      @@
     -         else:
     -             return LargeFileSystem.processContent(self, git_mode, relPath, contents)
     - 
     --class Command:
     -+class Command(object):
     -     delete_actions = ( "delete", "move/delete", "purge" )
     -     add_actions = ( "add", "branch", "move/add" )
     - 
     -@@
     -             setattr(self, attr, value)
     -         return getattr(self, attr)
     - 
     --class P4UserMap:
     -+class P4UserMap(object):
     -     def __init__(self):
     -         self.userMapFromPerforceServer = False
     -         self.myP4UserId = None
     -@@
     -             return True
     - 
     -     def getUserCacheFilename(self):
     -+        """ Returns the filename of the username cache """
     -         home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
     --        return home + "/.gitp4-usercache.txt"
     -+        return os.path.join(home, ".gitp4-usercache.txt")
     +         return os.path.join(home, ".gitp4-usercache.txt")
       
           def getUserMapFromPerforceServer(self):
      +        """ Creates the usercache from the data in P4.
      +        """
     -+        
               if self.userMapFromPerforceServer:
                   return
               self.users = {}
      @@
     -                 self.emails[email] = user
     - 
     -         s = ''
     --        for (key, val) in self.users.items():
     -+        for (key, val) in list(self.users.items()):
     +         for (key, val) in list(self.users.items()):
                   s += "%s\t%s\n" % (key.expandtabs(1), val.expandtabs(1))
       
      -        open(self.getUserCacheFilename(), "wb").write(s)
     @@ -674,7 +272,8 @@
               self.userMapFromPerforceServer = True
       
           def loadUserMapFromCache(self):
     -+        """ Reads the P4 username to git email map """
     ++        """ Reads the P4 username to git email map 
     ++        """
               self.users = {}
               self.userMapFromPerforceServer = False
               try:
     @@ -721,80 +320,6 @@
                   # cleanup our temporary file
                   os.unlink(outFileName)
                   print("Failed to strip RCS keywords in %s" % file)
     -@@
     -                 break
     -         if not change_entry:
     -             die('Failed to decode output of p4 change -o')
     --        for key, value in change_entry.iteritems():
     -+        for key, value in list(change_entry.items()):
     -             if key.startswith('File'):
     -                 if 'depot-paths' in settings:
     -                     if not [p for p in settings['depot-paths']
     --                            if p4PathStartsWith(value, p)]:
     -+                            if p4PathStartsWith(value, p, self.verbose)]:
     -                         continue
     -                 else:
     --                    if not p4PathStartsWith(value, self.depotPath):
     -+                    if not p4PathStartsWith(value, self.depotPath, self.verbose):
     -                         continue
     -                 files_list.append(value)
     -                 continue
     -@@
     -             return True
     - 
     -         while True:
     --            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
     -+            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
     -+                .strip()[0]
     -             if response == 'y':
     -                 return True
     -             if response == 'n':
     -@@
     -     def applyCommit(self, id):
     -         """Apply one commit, return True if it succeeded."""
     - 
     --        print("Applying", read_pipe(["git", "show", "-s",
     --                                     "--format=format:%h %s", id]))
     -+        print(("Applying", read_pipe(["git", "show", "-s",
     -+                                     "--format=format:%h %s", id])))
     - 
     -         (p4User, gitEmail) = self.p4UserForCommit(id)
     - 
     -@@
     -                     # disable the read-only bit on windows.
     -                     if self.isWindows and file not in editedFiles:
     -                         os.chmod(file, stat.S_IWRITE)
     --                    self.patchRCSKeywords(file, kwfiles[file])
     --                    fixed_rcs_keywords = True
     -+                    
     -+                    try:
     -+                        self.patchRCSKeywords(file, kwfiles[file])
     -+                        fixed_rcs_keywords = True
     -+                    except:
     -+                        # We are throwing an exception, undo all open edits
     -+                        for f in editedFiles:
     -+                            p4_revert(f)
     -+                        raise
     -+            else:
     -+                # They do not have attemptRCSCleanup set, this might be the fail point
     -+                # Check to see if the file has RCS keywords and suggest setting the property.
     -+                for file in editedFiles | filesToDelete:
     -+                    if p4_keywords_regexp_for_file(file) != None:
     -+                        print("At least one file in this commit has RCS Keywords that may be causing problems. ")
     -+                        print("Consider:\ngit config git-p4.attemptRCSCleanup true")
     -+                        break
     - 
     -             if fixed_rcs_keywords:
     -                 print("Retrying the patch with RCS keywords cleaned up")
     -@@
     -             p4_delete(f)
     - 
     -         # Set/clear executable bits
     --        for f in filesToChangeExecBit.keys():
     -+        for f in list(filesToChangeExecBit.keys()):
     -             mode = filesToChangeExecBit[f]
     -             setP4ExecBit(f, mode)
     - 
      @@
               tmpFile = os.fdopen(handle, "w+b")
               if self.isWindows:
     @@ -815,179 +340,6 @@
       
                       if update_shelve:
                           p4_write_pipe(['shelve', '-r', '-i'], submitTemplate)
     -@@
     -                 if verbose:
     -                     print("created p4 label for tag %s" % name)
     - 
     -+    def run_hook(self, hook_name, args = []):
     -+        """ Runs a hook if it is found.
     -+
     -+            Returns NONE if the hook does not exist
     -+            Returns TRUE if the exit code is 0, FALSE for a non-zero exit code.
     -+        """
     -+        hook_file = self.find_hook(hook_name)
     -+        if hook_file == None:
     -+            if self.verbose:
     -+                print("Skipping hook: %s" % hook_name)
     -+            return None
     -+
     -+        if self.verbose:
     -+            print("hooks_path = %s " % hooks_path)
     -+            print("hook_file = %s " % hook_file)
     -+
     -+        # Run the hook
     -+        # TODO - allow non-list format
     -+        cmd = [hook_file] + args
     -+        return subprocess.call(cmd) == 0
     -+
     -+    def find_hook(self, hook_name):
     -+        """ Locates the hook file for the given operating system.
     -+        """
     -+        hooks_path = gitConfig("core.hooksPath")
     -+        if len(hooks_path) <= 0:
     -+            hooks_path = os.path.join(os.environ.get("GIT_DIR", ".git"), "hooks")
     -+
     -+        # Look in the obvious place
     -+        hook_file = os.path.join(hooks_path, hook_name)
     -+        if os.path.isfile(hook_file) and os.access(hook_file, os.X_OK):
     -+            return hook_file
     -+
     -+        # if we are windows, we will also allow them to have the hooks have extensions
     -+        if (platform.system() == "Windows"):
     -+            for ext in ['.exe', '.bat', 'ps1']:
     -+                if os.path.isfile(hook_file + ext) and os.access(hook_file + ext, os.X_OK):
     -+                    return hook_file + ext
     -+
     -+        # We didn't find the file
     -+        return None
     -+
     -+
     -+
     -     def run(self, args):
     -         if len(args) == 0:
     -             self.master = currentGitBranch()
     -@@
     -             self.clientSpecDirs = getClientSpec()
     - 
     -         # Check for the existence of P4 branches
     --        branchesDetected = (len(p4BranchesInGit().keys()) > 1)
     -+        branchesDetected = (len(list(p4BranchesInGit().keys())) > 1)
     - 
     -         if self.useClientSpec and not branchesDetected:
     -             # all files are relative to the client spec
     -@@
     -             sys.exit("number of commits (%d) must match number of shelved changelist (%d)" %
     -                      (len(commits), num_shelves))
     - 
     --        hooks_path = gitConfig("core.hooksPath")
     --        if len(hooks_path) <= 0:
     --            hooks_path = os.path.join(os.environ.get("GIT_DIR", ".git"), "hooks")
     --
     --        hook_file = os.path.join(hooks_path, "p4-pre-submit")
     --        if os.path.isfile(hook_file) and os.access(hook_file, os.X_OK) and subprocess.call([hook_file]) != 0:
     -+        rtn = self.run_hook("p4-pre-submit")
     -+        if rtn == False:
     -             sys.exit(1)
     - 
     -         #
     -@@
     -         last = len(commits) - 1
     -         for i, commit in enumerate(commits):
     -             if self.dry_run:
     --                print(" ", read_pipe(["git", "show", "-s",
     --                                      "--format=format:%h %s", commit]))
     -+                print((" ", read_pipe(["git", "show", "-s",
     -+                                      "--format=format:%h %s", commit])))
     -                 ok = True
     -             else:
     -                 ok = self.applyCommit(commit)
     -@@
     -                         if self.conflict_behavior == "ask":
     -                             print("What do you want to do?")
     -                             response = raw_input("[s]kip this commit but apply"
     --                                                 " the rest, or [q]uit? ")
     -+                                                 " the rest, or [q]uit? ").lower().strip()[0]
     -                             if not response:
     -                                 continue
     -                         elif self.conflict_behavior == "skip":
     -@@
     -                         star = "*"
     -                     else:
     -                         star = " "
     --                    print(star, read_pipe(["git", "show", "-s",
     --                                           "--format=format:%h %s",  c]))
     -+                    print((star, read_pipe(["git", "show", "-s",
     -+                                           "--format=format:%h %s",  c])))
     -                 print("You will have to do 'git p4 sync' and rebase.")
     - 
     -         if gitConfigBool("git-p4.exportLabels"):
     -@@
     -     # ("-//depot/A/..." becomes "/depot/A/..." after option parsing)
     -     parser.values.cloneExclude += ["/" + re.sub(r"\.\.\.$", "", value)]
     - 
     -+
     - class P4Sync(Command, P4UserMap):
     - 
     -     def __init__(self):
     -@@
     -         self.knownBranches = {}
     -         self.initialParents = {}
     - 
     --        self.tz = "%+03d%02d" % (- time.timezone / 3600, ((- time.timezone % 3600) / 60))
     -+        self.tz = "%+03d%02d" % (- time.timezone // 3600, ((- time.timezone % 3600) // 60))
     -         self.labels = {}
     - 
     -     # Force a checkpoint in fast-import and wait for it to finish
     -@@
     -     def isPathWanted(self, path):
     -         for p in self.cloneExclude:
     -             if p.endswith("/"):
     --                if p4PathStartsWith(path, p):
     -+                if p4PathStartsWith(path, p, self.verbose):
     -                     return False
     -             # "-//depot/file1" without a trailing "/" should only exclude "file1", but not "file111" or "file1_dir/file2"
     -             elif path.lower() == p.lower():
     -                 return False
     -         for p in self.depotPaths:
     --            if p4PathStartsWith(path, p):
     -+            if p4PathStartsWith(path, p, self.verbose):
     -                 return True
     -         return False
     - 
     -     def extractFilesFromCommit(self, commit, shelved=False, shelved_cl = 0):
     -+        """ Generates the list of files to be added in this git commit.
     -+
     -+            commit     = Unicode[] - data read from the P4 commit
     -+            shelved    = Bool      - Is the P4 commit flagged as being shelved.
     -+            shelved_cl = Unicode   - Numeric string with the changelist number.
     -+        """
     -         files = []
     -         fnum = 0
     -         while "depotFile%s" % fnum in commit:
     -@@
     -             path = self.clientSpecDirs.map_in_client(path)
     -             if self.detectBranches:
     -                 for b in self.knownBranches:
     --                    if p4PathStartsWith(path, b + "/"):
     -+                    if p4PathStartsWith(path, b + "/", self.verbose):
     -                         path = path[len(b)+1:]
     - 
     -         elif self.keepRepoPath:
     -@@
     -             # //depot/; just look at first prefix as they all should
     -             # be in the same depot.
     -             depot = re.sub("^(//[^/]+/).*", r'\1', prefixes[0])
     --            if p4PathStartsWith(path, depot):
     -+            if p4PathStartsWith(path, depot, self.verbose):
     -                 path = path[len(depot):]
     - 
     -         else:
     -             for p in prefixes:
     --                if p4PathStartsWith(path, p):
     -+                if p4PathStartsWith(path, p, self.verbose):
     -                     path = path[len(p):]
     -                     break
     - 
      @@
               return path
       
     @@ -1002,19 +354,6 @@
       
               if self.clientSpecDirs:
                   files = self.extractFilesFromCommit(commit)
     -@@
     -             else:
     -                 relPath = self.stripRepoPath(path, self.depotPaths)
     - 
     --            for branch in self.knownBranches.keys():
     -+            for branch in list(self.knownBranches.keys()):
     -                 # add a trailing slash so that a commit into qt/4.2foo
     -                 # doesn't end up in qt/4.2, e.g.
     --                if p4PathStartsWith(relPath, branch + "/"):
     -+                if p4PathStartsWith(relPath, branch + "/", self.verbose):
     -                     if branch not in branches:
     -                         branches[branch] = []
     -                     branches[branch].append(file)
      @@
               return branches
       
     @@ -1031,18 +370,6 @@
      +            self.gitStreamBytes.write(d)
               self.gitStream.write('\n')
       
     --    def encodeWithUTF8(self, path):
     --        try:
     --            path.decode('ascii')
     --        except:
     --            encoding = 'utf8'
     --            if gitConfig('git-p4.pathEncoding'):
     --                encoding = gitConfig('git-p4.pathEncoding')
     --            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
     --            if self.verbose:
     --                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
     --        return path
     --
      -    # output one file from the P4 stream
      -    # - helper for streamP4Files
      -
     @@ -1053,18 +380,13 @@
      +            contents should be a bytes (bytes) 
      +        """
               relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
     --        relPath = self.encodeWithUTF8(relPath)
     -+        relPath = encodeWithUTF8(relPath, self.verbose)
     +         relPath = encodeWithUTF8(relPath, self.verbose)
               if verbose:
     -             if 'fileSize' in self.stream_file:
     +@@
                       size = int(self.stream_file['fileSize'])
                   else:
                       size = 0 # deleted files don't get a fileSize apparently
     --            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024))
     -+            #if isunicode:
     -+            #    sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), to_unicode(relPath), size//1024//1024))
     -+            #else:
     -+            #    sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), relPath, size//1024//1024))
     +-            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size//1024//1024))
      +            sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), as_string(relPath), size//1024//1024))
                   sys.stdout.flush()
       
     @@ -1100,15 +422,6 @@
       
               if self.largeFileSystem:
      @@
     - 
     -     def streamOneP4Deletion(self, file):
     -         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
     --        relPath = self.encodeWithUTF8(relPath)
     -+        relPath = encodeWithUTF8(relPath, self.verbose)
     -         if verbose:
     -             sys.stdout.write("delete %s\n" % relPath)
     -             sys.stdout.flush()
     -@@
               if self.largeFileSystem and self.largeFileSystem.isLargeFile(relPath):
                   self.largeFileSystem.removeLargeFile(relPath)
       
     @@ -1133,13 +446,6 @@
       
               if not err and 'fileSize' in self.stream_file:
                   required_bytes = int((4 * int(self.stream_file["fileSize"])) - calcDiskFree())
     -             if required_bytes > 0:
     -                 err = 'Not enough space left on %s! Free at least %i MB.' % (
     --                    os.getcwd(), required_bytes/1024/1024
     -+                    os.getcwd(), required_bytes//1024//1024
     -                 )
     - 
     -         if err:
      @@
                   # ignore errors, but make sure it exits first
                   self.importProcess.wait()
     @@ -1155,12 +461,10 @@
                   self.streamOneP4File(self.stream_file, self.stream_contents)
                   self.stream_file = {}
      @@
     - 
               # pick up the new file information... for the
               # 'data' field we need to append to our array
     --        for k in marshalled.keys():
     +         for k in list(marshalled.keys()):
      -            if k == 'data':
     -+        for k in list(marshalled.keys()):
      +            if k == b'data':
                       if 'streamContentSize' not in self.stream_file:
                           self.stream_file['streamContentSize'] = 0
     @@ -1178,12 +482,10 @@
               if (verbose and
                   'streamContentSize' in self.stream_file and
      @@
     -             'depotFile' in self.stream_file):
                   size = int(self.stream_file["fileSize"])
                   if size > 0:
     --                progress = 100*self.stream_file['streamContentSize']/size
     --                sys.stdout.write('\r%s %d%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size/1024/1024)))
     -+                progress = 100.0*self.stream_file['streamContentSize']/size
     +                 progress = 100.0*self.stream_file['streamContentSize']/size
     +-                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size//1024//1024)))
      +                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (path_as_string(self.stream_file['depotFile']), progress, int(size//1024//1024)))
                       sys.stdout.flush()
       
     @@ -1227,24 +529,6 @@
       
               if verbose:
      @@
     - 
     -         gitStream.write("tagger %s\n" % tagger)
     - 
     --        print("labelDetails=",labelDetails)
     -+        print(("labelDetails=",labelDetails))
     -         if 'Description' in labelDetails:
     -             description = labelDetails['Description']
     -         else:
     -@@
     -         if not self.branchPrefixes:
     -             return True
     -         hasPrefix = [p for p in self.branchPrefixes
     --                        if p4PathStartsWith(path, p)]
     -+                        if p4PathStartsWith(path, p, self.verbose)]
     -         if not hasPrefix and self.verbose:
     -             print('Ignoring file outside of prefix: {0}'.format(path))
     -         return hasPrefix
     -@@
                       .format(details['change']))
                   return
       
     @@ -1307,58 +591,6 @@
       
               if len(parent) > 0:
                   if self.verbose:
     -@@
     -             self.labels[newestChange] = [output, revisions]
     - 
     -         if self.verbose:
     --            print("Label changes: %s" % self.labels.keys())
     -+            print("Label changes: %s" % list(self.labels.keys()))
     - 
     -     # Import p4 labels as git tags. A direct mapping does not
     -     # exist, so assume that if all the files are at the same revision
     -@@
     -                 source = paths[0]
     -                 destination = paths[1]
     -                 ## HACK
     --                if p4PathStartsWith(source, self.depotPaths[0]) and p4PathStartsWith(destination, self.depotPaths[0]):
     -+                if p4PathStartsWith(source, self.depotPaths[0], self.verbose) and p4PathStartsWith(destination, self.depotPaths[0], self.verbose):
     -                     source = source[len(self.depotPaths[0]):-4]
     -                     destination = destination[len(self.depotPaths[0]):-4]
     - 
     -@@
     - 
     -     def getBranchMappingFromGitBranches(self):
     -         branches = p4BranchesInGit(self.importIntoRemotes)
     --        for branch in branches.keys():
     -+        for branch in list(branches.keys()):
     -             if branch == "master":
     -                 branch = "main"
     -             else:
     -@@
     -             self.updateOptionDict(description)
     - 
     -             if not self.silent:
     --                sys.stdout.write("\rImporting revision %s (%s%%)" % (change, cnt * 100 / len(changes)))
     -+                sys.stdout.write("\rImporting revision %s (%4.1f%%)" % (change, cnt * 100 / len(changes)))
     -                 sys.stdout.flush()
     -             cnt = cnt + 1
     - 
     -             try:
     -                 if self.detectBranches:
     -                     branches = self.splitFilesIntoBranches(description)
     --                    for branch in branches.keys():
     -+                    for branch in list(branches.keys()):
     -                         ## HACK  --hwn
     -                         branchPrefix = self.depotPaths[0] + branch + "/"
     -                         self.branchPrefixes = [ branchPrefix ]
     -@@
     -                 sys.exit(1)
     - 
     -     def sync_origin_only(self):
     -+        """ Ensures that the origin has been synchronized if one is set """
     -         if self.syncWithOrigin:
     -             self.hasOrigin = originP4BranchesExist()
     -             if self.hasOrigin:
      @@
                       system("git fetch origin")
       
     @@ -1439,61 +671,6 @@
           def closeStreams(self):
               self.gitStream.close()
      @@
     -                 if short in branches:
     -                     self.p4BranchesInGit = [ short ]
     -             else:
     --                self.p4BranchesInGit = branches.keys()
     -+                self.p4BranchesInGit = list(branches.keys())
     - 
     -             if len(self.p4BranchesInGit) > 1:
     -                 if not self.silent:
     -                     print("Importing from/into multiple branches")
     -                 self.detectBranches = True
     --                for branch in branches.keys():
     -+                for branch in list(branches.keys()):
     -                     self.initialParents[self.refPrefix + branch] = \
     -                         branches[branch]
     - 
     -@@
     -                                  help="where to leave result of the clone"),
     -             optparse.make_option("--bare", dest="cloneBare",
     -                                  action="store_true", default=False),
     -+            optparse.make_option("--encoding", dest="setPathEncoding",
     -+                                 action="store", default=None,
     -+                                 help="Sets the path encoding for this depot")
     -         ]
     -         self.cloneDestination = None
     -         self.needsGit = False
     -         self.cloneBare = False
     -+        self.setPathEncoding = None
     - 
     -     def defaultDestination(self, args):
     -+        """Returns the last path component as the default git 
     -+        repository directory name"""
     -         ## TODO: use common prefix of args?
     -         depotPath = args[0]
     -         depotDir = re.sub("(@[^@]*)$", "", depotPath)
     -         depotDir = re.sub("(#[^#]*)$", "", depotDir)
     -         depotDir = re.sub(r"\.\.\.$", "", depotDir)
     -         depotDir = re.sub(r"/$", "", depotDir)
     --        return os.path.split(depotDir)[1]
     -+        return depotDir.split('/')[-1]
     - 
     -     def run(self, args):
     -         if len(args) < 1:
     -@@
     - 
     -         depotPaths = args
     - 
     -+        # If we have an encoding provided, ignore what may already exist
     -+        # in the registry. This will ensure we show the displayed values
     -+        # using the correct encoding.
     -+        if self.setPathEncoding:
     -+            gitConfigSet("git-p4.pathEncoding", self.setPathEncoding)
     -+
     -+        # If more than 1 path element is supplied, the last element
     -+        # is the clone destination.
     -         if not self.cloneDestination and len(depotPaths) > 1:
                   self.cloneDestination = depotPaths[-1]
                   depotPaths = depotPaths[:-1]
       
     @@ -1512,177 +689,3 @@
       
               if not os.path.exists(self.cloneDestination):
                   os.makedirs(self.cloneDestination)
     -@@
     -         if retcode:
     -             raise CalledProcessError(retcode, init_cmd)
     - 
     -+        # Set the encoding if it was provided command line
     -+        if self.setPathEncoding:
     -+            init_cmd= ["git", "config", "git-p4.pathEncoding", self.setPathEncoding]
     -+            retcode = subprocess.call(init_cmd)
     -+            if retcode:
     -+                raise CalledProcessError(retcode, init_cmd)
     -+
     -         if not P4Sync.run(self, depotPaths):
     -             return False
     - 
     -@@
     -             to find the P4 commit we are based on, and the depot-paths.
     -         """
     - 
     --        for parent in (range(65535)):
     -+        for parent in (list(range(65535))):
     -             log = extractLogMessageFromGitCommit("{0}^{1}".format(starting_point, parent))
     -             settings = extractSettingsGitLog(log)
     -             if 'change' in settings:
     -@@
     -             print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
     -         return True
     - 
     -+class Py23File():
     -+    """ Python2/3 Unicode File Wrapper 
     -+    """
     -+    
     -+    stream_handle = None
     -+    verbose       = False
     -+    debug_handle  = None
     -+   
     -+    def __init__(self, stream_handle, verbose = False,
     -+                 debug_handle = None):
     -+        """ Create a Python3 compliant Unicode to Byte String
     -+            Windows compatible wrapper
     -+
     -+            stream_handle = the underlying file-like handle
     -+            verbose       = Boolean if content should be echoed
     -+            debug_handle  = A file-like handle data is duplicately written to
     -+        """
     -+        self.stream_handle = stream_handle
     -+        self.verbose       = verbose
     -+        self.debug_handle  = debug_handle
     -+
     -+    def write(self, utf8string):
     -+        """ Writes the utf8 encoded string to the underlying 
     -+            file stream
     -+        """
     -+        self.stream_handle.write(as_bytes(utf8string))
     -+        if self.verbose:
     -+            sys.stderr.write("Stream Output: %s" % utf8string)
     -+            sys.stderr.flush()
     -+        if self.debug_handle:
     -+            self.debug_handle.write(as_bytes(utf8string))
     -+
     -+    def read(self, size = None):
     -+        """ Reads int charcters from the underlying stream 
     -+            and converts it to utf8.
     -+
     -+            Be aware, the size value is for reading the underlying
     -+            bytes so the value may be incorrect. Usage of the size
     -+            value is discouraged.
     -+        """
     -+        if size == None:
     -+            return as_string(self.stream_handle.read())
     -+        else:
     -+            return as_string(self.stream_handle.read(size))
     -+
     -+    def readline(self):
     -+        """ Reads a line from the underlying byte stream 
     -+            and converts it to utf8
     -+        """
     -+        return as_string(self.stream_handle.readline())
     -+
     -+    def readlines(self, sizeHint = None):
     -+        """ Returns a list containing lines from the file converted to unicode.
     -+
     -+            sizehint - Optional. If the optional sizehint argument is 
     -+            present, instead of reading up to EOF, whole lines totalling 
     -+            approximately sizehint bytes are read.
     -+        """
     -+        lines = self.stream_handle.readlines(sizeHint)
     -+        for i in range(0, len(lines)):
     -+            lines[i] = as_string(lines[i])
     -+        return lines
     -+
     -+    def close(self):
     -+        """ Closes the underlying byte stream """
     -+        self.stream_handle.close()
     -+
     -+    def flush(self):
     -+        """ Flushes the underlying byte stream """
     -+        self.stream_handle.flush()
     -+
     -+class DepotPath():
     -+    """ Describes a DepotPath or File
     -+    """
     -+
     -+    raw_path = None
     -+    utf8_path = None
     -+    bytes_path = None
     -+
     -+    def __init__(self, path):
     -+        """ Creates a new DepotPath with the path encoded
     -+            with by the P4 repository
     -+        """
     -+        raw_path = path
     -+
     -+    def raw():
     -+        """ Returns the path as it was originally found
     -+            in the P4 repository
     -+        """
     -+        return raw_path
     -+
     -+    def startswith(self, prefix, start = None, end = None):
     -+        """ Return True if string starts with the prefix, otherwise 
     -+            return False. prefix can also be a tuple of prefixes to 
     -+            look for. With optional start, test string beginning at 
     -+            that position. With optional end, stop comparing 
     -+            string at that position.
     -+        """
     -+        return raw_path.startswith(prefix, start, end)
     -+
     -+
     - class HelpFormatter(optparse.IndentedHelpFormatter):
     -     def __init__(self):
     -         optparse.IndentedHelpFormatter.__init__(self)
     -@@
     - 
     - def main():
     -     if len(sys.argv[1:]) == 0:
     --        printUsage(commands.keys())
     -+        printUsage(list(commands.keys()))
     -         sys.exit(2)
     - 
     -     cmdName = sys.argv[1]
     -@@
     -     except KeyError:
     -         print("unknown command %s" % cmdName)
     -         print("")
     --        printUsage(commands.keys())
     -+        printUsage(list(commands.keys()))
     -         sys.exit(2)
     - 
     -     options = cmd.options
     -@@
     -                                    description = cmd.description,
     -                                    formatter = HelpFormatter())
     - 
     --    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
     -+    try:
     -+        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
     -+    except:
     -+        parser.print_help()
     -+        raise
     -+
     -     global verbose
     -     verbose = cmd.verbose
     -     if cmd.needsGit:
     -@@
     -                         chdir(cdup);
     - 
     -         if not isValidGitDir(cmd.gitdir):
     --            if isValidGitDir(cmd.gitdir + "/.git"):
     --                cmd.gitdir += "/.git"
     -+            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
     -+                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
     -             else:
     -                 die("fatal: cannot locate git repository at %s" % cmd.gitdir)
     - 
  -:  ---------- > 11:  883ef45ca5 git-p4: Added --encoding parameter to p4 clone

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 01/11] git-p4: select p4 binary by operating-system
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 10:19         ` Denton Liu
  2019-12-04 22:29       ` [PATCH v4 02/11] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
                         ` (10 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Depending on the version of GIT and Python installed, the perforce program (p4) may not resolve on Windows without the program extension.

Check the operating system (platform.system) and if it is reporting that it is Windows, use the full filename of "p4.exe" instead of "p4"

The original code unconditionally used "p4" as the binary filename.

This change is Python2 and Python3 compatible.

Thanks to: Junio C Hamano <gitster@pobox.com> and  Denton Liu <liu.denton@gmail.com> for patiently explaining proper format for my submissions.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 9a3a5c4e6d29dbef670072a9605c7a82b3729434)
---
 git-p4.py | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/git-p4.py b/git-p4.py
index 60c73b6a37..b2ffbc057b 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -75,7 +75,11 @@ def p4_build_cmd(cmd):
     location. It means that hooking into the environment, or other configuration
     can be done more easily.
     """
-    real_cmd = ["p4"]
+    # Look for the P4 binary
+    if (platform.system() == "Windows"):
+        real_cmd = ["p4.exe"]    
+    else:
+        real_cmd = ["p4"]
 
     user = gitConfig("git-p4.user")
     if len(user) > 0:
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 02/11] git-p4: change the expansion test from basestring to list
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  2019-12-04 22:29       ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 10:27         ` Denton Liu
  2019-12-04 22:29       ` [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion Ben Keene via GitGitGadget
                         ` (9 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.

Change references to basestring in the isinstance tests to use list instead. This prepares the code to remove all references to basestring.

The original code used basestring in a test to determine if a list or literal string was passed into 9 different functions.  This is used to determine if the shell should be evoked when calling subprocess methods.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 5b1b1c145479b5d5fd242122737a3134890409e6)
---
 git-p4.py | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index b2ffbc057b..0f27996393 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -109,7 +109,7 @@ def p4_build_cmd(cmd):
         # Provide a way to not pass this option by setting git-p4.retries to 0
         real_cmd += ["-r", str(retries)]
 
-    if isinstance(cmd,basestring):
+    if not isinstance(cmd, list):
         real_cmd = ' '.join(real_cmd) + ' ' + cmd
     else:
         real_cmd += cmd
@@ -175,7 +175,7 @@ def write_pipe(c, stdin):
     if verbose:
         sys.stderr.write('Writing pipe: %s\n' % str(c))
 
-    expand = isinstance(c,basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdin=subprocess.PIPE, shell=expand)
     pipe = p.stdin
     val = pipe.write(stdin)
@@ -197,7 +197,7 @@ def read_pipe_full(c):
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
-    expand = isinstance(c,basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
     return (p.returncode, out, err)
@@ -233,7 +233,7 @@ def read_pipe_lines(c):
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
-    expand = isinstance(c, basestring)
+    expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, shell=expand)
     pipe = p.stdout
     val = pipe.readlines()
@@ -276,7 +276,7 @@ def p4_has_move_command():
     return True
 
 def system(cmd, ignore_error=False):
-    expand = isinstance(cmd,basestring)
+    expand = not isinstance(cmd, list)
     if verbose:
         sys.stderr.write("executing %s\n" % str(cmd))
     retcode = subprocess.call(cmd, shell=expand)
@@ -288,7 +288,7 @@ def system(cmd, ignore_error=False):
 def p4_system(cmd):
     """Specifically invoke p4 as the system command. """
     real_cmd = p4_build_cmd(cmd)
-    expand = isinstance(real_cmd, basestring)
+    expand = not isinstance(real_cmd, list)
     retcode = subprocess.call(real_cmd, shell=expand)
     if retcode:
         raise CalledProcessError(retcode, real_cmd)
@@ -526,7 +526,7 @@ def getP4OpenedType(file):
 # Return the set of all p4 labels
 def getP4Labels(depotPaths):
     labels = set()
-    if isinstance(depotPaths,basestring):
+    if not isinstance(depotPaths, list):
         depotPaths = [depotPaths]
 
     for l in p4CmdList(["labels"] + ["%s..." % p for p in depotPaths]):
@@ -613,7 +613,7 @@ def isModeExecChanged(src_mode, dst_mode):
 def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
         errors_as_exceptions=False):
 
-    if isinstance(cmd,basestring):
+    if not isinstance(cmd, list):
         cmd = "-G " + cmd
         expand = True
     else:
@@ -630,7 +630,7 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     stdin_file = None
     if stdin is not None:
         stdin_file = tempfile.TemporaryFile(prefix='p4-stdin', mode=stdin_mode)
-        if isinstance(stdin,basestring):
+        if not isinstance(stdin, list):
             stdin_file.write(stdin)
         else:
             for i in stdin:
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
  2019-12-04 22:29       ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
  2019-12-04 22:29       ` [PATCH v4 02/11] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 10:40         ` Denton Liu
  2019-12-04 22:29       ` [PATCH v4 04/11] git-p4: python3 syntax changes Ben Keene via GitGitGadget
                         ` (8 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.

Change the existing unicode test add new support functions for python2-python3 support.

Define the following variables:
- isunicode - a boolean variable that states if the version of python natively supports unicode (true) or not (false). This is true for Python3 and false for Python2.
- unicode - a type alias for the datatype that holds a unicode string.  It is assigned to a str under python 3 and the unicode type for Python2.
- bytes - a type alias for an array of bytes.  It is assigned the native bytes type for Python3 and str for Python2.

Add the following new functions:

- as_string(text) - A new function that will convert a byte array to a unicode (UTF-8) string under python 3.  Under python 2, this returns the string unchanged.
- as_bytes(text) - A new function that will convert a unicode string to a byte array under python 3.  Under python 2, this returns the string unchanged.
- to_unicode(text) - Converts a text string as Unicode(UTF-8) on both Python2 and Python3.

Add a new function alias raw_input:
If raw_input does not exist (it was renamed to input in python 3) alias input as raw_input.

The AS_STRING and AS_BYTES functions allow for modifying the code with a minimal amount of impact on Python2 support.  When a string is expected, the as_string() will be used to convert "cast" the incoming "bytes" to a string type. Conversely as_bytes() will be used to convert a "string" to a "byte array" type. Since Python2 overloads the datatype 'str' to serve both purposes, the Python2 versions of these function do not change the data, since the str functions as both a byte array and a string.

basestring is removed since its only references are found in tests that were changed in the previous change list.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 7921aeb3136b07643c1a503c2d9d8b5ada620356)
---
 git-p4.py | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 66 insertions(+), 4 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 0f27996393..93dfd0920a 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -32,16 +32,78 @@
     unicode = unicode
 except NameError:
     # 'unicode' is undefined, must be Python 3
-    str = str
+    #
+    # For Python3 which is natively unicode, we will use 
+    # unicode for internal information but all P4 Data
+    # will remain in bytes
+    isunicode = True
     unicode = str
     bytes = bytes
-    basestring = (str,bytes)
+
+    def as_string(text):
+        """Return a byte array as a unicode string"""
+        if text == None:
+            return None
+        if isinstance(text, bytes):
+            return unicode(text, "utf-8")
+        else:
+            return text
+
+    def as_bytes(text):
+        """Return a Unicode string as a byte array"""
+        if text == None:
+            return None
+        if isinstance(text, bytes):
+            return text
+        else:
+            return bytes(text, "utf-8")
+
+    def to_unicode(text):
+        """Return a byte array as a unicode string"""
+        return as_string(text)    
+
+    def path_as_string(path):
+        """ Converts a path to the UTF8 encoded string """
+        if isinstance(path, unicode):
+            return path
+        return encodeWithUTF8(path).decode('utf-8')
+    
 else:
     # 'unicode' exists, must be Python 2
-    str = str
+    #
+    # We will treat the data as:
+    #   str   -> str
+    #   bytes -> str
+    # So for Python2 these functions are no-ops
+    # and will leave the data in the ambiguious
+    # string/bytes state
+    isunicode = False
     unicode = unicode
     bytes = str
-    basestring = basestring
+
+    def as_string(text):
+        """ Return text unaltered (for Python3 support) """
+        return text
+
+    def as_bytes(text):
+        """ Return text unaltered (for Python3 support) """
+        return text
+
+    def to_unicode(text):
+        """Return a string as a unicode string"""
+        return text.decode('utf-8')
+    
+    def path_as_string(path):
+        """ Converts a path to the UTF8 encoded bytes """
+        return encodeWithUTF8(path)
+
+
+ 
+# Check for raw_input support
+try:
+    raw_input
+except NameError:
+    raw_input = input
 
 try:
     from subprocess import CalledProcessError
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 04/11] git-p4: python3 syntax changes
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (2 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 11:02         ` Denton Liu
  2019-12-04 22:29       ` [PATCH v4 05/11] git-p4: Add new functions in preparation of usage Ben Keene via GitGitGadget
                         ` (7 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.

There are a number of translations suggested by modernize/futureize that should be taken to fix numerous non-string specific issues.

Change references to the X.next() iterator to the function next(X) which is compatible with both Python2 and Python3.

Change references to X.keys() to list(X.keys()) to return a list that can be iterated in both Python2 and Python3.

Add the literal text (object) to the end of class definitions to be consistent with Python3 class definition.

Change integer divison to use "//" instead of "/"  Under Both python2 and python3 // will return a floor()ed result which matches existing functionality.

Change the format string for displaying decimal values from %d to %4.1f% when displaying a progress.  This avoids displaying long repeating decimals in user displayed text.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit bde6b83296aa9b3e7a584c5ce2b571c7287d8f9f)
---
 git-p4.py | 55 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 26 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 93dfd0920a..b283ef1029 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -26,6 +26,9 @@
 import zlib
 import ctypes
 import errno
+import os.path
+import codecs
+import io
 
 # support basestring in python3
 try:
@@ -631,7 +634,7 @@ def parseDiffTreeEntry(entry):
 
     If the pattern is not matched, None is returned."""
 
-    match = diffTreePattern().next().match(entry)
+    match = next(diffTreePattern()).match(entry)
     if match:
         return {
             'src_mode': match.group(1),
@@ -935,7 +938,7 @@ def findUpstreamBranchPoint(head = "HEAD"):
     branches = p4BranchesInGit()
     # map from depot-path to branch name
     branchByDepotPath = {}
-    for branch in branches.keys():
+    for branch in list(branches.keys()):
         tip = branches[branch]
         log = extractLogMessageFromGitCommit(tip)
         settings = extractSettingsGitLog(log)
@@ -1129,7 +1132,7 @@ def getClientSpec():
     client_name = entry["Client"]
 
     # just the keys that start with "View"
-    view_keys = [ k for k in entry.keys() if k.startswith("View") ]
+    view_keys = [ k for k in list(entry.keys()) if k.startswith("View") ]
 
     # hold this new View
     view = View(client_name)
@@ -1371,7 +1374,7 @@ def processContent(self, git_mode, relPath, contents):
         else:
             return LargeFileSystem.processContent(self, git_mode, relPath, contents)
 
-class Command:
+class Command(object):
     delete_actions = ( "delete", "move/delete", "purge" )
     add_actions = ( "add", "branch", "move/add" )
 
@@ -1386,7 +1389,7 @@ def ensure_value(self, attr, value):
             setattr(self, attr, value)
         return getattr(self, attr)
 
-class P4UserMap:
+class P4UserMap(object):
     def __init__(self):
         self.userMapFromPerforceServer = False
         self.myP4UserId = None
@@ -1437,7 +1440,7 @@ def getUserMapFromPerforceServer(self):
                 self.emails[email] = user
 
         s = ''
-        for (key, val) in self.users.items():
+        for (key, val) in list(self.users.items()):
             s += "%s\t%s\n" % (key.expandtabs(1), val.expandtabs(1))
 
         open(self.getUserCacheFilename(), "wb").write(s)
@@ -1788,7 +1791,7 @@ def prepareSubmitTemplate(self, changelist=None):
                 break
         if not change_entry:
             die('Failed to decode output of p4 change -o')
-        for key, value in change_entry.iteritems():
+        for key, value in list(change_entry.items()):
             if key.startswith('File'):
                 if 'depot-paths' in settings:
                     if not [p for p in settings['depot-paths']
@@ -2032,7 +2035,7 @@ def applyCommit(self, id):
             p4_delete(f)
 
         # Set/clear executable bits
-        for f in filesToChangeExecBit.keys():
+        for f in list(filesToChangeExecBit.keys()):
             mode = filesToChangeExecBit[f]
             setP4ExecBit(f, mode)
 
@@ -2285,7 +2288,7 @@ def run(self, args):
             self.clientSpecDirs = getClientSpec()
 
         # Check for the existence of P4 branches
-        branchesDetected = (len(p4BranchesInGit().keys()) > 1)
+        branchesDetected = (len(list(p4BranchesInGit().keys())) > 1)
 
         if self.useClientSpec and not branchesDetected:
             # all files are relative to the client spec
@@ -2676,7 +2679,7 @@ def __init__(self):
         self.knownBranches = {}
         self.initialParents = {}
 
-        self.tz = "%+03d%02d" % (- time.timezone / 3600, ((- time.timezone % 3600) / 60))
+        self.tz = "%+03d%02d" % (- time.timezone // 3600, ((- time.timezone % 3600) // 60))
         self.labels = {}
 
     # Force a checkpoint in fast-import and wait for it to finish
@@ -2793,7 +2796,7 @@ def splitFilesIntoBranches(self, commit):
             else:
                 relPath = self.stripRepoPath(path, self.depotPaths)
 
-            for branch in self.knownBranches.keys():
+            for branch in list(self.knownBranches.keys()):
                 # add a trailing slash so that a commit into qt/4.2foo
                 # doesn't end up in qt/4.2, e.g.
                 if p4PathStartsWith(relPath, branch + "/"):
@@ -2834,7 +2837,7 @@ def streamOneP4File(self, file, contents):
                 size = int(self.stream_file['fileSize'])
             else:
                 size = 0 # deleted files don't get a fileSize apparently
-            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024))
+            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size//1024//1024))
             sys.stdout.flush()
 
         (type_base, type_mods) = split_p4_type(file["type"])
@@ -2934,7 +2937,7 @@ def streamP4FilesCb(self, marshalled):
             required_bytes = int((4 * int(self.stream_file["fileSize"])) - calcDiskFree())
             if required_bytes > 0:
                 err = 'Not enough space left on %s! Free at least %i MB.' % (
-                    os.getcwd(), required_bytes/1024/1024
+                    os.getcwd(), required_bytes//1024//1024
                 )
 
         if err:
@@ -2963,7 +2966,7 @@ def streamP4FilesCb(self, marshalled):
 
         # pick up the new file information... for the
         # 'data' field we need to append to our array
-        for k in marshalled.keys():
+        for k in list(marshalled.keys()):
             if k == 'data':
                 if 'streamContentSize' not in self.stream_file:
                     self.stream_file['streamContentSize'] = 0
@@ -2978,8 +2981,8 @@ def streamP4FilesCb(self, marshalled):
             'depotFile' in self.stream_file):
             size = int(self.stream_file["fileSize"])
             if size > 0:
-                progress = 100*self.stream_file['streamContentSize']/size
-                sys.stdout.write('\r%s %d%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size/1024/1024)))
+                progress = 100.0*self.stream_file['streamContentSize']/size
+                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size//1024//1024)))
                 sys.stdout.flush()
 
         self.stream_have_file_info = True
@@ -3060,7 +3063,7 @@ def streamTag(self, gitStream, labelName, labelDetails, commit, epoch):
 
         gitStream.write("tagger %s\n" % tagger)
 
-        print("labelDetails=",labelDetails)
+        print(("labelDetails=",labelDetails))
         if 'Description' in labelDetails:
             description = labelDetails['Description']
         else:
@@ -3199,7 +3202,7 @@ def getLabels(self):
             self.labels[newestChange] = [output, revisions]
 
         if self.verbose:
-            print("Label changes: %s" % self.labels.keys())
+            print("Label changes: %s" % list(self.labels.keys()))
 
     # Import p4 labels as git tags. A direct mapping does not
     # exist, so assume that if all the files are at the same revision
@@ -3342,7 +3345,7 @@ def getBranchMapping(self):
 
     def getBranchMappingFromGitBranches(self):
         branches = p4BranchesInGit(self.importIntoRemotes)
-        for branch in branches.keys():
+        for branch in list(branches.keys()):
             if branch == "master":
                 branch = "main"
             else:
@@ -3454,14 +3457,14 @@ def importChanges(self, changes, origin_revision=0):
             self.updateOptionDict(description)
 
             if not self.silent:
-                sys.stdout.write("\rImporting revision %s (%s%%)" % (change, cnt * 100 / len(changes)))
+                sys.stdout.write("\rImporting revision %s (%4.1f%%)" % (change, cnt * 100 / len(changes)))
                 sys.stdout.flush()
             cnt = cnt + 1
 
             try:
                 if self.detectBranches:
                     branches = self.splitFilesIntoBranches(description)
-                    for branch in branches.keys():
+                    for branch in list(branches.keys()):
                         ## HACK  --hwn
                         branchPrefix = self.depotPaths[0] + branch + "/"
                         self.branchPrefixes = [ branchPrefix ]
@@ -3650,13 +3653,13 @@ def run(self, args):
                 if short in branches:
                     self.p4BranchesInGit = [ short ]
             else:
-                self.p4BranchesInGit = branches.keys()
+                self.p4BranchesInGit = list(branches.keys())
 
             if len(self.p4BranchesInGit) > 1:
                 if not self.silent:
                     print("Importing from/into multiple branches")
                 self.detectBranches = True
-                for branch in branches.keys():
+                for branch in list(branches.keys()):
                     self.initialParents[self.refPrefix + branch] = \
                         branches[branch]
 
@@ -4040,7 +4043,7 @@ def findLastP4Revision(self, starting_point):
             to find the P4 commit we are based on, and the depot-paths.
         """
 
-        for parent in (range(65535)):
+        for parent in (list(range(65535))):
             log = extractLogMessageFromGitCommit("{0}^{1}".format(starting_point, parent))
             settings = extractSettingsGitLog(log)
             if 'change' in settings:
@@ -4179,7 +4182,7 @@ def printUsage(commands):
 
 def main():
     if len(sys.argv[1:]) == 0:
-        printUsage(commands.keys())
+        printUsage(list(commands.keys()))
         sys.exit(2)
 
     cmdName = sys.argv[1]
@@ -4189,7 +4192,7 @@ def main():
     except KeyError:
         print("unknown command %s" % cmdName)
         print("")
-        printUsage(commands.keys())
+        printUsage(list(commands.keys()))
         sys.exit(2)
 
     options = cmd.options
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 05/11] git-p4: Add new functions in preparation of usage
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (3 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 04/11] git-p4: python3 syntax changes Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 10:50         ` Denton Liu
  2019-12-04 22:29       ` [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
                         ` (6 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

This changelist is an intermediate submission for migrating the P4 support from Python2 to Python3. The code needs access to the encodeWithUTF8() for support of non-UTF8 filenames in the clone class as well as the sync class.

Move the function encodeWithUTF8() from the P4Sync class to a stand-alone function.  This will allow other classes to use this function without instanciating the P4Sync class. Change the self.verbose reference to an optional method parameter. Update the existing references to this function to pass the self.verbose since it is no longer available on "self" since the function is no longer contained on the P4Sync class.

Modify the functions write_pipe() and p4_write_pipe() to remove the return value.  The return value for both functions is the number of bytes, but the meaning is lost under python3 since the count does not match the number of characters that may have been encoded.  Additionally, the return value was never used, so this is removed to avoid future ambiguity.

Add a new method gitConfigSet(). This method will set a value in the git configuration cache list.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit affe888f432bb6833df78962e8671fccdf76c47a)
---
 git-p4.py | 60 ++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index b283ef1029..2659531c2e 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -237,6 +237,8 @@ def die(msg):
         sys.exit(1)
 
 def write_pipe(c, stdin):
+    """ Executes the command 'c', passing 'stdin' on the standard input
+    """
     if verbose:
         sys.stderr.write('Writing pipe: %s\n' % str(c))
 
@@ -248,11 +250,12 @@ def write_pipe(c, stdin):
     if p.wait():
         die('Command failed: %s' % str(c))
 
-    return val
 
 def p4_write_pipe(c, stdin):
+    """ Runs a P4 command 'c', passing 'stdin' data to P4
+    """
     real_cmd = p4_build_cmd(c)
-    return write_pipe(real_cmd, stdin)
+    write_pipe(real_cmd, stdin)
 
 def read_pipe_full(c):
     """ Read output from  command. Returns a tuple
@@ -653,6 +656,38 @@ def isModeExec(mode):
     # otherwise False.
     return mode[-3:] == "755"
 
+def encodeWithUTF8(path, verbose = False):
+    """ Ensure that the path is encoded as a UTF-8 string
+
+        Returns bytes(P3)/str(P2)
+    """
+   
+    if isunicode:
+        try:
+            if isinstance(path, unicode):
+                # It is already unicode, cast it as a bytes
+                # that is encoded as utf-8.
+                return path.encode('utf-8', 'strict')
+            path.decode('ascii', 'strict')
+        except:
+            encoding = 'utf8'
+            if gitConfig('git-p4.pathEncoding'):
+                encoding = gitConfig('git-p4.pathEncoding')
+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
+            if verbose:
+                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
+    else:    
+        try:
+            path.decode('ascii')
+        except:
+            encoding = 'utf8'
+            if gitConfig('git-p4.pathEncoding'):
+                encoding = gitConfig('git-p4.pathEncoding')
+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
+            if verbose:
+                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
+    return path
+
 class P4Exception(Exception):
     """ Base class for exceptions from the p4 client """
     def __init__(self, exit_code):
@@ -891,6 +926,11 @@ def gitConfigList(key):
             _gitConfig[key] = []
     return _gitConfig[key]
 
+def gitConfigSet(key, value):
+    """ Set the git configuration key 'key' to 'value' for this session
+    """
+    _gitConfig[key] = value
+
 def p4BranchesInGit(branchesAreInRemotes=True):
     """Find all the branches whose names start with "p4/", looking
        in remotes or heads as specified by the argument.  Return
@@ -2814,24 +2854,12 @@ def writeToGitStream(self, gitMode, relPath, contents):
             self.gitStream.write(d)
         self.gitStream.write('\n')
 
-    def encodeWithUTF8(self, path):
-        try:
-            path.decode('ascii')
-        except:
-            encoding = 'utf8'
-            if gitConfig('git-p4.pathEncoding'):
-                encoding = gitConfig('git-p4.pathEncoding')
-            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
-            if self.verbose:
-                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
-        return path
-
     # output one file from the P4 stream
     # - helper for streamP4Files
 
     def streamOneP4File(self, file, contents):
         relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
+        relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
             if 'fileSize' in self.stream_file:
                 size = int(self.stream_file['fileSize'])
@@ -2914,7 +2942,7 @@ def streamOneP4File(self, file, contents):
 
     def streamOneP4Deletion(self, file):
         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
-        relPath = self.encodeWithUTF8(relPath)
+        relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
             sys.stdout.write("delete %s\n" % relPath)
             sys.stdout.flush()
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (4 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 05/11] git-p4: Add new functions in preparation of usage Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 13:38         ` Junio C Hamano
  2019-12-04 22:29       ` [PATCH v4 07/11] git-p4: Add a helper class for stream writing Ben Keene via GitGitGadget
                         ` (5 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

When a computer is configured to use Git for windows and Python for windows, and not a Unix subsystem like cygwin or WSL, the directory separator changes and causes git-p4 to fail to properly determine paths.

Fix 3 path separator errors:

1. getUserCacheFilename should not use string concatenation. Change this code to use os.path.join to build an OS tolerant path.
2. defaultDestiantion used the OS.path.split to split depot paths.  This is incorrect on windows. Change the code to split on a forward slash(/) instead since depot paths use this character regardless  of the operating system.
3. The call to isvalidGitDir() in the main code also used a literal forward slash. Change the cose to use os.path.join to correctly format the path for the operating system.

These three changes allow the suggested windows configuration to properly locate files while retaining the existing behavior on non-windows operating systems.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit a5b45c12c3861638a933b05a1ffee0c83978dcb2)
---
 git-p4.py | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 2659531c2e..7ac8cb42ef 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1454,8 +1454,10 @@ def p4UserIsMe(self, p4User):
             return True
 
     def getUserCacheFilename(self):
+        """ Returns the filename of the username cache 
+	    """
         home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
-        return home + "/.gitp4-usercache.txt"
+        return os.path.join(home, ".gitp4-usercache.txt")
 
     def getUserMapFromPerforceServer(self):
         if self.userMapFromPerforceServer:
@@ -3973,13 +3975,16 @@ def __init__(self):
         self.cloneBare = False
 
     def defaultDestination(self, args):
+        """ Returns the last path component as the default git 
+            repository directory name
+        """
         ## TODO: use common prefix of args?
         depotPath = args[0]
         depotDir = re.sub("(@[^@]*)$", "", depotPath)
         depotDir = re.sub("(#[^#]*)$", "", depotDir)
         depotDir = re.sub(r"\.\.\.$", "", depotDir)
         depotDir = re.sub(r"/$", "", depotDir)
-        return os.path.split(depotDir)[1]
+        return depotDir.split('/')[-1]
 
     def run(self, args):
         if len(args) < 1:
@@ -4252,8 +4257,8 @@ def main():
                         chdir(cdup);
 
         if not isValidGitDir(cmd.gitdir):
-            if isValidGitDir(cmd.gitdir + "/.git"):
-                cmd.gitdir += "/.git"
+            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
+                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
             else:
                 die("fatal: cannot locate git repository at %s" % cmd.gitdir)
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 07/11] git-p4: Add a helper class for stream writing
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (5 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 13:42         ` Junio C Hamano
  2019-12-04 22:29       ` [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
                         ` (4 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

This is a transtional commit that does not change current behvior.  It adds a new class Py23File.

Following the Python recommendation of keeping text as unicode internally and only converting to and from bytes on input and output, this class provides an interface for the methods used for reading and writing files and file like streams.

Create a class that wraps the input and output functions used by the git-p4.py code for reading and writing to standard file handles.

The methods of this class should take a Unicode string for writing and return unicode strings in reads.  This class should be a drop-in for existing file like streams

The following methods should be coded for supporting existing read/write calls:
* write - this should write a Unicode string to the underlying stream
* read - this should read from the underlying stream and cast the bytes as a unicode string
* readline - this should read one line of text from the underlying stream and cast it as a unicode string
* readline - this should read a number of lines, optionally hinted, and cast each line as a unicode string

The expression "cast as a unicode string" is used because the code should use the AS_BYTES() and AS_UNICODE() functions instead of cohercing the data to actual unicode strings or bytes.  This allows python 2 code to continue to use the internal "str" data type instead of converting the data back and forth to actual unicode strings. This retains current python2 support while python3 support may be incomplete.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 12919111fbaa3e4c0c4c2fdd4f79744cc683d860)
---
 git-p4.py | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/git-p4.py b/git-p4.py
index 7ac8cb42ef..0da640be93 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -4182,6 +4182,72 @@ def run(self, args):
             print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
         return True
 
+class Py23File():
+    """ Python2/3 Unicode File Wrapper 
+    """
+    
+    stream_handle = None
+    verbose       = False
+    debug_handle  = None
+   
+    def __init__(self, stream_handle, verbose = False):
+        """ Create a Python3 compliant Unicode to Byte String
+            Windows compatible wrapper
+
+            stream_handle = the underlying file-like handle
+            verbose       = Boolean if content should be echoed
+        """
+        self.stream_handle = stream_handle
+        self.verbose       = verbose
+
+    def write(self, utf8string):
+        """ Writes the utf8 encoded string to the underlying 
+            file stream
+        """
+        self.stream_handle.write(as_bytes(utf8string))
+        if self.verbose:
+            sys.stderr.write("Stream Output: %s" % utf8string)
+            sys.stderr.flush()
+
+    def read(self, size = None):
+        """ Reads int charcters from the underlying stream 
+            and converts it to utf8.
+
+            Be aware, the size value is for reading the underlying
+            bytes so the value may be incorrect. Usage of the size
+            value is discouraged.
+        """
+        if size == None:
+            return as_string(self.stream_handle.read())
+        else:
+            return as_string(self.stream_handle.read(size))
+
+    def readline(self):
+        """ Reads a line from the underlying byte stream 
+            and converts it to utf8
+        """
+        return as_string(self.stream_handle.readline())
+
+    def readlines(self, sizeHint = None):
+        """ Returns a list containing lines from the file converted to unicode.
+
+            sizehint - Optional. If the optional sizehint argument is 
+            present, instead of reading up to EOF, whole lines totalling 
+            approximately sizehint bytes are read.
+        """
+        lines = self.stream_handle.readlines(sizeHint)
+        for i in range(0, len(lines)):
+            lines[i] = as_string(lines[i])
+        return lines
+
+    def close(self):
+        """ Closes the underlying byte stream """
+        self.stream_handle.close()
+
+    def flush(self):
+        """ Flushes the underlying byte stream """
+        self.stream_handle.flush()
+
 class HelpFormatter(optparse.IndentedHelpFormatter):
     def __init__(self):
         optparse.IndentedHelpFormatter.__init__(self)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 08/11] git-p4: p4CmdList  - support Unicode encoding
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (6 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 07/11] git-p4: Add a helper class for stream writing Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 13:55         ` Junio C Hamano
  2019-12-04 22:29       ` [PATCH v4 09/11] git-p4: Add usability enhancements Ben Keene via GitGitGadget
                         ` (3 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

The p4CmdList is a commonly used function in the git-p4 code. It is used to execute a command in P4 and return the results of the call in a list.

Change this code to take a new optional parameter, encode_data that will optionally convert the data AS_STRING() that isto be returned by the function.

Change the code so that the key will always be encoded AS_STRING()

Data that is passed for standard input (stdin) should be AS_BYTES() to ensure unicode text that is supplied will be written out as bytes.

Additionally, change literal text prior to conversion to be literal bytes.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 88306ac269186cbd0f6dc6cfd366b50b28ee4886)
---
 git-p4.py | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 0da640be93..f7c0ef0c53 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -711,7 +711,23 @@ def isModeExecChanged(src_mode, dst_mode):
     return isModeExec(src_mode) != isModeExec(dst_mode)
 
 def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
-        errors_as_exceptions=False):
+        errors_as_exceptions=False, encode_data=True):
+    """ Executes a P4 command:  'cmd' optionally passing 'stdin' to the command's
+        standard input via a temporary file with 'stdin_mode' mode.
+
+        Output from the command is optionally passed to the callback function 'cb'.
+        If 'cb' is None, the response from the command is parsed into a list
+        of resulting dictionaries. (For each block read from the process pipe.)
+
+        If 'skip_info' is true, information in a block read that has a code type of
+        'info' will be skipped.
+
+        If 'errors_as_exceptions' is set to true (the default is false) the error
+        code returned from the execution will generate an exception.
+
+        If 'encode_data' is set to true (the default) the data that is returned 
+        by this function will be passed through the "as_string" function.
+    """
 
     if not isinstance(cmd, list):
         cmd = "-G " + cmd
@@ -734,7 +750,7 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             stdin_file.write(stdin)
         else:
             for i in stdin:
-                stdin_file.write(i + '\n')
+                stdin_file.write(as_bytes(i) + b'\n')
         stdin_file.flush()
         stdin_file.seek(0)
 
@@ -748,12 +764,15 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
         while True:
             entry = marshal.load(p4.stdout)
             if skip_info:
-                if 'code' in entry and entry['code'] == 'info':
+                if b'code' in entry and entry[b'code'] == b'info':
                     continue
             if cb is not None:
                 cb(entry)
             else:
-                result.append(entry)
+                out = {}
+                for key, value in entry.items():
+                    out[as_string(key)] = (as_string(value) if encode_data else value)
+                result.append(out)
     except EOFError:
         pass
     exitCode = p4.wait()
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 09/11] git-p4: Add usability enhancements
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (7 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05 14:04         ` Junio C Hamano
  2019-12-04 22:29       ` [PATCH v4 10/11] git-p4: Support python3 for basic P4 clone, sync, and submit Ben Keene via GitGitGadget
                         ` (2 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Issue: when prompting the user with raw_input, the tests are not forgiving of user input.  For example, on the first query asks for a yes/no response. If the user enters the full word "yes" or "no" the test will fail. Additionally, offer the suggestion of setting git-p4.attemptRCSCleanup when applying a commit fails because of RCS keywords. Both of these changes are usability enhancement suggestions.

Change the code prompting the user for input to sanitize the user input before checking the response by asking the response as a lower case string, trimming leading/trailing spaces, and returning the first character.

Change the applyCommit() method that when applying a commit fails becasue of the P4 RCS Keywords, the user should consider setting git-p4.attemptRCSCleanup.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 1fab571664f5b6ad4ef321199f52615a32a9f8c7)
---
 git-p4.py | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index f7c0ef0c53..f13e4645a3 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1909,7 +1909,8 @@ def edit_template(self, template_file):
             return True
 
         while True:
-            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
+            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
+                .strip()[0]
             if response == 'y':
                 return True
             if response == 'n':
@@ -2069,8 +2070,23 @@ def applyCommit(self, id):
                     # disable the read-only bit on windows.
                     if self.isWindows and file not in editedFiles:
                         os.chmod(file, stat.S_IWRITE)
-                    self.patchRCSKeywords(file, kwfiles[file])
-                    fixed_rcs_keywords = True
+                    
+                    try:
+                        self.patchRCSKeywords(file, kwfiles[file])
+                        fixed_rcs_keywords = True
+                    except:
+                        # We are throwing an exception, undo all open edits
+                        for f in editedFiles:
+                            p4_revert(f)
+                        raise
+            else:
+                # They do not have attemptRCSCleanup set, this might be the fail point
+                # Check to see if the file has RCS keywords and suggest setting the property.
+                for file in editedFiles | filesToDelete:
+                    if p4_keywords_regexp_for_file(file) != None:
+                        print("At least one file in this commit has RCS Keywords that may be causing problems. ")
+                        print("Consider:\ngit config git-p4.attemptRCSCleanup true")
+                        break
 
             if fixed_rcs_keywords:
                 print("Retrying the patch with RCS keywords cleaned up")
@@ -2481,7 +2497,7 @@ def run(self, args):
                         if self.conflict_behavior == "ask":
                             print("What do you want to do?")
                             response = raw_input("[s]kip this commit but apply"
-                                                 " the rest, or [q]uit? ")
+                                                 " the rest, or [q]uit? ").lower().strip()[0]
                             if not response:
                                 continue
                         elif self.conflict_behavior == "skip":
@@ -4327,7 +4343,12 @@ def main():
                                    description = cmd.description,
                                    formatter = HelpFormatter())
 
-    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
+    try:
+        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
+    except:
+        parser.print_help()
+        raise
+
     global verbose
     verbose = cmd.verbose
     if cmd.needsGit:
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 10/11] git-p4: Support python3 for basic P4 clone, sync, and submit
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (8 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 09/11] git-p4: Add usability enhancements Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-04 22:29       ` [PATCH v4 11/11] git-p4: Added --encoding parameter to p4 clone Ben Keene via GitGitGadget
  2019-12-05  9:54       ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Luke Diamand
  11 siblings, 0 replies; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

Issue: Python 3 is still not properly supported for any use with the git-p4 python code.
Warning - this is a very large atomic commit.  The commit text is also very large.

Change the code such that, with the exception of P4 depot paths and depot files, all text read by git-p4 is cast as a string as soon as possible and converted back to bytes as late as possible, following Python2 to Python3 conversion best practices.

Important: Do not cast the bytes that contain the p4 depot path or p4 depot file name.  These should be left as bytes until used.

These two values should not be converted because the encoding of these values is unknown.  git-p4 supports a configuration value git-p4.pathEncoding that is used by the encodeWithUTF8()  to determine what a UTF8 version of the path and filename should be.  However, since depot path and depot filename need to be sent to P4 in their original encoding, they will be left as byte streams until they are actually used:

* When sent to P4, the bytes are literally passed to the p4 command
* When displayed in text for the user, they should be passed through the path_as_string() function
* When used by GIT they should be passed through the encodeWithUTF8() function

Change all the rest of system calls to cast output (stdin) as_bytes() and input (stdout) as_string().  This retains existing Python 2 support, and adds python 3 support for these functions:
* read_pipe_full
* read_pipe_lines
* p4_has_move_command (used internally)
* gitConfig
* branch_exists
* GitLFS.generatePointer
* applyCommit - template must be read and written to the temporary file as_bytes() since it is created in memory as a string.
* streamOneP4File(file, contents) - wrap calls to the depotFile in path_as_string() for display. The file contents must be retained as bytes, so update the RCS changes to be forced to bytes.
* streamP4Files
* importHeadRevision(revision) - encode the depotPaths for display separate from the text for processing.

Py23File usage -
Change the P4Sync.OpenStreams() function to cast the gitOutput, gitStream, and gitError streams as Py23File() wrapper classes.  This facilitates taking strings in both python 2 and python 3 and casting them to bytes in the wrapper class instead of having to modify each method. Since the fast-import command also expects a raw byte stream for file content, add a new stream handle - gitStreamBytes which is an unwrapped verison of gitStream.

Literal text -
Depending on context, most literal text does not need casting to unicode or bytes as the text is Python dependent - In python 2, the string is implied as 'str' and python 3 the string is implied as 'unicode'. Under these conditions, they match the rest of the operating text, following best practices.  However, when a literal string is used in functions that are dealing with the raw input from and raw ouput to files streams, literal bytes may be required. Additionally, functions that are dealing with P4 depot paths or P4 depot file names are also dealing with bytes and will require the same casting as bytes.  The following functions cast text as byte strings:
* wildcard_decode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
* wildcard_encode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
* streamP4FilesCb(marshalled) - the marshalled data is in bytes. Cast the literals as bytes. When using this data to manipulate self.stream_file, encode all the marshalled data except for the 'depotFile' name.
* streamP4Files

Special behavior:
* p4_describe - encoding is disabled for the depotFile(x) and path elements since these are depot path and depo filenames.
* p4PathStartsWith(path, prefix) - Since P4 depot paths can contain non-UTF-8 encoded strings, change this method to compare paths while supporting the optional encoding.
   - First, perform a byte-to-byte check to see if the path and prefix are both identical text.  There is no need to perform encoding conversions if the text is identical.
   - If the byte check fails, pass both the path and prefix through encodeWithUTF8() to ensure both paths are using the same encoding. Then perform the test as originally written.
* patchRCSKeywords(file, pattern) - the parameters of file and pattern are both strings. However this function changes the contents of the file itentified by name "file". Treat the content of this file as binary to ensure that python does not accidently change the original encoding. The regular expression is cast as_bytes() and run against the file as_bytes(). The P4 keywords are ASCII strings and cannot span lines so iterating over each line of the file is acceptable.
* writeToGitStream(gitMode, relPath, contents) - Since 'contents' is already bytes data, instead of using the self.gitStream, use the new self.gitStreamBytes - the unwrapped gitStream that does not cast as_bytes() the binary data.
* commit(details, files, branch, parent = "", allow_empty=False) - Changed the encoding for the commit message to the preferred format for fast-import. The number of bytes is sent in the data block instead of using the EOT marker.
* Change the code for handling the user cache to use binary files. Cast text as_bytes() when writing to the cache and as_string() when reading from the cache.  This makes the reading and writing of the cache determinstic in it's encoding. Unlike file paths, P4 encodes the user names in UTF-8 encoding so no additional string encoding is required.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 65ff0c74ebe62a200b4385ecfd4aa618ce091f48)
---
 git-p4.py | 287 ++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 205 insertions(+), 82 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index f13e4645a3..05db2ec657 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -268,6 +268,8 @@ def read_pipe_full(c):
     expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
     (out, err) = p.communicate()
+    out = as_string(out)
+    err = as_string(err)
     return (p.returncode, out, err)
 
 def read_pipe(c, ignore_error=False):
@@ -294,10 +296,17 @@ def read_pipe_text(c):
         return out.rstrip()
 
 def p4_read_pipe(c, ignore_error=False):
+    """ Read output from the P4 command 'c'. Returns the output text on
+        success. On failure, terminates execution, unless
+        ignore_error is True, when it returns an empty string.
+    """
     real_cmd = p4_build_cmd(c)
     return read_pipe(real_cmd, ignore_error)
 
 def read_pipe_lines(c):
+    """ Returns a list of text from executing the command 'c'.
+        The program will die if the command fails to execute.
+    """
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
@@ -307,6 +316,11 @@ def read_pipe_lines(c):
     val = pipe.readlines()
     if pipe.close() or p.wait():
         die('Command failed: %s' % str(c))
+    # Unicode conversion from byte-string
+    # Iterate and fix in-place to avoid a second list in memory.
+    if isunicode:
+        for i in range(len(val)):
+            val[i] = as_string(val[i])
 
     return val
 
@@ -335,6 +349,8 @@ def p4_has_move_command():
     cmd = p4_build_cmd(["move", "-k", "@from", "@to"])
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     (out, err) = p.communicate()
+    out=as_string(out)
+    err=as_string(err)
     # return code will be 1 in either case
     if err.find("Invalid option") >= 0:
         return False
@@ -462,16 +478,20 @@ def p4_last_change():
     return int(results[0]['change'])
 
 def p4_describe(change, shelved=False):
-    """Make sure it returns a valid result by checking for
-       the presence of field "time".  Return a dict of the
-       results."""
+    """ Returns information about the requested P4 change list.
+
+        Data returned is not string encoded (returned as bytes)
+    """
+    # Make sure it returns a valid result by checking for
+    #   the presence of field "time".  Return a dict of the
+    #   results.
 
     cmd = ["describe", "-s"]
     if shelved:
         cmd += ["-S"]
     cmd += [str(change)]
 
-    ds = p4CmdList(cmd, skip_info=True)
+    ds = p4CmdList(cmd, skip_info=True, encode_data=False)
     if len(ds) != 1:
         die("p4 describe -s %d did not return 1 result: %s" % (change, str(ds)))
 
@@ -481,12 +501,23 @@ def p4_describe(change, shelved=False):
         die("p4 describe -s %d exited with %d: %s" % (change, d["p4ExitCode"],
                                                       str(d)))
     if "code" in d:
-        if d["code"] == "error":
+        if d["code"] == b"error":
             die("p4 describe -s %d returned error code: %s" % (change, str(d)))
 
     if "time" not in d:
         die("p4 describe -s %d returned no \"time\": %s" % (change, str(d)))
 
+    # Do not convert 'depotFile(X)' or 'path' to be UTF-8 encoded, however 
+    # cast as_string() the rest of the text. 
+    keys=d.keys()
+    for key in keys:
+        if key.startswith('depotFile'):
+            d[key]=d[key] 
+        elif key == 'path':
+            d[key]=d[key] 
+        else:
+            d[key] = as_string(d[key])
+
     return d
 
 #
@@ -800,6 +831,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
     return result
 
 def p4Cmd(cmd):
+    """ Executes a P4 command and returns the results in a dictionary
+    """
     list = p4CmdList(cmd)
     result = {}
     for entry in list:
@@ -908,13 +941,15 @@ def gitDeleteRef(ref):
 _gitConfig = {}
 
 def gitConfig(key, typeSpecifier=None):
+    """ Return a configuration setting from GIT
+	"""
     if key not in _gitConfig:
         cmd = [ "git", "config" ]
         if typeSpecifier:
             cmd += [ typeSpecifier ]
         cmd += [ key ]
         s = read_pipe(cmd, ignore_error=True)
-        _gitConfig[key] = s.strip()
+        _gitConfig[key] = as_string(s).strip()
     return _gitConfig[key]
 
 def gitConfigBool(key):
@@ -988,6 +1023,7 @@ def branch_exists(branch):
     cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     out, _ = p.communicate()
+    out = as_string(out)
     if p.returncode:
         return False
     # expect exactly one line of output: the branch name
@@ -1171,9 +1207,22 @@ def p4PathStartsWith(path, prefix):
     #
     # we may or may not have a problem. If you have core.ignorecase=true,
     # we treat DirA and dira as the same directory
+    
+    # Since we have to deal with mixed encodings for p4 file
+    # paths, first perform a simple startswith check, this covers
+    # the case that the formats and path are identical.
+    if as_bytes(path).startswith(as_bytes(prefix)):
+        return True
+    
+    # attempt to convert the prefix and path both to utf8
+    path_utf8 = encodeWithUTF8(path)
+    prefix_utf8 = encodeWithUTF8(prefix)
+
     if gitConfigBool("core.ignorecase"):
-        return path.lower().startswith(prefix.lower())
-    return path.startswith(prefix)
+        # Check if we match byte-per-byte.  
+        
+        return path_utf8.lower().startswith(prefix_utf8.lower())
+    return path_utf8.startswith(prefix_utf8)
 
 def getClientSpec():
     """Look at the p4 client spec, create a View() object that contains
@@ -1229,18 +1278,24 @@ def wildcard_decode(path):
     # Cannot have * in a filename in windows; untested as to
     # what p4 would do in such a case.
     if not platform.system() == "Windows":
-        path = path.replace("%2A", "*")
-    path = path.replace("%23", "#") \
-               .replace("%40", "@") \
-               .replace("%25", "%")
+        path = path.replace(b"%2A", b"*")
+    path = path.replace(b"%23", b"#") \
+               .replace(b"%40", b"@") \
+               .replace(b"%25", b"%")
     return path
 
 def wildcard_encode(path):
     # do % first to avoid double-encoding the %s introduced here
-    path = path.replace("%", "%25") \
-               .replace("*", "%2A") \
-               .replace("#", "%23") \
-               .replace("@", "%40")
+    if isinstance(path, unicode):
+        path = path.replace("%", "%25") \
+                   .replace("*", "%2A") \
+                   .replace("#", "%23") \
+                   .replace("@", "%40")
+    else:
+        path = path.replace(b"%", b"%25") \
+                   .replace(b"*", b"%2A") \
+                   .replace(b"#", b"%23") \
+                   .replace(b"@", b"%40")
     return path
 
 def wildcard_present(path):
@@ -1372,7 +1427,7 @@ def generatePointer(self, contentFile):
             ['git', 'lfs', 'pointer', '--file=' + contentFile],
             stdout=subprocess.PIPE
         )
-        pointerFile = pointerProcess.stdout.read()
+        pointerFile = as_string(pointerProcess.stdout.read())
         if pointerProcess.wait():
             os.remove(contentFile)
             die('git-lfs pointer command failed. Did you install the extension?')
@@ -1479,6 +1534,8 @@ def getUserCacheFilename(self):
         return os.path.join(home, ".gitp4-usercache.txt")
 
     def getUserMapFromPerforceServer(self):
+        """ Creates the usercache from the data in P4.
+        """
         if self.userMapFromPerforceServer:
             return
         self.users = {}
@@ -1504,18 +1561,22 @@ def getUserMapFromPerforceServer(self):
         for (key, val) in list(self.users.items()):
             s += "%s\t%s\n" % (key.expandtabs(1), val.expandtabs(1))
 
-        open(self.getUserCacheFilename(), "wb").write(s)
+        cache = io.open(self.getUserCacheFilename(), "wb")
+        cache.write(as_bytes(s))
+        cache.close()
         self.userMapFromPerforceServer = True
 
     def loadUserMapFromCache(self):
+        """ Reads the P4 username to git email map 
+        """
         self.users = {}
         self.userMapFromPerforceServer = False
         try:
-            cache = open(self.getUserCacheFilename(), "rb")
+            cache = io.open(self.getUserCacheFilename(), "rb")
             lines = cache.readlines()
             cache.close()
             for line in lines:
-                entry = line.strip().split("\t")
+                entry = as_string(line).strip().split("\t")
                 self.users[entry[0]] = entry[1]
         except IOError:
             self.getUserMapFromPerforceServer()
@@ -1715,21 +1776,27 @@ def prepareLogMessage(self, template, message, jobs):
         return result
 
     def patchRCSKeywords(self, file, pattern):
-        # Attempt to zap the RCS keywords in a p4 controlled file matching the given pattern
+        """ Attempt to zap the RCS keywords in a p4 
+            controlled file matching the given pattern
+        """
+        bSubLine = as_bytes(r'$\1$')
         (handle, outFileName) = tempfile.mkstemp(dir='.')
         try:
-            outFile = os.fdopen(handle, "w+")
-            inFile = open(file, "r")
-            regexp = re.compile(pattern, re.VERBOSE)
+            outFile = os.fdopen(handle, "w+b")
+            inFile = open(file, "rb")
+            regexp = re.compile(as_bytes(pattern), re.VERBOSE)
             for line in inFile.readlines():
-                line = regexp.sub(r'$\1$', line)
+                line = regexp.sub(bSubLine, line)
                 outFile.write(line)
             inFile.close()
             outFile.close()
+            outFile = None
             # Forcibly overwrite the original file
             os.unlink(file)
             shutil.move(outFileName, file)
         except:
+            if outFile != None:
+                outFile.close()
             # cleanup our temporary file
             os.unlink(outFileName)
             print("Failed to strip RCS keywords in %s" % file)
@@ -2149,7 +2216,7 @@ def applyCommit(self, id):
         tmpFile = os.fdopen(handle, "w+b")
         if self.isWindows:
             submitTemplate = submitTemplate.replace("\n", "\r\n")
-        tmpFile.write(submitTemplate)
+        tmpFile.write(as_bytes(submitTemplate))
         tmpFile.close()
 
         if self.prepare_p4_only:
@@ -2199,8 +2266,8 @@ def applyCommit(self, id):
                 message = tmpFile.read()
                 tmpFile.close()
                 if self.isWindows:
-                    message = message.replace("\r\n", "\n")
-                submitTemplate = message[:message.index(separatorLine)]
+                    message = message.replace(b"\r\n", b"\n")
+                submitTemplate = message[:message.index(as_bytes(separatorLine))]
 
                 if update_shelve:
                     p4_write_pipe(['shelve', '-r', '-i'], submitTemplate)
@@ -2843,8 +2910,11 @@ def stripRepoPath(self, path, prefixes):
         return path
 
     def splitFilesIntoBranches(self, commit):
-        """Look at each depotFile in the commit to figure out to what
-           branch it belongs."""
+        """ Look at each depotFile in the commit to figure out to what
+            branch it belongs.
+
+            Data in the commit will NOT be encoded
+        """
 
         if self.clientSpecDirs:
             files = self.extractFilesFromCommit(commit)
@@ -2885,16 +2955,22 @@ def splitFilesIntoBranches(self, commit):
         return branches
 
     def writeToGitStream(self, gitMode, relPath, contents):
-        self.gitStream.write('M %s inline %s\n' % (gitMode, relPath))
+        """ Writes the bytes[] 'contents' to the git fast-import
+            with the given 'gitMode' and 'relPath' as the relative
+            path.
+        """
+        self.gitStream.write('M %s inline %s\n' % (gitMode, as_string(relPath)))
         self.gitStream.write('data %d\n' % sum(len(d) for d in contents))
         for d in contents:
-            self.gitStream.write(d)
+            self.gitStreamBytes.write(d)
         self.gitStream.write('\n')
 
-    # output one file from the P4 stream
-    # - helper for streamP4Files
-
     def streamOneP4File(self, file, contents):
+        """ output one file from the P4 stream to the git inbound stream.
+            helper for streamP4files.
+
+            contents should be a bytes (bytes) 
+        """
         relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
         relPath = encodeWithUTF8(relPath, self.verbose)
         if verbose:
@@ -2902,7 +2978,7 @@ def streamOneP4File(self, file, contents):
                 size = int(self.stream_file['fileSize'])
             else:
                 size = 0 # deleted files don't get a fileSize apparently
-            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size//1024//1024))
+            sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), as_string(relPath), size//1024//1024))
             sys.stdout.flush()
 
         (type_base, type_mods) = split_p4_type(file["type"])
@@ -2920,7 +2996,7 @@ def streamOneP4File(self, file, contents):
                 # to nothing.  This causes p4 errors when checking out such
                 # a change, and errors here too.  Work around it by ignoring
                 # the bad symlink; hopefully a future change fixes it.
-                print("\nIgnoring empty symlink in %s" % file['depotFile'])
+                print("\nIgnoring empty symlink in %s" % path_as_string(file['depotFile']))
                 return
             elif data[-1] == '\n':
                 contents = [data[:-1]]
@@ -2960,16 +3036,16 @@ def streamOneP4File(self, file, contents):
             # Ideally, someday, this script can learn how to generate
             # appledouble files directly and import those to git, but
             # non-mac machines can never find a use for apple filetype.
-            print("\nIgnoring apple filetype file %s" % file['depotFile'])
+            print("\nIgnoring apple filetype file %s" % path_as_string(file['depotFile']))
             return
 
         # Note that we do not try to de-mangle keywords on utf16 files,
         # even though in theory somebody may want that.
-        pattern = p4_keywords_regexp_for_type(type_base, type_mods)
+        pattern = as_bytes(p4_keywords_regexp_for_type(type_base, type_mods))
         if pattern:
             regexp = re.compile(pattern, re.VERBOSE)
-            text = ''.join(contents)
-            text = regexp.sub(r'$\1$', text)
+            text = b''.join(contents)
+            text = regexp.sub(as_bytes(r'$\1$'), text)
             contents = [ text ]
 
         if self.largeFileSystem:
@@ -2988,15 +3064,19 @@ def streamOneP4Deletion(self, file):
         if self.largeFileSystem and self.largeFileSystem.isLargeFile(relPath):
             self.largeFileSystem.removeLargeFile(relPath)
 
-    # handle another chunk of streaming data
     def streamP4FilesCb(self, marshalled):
+        """ Callback function for recording P4 chunks of data for streaming 
+            into GIT.
+
+            marshalled data is bytes[] from the caller
+        """
 
         # catch p4 errors and complain
         err = None
-        if "code" in marshalled:
-            if marshalled["code"] == "error":
-                if "data" in marshalled:
-                    err = marshalled["data"].rstrip()
+        if b"code" in marshalled:
+            if marshalled[b"code"] == b"error":
+                if b"data" in marshalled:
+                    err = marshalled[b"data"].rstrip()
 
         if not err and 'fileSize' in self.stream_file:
             required_bytes = int((4 * int(self.stream_file["fileSize"])) - calcDiskFree())
@@ -3018,11 +3098,11 @@ def streamP4FilesCb(self, marshalled):
             # ignore errors, but make sure it exits first
             self.importProcess.wait()
             if f:
-                die("Error from p4 print for %s: %s" % (f, err))
+                die("Error from p4 print for %s: %s" % (path_as_string(f), err))
             else:
                 die("Error from p4 print: %s" % err)
 
-        if 'depotFile' in marshalled and self.stream_have_file_info:
+        if b'depotFile' in marshalled and self.stream_have_file_info:
             # start of a new file - output the old one first
             self.streamOneP4File(self.stream_file, self.stream_contents)
             self.stream_file = {}
@@ -3032,13 +3112,16 @@ def streamP4FilesCb(self, marshalled):
         # pick up the new file information... for the
         # 'data' field we need to append to our array
         for k in list(marshalled.keys()):
-            if k == 'data':
+            if k == b'data':
                 if 'streamContentSize' not in self.stream_file:
                     self.stream_file['streamContentSize'] = 0
-                self.stream_file['streamContentSize'] += len(marshalled['data'])
-                self.stream_contents.append(marshalled['data'])
+                self.stream_file['streamContentSize'] += len(marshalled[b'data'])
+                self.stream_contents.append(marshalled[b'data'])
             else:
-                self.stream_file[k] = marshalled[k]
+                if k == b'depotFile':
+                    self.stream_file[as_string(k)] = marshalled[k]
+                else:
+                    self.stream_file[as_string(k)] = as_string(marshalled[k])
 
         if (verbose and
             'streamContentSize' in self.stream_file and
@@ -3047,13 +3130,14 @@ def streamP4FilesCb(self, marshalled):
             size = int(self.stream_file["fileSize"])
             if size > 0:
                 progress = 100.0*self.stream_file['streamContentSize']/size
-                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size//1024//1024)))
+                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (path_as_string(self.stream_file['depotFile']), progress, int(size//1024//1024)))
                 sys.stdout.flush()
 
         self.stream_have_file_info = True
 
-    # Stream directly from "p4 files" into "git fast-import"
     def streamP4Files(self, files):
+        """ Stream directly from "p4 files" into "git fast-import" 
+        """
         filesForCommit = []
         filesToRead = []
         filesToDelete = []
@@ -3074,7 +3158,7 @@ def streamP4Files(self, files):
             self.stream_contents = []
             self.stream_have_file_info = False
 
-            # curry self argument
+            # Callback for P4 command to collect file content
             def streamP4FilesCbSelf(entry):
                 self.streamP4FilesCb(entry)
 
@@ -3083,9 +3167,9 @@ def streamP4FilesCbSelf(entry):
                 if 'shelved_cl' in f:
                     # Handle shelved CLs using the "p4 print file@=N" syntax to print
                     # the contents
-                    fileArg = '%s@=%d' % (f['path'], f['shelved_cl'])
+                    fileArg = b'%s@=%d' % (f['path'], as_bytes(f['shelved_cl']))
                 else:
-                    fileArg = '%s#%s' % (f['path'], f['rev'])
+                    fileArg = b'%s#%s' % (f['path'], as_bytes(f['rev']))
 
                 fileArgs.append(fileArg)
 
@@ -3105,7 +3189,7 @@ def make_email(self, userid):
 
     def streamTag(self, gitStream, labelName, labelDetails, commit, epoch):
         """ Stream a p4 tag.
-        commit is either a git commit, or a fast-import mark, ":<p4commit>"
+            commit is either a git commit, or a fast-import mark, ":<p4commit>"
         """
 
         if verbose:
@@ -3177,7 +3261,22 @@ def commit(self, details, files, branch, parent = "", allow_empty=False):
                 .format(details['change']))
             return
 
+        # fast-import:
+        #'commit' SP <ref> LF
+	    #mark?
+	    #original-oid?
+	    #('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
+	    #'committer' (SP <name>)? SP LT <email> GT SP <when> LF
+	    #('encoding' SP <encoding>)?
+	    #data
+	    #('from' SP <commit-ish> LF)?
+	    #('merge' SP <commit-ish> LF)*
+	    #(filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
+	    #LF?
+        
+        #'commit' - <ref> is the name of the branch to make the commit on
         self.gitStream.write("commit %s\n" % branch)
+        #'mark' SP :<idnum>
         self.gitStream.write("mark :%s\n" % details["change"])
         self.committedChanges.add(int(details["change"]))
         committer = ""
@@ -3187,19 +3286,29 @@ def commit(self, details, files, branch, parent = "", allow_empty=False):
 
         self.gitStream.write("committer %s\n" % committer)
 
-        self.gitStream.write("data <<EOT\n")
-        self.gitStream.write(details["desc"])
+        # Per https://git-scm.com/docs/git-fast-import
+        # The preferred method for creating the commit message is to supply the 
+        # byte count in the data method and not to use a Delimited format. 
+        # Collect all the text in the commit message into a single string and 
+        # compute the byte count.
+        commitText = details["desc"]
         if len(jobs) > 0:
-            self.gitStream.write("\nJobs: %s" % (' '.join(jobs)))
-
+            commitText += "\nJobs: %s" % (' '.join(jobs))
         if not self.suppress_meta_comment:
-            self.gitStream.write("\n[git-p4: depot-paths = \"%s\": change = %s" %
-                                (','.join(self.branchPrefixes), details["change"]))
-            if len(details['options']) > 0:
-                self.gitStream.write(": options = %s" % details['options'])
-            self.gitStream.write("]\n")
+            # coherce the path to the correct formatting in the branch prefixes as well.
+            dispPaths = []
+            for p in self.branchPrefixes:
+                dispPaths += [path_as_string(p)]
 
-        self.gitStream.write("EOT\n\n")
+            commitText += ("\n[git-p4: depot-paths = \"%s\": change = %s" %
+                                (','.join(dispPaths), details["change"]))
+            if len(details['options']) > 0:
+                commitText += (": options = %s" % details['options'])
+            commitText += "]"
+        commitText += "\n" 
+        self.gitStream.write("data %s\n" % len(as_bytes(commitText)))
+        self.gitStream.write(commitText)
+        self.gitStream.write("\n")
 
         if len(parent) > 0:
             if self.verbose:
@@ -3606,30 +3715,35 @@ def sync_origin_only(self):
                 system("git fetch origin")
 
     def importHeadRevision(self, revision):
-        print("Doing initial import of %s from revision %s into %s" % (' '.join(self.depotPaths), revision, self.branch))
-
+        # Re-encode depot text
+        dispPaths = []
+        utf8Paths = []
+        for p in self.depotPaths:
+            dispPaths += [path_as_string(p)]
+        print("Doing initial import of %s from revision %s into %s" % (' '.join(dispPaths), revision, self.branch))
         details = {}
         details["user"] = "git perforce import user"
-        details["desc"] = ("Initial import of %s from the state at revision %s\n"
-                           % (' '.join(self.depotPaths), revision))
+        details["desc"] = ("Initial import of %s from the state at revision %s\n" %
+                           (' '.join(dispPaths), revision))
         details["change"] = revision
         newestRevision = 0
+        del dispPaths
 
         fileCnt = 0
         fileArgs = ["%s...%s" % (p,revision) for p in self.depotPaths]
 
-        for info in p4CmdList(["files"] + fileArgs):
+        for info in p4CmdList(["files"] + fileArgs, encode_data = False):
 
-            if 'code' in info and info['code'] == 'error':
+            if 'code' in info and info['code'] == b'error':
                 sys.stderr.write("p4 returned an error: %s\n"
-                                 % info['data'])
-                if info['data'].find("must refer to client") >= 0:
+                                 % as_string(info['data']))
+                if info['data'].find(b"must refer to client") >= 0:
                     sys.stderr.write("This particular p4 error is misleading.\n")
                     sys.stderr.write("Perhaps the depot path was misspelled.\n");
                     sys.stderr.write("Depot path:  %s\n" % " ".join(self.depotPaths))
                 sys.exit(1)
             if 'p4ExitCode' in info:
-                sys.stderr.write("p4 exitcode: %s\n" % info['p4ExitCode'])
+                sys.stderr.write("p4 exitcode: %s\n" % as_string(info['p4ExitCode']))
                 sys.exit(1)
 
 
@@ -3642,8 +3756,10 @@ def importHeadRevision(self, revision):
                 #fileCnt = fileCnt + 1
                 continue
 
+            # Save all the file information, howerver do not translate the depotFile name at 
+            # this time. Leave that as bytes since the encoding may vary.
             for prop in ["depotFile", "rev", "action", "type" ]:
-                details["%s%s" % (prop, fileCnt)] = info[prop]
+                details["%s%s" % (prop, fileCnt)] = (info[prop] if prop == "depotFile" else as_string(info[prop]))
 
             fileCnt = fileCnt + 1
 
@@ -3663,13 +3779,18 @@ def importHeadRevision(self, revision):
             print(self.gitError.read())
 
     def openStreams(self):
+        """ Opens the fast import pipes.  Note that the git* streams are wrapped
+            to expect Unicode text.  To send a raw byte Array, use the importProcess
+            underlying port
+        """
         self.importProcess = subprocess.Popen(["git", "fast-import"],
                                               stdin=subprocess.PIPE,
                                               stdout=subprocess.PIPE,
                                               stderr=subprocess.PIPE);
-        self.gitOutput = self.importProcess.stdout
-        self.gitStream = self.importProcess.stdin
-        self.gitError = self.importProcess.stderr
+        self.gitOutput = Py23File(self.importProcess.stdout, verbose = self.verbose)
+        self.gitStream = Py23File(self.importProcess.stdin, verbose = self.verbose)
+        self.gitError = Py23File(self.importProcess.stderr, verbose = self.verbose)
+        self.gitStreamBytes = self.importProcess.stdin
 
     def closeStreams(self):
         self.gitStream.close()
@@ -4035,15 +4156,17 @@ def run(self, args):
             self.cloneDestination = depotPaths[-1]
             depotPaths = depotPaths[:-1]
 
+        dispPaths = []
         for p in depotPaths:
             if not p.startswith("//"):
                 sys.stderr.write('Depot paths must start with "//": %s\n' % p)
                 return False
+            dispPaths += [path_as_string(p)]
 
         if not self.cloneDestination:
             self.cloneDestination = self.defaultDestination(args)
 
-        print("Importing from %s into %s" % (', '.join(depotPaths), self.cloneDestination))
+        print("Importing from %s into %s" % (', '.join(dispPaths), path_as_string(self.cloneDestination)))
 
         if not os.path.exists(self.cloneDestination):
             os.makedirs(self.cloneDestination)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 11/11] git-p4: Added --encoding parameter to p4 clone
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (9 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 10/11] git-p4: Support python3 for basic P4 clone, sync, and submit Ben Keene via GitGitGadget
@ 2019-12-04 22:29       ` Ben Keene via GitGitGadget
  2019-12-05  9:54       ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Luke Diamand
  11 siblings, 0 replies; 46+ messages in thread
From: Ben Keene via GitGitGadget @ 2019-12-04 22:29 UTC (permalink / raw)
  To: git; +Cc: Ben Keene, Junio C Hamano, Ben Keene

From: Ben Keene <seraphire@gmail.com>

The test t9822 did not have any tests that had encoded a directory name in ISO8859-1.

Additionally, to make it easier for the user to clone new repositories with a non-UTF-8 encoded path in P4, add a new parameter to p4clone "--encoding" that sets the

Add new tests that use ISO8859-1 encoded text in both the directory and file names.  git-p4.pathEncoding.

Update the View class in the git-p4 code to properly cast text as_string() except for depot path and filenames.

Update the documentation to include the new command line parameter for p4clone

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit e26f6309d60c6c1615320d4a9071935e23efe6fb)
---
 Documentation/git-p4.txt        |   5 ++
 git-p4.py                       |  61 +++++++++++++------
 t/t9822-git-p4-path-encoding.sh | 101 ++++++++++++++++++++++++++++++++
 3 files changed, 149 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index 3494a1db3e..f54af3c917 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -305,6 +305,11 @@ options described above.
 --bare::
 	Perform a bare clone.  See linkgit:git-clone[1].
 
+--encoding <encoding>::
+    Optionally sets the git-p4.pathEncoding configuration value in 
+	the newly created Git repository before files are synchronized 
+	from P4. See git-p4.pathEncoding for more information.
+
 Submit options
 ~~~~~~~~~~~~~~
 These options can be used to modify 'git p4 submit' behavior.
diff --git a/git-p4.py b/git-p4.py
index 05db2ec657..1f2e43430a 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1228,7 +1228,7 @@ def getClientSpec():
     """Look at the p4 client spec, create a View() object that contains
        all the mappings, and return it."""
 
-    specList = p4CmdList("client -o")
+    specList = p4CmdList("client -o", encode_data=False)
     if len(specList) != 1:
         die('Output from "client -o" is %d lines, expecting 1' %
             len(specList))
@@ -1237,7 +1237,7 @@ def getClientSpec():
     entry = specList[0]
 
     # the //client/ name
-    client_name = entry["Client"]
+    client_name = as_string(entry["Client"])
 
     # just the keys that start with "View"
     view_keys = [ k for k in list(entry.keys()) if k.startswith("View") ]
@@ -2637,19 +2637,25 @@ def run(self, args):
         return True
 
 class View(object):
-    """Represent a p4 view ("p4 help views"), and map files in a
-       repo according to the view."""
+    """ Represent a p4 view ("p4 help views"), and map files in a
+        repo according to the view.
+    """
 
     def __init__(self, client_name):
         self.mappings = []
-        self.client_prefix = "//%s/" % client_name
+        # the client prefix is saved in bytes as it is used for comparison
+        # against server data.
+        self.client_prefix = as_bytes("//%s/" % client_name)
         # cache results of "p4 where" to lookup client file locations
         self.client_spec_path_cache = {}
 
     def append(self, view_line):
-        """Parse a view line, splitting it into depot and client
-           sides.  Append to self.mappings, preserving order.  This
-           is only needed for tag creation."""
+        """ Parse a view line, splitting it into depot and client
+            sides.  Append to self.mappings, preserving order.  This
+            is only needed for tag creation.
+
+            view_line should be in bytes (depot path encoding)
+        """
 
         # Split the view line into exactly two words.  P4 enforces
         # structure on these lines that simplifies this quite a bit.
@@ -2662,28 +2668,28 @@ def append(self, view_line):
         # The line is already white-space stripped.
         # The two words are separated by a single space.
         #
-        if view_line[0] == '"':
+        if view_line[0] == b'"':
             # First word is double quoted.  Find its end.
-            close_quote_index = view_line.find('"', 1)
+            close_quote_index = view_line.find(b'"', 1)
             if close_quote_index <= 0:
-                die("No first-word closing quote found: %s" % view_line)
+                die("No first-word closing quote found: %s" % path_as_string(view_line))
             depot_side = view_line[1:close_quote_index]
             # skip closing quote and space
             rhs_index = close_quote_index + 1 + 1
         else:
-            space_index = view_line.find(" ")
+            space_index = view_line.find(b" ")
             if space_index <= 0:
-                die("No word-splitting space found: %s" % view_line)
+                die("No word-splitting space found: %s" % path_as_string(view_line))
             depot_side = view_line[0:space_index]
             rhs_index = space_index + 1
 
         # prefix + means overlay on previous mapping
-        if depot_side.startswith("+"):
+        if depot_side.startswith(b"+"):
             depot_side = depot_side[1:]
 
         # prefix - means exclude this path, leave out of mappings
         exclude = False
-        if depot_side.startswith("-"):
+        if depot_side.startswith(b"-"):
             exclude = True
             depot_side = depot_side[1:]
 
@@ -2694,7 +2700,7 @@ def convert_client_path(self, clientFile):
         # chop off //client/ part to make it relative
         if not clientFile.startswith(self.client_prefix):
             die("No prefix '%s' on clientFile '%s'" %
-                (self.client_prefix, clientFile))
+                (as_string(self.client_prefix)), path_as_string(clientFile))
         return clientFile[len(self.client_prefix):]
 
     def update_client_spec_path_cache(self, files):
@@ -2706,9 +2712,9 @@ def update_client_spec_path_cache(self, files):
         if len(fileArgs) == 0:
             return  # All files in cache
 
-        where_result = p4CmdList(["-x", "-", "where"], stdin=fileArgs)
+        where_result = p4CmdList(["-x", "-", "where"], stdin=fileArgs, encode_data=False)
         for res in where_result:
-            if "code" in res and res["code"] == "error":
+            if "code" in res and res["code"] == b"error":
                 # assume error is "... file(s) not in client view"
                 continue
             if "clientFile" not in res:
@@ -4125,10 +4131,14 @@ def __init__(self):
                                  help="where to leave result of the clone"),
             optparse.make_option("--bare", dest="cloneBare",
                                  action="store_true", default=False),
+            optparse.make_option("--encoding", dest="setPathEncoding",
+                                 action="store", default=None,
+                                 help="Sets the path encoding for this depot")
         ]
         self.cloneDestination = None
         self.needsGit = False
         self.cloneBare = False
+        self.setPathEncoding = None
 
     def defaultDestination(self, args):
         """ Returns the last path component as the default git 
@@ -4152,6 +4162,14 @@ def run(self, args):
 
         depotPaths = args
 
+        # If we have an encoding provided, ignore what may already exist
+        # in the registry. This will ensure we show the displayed values
+        # using the correct encoding.
+        if self.setPathEncoding:
+            gitConfigSet("git-p4.pathEncoding", self.setPathEncoding)
+
+        # If more than 1 path element is supplied, the last element
+        # is the clone destination.
         if not self.cloneDestination and len(depotPaths) > 1:
             self.cloneDestination = depotPaths[-1]
             depotPaths = depotPaths[:-1]
@@ -4179,6 +4197,13 @@ def run(self, args):
         if retcode:
             raise CalledProcessError(retcode, init_cmd)
 
+        # Set the encoding if it was provided command line
+        if self.setPathEncoding:
+            init_cmd= ["git", "config", "git-p4.pathEncoding", self.setPathEncoding]
+            retcode = subprocess.call(init_cmd)
+            if retcode:
+                raise CalledProcessError(retcode, init_cmd)
+
         if not P4Sync.run(self, depotPaths):
             return False
 
diff --git a/t/t9822-git-p4-path-encoding.sh b/t/t9822-git-p4-path-encoding.sh
index 572d395498..cf8a15b2e4 100755
--- a/t/t9822-git-p4-path-encoding.sh
+++ b/t/t9822-git-p4-path-encoding.sh
@@ -4,9 +4,20 @@ test_description='Clone repositories with non ASCII paths'
 
 . ./lib-git-p4.sh
 
+# lowercase filename
+# UTF8    - HEX:   a-\xc3\xa4_o-\xc3\xb6_u-\xc3\xbc
+#         - octal: a-\303\244_o-\303\266_u-\303\274
+# ISO8859 - HEX:   a-\xe4_o-\xf6_u-\xfc
 UTF8_ESCAPED="a-\303\244_o-\303\266_u-\303\274.txt"
 ISO8859_ESCAPED="a-\344_o-\366_u-\374.txt"
 
+# lowercase directory
+# UTF8    - HEX:   dir_a-\xc3\xa4_o-\xc3\xb6_u-\xc3\xbc
+# ISO8859 - HEX:   dir_a-\xe4_o-\xf6_u-\xfc
+DIR_UTF8_ESCAPED="dir_a-\303\244_o-\303\266_u-\303\274"
+DIR_ISO8859_ESCAPED="dir_a-\344_o-\366_u-\374"
+
+
 ISO8859="$(printf "$ISO8859_ESCAPED")" &&
 echo content123 >"$ISO8859" &&
 rm "$ISO8859" || {
@@ -58,6 +69,22 @@ test_expect_success 'Clone repo containing iso8859-1 encoded paths with git-p4.p
 	)
 '
 
+test_expect_success 'Clone repo containing iso8859-1 encoded paths with using --encoding parameter' '
+	test_when_finished cleanup_git &&
+	(
+		git p4 clone --encoding iso8859 --destination="$git" //depot &&
+		cd "$git" &&
+		UTF8="$(printf "$UTF8_ESCAPED")" &&
+		echo "$UTF8" >expect &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_cmp expect actual &&
+
+		echo content123 >expect &&
+		cat "$UTF8" >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'Delete iso8859-1 encoded paths and clone' '
 	(
 		cd "$cli" &&
@@ -74,4 +101,78 @@ test_expect_success 'Delete iso8859-1 encoded paths and clone' '
 	)
 '
 
+# These tests will create a directory with ISO8859-1 characters in both the 
+# directory and the path.  Since it is possible to clone a path instead of using
+# the whole client-spec.  Check both versions:  client-spec and with a direct
+# path using --encoding
+test_expect_success 'Create a repo containing iso8859-1 encoded directory and filename' '
+	(
+		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
+		ISO8859="$(printf "$ISO8859_ESCAPED")" &&
+		cd "$cli" &&
+		mkdir "$DIR_ISO8859" && 
+		cd "$DIR_ISO8859" &&
+		echo content123 >"$ISO8859" &&
+		p4 add "$ISO8859" &&
+		p4 submit -d "test commit (encoded directory)"
+	)
+'
+
+test_expect_success 'Clone repo containing iso8859-1 encoded depot path and files with git-p4.pathEncoding' '
+	test_when_finished cleanup_git &&
+	(
+		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
+		DIR_UTF8="$(printf "$DIR_UTF8_ESCAPED")" &&
+		cd "$git" &&
+		git init . &&
+		git config git-p4.pathEncoding iso8859-1 &&
+		git p4 clone --use-client-spec --destination="$git" "//depot/$DIR_ISO8859" &&
+		cd "$DIR_UTF8" &&
+		UTF8="$(printf "$UTF8_ESCAPED")" &&
+		echo "$UTF8" >expect &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_cmp expect actual &&
+
+		echo content123 >expect &&
+		cat "$UTF8" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'Clone repo containing iso8859-1 encoded depot path and files with git-p4.pathEncoding, without --use-client-spec' '
+	test_when_finished cleanup_git &&
+	(
+		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
+		cd "$git" &&
+		git init . &&
+		git config git-p4.pathEncoding iso8859-1 &&
+		git p4 clone --destination="$git" "//depot/$DIR_ISO8859" &&
+		UTF8="$(printf "$UTF8_ESCAPED")" &&
+		echo "$UTF8" >expect &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_cmp expect actual &&
+
+		echo content123 >expect &&
+		cat "$UTF8" >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'Clone repo containing iso8859-1 encoded depot path and files with using --encoding parameter' '
+	test_when_finished cleanup_git &&
+	(
+		DIR_ISO8859="$(printf "$DIR_ISO8859_ESCAPED")" &&
+		git p4 clone --encoding iso8859 --destination="$git" "//depot/$DIR_ISO8859" &&
+		cd "$git" &&
+		UTF8="$(printf "$UTF8_ESCAPED")" &&
+		echo "$UTF8" >expect &&
+		git -c core.quotepath=false ls-files >actual &&
+		test_cmp expect actual &&
+
+		echo content123 >expect &&
+		cat "$UTF8" >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
                         ` (10 preceding siblings ...)
  2019-12-04 22:29       ` [PATCH v4 11/11] git-p4: Added --encoding parameter to p4 clone Ben Keene via GitGitGadget
@ 2019-12-05  9:54       ` Luke Diamand
  2019-12-05 16:16         ` Ben Keene
  11 siblings, 1 reply; 46+ messages in thread
From: Luke Diamand @ 2019-12-05  9:54 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: Git Users, Ben Keene, Junio C Hamano

On Wed, 4 Dec 2019 at 22:29, Ben Keene via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Issue: The current git-p4.py script does not work with python3.
>
> I have attempted to use the P4 integration built into GIT and I was unable
> to get the program to run because I have Python 3.8 installed on my
> computer. I was able to get the program to run when I downgraded my python
> to version 2.7. However, python 2 is reaching its end of life.
>
> Submission: I am submitting a patch for the git-p4.py script that partially
> supports python 3.8. This code was able to pass the basic tests (t9800) when
> run against Python3. This provides basic functionality.
>
> In an attempt to pass the t9822 P4 path-encoding test, a new parameter for
> git P4 Clone was introduced.
>
> --encoding Format-identifier
>
> This will create the GIT repository following the current functionality;
> however, before importing the files from P4, it will set the
> git-p4.pathEncoding option so any files or paths that are encoded with
> non-ASCII/non-UTF-8 formats will import correctly.
>
> Technical details: The script was updated by futurize (
> https://python-future.org/futurize.html) to support Py2/Py3 syntax. The few
> references to classes in future were reworked so that future would not be
> required. The existing code test for Unicode support was extended to
> normalize the classes “unicode” and “bytes” to across platforms:
>
>  * ‘unicode’ is an alias for ‘str’ in Py3 and is the unicode class in Py2.
>  * ‘bytes’ is bytes in Py3 and an alias for ‘str’ in Py2.
>
> New coercion methods were written for both Python2 and Python3:
>
>  * as_string(text) – In Python3, this encodes a bytes object as a UTF-8
>    encoded Unicode string.
>  * as_bytes(text) – In Python3, this decodes a Unicode string to an array of
>    bytes.
>
> In Python2, these functions do not change the data since a ‘str’ object
> function in both roles as strings and byte arrays. This reduces the
> potential impact on backward compatibility with Python 2.
>
>  * to_unicode(text) – ensures that the supplied data is encoded as a UTF-8
>    string. This function will encode data in both Python2 and Python3. *
>       path_as_string(path) – This function is an extension function that
>       honors the option “git-p4.pathEncoding” to convert a set of bytes or
>       characters to UTF-8. If the str/bytes cannot decode as ASCII, it will
>       use the encodeWithUTF8() method to convert the custom encoded bytes to
>       Unicode in UTF-8.
>
>
>
> Generally speaking, information in the script is converted to Unicode as
> early as possible and converted back to a byte array just before passing to
> external programs or files. The exception to this rule is P4 Repository file
> paths.
>
> Paths are not converted but left as “bytes” so the original file path
> encoding can be preserved. This formatting is required for commands that
> interact with the P4 file path. When the file path is used by GIT, it is
> converted with encodeWithUTF8().
>

Almost all the tests pass now - nice!

(There's one test that fails for me, t9830-git-p4-symlink-dir.sh).

Nitpicking:

- There are some bits of trailing whitespace around - can you strip
those out? You can use "git diff --check".
- Also I think the convention for git commits is that they be limited
to 72 (?) characters.
- In 10dc commit message, s/behvior/behavior
- Maybe submit 4fc4 as a separate patch series? It doesn't seem
directly related to your python3 changes.
- s/howerver/however/

The comment at line 3261 (showing the fast-import syntax) has wonky
indentation, and needs a space after the '#'.

This code looked like we're duplicating stuff:

+    if isinstance(path, unicode):
+        path = path.replace("%", "%25") \
+                   .replace("*", "%2A") \
+                   .replace("#", "%23") \
+                   .replace("@", "%40")
+    else:
+        path = path.replace(b"%", b"%25") \
+                   .replace(b"*", b"%2A") \
+                   .replace(b"#", b"%23") \
+                   .replace(b"@", b"%40")

I wonder if we can have a helper to do this?

In patchRCSKeywords() you've added code to cleanup outFile. But I
wonder if we could just use a 'finally' block, or a contextexpr ("with
blah as outFile:")

I don't know if it's worth doing now that you've got it going, but at
one point I tried simplifying code like this:

   path_as_string(file['depotFile'])
and
   marshalled[b'data']

by using a dictionary with overloaded operators which would do the
bytes/string conversion automatically. However, your approach isn't
actually _that_ invasive, so maybe this is not necessary.

Looks good though, thanks!
Luke






> Signed-off-by: Ben Keene seraphire@gmail.com [seraphire@gmail.com]
>
> Ben Keene (11):
>   git-p4: select p4 binary by operating-system
>   git-p4: change the expansion test from basestring to list
>   git-p4: add new helper functions for python3 conversion
>   git-p4: python3 syntax changes
>   git-p4: Add new functions in preparation of usage
>   git-p4: Fix assumed path separators to be more Windows friendly
>   git-p4: Add a helper class for stream writing
>   git-p4: p4CmdList  - support Unicode encoding
>   git-p4: Add usability enhancements
>   git-p4: Support python3 for basic P4 clone, sync, and submit
>   git-p4: Added --encoding parameter to p4 clone
>
>  Documentation/git-p4.txt        |   5 +
>  git-p4.py                       | 690 ++++++++++++++++++++++++--------
>  t/t9822-git-p4-path-encoding.sh | 101 +++++
>  3 files changed, 629 insertions(+), 167 deletions(-)
>
>
> base-commit: 228f53135a4a41a37b6be8e4d6e2b6153db4a8ed
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-463%2Fseraphire%2Fseraphire%2Fp4-python3-unicode-v4
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-463/seraphire/seraphire/p4-python3-unicode-v4
> Pull-Request: https://github.com/gitgitgadget/git/pull/463
>
> Range-diff vs v3:
>
>   -:  ---------- >  1:  4012426993 git-p4: select p4 binary by operating-system
>   -:  ---------- >  2:  0ef2f56b04 git-p4: change the expansion test from basestring to list
>   -:  ---------- >  3:  f0e658b984 git-p4: add new helper functions for python3 conversion
>   -:  ---------- >  4:  3c41db3e91 git-p4: python3 syntax changes
>   -:  ---------- >  5:  1bf7b073b0 git-p4: Add new functions in preparation of usage
>   -:  ---------- >  6:  8f5752c127 git-p4: Fix assumed path separators to be more Windows friendly
>   -:  ---------- >  7:  10dc059444 git-p4: Add a helper class for stream writing
>   -:  ---------- >  8:  e1a424a955 git-p4: p4CmdList  - support Unicode encoding
>   -:  ---------- >  9:  4fc49313f0 git-p4: Add usability enhancements
>   1:  02b3843e9f ! 10:  04a0aedbaa Python3 support for t9800 tests. Basic P4/Python3 support
>      @@ -1,159 +1,60 @@
>       Author: Ben Keene <seraphire@gmail.com>
>
>      -    Python3 support for t9800 tests. Basic P4/Python3 support
>      +    git-p4: Support python3 for basic P4 clone, sync, and submit
>      +
>      +    Issue: Python 3 is still not properly supported for any use with the git-p4 python code.
>      +    Warning - this is a very large atomic commit.  The commit text is also very large.
>      +
>      +    Change the code such that, with the exception of P4 depot paths and depot files, all text read by git-p4 is cast as a string as soon as possible and converted back to bytes as late as possible, following Python2 to Python3 conversion best practices.
>      +
>      +    Important: Do not cast the bytes that contain the p4 depot path or p4 depot file name.  These should be left as bytes until used.
>      +
>      +    These two values should not be converted because the encoding of these values is unknown.  git-p4 supports a configuration value git-p4.pathEncoding that is used by the encodeWithUTF8()  to determine what a UTF8 version of the path and filename should be.  However, since depot path and depot filename need to be sent to P4 in their original encoding, they will be left as byte streams until they are actually used:
>      +
>      +    * When sent to P4, the bytes are literally passed to the p4 command
>      +    * When displayed in text for the user, they should be passed through the path_as_string() function
>      +    * When used by GIT they should be passed through the encodeWithUTF8() function
>      +
>      +    Change all the rest of system calls to cast output (stdin) as_bytes() and input (stdout) as_string().  This retains existing Python 2 support, and adds python 3 support for these functions:
>      +    * read_pipe_full
>      +    * read_pipe_lines
>      +    * p4_has_move_command (used internally)
>      +    * gitConfig
>      +    * branch_exists
>      +    * GitLFS.generatePointer
>      +    * applyCommit - template must be read and written to the temporary file as_bytes() since it is created in memory as a string.
>      +    * streamOneP4File(file, contents) - wrap calls to the depotFile in path_as_string() for display. The file contents must be retained as bytes, so update the RCS changes to be forced to bytes.
>      +    * streamP4Files
>      +    * importHeadRevision(revision) - encode the depotPaths for display separate from the text for processing.
>      +
>      +    Py23File usage -
>      +    Change the P4Sync.OpenStreams() function to cast the gitOutput, gitStream, and gitError streams as Py23File() wrapper classes.  This facilitates taking strings in both python 2 and python 3 and casting them to bytes in the wrapper class instead of having to modify each method. Since the fast-import command also expects a raw byte stream for file content, add a new stream handle - gitStreamBytes which is an unwrapped verison of gitStream.
>      +
>      +    Literal text -
>      +    Depending on context, most literal text does not need casting to unicode or bytes as the text is Python dependent - In python 2, the string is implied as 'str' and python 3 the string is implied as 'unicode'. Under these conditions, they match the rest of the operating text, following best practices.  However, when a literal string is used in functions that are dealing with the raw input from and raw ouput to files streams, literal bytes may be required. Additionally, functions that are dealing with P4 depot paths or P4 depot file names are also dealing with bytes and will require the same casting as bytes.  The following functions cast text as byte strings:
>      +    * wildcard_decode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
>      +    * wildcard_encode(path) - the path parameter is a P4 depot and is bytes. Cast all the literals to bytes.
>      +    * streamP4FilesCb(marshalled) - the marshalled data is in bytes. Cast the literals as bytes. When using this data to manipulate self.stream_file, encode all the marshalled data except for the 'depotFile' name.
>      +    * streamP4Files
>      +
>      +    Special behavior:
>      +    * p4_describe - encoding is disabled for the depotFile(x) and path elements since these are depot path and depo filenames.
>      +    * p4PathStartsWith(path, prefix) - Since P4 depot paths can contain non-UTF-8 encoded strings, change this method to compare paths while supporting the optional encoding.
>      +       - First, perform a byte-to-byte check to see if the path and prefix are both identical text.  There is no need to perform encoding conversions if the text is identical.
>      +       - If the byte check fails, pass both the path and prefix through encodeWithUTF8() to ensure both paths are using the same encoding. Then perform the test as originally written.
>      +    * patchRCSKeywords(file, pattern) - the parameters of file and pattern are both strings. However this function changes the contents of the file itentified by name "file". Treat the content of this file as binary to ensure that python does not accidently change the original encoding. The regular expression is cast as_bytes() and run against the file as_bytes(). The P4 keywords are ASCII strings and cannot span lines so iterating over each line of the file is acceptable.
>      +    * writeToGitStream(gitMode, relPath, contents) - Since 'contents' is already bytes data, instead of using the self.gitStream, use the new self.gitStreamBytes - the unwrapped gitStream that does not cast as_bytes() the binary data.
>      +    * commit(details, files, branch, parent = "", allow_empty=False) - Changed the encoding for the commit message to the preferred format for fast-import. The number of bytes is sent in the data block instead of using the EOT marker.
>      +    * Change the code for handling the user cache to use binary files. Cast text as_bytes() when writing to the cache and as_string() when reading from the cache.  This makes the reading and writing of the cache determinstic in it's encoding. Unlike file paths, P4 encodes the user names in UTF-8 encoding so no additional string encoding is required.
>
>           Signed-off-by: Ben Keene <seraphire@gmail.com>
>      +    (cherry picked from commit 65ff0c74ebe62a200b4385ecfd4aa618ce091f48)
>
>        diff --git a/git-p4.py b/git-p4.py
>        --- a/git-p4.py
>        +++ b/git-p4.py
>       @@
>      - import zlib
>      - import ctypes
>      - import errno
>      -+import os.path
>      -+import codecs
>      -+import io
>      -
>      - # support basestring in python3
>      - try:
>      -     unicode = unicode
>      - except NameError:
>      -     # 'unicode' is undefined, must be Python 3
>      --    str = str
>      -+    #
>      -+    # For Python3 which is natively unicode, we will use
>      -+    # unicode for internal information but all P4 Data
>      -+    # will remain in bytes
>      -+    isunicode = True
>      -     unicode = str
>      -     bytes = bytes
>      --    basestring = (str,bytes)
>      -+
>      -+    def as_string(text):
>      -+        """Return a byte array as a unicode string"""
>      -+        if text == None:
>      -+            return None
>      -+        if isinstance(text, bytes):
>      -+            return unicode(text, "utf-8")
>      -+        else:
>      -+            return text
>      -+
>      -+    def as_bytes(text):
>      -+        """Return a Unicode string as a byte array"""
>      -+        if text == None:
>      -+            return None
>      -+        if isinstance(text, bytes):
>      -+            return text
>      -+        else:
>      -+            return bytes(text, "utf-8")
>      -+
>      -+    def to_unicode(text):
>      -+        """Return a byte array as a unicode string"""
>      -+        return as_string(text)
>      -+
>      -+    def path_as_string(path):
>      -+        """ Converts a path to the UTF8 encoded string """
>      -+        if isinstance(path, unicode):
>      -+            return path
>      -+        return encodeWithUTF8(path).decode('utf-8')
>      -+
>      - else:
>      -     # 'unicode' exists, must be Python 2
>      --    str = str
>      -+    #
>      -+    # We will treat the data as:
>      -+    #   str   -> str
>      -+    #   bytes -> str
>      -+    # So for Python2 these functions are no-ops
>      -+    # and will leave the data in the ambiguious
>      -+    # string/bytes state
>      -+    isunicode = False
>      -     unicode = unicode
>      -     bytes = str
>      --    basestring = basestring
>      -+
>      -+    def as_string(text):
>      -+        """ Return text unaltered (for Python3 support) """
>      -+        return text
>      -+
>      -+    def as_bytes(text):
>      -+        """ Return text unaltered (for Python3 support) """
>      -+        return text
>      -+
>      -+    def to_unicode(text):
>      -+        """Return a string as a unicode string"""
>      -+        return text.decode('utf-8')
>      -+
>      -+    def path_as_string(path):
>      -+        """ Converts a path to the UTF8 encoded bytes """
>      -+        return encodeWithUTF8(path)
>      -+
>      -+
>      -+
>      -+# Check for raw_input support
>      -+try:
>      -+    raw_input
>      -+except NameError:
>      -+    raw_input = input
>      -
>      - try:
>      -     from subprocess import CalledProcessError
>      -@@
>      -     location. It means that hooking into the environment, or other configuration
>      -     can be done more easily.
>      -     """
>      --    real_cmd = ["p4"]
>      -+    # Look for the P4 binary
>      -+    if (platform.system() == "Windows"):
>      -+        real_cmd = ["p4.exe"]
>      -+    else:
>      -+        real_cmd = ["p4"]
>      -
>      -     user = gitConfig("git-p4.user")
>      -     if len(user) > 0:
>      -@@
>      -         # Provide a way to not pass this option by setting git-p4.retries to 0
>      -         real_cmd += ["-r", str(retries)]
>      -
>      --    if isinstance(cmd,basestring):
>      -+    if not isinstance(cmd, list):
>      -         real_cmd = ' '.join(real_cmd) + ' ' + cmd
>      -     else:
>      -         real_cmd += cmd
>      -@@
>      -         sys.exit(1)
>      -
>      - def write_pipe(c, stdin):
>      -+    """Executes the command 'c', passing 'stdin' on the standard input"""
>      -     if verbose:
>      -         sys.stderr.write('Writing pipe: %s\n' % str(c))
>      -
>      --    expand = isinstance(c,basestring)
>      -+    expand = not isinstance(c, list)
>      -     p = subprocess.Popen(c, stdin=subprocess.PIPE, shell=expand)
>      -     pipe = p.stdin
>      -     val = pipe.write(stdin)
>      -@@
>      -     if p.wait():
>      -         die('Command failed: %s' % str(c))
>      -
>      --    return val
>      -
>      - def p4_write_pipe(c, stdin):
>      -+    """ Runs a P4 command 'c', passing 'stdin' data to P4"""
>      -     real_cmd = p4_build_cmd(c)
>      --    return write_pipe(real_cmd, stdin)
>      -+    write_pipe(real_cmd, stdin)
>      -
>      - def read_pipe_full(c):
>      -     """ Read output from  command. Returns a tuple
>      -@@
>      -     if verbose:
>      -         sys.stderr.write('Reading pipe: %s\n' % str(c))
>      -
>      --    expand = isinstance(c,basestring)
>      -+    expand = not isinstance(c, list)
>      +     expand = not isinstance(c, list)
>            p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand)
>            (out, err) = p.communicate()
>       +    out = as_string(out)
>      @@ -179,10 +80,7 @@
>            if verbose:
>                sys.stderr.write('Reading pipe: %s\n' % str(c))
>
>      --    expand = isinstance(c, basestring)
>      -+    expand = not isinstance(c, list)
>      -     p = subprocess.Popen(c, stdout=subprocess.PIPE, shell=expand)
>      -     pipe = p.stdout
>      +@@
>            val = pipe.readlines()
>            if pipe.close() or p.wait():
>                die('Command failed: %s' % str(c))
>      @@ -203,28 +101,6 @@
>            # return code will be 1 in either case
>            if err.find("Invalid option") >= 0:
>                return False
>      -@@
>      -     return True
>      -
>      - def system(cmd, ignore_error=False):
>      --    expand = isinstance(cmd,basestring)
>      -+    expand = not isinstance(cmd, list)
>      -     if verbose:
>      -         sys.stderr.write("executing %s\n" % str(cmd))
>      -     retcode = subprocess.call(cmd, shell=expand)
>      -@@
>      -     return retcode
>      -
>      - def p4_system(cmd):
>      --    """Specifically invoke p4 as the system command. """
>      -+    """ Specifically invoke p4 as the system command.
>      -+    """
>      -     real_cmd = p4_build_cmd(cmd)
>      --    expand = isinstance(real_cmd, basestring)
>      -+    expand = not isinstance(real_cmd, list)
>      -     retcode = subprocess.call(real_cmd, shell=expand)
>      -     if retcode:
>      -         raise CalledProcessError(retcode, real_cmd)
>       @@
>            return int(results[0]['change'])
>
>      @@ -234,7 +110,7 @@
>       -       results."""
>       +    """ Returns information about the requested P4 change list.
>       +
>      -+        Data returns is not string encoded (returned as bytes)
>      ++        Data returned is not string encoded (returned as bytes)
>       +    """
>       +    # Make sure it returns a valid result by checking for
>       +    #   the presence of field "time".  Return a dict of the
>      @@ -261,218 +137,29 @@
>            if "time" not in d:
>                die("p4 describe -s %d returned no \"time\": %s" % (change, str(d)))
>
>      -+    # Convert depotFile(X) to be UTF-8 encoded, as this is what GIT
>      -+    # requires. This will also allow us to encode the rest of the text
>      -+    # at the same time to simplify textual processing later.
>      ++    # Do not convert 'depotFile(X)' or 'path' to be UTF-8 encoded, however
>      ++    # cast as_string() the rest of the text.
>       +    keys=d.keys()
>       +    for key in keys:
>       +        if key.startswith('depotFile'):
>      -+            d[key]=d[key] #DepotPath(d[key])
>      ++            d[key]=d[key]
>       +        elif key == 'path':
>      -+            d[key]=d[key] #DepotPath(d[key])
>      ++            d[key]=d[key]
>       +        else:
>       +            d[key] = as_string(d[key])
>       +
>            return d
>
>      --#
>      --# Canonicalize the p4 type and return a tuple of the
>      --# base type, plus any modifiers.  See "p4 help filetypes"
>      --# for a list and explanation.
>      --#
>      - def split_p4_type(p4type):
>      --
>      -+    """ Canonicalize the p4 type and return a tuple of the
>      -+        base type, plus any modifiers.  See "p4 help filetypes"
>      -+        for a list and explanation.
>      -+    """
>      -     p4_filetypes_historical = {
>      -         "ctempobj": "binary+Sw",
>      -         "ctext": "text+C",
>      -@@
>      -         mods = s[1]
>      -     return (base, mods)
>      -
>      --#
>      --# return the raw p4 type of a file (text, text+ko, etc)
>      --#
>      - def p4_type(f):
>      -+    """ return the raw p4 type of a file (text, text+ko, etc)
>      -+    """
>      -     results = p4CmdList(["fstat", "-T", "headType", wildcard_encode(f)])
>      -     return results[0]['headType']
>      -
>      --#
>      --# Given a type base and modifier, return a regexp matching
>      --# the keywords that can be expanded in the file
>      --#
>      - def p4_keywords_regexp_for_type(base, type_mods):
>      -+    """ Given a type base and modifier, return a regexp matching
>      -+        the keywords that can be expanded in the file
>      -+    """
>      -     if base in ("text", "unicode", "binary"):
>      -         kwords = None
>      -         if "ko" in type_mods:
>      -@@
>      -     else:
>      -         return None
>      -
>      --#
>      --# Given a file, return a regexp matching the possible
>      --# RCS keywords that will be expanded, or None for files
>      --# with kw expansion turned off.
>      --#
>      - def p4_keywords_regexp_for_file(file):
>      -+    """ Given a file, return a regexp matching the possible
>      -+        RCS keywords that will be expanded, or None for files
>      -+        with kw expansion turned off.
>      -+    """
>      -     if not os.path.exists(file):
>      -         return None
>      -     else:
>      -@@
>      - # Return the set of all p4 labels
>      - def getP4Labels(depotPaths):
>      -     labels = set()
>      --    if isinstance(depotPaths,basestring):
>      -+    if not isinstance(depotPaths, list):
>      -         depotPaths = [depotPaths]
>      -
>      -     for l in p4CmdList(["labels"] + ["%s..." % p for p in depotPaths]):
>      -@@
>      -
>      -     return labels
>      -
>      --# Return the set of all git tags
>      - def getGitTags():
>      -+    """Return the set of all git tags"""
>      -     gitTags = set()
>      -     for line in read_pipe_lines(["git", "tag"]):
>      -         tag = line.strip()
>      -@@
>      -
>      -     If the pattern is not matched, None is returned."""
>      -
>      --    match = diffTreePattern().next().match(entry)
>      -+    match = next(diffTreePattern()).match(entry)
>      -     if match:
>      -         return {
>      -             'src_mode': match.group(1),
>      -@@
>      -     # otherwise False.
>      -     return mode[-3:] == "755"
>      -
>      -+def encodeWithUTF8(path, verbose = False):
>      -+    """ Ensure that the path is encoded as a UTF-8 string
>      -+
>      -+        Returns bytes(P3)/str(P2)
>      -+    """
>      -+
>      -+    if isunicode:
>      -+        try:
>      -+            if isinstance(path, unicode):
>      -+                # It is already unicode, cast it as a bytes
>      -+                # that is encoded as utf-8.
>      -+                return path.encode('utf-8', 'strict')
>      -+            path.decode('ascii', 'strict')
>      -+        except:
>      -+            encoding = 'utf8'
>      -+            if gitConfig('git-p4.pathEncoding'):
>      -+                encoding = gitConfig('git-p4.pathEncoding')
>      -+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
>      -+            if verbose:
>      -+                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
>      -+    else:
>      -+        try:
>      -+            path.decode('ascii')
>      -+        except:
>      -+            encoding = 'utf8'
>      -+            if gitConfig('git-p4.pathEncoding'):
>      -+                encoding = gitConfig('git-p4.pathEncoding')
>      -+            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
>      -+            if verbose:
>      -+                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
>      -+    return path
>      -+
>      - class P4Exception(Exception):
>      -     """ Base class for exceptions from the p4 client """
>      -     def __init__(self, exit_code):
>      -@@
>      -     return isModeExec(src_mode) != isModeExec(dst_mode)
>      -
>      - def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
>      --        errors_as_exceptions=False):
>      -+        errors_as_exceptions=False, encode_data=True):
>      -+    """ Executes a P4 command:  'cmd' optionally passing 'stdin' to the command's
>      -+        standard input via a temporary file with 'stdin_mode' mode.
>      -+
>      -+        Output from the command is optionally passed to the callback function 'cb'.
>      -+        If 'cb' is None, the response from the command is parsed into a list
>      -+        of resulting dictionaries. (For each block read from the process pipe.)
>      -+
>      -+        If 'skip_info' is true, information in a block read that has a code type of
>      -+        'info' will be skipped.
>      -
>      --    if isinstance(cmd,basestring):
>      -+        If 'errors_as_exceptions' is set to true (the default is false) the error
>      -+        code returned from the execution will generate an exception.
>      -+
>      -+        If 'encode_data' is set to true (the default) the data that is returned
>      -+        by this function will be passed through the "as_string" function.
>      -+    """
>      -+
>      -+    if not isinstance(cmd, list):
>      -         cmd = "-G " + cmd
>      -         expand = True
>      -     else:
>      -@@
>      -     stdin_file = None
>      -     if stdin is not None:
>      -         stdin_file = tempfile.TemporaryFile(prefix='p4-stdin', mode=stdin_mode)
>      --        if isinstance(stdin,basestring):
>      -+        if not isinstance(stdin, list):
>      -             stdin_file.write(stdin)
>      -         else:
>      -             for i in stdin:
>      --                stdin_file.write(i + '\n')
>      -+                stdin_file.write(as_bytes(i) + b'\n')
>      -         stdin_file.flush()
>      -         stdin_file.seek(0)
>      -
>      -@@
>      -         while True:
>      -             entry = marshal.load(p4.stdout)
>      -             if skip_info:
>      --                if 'code' in entry and entry['code'] == 'info':
>      -+                if b'code' in entry and entry[b'code'] == b'info':
>      -                     continue
>      -             if cb is not None:
>      -                 cb(entry)
>      -             else:
>      --                result.append(entry)
>      -+                out = {}
>      -+                for key, value in entry.items():
>      -+                    out[as_string(key)] = (as_string(value) if encode_data else value)
>      -+                result.append(out)
>      -     except EOFError:
>      -         pass
>      -     exitCode = p4.wait()
>      + #
>       @@
>            return result
>
>        def p4Cmd(cmd):
>      -+    """ Executes a P4 command an returns the results in a dictionary"""
>      ++    """ Executes a P4 command and returns the results in a dictionary
>      ++    """
>            list = p4CmdList(cmd)
>            result = {}
>            for entry in list:
>      -@@
>      -     return values
>      -
>      - def gitBranchExists(branch):
>      -+    """Checks to see if a given branch exists in the git repo"""
>      -     proc = subprocess.Popen(["git", "rev-parse", branch],
>      -                             stderr=subprocess.PIPE, stdout=subprocess.PIPE);
>      -     return proc.wait() == 0;
>       @@
>        _gitConfig = {}
>
>      @@ -490,29 +177,6 @@
>            return _gitConfig[key]
>
>        def gitConfigBool(key):
>      --    """Return a bool, using git config --bool.  It is True only if the
>      --       variable is set to true, and False if set to false or not present
>      --       in the config."""
>      --
>      -+    """ Return a bool, using git config --bool.  It is True only if the
>      -+        variable is set to true, and False if set to false or not present
>      -+        in the config.
>      -+    """
>      -     if key not in _gitConfig:
>      -         _gitConfig[key] = gitConfig(key, '--bool') == "true"
>      -     return _gitConfig[key]
>      -@@
>      -             _gitConfig[key] = []
>      -     return _gitConfig[key]
>      -
>      -+def gitConfigSet(key, value):
>      -+    """ Set the git configuration key 'key' to 'value' for this session
>      -+    """
>      -+    _gitConfig[key] = value
>      -+
>      - def p4BranchesInGit(branchesAreInRemotes=True):
>      -     """Find all the branches whose names start with "p4/", looking
>      -        in remotes or heads as specified by the argument.  Return
>       @@
>            cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ]
>            p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>      @@ -521,34 +185,6 @@
>            if p.returncode:
>                return False
>            # expect exactly one line of output: the branch name
>      -@@
>      -     branches = p4BranchesInGit()
>      -     # map from depot-path to branch name
>      -     branchByDepotPath = {}
>      --    for branch in branches.keys():
>      -+    for branch in list(branches.keys()):
>      -         tip = branches[branch]
>      -         log = extractLogMessageFromGitCommit(tip)
>      -         settings = extractSettingsGitLog(log)
>      -@@
>      -             system("git update-ref %s %s" % (remoteHead, originHead))
>      -
>      - def originP4BranchesExist():
>      --        return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
>      -+    """Checks if origin/p4/master exists"""
>      -+    return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
>      -
>      -
>      - def p4ParseNumericChangeRange(parts):
>      -@@
>      -     changes = sorted(changes)
>      -     return changes
>      -
>      --def p4PathStartsWith(path, prefix):
>      -+def p4PathStartsWith(path, prefix, verbose = False):
>      -     # This method tries to remedy a potential mixed-case issue:
>      -     #
>      -     # If UserA adds  //depot/DirA/file1
>       @@
>            #
>            # we may or may not have a problem. If you have core.ignorecase=true,
>      @@ -574,15 +210,6 @@
>
>        def getClientSpec():
>            """Look at the p4 client spec, create a View() object that contains
>      -@@
>      -     client_name = entry["Client"]
>      -
>      -     # just the keys that start with "View"
>      --    view_keys = [ k for k in entry.keys() if k.startswith("View") ]
>      -+    view_keys = [ k for k in list(entry.keys()) if k.startswith("View") ]
>      -
>      -     # hold this new View
>      -     view = View(client_name)
>       @@
>            # Cannot have * in a filename in windows; untested as to
>            # what p4 would do in such a case.
>      @@ -626,45 +253,16 @@
>                    os.remove(contentFile)
>                    die('git-lfs pointer command failed. Did you install the extension?')
>       @@
>      -         else:
>      -             return LargeFileSystem.processContent(self, git_mode, relPath, contents)
>      -
>      --class Command:
>      -+class Command(object):
>      -     delete_actions = ( "delete", "move/delete", "purge" )
>      -     add_actions = ( "add", "branch", "move/add" )
>      -
>      -@@
>      -             setattr(self, attr, value)
>      -         return getattr(self, attr)
>      -
>      --class P4UserMap:
>      -+class P4UserMap(object):
>      -     def __init__(self):
>      -         self.userMapFromPerforceServer = False
>      -         self.myP4UserId = None
>      -@@
>      -             return True
>      -
>      -     def getUserCacheFilename(self):
>      -+        """ Returns the filename of the username cache """
>      -         home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
>      --        return home + "/.gitp4-usercache.txt"
>      -+        return os.path.join(home, ".gitp4-usercache.txt")
>      +         return os.path.join(home, ".gitp4-usercache.txt")
>
>            def getUserMapFromPerforceServer(self):
>       +        """ Creates the usercache from the data in P4.
>       +        """
>      -+
>                if self.userMapFromPerforceServer:
>                    return
>                self.users = {}
>       @@
>      -                 self.emails[email] = user
>      -
>      -         s = ''
>      --        for (key, val) in self.users.items():
>      -+        for (key, val) in list(self.users.items()):
>      +         for (key, val) in list(self.users.items()):
>                    s += "%s\t%s\n" % (key.expandtabs(1), val.expandtabs(1))
>
>       -        open(self.getUserCacheFilename(), "wb").write(s)
>      @@ -674,7 +272,8 @@
>                self.userMapFromPerforceServer = True
>
>            def loadUserMapFromCache(self):
>      -+        """ Reads the P4 username to git email map """
>      ++        """ Reads the P4 username to git email map
>      ++        """
>                self.users = {}
>                self.userMapFromPerforceServer = False
>                try:
>      @@ -721,80 +320,6 @@
>                    # cleanup our temporary file
>                    os.unlink(outFileName)
>                    print("Failed to strip RCS keywords in %s" % file)
>      -@@
>      -                 break
>      -         if not change_entry:
>      -             die('Failed to decode output of p4 change -o')
>      --        for key, value in change_entry.iteritems():
>      -+        for key, value in list(change_entry.items()):
>      -             if key.startswith('File'):
>      -                 if 'depot-paths' in settings:
>      -                     if not [p for p in settings['depot-paths']
>      --                            if p4PathStartsWith(value, p)]:
>      -+                            if p4PathStartsWith(value, p, self.verbose)]:
>      -                         continue
>      -                 else:
>      --                    if not p4PathStartsWith(value, self.depotPath):
>      -+                    if not p4PathStartsWith(value, self.depotPath, self.verbose):
>      -                         continue
>      -                 files_list.append(value)
>      -                 continue
>      -@@
>      -             return True
>      -
>      -         while True:
>      --            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
>      -+            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
>      -+                .strip()[0]
>      -             if response == 'y':
>      -                 return True
>      -             if response == 'n':
>      -@@
>      -     def applyCommit(self, id):
>      -         """Apply one commit, return True if it succeeded."""
>      -
>      --        print("Applying", read_pipe(["git", "show", "-s",
>      --                                     "--format=format:%h %s", id]))
>      -+        print(("Applying", read_pipe(["git", "show", "-s",
>      -+                                     "--format=format:%h %s", id])))
>      -
>      -         (p4User, gitEmail) = self.p4UserForCommit(id)
>      -
>      -@@
>      -                     # disable the read-only bit on windows.
>      -                     if self.isWindows and file not in editedFiles:
>      -                         os.chmod(file, stat.S_IWRITE)
>      --                    self.patchRCSKeywords(file, kwfiles[file])
>      --                    fixed_rcs_keywords = True
>      -+
>      -+                    try:
>      -+                        self.patchRCSKeywords(file, kwfiles[file])
>      -+                        fixed_rcs_keywords = True
>      -+                    except:
>      -+                        # We are throwing an exception, undo all open edits
>      -+                        for f in editedFiles:
>      -+                            p4_revert(f)
>      -+                        raise
>      -+            else:
>      -+                # They do not have attemptRCSCleanup set, this might be the fail point
>      -+                # Check to see if the file has RCS keywords and suggest setting the property.
>      -+                for file in editedFiles | filesToDelete:
>      -+                    if p4_keywords_regexp_for_file(file) != None:
>      -+                        print("At least one file in this commit has RCS Keywords that may be causing problems. ")
>      -+                        print("Consider:\ngit config git-p4.attemptRCSCleanup true")
>      -+                        break
>      -
>      -             if fixed_rcs_keywords:
>      -                 print("Retrying the patch with RCS keywords cleaned up")
>      -@@
>      -             p4_delete(f)
>      -
>      -         # Set/clear executable bits
>      --        for f in filesToChangeExecBit.keys():
>      -+        for f in list(filesToChangeExecBit.keys()):
>      -             mode = filesToChangeExecBit[f]
>      -             setP4ExecBit(f, mode)
>      -
>       @@
>                tmpFile = os.fdopen(handle, "w+b")
>                if self.isWindows:
>      @@ -815,179 +340,6 @@
>
>                        if update_shelve:
>                            p4_write_pipe(['shelve', '-r', '-i'], submitTemplate)
>      -@@
>      -                 if verbose:
>      -                     print("created p4 label for tag %s" % name)
>      -
>      -+    def run_hook(self, hook_name, args = []):
>      -+        """ Runs a hook if it is found.
>      -+
>      -+            Returns NONE if the hook does not exist
>      -+            Returns TRUE if the exit code is 0, FALSE for a non-zero exit code.
>      -+        """
>      -+        hook_file = self.find_hook(hook_name)
>      -+        if hook_file == None:
>      -+            if self.verbose:
>      -+                print("Skipping hook: %s" % hook_name)
>      -+            return None
>      -+
>      -+        if self.verbose:
>      -+            print("hooks_path = %s " % hooks_path)
>      -+            print("hook_file = %s " % hook_file)
>      -+
>      -+        # Run the hook
>      -+        # TODO - allow non-list format
>      -+        cmd = [hook_file] + args
>      -+        return subprocess.call(cmd) == 0
>      -+
>      -+    def find_hook(self, hook_name):
>      -+        """ Locates the hook file for the given operating system.
>      -+        """
>      -+        hooks_path = gitConfig("core.hooksPath")
>      -+        if len(hooks_path) <= 0:
>      -+            hooks_path = os.path.join(os.environ.get("GIT_DIR", ".git"), "hooks")
>      -+
>      -+        # Look in the obvious place
>      -+        hook_file = os.path.join(hooks_path, hook_name)
>      -+        if os.path.isfile(hook_file) and os.access(hook_file, os.X_OK):
>      -+            return hook_file
>      -+
>      -+        # if we are windows, we will also allow them to have the hooks have extensions
>      -+        if (platform.system() == "Windows"):
>      -+            for ext in ['.exe', '.bat', 'ps1']:
>      -+                if os.path.isfile(hook_file + ext) and os.access(hook_file + ext, os.X_OK):
>      -+                    return hook_file + ext
>      -+
>      -+        # We didn't find the file
>      -+        return None
>      -+
>      -+
>      -+
>      -     def run(self, args):
>      -         if len(args) == 0:
>      -             self.master = currentGitBranch()
>      -@@
>      -             self.clientSpecDirs = getClientSpec()
>      -
>      -         # Check for the existence of P4 branches
>      --        branchesDetected = (len(p4BranchesInGit().keys()) > 1)
>      -+        branchesDetected = (len(list(p4BranchesInGit().keys())) > 1)
>      -
>      -         if self.useClientSpec and not branchesDetected:
>      -             # all files are relative to the client spec
>      -@@
>      -             sys.exit("number of commits (%d) must match number of shelved changelist (%d)" %
>      -                      (len(commits), num_shelves))
>      -
>      --        hooks_path = gitConfig("core.hooksPath")
>      --        if len(hooks_path) <= 0:
>      --            hooks_path = os.path.join(os.environ.get("GIT_DIR", ".git"), "hooks")
>      --
>      --        hook_file = os.path.join(hooks_path, "p4-pre-submit")
>      --        if os.path.isfile(hook_file) and os.access(hook_file, os.X_OK) and subprocess.call([hook_file]) != 0:
>      -+        rtn = self.run_hook("p4-pre-submit")
>      -+        if rtn == False:
>      -             sys.exit(1)
>      -
>      -         #
>      -@@
>      -         last = len(commits) - 1
>      -         for i, commit in enumerate(commits):
>      -             if self.dry_run:
>      --                print(" ", read_pipe(["git", "show", "-s",
>      --                                      "--format=format:%h %s", commit]))
>      -+                print((" ", read_pipe(["git", "show", "-s",
>      -+                                      "--format=format:%h %s", commit])))
>      -                 ok = True
>      -             else:
>      -                 ok = self.applyCommit(commit)
>      -@@
>      -                         if self.conflict_behavior == "ask":
>      -                             print("What do you want to do?")
>      -                             response = raw_input("[s]kip this commit but apply"
>      --                                                 " the rest, or [q]uit? ")
>      -+                                                 " the rest, or [q]uit? ").lower().strip()[0]
>      -                             if not response:
>      -                                 continue
>      -                         elif self.conflict_behavior == "skip":
>      -@@
>      -                         star = "*"
>      -                     else:
>      -                         star = " "
>      --                    print(star, read_pipe(["git", "show", "-s",
>      --                                           "--format=format:%h %s",  c]))
>      -+                    print((star, read_pipe(["git", "show", "-s",
>      -+                                           "--format=format:%h %s",  c])))
>      -                 print("You will have to do 'git p4 sync' and rebase.")
>      -
>      -         if gitConfigBool("git-p4.exportLabels"):
>      -@@
>      -     # ("-//depot/A/..." becomes "/depot/A/..." after option parsing)
>      -     parser.values.cloneExclude += ["/" + re.sub(r"\.\.\.$", "", value)]
>      -
>      -+
>      - class P4Sync(Command, P4UserMap):
>      -
>      -     def __init__(self):
>      -@@
>      -         self.knownBranches = {}
>      -         self.initialParents = {}
>      -
>      --        self.tz = "%+03d%02d" % (- time.timezone / 3600, ((- time.timezone % 3600) / 60))
>      -+        self.tz = "%+03d%02d" % (- time.timezone // 3600, ((- time.timezone % 3600) // 60))
>      -         self.labels = {}
>      -
>      -     # Force a checkpoint in fast-import and wait for it to finish
>      -@@
>      -     def isPathWanted(self, path):
>      -         for p in self.cloneExclude:
>      -             if p.endswith("/"):
>      --                if p4PathStartsWith(path, p):
>      -+                if p4PathStartsWith(path, p, self.verbose):
>      -                     return False
>      -             # "-//depot/file1" without a trailing "/" should only exclude "file1", but not "file111" or "file1_dir/file2"
>      -             elif path.lower() == p.lower():
>      -                 return False
>      -         for p in self.depotPaths:
>      --            if p4PathStartsWith(path, p):
>      -+            if p4PathStartsWith(path, p, self.verbose):
>      -                 return True
>      -         return False
>      -
>      -     def extractFilesFromCommit(self, commit, shelved=False, shelved_cl = 0):
>      -+        """ Generates the list of files to be added in this git commit.
>      -+
>      -+            commit     = Unicode[] - data read from the P4 commit
>      -+            shelved    = Bool      - Is the P4 commit flagged as being shelved.
>      -+            shelved_cl = Unicode   - Numeric string with the changelist number.
>      -+        """
>      -         files = []
>      -         fnum = 0
>      -         while "depotFile%s" % fnum in commit:
>      -@@
>      -             path = self.clientSpecDirs.map_in_client(path)
>      -             if self.detectBranches:
>      -                 for b in self.knownBranches:
>      --                    if p4PathStartsWith(path, b + "/"):
>      -+                    if p4PathStartsWith(path, b + "/", self.verbose):
>      -                         path = path[len(b)+1:]
>      -
>      -         elif self.keepRepoPath:
>      -@@
>      -             # //depot/; just look at first prefix as they all should
>      -             # be in the same depot.
>      -             depot = re.sub("^(//[^/]+/).*", r'\1', prefixes[0])
>      --            if p4PathStartsWith(path, depot):
>      -+            if p4PathStartsWith(path, depot, self.verbose):
>      -                 path = path[len(depot):]
>      -
>      -         else:
>      -             for p in prefixes:
>      --                if p4PathStartsWith(path, p):
>      -+                if p4PathStartsWith(path, p, self.verbose):
>      -                     path = path[len(p):]
>      -                     break
>      -
>       @@
>                return path
>
>      @@ -1002,19 +354,6 @@
>
>                if self.clientSpecDirs:
>                    files = self.extractFilesFromCommit(commit)
>      -@@
>      -             else:
>      -                 relPath = self.stripRepoPath(path, self.depotPaths)
>      -
>      --            for branch in self.knownBranches.keys():
>      -+            for branch in list(self.knownBranches.keys()):
>      -                 # add a trailing slash so that a commit into qt/4.2foo
>      -                 # doesn't end up in qt/4.2, e.g.
>      --                if p4PathStartsWith(relPath, branch + "/"):
>      -+                if p4PathStartsWith(relPath, branch + "/", self.verbose):
>      -                     if branch not in branches:
>      -                         branches[branch] = []
>      -                     branches[branch].append(file)
>       @@
>                return branches
>
>      @@ -1031,18 +370,6 @@
>       +            self.gitStreamBytes.write(d)
>                self.gitStream.write('\n')
>
>      --    def encodeWithUTF8(self, path):
>      --        try:
>      --            path.decode('ascii')
>      --        except:
>      --            encoding = 'utf8'
>      --            if gitConfig('git-p4.pathEncoding'):
>      --                encoding = gitConfig('git-p4.pathEncoding')
>      --            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
>      --            if self.verbose:
>      --                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
>      --        return path
>      --
>       -    # output one file from the P4 stream
>       -    # - helper for streamP4Files
>       -
>      @@ -1053,18 +380,13 @@
>       +            contents should be a bytes (bytes)
>       +        """
>                relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
>      --        relPath = self.encodeWithUTF8(relPath)
>      -+        relPath = encodeWithUTF8(relPath, self.verbose)
>      +         relPath = encodeWithUTF8(relPath, self.verbose)
>                if verbose:
>      -             if 'fileSize' in self.stream_file:
>      +@@
>                        size = int(self.stream_file['fileSize'])
>                    else:
>                        size = 0 # deleted files don't get a fileSize apparently
>      --            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024))
>      -+            #if isunicode:
>      -+            #    sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), to_unicode(relPath), size//1024//1024))
>      -+            #else:
>      -+            #    sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), relPath, size//1024//1024))
>      +-            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size//1024//1024))
>       +            sys.stdout.write('\r%s --> %s (%i MB)\n' % (path_as_string(file['depotFile']), as_string(relPath), size//1024//1024))
>                    sys.stdout.flush()
>
>      @@ -1100,15 +422,6 @@
>
>                if self.largeFileSystem:
>       @@
>      -
>      -     def streamOneP4Deletion(self, file):
>      -         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
>      --        relPath = self.encodeWithUTF8(relPath)
>      -+        relPath = encodeWithUTF8(relPath, self.verbose)
>      -         if verbose:
>      -             sys.stdout.write("delete %s\n" % relPath)
>      -             sys.stdout.flush()
>      -@@
>                if self.largeFileSystem and self.largeFileSystem.isLargeFile(relPath):
>                    self.largeFileSystem.removeLargeFile(relPath)
>
>      @@ -1133,13 +446,6 @@
>
>                if not err and 'fileSize' in self.stream_file:
>                    required_bytes = int((4 * int(self.stream_file["fileSize"])) - calcDiskFree())
>      -             if required_bytes > 0:
>      -                 err = 'Not enough space left on %s! Free at least %i MB.' % (
>      --                    os.getcwd(), required_bytes/1024/1024
>      -+                    os.getcwd(), required_bytes//1024//1024
>      -                 )
>      -
>      -         if err:
>       @@
>                    # ignore errors, but make sure it exits first
>                    self.importProcess.wait()
>      @@ -1155,12 +461,10 @@
>                    self.streamOneP4File(self.stream_file, self.stream_contents)
>                    self.stream_file = {}
>       @@
>      -
>                # pick up the new file information... for the
>                # 'data' field we need to append to our array
>      --        for k in marshalled.keys():
>      +         for k in list(marshalled.keys()):
>       -            if k == 'data':
>      -+        for k in list(marshalled.keys()):
>       +            if k == b'data':
>                        if 'streamContentSize' not in self.stream_file:
>                            self.stream_file['streamContentSize'] = 0
>      @@ -1178,12 +482,10 @@
>                if (verbose and
>                    'streamContentSize' in self.stream_file and
>       @@
>      -             'depotFile' in self.stream_file):
>                    size = int(self.stream_file["fileSize"])
>                    if size > 0:
>      --                progress = 100*self.stream_file['streamContentSize']/size
>      --                sys.stdout.write('\r%s %d%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size/1024/1024)))
>      -+                progress = 100.0*self.stream_file['streamContentSize']/size
>      +                 progress = 100.0*self.stream_file['streamContentSize']/size
>      +-                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (self.stream_file['depotFile'], progress, int(size//1024//1024)))
>       +                sys.stdout.write('\r%s %4.1f%% (%i MB)' % (path_as_string(self.stream_file['depotFile']), progress, int(size//1024//1024)))
>                        sys.stdout.flush()
>
>      @@ -1227,24 +529,6 @@
>
>                if verbose:
>       @@
>      -
>      -         gitStream.write("tagger %s\n" % tagger)
>      -
>      --        print("labelDetails=",labelDetails)
>      -+        print(("labelDetails=",labelDetails))
>      -         if 'Description' in labelDetails:
>      -             description = labelDetails['Description']
>      -         else:
>      -@@
>      -         if not self.branchPrefixes:
>      -             return True
>      -         hasPrefix = [p for p in self.branchPrefixes
>      --                        if p4PathStartsWith(path, p)]
>      -+                        if p4PathStartsWith(path, p, self.verbose)]
>      -         if not hasPrefix and self.verbose:
>      -             print('Ignoring file outside of prefix: {0}'.format(path))
>      -         return hasPrefix
>      -@@
>                        .format(details['change']))
>                    return
>
>      @@ -1307,58 +591,6 @@
>
>                if len(parent) > 0:
>                    if self.verbose:
>      -@@
>      -             self.labels[newestChange] = [output, revisions]
>      -
>      -         if self.verbose:
>      --            print("Label changes: %s" % self.labels.keys())
>      -+            print("Label changes: %s" % list(self.labels.keys()))
>      -
>      -     # Import p4 labels as git tags. A direct mapping does not
>      -     # exist, so assume that if all the files are at the same revision
>      -@@
>      -                 source = paths[0]
>      -                 destination = paths[1]
>      -                 ## HACK
>      --                if p4PathStartsWith(source, self.depotPaths[0]) and p4PathStartsWith(destination, self.depotPaths[0]):
>      -+                if p4PathStartsWith(source, self.depotPaths[0], self.verbose) and p4PathStartsWith(destination, self.depotPaths[0], self.verbose):
>      -                     source = source[len(self.depotPaths[0]):-4]
>      -                     destination = destination[len(self.depotPaths[0]):-4]
>      -
>      -@@
>      -
>      -     def getBranchMappingFromGitBranches(self):
>      -         branches = p4BranchesInGit(self.importIntoRemotes)
>      --        for branch in branches.keys():
>      -+        for branch in list(branches.keys()):
>      -             if branch == "master":
>      -                 branch = "main"
>      -             else:
>      -@@
>      -             self.updateOptionDict(description)
>      -
>      -             if not self.silent:
>      --                sys.stdout.write("\rImporting revision %s (%s%%)" % (change, cnt * 100 / len(changes)))
>      -+                sys.stdout.write("\rImporting revision %s (%4.1f%%)" % (change, cnt * 100 / len(changes)))
>      -                 sys.stdout.flush()
>      -             cnt = cnt + 1
>      -
>      -             try:
>      -                 if self.detectBranches:
>      -                     branches = self.splitFilesIntoBranches(description)
>      --                    for branch in branches.keys():
>      -+                    for branch in list(branches.keys()):
>      -                         ## HACK  --hwn
>      -                         branchPrefix = self.depotPaths[0] + branch + "/"
>      -                         self.branchPrefixes = [ branchPrefix ]
>      -@@
>      -                 sys.exit(1)
>      -
>      -     def sync_origin_only(self):
>      -+        """ Ensures that the origin has been synchronized if one is set """
>      -         if self.syncWithOrigin:
>      -             self.hasOrigin = originP4BranchesExist()
>      -             if self.hasOrigin:
>       @@
>                        system("git fetch origin")
>
>      @@ -1439,61 +671,6 @@
>            def closeStreams(self):
>                self.gitStream.close()
>       @@
>      -                 if short in branches:
>      -                     self.p4BranchesInGit = [ short ]
>      -             else:
>      --                self.p4BranchesInGit = branches.keys()
>      -+                self.p4BranchesInGit = list(branches.keys())
>      -
>      -             if len(self.p4BranchesInGit) > 1:
>      -                 if not self.silent:
>      -                     print("Importing from/into multiple branches")
>      -                 self.detectBranches = True
>      --                for branch in branches.keys():
>      -+                for branch in list(branches.keys()):
>      -                     self.initialParents[self.refPrefix + branch] = \
>      -                         branches[branch]
>      -
>      -@@
>      -                                  help="where to leave result of the clone"),
>      -             optparse.make_option("--bare", dest="cloneBare",
>      -                                  action="store_true", default=False),
>      -+            optparse.make_option("--encoding", dest="setPathEncoding",
>      -+                                 action="store", default=None,
>      -+                                 help="Sets the path encoding for this depot")
>      -         ]
>      -         self.cloneDestination = None
>      -         self.needsGit = False
>      -         self.cloneBare = False
>      -+        self.setPathEncoding = None
>      -
>      -     def defaultDestination(self, args):
>      -+        """Returns the last path component as the default git
>      -+        repository directory name"""
>      -         ## TODO: use common prefix of args?
>      -         depotPath = args[0]
>      -         depotDir = re.sub("(@[^@]*)$", "", depotPath)
>      -         depotDir = re.sub("(#[^#]*)$", "", depotDir)
>      -         depotDir = re.sub(r"\.\.\.$", "", depotDir)
>      -         depotDir = re.sub(r"/$", "", depotDir)
>      --        return os.path.split(depotDir)[1]
>      -+        return depotDir.split('/')[-1]
>      -
>      -     def run(self, args):
>      -         if len(args) < 1:
>      -@@
>      -
>      -         depotPaths = args
>      -
>      -+        # If we have an encoding provided, ignore what may already exist
>      -+        # in the registry. This will ensure we show the displayed values
>      -+        # using the correct encoding.
>      -+        if self.setPathEncoding:
>      -+            gitConfigSet("git-p4.pathEncoding", self.setPathEncoding)
>      -+
>      -+        # If more than 1 path element is supplied, the last element
>      -+        # is the clone destination.
>      -         if not self.cloneDestination and len(depotPaths) > 1:
>                    self.cloneDestination = depotPaths[-1]
>                    depotPaths = depotPaths[:-1]
>
>      @@ -1512,177 +689,3 @@
>
>                if not os.path.exists(self.cloneDestination):
>                    os.makedirs(self.cloneDestination)
>      -@@
>      -         if retcode:
>      -             raise CalledProcessError(retcode, init_cmd)
>      -
>      -+        # Set the encoding if it was provided command line
>      -+        if self.setPathEncoding:
>      -+            init_cmd= ["git", "config", "git-p4.pathEncoding", self.setPathEncoding]
>      -+            retcode = subprocess.call(init_cmd)
>      -+            if retcode:
>      -+                raise CalledProcessError(retcode, init_cmd)
>      -+
>      -         if not P4Sync.run(self, depotPaths):
>      -             return False
>      -
>      -@@
>      -             to find the P4 commit we are based on, and the depot-paths.
>      -         """
>      -
>      --        for parent in (range(65535)):
>      -+        for parent in (list(range(65535))):
>      -             log = extractLogMessageFromGitCommit("{0}^{1}".format(starting_point, parent))
>      -             settings = extractSettingsGitLog(log)
>      -             if 'change' in settings:
>      -@@
>      -             print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
>      -         return True
>      -
>      -+class Py23File():
>      -+    """ Python2/3 Unicode File Wrapper
>      -+    """
>      -+
>      -+    stream_handle = None
>      -+    verbose       = False
>      -+    debug_handle  = None
>      -+
>      -+    def __init__(self, stream_handle, verbose = False,
>      -+                 debug_handle = None):
>      -+        """ Create a Python3 compliant Unicode to Byte String
>      -+            Windows compatible wrapper
>      -+
>      -+            stream_handle = the underlying file-like handle
>      -+            verbose       = Boolean if content should be echoed
>      -+            debug_handle  = A file-like handle data is duplicately written to
>      -+        """
>      -+        self.stream_handle = stream_handle
>      -+        self.verbose       = verbose
>      -+        self.debug_handle  = debug_handle
>      -+
>      -+    def write(self, utf8string):
>      -+        """ Writes the utf8 encoded string to the underlying
>      -+            file stream
>      -+        """
>      -+        self.stream_handle.write(as_bytes(utf8string))
>      -+        if self.verbose:
>      -+            sys.stderr.write("Stream Output: %s" % utf8string)
>      -+            sys.stderr.flush()
>      -+        if self.debug_handle:
>      -+            self.debug_handle.write(as_bytes(utf8string))
>      -+
>      -+    def read(self, size = None):
>      -+        """ Reads int charcters from the underlying stream
>      -+            and converts it to utf8.
>      -+
>      -+            Be aware, the size value is for reading the underlying
>      -+            bytes so the value may be incorrect. Usage of the size
>      -+            value is discouraged.
>      -+        """
>      -+        if size == None:
>      -+            return as_string(self.stream_handle.read())
>      -+        else:
>      -+            return as_string(self.stream_handle.read(size))
>      -+
>      -+    def readline(self):
>      -+        """ Reads a line from the underlying byte stream
>      -+            and converts it to utf8
>      -+        """
>      -+        return as_string(self.stream_handle.readline())
>      -+
>      -+    def readlines(self, sizeHint = None):
>      -+        """ Returns a list containing lines from the file converted to unicode.
>      -+
>      -+            sizehint - Optional. If the optional sizehint argument is
>      -+            present, instead of reading up to EOF, whole lines totalling
>      -+            approximately sizehint bytes are read.
>      -+        """
>      -+        lines = self.stream_handle.readlines(sizeHint)
>      -+        for i in range(0, len(lines)):
>      -+            lines[i] = as_string(lines[i])
>      -+        return lines
>      -+
>      -+    def close(self):
>      -+        """ Closes the underlying byte stream """
>      -+        self.stream_handle.close()
>      -+
>      -+    def flush(self):
>      -+        """ Flushes the underlying byte stream """
>      -+        self.stream_handle.flush()
>      -+
>      -+class DepotPath():
>      -+    """ Describes a DepotPath or File
>      -+    """
>      -+
>      -+    raw_path = None
>      -+    utf8_path = None
>      -+    bytes_path = None
>      -+
>      -+    def __init__(self, path):
>      -+        """ Creates a new DepotPath with the path encoded
>      -+            with by the P4 repository
>      -+        """
>      -+        raw_path = path
>      -+
>      -+    def raw():
>      -+        """ Returns the path as it was originally found
>      -+            in the P4 repository
>      -+        """
>      -+        return raw_path
>      -+
>      -+    def startswith(self, prefix, start = None, end = None):
>      -+        """ Return True if string starts with the prefix, otherwise
>      -+            return False. prefix can also be a tuple of prefixes to
>      -+            look for. With optional start, test string beginning at
>      -+            that position. With optional end, stop comparing
>      -+            string at that position.
>      -+        """
>      -+        return raw_path.startswith(prefix, start, end)
>      -+
>      -+
>      - class HelpFormatter(optparse.IndentedHelpFormatter):
>      -     def __init__(self):
>      -         optparse.IndentedHelpFormatter.__init__(self)
>      -@@
>      -
>      - def main():
>      -     if len(sys.argv[1:]) == 0:
>      --        printUsage(commands.keys())
>      -+        printUsage(list(commands.keys()))
>      -         sys.exit(2)
>      -
>      -     cmdName = sys.argv[1]
>      -@@
>      -     except KeyError:
>      -         print("unknown command %s" % cmdName)
>      -         print("")
>      --        printUsage(commands.keys())
>      -+        printUsage(list(commands.keys()))
>      -         sys.exit(2)
>      -
>      -     options = cmd.options
>      -@@
>      -                                    description = cmd.description,
>      -                                    formatter = HelpFormatter())
>      -
>      --    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
>      -+    try:
>      -+        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
>      -+    except:
>      -+        parser.print_help()
>      -+        raise
>      -+
>      -     global verbose
>      -     verbose = cmd.verbose
>      -     if cmd.needsGit:
>      -@@
>      -                         chdir(cdup);
>      -
>      -         if not isValidGitDir(cmd.gitdir):
>      --            if isValidGitDir(cmd.gitdir + "/.git"):
>      --                cmd.gitdir += "/.git"
>      -+            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
>      -+                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
>      -             else:
>      -                 die("fatal: cannot locate git repository at %s" % cmd.gitdir)
>      -
>   -:  ---------- > 11:  883ef45ca5 git-p4: Added --encoding parameter to p4 clone
>
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 01/11] git-p4: select p4 binary by operating-system
  2019-12-04 22:29       ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
@ 2019-12-05 10:19         ` Denton Liu
  2019-12-05 16:32           ` Ben Keene
  0 siblings, 1 reply; 46+ messages in thread
From: Denton Liu @ 2019-12-05 10:19 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

Hi Ben,

First of all, as a note to you and possibly others, I don't have much
(read: any) experience with git-p4. I do have experience with Python and
how git.git generally does things so I'll be reviewing from that
perspective.

On Wed, Dec 04, 2019 at 10:29:27PM +0000, Ben Keene via GitGitGadget wrote:
> From: Ben Keene <seraphire@gmail.com>
> 
> Depending on the version of GIT and Python installed, the perforce program (p4) may not resolve on Windows without the program extension.

Nit: "GIT" should be written as "Git" when referring to the whole
project and "git" when referring to the command. Never in all-caps.

Also, please wrap your paragraphs at 72 characters. I'll say it once
here but it applies to your whole series.

> 
> Check the operating system (platform.system) and if it is reporting that it is Windows, use the full filename of "p4.exe" instead of "p4"
> 
> The original code unconditionally used "p4" as the binary filename.

As a rule of thumb, we want to state the problem first before we state
what we did (and why). I'd move this paragraph up.

> 
> This change is Python2 and Python3 compatible.
> 
> Thanks to: Junio C Hamano <gitster@pobox.com> and  Denton Liu <liu.denton@gmail.com> for patiently explaining proper format for my submissions.

I appreciate the credit but I don't think it's necessary. At _most_, you
could include the

	Helped-by: Junio C Hamano <gitster@pobox.com>
	Helped-by: Denton Liu <liu.denton@gmail.com>

tags before your signoff but I don't think we've done anything to
warrant it.

> 
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit 9a3a5c4e6d29dbef670072a9605c7a82b3729434)

You should remove this line in all of your commits. The referenced
commit isn't public so the information isn't very useful. Also, try to
not include anything after your signoff so if this hypothetically were
useful information, you'd include it before your signoff.

If it's information that's ephemerally useful for current reviewers but
not for future readers of your commit in the log message, you can
include it after the three hyphens...

> ---
like this and it won't be included as part of the log message.

>  git-p4.py | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/git-p4.py b/git-p4.py
> index 60c73b6a37..b2ffbc057b 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -75,7 +75,11 @@ def p4_build_cmd(cmd):
>      location. It means that hooking into the environment, or other configuration
>      can be done more easily.
>      """
> -    real_cmd = ["p4"]
> +    # Look for the P4 binary

I don't think this comment is necessary as the code itself is pretty
self-explanatory.

> +    if (platform.system() == "Windows"):
> +        real_cmd = ["p4.exe"]    

You have trailing whitespace here. Try to run `git diff --check` before
committing to ensure that you have no whitespace errors.

Thanks,

Denton

> +    else:
> +        real_cmd = ["p4"]
>  
>      user = gitConfig("git-p4.user")
>      if len(user) > 0:
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 02/11] git-p4: change the expansion test from basestring to list
  2019-12-04 22:29       ` [PATCH v4 02/11] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
@ 2019-12-05 10:27         ` Denton Liu
  2019-12-05 17:05           ` Ben Keene
  0 siblings, 1 reply; 46+ messages in thread
From: Denton Liu @ 2019-12-05 10:27 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

Hi Ben,

On Wed, Dec 04, 2019 at 10:29:28PM +0000, Ben Keene via GitGitGadget wrote:
> From: Ben Keene <seraphire@gmail.com>
> 
> Python 3+ handles strings differently than Python 2.7.

Do you mean Python 3?

> Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.

Python 3.5 doesn't reach EOL until Q4 2020[1]. We should be testing
these changes under 3.5 to ensure that we're not accidentally
introducing stuff that's not backwards compatible.

> 
> Change references to basestring in the isinstance tests to use list instead. This prepares the code to remove all references to basestring.
> 
> The original code used basestring in a test to determine if a list or literal string was passed into 9 different functions.  This is used to determine if the shell should be evoked when calling subprocess methods.

Once again, I'd swap the above two paragraphs. Problem then solution.

Also, did you mean "invoked" instead of "evoked"?

> 
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit 5b1b1c145479b5d5fd242122737a3134890409e6)
> ---
>  git-p4.py | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)

The patch itself looks good, though.

[1]: https://devguide.python.org/#branchstatus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion
  2019-12-04 22:29       ` [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion Ben Keene via GitGitGadget
@ 2019-12-05 10:40         ` Denton Liu
  2019-12-05 18:42           ` Ben Keene
  0 siblings, 1 reply; 46+ messages in thread
From: Denton Liu @ 2019-12-05 10:40 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

On Wed, Dec 04, 2019 at 10:29:29PM +0000, Ben Keene via GitGitGadget wrote:
> From: Ben Keene <seraphire@gmail.com>
> 
> Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
> 
> Change the existing unicode test add new support functions for python2-python3 support.
> 
> Define the following variables:
> - isunicode - a boolean variable that states if the version of python natively supports unicode (true) or not (false). This is true for Python3 and false for Python2.
> - unicode - a type alias for the datatype that holds a unicode string.  It is assigned to a str under python 3 and the unicode type for Python2.
> - bytes - a type alias for an array of bytes.  It is assigned the native bytes type for Python3 and str for Python2.
> 
> Add the following new functions:
> 
> - as_string(text) - A new function that will convert a byte array to a unicode (UTF-8) string under python 3.  Under python 2, this returns the string unchanged.
> - as_bytes(text) - A new function that will convert a unicode string to a byte array under python 3.  Under python 2, this returns the string unchanged.
> - to_unicode(text) - Converts a text string as Unicode(UTF-8) on both Python2 and Python3.
> 
> Add a new function alias raw_input:
> If raw_input does not exist (it was renamed to input in python 3) alias input as raw_input.
> 
> The AS_STRING and AS_BYTES functions allow for modifying the code with a minimal amount of impact on Python2 support.  When a string is expected, the as_string() will be used to convert "cast" the incoming "bytes" to a string type. Conversely as_bytes() will be used to convert a "string" to a "byte array" type. Since Python2 overloads the datatype 'str' to serve both purposes, the Python2 versions of these function do not change the data, since the str functions as both a byte array and a string.

How come AS_STRING and AS_BYTES are all-caps here?

> 
> basestring is removed since its only references are found in tests that were changed in the previous change list.
> 
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit 7921aeb3136b07643c1a503c2d9d8b5ada620356)
> ---
>  git-p4.py | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 66 insertions(+), 4 deletions(-)
> 
> diff --git a/git-p4.py b/git-p4.py
> index 0f27996393..93dfd0920a 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -32,16 +32,78 @@
>      unicode = unicode
>  except NameError:
>      # 'unicode' is undefined, must be Python 3
> -    str = str
> +    #
> +    # For Python3 which is natively unicode, we will use 
> +    # unicode for internal information but all P4 Data
> +    # will remain in bytes
> +    isunicode = True
>      unicode = str
>      bytes = bytes
> -    basestring = (str,bytes)
> +
> +    def as_string(text):
> +        """Return a byte array as a unicode string"""
> +        if text == None:

Nit: use `text is None` instead. Actually, any time you're checking an
object to see if it's None, you should use `is` instead of `==` since
there's usually only one None reference.

> +            return None
> +        if isinstance(text, bytes):
> +            return unicode(text, "utf-8")
> +        else:
> +            return text
> +
> +    def as_bytes(text):
> +        """Return a Unicode string as a byte array"""
> +        if text == None:
> +            return None
> +        if isinstance(text, bytes):
> +            return text
> +        else:
> +            return bytes(text, "utf-8")
> +
> +    def to_unicode(text):
> +        """Return a byte array as a unicode string"""
> +        return as_string(text)    
> +
> +    def path_as_string(path):
> +        """ Converts a path to the UTF8 encoded string """
> +        if isinstance(path, unicode):
> +            return path
> +        return encodeWithUTF8(path).decode('utf-8')
> +    

Trailing whitespace.

>  else:
>      # 'unicode' exists, must be Python 2
> -    str = str
> +    #
> +    # We will treat the data as:
> +    #   str   -> str
> +    #   bytes -> str
> +    # So for Python2 these functions are no-ops
> +    # and will leave the data in the ambiguious
> +    # string/bytes state
> +    isunicode = False
>      unicode = unicode
>      bytes = str
> -    basestring = basestring
> +
> +    def as_string(text):
> +        """ Return text unaltered (for Python3 support) """

I didn't mention this in earlier emails but it's been bothering me a
lot: is there any reason why you write it as "Python3" vs. "Python 3"
sometimes (and Python2 as well)? If there's no difference, then we
should probably stick to one variant in both the commit messages and in
the code. (I prefer the spaced variant.)

> +        return text
> +
> +    def as_bytes(text):
> +        """ Return text unaltered (for Python3 support) """
> +        return text
> +
> +    def to_unicode(text):
> +        """Return a string as a unicode string"""
> +        return text.decode('utf-8')
> +    

Trailing whitespace.

> +    def path_as_string(path):
> +        """ Converts a path to the UTF8 encoded bytes """
> +        return encodeWithUTF8(path)
> +
> +
> + 

Trailing whitespace.

> +# Check for raw_input support
> +try:
> +    raw_input
> +except NameError:
> +    raw_input = input
>  
>  try:
>      from subprocess import CalledProcessError
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 05/11] git-p4: Add new functions in preparation of usage
  2019-12-04 22:29       ` [PATCH v4 05/11] git-p4: Add new functions in preparation of usage Ben Keene via GitGitGadget
@ 2019-12-05 10:50         ` Denton Liu
  2019-12-05 19:23           ` Ben Keene
  0 siblings, 1 reply; 46+ messages in thread
From: Denton Liu @ 2019-12-05 10:50 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

> Subject: git-p4: Add new functions in preparation of usage

Nit: as a convention, you should lowercase the letter after the colon in
the subject. As in "git-p4: add new functions..."

This applies for other patches as well.

On Wed, Dec 04, 2019 at 10:29:31PM +0000, Ben Keene via GitGitGadget wrote:
> From: Ben Keene <seraphire@gmail.com>
> 
> This changelist is an intermediate submission for migrating the P4 support from Python2 to Python3. The code needs access to the encodeWithUTF8() for support of non-UTF8 filenames in the clone class as well as the sync class.
> 
> Move the function encodeWithUTF8() from the P4Sync class to a stand-alone function.  This will allow other classes to use this function without instanciating the P4Sync class. Change the self.verbose reference to an optional method parameter. Update the existing references to this function to pass the self.verbose since it is no longer available on "self" since the function is no longer contained on the P4Sync class.

Hmmm, so does the patch before this not actually work since
encodeWithUTF8() isn't defined yet? When you reroll this series, you
should swap the order of the patches since the previous patch depends on
this one, not the other way around.

> 
> Modify the functions write_pipe() and p4_write_pipe() to remove the return value.  The return value for both functions is the number of bytes, but the meaning is lost under python3 since the count does not match the number of characters that may have been encoded.  Additionally, the return value was never used, so this is removed to avoid future ambiguity.
> 
> Add a new method gitConfigSet(). This method will set a value in the git configuration cache list.
> 
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit affe888f432bb6833df78962e8671fccdf76c47a)
> ---
>  git-p4.py | 60 ++++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 44 insertions(+), 16 deletions(-)
> 
> diff --git a/git-p4.py b/git-p4.py
> index b283ef1029..2659531c2e 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -237,6 +237,8 @@ def die(msg):
>          sys.exit(1)
>  
>  def write_pipe(c, stdin):
> +    """ Executes the command 'c', passing 'stdin' on the standard input
> +    """
>      if verbose:
>          sys.stderr.write('Writing pipe: %s\n' % str(c))
>  
> @@ -248,11 +250,12 @@ def write_pipe(c, stdin):
>      if p.wait():
>          die('Command failed: %s' % str(c))
>  
> -    return val
>  
>  def p4_write_pipe(c, stdin):
> +    """ Runs a P4 command 'c', passing 'stdin' data to P4
> +    """
>      real_cmd = p4_build_cmd(c)
> -    return write_pipe(real_cmd, stdin)
> +    write_pipe(real_cmd, stdin)
>  
>  def read_pipe_full(c):
>      """ Read output from  command. Returns a tuple
> @@ -653,6 +656,38 @@ def isModeExec(mode):
>      # otherwise False.
>      return mode[-3:] == "755"
>  
> +def encodeWithUTF8(path, verbose = False):

Nit: no spaces surrounding `=` in default args.

> +    """ Ensure that the path is encoded as a UTF-8 string
> +
> +        Returns bytes(P3)/str(P2)
> +    """
> +   

Trailing whitespace.

> +    if isunicode:
> +        try:
> +            if isinstance(path, unicode):
> +                # It is already unicode, cast it as a bytes
> +                # that is encoded as utf-8.
> +                return path.encode('utf-8', 'strict')
> +            path.decode('ascii', 'strict')
> +        except:
> +            encoding = 'utf8'
> +            if gitConfig('git-p4.pathEncoding'):
> +                encoding = gitConfig('git-p4.pathEncoding')
> +            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
> +            if verbose:
> +                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
> +    else:    

Trailing whitespace.

> +        try:
> +            path.decode('ascii')
> +        except:
> +            encoding = 'utf8'
> +            if gitConfig('git-p4.pathEncoding'):
> +                encoding = gitConfig('git-p4.pathEncoding')
> +            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
> +            if verbose:
> +                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
> +    return path
> +
>  class P4Exception(Exception):
>      """ Base class for exceptions from the p4 client """
>      def __init__(self, exit_code):
> @@ -891,6 +926,11 @@ def gitConfigList(key):
>              _gitConfig[key] = []
>      return _gitConfig[key]
>  
> +def gitConfigSet(key, value):
> +    """ Set the git configuration key 'key' to 'value' for this session
> +    """
> +    _gitConfig[key] = value
> +
>  def p4BranchesInGit(branchesAreInRemotes=True):
>      """Find all the branches whose names start with "p4/", looking
>         in remotes or heads as specified by the argument.  Return
> @@ -2814,24 +2854,12 @@ def writeToGitStream(self, gitMode, relPath, contents):
>              self.gitStream.write(d)
>          self.gitStream.write('\n')
>  
> -    def encodeWithUTF8(self, path):
> -        try:
> -            path.decode('ascii')
> -        except:
> -            encoding = 'utf8'
> -            if gitConfig('git-p4.pathEncoding'):
> -                encoding = gitConfig('git-p4.pathEncoding')
> -            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
> -            if self.verbose:
> -                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
> -        return path
> -
>      # output one file from the P4 stream
>      # - helper for streamP4Files
>  
>      def streamOneP4File(self, file, contents):
>          relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
> -        relPath = self.encodeWithUTF8(relPath)
> +        relPath = encodeWithUTF8(relPath, self.verbose)
>          if verbose:
>              if 'fileSize' in self.stream_file:
>                  size = int(self.stream_file['fileSize'])
> @@ -2914,7 +2942,7 @@ def streamOneP4File(self, file, contents):
>  
>      def streamOneP4Deletion(self, file):
>          relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
> -        relPath = self.encodeWithUTF8(relPath)
> +        relPath = encodeWithUTF8(relPath, self.verbose)
>          if verbose:
>              sys.stdout.write("delete %s\n" % relPath)
>              sys.stdout.flush()
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 04/11] git-p4: python3 syntax changes
  2019-12-04 22:29       ` [PATCH v4 04/11] git-p4: python3 syntax changes Ben Keene via GitGitGadget
@ 2019-12-05 11:02         ` Denton Liu
  0 siblings, 0 replies; 46+ messages in thread
From: Denton Liu @ 2019-12-05 11:02 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene, Junio C Hamano

On Wed, Dec 04, 2019 at 10:29:30PM +0000, Ben Keene via GitGitGadget wrote:
> From: Ben Keene <seraphire@gmail.com>
> 
> Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
> 
> There are a number of translations suggested by modernize/futureize that should be taken to fix numerous non-string specific issues.
> 
> Change references to the X.next() iterator to the function next(X) which is compatible with both Python2 and Python3.
> 
> Change references to X.keys() to list(X.keys()) to return a list that can be iterated in both Python2 and Python3.

I don't think this is necessary. From what I can tell, using the
key-view of the dict objects is fine since we're always doing so in a
read-only manner.

> 
> Add the literal text (object) to the end of class definitions to be consistent with Python3 class definition.

Since we're going to be dropping Python 2 soon, do we need this? I get
that we'd be mixing old-style with new-style classes in Python 2 vs
Python 3 but it's not like we do anything with the classess related to
type() or isinstance().

Anyway, I'm going to stop here since it's way past my bedtime. I hope
that my suggestions so far have been helpful.

> 
> Change integer divison to use "//" instead of "/"  Under Both python2 and python3 // will return a floor()ed result which matches existing functionality.
> 
> Change the format string for displaying decimal values from %d to %4.1f% when displaying a progress.  This avoids displaying long repeating decimals in user displayed text.
> 
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit bde6b83296aa9b3e7a584c5ce2b571c7287d8f9f)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly
  2019-12-04 22:29       ` [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
@ 2019-12-05 13:38         ` Junio C Hamano
  2019-12-05 19:37           ` Ben Keene
  0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2019-12-05 13:38 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene

"Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ben Keene <seraphire@gmail.com>
>
> When a computer is configured to use Git for windows and Python for windows, and not a Unix subsystem like cygwin or WSL, the directory separator changes and causes git-p4 to fail to properly determine paths.
>
> Fix 3 path separator errors:
>
> 1. getUserCacheFilename should not use string concatenation. Change this code to use os.path.join to build an OS tolerant path.
> 2. defaultDestiantion used the OS.path.split to split depot paths.  This is incorrect on windows. Change the code to split on a forward slash(/) instead since depot paths use this character regardless  of the operating system.
> 3. The call to isvalidGitDir() in the main code also used a literal forward slash. Change the cose to use os.path.join to correctly format the path for the operating system.

s/isvalid/isValid/;
s/cose/code/; 

Also please wrap your lines at around 72 columns (that will let
reviewers quote what you write (which adds "> " prefix and consumes
2 more columns), and would allow us a handful of exchanges (each
round adding ">" prefix to consume 1 more column) before bumping
into the right edge of the terminal at 80 columns.

> These three changes allow the suggested windows configuration to properly locate files while retaining the existing behavior on non-windows operating systems.
>
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit a5b45c12c3861638a933b05a1ffee0c83978dcb2)

As Denton mentioned, general public do not care if you "cherry
picked" it from your earlier unpublished work.  Remove it.

Aside from these small nits, the proposed log message for this step
is quite cleanly done and easily readable.  All the decisions are
clearly written and agreeable.  Nicely done.

> ---
>  git-p4.py | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/git-p4.py b/git-p4.py
> index 2659531c2e..7ac8cb42ef 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -1454,8 +1454,10 @@ def p4UserIsMe(self, p4User):
>              return True
>  
>      def getUserCacheFilename(self):
> +        """ Returns the filename of the username cache 
> +	    """

Inconsistent use of spaces and a tab I see on these two lines.
Intended?

>          home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
> -        return home + "/.gitp4-usercache.txt"
> +        return os.path.join(home, ".gitp4-usercache.txt")
>  
>      def getUserMapFromPerforceServer(self):
>          if self.userMapFromPerforceServer:
> @@ -3973,13 +3975,16 @@ def __init__(self):
>          self.cloneBare = False
>  
>      def defaultDestination(self, args):
> +        """ Returns the last path component as the default git 
> +            repository directory name
> +        """
>          ## TODO: use common prefix of args?
>          depotPath = args[0]
>          depotDir = re.sub("(@[^@]*)$", "", depotPath)
>          depotDir = re.sub("(#[^#]*)$", "", depotDir)
>          depotDir = re.sub(r"\.\.\.$", "", depotDir)
>          depotDir = re.sub(r"/$", "", depotDir)
> -        return os.path.split(depotDir)[1]
> +        return depotDir.split('/')[-1]
>  
>      def run(self, args):
>          if len(args) < 1:
> @@ -4252,8 +4257,8 @@ def main():
>                          chdir(cdup);
>  
>          if not isValidGitDir(cmd.gitdir):
> -            if isValidGitDir(cmd.gitdir + "/.git"):
> -                cmd.gitdir += "/.git"
> +            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
> +                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
>              else:
>                  die("fatal: cannot locate git repository at %s" % cmd.gitdir)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 07/11] git-p4: Add a helper class for stream writing
  2019-12-04 22:29       ` [PATCH v4 07/11] git-p4: Add a helper class for stream writing Ben Keene via GitGitGadget
@ 2019-12-05 13:42         ` Junio C Hamano
  2019-12-05 19:52           ` Ben Keene
  0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2019-12-05 13:42 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene

"Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ben Keene <seraphire@gmail.com>
>
> This is a transtional commit that does not change current behvior.  It adds a new class Py23File.

Perhaps s/transitional/preparatory/?  It does not change the
behaviour because nobody uses the class yet, if I understand
correctly.  Which is fine.

It is kind of surprising that each project needs to reinvent and
maintain a wrapper class like this one, as what the new class does
smells quite generic.

> Following the Python recommendation of keeping text as unicode internally and only converting to and from bytes on input and output, this class provides an interface for the methods used for reading and writing files and file like streams.
>
> Create a class that wraps the input and output functions used by the git-p4.py code for reading and writing to standard file handles.
>
> The methods of this class should take a Unicode string for writing and return unicode strings in reads.  This class should be a drop-in for existing file like streams
>
> The following methods should be coded for supporting existing read/write calls:
> * write - this should write a Unicode string to the underlying stream
> * read - this should read from the underlying stream and cast the bytes as a unicode string
> * readline - this should read one line of text from the underlying stream and cast it as a unicode string
> * readline - this should read a number of lines, optionally hinted, and cast each line as a unicode string
>
> The expression "cast as a unicode string" is used because the code should use the AS_BYTES() and AS_UNICODE() functions instead of cohercing the data to actual unicode strings or bytes.  This allows python 2 code to continue to use the internal "str" data type instead of converting the data back and forth to actual unicode strings. This retains current python2 support while python3 support may be incomplete.
>
> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit 12919111fbaa3e4c0c4c2fdd4f79744cc683d860)
> ---
>  git-p4.py | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
>
> diff --git a/git-p4.py b/git-p4.py
> index 7ac8cb42ef..0da640be93 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -4182,6 +4182,72 @@ def run(self, args):
>              print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
>          return True
>  
> +class Py23File():
> +    """ Python2/3 Unicode File Wrapper 
> +    """
> +    
> +    stream_handle = None
> +    verbose       = False
> +    debug_handle  = None
> +   
> +    def __init__(self, stream_handle, verbose = False):
> +        """ Create a Python3 compliant Unicode to Byte String
> +            Windows compatible wrapper
> +
> +            stream_handle = the underlying file-like handle
> +            verbose       = Boolean if content should be echoed
> +        """
> +        self.stream_handle = stream_handle
> +        self.verbose       = verbose
> +
> +    def write(self, utf8string):
> +        """ Writes the utf8 encoded string to the underlying 
> +            file stream
> +        """
> +        self.stream_handle.write(as_bytes(utf8string))
> +        if self.verbose:
> +            sys.stderr.write("Stream Output: %s" % utf8string)
> +            sys.stderr.flush()
> +
> +    def read(self, size = None):
> +        """ Reads int charcters from the underlying stream 
> +            and converts it to utf8.
> +
> +            Be aware, the size value is for reading the underlying
> +            bytes so the value may be incorrect. Usage of the size
> +            value is discouraged.
> +        """
> +        if size == None:
> +            return as_string(self.stream_handle.read())
> +        else:
> +            return as_string(self.stream_handle.read(size))
> +
> +    def readline(self):
> +        """ Reads a line from the underlying byte stream 
> +            and converts it to utf8
> +        """
> +        return as_string(self.stream_handle.readline())
> +
> +    def readlines(self, sizeHint = None):
> +        """ Returns a list containing lines from the file converted to unicode.
> +
> +            sizehint - Optional. If the optional sizehint argument is 
> +            present, instead of reading up to EOF, whole lines totalling 
> +            approximately sizehint bytes are read.
> +        """
> +        lines = self.stream_handle.readlines(sizeHint)
> +        for i in range(0, len(lines)):
> +            lines[i] = as_string(lines[i])
> +        return lines
> +
> +    def close(self):
> +        """ Closes the underlying byte stream """
> +        self.stream_handle.close()
> +
> +    def flush(self):
> +        """ Flushes the underlying byte stream """
> +        self.stream_handle.flush()
> +
>  class HelpFormatter(optparse.IndentedHelpFormatter):
>      def __init__(self):
>          optparse.IndentedHelpFormatter.__init__(self)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 08/11] git-p4: p4CmdList  - support Unicode encoding
  2019-12-04 22:29       ` [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
@ 2019-12-05 13:55         ` Junio C Hamano
  2019-12-05 20:23           ` Ben Keene
  0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2019-12-05 13:55 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene

"Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ben Keene <seraphire@gmail.com>
>
> The p4CmdList is a commonly used function in the git-p4 code. It is used to execute a command in P4 and return the results of the call in a list.

Somewhere in the midway of the series, the log message starts using
all-caps AS_STRING and AS_BYTES to describe some specific things,
and it would help readers if the first one of these steps explain
what they mean (I am guessing AS_STRING is an unicode object in both
Python 2 and 3, and AS_BYTES is a plain vanilla string in Python 2,
or something like that?).

> Change this code to take a new optional parameter, encode_data that will optionally convert the data AS_STRING() that isto be returned by the function.

s/isto/is to/;

This sentence is a bit hard to read.

This change does not make the function optionally convert the input
we feed to the p4 command---it only changes the values in the
command output.  But the readers cannot tell that easily until
reading to the very end of the sentence, i.e. "returned by the
function", as written.

We probably want to be a bit more explicit to say what gets
converted; perhaps renaming the parameter to encode_cmd_output may
help.

> Change the code so that the key will always be encoded AS_STRING()

s/key/key of the returned hash/ or something to clarify what key you
are talking about.

> Data that is passed for standard input (stdin) should be AS_BYTES() to ensure unicode text that is supplied will be written out as bytes.

"Data that is passed to the standard input stream of the p4 process"
to clarify whose standard input you are talking about (iow, "git p4"
also has and it may use its standard input, but this function does
not muck with it).


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 09/11] git-p4: Add usability enhancements
  2019-12-04 22:29       ` [PATCH v4 09/11] git-p4: Add usability enhancements Ben Keene via GitGitGadget
@ 2019-12-05 14:04         ` Junio C Hamano
  2019-12-05 15:40           ` Ben Keene
  0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2019-12-05 14:04 UTC (permalink / raw)
  To: Ben Keene via GitGitGadget; +Cc: git, Ben Keene

"Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ben Keene <seraphire@gmail.com>
>
> Issue: when prompting the user with raw_input, the tests are not forgiving of user input.  For example, on the first query asks for a yes/no response. If the user enters the full word "yes" or "no" the test will fail. Additionally, offer the suggestion of setting git-p4.attemptRCSCleanup when applying a commit fails because of RCS keywords. Both of these changes are usability enhancement suggestions.

Drop "Issue: " and upcase "when" that follows.  The rest of the
paragraph reads a lot better without it as a human friendly
description.

"are usability enhancement suggestions"???  Leaves readers wonder
who suggested them, or you are suggesting but are willing the change
to be dropped, or what.  Be a bit more assertive if you want to say
that you believe these two would improve usability.

> Change the code prompting the user for input to sanitize the user input before checking the response by asking the response as a lower case string, trimming leading/trailing spaces, and returning the first character.
>
> Change the applyCommit() method that when applying a commit fails becasue of the P4 RCS Keywords, the user should consider setting git-p4.attemptRCSCleanup.

s/becasue/because/;

I have a feeling that these two may be worth doing but are totally
separate issues, deserving two separate commits.  Is there a good
reason why these two must go hand-in-hand?


> Signed-off-by: Ben Keene <seraphire@gmail.com>
> (cherry picked from commit 1fab571664f5b6ad4ef321199f52615a32a9f8c7)
> ---
>  git-p4.py | 31 ++++++++++++++++++++++++++-----
>  1 file changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/git-p4.py b/git-p4.py
> index f7c0ef0c53..f13e4645a3 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -1909,7 +1909,8 @@ def edit_template(self, template_file):
>              return True
>  
>          while True:
> -            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
> +            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
> +                .strip()[0]

You could have saved the patch by doing

	+	.lower().strip()[0]

instead, no?

I wonder if it would be better to write a thin wrapper around raw_input()
that does the "downcase and take the first meaningful letter" thing
for you and call it prompt() or something like that.

> @@ -4327,7 +4343,12 @@ def main():
>                                     description = cmd.description,
>                                     formatter = HelpFormatter())
>  
> -    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
> +    try:
> +        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
> +    except:
> +        parser.print_help()
> +        raise
> +

This change may be a good idea to give help text when the command
line parsing fails, but a good change deserves to be explained.  I
do not think I saw any mention of it in the proposed log message,
though.

>      global verbose
>      verbose = cmd.verbose
>      if cmd.needsGit:

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 09/11] git-p4: Add usability enhancements
  2019-12-05 14:04         ` Junio C Hamano
@ 2019-12-05 15:40           ` Ben Keene
  0 siblings, 0 replies; 46+ messages in thread
From: Ben Keene @ 2019-12-05 15:40 UTC (permalink / raw)
  To: Junio C Hamano, Ben Keene via GitGitGadget; +Cc: git


On 12/5/2019 9:04 AM, Junio C Hamano wrote:
> "Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Ben Keene <seraphire@gmail.com>
>>
>> Issue: when prompting the user with raw_input, the tests are not forgiving of user input.  For example, on the first query asks for a yes/no response. If the user enters the full word "yes" or "no" the test will fail. Additionally, offer the suggestion of setting git-p4.attemptRCSCleanup when applying a commit fails because of RCS keywords. Both of these changes are usability enhancement suggestions.
> Drop "Issue: " and upcase "when" that follows.  The rest of the
> paragraph reads a lot better without it as a human friendly
> description.
>
> "are usability enhancement suggestions"???  Leaves readers wonder
> who suggested them, or you are suggesting but are willing the change
> to be dropped, or what.  Be a bit more assertive if you want to say
> that you believe these two would improve usability.
Thank you and I reworked my submissions. I'm moving them to a separate 
PR and will split the commit into 3 separate commits.
>> Change the code prompting the user for input to sanitize the user input before checking the response by asking the response as a lower case string, trimming leading/trailing spaces, and returning the first character.
>>
>> Change the applyCommit() method that when applying a commit fails becasue of the P4 RCS Keywords, the user should consider setting git-p4.attemptRCSCleanup.
> s/becasue/because/;
>
> I have a feeling that these two may be worth doing but are totally
> separate issues, deserving two separate commits.  Is there a good
> reason why these two must go hand-in-hand?
>
Good idea, and I split them out.
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit 1fab571664f5b6ad4ef321199f52615a32a9f8c7)
>> ---
>>   git-p4.py | 31 ++++++++++++++++++++++++++-----
>>   1 file changed, 26 insertions(+), 5 deletions(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index f7c0ef0c53..f13e4645a3 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -1909,7 +1909,8 @@ def edit_template(self, template_file):
>>               return True
>>   
>>           while True:
>> -            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ")
>> +            response = raw_input("Submit template unchanged. Submit anyway? [y]es, [n]o (skip this patch) ").lower() \
>> +                .strip()[0]
> You could have saved the patch by doing
>
> 	+	.lower().strip()[0]
>
> instead, no?
>
> I wonder if it would be better to write a thin wrapper around raw_input()
> that does the "downcase and take the first meaningful letter" thing
> for you and call it prompt() or something like that.
I created a new function prompt() as you suggested.
>> @@ -4327,7 +4343,12 @@ def main():
>>                                      description = cmd.description,
>>                                      formatter = HelpFormatter())
>>   
>> -    (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
>> +    try:
>> +        (cmd, args) = parser.parse_args(sys.argv[2:], cmd);
>> +    except:
>> +        parser.print_help()
>> +        raise
>> +
> This change may be a good idea to give help text when the command
> line parsing fails, but a good change deserves to be explained.  I
> do not think I saw any mention of it in the proposed log message,
> though.

Yes, you're right.  I split this out into a separate commit as well and 
gave it a place or prominence.

>>       global verbose
>>       verbose = cmd.verbose
>>       if cmd.needsGit:

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-05  9:54       ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Luke Diamand
@ 2019-12-05 16:16         ` Ben Keene
  2019-12-05 18:51           ` Denton Liu
  0 siblings, 1 reply; 46+ messages in thread
From: Ben Keene @ 2019-12-05 16:16 UTC (permalink / raw)
  To: Luke Diamand, Ben Keene via GitGitGadget; +Cc: Git Users, Junio C Hamano


On 12/5/2019 4:54 AM, Luke Diamand wrote:
> On Wed, 4 Dec 2019 at 22:29, Ben Keene via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Issue: The current git-p4.py script does not work with python3.
>>
>> I have attempted to use the P4 integration built into GIT and I was unable
>> to get the program to run because I have Python 3.8 installed on my
>> computer. I was able to get the program to run when I downgraded my python
>> to version 2.7. However, python 2 is reaching its end of life.
>>
>> Submission: I am submitting a patch for the git-p4.py script that partially
>> supports python 3.8. This code was able to pass the basic tests (t9800) when
>> run against Python3. This provides basic functionality.
>>
>> In an attempt to pass the t9822 P4 path-encoding test, a new parameter for
>> git P4 Clone was introduced.
>>
>> --encoding Format-identifier
>>
>> This will create the GIT repository following the current functionality;
>> however, before importing the files from P4, it will set the
>> git-p4.pathEncoding option so any files or paths that are encoded with
>> non-ASCII/non-UTF-8 formats will import correctly.
>>
>> Technical details: The script was updated by futurize (
>> https://python-future.org/futurize.html) to support Py2/Py3 syntax. The few
>> references to classes in future were reworked so that future would not be
>> required. The existing code test for Unicode support was extended to
>> normalize the classes “unicode” and “bytes” to across platforms:
>>
>>   * ‘unicode’ is an alias for ‘str’ in Py3 and is the unicode class in Py2.
>>   * ‘bytes’ is bytes in Py3 and an alias for ‘str’ in Py2.
>>
>> New coercion methods were written for both Python2 and Python3:
>>
>>   * as_string(text) – In Python3, this encodes a bytes object as a UTF-8
>>     encoded Unicode string.
>>   * as_bytes(text) – In Python3, this decodes a Unicode string to an array of
>>     bytes.
>>
>> In Python2, these functions do not change the data since a ‘str’ object
>> function in both roles as strings and byte arrays. This reduces the
>> potential impact on backward compatibility with Python 2.
>>
>>   * to_unicode(text) – ensures that the supplied data is encoded as a UTF-8
>>     string. This function will encode data in both Python2 and Python3. *
>>        path_as_string(path) – This function is an extension function that
>>        honors the option “git-p4.pathEncoding” to convert a set of bytes or
>>        characters to UTF-8. If the str/bytes cannot decode as ASCII, it will
>>        use the encodeWithUTF8() method to convert the custom encoded bytes to
>>        Unicode in UTF-8.
>>
>>
>>
>> Generally speaking, information in the script is converted to Unicode as
>> early as possible and converted back to a byte array just before passing to
>> external programs or files. The exception to this rule is P4 Repository file
>> paths.
>>
>> Paths are not converted but left as “bytes” so the original file path
>> encoding can be preserved. This formatting is required for commands that
>> interact with the P4 file path. When the file path is used by GIT, it is
>> converted with encodeWithUTF8().
>>
> Almost all the tests pass now - nice!
>
> (There's one test that fails for me, t9830-git-p4-symlink-dir.sh).


Which version of Python are running the failing test against?  I run it 
against Python 2.7 and it passes the test. I don't expect all Python 3.x 
tests to pass yet, just t9800.


>
> Nitpicking:
>
> - There are some bits of trailing whitespace around - can you strip
> those out? You can use "git diff --check".


Is there a way that I can find out which branches I need to remove white 
space from now that they have been committed?


> - Also I think the convention for git commits is that they be limited
> to 72 (?) characters.


I'm going through all my commits and fixing them.


> - In 10dc commit message, s/behvior/behavior
> - Maybe submit 4fc4 as a separate patch series? It doesn't seem
> directly related to your python3 changes.


I moved the enhancements to https://github.com/git/git/pull/675


> - s/howerver/however/
>
> The comment at line 3261 (showing the fast-import syntax) has wonky
> indentation, and needs a space after the '#'.
>
> This code looked like we're duplicating stuff:
>
> +    if isinstance(path, unicode):
> +        path = path.replace("%", "%25") \
> +                   .replace("*", "%2A") \
> +                   .replace("#", "%23") \
> +                   .replace("@", "%40")
> +    else:
> +        path = path.replace(b"%", b"%25") \
> +                   .replace(b"*", b"%2A") \
> +                   .replace(b"#", b"%23") \
> +                   .replace(b"@", b"%40")
>
> I wonder if we can have a helper to do this?

I was just looking at this code block, and at this time, I'm not sure if 
the text coming in will be Unicode or bytes, so I'm hesitant to change 
it until more of the code is converted, but I understand about the 
duplication.


>
> In patchRCSKeywords() you've added code to cleanup outFile. But I
> wonder if we could just use a 'finally' block, or a contextexpr ("with
> blah as outFile:")
>
> I don't know if it's worth doing now that you've got it going, but at
> one point I tried simplifying code like this:
>
>     path_as_string(file['depotFile'])
> and
>     marshalled[b'data']
>
> by using a dictionary with overloaded operators which would do the
> bytes/string conversion automatically. However, your approach isn't
> actually _that_ invasive, so maybe this is not necessary.
>
> Looks good though, thanks!
> Luke
>
I toyed with making a class object that would hold the path data and 
have methods to cast to bytes and encodeWithUTF8() and Unicode versions, 
but it quickly got out of hand.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 01/11] git-p4: select p4 binary by operating-system
  2019-12-05 10:19         ` Denton Liu
@ 2019-12-05 16:32           ` Ben Keene
  0 siblings, 0 replies; 46+ messages in thread
From: Ben Keene @ 2019-12-05 16:32 UTC (permalink / raw)
  To: Denton Liu, Ben Keene via GitGitGadget; +Cc: git, Junio C Hamano


On 12/5/2019 5:19 AM, Denton Liu wrote:
> Hi Ben,
>
> First of all, as a note to you and possibly others, I don't have much
> (read: any) experience with git-p4. I do have experience with Python and
> how git.git generally does things so I'll be reviewing from that
> perspective.
>
> On Wed, Dec 04, 2019 at 10:29:27PM +0000, Ben Keene via GitGitGadget wrote:
>> From: Ben Keene <seraphire@gmail.com>
>>
>> Depending on the version of GIT and Python installed, the perforce program (p4) may not resolve on Windows without the program extension.
> Nit: "GIT" should be written as "Git" when referring to the whole
> project and "git" when referring to the command. Never in all-caps.
>
> Also, please wrap your paragraphs at 72 characters. I'll say it once
> here but it applies to your whole series.


Got it. I'll update all my commit messages to fit within this space.  I 
didn't realize
they didn't word wrap properly. (I'm using a GUI tool to manage this.)


>> Check the operating system (platform.system) and if it is reporting that it is Windows, use the full filename of "p4.exe" instead of "p4"
>>
>> The original code unconditionally used "p4" as the binary filename.
> As a rule of thumb, we want to state the problem first before we state
> what we did (and why). I'd move this paragraph up.
>
>> This change is Python2 and Python3 compatible.
>>
>> Thanks to: Junio C Hamano <gitster@pobox.com> and  Denton Liu <liu.denton@gmail.com> for patiently explaining proper format for my submissions.
> I appreciate the credit but I don't think it's necessary. At _most_, you
> could include the
>
> 	Helped-by: Junio C Hamano <gitster@pobox.com>
> 	Helped-by: Denton Liu <liu.denton@gmail.com>
>
> tags before your signoff but I don't think we've done anything to
> warrant it.


Thank you, I'll keep that in mind for the next submission!


>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit 9a3a5c4e6d29dbef670072a9605c7a82b3729434)
> You should remove this line in all of your commits. The referenced
> commit isn't public so the information isn't very useful. Also, try to
> not include anything after your signoff so if this hypothetically were
> useful information, you'd include it before your signoff.
>
> If it's information that's ephemerally useful for current reviewers but
> not for future readers of your commit in the log message, you can
> include it after the three hyphens...


I'll look to pull these out before I update my submission.


>> ---
> like this and it won't be included as part of the log message.
>
>>   git-p4.py | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index 60c73b6a37..b2ffbc057b 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -75,7 +75,11 @@ def p4_build_cmd(cmd):
>>       location. It means that hooking into the environment, or other configuration
>>       can be done more easily.
>>       """
>> -    real_cmd = ["p4"]
>> +    # Look for the P4 binary
> I don't think this comment is necessary as the code itself is pretty
> self-explanatory.
>
>> +    if (platform.system() == "Windows"):
>> +        real_cmd = ["p4.exe"]
> You have trailing whitespace here. Try to run `git diff --check` before
> committing to ensure that you have no whitespace errors.
>
> Thanks,
>
> Denton
>
>> +    else:
>> +        real_cmd = ["p4"]
>>   
>>       user = gitConfig("git-p4.user")
>>       if len(user) > 0:
>> -- 
>> gitgitgadget
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 02/11] git-p4: change the expansion test from basestring to list
  2019-12-05 10:27         ` Denton Liu
@ 2019-12-05 17:05           ` Ben Keene
  0 siblings, 0 replies; 46+ messages in thread
From: Ben Keene @ 2019-12-05 17:05 UTC (permalink / raw)
  To: Denton Liu, Ben Keene via GitGitGadget; +Cc: git, Junio C Hamano


On 12/5/2019 5:27 AM, Denton Liu wrote:
> Hi Ben,
>
> On Wed, Dec 04, 2019 at 10:29:28PM +0000, Ben Keene via GitGitGadget wrote:
>> From: Ben Keene <seraphire@gmail.com>
>>
>> Python 3+ handles strings differently than Python 2.7.
> Do you mean Python 3?
>
>> Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
> Python 3.5 doesn't reach EOL until Q4 2020[1]. We should be testing
> these changes under 3.5 to ensure that we're not accidentally
> introducing stuff that's not backwards compatible.


I changed my commit text to say support for version 3.5 (which is 
actually the version I am running the test with).


>> Change references to basestring in the isinstance tests to use list instead. This prepares the code to remove all references to basestring.
>>
>> The original code used basestring in a test to determine if a list or literal string was passed into 9 different functions.  This is used to determine if the shell should be evoked when calling subprocess methods.
> Once again, I'd swap the above two paragraphs. Problem then solution.
>
> Also, did you mean "invoked" instead of "evoked"?


Changed.  And yes, I meant 'invoked'. I wasn't trying to make my code 
feel anything!


>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit 5b1b1c145479b5d5fd242122737a3134890409e6)
>> ---
>>   git-p4.py | 18 +++++++++---------
>>   1 file changed, 9 insertions(+), 9 deletions(-)
> The patch itself looks good, though.
>
> [1]: https://devguide.python.org/#branchstatus

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion
  2019-12-05 10:40         ` Denton Liu
@ 2019-12-05 18:42           ` Ben Keene
  0 siblings, 0 replies; 46+ messages in thread
From: Ben Keene @ 2019-12-05 18:42 UTC (permalink / raw)
  To: Denton Liu, Ben Keene via GitGitGadget; +Cc: git, Junio C Hamano


On 12/5/2019 5:40 AM, Denton Liu wrote:
> On Wed, Dec 04, 2019 at 10:29:29PM +0000, Ben Keene via GitGitGadget wrote:
>> From: Ben Keene <seraphire@gmail.com>
>>
>> Python 3+ handles strings differently than Python 2.7.  Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.7+ support. The current code fails basic tests under python 3.7.
>>
>> Change the existing unicode test add new support functions for python2-python3 support.
>>
>> Define the following variables:
>> - isunicode - a boolean variable that states if the version of python natively supports unicode (true) or not (false). This is true for Python3 and false for Python2.
>> - unicode - a type alias for the datatype that holds a unicode string.  It is assigned to a str under python 3 and the unicode type for Python2.
>> - bytes - a type alias for an array of bytes.  It is assigned the native bytes type for Python3 and str for Python2.
>>
>> Add the following new functions:
>>
>> - as_string(text) - A new function that will convert a byte array to a unicode (UTF-8) string under python 3.  Under python 2, this returns the string unchanged.
>> - as_bytes(text) - A new function that will convert a unicode string to a byte array under python 3.  Under python 2, this returns the string unchanged.
>> - to_unicode(text) - Converts a text string as Unicode(UTF-8) on both Python2 and Python3.
>>
>> Add a new function alias raw_input:
>> If raw_input does not exist (it was renamed to input in python 3) alias input as raw_input.
>>
>> The AS_STRING and AS_BYTES functions allow for modifying the code with a minimal amount of impact on Python2 support.  When a string is expected, the as_string() will be used to convert "cast" the incoming "bytes" to a string type. Conversely as_bytes() will be used to convert a "string" to a "byte array" type. Since Python2 overloads the datatype 'str' to serve both purposes, the Python2 versions of these function do not change the data, since the str functions as both a byte array and a string.
> How come AS_STRING and AS_BYTES are all-caps here?


I changed them.  I used all caps to designate that they are code string. 
I changed them to as_string() and as_bytes()


>
>> basestring is removed since its only references are found in tests that were changed in the previous change list.
>>
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit 7921aeb3136b07643c1a503c2d9d8b5ada620356)
>> ---
>>   git-p4.py | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++----
>>   1 file changed, 66 insertions(+), 4 deletions(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index 0f27996393..93dfd0920a 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -32,16 +32,78 @@
>>       unicode = unicode
>>   except NameError:
>>       # 'unicode' is undefined, must be Python 3
>> -    str = str
>> +    #
>> +    # For Python3 which is natively unicode, we will use
>> +    # unicode for internal information but all P4 Data
>> +    # will remain in bytes
>> +    isunicode = True
>>       unicode = str
>>       bytes = bytes
>> -    basestring = (str,bytes)
>> +
>> +    def as_string(text):
>> +        """Return a byte array as a unicode string"""
>> +        if text == None:
> Nit: use `text is None` instead. Actually, any time you're checking an
> object to see if it's None, you should use `is` instead of `==` since
> there's usually only one None reference.

I changed this in this commit and will attempt to fix this in all the 
following commits as well.


>
>> +            return None
>> +        if isinstance(text, bytes):
>> +            return unicode(text, "utf-8")
>> +        else:
>> +            return text
>> +
>> +    def as_bytes(text):
>> +        """Return a Unicode string as a byte array"""
>> +        if text == None:
>> +            return None
>> +        if isinstance(text, bytes):
>> +            return text
>> +        else:
>> +            return bytes(text, "utf-8")
>> +
>> +    def to_unicode(text):
>> +        """Return a byte array as a unicode string"""
>> +        return as_string(text)
>> +
>> +    def path_as_string(path):
>> +        """ Converts a path to the UTF8 encoded string """
>> +        if isinstance(path, unicode):
>> +            return path
>> +        return encodeWithUTF8(path).decode('utf-8')
>> +
> Trailing whitespace.
>
>>   else:
>>       # 'unicode' exists, must be Python 2
>> -    str = str
>> +    #
>> +    # We will treat the data as:
>> +    #   str   -> str
>> +    #   bytes -> str
>> +    # So for Python2 these functions are no-ops
>> +    # and will leave the data in the ambiguious
>> +    # string/bytes state
>> +    isunicode = False
>>       unicode = unicode
>>       bytes = str
>> -    basestring = basestring
>> +
>> +    def as_string(text):
>> +        """ Return text unaltered (for Python3 support) """
> I didn't mention this in earlier emails but it's been bothering me a
> lot: is there any reason why you write it as "Python3" vs. "Python 3"
> sometimes (and Python2 as well)? If there's no difference, then we
> should probably stick to one variant in both the commit messages and in
> the code. (I prefer the spaced variant.)


The difference was sloppy typing.  Like the "is None" and trailing white 
spaces, I'll work on fixing these.


>> +        return text
>> +
>> +    def as_bytes(text):
>> +        """ Return text unaltered (for Python3 support) """
>> +        return text
>> +
>> +    def to_unicode(text):
>> +        """Return a string as a unicode string"""
>> +        return text.decode('utf-8')
>> +
> Trailing whitespace.
>
>> +    def path_as_string(path):
>> +        """ Converts a path to the UTF8 encoded bytes """
>> +        return encodeWithUTF8(path)
>> +
>> +
>> +
> Trailing whitespace.
>
>> +# Check for raw_input support
>> +try:
>> +    raw_input
>> +except NameError:
>> +    raw_input = input
>>   
>>   try:
>>       from subprocess import CalledProcessError
>> -- 
>> gitgitgadget
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-05 16:16         ` Ben Keene
@ 2019-12-05 18:51           ` Denton Liu
  2019-12-05 20:47             ` Ben Keene
  0 siblings, 1 reply; 46+ messages in thread
From: Denton Liu @ 2019-12-05 18:51 UTC (permalink / raw)
  To: Ben Keene
  Cc: Luke Diamand, Ben Keene via GitGitGadget, Git Users, Junio C Hamano

On Thu, Dec 05, 2019 at 11:16:27AM -0500, Ben Keene wrote:
> 
> On 12/5/2019 4:54 AM, Luke Diamand wrote:
> > On Wed, 4 Dec 2019 at 22:29, Ben Keene via GitGitGadget
> > - There are some bits of trailing whitespace around - can you strip
> > those out? You can use "git diff --check".
> 
> 
> Is there a way that I can find out which branches I need to remove white
> space from now that they have been committed?

I'm assuming you mean commits? You can run

	git log --check master..

and git will highlight the whitespace errors.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 05/11] git-p4: Add new functions in preparation of usage
  2019-12-05 10:50         ` Denton Liu
@ 2019-12-05 19:23           ` Ben Keene
  0 siblings, 0 replies; 46+ messages in thread
From: Ben Keene @ 2019-12-05 19:23 UTC (permalink / raw)
  To: Denton Liu, Ben Keene via GitGitGadget; +Cc: git, Junio C Hamano


On 12/5/2019 5:50 AM, Denton Liu wrote:
>> Subject: git-p4: Add new functions in preparation of usage
> Nit: as a convention, you should lowercase the letter after the colon in
> the subject. As in "git-p4: add new functions..."
>
> This applies for other patches as well.


Got it.  Changing all leading characters to lower case.


>
> On Wed, Dec 04, 2019 at 10:29:31PM +0000, Ben Keene via GitGitGadget wrote:
>> From: Ben Keene <seraphire@gmail.com>
>>
>> This changelist is an intermediate submission for migrating the P4 support from Python2 to Python3. The code needs access to the encodeWithUTF8() for support of non-UTF8 filenames in the clone class as well as the sync class.
>>
>> Move the function encodeWithUTF8() from the P4Sync class to a stand-alone function.  This will allow other classes to use this function without instanciating the P4Sync class. Change the self.verbose reference to an optional method parameter. Update the existing references to this function to pass the self.verbose since it is no longer available on "self" since the function is no longer contained on the P4Sync class.
> Hmmm, so does the patch before this not actually work since
> encodeWithUTF8() isn't defined yet? When you reroll this series, you
> should swap the order of the patches since the previous patch depends on
> this one, not the other way around.

Good catch.  That's correct, the encodeWithUTF8() should be first.  I 
moved that commit earlier in the chain and actually split it up from the 
changes to write_pipe and gitConfigSet() so the text will be easier to see.


>> Modify the functions write_pipe() and p4_write_pipe() to remove the return value.  The return value for both functions is the number of bytes, but the meaning is lost under python3 since the count does not match the number of characters that may have been encoded.  Additionally, the return value was never used, so this is removed to avoid future ambiguity.
>>
>> Add a new method gitConfigSet(). This method will set a value in the git configuration cache list.
>>
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit affe888f432bb6833df78962e8671fccdf76c47a)
>> ---
>>   git-p4.py | 60 ++++++++++++++++++++++++++++++++++++++++---------------
>>   1 file changed, 44 insertions(+), 16 deletions(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index b283ef1029..2659531c2e 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -237,6 +237,8 @@ def die(msg):
>>           sys.exit(1)
>>   
>>   def write_pipe(c, stdin):
>> +    """ Executes the command 'c', passing 'stdin' on the standard input
>> +    """
>>       if verbose:
>>           sys.stderr.write('Writing pipe: %s\n' % str(c))
>>   
>> @@ -248,11 +250,12 @@ def write_pipe(c, stdin):
>>       if p.wait():
>>           die('Command failed: %s' % str(c))
>>   
>> -    return val
>>   
>>   def p4_write_pipe(c, stdin):
>> +    """ Runs a P4 command 'c', passing 'stdin' data to P4
>> +    """
>>       real_cmd = p4_build_cmd(c)
>> -    return write_pipe(real_cmd, stdin)
>> +    write_pipe(real_cmd, stdin)
>>   
>>   def read_pipe_full(c):
>>       """ Read output from  command. Returns a tuple
>> @@ -653,6 +656,38 @@ def isModeExec(mode):
>>       # otherwise False.
>>       return mode[-3:] == "755"
>>   
>> +def encodeWithUTF8(path, verbose = False):
> Nit: no spaces surrounding `=` in default args.


Fixed


>> +    """ Ensure that the path is encoded as a UTF-8 string
>> +
>> +        Returns bytes(P3)/str(P2)
>> +    """
>> +
> Trailing whitespace.
>
>> +    if isunicode:
>> +        try:
>> +            if isinstance(path, unicode):
>> +                # It is already unicode, cast it as a bytes
>> +                # that is encoded as utf-8.
>> +                return path.encode('utf-8', 'strict')
>> +            path.decode('ascii', 'strict')
>> +        except:
>> +            encoding = 'utf8'
>> +            if gitConfig('git-p4.pathEncoding'):
>> +                encoding = gitConfig('git-p4.pathEncoding')
>> +            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
>> +            if verbose:
>> +                print('\nNOTE:Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, to_unicode(path)))
>> +    else:
> Trailing whitespace.
>
>> +        try:
>> +            path.decode('ascii')
>> +        except:
>> +            encoding = 'utf8'
>> +            if gitConfig('git-p4.pathEncoding'):
>> +                encoding = gitConfig('git-p4.pathEncoding')
>> +            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
>> +            if verbose:
>> +                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
>> +    return path
>> +
>>   class P4Exception(Exception):
>>       """ Base class for exceptions from the p4 client """
>>       def __init__(self, exit_code):
>> @@ -891,6 +926,11 @@ def gitConfigList(key):
>>               _gitConfig[key] = []
>>       return _gitConfig[key]
>>   
>> +def gitConfigSet(key, value):
>> +    """ Set the git configuration key 'key' to 'value' for this session
>> +    """
>> +    _gitConfig[key] = value
>> +
>>   def p4BranchesInGit(branchesAreInRemotes=True):
>>       """Find all the branches whose names start with "p4/", looking
>>          in remotes or heads as specified by the argument.  Return
>> @@ -2814,24 +2854,12 @@ def writeToGitStream(self, gitMode, relPath, contents):
>>               self.gitStream.write(d)
>>           self.gitStream.write('\n')
>>   
>> -    def encodeWithUTF8(self, path):
>> -        try:
>> -            path.decode('ascii')
>> -        except:
>> -            encoding = 'utf8'
>> -            if gitConfig('git-p4.pathEncoding'):
>> -                encoding = gitConfig('git-p4.pathEncoding')
>> -            path = path.decode(encoding, 'replace').encode('utf8', 'replace')
>> -            if self.verbose:
>> -                print('Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path))
>> -        return path
>> -
>>       # output one file from the P4 stream
>>       # - helper for streamP4Files
>>   
>>       def streamOneP4File(self, file, contents):
>>           relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
>> -        relPath = self.encodeWithUTF8(relPath)
>> +        relPath = encodeWithUTF8(relPath, self.verbose)
>>           if verbose:
>>               if 'fileSize' in self.stream_file:
>>                   size = int(self.stream_file['fileSize'])
>> @@ -2914,7 +2942,7 @@ def streamOneP4File(self, file, contents):
>>   
>>       def streamOneP4Deletion(self, file):
>>           relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
>> -        relPath = self.encodeWithUTF8(relPath)
>> +        relPath = encodeWithUTF8(relPath, self.verbose)
>>           if verbose:
>>               sys.stdout.write("delete %s\n" % relPath)
>>               sys.stdout.flush()
>> -- 
>> gitgitgadget
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly
  2019-12-05 13:38         ` Junio C Hamano
@ 2019-12-05 19:37           ` Ben Keene
  0 siblings, 0 replies; 46+ messages in thread
From: Ben Keene @ 2019-12-05 19:37 UTC (permalink / raw)
  To: Junio C Hamano, Ben Keene via GitGitGadget; +Cc: git


On 12/5/2019 8:38 AM, Junio C Hamano wrote:
> "Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Ben Keene <seraphire@gmail.com>
>>
>> When a computer is configured to use Git for windows and Python for windows, and not a Unix subsystem like cygwin or WSL, the directory separator changes and causes git-p4 to fail to properly determine paths.
>>
>> Fix 3 path separator errors:
>>
>> 1. getUserCacheFilename should not use string concatenation. Change this code to use os.path.join to build an OS tolerant path.
>> 2. defaultDestiantion used the OS.path.split to split depot paths.  This is incorrect on windows. Change the code to split on a forward slash(/) instead since depot paths use this character regardless  of the operating system.
>> 3. The call to isvalidGitDir() in the main code also used a literal forward slash. Change the cose to use os.path.join to correctly format the path for the operating system.
> s/isvalid/isValid/;
> s/cose/code/;
>
> Also please wrap your lines at around 72 columns (that will let
> reviewers quote what you write (which adds "> " prefix and consumes
> 2 more columns), and would allow us a handful of exchanges (each
> round adding ">" prefix to consume 1 more column) before bumping
> into the right edge of the terminal at 80 columns.
>
>> These three changes allow the suggested windows configuration to properly locate files while retaining the existing behavior on non-windows operating systems.
>>
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit a5b45c12c3861638a933b05a1ffee0c83978dcb2)
> As Denton mentioned, general public do not care if you "cherry
> picked" it from your earlier unpublished work.  Remove it.
>
> Aside from these small nits, the proposed log message for this step
> is quite cleanly done and easily readable.  All the decisions are
> clearly written and agreeable.  Nicely done.


Thank you. I've been working through all the commits and updating them.


>> ---
>>   git-p4.py | 13 +++++++++----
>>   1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index 2659531c2e..7ac8cb42ef 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -1454,8 +1454,10 @@ def p4UserIsMe(self, p4User):
>>               return True
>>   
>>       def getUserCacheFilename(self):
>> +        """ Returns the filename of the username cache
>> +	    """
> Inconsistent use of spaces and a tab I see on these two lines.
> Intended?

Good catch! It should have been spaces.  Corrected.


>
>>           home = os.environ.get("HOME", os.environ.get("USERPROFILE"))
>> -        return home + "/.gitp4-usercache.txt"
>> +        return os.path.join(home, ".gitp4-usercache.txt")
>>   
>>       def getUserMapFromPerforceServer(self):
>>           if self.userMapFromPerforceServer:
>> @@ -3973,13 +3975,16 @@ def __init__(self):
>>           self.cloneBare = False
>>   
>>       def defaultDestination(self, args):
>> +        """ Returns the last path component as the default git
>> +            repository directory name
>> +        """
>>           ## TODO: use common prefix of args?
>>           depotPath = args[0]
>>           depotDir = re.sub("(@[^@]*)$", "", depotPath)
>>           depotDir = re.sub("(#[^#]*)$", "", depotDir)
>>           depotDir = re.sub(r"\.\.\.$", "", depotDir)
>>           depotDir = re.sub(r"/$", "", depotDir)
>> -        return os.path.split(depotDir)[1]
>> +        return depotDir.split('/')[-1]
>>   
>>       def run(self, args):
>>           if len(args) < 1:
>> @@ -4252,8 +4257,8 @@ def main():
>>                           chdir(cdup);
>>   
>>           if not isValidGitDir(cmd.gitdir):
>> -            if isValidGitDir(cmd.gitdir + "/.git"):
>> -                cmd.gitdir += "/.git"
>> +            if isValidGitDir(os.path.join(cmd.gitdir, ".git")):
>> +                cmd.gitdir = os.path.join(cmd.gitdir, ".git")
>>               else:
>>                   die("fatal: cannot locate git repository at %s" % cmd.gitdir)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 07/11] git-p4: Add a helper class for stream writing
  2019-12-05 13:42         ` Junio C Hamano
@ 2019-12-05 19:52           ` Ben Keene
  0 siblings, 0 replies; 46+ messages in thread
From: Ben Keene @ 2019-12-05 19:52 UTC (permalink / raw)
  To: Junio C Hamano, Ben Keene via GitGitGadget; +Cc: git


On 12/5/2019 8:42 AM, Junio C Hamano wrote:
> "Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Ben Keene <seraphire@gmail.com>
>>
>> This is a transtional commit that does not change current behvior.  It adds a new class Py23File.
> Perhaps s/transitional/preparatory/?  It does not change the
> behaviour because nobody uses the class yet, if I understand
> correctly.  Which is fine.
>
> It is kind of surprising that each project needs to reinvent and
> maintain a wrapper class like this one, as what the new class does
> smells quite generic.

It is a rather generic class.  My intention was to avoid adding
any additional dependencies so a small class that only implements
the few methods we need seemed safest.

I cleaned up this commit message as well.

>> Following the Python recommendation of keeping text as unicode internally and only converting to and from bytes on input and output, this class provides an interface for the methods used for reading and writing files and file like streams.
>>
>> Create a class that wraps the input and output functions used by the git-p4.py code for reading and writing to standard file handles.
>>
>> The methods of this class should take a Unicode string for writing and return unicode strings in reads.  This class should be a drop-in for existing file like streams
>>
>> The following methods should be coded for supporting existing read/write calls:
>> * write - this should write a Unicode string to the underlying stream
>> * read - this should read from the underlying stream and cast the bytes as a unicode string
>> * readline - this should read one line of text from the underlying stream and cast it as a unicode string
>> * readline - this should read a number of lines, optionally hinted, and cast each line as a unicode string
>>
>> The expression "cast as a unicode string" is used because the code should use the AS_BYTES() and AS_UNICODE() functions instead of cohercing the data to actual unicode strings or bytes.  This allows python 2 code to continue to use the internal "str" data type instead of converting the data back and forth to actual unicode strings. This retains current python2 support while python3 support may be incomplete.
>>
>> Signed-off-by: Ben Keene <seraphire@gmail.com>
>> (cherry picked from commit 12919111fbaa3e4c0c4c2fdd4f79744cc683d860)
>> ---
>>   git-p4.py | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 66 insertions(+)
>>
>> diff --git a/git-p4.py b/git-p4.py
>> index 7ac8cb42ef..0da640be93 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -4182,6 +4182,72 @@ def run(self, args):
>>               print("%s <= %s (%s)" % (branch, ",".join(settings["depot-paths"]), settings["change"]))
>>           return True
>>   
>> +class Py23File():
>> +    """ Python2/3 Unicode File Wrapper
>> +    """
>> +
>> +    stream_handle = None
>> +    verbose       = False
>> +    debug_handle  = None
>> +
>> +    def __init__(self, stream_handle, verbose = False):
>> +        """ Create a Python3 compliant Unicode to Byte String
>> +            Windows compatible wrapper
>> +
>> +            stream_handle = the underlying file-like handle
>> +            verbose       = Boolean if content should be echoed
>> +        """
>> +        self.stream_handle = stream_handle
>> +        self.verbose       = verbose
>> +
>> +    def write(self, utf8string):
>> +        """ Writes the utf8 encoded string to the underlying
>> +            file stream
>> +        """
>> +        self.stream_handle.write(as_bytes(utf8string))
>> +        if self.verbose:
>> +            sys.stderr.write("Stream Output: %s" % utf8string)
>> +            sys.stderr.flush()
>> +
>> +    def read(self, size = None):
>> +        """ Reads int charcters from the underlying stream
>> +            and converts it to utf8.
>> +
>> +            Be aware, the size value is for reading the underlying
>> +            bytes so the value may be incorrect. Usage of the size
>> +            value is discouraged.
>> +        """
>> +        if size == None:
>> +            return as_string(self.stream_handle.read())
>> +        else:
>> +            return as_string(self.stream_handle.read(size))
>> +
>> +    def readline(self):
>> +        """ Reads a line from the underlying byte stream
>> +            and converts it to utf8
>> +        """
>> +        return as_string(self.stream_handle.readline())
>> +
>> +    def readlines(self, sizeHint = None):
>> +        """ Returns a list containing lines from the file converted to unicode.
>> +
>> +            sizehint - Optional. If the optional sizehint argument is
>> +            present, instead of reading up to EOF, whole lines totalling
>> +            approximately sizehint bytes are read.
>> +        """
>> +        lines = self.stream_handle.readlines(sizeHint)
>> +        for i in range(0, len(lines)):
>> +            lines[i] = as_string(lines[i])
>> +        return lines
>> +
>> +    def close(self):
>> +        """ Closes the underlying byte stream """
>> +        self.stream_handle.close()
>> +
>> +    def flush(self):
>> +        """ Flushes the underlying byte stream """
>> +        self.stream_handle.flush()
>> +
>>   class HelpFormatter(optparse.IndentedHelpFormatter):
>>       def __init__(self):
>>           optparse.IndentedHelpFormatter.__init__(self)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding
  2019-12-05 13:55         ` Junio C Hamano
@ 2019-12-05 20:23           ` Ben Keene
  0 siblings, 0 replies; 46+ messages in thread
From: Ben Keene @ 2019-12-05 20:23 UTC (permalink / raw)
  To: Junio C Hamano, Ben Keene via GitGitGadget; +Cc: git


On 12/5/2019 8:55 AM, Junio C Hamano wrote:
> "Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Ben Keene <seraphire@gmail.com>
>>
>> The p4CmdList is a commonly used function in the git-p4 code. It is used to execute a command in P4 and return the results of the call in a list.
> Somewhere in the midway of the series, the log message starts using
> all-caps AS_STRING and AS_BYTES to describe some specific things,
> and it would help readers if the first one of these steps explain
> what they mean (I am guessing AS_STRING is an unicode object in both
> Python 2 and 3, and AS_BYTES is a plain vanilla string in Python 2,
> or something like that?).

I rewrote almost the entire commit message. Hopefully this will clarify 
the code.

>> Change this code to take a new optional parameter, encode_data that will optionally convert the data AS_STRING() that isto be returned by the function.
> s/isto/is to/;
>
> This sentence is a bit hard to read.
>
> This change does not make the function optionally convert the input
> we feed to the p4 command---it only changes the values in the
> command output.  But the readers cannot tell that easily until
> reading to the very end of the sentence, i.e. "returned by the
> function", as written.
>
> We probably want to be a bit more explicit to say what gets
> converted; perhaps renaming the parameter to encode_cmd_output may
> help.


I renamed the parameter as suggested.


>> Change the code so that the key will always be encoded AS_STRING()
> s/key/key of the returned hash/ or something to clarify what key you
> are talking about.
>
>> Data that is passed for standard input (stdin) should be AS_BYTES() to ensure unicode text that is supplied will be written out as bytes.
> "Data that is passed to the standard input stream of the p4 process"
> to clarify whose standard input you are talking about (iow, "git p4"
> also has and it may use its standard input, but this function does
> not muck with it).
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3
  2019-12-05 18:51           ` Denton Liu
@ 2019-12-05 20:47             ` Ben Keene
  0 siblings, 0 replies; 46+ messages in thread
From: Ben Keene @ 2019-12-05 20:47 UTC (permalink / raw)
  To: Denton Liu; +Cc: Luke Diamand, Git Users, Junio C Hamano


On 12/5/2019 1:51 PM, Denton Liu wrote:
> On Thu, Dec 05, 2019 at 11:16:27AM -0500, Ben Keene wrote:
>> On 12/5/2019 4:54 AM, Luke Diamand wrote:
>>> On Wed, 4 Dec 2019 at 22:29, Ben Keene via GitGitGadget
>>> - There are some bits of trailing whitespace around - can you strip
>>> those out? You can use "git diff --check".
>>
>> Is there a way that I can find out which branches I need to remove white
>> space from now that they have been committed?
> I'm assuming you mean commits? You can run
>
> 	git log --check master..
>
> and git will highlight the whitespace errors.
Yes, that's exactly what I meant.  Thank you.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, back to index

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-13 21:07 [PATCH 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-11-13 21:07 ` [PATCH 1/1] " Ben Keene via GitGitGadget
2019-11-14  2:25 ` [PATCH 0/1] git-p4.py: " Junio C Hamano
2019-11-14  9:46   ` Luke Diamand
2019-11-15 14:39 ` [PATCH v2 0/3] " Ben Keene via GitGitGadget
2019-11-15 14:39   ` [PATCH v2 1/3] " Ben Keene via GitGitGadget
2019-11-15 14:39   ` [PATCH v2 2/3] FIX: cast as unicode fails when a value is already unicode Ben Keene via GitGitGadget
2019-11-15 14:39   ` [PATCH v2 3/3] FIX: wrap return for read_pipe_lines in ustring() and wrap GitLFS read of the pointer file in ustring() Ben Keene via GitGitGadget
2019-12-02 19:02   ` [PATCH v3 0/1] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-12-02 19:02     ` [PATCH v3 1/1] Python3 support for t9800 tests. Basic P4/Python3 support Ben Keene via GitGitGadget
2019-12-03  0:18       ` Denton Liu
2019-12-03 16:03         ` Ben Keene
2019-12-04  6:14           ` Denton Liu
2019-12-04 22:29     ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Ben Keene via GitGitGadget
2019-12-04 22:29       ` [PATCH v4 01/11] git-p4: select p4 binary by operating-system Ben Keene via GitGitGadget
2019-12-05 10:19         ` Denton Liu
2019-12-05 16:32           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 02/11] git-p4: change the expansion test from basestring to list Ben Keene via GitGitGadget
2019-12-05 10:27         ` Denton Liu
2019-12-05 17:05           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 03/11] git-p4: add new helper functions for python3 conversion Ben Keene via GitGitGadget
2019-12-05 10:40         ` Denton Liu
2019-12-05 18:42           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 04/11] git-p4: python3 syntax changes Ben Keene via GitGitGadget
2019-12-05 11:02         ` Denton Liu
2019-12-04 22:29       ` [PATCH v4 05/11] git-p4: Add new functions in preparation of usage Ben Keene via GitGitGadget
2019-12-05 10:50         ` Denton Liu
2019-12-05 19:23           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 06/11] git-p4: Fix assumed path separators to be more Windows friendly Ben Keene via GitGitGadget
2019-12-05 13:38         ` Junio C Hamano
2019-12-05 19:37           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 07/11] git-p4: Add a helper class for stream writing Ben Keene via GitGitGadget
2019-12-05 13:42         ` Junio C Hamano
2019-12-05 19:52           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 08/11] git-p4: p4CmdList - support Unicode encoding Ben Keene via GitGitGadget
2019-12-05 13:55         ` Junio C Hamano
2019-12-05 20:23           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 09/11] git-p4: Add usability enhancements Ben Keene via GitGitGadget
2019-12-05 14:04         ` Junio C Hamano
2019-12-05 15:40           ` Ben Keene
2019-12-04 22:29       ` [PATCH v4 10/11] git-p4: Support python3 for basic P4 clone, sync, and submit Ben Keene via GitGitGadget
2019-12-04 22:29       ` [PATCH v4 11/11] git-p4: Added --encoding parameter to p4 clone Ben Keene via GitGitGadget
2019-12-05  9:54       ` [PATCH v4 00/11] git-p4.py: Cast byte strings to unicode strings in python3 Luke Diamand
2019-12-05 16:16         ` Ben Keene
2019-12-05 18:51           ` Denton Liu
2019-12-05 20:47             ` Ben Keene

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git