All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors
@ 2021-12-13 22:54 Joel Holdsworth
  2021-12-13 22:54 ` [PATCH 1/4] git-p4: use with statements to close files after use in patchRCSKeywords Joel Holdsworth
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Joel Holdsworth @ 2021-12-13 22:54 UTC (permalink / raw)
  To: git
  Cc: Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin,
	Johannes Schindelin, Luke Diamand, Ben Keene, Andrew Oakley,
	Joel Holdsworth

This patch-set fixes a family of issues with git-p4's handling of
incoming text data that contains RCS keywords, when those files contain
bytes which are invalid UTF-8 codes.

Among the patches is a fix for the issue, as well as some peripheral
tidy-ups and improvements to the existing code.

This patch-set is compatible and has been tested with both Python 2 and
3, and includes a test.

Joel Holdsworth (4):
  git-p4: use with statements to close files after use in
    patchRCSKeywords
  git-p4: pre-compile RCS keyword regexes
  git-p4: add raw option to read_pipelines
  git-p4: resolve RCS keywords in binary

 git-p4.py             | 66 ++++++++++++++++++-------------------------
 t/t9810-git-p4-rcs.sh | 15 ++++++++++
 2 files changed, 42 insertions(+), 39 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] git-p4: use with statements to close files after use in patchRCSKeywords
  2021-12-13 22:54 [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors Joel Holdsworth
@ 2021-12-13 22:54 ` Joel Holdsworth
  2021-12-13 22:54 ` [PATCH 2/4] git-p4: pre-compile RCS keyword regexes Joel Holdsworth
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Joel Holdsworth @ 2021-12-13 22:54 UTC (permalink / raw)
  To: git
  Cc: Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin,
	Johannes Schindelin, Luke Diamand, Ben Keene, Andrew Oakley,
	Joel Holdsworth

Python with statements are used to wrap the execution of a block of code
so that an object can be safely released when execution leaves the
scope.

They are desirable for improving code tidyness, and to ensure that
objects are properly destroyed even when exceptions are thrown.

Signed-off-by: Joel Holdsworth <jholdsworth@nvidia.com>
---
 git-p4.py | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 2b4500226a..226cdef424 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1757,14 +1757,11 @@ def patchRCSKeywords(self, file, pattern):
         # Attempt to zap the RCS keywords in a p4 controlled file matching the given pattern
         (handle, outFileName) = tempfile.mkstemp(dir='.')
         try:
-            outFile = os.fdopen(handle, "w+")
-            inFile = open(file, "r")
-            regexp = re.compile(pattern, re.VERBOSE)
-            for line in inFile.readlines():
-                line = regexp.sub(r'$\1$', line)
-                outFile.write(line)
-            inFile.close()
-            outFile.close()
+            with os.fdopen(handle, "w+") as outFile, open(file, "r") as inFile:
+                regexp = re.compile(pattern, re.VERBOSE)
+                for line in inFile.readlines():
+                    line = regexp.sub(r'$\1$', line)
+                    outFile.write(line)
             # Forcibly overwrite the original file
             os.unlink(file)
             shutil.move(outFileName, file)
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] git-p4: pre-compile RCS keyword regexes
  2021-12-13 22:54 [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors Joel Holdsworth
  2021-12-13 22:54 ` [PATCH 1/4] git-p4: use with statements to close files after use in patchRCSKeywords Joel Holdsworth
@ 2021-12-13 22:54 ` Joel Holdsworth
  2021-12-13 22:54 ` [PATCH 3/4] git-p4: add raw option to read_pipelines Joel Holdsworth
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Joel Holdsworth @ 2021-12-13 22:54 UTC (permalink / raw)
  To: git
  Cc: Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin,
	Johannes Schindelin, Luke Diamand, Ben Keene, Andrew Oakley,
	Joel Holdsworth

Previously git-p4.py would compile one of two regular expressions for
ever RCS keyword-enabled file. This patch improves simplifies the code
by pre-compiling the two regular expressions when the script first
loads.

Signed-off-by: Joel Holdsworth <jholdsworth@nvidia.com>
---
 git-p4.py | 48 ++++++++++++++++++------------------------------
 1 file changed, 18 insertions(+), 30 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 226cdef424..0af83b9c72 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -56,6 +56,9 @@
 
 p4_access_checked = False
 
+re_ko_keywords = re.compile(r'\$(Id|Header)(:[^$\n]+)?\$')
+re_k_keywords = re.compile(r'\$(Id|Header|Author|Date|DateTime|Change|File|Revision)(:[^$\n]+)?\$')
+
 def p4_build_cmd(cmd):
     """Build a suitable p4 command line.
 
@@ -577,20 +580,12 @@ def p4_type(f):
 #
 def p4_keywords_regexp_for_type(base, type_mods):
     if base in ("text", "unicode", "binary"):
-        kwords = None
         if "ko" in type_mods:
-            kwords = 'Id|Header'
+            return re_ko_keywords
         elif "k" in type_mods:
-            kwords = 'Id|Header|Author|Date|DateTime|Change|File|Revision'
+            return re_k_keywords
         else:
             return None
-        pattern = r"""
-            \$              # Starts with a dollar, followed by...
-            (%s)            # one of the keywords, followed by...
-            (:[^$\n]+)?     # possibly an old expansion, followed by...
-            \$              # another dollar
-            """ % kwords
-        return pattern
     else:
         return None
 
@@ -1753,15 +1748,13 @@ def prepareLogMessage(self, template, message, jobs):
 
         return result
 
-    def patchRCSKeywords(self, file, pattern):
-        # Attempt to zap the RCS keywords in a p4 controlled file matching the given pattern
+    def patchRCSKeywords(self, file, regexp):
+        # Attempt to zap the RCS keywords in a p4 controlled file matching the given regex
         (handle, outFileName) = tempfile.mkstemp(dir='.')
         try:
             with os.fdopen(handle, "w+") as outFile, open(file, "r") as inFile:
-                regexp = re.compile(pattern, re.VERBOSE)
                 for line in inFile.readlines():
-                    line = regexp.sub(r'$\1$', line)
-                    outFile.write(line)
+                    outFile.write(regexp.sub(r'$\1$', line))
             # Forcibly overwrite the original file
             os.unlink(file)
             shutil.move(outFileName, file)
@@ -2088,25 +2081,22 @@ def applyCommit(self, id):
             # the patch to see if that's possible.
             if gitConfigBool("git-p4.attemptRCSCleanup"):
                 file = None
-                pattern = None
                 kwfiles = {}
                 for file in editedFiles | filesToDelete:
                     # did this file's delta contain RCS keywords?
-                    pattern = p4_keywords_regexp_for_file(file)
-
-                    if pattern:
+                    regexp = p4_keywords_regexp_for_file(file)
+                    if regexp:
                         # this file is a possibility...look for RCS keywords.
-                        regexp = re.compile(pattern, re.VERBOSE)
                         for line in read_pipe_lines(["git", "diff", "%s^..%s" % (id, id), file]):
                             if regexp.search(line):
                                 if verbose:
-                                    print("got keyword match on %s in %s in %s" % (pattern, line, file))
-                                kwfiles[file] = pattern
+                                    print("got keyword match on %s in %s in %s" % (regex.pattern, line, file))
+                                kwfiles[file] = regexp
                                 break
 
-                for file in kwfiles:
+                for file, regexp in kwfiles.items():
                     if verbose:
-                        print("zapping %s with %s" % (line,pattern))
+                        print("zapping %s with %s" % (line, regexp.pattern))
                     # File is being deleted, so not open in p4.  Must
                     # disable the read-only bit on windows.
                     if self.isWindows and file not in editedFiles:
@@ -3026,12 +3016,10 @@ def streamOneP4File(self, file, contents):
 
         # Note that we do not try to de-mangle keywords on utf16 files,
         # even though in theory somebody may want that.
-        pattern = p4_keywords_regexp_for_type(type_base, type_mods)
-        if pattern:
-            regexp = re.compile(pattern, re.VERBOSE)
-            text = ''.join(decode_text_stream(c) for c in contents)
-            text = regexp.sub(r'$\1$', text)
-            contents = [ encode_text_stream(text) ]
+        regexp = p4_keywords_regexp_for_type(type_base, type_mods)
+        if regexp:
+            contents = [encode_text_stream(regexp.sub(
+                r'$\1$', ''.join(decode_text_stream(c) for c in contents)))]
 
         if self.largeFileSystem:
             (git_mode, contents) = self.largeFileSystem.processContent(git_mode, relPath, contents)
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] git-p4: add raw option to read_pipelines
  2021-12-13 22:54 [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors Joel Holdsworth
  2021-12-13 22:54 ` [PATCH 1/4] git-p4: use with statements to close files after use in patchRCSKeywords Joel Holdsworth
  2021-12-13 22:54 ` [PATCH 2/4] git-p4: pre-compile RCS keyword regexes Joel Holdsworth
@ 2021-12-13 22:54 ` Joel Holdsworth
  2021-12-13 22:54 ` [PATCH 4/4] git-p4: resolve RCS keywords in binary Joel Holdsworth
  2021-12-14 22:36 ` [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors Andrew Oakley
  4 siblings, 0 replies; 9+ messages in thread
From: Joel Holdsworth @ 2021-12-13 22:54 UTC (permalink / raw)
  To: git
  Cc: Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin,
	Johannes Schindelin, Luke Diamand, Ben Keene, Andrew Oakley,
	Joel Holdsworth

Previously the read_lines function always decoded the result lines. In
order to improve support for non-decoded binary processing of data in
git-p4.py, this patch adds a raw option to the function that allows
decoding to be disabled.

Signed-off-by: Joel Holdsworth <jholdsworth@nvidia.com>
---
 git-p4.py | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 0af83b9c72..509feac2d8 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -340,17 +340,19 @@ def p4_read_pipe(c, ignore_error=False, raw=False):
     real_cmd = p4_build_cmd(c)
     return read_pipe(real_cmd, ignore_error, raw=raw)
 
-def read_pipe_lines(c):
+def read_pipe_lines(c, raw=False):
     if verbose:
         sys.stderr.write('Reading pipe: %s\n' % str(c))
 
     expand = not isinstance(c, list)
     p = subprocess.Popen(c, stdout=subprocess.PIPE, shell=expand)
     pipe = p.stdout
-    val = [decode_text_stream(line) for line in pipe.readlines()]
+    lines = pipe.readlines()
+    if not raw:
+        lines = [decode_text_stream(line) for line in lines]
     if pipe.close() or p.wait():
         die('Command failed: %s' % str(c))
-    return val
+    return lines
 
 def p4_read_pipe_lines(c):
     """Specifically invoke p4 on the command supplied. """
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] git-p4: resolve RCS keywords in binary
  2021-12-13 22:54 [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors Joel Holdsworth
                   ` (2 preceding siblings ...)
  2021-12-13 22:54 ` [PATCH 3/4] git-p4: add raw option to read_pipelines Joel Holdsworth
@ 2021-12-13 22:54 ` Joel Holdsworth
  2021-12-13 23:34   ` Junio C Hamano
  2021-12-14 22:36 ` [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors Andrew Oakley
  4 siblings, 1 reply; 9+ messages in thread
From: Joel Holdsworth @ 2021-12-13 22:54 UTC (permalink / raw)
  To: git
  Cc: Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin,
	Johannes Schindelin, Luke Diamand, Ben Keene, Andrew Oakley,
	Joel Holdsworth

RCS keywords are strings that will are replaced with information from
Perforce. Examples include $Date$, $Author$, $File$, $Change$ etc.

Perforce resolves these by expanding them with their expanded values
when files are synced, but Git's data model requires these expanded
values to be converted back into their unexpanded form.

Previously, git-p4.py would implement this behaviour through the use of
regular expressions. However, the regular expression substitution was
applied using decoded strings i.e. the content of incoming commit diffs
was first decoded from bytes into UTF-8, processed with regular
expressions, then converted back to bytes.

Not only is this behaviour inefficient, but it is also a cause of a
common issue caused by text files containing invalid UTF-8 data. For
files created in Windows, CP1252 Smart Quote Characters (0x93 and 0x94)
are seen fairly frequently. These codes are invalid in UTF-8, so if the
script encountered any file containing them, on Python 2 the symbols
will be corrupted, and on Python 3 the script will fail with an
exception.

This patch replaces this decoding/encoding with bytes object regular
expressions, so that the substitution is performed directly upon the
source data with not conversions.

A test for smart quote handling has been added to the
t9810-git-p4-rcs.sh test suite.

Signed-off-by: Joel Holdsworth <jholdsworth@nvidia.com>
---
 git-p4.py             | 15 ++++++++-------
 t/t9810-git-p4-rcs.sh | 15 +++++++++++++++
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 509feac2d8..986595bef0 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -56,8 +56,8 @@
 
 p4_access_checked = False
 
-re_ko_keywords = re.compile(r'\$(Id|Header)(:[^$\n]+)?\$')
-re_k_keywords = re.compile(r'\$(Id|Header|Author|Date|DateTime|Change|File|Revision)(:[^$\n]+)?\$')
+re_ko_keywords = re.compile(br'\$(Id|Header)(:[^$\n]+)?\$')
+re_k_keywords = re.compile(br'\$(Id|Header|Author|Date|DateTime|Change|File|Revision)(:[^$\n]+)?\$')
 
 def p4_build_cmd(cmd):
     """Build a suitable p4 command line.
@@ -1754,9 +1754,9 @@ def patchRCSKeywords(self, file, regexp):
         # Attempt to zap the RCS keywords in a p4 controlled file matching the given regex
         (handle, outFileName) = tempfile.mkstemp(dir='.')
         try:
-            with os.fdopen(handle, "w+") as outFile, open(file, "r") as inFile:
+            with os.fdopen(handle, "wb") as outFile, open(file, "rb") as inFile:
                 for line in inFile.readlines():
-                    outFile.write(regexp.sub(r'$\1$', line))
+                    outFile.write(regexp.sub(br'$\1$', line))
             # Forcibly overwrite the original file
             os.unlink(file)
             shutil.move(outFileName, file)
@@ -2089,7 +2089,9 @@ def applyCommit(self, id):
                     regexp = p4_keywords_regexp_for_file(file)
                     if regexp:
                         # this file is a possibility...look for RCS keywords.
-                        for line in read_pipe_lines(["git", "diff", "%s^..%s" % (id, id), file]):
+                        for line in read_pipe_lines(
+                            ["git", "diff", "%s^..%s" % (id, id), file],
+                            raw=True):
                             if regexp.search(line):
                                 if verbose:
                                     print("got keyword match on %s in %s in %s" % (regex.pattern, line, file))
@@ -3020,8 +3022,7 @@ def streamOneP4File(self, file, contents):
         # even though in theory somebody may want that.
         regexp = p4_keywords_regexp_for_type(type_base, type_mods)
         if regexp:
-            contents = [encode_text_stream(regexp.sub(
-                r'$\1$', ''.join(decode_text_stream(c) for c in contents)))]
+            contents = [regexp.sub(br'$\1$', c) for c in contents]
 
         if self.largeFileSystem:
             (git_mode, contents) = self.largeFileSystem.processContent(git_mode, relPath, contents)
diff --git a/t/t9810-git-p4-rcs.sh b/t/t9810-git-p4-rcs.sh
index e3836888ec..5fe83315ec 100755
--- a/t/t9810-git-p4-rcs.sh
+++ b/t/t9810-git-p4-rcs.sh
@@ -4,6 +4,8 @@ test_description='git p4 rcs keywords'
 
 . ./lib-git-p4.sh
 
+CP1252="\223\224"
+
 test_expect_success 'start p4d' '
 	start_p4d
 '
@@ -32,6 +34,9 @@ test_expect_success 'init depot' '
 		p4 submit -d "filek" &&
 		p4 add -t text+ko fileko &&
 		p4 submit -d "fileko" &&
+		printf "$CP1252" >fileko_cp1252 &&
+		p4 add -t text+ko fileko_cp1252 &&
+		p4 submit -d "fileko_cp1252" &&
 		p4 add -t text file_text &&
 		p4 submit -d "file_text"
 	)
@@ -359,4 +364,14 @@ test_expect_failure 'Add keywords in git which do not match the default p4 value
 	)
 '
 
+test_expect_success 'check cp1252 smart quote are preserved through RCS keyword processing' '
+	test_when_finished cleanup_git &&
+	git p4 clone --dest="$git" //depot &&
+	(
+		cd "$git" &&
+		printf "$CP1252" >expect &&
+		test_cmp_bin expect fileko_cp1252
+	)
+'
+
 test_done
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] git-p4: resolve RCS keywords in binary
  2021-12-13 22:54 ` [PATCH 4/4] git-p4: resolve RCS keywords in binary Joel Holdsworth
@ 2021-12-13 23:34   ` Junio C Hamano
  2021-12-14 13:12     ` Joel Holdsworth
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2021-12-13 23:34 UTC (permalink / raw)
  To: Joel Holdsworth
  Cc: git, Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart,
	Daniel Levin, Johannes Schindelin, Luke Diamand, Ben Keene,
	Andrew Oakley

Joel Holdsworth <jholdsworth@nvidia.com> writes:

> RCS keywords are strings that will are replaced with information from
> Perforce. Examples include $Date$, $Author$, $File$, $Change$ etc.
>
> Perforce resolves these by expanding them with their expanded values
> when files are synced, but Git's data model requires these expanded
> values to be converted back into their unexpanded form.
>
> Previously, git-p4.py would implement this behaviour through the use of
> regular expressions. However, the regular expression substitution was
> applied using decoded strings i.e. the content of incoming commit diffs
> was first decoded from bytes into UTF-8, processed with regular
> expressions, then converted back to bytes.
>
> Not only is this behaviour inefficient, but it is also a cause of a
> common issue caused by text files containing invalid UTF-8 data. For
> files created in Windows, CP1252 Smart Quote Characters (0x93 and 0x94)
> are seen fairly frequently. These codes are invalid in UTF-8, so if the
> script encountered any file containing them, on Python 2 the symbols
> will be corrupted, and on Python 3 the script will fail with an
> exception.

Makes sense, and I am with others who commented on the previous
discussion thread that the right approach to take is to take the
stuff coming from Perforce as byte strings, process them as such and
write them out as byte strings, UNLESS we positively know what the
source and destination encodings are.

And this change we see here, matching with patterns, is perfectly in
line with that direction.  Very nice.

>          try:
> -            with os.fdopen(handle, "w+") as outFile, open(file, "r") as inFile:
> +            with os.fdopen(handle, "wb") as outFile, open(file, "rb") as inFile:

We seem to have lost "w+" and now it is "wb".  I do not see a reason
to make outFile anything but write-only, so the end result looks
good to me, but is it an unrelated "bug"fix that should be explained
as such (e.g. "there is no reason to make outFile read-write, so
instead of using 'w+' just use 'wb' while we make it unencoded
output by adding 'b' to it")?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 4/4] git-p4: resolve RCS keywords in binary
  2021-12-13 23:34   ` Junio C Hamano
@ 2021-12-14 13:12     ` Joel Holdsworth
  2021-12-15 21:41       ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Holdsworth @ 2021-12-14 13:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart,
	Daniel Levin, Johannes Schindelin, Luke Diamand, Ben Keene,
	Andrew Oakley

> Makes sense, and I am with others who commented on the previous
> discussion thread that the right approach to take is to take the stuff coming
> from Perforce as byte strings, process them as such and write them out as
> byte strings, UNLESS we positively know what the source and destination
> encodings are.
> 
> And this change we see here, matching with patterns, is perfectly in line with
> that direction.  Very nice.

Not bad. Fortunately, it's not possible for $ characters to appear as a component of a multi-byte UTF-8 character, so it's possible to do the matching byte-wise.

> 
> >          try:
> > -            with os.fdopen(handle, "w+") as outFile, open(file, "r") as inFile:
> > +            with os.fdopen(handle, "wb") as outFile, open(file, "rb") as inFile:
> 
> We seem to have lost "w+" and now it is "wb".  I do not see a reason to make
> outFile anything but write-only, so the end result looks good to me, but is it
> an unrelated "bug"fix that should be explained as such (e.g. "there is no
> reason to make outFile read-write, so instead of using 'w+' just use 'wb'
> while we make it unencoded output by adding 'b' to it")?

I am happy to split this change into a separate patch if this is preferred.

Joel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors
  2021-12-13 22:54 [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors Joel Holdsworth
                   ` (3 preceding siblings ...)
  2021-12-13 22:54 ` [PATCH 4/4] git-p4: resolve RCS keywords in binary Joel Holdsworth
@ 2021-12-14 22:36 ` Andrew Oakley
  4 siblings, 0 replies; 9+ messages in thread
From: Andrew Oakley @ 2021-12-14 22:36 UTC (permalink / raw)
  To: Joel Holdsworth
  Cc: git, Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart,
	Daniel Levin, Johannes Schindelin, Luke Diamand, Ben Keene

On Mon, 13 Dec 2021 22:54:37 +0000
Joel Holdsworth <jholdsworth@nvidia.com> wrote:

> This patch-set fixes a family of issues with git-p4's handling of
> incoming text data that contains RCS keywords, when those files
> contain bytes which are invalid UTF-8 codes.
> 
> Among the patches is a fix for the issue, as well as some peripheral
> tidy-ups and improvements to the existing code.

FWIW, these patches look good to me.

I spent a while trying to understand exactly how perforce handles the
keyword expansion stuff a few years ago.  Other quirks which I can
remember are:
- Files with a filetype of "utf16" files get expanded before we see
  them.  If we want to support that in git-p4 then I think some special
  handling will be required.
- Lines longer than lbr.rcs.maxlen at time of commit are not considered
  to be keyword expansions.  I don't think there is any way to handle
  this, but hopefully it won't ever occur in practice.

I'm not suggesting that these issues need to be solved as part of this
set of patches, just thought that you might want to be aware that there
are some more unsolved issues here.

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] git-p4: resolve RCS keywords in binary
  2021-12-14 13:12     ` Joel Holdsworth
@ 2021-12-15 21:41       ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2021-12-15 21:41 UTC (permalink / raw)
  To: Joel Holdsworth
  Cc: git, Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart,
	Daniel Levin, Johannes Schindelin, Luke Diamand, Ben Keene,
	Andrew Oakley

Joel Holdsworth <jholdsworth@nvidia.com> writes:

>> Makes sense, and I am with others who commented on the previous
>> discussion thread that the right approach to take is to take the stuff coming
>> from Perforce as byte strings, process them as such and write them out as
>> byte strings, UNLESS we positively know what the source and destination
>> encodings are.
>> 
>> And this change we see here, matching with patterns, is perfectly in line with
>> that direction.  Very nice.
>
> Not bad. Fortunately, it's not possible for $ characters to appear as a component of a multi-byte UTF-8 character, so it's possible to do the matching byte-wise.
>
>> 
>> >          try:
>> > -            with os.fdopen(handle, "w+") as outFile, open(file, "r") as inFile:
>> > +            with os.fdopen(handle, "wb") as outFile, open(file, "rb") as inFile:
>> 
>> We seem to have lost "w+" and now it is "wb".  I do not see a reason to make
>> outFile anything but write-only, so the end result looks good to me, but is it
>> an unrelated "bug"fix that should be explained as such (e.g. "there is no
>> reason to make outFile read-write, so instead of using 'w+' just use 'wb'
>> while we make it unencoded output by adding 'b' to it")?
>
> I am happy to split this change into a separate patch if this is preferred.

I do not think this is big enough for a separate patch; just a
mention in the log message is sufficient.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-12-15 21:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-13 22:54 [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors Joel Holdsworth
2021-12-13 22:54 ` [PATCH 1/4] git-p4: use with statements to close files after use in patchRCSKeywords Joel Holdsworth
2021-12-13 22:54 ` [PATCH 2/4] git-p4: pre-compile RCS keyword regexes Joel Holdsworth
2021-12-13 22:54 ` [PATCH 3/4] git-p4: add raw option to read_pipelines Joel Holdsworth
2021-12-13 22:54 ` [PATCH 4/4] git-p4: resolve RCS keywords in binary Joel Holdsworth
2021-12-13 23:34   ` Junio C Hamano
2021-12-14 13:12     ` Joel Holdsworth
2021-12-15 21:41       ` Junio C Hamano
2021-12-14 22:36 ` [PATCH 0/4] git-p4: fix RCS keyword processing encoding errors Andrew Oakley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.