* [PATCH] git-p4: fix bug with encoding of p4 client name
@ 2022-07-08 8:01 Kilian Kilger via GitGitGadget
2022-07-08 11:28 ` Tao Klerks
2022-07-18 8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget
0 siblings, 2 replies; 9+ messages in thread
From: Kilian Kilger via GitGitGadget @ 2022-07-08 8:01 UTC (permalink / raw)
To: git; +Cc: Tao Klerks, Kilian Kilger, Kilian Kilger
From: Kilian Kilger <kilian.kilger@sap.com>
The Perforce client name can contain arbitrary characters
which do not decode to UTF-8. Use the fallback strategy
implemented in metadata_stream_to_writable_bytes() also
for the client name.
Signed-off-by: Kilian Kilger <kkilger@gmail.com>
---
git-p4: Fix bug with encoding of P4 client name
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v1
Pull-Request: https://github.com/git/git/pull/1285
git-p4.py | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/git-p4.py b/git-p4.py
index 8fbf6eb1fe3..e65d6a2b0e1 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -854,12 +854,12 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
if bytes is not str:
# Decode unmarshalled dict to use str keys and values, except for:
# - `data` which may contain arbitrary binary data
- # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
+ # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
# - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
decoded_entry = {}
for key, value in entry.items():
key = key.decode()
- if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
+ if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
value = value.decode()
decoded_entry[key] = value
# Parse out data if it's an error response
@@ -871,6 +871,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
continue
if 'desc' in entry:
entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
+ if 'client' in entry:
+ entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
if 'FullName' in entry:
entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
if cb is not None:
base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5
--
gitgitgadget
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] git-p4: fix bug with encoding of p4 client name
2022-07-08 8:01 [PATCH] git-p4: fix bug with encoding of p4 client name Kilian Kilger via GitGitGadget
@ 2022-07-08 11:28 ` Tao Klerks
2022-07-08 15:05 ` Junio C Hamano
2022-07-18 8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget
1 sibling, 1 reply; 9+ messages in thread
From: Tao Klerks @ 2022-07-08 11:28 UTC (permalink / raw)
To: Kilian Kilger via GitGitGadget; +Cc: git, Kilian Kilger, Kilian Kilger
This makes sense to me, and I don't see anything wrong with the "form"
(and nor does GitGitGadget).
Not sure whether formal review sign-off is used on this list, I don't
tend to see it, but do I see "Reviewed-by" on patches, so FWIW:
Reviewed-by: Tao Klerks <tao@klerks.biz>
On Fri, Jul 8, 2022 at 10:01 AM Kilian Kilger via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Kilian Kilger <kilian.kilger@sap.com>
>
> The Perforce client name can contain arbitrary characters
> which do not decode to UTF-8. Use the fallback strategy
> implemented in metadata_stream_to_writable_bytes() also
> for the client name.
>
> Signed-off-by: Kilian Kilger <kkilger@gmail.com>
> ---
> git-p4: Fix bug with encoding of P4 client name
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v1
> Pull-Request: https://github.com/git/git/pull/1285
>
> git-p4.py | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/git-p4.py b/git-p4.py
> index 8fbf6eb1fe3..e65d6a2b0e1 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -854,12 +854,12 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
> if bytes is not str:
> # Decode unmarshalled dict to use str keys and values, except for:
> # - `data` which may contain arbitrary binary data
> - # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
> + # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
> # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
> decoded_entry = {}
> for key, value in entry.items():
> key = key.decode()
> - if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
> + if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
> value = value.decode()
> decoded_entry[key] = value
> # Parse out data if it's an error response
> @@ -871,6 +871,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
> continue
> if 'desc' in entry:
> entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
> + if 'client' in entry:
> + entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
> if 'FullName' in entry:
> entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
> if cb is not None:
>
> base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5
> --
> gitgitgadget
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git-p4: fix bug with encoding of p4 client name
2022-07-08 11:28 ` Tao Klerks
@ 2022-07-08 15:05 ` Junio C Hamano
0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2022-07-08 15:05 UTC (permalink / raw)
To: Tao Klerks
Cc: Kilian Kilger via GitGitGadget, git, Kilian Kilger, Kilian Kilger
Tao Klerks <tao@klerks.biz> writes:
>
>
> On Fri, Jul 8, 2022 at 10:01 AM Kilian Kilger via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Kilian Kilger <kilian.kilger@sap.com>
>>
>> The Perforce client name can contain arbitrary characters
>> which do not decode to UTF-8. Use the fallback strategy
>> implemented in metadata_stream_to_writable_bytes() also
>> for the client name.
>>
>> Signed-off-by: Kilian Kilger <kkilger@gmail.com>
>> ---
>> ...
>>
>> @@ -871,6 +871,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
>> continue
>> if 'desc' in entry:
>> entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
>> + if 'client' in entry:
>> + entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
>> if 'FullName' in entry:
>> entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName']
We had two repetitions and now we have three, which is a good time
to see if it makes sense to reduce the temptation for future
developers to add the fourth repetition in the next round, e.g.
for e in ["client", "desc", "FullName"]:
if e in entry:
entry[e] = metadata_stream_to_writable_bytes(entry[e])
or something like that?
> This makes sense to me, and I don't see anything wrong with the "form"
> (and nor does GitGitGadget).
One thing that is a bit problematic is that in-body From does not
match the sign-off. Kilian, which identity do you want to use in
your contribution to this project?
> Not sure whether formal review sign-off is used on this list, I don't
> tend to see it, but do I see "Reviewed-by" on patches, so FWIW:
>
> Reviewed-by: Tao Klerks <tao@klerks.biz>
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] git-p4: fix bug with encoding of p4 client name
2022-07-08 8:01 [PATCH] git-p4: fix bug with encoding of p4 client name Kilian Kilger via GitGitGadget
2022-07-08 11:28 ` Tao Klerks
@ 2022-07-18 8:57 ` Kilian Kilger via GitGitGadget
2022-07-18 16:36 ` Junio C Hamano
2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
1 sibling, 2 replies; 9+ messages in thread
From: Kilian Kilger via GitGitGadget @ 2022-07-18 8:57 UTC (permalink / raw)
To: git; +Cc: Tao Klerks, Kilian Kilger, Kilian Kilger
From: Kilian Kilger <kkilger@gmail.com>
The Perforce client name can contain arbitrary characters
which do not decode to UTF-8. Use the fallback strategy
implemented in metadata_stream_to_writable_bytes() also
for the client name.
Signed-off-by: Kilian Kilger <kkilger@gmail.com>
---
git-p4: Fix bug with encoding of P4 client name
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v2
Pull-Request: https://github.com/git/git/pull/1285
Range-diff vs v1:
1: 7393b59c642 ! 1: 3280a9579bc git-p4: fix bug with encoding of p4 client name
@@
## Metadata ##
-Author: Kilian Kilger <kilian.kilger@sap.com>
+Author: Kilian Kilger <kkilger@gmail.com>
## Commit message ##
git-p4: fix bug with encoding of p4 client name
@@ Commit message
Signed-off-by: Kilian Kilger <kkilger@gmail.com>
## git-p4.py ##
+@@ git-p4.py: def isModeExecChanged(src_mode, dst_mode):
+ return isModeExec(src_mode) != isModeExec(dst_mode)
+
+
++def p4KeysContainingNonUtf8Chars():
++ """Returns all keys which may contain non UTF-8 encoded strings
++ for which a fallback strategy has to be applied.
++ """
++ return ['desc', 'client', 'FullName']
++
++
++def p4KeysContainingBinaryData():
++ """Returns all keys which may contain arbitrary binary data
++ """
++ return ['data']
++
++
++def p4KeyContainsFilePaths(key):
++ """Returns True if the key contains file paths. These are handled by decode_path().
++ Otherwise False.
++ """
++ return key.startswith('depotFile') or key in ['path', 'clientFile']
++
++
++def p4KeyWhichCanBeDirectlyDecoded(key):
++ """Returns True if the key can be directly decoded as UTF-8 string
++ Otherwise False.
++
++ Keys which can not be encoded directly:
++ - `data` which may contain arbitrary binary data
++ - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text
++ - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
++ """
++ if key in p4KeysContainingNonUtf8Chars() or \
++ key in p4KeysContainingBinaryData() or \
++ p4KeyContainsFilePaths(key):
++ return False
++ return True
++
++
+ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
+ errors_as_exceptions=False, *k, **kw):
+
@@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
+ try:
+ while True:
+ entry = marshal.load(p4.stdout)
++
if bytes is not str:
- # Decode unmarshalled dict to use str keys and values, except for:
- # - `data` which may contain arbitrary binary data
+- # Decode unmarshalled dict to use str keys and values, except for:
+- # - `data` which may contain arbitrary binary data
- # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
-+ # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
- # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+- # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
++ # Decode unmarshalled dict to use str keys and values. Special cases are handled below.
decoded_entry = {}
for key, value in entry.items():
key = key.decode()
- if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
-+ if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
++ if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key):
value = value.decode()
decoded_entry[key] = value
# Parse out data if it's an error response
@@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
+ if skip_info:
+ if 'code' in entry and entry['code'] == 'info':
continue
- if 'desc' in entry:
- entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
-+ if 'client' in entry:
-+ entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
- if 'FullName' in entry:
- entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
+- if 'desc' in entry:
+- entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
+- if 'FullName' in entry:
+- entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
++ for key in p4KeysContainingNonUtf8Chars():
++ if key in entry:
++ entry[key] = metadata_stream_to_writable_bytes(entry[key])
if cb is not None:
+ cb(entry)
+ else:
git-p4.py | 51 ++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 42 insertions(+), 9 deletions(-)
diff --git a/git-p4.py b/git-p4.py
index 8fbf6eb1fe3..9323b943c68 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -822,6 +822,42 @@ def isModeExecChanged(src_mode, dst_mode):
return isModeExec(src_mode) != isModeExec(dst_mode)
+def p4KeysContainingNonUtf8Chars():
+ """Returns all keys which may contain non UTF-8 encoded strings
+ for which a fallback strategy has to be applied.
+ """
+ return ['desc', 'client', 'FullName']
+
+
+def p4KeysContainingBinaryData():
+ """Returns all keys which may contain arbitrary binary data
+ """
+ return ['data']
+
+
+def p4KeyContainsFilePaths(key):
+ """Returns True if the key contains file paths. These are handled by decode_path().
+ Otherwise False.
+ """
+ return key.startswith('depotFile') or key in ['path', 'clientFile']
+
+
+def p4KeyWhichCanBeDirectlyDecoded(key):
+ """Returns True if the key can be directly decoded as UTF-8 string
+ Otherwise False.
+
+ Keys which can not be encoded directly:
+ - `data` which may contain arbitrary binary data
+ - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text
+ - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+ """
+ if key in p4KeysContainingNonUtf8Chars() or \
+ key in p4KeysContainingBinaryData() or \
+ p4KeyContainsFilePaths(key):
+ return False
+ return True
+
+
def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
errors_as_exceptions=False, *k, **kw):
@@ -851,15 +887,13 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
try:
while True:
entry = marshal.load(p4.stdout)
+
if bytes is not str:
- # Decode unmarshalled dict to use str keys and values, except for:
- # - `data` which may contain arbitrary binary data
- # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
- # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+ # Decode unmarshalled dict to use str keys and values. Special cases are handled below.
decoded_entry = {}
for key, value in entry.items():
key = key.decode()
- if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
+ if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key):
value = value.decode()
decoded_entry[key] = value
# Parse out data if it's an error response
@@ -869,10 +903,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
if skip_info:
if 'code' in entry and entry['code'] == 'info':
continue
- if 'desc' in entry:
- entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
- if 'FullName' in entry:
- entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
+ for key in p4KeysContainingNonUtf8Chars():
+ if key in entry:
+ entry[key] = metadata_stream_to_writable_bytes(entry[key])
if cb is not None:
cb(entry)
else:
base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5
--
gitgitgadget
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] git-p4: fix bug with encoding of p4 client name
2022-07-18 8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget
@ 2022-07-18 16:36 ` Junio C Hamano
2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2022-07-18 16:36 UTC (permalink / raw)
To: Kilian Kilger via GitGitGadget; +Cc: git, Tao Klerks, Kilian Kilger
"Kilian Kilger via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Kilian Kilger <kkilger@gmail.com>
>
> The Perforce client name can contain arbitrary characters
> which do not decode to UTF-8. Use the fallback strategy
> implemented in metadata_stream_to_writable_bytes() also
> for the client name.
>
> Signed-off-by: Kilian Kilger <kkilger@gmail.com>
> ---
> git-p4: Fix bug with encoding of P4 client name
Sorry, is this to improve the topic that has already last Monday?
If so, it is way too late to update with a replacement patch, like
this. Instead, please send in a follow-up patch on top of what has
already been merged.
The description above seems to be identical to what was in the
previous round, but the patch seems to use a lot more code, so
such a follow-up patch would have plenty to explain how the new
strategy improves over the previous one.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name
2022-07-18 8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget
2022-07-18 16:36 ` Junio C Hamano
@ 2022-07-21 9:07 ` Kilian Kilger via GitGitGadget
2022-07-21 9:07 ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget
` (2 more replies)
1 sibling, 3 replies; 9+ messages in thread
From: Kilian Kilger via GitGitGadget @ 2022-07-21 9:07 UTC (permalink / raw)
To: git; +Cc: Tao Klerks, Kilian Kilger
Kilian Kilger (2):
git-p4: fix bug with encoding of p4 client name
git-p4: refactoring of p4CmdList()
git-p4.py | 51 ++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 42 insertions(+), 9 deletions(-)
base-commit: e4a4b31577c7419497ac30cebe30d755b97752c5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1285%2Fcohomology%2Fmaint-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1285/cohomology/maint-v3
Pull-Request: https://github.com/git/git/pull/1285
Range-diff vs v2:
-: ----------- > 1: 87e7809b75a git-p4: fix bug with encoding of p4 client name
1: 3280a9579bc ! 2: 4a81423f0e8 git-p4: fix bug with encoding of p4 client name
@@ Metadata
Author: Kilian Kilger <kkilger@gmail.com>
## Commit message ##
- git-p4: fix bug with encoding of p4 client name
+ git-p4: refactoring of p4CmdList()
- The Perforce client name can contain arbitrary characters
- which do not decode to UTF-8. Use the fallback strategy
- implemented in metadata_stream_to_writable_bytes() also
- for the client name.
+ The function p4CmdList executes a Perforce command and
+ decodes the marshalled python dictionary. Special care has to be
+ taken for certain dictionary values which contain non-unicode characters.
+ The old handling contained separate hacks for each of the corresponding
+ dictionary keys. This commit tries to refactor the coding to handle the
+ special cases uniformely.
Signed-off-by: Kilian Kilger <kkilger@gmail.com>
@@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=F
if bytes is not str:
- # Decode unmarshalled dict to use str keys and values, except for:
- # - `data` which may contain arbitrary binary data
-- # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
+- # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
- # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+ # Decode unmarshalled dict to use str keys and values. Special cases are handled below.
decoded_entry = {}
for key, value in entry.items():
key = key.decode()
-- if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
+- if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
+ if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key):
value = value.decode()
decoded_entry[key] = value
@@ git-p4.py: def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=F
continue
- if 'desc' in entry:
- entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
+- if 'client' in entry:
+- entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
- if 'FullName' in entry:
- entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
+ for key in p4KeysContainingNonUtf8Chars():
--
gitgitgadget
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/2] git-p4: fix bug with encoding of p4 client name
2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
@ 2022-07-21 9:07 ` Kilian Kilger via GitGitGadget
2022-07-21 9:07 ` [PATCH v3 2/2] git-p4: refactoring of p4CmdList() Kilian Kilger via GitGitGadget
2022-07-21 16:46 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name Junio C Hamano
2 siblings, 0 replies; 9+ messages in thread
From: Kilian Kilger via GitGitGadget @ 2022-07-21 9:07 UTC (permalink / raw)
To: git; +Cc: Tao Klerks, Kilian Kilger, Kilian Kilger
From: Kilian Kilger <kilian.kilger@sap.com>
The Perforce client name can contain arbitrary characters
which do not decode to UTF-8. Use the fallback strategy
implemented in metadata_stream_to_writable_bytes() also
for the client name.
Signed-off-by: Kilian Kilger <kkilger@gmail.com>
Reviewed-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
git-p4.py | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/git-p4.py b/git-p4.py
index 8fbf6eb1fe3..e65d6a2b0e1 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -854,12 +854,12 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
if bytes is not str:
# Decode unmarshalled dict to use str keys and values, except for:
# - `data` which may contain arbitrary binary data
- # - `desc` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
+ # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
# - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
decoded_entry = {}
for key, value in entry.items():
key = key.decode()
- if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile') or key.startswith('depotFile')):
+ if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
value = value.decode()
decoded_entry[key] = value
# Parse out data if it's an error response
@@ -871,6 +871,8 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
continue
if 'desc' in entry:
entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
+ if 'client' in entry:
+ entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
if 'FullName' in entry:
entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
if cb is not None:
--
gitgitgadget
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 2/2] git-p4: refactoring of p4CmdList()
2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
2022-07-21 9:07 ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget
@ 2022-07-21 9:07 ` Kilian Kilger via GitGitGadget
2022-07-21 16:46 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name Junio C Hamano
2 siblings, 0 replies; 9+ messages in thread
From: Kilian Kilger via GitGitGadget @ 2022-07-21 9:07 UTC (permalink / raw)
To: git; +Cc: Tao Klerks, Kilian Kilger, Kilian Kilger
From: Kilian Kilger <kkilger@gmail.com>
The function p4CmdList executes a Perforce command and
decodes the marshalled python dictionary. Special care has to be
taken for certain dictionary values which contain non-unicode characters.
The old handling contained separate hacks for each of the corresponding
dictionary keys. This commit tries to refactor the coding to handle the
special cases uniformely.
Signed-off-by: Kilian Kilger <kkilger@gmail.com>
---
git-p4.py | 53 ++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 42 insertions(+), 11 deletions(-)
diff --git a/git-p4.py b/git-p4.py
index e65d6a2b0e1..9323b943c68 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -822,6 +822,42 @@ def isModeExecChanged(src_mode, dst_mode):
return isModeExec(src_mode) != isModeExec(dst_mode)
+def p4KeysContainingNonUtf8Chars():
+ """Returns all keys which may contain non UTF-8 encoded strings
+ for which a fallback strategy has to be applied.
+ """
+ return ['desc', 'client', 'FullName']
+
+
+def p4KeysContainingBinaryData():
+ """Returns all keys which may contain arbitrary binary data
+ """
+ return ['data']
+
+
+def p4KeyContainsFilePaths(key):
+ """Returns True if the key contains file paths. These are handled by decode_path().
+ Otherwise False.
+ """
+ return key.startswith('depotFile') or key in ['path', 'clientFile']
+
+
+def p4KeyWhichCanBeDirectlyDecoded(key):
+ """Returns True if the key can be directly decoded as UTF-8 string
+ Otherwise False.
+
+ Keys which can not be encoded directly:
+ - `data` which may contain arbitrary binary data
+ - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text
+ - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+ """
+ if key in p4KeysContainingNonUtf8Chars() or \
+ key in p4KeysContainingBinaryData() or \
+ p4KeyContainsFilePaths(key):
+ return False
+ return True
+
+
def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
errors_as_exceptions=False, *k, **kw):
@@ -851,15 +887,13 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
try:
while True:
entry = marshal.load(p4.stdout)
+
if bytes is not str:
- # Decode unmarshalled dict to use str keys and values, except for:
- # - `data` which may contain arbitrary binary data
- # - `desc` or `client` or `FullName` which may contain non-UTF8 encoded text handled below, eagerly converted to bytes
- # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text, handled by decode_path()
+ # Decode unmarshalled dict to use str keys and values. Special cases are handled below.
decoded_entry = {}
for key, value in entry.items():
key = key.decode()
- if isinstance(value, bytes) and not (key in ('data', 'desc', 'FullName', 'path', 'clientFile', 'client') or key.startswith('depotFile')):
+ if isinstance(value, bytes) and p4KeyWhichCanBeDirectlyDecoded(key):
value = value.decode()
decoded_entry[key] = value
# Parse out data if it's an error response
@@ -869,12 +903,9 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
if skip_info:
if 'code' in entry and entry['code'] == 'info':
continue
- if 'desc' in entry:
- entry['desc'] = metadata_stream_to_writable_bytes(entry['desc'])
- if 'client' in entry:
- entry['client'] = metadata_stream_to_writable_bytes(entry['client'])
- if 'FullName' in entry:
- entry['FullName'] = metadata_stream_to_writable_bytes(entry['FullName'])
+ for key in p4KeysContainingNonUtf8Chars():
+ if key in entry:
+ entry[key] = metadata_stream_to_writable_bytes(entry[key])
if cb is not None:
cb(entry)
else:
--
gitgitgadget
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name
2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
2022-07-21 9:07 ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget
2022-07-21 9:07 ` [PATCH v3 2/2] git-p4: refactoring of p4CmdList() Kilian Kilger via GitGitGadget
@ 2022-07-21 16:46 ` Junio C Hamano
2 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2022-07-21 16:46 UTC (permalink / raw)
To: Kilian Kilger via GitGitGadget; +Cc: git, Tao Klerks, Kilian Kilger
"Kilian Kilger via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Kilian Kilger (2):
> git-p4: fix bug with encoding of p4 client name
> git-p4: refactoring of p4CmdList()
[PATCH v3 1/2] seems to be identical to 34f67c96 (git-p4: fix bug
with encoding of p4 client name, 2022-07-08) that was already merged
to 'next' yesterday, so I can safely ignore it and take [PATCH v3
2/2] as if it were a single follow-up patch on the same topic and
queue it on top.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-07-21 16:46 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-08 8:01 [PATCH] git-p4: fix bug with encoding of p4 client name Kilian Kilger via GitGitGadget
2022-07-08 11:28 ` Tao Klerks
2022-07-08 15:05 ` Junio C Hamano
2022-07-18 8:57 ` [PATCH v2] " Kilian Kilger via GitGitGadget
2022-07-18 16:36 ` Junio C Hamano
2022-07-21 9:07 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 " Kilian Kilger via GitGitGadget
2022-07-21 9:07 ` [PATCH v3 1/2] git-p4: fix bug with encoding of p4 " Kilian Kilger via GitGitGadget
2022-07-21 9:07 ` [PATCH v3 2/2] git-p4: refactoring of p4CmdList() Kilian Kilger via GitGitGadget
2022-07-21 16:46 ` [PATCH v3 0/2] git-p4: Fix bug with encoding of P4 client name Junio C Hamano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.