* [PATCH v2 0/3] Transition git-p4.py to support Python 3 only @ 2021-12-10 15:30 Joel Holdsworth 2021-12-10 15:30 ` [PATCH v2 1/3] git-p4: remove support for Python 2 Joel Holdsworth ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Joel Holdsworth @ 2021-12-10 15:30 UTC (permalink / raw) To: git Cc: Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin, Johannes Schindelin, Luke Diamand, Ben Keene, Andrew Oakley, Joel Holdsworth The git-p4.py script currently implements code paths for both Python 2 and 3. Python 2 was discontinued in 2020, and there is no longer any officially supported interpreter. Further development of git-p4.py will require would-be developers to test their changes with all supported dialects of the language. However, if there is no longer any supported runtime environment available, this places an unreasonable burden on the Git project to maintain support for an obselete dialect of the language. The patch-set removes all Python 2-specific code paths. This second revisision of the patch-set is more tightly focussed on the task of retiring Python 2, and excludes previously submitted patches that contain other tidy-ups and bug fixes. Joel Holdsworth (3): git-p4: remove support for Python 2 git-p4: eliminate decode_stream and encode_stream git-p4: add "Nvidia Corporation" to copyright header git-p4.py | 132 +++++++++++++++++++----------------------------------- 1 file changed, 45 insertions(+), 87 deletions(-) -- 2.33.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/3] git-p4: remove support for Python 2 2021-12-10 15:30 [PATCH v2 0/3] Transition git-p4.py to support Python 3 only Joel Holdsworth @ 2021-12-10 15:30 ` Joel Holdsworth 2021-12-12 17:50 ` Andrew Oakley 2021-12-10 15:31 ` [PATCH v2 2/3] git-p4: eliminate decode_stream and encode_stream Joel Holdsworth 2021-12-10 15:31 ` [PATCH v2 3/3] git-p4: add "Nvidia Corporation" to copyright header Joel Holdsworth 2 siblings, 1 reply; 10+ messages in thread From: Joel Holdsworth @ 2021-12-10 15:30 UTC (permalink / raw) To: git Cc: Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin, Johannes Schindelin, Luke Diamand, Ben Keene, Andrew Oakley, Joel Holdsworth git-p4 previously contained seperate code-paths for Python 2 and 3 to abstract away the differences in string handling behaviour between the two platforms. This patch removes the Python 2 code-paths within this abstraction without removing the abstractions themselves. These will be removed in later patches to further modernise the script. The motivation for this change is that there is a family of issues with git-p4's handling of incoming text data when it contains bytes which cannot be decoded into UTF-8 characters. For text files created in Windows, CP1252 Smart Quote Characters (0x93 and 0x94) are seen fairly frequently. These codes are invalid in UTF-8, so if the script encounters any file or file name containing them, on Python 2 the symbols will be corrupted, and on Python 3 the script will fail with an exception. In order to address these issues it will be necessary to overhaul git-p4's handling of incoming data. Keeping a clean separation between encoded bytes and decoded text is much easier to do in Python 3. If Python 2 support must be maintained, this will require careful testing of the separate code paths for each platform, which is unreasonable given that Python 2 is now thoroughly deprecated. The minimum supported Python version has been set to 3.6. This version is no longer supported by the Python project, however at the current time it is still available for use in RHEL 8. No features from newer versions of Python are currently required. Signed-off-by: Joel Holdsworth <jholdsworth@nvidia.com> --- git-p4.py | 90 ++++++++++++++++++------------------------------------- 1 file changed, 29 insertions(+), 61 deletions(-) diff --git a/git-p4.py b/git-p4.py index 2b4500226a..e3fe86e4f2 100755 --- a/git-p4.py +++ b/git-p4.py @@ -1,4 +1,4 @@ -#!/usr/bin/env python +#!/usr/bin/env python3 # # git-p4.py -- A tool for bidirectional operation between a Perforce depot and git. # @@ -16,8 +16,9 @@ # pylint: disable=too-many-branches,too-many-nested-blocks # import sys -if sys.version_info.major < 3 and sys.version_info.minor < 7: - sys.stderr.write("git-p4: requires Python 2.7 or later.\n") +if (sys.version_info.major < 3 or + (sys.version_info.major == 3 and sys.version_info.minor < 6)): + sys.stderr.write("git-p4: requires Python 3.6 or later.\n") sys.exit(1) import os import optparse @@ -36,16 +37,6 @@ import errno import glob -# On python2.7 where raw_input() and input() are both availble, -# we want raw_input's semantics, but aliased to input for python3 -# compatibility -# support basestring in python3 -try: - if raw_input and input: - input = raw_input -except: - pass - verbose = False # Only labels/tags matching this will be imported/exported @@ -176,35 +167,16 @@ def prompt(prompt_text): if response in choices: return response -# We need different encoding/decoding strategies for text data being passed -# around in pipes depending on python version -if bytes is not str: - # For python3, always encode and decode as appropriate - def decode_text_stream(s): - return s.decode() if isinstance(s, bytes) else s - def encode_text_stream(s): - return s.encode() if isinstance(s, str) else s -else: - # For python2.7, pass read strings as-is, but also allow writing unicode - def decode_text_stream(s): - return s - def encode_text_stream(s): - return s.encode('utf_8') if isinstance(s, unicode) else s +def decode_text_stream(s): + return s.decode() if isinstance(s, bytes) else s +def encode_text_stream(s): + return s.encode() if isinstance(s, str) else s def decode_path(path): """Decode a given string (bytes or otherwise) using configured path encoding options """ encoding = gitConfig('git-p4.pathEncoding') or 'utf_8' - if bytes is not str: - return path.decode(encoding, errors='replace') if isinstance(path, bytes) else path - else: - try: - path.decode('ascii') - except: - path = path.decode(encoding, errors='replace') - if verbose: - print('Path with non-ASCII characters detected. Used {} to decode: {}'.format(encoding, path)) - return path + return path.decode(encoding, errors='replace') if isinstance(path, bytes) else path def run_git_hook(cmd, param=[]): """Execute a hook if the hook exists.""" @@ -289,8 +261,8 @@ def write_pipe(c, stdin): def p4_write_pipe(c, stdin): real_cmd = p4_build_cmd(c) - if bytes is not str and isinstance(stdin, str): - stdin = encode_text_stream(stdin) + if isinstance(stdin, str): + stdin = stdin.encode() return write_pipe(real_cmd, stdin) def read_pipe_full(c): @@ -762,21 +734,18 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, result = [] try: while True: - entry = marshal.load(p4.stdout) - if bytes is not str: - # Decode unmarshalled dict to use str keys and values, except for: - # - `data` which may contain arbitrary binary data - # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text - decoded_entry = {} - for key, value in entry.items(): - key = key.decode() - if isinstance(value, bytes) and not (key in ('data', 'path', 'clientFile') or key.startswith('depotFile')): - value = value.decode() - decoded_entry[key] = value - # Parse out data if it's an error response - if decoded_entry.get('code') == 'error' and 'data' in decoded_entry: - decoded_entry['data'] = decoded_entry['data'].decode() - entry = decoded_entry + # Decode unmarshalled dict to use str keys and values, except for: + # - `data` which may contain arbitrary binary data + # - `depotFile[0-9]*`, `path`, or `clientFile` which may contain non-UTF8 encoded text + entry = {} + for key, value in marshal.load(p4.stdout).items(): + key = key.decode() + if isinstance(value, bytes) and not (key in ('data', 'path', 'clientFile') or key.startswith('depotFile')): + value = value.decode() + entry[key] = value + # Parse out data if it's an error response + if entry.get('code') == 'error' and 'data' in entry: + entry['data'] = entry['data'].decode() if skip_info: if 'code' in entry and entry['code'] == 'info': continue @@ -3840,14 +3809,13 @@ def openStreams(self): self.gitStream = self.importProcess.stdin self.gitError = self.importProcess.stderr - if bytes is not str: - # Wrap gitStream.write() so that it can be called using `str` arguments - def make_encoded_write(write): - def encoded_write(s): - return write(s.encode() if isinstance(s, str) else s) - return encoded_write + # Wrap gitStream.write() so that it can be called using `str` arguments + def make_encoded_write(write): + def encoded_write(s): + return write(s.encode() if isinstance(s, str) else s) + return encoded_write - self.gitStream.write = make_encoded_write(self.gitStream.write) + self.gitStream.write = make_encoded_write(self.gitStream.write) def closeStreams(self): if self.gitStream is None: -- 2.33.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/3] git-p4: remove support for Python 2 2021-12-10 15:30 ` [PATCH v2 1/3] git-p4: remove support for Python 2 Joel Holdsworth @ 2021-12-12 17:50 ` Andrew Oakley 2021-12-12 22:01 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 10+ messages in thread From: Andrew Oakley @ 2021-12-12 17:50 UTC (permalink / raw) To: Joel Holdsworth Cc: git, Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin, Johannes Schindelin, Luke Diamand, Ben Keene On Fri, 10 Dec 2021 15:30:59 +0000 Joel Holdsworth <jholdsworth@nvidia.com> wrote: > The motivation for this change is that there is a family of issues > with git-p4's handling of incoming text data when it contains bytes > which cannot be decoded into UTF-8 characters. For text files created > in Windows, CP1252 Smart Quote Characters (0x93 and 0x94) are seen > fairly frequently. These codes are invalid in UTF-8, so if the script > encounters any file or file name containing them, on Python 2 the > symbols will be corrupted, and on Python 3 the script will fail with > an exception. As I've pointed out previously, peforce fails to store the encoding of text like commit messages. With Windows perforce clients, the encoding used seems to be based on the current code page on the client which made the commit. If you're part of a global organisation with people in different locales making commits then you will find that there is not a consistent encoding for commit messages. Given that you don't know the encoding of the text, what's the best thing to do with the data? Options I can see are: - Feed the raw bytes directly into git. The i18n.commitEncoding config option can be set by the user if they want to attempt to decode the commit messages in something other than UTF-8. - Attempt to detect the encoding somehow, feed the raw bytes directly into git and set the encoding on the commit. - Attempt to dedect the encoding somehow and reencode everything into UTF-8. Right now, if you use python2 then you get the behaviour as described in the first of these options. It doesn't "corrupt" anything, it just transfers the bytes from perforce into git. If you use python3 then git-p4 is unusable because it throws exceptions trying to decode things. It's not clear to me how "attempt to detect the encoding somehow" would work. The first option therefore seems like the best choice. I think that this is the problem which really needs solving. Dropping support for python2 doesn't make the issue go away (although it might make it slightly easier to write the code). I think that the python2 compatibility should be maintained at least until the encoding problems have been solved for python3. I previously wrote some patches which attempt to move in what I think is the right direction, but unfortunately they never got upstreamed: https://lore.kernel.org/git/20210412085251.51475-1-andrew@adoakley.name/ Your comments elsewhere that git-p4 could benifit from some clean-up seem accurate to me, and it would be good to see that kind of change. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/3] git-p4: remove support for Python 2 2021-12-12 17:50 ` Andrew Oakley @ 2021-12-12 22:01 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 10+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-12-12 22:01 UTC (permalink / raw) To: Andrew Oakley Cc: Joel Holdsworth, git, Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin, Johannes Schindelin, Luke Diamand, Ben Keene On Sun, Dec 12 2021, Andrew Oakley wrote: > On Fri, 10 Dec 2021 15:30:59 +0000 > Joel Holdsworth <jholdsworth@nvidia.com> wrote: >> The motivation for this change is that there is a family of issues >> with git-p4's handling of incoming text data when it contains bytes >> which cannot be decoded into UTF-8 characters. For text files created >> in Windows, CP1252 Smart Quote Characters (0x93 and 0x94) are seen >> fairly frequently. These codes are invalid in UTF-8, so if the script >> encounters any file or file name containing them, on Python 2 the >> symbols will be corrupted, and on Python 3 the script will fail with >> an exception. > > As I've pointed out previously, peforce fails to store the encoding of > text like commit messages. With Windows perforce clients, the encoding > used seems to be based on the current code page on the client which > made the commit. If you're part of a global organisation with people > in different locales making commits then you will find that there is > not a consistent encoding for commit messages. > > Given that you don't know the encoding of the text, what's the best > thing to do with the data? Options I can see are: > > - Feed the raw bytes directly into git. The i18n.commitEncoding config > option can be set by the user if they want to attempt to decode the > commit messages in something other than UTF-8. > - Attempt to detect the encoding somehow, feed the raw bytes directly > into git and set the encoding on the commit. > - Attempt to dedect the encoding somehow and reencode everything into > UTF-8. > > Right now, if you use python2 then you get the behaviour as described > in the first of these options. It doesn't "corrupt" anything, it just > transfers the bytes from perforce into git. If you use python3 then > git-p4 is unusable because it throws exceptions trying to decode things. > > [...] > > I think that this is the problem which really needs solving. Dropping > support for python2 doesn't make the issue go away (although it might > make it slightly easier to write the code). I think that the python2 > compatibility should be maintained at least until the encoding problems > have been solved for python3. > > I previously wrote some patches which attempt to move in what I think is > the right direction, but unfortunately they never got upstreamed: > > https://lore.kernel.org/git/20210412085251.51475-1-andrew@adoakley.name/ > > Your comments elsewhere that git-p4 could benifit from some clean-up > seem accurate to me, and it would be good to see that kind of change. This summary makes sense, i.e. if the original SCM doesn't have a declared or consistent encoding then having no "encoding" header etc. in git likewise makes sense, and we should be trying to handle it in our output layer. [Snipped from above]: > It's not clear to me how "attempt to detect the encoding somehow" would > work. The first option therefore seems like the best choice. This really isn't possible to do in the general case, but you can get pretty far with heuristics. The best way to do this that I'm aware of is Mozilla's character detection library: https://www-archive.mozilla.org/projects/intl/chardetinterface Here's an old (maybe/probably not up-to-date) copy of its sources that I've worked with: https://metacpan.org/release/JGMYERS/Encode-Detect-1.01/source/src?sort=[[2,1]] I.e. if you get arbitrary text the best you can do if you're going in blind is to have various character/language frequency tables to try to guess at what encoding is the most plausible, but even then you might still be wrong. I'd think for git-p4 (and git-svn, how do we handle this there?) a sensible first approximation would be to use UTF-8, and if we encounter data that doesn't conform die. Then offer the user to manually configure a "fallback" encoding. I think most real-world projects only have two of those, e.g. some old latin1 data before a UTF-8 migration. And maybe start with a "don't even try" mode, which AFAICT is what you're describing Python 2 doing. If we change the Python 3 code to do what the Python 2 code does now, will we pull the rug from under users who have only ever used the Python 3 code, and are relying on those semantics somehow? ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/3] git-p4: eliminate decode_stream and encode_stream 2021-12-10 15:30 [PATCH v2 0/3] Transition git-p4.py to support Python 3 only Joel Holdsworth 2021-12-10 15:30 ` [PATCH v2 1/3] git-p4: remove support for Python 2 Joel Holdsworth @ 2021-12-10 15:31 ` Joel Holdsworth 2021-12-10 15:31 ` [PATCH v2 3/3] git-p4: add "Nvidia Corporation" to copyright header Joel Holdsworth 2 siblings, 0 replies; 10+ messages in thread From: Joel Holdsworth @ 2021-12-10 15:31 UTC (permalink / raw) To: git Cc: Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin, Johannes Schindelin, Luke Diamand, Ben Keene, Andrew Oakley, Joel Holdsworth The decode_stream and encode_stream functions previously abstracted away the differences in string encode/decode behaviour between Python 2 and Python 3. Given that Python 2 is no longer supported, and the code paths for it have been removed, the abstraction is no longer necessary, and the script can therefore be simplified by eliminating these functions. Signed-off-by: Joel Holdsworth <jholdsworth@nvidia.com> --- git-p4.py | 49 +++++++++++++++++++------------------------------ 1 file changed, 19 insertions(+), 30 deletions(-) diff --git a/git-p4.py b/git-p4.py index e3fe86e4f2..5568d44c72 100755 --- a/git-p4.py +++ b/git-p4.py @@ -167,11 +167,6 @@ def prompt(prompt_text): if response in choices: return response -def decode_text_stream(s): - return s.decode() if isinstance(s, bytes) else s -def encode_text_stream(s): - return s.encode() if isinstance(s, str) else s - def decode_path(path): """Decode a given string (bytes or otherwise) using configured path encoding options """ @@ -276,7 +271,7 @@ def read_pipe_full(c): expand = not isinstance(c, list) p = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=expand) (out, err) = p.communicate() - return (p.returncode, out, decode_text_stream(err)) + return (p.returncode, out, err.decode()) def read_pipe(c, ignore_error=False, raw=False): """ Read output from command. Returns the output text on @@ -288,22 +283,17 @@ def read_pipe(c, ignore_error=False, raw=False): (retcode, out, err) = read_pipe_full(c) if retcode != 0: if ignore_error: - out = "" + out = b"" else: die('Command failed: %s\nError: %s' % (str(c), err)) - if not raw: - out = decode_text_stream(out) - return out + return out if raw else out.decode() def read_pipe_text(c): """ Read output from a command with trailing whitespace stripped. On error, returns None. """ (retcode, out, err) = read_pipe_full(c) - if retcode != 0: - return None - else: - return decode_text_stream(out).rstrip() + return out.decode().rstrip() if retcode == 0 else None def p4_read_pipe(c, ignore_error=False, raw=False): real_cmd = p4_build_cmd(c) @@ -316,7 +306,7 @@ def read_pipe_lines(c): expand = not isinstance(c, list) p = subprocess.Popen(c, stdout=subprocess.PIPE, shell=expand) pipe = p.stdout - val = [decode_text_stream(line) for line in pipe.readlines()] + val = [line.decode() for line in pipe.readlines()] if pipe.close() or p.wait(): die('Command failed: %s' % str(c)) return val @@ -346,7 +336,7 @@ def p4_has_move_command(): cmd = p4_build_cmd(["move", "-k", "@from", "@to"]) p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) (out, err) = p.communicate() - err = decode_text_stream(err) + err = err.decode() # return code will be 1 in either case if err.find("Invalid option") >= 0: return False @@ -721,7 +711,7 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False, stdin_file.write(stdin) else: for i in stdin: - stdin_file.write(encode_text_stream(i)) + stdin_file.write(i.encode() if isinstance(i, str) else i) stdin_file.write(b'\n') stdin_file.flush() stdin_file.seek(0) @@ -963,8 +953,7 @@ def branch_exists(branch): cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) - out, _ = p.communicate() - out = decode_text_stream(out) + out = p.communicate()[0].decode() if p.returncode: return False # expect exactly one line of output: the branch name @@ -1349,7 +1338,7 @@ def generatePointer(self, contentFile): ['git', 'lfs', 'pointer', '--file=' + contentFile], stdout=subprocess.PIPE ) - pointerFile = decode_text_stream(pointerProcess.stdout.read()) + pointerFile = pointerProcess.stdout.read().decode() if pointerProcess.wait(): os.remove(contentFile) die('git-lfs pointer command failed. Did you install the extension?') @@ -2148,7 +2137,7 @@ def applyCommit(self, id): tmpFile = os.fdopen(handle, "w+b") if self.isWindows: submitTemplate = submitTemplate.replace("\n", "\r\n") - tmpFile.write(encode_text_stream(submitTemplate)) + tmpFile.write(submitTemplate.encode()) tmpFile.close() submitted = False @@ -2204,8 +2193,8 @@ def applyCommit(self, id): return False # read the edited message and submit - tmpFile = open(fileName, "rb") - message = decode_text_stream(tmpFile.read()) + with open(fileName, "r") as tmpFile: + message = tmpFile.read() tmpFile.close() if self.isWindows: message = message.replace("\r\n", "\n") @@ -2905,7 +2894,7 @@ def splitFilesIntoBranches(self, commit): return branches def writeToGitStream(self, gitMode, relPath, contents): - self.gitStream.write(encode_text_stream(u'M {} inline {}\n'.format(gitMode, relPath))) + self.gitStream.write('M {} inline {}\n'.format(gitMode, relPath)) self.gitStream.write('data %d\n' % sum(len(d) for d in contents)) for d in contents: self.gitStream.write(d) @@ -2947,7 +2936,7 @@ def streamOneP4File(self, file, contents): git_mode = "120000" # p4 print on a symlink sometimes contains "target\n"; # if it does, remove the newline - data = ''.join(decode_text_stream(c) for c in contents) + data = ''.join(c.decode() for c in contents) if not data: # Some version of p4 allowed creating a symlink that pointed # to nothing. This causes p4 errors when checking out such @@ -3001,9 +2990,9 @@ def streamOneP4File(self, file, contents): pattern = p4_keywords_regexp_for_type(type_base, type_mods) if pattern: regexp = re.compile(pattern, re.VERBOSE) - text = ''.join(decode_text_stream(c) for c in contents) + text = ''.join(c.decode() for c in contents) text = regexp.sub(r'$\1$', text) - contents = [ encode_text_stream(text) ] + contents = [text.encode()] if self.largeFileSystem: (git_mode, contents) = self.largeFileSystem.processContent(git_mode, relPath, contents) @@ -3015,7 +3004,7 @@ def streamOneP4Deletion(self, file): if verbose: sys.stdout.write("delete %s\n" % relPath) sys.stdout.flush() - self.gitStream.write(encode_text_stream(u'D {}\n'.format(relPath))) + self.gitStream.write('D {}\n'.format(relPath)) if self.largeFileSystem and self.largeFileSystem.isLargeFile(relPath): self.largeFileSystem.removeLargeFile(relPath) @@ -3115,9 +3104,9 @@ def streamP4FilesCbSelf(entry): if 'shelved_cl' in f: # Handle shelved CLs using the "p4 print file@=N" syntax to print # the contents - fileArg = f['path'] + encode_text_stream('@={}'.format(f['shelved_cl'])) + fileArg = f['path'] + '@={}'.format(f['shelved_cl']).encode() else: - fileArg = f['path'] + encode_text_stream('#{}'.format(f['rev'])) + fileArg = f['path'] + '#{}'.format(f['rev']).encode() fileArgs.append(fileArg) -- 2.33.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 3/3] git-p4: add "Nvidia Corporation" to copyright header 2021-12-10 15:30 [PATCH v2 0/3] Transition git-p4.py to support Python 3 only Joel Holdsworth 2021-12-10 15:30 ` [PATCH v2 1/3] git-p4: remove support for Python 2 Joel Holdsworth 2021-12-10 15:31 ` [PATCH v2 2/3] git-p4: eliminate decode_stream and encode_stream Joel Holdsworth @ 2021-12-10 15:31 ` Joel Holdsworth 2021-12-10 18:57 ` Luke Diamand 2021-12-11 21:19 ` Elijah Newren 2 siblings, 2 replies; 10+ messages in thread From: Joel Holdsworth @ 2021-12-10 15:31 UTC (permalink / raw) To: git Cc: Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin, Johannes Schindelin, Luke Diamand, Ben Keene, Andrew Oakley, Joel Holdsworth The inclusion of the coorporate copyright is a stipulation of the company code release process. --- git-p4.py | 1 + 1 file changed, 1 insertion(+) diff --git a/git-p4.py b/git-p4.py index 5568d44c72..17e18265dc 100755 --- a/git-p4.py +++ b/git-p4.py @@ -5,6 +5,7 @@ # Author: Simon Hausmann <simon@lst.de> # Copyright: 2007 Simon Hausmann <simon@lst.de> # 2007 Trolltech ASA +# 2021 Nvidia Corporation # License: MIT <http://www.opensource.org/licenses/mit-license.php> # # pylint: disable=invalid-name,missing-docstring,too-many-arguments,broad-except -- 2.33.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 3/3] git-p4: add "Nvidia Corporation" to copyright header 2021-12-10 15:31 ` [PATCH v2 3/3] git-p4: add "Nvidia Corporation" to copyright header Joel Holdsworth @ 2021-12-10 18:57 ` Luke Diamand 2021-12-11 21:19 ` Elijah Newren 1 sibling, 0 replies; 10+ messages in thread From: Luke Diamand @ 2021-12-10 18:57 UTC (permalink / raw) To: Joel Holdsworth Cc: Git Users, Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin, Johannes Schindelin, Ben Keene, Andrew Oakley On Fri, 10 Dec 2021 at 15:31, Joel Holdsworth <jholdsworth@nvidia.com> wrote: > > The inclusion of the coorporate copyright is a stipulation of the > company code release process. This doesn't seem right to me. It seems very odd that there's a patch that just adds a copyright notice, with no further notice. What code does this cover and is now copyrighted? What are the license terms of the copyright holder? I think these things need to be made clear before this could be accepted. > --- > git-p4.py | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/git-p4.py b/git-p4.py > index 5568d44c72..17e18265dc 100755 > --- a/git-p4.py > +++ b/git-p4.py > @@ -5,6 +5,7 @@ > # Author: Simon Hausmann <simon@lst.de> > # Copyright: 2007 Simon Hausmann <simon@lst.de> > # 2007 Trolltech ASA > +# 2021 Nvidia Corporation > # License: MIT <http://www.opensource.org/licenses/mit-license.php> > # > # pylint: disable=invalid-name,missing-docstring,too-many-arguments,broad-except > -- > 2.33.0 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 3/3] git-p4: add "Nvidia Corporation" to copyright header 2021-12-10 15:31 ` [PATCH v2 3/3] git-p4: add "Nvidia Corporation" to copyright header Joel Holdsworth 2021-12-10 18:57 ` Luke Diamand @ 2021-12-11 21:19 ` Elijah Newren 2021-12-13 0:11 ` brian m. carlson 1 sibling, 1 reply; 10+ messages in thread From: Elijah Newren @ 2021-12-11 21:19 UTC (permalink / raw) To: Joel Holdsworth Cc: Git Mailing List, Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin, Johannes Schindelin, Luke Diamand, Ben Keene, Andrew Oakley On Fri, Dec 10, 2021 at 12:30 PM Joel Holdsworth <jholdsworth@nvidia.com> wrote: > > The inclusion of the coorporate copyright is a stipulation of the > company code release process. > --- > git-p4.py | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/git-p4.py b/git-p4.py > index 5568d44c72..17e18265dc 100755 > --- a/git-p4.py > +++ b/git-p4.py > @@ -5,6 +5,7 @@ > # Author: Simon Hausmann <simon@lst.de> > # Copyright: 2007 Simon Hausmann <simon@lst.de> > # 2007 Trolltech ASA > +# 2021 Nvidia Corporation > # License: MIT <http://www.opensource.org/licenses/mit-license.php> > # > # pylint: disable=invalid-name,missing-docstring,too-many-arguments,broad-except > -- > 2.33.0 Can we just get rid of all these copyright notices from all files in Git? They're obviously out-of-date and not even close to an accurate indicator of authorship. For example, builtin/branch.c has: * Copyright (c) 2006 Kristian Høgsberg <krh@redhat.com> * Based on git-branch.sh by Junio C Hamano. Kristian only authored 1 patch for this file (though that one patch was submitted and attributed to Lars Hjemli in c31820c26b ("Make git-branch a builtin", 2006-10-23) with a note in the commit message about Kristian being the real author). I added a simple replace object to change the author attribution on Kristian's commit back to him, and then... Looking at shortlog, there are 86 different authors, with the top 7 having this many commits: $ git shortlog -sn --no-merges -- builtin/branch.c builtin-branch.c | head -n 7 42 Junio C Hamano 37 Jeff King 30 Nguyễn Thái Ngọc Duy 16 Karthik Nayak 12 René Scharfe 11 Lars Hjemli 11 Ævar Arnfjörð Bjarmason Looking at git blame, there are 61 different authors who have lines of code surviving until today, with the top 7 being: $ git blame -C -C builtin/branch.c | awk '{print $3 " " $4}' | sort | uniq -c | sort -rn | head -n 7 139 (Junio C 129 (Nguyễn Thái 122 (Karthik Nayak 40 (Jeff King 38 (Kristian H�gsberg 36 (Sahil Dua 35 (René Scharfe So the copyright notice is horribly misleading at best. It also seems like the wrong way to figure out the answer to _any_ question I can think of. (Some examples: "Who can review my changes to this file?", "Who do I need to contact for permission to relicense?", "Who should I praise for doing the work of making this code so great for me?", etc.) -- in all cases, shortlog, log, and blame are better tools. Can we just git rid of these lines entirely? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 3/3] git-p4: add "Nvidia Corporation" to copyright header 2021-12-11 21:19 ` Elijah Newren @ 2021-12-13 0:11 ` brian m. carlson 0 siblings, 0 replies; 10+ messages in thread From: brian m. carlson @ 2021-12-13 0:11 UTC (permalink / raw) To: Elijah Newren Cc: Joel Holdsworth, Git Mailing List, Tzadik Vanderhoof, Dorgon Chang, Joachim Kuebart, Daniel Levin, Johannes Schindelin, Luke Diamand, Ben Keene, Andrew Oakley [-- Attachment #1: Type: text/plain, Size: 2350 bytes --] On 2021-12-11 at 21:19:18, Elijah Newren wrote: > On Fri, Dec 10, 2021 at 12:30 PM Joel Holdsworth <jholdsworth@nvidia.com> wrote: > > > > The inclusion of the coorporate copyright is a stipulation of the > > company code release process. > > --- > > git-p4.py | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/git-p4.py b/git-p4.py > > index 5568d44c72..17e18265dc 100755 > > --- a/git-p4.py > > +++ b/git-p4.py > > @@ -5,6 +5,7 @@ > > # Author: Simon Hausmann <simon@lst.de> > > # Copyright: 2007 Simon Hausmann <simon@lst.de> > > # 2007 Trolltech ASA > > +# 2021 Nvidia Corporation > > # License: MIT <http://www.opensource.org/licenses/mit-license.php> > > # > > # pylint: disable=invalid-name,missing-docstring,too-many-arguments,broad-except > > -- > > 2.33.0 > > Can we just git rid of these lines entirely? In the case of the MIT License, it is a condition of the license that the copyright notices be preserved, so no, we cannot remove them. Specifically, the first paragraph, which grants permissions states that they are "subject to the following conditions", one of which is as follows: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. The other is the total exclusion of warranty or liability. As for the rest of the codebase, the GPL v2 states that the exercise of copying and distribution is permitted, "provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty." I am not an attorney, but I'm pretty sure that it would not be permissible to remove a copyright notice unless the code to which it referred were no longer present and that would not count as publishing an appropriate copyright notice. However, that doesn't mean we need to add to them, but I will state that as a contributor who primarily contributes on his own time, I don't think it's unreasonable for a contributor to request that a copyright notice be applied where applicable as attribution, since that's the only compensation one receives for one's contributions. Such copyright notices could live in a central file for convenience, however. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/3] git-p4: remove support for Python 2
@ 2021-12-13 0:30 Tzadik Vanderhoof
0 siblings, 0 replies; 10+ messages in thread
From: Tzadik Vanderhoof @ 2021-12-13 0:30 UTC (permalink / raw)
To: Git List
> On Sun, Dec 12, 2021, 5:39 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> This summary makes sense, i.e. if the original SCM doesn't have a
> declared or consistent encoding then having no "encoding" header etc. in
> git likewise makes sense, and we should be trying to handle it in our
> output layer.
>
> [Snipped from above]:
>
> > It's not clear to me how "attempt to detect the encoding somehow" would
> > work. The first option therefore seems like the best choice.
>
> This really isn't possible to do in the general case, but you can get
> pretty far with heuristics.
>
> I already submitted a patch several months ago to introduce a "p4.fallbackEncoding" option. It got merged to at least the lowest branch, but I think it died at that point.
I did considerable research into the possible options at the time, and
I'm pretty sure the best approach would be:
Add an optional setting for the user to set the encoding.
When decoding, first try UTF-8. If that succeeds, then it's almost
certain that the encoding really is UTF-8. The nature of UTF-8 is that
non-UTF-8 text almost never just happens to be valid when decoded as
UTF-8.
If that fails, use the new setting if present.
This is what my patch does.
I think it would be better to go beyond that, and if it fails UTF-8,
and the new setting was not specified, then use some well- accepted
heuristic library to detect the encoding.
Frankly anything would be better than the current behavior, which is
to completely crash on the first non UTF-8 character encountered (at
least with Python 3).
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-12-13 0:30 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-12-10 15:30 [PATCH v2 0/3] Transition git-p4.py to support Python 3 only Joel Holdsworth 2021-12-10 15:30 ` [PATCH v2 1/3] git-p4: remove support for Python 2 Joel Holdsworth 2021-12-12 17:50 ` Andrew Oakley 2021-12-12 22:01 ` Ævar Arnfjörð Bjarmason 2021-12-10 15:31 ` [PATCH v2 2/3] git-p4: eliminate decode_stream and encode_stream Joel Holdsworth 2021-12-10 15:31 ` [PATCH v2 3/3] git-p4: add "Nvidia Corporation" to copyright header Joel Holdsworth 2021-12-10 18:57 ` Luke Diamand 2021-12-11 21:19 ` Elijah Newren 2021-12-13 0:11 ` brian m. carlson 2021-12-13 0:30 [PATCH v2 1/3] git-p4: remove support for Python 2 Tzadik Vanderhoof
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).