* [PATCH v1] git-p4: fix git-p4.pathEncoding for removed files @ 2016-12-18 17:51 larsxschneider 2016-12-19 21:29 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: larsxschneider @ 2016-12-18 17:51 UTC (permalink / raw) To: git; +Cc: luke, gitster, Lars Schneider From: Lars Schneider <larsxschneider@gmail.com> In a9e38359e3 we taught git-p4 a way to re-encode path names from what was used in Perforce to UTF-8. This path re-encoding worked properly for "added" paths. "Removed" paths were not re-encoded and therefore different from the "added" paths. Consequently, these files were not removed in a git-p4 cloned Git repository because the path names did not match. Fix this by moving the re-encoding to a place that affects "added" and "removed" paths. Add a test to demonstrate the issue. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> --- Notes: Base Commit: d1271bddd4 (v2.11.0) Diff on Web: https://github.com/git/git/compare/d1271bddd4...larsxschneider:05a82caa69 Checkout: git fetch https://github.com/larsxschneider/git git-p4/fix-path-encoding-v1 && git checkout 05a82caa69 git-p4.py | 19 +++++++++---------- t/t9822-git-p4-path-encoding.sh | 16 ++++++++++++++++ 2 files changed, 25 insertions(+), 10 deletions(-) diff --git a/git-p4.py b/git-p4.py index fd5ca52462..8f311cb4e8 100755 --- a/git-p4.py +++ b/git-p4.py @@ -2366,6 +2366,15 @@ class P4Sync(Command, P4UserMap): break path = wildcard_decode(path) + try: + path.decode('ascii') + except: + encoding = 'utf8' + if gitConfig('git-p4.pathEncoding'): + encoding = gitConfig('git-p4.pathEncoding') + path = path.decode(encoding, 'replace').encode('utf8', 'replace') + if self.verbose: + print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path) return path def splitFilesIntoBranches(self, commit): @@ -2495,16 +2504,6 @@ class P4Sync(Command, P4UserMap): text = regexp.sub(r'$\1$', text) contents = [ text ] - try: - relPath.decode('ascii') - except: - encoding = 'utf8' - if gitConfig('git-p4.pathEncoding'): - encoding = gitConfig('git-p4.pathEncoding') - relPath = relPath.decode(encoding, 'replace').encode('utf8', 'replace') - if self.verbose: - print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, relPath) - if self.largeFileSystem: (git_mode, contents) = self.largeFileSystem.processContent(git_mode, relPath, contents) diff --git a/t/t9822-git-p4-path-encoding.sh b/t/t9822-git-p4-path-encoding.sh index 7b83e696a9..c78477c19b 100755 --- a/t/t9822-git-p4-path-encoding.sh +++ b/t/t9822-git-p4-path-encoding.sh @@ -51,6 +51,22 @@ test_expect_success 'Clone repo containing iso8859-1 encoded paths with git-p4.p ) ' +test_expect_success 'Delete iso8859-1 encoded paths and clone' ' + ( + cd "$cli" && + ISO8859="$(printf "$ISO8859_ESCAPED")" && + p4 delete "$ISO8859" && + p4 submit -d "remove file" + ) && + git p4 clone --destination="$git" //depot@all && + test_when_finished cleanup_git && + ( + cd "$git" && + git -c core.quotepath=false ls-files >actual && + test_must_be_empty actual + ) +' + test_expect_success 'kill p4d' ' kill_p4d ' -- 2.11.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v1] git-p4: fix git-p4.pathEncoding for removed files 2016-12-18 17:51 [PATCH v1] git-p4: fix git-p4.pathEncoding for removed files larsxschneider @ 2016-12-19 21:29 ` Junio C Hamano 2016-12-20 11:01 ` Luke Diamand 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2016-12-19 21:29 UTC (permalink / raw) To: larsxschneider; +Cc: git, luke larsxschneider@gmail.com writes: > From: Lars Schneider <larsxschneider@gmail.com> > > In a9e38359e3 we taught git-p4 a way to re-encode path names from what > was used in Perforce to UTF-8. This path re-encoding worked properly for > "added" paths. "Removed" paths were not re-encoded and therefore > different from the "added" paths. Consequently, these files were not > removed in a git-p4 cloned Git repository because the path names did not > match. > > Fix this by moving the re-encoding to a place that affects "added" and > "removed" paths. Add a test to demonstrate the issue. > > Signed-off-by: Lars Schneider <larsxschneider@gmail.com> > --- Thanks. The above description makes me wonder what happens to "modified" paths, but presumably they are handled in a separate codepath? Or does this also cover not just "removed" but also paths with any change? Luke, does this look good? > Notes: > Base Commit: d1271bddd4 (v2.11.0) > Diff on Web: https://github.com/git/git/compare/d1271bddd4...larsxschneider:05a82caa69 > Checkout: git fetch https://github.com/larsxschneider/git git-p4/fix-path-encoding-v1 && git checkout 05a82caa69 > > git-p4.py | 19 +++++++++---------- > t/t9822-git-p4-path-encoding.sh | 16 ++++++++++++++++ > 2 files changed, 25 insertions(+), 10 deletions(-) > > diff --git a/git-p4.py b/git-p4.py > index fd5ca52462..8f311cb4e8 100755 > --- a/git-p4.py > +++ b/git-p4.py > @@ -2366,6 +2366,15 @@ class P4Sync(Command, P4UserMap): > break > > path = wildcard_decode(path) > + try: > + path.decode('ascii') > + except: > + encoding = 'utf8' > + if gitConfig('git-p4.pathEncoding'): > + encoding = gitConfig('git-p4.pathEncoding') > + path = path.decode(encoding, 'replace').encode('utf8', 'replace') > + if self.verbose: > + print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path) > return path > > def splitFilesIntoBranches(self, commit): > @@ -2495,16 +2504,6 @@ class P4Sync(Command, P4UserMap): > text = regexp.sub(r'$\1$', text) > contents = [ text ] > > - try: > - relPath.decode('ascii') > - except: > - encoding = 'utf8' > - if gitConfig('git-p4.pathEncoding'): > - encoding = gitConfig('git-p4.pathEncoding') > - relPath = relPath.decode(encoding, 'replace').encode('utf8', 'replace') > - if self.verbose: > - print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, relPath) > - > if self.largeFileSystem: > (git_mode, contents) = self.largeFileSystem.processContent(git_mode, relPath, contents) > > diff --git a/t/t9822-git-p4-path-encoding.sh b/t/t9822-git-p4-path-encoding.sh > index 7b83e696a9..c78477c19b 100755 > --- a/t/t9822-git-p4-path-encoding.sh > +++ b/t/t9822-git-p4-path-encoding.sh > @@ -51,6 +51,22 @@ test_expect_success 'Clone repo containing iso8859-1 encoded paths with git-p4.p > ) > ' > > +test_expect_success 'Delete iso8859-1 encoded paths and clone' ' > + ( > + cd "$cli" && > + ISO8859="$(printf "$ISO8859_ESCAPED")" && > + p4 delete "$ISO8859" && > + p4 submit -d "remove file" > + ) && > + git p4 clone --destination="$git" //depot@all && > + test_when_finished cleanup_git && > + ( > + cd "$git" && > + git -c core.quotepath=false ls-files >actual && > + test_must_be_empty actual > + ) > +' > + > test_expect_success 'kill p4d' ' > kill_p4d > ' ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] git-p4: fix git-p4.pathEncoding for removed files 2016-12-19 21:29 ` Junio C Hamano @ 2016-12-20 11:01 ` Luke Diamand 2016-12-22 21:23 ` Junio C Hamano 2017-02-09 15:06 ` [PATCH v2] " Lars Schneider 0 siblings, 2 replies; 8+ messages in thread From: Luke Diamand @ 2016-12-20 11:01 UTC (permalink / raw) To: Junio C Hamano; +Cc: Lars Schneider, Git Users On 19 December 2016 at 21:29, Junio C Hamano <gitster@pobox.com> wrote: > larsxschneider@gmail.com writes: > >> From: Lars Schneider <larsxschneider@gmail.com> >> >> In a9e38359e3 we taught git-p4 a way to re-encode path names from what >> was used in Perforce to UTF-8. This path re-encoding worked properly for >> "added" paths. "Removed" paths were not re-encoded and therefore >> different from the "added" paths. Consequently, these files were not >> removed in a git-p4 cloned Git repository because the path names did not >> match. >> >> Fix this by moving the re-encoding to a place that affects "added" and >> "removed" paths. Add a test to demonstrate the issue. >> >> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> >> --- > > Thanks. > > The above description makes me wonder what happens to "modified" > paths, but presumably they are handled in a separate codepath? Or > does this also cover not just "removed" but also paths with any > change? > > Luke, does this look good? I'm not totally sure. In the previous version the conversion happened in streamOneP4File(). There is a counterpart to this, streamOneP4Deletion() which would seem like the callpoint that needs to know about this. The change puts the logic into stripRepoPath() instead, which is indeed called from both of those functions (good), but also from splitFilesIntoBranches(), but only if self.useClientSpec is set. That function only gets used if we're doing the automatic branch detection logic, so it's possible that this code might now be broken and we wouldn't know. Lars, what do you think? Other than the above, the change looks good, so it may all be fine. (As an aside, this is the heart of the code that's going to need some careful rework if/when we ever move to Python3). Luke ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] git-p4: fix git-p4.pathEncoding for removed files 2016-12-20 11:01 ` Luke Diamand @ 2016-12-22 21:23 ` Junio C Hamano 2017-02-09 15:06 ` [PATCH v2] " Lars Schneider 1 sibling, 0 replies; 8+ messages in thread From: Junio C Hamano @ 2016-12-22 21:23 UTC (permalink / raw) To: Luke Diamand; +Cc: Lars Schneider, Git Users Luke Diamand <luke@diamand.org> writes: > The change puts the logic into stripRepoPath() instead, which is > indeed called from both of those functions (good), but also from > splitFilesIntoBranches(), but only if self.useClientSpec is set. That > function only gets used if we're doing the automatic branch detection > logic, so it's possible that this code might now be broken and we > wouldn't know. > > Lars, what do you think? Other than the above, the change looks good, > so it may all be fine. > > (As an aside, this is the heart of the code that's going to need some > careful rework if/when we ever move to Python3). Thanks. I'll merge this as-is to 'next', expecting that further refinement can be done incrementally. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2] git-p4: fix git-p4.pathEncoding for removed files 2016-12-20 11:01 ` Luke Diamand 2016-12-22 21:23 ` Junio C Hamano @ 2017-02-09 15:06 ` Lars Schneider 2017-02-09 23:39 ` Junio C Hamano 1 sibling, 1 reply; 8+ messages in thread From: Lars Schneider @ 2017-02-09 15:06 UTC (permalink / raw) To: git; +Cc: luke, gitster In a9e38359e3 we taught git-p4 a way to re-encode path names from what was used in Perforce to UTF-8. This path re-encoding worked properly for "added" paths. "Removed" paths were not re-encoded and therefore different from the "added" paths. Consequently, these files were not removed in a git-p4 cloned Git repository because the path names did not match. Fix this by moving the re-encoding to a place that affects "added" and "removed" paths. Add a test to demonstrate the issue. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> --- Hi, unfortunately, I missed to send this v2. I agree with Luke's review and I moved the re-encode of the path name to the `streamOneP4File` and `streamOneP4Deletion` explicitly. Discussion: http://public-inbox.org/git/CAE5ih7-=bD_ZoL5pFYfD2Qvy-XE24V_cgge0XoAvuoTK02EDfg@mail.gmail.com/ Thanks, Lars Notes: Base Commit: 454cb6bd52 (v2.11.0) Diff on Web: https://github.com/larsxschneider/git/commit/75ed3e92e2 Checkout: git fetch https://github.com/larsxschneider/git git-p4/fix-path-encoding-v2 && git checkout 75ed3e92e2 Interdiff (v1..v2): diff --git a/git-p4.py b/git-p4.py index 8f311cb4e8..dac8b4955d 100755 --- a/git-p4.py +++ b/git-p4.py @@ -2366,15 +2366,6 @@ class P4Sync(Command, P4UserMap): break path = wildcard_decode(path) - try: - path.decode('ascii') - except: - encoding = 'utf8' - if gitConfig('git-p4.pathEncoding'): - encoding = gitConfig('git-p4.pathEncoding') - path = path.decode(encoding, 'replace').encode('utf8', 'replace') - if self.verbose: - print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path) return path def splitFilesIntoBranches(self, commit): @@ -2427,11 +2418,24 @@ class P4Sync(Command, P4UserMap): self.gitStream.write(d) self.gitStream.write('\n') + def encodeWithUTF8(self, path): + try: + path.decode('ascii') + except: + encoding = 'utf8' + if gitConfig('git-p4.pathEncoding'): + encoding = gitConfig('git-p4.pathEncoding') + path = path.decode(encoding, 'replace').encode('utf8', 'replace') + if self.verbose: + print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path) + return path + # output one file from the P4 stream # - helper for streamP4Files def streamOneP4File(self, file, contents): relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes) + relPath = self.encodeWithUTF8(relPath) if verbose: size = int(self.stream_file['fileSize']) sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024)) @@ -2511,6 +2515,7 @@ class P4Sync(Command, P4UserMap): def streamOneP4Deletion(self, file): relPath = self.stripRepoPath(file['path'], self.branchPrefixes) + relPath = self.encodeWithUTF8(relPath) if verbose: sys.stdout.write("delete %s\n" % relPath) sys.stdout.flush() git-p4.py | 24 ++++++++++++++---------- t/t9822-git-p4-path-encoding.sh | 16 ++++++++++++++++ 2 files changed, 30 insertions(+), 10 deletions(-) diff --git a/git-p4.py b/git-p4.py index fd5ca52462..dac8b4955d 100755 --- a/git-p4.py +++ b/git-p4.py @@ -2418,11 +2418,24 @@ class P4Sync(Command, P4UserMap): self.gitStream.write(d) self.gitStream.write('\n') + def encodeWithUTF8(self, path): + try: + path.decode('ascii') + except: + encoding = 'utf8' + if gitConfig('git-p4.pathEncoding'): + encoding = gitConfig('git-p4.pathEncoding') + path = path.decode(encoding, 'replace').encode('utf8', 'replace') + if self.verbose: + print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, path) + return path + # output one file from the P4 stream # - helper for streamP4Files def streamOneP4File(self, file, contents): relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes) + relPath = self.encodeWithUTF8(relPath) if verbose: size = int(self.stream_file['fileSize']) sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], relPath, size/1024/1024)) @@ -2495,16 +2508,6 @@ class P4Sync(Command, P4UserMap): text = regexp.sub(r'$\1$', text) contents = [ text ] - try: - relPath.decode('ascii') - except: - encoding = 'utf8' - if gitConfig('git-p4.pathEncoding'): - encoding = gitConfig('git-p4.pathEncoding') - relPath = relPath.decode(encoding, 'replace').encode('utf8', 'replace') - if self.verbose: - print 'Path with non-ASCII characters detected. Used %s to encode: %s ' % (encoding, relPath) - if self.largeFileSystem: (git_mode, contents) = self.largeFileSystem.processContent(git_mode, relPath, contents) @@ -2512,6 +2515,7 @@ class P4Sync(Command, P4UserMap): def streamOneP4Deletion(self, file): relPath = self.stripRepoPath(file['path'], self.branchPrefixes) + relPath = self.encodeWithUTF8(relPath) if verbose: sys.stdout.write("delete %s\n" % relPath) sys.stdout.flush() diff --git a/t/t9822-git-p4-path-encoding.sh b/t/t9822-git-p4-path-encoding.sh index 7b83e696a9..c78477c19b 100755 --- a/t/t9822-git-p4-path-encoding.sh +++ b/t/t9822-git-p4-path-encoding.sh @@ -51,6 +51,22 @@ test_expect_success 'Clone repo containing iso8859-1 encoded paths with git-p4.p ) ' +test_expect_success 'Delete iso8859-1 encoded paths and clone' ' + ( + cd "$cli" && + ISO8859="$(printf "$ISO8859_ESCAPED")" && + p4 delete "$ISO8859" && + p4 submit -d "remove file" + ) && + git p4 clone --destination="$git" //depot@all && + test_when_finished cleanup_git && + ( + cd "$git" && + git -c core.quotepath=false ls-files >actual && + test_must_be_empty actual + ) +' + test_expect_success 'kill p4d' ' kill_p4d ' base-commit: 454cb6bd52a4de614a3633e4f547af03d5c3b640 -- 2.11.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2] git-p4: fix git-p4.pathEncoding for removed files 2017-02-09 15:06 ` [PATCH v2] " Lars Schneider @ 2017-02-09 23:39 ` Junio C Hamano 2017-02-10 22:05 ` Luke Diamand 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2017-02-09 23:39 UTC (permalink / raw) To: Lars Schneider; +Cc: git, luke Lars Schneider <larsxschneider@gmail.com> writes: > unfortunately, I missed to send this v2. I agree with Luke's review and > I moved the re-encode of the path name to the `streamOneP4File` and > `streamOneP4Deletion` explicitly. > > Discussion: > http://public-inbox.org/git/CAE5ih7-=bD_ZoL5pFYfD2Qvy-XE24V_cgge0XoAvuoTK02EDfg@mail.gmail.com/ > > Thanks, > Lars Thanks. Will replace but will not immediately merge to 'next' yet, just in case Luke wants to tell me add his "Reviewed-by:". ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] git-p4: fix git-p4.pathEncoding for removed files 2017-02-09 23:39 ` Junio C Hamano @ 2017-02-10 22:05 ` Luke Diamand 2017-02-10 22:32 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Luke Diamand @ 2017-02-10 22:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: Lars Schneider, Git Users On 9 February 2017 at 23:39, Junio C Hamano <gitster@pobox.com> wrote: > Lars Schneider <larsxschneider@gmail.com> writes: > >> unfortunately, I missed to send this v2. I agree with Luke's review and >> I moved the re-encode of the path name to the `streamOneP4File` and >> `streamOneP4Deletion` explicitly. >> >> Discussion: >> http://public-inbox.org/git/CAE5ih7-=bD_ZoL5pFYfD2Qvy-XE24V_cgge0XoAvuoTK02EDfg@mail.gmail.com/ >> >> Thanks, >> Lars > > Thanks. Will replace but will not immediately merge to 'next' yet, > just in case Luke wants to tell me add his "Reviewed-by:". Yes, this looks good to me now. Luke ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] git-p4: fix git-p4.pathEncoding for removed files 2017-02-10 22:05 ` Luke Diamand @ 2017-02-10 22:32 ` Junio C Hamano 0 siblings, 0 replies; 8+ messages in thread From: Junio C Hamano @ 2017-02-10 22:32 UTC (permalink / raw) To: Luke Diamand; +Cc: Lars Schneider, Git Users Luke Diamand <luke@diamand.org> writes: > On 9 February 2017 at 23:39, Junio C Hamano <gitster@pobox.com> wrote: >> Lars Schneider <larsxschneider@gmail.com> writes: >> >>> unfortunately, I missed to send this v2. I agree with Luke's review and >>> I moved the re-encode of the path name to the `streamOneP4File` and >>> `streamOneP4Deletion` explicitly. >>> >>> Discussion: >>> http://public-inbox.org/git/CAE5ih7-=bD_ZoL5pFYfD2Qvy-XE24V_cgge0XoAvuoTK02EDfg@mail.gmail.com/ >>> >>> Thanks, >>> Lars >> >> Thanks. Will replace but will not immediately merge to 'next' yet, >> just in case Luke wants to tell me add his "Reviewed-by:". > > Yes, this looks good to me now. Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-02-10 22:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-12-18 17:51 [PATCH v1] git-p4: fix git-p4.pathEncoding for removed files larsxschneider 2016-12-19 21:29 ` Junio C Hamano 2016-12-20 11:01 ` Luke Diamand 2016-12-22 21:23 ` Junio C Hamano 2017-02-09 15:06 ` [PATCH v2] " Lars Schneider 2017-02-09 23:39 ` Junio C Hamano 2017-02-10 22:05 ` Luke Diamand 2017-02-10 22:32 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).