* [PATCH] git-p4: add "--path-encoding" option
@ 2015-08-31 15:40 larsxschneider
2015-08-31 15:40 ` larsxschneider
0 siblings, 1 reply; 5+ messages in thread
From: larsxschneider @ 2015-08-31 15:40 UTC (permalink / raw)
To: git; +Cc: luke, Lars Schneider
From: Lars Schneider <larsxschneider@gmail.com>
Hi,
I think I discovered a path encoding issue if you migrate P4 repositories that contain path names generated with Windows. I added a test case to prove my point. Character encoding is a complicated topic. Feedback is highly appreciated.
Thanks,
Lars
Lars Schneider (1):
git-p4: add "--path-encoding" option
Documentation/git-p4.txt | 4 ++++
git-p4.py | 6 ++++++
t/t9821-git-p4-path-encoding.sh | 38 ++++++++++++++++++++++++++++++++++++++
3 files changed, 48 insertions(+)
create mode 100755 t/t9821-git-p4-path-encoding.sh
--
2.5.1.1.g36ff854
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] git-p4: add "--path-encoding" option
2015-08-31 15:40 [PATCH] git-p4: add "--path-encoding" option larsxschneider
@ 2015-08-31 15:40 ` larsxschneider
2015-08-31 17:40 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: larsxschneider @ 2015-08-31 15:40 UTC (permalink / raw)
To: git; +Cc: luke, Lars Schneider
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 3589 bytes --]
From: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
Documentation/git-p4.txt | 4 ++++
git-p4.py | 6 ++++++
t/t9821-git-p4-path-encoding.sh | 38 ++++++++++++++++++++++++++++++++++++++
3 files changed, 48 insertions(+)
create mode 100755 t/t9821-git-p4-path-encoding.sh
diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index 82aa5d6..98b6c0f 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -252,6 +252,10 @@ Git repository:
Use a client spec to find the list of interesting files in p4.
See the "CLIENT SPEC" section below.
+----path-encoding <encoding>::
+ The encoding to use when reading p4 client paths. With this option
+ non ASCII paths are properly stored in Git. For example, the encoding 'cp1252' is often used on Windows systems.
+
-/ <path>::
Exclude selected depot paths when cloning or syncing.
diff --git a/git-p4.py b/git-p4.py
index 073f87b..2b3bfc4 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1981,6 +1981,8 @@ class P4Sync(Command, P4UserMap):
optparse.make_option("--silent", dest="silent", action="store_true"),
optparse.make_option("--detect-labels", dest="detectLabels", action="store_true"),
optparse.make_option("--import-labels", dest="importLabels", action="store_true"),
+ optparse.make_option("--path-encoding", dest="pathEncoding", type="string",
+ help="Encoding to use for paths"),
optparse.make_option("--import-local", dest="importIntoRemotes", action="store_false",
help="Import into refs/heads/ , not refs/remotes"),
optparse.make_option("--max-changes", dest="maxChanges",
@@ -2025,6 +2027,7 @@ class P4Sync(Command, P4UserMap):
self.clientSpecDirs = None
self.tempBranches = []
self.tempBranchLocation = "git-p4-tmp"
+ self.pathEncoding = None
if gitConfig("git-p4.syncFromOrigin") == "false":
self.syncWithOrigin = False
@@ -2213,6 +2216,9 @@ class P4Sync(Command, P4UserMap):
text = regexp.sub(r'$\1$', text)
contents = [ text ]
+ if self.pathEncoding:
+ relPath = relPath.decode(self.pathEncoding).encode('utf8', 'replace')
+
self.gitStream.write("M %s inline %s\n" % (git_mode, relPath))
# total length...
diff --git a/t/t9821-git-p4-path-encoding.sh b/t/t9821-git-p4-path-encoding.sh
new file mode 100755
index 0000000..f6bb79c
--- /dev/null
+++ b/t/t9821-git-p4-path-encoding.sh
@@ -0,0 +1,38 @@
+#!/bin/sh
+
+test_description='Clone repositories with non ASCII paths'
+
+. ./lib-git-p4.sh
+
+test_expect_success 'start p4d' '
+ start_p4d
+'
+
+test_expect_success 'Create a repo containing cp1251 encoded paths' '
+ cd "$cli" &&
+
+ FILENAME="$(echo "a-ä_o-ö_u-ü.txt" | iconv -f utf-8 -t cp1252)" &&
+ >"$FILENAME" &&
+ p4 add "$FILENAME" &&
+ p4 submit -d "test"
+'
+
+test_expect_success 'Clone repo containing cp1251 encoded paths' '
+ git p4 clone --destination="$git" --path-encoding=cp1252 //depot &&
+ test_when_finished cleanup_git &&
+ (
+ cd "$git" &&
+ git init . &&
+ cat >expect <<-\EOF &&
+ "a-\303\244_o-\303\266_u-\303\274.txt"
+ EOF
+ git ls-files >actual &&
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'kill p4d' '
+ kill_p4d
+'
+
+test_done
--
2.5.1.1.g36ff854
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] git-p4: add "--path-encoding" option
2015-08-31 15:40 ` larsxschneider
@ 2015-08-31 17:40 ` Junio C Hamano
2015-08-31 19:22 ` Torsten Bögershausen
0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2015-08-31 17:40 UTC (permalink / raw)
To: larsxschneider; +Cc: git, luke
larsxschneider@gmail.com writes:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---
> Documentation/git-p4.txt | 4 ++++
> git-p4.py | 6 ++++++
> t/t9821-git-p4-path-encoding.sh | 38 ++++++++++++++++++++++++++++++++++++++
> 3 files changed, 48 insertions(+)
> create mode 100755 t/t9821-git-p4-path-encoding.sh
>
> diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
> index 82aa5d6..98b6c0f 100644
> --- a/Documentation/git-p4.txt
> +++ b/Documentation/git-p4.txt
> @@ -252,6 +252,10 @@ Git repository:
> Use a client spec to find the list of interesting files in p4.
> See the "CLIENT SPEC" section below.
>
> +----path-encoding <encoding>::
> + The encoding to use when reading p4 client paths. With this option
> + non ASCII paths are properly stored in Git. For example, the encoding 'cp1252' is often used on Windows systems.
> +
This line is overly long. Let AsciiDoc wrap it upon output and keep
the source within a reasonable limit (see existing lines around the
new text to see what is considered reasonable).
Do I see too many dashes before the option name, by the way, or is
it my e-mail client tricking my eyes?
> diff --git a/t/t9821-git-p4-path-encoding.sh b/t/t9821-git-p4-path-encoding.sh
> new file mode 100755
> index 0000000..f6bb79c
> --- /dev/null
> +++ b/t/t9821-git-p4-path-encoding.sh
> @@ -0,0 +1,38 @@
> +#!/bin/sh
> +
> +test_description='Clone repositories with non ASCII paths'
> +
> +. ./lib-git-p4.sh
> +
> +test_expect_success 'start p4d' '
> + start_p4d
> +'
> +
> +test_expect_success 'Create a repo containing cp1251 encoded paths' '
> + cd "$cli" &&
> +
> + FILENAME="$(echo "a-¤_o-¶_u-¼.txt" | iconv -f utf-8 -t cp1252)" &&
Hmm, we'd be better off not having a bare UTF-8 sequence in the
source like this, especially when you already have the same thing
backslash-escaped in the "expect" file below. Perhaps
NAME="a-\303\244_o-\303\266_u-\303\274.txt" &&
UTF8=$(printf "$NAME") &&
CP1252=$(printf "$NAME" | iconv -t cp1252) &&
echo "\"$UTF8\"" >expect &&
>"$CP1252" &&
p4 add "$CP1252" &&
...
or something along that line?
> + >"$FILENAME" &&
> + p4 add "$FILENAME" &&
> + p4 submit -d "test"
> +'
> +
> +test_expect_success 'Clone repo containing cp1251 encoded paths' '
> + git p4 clone --destination="$git" --path-encoding=cp1252 //depot &&
> + test_when_finished cleanup_git &&
> + (
> + cd "$git" &&
> + git init . &&
> + cat >expect <<-\EOF &&
> + "a-\303\244_o-\303\266_u-\303\274.txt"
> + EOF
> + git ls-files >actual &&
> + test_cmp expect actual
> + )
> +'
> +
> +test_expect_success 'kill p4d' '
> + kill_p4d
> +'
> +
> +test_done
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] git-p4: add "--path-encoding" option
2015-08-31 17:40 ` Junio C Hamano
@ 2015-08-31 19:22 ` Torsten Bögershausen
2015-08-31 20:09 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: Torsten Bögershausen @ 2015-08-31 19:22 UTC (permalink / raw)
To: Junio C Hamano, larsxschneider; +Cc: git, luke
On 2015-08-31 19.40, Junio C Hamano wrote:
> larsxschneider@gmail.com writes:
>> +test_expect_success 'Create a repo containing cp1251 encoded paths' '
>> + cd "$cli" &&
>> +
>> + FILENAME="$(echo "a-¤_o-¶_u-¼.txt" | iconv -f utf-8 -t cp1252)" &&
>
> Hmm, we'd be better off not having a bare UTF-8 sequence in the
> source like this, especially when you already have the same thing
> backslash-escaped in the "expect" file below. Perhaps
>
> NAME="a-\303\244_o-\303\266_u-\303\274.txt" &&
>
> UTF8=$(printf "$NAME") &&
> CP1252=$(printf "$NAME" | iconv -t cp1252) &&
> echo "\"$UTF8\"" >expect &&
>
> >"$CP1252" &&
> p4 add "$CP1252" &&
> ...
>
Using file names and iconv like this may not be portable:
- cp1252 may be called CP1252 (or may not be available)
- reading from stdin is not necessarily supported by iconv
- creating files in CP1252 may not be supported under Mac OS
(Not sure about Windows)
One solution could be to use ISO-8859-1, convert into UTF-8,
and "convert into UTF-8" one more time.
We can skip using iconv in the test case completely, and use
something like this:
(Fully untested)
UTF8=$(printf '\303\203\302\204')
NAME=$(printf '\303\204')
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] git-p4: add "--path-encoding" option
2015-08-31 19:22 ` Torsten Bögershausen
@ 2015-08-31 20:09 ` Junio C Hamano
0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2015-08-31 20:09 UTC (permalink / raw)
To: Torsten Bögershausen; +Cc: larsxschneider, git, luke
Torsten Bögershausen <tboegi@web.de> writes:
> On 2015-08-31 19.40, Junio C Hamano wrote:
>> larsxschneider@gmail.com writes:
>
>>> +test_expect_success 'Create a repo containing cp1251 encoded paths' '
>>> + cd "$cli" &&
>>> +
>>> + FILENAME="$(echo "a-¤_o-¶_u-¼.txt" | iconv -f utf-8 -t cp1252)" &&
>> ...
> Using file names and iconv like this may not be portable:
> - cp1252 may be called CP1252 (or may not be available)
"git grep 'cp[0-9]' t/" does tell us that we refrain from using them
and I am sure the portability worries is a big reason. Thank you
for pointing it out.
> - reading from stdin is not necessarily supported by iconv
"git grep '| iconv' t/" tells me that this is irrelevant; we already
heavily depend on it.
> - creating files in CP1252 may not be supported under Mac OS
> (Not sure about Windows)
The same as the first point, which is a good thing to worry about.
> One solution could be to use ISO-8859-1, convert into UTF-8,
> and "convert into UTF-8" one more time.
I do not quite get it; do you need to do anything more than just
replacing cp1252 with iso-8859-1 in the patch being discussed?
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-08-31 20:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-31 15:40 [PATCH] git-p4: add "--path-encoding" option larsxschneider
2015-08-31 15:40 ` larsxschneider
2015-08-31 17:40 ` Junio C Hamano
2015-08-31 19:22 ` Torsten Bögershausen
2015-08-31 20:09 ` Junio C Hamano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.