All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] git-p4: add "--path-encoding" option
@ 2015-08-31 15:40 larsxschneider
  2015-08-31 15:40 ` larsxschneider
  0 siblings, 1 reply; 5+ messages in thread
From: larsxschneider @ 2015-08-31 15:40 UTC (permalink / raw)
  To: git; +Cc: luke, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Hi,

I think I discovered a path encoding issue if you migrate P4 repositories that contain path names generated with Windows. I added a test case to prove my point. Character encoding is a complicated topic. Feedback is highly appreciated.

Thanks,
Lars

Lars Schneider (1):
  git-p4: add "--path-encoding" option

 Documentation/git-p4.txt        |  4 ++++
 git-p4.py                       |  6 ++++++
 t/t9821-git-p4-path-encoding.sh | 38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 48 insertions(+)
 create mode 100755 t/t9821-git-p4-path-encoding.sh

--
2.5.1.1.g36ff854

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] git-p4: add "--path-encoding" option
  2015-08-31 15:40 [PATCH] git-p4: add "--path-encoding" option larsxschneider
@ 2015-08-31 15:40 ` larsxschneider
  2015-08-31 17:40   ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: larsxschneider @ 2015-08-31 15:40 UTC (permalink / raw)
  To: git; +Cc: luke, Lars Schneider

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 3589 bytes --]

From: Lars Schneider <larsxschneider@gmail.com>

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 Documentation/git-p4.txt        |  4 ++++
 git-p4.py                       |  6 ++++++
 t/t9821-git-p4-path-encoding.sh | 38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 48 insertions(+)
 create mode 100755 t/t9821-git-p4-path-encoding.sh

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index 82aa5d6..98b6c0f 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -252,6 +252,10 @@ Git repository:
 	Use a client spec to find the list of interesting files in p4.
 	See the "CLIENT SPEC" section below.
 
+----path-encoding <encoding>::
+	The encoding to use when reading p4 client paths. With this option
+	non ASCII paths are properly stored in Git. For example, the encoding 'cp1252' is often used on Windows systems.
+
 -/ <path>::
 	Exclude selected depot paths when cloning or syncing.
 
diff --git a/git-p4.py b/git-p4.py
index 073f87b..2b3bfc4 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1981,6 +1981,8 @@ class P4Sync(Command, P4UserMap):
                 optparse.make_option("--silent", dest="silent", action="store_true"),
                 optparse.make_option("--detect-labels", dest="detectLabels", action="store_true"),
                 optparse.make_option("--import-labels", dest="importLabels", action="store_true"),
+                optparse.make_option("--path-encoding", dest="pathEncoding", type="string",
+                                     help="Encoding to use for paths"),
                 optparse.make_option("--import-local", dest="importIntoRemotes", action="store_false",
                                      help="Import into refs/heads/ , not refs/remotes"),
                 optparse.make_option("--max-changes", dest="maxChanges",
@@ -2025,6 +2027,7 @@ class P4Sync(Command, P4UserMap):
         self.clientSpecDirs = None
         self.tempBranches = []
         self.tempBranchLocation = "git-p4-tmp"
+        self.pathEncoding = None
 
         if gitConfig("git-p4.syncFromOrigin") == "false":
             self.syncWithOrigin = False
@@ -2213,6 +2216,9 @@ class P4Sync(Command, P4UserMap):
             text = regexp.sub(r'$\1$', text)
             contents = [ text ]
 
+        if self.pathEncoding:
+            relPath = relPath.decode(self.pathEncoding).encode('utf8', 'replace')
+
         self.gitStream.write("M %s inline %s\n" % (git_mode, relPath))
 
         # total length...
diff --git a/t/t9821-git-p4-path-encoding.sh b/t/t9821-git-p4-path-encoding.sh
new file mode 100755
index 0000000..f6bb79c
--- /dev/null
+++ b/t/t9821-git-p4-path-encoding.sh
@@ -0,0 +1,38 @@
+#!/bin/sh
+
+test_description='Clone repositories with non ASCII paths'
+
+. ./lib-git-p4.sh
+
+test_expect_success 'start p4d' '
+	start_p4d
+'
+
+test_expect_success 'Create a repo containing cp1251 encoded paths' '
+	cd "$cli" &&
+
+	FILENAME="$(echo "a-ä_o-ö_u-ü.txt" | iconv -f utf-8 -t cp1252)" &&
+	>"$FILENAME" &&
+	p4 add "$FILENAME" &&
+	p4 submit -d "test"
+'
+
+test_expect_success 'Clone repo containing cp1251 encoded paths' '
+	git p4 clone --destination="$git" --path-encoding=cp1252 //depot &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		git init . &&
+		cat >expect <<-\EOF &&
+		"a-\303\244_o-\303\266_u-\303\274.txt"
+		EOF
+		git ls-files >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'kill p4d' '
+	kill_p4d
+'
+
+test_done
-- 
2.5.1.1.g36ff854

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] git-p4: add "--path-encoding" option
  2015-08-31 15:40 ` larsxschneider
@ 2015-08-31 17:40   ` Junio C Hamano
  2015-08-31 19:22     ` Torsten Bögershausen
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2015-08-31 17:40 UTC (permalink / raw)
  To: larsxschneider; +Cc: git, luke

larsxschneider@gmail.com writes:

> From: Lars Schneider <larsxschneider@gmail.com>
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---
>  Documentation/git-p4.txt        |  4 ++++
>  git-p4.py                       |  6 ++++++
>  t/t9821-git-p4-path-encoding.sh | 38 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 48 insertions(+)
>  create mode 100755 t/t9821-git-p4-path-encoding.sh
>
> diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
> index 82aa5d6..98b6c0f 100644
> --- a/Documentation/git-p4.txt
> +++ b/Documentation/git-p4.txt
> @@ -252,6 +252,10 @@ Git repository:
>  	Use a client spec to find the list of interesting files in p4.
>  	See the "CLIENT SPEC" section below.
>  
> +----path-encoding <encoding>::
> +	The encoding to use when reading p4 client paths. With this option
> +	non ASCII paths are properly stored in Git. For example, the encoding 'cp1252' is often used on Windows systems.
> +

This line is overly long.  Let AsciiDoc wrap it upon output and keep
the source within a reasonable limit (see existing lines around the
new text to see what is considered reasonable).

Do I see too many dashes before the option name, by the way, or is
it my e-mail client tricking my eyes?

> diff --git a/t/t9821-git-p4-path-encoding.sh b/t/t9821-git-p4-path-encoding.sh
> new file mode 100755
> index 0000000..f6bb79c
> --- /dev/null
> +++ b/t/t9821-git-p4-path-encoding.sh
> @@ -0,0 +1,38 @@
> +#!/bin/sh
> +
> +test_description='Clone repositories with non ASCII paths'
> +
> +. ./lib-git-p4.sh
> +
> +test_expect_success 'start p4d' '
> +	start_p4d
> +'
> +
> +test_expect_success 'Create a repo containing cp1251 encoded paths' '
> +	cd "$cli" &&
> +
> +	FILENAME="$(echo "a-¤_o-¶_u-¼.txt" | iconv -f utf-8 -t cp1252)" &&

Hmm, we'd be better off not having a bare UTF-8 sequence in the
source like this, especially when you already have the same thing
backslash-escaped in the "expect" file below.  Perhaps

	NAME="a-\303\244_o-\303\266_u-\303\274.txt" &&

	UTF8=$(printf "$NAME") &&
        CP1252=$(printf "$NAME" | iconv -t cp1252) &&
        echo "\"$UTF8\"" >expect &&

        >"$CP1252" &&
        p4 add "$CP1252" &&
        ...

or something along that line?

> +	>"$FILENAME" &&
> +	p4 add "$FILENAME" &&
> +	p4 submit -d "test"
> +'
> +
> +test_expect_success 'Clone repo containing cp1251 encoded paths' '
> +	git p4 clone --destination="$git" --path-encoding=cp1252 //depot &&
> +	test_when_finished cleanup_git &&
> +	(
> +		cd "$git" &&
> +		git init . &&
> +		cat >expect <<-\EOF &&
> +		"a-\303\244_o-\303\266_u-\303\274.txt"
> +		EOF
> +		git ls-files >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
> +test_expect_success 'kill p4d' '
> +	kill_p4d
> +'
> +
> +test_done

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] git-p4: add "--path-encoding" option
  2015-08-31 17:40   ` Junio C Hamano
@ 2015-08-31 19:22     ` Torsten Bögershausen
  2015-08-31 20:09       ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Torsten Bögershausen @ 2015-08-31 19:22 UTC (permalink / raw)
  To: Junio C Hamano, larsxschneider; +Cc: git, luke

On 2015-08-31 19.40, Junio C Hamano wrote:
> larsxschneider@gmail.com writes:

>> +test_expect_success 'Create a repo containing cp1251 encoded paths' '
>> +	cd "$cli" &&
>> +
>> +	FILENAME="$(echo "a-¤_o-¶_u-¼.txt" | iconv -f utf-8 -t cp1252)" &&
> 
> Hmm, we'd be better off not having a bare UTF-8 sequence in the
> source like this, especially when you already have the same thing
> backslash-escaped in the "expect" file below.  Perhaps
> 
> 	NAME="a-\303\244_o-\303\266_u-\303\274.txt" &&
> 
> 	UTF8=$(printf "$NAME") &&
>         CP1252=$(printf "$NAME" | iconv -t cp1252) &&
>         echo "\"$UTF8\"" >expect &&
> 
>         >"$CP1252" &&
>         p4 add "$CP1252" &&
>         ...
> 
Using file names and iconv like this may not be portable:
- cp1252 may be called CP1252 (or may not be available)
- reading from stdin is not necessarily supported by iconv
- creating files in CP1252 may not be supported under Mac OS
   (Not sure about Windows)


One solution could be to use ISO-8859-1, convert into UTF-8,
and "convert into UTF-8" one more time.

We can skip using iconv in the test case completely, and use
something like this:
(Fully untested)

UTF8=$(printf '\303\203\302\204')
NAME=$(printf '\303\204')

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] git-p4: add "--path-encoding" option
  2015-08-31 19:22     ` Torsten Bögershausen
@ 2015-08-31 20:09       ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2015-08-31 20:09 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: larsxschneider, git, luke

Torsten Bögershausen <tboegi@web.de> writes:

> On 2015-08-31 19.40, Junio C Hamano wrote:
>> larsxschneider@gmail.com writes:
>
>>> +test_expect_success 'Create a repo containing cp1251 encoded paths' '
>>> +	cd "$cli" &&
>>> +
>>> +	FILENAME="$(echo "a-¤_o-¶_u-¼.txt" | iconv -f utf-8 -t cp1252)" &&
>>  ...
> Using file names and iconv like this may not be portable:
> - cp1252 may be called CP1252 (or may not be available)

"git grep 'cp[0-9]' t/" does tell us that we refrain from using them
and I am sure the portability worries is a big reason.  Thank you
for pointing it out.

> - reading from stdin is not necessarily supported by iconv

"git grep '| iconv' t/" tells me that this is irrelevant; we already
heavily depend on it.

> - creating files in CP1252 may not be supported under Mac OS
>    (Not sure about Windows)

The same as the first point, which is a good thing to worry about.

> One solution could be to use ISO-8859-1, convert into UTF-8,
> and "convert into UTF-8" one more time.

I do not quite get it; do you need to do anything more than just
replacing cp1252 with iso-8859-1 in the patch being discussed?

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-08-31 20:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-31 15:40 [PATCH] git-p4: add "--path-encoding" option larsxschneider
2015-08-31 15:40 ` larsxschneider
2015-08-31 17:40   ` Junio C Hamano
2015-08-31 19:22     ` Torsten Bögershausen
2015-08-31 20:09       ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.