All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/8] Antique UTF-8 filename support
@ 2009-05-12 22:50 Robin Rosenberg
  2009-05-12 22:50 ` [RFC 1/8] UTF helpers Robin Rosenberg
  0 siblings, 1 reply; 20+ messages in thread
From: Robin Rosenberg @ 2009-05-12 22:50 UTC (permalink / raw)
  To: git; +Cc: Robin Rosenberg

From: Robin Rosenberg <robin.rosenberg@gmail.com>

Since there is some interest in the topic, now, I can republish my old 2 ½ year old
patches so there is some real code to comment on. They apply on top of
6dcfa306f2b67b733a7eb2d7ded1bc9987809edb, For completness I send
all patches, but the interesing stuff is in patch 4 and 5. Beware of encoding
issues with the test cases.

They do not handle Windows UTF-16 at all, but I think that is just a matter of writing
windows specifc wrappers for the filename and directory handling routines.

Feel free to rewamp and steal ideas and add constructive criticism. Don't even 
think of cherry-picking and rebasing, It's careful handpicking with copy/paste at 
best, but mostly it's fuel for discussions.

I'd admit some parts are quite kludgy and probably slow. as I was primarily 
interested to see if it was even feasible, which it was. however there was simply
no interest, which meant there was no point in optimizing it. It was simply the
wrong problem at the time.

Disclaimer: A problem with this approach is that, although it does character
conversion, if you are on a non-UTF-8 locale it will not let you mange
any repository. That is basically impossible and hence not the goal. It does
help people with the same (or close) languages to cooperate without enforcing
a common encoding as long as stick to the common characters, i.e. the ones
that can be converted between the locales involved.

This is probably the most out-dated patch series ever. 

-- robin

Robin Rosenberg (8):
(mostly obsolete)
  UTF helpers
  Messages in locale.
  Extend tests to cover locale wrt to commit messages.

The interesing stuff (patch 4 & 5)
  UTF file names.
  Extend all tests to work on UTF-8 filenames.

old wip
  test of utf_locallinks
  Convert symlink dest in diff
  UTF-8 in non-SHA1-objects

 Makefile                            |    8 +-
 builtin-add.c                       |    5 +-
 builtin-cat-file.c                  |    6 +-
 builtin-checkout-index.c            |   46 +++-
 builtin-commit-tree.c               |    9 +-
 builtin-ls-files.c                  |   26 ++-
 builtin-ls-tree.c                   |   16 +-
 builtin-rev-parse.c                 |    7 +-
 builtin-update-index.c              |   18 +-
 builtin-write-tree.c                |    5 +-
 diff.c                              |  111 ++++++--
 dir.c                               |   22 +-
 git-commit.sh                       |    5 +
 git-compat-util.h                   |   43 +++
 git-rebase.sh                       |    1 +
 git.c                               |    9 +
 log-tree.c                          |    4 +-
 merge-index.c                       |   25 ++-
 read-cache.c                        |    8 +-
 refs.c                              |   11 +-
 setup.c                             |   28 ++-
 t/lib-read-tree-m-3way.sh           |   38 ++--
 t/t-utf-filenames.sh                |   95 +++++++
 t/t-utf-msg.sh                      |   43 +++
 t/t0000-basic.sh                    |  117 ++++----
 t/t0010-racy-git.sh                 |   10 +-
 t/t1000-read-tree-m-3way.sh         |  240 +++++++++---------
 t/t1001-read-tree-m-2way.sh         |   56 ++--
 t/t1020-subdirectory.sh             |   63 +++---
 t/t1100-commit-tree-options.sh      |   12 +-
 t/t1400-update-ref.sh               |   10 +-
 t/t2000-checkout-cache-clash.sh     |   18 +-
 t/t2001-checkout-cache-clash.sh     |   30 +-
 t/t2002-checkout-cache-u.sh         |    8 +-
 t/t2003-checkout-cache-mkdir.sh     |  118 ++++----
 t/t2004-checkout-cache-temp.sh      |  144 +++++-----
 t/t2100-update-cache-badpath.sh     |   48 ++--
 t/t2101-update-index-reupdate.sh    |   56 ++--
 t/t3000-ls-files-others.sh          |   36 ++--
 t/t3002-ls-files-dashpath.sh        |   24 +-
 t/t3010-ls-files-killed-modified.sh |  104 ++++----
 t/t3020-ls-files-error-unmatch.sh   |   10 +-
 t/t3100-ls-tree-restrict.sh         |  122 +++++-----
 t/t3101-ls-tree-dirname.sh          |   88 +++---
 t/t3400-rebase.sh                   |   18 +-
 t/t3401-rebase-partial.sh           |   24 +-
 t/t3402-rebase-merge.sh             |   17 +-
 t/t3403-rebase-skip.sh              |   10 +-
 t/t3500-cherry.sh                   |   26 +-
 t/t3600-rm.sh                       |   28 +-
 t/t3700-add.sh                      |   30 +-
 t/t4000-diff-format.sh              |   26 +-
 t/t4001-diff-rename.sh              |   20 +-
 t/t4002-diff-basic.sh               |  160 ++++++------
 t/t4003-diff-rename-1.sh            |   66 +++---
 t/t4004-diff-rename-symlink.sh      |   40 ++--
 t/t4005-diff-rename-2.sh            |   54 ++--
 t/t4006-diff-mode.sh                |   14 +-
 t/t4008-diff-break-rewrite.sh       |  100 ++++----
 t/t4009-diff-rename-4.sh            |   63 +++---
 t/t4011-diff-symlink.sh             |   38 ++--
 t/t4012-diff-binary.sh              |   16 +-
 t/t7301-rev-parse.sh                |   20 ++
 t/test-lib.sh                       |   13 +-
 test-utf.c                          |   61 +++++
 utf.c                               |  501 +++++++++++++++++++++++++++++++++++
 utf.h                               |   27 ++
 67 files changed, 2133 insertions(+), 1142 deletions(-)
 create mode 100755 t/t-utf-filenames.sh
 create mode 100755 t/t-utf-msg.sh
 create mode 100755 t/t7301-rev-parse.sh
 create mode 100644 test-utf.c
 create mode 100644 utf.c
 create mode 100644 utf.h

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 1/8] UTF helpers
  2009-05-12 22:50 [RFC 0/8] Antique UTF-8 filename support Robin Rosenberg
@ 2009-05-12 22:50 ` Robin Rosenberg
  2009-05-12 22:50   ` [RFC 2/8] Messages in locale Robin Rosenberg
  2009-05-13  0:20   ` [RFC 1/8] UTF helpers Johannes Schindelin
  0 siblings, 2 replies; 20+ messages in thread
From: Robin Rosenberg @ 2009-05-12 22:50 UTC (permalink / raw)
  To: git; +Cc: Robin Rosenberg

---
 Makefile          |    8 ++-
 git-compat-util.h |    1 +
 git.c             |    9 +++
 t/test-lib.sh     |    4 +-
 test-utf.c        |   61 ++++++++++++++++
 utf.c             |  207 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 utf.h             |   27 +++++++
 7 files changed, 313 insertions(+), 4 deletions(-)
 create mode 100644 test-utf.c
 create mode 100644 utf.c
 create mode 100644 utf.h

diff --git a/Makefile b/Makefile
index 2d62efb..2d71f01 100644
--- a/Makefile
+++ b/Makefile
@@ -259,7 +259,7 @@ LIB_OBJS = \
 	object.o pack-check.o patch-delta.o path.o pkt-line.o sideband.o \
 	quote.o read-cache.o refs.o run-command.o dir.o object-refs.o \
 	server-info.o setup.o sha1_file.o sha1_name.o strbuf.o \
-	tag.o tree.o usage.o config.o environment.o ctype.o copy.o \
+	tag.o tree.o utf.o usage.o config.o environment.o ctype.o copy.o \
 	fetch-clone.o revision.o pager.o tree-walk.o xdiff-interface.o \
 	write_or_die.o trace.o list-objects.o grep.o \
 	alloc.o merge-file.o path-list.o help.o unpack-trees.o $(DIFF_OBJS) \
@@ -564,6 +564,9 @@ ifdef NO_ACCURATE_DIFF
 endif
 
 # Shell quote (do not use $(call) to accommodate ancient setups);
+ALL_CFLAGS += -DUTF8INTERNAL=1
+ALL_CFLAGS += -DDEBUG=1
+#ALL_CFLAGS += -DTEST=1
 
 SHA1_HEADER_SQ = $(subst ','\'',$(SHA1_HEADER))
 
@@ -811,6 +814,9 @@ export NO_SVN_TESTS
 test: all
 	$(MAKE) -C t/ all
 
+test-utf$X: test-utf.c ctype.o utf.o usage.o
+	$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) test-utf.c utf.c ctype.o usage.o
+
 test-date$X: test-date.c date.o ctype.o
 	$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) test-date.c date.o ctype.o
 
diff --git a/git-compat-util.h b/git-compat-util.h
index 0272d04..f83352b 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -25,6 +25,7 @@
 #include <netinet/in.h>
 #include <sys/types.h>
 #include <dirent.h>
+#include "utf.h"
 
 /* On most systems <limits.h> would have given us this, but
  * not on some systems (e.g. GNU/Hurd).
diff --git a/git.c b/git.c
index 6475847..bd4e726 100644
--- a/git.c
+++ b/git.c
@@ -272,6 +272,15 @@ static void handle_internal_command(int argc, const char **argv, char **envp)
 	};
 	int i;
 
+#ifdef DEBUG
+	if (debug()) {
+		fprintf(stderr,"GIT-");
+		for (i = 1; i<argc; ++i)
+			fprintf(stderr,"%s",argv[i]);
+		fprintf(stderr,"\n");
+	}
+#endif
+
 	/* Turn "git cmd --help" into "git help cmd" */
 	if (argc > 1 && !strcmp(argv[1], "--help")) {
 		argv[1] = argv[0];
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 07cb706..e8aefd8 100755
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -4,11 +4,9 @@
 #
 
 # For repeatability, reset the environment to known value.
-LANG=C
-LC_ALL=C
 PAGER=cat
 TZ=UTC
-export LANG LC_ALL PAGER TZ
+export PAGER TZ
 EDITOR=:
 VISUAL=:
 unset AUTHOR_DATE
diff --git a/test-utf.c b/test-utf.c
new file mode 100644
index 0000000..133eea0
--- /dev/null
+++ b/test-utf.c
@@ -0,0 +1,61 @@
+#include <stdio.h>
+#include <time.h>
+#include <assert.h>
+
+#include "cache.h"
+#include "utf.h"
+
+int main(int argc, char **argv)
+{
+	int i;
+
+#if 0
+	for (i = 1; i < argc; i++) {
+		char result1[100];
+		char result2[100];
+
+		utfcpy(result1, argv[i], strlen(argv[i])+1);
+		localcpy(result2, result1, strlen(result1)+1);
+
+		printf("%s -> %s -> %s\n", argv[i], result1, result2);
+	}
+	return 0;
+#endif
+
+#define test(name) case __LINE__: current_name=name; n++; printf("Testing case #%d: %s\n", n, current_name);
+#define end_test break;
+#define begin_suite() char *current_name=0; int n=1; for (i=0; i<1000; ++i) { switch(i) { 
+#define concats(a,b) #a #b
+
+#undef strcmp
+#define assertStringEquals(a,b) assert(#a #b && strcmp(a,b)==0)
+#define assertIntEquals(a,b) assert(#a #b && (a)==(b))
+
+#define end_suite() }}
+
+	begin_suite();
+
+	test("utfcpy") {
+	  char result[100];
+	  utfcpy(result,"Ändrad",7);
+	  assertStringEquals(result,"\303\204ndrad");
+	} end_test;
+
+	test("utflen") {
+	  int result=utflen("Ändrad",7);
+	  assertIntEquals(result,8);
+	} end_test;
+
+	test("localcpy") {
+	  char result[100];
+	  localcpy(result,"\303\204ndrad",8);
+	  assertStringEquals(result,"Ändrad");
+	} end_test;
+
+	test("locallen") {
+	  int result=locallen("\303\204ndrad",8);
+	  assertIntEquals(result,7);
+	} end_test;
+
+	end_suite();
+}
diff --git a/utf.c b/utf.c
new file mode 100644
index 0000000..eb430b2
--- /dev/null
+++ b/utf.c
@@ -0,0 +1,207 @@
+#undef UTF8INTERNAL
+
+#include <langinfo.h>
+#include <iconv.h>
+#include "cache.h"
+#include <locale.h>
+#include <stdarg.h>
+
+static iconv_t local_to_utf8 = (iconv_t)-1;
+static iconv_t utf8_to_local = (iconv_t)-1;
+static iconv_t utf8_to_utf8 = (iconv_t)-1;
+static int same = 0;
+
+#if TEST
+#define die printf
+#endif
+
+static void	initlocale()
+{
+#ifndef NO_ICONV
+	if (!same && local_to_utf8 == (iconv_t)-1) {
+		setlocale(LC_CTYPE, "");
+		char *local_encoding = nl_langinfo(CODESET);
+#ifdef DEBUG
+		if (debug()) fprintf(stderr,"encoding=%s\n", local_encoding);
+#endif
+		if (strcmp(local_encoding,"UTF-8") == 0) {
+			same = 1;
+			return;
+		}
+		local_to_utf8 = iconv_open("UTF-8",  local_encoding);
+		if (local_to_utf8 == (iconv_t)-1) {
+			die("cannot setup locale conversion from %s: %s", local_encoding, strerror(errno));
+		}
+#ifdef DEBUG
+		if (debug()) fprintf(stderr,"utf8_to_local = iconv_open(%s,UTF-8)\n",local_encoding);
+#endif
+		utf8_to_local = iconv_open(local_encoding,  "UTF-8");
+		if (utf8_to_local == (iconv_t)-1) {
+			die("cannot setup locale conversion from %s: %s", local_encoding, strerror(errno));
+		}
+
+		utf8_to_utf8 = iconv_open("UTF-8","UTF-8");
+		if (utf8_to_utf8 == (iconv_t)-1) {
+			die("cannot setup locale conversion from UTF-8 to UTF-8: %s",strerror(errno));
+		}
+	}
+#endif
+}
+
+int maybe_utf8(const char *local, size_t len)
+{
+  char *self = xcalloc(1,len+1);
+  char *selfp = self;
+  size_t outlen = len+1;
+  int ret = iconv(utf8_to_utf8, (char**)&local, &len, &selfp, &outlen);
+  free(self);
+  P(("maybelocal: %0.*s %s\n", len, local, ret!=-1 ? "yes" : "no"));
+  return ret != -1;
+}
+
+size_t utflen(const char *local, size_t locallen)
+{
+#ifndef NO_ICONV
+	initlocale();
+	if (same) {
+		return locallen;
+	}
+	if (maybe_utf8(local, locallen))
+		return locallen;
+
+	size_t outlen=locallen*6;
+	char *outbuf=xcalloc(outlen,1);
+	char *out=outbuf;
+	iconv(local_to_utf8, NULL, NULL, NULL, NULL);
+	const char *vlocal = local;
+	size_t vlocallen = locallen;
+	if (iconv(local_to_utf8,  (char**)&vlocal,  &vlocallen,  &out,  &outlen) == -1) {
+#if TEST
+		perror("failed");
+#endif
+		free(outbuf);
+		return locallen;
+	}
+	*out = 0;
+	free(outbuf);
+	return locallen*6 - outlen;
+#else
+	return locallen;
+#endif
+}
+
+/* Copy and transform */
+void utfcpy(char *to_utf, char *from_local, size_t localsize)
+{
+#ifdef DEBUG
+	char *a=to_utf,*b=from_local;
+#endif
+#ifndef NO_ICONV
+	initlocale();
+	if (same) {
+		memcpy(to_utf, from_local, localsize);
+		return;
+	}
+	if (maybe_utf8(from_local, localsize)) {
+		memcpy(to_utf, from_local, localsize);
+		return;
+	}
+
+	size_t outlen=localsize*6;
+	iconv(local_to_utf8, NULL, NULL, NULL, NULL);
+	char *vfrom_local = from_local;
+	char *vto_utf = to_utf;
+	size_t vlocalsize = localsize;
+	if (iconv(local_to_utf8,  &vfrom_local,  &vlocalsize,  &vto_utf,  &outlen) == -1) {
+		fprintf(stderr,"Failed to convert %0.*s to UTF\n", localsize, from_local);
+		memcpy(to_utf,  from_local,  localsize);
+	}
+#else
+	memcpy(to_utf, from_local, localsize);
+#endif
+#ifdef DEBUG
+	if (debug()) fprintf(stderr,"%0.*s ->UTF %0.*s\n", localsize, b, localsize*6 - outlen, a);
+#endif
+}
+
+size_t locallen(const char *utf, size_t utflen)
+{
+#ifndef NO_ICONV
+	initlocale();
+	if (same) {
+		return utflen;
+	}
+	char *outbuf=xcalloc(utflen*4,1); /* ??, can we be more specific? */
+	char *out=outbuf;
+	size_t outlen=utflen*4;
+	iconv(utf8_to_local, NULL, NULL, NULL, NULL);
+	char *vutf = utf;
+	size_t vutflen = utflen;
+	if (iconv(utf8_to_local,  (char**)&vutf,  &vutflen,  &out,  &outlen) == -1) {
+#ifdef DEBUG
+		perror("failed");
+#endif
+		free(outbuf);
+		return utflen;
+	}
+	*out = 0;
+	free(outbuf);
+	return utflen*4 - outlen; 	
+#else
+	return utflen;
+#endif
+}
+
+void localcpy(char *tolocal, char *fromutf, size_t utflen)
+{
+#ifdef DEBUG
+	char *a=tolocal,*b=fromutf;
+#endif
+	initlocale();
+	if (same) {
+		memcpy(tolocal, fromutf, utflen);
+		return;
+	}
+#ifndef NO_ICONV
+	iconv(utf8_to_local,  NULL,  NULL,  NULL,  NULL);
+	size_t outlen=utflen*4;
+	char *vfromutf = fromutf;
+	char *vtolocal = tolocal;
+	size_t vutflen = utflen;
+	if (iconv(utf8_to_local,  &vfromutf,  &vutflen,  &vtolocal,  &outlen) == -1) {
+		fprintf(stderr,"Failed to convert %0.*s to LOCAL\n", utflen, fromutf);
+		memcpy(tolocal, fromutf, utflen);
+	}
+#else
+	memcpy(tolocale, fromutf, utflen);
+#endif	
+#ifdef DEBUG
+	if (debug()) fprintf(stderr,"%0.*s ->LOCAL %0.*s\n", utflen, b, utflen*4-outlen, a);
+#endif
+}
+
+int PP(const char *fmt,...)
+{
+  va_list va;
+  va_start(va,fmt);
+  int ret=vfprintf(stderr,fmt,va);
+  va_end(va);
+  return ret;
+}
+
+int debugf=-1;
+
+int debug()
+{
+	if (debugf == -1) {
+		char *f = getenv("DEBUG");
+		if (!f) {
+			debugf = 0;
+		} else if (f[0] != 0) {
+			debugf = 1;
+		} else
+			debugf = 0;
+	}
+	return debugf == 1;
+}
+
diff --git a/utf.h b/utf.h
new file mode 100644
index 0000000..c6c6224
--- /dev/null
+++ b/utf.h
@@ -0,0 +1,27 @@
+#ifndef UTF_H
+#define UTF_H 1
+
+/** The number of octets 'local' would occupy encoded as utf8.
+ *  The input format is assumed to be local
+ */
+extern size_t utflen(const char *local,size_t locallen);
+extern size_t locallen(const char *utf,size_t utflen);
+
+/* Copy and transform */
+extern void utfcpy(char *toutf,char *fromlocal,size_t localen);
+
+/* Copy and transform */
+extern void localcpy(char *tolocal,char *fromutf,size_t utflen);
+
+#ifdef DEBUG
+#define D(x) do { if (debug()) fprintf(stderr,"%s:%d:%s\n",__FILE__,__LINE__,x); } while(0)
+#define P(x) do { if (debug()) { fprintf(stderr,"%s:%d:",__FILE__,__LINE__); PP x; } } while(0)
+int PP(const char *fmt,...);
+int debug();
+
+#else
+#define D(x)
+#define P(x)
+#endif
+
+#endif
-- 
1.6.3.dirty

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 2/8] Messages in locale.
  2009-05-12 22:50 ` [RFC 1/8] UTF helpers Robin Rosenberg
@ 2009-05-12 22:50   ` Robin Rosenberg
  2009-05-12 22:50     ` [RFC 3/8] Extend tests to cover locale wrt to commit messages Robin Rosenberg
  2009-05-13  0:20   ` [RFC 1/8] UTF helpers Johannes Schindelin
  1 sibling, 1 reply; 20+ messages in thread
From: Robin Rosenberg @ 2009-05-12 22:50 UTC (permalink / raw)
  To: git; +Cc: Robin Rosenberg

---
 builtin-cat-file.c    |    6 +++++-
 builtin-commit-tree.c |    9 ++++++---
 git-rebase.sh         |    1 +
 log-tree.c            |    4 +++-
 refs.c                |   11 ++++++++---
 t/t-utf-msg.sh        |   43 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 66 insertions(+), 8 deletions(-)
 create mode 100755 t/t-utf-msg.sh

diff --git a/builtin-cat-file.c b/builtin-cat-file.c
index 6c16bfa..ff275bf 100644
--- a/builtin-cat-file.c
+++ b/builtin-cat-file.c
@@ -145,6 +145,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	if (!buf)
 		die("git-cat-file %s: bad file", argv[2]);
 
-	write_or_die(1, buf, size);
+	size_t localsize = locallen(buf,size);
+	char *localbuf = xcalloc(localsize+1,1);
+	localcpy(localbuf, buf, size+1);
+	write_or_die(1, localbuf, localsize);
+	free(localbuf);
 	return 0;
 }
diff --git a/builtin-commit-tree.c b/builtin-commit-tree.c
index e2e690a..8d87ec7 100644
--- a/builtin-commit-tree.c
+++ b/builtin-commit-tree.c
@@ -23,16 +23,19 @@ static void init_buffer(char **bufp, unsigned int *sizep)
 static void add_buffer(char **bufp, unsigned int *sizep, const char *fmt, ...)
 {
 	char one_line[2048];
+	char one_line_utf[2048];
 	va_list args;
-	int len;
+	int len,len_utf;
 	unsigned long alloc, size, newsize;
 	char *buf;
 
 	va_start(args, fmt);
 	len = vsnprintf(one_line, sizeof(one_line), fmt, args);
 	va_end(args);
+	utfcpy(one_line_utf, one_line, len + 1);
+	len_utf = strlen(one_line_utf);
 	size = *sizep;
-	newsize = size + len;
+	newsize = size + len_utf;
 	alloc = (size + 32767) & ~32767;
 	buf = *bufp;
 	if (newsize > alloc) {
@@ -41,7 +44,7 @@ static void add_buffer(char **bufp, unsigned int *sizep, const char *fmt, ...)
 		*bufp = buf;
 	}
 	*sizep = newsize;
-	memcpy(buf + size, one_line, len);
+	memcpy(buf + size, one_line_utf, len_utf);
 }
 
 static void check_valid(unsigned char *sha1, const char *expect)
diff --git a/git-rebase.sh b/git-rebase.sh
index 546fa44..939ac40 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -296,6 +296,7 @@ fi
 
 if test -z "$do_merge"
 then
+	LC_CTYPE=sv_SE.UTF-8 \
 	git-format-patch -k --stdout --full-index --ignore-if-in-upstream "$upstream"..ORIG_HEAD |
 	git am --binary -3 -k --resolvemsg="$RESOLVEMSG" \
 		--reflog-action=rebase
diff --git a/log-tree.c b/log-tree.c
index fbe1399..7c2564d 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -104,6 +104,7 @@ static int append_signoff(char *buf, int buf_sz, int at, const char *signoff)
 void show_log(struct rev_info *opt, const char *sep)
 {
 	static char this_header[16384];
+	static char this_header_local[16384];
 	struct log_info *log = opt->loginfo;
 	struct commit *commit = log->commit, *parent = log->parent;
 	int abbrev = opt->diffopt.abbrev;
@@ -217,7 +218,8 @@ void show_log(struct rev_info *opt, const char *sep)
 	if (opt->add_signoff)
 		len = append_signoff(this_header, sizeof(this_header), len,
 				     opt->add_signoff);
-	printf("%s%s%s", this_header, extra, sep);
+	localcpy(this_header_local, this_header, len+1);
+	printf("%s%s%s", this_header_local, extra, sep);
 }
 
 int log_tree_diff_flush(struct rev_info *opt)
diff --git a/refs.c b/refs.c
index 98327d7..cfe2704 100644
--- a/refs.c
+++ b/refs.c
@@ -363,8 +363,9 @@ static int log_ref_write(struct ref_lock *lock,
 	const unsigned char *sha1, const char *msg)
 {
 	int logfd, written, oflags = O_APPEND | O_WRONLY;
-	unsigned maxlen, len;
+	unsigned maxlen, len, len_utf;
 	char *logrec;
+	char *logrec_utf;
 	const char *committer;
 
 	if (log_all_ref_updates) {
@@ -400,10 +401,14 @@ static int log_ref_write(struct ref_lock *lock,
 			sha1_to_hex(sha1),
 			committer);
 	}
-	written = len <= maxlen ? write(logfd, logrec, len) : -1;
+	logrec_utf = xmalloc(len*6);
+	utfcpy(logrec_utf, logrec, len + 1);
+	len_utf = strlen(logrec_utf);
+	written = len_utf <= maxlen ? write(logfd, logrec_utf, len_utf) : -1;
 	free(logrec);
+	free(logrec_utf);
 	close(logfd);
-	if (written != len)
+	if (written != len_utf)
 		return error("Unable to append to %s", lock->log_file);
 	return 0;
 }
diff --git a/t/t-utf-msg.sh b/t/t-utf-msg.sh
new file mode 100755
index 0000000..727d497
--- /dev/null
+++ b/t/t-utf-msg.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+
+test_description='Test charset management.
+
+This assumes normal tests works fine
+and concentrates commit messages and other
+descriptive data.'
+
+. ./test-lib.sh
+
+export GIT_AUTHOR_NAME='Pär Nördsson'
+export GIT_COMMITTER_NAME='Pär Nördsson'
+export GIT_AUTHOR_DATE='Thu Sep 14 22:54:30 2006 +0000'
+export GIT_COMMITTER_DATE='Thu Sep 14 22:54:30 2006 +0000'
+
+test_expect_success \
+    'add simple text file' \
+    'echo hej >aland.txt &&
+    git-add aland.txt &&
+    git-commit -a -m "Ändrad" &&
+    echo test $(git-ls-files) = "aland.txt\"" &&
+    LC_CTYPE=sv_SE.UTF-8 echo test $(git-ls-files) = "aland.txt\""
+    '
+
+cat >>expected <<EOF
+commit 6905219c78beda5d5efd2a5fe4fbe0a8757bb355
+Author: Pär Nördsson <author@example.com>
+Date:   Thu Sep 14 22:54:30 2006 +0000
+
+    Ändrad
+EOF
+
+test_expect_success \
+    'log' \
+    '
+    git log >actual &&
+    diff -u actual expected
+    '
+
+# todo: git-cat-file commit xxxxxxxxxxxxx
+
+test_done
+
-- 
1.6.3.dirty

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 3/8] Extend tests to cover locale wrt to commit messages.
  2009-05-12 22:50   ` [RFC 2/8] Messages in locale Robin Rosenberg
@ 2009-05-12 22:50     ` Robin Rosenberg
  2009-05-12 22:50       ` [RFC 4/8] UTF file names Robin Rosenberg
  0 siblings, 1 reply; 20+ messages in thread
From: Robin Rosenberg @ 2009-05-12 22:50 UTC (permalink / raw)
  To: git; +Cc: Robin Rosenberg

---
 t/t1100-commit-tree-options.sh |   12 ++++++------
 t/t1400-update-ref.sh          |   10 +++++-----
 t/t3400-rebase.sh              |    6 +++---
 t/t3401-rebase-partial.sh      |    8 ++++----
 t/t3402-rebase-merge.sh        |   17 ++++++++++-------
 t/t3403-rebase-skip.sh         |   10 +++++-----
 6 files changed, 33 insertions(+), 30 deletions(-)

diff --git a/t/t1100-commit-tree-options.sh b/t/t1100-commit-tree-options.sh
index 19a0ed4..04f2e3b 100755
--- a/t/t1100-commit-tree-options.sh
+++ b/t/t1100-commit-tree-options.sh
@@ -13,10 +13,10 @@ object by defining all environment variables that it understands.
 
 cat >expected <<EOF
 tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
-author Author Name <author@email> 1117148400 +0000
-committer Committer Name <committer@email> 1117150200 +0000
+author Authör Name <author@email> 1117148400 +0000
+committer Committer Näme <committer@email> 1117150200 +0000
 
-comment text
+cömment text
 EOF
 
 test_expect_success \
@@ -25,11 +25,11 @@ test_expect_success \
 
 test_expect_success \
     'construct commit' \
-    'echo comment text |
-     GIT_AUTHOR_NAME="Author Name" \
+    'echo cömment text |
+     GIT_AUTHOR_NAME="Authör Name" \
      GIT_AUTHOR_EMAIL="author@email" \
      GIT_AUTHOR_DATE="2005-05-26 23:00" \
-     GIT_COMMITTER_NAME="Committer Name" \
+     GIT_COMMITTER_NAME="Committer Näme" \
      GIT_COMMITTER_EMAIL="committer@email" \
      GIT_COMMITTER_DATE="2005-05-26 23:30" \
      TZ=GMT git-commit-tree `cat treeid` >commitid 2>/dev/null'
diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index b3b920e..0daff8a 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -71,12 +71,12 @@ touch .git/logs/refs/heads/master
 test_expect_success \
 	"create $m (logged by touch)" \
 	'GIT_COMMITTER_DATE="2005-05-26 23:30" \
-	 git-update-ref HEAD '"$A"' -m "Initial Creation" &&
+	 git-update-ref HEAD '"$A"' -m "Initial Creation. /mäster" &&
 	 test '"$A"' = $(cat .git/'"$m"')'
 test_expect_success \
 	"update $m (logged by touch)" \
 	'GIT_COMMITTER_DATE="2005-05-26 23:31" \
-	 git-update-ref HEAD'" $B $A "'-m "Switch" &&
+	 git-update-ref HEAD'" $B $A "'-m "Switch /mäster" &&
 	 test '"$B"' = $(cat .git/'"$m"')'
 test_expect_success \
 	"set $m (logged by touch)" \
@@ -84,9 +84,9 @@ test_expect_success \
 	 git-update-ref HEAD'" $A &&
 	 test $A"' = $(cat .git/'"$m"')'
 
-cat >expect <<EOF
-$Z $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150200 +0000	Initial Creation
-$A $B $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150260 +0000	Switch
+iconv -f iso-8859-1 -t utf-8 >expect <<EOF
+$Z $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150200 +0000	Initial Creation. /mäster
+$A $B $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150260 +0000	Switch /mäster
 $B $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150860 +0000
 EOF
 test_expect_success \
diff --git a/t/t3400-rebase.sh b/t/t3400-rebase.sh
index b9d3131..2982615 100755
--- a/t/t3400-rebase.sh
+++ b/t/t3400-rebase.sh
@@ -15,15 +15,15 @@ test_expect_success \
     'prepare repository with topic branch, then rebase against master' \
     'echo First > A &&
      git-update-index --add A &&
-     git-commit -m "Add A." &&
+     git-commit -m "Add Ä." &&
      git checkout -b my-topic-branch &&
      echo Second > B &&
      git-update-index --add B &&
-     git-commit -m "Add B." &&
+     git-commit -m "Add ß." &&
      git checkout -f master &&
      echo Third >> A &&
      git-update-index A &&
-     git-commit -m "Modify A." &&
+     git-commit -m "Modify ¢." &&
      git checkout -f my-topic-branch &&
      git rebase master'
 
diff --git a/t/t3401-rebase-partial.sh b/t/t3401-rebase-partial.sh
index 8b19d3c..801b631 100755
--- a/t/t3401-rebase-partial.sh
+++ b/t/t3401-rebase-partial.sh
@@ -15,23 +15,23 @@ test_expect_success \
     'prepare repository with topic branch' \
     'echo First > A &&
      git-update-index --add A &&
-     git-commit -m "Add A." &&
+     git-commit -m "Add Ä." &&
 
      git-checkout -b my-topic-branch &&
 
      echo Second > B &&
      git-update-index --add B &&
-     git-commit -m "Add B." &&
+     git-commit -m "Add ß." &&
 
      echo AnotherSecond > C &&
      git-update-index --add C &&
-     git-commit -m "Add C." &&
+     git-commit -m "Add ¢." &&
 
      git-checkout -f master &&
 
      echo Third >> A &&
      git-update-index A &&
-     git-commit -m "Modify A."
+     git-commit -m "Modify Ä."
 '
 
 test_expect_success \
diff --git a/t/t3402-rebase-merge.sh b/t/t3402-rebase-merge.sh
index 0779aaa..8045995 100755
--- a/t/t3402-rebase-merge.sh
+++ b/t/t3402-rebase-merge.sh
@@ -7,8 +7,11 @@ test_description='git rebase --merge test'
 
 . ./test-lib.sh
 
-T="A quick brown fox
-jumps over the lazy dog."
+export GIT_AUTHOR_NAME="Pär Nördsson"
+
+T="A quick brown föx
+jumps over the läzy dog."
+
 for i in 1 2 3 4 5 6 7 8 9 10
 do
 	echo "$i $T"
@@ -19,24 +22,24 @@ test_expect_success setup '
 	git commit -m"initial" &&
 	git branch side &&
 	echo "11 $T" >>original &&
-	git commit -a -m"master updates a bit." &&
+	git commit -a -m"mäster updates a bit." &&
 
 	echo "12 $T" >>original &&
-	git commit -a -m"master updates a bit more." &&
+	git commit -a -m"mäster updates a bit more." &&
 
 	git checkout side &&
 	(echo "0 $T" ; cat original) >renamed &&
 	git add renamed &&
 	git update-index --force-remove original &&
-	git commit -a -m"side renames and edits." &&
+	git commit -a -m"side renames and ädits." &&
 
 	tr "[a-z]" "[A-Z]" <original >newfile &&
 	git add newfile &&
-	git commit -a -m"side edits further." &&
+	git commit -a -m"side edits fürther." &&
 
 	tr "[a-m]" "[A-M]" <original >newfile &&
 	rm -f original &&
-	git commit -a -m"side edits once again." &&
+	git commit -a -m"side edits önce again." &&
 
 	git branch test-rebase side &&
 	git branch test-rebase-pick side &&
diff --git a/t/t3403-rebase-skip.sh b/t/t3403-rebase-skip.sh
index 977c498..84f14fd 100755
--- a/t/t3403-rebase-skip.sh
+++ b/t/t3403-rebase-skip.sh
@@ -13,20 +13,20 @@ test_description='git rebase --merge --skip tests'
 test_expect_success setup '
 	echo hello > hello &&
 	git add hello &&
-	git commit -m "hello" &&
+	git commit -m "hällo" &&
 	git branch skip-reference &&
 
 	echo world >> hello &&
-	git commit -a -m "hello world" &&
+	git commit -a -m "hällo world" &&
 	echo goodbye >> hello &&
-	git commit -a -m "goodbye" &&
+	git commit -a -m "göödbye" &&
 
 	git checkout -f skip-reference &&
 	echo moo > hello &&
-	git commit -a -m "we should skip this" &&
+	git commit -a -m "we shöld skip this" &&
 	echo moo > cow &&
 	git add cow &&
-	git commit -m "this should not be skipped" &&
+	git commit -m "this shöd not be skipped" &&
 	git branch pre-rebase skip-reference &&
 	git branch skip-merge skip-reference
 	'
-- 
1.6.3.dirty

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 4/8] UTF file names.
  2009-05-12 22:50     ` [RFC 3/8] Extend tests to cover locale wrt to commit messages Robin Rosenberg
@ 2009-05-12 22:50       ` Robin Rosenberg
       [not found]         ` <1242168631-30753-6-git-send-email-robin.rosenberg@dewire.com>
  0 siblings, 1 reply; 20+ messages in thread
From: Robin Rosenberg @ 2009-05-12 22:50 UTC (permalink / raw)
  To: git; +Cc: Robin Rosenberg

---
 builtin-add.c            |    5 +-
 builtin-checkout-index.c |   46 ++++++---
 builtin-ls-files.c       |   26 +++++-
 builtin-ls-tree.c        |   16 +++-
 builtin-rev-parse.c      |    7 +-
 builtin-update-index.c   |   18 +++-
 builtin-write-tree.c     |    5 +-
 diff.c                   |   97 ++++++++++++++------
 git-commit.sh            |    5 +
 git-compat-util.h        |   42 +++++++++
 merge-index.c            |   25 ++++--
 read-cache.c             |    8 +-
 setup.c                  |   28 +++++-
 t/t-utf-filenames.sh     |   95 +++++++++++++++++++
 utf.c                    |  230 ++++++++++++++++++++++++++++++++++++++++++++++
 15 files changed, 580 insertions(+), 73 deletions(-)
 create mode 100755 t/t-utf-filenames.sh

diff --git a/builtin-add.c b/builtin-add.c
index febb75e..62ea692 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -131,9 +131,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 		return 0;
 	}
 
-	for (i = 0; i < dir.nr; i++)
+	for (i = 0; i < dir.nr; i++) {
+	  P(("Adding '%s'\n", dir.entries[i]->name));
 		add_file_to_index(dir.entries[i]->name, verbose);
-
+	}
 	if (active_cache_changed) {
 		if (write_cache(newfd, active_cache, active_nr) ||
 		    close(newfd) || commit_lock_file(&lock_file))
diff --git a/builtin-checkout-index.c b/builtin-checkout-index.c
index b097c88..745bf9a 100644
--- a/builtin-checkout-index.c
+++ b/builtin-checkout-index.c
@@ -57,16 +57,22 @@ static void write_tempfile_record(const char *name, int prefix_length)
 		for (i = 1; i < 4; i++) {
 			if (i > 1)
 				putchar(' ');
-			if (topath[i][0])
-				fputs(topath[i], stdout);
-			else
+			if (topath[i][0]) {
+				char localbuf[MAXPATHLEN];
+				localcpy(localbuf, topath[i], strlen(topath[i])+1);
+				fputs(localbuf, stdout);
+			} else
 				putchar('.');
 		}
-	} else
-		fputs(topath[checkout_stage], stdout);
-
+	} else {
+		char localbuf[MAXPATHLEN];
+		localcpy(localbuf, topath[checkout_stage], strlen(topath[checkout_stage])+1);
+		fputs(localbuf, stdout);
+	}
 	putchar('\t');
-	write_name_quoted("", 0, name + prefix_length,
+	char localbuf[MAXPATHLEN];
+	localcpy(localbuf, name + prefix_length, strlen(name + prefix_length) + 1);
+	write_name_quoted("", 0, localbuf, 
 		line_termination, stdout);
 	putchar(line_termination);
 
@@ -77,8 +83,11 @@ static void write_tempfile_record(const char *name, int prefix_length)
 
 static int checkout_file(const char *name, int prefix_length)
 {
+	char utfname[MAXPATHLEN];
 	int namelen = strlen(name);
-	int pos = cache_name_pos(name, namelen);
+	utfcpy(utfname, name, namelen + 1);
+	int utfnamelen = strlen(utfname);
+	int pos = cache_name_pos(utfname, utfnamelen);
 	int has_same_name = 0;
 	int did_checkout = 0;
 	int errs = 0;
@@ -88,8 +97,8 @@ static int checkout_file(const char *name, int prefix_length)
 
 	while (pos < active_nr) {
 		struct cache_entry *ce = active_cache[pos];
-		if (ce_namelen(ce) != namelen ||
-		    memcmp(ce->name, name, namelen))
+		if (ce_namelen(ce) != utfnamelen ||
+		    memcmp(ce->name, utfname, utfnamelen))
 			break;
 		has_same_name = 1;
 		pos++;
@@ -224,7 +233,9 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 		if (!strncmp(arg, "--prefix=", 9)) {
-			state.base_dir = arg+9;
+			static char utfbasedir[MAXPATHLEN];
+			utfcpy(utfbasedir, arg+9, strlen(arg+9)+1);
+			state.base_dir = utfbasedir;
 			state.base_dir_len = strlen(state.base_dir);
 			continue;
 		}
@@ -262,13 +273,16 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 		const char *arg = argv[i];
 		const char *p;
 
+		char utfarg[MAXPATHLEN];
+		utfcpy(utfarg, arg, strlen(arg)+1);
+
 		if (all)
 			die("git-checkout-index: don't mix '--all' and explicit filenames");
 		if (read_from_stdin)
 			die("git-checkout-index: don't mix '--stdin' and explicit filenames");
-		p = prefix_path(prefix, prefix_length, arg);
+		p = prefix_path(prefix, prefix_length, utfarg);
 		checkout_file(p, prefix_length);
-		if (p < arg || p > arg + strlen(arg))
+		if (p < arg || p > arg + strlen(utfarg))
 			free((char*)p);
 	}
 
@@ -288,9 +302,11 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
 				path_name = unquote_c_style(buf.buf, NULL);
 			else
 				path_name = buf.buf;
-			p = prefix_path(prefix, prefix_length, path_name);
+			char utfpath_name[MAXPATHLEN];
+			utfcpy(utfpath_name, path_name, strlen(path_name)+1);
+			p = prefix_path(prefix, prefix_length, utfpath_name);
 			checkout_file(p, prefix_length);
-			if (p < path_name || p > path_name + strlen(path_name))
+			if (p < utfpath_name || p > utfpath_name + strlen(utfpath_name))
 				free((char *)p);
 			if (path_name != buf.buf)
 				free(path_name);
diff --git a/builtin-ls-files.c b/builtin-ls-files.c
index ad8c41e..babac34 100644
--- a/builtin-ls-files.c
+++ b/builtin-ls-files.c
@@ -65,6 +65,7 @@ static int match(const char **spec, char *ps_matched,
 			ps_matched++;
 		continue;
 	matched:
+		P(("matched(%s,%s)\n",m+len, filename+len));
 		if (ps_matched)
 			*ps_matched = 1;
 		return 1;
@@ -76,6 +77,9 @@ static void show_dir_entry(const char *tag, struct dir_entry *ent)
 {
 	int len = prefix_len;
 	int offset = prefix_offset;
+	char localpath[MAXPATHLEN];
+
+	P(("show_dir_entry(\"%s\",\"%s\"\n",tag,ent->name));
 
 	if (len >= ent->len)
 		die("git-ls-files: internal error - directory entry not superset of prefix");
@@ -84,7 +88,8 @@ static void show_dir_entry(const char *tag, struct dir_entry *ent)
 		return;
 
 	fputs(tag, stdout);
-	write_name_quoted("", 0, ent->name + offset, line_terminator, stdout);
+	localcpy(localpath, ent->name + offset, strlen(ent->name + offset) + 1);
+	write_name_quoted("", 0, localpath, line_terminator, stdout);
 	putchar(line_terminator);
 }
 
@@ -165,6 +170,8 @@ static void show_ce_entry(const char *tag, struct cache_entry *ce)
 	int len = prefix_len;
 	int offset = prefix_offset;
 
+	P(("show_ce_entry(\"%s\",\"%s\"\n",tag,ce->name));
+
 	if (len >= ce_namelen(ce))
 		die("git-ls-files: internal error - cache entry not superset of prefix");
 
@@ -189,19 +196,23 @@ static void show_ce_entry(const char *tag, struct cache_entry *ce)
 	}
 
 	if (!show_stage) {
+		char localpath[MAXPATHLEN];
 		fputs(tag, stdout);
-		write_name_quoted("", 0, ce->name + offset,
+		localcpy(localpath, ce->name + offset, strlen(ce->name + offset) + 1);
+		write_name_quoted("", 0, localpath,
 				  line_terminator, stdout);
 		putchar(line_terminator);
 	}
 	else {
+		char localpath[MAXPATHLEN];
 		printf("%s%06o %s %d\t",
 		       tag,
 		       ntohl(ce->ce_mode),
 		       abbrev ? find_unique_abbrev(ce->sha1,abbrev)
 				: sha1_to_hex(ce->sha1),
 		       ce_stage(ce));
-		write_name_quoted("", 0, ce->name + offset,
+		localcpy(localpath, ce->name + offset, strlen(ce->name + offset) + 1);
+		write_name_quoted("", 0, localpath,
 				  line_terminator, stdout);
 		putchar(line_terminator);
 	}
@@ -451,6 +462,12 @@ int cmd_ls_files(int argc, const char **argv, const char *prefix)
 
 	pathspec = get_pathspec(prefix, argv + i);
 
+#ifdef DEBUG
+ 	if (pathspec) {
+		P(("pathspec[0]=%s\n",pathspec[0]));
+		P(("pathspec[1]=%s\n",pathspec[1]));
+	}
+#endif
 	/* Verify that the pathspec matches the prefix */
 	if (pathspec)
 		prefix = verify_pathspec(prefix);
@@ -461,6 +478,7 @@ int cmd_ls_files(int argc, const char **argv, const char *prefix)
 		for (num = 0; pathspec[num]; num++)
 			;
 		ps_matched = xcalloc(1, num);
+		P(("allocated ps_matched, %d entries\n",num));
 	}
 
 	if (dir.show_ignored && !exc_given) {
@@ -485,12 +503,14 @@ int cmd_ls_files(int argc, const char **argv, const char *prefix)
 		 */
 		int num, errors = 0;
 		for (num = 0; pathspec[num]; num++) {
+		  	P(("ps_matched[%d]=%d\n",num,ps_matched[num]));
 			if (ps_matched[num])
 				continue;
 			error("pathspec '%s' did not match any.",
 			      pathspec[num] + prefix_offset);
 			errors++;
 		}
+		P(("return errors ? 1 : 0; => %d\n",errors ? 1 : 0));
 		return errors ? 1 : 0;
 	}
 
diff --git a/builtin-ls-tree.c b/builtin-ls-tree.c
index 201defd..4f53b0d 100644
--- a/builtin-ls-tree.c
+++ b/builtin-ls-tree.c
@@ -78,8 +78,14 @@ static int show_tree(const unsigned char *sha1, const char *base, int baselen,
 		printf("%06o %s %s\t", mode, type,
 				abbrev ? find_unique_abbrev(sha1,abbrev)
 					: sha1_to_hex(sha1));
-	write_name_quoted(base + chomp_prefix, baselen - chomp_prefix,
-			  pathname,
+	char localprefix[MAXPATHLEN];
+	int localprefixlen = locallen(base + chomp_prefix, baselen - chomp_prefix);
+	localcpy(localprefix, base + chomp_prefix, baselen - chomp_prefix);
+	
+	char localpathname[MAXPATHLEN];
+	localcpy(localpathname, pathname, strlen(pathname)+1);
+	write_name_quoted(localprefix, localprefixlen,
+			  localpathname,
 			  line_termination, stdout);
 	putchar(line_termination);
 	return retval;
@@ -147,6 +153,12 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 		die("Not a valid object name %s", argv[1]);
 
 	pathspec = get_pathspec(prefix, argv + 2);
+#ifdef DEBUG
+	if (pathspec) {
+	  P(("pathspec[0]=%s\n",pathspec[0]));
+	  P(("pathspec[1]=%s\n",pathspec[1]));
+	}
+#endif
 	tree = parse_tree_indirect(sha1);
 	if (!tree)
 		die("not a tree object");
diff --git a/builtin-rev-parse.c b/builtin-rev-parse.c
index fd3ccc8..da03906 100644
--- a/builtin-rev-parse.c
+++ b/builtin-rev-parse.c
@@ -315,8 +315,11 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				continue;
 			}
 			if (!strcmp(arg, "--show-prefix")) {
-				if (prefix)
-					puts(prefix);
+				if (prefix) {
+					char localprefix[MAXPATHLEN];
+					localcpy(localprefix, prefix, strlen(prefix)+1);
+					puts(localprefix);
+				}
 				continue;
 			}
 			if (!strcmp(arg, "--show-cdup")) {
diff --git a/builtin-update-index.c b/builtin-update-index.c
index a3c0a45..cfea4cd 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -274,7 +274,9 @@ static void read_index_info(int line_termination)
 		else
 			path_name = ptr;
 
-		if (!verify_path(path_name)) {
+		char utfpath_name[MAXPATHLEN];
+		utfcpy(utfpath_name, path_name, strlen(path_name) + 1);
+		if (!verify_path(utfpath_name)) {
 			fprintf(stderr, "Ignoring path %s\n", path_name);
 			if (path_name != ptr)
 				free(path_name);
@@ -284,7 +286,7 @@ static void read_index_info(int line_termination)
 
 		if (!mode) {
 			/* mode == 0 means there is no such path -- remove */
-			if (remove_file_from_cache(path_name))
+			if (remove_file_from_cache(utfpath_name))
 				die("git-update-index: unable to remove %s",
 				    ptr);
 		}
@@ -294,7 +296,7 @@ static void read_index_info(int line_termination)
 			 * ptr[-41] is at the beginning of sha1
 			 */
 			ptr[-42] = ptr[-1] = 0;
-			if (add_cacheinfo(mode, sha1, path_name, stage))
+			if (add_cacheinfo(mode, sha1, utfpath_name, stage))
 				die("git-update-index: unable to update %s",
 				    path_name);
 		}
@@ -616,7 +618,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 				usage(update_index_usage);
 			die("unknown option %s", path);
 		}
-		update_one(path, prefix, prefix_length);
+		char utfpath[MAXPATHLEN];
+		utfcpy(utfpath, path, strlen(path) + 1);
+		update_one(utfpath, prefix, prefix_length);
 		if (set_executable_bit)
 			chmod_path(set_executable_bit, path);
 	}
@@ -633,11 +637,13 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 				path_name = unquote_c_style(buf.buf, NULL);
 			else
 				path_name = buf.buf;
-			p = prefix_path(prefix, prefix_length, path_name);
+			char utfpath_name[MAXPATHLEN];
+			utfcpy(utfpath_name, path_name, strlen(path_name) + 1);
+			p = prefix_path(prefix, prefix_length, utfpath_name);
 			update_one(p, NULL, 0);
 			if (set_executable_bit)
 				chmod_path(set_executable_bit, p);
-			if (p < path_name || p > path_name + strlen(path_name))
+			if (p < utfpath_name || p > utfpath_name + strlen(utfpath_name))
 				free((char*) p);
 			if (path_name != buf.buf)
 				free(path_name);
diff --git a/builtin-write-tree.c b/builtin-write-tree.c
index 50670dc..84c370f 100644
--- a/builtin-write-tree.c
+++ b/builtin-write-tree.c
@@ -80,7 +80,10 @@ int cmd_write_tree(int argc, const char **argv, const char *unused_prefix)
 	if (argc > 2)
 		die("too many options");
 
-	ret = write_tree(sha1, missing_ok, prefix);
+	char utfprefix[MAXPATHLEN];
+	if (prefix)
+	  utfcpy(utfprefix, prefix, strlen(prefix)+1);
+	ret = write_tree(sha1, missing_ok, prefix!=NULL ? utfprefix : NULL);
 	printf("%s\n", sha1_to_hex(sha1));
 
 	return ret;
diff --git a/diff.c b/diff.c
index 3315378..170ec5a 100644
--- a/diff.c
+++ b/diff.c
@@ -964,9 +964,14 @@ static void builtin_diff(const char *name_a,
 	char *a_one, *b_two;
 	const char *set = diff_get_color(o->color_diff, DIFF_METAINFO);
 	const char *reset = diff_get_color(o->color_diff, DIFF_RESET);
+	char localname_a[MAXPATHLEN];
+	char localname_b[MAXPATHLEN];
 
-	a_one = quote_two("a/", name_a);
-	b_two = quote_two("b/", name_b);
+	localcpy(localname_a, name_a, strlen(name_a) + 1);
+	localcpy(localname_b, name_b, strlen(name_b) + 1);
+
+	a_one = quote_two("a/", localname_a);
+	b_two = quote_two("b/", localname_b);
 	lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null";
 	lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null";
 	printf("%sdiff --git %s %s%s\n", set, a_one, b_two, reset);
@@ -995,7 +1000,7 @@ static void builtin_diff(const char *name_a,
 		if ((one->mode ^ two->mode) & S_IFMT)
 			goto free_ab_and_return;
 		if (complete_rewrite) {
-			emit_rewrite_diff(name_a, name_b, one, two);
+			emit_rewrite_diff(localname_a, localname_b, one, two);
 			goto free_ab_and_return;
 		}
 	}
@@ -1372,18 +1377,22 @@ static void prepare_temp_file(const char *name,
 	    work_tree_matches(name, one->sha1)) {
 		struct stat st;
 		if (lstat(name, &st) < 0) {
+			char localname[MAXPATHLEN];
 			if (errno == ENOENT)
 				goto not_a_valid_file;
-			die("stat(%s): %s", name, strerror(errno));
+			localcpy(localname, name, strlen(name) + 1);
+			die("stat(%s): %s", localname, strerror(errno));
 		}
 		if (S_ISLNK(st.st_mode)) {
 			int ret;
 			char buf[PATH_MAX + 1]; /* ought to be SYMLINK_MAX */
+			char localname[MAXPATHLEN];
+			localcpy(localname, name, strlen(name) + 1);
 			if (sizeof(buf) <= st.st_size)
-				die("symlink too long: %s", name);
+				die("symlink too long: %s", localname);
 			ret = readlink(name, buf, st.st_size);
 			if (ret < 0)
-				die("readlink(%s)", name);
+				die("readlink(%s)", localname);
 			prep_temp_blob(temp, buf, st.st_size,
 				       (one->sha1_valid ?
 					one->sha1 : null_sha1),
@@ -1522,7 +1531,9 @@ static void run_external_diff(const char *pgm,
 	retval = spawn_prog(pgm, spawn_arg);
 	remove_tempfile();
 	if (retval) {
-		fprintf(stderr, "external diff died, stopping at %s.\n", name);
+	  	char localname[MAXPATHLEN];
+	  	localcpy(localname, name, strlen(name) + 1);
+		fprintf(stderr, "external diff died, stopping at %s.\n", localname);
 		exit(1);
 	}
 }
@@ -1574,6 +1585,9 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o)
 	char *name_munged, *other_munged;
 	int complete_rewrite = 0;
 	int len;
+	char localname[MAXPATHLEN];
+	char localotherbuf[MAXPATHLEN];
+	char *localother;
 
 	if (DIFF_PAIR_UNMERGED(p)) {
 		/* unmerged */
@@ -1583,8 +1597,14 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o)
 
 	name = p->one->path;
 	other = (strcmp(name, p->two->path) ? p->two->path : NULL);
-	name_munged = quote_one(name);
-	other_munged = quote_one(other);
+	localcpy(localname, name, strlen(name) + 1);
+	if (other) {
+		localcpy(localotherbuf, other, strlen(other) + 1);
+		localother = localotherbuf;
+	} else
+		localother = NULL;
+	name_munged = quote_one(localname);
+	other_munged = quote_one(localother);
 	one = p->one; two = p->two;
 
 	diff_fill_sha1_info(one);
@@ -2093,12 +2113,15 @@ static void diff_flush_raw(struct diff_filepair *p,
 	const char *path_one, *path_two;
 	int inter_name_termination = '\t';
 	int line_termination = options->line_termination;
+	char localpath_one[MAXPATHLEN];
+	char localpath_two[MAXPATHLEN];
 
 	if (!line_termination)
 		inter_name_termination = 0;
-
-	path_one = p->one->path;
-	path_two = p->two->path;
+	localcpy(localpath_one, p->one->path, strlen(p->one->path) + 1);
+	localcpy(localpath_two, p->two->path, strlen(p->two->path) + 1);
+	path_one = localpath_one;
+	path_two = localpath_two;
 	if (line_termination) {
 		path_one = quote_one(path_one);
 		path_two = quote_one(path_two);
@@ -2135,20 +2158,22 @@ static void diff_flush_raw(struct diff_filepair *p,
 	if (two_paths)
 		printf("%c%s", inter_name_termination, path_two);
 	putchar(line_termination);
-	if (path_one != p->one->path)
+	if (path_one != localpath_one)
 		free((void*)path_one);
-	if (path_two != p->two->path)
+	if (path_two != localpath_two)
 		free((void*)path_two);
 }
 
 static void diff_flush_name(struct diff_filepair *p, int line_termination)
 {
-	char *path = p->two->path;
+	char localpath_two[MAXPATHLEN];
+	char *path = localpath_two;
+	localcpy(localpath_two, p->two->path, strlen(p->two->path) + 1);
 
 	if (line_termination)
-		path = quote_one(p->two->path);
+		path = quote_one(localpath_two);
 	printf("%s%c", path, line_termination);
-	if (p->two->path != path)
+	if (localpath_two != path)
 		free(path);
 }
 
@@ -2325,8 +2350,10 @@ static void diff_resolve_rename_copy(void)
 			/* This is a "no-change" entry and should not
 			 * happen anymore, but prepare for broken callers.
 			 */
+		        char localpath[MAXPATHLEN];
+			localcpy(localpath, p->one->path, strlen(p->one->path) + 1);
 			error("feeding unmodified %s to diffcore",
-			      p->one->path);
+			      localpath);
 			p->status = DIFF_STATUS_UNKNOWN;
 		}
 	}
@@ -2359,20 +2386,24 @@ static void flush_one_pair(struct diff_filepair *p, struct diff_options *opt)
 
 static void show_file_mode_name(const char *newdelete, struct diff_filespec *fs)
 {
+        char localpath[MAXPATHLEN];
+	localcpy(localpath, fs->path, strlen(fs->path) + 1);
 	if (fs->mode)
-		printf(" %s mode %06o %s\n", newdelete, fs->mode, fs->path);
+		printf(" %s mode %06o %s\n", newdelete, fs->mode, localpath);
 	else
-		printf(" %s %s\n", newdelete, fs->path);
+		printf(" %s %s\n", newdelete, localpath);
 }
 
 
 static void show_mode_change(struct diff_filepair *p, int show_name)
 {
 	if (p->one->mode && p->two->mode && p->one->mode != p->two->mode) {
-		if (show_name)
+	        if (show_name) {
+		 	char localpath[MAXPATHLEN];
+			localcpy(localpath, p->two->path, strlen(p->two->path) + 1);
 			printf(" mode change %06o => %06o %s\n",
-			       p->one->mode, p->two->mode, p->two->path);
-		else
+			       p->one->mode, p->two->mode, localpath);
+		} else
 			printf(" mode change %06o => %06o\n",
 			       p->one->mode, p->two->mode);
 	}
@@ -2381,10 +2412,16 @@ static void show_mode_change(struct diff_filepair *p, int show_name)
 static void show_rename_copy(const char *renamecopy, struct diff_filepair *p)
 {
 	const char *old, *new;
+	char localpath_one[MAXPATHLEN];
+	char localpath_two[MAXPATHLEN];
+	char localpath_old[MAXPATHLEN];
+	char localpath_new[MAXPATHLEN];
 
 	/* Find common prefix */
-	old = p->one->path;
-	new = p->two->path;
+	localcpy(localpath_old, p->one->path, strlen(p->one->path) + 1);
+	localcpy(localpath_new, p->two->path, strlen(p->two->path) + 1);
+	old = localpath_old;
+	new = localpath_new;
 	while (1) {
 		const char *slash_old, *slash_new;
 		slash_old = strchr(old, '/');
@@ -2400,13 +2437,15 @@ static void show_rename_copy(const char *renamecopy, struct diff_filepair *p)
 	/* p->one->path thru old is the common prefix, and old and new
 	 * through the end of names are renames
 	 */
+	localcpy(localpath_one, p->one->path, strlen(p->one->path) + 1);
+	localcpy(localpath_two, p->two->path, strlen(p->two->path) + 1);
 	if (old != p->one->path)
 		printf(" %s %.*s{%s => %s} (%d%%)\n", renamecopy,
-		       (int)(old - p->one->path), p->one->path,
+		       (int)(old - localpath_one), localpath_one,
 		       old, new, (int)(0.5 + p->score * 100.0/MAX_SCORE));
 	else
 		printf(" %s %s => %s (%d%%)\n", renamecopy,
-		       p->one->path, p->two->path,
+		       localpath_one, localpath_two,
 		       (int)(0.5 + p->score * 100.0/MAX_SCORE));
 	show_mode_change(p, 0);
 }
@@ -2428,7 +2467,9 @@ static void diff_summary(struct diff_filepair *p)
 		break;
 	default:
 		if (p->score) {
-			printf(" rewrite %s (%d%%)\n", p->two->path,
+		  	char localpath[MAXPATHLEN];
+			localcpy(localpath, p->two->path, strlen(p->two->path) + 1);
+			printf(" rewrite %s (%d%%)\n", localpath,
 				(int)(0.5 + p->score * 100.0/MAX_SCORE));
 			show_mode_change(p, 0);
 		} else	show_mode_change(p, 1);
diff --git a/git-commit.sh b/git-commit.sh
index 5b1cf85..e2c647a 100755
--- a/git-commit.sh
+++ b/git-commit.sh
@@ -1,4 +1,9 @@
 #!/bin/sh
+if grep -q "^xALL_CFLAGS += -DDEBUG=1" $(dirname $0)/Makefile
+then
+    set -x
+    PS4='$0:$LINENO:'
+fi
 #
 # Copyright (c) 2005 Linus Torvalds
 # Copyright (c) 2006 Junio C Hamano
diff --git a/git-compat-util.h b/git-compat-util.h
index f83352b..6a61026 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -188,4 +188,46 @@ static inline int sane_case(int x, int high)
 	return x;
 }
 
+#ifndef MAXPATHLEN
+#define MAXPATHLEN 256
+#endif
+
+#ifdef UTF8INTERNAL
+#ifdef NO_ICONV
+#error "NO_ICONV must NOT be set when UTF8INTERNAL is set"
+#endif
+extern int utf_lstat(const char *path,struct stat *buf);
+extern int utf_stat(const char *path,struct stat *buf);
+extern DIR *utf_opendir(const char *path);
+extern struct dirent* utf_readdir(DIR *dir);
+extern int utf_closedir(DIR *dir);
+extern FILE *utf_fopen(const char *path,char *mode);
+extern FILE *utf_freopen(const char *path,char *mode,FILE *stream);
+extern int utf_unlink(const char *path);
+extern int utf_rmdir(const char *path);
+extern int utf_open(const char *path,int flags, ...) __nonnull((1));
+extern int utf_access(const char *path, int mode);
+extern char *utf_getcwd(char *buf,int bufsize);
+extern int utf_creat(const char *path,int mode);
+extern int utf_mkdir(const char *path,int mode);
+extern ssize_t utf_readlink(const char *path,char *buf,size_t bufsiz);
+
+#define lstat(path,buf) utf_lstat(path,buf)
+#define stat(path,buf) utf_stat(path,buf)
+#define opendir(path) utf_opendir(path)
+#define readdir(dir) utf_readdir(dir)
+#define closedir(dir) utf_closedir(dir)
+#define fopen(name,mode) utf_fopen(name,mode)
+#define freopen(name,mode,stream) utf_freopen(name,mode,stream)
+#define unlink(name) utf_unlink(name)
+#define rmdir(name) utf_rmdir(name)
+//#define open(name,flags,mode) utf_open(name,flags,mode)
+#define open utf_open
+#define access(path,mode) utf_access(path,mode)
+#define getcwd(buf,bufsize) utf_getcwd(buf,bufsize)
+#define creat(path,mode) utf_creat(path,mode)
+#define mkdir(path,mode) utf_mkdir(path,mode)
+#define readlink(path,buf,bufsiz) utf_readlink(path,buf,bufsiz)
+#endif
+
 #endif
diff --git a/merge-index.c b/merge-index.c
index 646d090..8a98b49 100644
--- a/merge-index.c
+++ b/merge-index.c
@@ -17,14 +17,23 @@ static void run_program(void)
 	if (pid < 0)
 		die("unable to fork");
 	if (!pid) {
-		execlp(pgm, arguments[0],
-			    arguments[1],
-			    arguments[2],
-			    arguments[3],
-			    arguments[4],
-			    arguments[5],
-			    arguments[6],
-			    arguments[7],
+		char argbuf[8][MAXPATHLEN];
+		char* args[8];
+		int i;
+		for (i=0; i<8; ++i) {
+			if (arguments[i]) {
+			      args[i] = argbuf[i];
+			      localcpy(args[i], arguments[i], strlen(arguments[i]) + 1);
+			}
+		}
+		execlp(pgm, args[0],
+			    args[1],
+			    args[2],
+			    args[3],
+			    args[4],
+			    args[5],
+			    args[6],
+			    args[7],
 			    NULL);
 		die("unable to execute '%s'", pgm);
 	}
diff --git a/read-cache.c b/read-cache.c
index 97c3867..f7642ca 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -701,7 +701,9 @@ int refresh_cache(unsigned int flags)
 			i--;
 			if (allow_unmerged)
 				continue;
-			printf("%s: needs merge\n", ce->name);
+			char localname[MAXPATHLEN];
+			localcpy(localname, ce->name, strlen(ce->name) + 1);
+			printf("%s: needs merge\n", localname);
 			has_errors = 1;
 			continue;
 		}
@@ -721,7 +723,9 @@ int refresh_cache(unsigned int flags)
 			}
 			if (quiet)
 				continue;
-			printf("%s: needs update\n", ce->name);
+			char localname[MAXPATHLEN];
+			localcpy(localname, ce->name, strlen(ce->name) + 1);
+			printf("%s: needs update\n", localname);
 			has_errors = 1;
 			continue;
 		}
diff --git a/setup.c b/setup.c
index 9a46a58..9e500f1 100644
--- a/setup.c
+++ b/setup.c
@@ -43,6 +43,8 @@ const char *prefix_path(const char *prefix, int len, const char *path)
 		memcpy(n, prefix, len);
 		memcpy(n + len, path, speclen+1);
 		path = n;
+	} else {
+		path = xstrdup(path);
 	}
 	return path;
 }
@@ -107,6 +109,8 @@ void verify_non_filename(const char *prefix, const char *arg)
 
 const char **get_pathspec(const char *prefix, const char **pathspec)
 {
+	char utfprefixbuf[MAXPATHLEN];
+	char *utfprefix;
 	const char *entry = *pathspec;
 	const char **p;
 	int prefixlen;
@@ -114,20 +118,36 @@ const char **get_pathspec(const char *prefix, const char **pathspec)
 	if (!prefix && !entry)
 		return NULL;
 
+	if (prefix) {
+	  utfcpy(utfprefixbuf, prefix, strlen(prefix)+1);
+	  utfprefix = utfprefixbuf;
+	} else {
+	  utfprefix = NULL;
+	}
+
 	if (!entry) {
 		static const char *spec[2];
-		spec[0] = prefix;
+		spec[0] = xstrdup(utfprefix);
 		spec[1] = NULL;
 		return spec;
 	}
 
 	/* Otherwise we have to re-write the entries.. */
+	int n;
+	for (n=0; pathspec[n]; ++n)
+		;
+	char **ret = xcalloc(n+1,sizeof(char*));
+	char **r = ret;
 	p = pathspec;
-	prefixlen = prefix ? strlen(prefix) : 0;
+	prefixlen = utfprefix ? strlen(utfprefix) : 0;
 	do {
-		*p = prefix_path(prefix, prefixlen, entry);
+	        char utfentry[MAXPATHLEN];
+		utfcpy(utfentry, entry, strlen(entry)+1);
+		*r = prefix_path(utfprefix, prefixlen, utfentry);
+		P(("*r=%s\n",*r));
+		++r;
 	} while ((entry = *++p) != NULL);
-	return (const char **) pathspec;
+	return (const char**)ret;
 }
 
 /*
diff --git a/t/t-utf-filenames.sh b/t/t-utf-filenames.sh
new file mode 100755
index 0000000..0731086
--- /dev/null
+++ b/t/t-utf-filenames.sh
@@ -0,0 +1,95 @@
+#!/bin/sh
+
+test_description='Test charset management.
+
+This assumes normal tests works fine
+and concentrates on filenames with non-ascii
+characters.'
+
+. ./test-lib.sh
+
+test_expect_success \
+    'add simple text file' \
+    'mkdir å &&
+    echo hej >å/åland.txt &&
+    git-add å/åland.txt &&
+    git-commit -a -m "Changed" &&
+    echo test $(git-ls-files) = "\"\\345/\\345land.txt\"" &&
+    LC_CTYPE=sv_SE.UTF-8 echo test $(git-ls-files) = "\"\\345/\\345land.txt\""
+    '
+
+test_expect_success \
+    'change single text file, first time' \
+    'echo san >>å/åland.txt &&
+    git-commit -a -m "Changed again"
+    '
+test_expect_success \
+    'add simple binary file' \
+    'dd if=/dev/urandom of=å/åäö bs=1 count=312 &&
+    git-add å/åäö &&
+    git-commit -a -m "Changed" &&
+    git-ls-files &&
+    test "$(git-ls-files)" = "\"\\345/\\345land.txt\"
+\"\\345/\\345\\344\\366\""
+    '
+test_expect_success \
+    'change single text file, second time' \
+    'echo san >>å/åland.txt &&
+    git-commit -a -m "Changed igen"
+    '
+test_expect_success \
+    'change single binary file' \
+    'dd if=/dev/urandom of=å/åäö bs=1 count=312 &&
+    git-commit -a -m "Changed igen"
+    '
+
+test_expect_success \
+    'branch and merge, new file' \
+    '
+    git-tag -f HERE &&
+    git-checkout -b "gren1" &&
+    echo >å/öland.txt hej &&
+    git-add . &&
+    git-commit -a -m "Ändrad" &&
+    git-checkout master &&
+    git-pull . gren1 &&
+    test "$(git-ls-files)" = "\"\\345/\\345land.txt\"
+\"\\345/\\345\\344\\366\"
+\"\\345/\\366land.txt\""
+    '
+test_expect_success \
+    'merge old file' \
+    '
+    git-checkout gren1 &&
+    echo >å/öland.txt hejsan &&
+    git-commit -a -m "Ändrad" &&
+    git-checkout master &&
+    git-pull . gren1
+    test "$(git-ls-files)" = "\"\\345/\\345land.txt\"
+\"\\345/\\345\\344\\366\"
+\"\\345/\\366land.txt\""
+    '
+
+test_expect_success \
+    'merge in-tree file' \
+    '
+    echo >>å/öland.txt in master &&
+    git-commit -a -m "in master" && 
+    git-checkout gren1 &&
+    echo >å/öland.txt in branch &&
+    git-update-index å/öland.txt &&
+    git-checkout -m master
+    test "$(git-ls-files)" = "\"\\345/\\345land.txt\"
+\"\\345/\\345\\344\\366\"
+\"\\345/\\366land.txt\""
+    test $(echo $(wc -l <å/öland.txt)) = 6
+    '
+
+test_expect_success \
+    'clone to UTF' \
+    '
+    rm -rf ../trash2 &&
+    LC_ALL=sv_SE.UTF-8 LC_CTYPE=sv_SE.UTF-8 git-clone . ../trash2
+    '
+
+test_done
diff --git a/utf.c b/utf.c
index eb430b2..7c44cac 100644
--- a/utf.c
+++ b/utf.c
@@ -180,6 +180,236 @@ void localcpy(char *tolocal, char *fromutf, size_t utflen)
 #endif
 }
 
+#define MAXRESOURCES 50
+struct resource {
+  void *key;
+  void *value;
+};
+
+static struct resource resources[MAXRESOURCES];
+static void put(void *key, void *value)
+{
+  int i;
+  for (i=0; i<MAXRESOURCES; ++i) {
+    if (resources[i].key == 0) {
+      resources[i].key = key;
+      resources[i].value = value;
+      return;
+    }
+  }
+}
+
+static void* get(void *key)
+{
+  int i;
+  for (i=0; i<MAXRESOURCES; ++i) {
+    if (resources[i].key == key) {
+      return resources[i].value;
+    }
+  }
+  return NULL;
+}
+
+static void* getandremove(void *key)
+{
+  int i;
+  for (i=0; i<MAXRESOURCES; ++i) {
+    if (resources[i].key == key) {
+      resources[i].key = NULL;
+      return resources[i].value;
+    }
+  }
+  return NULL;
+}
+ 
+static void zfree(void *ret)
+{
+  if (ret)
+    free(ret);
+}
+
+int utf_lstat(char *path, struct stat *buf)
+{
+  P(("utf_lstat(\"%s\",buf)[",path));
+  char localpath[MAXPATHLEN];
+  localcpy(localpath, path, strlen(path)+1);
+  P(("\"%s\"]\n",localpath));
+  int ret = lstat(localpath, buf);
+  if (ret >= 0 && (buf->st_mode & S_IFMT) == S_IFLNK) {
+    char sympath[MAXPATHLEN];
+    int n = utf_readlink(path, sympath, sizeof sympath);
+    if (n < 0)
+      die("Panic, cannot read link %s in utf_lstat\n", path);
+    buf->st_size = strlen(sympath);
+  }
+  return ret;
+}
+
+int utf_stat(char *path, struct stat *buf)
+{
+  P(("utf_stat(\"%s\",buf)[",path));
+  char localpath[MAXPATHLEN];
+  localcpy(localpath, path, strlen(path)+1);
+  P(("\"%s\"]\n",localpath));
+  return stat(localpath, buf);
+}
+
+DIR *utf_opendir(char *path)
+{
+  P(("utf_opendir(\"%s\")\n",path));
+  char localpath[MAXPATHLEN];
+  DIR *ret = NULL;
+  localcpy(localpath, path, strlen(path)+1);
+  P(("\"%s\"]\n",localpath));
+  ret = opendir(localpath);
+  if (ret)
+    put(ret, NULL);
+  return ret;
+}
+
+struct dirent* utf_readdir(DIR *dir)
+{
+  P(("utf_readdir(\"%p\")",dir));
+  struct dirent *ret;
+  int len;
+  char utfpath[256];
+  struct dirent *myret;
+
+  zfree(getandremove(dir));
+  ret = readdir(dir);
+  if (ret != NULL) {
+    utfcpy(utfpath, ret->d_name, strlen(ret->d_name)+1);
+    len=sizeof(struct dirent)+strlen(utfpath)+1;
+    myret=malloc(len);
+    memcpy(myret, ret, sizeof (struct dirent));
+    put(dir, myret);
+    strcpy(myret->d_name, utfpath);
+    P(("=>\"%s\"\n",myret->d_name));
+    return myret;
+  } else {
+    return NULL;
+  }
+}
+
+int utf_closedir(DIR *dir)
+{
+  P(("utf_closedir(%p)\n",dir));
+  zfree(getandremove(dir));
+  return closedir(dir);
+}
+
+FILE *utf_fopen(char *path, char *mode)
+{
+  P(("utf_fopen(\"%s\",\"%s\")[",path,mode));
+  char localpath[MAXPATHLEN];
+  localcpy(localpath, path, strlen(path)+1);
+  P(("\"%s\"]\n",localpath));
+  return fopen(localpath, mode);
+} 
+
+FILE *utf_freopen(char *path, char *mode, FILE *stream)
+{
+  P(("utf_freopen(\"%s\",\"%s\",%p)[",path,mode,stream));
+  char localpath[MAXPATHLEN];
+  localcpy(localpath, path, strlen(path)+1);
+  P(("\"%s\"]\n",localpath));
+  return freopen(localpath, mode, stream);
+} 
+
+int utf_unlink(char *path)
+{
+  P(("utf_unlink(\"%s\")[",path));
+  char localpath[MAXPATHLEN];
+  localcpy(localpath, path, strlen(path)+1);
+  P(("\"%s\"]\n",localpath));
+  return unlink(localpath);
+}
+
+int utf_rmdir(char *path)
+{
+  P(("utf_rmdir(\"%s\")[",path));
+  char localpath[MAXPATHLEN];
+  localcpy(localpath, path, strlen(path)+1);
+  P(("\"%s\"]\n",localpath));
+  return rmdir(localpath);
+}
+
+int utf_open(char *path, int flags,...)
+{
+  va_list va;
+  int mode;
+  va_start(va,flags);
+  mode = va_arg(va,int);
+  va_end(va);
+  P(("utf_open(\"%s\",%d,%d)[",path,flags,mode));
+  char localpath[MAXPATHLEN];
+  localcpy(localpath, path, strlen(path)+1);
+  P(("\"%s\"]\n",localpath));
+  return open(localpath, flags, mode);
+}
+
+int utf_access(char *path, int mode)
+{
+  P(("utf_access(\"%s\",%d)[",path,mode));
+  char localpath[MAXPATHLEN];
+  localcpy(localpath, path, strlen(path)+1);
+  P(("\"%s\"]\n",localpath));
+  return access(localpath,mode);
+}
+
+char *utf_getcwd(char *buf,int bufsize)
+{
+  char localbuf[MAXPATHLEN];
+  char *ret=getcwd(localbuf,sizeof localbuf);
+  if (ret != NULL) {
+    if (buf == NULL) {
+      if (bufsize == 0) {
+	buf = malloc(bufsize);
+      } else {
+	buf = malloc(utflen(localbuf,strlen(localbuf)) + 1);
+      }
+    }
+    utfcpy(buf, localbuf, strlen(localbuf) + 1);
+    return buf;
+  } else {
+    return NULL;    
+  }
+}
+
+int utf_creat(const char *path,int mode)
+{
+  P(("utf_creat(\"%s\",%d)[",path,mode));
+  char localpath[MAXPATHLEN];
+  localcpy(localpath, path, strlen(path)+1);
+  P(("\"%s\"]\n",localpath));
+  return creat(localpath, mode);
+}
+
+int utf_mkdir(const char *path,int mode)
+{
+  P(("utf_mkdir(\"%s\",%d)[",path,mode));
+  char localpath[MAXPATHLEN];
+  localcpy(localpath, path, strlen(path)+1);
+  P(("\"%s\"]\n",localpath));
+  return mkdir(localpath, mode);
+}
+
+ssize_t utf_readlink(const char *path,char *buf,size_t bufsiz)
+{
+  P(("utf_readlink(\"%s\",BUF,%d)[",path,bufsiz));
+  char localpath[MAXPATHLEN];
+  localcpy(localpath, path, strlen(path)+1);
+  P(("readlink(%s)\n", localpath));
+  char localret[MAXPATHLEN];
+  ssize_t ret = readlink(localpath, localret, bufsiz);
+  if (ret == -1)
+	return ret;
+  localret[ret] = 0;
+  utfcpy(buf, localret, ret+1);
+  P(("\"%s\"]\n",buf));
+  return strlen(buf);
+}
+
 int PP(const char *fmt,...)
 {
   va_list va;
-- 
1.6.3.dirty

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 6/8] test of utf_locallinks
       [not found]         ` <1242168631-30753-6-git-send-email-robin.rosenberg@dewire.com>
@ 2009-05-12 22:50           ` Robin Rosenberg
  2009-05-12 22:50             ` [RFC 7/8] Convert symlink dest in diff Robin Rosenberg
  0 siblings, 1 reply; 20+ messages in thread
From: Robin Rosenberg @ 2009-05-12 22:50 UTC (permalink / raw)
  To: git; +Cc: Robin Rosenberg

---
 t/t4004-diff-rename-symlink.sh |    2 +-
 t/t4011-diff-symlink.sh        |    8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/t/t4004-diff-rename-symlink.sh b/t/t4004-diff-rename-symlink.sh
index 663096e..b70d4ae 100755
--- a/t/t4004-diff-rename-symlink.sh
+++ b/t/t4004-diff-rename-symlink.sh
@@ -41,7 +41,7 @@ new file mode 120000
 --- /dev/null
 +++ "b/b\366zbar"
 @@ -0,0 +1 @@
-+xzzzyö
++xzzzyö
 \ No newline at end of file
 diff --git "a/fr\366tz" "b/nitf\366l"
 similarity index 100%
diff --git a/t/t4011-diff-symlink.sh b/t/t4011-diff-symlink.sh
index 0ee622f..b9a0cbf 100755
--- a/t/t4011-diff-symlink.sh
+++ b/t/t4011-diff-symlink.sh
@@ -16,7 +16,7 @@ index 0000000..7c465af
 --- /dev/null
 +++ "b/fr\366tz"
 @@ -0,0 +1 @@
-+xüzzü
++xüzzü
 \ No newline at end of file
 EOF
 
@@ -42,7 +42,7 @@ index 7c465af..0000000
 --- "a/fr\366tz"
 +++ /dev/null
 @@ -1 +0,0 @@
--xüzzü
+-xüzzü
 \ No newline at end of file
 EOF
 
@@ -69,9 +69,9 @@ index 7c465af..df1db54 120000
 --- "a/fr\366tz"
 +++ "b/fr\366tz"
 @@ -1 +1 @@
--xüzzü
+-xüzzü
 \ No newline at end of file
-+üxüuz
++üxüuz
 \ No newline at end of file
 EOF
 
-- 
1.6.3.dirty

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 7/8] Convert symlink dest in diff
  2009-05-12 22:50           ` [RFC 6/8] test of utf_locallinks Robin Rosenberg
@ 2009-05-12 22:50             ` Robin Rosenberg
  2009-05-12 22:50               ` [RFC 8/8] UTF-8 in non-SHA1-objects Robin Rosenberg
  0 siblings, 1 reply; 20+ messages in thread
From: Robin Rosenberg @ 2009-05-12 22:50 UTC (permalink / raw)
  To: git; +Cc: Robin Rosenberg

---
 diff.c |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 170ec5a..c8132a4 100644
--- a/diff.c
+++ b/diff.c
@@ -1319,8 +1319,18 @@ int diff_populate_filespec(struct diff_filespec *s, int size_only)
 				locate_size_cache(s->sha1, 0, s->size);
 		}
 		else {
-			s->data = read_sha1_file(s->sha1, type, &s->size);
-			s->should_free = 1;
+			if (S_ISLNK(s->mode)) {
+				int linksize;
+			        char *linkdata = read_sha1_file(s->sha1, type, &linksize);
+				s->size = locallen(linkdata, linksize);
+				s->data = xmalloc(s->size + 1);
+				localcpy(s->data, linkdata, linksize + 1);
+				s->should_free = 1;
+				free(linkdata);
+			} else {
+				s->data = read_sha1_file(s->sha1, type, &s->size);
+				s->should_free = 1;
+			}
 		}
 	}
 	return 0;
-- 
1.6.3.dirty

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 8/8] UTF-8 in non-SHA1-objects
  2009-05-12 22:50             ` [RFC 7/8] Convert symlink dest in diff Robin Rosenberg
@ 2009-05-12 22:50               ` Robin Rosenberg
  0 siblings, 0 replies; 20+ messages in thread
From: Robin Rosenberg @ 2009-05-12 22:50 UTC (permalink / raw)
  To: git; +Cc: Robin Rosenberg

---
 dir.c |   22 ++++++++++++++------
 utf.c |   66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 80 insertions(+), 8 deletions(-)

diff --git a/dir.c b/dir.c
index 96389b3..043df50 100644
--- a/dir.c
+++ b/dir.c
@@ -114,22 +114,26 @@ static int add_excludes_from_file_1(const char *fname,
 {
 	struct stat st;
 	int fd, i;
-	long size;
-	char *buf, *entry;
+	long insize, size;
+	char *inbuf, *entry, *buf;
 
 	fd = open(fname, O_RDONLY);
 	if (fd < 0 || fstat(fd, &st) < 0)
 		goto err;
-	size = st.st_size;
-	if (size == 0) {
+	insize = st.st_size;
+	if (insize == 0) {
 		close(fd);
 		return 0;
 	}
-	buf = xmalloc(size+1);
-	if (read(fd, buf, size) != size)
+	inbuf = xmalloc(insize+1);
+	if (read(fd, inbuf, insize) != insize)
 		goto err;
 	close(fd);
 
+	size = utflen(inbuf, insize+1);
+	buf = xmalloc(size);
+	utfcpy(buf, inbuf, insize+1);
+
 	buf[size++] = '\n';
 	entry = buf;
 	for (i = 0; i < size; i++) {
@@ -141,9 +145,11 @@ static int add_excludes_from_file_1(const char *fname,
 			entry = buf + i + 1;
 		}
 	}
+	free(inbuf);
 	return 0;
 
  err:
+	// bug: not freeing inbuf...
 	if (0 <= fd)
 		close(fd);
 	return -1;
@@ -185,6 +191,7 @@ static int excluded_1(const char *pathname,
 		      int pathlen,
 		      struct exclude_list *el)
 {
+	P(("excluded_1(\"%0.*s\") = ", pathlen, pathname));
 	int i;
 
 	if (el->nr) {
@@ -202,6 +209,7 @@ static int excluded_1(const char *pathname,
 				/* match basename */
 				const char *basename = strrchr(pathname, '/');
 				basename = (basename) ? basename+1 : pathname;
+				P(("fnmatch(\"%s\",\"%s\")", exclude, basename));
 				if (fnmatch(exclude, basename, 0) == 0)
 					return to_exclude;
 			}
@@ -218,7 +226,7 @@ static int excluded_1(const char *pathname,
 				    (baselen && pathname[baselen-1] != '/') ||
 				    strncmp(pathname, x->base, baselen))
 				    continue;
-
+				P(("fnmatch(\"%s\",\"%s\")", exclude, pathname+baselen));
 				if (fnmatch(exclude, pathname+baselen,
 					    FNM_PATHNAME) == 0)
 					return to_exclude;
diff --git a/utf.c b/utf.c
index 7c44cac..cca64dc 100644
--- a/utf.c
+++ b/utf.c
@@ -228,9 +228,28 @@ static void zfree(void *ret)
     free(ret);
 }
 
+int isgitpath(char *path)
+{
+  char *gitdir=getenv("GIT_DIR");
+  if (!gitdir)
+    gitdir=".git";
+
+  P(("gitdir=%s\n",gitdir));
+  if (strncmp(path, gitdir, strlen(gitdir)) == 0) {
+    P(("gitdir:YES\n"));
+    return 1;
+  } else {
+    P(("gitdir:NO\n"));
+  }
+  
+  return 0;
+}
+
 int utf_lstat(char *path, struct stat *buf)
 {
   P(("utf_lstat(\"%s\",buf)[",path));
+  if (isgitpath(path))
+    return lstat(path, buf);
   char localpath[MAXPATHLEN];
   localcpy(localpath, path, strlen(path)+1);
   P(("\"%s\"]\n",localpath));
@@ -248,6 +267,8 @@ int utf_lstat(char *path, struct stat *buf)
 int utf_stat(char *path, struct stat *buf)
 {
   P(("utf_stat(\"%s\",buf)[",path));
+  if (isgitpath(path))
+    return stat(path, buf);
   char localpath[MAXPATHLEN];
   localcpy(localpath, path, strlen(path)+1);
   P(("\"%s\"]\n",localpath));
@@ -257,19 +278,27 @@ int utf_stat(char *path, struct stat *buf)
 DIR *utf_opendir(char *path)
 {
   P(("utf_opendir(\"%s\")\n",path));
+  if (isgitpath(path)) {
+    P(("not transforming dir"));
+    return opendir(path);
+  }
   char localpath[MAXPATHLEN];
   DIR *ret = NULL;
   localcpy(localpath, path, strlen(path)+1);
   P(("\"%s\"]\n",localpath));
   ret = opendir(localpath);
   if (ret)
-    put(ret, NULL);
+    put(ret, xmalloc(1));
   return ret;
 }
 
 struct dirent* utf_readdir(DIR *dir)
 {
   P(("utf_readdir(\"%p\")",dir));
+  if (!get(dir)) {
+    P(("Skipping\n"));
+    return readdir(dir);
+  }
   struct dirent *ret;
   int len;
   char utfpath[256];
@@ -294,6 +323,9 @@ struct dirent* utf_readdir(DIR *dir)
 int utf_closedir(DIR *dir)
 {
   P(("utf_closedir(%p)\n",dir));
+  if (!get(dir))
+    return closedir(dir);
+
   zfree(getandremove(dir));
   return closedir(dir);
 }
@@ -301,6 +333,9 @@ int utf_closedir(DIR *dir)
 FILE *utf_fopen(char *path, char *mode)
 {
   P(("utf_fopen(\"%s\",\"%s\")[",path,mode));
+  if (isgitpath(path))
+    return fopen(path,mode);
+
   char localpath[MAXPATHLEN];
   localcpy(localpath, path, strlen(path)+1);
   P(("\"%s\"]\n",localpath));
@@ -310,6 +345,9 @@ FILE *utf_fopen(char *path, char *mode)
 FILE *utf_freopen(char *path, char *mode, FILE *stream)
 {
   P(("utf_freopen(\"%s\",\"%s\",%p)[",path,mode,stream));
+  if (isgitpath(path))
+    return utf_freopen(path, mode, stream);
+
   char localpath[MAXPATHLEN];
   localcpy(localpath, path, strlen(path)+1);
   P(("\"%s\"]\n",localpath));
@@ -319,6 +357,9 @@ FILE *utf_freopen(char *path, char *mode, FILE *stream)
 int utf_unlink(char *path)
 {
   P(("utf_unlink(\"%s\")[",path));
+  if (isgitpath(path))
+    return unlink(path);
+
   char localpath[MAXPATHLEN];
   localcpy(localpath, path, strlen(path)+1);
   P(("\"%s\"]\n",localpath));
@@ -328,6 +369,9 @@ int utf_unlink(char *path)
 int utf_rmdir(char *path)
 {
   P(("utf_rmdir(\"%s\")[",path));
+  if (isgitpath(path))
+    return rmdir(path);
+
   char localpath[MAXPATHLEN];
   localcpy(localpath, path, strlen(path)+1);
   P(("\"%s\"]\n",localpath));
@@ -342,6 +386,9 @@ int utf_open(char *path, int flags,...)
   mode = va_arg(va,int);
   va_end(va);
   P(("utf_open(\"%s\",%d,%d)[",path,flags,mode));
+  if (isgitpath(path))
+    return open(path, flags, mode);
+
   char localpath[MAXPATHLEN];
   localcpy(localpath, path, strlen(path)+1);
   P(("\"%s\"]\n",localpath));
@@ -351,6 +398,9 @@ int utf_open(char *path, int flags,...)
 int utf_access(char *path, int mode)
 {
   P(("utf_access(\"%s\",%d)[",path,mode));
+  if (isgitpath(path))
+    return access(path, mode);
+
   char localpath[MAXPATHLEN];
   localcpy(localpath, path, strlen(path)+1);
   P(("\"%s\"]\n",localpath));
@@ -369,6 +419,11 @@ char *utf_getcwd(char *buf,int bufsize)
 	buf = malloc(utflen(localbuf,strlen(localbuf)) + 1);
       }
     }
+    if (isgitpath(localbuf)) {
+      strcpy(buf, localbuf);
+      return buf;
+    }
+
     utfcpy(buf, localbuf, strlen(localbuf) + 1);
     return buf;
   } else {
@@ -379,6 +434,9 @@ char *utf_getcwd(char *buf,int bufsize)
 int utf_creat(const char *path,int mode)
 {
   P(("utf_creat(\"%s\",%d)[",path,mode));
+  if (isgitpath(path))
+    return creat(path, mode);
+
   char localpath[MAXPATHLEN];
   localcpy(localpath, path, strlen(path)+1);
   P(("\"%s\"]\n",localpath));
@@ -388,6 +446,9 @@ int utf_creat(const char *path,int mode)
 int utf_mkdir(const char *path,int mode)
 {
   P(("utf_mkdir(\"%s\",%d)[",path,mode));
+  if (isgitpath(path))
+    return mkdir(path,mode);
+
   char localpath[MAXPATHLEN];
   localcpy(localpath, path, strlen(path)+1);
   P(("\"%s\"]\n",localpath));
@@ -397,6 +458,9 @@ int utf_mkdir(const char *path,int mode)
 ssize_t utf_readlink(const char *path,char *buf,size_t bufsiz)
 {
   P(("utf_readlink(\"%s\",BUF,%d)[",path,bufsiz));
+  if (isgitpath(path))
+    return readlink(path, buf, bufsiz);
+
   char localpath[MAXPATHLEN];
   localcpy(localpath, path, strlen(path)+1);
   P(("readlink(%s)\n", localpath));
-- 
1.6.3.dirty

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] UTF helpers
  2009-05-12 22:50 ` [RFC 1/8] UTF helpers Robin Rosenberg
  2009-05-12 22:50   ` [RFC 2/8] Messages in locale Robin Rosenberg
@ 2009-05-13  0:20   ` Johannes Schindelin
  2009-05-13  5:24     ` Robin Rosenberg
  1 sibling, 1 reply; 20+ messages in thread
From: Johannes Schindelin @ 2009-05-13  0:20 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 6323 bytes --]

Hi,

On Wed, 13 May 2009, Robin Rosenberg wrote:

> ---

No SOB.

> diff --git a/git.c b/git.c
> index 6475847..bd4e726 100644
> --- a/git.c
> +++ b/git.c
> @@ -272,6 +272,15 @@ static void handle_internal_command(int argc, const char **argv, char **envp)
>  	};
>  	int i;
>  
> +#ifdef DEBUG
> +	if (debug()) {
> +		fprintf(stderr,"GIT-");
> +		for (i = 1; i<argc; ++i)
> +			fprintf(stderr,"%s",argv[i]);
> +		fprintf(stderr,"\n");
> +	}
> +#endif
> +

What does that have to do with UTF support?

> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 07cb706..e8aefd8 100755
> --- a/t/test-lib.sh
> +++  b/t/test-lib.sh
> @@ -4,11 +4,9 @@
>  #
>  
>  # For repeatability, reset the environment to known value.
> -LANG=C
> -LC_ALL=C
>  PAGER=cat
>  TZ=UTC
> -export LANG LC_ALL PAGER TZ
> +export PAGER TZ
>  EDITOR=:
>  VISUAL=:
>  unset AUTHOR_DATE

Likewise.

> diff --git a/test-utf.c b/test-utf.c
> new file mode 100644
> index 0000000..133eea0
> --- /dev/null
> +++ b/test-utf.c
> @@ -0,0 +1,61 @@
> +#include <stdio.h>
> +#include <time.h>
> +#include <assert.h>
> +
> +#include "cache.h"
> +#include "utf.h"
> +
> +int main(int argc, char **argv)
> +{
> +	int i;
> +
> +#if 0
> +	for (i = 1; i < argc; i++) {
> +		char result1[100];
> +		char result2[100];
> +
> +		utfcpy(result1, argv[i], strlen(argv[i])+1);
> +		localcpy(result2, result1, strlen(result1)+1);
> +
> +		printf("%s -> %s -> %s\n", argv[i], result1, result2);
> +	}
> +	return 0;
> +#endif
> +
> +#define test(name) case __LINE__: current_name=name; n++; printf("Testing case #%d: %s\n", n, current_name);
> +#define end_test break;
> +#define begin_suite() char *current_name=0; int n=1; for (i=0; i<1000; ++i) { switch(i) { 
> +#define concats(a,b) #a #b
> +
> +#undef strcmp
> +#define assertStringEquals(a,b) assert(#a #b && strcmp(a,b)==0)
> +#define assertIntEquals(a,b) assert(#a #b && (a)==(b))
> +
> +#define end_suite() }}
> +
> +	begin_suite();
> +
> +	test("utfcpy") {
> +	  char result[100];
> +	  utfcpy(result,"?ndrad",7);
> +	  assertStringEquals(result,"\303\204ndrad");
> +	} end_test;
> +
> +	test("utflen") {
> +	  int result=utflen("?ndrad",7);
> +	  assertIntEquals(result,8);
> +	} end_test;
> +
> +	test("localcpy") {
> +	  char result[100];
> +	  localcpy(result,"\303\204ndrad",8);
> +	  assertStringEquals(result,"?ndrad");
> +	} end_test;
> +
> +	test("locallen") {
> +	  int result=locallen("\303\204ndrad",8);
> +	  assertIntEquals(result,7);
> +	} end_test;
> +
> +	end_suite();
> +}

Should the test-utf binary not rather perform _actions_ (i.e. 
transformations) instead of checks?

> diff --git a/utf.c b/utf.c
> new file mode 100644
> index 0000000..eb430b2
> --- /dev/null
> +++ b/utf.c
> @@ -0,0 +1,207 @@
> +#undef UTF8INTERNAL
> +
> +#include <langinfo.h>
> +#include <iconv.h>
> +#include "cache.h"
> +#include <locale.h>
> +#include <stdarg.h>
> +
> +static iconv_t local_to_utf8 = (iconv_t)-1;
> +static iconv_t utf8_to_local = (iconv_t)-1;
> +static iconv_t utf8_to_utf8 = (iconv_t)-1;
> +static int same = 0;
> +
> +#if TEST
> +#define die printf
> +#endif

This is ugly.

> +
> +static void	initlocale()
> +{
> +#ifndef NO_ICONV
> +	if (!same && local_to_utf8 == (iconv_t)-1) {
> +		setlocale(LC_CTYPE, "");
> +		char *local_encoding = nl_langinfo(CODESET);
> +#ifdef DEBUG
> +		if (debug()) fprintf(stderr,"encoding=%s\n", local_encoding);
> +#endif

This is ugly.

> +		if (strcmp(local_encoding,"UTF-8") == 0) {
> +			same = 1;
> +			return;
> +		}
> +		local_to_utf8 = iconv_open("UTF-8",  local_encoding);
> +		if (local_to_utf8 == (iconv_t)-1) {
> +			die("cannot setup locale conversion from %s: %s", local_encoding, strerror(errno));
> +		}
> +#ifdef DEBUG
> +		if (debug()) fprintf(stderr,"utf8_to_local = iconv_open(%s,UTF-8)\n",local_encoding);
> +#endif

This is ugly.

> +		utf8_to_local = iconv_open(local_encoding,  "UTF-8");
> +		if (utf8_to_local == (iconv_t)-1) {
> +			die("cannot setup locale conversion from %s: %s", local_encoding, strerror(errno));
> +		}
> +
> +		utf8_to_utf8 = iconv_open("UTF-8","UTF-8");
> +		if (utf8_to_utf8 == (iconv_t)-1) {
> +			die("cannot setup locale conversion from UTF-8 to UTF-8: %s",strerror(errno));
> +		}
> +	}
> +#endif
> +}
> +
> +int maybe_utf8(const char *local, size_t len)
> +{
> +  char *self = xcalloc(1,len+1);
> +  char *selfp = self;
> +  size_t outlen = len+1;
> +  int ret = iconv(utf8_to_utf8, (char**)&local, &len, &selfp, &outlen);
> +  free(self);
> +  P(("maybelocal: %0.*s %s\n", len, local, ret!=-1 ? "yes" : "no"));
> +  return ret != -1;
> +}
> +
> +size_t utflen(const char *local, size_t locallen)
> +{
> +#ifndef NO_ICONV
> +	initlocale();
> +	if (same) {
> +		return locallen;
> +	}
> +	if (maybe_utf8(local, locallen))
> +		return locallen;
> +
> +	size_t outlen=locallen*6;
> +	char *outbuf=xcalloc(outlen,1);
> +	char *out=outbuf;
> +	iconv(local_to_utf8, NULL, NULL, NULL, NULL);
> +	const char *vlocal = local;
> +	size_t vlocallen = locallen;
> +	if (iconv(local_to_utf8,  (char**)&vlocal,  &vlocallen,  &out,  &outlen) == -1) {
> +#if TEST
> +		perror("failed");
> +#endif
> +		free(outbuf);
> +		return locallen;
> +	}
> +	*out = 0;
> +	free(outbuf);
> +	return locallen*6 - outlen;
> +#else
> +	return locallen;
> +#endif
> +}
> +
> +/* Copy and transform */
> +void utfcpy(char *to_utf, char *from_local, size_t localsize)
> +{
> +#ifdef DEBUG
> +	char *a=to_utf,*b=from_local;
> +#endif
> +#ifndef NO_ICONV
> +	initlocale();
> +	if (same) {
> +		memcpy(to_utf, from_local, localsize);
> +		return;
> +	}
> +	if (maybe_utf8(from_local, localsize)) {
> +		memcpy(to_utf, from_local, localsize);
> +		return;
> +	}
> +
> +	size_t outlen=localsize*6;
> +	iconv(local_to_utf8, NULL, NULL, NULL, NULL);
> +	char *vfrom_local = from_local;
> +	char *vto_utf = to_utf;
> +	size_t vlocalsize = localsize;
> +	if (iconv(local_to_utf8,  &vfrom_local,  &vlocalsize,  &vto_utf,  &outlen) == -1) {
> +		fprintf(stderr,"Failed to convert %0.*s to UTF\n", localsize, from_local);
> +		memcpy(to_utf,  from_local,  localsize);
> +	}
> +#else
> +	memcpy(to_utf, from_local, localsize);
> +#endif
> +#ifdef DEBUG
> +	if (debug()) fprintf(stderr,"%0.*s ->UTF %0.*s\n", localsize, b, localsize*6 - outlen, a);
> +#endif
> +}

Okay, I'll stop here.  You might want to clean up your patch series before 
resending.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] UTF helpers
  2009-05-13  0:20   ` [RFC 1/8] UTF helpers Johannes Schindelin
@ 2009-05-13  5:24     ` Robin Rosenberg
  2009-05-13  9:24       ` Esko Luontola
                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Robin Rosenberg @ 2009-05-13  5:24 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

> This is ugly.

I told you so. No news.

> Okay, I'll stop here.  You might want to clean up your patch series before 
> resending.

I also told you why why I stopped working on the patches. The patches are not part of
a beauty contest and not meant for inclusion as such. If you cannot see through the
rubble, just ignore the patches. If the conclusion is that this is a way forward, then I
could start working on a completely new set of much cleaner patches.,

-- robin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] UTF helpers
  2009-05-13  5:24     ` Robin Rosenberg
@ 2009-05-13  9:24       ` Esko Luontola
  2009-05-13 10:02         ` Andreas Ericsson
  2009-05-13 18:48         ` Junio C Hamano
  2009-05-13 10:14       ` Johannes Schindelin
  2009-05-14  4:38       ` Junio C Hamano
  2 siblings, 2 replies; 20+ messages in thread
From: Esko Luontola @ 2009-05-13  9:24 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Robin Rosenberg wrote on 13.5.2009 8:24:
> If the conclusion is that this is a way forward, then I
> could start working on a completely new set of much cleaner patches.,

That would be great!

I see that in those early patches you took the approach of converting 
the filenames from the local encoding to UTF-8 at the outer edges of 
Git. That obviously was the easiest way to make the changes with minimal 
changes to Git.

I've been thinking about a bit more extensive approach, which should 
serve the interest of all stakeholders:


Now the tree object contains the following information for each file: 
filename, mode, sha1. To that would be added one more string: filename 
encoding. Unless the encoding is specified (such as in old commits 
before the encoding information was added), the default encoding is 
"binary", which is the same as how Git works now (it thinks filenames as 
series of bytes, ignoring their encoding completely).

When a file is added/committed, the following things will happen:

1. Git finds out what is the filename encoding used by the system. Git 
will try to detect it automatically from the environment, and the 
autodetected value can be overridden by setting a config variable 
"i18n.localFilenameEncoding". If autodetection fails, it will default to 
"binary".

2. Git reads the config variable "i18n.commitFilenameEncoding". If 
localFilenameEncoding equals commitFilenameEncoding, or if either of 
them is "binary", go to step 3A. Otherwise go to step 3B.

3A. Git saves the filename together with the local filename encoding. 
The bytes of the filename are not changed when it is stored in the 
repository (the same as now).

3B. Git converts the filename from localFilenameEncoding to 
commitFilenameEncoding. (The commitFilenameEncoding may also specify a 
normalized form for UTF-8, for example "UTF-8 NFC". This is needed for 
Mac OS X.) Then Git saves the filename together with the commit filename 
encoding.


When a file is checked out, the following things will happen:

1. Git reads the actual filename encoding from the repository. If it is 
not specified, "binary" will be assumed.

2. Git detects the local filename encoding, the same was as before. If 
the actual filename encoding equals the local filename encoding, or if 
either of them is "binary", go to step 3A. Otherwise go to step 3B.

3A. Git creates the file using the same bytes as filename as what is 
stored in the repository. This is the same as how Git works now.

3B. Git converts the filename from the actual filename encoding to the 
local filename encoding, and creates the file using the encoding of the 
local platform.


This should fit in with Git's philosophy of not modifying the user's 
data without the user's permission. The data will always be stored 
unchanged into the repository, unless the user specifies 
"i18n.commitFilenameEncoding". The conversions are by default done only 
on checkout. Git will try to serve the needs of the user as well as it 
can by detecting the local filename encoding, but if the user so 
desires, he can disable the conversions by specifying 
"i18n.localFilenameEncoding" as "binary", in which case Git will work 
the same way as it does today.


I was browsing Git's code, and it seems that the encoding information 
would need to be added to struct name_entry in tree-walk.h. A quick 
search reveals that name_entry is used in 15 files, out of which only 4 
files use it more than once. It would probably make sense to create a 
new datatype for the filename, for example "struct encoded_path { const 
char *path; const char *encoding; }", and then provide functions for 
accessing the filename with the right encoding (commit or local).

I might even myself be able to make that change, because Git is not 
legacy software (it has tests) and the needed changes seem quite local. 
I would just need a way to detect the encodings (at first it could rely 
on manually set config variables) and have a library for doing the 
encoding conversions.

One big question is, that will this change require a change to the 
repository format? Will it be possible to add the encoding field to the 
tree object, without breaking compatibility with older Git clients? If 
compatibility needs to be broken, how it can be done in a controlled 
fashion?

-- 
Esko Luontola
www.orfjackal.net

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] UTF helpers
  2009-05-13  9:24       ` Esko Luontola
@ 2009-05-13 10:02         ` Andreas Ericsson
  2009-05-13 10:21           ` Esko Luontola
  2009-05-13 18:48         ` Junio C Hamano
  1 sibling, 1 reply; 20+ messages in thread
From: Andreas Ericsson @ 2009-05-13 10:02 UTC (permalink / raw)
  To: Esko Luontola; +Cc: Robin Rosenberg, git

Esko Luontola wrote:
> Robin Rosenberg wrote on 13.5.2009 8:24:
>> If the conclusion is that this is a way forward, then I
>> could start working on a completely new set of much cleaner patches.,
> 
> That would be great!
> 
> I see that in those early patches you took the approach of converting 
> the filenames from the local encoding to UTF-8 at the outer edges of 
> Git. That obviously was the easiest way to make the changes with minimal 
> changes to Git.
> 
> I've been thinking about a bit more extensive approach, which should 
> serve the interest of all stakeholders:
> 
> 
> Now the tree object contains the following information for each file: 
> filename, mode, sha1. To that would be added one more string: filename
> encoding. Unless the encoding is specified (such as in old commits 
> before the encoding information was added), the default encoding is 
> "binary", which is the same as how Git works now (it thinks filenames as 
> series of bytes, ignoring their encoding completely).
> 

[ long and incompatible plan removed ]

> One big question is, that will this change require a change to the 
> repository format? Will it be possible to add the encoding field to the 
> tree object, without breaking compatibility with older Git clients? If 
> compatibility needs to be broken, how it can be done in a controlled 
> fashion?
> 

Generally when one wants to change one of the basic object types in
git, some extraordinary benefit has to be shown that is not aimed
at just a few people. Academic benefits (ie, "non-real-worldy") do
not fall into that category. In fact, it's so rare for someone to
provide such enormous benefit that the only time a core object format
in git has been incompatibly changed is when Linus decided that trees
should be able to have subtrees. The change reduced the repository
size for the early git-tracked Linux kernel to about 4% of its
original size, so there was a clear, undisputable and obvious benefit
huge enough to warrant breaking the git repository format entirely
just to get it in (I might have gotten those details entirely wrong,
but it was something along those lines).

So unless you can change tree objects in a way that lets older git
clients understand them while still adding this encoding cruft
(it's cruft to me), I think your chances of getting such a change
into the git core are about the size of the colour green.

If you're *really* serious about it though, here's how to go about
it:

1. Make the changes so that newer git can always read and operate
on trees without the encoding information, regardless of what the
configuration says.
2. Modify 1.4.x branch to support this new format too, at least
for reading trees with the information in it. Otherwise some
package maintainers will just ignore such compatibility.
3. Modify 1.5.x branch similarly.
5. Make it configurable, but turned off by default and with a big
fat warning when its turned on.
6. 2 years later, remove the warning.
7. 2 years lter, turn it on by default.
8. 2 years later, remove the config option and make it a new
major release, but maintain the two codepaths forever.


1.[45].x branches are imaginary. They represent the branch that
gets created when a new release in that series is necessary for
some reason.


I haven't perused Robin's patches enough to know how they would
interact with older git, and I'm not really interested in encoding
issues. English being the lingua franca of internet and opensource
development anyways, every project I've ever seen has only files
named in a manner that would fit nicely into 7-bit ascii.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Register now for Nordic Meet on Nagios, June 3-4 in Stockholm
 http://nordicmeetonnagios.op5.org/

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] UTF helpers
  2009-05-13  5:24     ` Robin Rosenberg
  2009-05-13  9:24       ` Esko Luontola
@ 2009-05-13 10:14       ` Johannes Schindelin
  2009-05-14  4:38       ` Junio C Hamano
  2 siblings, 0 replies; 20+ messages in thread
From: Johannes Schindelin @ 2009-05-13 10:14 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Hi,

On Wed, 13 May 2009, Robin Rosenberg wrote:

> > Okay, I'll stop here.  You might want to clean up your patch series 
> > before resending.
> 
> I also told you why why I stopped working on the patches. The patches 
> are not part of a beauty contest and not meant for inclusion as such. If 
> you cannot see through the rubble, just ignore the patches. If the 
> conclusion is that this is a way forward, then I could start working on 
> a completely new set of much cleaner patches.,

You know, I really should have gone to bed, because I managed to mistake 
the "RFC" for "PATCH".

Very sorry for that,
Dscho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] UTF helpers
  2009-05-13 10:02         ` Andreas Ericsson
@ 2009-05-13 10:21           ` Esko Luontola
  2009-05-13 11:44             ` Alex Riesen
  0 siblings, 1 reply; 20+ messages in thread
From: Esko Luontola @ 2009-05-13 10:21 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Robin Rosenberg, git

Andreas Ericsson wrote on 13.5.2009 13:02:
> If you're *really* serious about it though, here's how to go about
> it:

Thanks for the pointers. I'll have a look at what the repository format 
is and how to create a migration path.

-- 
Esko Luontola
www.orfjackal.net

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] UTF helpers
  2009-05-13 10:21           ` Esko Luontola
@ 2009-05-13 11:44             ` Alex Riesen
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Riesen @ 2009-05-13 11:44 UTC (permalink / raw)
  To: Esko Luontola; +Cc: Andreas Ericsson, Robin Rosenberg, git

2009/5/13 Esko Luontola <esko.luontola@gmail.com>:
> Andreas Ericsson wrote on 13.5.2009 13:02:
>>
>> If you're *really* serious about it though, here's how to go about
>> it:
>
> Thanks for the pointers. I'll have a look at what the repository format is
> and how to create a migration path.
>

While at it, think about how are you going to merge trees where filenames
have changed their encoding (because of local configurations, you know).
And take a look at memory use in your new encoding-per-filename format

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] UTF helpers
  2009-05-13  9:24       ` Esko Luontola
  2009-05-13 10:02         ` Andreas Ericsson
@ 2009-05-13 18:48         ` Junio C Hamano
  2009-05-13 19:31           ` Esko Luontola
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2009-05-13 18:48 UTC (permalink / raw)
  To: Esko Luontola; +Cc: Robin Rosenberg, git

Esko Luontola <esko.luontola@gmail.com> writes:

> Robin Rosenberg wrote on 13.5.2009 8:24:
>> If the conclusion is that this is a way forward, then I
>> could start working on a completely new set of much cleaner patches.,
>
> That would be great!
>
> I see that in those early patches you took the approach of converting
> the filenames from the local encoding to UTF-8 at the outer edges of
> Git. That obviously was the easiest way to make the changes with
> minimal changes to Git.

Which would be the _only_ sane approach.

If you allow people to record otherwise exactly the same tree object in
different encoding, like you seem to have in mind, subtree comparision
based on the object name will not work and you will end up always
traversing down to the tip, because you won't know if your subtrees need
filename iconv until you recurse into them and actually take a look.

Don't do it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] UTF helpers
  2009-05-13 18:48         ` Junio C Hamano
@ 2009-05-13 19:31           ` Esko Luontola
  2009-05-13 20:10             ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Esko Luontola @ 2009-05-13 19:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robin Rosenberg, git

Junio C Hamano wrote on 13.5.2009 21:48:
> If you allow people to record otherwise exactly the same tree object in
> different encoding, like you seem to have in mind, subtree comparision
> based on the object name will not work and you will end up always
> traversing down to the tip, because you won't know if your subtrees need
> filename iconv until you recurse into them and actually take a look.

Could you please educate me, that which operations depend on "doing 
subtree comparisons based on the object name", and in about which 
files/functions those comparisons are done? Also, do you mean by "object 
name" the SHA1 of the object, the filename of a file/directory, or 
something else? I'd like to know, which parts of the code I should read, 
to get a better mental model on how Git works.

-- 
Esko Luontola
www.orfjackal.net

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] UTF helpers
  2009-05-13 19:31           ` Esko Luontola
@ 2009-05-13 20:10             ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2009-05-13 20:10 UTC (permalink / raw)
  To: Esko Luontola; +Cc: Junio C Hamano, Robin Rosenberg, git

Esko Luontola <esko.luontola@gmail.com> writes:

> Junio C Hamano wrote on 13.5.2009 21:48:
>> If you allow people to record otherwise exactly the same tree object in
>> different encoding, like you seem to have in mind, subtree comparision
>> based on the object name will not work and you will end up always
>> traversing down to the tip, because you won't know if your subtrees need
>> filename iconv until you recurse into them and actually take a look.
>
> Could you please educate me, that which operations depend on "doing
> subtree comparisons based on the object name",

"diff-tree A B" looks at corresponding subtrees of A and B and does not
recurse into the identical subdirectories; "git log -- dir" uses this fact
to speed up the checking of differences.  When a typical commit touches
only a handful of paths in a project with 20k paths, this really matters.

> files/functions those comparisons are done? Also, do you mean by
> "object name" the SHA1 of the object, the filename of a
> file/directory, or something else?

What is colloquially known as SHA-1 has an official terminology; see
Documentation/glossary.txt.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] UTF helpers
  2009-05-13  5:24     ` Robin Rosenberg
  2009-05-13  9:24       ` Esko Luontola
  2009-05-13 10:14       ` Johannes Schindelin
@ 2009-05-14  4:38       ` Junio C Hamano
  2009-05-14 13:57         ` Jay Soffian
  2 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2009-05-14  4:38 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: Johannes Schindelin, git

Robin Rosenberg <robin.rosenberg@dewire.com> writes:

>> This is ugly.
>
> I told you so. No news.
>
>> Okay, I'll stop here.  You might want to clean up your patch series before 
>> resending.
>
> I also told you why why I stopped working on the patches. The patches are not part of
> a beauty contest and not meant for inclusion as such.

It is rather sad; I suspect that the core of the series is buried in too
much cruft deep enough to discourage many potential reviewers.  I think
the entire series look incoherent because attacking two largely unrelated
things at once.

 (1) Normalizing pathnames internally to UTF-8 and possibly convert it
     back to native upon use (e.g. creat(), lstat(), unlink()) and output.
     As Linus analyzed, this shouldn't be done too early in the callchain
     for performance reasons, but I think your patch would give us a good
     set of starting points to follow where the result from readdir(),
     user input and other things that are pathnames come from and go.

     This part of the patch series was inspiring.  You have to worry about
     gitignore, gitattributes and readlink() vs contents of a blob object
     that records a symbolic link values, which I think either escaped
     analysis people have done so far or being ignored as a small detail,
     but they are important;

 (2) Passing cat-file output through iconv to convert it.

     I think this is unwarranted, even if the object given to cat-file
     happens to be a commit or a tag object and you want to convert their
     messages in native encoding.

     I am not sure what should happen to "cat-file tree", "ls-files" and
     "ls-tree".  The output from these plumbing does show pathnames, but I
     tend to think it is Porcelain's job to turn them into whatever
     encoding they want to use.  So are input to "update-index --stdin",
     but I am still just thinking out loud.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] UTF helpers
  2009-05-14  4:38       ` Junio C Hamano
@ 2009-05-14 13:57         ` Jay Soffian
  0 siblings, 0 replies; 20+ messages in thread
From: Jay Soffian @ 2009-05-14 13:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robin Rosenberg, Johannes Schindelin, git

On Thu, May 14, 2009 at 12:38 AM, Junio C Hamano <gitster@pobox.com> wrote:
> It is rather sad; I suspect that the core of the series is buried in too
> much cruft deep enough to discourage many potential reviewers.  I think
> the entire series look incoherent because attacking two largely unrelated
> things at once.
>
>  (1) Normalizing pathnames internally to UTF-8 and possibly convert it
>     back to native upon use (e.g. creat(), lstat(), unlink()) and output.
>     As Linus analyzed, this shouldn't be done too early in the callchain
>     for performance reasons, but I think your patch would give us a good
>     set of starting points to follow where the result from readdir(),
>     user input and other things that are pathnames come from and go.
>
>     This part of the patch series was inspiring.  You have to worry about
>     gitignore, gitattributes and readlink() vs contents of a blob object
>     that records a symbolic link values, which I think either escaped
>     analysis people have done so far or being ignored as a small detail,
>     but they are important;
>
>  (2) Passing cat-file output through iconv to convert it.
>
>     I think this is unwarranted, even if the object given to cat-file
>     happens to be a commit or a tag object and you want to convert their
>     messages in native encoding.
>
>     I am not sure what should happen to "cat-file tree", "ls-files" and
>     "ls-tree".  The output from these plumbing does show pathnames, but I
>     tend to think it is Porcelain's job to turn them into whatever
>     encoding they want to use.  So are input to "update-index --stdin",
>     but I am still just thinking out loud.

I definitely do not have the time to work on unicode/utf-8/i18n
support for git right now, but as an OS X user, it is something that
interests me. When this topic periodically pops up, I squirrel away
the useful messages into my "someday" folder. So even though it may
seem that reviewing these patches is wasted effort, comments like the
above are helpful. I say this because I expect someday someone will
work on this topic, even if it is not me, and hopefully they can
locate prior discussion in the mailing list archives and as such it
will be of some use.

IOW, your comments are useful and appreciated, even if it doesn't lead
to improved patches right away.

j.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-05-14 13:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-12 22:50 [RFC 0/8] Antique UTF-8 filename support Robin Rosenberg
2009-05-12 22:50 ` [RFC 1/8] UTF helpers Robin Rosenberg
2009-05-12 22:50   ` [RFC 2/8] Messages in locale Robin Rosenberg
2009-05-12 22:50     ` [RFC 3/8] Extend tests to cover locale wrt to commit messages Robin Rosenberg
2009-05-12 22:50       ` [RFC 4/8] UTF file names Robin Rosenberg
     [not found]         ` <1242168631-30753-6-git-send-email-robin.rosenberg@dewire.com>
2009-05-12 22:50           ` [RFC 6/8] test of utf_locallinks Robin Rosenberg
2009-05-12 22:50             ` [RFC 7/8] Convert symlink dest in diff Robin Rosenberg
2009-05-12 22:50               ` [RFC 8/8] UTF-8 in non-SHA1-objects Robin Rosenberg
2009-05-13  0:20   ` [RFC 1/8] UTF helpers Johannes Schindelin
2009-05-13  5:24     ` Robin Rosenberg
2009-05-13  9:24       ` Esko Luontola
2009-05-13 10:02         ` Andreas Ericsson
2009-05-13 10:21           ` Esko Luontola
2009-05-13 11:44             ` Alex Riesen
2009-05-13 18:48         ` Junio C Hamano
2009-05-13 19:31           ` Esko Luontola
2009-05-13 20:10             ` Junio C Hamano
2009-05-13 10:14       ` Johannes Schindelin
2009-05-14  4:38       ` Junio C Hamano
2009-05-14 13:57         ` Jay Soffian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.