All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Rosenberg <robin.rosenberg@dewire.com>
To: git@vger.kernel.org
Cc: Robin Rosenberg <robin.rosenberg@gmail.com>
Subject: [RFC 0/8] Antique UTF-8 filename support
Date: Wed, 13 May 2009 00:50:23 +0200	[thread overview]
Message-ID: <1242168631-30753-1-git-send-email-robin.rosenberg@dewire.com> (raw)

From: Robin Rosenberg <robin.rosenberg@gmail.com>

Since there is some interest in the topic, now, I can republish my old 2 ½ year old
patches so there is some real code to comment on. They apply on top of
6dcfa306f2b67b733a7eb2d7ded1bc9987809edb, For completness I send
all patches, but the interesing stuff is in patch 4 and 5. Beware of encoding
issues with the test cases.

They do not handle Windows UTF-16 at all, but I think that is just a matter of writing
windows specifc wrappers for the filename and directory handling routines.

Feel free to rewamp and steal ideas and add constructive criticism. Don't even 
think of cherry-picking and rebasing, It's careful handpicking with copy/paste at 
best, but mostly it's fuel for discussions.

I'd admit some parts are quite kludgy and probably slow. as I was primarily 
interested to see if it was even feasible, which it was. however there was simply
no interest, which meant there was no point in optimizing it. It was simply the
wrong problem at the time.

Disclaimer: A problem with this approach is that, although it does character
conversion, if you are on a non-UTF-8 locale it will not let you mange
any repository. That is basically impossible and hence not the goal. It does
help people with the same (or close) languages to cooperate without enforcing
a common encoding as long as stick to the common characters, i.e. the ones
that can be converted between the locales involved.

This is probably the most out-dated patch series ever. 

-- robin

Robin Rosenberg (8):
(mostly obsolete)
  UTF helpers
  Messages in locale.
  Extend tests to cover locale wrt to commit messages.

The interesing stuff (patch 4 & 5)
  UTF file names.
  Extend all tests to work on UTF-8 filenames.

old wip
  test of utf_locallinks
  Convert symlink dest in diff
  UTF-8 in non-SHA1-objects

 Makefile                            |    8 +-
 builtin-add.c                       |    5 +-
 builtin-cat-file.c                  |    6 +-
 builtin-checkout-index.c            |   46 +++-
 builtin-commit-tree.c               |    9 +-
 builtin-ls-files.c                  |   26 ++-
 builtin-ls-tree.c                   |   16 +-
 builtin-rev-parse.c                 |    7 +-
 builtin-update-index.c              |   18 +-
 builtin-write-tree.c                |    5 +-
 diff.c                              |  111 ++++++--
 dir.c                               |   22 +-
 git-commit.sh                       |    5 +
 git-compat-util.h                   |   43 +++
 git-rebase.sh                       |    1 +
 git.c                               |    9 +
 log-tree.c                          |    4 +-
 merge-index.c                       |   25 ++-
 read-cache.c                        |    8 +-
 refs.c                              |   11 +-
 setup.c                             |   28 ++-
 t/lib-read-tree-m-3way.sh           |   38 ++--
 t/t-utf-filenames.sh                |   95 +++++++
 t/t-utf-msg.sh                      |   43 +++
 t/t0000-basic.sh                    |  117 ++++----
 t/t0010-racy-git.sh                 |   10 +-
 t/t1000-read-tree-m-3way.sh         |  240 +++++++++---------
 t/t1001-read-tree-m-2way.sh         |   56 ++--
 t/t1020-subdirectory.sh             |   63 +++---
 t/t1100-commit-tree-options.sh      |   12 +-
 t/t1400-update-ref.sh               |   10 +-
 t/t2000-checkout-cache-clash.sh     |   18 +-
 t/t2001-checkout-cache-clash.sh     |   30 +-
 t/t2002-checkout-cache-u.sh         |    8 +-
 t/t2003-checkout-cache-mkdir.sh     |  118 ++++----
 t/t2004-checkout-cache-temp.sh      |  144 +++++-----
 t/t2100-update-cache-badpath.sh     |   48 ++--
 t/t2101-update-index-reupdate.sh    |   56 ++--
 t/t3000-ls-files-others.sh          |   36 ++--
 t/t3002-ls-files-dashpath.sh        |   24 +-
 t/t3010-ls-files-killed-modified.sh |  104 ++++----
 t/t3020-ls-files-error-unmatch.sh   |   10 +-
 t/t3100-ls-tree-restrict.sh         |  122 +++++-----
 t/t3101-ls-tree-dirname.sh          |   88 +++---
 t/t3400-rebase.sh                   |   18 +-
 t/t3401-rebase-partial.sh           |   24 +-
 t/t3402-rebase-merge.sh             |   17 +-
 t/t3403-rebase-skip.sh              |   10 +-
 t/t3500-cherry.sh                   |   26 +-
 t/t3600-rm.sh                       |   28 +-
 t/t3700-add.sh                      |   30 +-
 t/t4000-diff-format.sh              |   26 +-
 t/t4001-diff-rename.sh              |   20 +-
 t/t4002-diff-basic.sh               |  160 ++++++------
 t/t4003-diff-rename-1.sh            |   66 +++---
 t/t4004-diff-rename-symlink.sh      |   40 ++--
 t/t4005-diff-rename-2.sh            |   54 ++--
 t/t4006-diff-mode.sh                |   14 +-
 t/t4008-diff-break-rewrite.sh       |  100 ++++----
 t/t4009-diff-rename-4.sh            |   63 +++---
 t/t4011-diff-symlink.sh             |   38 ++--
 t/t4012-diff-binary.sh              |   16 +-
 t/t7301-rev-parse.sh                |   20 ++
 t/test-lib.sh                       |   13 +-
 test-utf.c                          |   61 +++++
 utf.c                               |  501 +++++++++++++++++++++++++++++++++++
 utf.h                               |   27 ++
 67 files changed, 2133 insertions(+), 1142 deletions(-)
 create mode 100755 t/t-utf-filenames.sh
 create mode 100755 t/t-utf-msg.sh
 create mode 100755 t/t7301-rev-parse.sh
 create mode 100644 test-utf.c
 create mode 100644 utf.c
 create mode 100644 utf.h

             reply	other threads:[~2009-05-12 22:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-12 22:50 Robin Rosenberg [this message]
2009-05-12 22:50 ` [RFC 1/8] UTF helpers Robin Rosenberg
2009-05-12 22:50   ` [RFC 2/8] Messages in locale Robin Rosenberg
2009-05-12 22:50     ` [RFC 3/8] Extend tests to cover locale wrt to commit messages Robin Rosenberg
2009-05-12 22:50       ` [RFC 4/8] UTF file names Robin Rosenberg
     [not found]         ` <1242168631-30753-6-git-send-email-robin.rosenberg@dewire.com>
2009-05-12 22:50           ` [RFC 6/8] test of utf_locallinks Robin Rosenberg
2009-05-12 22:50             ` [RFC 7/8] Convert symlink dest in diff Robin Rosenberg
2009-05-12 22:50               ` [RFC 8/8] UTF-8 in non-SHA1-objects Robin Rosenberg
2009-05-13  0:20   ` [RFC 1/8] UTF helpers Johannes Schindelin
2009-05-13  5:24     ` Robin Rosenberg
2009-05-13  9:24       ` Esko Luontola
2009-05-13 10:02         ` Andreas Ericsson
2009-05-13 10:21           ` Esko Luontola
2009-05-13 11:44             ` Alex Riesen
2009-05-13 18:48         ` Junio C Hamano
2009-05-13 19:31           ` Esko Luontola
2009-05-13 20:10             ` Junio C Hamano
2009-05-13 10:14       ` Johannes Schindelin
2009-05-14  4:38       ` Junio C Hamano
2009-05-14 13:57         ` Jay Soffian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1242168631-30753-1-git-send-email-robin.rosenberg@dewire.com \
    --to=robin.rosenberg@dewire.com \
    --cc=git@vger.kernel.org \
    --cc=robin.rosenberg@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.