[RFC] embedded TAB and LF in pathnames

* [RFC] embedded TAB and LF in pathnames
@ 2005-10-07 19:35 Junio C Hamano
  2005-10-07 23:29 ` Alex Riesen
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2005-10-07 19:35 UTC (permalink / raw)
  To: git; +Cc: Kai Ruemmler

While I was reviewing git-status fix by Kai Ruemmler, it struck
me that our barebone Porcelain-ish layer got a bit sloppier over
time.  The core layer does not care about any metacharacters in
the pathname, and it has provisions, primarily in the form of
'-z' flag, for carefully written Porcelain layers to handle
pathnames with embedded metacharacters correctly.

One exception, however, is the interaction between the git-diff
family output and git-apply.  We needed to be compatible with
other people's diff, which meant that we should not have to
worry too much about pathnames with embedded TABs and LFs
because GNU diff would not produce usable diff for such things
anyway.  But 'git-diff --names' barfing if a pathname contained
these characters when run without '-z' flag was too much.  This
still breaks 'git-status'.

So I am considering the following changes:

 - 'raw' output format without '-z', upon finding a TAB or LF,
   would not die, but just issue a warning.  However, the paths
   are "munged" in a way described later.

 - '--name-only' and '--name-status' format issue the same
   warning when finding these characters and run without '-z'.
   And the paths are "munged" as well.

 - 'patch' output format also issues a warning.  The paths are
   "munged" but in a slightly different manner from the above.

 - 'git-apply' is taught about the path munging in the diff
   input for git diffs (i.e. 'diff --git') and do sensible
   things.

One possible way for path munging goes like this.  We could take
advantage of the fact that we do not ever output '//' ourselves,
and '//' never appears in valid diffs by other people's tools,
unless done deliberately by hand ("diff -u a//foo. b//foo.c"
from the command line).  So we could use '//' as if it is a
backslash.  Examples.

  "foo/bar.c" --> "foo/bar.c"	(no funny letters - as before)
  "foo\nbar"  --> "foo//0Abar" (double slash followed by 2 hex)
  "foo\tbar"  --> "foo//09bar" (double slash followed by 2 hex)

So a diff output to rename "foo/bar.c" to "foo\nbar.c" would
become:

  diff --git a/foo/bar.c b/foo//0Abar.c
  similarity index 100%
  rename from foo
  rename to foo//0Abar.c

The byte-values subject to this munging is LF for patch output
(because git-apply seems to grok TABs in pathnames just fine),
and TAB and LF for 'raw', '--name-only', '--name-status' without
'-z'.

I have not made up my mind on the exact choice of the quoting
convention.  We could say '///' instead of '//', for example, or
even '//{LF}//' instead of '//0A' proposed above.  One thing I
am trying to avoid is "foo\nbar", which I suspect would be
unfriendly to the Cygwin folks.

^ permalink raw reply	[flat|nested] 33+ messages in thread