From: Linus Torvalds <torvalds@osdl.org>
To: Paul Eggert <eggert@CS.UCLA.EDU>
Cc: Junio C Hamano <junkio@cox.net>,
Robert Fitzsimons <robfitz@273k.net>,
Alex Riesen <raa.lkml@gmail.com>,
git@vger.kernel.org, Kai Ruemmler <kai.ruemmler@gmx.net>
Subject: Re: [PATCH] Try URI quoting for embedded TAB and LF in pathnames
Date: Thu, 13 Oct 2005 22:20:53 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0510132203220.23590@g5.osdl.org> (raw)
In-Reply-To: <87irw1q7eu.fsf@penguin.cs.ucla.edu>
On Thu, 13 Oct 2005, Paul Eggert wrote:
>
> Perhaps so, but it has a lot of company. I have even worse problems
> with Mozilla Thunderbird. And as we observed, Pine also has problems
> sending properly-formatted email containing arbitrary binary data.
No, pine does it right. Exactly because it sends _arbitraty_ binary data.
The fact that I turned the terminal into utf-8 mode in order to generate
the bytes (that end up being a garbage string in latin1) is not pine's
fault.
The point being that because the transport was 8-bit clean, I could do
that. I could mix a latin1-encoding with a UTF-8 encoding, and the other
side could see the mixed setting. Now, the other side had no way of
knowing that I mixed things (unless it was a smart human and could read
and understand what I wrote), so any email client would have trouble
showing it.
But it got _transferred_ right, and you could have saved the email, and
turned the terminal into latin1 or utf-8 mode, and done a "cat" both ways,
and you'd have seen both versions.
> I suspect the vast majority of email clients will screw up in
> relatively common cases involving unusual characters in file names.
Not if they just save it.
Oh, sure, they can't _display_ it, since they don't know what it is, but
when they save it, they'd _better_ save it bit-for-bit.
Which is the right thing to do. Then you apply it with "patch", and you
get the right answer.
> Using attachments avoids many of the problems, but lots of patches are
> emailed inline and I'd rather not force people to use attachments to
> send diffs.
inline or attachment should not matter to any sane email client. If it
does, then the email client isn't sane.
The point is, when you save it, it _has_ to be saved bit-for-bit.
The only difference between a binary attachment and a text thing is that
an email client will _try_ to show the text thing to you as text. It has
no other meaning.
And trying is better than not trying. Attachments are _inferior_ to inline
for that reason.
> > I find that email is very robust - it's basically 8-bit clean. No
> > character encoding, no crap. Just a byte stream. It really _is_ the most
> > reliable format.
>
> Hmm. To test that theory, I just now sent plain-text email to myself,
> containing a carriage-return (CR) byte in the middle of a line.
>
> The CR byte was transliterated into a LF. Ooops.
I'm not surprised, since CR/LF is special for a lot of (sad) reasons. Oh,
well.
I agree that it makes sense to escape \r, and obviously you _have_ to
escape \n. In general, escaping pretty much everything in the 0-31 range
is likely the right approach, since those are never printable anyway.
That, btw, is probably true of the patch contents too, not just the
filename. The exception being \t (and in patch contents, \n is obviously
part of the stream).
> More generally, I suspect inline patches with weird bytes will suffer
> greatly from encoding and recoding by mail agents.
I've had pretty good luck. We do have 8-bit stuff occasionally, but it
almost always makes it through.
Spaces and tabs are much worse (yes, they're more common too). That's
clearly just crap mailers.
> Unfortunately this isn't true for Emacs, and I suspect other mailers
> will have similar problems. For example, with Emacs I can easily save
> either the exact byte-for-byte message body that my mail transfer
> agent gave me; or I can have Emacs decode the message into its
> constituent characters, reencode the result as UTF-8, and put that
> into a file.
Well, as long as there's a choice.
> In neither case, though, am I saving the original byte
> stream that you presented to your mail user agent. Even if I save the
> byte-for-byte message body, it is often in quoted-printable format so
> I'll have to decode strings like "=EF" to recover the original bytes.
You have a broken mail client. Now, I'm not a big fan of QP (I think it
was making a stupid excuse for bad transport), but QP is a _mail_ level
quoting protocol, and the same way a MUA uses QP to encode, the MUA should
have de-coded the QP. It shouldn't leave it to somebody else.
I think GNU emacs is a horrible mistake ("do everything - badly"), but you
may be able to fix it by letting your mail transport agent do the un-QP
for you. A lot of them do, which makes it easier to then use weak MUA's.
Anyway, it sounds like GNU emacs made the wrong choices (hey, I'm not
surprised). It should have decoded QP, not the character set. There are
lots of tools that do charset conversions, that's not very email-specific.
Linus
next prev parent reply other threads:[~2005-10-14 5:21 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-07 19:35 [RFC] embedded TAB and LF in pathnames Junio C Hamano
2005-10-07 23:29 ` Alex Riesen
2005-10-07 23:44 ` Junio C Hamano
2005-10-08 6:45 ` Alex Riesen
2005-10-08 9:10 ` Junio C Hamano
2005-10-08 13:30 ` [PATCH] Try URI quoting for " Robert Fitzsimons
2005-10-08 18:30 ` Junio C Hamano
2005-10-08 20:19 ` Junio C Hamano
2005-10-11 6:20 ` Paul Eggert
2005-10-11 7:37 ` Junio C Hamano
2005-10-11 15:17 ` Linus Torvalds
2005-10-11 18:03 ` Paul Eggert
2005-10-11 18:37 ` Linus Torvalds
2005-10-11 19:42 ` Paul Eggert
2005-10-11 20:56 ` Linus Torvalds
2005-10-12 6:51 ` Paul Eggert
2005-10-12 14:59 ` Linus Torvalds
2005-10-12 19:07 ` Daniel Barkalow
2005-10-12 19:52 ` Linus Torvalds
2005-10-12 20:21 ` H. Peter Anvin
[not found] ` <87vf02qy79.fsf@penguin.cs.ucla.edu>
2005-10-12 21:02 ` Junio C Hamano
2005-10-12 21:05 ` Linus Torvalds
2005-10-12 21:09 ` H. Peter Anvin
2005-10-12 21:15 ` Johannes Schindelin
2005-10-12 21:33 ` Junio C Hamano
2005-10-14 0:57 ` Paul Eggert
2005-10-14 5:43 ` Linus Torvalds
2005-10-12 21:24 ` Linus Torvalds
2005-10-14 0:16 ` Paul Eggert
2005-10-14 5:20 ` Linus Torvalds [this message]
2005-10-14 17:18 ` H. Peter Anvin
2005-10-14 6:59 ` Junio C Hamano
2005-10-09 10:42 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0510132203220.23590@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=eggert@CS.UCLA.EDU \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=kai.ruemmler@gmx.net \
--cc=raa.lkml@gmail.com \
--cc=robfitz@273k.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).