git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Paul Eggert <eggert@CS.UCLA.EDU>
Cc: Junio C Hamano <junkio@cox.net>,
	Robert Fitzsimons <robfitz@273k.net>,
	Alex Riesen <raa.lkml@gmail.com>,
	git@vger.kernel.org, Kai Ruemmler <kai.ruemmler@gmx.net>
Subject: Re: [PATCH] Try URI quoting for embedded TAB and LF in pathnames
Date: Thu, 13 Oct 2005 22:20:53 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0510132203220.23590@g5.osdl.org> (raw)
In-Reply-To: <87irw1q7eu.fsf@penguin.cs.ucla.edu>



On Thu, 13 Oct 2005, Paul Eggert wrote:
> 
> Perhaps so, but it has a lot of company.  I have even worse problems
> with Mozilla Thunderbird.  And as we observed, Pine also has problems
> sending properly-formatted email containing arbitrary binary data.

No, pine does it right. Exactly because it sends _arbitraty_ binary data.

The fact that I turned the terminal into utf-8 mode in order to generate 
the bytes (that end up being a garbage string in latin1) is not pine's 
fault. 

The point being that because the transport was 8-bit clean, I could do 
that. I could mix a latin1-encoding with a UTF-8 encoding, and the other 
side could see the mixed setting. Now, the other side had no way of 
knowing that I mixed things (unless it was a smart human and could read 
and understand what I wrote), so any email client would have trouble 
showing it.

But it got _transferred_ right, and you could have saved the email, and 
turned the terminal into latin1 or utf-8 mode, and done a "cat" both ways, 
and you'd have seen both versions.

> I suspect the vast majority of email clients will screw up in
> relatively common cases involving unusual characters in file names.

Not if they just save it.

Oh, sure, they can't _display_ it, since they don't know what it is, but 
when they save it, they'd _better_ save it bit-for-bit.

Which is the right thing to do. Then you apply it with "patch", and you 
get the right answer.

> Using attachments avoids many of the problems, but lots of patches are
> emailed inline and I'd rather not force people to use attachments to
> send diffs.

inline or attachment should not matter to any sane email client. If it 
does, then the email client isn't sane.

The point is, when you save it, it _has_ to be saved bit-for-bit. 

The only difference between a binary attachment and a text thing is that 
an email client will _try_ to show the text thing to you as text. It has 
no other meaning.

And trying is better than not trying. Attachments are _inferior_ to inline 
for that reason.

> > I find that email is very robust - it's basically 8-bit clean. No 
> > character encoding, no crap. Just a byte stream. It really _is_ the most 
> > reliable format.
> 
> Hmm.  To test that theory, I just now sent plain-text email to myself,
> containing a carriage-return (CR) byte in the middle of a line.
> 
> The CR byte was transliterated into a LF.  Ooops.

I'm not surprised, since CR/LF is special for a lot of (sad) reasons. Oh, 
well.

I agree that it makes sense to escape \r, and obviously you _have_ to 
escape \n. In general, escaping pretty much everything in the 0-31 range 
is likely the right approach, since those are never printable anyway.

That, btw, is probably true of the patch contents too, not just the 
filename. The exception being \t (and in patch contents, \n is obviously 
part of the stream).

> More generally, I suspect inline patches with weird bytes will suffer
> greatly from encoding and recoding by mail agents.

I've had pretty good luck. We do have 8-bit stuff occasionally, but it 
almost always makes it through. 

Spaces and tabs are much worse (yes, they're more common too). That's 
clearly just crap mailers.

> Unfortunately this isn't true for Emacs, and I suspect other mailers
> will have similar problems.  For example, with Emacs I can easily save
> either the exact byte-for-byte message body that my mail transfer
> agent gave me; or I can have Emacs decode the message into its
> constituent characters, reencode the result as UTF-8, and put that
> into a file.

Well, as long as there's a choice.

> In neither case, though, am I saving the original byte
> stream that you presented to your mail user agent.  Even if I save the
> byte-for-byte message body, it is often in quoted-printable format so
> I'll have to decode strings like "=EF" to recover the original bytes.

You have a broken mail client. Now, I'm not a big fan of QP (I think it 
was making a stupid excuse for bad transport), but QP is a _mail_ level 
quoting protocol, and the same way a MUA uses QP to encode, the MUA should 
have de-coded the QP. It shouldn't leave it to somebody else.

I think GNU emacs is a horrible mistake ("do everything - badly"), but you 
may be able to fix it by letting your mail transport agent do the un-QP 
for you. A lot of them do, which makes it easier to then use weak MUA's.

Anyway, it sounds like GNU emacs made the wrong choices (hey, I'm not 
surprised). It should have decoded QP, not the character set. There are 
lots of tools that do charset conversions, that's not very email-specific.

			Linus

  reply	other threads:[~2005-10-14  5:21 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-07 19:35 [RFC] embedded TAB and LF in pathnames Junio C Hamano
2005-10-07 23:29 ` Alex Riesen
2005-10-07 23:44   ` Junio C Hamano
2005-10-08  6:45     ` Alex Riesen
2005-10-08  9:10       ` Junio C Hamano
2005-10-08 13:30         ` [PATCH] Try URI quoting for " Robert Fitzsimons
2005-10-08 18:30           ` Junio C Hamano
2005-10-08 20:19             ` Junio C Hamano
2005-10-11  6:20               ` Paul Eggert
2005-10-11  7:37                 ` Junio C Hamano
2005-10-11 15:17                 ` Linus Torvalds
2005-10-11 18:03                   ` Paul Eggert
2005-10-11 18:37                     ` Linus Torvalds
2005-10-11 19:42                       ` Paul Eggert
2005-10-11 20:56                         ` Linus Torvalds
2005-10-12  6:51                           ` Paul Eggert
2005-10-12 14:59                             ` Linus Torvalds
2005-10-12 19:07                               ` Daniel Barkalow
2005-10-12 19:52                                 ` Linus Torvalds
2005-10-12 20:21                                   ` H. Peter Anvin
     [not found]                               ` <87vf02qy79.fsf@penguin.cs.ucla.edu>
2005-10-12 21:02                                 ` Junio C Hamano
2005-10-12 21:05                                 ` Linus Torvalds
2005-10-12 21:09                                   ` H. Peter Anvin
2005-10-12 21:15                                   ` Johannes Schindelin
2005-10-12 21:33                                   ` Junio C Hamano
2005-10-14  0:57                                   ` Paul Eggert
2005-10-14  5:43                                     ` Linus Torvalds
2005-10-12 21:24                                 ` Linus Torvalds
2005-10-14  0:16                                   ` Paul Eggert
2005-10-14  5:20                                     ` Linus Torvalds [this message]
2005-10-14 17:18                                       ` H. Peter Anvin
2005-10-14  6:59                                 ` Junio C Hamano
2005-10-09 10:42           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0510132203220.23590@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=eggert@CS.UCLA.EDU \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=kai.ruemmler@gmx.net \
    --cc=raa.lkml@gmail.com \
    --cc=robfitz@273k.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).