linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael K. Edwards" <medwards.linux@gmail.com>
To: "David Miller" <davem@davemloft.net>
Cc: alan@lxorguk.ukuu.org.uk, 7eggert@gmx.de, dada1@cosmosbay.com,
	linux-kernel@vger.kernel.org
Subject: Re: sys_write() racy for multi-threaded append?
Date: Tue, 13 Mar 2007 09:24:22 -0700	[thread overview]
Message-ID: <f2b55d220703130924h11786d9fh1d87f90bcd1b3d60@mail.gmail.com> (raw)
In-Reply-To: <20070313.004224.41634994.davem@davemloft.net>

On 3/13/07, David Miller <davem@davemloft.net> wrote:
> You're not even safe over standard output, simply run the program over
> ssh and you suddenly have socket semantics to deal with.

I'm intimately familiar with this one.  Easily worked around by piping
the output through cat or tee.  Not that one should ever write code
for a *nix box that can't cope with stdout being a socket or tty; but
sometimes the quickest way to glue existing code into a pipeline is to
pass /dev/stdout in place of a filename, and there's no real value in
reworking legacy code to handle short writes when you can just drop in
cat.

> In short, if you don't handle short writes, you're writing a program
> for something other than unix.

Right in one.  You're writing a program for a controlled environment,
whether it's a Linux-only API (netlink sockets) or a Linux-only
embedded product.  And there's no need for that controlled environment
to gratuitously reproduce the overly vague semantics of the *nix zoo.

> We're not changing write() to interlock with other parallel callers or
> messing with the f_pos semantics in such cases, that's stupid, please
> cope, kthx.

This is not what I am proposing, and certainly not what I'm
implementing.  Right now f_pos handling is gratuitously thread-unsafe
even in the common, all writes completed normally case.  writev()
opens the largest possible window to f_pos corruption by delaying the
f_pos update until after the vfs_writev() completes.  That's the main
thing I want to see fixed.

_If_ it proves practical to wrap accessor functions around f_pos
accesses, and _if_ accesses to f_pos from outside read_write.c can be
made robust against transient overshoot, then there is no harm in
tightening up f_pos semantics so that successful pipelined writes to
the same file don't collide so easily.  If read-modify-write on f_pos
were atomic, it would be easy to guarantee that it is in a sane state
any time a syscall is not actually outstanding on that struct file.
There's even a potential performance gain from streamlining the
pattern of L1 cache usage in the common case, although I expect it's
negligible.

Making read-modify-write atomic on f_pos is of course not free on all
platforms.  But once you have the accessors, you can decide at kernel
compile time whether or not to interlock them with spinlocks and
memory barriers or some such.  Most platforms probably needn't bother;
there are much bigger time bombs lurking elsewhere in the kernel.  x86
is a bit dicey, because there's no way to atomically update a 64-bit
loff_t in memory, and so in principle you could get a race on carry
across the 32-bit boundary.  (This file instantaneously jumped from
4GB to 8GB in size?  What the hell?  How long would it take you to
spot a preemption between writing out the upper and lower halves of a
64-bit integer?)

In any case, I have certainly gotten the -ENOPATCH message by now.  It
must be nice not to have to care whether things work in the field if
you can point at something stupid that an application programmer did.
I don't always have that luxury.

Cheers,
- Michael

  reply	other threads:[~2007-03-13 16:24 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <7WzUo-1zl-21@gated-at.bofh.it>
     [not found] ` <7WAx2-2pg-21@gated-at.bofh.it>
     [not found]   ` <7WAGF-2Bx-9@gated-at.bofh.it>
     [not found]     ` <7WB07-3g5-33@gated-at.bofh.it>
     [not found]       ` <7WBt7-3SZ-23@gated-at.bofh.it>
2007-03-12  7:53         ` sys_write() racy for multi-threaded append? Bodo Eggert
2007-03-12 16:26           ` Michael K. Edwards
2007-03-12 18:48             ` Bodo Eggert
2007-03-13  0:46               ` Michael K. Edwards
2007-03-13  2:24                 ` Alan Cox
2007-03-13  7:25                   ` Michael K. Edwards
2007-03-13  7:42                     ` David Miller
2007-03-13 16:24                       ` Michael K. Edwards [this message]
2007-03-13 17:59                         ` Michael K. Edwards
2007-03-13 19:09                           ` Christoph Hellwig
2007-03-13 23:40                             ` Michael K. Edwards
2007-03-14  0:09                               ` Michael K. Edwards
2007-03-13 13:15                     ` Alan Cox
2007-03-14 20:09                       ` Michael K. Edwards
2007-03-16 16:43                         ` Frank Ch. Eigler
2007-03-16 17:25                         ` Alan Cox
2007-03-13 14:00                   ` David M. Lloyd
2007-03-08 23:08 Michael K. Edwards
2007-03-08 23:43 ` Eric Dumazet
2007-03-08 23:57   ` Michael K. Edwards
2007-03-09  0:15     ` Eric Dumazet
2007-03-09  0:45       ` Michael K. Edwards
2007-03-09  1:34         ` Benjamin LaHaise
2007-03-09 12:19           ` Michael K. Edwards
2007-03-09 13:44             ` Eric Dumazet
2007-03-09 14:10             ` Alan Cox
2007-03-09 14:59             ` Benjamin LaHaise
2007-03-10  6:43               ` Michael K. Edwards
2007-03-09  5:53         ` Eric Dumazet
2007-03-09 11:52           ` Michael K. Edwards
2007-03-09  0:43 ` Alan Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2b55d220703130924h11786d9fh1d87f90bcd1b3d60@mail.gmail.com \
    --to=medwards.linux@gmail.com \
    --cc=7eggert@gmx.de \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).