linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael K. Edwards" <medwards.linux@gmail.com>
To: 7eggert@gmx.de
Cc: "Eric Dumazet" <dada1@cosmosbay.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
Subject: Re: sys_write() racy for multi-threaded append?
Date: Mon, 12 Mar 2007 09:26:22 -0700	[thread overview]
Message-ID: <f2b55d220703120926k1ab7112fh60227fac670b8b4c@mail.gmail.com> (raw)
In-Reply-To: <E1HQfLX-0000fk-68@be1.lrz>

On 3/12/07, Bodo Eggert <7eggert@gmx.de> wrote:
> Michael K. Edwards <medwards.linux@gmail.com> wrote:
> > Actually, I think it would make the kernel (negligibly) faster to bump
> > f_pos before the vfs_write() call.
>
> This is a security risk.
>
> ----------------
> other process:
> unlink(secrest_file)
>
> Thread 1:
> write(fd, large)
> (interrupted)
>
> Thread 2:
> fseek(fd, -n, relative)
> read(fd, buf)
> ----------------

I don't entirely follow this.  Which process is supposed to be secure
here, and what information is supposed to be secret from whom?  But in
any case, are you aware that f_pos is clamped to the inode's
max_bytes, not to the current file size?  Thread 2 can seek/read past
the position at which Thread 1's write began, whether f_pos is bumped
before or after the vfs_write.

> BTW: The best thing you can do to a program where two threads race for
> writing one fd is to let it crash and burn in the most spectacular way
> allowed without affecting the rest of the system, unless it happens to
> be a pipe and the number of bytes written is less than PIPE_MAX.

That's fine when you're doing integration test, and should probably be
the default during development.  But if the race is first exposed in
the field, or if the developer is trying to concentrate on a different
problem, "spectacular crash and burn" may do more harm than good.
It's easy enough to refactor the f_pos handling in the kernel so that
it all goes through three or four inline accessor functions, at which
point you can choose your trade-off between speed and idiot-proofness
-- at _kernel_ compile time, or (given future hardware that supports
standardized optionally-atomic-based-on-runtime-flag operations) per
process at run-time.

Frankly, I think that unless application programmers poke at some sort
of magic "I promise to handle short writes correctly" bit, write()
should always return either the full number of bytes requested or an
error code.  If they do poke at that bit, the (development) kernel
should deliberately split writes a few percent of the time, just to
exercise the short-write code paths.  And in order to find out where
that magic bit is, they should have to read the kernel code or ask on
LKML (and get the "standard lecture").

Really very IEEE754-like, actually.  (Harp, harp.)

Cheers,
- Michael

  reply	other threads:[~2007-03-12 16:33 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <7WzUo-1zl-21@gated-at.bofh.it>
     [not found] ` <7WAx2-2pg-21@gated-at.bofh.it>
     [not found]   ` <7WAGF-2Bx-9@gated-at.bofh.it>
     [not found]     ` <7WB07-3g5-33@gated-at.bofh.it>
     [not found]       ` <7WBt7-3SZ-23@gated-at.bofh.it>
2007-03-12  7:53         ` sys_write() racy for multi-threaded append? Bodo Eggert
2007-03-12 16:26           ` Michael K. Edwards [this message]
2007-03-12 18:48             ` Bodo Eggert
2007-03-13  0:46               ` Michael K. Edwards
2007-03-13  2:24                 ` Alan Cox
2007-03-13  7:25                   ` Michael K. Edwards
2007-03-13  7:42                     ` David Miller
2007-03-13 16:24                       ` Michael K. Edwards
2007-03-13 17:59                         ` Michael K. Edwards
2007-03-13 19:09                           ` Christoph Hellwig
2007-03-13 23:40                             ` Michael K. Edwards
2007-03-14  0:09                               ` Michael K. Edwards
2007-03-13 13:15                     ` Alan Cox
2007-03-14 20:09                       ` Michael K. Edwards
2007-03-16 16:43                         ` Frank Ch. Eigler
2007-03-16 17:25                         ` Alan Cox
2007-03-13 14:00                   ` David M. Lloyd
2007-03-08 23:08 Michael K. Edwards
2007-03-08 23:43 ` Eric Dumazet
2007-03-08 23:57   ` Michael K. Edwards
2007-03-09  0:15     ` Eric Dumazet
2007-03-09  0:45       ` Michael K. Edwards
2007-03-09  1:34         ` Benjamin LaHaise
2007-03-09 12:19           ` Michael K. Edwards
2007-03-09 13:44             ` Eric Dumazet
2007-03-09 14:10             ` Alan Cox
2007-03-09 14:59             ` Benjamin LaHaise
2007-03-10  6:43               ` Michael K. Edwards
2007-03-09  5:53         ` Eric Dumazet
2007-03-09 11:52           ` Michael K. Edwards
2007-03-09  0:43 ` Alan Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2b55d220703120926k1ab7112fh60227fac670b8b4c@mail.gmail.com \
    --to=medwards.linux@gmail.com \
    --cc=7eggert@gmx.de \
    --cc=dada1@cosmosbay.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).