linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: David Howells <dhowells@redhat.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Nicolas Dichtel <nicolas.dichtel@6wind.com>,
	raven@themaw.net, Christian Brauner <christian@brauner.io>,
	keyrings@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-block <linux-block@vger.kernel.org>,
	LSM List <linux-security-module@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 11/10] pipe: Add fsync() support [ver #2]
Date: Sat, 2 Nov 2019 16:02:13 -0700	[thread overview]
Message-ID: <CAHk-=wgzRU9RjkZG0L9_yrnFN69REkrSokTQOGZMUkvdispvuQ@mail.gmail.com> (raw)
In-Reply-To: <E590C3AF-1D09-4927-B83F-DD0A6A148B6D@amacapital.net>

On Sat, Nov 2, 2019 at 3:30 PM Andy Lutomirski <luto@amacapital.net> wrote:
>
> So you allocate memory, vmsplice, and munmap() without reusing it?

You can re-use it as much as you want. Just don't write to it.

So the traditional argument for this was "I do a caching http server".
If you don't ever load the data into user space at all and just push
file data out, you just use splice() from the file to the target. But
if you generate some of the data in memory, and you cache it, you use
vmsplice().

And then it really is very easy to set up: make sure you generate your
caches with a new clean private mmap, and you can throw them out with
munmap (or just over-mmap it with the new cache, of course).

If you don't cache it, then there's no advantage to vmsplice() - just
write() it and forget about it. The whole (and only) point of
vmsplice() is when you want to zero-copy the data, and that's
generally likely only an advantage if you can do it multiple times.

But I don't think anybody actually _did_ any of that. But that's
basically the argument for the three splice operations:
write/vmsplice/splice(). Which one you use depends on the lifetime and
the source of your data. write() is obviously for the copy case (the
source data might not be stable), while splice() is for the "data from
another source", and vmsplace() is "data is from stable data in my
vm".

There's the reverse op, of course, but we never implemented that:
mmap() on the pipe could do the reverse of a vmsplice() (moving from
the pipe to the vm), but it would only work if everything was
page-aligned, which it effectively never is. It's basically a
benchmark-only operation.

And the existence of vmsplice() is because we actually had code to
play games with making write() do a zero-copy but mark the source as
being COW. It was _wonderful_ for benchmarks, and was completely
useless for real world case because in the real world you always took
the COW fault. So vmsplice() is basically a "hey, I know what I'm
doing, and you can just take the page as-is because the source is
stable".

             Linus

  reply	other threads:[~2019-11-02 23:02 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-23 20:17 [RFC PATCH 00/10] pipe: Notification queue preparation [ver #2] David Howells
2019-10-23 20:17 ` [RFC PATCH 01/10] pipe: Reduce #inclusion of pipe_fs_i.h " David Howells
2019-10-23 20:17 ` [RFC PATCH 02/10] Remove the nr_exclusive argument from __wake_up_sync_key() " David Howells
2019-10-23 20:17 ` [RFC PATCH 03/10] Add wake_up_interruptible_sync_poll_locked() " David Howells
2019-10-23 20:17 ` [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length " David Howells
2019-10-27 14:03   ` Linus Torvalds
2019-10-30 16:19   ` Ilya Dryomov
2019-10-30 20:35     ` Rasmus Villemoes
2019-10-30 22:16       ` Ilya Dryomov
2019-10-30 22:38         ` Rasmus Villemoes
2019-10-31 15:11     ` David Howells
2019-10-31 15:57       ` Ilya Dryomov
2019-11-01 14:53       ` David Howells
2019-10-31 14:57   ` David Howells
2019-11-03 11:17     ` Matthew Wilcox
2019-12-06 21:47   ` Johannes Hirte
2019-12-06 22:14     ` Linus Torvalds
2019-12-07  0:00       ` Johannes Hirte
2019-12-07  1:03         ` Linus Torvalds
2019-12-08 17:56           ` Johannes Hirte
2019-12-08 18:10             ` Linus Torvalds
2019-12-07  6:47         ` Linus Torvalds
2019-12-06 22:15   ` David Howells
2019-10-23 20:17 ` [RFC PATCH 05/10] pipe: Allow pipes to have kernel-reserved slots " David Howells
2019-10-23 20:18 ` [RFC PATCH 06/10] pipe: Advance tail pointer inside of wait spinlock in pipe_read() " David Howells
2019-10-23 20:18 ` [RFC PATCH 07/10] pipe: Conditionalise wakeup " David Howells
2019-10-27 15:57   ` Konstantin Khlebnikov
2019-10-31 15:21   ` David Howells
2019-10-31 16:38   ` David Howells
2019-11-03 11:04     ` Konstantin Khlebnikov
2019-10-23 20:18 ` [RFC PATCH 08/10] pipe: Rearrange sequence in pipe_write() to preallocate slot " David Howells
2019-10-23 20:18 ` [RFC PATCH 09/10] pipe: Remove redundant wakeup from pipe_write() " David Howells
2019-10-23 20:18 ` [RFC PATCH 10/10] pipe: Check for ring full inside of the spinlock in " David Howells
2019-10-24 10:32 ` [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length " David Howells
2019-10-24 13:14 ` [RFC PATCH 00/10] pipe: Notification queue preparation " Peter Zijlstra
2019-10-24 16:57 ` [RFC PATCH 11/10] pipe: Add fsync() support " David Howells
2019-10-24 21:29   ` Linus Torvalds
2019-10-25  8:34   ` David Howells
2019-10-27 15:22   ` Christoph Hellwig
2019-10-27 16:04   ` Konstantin Khlebnikov
2019-10-31 15:13   ` David Howells
2019-10-31 15:15   ` David Howells
2019-11-02 18:53     ` Linus Torvalds
2019-11-02 19:34     ` David Howells
2019-11-02 20:31       ` Andy Lutomirski
2019-11-02 22:03         ` Linus Torvalds
2019-11-02 22:09           ` Linus Torvalds
2019-11-02 22:30           ` Andy Lutomirski
2019-11-02 23:02             ` Linus Torvalds [this message]
2019-11-02 23:09               ` Linus Torvalds
2019-11-02 23:14                 ` Andy Lutomirski
2019-11-03 12:02                   ` Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgzRU9RjkZG0L9_yrnFN69REkrSokTQOGZMUkvdispvuQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=christian@brauner.io \
    --cc=dhowells@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=keyrings@vger.kernel.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luto@amacapital.net \
    --cc=nicolas.dichtel@6wind.com \
    --cc=peterz@infradead.org \
    --cc=raven@themaw.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).