All of lore.kernel.org
 help / color / mirror / Atom feed
From: Milosz Tanski <milosz@adfin.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeremy Allison <jra@samba.org>,
	Christoph Hellwig <hch@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-aio@kvack.org" <linux-aio@kvack.org>,
	Mel Gorman <mgorman@suse.de>,
	Volker Lendecke <Volker.Lendecke@sernet.de>,
	Tejun Heo <tj@kernel.org>, Jeff Moyer <jmoyer@redhat.com>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Linux API <linux-api@vger.kernel.org>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	linux-arch@vger.kernel.org, Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only)
Date: Mon, 30 Mar 2015 21:27:54 -0400	[thread overview]
Message-ID: <CANP1eJEciJRM10ZpC98uXmxs5TP_GP8OX+cBfHK5WEhgaQnzNw@mail.gmail.com> (raw)
In-Reply-To: <20150327093046.53c2769a.akpm@linux-foundation.org>

On Fri, Mar 27, 2015 at 12:30 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Fri, 27 Mar 2015 08:58:54 -0700 Jeremy Allison <jra@samba.org> wrote:
>
>> On Fri, Mar 27, 2015 at 02:01:59AM -0700, Andrew Morton wrote:
>> > On Fri, 27 Mar 2015 01:48:33 -0700 Christoph Hellwig <hch@infradead.org> wrote:
>> >
>> > > On Fri, Mar 27, 2015 at 01:35:16AM -0700, Andrew Morton wrote:
>> > > > fincore() doesn't have to be ugly.  Please address the design issues I
>> > > > raised.  How is pread2() useful to the class of applications which
>> > > > cannot proceed until all data is available?
>> > >
>> > > It actually makes them work correctly?  preadv2( ..., DONTWAIT) will
>> > > return -EGAIN, which causes them to bounce to the threadpool where
>> > > they call preadv(...).
>> >
>> > (I assume you mean RWF_NONBLOCK)
>> >
>> > That isn't how pread2() works.  If the leading one or more pages are
>> > uptodate, pread2() will return a partial read.  Now what?  Either the
>> > application reads the same data a second time via the worker thread
>> > (dumb, but it will usually be a rare case)
>>
>> The problem with the above is that we can't tell the difference
>> between pread2() returning a short read because the pages are not
>> in cache, or because someone truncated the file. So we need some
>> way to differentiate this.
>>
>> My preference from userspace would be for pread2() to return
>> EAGAIN if *all* the data requested is not available (where
>> 'all' can be less than the size requested if the file has
>> been truncated in the meantime).
>>
>> ...
>>
>> The thing I want to avoid is the case where
>> ret < size_wanted means only part of the file
>> is in cache.
>
> From my reading of the code, pread2() will return -EAGAIN only when it
> copied zero bytes to userspace.  ie, the very first page wasn't in
> cache.  If pread2() does copy some data to userspace then it will
> return the amount of data copied.  This is traditional read()
> behaviour.
>
> Maybe there's some other code somewhere in the patch which converts
> that short read into -EAGAIN, dunno - the changelogs don't appear to
> mention it and the manpage update is ambiguous about this.
>
> But from an interface perspective the behaviour you're asking for is
> insane, frankly - if the kernel copied out 8k of data then pread2()
> should return 8k.  Otherwise there's no way for userspace to know that
> the 8k copy actually happened and we have just wasted a great pile of
> CPU doing a pointless memcpy.
>
> I expect that this situation (first part in cache, latter part not in
> cache) is rare - for reasonably small requests the common cases will be
> "all cached" and "nothing cached".  So perhaps the best approach here
> is for samba to add special handling for the short read, to work out
> the reason for its occurrence.
>
> Alternatively we could add another flag to pread2() to select this
> "throw away my data and return -EAGAIN" behaviour.  Presumably
> implemented with an i_size check, but it's gonna be racy.
>
>
>
> I take it from your comments that nobody has actually wired up pread2()
> into samba yet?  That's a bit disturbing, because if we later want to
> go and change something like this short-read behaviour, we're screwed -
> it's a non back-compat userspace-visible change.
>
>
> And a note on cosmetics: why are we using EAGAIN here rather than
> EWOULDBLOCK?  They have the same numerical value, but EWOULDBLOCK is a
> better name - EAGAIN says "run it again", but that won't work.
>

Per definition EWOULDBLOCK seams like a better fit. Like you said
above it won't stop blocking unless you do something. I also did a
search in the kernel source (excluding drivers / sound directories)
use of EAGAIN (even in network code) is like 2 magnitudes bigger then
EWOULDBLOCK. In fact some places that grep found check for both
(although I'm sure it's optimized out).

Does anybody feel strongly about it being EWOULDBLOCK instead of
EAGAIN? Esp. since they are same on Linux? The convention (by numbers)
seams to favor EAGAIN.

-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@adfin.com

  parent reply	other threads:[~2015-03-31  1:28 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-16 18:27 [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
2015-03-16 18:27 ` [PATCH v7 1/5] vfs: Prepare for adding a new preadv/pwritev with user flags Milosz Tanski
2015-03-16 18:27   ` Milosz Tanski
2015-03-16 21:05   ` Andreas Dilger
2015-03-16 21:05     ` Andreas Dilger
2015-03-16 18:27 ` [PATCH v7 2/5] vfs: Define new syscalls preadv2,pwritev2 Milosz Tanski
2015-03-16 18:27   ` Milosz Tanski
2015-03-16 18:27 ` [PATCH v7 3/5] x86: wire up preadv2 and pwritev2 Milosz Tanski
2015-03-16 18:27   ` Milosz Tanski
2015-03-16 18:27 ` [PATCH v7 4/5] vfs: RWF_NONBLOCK flag for preadv2 Milosz Tanski
2015-03-16 18:27   ` Milosz Tanski
2015-03-16 18:27 ` [PATCH v7 5/5] xfs: add RWF_NONBLOCK support Milosz Tanski
2015-03-16 18:27   ` Milosz Tanski
2015-03-16 22:04   ` Dave Chinner
2015-03-16 18:32 ` [PATCH] Add preadv2/pwritev2 documentation Milosz Tanski
2015-03-27 16:49   ` Andrew Morton
2015-03-30  7:33     ` Christoph Hellwig
2015-03-30  7:33       ` Christoph Hellwig
2015-03-16 18:34 ` [PATCH] fstests: generic test for preadv2 behavior on linux Milosz Tanski
2015-03-16 18:34   ` Milosz Tanski
2015-03-16 21:07   ` Andreas Dilger
2015-03-16 21:07     ` Andreas Dilger
2015-03-16 22:03     ` Milosz Tanski
2015-03-16 22:02   ` Dave Chinner
2015-03-16 22:02     ` Dave Chinner
2015-03-16 22:11     ` Milosz Tanski
2015-03-16 22:56       ` Dave Chinner
2015-03-16 22:56         ` Dave Chinner
2015-03-26 11:55 ` [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only) Christoph Hellwig
2015-03-26 11:55   ` Christoph Hellwig
2015-03-26 19:12   ` Milosz Tanski
2015-03-26 19:12     ` Milosz Tanski
2015-03-27  2:26     ` Milosz Tanski
2015-03-27  2:29     ` Milosz Tanski
2015-03-27  2:29       ` Milosz Tanski
2015-03-27  3:28 ` Andrew Morton
2015-03-27  3:28   ` Andrew Morton
2015-03-27  5:41   ` Volker Lendecke
2015-03-27  5:41     ` Volker Lendecke
2015-03-27  6:08     ` Andrew Morton
2015-03-27  6:08       ` Andrew Morton
2015-03-27  8:02       ` Volker Lendecke
2015-03-27  8:02         ` Volker Lendecke
2015-03-27  8:12         ` Christoph Hellwig
2015-03-27  8:18   ` Christoph Hellwig
2015-03-27  8:18     ` Christoph Hellwig
2015-03-27  8:35     ` Andrew Morton
2015-03-27  8:35       ` Andrew Morton
2015-03-27  8:48       ` Christoph Hellwig
2015-03-27  9:01         ` Andrew Morton
2015-03-27  9:01           ` Andrew Morton
2015-03-27  9:44           ` Volker Lendecke
2015-03-27 15:58           ` Jeremy Allison
2015-03-27 15:58             ` Jeremy Allison
2015-03-27 16:30             ` Andrew Morton
2015-03-27 16:30               ` Andrew Morton
2015-03-27 16:30               ` Andrew Morton
2015-03-27 16:30               ` Andrew Morton
2015-03-27 16:39               ` Jeremy Allison
2015-03-27 16:39                 ` Jeremy Allison
2015-03-27 16:39               ` Andrew Morton
2015-03-27 16:45               ` Milosz Tanski
2015-03-31  1:27               ` Milosz Tanski [this message]
2015-03-27 16:38             ` Milosz Tanski
2015-03-27 16:38               ` Milosz Tanski
2015-03-30  7:36             ` Christoph Hellwig
2015-03-30 17:19               ` Jeremy Allison
2015-03-30 17:19                 ` Jeremy Allison
2015-03-30 22:51                 ` Milosz Tanski
2015-03-30 20:26               ` Andrew Morton
2015-03-30 20:26                 ` Andrew Morton
2015-03-30 20:32                 ` Jeremy Allison
2015-03-30 20:37                   ` Andrew Morton
2015-03-30 20:49                     ` Jeremy Allison
2015-03-30 21:33                       ` Andrew Morton
2015-03-30 22:35                     ` Milosz Tanski
2015-03-30 22:49                   ` Milosz Tanski
2015-03-30 22:57                     ` Andrew Morton
2015-03-30 23:06                       ` Milosz Tanski
2015-03-30 23:06                         ` Milosz Tanski
2015-03-30 23:25                 ` Milosz Tanski
2015-04-04  3:42                 ` Andrew Morton
2015-04-06  3:53                   ` Milosz Tanski
2015-04-06  3:53                     ` Milosz Tanski
2015-03-30 23:09               ` Milosz Tanski
2015-03-27 15:21   ` Milosz Tanski
2015-03-27 15:21     ` Milosz Tanski
2015-03-27 17:04     ` Andrew Morton
2015-03-30  7:40       ` Christoph Hellwig
2015-03-30  7:40         ` Christoph Hellwig
2015-03-30 18:54         ` Andrew Morton
2015-03-30 22:40           ` Milosz Tanski
2015-03-30 22:50             ` Andrew Morton
2015-03-30 22:50               ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANP1eJEciJRM10ZpC98uXmxs5TP_GP8OX+cBfHK5WEhgaQnzNw@mail.gmail.com \
    --to=milosz@adfin.com \
    --cc=Volker.Lendecke@sernet.de \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jmoyer@redhat.com \
    --cc=jra@samba.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mtk.manpages@gmail.com \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.