All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Coddington <bcodding@redhat.com>
To: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH] NFS: Retry a zero-length short read
Date: Wed, 16 Mar 2016 13:30:12 -0400 (EDT)	[thread overview]
Message-ID: <alpine.OSX.2.19.9992.1603161310150.11199@planck> (raw)
In-Reply-To: <CAHQdGtTMOjr8SMKOpM2tg0meAAgfLjgbMb5KjGDRoxt7LnEADw@mail.gmail.com>

On Wed, 16 Mar 2016, Trond Myklebust wrote:

> On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington
> <bcodding@redhat.com> wrote:
> > On Wed, 16 Mar 2016, Benjamin Coddington wrote:
> >
> >> On Wed, 16 Mar 2016, Trond Myklebust wrote:
> >>
> >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington
> >> > <bcodding@redhat.com> wrote:
> >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote:
> >> > >
> >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
> >> > >> <bcodding@redhat.com> wrote:
> >> > >> >
> >> > >> > A zero-length short read without eof should be retried rather than sending
> >> > >> > an error to the application.
> >> > >>
> >> > >>
> >> > >> In what situation would returning a 0 length read not be a bug? If the
> >> > >> server intended that we back off and retry, it has the alternative of
> >> > >> sending a JUKEBOX/DELAY error.
> >> > >
> >> > > If the server completes a local read but then another writer comes in and
> >> > > appends to the file before the server checks if it needs to set EOF, then
> >> > > the response might be 0 length without EOF set.
> >> >
> >> > Why isn't that EOF check done atomically with the read itself? This
> >> > still sounds like a server bug to me.
> >>
> >> I don't know -- I would guess because doing that atomically is harder than
> >> not, and I don't know where the RFCs say that a zero length response without
> >> eof is to be treated as an error or condition to be avoided.
> >>
> >> I'll look into that, and respond here.
> >
> > Indeed, it seems that it is more convenient for the linux server to send a
> > zero-length response without eof when the file grows.  It would probably be
> > more helpful if the server handled that case, but I think that 7530 states
> > that it doesn't have to handle that case.
>
> Here is what RFC5661 and RFC7530 say.
>
>    If the READ ended at the end-of-file (formally, in a correctly formed
>    READ request, if offset + count is equal to the size of the file), or
>    the READ request extends beyond the size of the file (if offset +
>    count is greater than the size of the file), eof is returned as TRUE;
>    otherwise, it is FALSE.  A successful READ of an empty file will
>    always return eof as TRUE.
>
> Here is what RFC1813 says:
>
>       eof
>          If the read ended at the end-of-file (formally, in a
>          correctly formed READ request, if READ3args.offset plus
>          READ3resok.count is equal to the size of the file), eof
>          is returned as TRUE; otherwise it is FALSE. A
>          successful READ of an empty file will always return eof
>          as TRUE.

These cover the eof case, but what about server resource exhaustion?
NFS4ERR_DELAY seems inappropriate since it signals that the operation could
not be completed in an appropriate amount of time.

RFC5661 also says:

   If the client specifies a count value of 0 (zero), the READ succeeds
   and returns 0 (zero) bytes of data (subject to access permissions
   checking).  The server may choose to return fewer bytes than
   specified by the client.  The client needs to check for this
   condition and handle the condition appropriately.

So there is a case where a server can send a read response of zero length
without eof set, and this is perfectly fine.  And, it is the server's choice
to send fewer bytes.

> Where does it say that the eof determination is allowed to be
> non-atomic? Unlike structures such as change_info4, there isn't an
> "atomic" flag to allow the server to communicate to the client that it
> cannot rely on the eof flag. Since the determination is part of the
> same READ operation, you can't point to the "COMPOUNDS are not atomic"
> either.

I don't see where it says that the eof determination is allowed to be
non-atomic, but I do see where it says the amount of data returned is up to
the server.  Why can't the server decide to skip the local read altogether
this time around, returning no data and also not setting eof?

Ben

> >> > > I'm also using https://tools.ietf.org/html/rfc7530#section-16.23.5 to guide
> >> > > how I think the client should behave; it says that the client should retry
> >> > > a short read without eof set.  I think that should include a response with
> >> > > 0 length.
> >>
> >> Here's the verbatim from section 12.23.5:
> >>
> >>    If the server returns a "short read" (i.e., less data than requested
> >>    and eof is set to FALSE), the client should send another READ to get
> >>    the remaining data.  A server may return less data than requested
> >>    under several circumstances.  The file may have been truncated by
> >>    another client or perhaps on the server itself, changing the file
> >>    size from what the requesting client believes to be the case.  This
> >>    would reduce the actual amount of data available to the client.  It
> >>    is possible that the server reduces the transfer size and so returns
> >>    a short read result.  Server resource exhaustion may also result in a
> >>    short read.
> >>
> >> Ben
> >>
> >> > >> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> >> > >> > ---
> >> > >> >  fs/nfs/read.c |    5 -----
> >> > >> >  1 files changed, 0 insertions(+), 5 deletions(-)
> >> > >> >
> >> > >> > diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> >> > >> > index eb31e23..7269d42 100644
> >> > >> > --- a/fs/nfs/read.c
> >> > >> > +++ b/fs/nfs/read.c
> >> > >> > @@ -244,11 +244,6 @@ static void nfs_readpage_retry(struct rpc_task *task,
> >> > >> >
> >> > >> >         /* This is a short read! */
> >> > >> >         nfs_inc_stats(hdr->inode, NFSIOS_SHORTREAD);
> >> > >> > -       /* Has the server at least made some progress? */
> >> > >> > -       if (resp->count == 0) {
> >> > >> > -               nfs_set_pgio_error(hdr, -EIO, argp->offset);
> >> > >> > -               return;
> >> > >> > -       }
> >> > >> >
> >> > >> >         /* For non rpc-based layout drivers, retry-through-MDS */
> >> > >> >         if (!task->tk_ops) {
> >> > >> > --
> >> > >> > 1.7.1
> >> > >> >
> >> > >>
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

      parent reply	other threads:[~2016-03-16 17:30 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-16  9:17 [PATCH] NFS: Retry a zero-length short read Benjamin Coddington
2016-03-16 13:14 ` Trond Myklebust
2016-03-16 14:22   ` Benjamin Coddington
2016-03-16 14:40     ` Trond Myklebust
2016-03-16 14:56       ` Benjamin Coddington
2016-03-16 15:20         ` Benjamin Coddington
2016-03-16 16:22           ` Trond Myklebust
2016-03-16 17:18             ` J. Bruce Fields
2016-03-16 17:36               ` Benjamin Coddington
2016-03-16 19:15                 ` J. Bruce Fields
2016-03-16 19:46                   ` Benjamin Coddington
2016-03-16 19:56                     ` J. Bruce Fields
2016-03-16 20:02                       ` Trond Myklebust
2016-03-17  2:03                         ` Mkrtchyan, Tigran
2016-03-17 10:11                           ` Benjamin Coddington
2016-03-17 13:24                             ` Trond Myklebust
2016-03-17 13:34                               ` Benjamin Coddington
2016-03-22 21:04                         ` J. Bruce Fields
2016-03-16 19:46                 ` J. Bruce Fields
2016-03-16 17:30             ` Benjamin Coddington [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.OSX.2.19.9992.1603161310150.11199@planck \
    --to=bcodding@redhat.com \
    --cc=anna.schumaker@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.