Re: [PATCH] NFS: Retry a zero-length short read

From: Benjamin Coddington <bcodding@redhat.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH] NFS: Retry a zero-length short read
Date: Wed, 16 Mar 2016 13:36:48 -0400 (EDT)	[thread overview]
Message-ID: <alpine.OSX.2.19.9992.1603161333260.11199@planck> (raw)
In-Reply-To: <20160316171857.GA12992@fieldses.org>

On Wed, 16 Mar 2016, J. Bruce Fields wrote:

> On Wed, Mar 16, 2016 at 12:22:22PM -0400, Trond Myklebust wrote:
> > On Wed, Mar 16, 2016 at 11:20 AM, Benjamin Coddington
> > <bcodding@redhat.com> wrote:
> > > On Wed, 16 Mar 2016, Benjamin Coddington wrote:
> > >
> > >> On Wed, 16 Mar 2016, Trond Myklebust wrote:
> > >>
> > >> > On Wed, Mar 16, 2016 at 10:22 AM, Benjamin Coddington
> > >> > <bcodding@redhat.com> wrote:
> > >> > > On Wed, 16 Mar 2016, Trond Myklebust wrote:
> > >> > >
> > >> > >> On Wed, Mar 16, 2016 at 5:17 AM, Benjamin Coddington
> > >> > >> <bcodding@redhat.com> wrote:
> > >> > >> >
> > >> > >> > A zero-length short read without eof should be retried rather than sending
> > >> > >> > an error to the application.
> > >> > >>
> > >> > >>
> > >> > >> In what situation would returning a 0 length read not be a bug? If the
> > >> > >> server intended that we back off and retry, it has the alternative of
> > >> > >> sending a JUKEBOX/DELAY error.
> > >> > >
> > >> > > If the server completes a local read but then another writer comes in and
> > >> > > appends to the file before the server checks if it needs to set EOF, then
> > >> > > the response might be 0 length without EOF set.
> > >> >
> > >> > Why isn't that EOF check done atomically with the read itself? This
> > >> > still sounds like a server bug to me.
> > >>
> > >> I don't know -- I would guess because doing that atomically is harder than
> > >> not, and I don't know where the RFCs say that a zero length response without
> > >> eof is to be treated as an error or condition to be avoided.
> > >>
> > >> I'll look into that, and respond here.
> > >
> > > Indeed, it seems that it is more convenient for the linux server to send a
> > > zero-length response without eof when the file grows.  It would probably be
> > > more helpful if the server handled that case, but I think that 7530 states
> > > that it doesn't have to handle that case.
> >
> > Here is what RFC5661 and RFC7530 say.
> >
> >    If the READ ended at the end-of-file (formally, in a correctly formed
> >    READ request, if offset + count is equal to the size of the file), or
> >    the READ request extends beyond the size of the file (if offset +
> >    count is greater than the size of the file), eof is returned as TRUE;
> >    otherwise, it is FALSE.  A successful READ of an empty file will
> >    always return eof as TRUE.
> >
> > Here is what RFC1813 says:
> >
> >       eof
> >          If the read ended at the end-of-file (formally, in a
> >          correctly formed READ request, if READ3args.offset plus
> >          READ3resok.count is equal to the size of the file), eof
> >          is returned as TRUE; otherwise it is FALSE. A
> >          successful READ of an empty file will always return eof
> >          as TRUE.
> >
> > Where does it say that the eof determination is allowed to be
> > non-atomic? Unlike structures such as change_info4, there isn't an
> > "atomic" flag to allow the server to communicate to the client that it
> > cannot rely on the eof flag. Since the determination is part of the
> > same READ operation, you can't point to the "COMPOUNDS are not atomic"
> > either.
>
> I wonder why READ has eof at all, instead of adopting read()'s
> convention that 0 means eof?
>
> In any case, our server should be able to count on that convention
> internally, so if getting a real "eof" out of the filesystem looks
> daunting, why not for now just check for that case in
> nfsd4_encode_readv()?:

It's probably not really that daunting.  Couldn't we check to see if inode's
size has increased, and if so do the read again?

The concept of READ and eof atomicity should also include a concept of a
boundary or barrier.  The operation is atomic within what bound - the
server's response processing or the underlying filesystem?

Ben

>
> -	eof = (read->rd_offset + maxcount >=
> +	eof = (maxcount == 0) || (read->rd_offset + maxcount >=
>  	       d_inode(read->rd_fhp->fh_dentry)->i_size);
>
>
> (Actually, I'm not sure that's completely right: if 0-length read
> requests are legal then we need to exclude that case.)
>
> --b.
>