All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: Benjamin Coddington <bcodding@redhat.com>,
	Trond Myklebust <trondmy@primarydata.com>
Cc: "anna.schumaker@netapp.com" <anna.schumaker@netapp.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 4/4] NFS: Always send an unlock for FL_CLOSE
Date: Wed, 22 Feb 2017 14:20:12 -0500	[thread overview]
Message-ID: <1487791212.7731.6.camel@redhat.com> (raw)
In-Reply-To: <3D0E5D1B-CB46-40B3-AE36-2D7D7B4B9AD1@redhat.com>

On Wed, 2017-02-22 at 12:39 -0500, Benjamin Coddington wrote:
> 
> On 22 Feb 2017, at 11:27, Trond Myklebust wrote:
> 
> > On Wed, 2017-02-22 at 10:42 -0500, Jeff Layton wrote:
> > > On Wed, 2017-02-22 at 09:10 -0500, Benjamin Coddington wrote:
> > > > On 22 Feb 2017, at 8:20, Jeff Layton wrote:
> > > > 
> > > > > On Tue, 2017-02-21 at 10:39 -0500, Benjamin Coddington wrote:
> > > > > > NFS attempts to wait for read and write completion before
> > > > > > unlocking 
> > > > > > in
> > > > > > order to ensure that the data returned was protected by the
> > > > > > lock.  
> > > > > > When
> > > > > > this waiting is interrupted by a signal, the unlock may be
> > > > > > skipped, 
> > > > > > and
> > > > > > messages similar to the following are seen in the kernel ring
> > > > > > buffer:
> > > > > > 
> > > > > > [20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3:
> > > > > > [20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1
> > > > > > fl_type=0x0 
> > > > > > fl_pid=20183
> > > > > > [20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1
> > > > > > fl_type=0x0 
> > > > > > fl_pid=20185
> > > > > > 
> > > > > > For NFSv3, the missing unlock will cause the server to refuse 
> > > > > > conflicting
> > > > > > locks indefinitely.  For NFSv4, the leftover lock will be
> > > > > > removed by 
> > > > > > the
> > > > > > server after the lease timeout.
> > > > > > 
> > > > > > This patch fixes this issue by skipping the wait in order to 
> > > > > > immediately send
> > > > > > the unlock if the FL_CLOSE flag is set when signaled.
> > > > > > 
> > > > > > Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> > > > > > ---
> > > > > >  fs/nfs/file.c | 10 +++++-----
> > > > > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> > > > > > index a490f45df4db..df695f52bb9d 100644
> > > > > > --- a/fs/nfs/file.c
> > > > > > +++ b/fs/nfs/file.c
> > > > > > @@ -697,14 +697,14 @@ do_unlk(struct file *filp, int cmd,
> > > > > > struct 
> > > > > > file_lock *fl, int is_local)
> > > > > >  	if (!IS_ERR(l_ctx)) {
> > > > > >  		status = nfs_iocounter_wait(l_ctx);
> > > > > >  		nfs_put_lock_context(l_ctx);
> > > > > > -		if (status < 0)
> > > > > > +		/*  NOTE: special case
> > > > > > +		 * 	If we're signalled while cleaning
> > > > > > up locks on process exit, we
> > > > > > +		 * 	still need to complete the unlock.
> > > > > > +		 */
> > > > > > +		if (status < 0 && !(fl->fl_flags & FL_CLOSE))
> > > > > >  			return status;
> > > > > 
> > > > > 
> > > > > Hmm, I don't know if this is safe...
> > > > > 
> > > > > Suppose we have a bunch of buffered writeback going on, and we're
> > > > > sitting here waiting for it so we can do the unlock. The task catches 
> > > > > a signal, and then issues the unlock while writeback is still going on.
> > > > > Another client then grabs the lock, and starts doing reads and writes
> > > > > while this one is still writing back.
> > > > > 
> > > > > I think the unlock really needs to wait until writeback is done,
> > > > > regardless of whether you catch a signal or not.
> > > > 
> > > > The only other way to do this is as you've sugggested many times -- by
> > > > putting the LOCKU on a rpc_wait_queue.  But that won't work for NFSv3.
> > > > 
> > > 
> > > IDGI -- why doesn't that (or something similar) not work for v3?
> 
> Because of the layering, as discussed further on.
> 
> > > > For NFSv4, the other thing to do might be to look up the lock state and 
> > > > set NFS_LOCK_LOST, then unlock.  That will cause subsequent writes to
> > > > not attempt recovery.
> > > > 
> > > 
> > > No! LOCK_LOST is really for the case where we've lost contact with the
> > > server. We don't want to get kicked into recovery just because of a
> > > SIGKILL.
> 
> True, it was added for the case where contact is lost, but it does exactly
> what we'd want.
> 
> If we're coming through do_unlk() with FL_CLOSE, then we've just done a
> vfs_fsync() and I believe that all dirty data are now in requests queued up
> and being transmitted.  The signal has had us bail out of the wait for those
> pages to writeback, and I don't think new pages can be dirtied under this
> lock.  So, an unlock at this point will at worst cause some of that I/O to
> fail with BAD_STATEID (or OLD_STATEID?).  What we want to prevent then is
> the client retrying those writes under another stateid, which is what
> NFS_LOCK_LOST does.
> 
> > I suggest we add a new lock state: NFS_LOCK_UNLOCKED to allow the client
> > to recognise that the failed operation should not be retried.
> > 
> > In fact, we might want to have nfs4_set_rw_stateid() scan for that, and
> > return an error that could be made to fail the operation.  Furthermore,
> > maybe __nfs4_find_lock_state() should also ignore lock structures in this
> > state so that once the lock is in this state, we always ignore new
> > attempts.  This could also be made to work with pNFS; although for most
> > layout types, the locking will not be enforced by the server, we can at
> > least prevent new I/O requests from being generated after the
> > NFS_LOCK_UNLOCKED state has been declared.
> > 
> > You can't do any of the above with NFSv3, since the locking is decoupled
> > from the actual I/O. Is anyone actually using NLM for mission critical
> > applications that need this kind of protection? (followup question: is
> > there a good reason why they are doing so instead of using NFSv4?)
> 

Well, we do have a nfs_lock_context for all NFS versions though, and
you can get to that via the nfs_page from down in the bowels of the
pageio code. We could have the flag in there instead.

That has the benefit of moving at least some of the logic into non-
version specific code as well.

> Yes, people are using it.  And I don't think those people care much about
> whether or not their I/O completes under the lock after a signal.  They care
> that when their job on a cluster gets hung up and their scheduler kills it,
> they can't submit the job again anywhere else on the cluster because the
> server thinks the job's file is locked.  The only fix is to restart the
> server, which is very disruptive.
> 
> So, I'm happy to work on a NFS_LOCK_UNLOCKED approach for v4 and I'd also be
> happy to just leave v3 to always unlock.
> 

Well, that's where their immediate pain point is. If they have
applications that are relying on NLM locking to serialize access to the
files, then this could potentially corrupt their file data.

It may be that no one would ever notice, but those sorts of bug reports
can be very difficult to track down. If we can fix this in a way that
works across versions, then that's what I'd suggest.

-- 
Jeff Layton <jlayton@redhat.com>

  reply	other threads:[~2017-02-22 19:20 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-21 15:39 [PATCH v3 0/4] Skipped unlocks Benjamin Coddington
2017-02-21 15:39 ` [PATCH 1/4] NFS4: remove a redundant lock range check Benjamin Coddington
2017-02-21 15:39 ` [PATCH 2/4] NFS: Move the flock open mode check into nfs_flock() Benjamin Coddington
2017-02-22 12:12   ` Jeff Layton
2017-02-21 15:39 ` [PATCH 3/4] locks: Set FL_CLOSE when removing flock locks on close() Benjamin Coddington
2017-02-22 12:13   ` Jeff Layton
     [not found]     ` <1487765584.2886.8.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-22 12:25       ` Jeff Layton
2017-02-22 12:25         ` Jeff Layton
     [not found]         ` <1487766356.2886.11.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-02-22 13:25           ` Miklos Szeredi
2017-02-22 13:25             ` Miklos Szeredi
     [not found]             ` <CAJfpegtArWuOKVS_Qq8E0Fw4pcsYC4sj==BN5=3WQSLDXA5AQg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-22 13:27               ` Benjamin Coddington
2017-02-22 13:27                 ` Benjamin Coddington
2017-02-21 15:39 ` [PATCH 4/4] NFS: Always send an unlock for FL_CLOSE Benjamin Coddington
2017-02-22 13:20   ` Jeff Layton
2017-02-22 14:10     ` Benjamin Coddington
2017-02-22 15:42       ` Jeff Layton
2017-02-22 16:27         ` Trond Myklebust
2017-02-22 17:39           ` Benjamin Coddington
2017-02-22 19:20             ` Jeff Layton [this message]
2017-02-23 11:24               ` Benjamin Coddington

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1487791212.7731.6.camel@redhat.com \
    --to=jlayton@redhat.com \
    --cc=anna.schumaker@netapp.com \
    --cc=bcodding@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.