All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@poochiereds.net>
To: Benjamin Coddington <bcodding@redhat.com>
Cc: trond.myklebust@primarydata.com, linux-nfs@vger.kernel.org
Subject: Re: nfs4_put_lock_state() wants some nfs4_state on cleanup
Date: Wed, 22 Jul 2015 12:34:42 -0400	[thread overview]
Message-ID: <20150722123442.78ed7f67@tlielax.poochiereds.net> (raw)
In-Reply-To: <1437579281-26810-1-git-send-email-bcodding@redhat.com>

On Wed, 22 Jul 2015 11:34:41 -0400
Benjamin Coddington <bcodding@redhat.com> wrote:

> Our QE folks are noticing the old leftover locks WARN popping up in RHEL7
> (it's since been removed).  While investigating upstream, I found I could
> make this happen by locking, then closing and signaling a process in a loop:
> 
>  #0 [ffff88007a4874a0] __schedule at ffffffff81736d8a
>  #1 [ffff88007a4874f0] schedule at ffffffff81737407
>  #2 [ffff88007a487510] do_exit at ffffffff8109e18f
>  #3 [ffff88007a487590] oops_end at ffffffff8101822e
>  #4 [ffff88007a4875c0] no_context at ffffffff81063b55
>  #5 [ffff88007a487630] __bad_area_nosemaphore at ffffffff81063e1b
>  #6 [ffff88007a487680] bad_area_nosemaphore at ffffffff81063fa3
>  #7 [ffff88007a487690] __do_page_fault at ffffffff81064251
>  #8 [ffff88007a4876f0] trace_do_page_fault at ffffffff81064677
>  #9 [ffff88007a487730] do_async_page_fault at ffffffff8105ed0e
> #10 [ffff88007a487750] async_page_fault at ffffffff8173d078
>     [exception RIP: nfs4_put_lock_state+82]
>     RIP: ffffffffa02dd5b2  RSP: ffff88007a487808  RFLAGS: 00010207
>     RAX: 0000003fffffffff  RBX: ffff8800351d2000  RCX: 0000000000000024
>     RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000009
>     RBP: ffff88007a487818   R8: 0000000000000000   R9: 0000000000000000
>     R10: 000000000000028b  R11: 0000000000aaaaaa  R12: ffff88003675e240
>     R13: ffff88003504d5b0  R14: ffff88007a487b30  R15: ffff880035097c40
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> #11 [ffff88007a487800] nfs4_put_lock_state at ffffffffa02dd59b [nfsv4]
> #12 [ffff88007a487820] nfs4_fl_release_lock at ffffffffa02dd605 [nfsv4]
> #13 [ffff88007a487830] locks_release_private at ffffffff81258548
> #14 [ffff88007a487850] locks_free_lock at ffffffff81258dbb
> #15 [ffff88007a487870] locks_dispose_list at ffffffff81258f68
> #16 [ffff88007a4878a0] __posix_lock_file at ffffffff81259ab6
> #17 [ffff88007a487930] posix_lock_inode_wait at ffffffff8125a02a
> #18 [ffff88007a4879b0] do_vfs_lock at ffffffffa02c4687 [nfsv4]
> #19 [ffff88007a4879c0] nfs4_proc_lock at ffffffffa02cc1a1 [nfsv4]
> #20 [ffff88007a487a70] do_unlk at ffffffffa0273d9e [nfs]
> #21 [ffff88007a487ac0] nfs_lock at ffffffffa0273fa9 [nfs]
> #22 [ffff88007a487b10] vfs_lock_file at ffffffff8125a76e
> #23 [ffff88007a487b20] locks_remove_posix at ffffffff8125a819
> #24 [ffff88007a487c10] locks_remove_posix at ffffffff8125a878
> #25 [ffff88007a487c20] filp_close at ffffffff812092a2
> #26 [ffff88007a487c50] put_files_struct at ffffffff812290c5
> #27 [ffff88007a487ca0] exit_files at ffffffff812291c1
> #28 [ffff88007a487cc0] do_exit at ffffffff8109dc5f
> #29 [ffff88007a487d40] do_group_exit at ffffffff8109e3b5
> #30 [ffff88007a487d70] get_signal at ffffffff810a9504
> #31 [ffff88007a487e00] do_signal at ffffffff81014447
> #32 [ffff88007a487f30] do_notify_resume at ffffffff81014b0e
> #33 [ffff88007a487f50] int_signal at ffffffff8173b2fc
> 
> The nfs4_lock_state->ls_state pointer is pointing to free memory.
> 
> I think what's happening here is that a signal is bumping us out of
> do_unlk() waiting on the io_counter while we try to release locks on
> __fput().  Since the lock is never released, it sticks around on the inode
> until another lock replaces it, and when it is freed it wants some bits from
> nfs4_state, but the nfs4_state was already cleaned up.
> 
> Probably we need to do a better job not bailing out of do_unlk on file
> close, but while I work on that, something like the following keeps the
> nfs4_state around for proper cleanup of the nfs4_lock_state:
> 
> Is this sane?
> 
> Ben
> 
> 8<--------------------------------------------------------------------
> From cab3dd59aa1a04f3be28811dfb515afc4a9080a7 Mon Sep 17 00:00:00 2001
> Message-Id: <cab3dd59aa1a04f3be28811dfb515afc4a9080a7.1437578183.git.bcodding@redhat.com>
> From: Benjamin Coddington <bcodding@redhat.com>
> Date: Wed, 22 Jul 2015 11:02:26 -0400
> Subject: [PATCH] NFS: keep nfs4_state for nfs4_lock_state cleanup
> 
> If we fail to release a lock due to an error or signal on file close, we
> might later free the lock if another lock replaces it.  Hold a reference to
> the nfs4_state to ensure it is not released before freeing the
> nfs4_lock_state.
> ---
>  fs/nfs/nfs4state.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 605840d..f93b410 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -827,6 +827,7 @@ static struct nfs4_lock_state *nfs4_alloc_lock_state(struct nfs4_state *state, f
>  		return NULL;
>  	nfs4_init_seqid_counter(&lsp->ls_seqid);
>  	atomic_set(&lsp->ls_count, 1);
> +	atomic_inc(&state->count);
>  	lsp->ls_state = state;
>  	lsp->ls_owner = fl_owner;
>  	lsp->ls_seqid.owner_id = ida_simple_get(&server->lockowner_id, 0, 0, GFP_NOFS);
> @@ -903,6 +904,7 @@ void nfs4_put_lock_state(struct nfs4_lock_state *lsp)
>  		clp->cl_mvops->free_lock_state(server, lsp);
>  	} else
>  		nfs4_free_lock_state(server, lsp);
> +	nfs4_put_open_state(state);
>  }
>  
>  static void nfs4_fl_copy_lock(struct file_lock *dst, struct file_lock *src)

Looks relatively harmless at first glance, and since lock states are
somewhat dependent on an open state then having them hold a reference
like this makes a lot of sense as well.

The existing behavior is probably fine when FL_CLOSE isn't set, but
when it is we need a stronger guarantee that the lock will be cleaned
up properly.

I think the best fix when FL_CLOSE is set would be to change the code
so that it's not waiting synchronously on the iocounter to go to zero
before submitting the rpc_task. Instead, we should have the LOCKU
rpc_task wait on an rpc_wait_queue for the counter to go to zero.

We might be able to get away with making all LOCKU rpcs do this, but I
think when you catch a signal in the middle of a fcntl() syscall,
you'll probably want to cancel the RPC as well if it hasn't been
successfully transmitted yet.

-- 
Jeff Layton <jlayton@poochiereds.net>

  reply	other threads:[~2015-07-22 16:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-22 15:34 nfs4_put_lock_state() wants some nfs4_state on cleanup Benjamin Coddington
2015-07-22 16:34 ` Jeff Layton [this message]
2015-08-07 11:49   ` Benjamin Coddington
2015-08-07 13:22     ` Jeff Layton
2015-08-07 13:49       ` Benjamin Coddington

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150722123442.78ed7f67@tlielax.poochiereds.net \
    --to=jlayton@poochiereds.net \
    --cc=bcodding@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.