All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Adamson <andros@netapp.com>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Peng Tao <bergwolf@gmail.com>, Jim Rees <rees@umich.edu>,
	Christoph Hellwig <hch@infradead.org>,
	linux-nfs@vger.kernel.org, peter honeyman <honey@citi.umich.edu>
Subject: Re: [PATCH v4 00/27] add block layout driver to pnfs client
Date: Mon, 1 Aug 2011 18:57:56 -0400	[thread overview]
Message-ID: <0A94625F-4E4F-4A88-A0DF-63D429924F66@netapp.com> (raw)
In-Reply-To: <1312238117.23392.19.camel@lade.trondhjem.org>


On Aug 1, 2011, at 6:35 PM, Trond Myklebust wrote:

> On Mon, 2011-08-01 at 17:10 -0400, Trond Myklebust wrote: 
>> Looking at the callback code, I see that if tbl->highest_used_slotid !=
>> 0, then we BUG() while holding the backchannel's tbl->slot_tbl_lock
>> spinlock. That seems a likely candidate for the above hang.
>> 
>> Andy, how we are guaranteed that tbl->highest_used_slotid won't take
>> values other than 0, and why do we commit suicide when it does? As far
>> as I can see, there is no guarantee that we call nfs4_cb_take_slot() in
>> nfs4_callback_compound(), however we appear to unconditionally call
>> nfs4_cb_free_slot() provided there is a session.
>> 
>> The other strangeness would be the fact that there is nothing enforcing
>> the NFS4_SESSION_DRAINING flag. If the session is draining, then the
>> back-channel simply ignores that and goes ahead with processing the
>> callback. Is this to avoid deadlocks with the server returning
>> NFS4ERR_BACK_CHAN_BUSY when the client does a DESTROY_SESSION?


When NFS4_SESSION_DRAINING is set, the backchannel is drained first - and waits for current processing to complete signaled by the highest_used_slotid == -1. Any other backchannel requests that occur under the NFS4_SESSION_DRAINING flag are processed - but just the CB_SEQUENCE operation which returns NFS4ERR_DELAY.  It does indeed prevent the BACK_CHAN_BUSY deadlock.

> 
> How about something like the following?

Looks good. Nice catch. One change below and a comment inline

-->Andy

> 
> 8<------------------------------------------------------------------------------- 
> From c0c499b0ca9d0af8cbdc29c40effba38475461d9 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date: Mon, 1 Aug 2011 18:31:12 -0400
> Subject: [PATCH] NFSv4.1: Fix the callback 'highest_used_slotid' behaviour
> 
> Currently, there is no guarantee that we will call nfs4_cb_take_slot() even
> though nfs4_callback_compound() will consistently call
> nfs4_cb_free_slot() provided the cb_process_state has set the 'clp' field.
> The result is that we can trigger the BUG_ON() upon the next call to
> nfs4_cb_take_slot().
> This patch fixes the above problem by using the slot id that was taken in
> the CB_SEQUENCE operation as a flag for whether or not we need to call
> nfs4_cb_free_slot().
> It also fixes an atomicity problem: we need to set tbl->highest_used_slotid
> atomically with the check for NFS4_SESSION_DRAINING, otherwise we end up
> racing with the various tests in nfs4_begin_drain_session().

But the code below doesn't do this - it locks the backchannel slot table to check the flag, but does not set the highest_used_slot atomically with this check.

> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
> fs/nfs/callback.h      |    2 +-
> fs/nfs/callback_proc.c |   18 +++++++++++++-----
> fs/nfs/callback_xdr.c  |   24 +++++++-----------------
> 3 files changed, 21 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
> index b257383..07df5f1 100644
> --- a/fs/nfs/callback.h
> +++ b/fs/nfs/callback.h
> @@ -38,6 +38,7 @@ enum nfs4_callback_opnum {
> struct cb_process_state {
> 	__be32			drc_status;
> 	struct nfs_client	*clp;
> +	int			slotid;
> };
> 
> struct cb_compound_hdr_arg {
> @@ -166,7 +167,6 @@ extern unsigned nfs4_callback_layoutrecall(
> 	void *dummy, struct cb_process_state *cps);
> 
> extern void nfs4_check_drain_bc_complete(struct nfs4_session *ses);
> -extern void nfs4_cb_take_slot(struct nfs_client *clp);
> 
> struct cb_devicenotifyitem {
> 	uint32_t		cbd_notify_type;
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index 74780f9..1bd2c81 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -348,7 +348,7 @@ validate_seqid(struct nfs4_slot_table *tbl, struct cb_sequenceargs * args)
> 	/* Normal */
> 	if (likely(args->csa_sequenceid == slot->seq_nr + 1)) {
> 		slot->seq_nr++;
> -		return htonl(NFS4_OK);
> +		goto out_ok;
> 	}
> 
> 	/* Replay */
> @@ -367,11 +367,14 @@ validate_seqid(struct nfs4_slot_table *tbl, struct cb_sequenceargs * args)
> 	/* Wraparound */
> 	if (args->csa_sequenceid == 1 && (slot->seq_nr + 1) == 0) {
> 		slot->seq_nr = 1;
> -		return htonl(NFS4_OK);
> +		goto out_ok;
> 	}
> 
> 	/* Misordered request */
> 	return htonl(NFS4ERR_SEQ_MISORDERED);
> +out_ok:
> +	tbl->highest_used_slotid = args->csa_slotid;
> +	return htonl(NFS4_OK);
> }
> 
> /*
> @@ -433,26 +436,32 @@ __be32 nfs4_callback_sequence(struct cb_sequenceargs *args,
> 			      struct cb_sequenceres *res,
> 			      struct cb_process_state *cps)
> {
> +	struct nfs4_slot_table *tbl;
> 	struct nfs_client *clp;
> 	int i;
> 	__be32 status = htonl(NFS4ERR_BADSESSION);
> 
> -	cps->clp = NULL;
> -
> 	clp = nfs4_find_client_sessionid(args->csa_addr, &args->csa_sessionid);
> 	if (clp == NULL)
> 		goto out;
> 
> +	tbl = &clp->cl_session->bc_slot_table;
> +
> +	spin_lock(&tbl->slot_tbl_lock);
> 	/* state manager is resetting the session */
> 	if (test_bit(NFS4_SESSION_DRAINING, &clp->cl_session->session_state)) {
> 		status = NFS4ERR_DELAY;
                                 
status = htonl(NFS4ERR_DELAY);


> +		spin_unlock(&tbl->slot_tbl_lock);
> 		goto out;
> 	}
> 
> 	status = validate_seqid(&clp->cl_session->bc_slot_table, args);
> +	spin_unlock(&tbl->slot_tbl_lock);
> 	if (status)
> 		goto out;
> 
> +	cps->slotid = args->csa_slotid;
> +
> 	/*
> 	 * Check for pending referring calls.  If a match is found, a
> 	 * related callback was received before the response to the original
> @@ -469,7 +478,6 @@ __be32 nfs4_callback_sequence(struct cb_sequenceargs *args,
> 	res->csr_slotid = args->csa_slotid;
> 	res->csr_highestslotid = NFS41_BC_MAX_CALLBACKS - 1;
> 	res->csr_target_highestslotid = NFS41_BC_MAX_CALLBACKS - 1;
> -	nfs4_cb_take_slot(clp);
> 
> out:
> 	cps->clp = clp; /* put in nfs4_callback_compound */
> diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
> index c6c86a7..918ad64 100644
> --- a/fs/nfs/callback_xdr.c
> +++ b/fs/nfs/callback_xdr.c
> @@ -754,26 +754,15 @@ static void nfs4_callback_free_slot(struct nfs4_session *session)
> 	 * Let the state manager know callback processing done.
> 	 * A single slot, so highest used slotid is either 0 or -1
> 	 */
> -	tbl->highest_used_slotid--;
> +	tbl->highest_used_slotid = -1;
> 	nfs4_check_drain_bc_complete(session);
> 	spin_unlock(&tbl->slot_tbl_lock);
> }
> 
> -static void nfs4_cb_free_slot(struct nfs_client *clp)
> +static void nfs4_cb_free_slot(struct cb_process_state *cps)
> {
> -	if (clp && clp->cl_session)
> -		nfs4_callback_free_slot(clp->cl_session);
> -}
> -
> -/* A single slot, so highest used slotid is either 0 or -1 */
> -void nfs4_cb_take_slot(struct nfs_client *clp)
> -{
> -	struct nfs4_slot_table *tbl = &clp->cl_session->bc_slot_table;
> -
> -	spin_lock(&tbl->slot_tbl_lock);
> -	tbl->highest_used_slotid++;
> -	BUG_ON(tbl->highest_used_slotid != 0);
> -	spin_unlock(&tbl->slot_tbl_lock);
> +	if (cps->slotid != -1)
> +		nfs4_callback_free_slot(cps->clp->cl_session);
> }
> 
> #else /* CONFIG_NFS_V4_1 */
> @@ -784,7 +773,7 @@ preprocess_nfs41_op(int nop, unsigned int op_nr, struct callback_op **op)
> 	return htonl(NFS4ERR_MINOR_VERS_MISMATCH);
> }
> 
> -static void nfs4_cb_free_slot(struct nfs_client *clp)
> +static void nfs4_cb_free_slot(struct cb_process_state *cps)
> {
> }
> #endif /* CONFIG_NFS_V4_1 */
> @@ -866,6 +855,7 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp, void *argp, void *r
> 	struct cb_process_state cps = {
> 		.drc_status = 0,
> 		.clp = NULL,
> +		.slotid = -1,
> 	};
> 	unsigned int nops = 0;
> 
> @@ -906,7 +896,7 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp, void *argp, void *r
> 
> 	*hdr_res.status = status;
> 	*hdr_res.nops = htonl(nops);
> -	nfs4_cb_free_slot(cps.clp);
> +	nfs4_cb_free_slot(&cps);
> 	nfs_put_client(cps.clp);
> 	dprintk("%s: done, status = %u\n", __func__, ntohl(status));
> 	return rpc_success;
> -- 
> 1.7.6
> 
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
> 


  reply	other threads:[~2011-08-01 22:57 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-28 17:30 [PATCH v4 00/27] add block layout driver to pnfs client Jim Rees
2011-07-28 17:30 ` [PATCH v4 01/27] pnfs: GETDEVICELIST Jim Rees
2011-07-28 17:30 ` [PATCH v4 02/27] pnfs: add set-clear layoutdriver interface Jim Rees
2011-07-28 17:30 ` [PATCH v4 03/27] pnfs: save layoutcommit lwb at layout header Jim Rees
2011-07-28 17:30 ` [PATCH v4 04/27] pnfs: save layoutcommit cred " Jim Rees
2011-07-28 17:30 ` [PATCH v4 05/27] pnfs: let layoutcommit handle a list of lseg Jim Rees
2011-07-28 18:52   ` Boaz Harrosh
2011-07-28 17:30 ` [PATCH v4 06/27] pnfs: use lwb as layoutcommit length Jim Rees
2011-07-28 17:30 ` [PATCH v4 07/27] NFS41: save layoutcommit cred in layout header init Jim Rees
2011-07-28 17:30 ` [PATCH v4 08/27] pnfs: ask for layout_blksize and save it in nfs_server Jim Rees
2011-07-28 17:30 ` [PATCH v4 09/27] pnfs: cleanup_layoutcommit Jim Rees
2011-07-28 18:26   ` Boaz Harrosh
2011-07-29  3:16     ` Jim Rees
2011-07-28 17:30 ` [PATCH v4 10/27] pnfsblock: add blocklayout Kconfig option, Makefile, and stubs Jim Rees
2011-07-28 17:31 ` [PATCH v4 11/27] pnfsblock: use pageio_ops api Jim Rees
2011-07-28 17:31 ` [PATCH v4 12/27] pnfsblock: basic extent code Jim Rees
2011-07-28 17:31 ` [PATCH v4 13/27] pnfsblock: add device operations Jim Rees
2011-07-28 17:31 ` [PATCH v4 14/27] pnfsblock: remove " Jim Rees
2011-07-28 17:31 ` [PATCH v4 15/27] pnfsblock: lseg alloc and free Jim Rees
2011-07-28 17:31 ` [PATCH v4 16/27] pnfsblock: merge extents Jim Rees
2011-07-28 17:31 ` [PATCH v4 17/27] pnfsblock: call and parse getdevicelist Jim Rees
2011-07-28 17:31 ` [PATCH v4 18/27] pnfsblock: xdr decode pnfs_block_layout4 Jim Rees
2011-07-28 17:31 ` [PATCH v4 19/27] pnfsblock: bl_find_get_extent Jim Rees
2011-07-28 17:31 ` [PATCH v4 20/27] pnfsblock: add extent manipulation functions Jim Rees
2011-07-28 17:31 ` [PATCH v4 21/27] pnfsblock: merge rw extents Jim Rees
2011-07-28 17:31 ` [PATCH v4 22/27] pnfsblock: encode_layoutcommit Jim Rees
2011-07-28 17:31 ` [PATCH v4 23/27] pnfsblock: cleanup_layoutcommit Jim Rees
2011-07-28 17:31 ` [PATCH v4 24/27] pnfsblock: bl_read_pagelist Jim Rees
2011-07-28 17:31 ` [PATCH v4 25/27] pnfsblock: bl_write_pagelist Jim Rees
2011-07-28 17:31 ` [PATCH v4 26/27] pnfsblock: note written INVAL areas for layoutcommit Jim Rees
2011-07-28 17:31 ` [PATCH v4 27/27] pnfsblock: write_pagelist handle zero invalid extents Jim Rees
2011-07-29 15:51 ` [PATCH v4 00/27] add block layout driver to pnfs client Christoph Hellwig
2011-07-29 17:45   ` Peng Tao
2011-07-29 18:44     ` Christoph Hellwig
2011-07-29 18:54   ` Jim Rees
2011-07-29 19:01     ` Christoph Hellwig
2011-07-29 19:13       ` Jim Rees
2011-07-30  1:09         ` Trond Myklebust
2011-07-30  3:26           ` Jim Rees
2011-07-30 14:25             ` Peng Tao
2011-08-01 21:10               ` Trond Myklebust
2011-08-01 22:35                 ` Trond Myklebust
2011-08-01 22:57                   ` Andy Adamson [this message]
2011-08-01 23:11                     ` Trond Myklebust
2011-08-02 17:30                       ` Trond Myklebust
2011-08-02 18:50                         ` [PATCH v2 1/2] NFSv4.1: Fix the callback 'highest_used_slotid' behaviour Trond Myklebust
2011-08-02 18:50                           ` [PATCH v2 2/2] NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets Trond Myklebust
2011-08-03  8:52                           ` [PATCH v2 1/2] NFSv4.1: Fix the callback 'highest_used_slotid' behaviour Peng Tao
2011-08-02  2:21                   ` [PATCH v4 00/27] add block layout driver to pnfs client Jim Rees
2011-08-02  2:29                     ` Myklebust, Trond
2011-08-02  3:23                       ` Jim Rees
2011-08-02 12:28                         ` Trond Myklebust
2011-08-02 12:56                           ` Jim Rees
2011-08-03  1:48                           ` Jim Rees
2011-08-03  2:07                             ` Myklebust, Trond
     [not found]                               ` <2E1EB2CF9ED1CB4AA966F0EB76EAB4430A778AE2-hX7t0kiaRRrlMGe9HJ1VYQK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2011-08-03  2:11                                 ` Jim Rees
2011-08-03  2:38                               ` Jim Rees
2011-08-03  8:43                                 ` Peng Tao
2011-08-03 11:49                                   ` Jim Rees
2011-08-03 11:53                                   ` Jim Rees
2011-08-03 13:59                                     ` Peng Tao
2011-08-03 14:11                                       ` Jim Rees
2011-07-30 14:18           ` Jim Rees

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0A94625F-4E4F-4A88-A0DF-63D429924F66@netapp.com \
    --to=andros@netapp.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=bergwolf@gmail.com \
    --cc=hch@infradead.org \
    --cc=honey@citi.umich.edu \
    --cc=linux-nfs@vger.kernel.org \
    --cc=rees@umich.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.