All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <openosd@gmail.com>
To: Christoph Hellwig <hch@lst.de>, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 03/19] pnfs: force a layout commit when encountering busy segments during recall
Date: Sun, 24 Aug 2014 20:49:16 +0300	[thread overview]
Message-ID: <53FA259C.9050807@gmail.com> (raw)
In-Reply-To: <1408637375-11343-4-git-send-email-hch@lst.de>

On 08/21/2014 07:09 PM, Christoph Hellwig wrote:
> Expedite layout recall processing by forcing a layout commit when
> we see busy segments.  Without it the layout recall might have to wait
> until the VM decided to start writeback for the file, which can introduce
> long delays.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Good god, Hi Christoph

I've been sitting on client RECALL bugs over a year NOW. I have you scenario
but actually a real DEAD-LOCK instead of an annoying delay.

You have the same deadlock only it is harder for you to hit, with objects
layout it is very easy to reproduce. (Files layout would have the same
bug if it would support segments)

The scenario is as follows:

* Client is doing a LAYOUT_GET and is returned RECALL_CONFLICT

  Comment: If your server is serious about it's recalls, then all the
  while a recall is in progress it will return RECALL_CONFLICT on any
  segment in conflict with the RECALL.
  In objects layout this is easy to hit, because the LAYOUT_GET itself
  may cause the issue of the RECALL, because if the objects map grows
  do to the current LAYOUT_GET then all clients are RECALLed including
  the one issuing the call.
  But this can also happen when one client caused an operation that
  sends a RECALL on our client while our client is in the middle of
  issuing a LAYOUT_GET.

  So our client is stuck in LAYOUT_GET until RECALL from self is
  satisfied.

* The RECALL is received but LAYOUTs are busy because they need
  a LAYOUTCOMMIT. ERR_DELAY is returned.

  Note the server will busy loop on RECALLs until success (NO_MATCHING_LAYOUT)

* Ha ha. LAYOUTCOMMIT will never be called because our client is stuck inside
  LAYOUTGET, and we only call LAYOUTCOMMIT from update_inode() but LAYOUTGET
  is already in an update_inode and VFS will not concurrently call update_inode()
  twice, it will always wait for one to finish in order to notice the inode_dirty
  flag and issue a new one.

   So now we are dead-locked, LAYOUT_GET will wait for the Server to finish the
   RECALL, and will pole for LAYOUT.
   Server is stuck on Polling RECALL, waiting for the client to do a LO_COMMIT
   but this one will never happen because it is waiting for the LAYOUT_GET to
   return.

* The way to try and solve this is like you did below by pushing an immediate
  LAYOUTCOMMIT as part of the recall thread and thous releasing the segments.

I had a slight different solution though

> ---
>  fs/nfs/callback_proc.c | 16 +++++++++++-----
>  fs/nfs/pnfs.c          |  3 +++
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index 41db525..bf017b0 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -164,6 +164,7 @@ static u32 initiate_file_draining(struct nfs_client *clp,
>  	struct inode *ino;
>  	struct pnfs_layout_hdr *lo;
>  	u32 rv = NFS4ERR_NOMATCHING_LAYOUT;
> +	bool need_commit = false;
>  	LIST_HEAD(free_me_list);
>  
>  	lo = get_layout_by_fh(clp, &args->cbl_fh, &args->cbl_stateid);
> @@ -172,16 +173,21 @@ static u32 initiate_file_draining(struct nfs_client *clp,
>  
>  	ino = lo->plh_inode;
>  	spin_lock(&ino->i_lock);
> -	if (test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags) ||
> -	    pnfs_mark_matching_lsegs_invalid(lo, &free_me_list,
> -					&args->cbl_range))
> +	if (test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags)) {
>  		rv = NFS4ERR_DELAY;
> -	else
> -		rv = NFS4ERR_NOMATCHING_LAYOUT;
> +	} else if (pnfs_mark_matching_lsegs_invalid(lo, &free_me_list,
> +			&args->cbl_range)) {
> +		need_commit = true;
> +		rv = NFS4ERR_DELAY;
> +	}
> +
>  	pnfs_set_layout_stateid(lo, &args->cbl_stateid, true);
>  	spin_unlock(&ino->i_lock);
>  	pnfs_free_lseg_list(&free_me_list);
>  	pnfs_put_layout_hdr(lo);
> +
> +	if (need_commit)
> +		pnfs_layoutcommit_inode(ino, false);
>  	iput(ino);
>  out:
>  	return rv;

I did this like below:

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 41db525..59f76bf 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -171,6 +171,14 @@ static u32 initiate_file_draining(struct nfs_client *clp,
 		goto out;
 
 	ino = lo->plh_inode;
+
+	spin_lock(&ino->i_lock);
+	pnfs_set_layout_stateid(lo, &args->cbl_stateid, true);
+	spin_unlock(&ino->i_lock);
+
+	/* kick out any segs held by need to commit */
+	pnfs_layoutcommit_inode(ino, true);
+
 	spin_lock(&ino->i_lock);
 	if (test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags) ||
 	    pnfs_mark_matching_lsegs_invalid(lo, &free_me_list,
@@ -178,7 +186,7 @@ static u32 initiate_file_draining(struct nfs_client *clp,
 		rv = NFS4ERR_DELAY;
 	else
 		rv = NFS4ERR_NOMATCHING_LAYOUT;
-	pnfs_set_layout_stateid(lo, &args->cbl_stateid, true);
 	spin_unlock(&ino->i_lock);
 	pnfs_free_lseg_list(&free_me_list);
 	pnfs_put_layout_hdr(lo);


Comments:

1. I do the pnfs_layoutcommit_inode() regrdless of busy segments because
   if it has-nothing-to-do it returns right-away. Segments may be busy
   because of need-to-commit but also because they are used by in-flight-IO
   So busy segments are not an exact indication.
   In any way we can always do pnfs_layoutcommit_inode() to kick a LAYOUTCOMMIT
   it will never do any harm.

2. This has a performance advantage, any segments held by LAYOUTCOMMIT will
   now be freed, and the RECALL will return success instead of forcing the
   server to one or more RECALL rounds with ERR_DELAY.

It is allowed by the protocol to issue a LAYOUTCOMMIT while in recall because
RECALL is governed by the BACK-CHANNEL seq_id and LAYOUTCOMMIT by the for-channel
seq_id and they need not wait for each other to finish.
(Like for example LAYOUT_GET and LAYOUT_COMMIT which are serialized by the seq_id)


> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 6e0fa71..242e73f 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -604,6 +604,9 @@ pnfs_layout_free_bulk_destroy_list(struct list_head *layout_list,
>  		spin_unlock(&inode->i_lock);
>  		pnfs_free_lseg_list(&lseg_list);
>  		pnfs_put_layout_hdr(lo);
> +
> +		if (ret)
> +			pnfs_layoutcommit_inode(inode, false);
>  		iput(inode);
>  	}
>  	return ret;
> 

With My patch I could go farther on but hit some of the other stuff you have
fixes for with the state_ids and other protocol stuff.

Also with my patch I hit races in state management, because my patch waits
for LAYOUT_COMMIT to execute synchronously from the RECALL thread, your
patch of  asynchronous LAYOUT_COMMIT has a lower chance of hitting. But I
think Trond might have fixed these races, as I have tested this code like
6 month a go.

If you are up to it you might want to test my synchronous way and see if you like
things better. I'm testing your code as well to see how it looks.

BTW: It looks like the hch-pnfs/getdeviceinfo has some of the pnfs fixes but that
the hch-pnfs/blocklayout-for-3.18 has newer fixes but without the getdeviceinfo
stuff. I'm testing with the older getdeviceinfo branch.

[hch-pnfs == git://git.infradead.org/users/hch/pnfs.git]

[Testing is not so easy because I need to merge in my pnfs-server as well as this
 here and I needed to do some forward porting as newest code was stuck on like 6
 month ago. That was easy, now I need to go figure out what Ganesha to use.

 Kernel-pnfs-server is out of the question because it is stuck on 3.12 and will not
 merge very well with this here, But I'm stupid I can just run a 3.12 based Server,
 and this here as client, Ye I'll go do this tomorrow. See who gets stuck sooner
 Ganesha or Kpnfsd
]


Thanks for working on this
Boaz


  reply	other threads:[~2014-08-24 17:49 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <pnfs block layout driver fixes V2>
2014-08-21 16:09 ` Christoph Hellwig
2014-08-21 16:09   ` [PATCH 01/19] nfs: cap request size to fit a kmalloced page array Christoph Hellwig
2014-08-21 16:09   ` [PATCH 02/19] pnfs: do not pass uninitialized lsegs to ->free_lseg Christoph Hellwig
2014-08-21 16:09   ` [PATCH 03/19] pnfs: force a layout commit when encountering busy segments during recall Christoph Hellwig
2014-08-24 17:49     ` Boaz Harrosh [this message]
2014-08-24 19:18       ` Christoph Hellwig
2014-08-26 14:10         ` [PATCH] pnfs: Kick a pnfs_layoutcommit_inode on recall Boaz Harrosh
2014-08-26 14:26           ` Trond Myklebust
2014-08-26 14:37             ` Boaz Harrosh
2014-08-26 14:52               ` Boaz Harrosh
2014-08-26 14:55               ` Trond Myklebust
2014-08-26 15:02                 ` Boaz Harrosh
2014-08-26 15:24                   ` Matt W. Benjamin
2014-08-26 15:36                     ` Trond Myklebust
2014-08-26 16:56                       ` Boaz Harrosh
2014-08-26 16:59                         ` Trond Myklebust
2014-08-26 17:06                           ` Boaz Harrosh
2014-08-26 17:54                             ` Trond Myklebust
2014-08-26 18:19                               ` Boaz Harrosh
2014-08-26 18:34                                 ` Boaz Harrosh
2014-08-26 18:41                                 ` Trond Myklebust
2014-08-26 19:46                                   ` Trond Myklebust
2014-08-27  8:50                                     ` Boaz Harrosh
2014-08-27  8:22                                   ` Boaz Harrosh
2014-09-09  0:37     ` [PATCH 03/19] pnfs: force a layout commit when encountering busy segments during recall Trond Myklebust
2014-09-09  5:49       ` Christoph Hellwig
2014-09-09 14:38         ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 04/19] pnfs: don't check sequence on new stateids in layoutget Christoph Hellwig
2014-08-21 16:09   ` [PATCH 05/19] pnfs: retry after a bad stateid error from layoutget Christoph Hellwig
2014-08-21 16:09   ` [PATCH 06/19] pnfs: avoid using stale stateids after layoutreturn Christoph Hellwig
2014-08-21 16:09   ` [PATCH 07/19] pnfs: add flag to force read-modify-write in ->write_begin Christoph Hellwig
2014-09-09  3:50     ` Trond Myklebust
2014-09-09  5:53       ` Christoph Hellwig
2014-09-09 14:41         ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 08/19] pnfs: add return_range method Christoph Hellwig
2014-08-25 13:50     ` Anna Schumaker
2014-08-25 14:09       ` Christoph Hellwig
2014-08-25 14:17         ` Anna Schumaker
2014-08-25 14:20           ` Christoph Hellwig
2014-09-09  3:57     ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 09/19] pnfs: allow splicing pre-encoded pages into the layoutcommit args Christoph Hellwig
2014-08-21 16:09   ` [PATCH 10/19] pnfs/blocklayout: reject pnfs blocksize larger than page size Christoph Hellwig
2014-08-21 16:09   ` [PATCH 11/19] pnfs/blocklayout: improve GETDEVICEINFO error reporting Christoph Hellwig
2014-08-21 16:09   ` [PATCH 12/19] pnfs/blocklayout: plug block queues Christoph Hellwig
2014-08-21 16:09   ` [PATCH 13/19] pnfs/blocklayout: correctly decrement extent length Christoph Hellwig
2015-02-09  6:01     ` NeilBrown
2015-02-09 18:24       ` Christoph Hellwig
2014-08-21 16:09   ` [PATCH 14/19] pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelist Christoph Hellwig
2014-09-09  4:43     ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 15/19] pnfs/blocklayout: don't set pages uptodate Christoph Hellwig
2014-09-09  4:48     ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 16/19] pnfs/blocklayout: rewrite extent tracking Christoph Hellwig
2014-08-25 14:36     ` Anna Schumaker
2014-08-25 14:43       ` Christoph Hellwig
2014-08-26  9:06         ` Boaz Harrosh
2014-09-09  4:50     ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 17/19] pnfs/blocklayout: implement the return_range method Christoph Hellwig
2014-09-09  4:03     ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 18/19] pnfs/blocklayout: return layouts on setattr Christoph Hellwig
2014-09-09  4:09     ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 19/19] pnfs/blocklayout: allocate separate pages for the layoutcommit payload Christoph Hellwig
2014-09-09  4:52     ` Trond Myklebust
2014-08-21 16:13   ` pnfs block layout driver fixes V2 Christoph Hellwig
2014-09-09  4:12     ` Trond Myklebust
2014-09-09  5:54       ` Christoph Hellwig
2014-09-09 14:40         ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53FA259C.9050807@gmail.com \
    --to=openosd@gmail.com \
    --cc=hch@lst.de \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.