All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trond.myklebust@primarydata.com>
To: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Matt Benjamin <matt@linuxbox.com>
Subject: Re: [PATCH] pnfs: Kick a pnfs_layoutcommit_inode on recall
Date: Tue, 26 Aug 2014 10:26:27 -0400	[thread overview]
Message-ID: <CAHQdGtQNDvofWVzn9CmtmbDRSj39BLT_gNDyY34uUFw_nEDe0Q@mail.gmail.com> (raw)
In-Reply-To: <53FC9545.4000800@plexistor.com>

On Tue, Aug 26, 2014 at 10:10 AM, Boaz Harrosh <boaz@plexistor.com> wrote:
> From: Boaz Harrosh <boaz@plexistor.com>
>
> This fixes a dead-lock in the pnfs recall processing
>
> pnfs_layoutcommit_inode() is called through update_inode()
> called from VFS. By setting set_inode_dirty during
> pnfs write IO.
>
> But the VFS will not schedule another update_inode()
> If it is already inside an update_inode() or an sb-writeback
>
> As part of writeback pnfs code might get stuck in LAYOUT_GET
> with the server returning ERR_RECALL_CONFLICT because some
> operation has caused the server to RECALL all layouts, including
> those from our client.
>
> So the RECALL is received, but our client is returning ERR_DELAY
> because its write-segments need a LAYOUT_COMMIT, but
> pnfs_layoutcommit_inode will never come because it is scheduled
> behind the LAYOUT_GET which is stuck waiting for the recall to
> finish
>
> Hence the deadlock, client is stuck polling LAYOUT_GET receiving
> ERR_RECALL_CONFLICT. Server is stuck polling RECALL receiving
> ERR_DELAY.
>
> With pnfs-objects the above condition can easily happen, when
> a file grows beyond a group of devices. The pnfs-objects-server
> will RECALL all layouts because the file-objects-map will
> change and all old layouts will have stale attributes, therefor
> the RECALL is initiated as part of a LAYOUT_GET, and this can
> be triggered from within a single client operation.
>
> A simple solution is to kick out a pnfs_layoutcommit_inode()
> from within the recall, to free any need-to-commit segments
> and let the client return success on the RECALL, so streaming
> can continue.
>
> This patch Is based on 3.17-rc1. It is completely UNTESTED.
> I have tested a version of this patch at around the 3.12 Kernel
> at which point the deadlock was resolved but I hit some race
> conditions on pnfs state management farther on, so the actual
> overall processing was not fixed. But hopefully these were fixed
> by Trond and Christoph, and it should work better now.
>
> Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
> ---
>  fs/nfs/callback_proc.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index 41db525..8660f96 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -171,6 +171,14 @@ static u32 initiate_file_draining(struct nfs_client *clp,
>                 goto out;
>
>         ino = lo->plh_inode;
> +
> +       spin_lock(&ino->i_lock);
> +       pnfs_set_layout_stateid(lo, &args->cbl_stateid, true);
> +       spin_unlock(&ino->i_lock);
> +
> +       /* kick out any segs held by need to commit */
> +       pnfs_layoutcommit_inode(ino, true);

Making this call synchronous could deadlock the entire back channel.
Is there any reason why it can't just be made asynchonous?

> +
>         spin_lock(&ino->i_lock);
>         if (test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags) ||
>             pnfs_mark_matching_lsegs_invalid(lo, &free_me_list,
> @@ -178,7 +186,6 @@ static u32 initiate_file_draining(struct nfs_client *clp,
>                 rv = NFS4ERR_DELAY;
>         else
>                 rv = NFS4ERR_NOMATCHING_LAYOUT;
> -       pnfs_set_layout_stateid(lo, &args->cbl_stateid, true);
>         spin_unlock(&ino->i_lock);
>         pnfs_free_lseg_list(&free_me_list);
>         pnfs_put_layout_hdr(lo);
> --
> 1.9.3
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

  reply	other threads:[~2014-08-26 14:26 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <pnfs block layout driver fixes V2>
2014-08-21 16:09 ` Christoph Hellwig
2014-08-21 16:09   ` [PATCH 01/19] nfs: cap request size to fit a kmalloced page array Christoph Hellwig
2014-08-21 16:09   ` [PATCH 02/19] pnfs: do not pass uninitialized lsegs to ->free_lseg Christoph Hellwig
2014-08-21 16:09   ` [PATCH 03/19] pnfs: force a layout commit when encountering busy segments during recall Christoph Hellwig
2014-08-24 17:49     ` Boaz Harrosh
2014-08-24 19:18       ` Christoph Hellwig
2014-08-26 14:10         ` [PATCH] pnfs: Kick a pnfs_layoutcommit_inode on recall Boaz Harrosh
2014-08-26 14:26           ` Trond Myklebust [this message]
2014-08-26 14:37             ` Boaz Harrosh
2014-08-26 14:52               ` Boaz Harrosh
2014-08-26 14:55               ` Trond Myklebust
2014-08-26 15:02                 ` Boaz Harrosh
2014-08-26 15:24                   ` Matt W. Benjamin
2014-08-26 15:36                     ` Trond Myklebust
2014-08-26 16:56                       ` Boaz Harrosh
2014-08-26 16:59                         ` Trond Myklebust
2014-08-26 17:06                           ` Boaz Harrosh
2014-08-26 17:54                             ` Trond Myklebust
2014-08-26 18:19                               ` Boaz Harrosh
2014-08-26 18:34                                 ` Boaz Harrosh
2014-08-26 18:41                                 ` Trond Myklebust
2014-08-26 19:46                                   ` Trond Myklebust
2014-08-27  8:50                                     ` Boaz Harrosh
2014-08-27  8:22                                   ` Boaz Harrosh
2014-09-09  0:37     ` [PATCH 03/19] pnfs: force a layout commit when encountering busy segments during recall Trond Myklebust
2014-09-09  5:49       ` Christoph Hellwig
2014-09-09 14:38         ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 04/19] pnfs: don't check sequence on new stateids in layoutget Christoph Hellwig
2014-08-21 16:09   ` [PATCH 05/19] pnfs: retry after a bad stateid error from layoutget Christoph Hellwig
2014-08-21 16:09   ` [PATCH 06/19] pnfs: avoid using stale stateids after layoutreturn Christoph Hellwig
2014-08-21 16:09   ` [PATCH 07/19] pnfs: add flag to force read-modify-write in ->write_begin Christoph Hellwig
2014-09-09  3:50     ` Trond Myklebust
2014-09-09  5:53       ` Christoph Hellwig
2014-09-09 14:41         ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 08/19] pnfs: add return_range method Christoph Hellwig
2014-08-25 13:50     ` Anna Schumaker
2014-08-25 14:09       ` Christoph Hellwig
2014-08-25 14:17         ` Anna Schumaker
2014-08-25 14:20           ` Christoph Hellwig
2014-09-09  3:57     ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 09/19] pnfs: allow splicing pre-encoded pages into the layoutcommit args Christoph Hellwig
2014-08-21 16:09   ` [PATCH 10/19] pnfs/blocklayout: reject pnfs blocksize larger than page size Christoph Hellwig
2014-08-21 16:09   ` [PATCH 11/19] pnfs/blocklayout: improve GETDEVICEINFO error reporting Christoph Hellwig
2014-08-21 16:09   ` [PATCH 12/19] pnfs/blocklayout: plug block queues Christoph Hellwig
2014-08-21 16:09   ` [PATCH 13/19] pnfs/blocklayout: correctly decrement extent length Christoph Hellwig
2015-02-09  6:01     ` NeilBrown
2015-02-09 18:24       ` Christoph Hellwig
2014-08-21 16:09   ` [PATCH 14/19] pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelist Christoph Hellwig
2014-09-09  4:43     ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 15/19] pnfs/blocklayout: don't set pages uptodate Christoph Hellwig
2014-09-09  4:48     ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 16/19] pnfs/blocklayout: rewrite extent tracking Christoph Hellwig
2014-08-25 14:36     ` Anna Schumaker
2014-08-25 14:43       ` Christoph Hellwig
2014-08-26  9:06         ` Boaz Harrosh
2014-09-09  4:50     ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 17/19] pnfs/blocklayout: implement the return_range method Christoph Hellwig
2014-09-09  4:03     ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 18/19] pnfs/blocklayout: return layouts on setattr Christoph Hellwig
2014-09-09  4:09     ` Trond Myklebust
2014-08-21 16:09   ` [PATCH 19/19] pnfs/blocklayout: allocate separate pages for the layoutcommit payload Christoph Hellwig
2014-09-09  4:52     ` Trond Myklebust
2014-08-21 16:13   ` pnfs block layout driver fixes V2 Christoph Hellwig
2014-09-09  4:12     ` Trond Myklebust
2014-09-09  5:54       ` Christoph Hellwig
2014-09-09 14:40         ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHQdGtQNDvofWVzn9CmtmbDRSj39BLT_gNDyY34uUFw_nEDe0Q@mail.gmail.com \
    --to=trond.myklebust@primarydata.com \
    --cc=boaz@plexistor.com \
    --cc=hch@lst.de \
    --cc=linux-nfs@vger.kernel.org \
    --cc=matt@linuxbox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.