All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@gmail.com>
To: Olga Kornievskaia <aglo@umich.edu>
Cc: linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries
Date: Tue, 02 Apr 2019 11:23:21 -0700	[thread overview]
Message-ID: <141dfd2929e37043e545458da1534b75f14f4baf.camel@gmail.com> (raw)
In-Reply-To: <CAN-5tyE0CC9+JthUp2ic0fQ+OzRrrnWpD9DbVgT47ezD4j-mHw@mail.gmail.com>

On Mon, 2019-04-01 at 12:27 -0400, Olga Kornievskaia wrote:
> On Fri, Mar 29, 2019 at 6:03 PM Trond Myklebust <trondmy@gmail.com>
> wrote:
> > When the client is reading or writing using pNFS, and hits an error
> > on the DS,
> 
> Doesn't the client retry IO against the MDS when IO to the DS fails?
> I
> find the commit message confusing. What re-tries are we talking
> about?
> I recall after a while the client will try to get a layout again and
> if it succeeds it will send the IO to the DS. So are you trying to
> prevent these new retries to the DS that will fail (as you say if DS
> is in unrecoverable state)? Then why would there be a fatal error
> since writing thru the MDS should (hopefully) always succeed?

You are thinking about tightly coupled pNFS systems, where the MDS has
a 'special relationship' with the DSes. On a more generic system, such
as flexfiles, there is no point in doing write through the MDS because
the MDS typically has no better chance of success than the client.

As you can see from the patch, that is the main case we're targetting
here. There is no change to the other pNFS layout behaviours.

> > then it typically sends a LAYOUTERROR and/or LAYOUTRETURN
> > to the MDS, before redirtying the failed pages, and going for a new
> > round of reads/writebacks. The problem is that if the server has no
> > way to fix the DS, then we may need a way to interrupt this loop
> > after a set number of attempts have been made.
> > This patch adds an optional module parameter that allows the admin
> > to specify how many times to retry the read/writeback process
> > before
> > failing with a fatal error.
> > The default behaviour is to retry forever.
> > 
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  fs/nfs/direct.c                        |  7 +++++++
> >  fs/nfs/flexfilelayout/flexfilelayout.c |  8 ++++++++
> >  fs/nfs/pagelist.c                      | 14 +++++++++++++-
> >  fs/nfs/write.c                         |  5 +++++
> >  include/linux/nfs_page.h               |  4 +++-
> >  5 files changed, 36 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> > index 2d301a1a73e2..2436bd92bc00 100644
> > --- a/fs/nfs/direct.c
> > +++ b/fs/nfs/direct.c
> > @@ -663,6 +663,8 @@ static void nfs_direct_write_reschedule(struct
> > nfs_direct_req *dreq)
> >         }
> > 
> >         list_for_each_entry_safe(req, tmp, &reqs, wb_list) {
> > +               /* Bump the transmission count */
> > +               req->wb_nio++;
> >                 if (!nfs_pageio_add_request(&desc, req)) {
> >                         nfs_list_move_request(req, &failed);
> >                         spin_lock(&cinfo.inode->i_lock);
> > @@ -703,6 +705,11 @@ static void nfs_direct_commit_complete(struct
> > nfs_commit_data *data)
> >                 req = nfs_list_entry(data->pages.next);
> >                 nfs_list_remove_request(req);
> >                 if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES) {
> > +                       /*
> > +                        * Despite the reboot, the write was
> > successful,
> > +                        * so reset wb_nio.
> > +                        */
> > +                       req->wb_nio = 0;
> >                         /* Note the rewrite will go through mds */
> >                         nfs_mark_request_commit(req, NULL, &cinfo,
> > 0);
> >                 } else
> > diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c
> > b/fs/nfs/flexfilelayout/flexfilelayout.c
> > index 6673d4ff5a2a..9fdbcfd3e39d 100644
> > --- a/fs/nfs/flexfilelayout/flexfilelayout.c
> > +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> > @@ -28,6 +28,8 @@
> >  #define FF_LAYOUT_POLL_RETRY_MAX     (15*HZ)
> >  #define FF_LAYOUTRETURN_MAXERR 20
> > 
> > +static unsigned short io_maxretrans;
> > +
> >  static void ff_layout_read_record_layoutstats_done(struct rpc_task
> > *task,
> >                 struct nfs_pgio_header *hdr);
> >  static int ff_layout_mirror_prepare_stats(struct pnfs_layout_hdr
> > *lo,
> > @@ -925,6 +927,7 @@ ff_layout_pg_init_read(struct
> > nfs_pageio_descriptor *pgio,
> >         pgm = &pgio->pg_mirrors[0];
> >         pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize;
> > 
> > +       pgio->pg_maxretrans = io_maxretrans;
> >         return;
> >  out_nolseg:
> >         if (pgio->pg_error < 0)
> > @@ -992,6 +995,7 @@ ff_layout_pg_init_write(struct
> > nfs_pageio_descriptor *pgio,
> >                 pgm->pg_bsize = mirror->mirror_ds-
> > >ds_versions[0].wsize;
> >         }
> > 
> > +       pgio->pg_maxretrans = io_maxretrans;
> >         return;
> > 
> >  out_mds:
> > @@ -2515,3 +2519,7 @@ MODULE_DESCRIPTION("The NFSv4 flexfile layout
> > driver");
> > 
> >  module_init(nfs4flexfilelayout_init);
> >  module_exit(nfs4flexfilelayout_exit);
> > +
> > +module_param(io_maxretrans, ushort, 0644);
> > +MODULE_PARM_DESC(io_maxretrans, "The  number of times the NFSv4.1
> > client "
> > +                       "retries an I/O request before returning an
> > error. ");
> > diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> > index b8301c40dd78..4a31284f411e 100644
> > --- a/fs/nfs/pagelist.c
> > +++ b/fs/nfs/pagelist.c
> > @@ -16,8 +16,8 @@
> >  #include <linux/nfs.h>
> >  #include <linux/nfs3.h>
> >  #include <linux/nfs4.h>
> > -#include <linux/nfs_page.h>
> >  #include <linux/nfs_fs.h>
> > +#include <linux/nfs_page.h>
> >  #include <linux/nfs_mount.h>
> >  #include <linux/export.h>
> > 
> > @@ -327,6 +327,7 @@ __nfs_create_request(struct nfs_lock_context
> > *l_ctx, struct page *page,
> >         req->wb_bytes   = count;
> >         req->wb_context = get_nfs_open_context(ctx);
> >         kref_init(&req->wb_kref);
> > +       req->wb_nio = 0;
> >         return req;
> >  }
> > 
> > @@ -370,6 +371,7 @@ nfs_create_subreq(struct nfs_page *req, struct
> > nfs_page *last,
> >                 nfs_lock_request(ret);
> >                 ret->wb_index = req->wb_index;
> >                 nfs_page_group_init(ret, last);
> > +               ret->wb_nio = req->wb_nio;
> >         }
> >         return ret;
> >  }
> > @@ -724,6 +726,7 @@ void nfs_pageio_init(struct
> > nfs_pageio_descriptor *desc,
> >         desc->pg_mirrors_dynamic = NULL;
> >         desc->pg_mirrors = desc->pg_mirrors_static;
> >         nfs_pageio_mirror_init(&desc->pg_mirrors[0], bsize);
> > +       desc->pg_maxretrans = 0;
> >  }
> > 
> >  /**
> > @@ -983,6 +986,15 @@ static int nfs_pageio_do_add_request(struct
> > nfs_pageio_descriptor *desc,
> >                         return 0;
> >                 mirror->pg_base = req->wb_pgbase;
> >         }
> > +
> > +       if (desc->pg_maxretrans && req->wb_nio > desc-
> > >pg_maxretrans) {
> > +               if (NFS_SERVER(desc->pg_inode)->flags &
> > NFS_MOUNT_SOFTERR)
> > +                       desc->pg_error = -ETIMEDOUT;
> > +               else
> > +                       desc->pg_error = -EIO;
> > +               return 0;
> > +       }
> > +
> >         if (!nfs_can_coalesce_requests(prev, req, desc))
> >                 return 0;
> >         nfs_list_move_request(req, &mirror->pg_list);
> > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > index 0712d886ff08..908b166d635d 100644
> > --- a/fs/nfs/write.c
> > +++ b/fs/nfs/write.c
> > @@ -1009,6 +1009,8 @@ static void nfs_write_completion(struct
> > nfs_pgio_header *hdr)
> >                         goto remove_req;
> >                 }
> >                 if (nfs_write_need_commit(hdr)) {
> > +                       /* Reset wb_nio, since the write was
> > successful. */
> > +                       req->wb_nio = 0;
> >                         memcpy(&req->wb_verf, &hdr->verf.verifier,
> > sizeof(req->wb_verf));
> >                         nfs_mark_request_commit(req, hdr->lseg,
> > &cinfo,
> >                                 hdr->pgio_mirror_idx);
> > @@ -1142,6 +1144,7 @@ static struct nfs_page
> > *nfs_try_to_update_request(struct inode *inode,
> >                 req->wb_bytes = end - req->wb_offset;
> >         else
> >                 req->wb_bytes = rqend - req->wb_offset;
> > +       req->wb_nio = 0;
> >         return req;
> >  out_flushme:
> >         /*
> > @@ -1416,6 +1419,8 @@ static void nfs_initiate_write(struct
> > nfs_pgio_header *hdr,
> >   */
> >  static void nfs_redirty_request(struct nfs_page *req)
> >  {
> > +       /* Bump the transmission count */
> > +       req->wb_nio++;
> >         nfs_mark_request_dirty(req);
> >         set_bit(NFS_CONTEXT_RESEND_WRITES, &req->wb_context-
> > >flags);
> >         nfs_end_page_writeback(req);
> > diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
> > index b7d0f15615c2..8b36800d342d 100644
> > --- a/include/linux/nfs_page.h
> > +++ b/include/linux/nfs_page.h
> > @@ -53,6 +53,7 @@ struct nfs_page {
> >         struct nfs_write_verifier       wb_verf;        /* Commit
> > cookie */
> >         struct nfs_page         *wb_this_page;  /* list of reqs for
> > this page */
> >         struct nfs_page         *wb_head;       /* head pointer for
> > req list */
> > +       unsigned short          wb_nio;         /* Number of I/O
> > attempts */
> >  };
> > 
> >  struct nfs_pageio_descriptor;
> > @@ -87,7 +88,6 @@ struct nfs_pgio_mirror {
> >  };
> > 
> >  struct nfs_pageio_descriptor {
> > -       unsigned char           pg_moreio : 1;
> >         struct inode            *pg_inode;
> >         const struct nfs_pageio_ops *pg_ops;
> >         const struct nfs_rw_ops *pg_rw_ops;
> > @@ -105,6 +105,8 @@ struct nfs_pageio_descriptor {
> >         struct nfs_pgio_mirror  pg_mirrors_static[1];
> >         struct nfs_pgio_mirror  *pg_mirrors_dynamic;
> >         u32                     pg_mirror_idx;  /* current mirror
> > */
> > +       unsigned short          pg_maxretrans;
> > +       unsigned char           pg_moreio : 1;
> >  };
> > 
> >  /* arbitrarily selected limit to number of mirrors */
> > --
> > 2.20.1
> > 


  reply	other threads:[~2019-04-02 18:23 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-29 21:59 [PATCH v2 00/28] Fix up soft mounts for NFSv4.x Trond Myklebust
2019-03-29 21:59 ` [PATCH v2 01/28] SUNRPC: Fix up task signalling Trond Myklebust
2019-03-29 21:59   ` [PATCH v2 02/28] SUNRPC: Refactor rpc_restart_call/rpc_restart_call_prepare Trond Myklebust
2019-03-29 21:59     ` [PATCH v2 03/28] SUNRPC: Refactor xprt_request_wait_receive() Trond Myklebust
2019-03-29 21:59       ` [PATCH v2 04/28] SUNRPC: Refactor rpc_sleep_on() Trond Myklebust
2019-03-29 21:59         ` [PATCH v2 05/28] SUNRPC: Remove unused argument 'action' from rpc_sleep_on_priority() Trond Myklebust
2019-03-29 21:59           ` [PATCH v2 06/28] SUNRPC: Add function rpc_sleep_on_timeout() Trond Myklebust
2019-03-29 21:59             ` [PATCH v2 07/28] SUNRPC: Fix up tracking of timeouts Trond Myklebust
2019-03-29 21:59               ` [PATCH v2 08/28] SUNRPC: Simplify queue timeouts using timer_reduce() Trond Myklebust
2019-03-29 21:59                 ` [PATCH v2 09/28] SUNRPC: Declare RPC timers as TIMER_DEFERRABLE Trond Myklebust
2019-03-29 21:59                   ` [PATCH v2 10/28] SUNRPC: Ensure that the transport layer respect major timeouts Trond Myklebust
2019-03-29 21:59                     ` [PATCH v2 11/28] SUNRPC: Add tracking of RPC level errors Trond Myklebust
2019-03-29 21:59                       ` [PATCH v2 12/28] SUNRPC: Make "no retrans timeout" soft tasks behave like softconn for timeouts Trond Myklebust
2019-03-29 21:59                         ` [PATCH v2 13/28] SUNRPC: Start the first major timeout calculation at task creation Trond Myklebust
2019-03-29 21:59                           ` [PATCH v2 14/28] SUNRPC: Ensure to ratelimit the "server not responding" syslog messages Trond Myklebust
2019-03-29 21:59                             ` [PATCH v2 15/28] SUNRPC: Add the 'softerr' rpc_client flag Trond Myklebust
2019-03-29 21:59                               ` [PATCH v2 16/28] NFS: Consider ETIMEDOUT to be a fatal error Trond Myklebust
2019-03-29 21:59                                 ` [PATCH v2 17/28] NFS: Move internal constants out of uapi/linux/nfs_mount.h Trond Myklebust
2019-03-29 21:59                                   ` [PATCH v2 18/28] NFS: Add a mount option "softerr" to allow clients to see ETIMEDOUT errors Trond Myklebust
2019-03-29 21:59                                     ` [PATCH v2 19/28] NFS: Don't interrupt file writeout due to fatal errors Trond Myklebust
2019-03-29 21:59                                       ` [PATCH v2 20/28] NFS: Don't call generic_error_remove_page() while holding locks Trond Myklebust
2019-03-29 21:59                                         ` [PATCH v2 21/28] NFS: Don't inadvertently clear writeback errors Trond Myklebust
2019-03-29 21:59                                           ` [PATCH v2 22/28] NFS: Replace custom error reporting mechanism with generic one Trond Myklebust
2019-03-29 21:59                                             ` [PATCH v2 23/28] NFS: Fix up NFS I/O subrequest creation Trond Myklebust
2019-03-29 21:59                                               ` [PATCH v2 24/28] NFS: Remove unused argument from nfs_create_request() Trond Myklebust
2019-03-29 21:59                                                 ` [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries Trond Myklebust
2019-03-29 21:59                                                   ` [PATCH v2 26/28] NFS: Allow signal interruption of NFS4ERR_DELAYed operations Trond Myklebust
2019-03-29 21:59                                                     ` [PATCH v2 27/28] NFS: Add a helper to return a pointer to the open context of a struct nfs_page Trond Myklebust
2019-03-29 21:59                                                       ` [PATCH v2 28/28] NFS: Remove redundant open context from nfs_page Trond Myklebust
2019-04-05 19:42                                                         ` Anna Schumaker
2019-04-06 14:14                                                           ` Trond Myklebust
2019-04-08 17:11                                                             ` Anna Schumaker
2019-04-08 18:13                                                               ` Trond Myklebust
2019-04-08 18:15                                                                 ` Anna Schumaker
2019-04-01 16:27                                                   ` [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries Olga Kornievskaia
2019-04-02 18:23                                                     ` Trond Myklebust [this message]
2019-04-01 16:54 ` [PATCH v2 00/28] Fix up soft mounts for NFSv4.x Olga Kornievskaia
2019-04-02 18:28   ` Trond Myklebust
2019-04-03 20:51     ` Mkrtchyan, Tigran
2019-04-03 21:13       ` Trond Myklebust
2019-04-03 21:59         ` Mkrtchyan, Tigran
2019-04-03 22:10           ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=141dfd2929e37043e545458da1534b75f14f4baf.camel@gmail.com \
    --to=trondmy@gmail.com \
    --cc=aglo@umich.edu \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.