linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: devel@lists.nfs-ganesha.org, sostapov@redhat.com,
	Supriti.Singh@suse.com, "open list:NFS, SUNRPC,
	AND..." <linux-nfs@vger.kernel.org>
Subject: Re: [RFC PATCH] rados_cluster: add a "design" manpage
Date: Fri, 01 Jun 2018 06:42:37 -0400	[thread overview]
Message-ID: <fbc5ad16342dc1b4d50041bfa1274fc0197486a9.camel@kernel.org> (raw)
In-Reply-To: <20180531213733.GB4654@fieldses.org>

On Thu, 2018-05-31 at 17:37 -0400, J. Bruce Fields wrote:
> On Wed, May 23, 2018 at 08:21:40AM -0400, Jeff Layton wrote:
> > From: Jeff Layton <jlayton@redhat.com>
> > 
> > Bruce asked for better design documentation, so this is my attempt at
> > it. Let me know what you think. I'll probably end up squashing this into
> > one of the code patches but for now I'm sending this separately to see
> > if it helps clarify things.
> > 
> > Suggestions and feedback are welcome.
> > 
> > Change-Id: I53cc77f66b2407c2083638e5760666639ba1fd57
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  src/doc/man/ganesha-rados-cluster.rst | 227 ++++++++++++++++++++++++++
> >  1 file changed, 227 insertions(+)
> >  create mode 100644 src/doc/man/ganesha-rados-cluster.rst
> > 
> > diff --git a/src/doc/man/ganesha-rados-cluster.rst b/src/doc/man/ganesha-rados-cluster.rst
> > new file mode 100644
> > index 000000000000..1ba2d3c29093
> > --- /dev/null
> > +++ b/src/doc/man/ganesha-rados-cluster.rst
> > @@ -0,0 +1,227 @@
> > +==============================================================================
> > +ganesha-rados-cluster-design -- Clustered RADOS Recovery Backend Design
> > +==============================================================================
> > +
> > +.. program:: ganesha-rados-cluster-design
> > +
> > +This document aims to explain the theory and design behind the
> > +rados_cluster recovery backend, which coordinates grace period
> > +enforcement among multiple, independent NFS servers.
> > +
> > +In order to understand the clustered recovery backend, it's first necessary
> > +to understand how recovery works with a single server:
> > +
> > +Singleton Server Recovery
> > +-------------------------
> > +NFSv4 is a lease-based protocol. Clients set up a relationship to the
> > +server and must periodically renew their lease in order to maintain
> > +their ephemeral state (open files, locks, delegations or layouts).
> > +
> > +When a singleton NFS server is restarted, any ephemeral state is lost. When
> > +the server comes comes back online, NFS clients detect that the server has
> > +been restarted and will reclaim the ephemeral state that they held at the
> > +time of their last contact with the server.
> > +
> > +Singleton Grace Period
> > +----------------------
> > +
> > +In order to ensure that we don't end up with conflicts, clients are
> > +barred from acquiring any new state while in the Recovery phase. Only
> > +reclaim operations are allowed.
> > +
> > +This period of time is called the **grace period**. Most NFS servers
> > +have a grace period that lasts around two lease periods, however
> 
> knfsd's is one lease period, who does two?
> 
> (Still catching up on the rest.  Looks good.)
> 
> --b.

(cc'ing linux-nfs)

Thanks for having a look. Hmm...you're right.

        nn->nfsd4_lease = 90;   /* default lease time */                        
        nn->nfsd4_grace = 90;                                                   

nit: We should probably add a #define'd constant for that at some
point...but, might this be problematic?

In the pessimal case, you might renew your lease just before the server
crashes. It then comes back up quickly and starts the grace period. By
the time the client contacts the server again the grace period is almost
over and you may have very little time to actually do any reclaim.

ISTR that when we were working on the server at PD we had determined
that we needed around 2 grace periods + a small fudge factor. I don't
recall the details of how we determined it though.

Even worse: 

        $ cat /proc/sys/fs/lease-break-time 
        45

Maybe we should be basing the v4 lease time on the lease-break-time
value? It seems like we ought to revoke delegations after two lease
periods rather than after half of one.
-- 
Jeff Layton <jlayton@kernel.org>

       reply	other threads:[~2018-06-01 10:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20180523122140.8373-1-jlayton@kernel.org>
     [not found] ` <20180531213733.GB4654@fieldses.org>
2018-06-01 10:42   ` Jeff Layton [this message]
2018-06-01 14:17     ` [RFC PATCH] rados_cluster: add a "design" manpage J. Bruce Fields
2018-06-01 14:33       ` Jeff Layton
2018-06-01 14:46         ` J. Bruce Fields
2018-06-08 16:33           ` J. Bruce Fields
2018-06-14 10:01             ` Jeff Layton
2018-06-14 16:32               ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fbc5ad16342dc1b4d50041bfa1274fc0197486a9.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=Supriti.Singh@suse.com \
    --cc=bfields@fieldses.org \
    --cc=devel@lists.nfs-ganesha.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sostapov@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).