Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "bfields@fieldses.org" <bfields@fieldses.org>
Cc: "kinglongmee@gmail.com" <kinglongmee@gmail.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"bfields@redhat.com" <bfields@redhat.com>
Subject: Re: [PATCH] SUNRPC/cache: Allow garbage collection of invalid cache entries
Date: Thu, 26 Mar 2020 21:42:19 +0000
Message-ID: <1a0ce8bb1150835f7a25126df2524e8a8fb0e112.camel@hammerspace.com> (raw)
In-Reply-To: <20200326204001.GA25053@fieldses.org>

Hi Bruce,

On Thu, 2020-03-26 at 16:40 -0400, bfields@fieldses.org wrote:
> Sorry, just getting back to this:
> 
> On Fri, Feb 07, 2020 at 01:18:17PM -0500, bfields@fieldses.org wrote:
> > On Fri, Feb 07, 2020 at 02:25:27PM +0000, Trond Myklebust wrote:
> > > On Thu, 2020-02-06 at 11:33 -0500, J. Bruce Fields wrote:
> > > > On Tue, Jan 14, 2020 at 11:57:38AM -0500, Trond Myklebust
> > > > wrote:
> > > > > If the cache entry never gets initialised, we want the
> > > > > garbage
> > > > > collector to be able to evict it. Otherwise if the upcall
> > > > > daemon
> > > > > fails to initialise the entry, we end up never expiring it.
> > > > 
> > > > Could you tell us more about what motivated this?
> > > > 
> > > > It's causing failures on pynfs server-reboot tests.  I haven't
> > > > pinned
> > > > down the cause yet, but it looks like it could be a regression
> > > > to the
> > > > behavior Kinglong Mee describes in detail in his original
> > > > patch.
> > > > 
> > > 
> > > Can you point me to the tests that are failing?
> > 
> > I'm basically doing
> > 
> > 	./nfs4.1/testserver.py myserver:/path reboot
> > 			--serverhelper=examples/server_helper.sh
> > 			--serverhelperarg=myserver
> > 
> > For all I know at this point, the change could be exposing a pynfs-
> > side
> > bug.
> 
> From a trace, it's clear that the server is actually becoming
> unresponsive, so it's not a pynfs bug.
> 
> > > The motivation here is to allow the garbage collector to do its
> > > job of
> > > evicting cache entries after they are supposed to have timed out.
> > 
> > Understood.  I was curious whether this was found by code
> > inspection or
> > because you'd run across a case where the leak was causing a
> > practical
> > problem.
> 
> I'm still curious.
> 
> > > The fact that uninitialised cache entries are given an infinite
> > > lifetime, and are never evicted is a de facto memory leak if, for
> > > instance, the mountd daemon ignores the cache request, or the
> > > downcall
> > > in expkey_parse() or svc_export_parse() fails without being able
> > > to
> > > update the request.
> 
> If mountd ignores cache requests, or downcalls fail, then the
> server's
> broken anyway.  The server can't do anything without mountd.
> 
> > > The threads that are waiting for the cache replies already have a
> > > mechanism for dealing with timeouts (with cache_wait_req() and
> > > deferred requests), so the question is what is so special about
> > > uninitialised requests that we have to leak them in order to
> > > avoid a
> > > problem with reboot?
> 
> I'm not sure I have this right yet.  I'm just staring at the code and
> at
> Kinglong Mee's description on d6fc8821c2d2.  I think the way it works
> is
> that a cash flush from mountd results in all cache entries (including
> invalid entries that nfsd threads are waiting on) being considered
> expired.  So cache_check() returns an immediate ETIMEDOUT without
> waiting.
> 
> Maybe the cache_is_expired() logic should be something more like:
> 
> 	if (h->expiry_time < seconds_since_boot())
> 		return true;
> 	if (!test_bit(CACHE_VALID, &h->flags))
> 		return false;
> 	return h->expiry_time < seconds_since_boot();
> 
> So invalid cache entries (which are waiting for a reply from mountd)
> can
> expire, but they can't be flushed.  If that makes sense.
> 
> As a stopgap we may want to revert or drop the "Allow garbage
> collection" patch, as the (preexisting) memory leak seems lower
> impact
> than the server hang.
> 

I believe you were probably seeing the effect of the
cache_listeners_exist() test, which is just wrong for all cache upcall
users except idmapper and svcauth_gss. We should not be creating
negative cache entries just because the rpc.mountd daemon happens to be
slow to connect to the upcall pipes when starting up, or because it
crashes and fails to restart correctly.

That's why, when I resubmitted this patch, I included 
https://git.linux-nfs.org/?p=cel/cel-2.6.git;a=commitdiff;h=b840228cd6096bebe16b3e4eb5d93597d0e02c6d

which turns off that particular test for all the upcalls to rpc.mountd.

Note that the patch series I sent includes a bunch of kernel
tracepoints that we can enable to help debugging if we see anything
similar happening in the future.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply index

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 16:57 Trond Myklebust
2020-02-06 16:33 ` J. Bruce Fields
2020-02-07 14:25   ` Trond Myklebust
2020-02-07 18:18     ` bfields
2020-02-10 18:47       ` Trond Myklebust
2020-03-26 20:40       ` bfields
2020-03-26 21:42         ` Trond Myklebust [this message]
2020-03-27  1:50           ` J. Bruce Fields
2020-03-27 12:33             ` Trond Myklebust
2020-03-27 15:53               ` [PATCH] SUNRPC/cache: don't allow invalid entries to be flushed J. Bruce Fields
2020-03-27 16:15                 ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a0ce8bb1150835f7a25126df2524e8a8fb0e112.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=bfields@fieldses.org \
    --cc=bfields@redhat.com \
    --cc=kinglongmee@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git