Re: nfsd: another possible delegation race

From: "NeilBrown" <neilb@suse.de>
To: "Chuck Lever III" <chuck.lever@oracle.com>
Cc: "Jeff Layton" <jlayton@kernel.org>,
	"Linux NFS Mailing List" <linux-nfs@vger.kernel.org>
Subject: Re: nfsd: another possible delegation race
Date: Wed, 05 Oct 2022 09:17:29 +1100	[thread overview]
Message-ID: <166492184913.14457.15445320504611194255@noble.neil.brown.name> (raw)
In-Reply-To: <A20D72B2-097F-4240-A894-0759B53C02F4@oracle.com>

On Wed, 05 Oct 2022, Chuck Lever III wrote:
> 
> > On Oct 4, 2022, at 1:14 AM, NeilBrown <neilb@suse.de> wrote:
> > 
> > 
> > Hi,
> > I have a customer who experienced a crash in nfsd which appears to be
> > related to delegation return.  I cannot completely rule-out
> >  Commit 548ec0805c39 ("nfsd: fix use-after-free due to delegation race")
> > as the kernel being used didn't have that commit, but the symptoms are
> > quite different, and while exploring I found, I think, a different
> > race.  This new race doesn't obviously address all the symptoms, but
> > maybe it does...
> > 
> > The symptoms were:
> >  1/   WARN_ON(!unhash_delegation_locked(dp));
> >    in nfs4_laundromat complained (delegation wasn't hashed!)
> >  2/   refcount_t: saturated; leaking memory
> >    This came from the refcount_inc in revoke_delegation() called from
> >    nfs4_laundromat(), a few lines below the above warning
> >  3/ BUG: kernel NULL pointer dereference, address: 0000000000000028
> >     This is from the destroy_unhashed_deleg() call at the end of
> >     that same revoke_delegation() call, which calls
> >     nfs4_unlock_deleg_lease() and passes fp->fi_deleg_file, which is
> >     NULL (!!!), to vfs_setlease().
> >  These three happened in a 200usec window.
> > 
> > What I imagine might be happening is that the nfsd_break_deleg_cb()
> > callback is called after destroy_delegation() has unhashed the deleg,
> > but before destroy_unhashed_delegation() gets called.
> > 
> > If nfsd_break_deleg_cb() is called before the unhash - and particularly
> > if nfsd_break_one_deleg()->nfsd4_run_cb() is called before, then the
> > unhash will disconnect the delegation from the recall list, and no
> > harm can be done.
> > Once vfs_setlease(F_UNLCK) is called, the callback can no longer be
> > called, so again no harm is possible.
> > 
> > Between these two is a race window.  The delegation can be put on the
> > recall list, but the deleg will be unhashed and put_deleg_file() will
> > have set fi_deleg_file to NULL - resulting in first WARNING and the
> > BUG.
> 
> That seems plausible. I've been accepting defensive patches like
> what you proposed below, so I can queue that up for v6.2 as soon as
> you post an official version.
> 
> It would help to know the kernel version where you encountered
> these symptoms, and to have a rough description of the workload;
> I assume you do not have a reliable reproducer. I'm wondering if
> there should be a bug report too (bugzilla.linux-nfs.org)?
> 
Kernel version 5.3.18 plus various backported patches SLE15-SP3 from
July 2021, so not ancient but not the most recent either.

I don't have any workload information.
bug 394 seem much the same, though details might be different.

> 
> > I cannot see how the refcount_t warning can be generated ...  so maybe
> > I've missed something.
> 
> stid refcounting does not seem reliable in the current code base.
> It's possible that the overflow is a separate issue that simply
> appeared at the same time or due to the same conditions that
> triggered the BUG.

Maybe ... though refcount is quite noisy about anything suspicious....

Aha.. The refcount is at the start of the structure.  When slub frees an
allocation, it put is on a freelist with pointers at the start of the
allocation.  So freeing an nfsd_deleg will corrupt the refcount.
So when refcount_inc() is called on a freed nfsd_deleg, we get the
"saturation" error, because pointers tend to look like negative numbers.

One would expect to see sc_type corrupt too because it is within the
first 64 bits of the start of the struct, but it is set explicitly in
revoke_delegation() which happens after the struct is freed, and is
where the crash happens.

Oh good - I think I understand it all now.  Thanks :-)

NeilBrown