All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "bfields@fieldses.org" <bfields@fieldses.org>
Cc: "olivier@bm-services.com" <olivier@bm-services.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"carnil@debian.org" <carnil@debian.org>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work
Date: Wed, 24 Nov 2021 17:14:53 +0000	[thread overview]
Message-ID: <e17bbb0a22b154c77c6ec82aad63424f70bfda94.camel@hammerspace.com> (raw)
In-Reply-To: <20211124161038.GC30602@fieldses.org>

On Wed, 2021-11-24 at 11:10 -0500, bfields@fieldses.org wrote:
> On Wed, Nov 24, 2021 at 03:59:47PM +0000, Trond Myklebust wrote:
> > On Wed, 2021-11-24 at 10:29 -0500, Bruce Fields wrote:
> > > On Mon, Nov 22, 2021 at 03:17:28PM +0000, Chuck Lever III wrote:
> > > > 
> > > > 
> > > > > On Nov 22, 2021, at 4:15 AM, Olivier Monaco
> > > > > <olivier@bm-services.com> wrote:
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > I think my problem is related to this thread.
> > > > > 
> > > > > We are running a VMware vCenter platform running 9 groups of
> > > > > virtual machines. Each group includes a VM with NFS for file
> > > > > sharing and 3 VMs with NFS clients, so we are running 9
> > > > > independent file servers.
> > > > 
> > > > I've opened https://bugzilla.linux-nfs.org/show_bug.cgi?id=371
> > > > 
> > > > Just a random thought: would enabling KASAN shed some light?
> > > 
> > > In fact, we've gotten reports from Redhat QE of a KASAN use-
> > > after-
> > > free
> > > warning in the laundromat thread, which I think might be the same
> > > bug.
> > > We've been getting occasional reports of problems here for a long
> > > time,
> > > but they've been very hard to reproduce.
> > > 
> > > After fooling with their reproducer, I think I finally have it.
> > > Embarrasingly, it's nothing that complicated.  You can make it
> > > much
> > > easier to reproduce by adding an msleep call after the
> > > vfs_setlease
> > > in
> > > nfs4_set_delegation.
> > > 
> > > If it's possible to run a patched kernel in production, you could
> > > try
> > > the following, and I'd definitely be interested in any results.
> > > 
> > > Otherwise, you can probably work around the problem by disabling
> > > delegations.  Something like
> > > 
> > >         sudo echo "fs.leases-enable = 0" >/etc/sysctl.d/nfsd-
> > > workaround.conf
> > > 
> > > should do it.
> > > 
> > > Not sure if this fix is best or if there's something simpler.
> > > 
> > > --b.
> > > 
> > > commit 6de51237589e
> > > Author: J. Bruce Fields <bfields@redhat.com>
> > > Date:   Tue Nov 23 22:31:04 2021 -0500
> > > 
> > >     nfsd: fix use-after-free due to delegation race
> > >     
> > >     A delegation break could arrive as soon as we've called
> > > vfs_setlease.  A
> > >     delegation break runs a callback which immediately (in
> > >     nfsd4_cb_recall_prepare) adds the delegation to
> > > del_recall_lru. 
> > > If we
> > >     then exit nfs4_set_delegation without hashing the delegation,
> > > it
> > > will be
> > >     freed as soon as the callback is done with it, without ever
> > > being
> > >     removed from del_recall_lru.
> > >     
> > >     Symptoms show up later as use-after-free or list corruption
> > > warnings,
> > >     usually in the laundromat thread.
> > >     
> > >     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > > 
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index bfad94c70b84..8e063f49240b 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -5159,15 +5159,16 @@ nfs4_set_delegation(struct nfs4_client
> > > *clp,
> > > struct svc_fh *fh,
> > >                 locks_free_lock(fl);
> > >         if (status)
> > >                 goto out_clnt_odstate;
> > > +
> > >         status = nfsd4_check_conflicting_opens(clp, fp);
> > > -       if (status)
> > > -               goto out_unlock;
> > >  
> > >         spin_lock(&state_lock);
> > >         spin_lock(&fp->fi_lock);
> > > -       if (fp->fi_had_conflict)
> > > +       if (status || fp->fi_had_conflict) {
> > > +               list_del_init(&dp->dl_recall_lru);
> > > +               dp->dl_time++;
> > >                 status = -EAGAIN;
> > > -       else
> > > +       } else
> > >                 status = hash_delegation_locked(dp, fp);
> > >         spin_unlock(&fp->fi_lock);
> > >         spin_unlock(&state_lock);
> > 
> > Why won't this leak a reference to the stid?
> 
> I'm not seeing it.

Hmm... I thought we were leaking the reference taken by
nfsd_break_one_deleg(), but that does get cleaned up by
nfsd4_cb_recall_release()

> 
> > Afaics nfsd_break_one_deleg() does take a reference before
> > launching
> > the callback daemon, and that reference is released when the
> > delegation is later processed by the laundromat.
> 
> Right.  Basically, there are two "long-lived" references, one taken
> as
> long as the callback's using it, one taken as long as it's "hashed"
> (see
> hash_delegation_locked/unhash_delegation_locked).
> 
> In the -EAGAIN case above, we're holding a temporary reference which
> will be dropped on exit from this function; if a callback's in
> progress,
> it will then drop the final reference.
> 
> > Hmm... Isn't the real bug here rather that the laundromat is
> > corrupting
> > both the nn->client_lru and nn->del_recall_lru lists because it is
> > using list_add() instead of list_move() when adding these objects
> > to
> > the reaplist?
> 
> If that were the problem, I think we'd be hitting it all the time.
> Looking....  No, unhash_delegation_locked did a list_del_init().
> 
> (Is the WARN_ON there too ugly?)
> 

It is a little nasty that we hide the list_del() calls in several
levels of function call, so they probably do deserve a comment.

That said, if, as in the case here, the delegation was unhashed, we
still end up not calling list_del_init() in unhash_delegation_locked(),
and since the list_add() is not conditional on it being successful, the
global list is again corrupted.

Yes, it is an unlikely race, but it is possible despite your change.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2021-11-24 17:14 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-11  7:59 Kernel panic / list_add corruption when in nfsd4_run_cb_work Salvatore Bonaccorso
2020-10-12 14:26 ` J. Bruce Fields
2020-10-12 15:41   ` Salvatore Bonaccorso
2020-10-12 16:33     ` J. Bruce Fields
2020-10-18  9:39       ` Salvatore Bonaccorso
2021-10-06 18:46         ` Salvatore Bonaccorso
2021-11-22  9:15           ` Olivier Monaco
2021-11-22 15:17             ` Chuck Lever III
2021-11-24 15:29               ` Bruce Fields
2021-11-24 15:59                 ` Trond Myklebust
2021-11-24 16:10                   ` Trond Myklebust
2021-11-24 16:10                   ` bfields
2021-11-24 17:14                     ` Trond Myklebust [this message]
2021-11-24 22:06                       ` bfields
2021-11-24 22:17                         ` Trond Myklebust
2021-12-01 22:33                           ` bfields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e17bbb0a22b154c77c6ec82aad63424f70bfda94.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=bfields@fieldses.org \
    --cc=carnil@debian.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=olivier@bm-services.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.