ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Luis Henriques <lhenriques@suse.de>
Cc: ceph-devel@vger.kernel.org, idryomov@gmail.com,
	stable@vger.kernel.org, Sage Weil <sage@redhat.com>,
	Mark Nelson <mnelson@redhat.com>
Subject: Re: [PATCH v2] ceph: ensure we take snap_empty_lock atomically with snaprealm refcount change
Date: Wed, 04 Aug 2021 12:32:54 -0400	[thread overview]
Message-ID: <a7566b1207ab66f4bdbeef8b653e97d5849a177f.camel@kernel.org> (raw)
In-Reply-To: <87o8adi3bo.fsf@suse.de>

On Wed, 2021-08-04 at 17:26 +0100, Luis Henriques wrote:
> Jeff Layton <jlayton@kernel.org> writes:
> 
> > There is a race in ceph_put_snap_realm. The change to the nref and the
> > spinlock acquisition are not done atomically, so you could decrement nref,
> > and before you take the spinlock, the nref is incremented again. At that
> > point, you end up putting it on the empty list when it shouldn't be
> > there. Eventually __cleanup_empty_realms runs and frees it when it's
> > still in-use.
> > 
> > Fix this by protecting the 1->0 transition with atomic_dec_and_lock, and
> > just drop the spinlock if we can get the rwsem.
> > 
> > Because these objects can also undergo a 0->1 refcount transition, we
> > must protect that change as well with the spinlock. Increment locklessly
> > unless the value is at 0, in which case we take the spinlock, increment
> > and then take it off the empty list if it did the 0->1 transition.
> > 
> > With these changes, I'm removing the dout() messages from these
> > functions, as well as in __put_snap_realm. They've always been racy, and
> > it's better to not print values that may be misleading.
> > 
> > Cc: stable@vger.kernel.org
> > Cc: Sage Weil <sage@redhat.com>
> > Reported-by: Mark Nelson <mnelson@redhat.com>
> > URL: https://tracker.ceph.com/issues/46419
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/snap.c | 34 +++++++++++++++++-----------------
> >  1 file changed, 17 insertions(+), 17 deletions(-)
> > 
> > v2: No functional changes, but I cleaned up the comments a bit and
> >     added another in __put_snap_realm.
> > 
> > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> > index 9dbc92cfda38..158c11e96fb7 100644
> > --- a/fs/ceph/snap.c
> > +++ b/fs/ceph/snap.c
> > @@ -67,19 +67,19 @@ void ceph_get_snap_realm(struct ceph_mds_client *mdsc,
> >  {
> >  	lockdep_assert_held(&mdsc->snap_rwsem);
> >  
> > -	dout("get_realm %p %d -> %d\n", realm,
> > -	     atomic_read(&realm->nref), atomic_read(&realm->nref)+1);
> >  	/*
> > -	 * since we _only_ increment realm refs or empty the empty
> > -	 * list with snap_rwsem held, adjusting the empty list here is
> > -	 * safe.  we do need to protect against concurrent empty list
> > -	 * additions, however.
> > +	 * The 0->1 and 1->0 transitions must take the snap_empty_lock
> > +	 * atomically with the refcount change. Go ahead and bump the
> > +	 * nref here, unless it's 0, in which case we take the spinlock
> > +	 * and then do the increment and remove it from the list.
> >  	 */
> > -	if (atomic_inc_return(&realm->nref) == 1) {
> > -		spin_lock(&mdsc->snap_empty_lock);
> > +	if (atomic_add_unless(&realm->nref, 1, 0))
> 
> Here you could probably use atomic_inc_not_zero() instead.  But other
> than that it looks good.  Thanks a lot for solving yet another locking
> puzzle!
> 
> Reviewed-by: Luis Henriques <lhenriques@suse.de>
> 
> Cheers,

Good point! That is a little clearer. I'll incorporate that change and
merge it.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>


      reply	other threads:[~2021-08-04 16:32 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-04 15:55 [PATCH v2] ceph: ensure we take snap_empty_lock atomically with snaprealm refcount change Jeff Layton
2021-08-04 16:26 ` Luis Henriques
2021-08-04 16:32   ` Jeff Layton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a7566b1207ab66f4bdbeef8b653e97d5849a177f.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=lhenriques@suse.de \
    --cc=mnelson@redhat.com \
    --cc=sage@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).