From: Jeff Layton <jlayton@kernel.org>
To: ceph-devel@vger.kernel.org
Cc: idryomov@gmail.com, Sage Weil <sage@redhat.com>,
Mark Nelson <mnelson@redhat.com>
Subject: Re: [PATCH] ceph: ensure we take snap_empty_lock atomically with refcount change
Date: Tue, 03 Aug 2021 14:33:22 -0400 [thread overview]
Message-ID: <05ed962c013a62dbe1b17dcc19bb02a454bdb094.camel@kernel.org> (raw)
In-Reply-To: <20210803175126.29165-1-jlayton@kernel.org>
On Tue, 2021-08-03 at 13:51 -0400, Jeff Layton wrote:
> There is a race in ceph_put_snap_realm. The change to the nref and the
> spinlock acquisition are not done atomically, so you could change nref,
> and before you take the spinlock, the nref is incremented again. At that
> point, you end up putting it on the empty list when it shouldn't be
> there. Eventually __cleanup_empty_realms runs and frees it when it's
> still in-use.
>
> Fix this by protecting the 1->0 transition with atomic_dec_and_lock,
> which should ensure that the race can't occur.
>
> Because these objects can also undergo a 0->1 refcount transition, we
> must protect that change as well with the spinlock. Increment locklessly
> unless the value is at 0, in which case we take the spinlock, increment
> and then take it off the empty list.
>
> With these changes, I'm removing the dout() messages from these
> functions as well. They've always been racy, and it's better to not
> print values that may be misleading.
>
> Cc: Sage Weil <sage@redhat.com>
> Reported-by: Mark Nelson <mnelson@redhat.com>
> URL: https://tracker.ceph.com/issues/46419
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/ceph/snap.c | 29 ++++++++++++++---------------
> 1 file changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> index 9dbc92cfda38..c81ba22711a5 100644
> --- a/fs/ceph/snap.c
> +++ b/fs/ceph/snap.c
> @@ -67,19 +67,20 @@ void ceph_get_snap_realm(struct ceph_mds_client *mdsc,
> {
> lockdep_assert_held(&mdsc->snap_rwsem);
>
> - dout("get_realm %p %d -> %d\n", realm,
> - atomic_read(&realm->nref), atomic_read(&realm->nref)+1);
> /*
> - * since we _only_ increment realm refs or empty the empty
> - * list with snap_rwsem held, adjusting the empty list here is
> - * safe. we do need to protect against concurrent empty list
> - * additions, however.
> + * The 0->1 and 1->0 transitions must take the snap_empty_lock
> + * atomically with the refcount change. Go ahead and bump the
> + * nref * here, unless it's 0, in which case we take the
I've cleaned up the comment block above before I merged this into the
testing branch. I've also marked this for stable kernels as well.
> + * spinlock and then do the increment and remove it from the
> + * list.
> */
> - if (atomic_inc_return(&realm->nref) == 1) {
> - spin_lock(&mdsc->snap_empty_lock);
> + if (atomic_add_unless(&realm->nref, 1, 0))
> + return;
> +
> + spin_lock(&mdsc->snap_empty_lock);
> + if (atomic_inc_return(&realm->nref) == 1)
> list_del_init(&realm->empty_item);
> - spin_unlock(&mdsc->snap_empty_lock);
> - }
> + spin_unlock(&mdsc->snap_empty_lock);
> }
>
> static void __insert_snap_realm(struct rb_root *root,
> @@ -215,21 +216,19 @@ static void __put_snap_realm(struct ceph_mds_client *mdsc,
> }
>
> /*
> - * caller needn't hold any locks
> + * See comments in ceph_get_snap_realm. Caller needn't hold any locks.
> */
> void ceph_put_snap_realm(struct ceph_mds_client *mdsc,
> struct ceph_snap_realm *realm)
> {
> - dout("put_snap_realm %llx %p %d -> %d\n", realm->ino, realm,
> - atomic_read(&realm->nref), atomic_read(&realm->nref)-1);
> - if (!atomic_dec_and_test(&realm->nref))
> + if (!atomic_dec_and_lock(&realm->nref, &mdsc->snap_empty_lock))
> return;
>
> if (down_write_trylock(&mdsc->snap_rwsem)) {
> + spin_unlock(&mdsc->snap_empty_lock);
> __destroy_snap_realm(mdsc, realm);
> up_write(&mdsc->snap_rwsem);
> } else {
> - spin_lock(&mdsc->snap_empty_lock);
> list_add(&realm->empty_item, &mdsc->snap_empty);
> spin_unlock(&mdsc->snap_empty_lock);
> }
--
Jeff Layton <jlayton@kernel.org>
prev parent reply other threads:[~2021-08-03 18:33 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-03 17:51 [PATCH] ceph: ensure we take snap_empty_lock atomically with refcount change Jeff Layton
2021-08-03 18:33 ` Jeff Layton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=05ed962c013a62dbe1b17dcc19bb02a454bdb094.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=mnelson@redhat.com \
--cc=sage@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).