From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9CC5C388F9 for ; Wed, 11 Nov 2020 11:08:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 55DCC20756 for ; Wed, 11 Nov 2020 11:08:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726157AbgKKLIe (ORCPT ); Wed, 11 Nov 2020 06:08:34 -0500 Received: from mx2.suse.de ([195.135.220.15]:50092 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725895AbgKKLIe (ORCPT ); Wed, 11 Nov 2020 06:08:34 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 91F92ABD6; Wed, 11 Nov 2020 11:08:32 +0000 (UTC) Received: from localhost (brahms [local]) by brahms (OpenSMTPD) with ESMTPA id 57f960c6; Wed, 11 Nov 2020 11:08:44 +0000 (UTC) From: Luis Henriques To: Jeff Layton Cc: "Yan, Zheng" , ceph-devel , Ilya Dryomov Subject: Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races References: <20191212173159.35013-1-jlayton@kernel.org> <64d5a16d920098122144e0df8e03df0cadfb2784.camel@kernel.org> Date: Wed, 11 Nov 2020 11:08:44 +0000 In-Reply-To: <64d5a16d920098122144e0df8e03df0cadfb2784.camel@kernel.org> (Jeff Layton's message of "Sun, 15 Dec 2019 17:40:21 -0500") Message-ID: <871rh0f8w3.fsf@suse.de> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Jeff Layton writes: > On Sat, 2019-12-14 at 10:46 +0800, Yan, Zheng wrote: >> On Fri, Dec 13, 2019 at 1:32 AM Jeff Layton wrote: >> > I believe it's possible that we could end up with racing calls to >> > __ceph_remove_cap for the same cap. If that happens, the cap->ci >> > pointer will be zereoed out and we can hit a NULL pointer dereference. >> > >> > Once we acquire the s_cap_lock, check for the ci pointer being NULL, >> > and just return without doing anything if it is. >> > >> > URL: https://tracker.ceph.com/issues/43272 >> > Signed-off-by: Jeff Layton >> > --- >> > fs/ceph/caps.c | 21 ++++++++++++++++----- >> > 1 file changed, 16 insertions(+), 5 deletions(-) >> > >> > This is the only scenario that made sense to me in light of Ilya's >> > analysis on the tracker above. I could be off here though -- the locking >> > around this code is horrifically complex, and I could be missing what >> > should guard against this scenario. >> > >> >> I think the simpler fix is, in trim_caps_cb, check if cap-ci is >> non-null before calling __ceph_remove_cap(). this should work because >> __ceph_remove_cap() is always called inside i_ceph_lock >> > > Is that sufficient though? The stack trace in the bug shows it being > called by ceph_trim_caps, but I think we could hit the same problem with > other __ceph_remove_cap callers, if they happen to race in at the right > time. Sorry for resurrecting this old thread, but we just got a report with this issue on a kernel that includes commit d6e47819721a ("ceph: hold i_ceph_lock when removing caps for freeing inode"). Looking at the code, I believe Zheng's suggestion should work as I don't see any __ceph_remove_cap callers that don't hold the i_ceph_lock. So, would something like the diff bellow be acceptable? Cheers, -- Luis diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 8f1d7500a7ec..7dbb73099d2c 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1960,7 +1960,8 @@ static int trim_caps_cb(struct inode *inode, struct ceph_cap *cap, void *arg) if (oissued) { /* we aren't the only cap.. just remove us */ - __ceph_remove_cap(cap, true); + if (cap->ci) + __ceph_remove_cap(cap, true); (*remaining)--; } else { struct dentry *dentry; > > >> > Thoughts? >> > >> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c >> > index 9d09bb53c1ab..7e39ee8eff60 100644 >> > --- a/fs/ceph/caps.c >> > +++ b/fs/ceph/caps.c >> > @@ -1046,11 +1046,22 @@ static void drop_inode_snap_realm(struct ceph_inode_info *ci) >> > void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) >> > { >> > struct ceph_mds_session *session = cap->session; >> > - struct ceph_inode_info *ci = cap->ci; >> > - struct ceph_mds_client *mdsc = >> > - ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; >> > + struct ceph_inode_info *ci; >> > + struct ceph_mds_client *mdsc; >> > int removed = 0; >> > >> > + spin_lock(&session->s_cap_lock); >> > + ci = cap->ci; >> > + if (!ci) { >> > + /* >> > + * Did we race with a competing __ceph_remove_cap call? If >> > + * ci is zeroed out, then just unlock and don't do anything. >> > + * Assume that it's on its way out anyway. >> > + */ >> > + spin_unlock(&session->s_cap_lock); >> > + return; >> > + } >> > + >> > dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode); >> > >> > /* remove from inode's cap rbtree, and clear auth cap */ >> > @@ -1058,13 +1069,12 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) >> > if (ci->i_auth_cap == cap) >> > ci->i_auth_cap = NULL; >> > >> > - /* remove from session list */ >> > - spin_lock(&session->s_cap_lock); >> > if (session->s_cap_iterator == cap) { >> > /* not yet, we are iterating over this very cap */ >> > dout("__ceph_remove_cap delaying %p removal from session %p\n", >> > cap, cap->session); >> > } else { >> > + /* remove from session list */ >> > list_del_init(&cap->session_caps); >> > session->s_nr_caps--; >> > cap->session = NULL; >> > @@ -1072,6 +1082,7 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) >> > } >> > /* protect backpointer with s_cap_lock: see iterate_session_caps */ >> > cap->ci = NULL; >> > + mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; >> > >> > /* >> > * s_cap_reconnect is protected by s_cap_lock. no one changes >> > -- >> > 2.23.0 >> >