Re: [RFC PATCH] ceph: guard against __ceph_remove

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races
       [not found]   ` <64d5a16d920098122144e0df8e03df0cadfb2784.camel@kernel.org>
@ 2020-11-11 11:08     ` Luis Henriques
  2020-11-11 13:09       ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Luis Henriques @ 2020-11-11 11:08 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Yan, Zheng, ceph-devel, Ilya Dryomov

Jeff Layton <jlayton@kernel.org> writes:

> On Sat, 2019-12-14 at 10:46 +0800, Yan, Zheng wrote:
>> On Fri, Dec 13, 2019 at 1:32 AM Jeff Layton <jlayton@kernel.org> wrote:
>> > I believe it's possible that we could end up with racing calls to
>> > __ceph_remove_cap for the same cap. If that happens, the cap->ci
>> > pointer will be zereoed out and we can hit a NULL pointer dereference.
>> > 
>> > Once we acquire the s_cap_lock, check for the ci pointer being NULL,
>> > and just return without doing anything if it is.
>> > 
>> > URL: https://tracker.ceph.com/issues/43272
>> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> > ---
>> >  fs/ceph/caps.c | 21 ++++++++++++++++-----
>> >  1 file changed, 16 insertions(+), 5 deletions(-)
>> > 
>> > This is the only scenario that made sense to me in light of Ilya's
>> > analysis on the tracker above. I could be off here though -- the locking
>> > around this code is horrifically complex, and I could be missing what
>> > should guard against this scenario.
>> > 
>> 
>> I think the simpler fix is,  in trim_caps_cb, check if cap-ci is
>> non-null before calling __ceph_remove_cap().  this should work because
>> __ceph_remove_cap() is always called inside i_ceph_lock
>> 
>
> Is that sufficient though? The stack trace in the bug shows it being
> called by ceph_trim_caps, but I think we could hit the same problem with
> other __ceph_remove_cap callers, if they happen to race in at the right
> time.

Sorry for resurrecting this old thread, but we just got a report with this
issue on a kernel that includes commit d6e47819721a ("ceph: hold
i_ceph_lock when removing caps for freeing inode").

Looking at the code, I believe Zheng's suggestion should work as I don't
see any __ceph_remove_cap callers that don't hold the i_ceph_lock.  So,
would something like the diff bellow be acceptable?

Cheers,
-- 
Luis


diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 8f1d7500a7ec..7dbb73099d2c 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1960,7 +1960,8 @@ static int trim_caps_cb(struct inode *inode, struct ceph_cap *cap, void *arg)
 
 	if (oissued) {
 		/* we aren't the only cap.. just remove us */
-		__ceph_remove_cap(cap, true);
+		if (cap->ci)
+			__ceph_remove_cap(cap, true);
 		(*remaining)--;
 	} else {
 		struct dentry *dentry;


>
>
>> > Thoughts?
>> > 
>> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>> > index 9d09bb53c1ab..7e39ee8eff60 100644
>> > --- a/fs/ceph/caps.c
>> > +++ b/fs/ceph/caps.c
>> > @@ -1046,11 +1046,22 @@ static void drop_inode_snap_realm(struct ceph_inode_info *ci)
>> >  void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
>> >  {
>> >         struct ceph_mds_session *session = cap->session;
>> > -       struct ceph_inode_info *ci = cap->ci;
>> > -       struct ceph_mds_client *mdsc =
>> > -               ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
>> > +       struct ceph_inode_info *ci;
>> > +       struct ceph_mds_client *mdsc;
>> >         int removed = 0;
>> > 
>> > +       spin_lock(&session->s_cap_lock);
>> > +       ci = cap->ci;
>> > +       if (!ci) {
>> > +               /*
>> > +                * Did we race with a competing __ceph_remove_cap call? If
>> > +                * ci is zeroed out, then just unlock and don't do anything.
>> > +                * Assume that it's on its way out anyway.
>> > +                */
>> > +               spin_unlock(&session->s_cap_lock);
>> > +               return;
>> > +       }
>> > +
>> >         dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode);
>> > 
>> >         /* remove from inode's cap rbtree, and clear auth cap */
>> > @@ -1058,13 +1069,12 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
>> >         if (ci->i_auth_cap == cap)
>> >                 ci->i_auth_cap = NULL;
>> > 
>> > -       /* remove from session list */
>> > -       spin_lock(&session->s_cap_lock);
>> >         if (session->s_cap_iterator == cap) {
>> >                 /* not yet, we are iterating over this very cap */
>> >                 dout("__ceph_remove_cap  delaying %p removal from session %p\n",
>> >                      cap, cap->session);
>> >         } else {
>> > +               /* remove from session list */
>> >                 list_del_init(&cap->session_caps);
>> >                 session->s_nr_caps--;
>> >                 cap->session = NULL;
>> > @@ -1072,6 +1082,7 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
>> >         }
>> >         /* protect backpointer with s_cap_lock: see iterate_session_caps */
>> >         cap->ci = NULL;
>> > +       mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
>> > 
>> >         /*
>> >          * s_cap_reconnect is protected by s_cap_lock. no one changes
>> > --
>> > 2.23.0
>> > 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races
  2020-11-11 11:08     ` [RFC PATCH] ceph: guard against __ceph_remove_cap races Luis Henriques
@ 2020-11-11 13:09       ` Jeff Layton
  2020-11-11 14:11         ` Luis Henriques
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2020-11-11 13:09 UTC (permalink / raw)
  To: Luis Henriques; +Cc: Yan, Zheng, ceph-devel, Ilya Dryomov

On Wed, 2020-11-11 at 11:08 +0000, Luis Henriques wrote:
> Jeff Layton <jlayton@kernel.org> writes:
> 
> > On Sat, 2019-12-14 at 10:46 +0800, Yan, Zheng wrote:
> > > On Fri, Dec 13, 2019 at 1:32 AM Jeff Layton <jlayton@kernel.org> wrote:
> > > > I believe it's possible that we could end up with racing calls to
> > > > __ceph_remove_cap for the same cap. If that happens, the cap->ci
> > > > pointer will be zereoed out and we can hit a NULL pointer dereference.
> > > > 
> > > > Once we acquire the s_cap_lock, check for the ci pointer being NULL,
> > > > and just return without doing anything if it is.
> > > > 
> > > > URL: https://tracker.ceph.com/issues/43272
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  fs/ceph/caps.c | 21 ++++++++++++++++-----
> > > >  1 file changed, 16 insertions(+), 5 deletions(-)
> > > > 
> > > > This is the only scenario that made sense to me in light of Ilya's
> > > > analysis on the tracker above. I could be off here though -- the locking
> > > > around this code is horrifically complex, and I could be missing what
> > > > should guard against this scenario.
> > > > 
> > > 
> > > I think the simpler fix is,  in trim_caps_cb, check if cap-ci is
> > > non-null before calling __ceph_remove_cap().  this should work because
> > > __ceph_remove_cap() is always called inside i_ceph_lock
> > > 
> > 
> > Is that sufficient though? The stack trace in the bug shows it being
> > called by ceph_trim_caps, but I think we could hit the same problem with
> > other __ceph_remove_cap callers, if they happen to race in at the right
> > time.
> 
> Sorry for resurrecting this old thread, but we just got a report with this
> issue on a kernel that includes commit d6e47819721a ("ceph: hold
> i_ceph_lock when removing caps for freeing inode").
> 
> Looking at the code, I believe Zheng's suggestion should work as I don't
> see any __ceph_remove_cap callers that don't hold the i_ceph_lock.  So,
> would something like the diff bellow be acceptable?
> 
> Cheers,

I'm still not convinced that's the correct fix.

Why would trim_caps_cb be subject to this race when other
__ceph_remove_cap callers are not? Maybe the right fix is to test for a
NULL cap->ci in __ceph_remove_cap and just return early if it is?

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races
  2020-11-11 13:09       ` Jeff Layton
@ 2020-11-11 14:11         ` Luis Henriques
  2020-11-11 14:24           ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Luis Henriques @ 2020-11-11 14:11 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Yan, Zheng, ceph-devel, Ilya Dryomov

Jeff Layton <jlayton@kernel.org> writes:

> On Wed, 2020-11-11 at 11:08 +0000, Luis Henriques wrote:
>> Jeff Layton <jlayton@kernel.org> writes:
>> 
>> > On Sat, 2019-12-14 at 10:46 +0800, Yan, Zheng wrote:
>> > > On Fri, Dec 13, 2019 at 1:32 AM Jeff Layton <jlayton@kernel.org> wrote:
>> > > > I believe it's possible that we could end up with racing calls to
>> > > > __ceph_remove_cap for the same cap. If that happens, the cap->ci
>> > > > pointer will be zereoed out and we can hit a NULL pointer dereference.
>> > > > 
>> > > > Once we acquire the s_cap_lock, check for the ci pointer being NULL,
>> > > > and just return without doing anything if it is.
>> > > > 
>> > > > URL: https://tracker.ceph.com/issues/43272
>> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> > > > ---
>> > > >  fs/ceph/caps.c | 21 ++++++++++++++++-----
>> > > >  1 file changed, 16 insertions(+), 5 deletions(-)
>> > > > 
>> > > > This is the only scenario that made sense to me in light of Ilya's
>> > > > analysis on the tracker above. I could be off here though -- the locking
>> > > > around this code is horrifically complex, and I could be missing what
>> > > > should guard against this scenario.
>> > > > 
>> > > 
>> > > I think the simpler fix is,  in trim_caps_cb, check if cap-ci is
>> > > non-null before calling __ceph_remove_cap().  this should work because
>> > > __ceph_remove_cap() is always called inside i_ceph_lock
>> > > 
>> > 
>> > Is that sufficient though? The stack trace in the bug shows it being
>> > called by ceph_trim_caps, but I think we could hit the same problem with
>> > other __ceph_remove_cap callers, if they happen to race in at the right
>> > time.
>> 
>> Sorry for resurrecting this old thread, but we just got a report with this
>> issue on a kernel that includes commit d6e47819721a ("ceph: hold
>> i_ceph_lock when removing caps for freeing inode").
>> 
>> Looking at the code, I believe Zheng's suggestion should work as I don't
>> see any __ceph_remove_cap callers that don't hold the i_ceph_lock.  So,
>> would something like the diff bellow be acceptable?
>> 
>> Cheers,
>
> I'm still not convinced that's the correct fix.
>
> Why would trim_caps_cb be subject to this race when other
> __ceph_remove_cap callers are not? Maybe the right fix is to test for a
> NULL cap->ci in __ceph_remove_cap and just return early if it is?

I see, you're probably right.  Looking again at the code I see that there
are two possible places where this race could occur, and they're both used
as callbacks in ceph_iterate_session_caps.  They are trim_caps_cb and
remove_session_caps_cb.

These callbacks get the struct ceph_cap as argument and only then they
will acquire i_ceph_lock.  Since this isn't protected with
session->s_cap_lock, I guess this could be where the race window is, where
cap->ci can be set to NULL.

Bellow is the patch you suggested.  If you think that's acceptable I can
resend with a proper commit message.

Cheers,
-- 
Luis

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index ded4229c314a..917dfaf0bd01 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1140,12 +1140,17 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
 {
 	struct ceph_mds_session *session = cap->session;
 	struct ceph_inode_info *ci = cap->ci;
-	struct ceph_mds_client *mdsc =
-		ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
+	struct ceph_mds_client *mdsc;
+
 	int removed = 0;
 
+	if (!ci)
+		return;
+
 	dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode);
 
+	mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
+
 	/* remove from inode's cap rbtree, and clear auth cap */
 	rb_erase(&cap->ci_node, &ci->i_caps);
 	if (ci->i_auth_cap == cap) {

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races
  2020-11-11 14:11         ` Luis Henriques
@ 2020-11-11 14:24           ` Jeff Layton
  2020-11-11 14:34             ` Luis Henriques
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2020-11-11 14:24 UTC (permalink / raw)
  To: Luis Henriques; +Cc: Yan, Zheng, ceph-devel, Ilya Dryomov

On Wed, 2020-11-11 at 14:11 +0000, Luis Henriques wrote:
> 
> 

It think this looks reasonable. Minor nits below:

> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index ded4229c314a..917dfaf0bd01 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -1140,12 +1140,17 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
>  {
>         struct ceph_mds_session *session = cap->session;
>         struct ceph_inode_info *ci = cap->ci;
> -       struct ceph_mds_client *mdsc =
> -               ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
> +       struct ceph_mds_client *mdsc;
> +

nit: remove the above newline

>         int removed = 0;
>  

Maybe add a comment here to the effect that a NULL cap->ci indicates
that the remove has already been done?

> +       if (!ci)
> +               return;
> +
>         dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode);
>  
> +       mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
> +

There's a ceph_inode_to_client helper now that may make this a bit more
readable.

>         /* remove from inode's cap rbtree, and clear auth cap */
>         rb_erase(&cap->ci_node, &ci->i_caps);
>         if (ci->i_auth_cap == cap) {

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races
  2020-11-11 14:24           ` Jeff Layton
@ 2020-11-11 14:34             ` Luis Henriques
  0 siblings, 0 replies; 5+ messages in thread
From: Luis Henriques @ 2020-11-11 14:34 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Yan, Zheng, ceph-devel, Ilya Dryomov

Jeff Layton <jlayton@kernel.org> writes:

> On Wed, 2020-11-11 at 14:11 +0000, Luis Henriques wrote:
>> 
>> 
>
> It think this looks reasonable. Minor nits below:
>
>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>> index ded4229c314a..917dfaf0bd01 100644
>> --- a/fs/ceph/caps.c
>> +++ b/fs/ceph/caps.c
>> @@ -1140,12 +1140,17 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
>>  {
>>         struct ceph_mds_session *session = cap->session;
>>         struct ceph_inode_info *ci = cap->ci;
>> -       struct ceph_mds_client *mdsc =
>> -               ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
>> +       struct ceph_mds_client *mdsc;
>> +
>
> nit: remove the above newline
>
>>         int removed = 0;
>>  
>
> Maybe add a comment here to the effect that a NULL cap->ci indicates
> that the remove has already been done?
>
>> +       if (!ci)
>> +               return;
>> +
>>         dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode);
>>  
>> +       mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
>> +
>
> There's a ceph_inode_to_client helper now that may make this a bit more
> readable.
>
>>         /* remove from inode's cap rbtree, and clear auth cap */
>>         rb_erase(&cap->ci_node, &ci->i_caps);
>>         if (ci->i_auth_cap == cap) {

Thanks Jeff.  I'll re-post this soon with your suggestions.  I just want
to run some more local tests to make sure things aren't breaking with this
change.

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-11 14:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20191212173159.35013-1-jlayton@kernel.org>
     [not found] ` <CAAM7YAmquOg5ESMAMa5y0gGAR-UAivYF8m+nqrJNmK=SzG6+wA@mail.gmail.com>
     [not found]   ` <64d5a16d920098122144e0df8e03df0cadfb2784.camel@kernel.org>
2020-11-11 11:08     ` [RFC PATCH] ceph: guard against __ceph_remove_cap races Luis Henriques
2020-11-11 13:09       ` Jeff Layton
2020-11-11 14:11         ` Luis Henriques
2020-11-11 14:24           ` Jeff Layton
2020-11-11 14:34             ` Luis Henriques

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.