* Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races [not found] ` <64d5a16d920098122144e0df8e03df0cadfb2784.camel@kernel.org> @ 2020-11-11 11:08 ` Luis Henriques 2020-11-11 13:09 ` Jeff Layton 0 siblings, 1 reply; 5+ messages in thread From: Luis Henriques @ 2020-11-11 11:08 UTC (permalink / raw) To: Jeff Layton; +Cc: Yan, Zheng, ceph-devel, Ilya Dryomov Jeff Layton <jlayton@kernel.org> writes: > On Sat, 2019-12-14 at 10:46 +0800, Yan, Zheng wrote: >> On Fri, Dec 13, 2019 at 1:32 AM Jeff Layton <jlayton@kernel.org> wrote: >> > I believe it's possible that we could end up with racing calls to >> > __ceph_remove_cap for the same cap. If that happens, the cap->ci >> > pointer will be zereoed out and we can hit a NULL pointer dereference. >> > >> > Once we acquire the s_cap_lock, check for the ci pointer being NULL, >> > and just return without doing anything if it is. >> > >> > URL: https://tracker.ceph.com/issues/43272 >> > Signed-off-by: Jeff Layton <jlayton@kernel.org> >> > --- >> > fs/ceph/caps.c | 21 ++++++++++++++++----- >> > 1 file changed, 16 insertions(+), 5 deletions(-) >> > >> > This is the only scenario that made sense to me in light of Ilya's >> > analysis on the tracker above. I could be off here though -- the locking >> > around this code is horrifically complex, and I could be missing what >> > should guard against this scenario. >> > >> >> I think the simpler fix is, in trim_caps_cb, check if cap-ci is >> non-null before calling __ceph_remove_cap(). this should work because >> __ceph_remove_cap() is always called inside i_ceph_lock >> > > Is that sufficient though? The stack trace in the bug shows it being > called by ceph_trim_caps, but I think we could hit the same problem with > other __ceph_remove_cap callers, if they happen to race in at the right > time. Sorry for resurrecting this old thread, but we just got a report with this issue on a kernel that includes commit d6e47819721a ("ceph: hold i_ceph_lock when removing caps for freeing inode"). Looking at the code, I believe Zheng's suggestion should work as I don't see any __ceph_remove_cap callers that don't hold the i_ceph_lock. So, would something like the diff bellow be acceptable? Cheers, -- Luis diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 8f1d7500a7ec..7dbb73099d2c 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1960,7 +1960,8 @@ static int trim_caps_cb(struct inode *inode, struct ceph_cap *cap, void *arg) if (oissued) { /* we aren't the only cap.. just remove us */ - __ceph_remove_cap(cap, true); + if (cap->ci) + __ceph_remove_cap(cap, true); (*remaining)--; } else { struct dentry *dentry; > > >> > Thoughts? >> > >> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c >> > index 9d09bb53c1ab..7e39ee8eff60 100644 >> > --- a/fs/ceph/caps.c >> > +++ b/fs/ceph/caps.c >> > @@ -1046,11 +1046,22 @@ static void drop_inode_snap_realm(struct ceph_inode_info *ci) >> > void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) >> > { >> > struct ceph_mds_session *session = cap->session; >> > - struct ceph_inode_info *ci = cap->ci; >> > - struct ceph_mds_client *mdsc = >> > - ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; >> > + struct ceph_inode_info *ci; >> > + struct ceph_mds_client *mdsc; >> > int removed = 0; >> > >> > + spin_lock(&session->s_cap_lock); >> > + ci = cap->ci; >> > + if (!ci) { >> > + /* >> > + * Did we race with a competing __ceph_remove_cap call? If >> > + * ci is zeroed out, then just unlock and don't do anything. >> > + * Assume that it's on its way out anyway. >> > + */ >> > + spin_unlock(&session->s_cap_lock); >> > + return; >> > + } >> > + >> > dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode); >> > >> > /* remove from inode's cap rbtree, and clear auth cap */ >> > @@ -1058,13 +1069,12 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) >> > if (ci->i_auth_cap == cap) >> > ci->i_auth_cap = NULL; >> > >> > - /* remove from session list */ >> > - spin_lock(&session->s_cap_lock); >> > if (session->s_cap_iterator == cap) { >> > /* not yet, we are iterating over this very cap */ >> > dout("__ceph_remove_cap delaying %p removal from session %p\n", >> > cap, cap->session); >> > } else { >> > + /* remove from session list */ >> > list_del_init(&cap->session_caps); >> > session->s_nr_caps--; >> > cap->session = NULL; >> > @@ -1072,6 +1082,7 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) >> > } >> > /* protect backpointer with s_cap_lock: see iterate_session_caps */ >> > cap->ci = NULL; >> > + mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; >> > >> > /* >> > * s_cap_reconnect is protected by s_cap_lock. no one changes >> > -- >> > 2.23.0 >> > ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races 2020-11-11 11:08 ` [RFC PATCH] ceph: guard against __ceph_remove_cap races Luis Henriques @ 2020-11-11 13:09 ` Jeff Layton 2020-11-11 14:11 ` Luis Henriques 0 siblings, 1 reply; 5+ messages in thread From: Jeff Layton @ 2020-11-11 13:09 UTC (permalink / raw) To: Luis Henriques; +Cc: Yan, Zheng, ceph-devel, Ilya Dryomov On Wed, 2020-11-11 at 11:08 +0000, Luis Henriques wrote: > Jeff Layton <jlayton@kernel.org> writes: > > > On Sat, 2019-12-14 at 10:46 +0800, Yan, Zheng wrote: > > > On Fri, Dec 13, 2019 at 1:32 AM Jeff Layton <jlayton@kernel.org> wrote: > > > > I believe it's possible that we could end up with racing calls to > > > > __ceph_remove_cap for the same cap. If that happens, the cap->ci > > > > pointer will be zereoed out and we can hit a NULL pointer dereference. > > > > > > > > Once we acquire the s_cap_lock, check for the ci pointer being NULL, > > > > and just return without doing anything if it is. > > > > > > > > URL: https://tracker.ceph.com/issues/43272 > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > --- > > > > fs/ceph/caps.c | 21 ++++++++++++++++----- > > > > 1 file changed, 16 insertions(+), 5 deletions(-) > > > > > > > > This is the only scenario that made sense to me in light of Ilya's > > > > analysis on the tracker above. I could be off here though -- the locking > > > > around this code is horrifically complex, and I could be missing what > > > > should guard against this scenario. > > > > > > > > > > I think the simpler fix is, in trim_caps_cb, check if cap-ci is > > > non-null before calling __ceph_remove_cap(). this should work because > > > __ceph_remove_cap() is always called inside i_ceph_lock > > > > > > > Is that sufficient though? The stack trace in the bug shows it being > > called by ceph_trim_caps, but I think we could hit the same problem with > > other __ceph_remove_cap callers, if they happen to race in at the right > > time. > > Sorry for resurrecting this old thread, but we just got a report with this > issue on a kernel that includes commit d6e47819721a ("ceph: hold > i_ceph_lock when removing caps for freeing inode"). > > Looking at the code, I believe Zheng's suggestion should work as I don't > see any __ceph_remove_cap callers that don't hold the i_ceph_lock. So, > would something like the diff bellow be acceptable? > > Cheers, I'm still not convinced that's the correct fix. Why would trim_caps_cb be subject to this race when other __ceph_remove_cap callers are not? Maybe the right fix is to test for a NULL cap->ci in __ceph_remove_cap and just return early if it is? -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races 2020-11-11 13:09 ` Jeff Layton @ 2020-11-11 14:11 ` Luis Henriques 2020-11-11 14:24 ` Jeff Layton 0 siblings, 1 reply; 5+ messages in thread From: Luis Henriques @ 2020-11-11 14:11 UTC (permalink / raw) To: Jeff Layton; +Cc: Yan, Zheng, ceph-devel, Ilya Dryomov Jeff Layton <jlayton@kernel.org> writes: > On Wed, 2020-11-11 at 11:08 +0000, Luis Henriques wrote: >> Jeff Layton <jlayton@kernel.org> writes: >> >> > On Sat, 2019-12-14 at 10:46 +0800, Yan, Zheng wrote: >> > > On Fri, Dec 13, 2019 at 1:32 AM Jeff Layton <jlayton@kernel.org> wrote: >> > > > I believe it's possible that we could end up with racing calls to >> > > > __ceph_remove_cap for the same cap. If that happens, the cap->ci >> > > > pointer will be zereoed out and we can hit a NULL pointer dereference. >> > > > >> > > > Once we acquire the s_cap_lock, check for the ci pointer being NULL, >> > > > and just return without doing anything if it is. >> > > > >> > > > URL: https://tracker.ceph.com/issues/43272 >> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> >> > > > --- >> > > > fs/ceph/caps.c | 21 ++++++++++++++++----- >> > > > 1 file changed, 16 insertions(+), 5 deletions(-) >> > > > >> > > > This is the only scenario that made sense to me in light of Ilya's >> > > > analysis on the tracker above. I could be off here though -- the locking >> > > > around this code is horrifically complex, and I could be missing what >> > > > should guard against this scenario. >> > > > >> > > >> > > I think the simpler fix is, in trim_caps_cb, check if cap-ci is >> > > non-null before calling __ceph_remove_cap(). this should work because >> > > __ceph_remove_cap() is always called inside i_ceph_lock >> > > >> > >> > Is that sufficient though? The stack trace in the bug shows it being >> > called by ceph_trim_caps, but I think we could hit the same problem with >> > other __ceph_remove_cap callers, if they happen to race in at the right >> > time. >> >> Sorry for resurrecting this old thread, but we just got a report with this >> issue on a kernel that includes commit d6e47819721a ("ceph: hold >> i_ceph_lock when removing caps for freeing inode"). >> >> Looking at the code, I believe Zheng's suggestion should work as I don't >> see any __ceph_remove_cap callers that don't hold the i_ceph_lock. So, >> would something like the diff bellow be acceptable? >> >> Cheers, > > I'm still not convinced that's the correct fix. > > Why would trim_caps_cb be subject to this race when other > __ceph_remove_cap callers are not? Maybe the right fix is to test for a > NULL cap->ci in __ceph_remove_cap and just return early if it is? I see, you're probably right. Looking again at the code I see that there are two possible places where this race could occur, and they're both used as callbacks in ceph_iterate_session_caps. They are trim_caps_cb and remove_session_caps_cb. These callbacks get the struct ceph_cap as argument and only then they will acquire i_ceph_lock. Since this isn't protected with session->s_cap_lock, I guess this could be where the race window is, where cap->ci can be set to NULL. Bellow is the patch you suggested. If you think that's acceptable I can resend with a proper commit message. Cheers, -- Luis diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index ded4229c314a..917dfaf0bd01 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -1140,12 +1140,17 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) { struct ceph_mds_session *session = cap->session; struct ceph_inode_info *ci = cap->ci; - struct ceph_mds_client *mdsc = - ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; + struct ceph_mds_client *mdsc; + int removed = 0; + if (!ci) + return; + dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode); + mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; + /* remove from inode's cap rbtree, and clear auth cap */ rb_erase(&cap->ci_node, &ci->i_caps); if (ci->i_auth_cap == cap) { ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races 2020-11-11 14:11 ` Luis Henriques @ 2020-11-11 14:24 ` Jeff Layton 2020-11-11 14:34 ` Luis Henriques 0 siblings, 1 reply; 5+ messages in thread From: Jeff Layton @ 2020-11-11 14:24 UTC (permalink / raw) To: Luis Henriques; +Cc: Yan, Zheng, ceph-devel, Ilya Dryomov On Wed, 2020-11-11 at 14:11 +0000, Luis Henriques wrote: > > It think this looks reasonable. Minor nits below: > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > index ded4229c314a..917dfaf0bd01 100644 > --- a/fs/ceph/caps.c > +++ b/fs/ceph/caps.c > @@ -1140,12 +1140,17 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) > { > struct ceph_mds_session *session = cap->session; > struct ceph_inode_info *ci = cap->ci; > - struct ceph_mds_client *mdsc = > - ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; > + struct ceph_mds_client *mdsc; > + nit: remove the above newline > int removed = 0; > Maybe add a comment here to the effect that a NULL cap->ci indicates that the remove has already been done? > + if (!ci) > + return; > + > dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode); > > + mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; > + There's a ceph_inode_to_client helper now that may make this a bit more readable. > /* remove from inode's cap rbtree, and clear auth cap */ > rb_erase(&cap->ci_node, &ci->i_caps); > if (ci->i_auth_cap == cap) { -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races 2020-11-11 14:24 ` Jeff Layton @ 2020-11-11 14:34 ` Luis Henriques 0 siblings, 0 replies; 5+ messages in thread From: Luis Henriques @ 2020-11-11 14:34 UTC (permalink / raw) To: Jeff Layton; +Cc: Yan, Zheng, ceph-devel, Ilya Dryomov Jeff Layton <jlayton@kernel.org> writes: > On Wed, 2020-11-11 at 14:11 +0000, Luis Henriques wrote: >> >> > > It think this looks reasonable. Minor nits below: > >> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c >> index ded4229c314a..917dfaf0bd01 100644 >> --- a/fs/ceph/caps.c >> +++ b/fs/ceph/caps.c >> @@ -1140,12 +1140,17 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) >> { >> struct ceph_mds_session *session = cap->session; >> struct ceph_inode_info *ci = cap->ci; >> - struct ceph_mds_client *mdsc = >> - ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; >> + struct ceph_mds_client *mdsc; >> + > > nit: remove the above newline > >> int removed = 0; >> > > Maybe add a comment here to the effect that a NULL cap->ci indicates > that the remove has already been done? > >> + if (!ci) >> + return; >> + >> dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode); >> >> + mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; >> + > > There's a ceph_inode_to_client helper now that may make this a bit more > readable. > >> /* remove from inode's cap rbtree, and clear auth cap */ >> rb_erase(&cap->ci_node, &ci->i_caps); >> if (ci->i_auth_cap == cap) { Thanks Jeff. I'll re-post this soon with your suggestions. I just want to run some more local tests to make sure things aren't breaking with this change. Cheers, -- Luis ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-11-11 14:34 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20191212173159.35013-1-jlayton@kernel.org> [not found] ` <CAAM7YAmquOg5ESMAMa5y0gGAR-UAivYF8m+nqrJNmK=SzG6+wA@mail.gmail.com> [not found] ` <64d5a16d920098122144e0df8e03df0cadfb2784.camel@kernel.org> 2020-11-11 11:08 ` [RFC PATCH] ceph: guard against __ceph_remove_cap races Luis Henriques 2020-11-11 13:09 ` Jeff Layton 2020-11-11 14:11 ` Luis Henriques 2020-11-11 14:24 ` Jeff Layton 2020-11-11 14:34 ` Luis Henriques
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.