linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] ceph: reduce contention in ceph_check_delayed_caps()
@ 2021-06-25 15:45 Luis Henriques
  2021-06-25 16:54 ` Jeff Layton
  0 siblings, 1 reply; 4+ messages in thread
From: Luis Henriques @ 2021-06-25 15:45 UTC (permalink / raw)
  To: Jeff Layton, Ilya Dryomov
  Cc: ceph-devel, linux-kernel, Luis Henriques, stable

Function ceph_check_delayed_caps() is called from the mdsc->delayed_work
workqueue and it can be kept looping for quite some time if caps keep being
added back to the mdsc->cap_delay_list.  This may result in the watchdog
tainting the kernel with the softlockup flag.

This patch re-arranges the loop through the caps list so that it initially
removes all the caps from list, adding them to a temporary list.  And then, with
less locking contention, it will eventually call the ceph_check_caps() for each
inode.  Any caps added to the list in the meantime will be handled in the next
run.

Cc: stable@vger.kernel.org
Signed-off-by: Luis Henriques <lhenriques@suse.de>
---
Hi Jeff!

So, I've not based this patch on top of your patchset that gets rid of
ceph_async_iput() so that it will make it easier to backport it for stable
kernels.  Of course I'm not 100% this classifies as stable material.

Other than that, I've been testing this patch and I couldn't see anything
breaking.  Let me know what you think.

(I *think* I've seen a tracker bug for this in the past but I couldn't
find it.  I guess it could be added as a 'Link:' tag.)

Cheers,
--
Luis

 fs/ceph/caps.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index a5e93b185515..727e41e3b939 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -4229,6 +4229,7 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
 {
 	struct inode *inode;
 	struct ceph_inode_info *ci;
+	LIST_HEAD(caps_list);
 
 	dout("check_delayed_caps\n");
 	spin_lock(&mdsc->cap_delay_lock);
@@ -4239,19 +4240,23 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
 		if ((ci->i_ceph_flags & CEPH_I_FLUSH) == 0 &&
 		    time_before(jiffies, ci->i_hold_caps_max))
 			break;
-		list_del_init(&ci->i_cap_delay_list);
+		list_move_tail(&ci->i_cap_delay_list, &caps_list);
+	}
+	spin_unlock(&mdsc->cap_delay_lock);
 
+	while (!list_empty(&caps_list)) {
+		ci = list_first_entry(&caps_list,
+				      struct ceph_inode_info,
+				      i_cap_delay_list);
+		list_del_init(&ci->i_cap_delay_list);
 		inode = igrab(&ci->vfs_inode);
 		if (inode) {
-			spin_unlock(&mdsc->cap_delay_lock);
 			dout("check_delayed_caps on %p\n", inode);
 			ceph_check_caps(ci, 0, NULL);
 			/* avoid calling iput_final() in tick thread */
 			ceph_async_iput(inode);
-			spin_lock(&mdsc->cap_delay_lock);
 		}
 	}
-	spin_unlock(&mdsc->cap_delay_lock);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] ceph: reduce contention in ceph_check_delayed_caps()
  2021-06-25 15:45 [RFC PATCH] ceph: reduce contention in ceph_check_delayed_caps() Luis Henriques
@ 2021-06-25 16:54 ` Jeff Layton
  2021-06-28  9:04   ` Luis Henriques
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Layton @ 2021-06-25 16:54 UTC (permalink / raw)
  To: Luis Henriques, Ilya Dryomov; +Cc: ceph-devel, linux-kernel, stable

On Fri, 2021-06-25 at 16:45 +0100, Luis Henriques wrote:
> Function ceph_check_delayed_caps() is called from the mdsc->delayed_work
> workqueue and it can be kept looping for quite some time if caps keep being
> added back to the mdsc->cap_delay_list.  This may result in the watchdog
> tainting the kernel with the softlockup flag.
> 
> This patch re-arranges the loop through the caps list so that it initially
> removes all the caps from list, adding them to a temporary list.  And then, with
> less locking contention, it will eventually call the ceph_check_caps() for each
> inode.  Any caps added to the list in the meantime will be handled in the next
> run.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> ---
> Hi Jeff!
> 
> So, I've not based this patch on top of your patchset that gets rid of
> ceph_async_iput() so that it will make it easier to backport it for stable
> kernels.  Of course I'm not 100% this classifies as stable material.
> 
> Other than that, I've been testing this patch and I couldn't see anything
> breaking.  Let me know what you think.
> 
> (I *think* I've seen a tracker bug for this in the past but I couldn't
> find it.  I guess it could be added as a 'Link:' tag.)
> 
> Cheers,
> --
> Luis
> 
>  fs/ceph/caps.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index a5e93b185515..727e41e3b939 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -4229,6 +4229,7 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
>  {
>  	struct inode *inode;
>  	struct ceph_inode_info *ci;
> +	LIST_HEAD(caps_list);
>  
>  	dout("check_delayed_caps\n");
>  	spin_lock(&mdsc->cap_delay_lock);
> @@ -4239,19 +4240,23 @@ void ceph_check_delayed_caps(struct ceph_mds_client *mdsc)
>  		if ((ci->i_ceph_flags & CEPH_I_FLUSH) == 0 &&
>  		    time_before(jiffies, ci->i_hold_caps_max))
>  			break;
> -		list_del_init(&ci->i_cap_delay_list);
> +		list_move_tail(&ci->i_cap_delay_list, &caps_list);
> +	}
> +	spin_unlock(&mdsc->cap_delay_lock);
>  
> +	while (!list_empty(&caps_list)) {
> +		ci = list_first_entry(&caps_list,
> +				      struct ceph_inode_info,
> +				      i_cap_delay_list);
> +		list_del_init(&ci->i_cap_delay_list);
>  		inode = igrab(&ci->vfs_inode);
>  		if (inode) {
> -			spin_unlock(&mdsc->cap_delay_lock);
>  			dout("check_delayed_caps on %p\n", inode);
>  			ceph_check_caps(ci, 0, NULL);
>  			/* avoid calling iput_final() in tick thread */
>  			ceph_async_iput(inode);
> -			spin_lock(&mdsc->cap_delay_lock);
>  		}
>  	}
> -	spin_unlock(&mdsc->cap_delay_lock);
>  }
>  
>  /*

I'm not sure this approach is viable, unfortunately. Once you've dropped
the cap_delay_lock, then nothing protects the i_cap_delay_list head
anymore.

So you could detach these objects and put them on the private list, and
then once you drop the spinlock another task could find one of them and
(e.g.) call __cap_delay_requeue on it, potentially corrupting your list.

I think we'll need to come up with a different way to do this...
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] ceph: reduce contention in ceph_check_delayed_caps()
  2021-06-25 16:54 ` Jeff Layton
@ 2021-06-28  9:04   ` Luis Henriques
  2021-06-28 14:44     ` Jeff Layton
  0 siblings, 1 reply; 4+ messages in thread
From: Luis Henriques @ 2021-06-28  9:04 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Ilya Dryomov, ceph-devel, linux-kernel, stable

On Fri, Jun 25, 2021 at 12:54:44PM -0400, Jeff Layton wrote:
<...>
> I'm not sure this approach is viable, unfortunately. Once you've dropped
> the cap_delay_lock, then nothing protects the i_cap_delay_list head
> anymore.
> 
> So you could detach these objects and put them on the private list, and
> then once you drop the spinlock another task could find one of them and
> (e.g.) call __cap_delay_requeue on it, potentially corrupting your list.
> 
> I think we'll need to come up with a different way to do this...

Ugh, yeah I see what you mean.

Another option I can think off is to time-bound this loop, so that it
would stop after finding the first ci->i_hold_caps_max timestamp that was
set *after* the start of the current run.  I'll see if I can come up with
an RFC shortly.

Cheers,
--
Luís

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] ceph: reduce contention in ceph_check_delayed_caps()
  2021-06-28  9:04   ` Luis Henriques
@ 2021-06-28 14:44     ` Jeff Layton
  0 siblings, 0 replies; 4+ messages in thread
From: Jeff Layton @ 2021-06-28 14:44 UTC (permalink / raw)
  To: Luis Henriques; +Cc: Ilya Dryomov, ceph-devel, linux-kernel, stable

On Mon, 2021-06-28 at 10:04 +0100, Luis Henriques wrote:
> On Fri, Jun 25, 2021 at 12:54:44PM -0400, Jeff Layton wrote:
> <...>
> > I'm not sure this approach is viable, unfortunately. Once you've dropped
> > the cap_delay_lock, then nothing protects the i_cap_delay_list head
> > anymore.
> > 
> > So you could detach these objects and put them on the private list, and
> > then once you drop the spinlock another task could find one of them and
> > (e.g.) call __cap_delay_requeue on it, potentially corrupting your list.
> > 
> > I think we'll need to come up with a different way to do this...
> 
> Ugh, yeah I see what you mean.
> 
> Another option I can think off is to time-bound this loop, so that it
> would stop after finding the first ci->i_hold_caps_max timestamp that was
> set *after* the start of the current run.  I'll see if I can come up with
> an RFC shortly.
> 

Sounds like a reasonable thing to do.

The catch there is that those caps may end up being delayed up to 5s
more than they would have, since schedule_delayed always uses a 5s
delay. That delay could be made more dynamic if it becomes an issue.

Maybe have the schedule_delayed callers calculate and pass in a timeout
and schedule the next run for that point in the future? Then
delayed_work could schedule the next run to coincide with the timeout of
the next entry on the list.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-06-28 15:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-25 15:45 [RFC PATCH] ceph: reduce contention in ceph_check_delayed_caps() Luis Henriques
2021-06-25 16:54 ` Jeff Layton
2021-06-28  9:04   ` Luis Henriques
2021-06-28 14:44     ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).