All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] ceph: cancel delayed work instead of flushing on mdsc teardown
@ 2021-07-29 12:38 Jeff Layton
  2021-07-29 12:47 ` Xiubo Li
  0 siblings, 1 reply; 2+ messages in thread
From: Jeff Layton @ 2021-07-29 12:38 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, Xiubo Li

The first thing metric_delayed_work does is check mdsc->stopping,
and then return immediately if it's set. That's good since we would
have already torn down the metric structures at this point, otherwise,
but there is no locking around mdsc->stopping.

It's possible that the ceph_metric_destroy call could race with the
delayed_work, in which case we could end up with the delayed_work
accessing destroyed percpu variables.

At this point in the mdsc teardown, the "stopping" flag has already been
set, so there's no benefit to flushing the work. Move the work
cancellation in ceph_metric_destroy ahead of the percpu variable
destruction, and eliminate the flush_delayed_work call in
ceph_mdsc_destroy.

Cc: Xiubo Li <xiubli@redhat.com>
Fixes: 18f473b384a6 ("ceph: periodically send perf metrics to MDSes")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 1 -
 fs/ceph/metric.c     | 4 ++--
 2 files changed, 2 insertions(+), 3 deletions(-)

v2: just drop the flush call altogether and move the cancel before the
    percpu variables are destroyed (per Xiubo's suggestion).

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index c43091a30ba8..34124fb1605e 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -4979,7 +4979,6 @@ void ceph_mdsc_destroy(struct ceph_fs_client *fsc)
 
 	ceph_metric_destroy(&mdsc->metric);
 
-	flush_delayed_work(&mdsc->metric.delayed_work);
 	fsc->mdsc = NULL;
 	kfree(mdsc);
 	dout("mdsc_destroy %p done\n", mdsc);
diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
index 5ac151eb0d49..04d5df29bbbf 100644
--- a/fs/ceph/metric.c
+++ b/fs/ceph/metric.c
@@ -302,6 +302,8 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
 	if (!m)
 		return;
 
+	cancel_delayed_work_sync(&m->delayed_work);
+
 	percpu_counter_destroy(&m->total_inodes);
 	percpu_counter_destroy(&m->opened_inodes);
 	percpu_counter_destroy(&m->i_caps_mis);
@@ -309,8 +311,6 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
 	percpu_counter_destroy(&m->d_lease_mis);
 	percpu_counter_destroy(&m->d_lease_hit);
 
-	cancel_delayed_work_sync(&m->delayed_work);
-
 	ceph_put_mds_session(m->session);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v2] ceph: cancel delayed work instead of flushing on mdsc teardown
  2021-07-29 12:38 [PATCH v2] ceph: cancel delayed work instead of flushing on mdsc teardown Jeff Layton
@ 2021-07-29 12:47 ` Xiubo Li
  0 siblings, 0 replies; 2+ messages in thread
From: Xiubo Li @ 2021-07-29 12:47 UTC (permalink / raw)
  To: Jeff Layton, ceph-devel; +Cc: idryomov


On 7/29/21 8:38 PM, Jeff Layton wrote:
> The first thing metric_delayed_work does is check mdsc->stopping,
> and then return immediately if it's set. That's good since we would
> have already torn down the metric structures at this point, otherwise,
> but there is no locking around mdsc->stopping.
>
> It's possible that the ceph_metric_destroy call could race with the
> delayed_work, in which case we could end up with the delayed_work
> accessing destroyed percpu variables.
>
> At this point in the mdsc teardown, the "stopping" flag has already been
> set, so there's no benefit to flushing the work. Move the work
> cancellation in ceph_metric_destroy ahead of the percpu variable
> destruction, and eliminate the flush_delayed_work call in
> ceph_mdsc_destroy.
>
> Cc: Xiubo Li <xiubli@redhat.com>
> Fixes: 18f473b384a6 ("ceph: periodically send perf metrics to MDSes")
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/ceph/mds_client.c | 1 -
>   fs/ceph/metric.c     | 4 ++--
>   2 files changed, 2 insertions(+), 3 deletions(-)
>
> v2: just drop the flush call altogether and move the cancel before the
>      percpu variables are destroyed (per Xiubo's suggestion).
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index c43091a30ba8..34124fb1605e 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -4979,7 +4979,6 @@ void ceph_mdsc_destroy(struct ceph_fs_client *fsc)
>   
>   	ceph_metric_destroy(&mdsc->metric);
>   
> -	flush_delayed_work(&mdsc->metric.delayed_work);
>   	fsc->mdsc = NULL;
>   	kfree(mdsc);
>   	dout("mdsc_destroy %p done\n", mdsc);
> diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
> index 5ac151eb0d49..04d5df29bbbf 100644
> --- a/fs/ceph/metric.c
> +++ b/fs/ceph/metric.c
> @@ -302,6 +302,8 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
>   	if (!m)
>   		return;
>   
> +	cancel_delayed_work_sync(&m->delayed_work);
> +
>   	percpu_counter_destroy(&m->total_inodes);
>   	percpu_counter_destroy(&m->opened_inodes);
>   	percpu_counter_destroy(&m->i_caps_mis);
> @@ -309,8 +311,6 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
>   	percpu_counter_destroy(&m->d_lease_mis);
>   	percpu_counter_destroy(&m->d_lease_hit);
>   
> -	cancel_delayed_work_sync(&m->delayed_work);
> -
>   	ceph_put_mds_session(m->session);
>   }
>   

Reviewed-by: Xiubo Li <xiubli@redhat.com>


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-07-29 12:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-29 12:38 [PATCH v2] ceph: cancel delayed work instead of flushing on mdsc teardown Jeff Layton
2021-07-29 12:47 ` Xiubo Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.