All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf/x86/rapl: fix deadlock in rapl_pmu_event_stop
@ 2022-09-17 14:47 Duoming Zhou
  2022-09-19 11:45 ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Duoming Zhou @ 2022-09-17 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	namhyung, tglx, bp, dave.hansen, x86, hpa, Duoming Zhou

There is a deadlock in rapl_pmu_event_stop(), the process is
shown below:

    (thread 1)                 |        (thread 2)
rapl_pmu_event_stop()          | rapl_hrtimer_handle()
 ...                           |  if (!pmu->n_active)
 raw_spin_lock_irqsave() //(1) |  ...
  ...                          |
  hrtimer_cancel()             |  raw_spin_lock_irqsave() //(2)
  (block forever)

We hold pmu->lock in position (1) and use hrtimer_cancel() to wait
rapl_hrtimer_handle() to stop, but rapl_hrtimer_handle() also need
pmu->lock in position (2). As a result, the rapl_pmu_event_stop()
will be blocked forever.

This patch extracts hrtimer_cancel() from the protection of
raw_spin_lock_irqsave(). As a result, the rapl_hrtimer_handle() could
obtain the pmu->lock. In order to prevent race conditions, we put
"if (!pmu->n_active)" in rapl_hrtimer_handle() under the protection
of raw_spin_lock_irqsave().

Fixes: 65661f96d3b3 ("perf/x86: Add RAPL hrtimer support")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 arch/x86/events/rapl.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 77e3a47af5a..97c71538d01 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -219,11 +219,11 @@ static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer)
 	struct perf_event *event;
 	unsigned long flags;
 
+	raw_spin_lock_irqsave(&pmu->lock, flags);
+
 	if (!pmu->n_active)
 		return HRTIMER_NORESTART;
 
-	raw_spin_lock_irqsave(&pmu->lock, flags);
-
 	list_for_each_entry(event, &pmu->active_list, active_entry)
 		rapl_event_update(event);
 
@@ -281,8 +281,11 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
 	if (!(hwc->state & PERF_HES_STOPPED)) {
 		WARN_ON_ONCE(pmu->n_active <= 0);
 		pmu->n_active--;
-		if (pmu->n_active == 0)
+		if (!pmu->n_active) {
+			raw_spin_unlock_irqrestore(&pmu->lock, flags);
 			hrtimer_cancel(&pmu->hrtimer);
+			raw_spin_lock_irqsave(&pmu->lock, flags);
+		}
 
 		list_del(&event->active_entry);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] perf/x86/rapl: fix deadlock in rapl_pmu_event_stop
  2022-09-17 14:47 [PATCH] perf/x86/rapl: fix deadlock in rapl_pmu_event_stop Duoming Zhou
@ 2022-09-19 11:45 ` Peter Zijlstra
  2022-09-19 15:16   ` duoming
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2022-09-19 11:45 UTC (permalink / raw)
  To: Duoming Zhou
  Cc: linux-kernel, linux-perf-users, mingo, acme, mark.rutland,
	alexander.shishkin, jolsa, namhyung, tglx, bp, dave.hansen, x86,
	hpa

On Sat, Sep 17, 2022 at 10:47:29PM +0800, Duoming Zhou wrote:
> There is a deadlock in rapl_pmu_event_stop(), the process is
> shown below:
> 
>     (thread 1)                 |        (thread 2)
> rapl_pmu_event_stop()          | rapl_hrtimer_handle()
>  ...                           |  if (!pmu->n_active)
>  raw_spin_lock_irqsave() //(1) |  ...
>   ...                          |
>   hrtimer_cancel()             |  raw_spin_lock_irqsave() //(2)
>   (block forever)
> 
> We hold pmu->lock in position (1) and use hrtimer_cancel() to wait
> rapl_hrtimer_handle() to stop, but rapl_hrtimer_handle() also need
> pmu->lock in position (2). As a result, the rapl_pmu_event_stop()
> will be blocked forever.
> 
> This patch extracts hrtimer_cancel() from the protection of
> raw_spin_lock_irqsave(). As a result, the rapl_hrtimer_handle() could
> obtain the pmu->lock. In order to prevent race conditions, we put
> "if (!pmu->n_active)" in rapl_hrtimer_handle() under the protection
> of raw_spin_lock_irqsave().
> 
> Fixes: 65661f96d3b3 ("perf/x86: Add RAPL hrtimer support")
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
>  arch/x86/events/rapl.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index 77e3a47af5a..97c71538d01 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -219,11 +219,11 @@ static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer)
>  	struct perf_event *event;
>  	unsigned long flags;
>  
> +	raw_spin_lock_irqsave(&pmu->lock, flags);
> +
>  	if (!pmu->n_active)
>  		return HRTIMER_NORESTART;

Except now you return with the lock held...

>  
> -	raw_spin_lock_irqsave(&pmu->lock, flags);
> -
>  	list_for_each_entry(event, &pmu->active_list, active_entry)
>  		rapl_event_update(event);
>  
> @@ -281,8 +281,11 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
>  	if (!(hwc->state & PERF_HES_STOPPED)) {
>  		WARN_ON_ONCE(pmu->n_active <= 0);
>  		pmu->n_active--;
> -		if (pmu->n_active == 0)
> +		if (!pmu->n_active) {
> +			raw_spin_unlock_irqrestore(&pmu->lock, flags);
>  			hrtimer_cancel(&pmu->hrtimer);
> +			raw_spin_lock_irqsave(&pmu->lock, flags);

Doing a lock-break makes the nr_active and list_del thing non-atomic,
breaking the whole purpose of the lock.

> +		}
>  
>  		list_del(&event->active_entry);


Now; did you actually observe this deadlock or is this a code-reading
exercise? If you saw an actual deadlock, was cpu-hotplug involved?



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] perf/x86/rapl: fix deadlock in rapl_pmu_event_stop
  2022-09-19 11:45 ` Peter Zijlstra
@ 2022-09-19 15:16   ` duoming
  0 siblings, 0 replies; 3+ messages in thread
From: duoming @ 2022-09-19 15:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-perf-users, mingo, acme, mark.rutland,
	alexander.shishkin, jolsa, namhyung, tglx, bp, dave.hansen, x86,
	hpa

Hello,

On Mon, 19 Sep 2022 13:45:38 +0200 Peter Zijlstra wrote:

> On Sat, Sep 17, 2022 at 10:47:29PM +0800, Duoming Zhou wrote:
> > There is a deadlock in rapl_pmu_event_stop(), the process is
> > shown below:
> > 
> >     (thread 1)                 |        (thread 2)
> > rapl_pmu_event_stop()          | rapl_hrtimer_handle()
> >  ...                           |  if (!pmu->n_active)
> >  raw_spin_lock_irqsave() //(1) |  ...
> >   ...                          |
> >   hrtimer_cancel()             |  raw_spin_lock_irqsave() //(2)
> >   (block forever)
> > 
> > We hold pmu->lock in position (1) and use hrtimer_cancel() to wait
> > rapl_hrtimer_handle() to stop, but rapl_hrtimer_handle() also need
> > pmu->lock in position (2). As a result, the rapl_pmu_event_stop()
> > will be blocked forever.
> > 
> > This patch extracts hrtimer_cancel() from the protection of
> > raw_spin_lock_irqsave(). As a result, the rapl_hrtimer_handle() could
> > obtain the pmu->lock. In order to prevent race conditions, we put
> > "if (!pmu->n_active)" in rapl_hrtimer_handle() under the protection
> > of raw_spin_lock_irqsave().
> > 
> > Fixes: 65661f96d3b3 ("perf/x86: Add RAPL hrtimer support")
> > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > ---
> >  arch/x86/events/rapl.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> > index 77e3a47af5a..97c71538d01 100644
> > --- a/arch/x86/events/rapl.c
> > +++ b/arch/x86/events/rapl.c
> > @@ -219,11 +219,11 @@ static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer)
> >  	struct perf_event *event;
> >  	unsigned long flags;
> >  
> > +	raw_spin_lock_irqsave(&pmu->lock, flags);
> > +
> >  	if (!pmu->n_active)
> >  		return HRTIMER_NORESTART;
> 
> Except now you return with the lock held...
> 
> >  
> > -	raw_spin_lock_irqsave(&pmu->lock, flags);
> > -
> >  	list_for_each_entry(event, &pmu->active_list, active_entry)
> >  		rapl_event_update(event);
> >  
> > @@ -281,8 +281,11 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
> >  	if (!(hwc->state & PERF_HES_STOPPED)) {
> >  		WARN_ON_ONCE(pmu->n_active <= 0);
> >  		pmu->n_active--;
> > -		if (pmu->n_active == 0)
> > +		if (!pmu->n_active) {
> > +			raw_spin_unlock_irqrestore(&pmu->lock, flags);
> >  			hrtimer_cancel(&pmu->hrtimer);
> > +			raw_spin_lock_irqsave(&pmu->lock, flags);
> 
> Doing a lock-break makes the nr_active and list_del thing non-atomic,
> breaking the whole purpose of the lock.

Thank you for your time and suggestions! I come up with another solution that
will not break the atomicity, the detail is shown below: 

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 77e3a47af5a..7c110092c83 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -281,8 +281,6 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
        if (!(hwc->state & PERF_HES_STOPPED)) {
                WARN_ON_ONCE(pmu->n_active <= 0);
                pmu->n_active--;
-               if (pmu->n_active == 0)
-                       hrtimer_cancel(&pmu->hrtimer);
 
                list_del(&event->active_entry);
 
@@ -300,6 +298,11 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
                hwc->state |= PERF_HES_UPTODATE;
        }
 
+       if (!pmu->n_active) {
+               raw_spin_unlock_irqrestore(&pmu->lock, flags);
+               hrtimer_cancel(&pmu->hrtimer);
+               return;
+       }
        raw_spin_unlock_irqrestore(&pmu->lock, flags);
 }

I move the hrtimer_cancel() to the end of the rapl_pmu_event_stop() function.
As a result, the atomicity will not break and the deadlock bug could be mitigated.

> > +		}
> >  
> >  		list_del(&event->active_entry);
> 
> 
> Now; did you actually observe this deadlock or is this a code-reading
> exercise? If you saw an actual deadlock, was cpu-hotplug involved?

I found this bug through a static analysis tool written by myself.

Thanks you!

Best regards,
Duoming Zhou

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-09-19 15:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-17 14:47 [PATCH] perf/x86/rapl: fix deadlock in rapl_pmu_event_stop Duoming Zhou
2022-09-19 11:45 ` Peter Zijlstra
2022-09-19 15:16   ` duoming

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.