All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xen/timers: Fix memory leak with cpu hot unplug
@ 2019-03-29 16:57 Andrew Cooper
  2019-04-01  8:27 ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2019-03-29 16:57 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Julien Grall,
	Jan Beulich, Ian Jackson

timer_softirq_action() realloc's itself a larger timer heap whenever
necessary, which includes bootstrapping from the empty dummy_heap.  Nothing
ever freed this allocation.

CPU hot unplug and plug has the side effect of zeroing the percpu data area,
which clears ts->heap.  This in turn causes new timers to be put on the list
rather than the heap, and for timer_softirq_action() to bootstrap itself
again.

This in practice leaks ts->heap every time a CPU is hot unplugged and
replugged.  In the cpu notifier, free the heap after migrating all other
timers away.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Julien Grall <julien.grall@arm.com>

This texturally depends on "xen/timers: Document and improve the
representation of the timer heap metadata" which was necessary to understand
the problem well enough to fix it, but isn't backporting over this change
isn't too complicated (should the cleanup patch not want to be backported).
---
 xen/common/timer.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/xen/common/timer.c b/xen/common/timer.c
index 98f2c48..afcb1b0 100644
--- a/xen/common/timer.c
+++ b/xen/common/timer.c
@@ -631,6 +631,10 @@ static int cpu_callback(
     case CPU_UP_CANCELED:
     case CPU_DEAD:
         migrate_timers_from_cpu(cpu);
+        ASSERT(heap_metadata(ts->heap)->size == 0);
+        if ( heap_metadata(ts->heap)->limit )
+            xfree(ts->heap);
+        ts->heap = dummy_heap;
         break;
     default:
         break;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] xen/timers: Fix memory leak with cpu hot unplug
  2019-03-29 16:57 [PATCH] xen/timers: Fix memory leak with cpu hot unplug Andrew Cooper
@ 2019-04-01  8:27 ` Jan Beulich
  2019-04-01 16:59   ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2019-04-01  8:27 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Xen-devel, Julien Grall, Ian Jackson

>>> On 29.03.19 at 17:57, <andrew.cooper3@citrix.com> wrote:
> timer_softirq_action() realloc's itself a larger timer heap whenever
> necessary, which includes bootstrapping from the empty dummy_heap.  Nothing
> ever freed this allocation.
> 
> CPU hot unplug and plug has the side effect of zeroing the percpu data area,
> which clears ts->heap.  This in turn causes new timers to be put on the list
> rather than the heap, and for timer_softirq_action() to bootstrap itself
> again.
> 
> This in practice leaks ts->heap every time a CPU is hot unplugged and
> replugged.  In the cpu notifier, free the heap after migrating all other
> timers away.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Could I talk you into using online/offline instead of plug/unplug,
as the latter is pretty tied to the physical operation of adding /
removing a CPU to / from a system, whereas the issue here
also surfaces for pure software actions like suspend/resume
or the xen-hptool operations?

> --- a/xen/common/timer.c
> +++ b/xen/common/timer.c
> @@ -631,6 +631,10 @@ static int cpu_callback(
>      case CPU_UP_CANCELED:
>      case CPU_DEAD:
>          migrate_timers_from_cpu(cpu);
> +        ASSERT(heap_metadata(ts->heap)->size == 0);
> +        if ( heap_metadata(ts->heap)->limit )
> +            xfree(ts->heap);
> +        ts->heap = dummy_heap;
>          break;

I think it would be worthwhile to add a comment to clarify that
the updating of per-CPU data here is not entirely pointless,
despite the zeroing done when bringing a CPU back up.

Additionally - is this really the right thing to do uniformly during
CPU_UP_CANCELED / CPU_DEAD? Shouldn't this be done
conditionally upon park_offline_cpus there, and get done for
CPU_REMOVE in the opposite case (like we do elsewhere, in
particular in cpu_percpu_callback() itself)?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xen/timers: Fix memory leak with cpu hot unplug
  2019-04-01  8:27 ` Jan Beulich
@ 2019-04-01 16:59   ` Andrew Cooper
  2019-04-02 10:32     ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2019-04-01 16:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Xen-devel, Julien Grall, Ian Jackson

On 01/04/2019 09:27, Jan Beulich wrote:
>>>> On 29.03.19 at 17:57, <andrew.cooper3@citrix.com> wrote:
>> timer_softirq_action() realloc's itself a larger timer heap whenever
>> necessary, which includes bootstrapping from the empty dummy_heap.  Nothing
>> ever freed this allocation.
>>
>> CPU hot unplug and plug has the side effect of zeroing the percpu data area,
>> which clears ts->heap.  This in turn causes new timers to be put on the list
>> rather than the heap, and for timer_softirq_action() to bootstrap itself
>> again.
>>
>> This in practice leaks ts->heap every time a CPU is hot unplugged and
>> replugged.  In the cpu notifier, free the heap after migrating all other
>> timers away.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Could I talk you into using online/offline instead of plug/unplug,
> as the latter is pretty tied to the physical operation of adding /
> removing a CPU to / from a system, whereas the issue here
> also surfaces for pure software actions like suspend/resume
> or the xen-hptool operations?

Ok.

>
>> --- a/xen/common/timer.c
>> +++ b/xen/common/timer.c
>> @@ -631,6 +631,10 @@ static int cpu_callback(
>>      case CPU_UP_CANCELED:
>>      case CPU_DEAD:
>>          migrate_timers_from_cpu(cpu);
>> +        ASSERT(heap_metadata(ts->heap)->size == 0);
>> +        if ( heap_metadata(ts->heap)->limit )
>> +            xfree(ts->heap);
>> +        ts->heap = dummy_heap;
>>          break;
> I think it would be worthwhile to add a comment to clarify that
> the updating of per-CPU data here is not entirely pointless,
> despite the zeroing done when bringing a CPU back up.

What kind of comment do you think would be useful here? 

ts->heap obviously shouldn’t be left as a wild pointer, and I don't see
why this point is worthy of comment.

>
> Additionally - is this really the right thing to do uniformly during
> CPU_UP_CANCELED / CPU_DEAD? Shouldn't this be done
> conditionally upon park_offline_cpus there, and get done for
> CPU_REMOVE in the opposite case (like we do elsewhere, in
> particular in cpu_percpu_callback() itself)?

Its certainly the safest course of action, and obviously needs to follow
migrate_timers_from_cpu()

Do parked CPUs actually get entered into the online map?  It appears so.

Either way, the actual semantics of park_offline_cpus are undocumented,
and if the intended semantics are to use that bifurcated logic
everywhere, then I think the semantics want rethinking.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xen/timers: Fix memory leak with cpu hot unplug
  2019-04-01 16:59   ` Andrew Cooper
@ 2019-04-02 10:32     ` Jan Beulich
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2019-04-02 10:32 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Xen-devel, Julien Grall, Ian Jackson

>>> On 01.04.19 at 18:59, <andrew.cooper3@citrix.com> wrote:
> On 01/04/2019 09:27, Jan Beulich wrote:
>>>>> On 29.03.19 at 17:57, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/common/timer.c
>>> +++ b/xen/common/timer.c
>>> @@ -631,6 +631,10 @@ static int cpu_callback(
>>>      case CPU_UP_CANCELED:
>>>      case CPU_DEAD:
>>>          migrate_timers_from_cpu(cpu);
>>> +        ASSERT(heap_metadata(ts->heap)->size == 0);
>>> +        if ( heap_metadata(ts->heap)->limit )
>>> +            xfree(ts->heap);
>>> +        ts->heap = dummy_heap;
>>>          break;
>> I think it would be worthwhile to add a comment to clarify that
>> the updating of per-CPU data here is not entirely pointless,
>> despite the zeroing done when bringing a CPU back up.
> 
> What kind of comment do you think would be useful here? 
> 
> ts->heap obviously shouldn’t be left as a wild pointer, and I don't see
> why this point is worthy of comment.

Oh, I wasn't suggesting to comment the freeing. It's the storing
of dummy_heap's address which might look unnecessary when
being aware of the clearing of per-CPU data.

>> Additionally - is this really the right thing to do uniformly during
>> CPU_UP_CANCELED / CPU_DEAD? Shouldn't this be done
>> conditionally upon park_offline_cpus there, and get done for
>> CPU_REMOVE in the opposite case (like we do elsewhere, in
>> particular in cpu_percpu_callback() itself)?
> 
> Its certainly the safest course of action, and obviously needs to follow
> migrate_timers_from_cpu()
> 
> Do parked CPUs actually get entered into the online map?  It appears so.

No, they don't: It's a full cpu_down() they go through. That's
why the new CPU_REMOVE notification was introduced, for
handlers to be able to free per-CPU data at the appropriate
point in time.

> Either way, the actual semantics of park_offline_cpus are undocumented,
> and if the intended semantics are to use that bifurcated logic
> everywhere, then I think the semantics want rethinking.

Well, I don't mind finding a different, perhaps easier to use
model. Back then, when introducing parking, I couldn't think
of one.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-04-02 10:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-29 16:57 [PATCH] xen/timers: Fix memory leak with cpu hot unplug Andrew Cooper
2019-04-01  8:27 ` Jan Beulich
2019-04-01 16:59   ` Andrew Cooper
2019-04-02 10:32     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.