All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
@ 2015-01-19 10:26 ` Preeti U Murthy
  0 siblings, 0 replies; 6+ messages in thread
From: Preeti U Murthy @ 2015-01-19 10:26 UTC (permalink / raw)
  To: tglx; +Cc: aik, shreyas, linux-kernel, michael, anton, svaidy, linuxppc-dev

Today if a cpu handling broadcasting of wakeups goes offline, it hands over
the job of broadcasting to another cpu in the CPU_DEAD phase. The CPU_DEAD
notifiers are run only after the offline cpu sets its state as CPU_DEAD.
Meanwhile, the kthread doing the offline is scheduled out while waiting for
this transition by queuing a timer. This is fatal because if the cpu on which
this kthread was running has no other work queued on it, it can re-enter deep
idle state, since it sees that a broadcast cpu still exists. However the broadcast
wakeup will never come since the cpu which was handling it is offline, and this cpu
never wakes up to see this because its in deep idle state.

Fix this by setting the broadcast timer to a max value so as to force the cpus
entering deep idle states henceforth to freshly nominate the broadcast cpu. More
importantly this has to be done in the CPU_DYING phase so that it is visible to
all cpus right after exiting stop_machine, which is when they can re-enter idle.
This ensures that handover of the broadcast duty falls in place on offline, without
having to do it explicitly.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 kernel/time/clockevents.c    |    2 +-
 kernel/time/tick-broadcast.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 5544990..f3907c9 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -568,6 +568,7 @@ int clockevents_notify(unsigned long reason, void *arg)
 
 	case CLOCK_EVT_NOTIFY_CPU_DYING:
 		tick_handover_do_timer(arg);
+		tick_shutdown_broadcast_oneshot(arg);
 		break;
 
 	case CLOCK_EVT_NOTIFY_SUSPEND:
@@ -580,7 +581,6 @@ int clockevents_notify(unsigned long reason, void *arg)
 		break;
 
 	case CLOCK_EVT_NOTIFY_CPU_DEAD:
-		tick_shutdown_broadcast_oneshot(arg);
 		tick_shutdown_broadcast(arg);
 		tick_shutdown(arg);
 		/*
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 066f0ec..e9c1d9b 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -675,8 +675,8 @@ static void broadcast_move_bc(int deadcpu)
 
 	if (!bc || !broadcast_needs_cpu(bc, deadcpu))
 		return;
-	/* This moves the broadcast assignment to this cpu */
-	clockevents_program_event(bc, bc->next_event, 1);
+	/* This allows fresh nomination of broadcast cpu */
+	bc->next_event.tv64 = KTIME_MAX;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
@ 2015-01-19 10:26 ` Preeti U Murthy
  0 siblings, 0 replies; 6+ messages in thread
From: Preeti U Murthy @ 2015-01-19 10:26 UTC (permalink / raw)
  To: tglx; +Cc: aik, shreyas, linux-kernel, michael, anton, linuxppc-dev

Today if a cpu handling broadcasting of wakeups goes offline, it hands over
the job of broadcasting to another cpu in the CPU_DEAD phase. The CPU_DEAD
notifiers are run only after the offline cpu sets its state as CPU_DEAD.
Meanwhile, the kthread doing the offline is scheduled out while waiting for
this transition by queuing a timer. This is fatal because if the cpu on which
this kthread was running has no other work queued on it, it can re-enter deep
idle state, since it sees that a broadcast cpu still exists. However the broadcast
wakeup will never come since the cpu which was handling it is offline, and this cpu
never wakes up to see this because its in deep idle state.

Fix this by setting the broadcast timer to a max value so as to force the cpus
entering deep idle states henceforth to freshly nominate the broadcast cpu. More
importantly this has to be done in the CPU_DYING phase so that it is visible to
all cpus right after exiting stop_machine, which is when they can re-enter idle.
This ensures that handover of the broadcast duty falls in place on offline, without
having to do it explicitly.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---

 kernel/time/clockevents.c    |    2 +-
 kernel/time/tick-broadcast.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 5544990..f3907c9 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -568,6 +568,7 @@ int clockevents_notify(unsigned long reason, void *arg)
 
 	case CLOCK_EVT_NOTIFY_CPU_DYING:
 		tick_handover_do_timer(arg);
+		tick_shutdown_broadcast_oneshot(arg);
 		break;
 
 	case CLOCK_EVT_NOTIFY_SUSPEND:
@@ -580,7 +581,6 @@ int clockevents_notify(unsigned long reason, void *arg)
 		break;
 
 	case CLOCK_EVT_NOTIFY_CPU_DEAD:
-		tick_shutdown_broadcast_oneshot(arg);
 		tick_shutdown_broadcast(arg);
 		tick_shutdown(arg);
 		/*
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 066f0ec..e9c1d9b 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -675,8 +675,8 @@ static void broadcast_move_bc(int deadcpu)
 
 	if (!bc || !broadcast_needs_cpu(bc, deadcpu))
 		return;
-	/* This moves the broadcast assignment to this cpu */
-	clockevents_program_event(bc, bc->next_event, 1);
+	/* This allows fresh nomination of broadcast cpu */
+	bc->next_event.tv64 = KTIME_MAX;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
  2015-01-19 10:26 ` Preeti U Murthy
@ 2015-01-20  6:09   ` Michael Ellerman
  -1 siblings, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2015-01-20  6:09 UTC (permalink / raw)
  To: Preeti U Murthy, tglx; +Cc: aik, shreyas, linux-kernel, anton, linuxppc-dev

On Mon, 2015-19-01 at 10:26:48 UTC, Preeti U Murthy wrote:
> Today if a cpu handling broadcasting of wakeups goes offline, it hands over

It's *the* cpu handling broadcasting of wakeups right? ie. there's only ever
one at a time.

> the job of broadcasting to another cpu in the CPU_DEAD phase. 

I think that would be clearer as "to another cpu, when the cpu going offline
reaches the CPU_DEAD state."

Otherwise it can read as "another cpu (which is) in the CPU_DEAD phase", which
is not what you mean - I think.

> The CPU_DEAD notifiers are run only after the offline cpu sets its state as
> CPU_DEAD. Meanwhile, the kthread doing the offline is scheduled out while

The kthread which is running on a different cpu from either of the first two
cpus you've mentioned.

> waiting for this transition by queuing a timer. This is fatal because if the
> cpu on which this kthread was running has no other work queued on it, it can
> re-enter deep idle state, since it sees that a broadcast cpu still exists.
> However the broadcast wakeup will never come since the cpu which was handling
> it is offline, and this cpu never wakes up to see this because its in deep
> idle state.

Which cpu is "this cpu"? I think you mean the one running the kthread which is
doing the offline, but it's not 100% clear.

> Fix this by setting the broadcast timer to a max value so as to force the cpus
> entering deep idle states henceforth to freshly nominate the broadcast cpu. More
> importantly this has to be done in the CPU_DYING phase so that it is visible to
> all cpus right after exiting stop_machine, which is when they can re-enter idle.
> This ensures that handover of the broadcast duty falls in place on offline, without
> having to do it explicitly.

OK, I don't know the code well enough to say if that's the right fix.

You say:

+	/* This allows fresh nomination of broadcast cpu */
+	bc->next_event.tv64 = KTIME_MAX;

Is that all it does? I see that check in several places in the code.


I assume we're expecting Thomas to merge this?

If so it's probably worth mentioning that it fixes a bug we are seeing on
machines in the wild. So it'd be nice if it went into 3.19 and/or gets sent to
stable.

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
@ 2015-01-20  6:09   ` Michael Ellerman
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2015-01-20  6:09 UTC (permalink / raw)
  To: Preeti U Murthy, tglx; +Cc: aik, shreyas, linuxppc-dev, linux-kernel, anton

On Mon, 2015-19-01 at 10:26:48 UTC, Preeti U Murthy wrote:
> Today if a cpu handling broadcasting of wakeups goes offline, it hands over

It's *the* cpu handling broadcasting of wakeups right? ie. there's only ever
one at a time.

> the job of broadcasting to another cpu in the CPU_DEAD phase. 

I think that would be clearer as "to another cpu, when the cpu going offline
reaches the CPU_DEAD state."

Otherwise it can read as "another cpu (which is) in the CPU_DEAD phase", which
is not what you mean - I think.

> The CPU_DEAD notifiers are run only after the offline cpu sets its state as
> CPU_DEAD. Meanwhile, the kthread doing the offline is scheduled out while

The kthread which is running on a different cpu from either of the first two
cpus you've mentioned.

> waiting for this transition by queuing a timer. This is fatal because if the
> cpu on which this kthread was running has no other work queued on it, it can
> re-enter deep idle state, since it sees that a broadcast cpu still exists.
> However the broadcast wakeup will never come since the cpu which was handling
> it is offline, and this cpu never wakes up to see this because its in deep
> idle state.

Which cpu is "this cpu"? I think you mean the one running the kthread which is
doing the offline, but it's not 100% clear.

> Fix this by setting the broadcast timer to a max value so as to force the cpus
> entering deep idle states henceforth to freshly nominate the broadcast cpu. More
> importantly this has to be done in the CPU_DYING phase so that it is visible to
> all cpus right after exiting stop_machine, which is when they can re-enter idle.
> This ensures that handover of the broadcast duty falls in place on offline, without
> having to do it explicitly.

OK, I don't know the code well enough to say if that's the right fix.

You say:

+	/* This allows fresh nomination of broadcast cpu */
+	bc->next_event.tv64 = KTIME_MAX;

Is that all it does? I see that check in several places in the code.


I assume we're expecting Thomas to merge this?

If so it's probably worth mentioning that it fixes a bug we are seeing on
machines in the wild. So it'd be nice if it went into 3.19 and/or gets sent to
stable.

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
  2015-01-20  6:09   ` Michael Ellerman
@ 2015-01-20  6:58     ` Preeti U Murthy
  -1 siblings, 0 replies; 6+ messages in thread
From: Preeti U Murthy @ 2015-01-20  6:58 UTC (permalink / raw)
  To: Michael Ellerman, tglx; +Cc: aik, shreyas, linux-kernel, anton, linuxppc-dev

On 01/20/2015 11:39 AM, Michael Ellerman wrote:
> On Mon, 2015-19-01 at 10:26:48 UTC, Preeti U Murthy wrote:
>> Today if a cpu handling broadcasting of wakeups goes offline, it hands over
> 
> It's *the* cpu handling broadcasting of wakeups right? ie. there's only ever
> one at a time.

Right, thanks for pointing this.

> 
>> the job of broadcasting to another cpu in the CPU_DEAD phase. 
> 
> I think that would be clearer as "to another cpu, when the cpu going offline
> reaches the CPU_DEAD state."
> 
> Otherwise it can read as "another cpu (which is) in the CPU_DEAD phase", which
> is not what you mean - I think.

Yes I will word this better.
> 
>> The CPU_DEAD notifiers are run only after the offline cpu sets its state as
>> CPU_DEAD. Meanwhile, the kthread doing the offline is scheduled out while
> 
> The kthread which is running on a different cpu from either of the first two
> cpus you've mentioned.

It could be running on any cpu other than the offline one. The next line
clarifies this - "This is fatal if the cpu on which this kthread was
running". I also say "Meanwhile, the kthread doing the offline" above so
as to clarify that the offline cpu has nothing to do with this kthread.

> 
>> waiting for this transition by queuing a timer. This is fatal because if the
>> cpu on which this kthread was running has no other work queued on it, it can
>> re-enter deep idle state, since it sees that a broadcast cpu still exists.
>> However the broadcast wakeup will never come since the cpu which was handling
>> it is offline, and this cpu never wakes up to see this because its in deep
>> idle state.
> 
> Which cpu is "this cpu"? I think you mean the one running the kthread which is
> doing the offline, but it's not 100% clear.

Right, I will make this correction as well, its ambiguous.

> 
>> Fix this by setting the broadcast timer to a max value so as to force the cpus
>> entering deep idle states henceforth to freshly nominate the broadcast cpu. More
>> importantly this has to be done in the CPU_DYING phase so that it is visible to
>> all cpus right after exiting stop_machine, which is when they can re-enter idle.
>> This ensures that handover of the broadcast duty falls in place on offline, without
>> having to do it explicitly.
> 
> OK, I don't know the code well enough to say if that's the right fix.
> 
> You say:
> 
> +	/* This allows fresh nomination of broadcast cpu */
> +	bc->next_event.tv64 = KTIME_MAX;
> 
> Is that all it does? I see that check in several places in the code.

This change does not affect those parts of the tick broadcast code,
which do not depend on the hrtimer broadcast framework. And for those
parts that do depend on this framework, this plays a critical role in
handing over the broadcast duty.

> 
> 
> I assume we're expecting Thomas to merge this?

Yes, Thomas can you please take this into the next 3.19 rc release ? I
will send out a new patch with the modified changelog. If you find this
acceptable I will port it to the relevant stable kernels.
> 
> If so it's probably worth mentioning that it fixes a bug we are seeing on

I will mention this in the changelog by pointing to the bug report.

Regards
Preeti U Murthy
> machines in the wild. So it'd be nice if it went into 3.19 and/or gets sent to
> stable.
> 
> cheers
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: tick/broadcast: Make movement of broadcast hrtimer robust against hotplug
@ 2015-01-20  6:58     ` Preeti U Murthy
  0 siblings, 0 replies; 6+ messages in thread
From: Preeti U Murthy @ 2015-01-20  6:58 UTC (permalink / raw)
  To: Michael Ellerman, tglx; +Cc: aik, shreyas, linuxppc-dev, linux-kernel, anton

On 01/20/2015 11:39 AM, Michael Ellerman wrote:
> On Mon, 2015-19-01 at 10:26:48 UTC, Preeti U Murthy wrote:
>> Today if a cpu handling broadcasting of wakeups goes offline, it hands over
> 
> It's *the* cpu handling broadcasting of wakeups right? ie. there's only ever
> one at a time.

Right, thanks for pointing this.

> 
>> the job of broadcasting to another cpu in the CPU_DEAD phase. 
> 
> I think that would be clearer as "to another cpu, when the cpu going offline
> reaches the CPU_DEAD state."
> 
> Otherwise it can read as "another cpu (which is) in the CPU_DEAD phase", which
> is not what you mean - I think.

Yes I will word this better.
> 
>> The CPU_DEAD notifiers are run only after the offline cpu sets its state as
>> CPU_DEAD. Meanwhile, the kthread doing the offline is scheduled out while
> 
> The kthread which is running on a different cpu from either of the first two
> cpus you've mentioned.

It could be running on any cpu other than the offline one. The next line
clarifies this - "This is fatal if the cpu on which this kthread was
running". I also say "Meanwhile, the kthread doing the offline" above so
as to clarify that the offline cpu has nothing to do with this kthread.

> 
>> waiting for this transition by queuing a timer. This is fatal because if the
>> cpu on which this kthread was running has no other work queued on it, it can
>> re-enter deep idle state, since it sees that a broadcast cpu still exists.
>> However the broadcast wakeup will never come since the cpu which was handling
>> it is offline, and this cpu never wakes up to see this because its in deep
>> idle state.
> 
> Which cpu is "this cpu"? I think you mean the one running the kthread which is
> doing the offline, but it's not 100% clear.

Right, I will make this correction as well, its ambiguous.

> 
>> Fix this by setting the broadcast timer to a max value so as to force the cpus
>> entering deep idle states henceforth to freshly nominate the broadcast cpu. More
>> importantly this has to be done in the CPU_DYING phase so that it is visible to
>> all cpus right after exiting stop_machine, which is when they can re-enter idle.
>> This ensures that handover of the broadcast duty falls in place on offline, without
>> having to do it explicitly.
> 
> OK, I don't know the code well enough to say if that's the right fix.
> 
> You say:
> 
> +	/* This allows fresh nomination of broadcast cpu */
> +	bc->next_event.tv64 = KTIME_MAX;
> 
> Is that all it does? I see that check in several places in the code.

This change does not affect those parts of the tick broadcast code,
which do not depend on the hrtimer broadcast framework. And for those
parts that do depend on this framework, this plays a critical role in
handing over the broadcast duty.

> 
> 
> I assume we're expecting Thomas to merge this?

Yes, Thomas can you please take this into the next 3.19 rc release ? I
will send out a new patch with the modified changelog. If you find this
acceptable I will port it to the relevant stable kernels.
> 
> If so it's probably worth mentioning that it fixes a bug we are seeing on

I will mention this in the changelog by pointing to the bug report.

Regards
Preeti U Murthy
> machines in the wild. So it'd be nice if it went into 3.19 and/or gets sent to
> stable.
> 
> cheers
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-01-20  6:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-19 10:26 [PATCH] tick/broadcast: Make movement of broadcast hrtimer robust against hotplug Preeti U Murthy
2015-01-19 10:26 ` Preeti U Murthy
2015-01-20  6:09 ` Michael Ellerman
2015-01-20  6:09   ` Michael Ellerman
2015-01-20  6:58   ` Preeti U Murthy
2015-01-20  6:58     ` Preeti U Murthy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.