All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable path
@ 2009-02-09 18:13 ` Alex Chiang
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Chiang @ 2009-02-09 18:13 UTC (permalink / raw)
  To: tony.luck; +Cc: Paul E. McKenney, stable, linux-ia64, linux-kernel

This is v2 of my attempt to prevent an oops while offlining CPUs.

The change is that the patch becomes a full revert of Paul's
original patch, along with a long changelog that explains the
situation as best as I can determine. It's not 100% satisfactory
to me right now, but the testing we've done supports the patch.

The 2nd patch in the series is mostly cosmetic, and removes a
redundant call to cpu_clear() that we no longer need().

Tony, if you agree with the rationale in 1/2, then this series is
a candidate for .29.

stable team, if Tony pushes upstream for .29, then this series
should be applied to the .27 and .28 stable series.

Thanks.

/ac


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable
@ 2009-02-09 18:13 ` Alex Chiang
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Chiang @ 2009-02-09 18:13 UTC (permalink / raw)
  To: tony.luck; +Cc: Paul E. McKenney, stable, linux-ia64, linux-kernel

This is v2 of my attempt to prevent an oops while offlining CPUs.

The change is that the patch becomes a full revert of Paul's
original patch, along with a long changelog that explains the
situation as best as I can determine. It's not 100% satisfactory
to me right now, but the testing we've done supports the patch.

The 2nd patch in the series is mostly cosmetic, and removes a
redundant call to cpu_clear() that we no longer need().

Tony, if you agree with the rationale in 1/2, then this series is
a candidate for .29.

stable team, if Tony pushes upstream for .29, then this series
should be applied to the .27 and .28 stable series.

Thanks.

/ac


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq handlers on offline CPUs"
  2009-02-09 18:13 ` [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable Alex Chiang
@ 2009-02-09 18:16   ` Alex Chiang
  -1 siblings, 0 replies; 17+ messages in thread
From: Alex Chiang @ 2009-02-09 18:16 UTC (permalink / raw)
  To: tony.luck, Paul E. McKenney, stable, linux-ia64, linux-kernel

This reverts commit e7b140365b86aaf94374214c6f4e6decbee2eb0a.

Commit e7b14036 removes the targetted disabled CPU from the
cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

Paul McKenney states that the reasoning behind the patch was to
prevent irq handlers from running on CPUs marked offline because:

	RCU happily ignores CPUs that don't have their bits set in
	cpu_online_map, so if there are RCU read-side critical sections
	in the irq handlers being run, RCU will ignore them.  If the
	other CPUs were running, they might sequence through the RCU
	state machine, which could result in data structures being
	yanked out from under those irq handlers, which in turn could
	result in oopses or worse.

Unfortunately, both ia64 functions above look at cpu_online_map to find
a new CPU to migrate interrupts onto. This means we can potentially
migrate an interrupt off ourself back to... ourself. Uh oh.

This causes an oops when we finally try to process pending interrupts on
the CPU we want to disable. The oops results from calling __do_IRQ with
a NULL pt_regs:

Unable to handle kernel NULL pointer dereference (address 0000000000000040)
Call Trace:
 [<a000000100016930>] show_stack+0x50/0xa0
                                sp=e0000009c922fa00 bsp=e0000009c92214d0
 [<a0000001000171a0>] show_regs+0x820/0x860
                                sp=e0000009c922fbd0 bsp=e0000009c9221478
 [<a00000010003c700>] die+0x1a0/0x2e0
                                sp=e0000009c922fbd0 bsp=e0000009c9221438
 [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
                                sp=e0000009c922fbd0 bsp=e0000009c92213d8
 [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000009c922fc60 bsp=e0000009c92213d8
 [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
                                sp=e0000009c922fe30 bsp=e0000009c9221398
 [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
                                sp=e0000009c922fe30 bsp=e0000009c9221330
 [<a00000010013a800>] handle_IRQ_event+0x80/0x120
                                sp=e0000009c922fe30 bsp=e0000009c92212f8
 [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
                                sp=e0000009c922fe30 bsp=e0000009c9221290
 [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
                                sp=e0000009c922fe30 bsp=e0000009c9221208
 [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
                                sp=e0000009c922fe30 bsp=e0000009c92211a8
 [<a00000010005bd80>] __cpu_disable+0x140/0x240
                                sp=e0000009c922fe30 bsp=e0000009c9221168
 [<a0000001006c5870>] take_cpu_down+0x50/0xa0
                                sp=e0000009c922fe30 bsp=e0000009c9221148
 [<a000000100122610>] stop_cpu+0xd0/0x200
                                sp=e0000009c922fe30 bsp=e0000009c92210f0
 [<a0000001000e0440>] kthread+0xc0/0x140
                                sp=e0000009c922fe30 bsp=e0000009c92210c8
 [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
                                sp=e0000009c922fe30 bsp=e0000009c92210a0
 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                sp=e0000009c922fe30 bsp=e0000009c92210a0

I don't like this revert because it is fragile. ia64 is getting lucky
because we seem to only ever process timer interrupts in this path, but
if we ever race with an IPI here, we definitely use RCU and have the
potential of hitting an oops that Paul describes above.

Patching ia64's timer_interrupt() to check for NULL pt_regs is
insufficient though, as we still hit the above oops.

As a short term solution, I do think that this revert is the right
answer. The revert hold up under repeated testing (24+ hour test runs)
with this setup:

	- 8-way rx6600
	- randomly toggling CPU online/offline state every 2 seconds
	- running CPU exercisers, memory hog, disk exercisers, and
	  network stressors
	- average system load around ~160

In the long term, we really need to figure out why we set pt_regs = NULL
in ia64_process_pending_intr(). If it turns out that it is unnecessary
to do so, then we could safely re-introduce e7b14036 (along with some
other logic to be smarter about migrating interrupts).

One final note: x86 also removes the disabled CPU from cpu_online_map
and then re-enables interrupts for 1ms, presumably to handle any pending
interrupts:

arch/x86/kernel/irq_32.c (and irq_64.c):
cpu_disable_common:
	[remove cpu from cpu_online_map]

	fixup_irqs():
		for_each_irq:
			[break CPU affinities]

		local_irq_enable();
		mdelay(1);
		local_irq_disable();

So they are doing implicitly what ia64 is doing explicitly.

Cc: stable@kernel.org
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
---
 arch/ia64/kernel/smpboot.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index 1146399..2ec5bbf 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -736,14 +736,16 @@ int __cpu_disable(void)
 			return -EBUSY;
 	}
 
+	cpu_clear(cpu, cpu_online_map);
+
 	if (migrate_platform_irqs(cpu)) {
 		cpu_set(cpu, cpu_online_map);
 		return (-EBUSY);
 	}
 
 	remove_siblinginfo(cpu);
-	fixup_irqs();
 	cpu_clear(cpu, cpu_online_map);
+	fixup_irqs();
 	local_flush_tlb_all();
 	cpu_clear(cpu, cpu_callin_map);
 	return 0;
-- 
1.6.0.1.161.g7f314


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq
@ 2009-02-09 18:16   ` Alex Chiang
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Chiang @ 2009-02-09 18:16 UTC (permalink / raw)
  To: tony.luck, Paul E. McKenney, stable, linux-ia64, linux-kernel

This reverts commit e7b140365b86aaf94374214c6f4e6decbee2eb0a.

Commit e7b14036 removes the targetted disabled CPU from the
cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

Paul McKenney states that the reasoning behind the patch was to
prevent irq handlers from running on CPUs marked offline because:

	RCU happily ignores CPUs that don't have their bits set in
	cpu_online_map, so if there are RCU read-side critical sections
	in the irq handlers being run, RCU will ignore them.  If the
	other CPUs were running, they might sequence through the RCU
	state machine, which could result in data structures being
	yanked out from under those irq handlers, which in turn could
	result in oopses or worse.

Unfortunately, both ia64 functions above look at cpu_online_map to find
a new CPU to migrate interrupts onto. This means we can potentially
migrate an interrupt off ourself back to... ourself. Uh oh.

This causes an oops when we finally try to process pending interrupts on
the CPU we want to disable. The oops results from calling __do_IRQ with
a NULL pt_regs:

Unable to handle kernel NULL pointer dereference (address 0000000000000040)
Call Trace:
 [<a000000100016930>] show_stack+0x50/0xa0
                                spà000009c922fa00 bspà000009c92214d0
 [<a0000001000171a0>] show_regs+0x820/0x860
                                spà000009c922fbd0 bspà000009c9221478
 [<a00000010003c700>] die+0x1a0/0x2e0
                                spà000009c922fbd0 bspà000009c9221438
 [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
                                spà000009c922fbd0 bspà000009c92213d8
 [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
                                spà000009c922fc60 bspà000009c92213d8
 [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
                                spà000009c922fe30 bspà000009c9221398
 [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
                                spà000009c922fe30 bspà000009c9221330
 [<a00000010013a800>] handle_IRQ_event+0x80/0x120
                                spà000009c922fe30 bspà000009c92212f8
 [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
                                spà000009c922fe30 bspà000009c9221290
 [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
                                spà000009c922fe30 bspà000009c9221208
 [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
                                spà000009c922fe30 bspà000009c92211a8
 [<a00000010005bd80>] __cpu_disable+0x140/0x240
                                spà000009c922fe30 bspà000009c9221168
 [<a0000001006c5870>] take_cpu_down+0x50/0xa0
                                spà000009c922fe30 bspà000009c9221148
 [<a000000100122610>] stop_cpu+0xd0/0x200
                                spà000009c922fe30 bspà000009c92210f0
 [<a0000001000e0440>] kthread+0xc0/0x140
                                spà000009c922fe30 bspà000009c92210c8
 [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
                                spà000009c922fe30 bspà000009c92210a0
 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                spà000009c922fe30 bspà000009c92210a0

I don't like this revert because it is fragile. ia64 is getting lucky
because we seem to only ever process timer interrupts in this path, but
if we ever race with an IPI here, we definitely use RCU and have the
potential of hitting an oops that Paul describes above.

Patching ia64's timer_interrupt() to check for NULL pt_regs is
insufficient though, as we still hit the above oops.

As a short term solution, I do think that this revert is the right
answer. The revert hold up under repeated testing (24+ hour test runs)
with this setup:

	- 8-way rx6600
	- randomly toggling CPU online/offline state every 2 seconds
	- running CPU exercisers, memory hog, disk exercisers, and
	  network stressors
	- average system load around ~160

In the long term, we really need to figure out why we set pt_regs = NULL
in ia64_process_pending_intr(). If it turns out that it is unnecessary
to do so, then we could safely re-introduce e7b14036 (along with some
other logic to be smarter about migrating interrupts).

One final note: x86 also removes the disabled CPU from cpu_online_map
and then re-enables interrupts for 1ms, presumably to handle any pending
interrupts:

arch/x86/kernel/irq_32.c (and irq_64.c):
cpu_disable_common:
	[remove cpu from cpu_online_map]

	fixup_irqs():
		for_each_irq:
			[break CPU affinities]

		local_irq_enable();
		mdelay(1);
		local_irq_disable();

So they are doing implicitly what ia64 is doing explicitly.

Cc: stable@kernel.org
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
---
 arch/ia64/kernel/smpboot.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index 1146399..2ec5bbf 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -736,14 +736,16 @@ int __cpu_disable(void)
 			return -EBUSY;
 	}
 
+	cpu_clear(cpu, cpu_online_map);
+
 	if (migrate_platform_irqs(cpu)) {
 		cpu_set(cpu, cpu_online_map);
 		return (-EBUSY);
 	}
 
 	remove_siblinginfo(cpu);
-	fixup_irqs();
 	cpu_clear(cpu, cpu_online_map);
+	fixup_irqs();
 	local_flush_tlb_all();
 	cpu_clear(cpu, cpu_callin_map);
 	return 0;
-- 
1.6.0.1.161.g7f314


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/2] ia64: Remove redundant cpu_clear() in __cpu_disable path
  2009-02-09 18:13 ` [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable Alex Chiang
@ 2009-02-09 18:16   ` Alex Chiang
  -1 siblings, 0 replies; 17+ messages in thread
From: Alex Chiang @ 2009-02-09 18:16 UTC (permalink / raw)
  To: tony.luck, Paul E. McKenney, stable, linux-ia64, linux-kernel

The second call to cpu_clear() is redundant, as we've already removed
the CPU from cpu_online_map before calling migrate_platform_irqs().

Cc: stable@kernel.org
Signed-off-by: Alex Chiang <achiang@hp.com>
---
 arch/ia64/kernel/smpboot.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index 2ec5bbf..5229054 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -740,11 +740,10 @@ int __cpu_disable(void)
 
 	if (migrate_platform_irqs(cpu)) {
 		cpu_set(cpu, cpu_online_map);
-		return (-EBUSY);
+		return -EBUSY;
 	}
 
 	remove_siblinginfo(cpu);
-	cpu_clear(cpu, cpu_online_map);
 	fixup_irqs();
 	local_flush_tlb_all();
 	cpu_clear(cpu, cpu_callin_map);
-- 
1.6.0.1.161.g7f314


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/2] ia64: Remove redundant cpu_clear() in __cpu_disable
@ 2009-02-09 18:16   ` Alex Chiang
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Chiang @ 2009-02-09 18:16 UTC (permalink / raw)
  To: tony.luck, Paul E. McKenney, stable, linux-ia64, linux-kernel

The second call to cpu_clear() is redundant, as we've already removed
the CPU from cpu_online_map before calling migrate_platform_irqs().

Cc: stable@kernel.org
Signed-off-by: Alex Chiang <achiang@hp.com>
---
 arch/ia64/kernel/smpboot.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index 2ec5bbf..5229054 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -740,11 +740,10 @@ int __cpu_disable(void)
 
 	if (migrate_platform_irqs(cpu)) {
 		cpu_set(cpu, cpu_online_map);
-		return (-EBUSY);
+		return -EBUSY;
 	}
 
 	remove_siblinginfo(cpu);
-	cpu_clear(cpu, cpu_online_map);
 	fixup_irqs();
 	local_flush_tlb_all();
 	cpu_clear(cpu, cpu_callin_map);
-- 
1.6.0.1.161.g7f314


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq handlers on offline CPUs"
  2009-02-09 18:16   ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq Alex Chiang
@ 2009-02-09 21:17     ` Alex Chiang
  -1 siblings, 0 replies; 17+ messages in thread
From: Alex Chiang @ 2009-02-09 21:17 UTC (permalink / raw)
  To: tony.luck, Paul E. McKenney, stable, linux-ia64, linux-kernel

Hi Tony,

* Alex Chiang <achiang@hp.com>:
> This reverts commit e7b140365b86aaf94374214c6f4e6decbee2eb0a.
> 
> Commit e7b14036 removes the targetted disabled CPU from the
> cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

I'm currently testing the patch below as a v3.

> Paul McKenney states that the reasoning behind the patch was to
> prevent irq handlers from running on CPUs marked offline because:
> 
> 	RCU happily ignores CPUs that don't have their bits set in
> 	cpu_online_map, so if there are RCU read-side critical sections
> 	in the irq handlers being run, RCU will ignore them.  If the
> 	other CPUs were running, they might sequence through the RCU
> 	state machine, which could result in data structures being
> 	yanked out from under those irq handlers, which in turn could
> 	result in oopses or worse.
> 
> Unfortunately, both ia64 functions above look at cpu_online_map to find
> a new CPU to migrate interrupts onto. This means we can potentially
> migrate an interrupt off ourself back to... ourself. Uh oh.

v3 uses cpu_active_mask to find an interrupt migration target.
This should fix both the oops we were seeing as well as avoid the
issues with RCU that Paul mentions above.

I also think that this fix is simpler for us to think through
rather than making Paul think through the implications of
changing RCU to use cpu_active_mask. :)

So far, it's survived ~45 minutes on my simple test bed (without
any patches, it usually crashes in < 15 minutes). I'm about to
start a longer run on our complex test system that runs under
heavy load.

Hopefully I'll have some results for tomorrow, in which case I'll
send a proper patch.

Thanks.

/ac

diff --git a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c
index a58f64c..9eaab3c 100644
--- a/arch/ia64/kernel/irq.c
+++ b/arch/ia64/kernel/irq.c
@@ -155,7 +155,7 @@ static void migrate_irqs(void)
 			 */
 			vectors_in_migration[irq] = irq;
 
-			new_cpu = cpumask_any(cpu_online_mask);
+			new_cpu = cpumask_any(cpu_active_mask);
 
 			/*
 			 * Al three are essential, currently WARN_ON.. maybe panic?
diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index 1146399..4e8765d 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -694,7 +694,7 @@ int migrate_platform_irqs(unsigned int cpu)
 			/*
 			 * Now re-target the CPEI to a different processor
 			 */
-			new_cpei_cpu = any_online_cpu(cpu_online_map);
+			new_cpei_cpu = cpumask_any(cpu_active_mask);
 			mask = cpumask_of(new_cpei_cpu);
 			set_cpei_target_cpu(new_cpei_cpu);
 			desc = irq_desc + ia64_cpe_irq;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq
@ 2009-02-09 21:17     ` Alex Chiang
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Chiang @ 2009-02-09 21:17 UTC (permalink / raw)
  To: tony.luck, Paul E. McKenney, stable, linux-ia64, linux-kernel

Hi Tony,

* Alex Chiang <achiang@hp.com>:
> This reverts commit e7b140365b86aaf94374214c6f4e6decbee2eb0a.
> 
> Commit e7b14036 removes the targetted disabled CPU from the
> cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

I'm currently testing the patch below as a v3.

> Paul McKenney states that the reasoning behind the patch was to
> prevent irq handlers from running on CPUs marked offline because:
> 
> 	RCU happily ignores CPUs that don't have their bits set in
> 	cpu_online_map, so if there are RCU read-side critical sections
> 	in the irq handlers being run, RCU will ignore them.  If the
> 	other CPUs were running, they might sequence through the RCU
> 	state machine, which could result in data structures being
> 	yanked out from under those irq handlers, which in turn could
> 	result in oopses or worse.
> 
> Unfortunately, both ia64 functions above look at cpu_online_map to find
> a new CPU to migrate interrupts onto. This means we can potentially
> migrate an interrupt off ourself back to... ourself. Uh oh.

v3 uses cpu_active_mask to find an interrupt migration target.
This should fix both the oops we were seeing as well as avoid the
issues with RCU that Paul mentions above.

I also think that this fix is simpler for us to think through
rather than making Paul think through the implications of
changing RCU to use cpu_active_mask. :)

So far, it's survived ~45 minutes on my simple test bed (without
any patches, it usually crashes in < 15 minutes). I'm about to
start a longer run on our complex test system that runs under
heavy load.

Hopefully I'll have some results for tomorrow, in which case I'll
send a proper patch.

Thanks.

/ac

diff --git a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c
index a58f64c..9eaab3c 100644
--- a/arch/ia64/kernel/irq.c
+++ b/arch/ia64/kernel/irq.c
@@ -155,7 +155,7 @@ static void migrate_irqs(void)
 			 */
 			vectors_in_migration[irq] = irq;
 
-			new_cpu = cpumask_any(cpu_online_mask);
+			new_cpu = cpumask_any(cpu_active_mask);
 
 			/*
 			 * Al three are essential, currently WARN_ON.. maybe panic?
diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index 1146399..4e8765d 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -694,7 +694,7 @@ int migrate_platform_irqs(unsigned int cpu)
 			/*
 			 * Now re-target the CPEI to a different processor
 			 */
-			new_cpei_cpu = any_online_cpu(cpu_online_map);
+			new_cpei_cpu = cpumask_any(cpu_active_mask);
 			mask = cpumask_of(new_cpei_cpu);
 			set_cpei_target_cpu(new_cpei_cpu);
 			desc = irq_desc + ia64_cpe_irq;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq handlers on offline CPUs"
  2009-02-09 21:17     ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq Alex Chiang
@ 2009-02-09 23:33       ` Alex Chiang
  -1 siblings, 0 replies; 17+ messages in thread
From: Alex Chiang @ 2009-02-09 23:33 UTC (permalink / raw)
  To: tony.luck, Paul E. McKenney, stable, linux-ia64, linux-kernel

Hi Tony,

* Alex Chiang <achiang@hp.com>:
> * Alex Chiang <achiang@hp.com>:
> > This reverts commit e7b140365b86aaf94374214c6f4e6decbee2eb0a.
> > 
> > Commit e7b14036 removes the targetted disabled CPU from the
> > cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.
> 
> I'm currently testing the patch below as a v3.
> 
> > Paul McKenney states that the reasoning behind the patch was to
> > prevent irq handlers from running on CPUs marked offline because:
> > 
> > 	RCU happily ignores CPUs that don't have their bits set in
> > 	cpu_online_map, so if there are RCU read-side critical sections
> > 	in the irq handlers being run, RCU will ignore them.  If the
> > 	other CPUs were running, they might sequence through the RCU
> > 	state machine, which could result in data structures being
> > 	yanked out from under those irq handlers, which in turn could
> > 	result in oopses or worse.
> > 
> > Unfortunately, both ia64 functions above look at cpu_online_map to find
> > a new CPU to migrate interrupts onto. This means we can potentially
> > migrate an interrupt off ourself back to... ourself. Uh oh.
> 
> v3 uses cpu_active_mask to find an interrupt migration target.
> This should fix both the oops we were seeing as well as avoid the
> issues with RCU that Paul mentions above.
> 
> I also think that this fix is simpler for us to think through
> rather than making Paul think through the implications of
> changing RCU to use cpu_active_mask. :)
> 
> So far, it's survived ~45 minutes on my simple test bed (without
> any patches, it usually crashes in < 15 minutes). I'm about to
> start a longer run on our complex test system that runs under
> heavy load.

NAK this patch, I was able to reproduce the crash again.

I'm a little closer to understanding why the original revert
survives my test though.

It seems that during ia64_process_pending_intr(), we will skip
TLB flushes, and IPI reschedules.

Vectors lower than IA64_TIMER_VECTOR are masked (because we raise
the TPR), meaning we won't see CMC/CPE interrupts or perfmon
interrupts.

This leaves only IPIs and MCA above IA64_TIMER_VECTOR. The kernel
doesn't actually send many IPIs to itself, so in practice, we
almost never see those.  If we receive an MCA interrupt, well, we
have more problems to worry about than taking a CPU offline (and
whatever implications it may have on RCU). So I'm not concerned
there.

The upshot is that in practice, we pretty much ever only need to
handle the timer interrupt.

The ia64 implementation of timer_interrupt() has this near the
top of the function:

        if (unlikely(cpu_is_offline(smp_processor_id()))) {
		return IRQ_HANDLED;
	}

So if we remove the CPU from cpu_online_map before performing any
of the interrupt migration stuff in __cpu_disable(), I think
we're safe.

Please have a think about this, and re-consider my v2 patch
series for inclusion into .29, and let me know what you think.

Thanks.

/ac

> 
> Hopefully I'll have some results for tomorrow, in which case I'll
> send a proper patch.
> 
> Thanks.
> 
> /ac
> 
> diff --git a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c
> index a58f64c..9eaab3c 100644
> --- a/arch/ia64/kernel/irq.c
> +++ b/arch/ia64/kernel/irq.c
> @@ -155,7 +155,7 @@ static void migrate_irqs(void)
>  			 */
>  			vectors_in_migration[irq] = irq;
>  
> -			new_cpu = cpumask_any(cpu_online_mask);
> +			new_cpu = cpumask_any(cpu_active_mask);
>  
>  			/*
>  			 * Al three are essential, currently WARN_ON.. maybe panic?
> diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
> index 1146399..4e8765d 100644
> --- a/arch/ia64/kernel/smpboot.c
> +++ b/arch/ia64/kernel/smpboot.c
> @@ -694,7 +694,7 @@ int migrate_platform_irqs(unsigned int cpu)
>  			/*
>  			 * Now re-target the CPEI to a different processor
>  			 */
> -			new_cpei_cpu = any_online_cpu(cpu_online_map);
> +			new_cpei_cpu = cpumask_any(cpu_active_mask);
>  			mask = cpumask_of(new_cpei_cpu);
>  			set_cpei_target_cpu(new_cpei_cpu);
>  			desc = irq_desc + ia64_cpe_irq;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq
@ 2009-02-09 23:33       ` Alex Chiang
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Chiang @ 2009-02-09 23:33 UTC (permalink / raw)
  To: tony.luck, Paul E. McKenney, stable, linux-ia64, linux-kernel

Hi Tony,

* Alex Chiang <achiang@hp.com>:
> * Alex Chiang <achiang@hp.com>:
> > This reverts commit e7b140365b86aaf94374214c6f4e6decbee2eb0a.
> > 
> > Commit e7b14036 removes the targetted disabled CPU from the
> > cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.
> 
> I'm currently testing the patch below as a v3.
> 
> > Paul McKenney states that the reasoning behind the patch was to
> > prevent irq handlers from running on CPUs marked offline because:
> > 
> > 	RCU happily ignores CPUs that don't have their bits set in
> > 	cpu_online_map, so if there are RCU read-side critical sections
> > 	in the irq handlers being run, RCU will ignore them.  If the
> > 	other CPUs were running, they might sequence through the RCU
> > 	state machine, which could result in data structures being
> > 	yanked out from under those irq handlers, which in turn could
> > 	result in oopses or worse.
> > 
> > Unfortunately, both ia64 functions above look at cpu_online_map to find
> > a new CPU to migrate interrupts onto. This means we can potentially
> > migrate an interrupt off ourself back to... ourself. Uh oh.
> 
> v3 uses cpu_active_mask to find an interrupt migration target.
> This should fix both the oops we were seeing as well as avoid the
> issues with RCU that Paul mentions above.
> 
> I also think that this fix is simpler for us to think through
> rather than making Paul think through the implications of
> changing RCU to use cpu_active_mask. :)
> 
> So far, it's survived ~45 minutes on my simple test bed (without
> any patches, it usually crashes in < 15 minutes). I'm about to
> start a longer run on our complex test system that runs under
> heavy load.

NAK this patch, I was able to reproduce the crash again.

I'm a little closer to understanding why the original revert
survives my test though.

It seems that during ia64_process_pending_intr(), we will skip
TLB flushes, and IPI reschedules.

Vectors lower than IA64_TIMER_VECTOR are masked (because we raise
the TPR), meaning we won't see CMC/CPE interrupts or perfmon
interrupts.

This leaves only IPIs and MCA above IA64_TIMER_VECTOR. The kernel
doesn't actually send many IPIs to itself, so in practice, we
almost never see those.  If we receive an MCA interrupt, well, we
have more problems to worry about than taking a CPU offline (and
whatever implications it may have on RCU). So I'm not concerned
there.

The upshot is that in practice, we pretty much ever only need to
handle the timer interrupt.

The ia64 implementation of timer_interrupt() has this near the
top of the function:

        if (unlikely(cpu_is_offline(smp_processor_id()))) {
		return IRQ_HANDLED;
	}

So if we remove the CPU from cpu_online_map before performing any
of the interrupt migration stuff in __cpu_disable(), I think
we're safe.

Please have a think about this, and re-consider my v2 patch
series for inclusion into .29, and let me know what you think.

Thanks.

/ac

> 
> Hopefully I'll have some results for tomorrow, in which case I'll
> send a proper patch.
> 
> Thanks.
> 
> /ac
> 
> diff --git a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c
> index a58f64c..9eaab3c 100644
> --- a/arch/ia64/kernel/irq.c
> +++ b/arch/ia64/kernel/irq.c
> @@ -155,7 +155,7 @@ static void migrate_irqs(void)
>  			 */
>  			vectors_in_migration[irq] = irq;
>  
> -			new_cpu = cpumask_any(cpu_online_mask);
> +			new_cpu = cpumask_any(cpu_active_mask);
>  
>  			/*
>  			 * Al three are essential, currently WARN_ON.. maybe panic?
> diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
> index 1146399..4e8765d 100644
> --- a/arch/ia64/kernel/smpboot.c
> +++ b/arch/ia64/kernel/smpboot.c
> @@ -694,7 +694,7 @@ int migrate_platform_irqs(unsigned int cpu)
>  			/*
>  			 * Now re-target the CPEI to a different processor
>  			 */
> -			new_cpei_cpu = any_online_cpu(cpu_online_map);
> +			new_cpei_cpu = cpumask_any(cpu_active_mask);
>  			mask = cpumask_of(new_cpei_cpu);
>  			set_cpei_target_cpu(new_cpei_cpu);
>  			desc = irq_desc + ia64_cpe_irq;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq handlers on offline CPUs"
  2009-02-09 23:33       ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq Alex Chiang
@ 2009-02-09 23:52         ` Russ Anderson
  -1 siblings, 0 replies; 17+ messages in thread
From: Russ Anderson @ 2009-02-09 23:52 UTC (permalink / raw)
  To: Alex Chiang, tony.luck, Paul E. McKenney, stable, linux-ia64,
	linux-kernel
  Cc: rja

On Mon, Feb 09, 2009 at 04:33:24PM -0700, Alex Chiang wrote:
> 
> I'm a little closer to understanding why the original revert
> survives my test though.
> 
> It seems that during ia64_process_pending_intr(), we will skip
> TLB flushes, and IPI reschedules.
> 
> Vectors lower than IA64_TIMER_VECTOR are masked (because we raise
> the TPR), meaning we won't see CMC/CPE interrupts or perfmon
> interrupts.
> 
> This leaves only IPIs and MCA above IA64_TIMER_VECTOR. The kernel
> doesn't actually send many IPIs to itself, so in practice, we
> almost never see those.  If we receive an MCA interrupt, well, we
> have more problems to worry about than taking a CPU offline (and
> whatever implications it may have on RCU). So I'm not concerned
> there.

Keep in mind there are recoverable MCAs on ia64.  It should
be a rare condition to have an MCA surface while taking a CPU
offline, but it could happen.

My main point is to make sure people do not assume that an MCA means
the system is going down.  

> The upshot is that in practice, we pretty much ever only need to
> handle the timer interrupt.

Thanks.
-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq handlers on offline CPUs"
@ 2009-02-09 23:52         ` Russ Anderson
  0 siblings, 0 replies; 17+ messages in thread
From: Russ Anderson @ 2009-02-09 23:52 UTC (permalink / raw)
  To: Alex Chiang, tony.luck, Paul E. McKenney, stable, linux-ia64,
	linux-kernel
  Cc: rja

On Mon, Feb 09, 2009 at 04:33:24PM -0700, Alex Chiang wrote:
> 
> I'm a little closer to understanding why the original revert
> survives my test though.
> 
> It seems that during ia64_process_pending_intr(), we will skip
> TLB flushes, and IPI reschedules.
> 
> Vectors lower than IA64_TIMER_VECTOR are masked (because we raise
> the TPR), meaning we won't see CMC/CPE interrupts or perfmon
> interrupts.
> 
> This leaves only IPIs and MCA above IA64_TIMER_VECTOR. The kernel
> doesn't actually send many IPIs to itself, so in practice, we
> almost never see those.  If we receive an MCA interrupt, well, we
> have more problems to worry about than taking a CPU offline (and
> whatever implications it may have on RCU). So I'm not concerned
> there.

Keep in mind there are recoverable MCAs on ia64.  It should
be a rare condition to have an MCA surface while taking a CPU
offline, but it could happen.

My main point is to make sure people do not assume that an MCA means
the system is going down.  

> The upshot is that in practice, we pretty much ever only need to
> handle the timer interrupt.

Thanks.
-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable path
  2009-02-09 18:13 ` [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable Alex Chiang
@ 2009-02-10 12:36   ` Paul E. McKenney
  -1 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2009-02-10 12:36 UTC (permalink / raw)
  To: Alex Chiang, tony.luck, stable, linux-ia64, linux-kernel

On Mon, Feb 09, 2009 at 11:13:38AM -0700, Alex Chiang wrote:
> This is v2 of my attempt to prevent an oops while offlining CPUs.
> 
> The change is that the patch becomes a full revert of Paul's
> original patch, along with a long changelog that explains the
> situation as best as I can determine. It's not 100% satisfactory
> to me right now, but the testing we've done supports the patch.
> 
> The 2nd patch in the series is mostly cosmetic, and removes a
> redundant call to cpu_clear() that we no longer need().
> 
> Tony, if you agree with the rationale in 1/2, then this series is
> a candidate for .29.
> 
> stable team, if Tony pushes upstream for .29, then this series
> should be applied to the .27 and .28 stable series.

OK, I'll bite...

Why not use cpu_active_map rather than cpu_online_map to select which
CPU to migrate interrupts to?  That way, we can delay clearing the
bit in cpu_online_map and avoid the questionable scenario where irqs
are being handled by a CPU that appears to be offline.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/2] ia64: prevent irq migration race in
@ 2009-02-10 12:36   ` Paul E. McKenney
  0 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2009-02-10 12:36 UTC (permalink / raw)
  To: Alex Chiang, tony.luck, stable, linux-ia64, linux-kernel

On Mon, Feb 09, 2009 at 11:13:38AM -0700, Alex Chiang wrote:
> This is v2 of my attempt to prevent an oops while offlining CPUs.
> 
> The change is that the patch becomes a full revert of Paul's
> original patch, along with a long changelog that explains the
> situation as best as I can determine. It's not 100% satisfactory
> to me right now, but the testing we've done supports the patch.
> 
> The 2nd patch in the series is mostly cosmetic, and removes a
> redundant call to cpu_clear() that we no longer need().
> 
> Tony, if you agree with the rationale in 1/2, then this series is
> a candidate for .29.
> 
> stable team, if Tony pushes upstream for .29, then this series
> should be applied to the .27 and .28 stable series.

OK, I'll bite...

Why not use cpu_active_map rather than cpu_online_map to select which
CPU to migrate interrupts to?  That way, we can delay clearing the
bit in cpu_online_map and avoid the questionable scenario where irqs
are being handled by a CPU that appears to be offline.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable path
  2009-02-10 12:36   ` [PATCH v2 0/2] ia64: prevent irq migration race in Paul E. McKenney
@ 2009-02-10 16:11     ` Alex Chiang
  -1 siblings, 0 replies; 17+ messages in thread
From: Alex Chiang @ 2009-02-10 16:11 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: tony.luck, stable, linux-ia64, linux-kernel

* Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> On Mon, Feb 09, 2009 at 11:13:38AM -0700, Alex Chiang wrote:
> > This is v2 of my attempt to prevent an oops while offlining CPUs.
> > 
> > The change is that the patch becomes a full revert of Paul's
> > original patch, along with a long changelog that explains the
> > situation as best as I can determine. It's not 100% satisfactory
> > to me right now, but the testing we've done supports the patch.
> > 
> > The 2nd patch in the series is mostly cosmetic, and removes a
> > redundant call to cpu_clear() that we no longer need().
> > 
> > Tony, if you agree with the rationale in 1/2, then this series is
> > a candidate for .29.
> > 
> > stable team, if Tony pushes upstream for .29, then this series
> > should be applied to the .27 and .28 stable series.
> 
> OK, I'll bite...
> 
> Why not use cpu_active_map rather than cpu_online_map to select which
> CPU to migrate interrupts to?  That way, we can delay clearing the
> bit in cpu_online_map and avoid the questionable scenario where irqs
> are being handled by a CPU that appears to be offline.

I did explain a little bit yesterday here:

	http://lkml.org/lkml/2009/2/9/508

The upshot is that on ia64, in the cpu_down() path, in practice,
we're only seeing the timer interrupt fire, even on a heavily
loaded system with lots of I/O.

And in our timer interrupt routine, we're checking to make sure
that the CPU is online before handling the interrupt.

So at least empirically, we don't seem to allow any offline CPUs
to handle interrupts.

I played around with cpu_active_map yesterday, and realized the
patch I posted was incomplete. When I started fleshing it out a
bit more, I learned that we're simply not using cpu_active_map in
the kernel to the extent that we're using cpu_online_map, and I'm
a bit hesitant to start introducing regressions because I missed
a usage somewhere.

With this below patch, I can't even offline a single CPU, and the
patch is already twice as big as the revert. At this point, the
revert has held up to testing, and in my view, is the clear short
term winner.

I can keep exploring the cpu_active_mask option, but that would
be a .30 activity, and I'd like to get this particular oops fixed
for .29.

Seem like a reasonable way forward?

Thanks.

/ac


diff --git a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c
index a58f64c..9eaab3c 100644
--- a/arch/ia64/kernel/irq.c
+++ b/arch/ia64/kernel/irq.c
@@ -155,7 +155,7 @@ static void migrate_irqs(void)
 			 */
 			vectors_in_migration[irq] = irq;
 
-			new_cpu = cpumask_any(cpu_online_mask);
+			new_cpu = cpumask_any(cpu_active_mask);
 
 			/*
 			 * Al three are essential, currently WARN_ON.. maybe panic?
diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index 1146399..a08175b 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -396,7 +396,8 @@ smp_callin (void)
 	/* Setup the per cpu irq handling data structures */
 	__setup_vector_irq(cpuid);
 	notify_cpu_starting(cpuid);
-	cpu_set(cpuid, cpu_online_map);
+	set_cpu_online(cpuid, true);
+	set_cpu_active(cpuid, true);
 	per_cpu(cpu_state, cpuid) = CPU_ONLINE;
 	spin_unlock(&vector_lock);
 	ipi_call_unlock_irq();
@@ -694,7 +695,7 @@ int migrate_platform_irqs(unsigned int cpu)
 			/*
 			 * Now re-target the CPEI to a different processor
 			 */
-			new_cpei_cpu = any_online_cpu(cpu_online_map);
+			new_cpei_cpu = cpumask_any(cpu_active_mask);
 			mask = cpumask_of(new_cpei_cpu);
 			set_cpei_target_cpu(new_cpei_cpu);
 			desc = irq_desc + ia64_cpe_irq;
diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index f0ebb34..f8ae866 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -158,7 +158,7 @@ timer_interrupt (int irq, void *dev_id)
 {
 	unsigned long new_itm;
 
-	if (unlikely(cpu_is_offline(smp_processor_id()))) {
+	if (unlikely(!cpu_active(smp_processor_id()))) {
 		return IRQ_HANDLED;
 	}
 
diff --git a/init/main.c b/init/main.c
index 8442094..c126d23 100644
--- a/init/main.c
+++ b/init/main.c
@@ -514,6 +514,7 @@ static void __init boot_cpu_init(void)
 	int cpu = smp_processor_id();
 	/* Mark the boot cpu "present", "online" etc for SMP and UP case */
 	set_cpu_online(cpu, true);
+	set_cpu_active(cpu, true);
 	set_cpu_present(cpu, true);
 	set_cpu_possible(cpu, true);
 }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/2] ia64: prevent irq migration race in
@ 2009-02-10 16:11     ` Alex Chiang
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Chiang @ 2009-02-10 16:11 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: tony.luck, stable, linux-ia64, linux-kernel

* Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> On Mon, Feb 09, 2009 at 11:13:38AM -0700, Alex Chiang wrote:
> > This is v2 of my attempt to prevent an oops while offlining CPUs.
> > 
> > The change is that the patch becomes a full revert of Paul's
> > original patch, along with a long changelog that explains the
> > situation as best as I can determine. It's not 100% satisfactory
> > to me right now, but the testing we've done supports the patch.
> > 
> > The 2nd patch in the series is mostly cosmetic, and removes a
> > redundant call to cpu_clear() that we no longer need().
> > 
> > Tony, if you agree with the rationale in 1/2, then this series is
> > a candidate for .29.
> > 
> > stable team, if Tony pushes upstream for .29, then this series
> > should be applied to the .27 and .28 stable series.
> 
> OK, I'll bite...
> 
> Why not use cpu_active_map rather than cpu_online_map to select which
> CPU to migrate interrupts to?  That way, we can delay clearing the
> bit in cpu_online_map and avoid the questionable scenario where irqs
> are being handled by a CPU that appears to be offline.

I did explain a little bit yesterday here:

	http://lkml.org/lkml/2009/2/9/508

The upshot is that on ia64, in the cpu_down() path, in practice,
we're only seeing the timer interrupt fire, even on a heavily
loaded system with lots of I/O.

And in our timer interrupt routine, we're checking to make sure
that the CPU is online before handling the interrupt.

So at least empirically, we don't seem to allow any offline CPUs
to handle interrupts.

I played around with cpu_active_map yesterday, and realized the
patch I posted was incomplete. When I started fleshing it out a
bit more, I learned that we're simply not using cpu_active_map in
the kernel to the extent that we're using cpu_online_map, and I'm
a bit hesitant to start introducing regressions because I missed
a usage somewhere.

With this below patch, I can't even offline a single CPU, and the
patch is already twice as big as the revert. At this point, the
revert has held up to testing, and in my view, is the clear short
term winner.

I can keep exploring the cpu_active_mask option, but that would
be a .30 activity, and I'd like to get this particular oops fixed
for .29.

Seem like a reasonable way forward?

Thanks.

/ac


diff --git a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c
index a58f64c..9eaab3c 100644
--- a/arch/ia64/kernel/irq.c
+++ b/arch/ia64/kernel/irq.c
@@ -155,7 +155,7 @@ static void migrate_irqs(void)
 			 */
 			vectors_in_migration[irq] = irq;
 
-			new_cpu = cpumask_any(cpu_online_mask);
+			new_cpu = cpumask_any(cpu_active_mask);
 
 			/*
 			 * Al three are essential, currently WARN_ON.. maybe panic?
diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index 1146399..a08175b 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -396,7 +396,8 @@ smp_callin (void)
 	/* Setup the per cpu irq handling data structures */
 	__setup_vector_irq(cpuid);
 	notify_cpu_starting(cpuid);
-	cpu_set(cpuid, cpu_online_map);
+	set_cpu_online(cpuid, true);
+	set_cpu_active(cpuid, true);
 	per_cpu(cpu_state, cpuid) = CPU_ONLINE;
 	spin_unlock(&vector_lock);
 	ipi_call_unlock_irq();
@@ -694,7 +695,7 @@ int migrate_platform_irqs(unsigned int cpu)
 			/*
 			 * Now re-target the CPEI to a different processor
 			 */
-			new_cpei_cpu = any_online_cpu(cpu_online_map);
+			new_cpei_cpu = cpumask_any(cpu_active_mask);
 			mask = cpumask_of(new_cpei_cpu);
 			set_cpei_target_cpu(new_cpei_cpu);
 			desc = irq_desc + ia64_cpe_irq;
diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index f0ebb34..f8ae866 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -158,7 +158,7 @@ timer_interrupt (int irq, void *dev_id)
 {
 	unsigned long new_itm;
 
-	if (unlikely(cpu_is_offline(smp_processor_id()))) {
+	if (unlikely(!cpu_active(smp_processor_id()))) {
 		return IRQ_HANDLED;
 	}
 
diff --git a/init/main.c b/init/main.c
index 8442094..c126d23 100644
--- a/init/main.c
+++ b/init/main.c
@@ -514,6 +514,7 @@ static void __init boot_cpu_init(void)
 	int cpu = smp_processor_id();
 	/* Mark the boot cpu "present", "online" etc for SMP and UP case */
 	set_cpu_online(cpu, true);
+	set_cpu_active(cpu, true);
 	set_cpu_present(cpu, true);
 	set_cpu_possible(cpu, true);
 }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [APPLIED] [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq
  2009-02-09 21:17     ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq Alex Chiang
  (?)
  (?)
@ 2009-11-12 22:40     ` Tony Lindgren
  -1 siblings, 0 replies; 17+ messages in thread
From: Tony Lindgren @ 2009-11-12 22:40 UTC (permalink / raw)
  To: linux-omap

This patch has been applied to the linux-omap
by youw fwiendly patch wobot.

Branch in linux-omap: for-next

Initial commit ID (Likely to change): 742eb23e178bf582b4a35633fc5470cbe7785166

PatchWorks
http://patchwork.kernel.org/patch/6295/

Git (Likely to change, and takes a while to get mirrored)
http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap-2.6.git;a=commit;h=742eb23e178bf582b4a35633fc5470cbe7785166



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-11-12 22:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-09 18:13 [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable path Alex Chiang
2009-02-09 18:13 ` [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable Alex Chiang
2009-02-09 18:16 ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq handlers on offline CPUs" Alex Chiang
2009-02-09 18:16   ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq Alex Chiang
2009-02-09 21:17   ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq handlers on offline CPUs" Alex Chiang
2009-02-09 21:17     ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq Alex Chiang
2009-02-09 23:33     ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq handlers on offline CPUs" Alex Chiang
2009-02-09 23:33       ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq Alex Chiang
2009-02-09 23:52       ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq handlers on offline CPUs" Russ Anderson
2009-02-09 23:52         ` Russ Anderson
2009-11-12 22:40     ` [APPLIED] [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq Tony Lindgren
2009-02-09 18:16 ` [PATCH v2 2/2] ia64: Remove redundant cpu_clear() in __cpu_disable path Alex Chiang
2009-02-09 18:16   ` [PATCH v2 2/2] ia64: Remove redundant cpu_clear() in __cpu_disable Alex Chiang
2009-02-10 12:36 ` [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable path Paul E. McKenney
2009-02-10 12:36   ` [PATCH v2 0/2] ia64: prevent irq migration race in Paul E. McKenney
2009-02-10 16:11   ` [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable path Alex Chiang
2009-02-10 16:11     ` [PATCH v2 0/2] ia64: prevent irq migration race in Alex Chiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.