linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH]broadcast IPI race condition on CPU hotplug
@ 2005-04-26  2:20 Li Shaohua
  2005-04-26  4:25 ` Zwane Mwaikambo
  2005-04-26 13:21 ` Andi Kleen
  0 siblings, 2 replies; 10+ messages in thread
From: Li Shaohua @ 2005-04-26  2:20 UTC (permalink / raw)
  To: lkml; +Cc: Andrew Morton, Andi Kleen, Zwane Mwaikambo

Hi,
After a CPU is booted but before it's officially up (set online map, and
enable interrupt), the CPU possibly will receive a broadcast IPI. After
it's up, it will handle the stale interrupt soon and maybe cause oops if
it's a smp-call-function-interrupt. This is quite possible in CPU
hotplug case, but nearly can't occur at boot time. Below patch replaces
broadcast IPI with send_ipi_mask just like the cluster mode.

Thanks,
Shaohua

Signed-off-by: Li Shaohua<shaohua.li@intel.com>

--- a/include/asm-i386/mach-default/mach_ipi.h	2005-04-26 09:07:51.695390216 +0800
+++ b/include/asm-i386/mach-default/mach_ipi.h	2005-04-26 09:26:59.235937536 +0800
@@ -11,20 +11,16 @@ static inline void send_IPI_mask(cpumask
 
 static inline void send_IPI_allbutself(int vector)
 {
-	/*
-	 * if there are no other CPUs in the system then we get an APIC send 
-	 * error if we try to broadcast, thus avoid sending IPIs in this case.
-	 */
-	if (!(num_online_cpus() > 1))
-		return;
+	cpumask_t mask = cpu_online_map;
+	cpu_clear(smp_processor_id(), mask);
 
-	__send_IPI_shortcut(APIC_DEST_ALLBUT, vector);
-	return;
+	if (likely(!cpus_empty(mask)))
+		send_IPI_mask(mask, vector);
 }
 
 static inline void send_IPI_all(int vector)
 {
-	__send_IPI_shortcut(APIC_DEST_ALLINC, vector);
+	send_IPI_mask(cpu_online_map, vector);
 }
 
 #endif /* __ASM_MACH_IPI_H */
--- a/arch/x86_64/kernel/genapic_flat.c	2005-03-02 15:38:09.000000000 +0800
+++ b/arch/x86_64/kernel/genapic_flat.c	2005-04-26 09:26:09.298529176 +0800
@@ -45,20 +45,19 @@ static void flat_init_apic_ldr(void)
 	apic_write_around(APIC_LDR, val);
 }
 
+static void flat_send_IPI_mask(cpumask_t cpumask, int vector);
 static void flat_send_IPI_allbutself(int vector)
 {
-	/*
-	 * if there are no other CPUs in the system then
-	 * we get an APIC send error if we try to broadcast.
-	 * thus we have to avoid sending IPIs in this case.
-	 */
-	if (num_online_cpus() > 1)
-		__send_IPI_shortcut(APIC_DEST_ALLBUT, vector, APIC_DEST_LOGICAL);
+	cpumask_t mask = cpu_online_map;
+	cpu_clear(smp_processor_id(), mask);
+
+	if (likely(!cpus_empty(mask)))
+		flat_send_IPI_mask(mask, vector);
 }
 
 static void flat_send_IPI_all(int vector)
 {
-	__send_IPI_shortcut(APIC_DEST_ALLINC, vector, APIC_DEST_LOGICAL);
+	flat_send_IPI_mask(cpu_online_map, vector);
 }
 
 static void flat_send_IPI_mask(cpumask_t cpumask, int vector)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH]broadcast IPI race condition on CPU hotplug
  2005-04-26  4:25 ` Zwane Mwaikambo
@ 2005-04-26  4:24   ` Li Shaohua
  2005-04-26  4:47     ` Zwane Mwaikambo
  0 siblings, 1 reply; 10+ messages in thread
From: Li Shaohua @ 2005-04-26  4:24 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: lkml, Andrew Morton, Andi Kleen

Hi,
On Tue, 2005-04-26 at 12:25, Zwane Mwaikambo wrote:
> 
> > After a CPU is booted but before it's officially up (set online map, and
> > enable interrupt), the CPU possibly will receive a broadcast IPI. After
> > it's up, it will handle the stale interrupt soon and maybe cause oops if
> > it's a smp-call-function-interrupt. This is quite possible in CPU
> > hotplug case, but nearly can't occur at boot time. Below patch replaces
> > broadcast IPI with send_ipi_mask just like the cluster mode.
> 
> Ok, but isn't it sufficient to use send_ipi_mask in smp_call_function 
> instead?
I'm not sure if other routines using broadcast IPI have this bug. Fixing
the send_ipi_all API looks more generic. Is there any reason we should
use broadcast IPI?

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH]broadcast IPI race condition on CPU hotplug
  2005-04-26  2:20 [PATCH]broadcast IPI race condition on CPU hotplug Li Shaohua
@ 2005-04-26  4:25 ` Zwane Mwaikambo
  2005-04-26  4:24   ` Li Shaohua
  2005-04-26 13:21 ` Andi Kleen
  1 sibling, 1 reply; 10+ messages in thread
From: Zwane Mwaikambo @ 2005-04-26  4:25 UTC (permalink / raw)
  To: Li Shaohua; +Cc: lkml, Andrew Morton, Andi Kleen

Hi Shaohua,

On Tue, 26 Apr 2005, Li Shaohua wrote:

> After a CPU is booted but before it's officially up (set online map, and
> enable interrupt), the CPU possibly will receive a broadcast IPI. After
> it's up, it will handle the stale interrupt soon and maybe cause oops if
> it's a smp-call-function-interrupt. This is quite possible in CPU
> hotplug case, but nearly can't occur at boot time. Below patch replaces
> broadcast IPI with send_ipi_mask just like the cluster mode.

Ok, but isn't it sufficient to use send_ipi_mask in smp_call_function 
instead?

Thanks,
	Zwane

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH]broadcast IPI race condition on CPU hotplug
  2005-04-26  4:24   ` Li Shaohua
@ 2005-04-26  4:47     ` Zwane Mwaikambo
  2005-04-26  5:17       ` Li Shaohua
  0 siblings, 1 reply; 10+ messages in thread
From: Zwane Mwaikambo @ 2005-04-26  4:47 UTC (permalink / raw)
  To: Li Shaohua; +Cc: lkml, Andrew Morton, Andi Kleen

On Tue, 26 Apr 2005, Li Shaohua wrote:

> On Tue, 2005-04-26 at 12:25, Zwane Mwaikambo wrote:
> > 
> > > After a CPU is booted but before it's officially up (set online map, and
> > > enable interrupt), the CPU possibly will receive a broadcast IPI. After
> > > it's up, it will handle the stale interrupt soon and maybe cause oops if
> > > it's a smp-call-function-interrupt. This is quite possible in CPU
> > > hotplug case, but nearly can't occur at boot time. Below patch replaces
> > > broadcast IPI with send_ipi_mask just like the cluster mode.
> > 
> > Ok, but isn't it sufficient to use send_ipi_mask in smp_call_function 
> > instead?
> I'm not sure if other routines using broadcast IPI have this bug. Fixing
> the send_ipi_all API looks more generic. Is there any reason we should
> use broadcast IPI?

I'd prefer only touching smp_call_function because the others are 
primitives, we should only be fixing the users of the primitives, 
otherwise we'll end up with code which won't be as versatile.

Thanks,
	Zwane

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH]broadcast IPI race condition on CPU hotplug
  2005-04-26  4:47     ` Zwane Mwaikambo
@ 2005-04-26  5:17       ` Li Shaohua
  0 siblings, 0 replies; 10+ messages in thread
From: Li Shaohua @ 2005-04-26  5:17 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: lkml, Andrew Morton, Andi Kleen

Hi,
On Tue, 2005-04-26 at 12:47, Zwane Mwaikambo wrote:
> > > > After a CPU is booted but before it's officially up (set online map, and
> > > > enable interrupt), the CPU possibly will receive a broadcast IPI. After
> > > > it's up, it will handle the stale interrupt soon and maybe cause oops if
> > > > it's a smp-call-function-interrupt. This is quite possible in CPU
> > > > hotplug case, but nearly can't occur at boot time. Below patch replaces
> > > > broadcast IPI with send_ipi_mask just like the cluster mode.
> > > 
> > > Ok, but isn't it sufficient to use send_ipi_mask in smp_call_function 
> > > instead?
> > I'm not sure if other routines using broadcast IPI have this bug. Fixing
> > the send_ipi_all API looks more generic. Is there any reason we should
> > use broadcast IPI?
> 
> I'd prefer only touching smp_call_function because the others are 
> primitives, we should only be fixing the users of the primitives, 
> otherwise we'll end up with code which won't be as versatile.
Ok, here it it. It's against -mm tree.

Thanks,
Shaohua

Signed-off-by: Li Shaohua<shaohua.li@intel.com>

--- a/arch/i386/kernel/smp.c	2005-04-26 08:47:08.762344760 +0800
+++ b/arch/i386/kernel/smp.c	2005-04-26 13:05:44.558585200 +0800
@@ -527,6 +527,7 @@ int smp_call_function (void (*func) (voi
 {
 	struct call_data_struct data;
 	int cpus;
+	cpumask_t mask;
 
 	/* Holding any lock stops cpus from going down. */
 	spin_lock(&call_lock);
@@ -536,6 +537,8 @@ int smp_call_function (void (*func) (voi
 		spin_unlock(&call_lock);
 		return 0;
 	}
+	mask = cpu_online_map;
+	cpu_clear(smp_processor_id(), mask);
 
 	/* Can deadlock when called with interrupts disabled */
 	WARN_ON(irqs_disabled());
@@ -551,7 +554,7 @@ int smp_call_function (void (*func) (voi
 	mb();
 	
 	/* Send a message to all other CPUs and wait for them to respond */
-	send_IPI_allbutself(CALL_FUNCTION_VECTOR);
+	send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
 
 	/* Wait for response */
 	while (atomic_read(&data.started) != cpus)
--- a/arch/x86_64/kernel/smp.c	2005-04-12 10:12:16.000000000 +0800
+++ b/arch/x86_64/kernel/smp.c	2005-04-26 13:07:26.715055056 +0800
@@ -304,10 +304,12 @@ static void __smp_call_function (void (*
 {
 	struct call_data_struct data;
 	int cpus = num_online_cpus()-1;
+	cpumask_t mask = cpu_online_map;
 
 	if (!cpus)
 		return;
 
+	cpu_clear(smp_processor_id(), mask);
 	data.func = func;
 	data.info = info;
 	atomic_set(&data.started, 0);
@@ -318,7 +320,7 @@ static void __smp_call_function (void (*
 	call_data = &data;
 	wmb();
 	/* Send a message to all other CPUs and wait for them to respond */
-	send_IPI_allbutself(CALL_FUNCTION_VECTOR);
+	send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
 
 	/* Wait for response */
 	while (atomic_read(&data.started) != cpus)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH]broadcast IPI race condition on CPU hotplug
  2005-04-26  2:20 [PATCH]broadcast IPI race condition on CPU hotplug Li Shaohua
  2005-04-26  4:25 ` Zwane Mwaikambo
@ 2005-04-26 13:21 ` Andi Kleen
  2005-04-27  1:00   ` Zwane Mwaikambo
  2005-04-27  1:11   ` Li Shaohua
  1 sibling, 2 replies; 10+ messages in thread
From: Andi Kleen @ 2005-04-26 13:21 UTC (permalink / raw)
  To: Li Shaohua; +Cc: lkml, Andrew Morton, Andi Kleen, Zwane Mwaikambo

On Tue, Apr 26, 2005 at 10:20:44AM +0800, Li Shaohua wrote:
> Hi,
> After a CPU is booted but before it's officially up (set online map, and
> enable interrupt), the CPU possibly will receive a broadcast IPI. After
> it's up, it will handle the stale interrupt soon and maybe cause oops if
> it's a smp-call-function-interrupt. This is quite possible in CPU
> hotplug case, but nearly can't occur at boot time. Below patch replaces
> broadcast IPI with send_ipi_mask just like the cluster mode.

No way we are making this common operation much slower just
to fix an obscure race at boot time. PLease come up with a fix
that only impacts the boot process.

Thanks,
-Andi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH]broadcast IPI race condition on CPU hotplug
  2005-04-26 13:21 ` Andi Kleen
@ 2005-04-27  1:00   ` Zwane Mwaikambo
  2005-04-27  1:11   ` Li Shaohua
  1 sibling, 0 replies; 10+ messages in thread
From: Zwane Mwaikambo @ 2005-04-27  1:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Li Shaohua, lkml, Andrew Morton

On Tue, 26 Apr 2005, Andi Kleen wrote:

> On Tue, Apr 26, 2005 at 10:20:44AM +0800, Li Shaohua wrote:
> > Hi,
> > After a CPU is booted but before it's officially up (set online map, and
> > enable interrupt), the CPU possibly will receive a broadcast IPI. After
> > it's up, it will handle the stale interrupt soon and maybe cause oops if
> > it's a smp-call-function-interrupt. This is quite possible in CPU
> > hotplug case, but nearly can't occur at boot time. Below patch replaces
> > broadcast IPI with send_ipi_mask just like the cluster mode.
> 
> No way we are making this common operation much slower just
> to fix an obscure race at boot time. PLease come up with a fix
> that only impacts the boot process.

The fix only for smp_call_function seems fine to me since we really don't 
want to broadcast to all processors but only online ones. 
smp_call_function is particularly dangerous because the interrupt handler 
accesses stack data which wouldn't be persistent due to the calling 
processor not waiting for non online processors.

Thanks,
	Zwane


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH]broadcast IPI race condition on CPU hotplug
  2005-04-26 13:21 ` Andi Kleen
  2005-04-27  1:00   ` Zwane Mwaikambo
@ 2005-04-27  1:11   ` Li Shaohua
  2005-04-27 12:56     ` Andi Kleen
  1 sibling, 1 reply; 10+ messages in thread
From: Li Shaohua @ 2005-04-27  1:11 UTC (permalink / raw)
  To: Andi Kleen; +Cc: lkml, Andrew Morton, Zwane Mwaikambo

On Tue, 2005-04-26 at 21:21, Andi Kleen wrote:
> On Tue, Apr 26, 2005 at 10:20:44AM +0800, Li Shaohua wrote:
> > Hi,
> > After a CPU is booted but before it's officially up (set online map, and
> > enable interrupt), the CPU possibly will receive a broadcast IPI. After
> > it's up, it will handle the stale interrupt soon and maybe cause oops if
> > it's a smp-call-function-interrupt. This is quite possible in CPU
> > hotplug case, but nearly can't occur at boot time. Below patch replaces
> > broadcast IPI with send_ipi_mask just like the cluster mode.
> 
> No way we are making this common operation much slower just
> to fix an obscure race at boot time. PLease come up with a fix
> that only impacts the boot process.
We can't prevent a CPU to receive a broadcast interrupt. Ack the
interrupt and mark the cpu online can't be atomic operation, so the CPU
either receives unexpected interrupt or loses interrupt.

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH]broadcast IPI race condition on CPU hotplug
  2005-04-27  1:11   ` Li Shaohua
@ 2005-04-27 12:56     ` Andi Kleen
  2005-04-28  3:15       ` Li Shaohua
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2005-04-27 12:56 UTC (permalink / raw)
  To: Li Shaohua; +Cc: Andi Kleen, lkml, Andrew Morton, Zwane Mwaikambo

On Wed, Apr 27, 2005 at 09:11:59AM +0800, Li Shaohua wrote:
> On Tue, 2005-04-26 at 21:21, Andi Kleen wrote:
> > On Tue, Apr 26, 2005 at 10:20:44AM +0800, Li Shaohua wrote:
> > > Hi,
> > > After a CPU is booted but before it's officially up (set online map, and
> > > enable interrupt), the CPU possibly will receive a broadcast IPI. After
> > > it's up, it will handle the stale interrupt soon and maybe cause oops if
> > > it's a smp-call-function-interrupt. This is quite possible in CPU
> > > hotplug case, but nearly can't occur at boot time. Below patch replaces
> > > broadcast IPI with send_ipi_mask just like the cluster mode.
> > 
> > No way we are making this common operation much slower just
> > to fix an obscure race at boot time. PLease come up with a fix
> > that only impacts the boot process.
> We can't prevent a CPU to receive a broadcast interrupt. Ack the
> interrupt and mark the cpu online can't be atomic operation, so the CPU
> either receives unexpected interrupt or loses interrupt.

Cant you just check at the end of the CPU bootup if the CPU
got such an APIC interrupt and ack it then? 

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH]broadcast IPI race condition on CPU hotplug
  2005-04-27 12:56     ` Andi Kleen
@ 2005-04-28  3:15       ` Li Shaohua
  0 siblings, 0 replies; 10+ messages in thread
From: Li Shaohua @ 2005-04-28  3:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: lkml, Andrew Morton, Zwane Mwaikambo

On Wed, 2005-04-27 at 20:56, Andi Kleen wrote:
> > > No way we are making this common operation much slower just
> > > to fix an obscure race at boot time. PLease come up with a fix
> > > that only impacts the boot process.
> > We can't prevent a CPU to receive a broadcast interrupt. Ack the
> > interrupt and mark the cpu online can't be atomic operation, so the CPU
> > either receives unexpected interrupt or loses interrupt.
> 
> Cant you just check at the end of the CPU bootup if the CPU
> got such an APIC interrupt and ack it then? 
Ok, There are two solutions:
1. boot sequence hold the 'call_lock' before LAPIC is initialized. So no
broadcast IPI will be received. But you possibly think the lock is hold
too long time.
2. Hold the lock later.
The boot sequence does:
a.hold 'call_lock' (prevent upcoming IPI)
b.enable interrupt (let stale IPI get handled)
c.set cpu online
d.unlock the lock.
The smp_call_function_interrupt does:
if (cpu is offline)
	ack the interrupt and return
But this approach will add check in smp_call_function_interrupt and
enable interrupt before set online map. Don't know if it's an issue.

Seems no other approaches. The ISR and IRR registers are read only.
BTW, send_ipi_mask_bitmask sends a CPU group one time, is it really much
slower than broadcast IPI? I guess we can ignore the overhead, since the
overhead is lighter against the call function interrupt handler.

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-04-28  3:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-26  2:20 [PATCH]broadcast IPI race condition on CPU hotplug Li Shaohua
2005-04-26  4:25 ` Zwane Mwaikambo
2005-04-26  4:24   ` Li Shaohua
2005-04-26  4:47     ` Zwane Mwaikambo
2005-04-26  5:17       ` Li Shaohua
2005-04-26 13:21 ` Andi Kleen
2005-04-27  1:00   ` Zwane Mwaikambo
2005-04-27  1:11   ` Li Shaohua
2005-04-27 12:56     ` Andi Kleen
2005-04-28  3:15       ` Li Shaohua

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).