linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0
@ 2003-08-06  2:17 john stultz
  2003-08-06 17:02 ` Mark Haverkamp
  0 siblings, 1 reply; 9+ messages in thread
From: john stultz @ 2003-08-06  2:17 UTC (permalink / raw)
  To: lkml; +Cc: Alan Cox, Matt, Martin J. Bligh

All, 
	I've been noticing occasional (and with some kernels, not so
ocassional) hangs after the mtrr init call. Booting with
"initcall_debug" pointed the finger at init_bio(), but further
investigation showed that cpus were actually getting hung in the mtrr
ipi_handler, thus they could not respond to the smp_call_function made
deeper within init_bio(). 

I've found a race in the mtrr ipi_handler() and set_mtrr() functions.

Basically the server, set_mtrr() does the following:

1.1 initialize a flag
1.2 send ipi
1.3 waits for all cpus to check in
1.4 toggle flag
1.5 do stuff
1.6 wait for all cpus to check out
1.7 toggle flag again
1.8 return

While the clients, running ipi_handler() do the following:

2.1 check in
2.2 wait for flag
2.3 do stuff
2.4 check out
2.5 wait for flag
2.6 return

The problem is the flag is on the servers stack! So if 1.7 and 1.8
executes before 2.5 happens, the client's pointer to the flag becomes
invalid (likely overwritten) which causes the client to never see the
flag change, hanging the box.

I believe the solution to this is to just remove the needless
synchronization steps: 1.7 and 2.5. Once all the clients have checked
out, there is no reason to have them hang around only to return all at
once. The final flagging seems needless and racy. 

Amazingly this has been in the kernel for about a year, yet its not
bothered me until very recently. Go figure. 

Please review and comment on the following fix. Please let me know if
I'm just wrong and the final flagging is more needed then I think. 

thanks
-john

diff -Nru a/arch/i386/kernel/cpu/mtrr/main.c b/arch/i386/kernel/cpu/mtrr/main.c
--- a/arch/i386/kernel/cpu/mtrr/main.c	Tue Aug  5 18:45:17 2003
+++ b/arch/i386/kernel/cpu/mtrr/main.c	Tue Aug  5 18:45:17 2003
@@ -165,10 +165,6 @@
 		mtrr_if->set_all();
 
 	atomic_dec(&data->count);
-	while(atomic_read(&data->gate)) {
-		cpu_relax();
-		barrier();
-	}
 	local_irq_restore(flags);
 }
 
@@ -257,7 +253,6 @@
 		barrier();
 	}
 	local_irq_restore(flags);
-	atomic_set(&data.gate,0);
 }
 
 /**




^ permalink raw reply	[flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0
@ 2003-08-06  8:52 Mathias Fröhlich
  2003-08-06 17:15 ` john stultz
  0 siblings, 1 reply; 9+ messages in thread
From: Mathias Fröhlich @ 2003-08-06  8:52 UTC (permalink / raw)
  To: johnstul; +Cc: linux-kernel


Hi,

You should not remove the barrier past mtrr change. If you do that, it is 
possible that cpu's run with inconsistent mtrrs. This can have bad 
sideeffects since at least the cache snooping protocol used by intel uses 
assumptions about the cachability of memory regions. Those information about 
the cachability is also taken from the mtrrs as far as I remember.
This intel cpu developer manual, which documented the early PII and PPro 
chips, recommended this algorithm. Since actual intel cpus use the same old 
cpu to chipset bus protocol, this old documentation most propably still 
applies.

So the conclusion is that as far as you don't know the exact way all those SMP 
protocols between chipsets and CPUs with all the possible sideeffects very 
well, dont't change this behavour.

But I'm shure that fixes to the stack allocated variable problem are welcome 
:)

   Greetings

      Mathias Fröhlich

-- 
Mathias Fröhlich, email: Mathias.Froehlich@web.de
old email was: frohlich@na.uni-tuebingen.de


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-08-07  5:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-06  2:17 [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0 john stultz
2003-08-06 17:02 ` Mark Haverkamp
2003-08-06  8:52 Mathias Fröhlich
2003-08-06 17:15 ` john stultz
2003-08-06 18:05   ` Mark Haverkamp
2003-08-06 18:02     ` john stultz
2003-08-06 18:31   ` Zwane Mwaikambo
2003-08-06 20:24     ` john stultz
2003-08-07  5:06       ` Zwane Mwaikambo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).