linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0
@ 2003-08-06  8:52 Mathias Fröhlich
  2003-08-06 17:15 ` john stultz
  0 siblings, 1 reply; 10+ messages in thread
From: Mathias Fröhlich @ 2003-08-06  8:52 UTC (permalink / raw)
  To: johnstul; +Cc: linux-kernel


Hi,

You should not remove the barrier past mtrr change. If you do that, it is 
possible that cpu's run with inconsistent mtrrs. This can have bad 
sideeffects since at least the cache snooping protocol used by intel uses 
assumptions about the cachability of memory regions. Those information about 
the cachability is also taken from the mtrrs as far as I remember.
This intel cpu developer manual, which documented the early PII and PPro 
chips, recommended this algorithm. Since actual intel cpus use the same old 
cpu to chipset bus protocol, this old documentation most propably still 
applies.

So the conclusion is that as far as you don't know the exact way all those SMP 
protocols between chipsets and CPUs with all the possible sideeffects very 
well, dont't change this behavour.

But I'm shure that fixes to the stack allocated variable problem are welcome 
:)

   Greetings

      Mathias Fröhlich

-- 
Mathias Fröhlich, email: Mathias.Froehlich@web.de
old email was: frohlich@na.uni-tuebingen.de


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0
  2003-08-06  8:52 [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0 Mathias Fröhlich
@ 2003-08-06 17:15 ` john stultz
  2003-08-06 18:05   ` Mark Haverkamp
  2003-08-06 18:31   ` [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0 Zwane Mwaikambo
  0 siblings, 2 replies; 10+ messages in thread
From: john stultz @ 2003-08-06 17:15 UTC (permalink / raw)
  To: Mathias Fröhlich; +Cc: lkml

On Wed, 2003-08-06 at 01:52, Mathias Fröhlich wrote:
> Hi,
> 
> You should not remove the barrier past mtrr change. If you do that, it is 
> possible that cpu's run with inconsistent mtrrs. This can have bad 
> sideeffects since at least the cache snooping protocol used by intel uses 
> assumptions about the cachability of memory regions. Those information about 
> the cachability is also taken from the mtrrs as far as I remember.
> This intel cpu developer manual, which documented the early PII and PPro 
> chips, recommended this algorithm. Since actual intel cpus use the same old 
> cpu to chipset bus protocol, this old documentation most propably still 
> applies.

Hmm. I should dig up that doc. Its a little hazy in my mind, but I think
I understand your description. I'm glad you caught this, as I can't
imagine the subtle bugs that might have popped up. 

Well, let me look at it again and see if I can come up with a proper
fix. 

Thanks for the knowledgeable feedback and sanity checking!
-john


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0
  2003-08-06 18:05   ` Mark Haverkamp
@ 2003-08-06 18:02     ` john stultz
  2003-08-07 14:38       ` [PATCH] linux-2.6.0-test2_mtrr-race-fix_A0 Mark Haverkamp
  0 siblings, 1 reply; 10+ messages in thread
From: john stultz @ 2003-08-06 18:02 UTC (permalink / raw)
  To: Mark Haverkamp; +Cc: Mathias Fröhlich, lkml

On Wed, 2003-08-06 at 11:05, Mark Haverkamp wrote:
> On Wed, 2003-08-06 at 10:15, john stultz wrote:
> > 
> > Well, let me look at it again and see if I can come up with a proper
> > fix. 
> I added an extra sync up from the caller after the last gate change so
> it is the last one to touch the automatic data.

Ah, you beat me to it! 

I'm actually testing the very same change (comments differ a touch, but
that's ok).

Looks good. If everyone is happy I'd say resubmit it to Andrew.

thanks
-john




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0
  2003-08-06 17:15 ` john stultz
@ 2003-08-06 18:05   ` Mark Haverkamp
  2003-08-06 18:02     ` john stultz
  2003-08-06 18:31   ` [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0 Zwane Mwaikambo
  1 sibling, 1 reply; 10+ messages in thread
From: Mark Haverkamp @ 2003-08-06 18:05 UTC (permalink / raw)
  To: john stultz; +Cc: Mathias Fröhlich, lkml

On Wed, 2003-08-06 at 10:15, john stultz wrote:
> On Wed, 2003-08-06 at 01:52, Mathias Fröhlich wrote:
> > Hi,
> > 
> > You should not remove the barrier past mtrr change. If you do that, it is 
> > possible that cpu's run with inconsistent mtrrs. This can have bad 
> > sideeffects since at least the cache snooping protocol used by intel uses 
> > assumptions about the cachability of memory regions. Those information about 
> > the cachability is also taken from the mtrrs as far as I remember.
> > This intel cpu developer manual, which documented the early PII and PPro 
> > chips, recommended this algorithm. Since actual intel cpus use the same old 
> > cpu to chipset bus protocol, this old documentation most propably still 
> > applies.
> 
> Hmm. I should dig up that doc. Its a little hazy in my mind, but I think
> I understand your description. I'm glad you caught this, as I can't
> imagine the subtle bugs that might have popped up. 
> 
> Well, let me look at it again and see if I can come up with a proper
> fix. 
> 
> Thanks for the knowledgeable feedback and sanity checking!
> -john


I added an extra sync up from the caller after the last gate change so
it is the last one to touch the automatic data.

Mark.


===== arch/i386/kernel/cpu/mtrr/main.c 1.29 vs edited =====
--- 1.29/arch/i386/kernel/cpu/mtrr/main.c	Tue Jul 15 10:08:48 2003
+++ edited/arch/i386/kernel/cpu/mtrr/main.c	Wed Aug  6 10:46:00 2003
@@ -169,6 +169,7 @@
 		cpu_relax();
 		barrier();
 	}
+	atomic_dec(&data->count);
 	local_irq_restore(flags);
 }
 
@@ -256,8 +257,18 @@
 		cpu_relax();
 		barrier();
 	}
-	local_irq_restore(flags);
+	atomic_set(&data.count, num_booting_cpus() - 1);
 	atomic_set(&data.gate,0);
+
+	/* 
+	 * Wait here for everyone to have seen the gate change
+	 * So we're the last ones to touch 'data'
+	 */
+	while(atomic_read(&data.count)) {
+		cpu_relax();
+		barrier();
+	}
+	local_irq_restore(flags);
 }
 
 /**


-- 
Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0
  2003-08-06 17:15 ` john stultz
  2003-08-06 18:05   ` Mark Haverkamp
@ 2003-08-06 18:31   ` Zwane Mwaikambo
  2003-08-06 20:24     ` john stultz
  1 sibling, 1 reply; 10+ messages in thread
From: Zwane Mwaikambo @ 2003-08-06 18:31 UTC (permalink / raw)
  To: john stultz; +Cc: Mathias Fröhlich, lkml, Mark Haverkamp, Dave Jones

On Wed, 6 Aug 2003, john stultz wrote:

> On Wed, 2003-08-06 at 01:52, Mathias Fröhlich wrote:

> > You should not remove the barrier past mtrr change. If you do that, it is 
> > possible that cpu's run with inconsistent mtrrs. This can have bad 
> > sideeffects since at least the cache snooping protocol used by intel uses 
> > assumptions about the cachability of memory regions. Those information about 
> > the cachability is also taken from the mtrrs as far as I remember.
> > This intel cpu developer manual, which documented the early PII and PPro 
> > chips, recommended this algorithm. Since actual intel cpus use the same old 
> > cpu to chipset bus protocol, this old documentation most propably still 
> > applies.
> 
> Hmm. I should dig up that doc. Its a little hazy in my mind, but I think
> I understand your description. I'm glad you caught this, as I can't
> imagine the subtle bugs that might have popped up. 
> 
> Well, let me look at it again and see if I can come up with a proper
> fix. 

The intel manual (10-36 Memory Cache Control - vol3) actually recommends 
the following procedure I was a bit anal about explicitely setting and 
clearing flags and used the specific TLB flush via cr3->reg->cr3. John 
could you give this a spin on your afflicted systems?

Index: linux-2.5/arch/i386/kernel/cpu/mtrr/generic.c
===================================================================
RCS file: /home/cvs/linux-2.5/arch/i386/kernel/cpu/mtrr/generic.c,v
retrieving revision 1.8
diff -u -p -B -r1.8 generic.c
--- linux-2.5/arch/i386/kernel/cpu/mtrr/generic.c	15 Feb 2003 20:30:00 -0000	1.8
+++ linux-2.5/arch/i386/kernel/cpu/mtrr/generic.c	6 Aug 2003 17:05:59 -0000
@@ -8,6 +8,7 @@
 #include <asm/msr.h>
 #include <asm/system.h>
 #include <asm/cpufeature.h>
+#include <asm/tlbflush.h>
 #include "mtrr.h"
 
 struct mtrr_state {
@@ -241,19 +242,22 @@ static void prepare_set(void)
 	   more invasive changes to the way the kernel boots  */
 	spin_lock(&set_atomicity_lock);
 
+	/*  Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
+	cr0 = read_cr0() | 0x40000000;	/* set CD flag */
+	cr0 &= ~(0x20000000);		/* clear NW flag */
+	wbinvd();
+	write_cr0(cr0);
+	wbinvd();
+
 	/*  Save value of CR4 and clear Page Global Enable (bit 7)  */
 	if ( cpu_has_pge ) {
 		cr4 = read_cr4();
 		write_cr4(cr4 & (unsigned char) ~(1 << 7));
 	}
 
-	/*  Disable and flush caches. Note that wbinvd flushes the TLBs as
-	    a side-effect  */
-	cr0 = read_cr0() | 0x40000000;
-	wbinvd();
-	write_cr0(cr0);
-	wbinvd();
-
+	/* Flush all TLBs via a mov %cr3, %reg; mov %reg, %cr3 */
+	__flush_tlb();
+	
 	/*  Save MTRR state */
 	rdmsr(MTRRdefType_MSR, deftype_lo, deftype_hi);
 
@@ -263,14 +267,18 @@ static void prepare_set(void)
 
 static void post_set(void)
 {
+	unsigned long cr0;
+
 	/*  Flush caches and TLBs  */
 	wbinvd();
+	__flush_tlb();
 
 	/* Intel (P6) standard MTRRs */
 	wrmsr(MTRRdefType_MSR, deftype_lo, deftype_hi);
 		
-	/*  Enable caches  */
-	write_cr0(read_cr0() & 0xbfffffff);
+	/*  Enable caches, clear previously set CD flag */
+	cr0 = read_cr0() & ~(0x40000000);
+	write_cr0(cr0);
 
 	/*  Restore value of CR4  */
 	if ( cpu_has_pge )

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0
  2003-08-06 18:31   ` [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0 Zwane Mwaikambo
@ 2003-08-06 20:24     ` john stultz
  2003-08-07  5:06       ` Zwane Mwaikambo
  0 siblings, 1 reply; 10+ messages in thread
From: john stultz @ 2003-08-06 20:24 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Mathias Fröhlich, lkml, Mark Haverkamp, Dave Jones

On Wed, 2003-08-06 at 11:31, Zwane Mwaikambo wrote:
> The intel manual (10-36 Memory Cache Control - vol3) actually recommends 
> the following procedure I was a bit anal about explicitely setting and 
> clearing flags and used the specific TLB flush via cr3->reg->cr3. John 
> could you give this a spin on your afflicted systems?

I'm not quite sure I follow. Really I've never looked at the mtrr code
before, so forgive my daftness.  I just found a deadlock that was
locking my box caused by the synchronization between ipi_handler() and
set_mtrr() (which Mark's patch seems to properly fix). 

How does this change affect the deadlock? Is this just a separate issue?
thanks
-john



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0
  2003-08-06 20:24     ` john stultz
@ 2003-08-07  5:06       ` Zwane Mwaikambo
  0 siblings, 0 replies; 10+ messages in thread
From: Zwane Mwaikambo @ 2003-08-07  5:06 UTC (permalink / raw)
  To: john stultz; +Cc: Mathias Fröhlich, lkml, Mark Haverkamp, Dave Jones

On Wed, 6 Aug 2003, john stultz wrote:

> On Wed, 2003-08-06 at 11:31, Zwane Mwaikambo wrote:
> > The intel manual (10-36 Memory Cache Control - vol3) actually recommends 
> > the following procedure I was a bit anal about explicitely setting and 
> > clearing flags and used the specific TLB flush via cr3->reg->cr3. John 
> > could you give this a spin on your afflicted systems?
> 
> I'm not quite sure I follow. Really I've never looked at the mtrr code
> before, so forgive my daftness.  I just found a deadlock that was
> locking my box caused by the synchronization between ipi_handler() and
> set_mtrr() (which Mark's patch seems to properly fix). 
> 
> How does this change affect the deadlock? Is this just a separate issue?
> thanks

Mostly seperate issue, but seeing as we where there, there is a slight 
deviation from recommended procedure. Sorry about that, you may ignore it.

Thanks
	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] linux-2.6.0-test2_mtrr-race-fix_A0
  2003-08-06 18:02     ` john stultz
@ 2003-08-07 14:38       ` Mark Haverkamp
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Haverkamp @ 2003-08-07 14:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, john stultz

On Wed, 2003-08-06 at 11:02, john stultz wrote:
> On Wed, 2003-08-06 at 11:05, Mark Haverkamp wrote:
> > On Wed, 2003-08-06 at 10:15, john stultz wrote:
> > > 
> > > Well, let me look at it again and see if I can come up with a proper
> > > fix. 
> > I added an extra sync up from the caller after the last gate change so
> > it is the last one to touch the automatic data.
> 
> Ah, you beat me to it! 
> 
> I'm actually testing the very same change (comments differ a touch, but
> that's ok).
> 
> Looks good. If everyone is happy I'd say resubmit it to Andrew.
> 
> thanks
> -john

I'd like to submit this patch for inclusion.  It adds an extra sync up
in set_mtrr and ipi_handler to make sure that set_mtrr is the last to
touch the automatic set_mtrr_data.


===== arch/i386/kernel/cpu/mtrr/main.c 1.29 vs edited =====
--- 1.29/arch/i386/kernel/cpu/mtrr/main.c	Tue Jul 15 10:08:48 2003
+++ edited/arch/i386/kernel/cpu/mtrr/main.c	Wed Aug  6 10:46:00 2003
@@ -169,6 +169,7 @@
 		cpu_relax();
 		barrier();
 	}
+	atomic_dec(&data->count);
 	local_irq_restore(flags);
 }
 
@@ -256,8 +257,18 @@
 		cpu_relax();
 		barrier();
 	}
-	local_irq_restore(flags);
+	atomic_set(&data.count, num_booting_cpus() - 1);
 	atomic_set(&data.gate,0);
+
+	/* 
+	 * Wait here for everyone to have seen the gate change
+	 * So we're the last ones to touch 'data'
+	 */
+	while(atomic_read(&data.count)) {
+		cpu_relax();
+		barrier();
+	}
+	local_irq_restore(flags);
 }
 
 /**


-- 
Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0
  2003-08-06  2:17 john stultz
@ 2003-08-06 17:02 ` Mark Haverkamp
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Haverkamp @ 2003-08-06 17:02 UTC (permalink / raw)
  To: john stultz; +Cc: lkml, Alan Cox, Matt, Martin J. Bligh

On Tue, 2003-08-05 at 19:17, john stultz wrote:
> All, 


> 
> Amazingly this has been in the kernel for about a year, yet its not
> bothered me until very recently. Go figure. 

I started seeing this on an 8-way after the mtrr_init function was
changed from core_initcall to subsys_initcall.

> 
> Please review and comment on the following fix. Please let me know if
> I'm just wrong and the final flagging is more needed then I think. 

I applied this patch and so far I have been able to boot every time
without a hang on my 8-way.

Mark.


> 
> thanks
> -john

-- 
Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0
@ 2003-08-06  2:17 john stultz
  2003-08-06 17:02 ` Mark Haverkamp
  0 siblings, 1 reply; 10+ messages in thread
From: john stultz @ 2003-08-06  2:17 UTC (permalink / raw)
  To: lkml; +Cc: Alan Cox, Matt, Martin J. Bligh

All, 
	I've been noticing occasional (and with some kernels, not so
ocassional) hangs after the mtrr init call. Booting with
"initcall_debug" pointed the finger at init_bio(), but further
investigation showed that cpus were actually getting hung in the mtrr
ipi_handler, thus they could not respond to the smp_call_function made
deeper within init_bio(). 

I've found a race in the mtrr ipi_handler() and set_mtrr() functions.

Basically the server, set_mtrr() does the following:

1.1 initialize a flag
1.2 send ipi
1.3 waits for all cpus to check in
1.4 toggle flag
1.5 do stuff
1.6 wait for all cpus to check out
1.7 toggle flag again
1.8 return

While the clients, running ipi_handler() do the following:

2.1 check in
2.2 wait for flag
2.3 do stuff
2.4 check out
2.5 wait for flag
2.6 return

The problem is the flag is on the servers stack! So if 1.7 and 1.8
executes before 2.5 happens, the client's pointer to the flag becomes
invalid (likely overwritten) which causes the client to never see the
flag change, hanging the box.

I believe the solution to this is to just remove the needless
synchronization steps: 1.7 and 2.5. Once all the clients have checked
out, there is no reason to have them hang around only to return all at
once. The final flagging seems needless and racy. 

Amazingly this has been in the kernel for about a year, yet its not
bothered me until very recently. Go figure. 

Please review and comment on the following fix. Please let me know if
I'm just wrong and the final flagging is more needed then I think. 

thanks
-john

diff -Nru a/arch/i386/kernel/cpu/mtrr/main.c b/arch/i386/kernel/cpu/mtrr/main.c
--- a/arch/i386/kernel/cpu/mtrr/main.c	Tue Aug  5 18:45:17 2003
+++ b/arch/i386/kernel/cpu/mtrr/main.c	Tue Aug  5 18:45:17 2003
@@ -165,10 +165,6 @@
 		mtrr_if->set_all();
 
 	atomic_dec(&data->count);
-	while(atomic_read(&data->gate)) {
-		cpu_relax();
-		barrier();
-	}
 	local_irq_restore(flags);
 }
 
@@ -257,7 +253,6 @@
 		barrier();
 	}
 	local_irq_restore(flags);
-	atomic_set(&data.gate,0);
 }
 
 /**




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-08-07 14:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-06  8:52 [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0 Mathias Fröhlich
2003-08-06 17:15 ` john stultz
2003-08-06 18:05   ` Mark Haverkamp
2003-08-06 18:02     ` john stultz
2003-08-07 14:38       ` [PATCH] linux-2.6.0-test2_mtrr-race-fix_A0 Mark Haverkamp
2003-08-06 18:31   ` [RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0 Zwane Mwaikambo
2003-08-06 20:24     ` john stultz
2003-08-07  5:06       ` Zwane Mwaikambo
  -- strict thread matches above, loose matches on Subject: below --
2003-08-06  2:17 john stultz
2003-08-06 17:02 ` Mark Haverkamp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).