All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.12-mm1 boot failure on NUMA box.
@ 2005-06-21  5:07 Martin J. Bligh
  2005-06-21  6:29 ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: Martin J. Bligh @ 2005-06-21  5:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Ingo Molnar

OK, after fixing the build failure with Andy's patch here:

http://mbligh.org/abat/apw_pci_assign_unassigned_resources

I get a boot failure on the NUMA-Q box. Full log is here:

http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/6184/debug/console.log

But at the end it prints out lots of wierd scheduler stuff, then one more
message, then dies:

| migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
---------------------
          [00]    [01]    [02]    [03]    [04]    [05]    [06]    [07]    [08]    [09]    [10]    [11]    [12]    [13]    [14]    [15]
[00]:     -    12.0(0) 12.0(0) 12.0(0) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1)
[01]:  12.0(0)    -    12.0(0) 12.0(0) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1)
[02]:  12.0(0) 12.0(0)    -    12.0(0) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1)
[03]:  12.0(0) 12.0(0) 12.0(0)    -    466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1)
[04]:  466.0(1) 466.0(1) 466.0(1) 466.0(1)    -    12.0(0) 12.0(0) 12.0(0) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1)
[05]:  466.0(1) 466.0(1) 466.0(1) 466.0(1) 12.0(0)    -    12.0(0) 12.0(0) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1)
[06]:  466.0(1) 466.0(1) 466.0(1) 466.0(1) 12.0(0) 12.0(0)    -    12.0(0) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1)
[07]:  466.0(1) 466.0(1) 466.0(1) 466.0(1) 12.0(0) 12.0(0) 12.0(0)    -    466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1)
[08]:  466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1)    -    12.0(0) 12.0(0) 12.0(0) 466.0(1) 466.0(1) 466.0(1) 466.0(1)
[09]:  466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 12.0(0)    -    12.0(0) 12.0(0) 466.0(1) 466.0(1) 466.0(1) 466.0(1)
[10]:  466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 12.0(0) 12.0(0)    -    12.0(0) 466.0(1) 466.0(1) 466.0(1) 466.0(1)
[11]:  466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 12.0(0) 12.0(0) 12.0(0)    -    466.0(1) 466.0(1) 466.0(1) 466.0(1)
[12]:  466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1)    -    12.0(0) 12.0(0) 12.0(0)
[13]:  466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 12.0(0)    -    12.0(0) 12.0(0)
[14]:  466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 12.0(0) 12.0(0)    -    12.0(0)
[15]:  466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 466.0(1) 12.0(0) 12.0(0) 12.0(0)    -   
--------------------------------
| cacheflush times [2]: 12.0 (12000000) 466.0 (466000000)
| calibration delay: 29 seconds
--------------------------------
NET: Registered protocol family 16


I guess I'll try backing out the scheduler patches unless someone else 
has a brighter idea?

M.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-21  5:07 2.6.12-mm1 boot failure on NUMA box Martin J. Bligh
@ 2005-06-21  6:29 ` Andrew Morton
  2005-06-21 14:22   ` Martin J. Bligh
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2005-06-21  6:29 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, mingo

"Martin J. Bligh" <mbligh@mbligh.org> wrote:
>
> OK, after fixing the build failure with Andy's patch here:
> 
> http://mbligh.org/abat/apw_pci_assign_unassigned_resources

yup, I have that now.

> I get a boot failure on the NUMA-Q box. Full log is here:
> 
> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/6184/debug/console.log
> 
> But at the end it prints out lots of wierd scheduler stuff, then one more
> message, then dies:
> 
> | migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
> ---------------------

That's Ingo debug stuff.

> --------------------------------
> NET: Registered protocol family 16
> 

Well it got up to core_initcall(netlink_proto_init);

> 
> I guess I'll try backing out the scheduler patches unless someone else 
> has a brighter idea?

It doesn't look like a scheduler thing.  Tried enabling initcall_debug?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-21  6:29 ` Andrew Morton
@ 2005-06-21 14:22   ` Martin J. Bligh
  2005-06-21 20:03     ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: Martin J. Bligh @ 2005-06-21 14:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo

>> OK, after fixing the build failure with Andy's patch here:
>> 
>> http://mbligh.org/abat/apw_pci_assign_unassigned_resources
> 
> yup, I have that now.

Sweet, thanks.
 
>> I get a boot failure on the NUMA-Q box. Full log is here:
>> 
>> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/6184/debug/console.log
>> 
>> But at the end it prints out lots of wierd scheduler stuff, then one more
>> message, then dies:
>> 
>> | migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
>> ---------------------
> 
> That's Ingo debug stuff.
> 
>> --------------------------------
>> NET: Registered protocol family 16
> 
> Well it got up to core_initcall(netlink_proto_init);
> 
>> I guess I'll try backing out the scheduler patches unless someone else 
>> has a brighter idea?
> 
> It doesn't look like a scheduler thing.  Tried enabling initcall_debug?

Humpf. I agree it seemed to get a bit further than that, but I kicked off
a new test before I went to bed, and it does seem to work w/o the sched
patches:

http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/regression_matrix.html

see the "moe" column, comparing:

2.6.12-mm1
+apw_pci_assign_unass
+nosched_2.6.12-mm1

vs

2.6.12-mm1
+apw_pci_assign_unass

rows ?

I can still do initcall debug if you want. Or I guess it's binary chop
search amongst sched patches (or at least the ones that are new in 
this -mm ?)

M.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-21 14:22   ` Martin J. Bligh
@ 2005-06-21 20:03     ` Andrew Morton
  2005-06-24 14:11       ` Martin J. Bligh
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2005-06-21 20:03 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, mingo

"Martin J. Bligh" <mbligh@mbligh.org> wrote:
>
> Or I guess it's binary chop
>  search amongst sched patches (or at least the ones that are new in 
>  this -mm ?)

Yes please.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-21 20:03     ` Andrew Morton
@ 2005-06-24 14:11       ` Martin J. Bligh
  2005-06-24 14:14         ` Con Kolivas
  2005-06-24 17:01         ` Ingo Molnar
  0 siblings, 2 replies; 20+ messages in thread
From: Martin J. Bligh @ 2005-06-24 14:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo, Con Kolivas



--Andrew Morton <akpm@osdl.org> wrote (on Tuesday, June 21, 2005 13:03:44 -0700):

> "Martin J. Bligh" <mbligh@mbligh.org> wrote:
>> 
>> Or I guess it's binary chop
>>  search amongst sched patches (or at least the ones that are new in 
>>  this -mm ?)
> 
> Yes please.

OK, still broken with the last 3 backed out, but works with the last
4 backed out. So I guess it's scheduler-cache-hot-autodetect.patch
that breaks it. Con just sent me something else to try to fix it in order
to run next ... will do that.

M.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-24 14:11       ` Martin J. Bligh
@ 2005-06-24 14:14         ` Con Kolivas
  2005-06-24 15:31           ` Martin J. Bligh
  2005-06-24 17:01         ` Ingo Molnar
  1 sibling, 1 reply; 20+ messages in thread
From: Con Kolivas @ 2005-06-24 14:14 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel, mingo

On Sat, 25 Jun 2005 00:11, Martin J. Bligh wrote:
> --Andrew Morton <akpm@osdl.org> wrote (on Tuesday, June 21, 2005 13:03:44 
-0700):
> > "Martin J. Bligh" <mbligh@mbligh.org> wrote:
> >> Or I guess it's binary chop
> >>  search amongst sched patches (or at least the ones that are new in
> >>  this -mm ?)
> >
> > Yes please.
>
> OK, still broken with the last 3 backed out, but works with the last
> 4 backed out. So I guess it's scheduler-cache-hot-autodetect.patch
> that breaks it. Con just sent me something else to try to fix it in order
> to run next ... will do that.

Sorry, that patch I sent isn't a fix for any known problem, it's another tweak 
to my code. If you have breakage elsewhere don't waste your time with my code 
just yet.

Cheers,
Con

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-24 14:14         ` Con Kolivas
@ 2005-06-24 15:31           ` Martin J. Bligh
  0 siblings, 0 replies; 20+ messages in thread
From: Martin J. Bligh @ 2005-06-24 15:31 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, linux-kernel, mingo



--Con Kolivas <kernel@kolivas.org> wrote (on Saturday, June 25, 2005 00:14:26 +1000):

> On Sat, 25 Jun 2005 00:11, Martin J. Bligh wrote:
>> --Andrew Morton <akpm@osdl.org> wrote (on Tuesday, June 21, 2005 13:03:44 
> -0700):
>> > "Martin J. Bligh" <mbligh@mbligh.org> wrote:
>> >> Or I guess it's binary chop
>> >>  search amongst sched patches (or at least the ones that are new in
>> >>  this -mm ?)
>> > 
>> > Yes please.
>> 
>> OK, still broken with the last 3 backed out, but works with the last
>> 4 backed out. So I guess it's scheduler-cache-hot-autodetect.patch
>> that breaks it. Con just sent me something else to try to fix it in order
>> to run next ... will do that.
> 
> Sorry, that patch I sent isn't a fix for any known problem, it's another tweak 
> to my code. If you have breakage elsewhere don't waste your time with my code 
> just yet.

OK, I'll stack that on top of backing out the last 4 patches, which fixed moe.

M.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-24 14:11       ` Martin J. Bligh
  2005-06-24 14:14         ` Con Kolivas
@ 2005-06-24 17:01         ` Ingo Molnar
  2005-06-24 17:09           ` Martin J. Bligh
  2005-06-28 22:09           ` Martin J. Bligh
  1 sibling, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2005-06-24 17:01 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel, Con Kolivas


* Martin J. Bligh <mbligh@mbligh.org> wrote:

> OK, still broken with the last 3 backed out, but works with the last 4 
> backed out. So I guess it's scheduler-cache-hot-autodetect.patch that 
> breaks it. Con just sent me something else to try to fix it in order 
> to run next ... will do that.

hm. Does it work if you disable migration-autodetect via passing in e.g.  
migration_cost=1000,2000,3000 on the boot line? Is it perhaps the 
excessive debugging that hurts.

or does it work if you undo the chunk below? Seemed harmless, but has 
CONFIG_NUMA relevance.

	Ingo

--- linux/arch/i386/kernel/timers/timer_tsc.c.orig
+++ linux/arch/i386/kernel/timers/timer_tsc.c
@@ -133,18 +133,15 @@ static unsigned long long monotonic_cloc
 
 /*
  * Scheduler clock - returns current time in nanosec units.
+ *
+ * it's not a problem if the TSC is unsynchronized,
+ * as the scheduler will carefully compensate for it.
  */
 unsigned long long sched_clock(void)
 {
 	unsigned long long this_offset;
 
-	/*
-	 * In the NUMA case we dont use the TSC as they are not
-	 * synchronized across all CPUs.
-	 */
-#ifndef CONFIG_NUMA
-	if (!use_tsc)
-#endif
+	if (!cpu_has_tsc)
 		/* no locking but a rare wrong value is not a big deal */
 		return jiffies_64 * (1000000000 / HZ);
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-24 17:01         ` Ingo Molnar
@ 2005-06-24 17:09           ` Martin J. Bligh
  2005-06-24 19:52             ` Ingo Molnar
  2005-06-28 22:09           ` Martin J. Bligh
  1 sibling, 1 reply; 20+ messages in thread
From: Martin J. Bligh @ 2005-06-24 17:09 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, Con Kolivas

>> OK, still broken with the last 3 backed out, but works with the last 4 
>> backed out. So I guess it's scheduler-cache-hot-autodetect.patch that 
>> breaks it. Con just sent me something else to try to fix it in order 
>> to run next ... will do that.
> 
> hm. Does it work if you disable migration-autodetect via passing in e.g.  
> migration_cost=1000,2000,3000 on the boot line? Is it perhaps the 
> excessive debugging that hurts.
> 
> or does it work if you undo the chunk below? Seemed harmless, but has 
> CONFIG_NUMA relevance.
> 
> 	Ingo
> 
> --- linux/arch/i386/kernel/timers/timer_tsc.c.orig
> +++ linux/arch/i386/kernel/timers/timer_tsc.c
> @@ -133,18 +133,15 @@ static unsigned long long monotonic_cloc
>  
>  /*
>   * Scheduler clock - returns current time in nanosec units.
> + *
> + * it's not a problem if the TSC is unsynchronized,
> + * as the scheduler will carefully compensate for it.
>   */
>  unsigned long long sched_clock(void)
>  {
>  	unsigned long long this_offset;
>  
> -	/*
> -	 * In the NUMA case we dont use the TSC as they are not
> -	 * synchronized across all CPUs.
> -	 */
> -#ifndef CONFIG_NUMA
> -	if (!use_tsc)
> -#endif
> +	if (!cpu_has_tsc)
>  		/* no locking but a rare wrong value is not a big deal */
>  		return jiffies_64 * (1000000000 / HZ);

Humpf. That does look dangerous on a NUMA-Q. The TSCs aren't synced,
and we can't use them .... have to use PIT, whether the CPUs have TSC
or not.

M.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-24 17:09           ` Martin J. Bligh
@ 2005-06-24 19:52             ` Ingo Molnar
  2005-06-24 20:08               ` Nish Aravamudan
  2005-06-24 20:56               ` Martin J. Bligh
  0 siblings, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2005-06-24 19:52 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel, Con Kolivas


* Martin J. Bligh <mbligh@mbligh.org> wrote:

> > -	/*
> > -	 * In the NUMA case we dont use the TSC as they are not
> > -	 * synchronized across all CPUs.
> > -	 */
> > -#ifndef CONFIG_NUMA
> > -	if (!use_tsc)
> > -#endif
> > +	if (!cpu_has_tsc)
> >  		/* no locking but a rare wrong value is not a big deal */
> >  		return jiffies_64 * (1000000000 / HZ);
> 
> Humpf. That does look dangerous on a NUMA-Q. The TSCs aren't synced, 
> and we can't use them .... have to use PIT, whether the CPUs have TSC 
> or not.

is the only problem the unsyncedness? That should be fine as far as the 
scheduler is concerned. (we compensate for per-CPU drifts)

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-24 19:52             ` Ingo Molnar
@ 2005-06-24 20:08               ` Nish Aravamudan
  2005-06-24 20:56               ` Martin J. Bligh
  1 sibling, 0 replies; 20+ messages in thread
From: Nish Aravamudan @ 2005-06-24 20:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel, Con Kolivas

On 6/24/05, Ingo Molnar <mingo@elte.hu> wrote:
> 
> * Martin J. Bligh <mbligh@mbligh.org> wrote:
> 
> > > -   /*
> > > -    * In the NUMA case we dont use the TSC as they are not
> > > -    * synchronized across all CPUs.
> > > -    */
> > > -#ifndef CONFIG_NUMA
> > > -   if (!use_tsc)
> > > -#endif
> > > +   if (!cpu_has_tsc)
> > >             /* no locking but a rare wrong value is not a big deal */
> > >             return jiffies_64 * (1000000000 / HZ);
> >
> > Humpf. That does look dangerous on a NUMA-Q. The TSCs aren't synced,
> > and we can't use them .... have to use PIT, whether the CPUs have TSC
> > or not.
> 
> is the only problem the unsyncedness? That should be fine as far as the
> scheduler is concerned. (we compensate for per-CPU drifts)

I'm pretty sure if the TSC gets used at all in NUMA-Q, the machine
will hang. Whenever I see that "syncronizing TSC across ## CPUs"
message at boot, I know my test is going to fail on NUMA-Q :) It is
not consistent where the hang will occur, either. Sometimes the
machine will boot but then hang in the middle of kernbench. In any
case, the solution is not to use TSC on NUMA-Q. Martin may be able to
give more technical reasons.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-24 19:52             ` Ingo Molnar
  2005-06-24 20:08               ` Nish Aravamudan
@ 2005-06-24 20:56               ` Martin J. Bligh
  2005-06-25  4:00                 ` Ingo Molnar
  1 sibling, 1 reply; 20+ messages in thread
From: Martin J. Bligh @ 2005-06-24 20:56 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, Con Kolivas



--On Friday, June 24, 2005 21:52:48 +0200 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Martin J. Bligh <mbligh@mbligh.org> wrote:
> 
>> > -	/*
>> > -	 * In the NUMA case we dont use the TSC as they are not
>> > -	 * synchronized across all CPUs.
>> > -	 */
>> > -#ifndef CONFIG_NUMA
>> > -	if (!use_tsc)
>> > -#endif
>> > +	if (!cpu_has_tsc)
>> >  		/* no locking but a rare wrong value is not a big deal */
>> >  		return jiffies_64 * (1000000000 / HZ);
>> 
>> Humpf. That does look dangerous on a NUMA-Q. The TSCs aren't synced, 
>> and we can't use them .... have to use PIT, whether the CPUs have TSC 
>> or not.
> 
> is the only problem the unsyncedness? That should be fine as far as the 
> scheduler is concerned. (we compensate for per-CPU drifts)

Well, I think so. But I don't see how you're going to compensate for
large-scale unsynced-ness safely. I've always completely avoided the
TSC altogether on NUMA-Q ... would prefer to keep it that way. We got
lots of wierd random crashes, panics, hangs, and -ve time offsets 
returned from userspace time commands last time I tried it.

M.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-24 20:56               ` Martin J. Bligh
@ 2005-06-25  4:00                 ` Ingo Molnar
  2005-06-25  6:42                   ` Martin J. Bligh
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2005-06-25  4:00 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel, Con Kolivas


* Martin J. Bligh <Martin.Bligh@us.ibm.com> wrote:

> > is the only problem the unsyncedness? That should be fine as far as the 
> > scheduler is concerned. (we compensate for per-CPU drifts)
> 
> Well, I think so. But I don't see how you're going to compensate for 
> large-scale unsynced-ness safely. I've always completely avoided the 
> TSC altogether on NUMA-Q ... would prefer to keep it that way. We got 
> lots of wierd random crashes, panics, hangs, and -ve time offsets 
> returned from userspace time commands last time I tried it.

ok. Would be nice to check whether reverting that single change solves 
the boot problem. If it does i'll switch the measurement method to be 
do_gettimeoffset based, that way the measurement will still be accurate.  
(which is needed for precise migration cost results) Right now the 
calibration uses sched_clock() - which was the reason for the change.

(btw., if the TSC is that unreliable on numaq boxes, shouldnt we disable 
it for userspace apps too? Or are those hangs purely kernel bugs? In 
which case it might make sense to debug those a bit more - large-scale 
TSC unsyncedness is something that could slip in on other hardware too.)

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-25  4:00                 ` Ingo Molnar
@ 2005-06-25  6:42                   ` Martin J. Bligh
  2005-06-25  9:09                     ` Ingo Molnar
  2005-06-25 18:08                     ` Lee Revell
  0 siblings, 2 replies; 20+ messages in thread
From: Martin J. Bligh @ 2005-06-25  6:42 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, Con Kolivas



--Ingo Molnar <mingo@elte.hu> wrote (on Saturday, June 25, 2005 06:00:52 +0200):

> 
> * Martin J. Bligh <Martin.Bligh@us.ibm.com> wrote:
> 
>> > is the only problem the unsyncedness? That should be fine as far as the 
>> > scheduler is concerned. (we compensate for per-CPU drifts)
>> 
>> Well, I think so. But I don't see how you're going to compensate for 
>> large-scale unsynced-ness safely. I've always completely avoided the 
>> TSC altogether on NUMA-Q ... would prefer to keep it that way. We got 
>> lots of wierd random crashes, panics, hangs, and -ve time offsets 
>> returned from userspace time commands last time I tried it.
> 
> ok. Would be nice to check whether reverting that single change solves 
> the boot problem. If it does i'll switch the measurement method to be 
> do_gettimeoffset based, that way the measurement will still be accurate.  
> (which is needed for precise migration cost results) Right now the 
> calibration uses sched_clock() - which was the reason for the change.

OK, will test that.

> (btw., if the TSC is that unreliable on numaq boxes, shouldnt we disable 
> it for userspace apps too? Or are those hangs purely kernel bugs? In 
> which case it might make sense to debug those a bit more - large-scale 
> TSC unsyncedness is something that could slip in on other hardware too.)

Well it reads reliably. it just reliably reads utter random crap (well,
across CPUs). Not many things read tsc from userspace, and it won't hang
I guess .... depends what their expecations are. I do like gettimeofday
not to go backwards though - that tends to bugger things up ;-)

M.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-25  6:42                   ` Martin J. Bligh
@ 2005-06-25  9:09                     ` Ingo Molnar
  2005-06-25 18:08                     ` Lee Revell
  1 sibling, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2005-06-25  9:09 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel, Con Kolivas


* Martin J. Bligh <mbligh@mbligh.org> wrote:

> > (btw., if the TSC is that unreliable on numaq boxes, shouldnt we disable 
> > it for userspace apps too? Or are those hangs purely kernel bugs? In 
> > which case it might make sense to debug those a bit more - large-scale 
> > TSC unsyncedness is something that could slip in on other hardware too.)
> 
> Well it reads reliably. it just reliably reads utter random crap 
> (well, across CPUs). Not many things read tsc from userspace, and it 
> won't hang I guess .... depends what their expecations are. I do like 
> gettimeofday not to go backwards though - that tends to bugger things 
> up ;-)

the patch only adds the TSC back for purposes of sched_clock() (whose 
call sites are robust against cross-CPU migration) - gettimeofday() is 
still using the PIT or HPET.

but i intended this to be an problem-free change - if it causes any 
problems i'll switch the code to use gettimeofday() and not the 
[thus-]lower-accuracy sched_clock().

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-25  6:42                   ` Martin J. Bligh
  2005-06-25  9:09                     ` Ingo Molnar
@ 2005-06-25 18:08                     ` Lee Revell
  2005-06-25 21:03                       ` Martin J. Bligh
  1 sibling, 1 reply; 20+ messages in thread
From: Lee Revell @ 2005-06-25 18:08 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, Con Kolivas

On Fri, 2005-06-24 at 23:42 -0700, Martin J. Bligh wrote:
> > (btw., if the TSC is that unreliable on numaq boxes, shouldnt we disable 
> > it for userspace apps too? Or are those hangs purely kernel bugs? In 
> > which case it might make sense to debug those a bit more - large-scale 
> > TSC unsyncedness is something that could slip in on other hardware too.)
> 
> Well it reads reliably. it just reliably reads utter random crap (well,
> across CPUs). Not many things read tsc from userspace, and it won't hang
> I guess .... depends what their expecations are. I do like gettimeofday
> not to go backwards though - that tends to bugger things up ;-)

The userspace apps that read the TSC know what they are doing, and have
chosen to use the TSC because they need a cheap, fast timer rather than
a correct one.  Please don't break it.

Lee


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-25 18:08                     ` Lee Revell
@ 2005-06-25 21:03                       ` Martin J. Bligh
  2005-06-25 21:54                         ` Lee Revell
  0 siblings, 1 reply; 20+ messages in thread
From: Martin J. Bligh @ 2005-06-25 21:03 UTC (permalink / raw)
  To: Lee Revell; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, Con Kolivas

--Lee Revell <rlrevell@joe-job.com> wrote (on Saturday, June 25, 2005 14:08:25 -0400):

> On Fri, 2005-06-24 at 23:42 -0700, Martin J. Bligh wrote:
>> > (btw., if the TSC is that unreliable on numaq boxes, shouldnt we disable 
>> > it for userspace apps too? Or are those hangs purely kernel bugs? In 
>> > which case it might make sense to debug those a bit more - large-scale 
>> > TSC unsyncedness is something that could slip in on other hardware too.)
>> 
>> Well it reads reliably. it just reliably reads utter random crap (well,
>> across CPUs). Not many things read tsc from userspace, and it won't hang
>> I guess .... depends what their expecations are. I do like gettimeofday
>> not to go backwards though - that tends to bugger things up ;-)
> 
> The userspace apps that read the TSC know what they are doing, and have
> chosen to use the TSC because they need a cheap, fast timer rather than
> a correct one.  Please don't break it.

I have no intent, nor method, of doing so. rdtsc is a direct instruction,
without intervention, as I understand it.

M.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-25 21:03                       ` Martin J. Bligh
@ 2005-06-25 21:54                         ` Lee Revell
  0 siblings, 0 replies; 20+ messages in thread
From: Lee Revell @ 2005-06-25 21:54 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, Con Kolivas

On Sat, 2005-06-25 at 14:03 -0700, Martin J. Bligh wrote:
> I have no intent, nor method, of doing so. rdtsc is a direct instruction,
> without intervention, as I understand it.

I thought so too, but it turns out the CPU lets you disable it at boot.
This was suggested as a possible fix for the alleged hyperthreading
vulnerability.

Lee


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
  2005-06-24 17:01         ` Ingo Molnar
  2005-06-24 17:09           ` Martin J. Bligh
@ 2005-06-28 22:09           ` Martin J. Bligh
  1 sibling, 0 replies; 20+ messages in thread
From: Martin J. Bligh @ 2005-06-28 22:09 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, Con Kolivas



--On Friday, June 24, 2005 19:01:12 +0200 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Martin J. Bligh <mbligh@mbligh.org> wrote:
> 
>> OK, still broken with the last 3 backed out, but works with the last 4 
>> backed out. So I guess it's scheduler-cache-hot-autodetect.patch that 
>> breaks it. Con just sent me something else to try to fix it in order 
>> to run next ... will do that.
> 
> hm. Does it work if you disable migration-autodetect via passing in e.g.  
> migration_cost=1000,2000,3000 on the boot line? Is it perhaps the 
> excessive debugging that hurts.
> 
> or does it work if you undo the chunk below? Seemed harmless, but has 
> CONFIG_NUMA relevance.

Sorry for slow response - had some problems with machines and the harness.
That didn't fix it, I'm afraid. I'll try to find some more time to beat
on the problem later.

M.

> 	Ingo
> 
> --- linux/arch/i386/kernel/timers/timer_tsc.c.orig
> +++ linux/arch/i386/kernel/timers/timer_tsc.c
> @@ -133,18 +133,15 @@ static unsigned long long monotonic_cloc
>  
>  /*
>   * Scheduler clock - returns current time in nanosec units.
> + *
> + * it's not a problem if the TSC is unsynchronized,
> + * as the scheduler will carefully compensate for it.
>   */
>  unsigned long long sched_clock(void)
>  {
>  	unsigned long long this_offset;
>  
> -	/*
> -	 * In the NUMA case we dont use the TSC as they are not
> -	 * synchronized across all CPUs.
> -	 */
> -#ifndef CONFIG_NUMA
> -	if (!use_tsc)
> -#endif
> +	if (!cpu_has_tsc)
>  		/* no locking but a rare wrong value is not a big deal */
>  		return jiffies_64 * (1000000000 / HZ);
>  
> 
> 
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: 2.6.12-mm1 boot failure on NUMA box.
       [not found] <208690000.1119330454@[10.10.2.4].suse.lists.linux.kernel>
@ 2005-06-21 11:31 ` Andi Kleen
  0 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2005-06-21 11:31 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel

"Martin J. Bligh" <mbligh@mbligh.org> writes:

> OK, after fixing the build failure with Andy's patch here:
> 
> http://mbligh.org/abat/apw_pci_assign_unassigned_resources
> 
> I get a boot failure on the NUMA-Q box. Full log is here:

FWIW i tried 2.6.12-rc6 (not final yet) on a 16 way x86-64 box
and it also always deadlocked early when trying to boot the
other CPUs (in fact when waiting for the migration thread
to process a request). 2.6.11 worked.

-Andi


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2005-06-28 22:18 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-21  5:07 2.6.12-mm1 boot failure on NUMA box Martin J. Bligh
2005-06-21  6:29 ` Andrew Morton
2005-06-21 14:22   ` Martin J. Bligh
2005-06-21 20:03     ` Andrew Morton
2005-06-24 14:11       ` Martin J. Bligh
2005-06-24 14:14         ` Con Kolivas
2005-06-24 15:31           ` Martin J. Bligh
2005-06-24 17:01         ` Ingo Molnar
2005-06-24 17:09           ` Martin J. Bligh
2005-06-24 19:52             ` Ingo Molnar
2005-06-24 20:08               ` Nish Aravamudan
2005-06-24 20:56               ` Martin J. Bligh
2005-06-25  4:00                 ` Ingo Molnar
2005-06-25  6:42                   ` Martin J. Bligh
2005-06-25  9:09                     ` Ingo Molnar
2005-06-25 18:08                     ` Lee Revell
2005-06-25 21:03                       ` Martin J. Bligh
2005-06-25 21:54                         ` Lee Revell
2005-06-28 22:09           ` Martin J. Bligh
     [not found] <208690000.1119330454@[10.10.2.4].suse.lists.linux.kernel>
2005-06-21 11:31 ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.