linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: not fixed in 2.4.23-rc3 (was: Re: 2.4.22 SMP kernel build for hyper threading P4)
@ 2003-11-23 20:16 Brown, Len
  2003-11-23 20:45 ` Eduard Bloch
  0 siblings, 1 reply; 8+ messages in thread
From: Brown, Len @ 2003-11-23 20:16 UTC (permalink / raw)
  To: Eduard Bloch, linux-kernel

> weird 1+2xHT mode.

Re: BIOS disables CPUSs.
It would be good to verify that 2.4.21 still works properly on this box
to verify the hardware isn't hosed.  Also, if your BIOS CMOS has error
logs, it might be good to read them to see what it is thinking.

Also, does the same 3-cpu configuration result when you boot 2.6?

Thanks,
-Len

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: not fixed in 2.4.23-rc3 (was: Re: 2.4.22 SMP kernel build for hyper threading P4)
  2003-11-23 20:16 not fixed in 2.4.23-rc3 (was: Re: 2.4.22 SMP kernel build for hyper threading P4) Brown, Len
@ 2003-11-23 20:45 ` Eduard Bloch
  2003-11-24  6:19   ` Len Brown
  0 siblings, 1 reply; 8+ messages in thread
From: Eduard Bloch @ 2003-11-23 20:45 UTC (permalink / raw)
  To: Brown, Len; +Cc: linux-kernel

#include <hallo.h>
* Brown, Len [Sun, Nov 23 2003, 03:16:11PM]:
> > weird 1+2xHT mode.
> 
> Re: BIOS disables CPUSs.
> It would be good to verify that 2.4.21 still works properly on this box
> to verify the hardware isn't hosed.  Also, if your BIOS CMOS has error

It does work fine with 2.4.21, the last part of the log on the mentioned
URL is few hours old.  The hardware looks okay, the box was running for
more than 14 months without any hardware trouble.

> logs, it might be good to read them to see what it is thinking.

I could not find any error logs till now, the BIOS help only says that a
CPU is turned off when a severe error occured.

> Also, does the same 3-cpu configuration result when you boot 2.6?

I cannot promise when I have a chance to test 2.6 there, it's a
production system.

Mfg,
Eduard.
-- 
Eine Freude vertreibt hundert Sorgen.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: not fixed in 2.4.23-rc3 (was: Re: 2.4.22 SMP kernel build for hyper threading P4)
  2003-11-23 20:45 ` Eduard Bloch
@ 2003-11-24  6:19   ` Len Brown
  2003-11-24  7:00     ` William Lee Irwin III
  2003-11-30  9:28     ` Eduard Bloch
  0 siblings, 2 replies; 8+ messages in thread
From: Len Brown @ 2003-11-24  6:19 UTC (permalink / raw)
  To: Eduard Bloch; +Cc: linux-kernel, davej

On Sun, 2003-11-23 at 15:45, Eduard Bloch wrote:
> #include <hallo.h>
> * Brown, Len [Sun, Nov 23 2003, 03:16:11PM]:
> > > weird 1+2xHT mode.

Please try CONFIG_NR_CPUS=8, or apply the patch below to 2.4.23.

smp_boot_cpus() incorrectly assumes that Local APIC ID's are handed out
0,1,2...

But they're handed out 0,1,6,7 on your system.  #6 happens to be your
boot CPU, smp_boot_cpus() brings up #0 and #1, and never asks to boot #7
-- thus 3 logical processors.  If #0 happened to be your boot processor,
you'd get only 2 logical processors.

cheers,
-Len


===== arch/i386/kernel/smpboot.c 1.17 vs edited =====
--- 1.17/arch/i386/kernel/smpboot.c	Mon Nov  3 08:48:33 2003
+++ edited/arch/i386/kernel/smpboot.c	Mon Nov 24 01:06:26 2003
@@ -1106,7 +1106,7 @@
 	 */
 	Dprintk("CPU present map: %lx\n", phys_cpu_present_map);
 
-	for (bit = 0; bit < NR_CPUS; bit++) {
+	for (bit = 0; bit < MAX_APICS; bit++) {
 		apicid = cpu_present_to_apicid(bit);
 		
 		/* don't try to boot BAD_APICID */




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: not fixed in 2.4.23-rc3 (was: Re: 2.4.22 SMP kernel build for hyper threading P4)
  2003-11-24  6:19   ` Len Brown
@ 2003-11-24  7:00     ` William Lee Irwin III
  2003-11-24 16:49       ` Len Brown
  2003-11-30  9:28     ` Eduard Bloch
  1 sibling, 1 reply; 8+ messages in thread
From: William Lee Irwin III @ 2003-11-24  7:00 UTC (permalink / raw)
  To: Len Brown; +Cc: Eduard Bloch, linux-kernel, davej

On Sun, 2003-11-23 at 15:45, Eduard Bloch wrote:
>> #include <hallo.h>
>> * Brown, Len [Sun, Nov 23 2003, 03:16:11PM]:
>> > > weird 1+2xHT mode.

On Mon, Nov 24, 2003 at 01:19:07AM -0500, Len Brown wrote:
> Please try CONFIG_NR_CPUS=8, or apply the patch below to 2.4.23.
> smp_boot_cpus() incorrectly assumes that Local APIC ID's are handed out
> 0,1,2...
> But they're handed out 0,1,6,7 on your system.  #6 happens to be your
> boot CPU, smp_boot_cpus() brings up #0 and #1, and never asks to boot #7
> -- thus 3 logical processors.  If #0 happened to be your boot processor,
> you'd get only 2 logical processors.

A similar (but more elaborate) fix is in 2.6.


-- wli

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: not fixed in 2.4.23-rc3 (was: Re: 2.4.22 SMP kernel build for hyper threading P4)
  2003-11-24  7:00     ` William Lee Irwin III
@ 2003-11-24 16:49       ` Len Brown
  2003-11-24 22:55         ` William Lee Irwin III
  0 siblings, 1 reply; 8+ messages in thread
From: Len Brown @ 2003-11-24 16:49 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Eduard Bloch, linux-kernel, davej

On Mon, 2003-11-24 at 02:00, William Lee Irwin III wrote:

> A similar (but more elaborate) fix is in 2.6.

Why is the additional variable "kicked" in 2.6 necessary?
Appears that kicked == (cpucount + 1), and the loop already
compares that to NR_CPUS via max_cpus:

                if (max_cpus <= cpucount+1)
                        continue;

Though I think it would read more clearly this way:

                if (cpucount + 1 >= max_cpus)
                        break;

Speaking of max_cpus, it would probably be a good thing if maxcpus() did
not allow the administrator to set max_cpus > NR_CPUS at boot time.

cheers,
-Len




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: not fixed in 2.4.23-rc3 (was: Re: 2.4.22 SMP kernel build for hyper threading P4)
  2003-11-24 16:49       ` Len Brown
@ 2003-11-24 22:55         ` William Lee Irwin III
  0 siblings, 0 replies; 8+ messages in thread
From: William Lee Irwin III @ 2003-11-24 22:55 UTC (permalink / raw)
  To: Len Brown; +Cc: Eduard Bloch, linux-kernel, davej

On Mon, 2003-11-24 at 02:00, William Lee Irwin III wrote:
>> A similar (but more elaborate) fix is in 2.6.

On Mon, Nov 24, 2003 at 11:49:18AM -0500, Len Brown wrote:
> Why is the additional variable "kicked" in 2.6 necessary?
> Appears that kicked == (cpucount + 1), and the loop already
> compares that to NR_CPUS via max_cpus:
>                 if (max_cpus <= cpucount+1)
>                         continue;
> Though I think it would read more clearly this way:
>                 if (cpucount + 1 >= max_cpus)
>                         break;
> Speaking of max_cpus, it would probably be a good thing if maxcpus() did
> not allow the administrator to set max_cpus > NR_CPUS at boot time.

There's some kind of shenanigan going on with cpucount I can't be arsed
to decipher where it's incremented and decremented all over the place;
do_boot_cpu() returns status, so counting successes got the bug fixed.
Fixing max_cpus > NR_CPUS has some kind of core impact. So I've pretty
much punted on both killing kicked and fixing max_cpus. I anticipate
a particular someone I don't want to hear from complaining loudly.

Maybe something like the below (untested) would be helpful.


-- wli


Propagate kicked down to do_boot_cpu() and nuke the redundant cpucount.
The choice of which to nuke was based on cpucount being modified all
over the place and actually being 1 less than all cpus, vs. kicked being
a counter maintained all in one place and actually representing the
total number of cpus. Old semantics are AFAICT preserved apart from
terminating the wakeup loop at kicked < max(NR_CPUS, max_cpus), which is
actually a darn good idea since there's no reason to fiddle with the
rest of the APIC ID space once we've got all the cpus we're allowed to.


diff -urpN linux-2.6.0-test9/arch/i386/kernel/smpboot.c maxcpus-2.6.0-test9/arch/i386/kernel/smpboot.c
--- linux-2.6.0-test9/arch/i386/kernel/smpboot.c	2003-10-25 11:43:36.000000000 -0700
+++ maxcpus-2.6.0-test9/arch/i386/kernel/smpboot.c	2003-11-24 14:44:50.000000000 -0800
@@ -426,8 +426,6 @@ void __init smp_callin(void)
 		synchronize_tsc_ap();
 }
 
-int cpucount;
-
 extern int cpu_idle(void);
 
 /*
@@ -772,7 +770,7 @@ wakeup_secondary_cpu(int phys_apicid, un
 
 extern cpumask_t cpu_initialized;
 
-static int __init do_boot_cpu(int apicid)
+static int __init do_boot_cpu(int apicid, int cpu)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -781,11 +779,10 @@ static int __init do_boot_cpu(int apicid
 {
 	struct task_struct *idle;
 	unsigned long boot_error;
-	int timeout, cpu;
+	int timeout;
 	unsigned long start_eip;
 	unsigned short nmi_high = 0, nmi_low = 0;
 
-	cpu = ++cpucount;
 	/*
 	 * We can't use kernel_thread since we must avoid to
 	 * reschedule the child.
@@ -871,7 +868,6 @@ static int __init do_boot_cpu(int apicid
 		unmap_cpu_to_logical_apicid(cpu);
 		cpu_clear(cpu, cpu_callout_map); /* was set here (do_boot_cpu()) */
 		cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
-		cpucount--;
 	}
 
 	/* mark "stuck" area as not stuck */
@@ -1021,7 +1017,7 @@ static void __init smp_boot_cpus(unsigne
 	Dprintk("CPU present map: %lx\n", physids_coerce(phys_cpu_present_map));
 
 	kicked = 1;
-	for (bit = 0; kicked < NR_CPUS && bit < MAX_APICS; bit++) {
+	for (bit = 0; kicked < min(NR_CPUS, maxcpus) && bit < MAX_APICS; bit++) {
 		apicid = cpu_present_to_apicid(bit);
 		/*
 		 * Don't even attempt to start the boot CPU!
@@ -1031,10 +1027,7 @@ static void __init smp_boot_cpus(unsigne
 
 		if (!check_apicid_present(bit))
 			continue;
-		if (max_cpus <= cpucount+1)
-			continue;
-
-		if (do_boot_cpu(apicid))
+		if (do_boot_cpu(apicid, kicked))
 			printk("CPU #%d not responding - cannot use it.\n",
 								apicid);
 		else
@@ -1055,7 +1048,7 @@ static void __init smp_boot_cpus(unsigne
 			bogosum += cpu_data[cpu].loops_per_jiffy;
 	printk(KERN_INFO
 		"Total of %d processors activated (%lu.%02lu BogoMIPS).\n",
-		cpucount+1,
+		kicked,
 		bogosum/(500000/HZ),
 		(bogosum/(5000/HZ))%100);
 	
@@ -1069,7 +1062,7 @@ static void __init smp_boot_cpus(unsigne
 	 * approved Athlon
 	 */
 	if (tainted & TAINT_UNSAFE_SMP) {
-		if (cpucount)
+		if (kicked > 1)
 			printk (KERN_INFO "WARNING: This combination of AMD processors is not suitable for SMP.\n");
 		else
 			tainted &= ~TAINT_UNSAFE_SMP;
@@ -1113,7 +1106,7 @@ static void __init smp_boot_cpus(unsigne
 	/*
 	 * Synchronize the TSC with the AP
 	 */
-	if (cpu_has_tsc && cpucount && cpu_khz)
+	if (cpu_has_tsc && kicked > 1 && cpu_khz)
 		synchronize_tsc_bp();
 }
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: not fixed in 2.4.23-rc3 (was: Re: 2.4.22 SMP kernel build for hyper threading P4)
  2003-11-24  6:19   ` Len Brown
  2003-11-24  7:00     ` William Lee Irwin III
@ 2003-11-30  9:28     ` Eduard Bloch
  1 sibling, 0 replies; 8+ messages in thread
From: Eduard Bloch @ 2003-11-30  9:28 UTC (permalink / raw)
  To: Len Brown; +Cc: linux-kernel, davej

#include <hallo.h>
* Len Brown [Mon, Nov 24 2003, 01:19:07AM]:

> > #include <hallo.h>
> > * Brown, Len [Sun, Nov 23 2003, 03:16:11PM]:
> > > > weird 1+2xHT mode.
> 
> Please try CONFIG_NR_CPUS=8, or apply the patch below to 2.4.23.

The first thing fixed the problem, thank you.

MfG,
Eduard.
-- 
Es ereignet sich nichts Neues. Es sind immer dieselben alten
Geschichten, die von immer neuen Menschen erlebt werden.
		-- William Faulkner

^ permalink raw reply	[flat|nested] 8+ messages in thread

* not fixed in 2.4.23-rc3 (was: Re: 2.4.22 SMP kernel build for hyper threading P4)
  2003-11-15 16:40 ` Eduard Bloch
@ 2003-11-23 15:06   ` Eduard Bloch
  0 siblings, 0 replies; 8+ messages in thread
From: Eduard Bloch @ 2003-11-23 15:06 UTC (permalink / raw)
  To: linux-kernel

#include <hallo.h>
* Eduard Bloch [Sat, Nov 15 2003, 05:40:15PM]:

> > I've even build a 2.4.22 kernel with the config-2.4.20-20.9smp
> > configuration that came with RH9.
> 
> I see a very similar behaviour on a dual Xeon system here. But it began
> with 2.4.22. After upgrading from 2.4.21 to 2.4.22, HT was disabled. It
> became even worse, the mainboard (some Serverworks board in Siemens
> Primergy F250 server) decided to disable the second CPU completely (I
> don't have the logs from that moment, sorry).

The problem is still reproducible with 2.4.23-rc3. And the box has
rebooted suddently after 10 days running 2.4.23-pre9 in the weird 1+2xHT
mode. After the reboot, the second CPU disappeared and now I saw that
the BIOS disabled both CPUs, maybe because of detected errors.

> After upgrading to 2.4.23-pre9 and enabling the second CPU in the
> mainboard BIOS, I see HT only working on the second CPU. There only a
> message about the first: WARNING: No sibling found for CPU 0. I tried
> compiling with or without ACPI, it makes no difference. I can live with
> 3 virtual CPUs but idealy it should be fixed before 2.4.23 release.
> Needles to say that it still works with 4 virtual CPUs using 2.4.21.
> 
> Kernel messages for those who are interessted:

Both logs from 2.4.23-pre and 2.4.21 can be found on:

http://sites.inka.de/W1752/ht-trouble.txt

MfG,
Eduard.
-- 
Abwechslung ist des Lebens Reiz, was freilich jede glückliche Ehe zu
widerlegen scheint.
		-- Theodor Fontane

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-11-30  9:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-23 20:16 not fixed in 2.4.23-rc3 (was: Re: 2.4.22 SMP kernel build for hyper threading P4) Brown, Len
2003-11-23 20:45 ` Eduard Bloch
2003-11-24  6:19   ` Len Brown
2003-11-24  7:00     ` William Lee Irwin III
2003-11-24 16:49       ` Len Brown
2003-11-24 22:55         ` William Lee Irwin III
2003-11-30  9:28     ` Eduard Bloch
  -- strict thread matches above, loose matches on Subject: below --
2003-11-15 15:40 2.4.22 SMP kernel build for hyper threading P4 Job 317
2003-11-15 16:40 ` Eduard Bloch
2003-11-23 15:06   ` not fixed in 2.4.23-rc3 (was: Re: 2.4.22 SMP kernel build for hyper threading P4) Eduard Bloch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).