linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
@ 2006-12-22  4:41 Zhang, Yanmin
  2006-12-22  8:22 ` Ard -kwaak- van Breemen
  0 siblings, 1 reply; 42+ messages in thread
From: Zhang, Yanmin @ 2006-12-22  4:41 UTC (permalink / raw)
  To: Ard -kwaak- van Breemen
  Cc: Andrew Morton, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

>>-----Original Message-----
>>From: Ard -kwaak- van Breemen [mailto:ard@telegraafnet.nl]
>>Sent: 2006年12月22日 5:06
>>To: Zhang, Yanmin
>>Cc: Andrew Morton; Chuck Ebbert; Yinghai Lu; take@libero.it; agalanin@mera.ru; linux-kernel@vger.kernel.org;
>>bugme-daemon@bugzilla.kernel.org; Eric W. Biederman
>>Subject: Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
>>
>>On Thu, Dec 21, 2006 at 04:04:04PM +0800, Zhang, Yanmin wrote:
>>> I couldn't reproduce it on my EM64T machine. I instrumented function start_kernel and
>>> didn't find irq was enabled before calling init_IRQ. It'll be better if the reporter could
>>> instrument function start_kernel to capture which function enables irq.
>>
>>Editing init/main.c:
>>        preempt_disable();
>>        if (!irqs_disabled())
>>                printk("start_kernel(): bug: interrupts were enabled early\n");
>>                printk("BLAAT17");
>>        build_all_zonelists();
>>        if (!irqs_disabled())
>>                printk("start_kernel(): bug: interrupts were enabled early\n");
>>                printk("BLAAT18");
>>        page_alloc_init();
>>        if (!irqs_disabled())
>>                printk("start_kernel(): bug: interrupts were enabled early\n");
>>                printk("BLAAT19");
>>        printk(KERN_NOTICE "Kernel command line: %s\n", saved_command_line);
>>        parse_early_param();
>>        if (!irqs_disabled())
>>                printk("start_kernel(): bug: interrupts were enabled early\n");
>>                printk("BLAAT20");
>>        parse_args("Booting kernel", command_line, __start___param,
>>                   __stop___param - __start___param,
>>                   &unknown_bootoption);
>>                printk("BLAAT21");
>>        if (!irqs_disabled())
>>                printk("start_kernel(): bug: interrupts were enabled early\n");
>>        sort_main_extable();
>>        if (!irqs_disabled())
>>                printk("start_kernel(): bug: interrupts were enabled early\n");
>>                printk("BLAAT22");
>>        trap_init();
>>        if (!irqs_disabled())
>>                printk("start_kernel(): bug: interrupts were enabled early\n");
>>                printk("BLAAT23");
>>
>>Results in:
>>^MAllocating PCI resources starting at 88000000 (gap: 80000000:60000000)
>>^MBLAAT12BLAAT13<6>PERCPU: Allocating 32960 bytes of per cpu data
>>^MBLAAT14BLAAT15BLAAT16BLAAT17Built 2 zonelists.  Total pages: 1032635
>>^MBLAAT18BLAAT19<5>Kernel command line: console=tty0 console=ttyS0,115200 hdb=noprobe hdc=noprobe hdd=noprobe root=/dev/md0 ro panic=30
>>earlyprintk=serial,ttyS0,115200
>>^MBLAAT20<6>ide_setup: hdb=noprobe
>>^Mide_setup: hdc=noprobe
>>^Mide_setup: hdd=noprobe
>>^MBLAAT21start_kernel(): bug: interrupts were enabled early
>>^Mstart_kernel(): bug: interrupts were enabled early
>>^MBLAAT22Initializing CPU#0
>>
>>Hmmm, that actually doesn't make sense to me (unless parse_args is able to enable irq's).
I think parse_args enables irq when it calls callbacks.
Could you try below?
1) Test Andrew's patch of sema down_write;
2) Apply below patch and see what the output is when booting. If the output has
"[BUG]..address.", Pls. map the address to function name by System.map.

--- linux-2.6.19/kernel/params.c	2006-12-08 15:32:49.000000000 +0800
+++ linux-2.6.19_work/kernel/params.c	2006-12-22 12:28:38.000000000 +0800
@@ -53,13 +53,22 @@ static int parse_one(char *param,
 		     int (*handle_unknown)(char *param, char *val))
 {
 	unsigned int i;
+	int result;
+	int irq_is_disabled;
 
 	/* Find parameter */
 	for (i = 0; i < num_params; i++) {
 		if (parameq(param, params[i].name)) {
 			DEBUGP("They are equal!  Calling %p\n",
 			       params[i].set);
-			return params[i].set(val, &params[i]);
+			irq_is_disabled = irqs_disabled();
+			result = params[i].set(val, &params[i]);
+			if (irq_is_disabled && !irqs_disabled())
+			{
+				printk("[BUG] parse_one: function %p enabled irq!\n",
+						params[i].set);
+			}
+			return result;
 		}
 	}
 
--- linux-2.6.19/init/main.c	2006-12-08 15:32:49.000000000 +0800
+++ linux-2.6.19_work/init/main.c	2006-12-22 12:28:50.000000000 +0800
@@ -181,6 +181,7 @@ static int __init obsolete_checksetup(ch
 {
 	struct obs_kernel_param *p;
 	int had_early_param = 0;
+	int result, irq_is_disabled;
 
 	p = __setup_start;
 	do {
@@ -197,8 +198,17 @@ static int __init obsolete_checksetup(ch
 				printk(KERN_WARNING "Parameter %s is obsolete,"
 				       " ignored\n", p->str);
 				return 1;
-			} else if (p->setup_func(line + n))
-				return 1;
+			} else {
+				irq_is_disabled = irqs_disabled();
+				result = p->setup_func(line + n);
+				if (irq_is_disabled && !irqs_disabled())
+				{
+					printk("[BUG] obsolete_checksetup: function %p enabled irq!\n",
+							p->setup_func);
+				}
+				if (result)
+					return 1;
+			}
 		}
 		p++;
 	} while (p < __setup_end);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22  4:41 [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine Zhang, Yanmin
@ 2006-12-22  8:22 ` Ard -kwaak- van Breemen
  2006-12-22  8:30   ` Andrew Morton
  0 siblings, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-22  8:22 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Andrew Morton, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

Hello,
On Fri, Dec 22, 2006 at 12:41:46PM +0800, Zhang, Yanmin wrote:
> I think parse_args enables irq when it calls callbacks.
> Could you try below?
> 1) Test Andrew's patch of sema down_write;
> 2) Apply below patch and see what the output is when booting. If the output has
> "[BUG]..address.", Pls. map the address to function name by System.map.
Without proof^H^H^H^H^Hpasting my dmesg and the "diff", I already
concluded that ide_setup was the culprit. (I've debuged
parse_one, and it barfed around the 3rd parameter which is
hdb=noprobe).
Anyway, a bad night of sleep reminds me that our EM64T boxes also
have this line (which actually is a remainder of our VA1220 boxes
;-) ), and they don't barf, so it must be either the combination
of the sata_nv together with the pata driver part, *or* just the
pata driver part. (Our opteron != nforce chipsets also works).

I will trace down the ide_setup today. First loads of coffee.

-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22  8:22 ` Ard -kwaak- van Breemen
@ 2006-12-22  8:30   ` Andrew Morton
  2006-12-22  9:32     ` Stefano Takekawa
                       ` (3 more replies)
  0 siblings, 4 replies; 42+ messages in thread
From: Andrew Morton @ 2006-12-22  8:30 UTC (permalink / raw)
  To: Ard -kwaak- van Breemen
  Cc: Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

On Fri, 22 Dec 2006 09:22:48 +0100
Ard -kwaak- van Breemen <ard@telegraafnet.nl> wrote:

> Hello,
> On Fri, Dec 22, 2006 at 12:41:46PM +0800, Zhang, Yanmin wrote:
> > I think parse_args enables irq when it calls callbacks.
> > Could you try below?
> > 1) Test Andrew's patch of sema down_write;
> > 2) Apply below patch and see what the output is when booting. If the output has
> > "[BUG]..address.", Pls. map the address to function name by System.map.
> Without proof^H^H^H^H^Hpasting my dmesg and the "diff", I already
> concluded that ide_setup was the culprit. (I've debuged
> parse_one, and it barfed around the 3rd parameter which is
> hdb=noprobe).
> Anyway, a bad night of sleep reminds me that our EM64T boxes also
> have this line (which actually is a remainder of our VA1220 boxes
> ;-) ), and they don't barf, so it must be either the combination
> of the sata_nv together with the pata driver part, *or* just the
> pata driver part. (Our opteron != nforce chipsets also works).
> 

I expect that you'll find that the ide code ends up doing
down_write(pci_bus_sem), which will enable interrupts.

(We don't know which interrupt is pending this early - that'd be
interesting to find out, but we shouldn't be enabling interrupts in there).

To whom do I have to pay how much to get this darn patch tested?



--- a/lib/rwsem-spinlock.c~down_write-preserve-local-irqs
+++ a/lib/rwsem-spinlock.c
@@ -195,13 +195,14 @@ void fastcall __sched __down_write_neste
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
+	unsigned long flags;
 
-	spin_lock_irq(&sem->wait_lock);
+	spin_lock_irqsave(&sem->wait_lock, flags);
 
 	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
 		/* granted */
 		sem->activity = -1;
-		spin_unlock_irq(&sem->wait_lock);
+		spin_unlock_irqrestore(&sem->wait_lock, flags);
 		goto out;
 	}
 
@@ -216,7 +217,7 @@ void fastcall __sched __down_write_neste
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we don't need to touch the semaphore struct anymore */
-	spin_unlock_irq(&sem->wait_lock);
+	spin_unlock_irqrestore(&sem->wait_lock, flags);
 
 	/* wait to be given the lock */
 	for (;;) {
_


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22  8:30   ` Andrew Morton
@ 2006-12-22  9:32     ` Stefano Takekawa
  2006-12-22  9:43       ` Andrew Morton
  2006-12-22 10:30     ` Ard -kwaak- van Breemen
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 42+ messages in thread
From: Stefano Takekawa @ 2006-12-22  9:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ard -kwaak- van Breemen, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu,
	agalanin, linux-kernel, bugme-daemon, Eric W. Biederman

Il giorno ven, 22/12/2006 alle 00.30 -0800, Andrew Morton ha scritto:
> On Fri, 22 Dec 2006 09:22:48 +0100
> Ard -kwaak- van Breemen <ard@telegraafnet.nl> wrote:
> 
> > Hello,
> > On Fri, Dec 22, 2006 at 12:41:46PM +0800, Zhang, Yanmin wrote:
> > > I think parse_args enables irq when it calls callbacks.
> > > Could you try below?
> > > 1) Test Andrew's patch of sema down_write;
> > > 2) Apply below patch and see what the output is when booting. If the output has
> > > "[BUG]..address.", Pls. map the address to function name by System.map.
> > Without proof^H^H^H^H^Hpasting my dmesg and the "diff", I already
> > concluded that ide_setup was the culprit. (I've debuged
> > parse_one, and it barfed around the 3rd parameter which is
> > hdb=noprobe).
> > Anyway, a bad night of sleep reminds me that our EM64T boxes also
> > have this line (which actually is a remainder of our VA1220 boxes
> > ;-) ), and they don't barf, so it must be either the combination
> > of the sata_nv together with the pata driver part, *or* just the
> > pata driver part. (Our opteron != nforce chipsets also works).
> > 
> 
> I expect that you'll find that the ide code ends up doing
> down_write(pci_bus_sem), which will enable interrupts.
> 
> (We don't know which interrupt is pending this early - that'd be
> interesting to find out, but we shouldn't be enabling interrupts in there).
> 
> To whom do I have to pay how much to get this darn patch tested?
> 
> 
> 
> --- a/lib/rwsem-spinlock.c~down_write-preserve-local-irqs
> +++ a/lib/rwsem-spinlock.c
> @@ -195,13 +195,14 @@ void fastcall __sched __down_write_neste
>  {
>  	struct rwsem_waiter waiter;
>  	struct task_struct *tsk;
> +	unsigned long flags;
>  
> -	spin_lock_irq(&sem->wait_lock);
> +	spin_lock_irqsave(&sem->wait_lock, flags);
>  
>  	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
>  		/* granted */
>  		sem->activity = -1;
> -		spin_unlock_irq(&sem->wait_lock);
> +		spin_unlock_irqrestore(&sem->wait_lock, flags);
>  		goto out;
>  	}
>  
> @@ -216,7 +217,7 @@ void fastcall __sched __down_write_neste
>  	list_add_tail(&waiter.list, &sem->wait_list);
>  
>  	/* we don't need to touch the semaphore struct anymore */
> -	spin_unlock_irq(&sem->wait_lock);
> +	spin_unlock_irqrestore(&sem->wait_lock, flags);
>  
>  	/* wait to be given the lock */
>  	for (;;) {
> _
> 
Applied to 2.6.19 it doesn't change anything. It still panics.

How can I have something similar to a serial console on a laptop without
serial port but with a parallel one? Will netconsole work?



-- 
Stefano Takekawa
take@libero.it

Frank:  And why do days get longer in the summer?
Ernest: Because heat makes things expand!



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22  9:32     ` Stefano Takekawa
@ 2006-12-22  9:43       ` Andrew Morton
  2006-12-22 13:23         ` Stefano Takekawa
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2006-12-22  9:43 UTC (permalink / raw)
  To: Stefano Takekawa
  Cc: Ard -kwaak- van Breemen, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu,
	agalanin, linux-kernel, bugme-daemon, Eric W. Biederman

On Fri, 22 Dec 2006 10:32:51 +0100
Stefano Takekawa <take@libero.it> wrote:

> Applied to 2.6.19 it doesn't change anything. It still panics.

Really?

And you can confirm that converting pci_bus_sem back into a spinlock fixes
it?

> How can I have something similar to a serial console on a laptop without
> serial port but with a parallel one? Will netconsole work?
> 

No, netconsole isn't available for quite some time after the kernel starts.

Your best bet would be to boot with `earlyprintk=vga vga=N', where N is
something which gives lots of rows.  0F01, perhaps.

Then, take a digital photo of the display.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22  8:30   ` Andrew Morton
  2006-12-22  9:32     ` Stefano Takekawa
@ 2006-12-22 10:30     ` Ard -kwaak- van Breemen
  2006-12-22 14:00       ` Ard -kwaak- van Breemen
  2006-12-22 14:35     ` Ard -kwaak- van Breemen
  2006-12-22 14:41     ` Ard -kwaak- van Breemen
  3 siblings, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-22 10:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

Hello,
On Fri, Dec 22, 2006 at 12:30:29AM -0800, Andrew Morton wrote:
> To whom do I have to pay how much to get this darn patch tested?
I've already tested that (as I said somewhere in the bugzilla so
it probably got lost somehow :-) ): It doesn't solve the booting
problem, and I really don't have an idea what it does, nor does
it output any debug code. So I left it at: doesn't fix ;-).

Anyway: on to the ide_setup tracking....
(I've noticed that the notifier of this problem als has idebus=66
or something similar, so that explains in his case the
early call to ide_setup.)

-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22  9:43       ` Andrew Morton
@ 2006-12-22 13:23         ` Stefano Takekawa
  0 siblings, 0 replies; 42+ messages in thread
From: Stefano Takekawa @ 2006-12-22 13:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ard -kwaak- van Breemen, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu,
	agalanin, linux-kernel, bugme-daemon, Eric W. Biederman

Il giorno ven, 22/12/2006 alle 01.43 -0800, Andrew Morton ha scritto:
> On Fri, 22 Dec 2006 10:32:51 +0100
> Stefano Takekawa <take@libero.it> wrote:
> 
> > Applied to 2.6.19 it doesn't change anything. It still panics.
> 
> Really?
> 
> And you can confirm that converting pci_bus_sem back into a spinlock fixes
> it?
> 
> > How can I have something similar to a serial console on a laptop without
> > serial port but with a parallel one? Will netconsole work?
> > 
> 
> No, netconsole isn't available for quite some time after the kernel starts.
> 
> Your best bet would be to boot with `earlyprintk=vga vga=N', where N is
> something which gives lots of rows.  0F01, perhaps.
> 
> Then, take a digital photo of the display.

I can't take any digital photo. Well I got this:
2.6.19 + lib/rwsem-spinlock.c patched + hdc=ide-cd or idebus=66 >> panic
2.6.19 + lib/rwsem-spinlock.c patched + no ide_setup calls >> works!!!
2.6.19 + spinlock reversed >> always works

-- 
Stefano Takekawa
take@libero.it

Frank:  And why do days get longer in the summer?
Ernest: Because heat makes things expand!



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22 10:30     ` Ard -kwaak- van Breemen
@ 2006-12-22 14:00       ` Ard -kwaak- van Breemen
  2006-12-22 14:16         ` Ard -kwaak- van Breemen
  0 siblings, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-22 14:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

On Fri, Dec 22, 2006 at 11:30:05AM +0100, Ard -kwaak- van Breemen wrote:
> Anyway: on to the ide_setup tracking....
> (I've noticed that the notifier of this problem als has idebus=66
> or something similar, so that explains in his case the
> early call to ide_setup.)

Aaarrgh...
Somewhere between the call to ide_setup and ide_init_hwif_ports
the interrupts get enabled. But I haven't got to the point where
exactly...

include/linux/ide.h:
    266 /*
    267  * ide_init_hwif_ports() is OBSOLETE and will be removed in 2.7 series.
    268  * New ports shouldn't define IDE_ARCH_OBSOLETE_INIT in <asm/ide.h>.
    269  */
    270 #ifdef IDE_ARCH_OBSOLETE_INIT
    271 static inline void ide_init_hwif_ports(hw_regs_t *hw,
    272                                        unsigned long io_addr,
    273                                        unsigned long ctl_addr,
    274                                        int *irq)
    275 {
    276         if (!irqs_disabled()) printk(__FILE__ "%s(): blaat: interrupts were enabled early@%d\n",__FUNCTION__,__LINE__);
    277         if (!ctl_addr) {
    278                 ide_std_init_ports(hw, io_addr, ide_default_io_ctl(io_addr));

drivers/ide/ide.c:
    256 static void init_hwif_default(ide_hwif_t *hwif, unsigned int index)
    257 {
    258         hw_regs_t hw;
    259 
    260         if (!irqs_disabled()) printk(__FILE__ "%s(): blaat: interrupts were enabled early@%d\n",__FUNCTION__,__LINE__);
    261         memset(&hw, 0, sizeof(hw_regs_t));
    262         if (!irqs_disabled()) printk(__FILE__ "%s(): blaat: interrupts were enabled early@%d\n",__FUNCTION__,__LINE__);
    263 
    264         ide_init_hwif_ports(&hw, ide_default_io_base(index), 0, &hwif->irq);
    265         if (!irqs_disabled()) printk(__FILE__ " %s(): blaat: interrupts were enabled early@%d\n",__FUNCTION__,__LINE__);
    266 

dmesg:
BLAAT20Parsing ARGS: console=tty0 console=ttyS0,115200 hdb=noprobe hdc=noprobe hdd=noprobe root=/dev/md0 ro panic
=30 earlyprintk=serial,ttyS0,115200 
Unknown argument: calling ffffffff80643380
Unknown argument: calling ffffffff80643380
Unknown argument: calling ffffffff80643380
ide_setup: hdb=noprobeinclude/linux/ide.hide_init_hwif_ports(): blaat: interrupts were enabled early@276
include/linux/ide.hide_init_hwif_ports(): blaat: interrupts were enabled early@279
include/linux/ide.hide_init_hwif_ports(): blaat: interrupts were enabled early@284
include/linux/ide.hide_init_hwif_ports(): blaat: interrupts were enabled early@289
drivers/ide/ide.c init_hwif_default(): blaat: interrupts were enabled early@265
drivers/ide/ide.cinit_hwif_default(): blaat: interrupts were enabled early@269
drivers/ide/ide.cinit_ide_data(): blaat: interrupts were enabled early@317

So as I read it: init_hwif_default calls ide_init_hwif_ports with irq's
disabled, but upon entrance, the irq's are enabled.
That really makes no sense to me.
So I will continue digging this code (there must be something recursive going
on).
-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22 14:00       ` Ard -kwaak- van Breemen
@ 2006-12-22 14:16         ` Ard -kwaak- van Breemen
  2006-12-22 19:10           ` Andrew Morton
  0 siblings, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-22 14:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

On Fri, Dec 22, 2006 at 03:00:59PM +0100, Ard -kwaak- van Breemen wrote:
>     262         if (!irqs_disabled()) printk(__FILE__ "%s(): blaat: interrupts were enabled early@%d\n",__FUNCTION__,__LINE__);
>     263 
>     264         ide_init_hwif_ports(&hw, ide_default_io_base(index), 0, &hwif->irq);
------------------------------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^
which does a         if (pci_find_device(PCI_ANY_ID, PCI_ANY_ID,
in include/asm-i386/ide.h

which should be really the part that does the irq enabling.

-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22  8:30   ` Andrew Morton
  2006-12-22  9:32     ` Stefano Takekawa
  2006-12-22 10:30     ` Ard -kwaak- van Breemen
@ 2006-12-22 14:35     ` Ard -kwaak- van Breemen
  2006-12-29 15:08       ` Ard -kwaak- van Breemen
  2006-12-22 14:41     ` Ard -kwaak- van Breemen
  3 siblings, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-22 14:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

On Fri, Dec 22, 2006 at 12:30:29AM -0800, Andrew Morton wrote:
> I expect that you'll find that the ide code ends up doing
> down_write(pci_bus_sem), which will enable interrupts.
will:         down_read(&pci_bus_sem);
also enable interrupts?
Since that is called:
init/main.c         start_kernel
kernel/params.c      parse_args("Booting kernel"
kernel/params.c       parse_one
drivers/ide/ide.c      ide_setup
drivers/ide/ide.c       init_ide_data
drivers/ide/ide.c        init_hwif_default
include/asm-i386/ide.h    ide_default_io_base(index)
drivers/pci/search.c       pci_find_device
drivers/pci/search.c        pci_find_subsys
                             down_read(&pci_bus_sem);
                 

-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22  8:30   ` Andrew Morton
                       ` (2 preceding siblings ...)
  2006-12-22 14:35     ` Ard -kwaak- van Breemen
@ 2006-12-22 14:41     ` Ard -kwaak- van Breemen
  2006-12-22 15:42       ` Ard -kwaak- van Breemen
  3 siblings, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-22 14:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

On Fri, Dec 22, 2006 at 12:30:29AM -0800, Andrew Morton wrote:
> To whom do I have to pay how much to get this darn patch tested?
I've altered your patch to do the spin_lock_irqsave in down_read.
I am very ignorant and stupid. That's why I am doing it without
thinking why or why not de irqsave is ok in that region or not.

And the results are:
include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@49
include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@51
include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@59

Meaning: it works.

Repeating: I am very stupid, so I don't know if saving the irq state is ok or
not in down_read.
-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22 14:41     ` Ard -kwaak- van Breemen
@ 2006-12-22 15:42       ` Ard -kwaak- van Breemen
  2006-12-28 23:51         ` Andrew Morton
  0 siblings, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-22 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 431 bytes --]

On Fri, Dec 22, 2006 at 03:41:34PM +0100, Ard -kwaak- van Breemen wrote:
> Repeating: I am very stupid, so I don't know if saving the irq state is ok or
> not in down_read.
The Andrew Morton patch but the rewritten for down_read makes the
symptoms go away.

The problem obviously is that the ide_setup pokes the pci
subsystem way too early.
Parsing of the ide parameters should be delayed until the next
run of parse_args I guess.

[-- Attachment #2: rwsem-spinlock.patch --]
[-- Type: text/plain, Size: 809 bytes --]

--- linux-2.6.19.1/lib/rwsem-spinlock.c	2006-12-11 19:32:53.000000000 +0000
+++ linux-2.6.19/lib/rwsem-spinlock.c	2006-12-22 15:06:52.000000000 +0000
@@ -129,13 +129,14 @@
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
+	unsigned long flags;
 
-	spin_lock_irq(&sem->wait_lock);
+	spin_lock_irqsave(&sem->wait_lock, flags); 
 
 	if (sem->activity >= 0 && list_empty(&sem->wait_list)) {
 		/* granted */
 		sem->activity++;
-		spin_unlock_irq(&sem->wait_lock);
+		spin_unlock_irqrestore(&sem->wait_lock, flags); 
 		goto out;
 	}
 
@@ -150,7 +151,7 @@
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we don't need to touch the semaphore struct anymore */
-	spin_unlock_irq(&sem->wait_lock);
+	spin_unlock_irqrestore(&sem->wait_lock, flags); 
 
 	/* wait to be given the lock */
 	for (;;) {

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22 14:16         ` Ard -kwaak- van Breemen
@ 2006-12-22 19:10           ` Andrew Morton
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2006-12-22 19:10 UTC (permalink / raw)
  To: Ard -kwaak- van Breemen
  Cc: Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

On Fri, 22 Dec 2006 15:16:55 +0100
Ard -kwaak- van Breemen <ard@telegraafnet.nl> wrote:

> On Fri, Dec 22, 2006 at 03:00:59PM +0100, Ard -kwaak- van Breemen wrote:
> >     262         if (!irqs_disabled()) printk(__FILE__ "%s(): blaat: interrupts were enabled early@%d\n",__FUNCTION__,__LINE__);
> >     263 
> >     264         ide_init_hwif_ports(&hw, ide_default_io_base(index), 0, &hwif->irq);
> ------------------------------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^
> which does a         if (pci_find_device(PCI_ANY_ID, PCI_ANY_ID,
> in include/asm-i386/ide.h
> 
> which should be really the part that does the irq enabling.

doh, I missed down_read():

--- a/lib/rwsem-spinlock.c~down_write-preserve-local-irqs
+++ a/lib/rwsem-spinlock.c
@@ -129,13 +129,14 @@ void fastcall __sched __down_read(struct
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
+	unsigned long flags;
 
-	spin_lock_irq(&sem->wait_lock);
+	spin_lock_irqsave(&sem->wait_lock, flags);
 
 	if (sem->activity >= 0 && list_empty(&sem->wait_list)) {
 		/* granted */
 		sem->activity++;
-		spin_unlock_irq(&sem->wait_lock);
+		spin_unlock_irqrestore(&sem->wait_lock, flags);
 		goto out;
 	}
 
@@ -150,7 +151,7 @@ void fastcall __sched __down_read(struct
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we don't need to touch the semaphore struct anymore */
-	spin_unlock_irq(&sem->wait_lock);
+	spin_unlock_irqrestore(&sem->wait_lock, flags);
 
 	/* wait to be given the lock */
 	for (;;) {
@@ -195,13 +196,14 @@ void fastcall __sched __down_write_neste
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
+	unsigned long flags;
 
-	spin_lock_irq(&sem->wait_lock);
+	spin_lock_irqsave(&sem->wait_lock, flags);
 
 	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
 		/* granted */
 		sem->activity = -1;
-		spin_unlock_irq(&sem->wait_lock);
+		spin_unlock_irqrestore(&sem->wait_lock, flags);
 		goto out;
 	}
 
@@ -216,7 +218,7 @@ void fastcall __sched __down_write_neste
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we don't need to touch the semaphore struct anymore */
-	spin_unlock_irq(&sem->wait_lock);
+	spin_unlock_irqrestore(&sem->wait_lock, flags);
 
 	/* wait to be given the lock */
 	for (;;) {
_


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22 15:42       ` Ard -kwaak- van Breemen
@ 2006-12-28 23:51         ` Andrew Morton
  2006-12-29 10:18           ` Stefano Takekawa
  2006-12-29 12:51           ` Ard -kwaak- van Breemen
  0 siblings, 2 replies; 42+ messages in thread
From: Andrew Morton @ 2006-12-28 23:51 UTC (permalink / raw)
  To: Ard -kwaak- van Breemen, Greg KH
  Cc: Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman


Could someone please test this?


From: Andrew Morton <akpm@osdl.org>

Various people have reported machines failing to boot since pci_bus_sem was
switched from a spinlock to an rwsem.

The reason for this is that these people had "ide=" on the kernel commandline,
and ide_setup() can end up calling PCI functions which do
down_read(&pci_bus_sem).


Ard has worked out the call tree:

init/main.c         start_kernel
kernel/params.c      parse_args("Booting kernel"
kernel/params.c       parse_one
drivers/ide/ide.c      ide_setup
drivers/ide/ide.c       init_ide_data
drivers/ide/ide.c        init_hwif_default
include/asm-i386/ide.h    ide_default_io_base(index)
drivers/pci/search.c       pci_find_device
drivers/pci/search.c        pci_find_subsys
                             down_read(&pci_bus_sem);


down_read() will unconditionally enable interrupts and some early interrupt
(source unknown) comes in and whacks the machine, apparently because the LDT
isn't set up yet.

Fix that by avoiding taking the semaphore in the PCI code in this situation.

Cc: Ard -kwaak- van Breemen <ard@telegraafnet.nl>
Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Yinghai Lu <yinghai.lu@amd.com>
Cc: <take@libero.it>
Cc: <agalanin@mera.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/pci/search.c |   10 ++++++++++
 1 files changed, 10 insertions(+)

diff -puN drivers/pci/search.c~pci-avoid-taking-pci_bus_sem-early-in-boot drivers/pci/search.c
--- a/drivers/pci/search.c~pci-avoid-taking-pci_bus_sem-early-in-boot
+++ a/drivers/pci/search.c
@@ -259,6 +259,16 @@ pci_get_subsys(unsigned int vendor, unsi
 	struct pci_dev *dev;
 
 	WARN_ON(in_interrupt());
+
+	/*
+	 * pci_get_subsys() can be called on the ide_setup() path, super-early
+	 * in boot.  But the down_read() will enable local interrupts, which
+	 * can cause some machines to crash.  So here we detect that situation
+	 * and bail out early.
+	 */
+	if (unlikely(list_empty(pci_devices)))
+		return NULL;
+
 	down_read(&pci_bus_sem);
 	n = from ? from->global_list.next : pci_devices.next;
 
_


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-28 23:51         ` Andrew Morton
@ 2006-12-29 10:18           ` Stefano Takekawa
  2006-12-29 12:51           ` Ard -kwaak- van Breemen
  1 sibling, 0 replies; 42+ messages in thread
From: Stefano Takekawa @ 2006-12-29 10:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ard -kwaak- van Breemen, Greg KH, Zhang, Yanmin, Chuck Ebbert,
	Yinghai Lu, agalanin, linux-kernel, bugme-daemon,
	Eric W. Biederman

Il giorno gio, 28/12/2006 alle 15.51 -0800, Andrew Morton ha scritto:
> Could someone please test this?
> diff -puN drivers/pci/search.c~pci-avoid-taking-pci_bus_sem-early-in-boot drivers/pci/search.c
> --- a/drivers/pci/search.c~pci-avoid-taking-pci_bus_sem-early-in-boot
> +++ a/drivers/pci/search.c
> @@ -259,6 +259,16 @@ pci_get_subsys(unsigned int vendor, unsi
>  	struct pci_dev *dev;
>  
>  	WARN_ON(in_interrupt());
> +
> +	/*
> +	 * pci_get_subsys() can be called on the ide_setup() path, super-early
> +	 * in boot.  But the down_read() will enable local interrupts, which
> +	 * can cause some machines to crash.  So here we detect that situation
> +	 * and bail out early.
> +	 */
> +	if (unlikely(list_empty(pci_devices)))
> +		return NULL;
> +
>  	down_read(&pci_bus_sem);
>  	n = from ? from->global_list.next : pci_devices.next;
>  
> _
> 
Applied to 2.6.19 it returns error while compiling:

CC      drivers/pci/search.o
drivers/pci/search.c: In function ‘pci_get_subsys’:
drivers/pci/search.c:269: error: incompatible type for argument 1 of
‘list_empty’
make[2]: *** [drivers/pci/search.o] Error 1
make[1]: *** [drivers/pci] Error 2
make: *** [drivers] Error 2

drivers/pci/search.c
268       */
269        if (unlikely(list_empty(pci_devices)))
270               return NULL;


-- 
Stefano Takekawa
take@libero.it

Frank:  And why do days get longer in the summer?
Ernest: Because heat makes things expand!



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-28 23:51         ` Andrew Morton
  2006-12-29 10:18           ` Stefano Takekawa
@ 2006-12-29 12:51           ` Ard -kwaak- van Breemen
  2006-12-29 13:27             ` Ard -kwaak- van Breemen
  1 sibling, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-29 12:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg KH, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

Hello Andrew,
On Thu, Dec 28, 2006 at 03:51:48PM -0800, Andrew Morton wrote:
> Could someone please test this?
Without testing I declare it won't fix it 8-D
> Ard has worked out the call tree:
> 
> init/main.c         start_kernel
> kernel/params.c      parse_args("Booting kernel"
> kernel/params.c       parse_one
> drivers/ide/ide.c      ide_setup
> drivers/ide/ide.c       init_ide_data
> drivers/ide/ide.c        init_hwif_default
> include/asm-i386/ide.h    ide_default_io_base(index)
> drivers/pci/search.c       pci_find_device
> drivers/pci/search.c        pci_find_subsys
------------------------------^^^^^^^^^^^^^^
Your patch patches pci_get_subsys, while pci_find_subsys does the
down_read...
I will try it on the right function, and see what we get.

Regards,
Ard

-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-29 12:51           ` Ard -kwaak- van Breemen
@ 2006-12-29 13:27             ` Ard -kwaak- van Breemen
  2006-12-29 14:10               ` Ard -kwaak- van Breemen
  0 siblings, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-29 13:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg KH, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

On Fri, Dec 29, 2006 at 01:51:08PM +0100, Ard -kwaak- van Breemen wrote:
> I will try it on the right function, and see what we get.

In function: 186 static struct pci_dev * pci_find_subsys(unsigned
int vendor,

203        if (unlikely(list_empty(&pci_devices))) {
204                 printk("Pci device list empty, preventing down_read\n");
205                return NULL;
206         }

delivers:
ard@supergirl:~$ sudo grep -C1 'Pci device list empty' /var/log/kern.log
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@49
Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@51
--
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@49
Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@51
--
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@49
Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@51
--
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@49
Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@51
--
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@49
Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@51
--
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@49
Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@51
--
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@49
Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@51
--
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@49
Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@51
--
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@49
Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@51
--
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@49
Dec 29 14:17:47 localhost kernel: Pci device list empty, preventing down_read
Dec 29 14:17:47 localhost kernel: include/asm-i386/ide.h ide_default_io_base(): blaat: interrupts were disabled@51

I don't see any other warnings, so I guess the patch is working now :-).


I will clean up the patches found on this list to fix and detect this.

program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-29 13:27             ` Ard -kwaak- van Breemen
@ 2006-12-29 14:10               ` Ard -kwaak- van Breemen
  2006-12-29 15:01                 ` Ard -kwaak- van Breemen
  0 siblings, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-29 14:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg KH, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 683 bytes --]

On Fri, Dec 29, 2006 at 02:27:59PM +0100, Ard -kwaak- van Breemen wrote:
> I will clean up the patches found on this list to fix and detect this.

Preliminary patches:
- pci fix of Andrews patches
- parse-one detection of Yanmin
- start_kernel detection and workaround (disable them again)

These are the patches that I am about to test in the next 2
hours... :-)
Anyway: I think it is possible that other drivers are also
potential irq enablers as soon as they are called from parse_one.
Usually I compile network drivers as modules, but in diskless
setups this might not be the case :-).
-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

[-- Attachment #2: pci-prevent-downread.patch --]
[-- Type: text/plain, Size: 1292 bytes --]

--- linux-2.6.19.vanilla/drivers/pci/search.c	2006-11-29 21:57:37.000000000 +0000
+++ linux-2.6.19/drivers/pci/search.c	2006-12-29 13:58:51.000000000 +0000
@@ -193,6 +193,17 @@
 	struct pci_dev *dev;
 
 	WARN_ON(in_interrupt());
+
+	/*
+	 * pci_find_subsys() can be called on the ide_setup() path, super-early
+	 * in boot.  But the down_read() will enable local interrupts, which
+	 * can cause some machines to crash.  So here we detect and flag that
+	 * situation and bail out early.
+	 */
+	if(unlikely(list_empty(&pci_devices))) {
+		printk(KERN_INFO "pci_find_subsys() called while pci_devices is still empty\n");
+		return NULL;
+	}
 	down_read(&pci_bus_sem);
 	n = from ? from->global_list.next : pci_devices.next;
 
@@ -259,6 +270,16 @@
 	struct pci_dev *dev;
 
 	WARN_ON(in_interrupt());
+	/*
+	 * pci_get_subsys() can potentially be called by drivers super-early
+	 * in boot.  But the down_read() will enable local interrupts, which
+	 * can cause some machines to crash.  So here we detect and flag that
+	 * situation and bail out early.
+	 */
+	if(unlikely(list_empty(&pci_devices))) {
+		printk(KERN_NOTICE "pci_get_subsys() called while pci_devices is still empty\n");
+		return NULL;
+	}
 	down_read(&pci_bus_sem);
 	n = from ? from->global_list.next : pci_devices.next;
 

[-- Attachment #3: param-parse-irq-enable-detection.patch --]
[-- Type: text/plain, Size: 749 bytes --]

--- linux-2.6.19.vanilla/kernel/params.c	2006-11-29 21:57:37.000000000 +0000
+++ linux-2.6.19/kernel/params.c	2006-12-29 14:02:48.000000000 +0000
@@ -53,13 +53,20 @@
 		     int (*handle_unknown)(char *param, char *val))
 {
 	unsigned int i;
+	int result;
+	int irq_was_disabled;
 
 	/* Find parameter */
 	for (i = 0; i < num_params; i++) {
 		if (parameq(param, params[i].name)) {
 			DEBUGP("They are equal!  Calling %p\n",
 			       params[i].set);
-			return params[i].set(val, &params[i]);
+			irq_was_disabled = irqs_disabled();
+			result=params[i].set(val, &params[i]);
+			if (irq_was_disabled && !irqs_disabled()) {
+				printk(KERN_WARNING "[BUG] parse_one: kerneloption '%s' enabled irq!\n",param);
+			}
+			return result;
 		}
 	}
 

[-- Attachment #4: main-irq-enable-detection-and-disable-again.patch --]
[-- Type: text/plain, Size: 493 bytes --]

--- linux-2.6.19.vanilla/init/main.c	2006-11-29 21:57:37.000000000 +0000
+++ linux-2.6.19/init/main.c	2006-12-29 13:58:37.000000000 +0000
@@ -525,6 +525,10 @@
 	parse_args("Booting kernel", command_line, __start___param,
 		   __stop___param - __start___param,
 		   &unknown_bootoption);
+	if (!irqs_disabled()) {
+		printk(KERN_WARNING "start_kernel(): bug: interrupts were enabled *very* early, fixing it\n");
+		local_irq_disable();
+	}
 	sort_main_extable();
 	trap_init();
 	rcu_init();

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-29 14:10               ` Ard -kwaak- van Breemen
@ 2006-12-29 15:01                 ` Ard -kwaak- van Breemen
  2006-12-29 15:05                   ` Ard -kwaak- van Breemen
                                     ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-29 15:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg KH, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

On Fri, Dec 29, 2006 at 03:10:58PM +0100, Ard -kwaak- van Breemen wrote:
> Preliminary patches:
> - pci fix of Andrews patches
The printk might be too verbose. I think removing them is ok
since the only thing that has happened is that it prevents
entering the loop and the semaphores. The only thing that bugs me
is if list_empty can be used like that. (in other words: don't we
need semaphores around that).

> - parse-one detection of Yanmin
It doesn't flag it. I am working on that.

> - start_kernel detection and workaround (disable them again)
main-irq-enable-detection-and-disable-again.patch is working
great. I love to see that one included in the kernel.

-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-29 15:01                 ` Ard -kwaak- van Breemen
@ 2006-12-29 15:05                   ` Ard -kwaak- van Breemen
  2006-12-29 15:24                   ` Ard -kwaak- van Breemen
  2006-12-29 15:42                   ` Ard -kwaak- van Breemen
  2 siblings, 0 replies; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-29 15:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg KH, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

On Fri, Dec 29, 2006 at 04:01:32PM +0100, Ard -kwaak- van Breemen wrote:
> > - parse-one detection of Yanmin
> It doesn't flag it. I am working on that.
Since it goes to a callback to obsolete_checksetup()
Argh... my calltree was a little flawed :-(...

-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22 14:35     ` Ard -kwaak- van Breemen
@ 2006-12-29 15:08       ` Ard -kwaak- van Breemen
  0 siblings, 0 replies; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-29 15:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

On Fri, Dec 22, 2006 at 03:35:20PM +0100, Ard -kwaak- van Breemen wrote:
> On Fri, Dec 22, 2006 at 12:30:29AM -0800, Andrew Morton wrote:
> > I expect that you'll find that the ide code ends up doing
> > down_write(pci_bus_sem), which will enable interrupts.
> will:         down_read(&pci_bus_sem);
> also enable interrupts?
> Since that is called:
> init/main.c         start_kernel
> kernel/params.c      parse_args("Booting kernel"
> kernel/params.c       parse_one
-----------------------------------------------
  init/main.c            unknown_bootoption
  init/main.c             obsolete_checksetup
-----------------------------------------------
  > drivers/ide/ide.c      ide_setup
  > drivers/ide/ide.c       init_ide_data
  > drivers/ide/ide.c        init_hwif_default
  > include/asm-i386/ide.h    ide_default_io_base(index)
  > drivers/pci/search.c       pci_find_device
  > drivers/pci/search.c        pci_find_subsys

Fixes in the calltree
-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-29 15:01                 ` Ard -kwaak- van Breemen
  2006-12-29 15:05                   ` Ard -kwaak- van Breemen
@ 2006-12-29 15:24                   ` Ard -kwaak- van Breemen
  2006-12-29 15:42                   ` Ard -kwaak- van Breemen
  2 siblings, 0 replies; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-29 15:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg KH, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 914 bytes --]

On Fri, Dec 29, 2006 at 04:01:32PM +0100, Ard -kwaak- van Breemen wrote:
> > - parse-one detection of Yanmin
> It doesn't flag it. I am working on that.
As said: it was doing a callback to obsolete_...
This replaces the patch into not being bloated and still gives
enough info. It won't check voor callbacks or whatever, just
which parameter b0rked it.

Output of dmesg without the pci-patch applied:
ard@supergirl:~$ dmesg|grep -B5 -A1 'interrupts were enabled'
Kernel command line: console=tty0 console=ttyS0,115200 hdb=noprobe hdc=noprobe hdd=noprobe root=/dev/md0 ro panic=30 earlyprintk=serial,ttyS0,115200 
ide_setup: hdb=noprobe
parse_args(): option 'hdb=noprobe' enabled irq's!
ide_setup: hdc=noprobe
ide_setup: hdd=noprobe
start_kernel(): bug: interrupts were enabled *very* early, fixing it
Initializing CPU#0

-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

[-- Attachment #2: param-parse-irq-enable-detection.patch --]
[-- Type: text/plain, Size: 572 bytes --]

--- linux-2.6.19.vanilla/kernel/params.c	2006-11-29 21:57:37.000000000 +0000
+++ linux-2.6.19/kernel/params.c	2006-12-29 15:14:26.000000000 +0000
@@ -143,9 +143,14 @@
 
 	while (*args) {
 		int ret;
+		int irq_was_disabled;
 
 		args = next_arg(args, &param, &val);
+		irq_was_disabled=irqs_disabled();
 		ret = parse_one(param, val, params, num, unknown);
+		if(irq_was_disabled && !irqs_disabled()) {
+			printk(KERN_WARNING "parse_args(): option '%s' enabled irq's!\n",param);
+		}
 		switch (ret) {
 		case -ENOENT:
 			printk(KERN_ERR "%s: Unknown parameter `%s'\n",

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-29 15:01                 ` Ard -kwaak- van Breemen
  2006-12-29 15:05                   ` Ard -kwaak- van Breemen
  2006-12-29 15:24                   ` Ard -kwaak- van Breemen
@ 2006-12-29 15:42                   ` Ard -kwaak- van Breemen
  2006-12-30 19:46                     ` [PATCH 2.6.20-rc2-git1] start_kernel: Test if irq's got enabled early, barf, and disable them again Ard -kwaak- van Breemen
                                       ` (2 more replies)
  2 siblings, 3 replies; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-29 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg KH, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

Hello,
On Fri, Dec 29, 2006 at 04:01:32PM +0100, Ard -kwaak- van Breemen wrote:
> On Fri, Dec 29, 2006 at 03:10:58PM +0100, Ard -kwaak- van Breemen wrote:
> > Preliminary patches:
> > - pci fix of Andrews patches
> The printk might be too verbose. I think removing them is ok

I stick with the verbose printk. Because else we will never know
that something is faul.

> since the only thing that has happened is that it prevents
> entering the loop and the semaphores. The only thing that bugs me
> is if list_empty can be used like that. (in other words: don't we
> need semaphores around that).

I was wondering about the validity of pci_devices at that time.
But on the other hand: if that was not wrong, people would have
complained much earlier.

Anyway, I think that's it: those 3 patches will fix and guard the
problems we've seen.

-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 2.6.20-rc2-git1] start_kernel: Test if irq's got enabled early, barf, and disable them again
  2006-12-29 15:42                   ` Ard -kwaak- van Breemen
@ 2006-12-30 19:46                     ` Ard -kwaak- van Breemen
  2008-03-03 22:46                       ` Tony Luck
  2006-12-30 19:58                     ` [PATCH 2.6.20-rc2-git1] kernelparams: detect if and which parameter parsing enabled irq's Ard -kwaak- van Breemen
  2006-12-30 20:15                     ` [PATCH 2.6.20-rc2-git1] PCI: prevent down_read when pci_devices is empty Ard -kwaak- van Breemen
  2 siblings, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-30 19:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg KH, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

The calls made by parse_parms to other initialization code might
enable interrupts again way too early.
Having interrupts on this early can make systems PANIC when they
initialize the IRQ controllers (which happens later in the code).
This patch detects that irq's are enabled again, barfs about it
and disables them again as a safety net.

Signed-off-by: Ard van Breemen <ard@telegraafnet.nl>

--- linux-2.6.20-rc2-git1/init/main.c.orig	2006-12-30 17:41:13.000000000 +0000
+++ linux-2.6.20-rc2-git1/init/main.c	2006-12-30 17:44:02.000000000 +0000
@@ -538,6 +538,10 @@ asmlinkage void __init start_kernel(void
 	parse_args("Booting kernel", command_line, __start___param,
 		   __stop___param - __start___param,
 		   &unknown_bootoption);
+	if (!irqs_disabled()) {
+		printk(KERN_WARNING "start_kernel(): bug: interrupts were enabled *very* early, fixing it\n");
+		local_irq_disable();
+	}
 	sort_main_extable();
 	trap_init();
 	rcu_init();

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 2.6.20-rc2-git1] kernelparams: detect if and which parameter parsing enabled irq's
  2006-12-29 15:42                   ` Ard -kwaak- van Breemen
  2006-12-30 19:46                     ` [PATCH 2.6.20-rc2-git1] start_kernel: Test if irq's got enabled early, barf, and disable them again Ard -kwaak- van Breemen
@ 2006-12-30 19:58                     ` Ard -kwaak- van Breemen
  2006-12-30 20:15                     ` [PATCH 2.6.20-rc2-git1] PCI: prevent down_read when pci_devices is empty Ard -kwaak- van Breemen
  2 siblings, 0 replies; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-30 19:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg KH, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

The parsing of some kernel parameters seem to enable irq's at a
stage that irq's are not supposed to be enabled (Particularly the
ide kernel parameters).  Having irq's enabled before the irq
controller is initialized might lead to a kernel panic.
This patch only detects this behaviour and warns about wich
parameter caused it.

Signed-off-by: Ard van Breemen <ard@telegraafnet.nl>

--- linux-2.6.19.vanilla/kernel/params.c	2006-11-29 21:57:37.000000000 +0000
+++ linux-2.6.19.ok/kernel/params.c	2006-12-29 15:14:26.000000000 +0000
@@ -143,9 +143,14 @@ int parse_args(const char *name,
 
 	while (*args) {
 		int ret;
+		int irq_was_disabled;
 
 		args = next_arg(args, &param, &val);
+		irq_was_disabled=irqs_disabled();
 		ret = parse_one(param, val, params, num, unknown);
+		if(irq_was_disabled && !irqs_disabled()) {
+			printk(KERN_WARNING "parse_args(): option '%s' enabled irq's!\n",param);
+		}
 		switch (ret) {
 		case -ENOENT:
 			printk(KERN_ERR "%s: Unknown parameter `%s'\n",

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 2.6.20-rc2-git1] PCI: prevent down_read when pci_devices is empty
  2006-12-29 15:42                   ` Ard -kwaak- van Breemen
  2006-12-30 19:46                     ` [PATCH 2.6.20-rc2-git1] start_kernel: Test if irq's got enabled early, barf, and disable them again Ard -kwaak- van Breemen
  2006-12-30 19:58                     ` [PATCH 2.6.20-rc2-git1] kernelparams: detect if and which parameter parsing enabled irq's Ard -kwaak- van Breemen
@ 2006-12-30 20:15                     ` Ard -kwaak- van Breemen
  2 siblings, 0 replies; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-30 20:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg KH, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

The pci_find_subsys gets called very early by obsolete ide setup
parameters.  This is a bogus call since pci is not initialized
yet, so the list is empty.  But in the mean time, interrupts get
enabled by down_read. This can result in a kernel panic when the
irq controller gets initialized.
This patch checks if the device list is empty before taking the
semaphore, and hence will not enable irq's. Furthermore it will
inform that it is called while pci_devices is empty as a reminder
that the ide code needs to be fixed.
The pci_get_subsys can get called in the same manner, and as such
is patched in the same manner.

Signed-off-by: Ard van Breemen <ard@telegraafnet.nl>
----
This patch is an adaption of Andrew Mortons patch.

--- linux-2.6.19.vanilla/drivers/pci/search.c	2006-11-29 21:57:37.000000000 +0000
+++ linux-2.6.19.ok/drivers/pci/search.c	2006-12-29 15:38:18.000000000 +0000
@@ -193,6 +193,17 @@ static struct pci_dev * pci_find_subsys(
 	struct pci_dev *dev;
 
 	WARN_ON(in_interrupt());
+
+	/*
+	 * pci_find_subsys() can be called on the ide_setup() path, super-early
+	 * in boot.  But the down_read() will enable local interrupts, which
+	 * can cause some machines to crash.  So here we detect and flag that
+	 * situation and bail out early.
+	 */
+	if(unlikely(list_empty(&pci_devices))) {
+		printk(KERN_INFO "pci_find_subsys() called while pci_devices is still empty\n");
+		return NULL;
+	}
 	down_read(&pci_bus_sem);
 	n = from ? from->global_list.next : pci_devices.next;
 
@@ -259,6 +270,16 @@ pci_get_subsys(unsigned int vendor, unsi
 	struct pci_dev *dev;
 
 	WARN_ON(in_interrupt());
+	/*
+	 * pci_get_subsys() can potentially be called by drivers super-early
+	 * in boot.  But the down_read() will enable local interrupts, which
+	 * can cause some machines to crash.  So here we detect and flag that
+	 * situation and bail out early.
+	 */
+	if(unlikely(list_empty(&pci_devices))) {
+		printk(KERN_NOTICE "pci_get_subsys() called while pci_devices is still empty\n");
+		return NULL;
+	}
 	down_read(&pci_bus_sem);
 	n = from ? from->global_list.next : pci_devices.next;
 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2.6.20-rc2-git1] start_kernel: Test if irq's got enabled early, barf, and disable them again
  2006-12-30 19:46                     ` [PATCH 2.6.20-rc2-git1] start_kernel: Test if irq's got enabled early, barf, and disable them again Ard -kwaak- van Breemen
@ 2008-03-03 22:46                       ` Tony Luck
  2008-03-04  0:34                         ` Stephen Rothwell
  0 siblings, 1 reply; 42+ messages in thread
From: Tony Luck @ 2008-03-03 22:46 UTC (permalink / raw)
  To: Ard -kwaak- van Breemen
  Cc: Andrew Morton, Greg KH, Zhang, Yanmin, Chuck Ebbert, Yinghai Lu,
	take, agalanin, linux-kernel, bugme-daemon, Eric W. Biederman,
	sfr

>  +               printk(KERN_WARNING "start_kernel(): bug: interrupts were enabled *very* early, fixing it\n");

I built and booted the next-20080303 tag from linux-next and
found the above warning in my console log on ia64 (this is
new ... I've never seen this message before, even though
this patch was applied January 2007).

Hunting this down, I found the enabler was the lock_kernel() call
on line 536 of init/main.c ... doesn't than happen to other archs
too?  We get into the first call to lock_kernel() with current->lock_depth
set to -1, so we call down(&kernel_sem) ... which does spin_lock_irq()
and then spin_unlock_irq() ... leaving interrupts enabled.

What else changed to make this suddenly kick out now? It
doesn't happen from a build from Linus' tree.

-Tony

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2.6.20-rc2-git1] start_kernel: Test if irq's got enabled early, barf, and disable them again
  2008-03-03 22:46                       ` Tony Luck
@ 2008-03-04  0:34                         ` Stephen Rothwell
  0 siblings, 0 replies; 42+ messages in thread
From: Stephen Rothwell @ 2008-03-04  0:34 UTC (permalink / raw)
  To: Tony Luck
  Cc: Ard -kwaak- van Breemen, Andrew Morton, Greg KH, Zhang, Yanmin,
	Chuck Ebbert, Yinghai Lu, take, agalanin, linux-kernel,
	bugme-daemon, Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 1333 bytes --]

Hi Tony,

On Mon, 3 Mar 2008 14:46:52 -0800 "Tony Luck" <tony.luck@intel.com> wrote:
>
> >  +               printk(KERN_WARNING "start_kernel(): bug: interrupts were enabled *very* early, fixing it\n");
> 
> I built and booted the next-20080303 tag from linux-next and
> found the above warning in my console log on ia64 (this is
> new ... I've never seen this message before, even though
> this patch was applied January 2007).
> 
> Hunting this down, I found the enabler was the lock_kernel() call
> on line 536 of init/main.c ... doesn't than happen to other archs
> too?  We get into the first call to lock_kernel() with current->lock_depth
> set to -1, so we call down(&kernel_sem) ... which does spin_lock_irq()
> and then spin_unlock_irq() ... leaving interrupts enabled.
> 
> What else changed to make this suddenly kick out now? It
> doesn't happen from a build from Linus' tree.

This is Willy's generic semaphore code that is included in linux-next.
It is being discussed in another thread "linux-next: Tree for Feb 29:
WARNING: at kernel/lockdep.c:2024 trace_hardirqs_on" on the
linux-next@vger.kernel.org mailing list (archived at
http://marc.info/?l=linux-next&m=120440556729954&w=2).

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-22 18:42 ` Ard -kwaak- van Breemen
@ 2006-12-22 19:39   ` Stefano Takekawa
  0 siblings, 0 replies; 42+ messages in thread
From: Stefano Takekawa @ 2006-12-22 19:39 UTC (permalink / raw)
  To: Ard -kwaak- van Breemen
  Cc: Zhang, Yanmin, Andrew Morton, Chuck Ebbert, Yinghai Lu, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

> I am pretty sure the i386 tree has the same problem but I haven't checked yet.
> Anyway: the panic is just a way of noticing. The bug is that irq's are enabled
> before the irq controller is set up.

A very similar i386 linux installation works fine on my laptop, but that
i386 kernel never had problem.

-- 
Stefano Takekawa
take@libero.it

Frank:  And why do days get longer in the summer?
Ernest: Because heat makes things expand!



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-21  8:04 [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine Zhang, Yanmin
  2006-12-21 19:52 ` Ard -kwaak- van Breemen
  2006-12-21 21:05 ` Ard -kwaak- van Breemen
@ 2006-12-22 18:42 ` Ard -kwaak- van Breemen
  2006-12-22 19:39   ` Stefano Takekawa
  2 siblings, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-22 18:42 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Andrew Morton, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

Hello,
On Thu, Dec 21, 2006 at 04:04:04PM +0800, Zhang, Yanmin wrote:
> I couldn't reproduce it on my EM64T machine. I instrumented function start_kernel and
> didn't find irq was enabled before calling init_IRQ. It'll be better if the reporter could
> instrument function start_kernel to capture which function enables irq.

I can confirm this is a *GENERIC* X86_64 problem:
----
Kernel command line: console=tty0 console=ttyS0,115200 hdb=noprobe root=/dev/md0
init/main.c start_kernel(): interrupts were disabled@525
ide_setup: hdb=noprobe
init/main.c start_kernel(): interrupts were enabled@529
...
start_kernel(): bug: interrupts were enabled early
----
This is on a dell 1950 with a core 2 duo processors.

You have to have ide compiled in, and set ide options to get the irq's enabled,
and then have a setup which will have an irq pending before the irq controller
get's initialized to get the panic. The dell1950 does not panic, the kernel
merely warns.

I am pretty sure the i386 tree has the same problem but I haven't checked yet.
Anyway: the panic is just a way of noticing. The bug is that irq's are enabled
before the irq controller is set up.

But to make the ide_setup/irq bug go away, I think it might be an acceptable
solution to just disable the irq's again after the parse_args, and just to wait
until the SATA tree takes over the IDE tree.

-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-21  8:04 [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine Zhang, Yanmin
  2006-12-21 19:52 ` Ard -kwaak- van Breemen
@ 2006-12-21 21:05 ` Ard -kwaak- van Breemen
  2006-12-22 18:42 ` Ard -kwaak- van Breemen
  2 siblings, 0 replies; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-21 21:05 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Andrew Morton, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

On Thu, Dec 21, 2006 at 04:04:04PM +0800, Zhang, Yanmin wrote:
> I couldn't reproduce it on my EM64T machine. I instrumented function start_kernel and
> didn't find irq was enabled before calling init_IRQ. It'll be better if the reporter could
> instrument function start_kernel to capture which function enables irq.

Editing init/main.c:
        preempt_disable();
        if (!irqs_disabled())
                printk("start_kernel(): bug: interrupts were enabled early\n");
                printk("BLAAT17");
        build_all_zonelists();
        if (!irqs_disabled())
                printk("start_kernel(): bug: interrupts were enabled early\n");
                printk("BLAAT18");
        page_alloc_init();
        if (!irqs_disabled())
                printk("start_kernel(): bug: interrupts were enabled early\n");
                printk("BLAAT19");
        printk(KERN_NOTICE "Kernel command line: %s\n", saved_command_line);
        parse_early_param();
        if (!irqs_disabled())
                printk("start_kernel(): bug: interrupts were enabled early\n");
                printk("BLAAT20");
        parse_args("Booting kernel", command_line, __start___param,
                   __stop___param - __start___param,
                   &unknown_bootoption);
                printk("BLAAT21");
        if (!irqs_disabled())
                printk("start_kernel(): bug: interrupts were enabled early\n");
        sort_main_extable();
        if (!irqs_disabled())
                printk("start_kernel(): bug: interrupts were enabled early\n");
                printk("BLAAT22");
        trap_init();
        if (!irqs_disabled())
                printk("start_kernel(): bug: interrupts were enabled early\n");
                printk("BLAAT23");

Results in:
^MAllocating PCI resources starting at 88000000 (gap: 80000000:60000000)
^MBLAAT12BLAAT13<6>PERCPU: Allocating 32960 bytes of per cpu data
^MBLAAT14BLAAT15BLAAT16BLAAT17Built 2 zonelists.  Total pages: 1032635
^MBLAAT18BLAAT19<5>Kernel command line: console=tty0 console=ttyS0,115200 hdb=noprobe hdc=noprobe hdd=noprobe root=/dev/md0 ro panic=30 earlyprintk=serial,ttyS0,115200 
^MBLAAT20<6>ide_setup: hdb=noprobe
^Mide_setup: hdc=noprobe
^Mide_setup: hdd=noprobe
^MBLAAT21start_kernel(): bug: interrupts were enabled early
^Mstart_kernel(): bug: interrupts were enabled early
^MBLAAT22Initializing CPU#0

Hmmm, that actually doesn't make sense to me (unless parse_args is able to enable irq's).
-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-21 19:52 ` Ard -kwaak- van Breemen
@ 2006-12-21 20:11   ` Andrew Morton
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2006-12-21 20:11 UTC (permalink / raw)
  To: Ard -kwaak- van Breemen
  Cc: Zhang, Yanmin, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

On Thu, 21 Dec 2006 20:52:40 +0100
Ard -kwaak- van Breemen <ard@telegraafnet.nl> wrote:

> Hello,
> 
> On Thu, Dec 21, 2006 at 04:04:04PM +0800, Zhang, Yanmin wrote:
> > I couldn't reproduce it on my EM64T machine. I instrumented function start_kernel and
> > didn't find irq was enabled before calling init_IRQ. It'll be better if the reporter could
> > instrument function start_kernel to capture which function enables irq.
> Just diving into the sources.
> Is that something like:
> if(!raw_irqs_disabled_flags) printk "irqs are enabled";
> 
> (At that moment it might have crashed already.. :-)).
> 
> I don't see the complete context yet, but I hope the irq is
> triggered after the irq is somehow enabled.
> 
> BTW: the panic occurs on half of my boards on tyan S2891 with 2
> opterons, of which the only difference seems to be the purchase
> date (and hence probably the motherboard revisions). (Haven't got
> time yet to pull them out of the rack and compare the
> motherboards).

please, I'm still waiting for someone to tell me whether this "fixes" it:


--- a/lib/rwsem-spinlock.c~down_write-preserve-local-irqs
+++ a/lib/rwsem-spinlock.c
@@ -195,13 +195,14 @@ void fastcall __sched __down_write_neste
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
+	unsigned long flags;
 
-	spin_lock_irq(&sem->wait_lock);
+	spin_lock_irqsave(&sem->wait_lock, flags);
 
 	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
 		/* granted */
 		sem->activity = -1;
-		spin_unlock_irq(&sem->wait_lock);
+		spin_unlock_irqrestore(&sem->wait_lock, flags);
 		goto out;
 	}
 
@@ -216,7 +217,7 @@ void fastcall __sched __down_write_neste
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we don't need to touch the semaphore struct anymore */
-	spin_unlock_irq(&sem->wait_lock);
+	spin_unlock_irqrestore(&sem->wait_lock, flags);
 
 	/* wait to be given the lock */
 	for (;;) {
_


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-21  8:04 [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine Zhang, Yanmin
@ 2006-12-21 19:52 ` Ard -kwaak- van Breemen
  2006-12-21 20:11   ` Andrew Morton
  2006-12-21 21:05 ` Ard -kwaak- van Breemen
  2006-12-22 18:42 ` Ard -kwaak- van Breemen
  2 siblings, 1 reply; 42+ messages in thread
From: Ard -kwaak- van Breemen @ 2006-12-21 19:52 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Andrew Morton, Chuck Ebbert, Yinghai Lu, take, agalanin,
	linux-kernel, bugme-daemon, Eric W. Biederman

Hello,

On Thu, Dec 21, 2006 at 04:04:04PM +0800, Zhang, Yanmin wrote:
> I couldn't reproduce it on my EM64T machine. I instrumented function start_kernel and
> didn't find irq was enabled before calling init_IRQ. It'll be better if the reporter could
> instrument function start_kernel to capture which function enables irq.
Just diving into the sources.
Is that something like:
if(!raw_irqs_disabled_flags) printk "irqs are enabled";

(At that moment it might have crashed already.. :-)).

I don't see the complete context yet, but I hope the irq is
triggered after the irq is somehow enabled.

BTW: the panic occurs on half of my boards on tyan S2891 with 2
opterons, of which the only difference seems to be the purchase
date (and hence probably the motherboard revisions). (Haven't got
time yet to pull them out of the rack and compare the
motherboards).


-- 
program signature;
begin  { telegraaf.com
} writeln("<ard@telegraafnet.nl> TEM2");
end
.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
@ 2006-12-21  8:04 Zhang, Yanmin
  2006-12-21 19:52 ` Ard -kwaak- van Breemen
                   ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Zhang, Yanmin @ 2006-12-21  8:04 UTC (permalink / raw)
  To: Andrew Morton, Chuck Ebbert
  Cc: Yinghai Lu, ard, take, agalanin, linux-kernel, bugme-daemon,
	Eric W. Biederman

>>-----Original Message-----
>>From: Andrew Morton [mailto:akpm@osdl.org]
>>Sent: 2006年12月20日 18:38
>>To: Chuck Ebbert
>>Cc: Yinghai Lu; ard@telegraafnet.nl; take@libero.it; agalanin@mera.ru; linux-kernel@vger.kernel.org; bugme-daemon@bugzilla.kernel.org;
>>Eric W. Biederman; Zhang, Yanmin
>>Subject: Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
>>
>>On Wed, 20 Dec 2006 04:59:19 -0500
>>Chuck Ebbert <76306.1226@compuserve.com> wrote:
>>
>>> > On 12/19/06, Chuck Ebbert <76306.1226@compuserve.com> wrote:
>>> > > So an external interrupt occurred, the system tried to use interrupt
>>> > > descriptor #39 decimal (irq 7), but the descriptor was invalid.
>>> >
>>> > but the irq is disabled at that time.
>>> >
>>> > can you use attached diff to verify if the irq is enable somehow?
>>>
>>> But it seems interrupts are on--look at the flags:
>>>
>>>         RSP: 0018:ffffffff803cdf68  EFLAGS: 00010246
>>>
>>
>>down_write()->__down_write()->__down_write_nested()->spin_unlock_irq()->dead
>>
>>Could someone please test this?
I couldn't reproduce it on my EM64T machine. I instrumented function start_kernel and
didn't find irq was enabled before calling init_IRQ. It'll be better if the reporter could
instrument function start_kernel to capture which function enables irq.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-20 10:37 ` Andrew Morton
@ 2006-12-20 10:55   ` Arjan van de Ven
  0 siblings, 0 replies; 42+ messages in thread
From: Arjan van de Ven @ 2006-12-20 10:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chuck Ebbert, Yinghai Lu, ard, take, agalanin, linux-kernel,
	bugme-daemon, Eric W. Biederman, Zhang Yanmin

On Wed, 2006-12-20 at 02:37 -0800, Andrew Morton wrote:
> On Wed, 20 Dec 2006 04:59:19 -0500
> Chuck Ebbert <76306.1226@compuserve.com> wrote:
> 
> > > On 12/19/06, Chuck Ebbert <76306.1226@compuserve.com> wrote:
> > > > So an external interrupt occurred, the system tried to use interrupt
> > > > descriptor #39 decimal (irq 7), but the descriptor was invalid.
> > > 
> > > but the irq is disabled at that time.
> > > 
> > > can you use attached diff to verify if the irq is enable somehow?
> > 
> > But it seems interrupts are on--look at the flags:
> > 
> >         RSP: 0018:ffffffff803cdf68  EFLAGS: 00010246
> > 
> 
> down_write()->__down_write() -> __down_write_nested()->spin_unlock_irq()->dead

since down_write() sleeps..... what?



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-20  9:59 Chuck Ebbert
  2006-12-20 10:12 ` Yinghai Lu
@ 2006-12-20 10:37 ` Andrew Morton
  2006-12-20 10:55   ` Arjan van de Ven
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2006-12-20 10:37 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Yinghai Lu, ard, take, agalanin, linux-kernel, bugme-daemon,
	Eric W. Biederman, Zhang Yanmin

On Wed, 20 Dec 2006 04:59:19 -0500
Chuck Ebbert <76306.1226@compuserve.com> wrote:

> > On 12/19/06, Chuck Ebbert <76306.1226@compuserve.com> wrote:
> > > So an external interrupt occurred, the system tried to use interrupt
> > > descriptor #39 decimal (irq 7), but the descriptor was invalid.
> > 
> > but the irq is disabled at that time.
> > 
> > can you use attached diff to verify if the irq is enable somehow?
> 
> But it seems interrupts are on--look at the flags:
> 
>         RSP: 0018:ffffffff803cdf68  EFLAGS: 00010246
> 

down_write()->__down_write()->__down_write_nested()->spin_unlock_irq()->dead

Could someone please test this?


--- a/lib/rwsem-spinlock.c~a
+++ a/lib/rwsem-spinlock.c
@@ -195,13 +195,14 @@ void fastcall __sched __down_write_neste
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
+	unsigned long flags;
 
-	spin_lock_irq(&sem->wait_lock);
+	spin_lock_irqsave(&sem->wait_lock, flags);
 
 	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
 		/* granted */
 		sem->activity = -1;
-		spin_unlock_irq(&sem->wait_lock);
+		spin_unlock_irqrestore(&sem->wait_lock, flags);
 		goto out;
 	}
 
@@ -216,7 +217,7 @@ void fastcall __sched __down_write_neste
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we don't need to touch the semaphore struct anymore */
-	spin_unlock_irq(&sem->wait_lock);
+	spin_unlock_irqrestore(&sem->wait_lock, flags);
 
 	/* wait to be given the lock */
 	for (;;) {
_


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-20  9:59 Chuck Ebbert
@ 2006-12-20 10:12 ` Yinghai Lu
  2006-12-20 10:37 ` Andrew Morton
  1 sibling, 0 replies; 42+ messages in thread
From: Yinghai Lu @ 2006-12-20 10:12 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: ard, take, agalanin, linux-kernel, bugme-daemon,
	Eric W. Biederman, Zhang Yanmin, Andrew Morton

On 12/20/06, Chuck Ebbert <76306.1226@compuserve.com> wrote:
> But it seems interrupts are on--look at the flags:
>
>         RSP: 0018:ffffffff803cdf68  EFLAGS: 00010246

Yes, the IF bit is set.

maybe someone (reporters) could add !irq_disabled() and printk in
start_kernel init/main.c to see which function cause the irq get
enabled.

YH

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
@ 2006-12-20  9:59 Chuck Ebbert
  2006-12-20 10:12 ` Yinghai Lu
  2006-12-20 10:37 ` Andrew Morton
  0 siblings, 2 replies; 42+ messages in thread
From: Chuck Ebbert @ 2006-12-20  9:59 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: ard, take, agalanin, linux-kernel, bugme-daemon,
	Eric W. Biederman, Zhang Yanmin, Andrew Morton

> On 12/19/06, Chuck Ebbert <76306.1226@compuserve.com> wrote:
> > So an external interrupt occurred, the system tried to use interrupt
> > descriptor #39 decimal (irq 7), but the descriptor was invalid.
> 
> but the irq is disabled at that time.
> 
> can you use attached diff to verify if the irq is enable somehow?

But it seems interrupts are on--look at the flags:

        RSP: 0018:ffffffff803cdf68  EFLAGS: 00010246

-- 
MBTI: IXTP

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-20  6:42 Chuck Ebbert
@ 2006-12-20  9:11 ` Yinghai Lu
  0 siblings, 0 replies; 42+ messages in thread
From: Yinghai Lu @ 2006-12-20  9:11 UTC (permalink / raw)
  To: Chuck Ebbert, agalanin, take
  Cc: Andrew Morton, Zhang Yanmin, Eric W. Biederman, bugme-daemon,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 308 bytes --]

On 12/19/06, Chuck Ebbert <76306.1226@compuserve.com> wrote:
> So an external interrupt occurred, the system tried to use interrupt
> descriptor #39 decimal (irq 7), but the descriptor was invalid.

but the irq is disabled at that time.

can you use attached diff to verify if the irq is enable somehow?

YH

[-- Attachment #2: test_init_isa_irqs.diff --]
[-- Type: text/x-patch, Size: 497 bytes --]

diff --git a/arch/x86_64/kernel/i8259.c b/arch/x86_64/kernel/i8259.c
index d73c79e..fedde34 100644
--- a/arch/x86_64/kernel/i8259.c
+++ b/arch/x86_64/kernel/i8259.c
@@ -421,7 +421,11 @@ void __init init_ISA_irqs (void)
 {
 	int i;
 
+	if (!irqs_disabled())
+		printk("init_ISA_irqs(): -1  bug: interrupts were enabled early\n");
 	init_bsp_APIC();
+	if (!irqs_disabled())
+		printk("init_ISA_irqs(): -2  bug: interrupts were enabled early\n");
 	init_8259A(0);
 
 	for (i = 0; i < NR_IRQS; i++) {

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
@ 2006-12-20  6:42 Chuck Ebbert
  2006-12-20  9:11 ` Yinghai Lu
  0 siblings, 1 reply; 42+ messages in thread
From: Chuck Ebbert @ 2006-12-20  6:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Zhang Yanmin, Eric W. Biederman, bugme-daemon, linux-kernel

In-Reply-To: <20061219172900.37312b38.akpm@osdl.org>

On Tue, 19 Dec 2006 17:29:00 -0800, Andrew Morton wrote:

> Quoting the bug report:

> general protection fault: 013b [1] PREEMPT 

That '013b' is critical information.

Bit 0: 1: exception source is external to the processor
Bit 1: 1: there is a problem with an interrupt descriptor in the IDT
Bit 2: n/a
Bits 15-3: index of the problem descriptor

So an external interrupt occurred, the system tried to use interrupt
descriptor #39 decimal (irq 7), but the descriptor was invalid.
-- 
MBTI: IXTP


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
  2006-12-18 16:48 ` Eric W. Biederman
@ 2006-12-20  1:29   ` Andrew Morton
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Morton @ 2006-12-20  1:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, bugme-daemon, Zhang Yanmin, linux-kernel,
	take, ard, agalanin

On Mon, 18 Dec 2006 09:48:01 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> bugme-daemon@bugzilla.kernel.org writes:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=7505
> >
> > ------- Additional Comments From agalanin@mera.ru  2006-12-18 07:39 -------
> > OK, fixed.
> 
> 
> Greg.
> 
> It appears commit d71374dafbba7ec3f67371d3b7e9f6310a588808 which
> replaced the pci bus spinlock with a semaphore causes some systems not
> to boot.  I haven't a clue why.   
> 
> So I figure I would toss the ball over to your court to see if you can
> look and see what needs to happen to resolve this problem.
> 
> There appears to be at least one positive confirmation that reverting
> this patch allows this patch fixes the problems.
> 

That's weird.

Quoting the bug report:


There are output from kernel with enabled 'earlyprintk' option.

Linux version 2.6.19-rc5 (root@gaa) (gcc version 4.1.2 20060901 (prerelease) 
(Debian 4.1.1-13)) #2 PREEMPT Sat Nov 11 16:04:00 MSK 2006
Command line: BOOT_IMAGE=Linux-bug ro root=303 
video=radeonfb:mode:1024x768-16@60 idebus=66 earlyprintk=serial,ttyS0,9600,keep
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
 BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
 BIOS-e820: 000000001fff0000 - 000000001fff3000 (ACPI NVS)
 BIOS-e820: 000000001fff3000 - 0000000020000000 (ACPI data)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
end_pfn_map = 1048576
kernel direct mapping tables up to 100000000 @ 8000-d000
DMI 2.2 present.
Zone PFN ranges:
  DMA             0 ->     4096
  DMA32        4096 ->  1048576
  Normal    1048576 ->  1048576
early_node_map[2] active PFN ranges
    0:        0 ->      159
    0:      256 ->   131056
Nvidia board detected. Ignoring ACPI timer override.
ACPI: PM-Timer IO Port: 0x4008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Nosave address range: 000000000009f000 - 00000000000a0000
Nosave address range: 00000000000a0000 - 00000000000f0000
Nosave address range: 00000000000f0000 - 0000000000100000
Allocating PCI resources starting at 30000000 (gap: 20000000:c0000000)
Built 1 zonelists.  Total pages: 128336
Kernel command line: BOOT_IMAGE=Linux-bug ro root=303 
video=radeonfb:mode:1024x768-16@60 idebus=66 earlyprintk=serial,ttyS0,9600,keep
ide_setup: idebus=66
Initializing CPU#0
general protection fault: 013b [1] PREEMPT 
CPU 0 
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.19-rc5 #2
RIP: 0010:[<ffffffff8010fac6>]  [<ffffffff8010fac6>] init_8259A+0xb6/0xf0
RSP: 0018:ffffffff803cdf68  EFLAGS: 00010246
RAX: 00000000000000ff RBX: 0000000000000246 RCX: 00000000b4fcb55f
RDX: 0000000000000011 RSI: ffffffff8013cf40 RDI: 0000000000000199
RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000070 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff803c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000f0aed9 CR3: 0000000000101000 CR4: 00000000000006a0
Process swapper (pid: 0, threadinfo ffffffff803cc000, task ffffffff80360360)
Stack:  0000000000000000 ffffffff803d3a46 800089360a40206f 0000000000090000
 000000000008e000 ffffffff803d3ab9 0000000000000000 ffffffff803ddd99
 0000000000090000 ffffffff803cf65a 0000000000000000 0000000000090000
Call Trace:
 [<ffffffff803d3a46>] init_ISA_irqs+0x16/0x80
 [<ffffffff803d3ab9>] init_IRQ+0x9/0x1e0
 [<ffffffff803ddd99>] rcu_cpu_notify+0x49/0x60
 [<ffffffff803cf65a>] start_kernel+0xda/0x1f0
 [<ffffffff803cf146>] _sinittext+0x146/0x150


I assume we went splat in start_kernel->trap_init->cpu_init.  We shouldn't
have touched pci_bus_lock that early?  Perhaps acpi does PCI things very
early..

Conceivably an accidental early local_irq_enable could cause bad things,
but that rwsem should be 100% uncontended.

Could the reporters please determine whether disabling the various
CONFIG_DEBUG_* options prevents this?  Such as CONFIG_DEBUG_LOCKDEP,
CONFIG_DEBUG_LOCK_ALLOC, CONFIG_PROVE_LOCKING, etc?

Also, some additional oops traces would be nice, if we can get them.

(Please do reply-to-all via email from now on, rather than using the
bugzilla UI).


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine
       [not found] <200612181543.kBIFhcIc001555@fire-2.osdl.org>
@ 2006-12-18 16:48 ` Eric W. Biederman
  2006-12-20  1:29   ` Andrew Morton
  0 siblings, 1 reply; 42+ messages in thread
From: Eric W. Biederman @ 2006-12-18 16:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: bugme-daemon, Zhang Yanmin, linux-kernel

bugme-daemon@bugzilla.kernel.org writes:

> http://bugzilla.kernel.org/show_bug.cgi?id=7505
>
> ------- Additional Comments From agalanin@mera.ru  2006-12-18 07:39 -------
> OK, fixed.


Greg.

It appears commit d71374dafbba7ec3f67371d3b7e9f6310a588808 which
replaced the pci bus spinlock with a semaphore causes some systems not
to boot.  I haven't a clue why.   

So I figure I would toss the ball over to your court to see if you can
look and see what needs to happen to resolve this problem.

There appears to be at least one positive confirmation that reverting
this patch allows this patch fixes the problems.

Eric







^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2008-03-04  0:35 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-22  4:41 [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine Zhang, Yanmin
2006-12-22  8:22 ` Ard -kwaak- van Breemen
2006-12-22  8:30   ` Andrew Morton
2006-12-22  9:32     ` Stefano Takekawa
2006-12-22  9:43       ` Andrew Morton
2006-12-22 13:23         ` Stefano Takekawa
2006-12-22 10:30     ` Ard -kwaak- van Breemen
2006-12-22 14:00       ` Ard -kwaak- van Breemen
2006-12-22 14:16         ` Ard -kwaak- van Breemen
2006-12-22 19:10           ` Andrew Morton
2006-12-22 14:35     ` Ard -kwaak- van Breemen
2006-12-29 15:08       ` Ard -kwaak- van Breemen
2006-12-22 14:41     ` Ard -kwaak- van Breemen
2006-12-22 15:42       ` Ard -kwaak- van Breemen
2006-12-28 23:51         ` Andrew Morton
2006-12-29 10:18           ` Stefano Takekawa
2006-12-29 12:51           ` Ard -kwaak- van Breemen
2006-12-29 13:27             ` Ard -kwaak- van Breemen
2006-12-29 14:10               ` Ard -kwaak- van Breemen
2006-12-29 15:01                 ` Ard -kwaak- van Breemen
2006-12-29 15:05                   ` Ard -kwaak- van Breemen
2006-12-29 15:24                   ` Ard -kwaak- van Breemen
2006-12-29 15:42                   ` Ard -kwaak- van Breemen
2006-12-30 19:46                     ` [PATCH 2.6.20-rc2-git1] start_kernel: Test if irq's got enabled early, barf, and disable them again Ard -kwaak- van Breemen
2008-03-03 22:46                       ` Tony Luck
2008-03-04  0:34                         ` Stephen Rothwell
2006-12-30 19:58                     ` [PATCH 2.6.20-rc2-git1] kernelparams: detect if and which parameter parsing enabled irq's Ard -kwaak- van Breemen
2006-12-30 20:15                     ` [PATCH 2.6.20-rc2-git1] PCI: prevent down_read when pci_devices is empty Ard -kwaak- van Breemen
  -- strict thread matches above, loose matches on Subject: below --
2006-12-21  8:04 [Bug 7505] Linux-2.6.18 fails to boot on AMD64 machine Zhang, Yanmin
2006-12-21 19:52 ` Ard -kwaak- van Breemen
2006-12-21 20:11   ` Andrew Morton
2006-12-21 21:05 ` Ard -kwaak- van Breemen
2006-12-22 18:42 ` Ard -kwaak- van Breemen
2006-12-22 19:39   ` Stefano Takekawa
2006-12-20  9:59 Chuck Ebbert
2006-12-20 10:12 ` Yinghai Lu
2006-12-20 10:37 ` Andrew Morton
2006-12-20 10:55   ` Arjan van de Ven
2006-12-20  6:42 Chuck Ebbert
2006-12-20  9:11 ` Yinghai Lu
     [not found] <200612181543.kBIFhcIc001555@fire-2.osdl.org>
2006-12-18 16:48 ` Eric W. Biederman
2006-12-20  1:29   ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).