[APIC] Avoid change apic_id failure panic
diff mbox series

Message ID 20040603101313.GB6578@gondor.apana.org.au
State New, archived
Headers show
Series
  • [APIC] Avoid change apic_id failure panic
Related show

Commit Message

Herbert Xu June 3, 2004, 10:13 a.m. UTC
Hi:

I've received two reports at http://bugs.debian.org/251207 where
ioapic caused machines to lock up during booting due to the
change apic_id panic in arch/i386/kernel/io_apic.c.

Since it appears that we can avoid panicking at all, I think we
should replace the panic calls with the following patch which
attempts to continue after the failure.

I've also done the same thing to the other panic() call in the
same function.

Cheers,

Comments

Andrew Morton June 3, 2004, 11:20 p.m. UTC | #1
Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> I've received two reports at http://bugs.debian.org/251207 where
> ioapic caused machines to lock up during booting due to the
> change apic_id panic in arch/i386/kernel/io_apic.c.
> 
> Since it appears that we can avoid panicking at all, I think we
> should replace the panic calls with the following patch which
> attempts to continue after the failure.
> 
> I've also done the same thing to the other panic() call in the
> same function.

Well.  Question is, why are we getting insame APIC ID's in there in the
first place?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
William Lee Irwin III June 3, 2004, 11:37 p.m. UTC | #2
Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> I've received two reports at http://bugs.debian.org/251207 where
>> ioapic caused machines to lock up during booting due to the
>> change apic_id panic in arch/i386/kernel/io_apic.c.
>> Since it appears that we can avoid panicking at all, I think we
>> should replace the panic calls with the following patch which
>> attempts to continue after the failure.
>> I've also done the same thing to the other panic() call in the
>> same function.

On Thu, Jun 03, 2004 at 04:20:45PM -0700, Andrew Morton wrote:
> Well.  Question is, why are we getting insame APIC ID's in there in the
> first place?

They're usually not insane. xAPIC's have 8-bit physical ID's, not 4-bit,
but no one's bothered tracking whether the APIC's found were serial APIC
or xAPIC, and then dispatching on that in the IO-APIC physid check to
avoid unnecessarily renumbering the things or panicking.

This is my longstanding complaint regarding APIC_BROADCAST_ID being wrong.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Andrew Morton June 3, 2004, 11:44 p.m. UTC | #3
William Lee Irwin III <wli@holomorphy.com> wrote:
>
> Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >> I've received two reports at http://bugs.debian.org/251207 where
> >> ioapic caused machines to lock up during booting due to the
> >> change apic_id panic in arch/i386/kernel/io_apic.c.
> >> Since it appears that we can avoid panicking at all, I think we
> >> should replace the panic calls with the following patch which
> >> attempts to continue after the failure.
> >> I've also done the same thing to the other panic() call in the
> >> same function.
> 
> On Thu, Jun 03, 2004 at 04:20:45PM -0700, Andrew Morton wrote:
> > Well.  Question is, why are we getting insame APIC ID's in there in the
> > first place?
> 
> They're usually not insane. xAPIC's have 8-bit physical ID's, not 4-bit,
> but no one's bothered tracking whether the APIC's found were serial APIC
> or xAPIC, and then dispatching on that in the IO-APIC physid check to
> avoid unnecessarily renumbering the things or panicking.

So... what should we do?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
William Lee Irwin III June 3, 2004, 11:48 p.m. UTC | #4
William Lee Irwin III <wli@holomorphy.com> wrote:
>> They're usually not insane. xAPIC's have 8-bit physical ID's, not 4-bit,
>> but no one's bothered tracking whether the APIC's found were serial APIC
>> or xAPIC, and then dispatching on that in the IO-APIC physid check to
>> avoid unnecessarily renumbering the things or panicking.

On Thu, Jun 03, 2004 at 04:44:15PM -0700, Andrew Morton wrote:
> So... what should we do?

In all likelihood, make the thing a variable or return a value based
on the result of GET_APIC_VERSION(), particularly since guessing wrong
either way takes out machines before console_init(). Returning values
based on the result of GET_APIC_VERSION() has a predecent, get_maxlvt().


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

diff -Nru a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
--- a/arch/i386/kernel/io_apic.c	2004-06-03 20:02:42 +10:00
+++ b/arch/i386/kernel/io_apic.c	2004-06-03 20:02:42 +10:00
@@ -2365,8 +2365,10 @@ 
 				break;
 		}
 
-		if (i == IO_APIC_MAX_ID)
-			panic("Max apic_id exceeded!\n");
+		if (i == IO_APIC_MAX_ID) {
+			printk(KERN_ERR "Max apic_id exceeded!\n");
+			return -1;
+		}
 
 		printk(KERN_WARNING "IOAPIC[%d]: apic_id %d already used, "
 			"trying %d\n", ioapic, apic_id, i);
@@ -2374,9 +2376,6 @@ 
 		apic_id = i;
 	} 
 
-	tmp = apicid_to_cpu_present(apic_id);
-	physids_or(apic_id_map, apic_id_map, tmp);
-
 	if (reg_00.bits.ID != apic_id) {
 		reg_00.bits.ID = apic_id;
 
@@ -2386,9 +2385,15 @@ 
 		spin_unlock_irqrestore(&ioapic_lock, flags);
 
 		/* Sanity check */
-		if (reg_00.bits.ID != apic_id)
-			panic("IOAPIC[%d]: Unable change apic_id!\n", ioapic);
+		if (reg_00.bits.ID != apic_id) {
+			printk(KERN_ERR "IOAPIC[%d]: Unable change apic_id!\n",
+			       ioapic);
+			return -1;
+		}
 	}
+
+	tmp = apicid_to_cpu_present(apic_id);
+	physids_or(apic_id_map, apic_id_map, tmp);
 
 	printk(KERN_INFO "IOAPIC[%d]: Assigned apic_id %d\n", ioapic, apic_id);
 
diff -Nru a/arch/i386/kernel/mpparse.c b/arch/i386/kernel/mpparse.c
--- a/arch/i386/kernel/mpparse.c	2004-06-03 20:02:41 +10:00
+++ b/arch/i386/kernel/mpparse.c	2004-06-03 20:02:41 +10:00
@@ -881,6 +881,7 @@ 
 	u32			gsi_base)
 {
 	int			idx = 0;
+	int			apicid;
 
 	if (nr_ioapics >= MAX_IO_APICS) {
 		printk(KERN_ERR "ERROR: Max # of I/O APICs (%d) exceeded "
@@ -893,14 +894,19 @@ 
 		return;
 	}
 
-	idx = nr_ioapics++;
+	idx = nr_ioapics;
 
 	mp_ioapics[idx].mpc_type = MP_IOAPIC;
 	mp_ioapics[idx].mpc_flags = MPC_APIC_USABLE;
 	mp_ioapics[idx].mpc_apicaddr = address;
 
 	set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address);
-	mp_ioapics[idx].mpc_apicid = io_apic_get_unique_id(idx, id);
+	apicid = io_apic_get_unique_id(idx, id);
+	if (apicid < 0) {
+		clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
+		return;
+	}
+	mp_ioapics[idx].mpc_apicid = apicid;
 	mp_ioapics[idx].mpc_apicver = io_apic_get_version(idx);
 	
 	/* 
@@ -912,6 +918,7 @@ 
 	mp_ioapic_routing[idx].gsi_end = gsi_base + 
 		io_apic_get_redir_entries(idx);
 
+	nr_ioapics++;
 	printk("IOAPIC[%d]: apic_id %d, version %d, address 0x%lx, "
 		"GSI %d-%d\n", idx, mp_ioapics[idx].mpc_apicid, 
 		mp_ioapics[idx].mpc_apicver, mp_ioapics[idx].mpc_apicaddr,