linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUGFIX PATCH] kexec & iosapic: kexec oops when iosapic was removed
@ 2012-08-10  7:23 Hanjun Guo
  2012-08-11  3:10 ` [BUGFIX PATCH][RESEND] " Hanjun Guo
  0 siblings, 1 reply; 6+ messages in thread
From: Hanjun Guo @ 2012-08-10  7:23 UTC (permalink / raw)
  To: linux-kernel, linux-pci
  Cc: Eric Biederman, Vivek Goyal, Haren Myneni, Tony Luck, Jiang Liu,
	Toshi Kani, Yinghai Lu, Yasuaki Ishimatsu, Taku Izumi,
	Wen Congyang, Tang Chen, Hanjun Guo, Jianguo Wu

Hi, all
We are working on a node hot-plug project, and IOAPIC is one of these devices to
be removed. but after IOSAPIC was removed, we use kexec to start a new kernel,
oops happended.

I reviewed the code and find out:
iosapic_remove
  iosapic_free
    memset(&iosapic_lists[index], 0, sizeof(iosapic_lists[0]))
      iosapic_lists[index].addr was set to 0;

and then kexec a new kernel
kexec_disable_iosapic
  iosapic_write(rte->iosapic,..)
    __iosapic_write(iosapic->addr, reg, val);
      addr was set to 0 when iosapic_remove, and oops happened

here is the oops information:

Starting new kernel
kexec[11336]: Oops 8804682956800 [1]
Modules linked in: raw(N) ipv6(N) acpi_cpufreq(N) binfmt_misc(N) fuse(N) nls_iso
8859_1(N) loop(N) ipmi_si(N) ipmi_devintf(N) ipmi_msghandler(N) mca_ereport(N) s
csi_ereport(N) nic_ereport(N) pcie_ereport(N) err_transport(N) nvlist(PN) dm_mod
(N) tpm_tis(N) tpm(N) ppdev(N) tpm_bios(N) serio_raw(N) i2c_i801(N) iTCO_wdt(N)
i2c_core(N) iTCO_vendor_support(N) sg(N) ioatdma(N) igb(N) mptctl(N) dca(N) parp
ort_pc(N) parport(N) container(N) button(N) usbhid(N) hid(N) uhci_hcd(N) ehci_hc
d(N) usbcore(N) sd_mod(N) crc_t10dif(N) ext3(N) mbcache(N) jbd(N) fan(N) process
or(N) ide_pci_generic(N) ide_core(N) ata_piix(N) libata(N) mptsas(N) mptscsih(N)
 mptbase(N) scsi_transport_sas(N) scsi_mod(N) thermal(N) thermal_sys(N) hwmon(N)

Supported: Yes, External

Pid: 11336, CPU 0, comm:                kexec
psr : 0000101009522030 ifs : 8000000000000791 ip  : [<a00000010004c160>]    Tain
ted: P          N  (2.6.32.12_RAS_V1R3C00B011)
ip is at kexec_disable_iosapic+0x120/0x1e0
unat: 0000000000000000 pfs : 0000000000000791 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 65519aa6a555a659
ldrs: 0000000000000000 ccv : 00000000ea3cf51e fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a00000010004c150 b6  : a000000100012620 b7  : a00000010000cda0
f6  : 000000000000000000000 f7  : 1003e0000000002000000
f8  : 1003e0000000050000003 f9  : 1003e0000028fb97183cd
f10 : 1003ee9f380df3c548b67 f11 : 1003e00000000000000cc
r1  : a0000001016cf660 r2  : 0000000000000000 r3  : 0000000000000000
r8  : 0000001009526030 r9  : a000000100012620 r10 : e00000010053f600
r11 : c0000000fec34040 r12 : e00000078f76fd30 r13 : e00000078f760000
r14 : 0000000000000000 r15 : 0000000000000000 r16 : 0000000000000000
r17 : 0000000000000000 r18 : 0000000000007fff r19 : 0000000000000000
r20 : 0000000000000000 r21 : e00000010053f590 r22 : a000000100cf0000
r23 : 0000000000000036 r24 : e0000007002f8a84 r25 : 0000000000000022
r26 : e0000007002f8a88 r27 : 0000000000000020 r28 : 0000000000000002
r29 : a0000001012c8c60 r30 : 0000000000000000 r31 : 0000000000322e49

Call Trace:
 [<a000000100018ca0>] show_stack+0x80/0xa0
                                sp=e00000078f76f8f0 bsp=e00000078f761380
 [<a000000100019300>] show_regs+0x640/0x920
                                sp=e00000078f76fac0 bsp=e00000078f761328
 [<a00000010002a130>] die+0x190/0x2e0
                                sp=e00000078f76fad0 bsp=e00000078f7612e8
 [<a000000100922fa0>] ia64_do_page_fault+0x840/0xb20
                                sp=e00000078f76fad0 bsp=e00000078f761288
 [<a00000010000d5c0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e00000078f76fb60 bsp=e00000078f761288
 [<a00000010004c160>] kexec_disable_iosapic+0x120/0x1e0
                                sp=e00000078f76fd30 bsp=e00000078f761200
 [<a000000100016970>] machine_shutdown+0x110/0x140
                                sp=e00000078f76fd30 bsp=e00000078f7611c8
 [<a000000100133530>] kernel_kexec+0xd0/0x120
                                sp=e00000078f76fd30 bsp=e00000078f7611a0
 [<a0000001000eca40>] sys_reboot+0x480/0x4e0
                                sp=e00000078f76fd30 bsp=e00000078f761128
 [<a00000010000d420>] ia64_ret_from_syscall+0x0/0x20
                                sp=e00000078f76fe30 bsp=e00000078f761120
Kernel panic - not syncing: Fatal exception
irq 69: nobody cared (try booting with the "irqpoll" option)


Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
---
 arch/ia64/kernel/iosapic.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kernel/iosapic.c b/arch/ia64/kernel/iosapic.c
index ef4b5d8..11ce1ec 100644
--- a/arch/ia64/kernel/iosapic.c
+++ b/arch/ia64/kernel/iosapic.c
@@ -276,6 +276,9 @@ kexec_disable_iosapic(void)
 		vec = irq_to_vector(irq);
 		list_for_each_entry(rte, &info->rtes,
 				rte_list) {
+			if (rte->refcnt == NO_REF_RTE)
+				continue;
+
 			iosapic_write(rte->iosapic,
 					IOSAPIC_RTE_LOW(rte->rte_index),
 					IOSAPIC_MASK|vec);
-- 
1.7.6.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [BUGFIX PATCH][RESEND] kexec & iosapic: kexec oops when iosapic was removed
  2012-08-10  7:23 [BUGFIX PATCH] kexec & iosapic: kexec oops when iosapic was removed Hanjun Guo
@ 2012-08-11  3:10 ` Hanjun Guo
  2012-08-13  2:54   ` Luck, Tony
  0 siblings, 1 reply; 6+ messages in thread
From: Hanjun Guo @ 2012-08-11  3:10 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu
  Cc: linux-ia64, linux-kernel, Jiang Liu, Eric Biederman, Vivek Goyal,
	Haren Myneni, Toshi Kani, Yinghai Lu, Yasuaki Ishimatsu,
	Taku Izumi, Wen Congyang, Tang Chen, Hanjun Guo, Jianguo Wu

Hi, all
We are working on a node hot-plug project, and IOAPIC is one of these devices to
be removed. but after IOSAPIC was removed, we use kexec to start a new kernel,
oops happended.

I reviewed the code and find out:
iosapic_remove
  iosapic_free
    memset(&iosapic_lists[index], 0, sizeof(iosapic_lists[0]))
      iosapic_lists[index].addr was set to 0;

and then kexec a new kernel
kexec_disable_iosapic
  iosapic_write(rte->iosapic,..)
    __iosapic_write(iosapic->addr, reg, val);
      addr was set to 0 when iosapic_remove, and oops happened

here is the oops information:

Starting new kernel
kexec[11336]: Oops 8804682956800 [1]
Modules linked in: raw(N) ipv6(N) acpi_cpufreq(N) binfmt_misc(N) fuse(N) nls_iso
8859_1(N) loop(N) ipmi_si(N) ipmi_devintf(N) ipmi_msghandler(N) mca_ereport(N) s
csi_ereport(N) nic_ereport(N) pcie_ereport(N) err_transport(N) nvlist(PN) dm_mod
(N) tpm_tis(N) tpm(N) ppdev(N) tpm_bios(N) serio_raw(N) i2c_i801(N) iTCO_wdt(N)
i2c_core(N) iTCO_vendor_support(N) sg(N) ioatdma(N) igb(N) mptctl(N) dca(N) parp
ort_pc(N) parport(N) container(N) button(N) usbhid(N) hid(N) uhci_hcd(N) ehci_hc
d(N) usbcore(N) sd_mod(N) crc_t10dif(N) ext3(N) mbcache(N) jbd(N) fan(N) process
or(N) ide_pci_generic(N) ide_core(N) ata_piix(N) libata(N) mptsas(N) mptscsih(N)
 mptbase(N) scsi_transport_sas(N) scsi_mod(N) thermal(N) thermal_sys(N) hwmon(N)

Supported: Yes, External

Pid: 11336, CPU 0, comm:                kexec
psr : 0000101009522030 ifs : 8000000000000791 ip  : [<a00000010004c160>]    Tain
ted: P          N  (2.6.32.12_RAS_V1R3C00B011)
ip is at kexec_disable_iosapic+0x120/0x1e0
unat: 0000000000000000 pfs : 0000000000000791 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 65519aa6a555a659
ldrs: 0000000000000000 ccv : 00000000ea3cf51e fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a00000010004c150 b6  : a000000100012620 b7  : a00000010000cda0
f6  : 000000000000000000000 f7  : 1003e0000000002000000
f8  : 1003e0000000050000003 f9  : 1003e0000028fb97183cd
f10 : 1003ee9f380df3c548b67 f11 : 1003e00000000000000cc
r1  : a0000001016cf660 r2  : 0000000000000000 r3  : 0000000000000000
r8  : 0000001009526030 r9  : a000000100012620 r10 : e00000010053f600
r11 : c0000000fec34040 r12 : e00000078f76fd30 r13 : e00000078f760000
r14 : 0000000000000000 r15 : 0000000000000000 r16 : 0000000000000000
r17 : 0000000000000000 r18 : 0000000000007fff r19 : 0000000000000000
r20 : 0000000000000000 r21 : e00000010053f590 r22 : a000000100cf0000
r23 : 0000000000000036 r24 : e0000007002f8a84 r25 : 0000000000000022
r26 : e0000007002f8a88 r27 : 0000000000000020 r28 : 0000000000000002
r29 : a0000001012c8c60 r30 : 0000000000000000 r31 : 0000000000322e49

Call Trace:
 [<a000000100018ca0>] show_stack+0x80/0xa0
                                sp=e00000078f76f8f0 bsp=e00000078f761380
 [<a000000100019300>] show_regs+0x640/0x920
                                sp=e00000078f76fac0 bsp=e00000078f761328
 [<a00000010002a130>] die+0x190/0x2e0
                                sp=e00000078f76fad0 bsp=e00000078f7612e8
 [<a000000100922fa0>] ia64_do_page_fault+0x840/0xb20
                                sp=e00000078f76fad0 bsp=e00000078f761288
 [<a00000010000d5c0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e00000078f76fb60 bsp=e00000078f761288
 [<a00000010004c160>] kexec_disable_iosapic+0x120/0x1e0
                                sp=e00000078f76fd30 bsp=e00000078f761200
 [<a000000100016970>] machine_shutdown+0x110/0x140
                                sp=e00000078f76fd30 bsp=e00000078f7611c8
 [<a000000100133530>] kernel_kexec+0xd0/0x120
                                sp=e00000078f76fd30 bsp=e00000078f7611a0
 [<a0000001000eca40>] sys_reboot+0x480/0x4e0
                                sp=e00000078f76fd30 bsp=e00000078f761128
 [<a00000010000d420>] ia64_ret_from_syscall+0x0/0x20
                                sp=e00000078f76fe30 bsp=e00000078f761120
Kernel panic - not syncing: Fatal exception
irq 69: nobody cared (try booting with the "irqpoll" option)


Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
---
 arch/ia64/kernel/iosapic.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kernel/iosapic.c b/arch/ia64/kernel/iosapic.c
index ef4b5d8..11ce1ec 100644
--- a/arch/ia64/kernel/iosapic.c
+++ b/arch/ia64/kernel/iosapic.c
@@ -276,6 +276,9 @@ kexec_disable_iosapic(void)
 		vec = irq_to_vector(irq);
 		list_for_each_entry(rte, &info->rtes,
 				rte_list) {
+			if (rte->refcnt == NO_REF_RTE)
+				continue;
+
 			iosapic_write(rte->iosapic,
 					IOSAPIC_RTE_LOW(rte->rte_index),
 					IOSAPIC_MASK|vec);
-- 
1.7.6.1



.





^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: [BUGFIX PATCH][RESEND] kexec & iosapic: kexec oops when iosapic was removed
  2012-08-11  3:10 ` [BUGFIX PATCH][RESEND] " Hanjun Guo
@ 2012-08-13  2:54   ` Luck, Tony
  2012-08-16 10:28     ` Hanjun Guo
  0 siblings, 1 reply; 6+ messages in thread
From: Luck, Tony @ 2012-08-13  2:54 UTC (permalink / raw)
  To: Hanjun Guo, Yu, Fenghua
  Cc: linux-ia64, linux-kernel, Jiang Liu, Eric Biederman, Vivek Goyal,
	Haren Myneni, Toshi Kani, Yinghai Lu, Yasuaki Ishimatsu,
	Taku Izumi, Wen Congyang, Tang Chen, Jianguo Wu

>		vec = irq_to_vector(irq);
> 		list_for_each_entry(rte, &info->rtes,
> 				rte_list) {
> +			if (rte->refcnt == NO_REF_RTE)
> +				continue;
> +
> 			iosapic_write(rte->iosapic,
> 					IOSAPIC_RTE_LOW(rte->rte_index),
> 					IOSAPIC_MASK|vec);

This will work - but is it papering over a problem when you removed the
iosapic? Should we really have removed this "rte" from rte_list when the
iosapic was removed?

-Tony

.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUGFIX PATCH][RESEND] kexec & iosapic: kexec oops when iosapic was removed
  2012-08-13  2:54   ` Luck, Tony
@ 2012-08-16 10:28     ` Hanjun Guo
  2012-08-16 19:33       ` Toshi Kani
  0 siblings, 1 reply; 6+ messages in thread
From: Hanjun Guo @ 2012-08-16 10:28 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Yu, Fenghua, linux-ia64, linux-kernel, Jiang Liu, Eric Biederman,
	Vivek Goyal, Haren Myneni, Toshi Kani, Yinghai Lu,
	Yasuaki Ishimatsu, Taku Izumi, Wen Congyang, Tang Chen,
	Jianguo Wu

On 2012/8/13 10:54, Luck, Tony wrote:
>> 		vec = irq_to_vector(irq);
>> 		list_for_each_entry(rte, &info->rtes,
>> 				rte_list) {
>> +			if (rte->refcnt == NO_REF_RTE)
>> +				continue;
>> +
>> 			iosapic_write(rte->iosapic,
>> 					IOSAPIC_RTE_LOW(rte->rte_index),
>> 					IOSAPIC_MASK|vec);
> 
> This will work - but is it papering over a problem when you removed the
> iosapic? Should we really have removed this "rte" from rte_list when the
> iosapic was removed?
> 
> -Tony
> 

Hi Tony,
    Thanks for your comments, and sorry for the late reply.

We only set rte->refcnt to NO_REF_RTE if no device attach to this RTE when
unregister a GSI, and increase the rte->refcnt if the RTE is already existing
when register a GSI, so "rte" will not removed from rte_list when the
iosapic is removed.

Actually, the rte_list will keep static when remove/add a existing iosapic
after boot up.

Should we remove the RTE from the rte_list? if yes, we will have more
to do than this patch.

Thanks
Hanjun Guo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUGFIX PATCH][RESEND] kexec & iosapic: kexec oops when iosapic was removed
  2012-08-16 10:28     ` Hanjun Guo
@ 2012-08-16 19:33       ` Toshi Kani
  2012-08-20  7:46         ` Hanjun Guo
  0 siblings, 1 reply; 6+ messages in thread
From: Toshi Kani @ 2012-08-16 19:33 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Luck, Tony, Yu, Fenghua, linux-ia64, linux-kernel, Jiang Liu,
	Eric Biederman, Vivek Goyal, Haren Myneni, Yinghai Lu,
	Yasuaki Ishimatsu, Taku Izumi, Wen Congyang, Tang Chen,
	Jianguo Wu

On Thu, 2012-08-16 at 18:28 +0800, Hanjun Guo wrote:
> On 2012/8/13 10:54, Luck, Tony wrote:
> >> 		vec = irq_to_vector(irq);
> >> 		list_for_each_entry(rte, &info->rtes,
> >> 				rte_list) {
> >> +			if (rte->refcnt == NO_REF_RTE)
> >> +				continue;
> >> +
> >> 			iosapic_write(rte->iosapic,
> >> 					IOSAPIC_RTE_LOW(rte->rte_index),
> >> 					IOSAPIC_MASK|vec);
> > 
> > This will work - but is it papering over a problem when you removed the
> > iosapic? Should we really have removed this "rte" from rte_list when the
> > iosapic was removed?
> > 
> > -Tony
> > 
> 
> Hi Tony,
>     Thanks for your comments, and sorry for the late reply.
> 
> We only set rte->refcnt to NO_REF_RTE if no device attach to this RTE when
> unregister a GSI, and increase the rte->refcnt if the RTE is already existing
> when register a GSI, so "rte" will not removed from rte_list when the
> iosapic is removed.

Hi Hanjun,

I think updating rte->refcnt makes sense as long as rte->iosapic points
to a valid iosapic entry.  It looks odd to me that rte->iosapic is left
pointing an invalid iosapic entry after this iosapic is removed.  So, I
agree with Tony's concern.

Thanks,
-Toshi


> Actually, the rte_list will keep static when remove/add a existing iosapic
> after boot up.
> 
> Should we remove the RTE from the rte_list? if yes, we will have more
> to do than this patch.
> 
> Thanks
> Hanjun Guo
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUGFIX PATCH][RESEND] kexec & iosapic: kexec oops when iosapic was removed
  2012-08-16 19:33       ` Toshi Kani
@ 2012-08-20  7:46         ` Hanjun Guo
  0 siblings, 0 replies; 6+ messages in thread
From: Hanjun Guo @ 2012-08-20  7:46 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Luck, Tony, Yu, Fenghua, linux-ia64, linux-kernel, Jiang Liu,
	Eric Biederman, Vivek Goyal, Haren Myneni, Yinghai Lu,
	Yasuaki Ishimatsu, Taku Izumi, Wen Congyang, Tang Chen,
	Jianguo Wu

On 2012/8/17 3:33, Toshi Kani wrote:
> On Thu, 2012-08-16 at 18:28 +0800, Hanjun Guo wrote:
>> On 2012/8/13 10:54, Luck, Tony wrote:
>>>> 		vec = irq_to_vector(irq);
>>>> 		list_for_each_entry(rte, &info->rtes,
>>>> 				rte_list) {
>>>> +			if (rte->refcnt == NO_REF_RTE)
>>>> +				continue;
>>>> +
>>>> 			iosapic_write(rte->iosapic,
>>>> 					IOSAPIC_RTE_LOW(rte->rte_index),
>>>> 					IOSAPIC_MASK|vec);
>>>
>>> This will work - but is it papering over a problem when you removed the
>>> iosapic? Should we really have removed this "rte" from rte_list when the
>>> iosapic was removed?
>>>
>>> -Tony
>>>
>>
>> Hi Tony,
>>     Thanks for your comments, and sorry for the late reply.
>>
>> We only set rte->refcnt to NO_REF_RTE if no device attach to this RTE when
>> unregister a GSI, and increase the rte->refcnt if the RTE is already existing
>> when register a GSI, so "rte" will not removed from rte_list when the
>> iosapic is removed.
> 
> Hi Hanjun,
> 
> I think updating rte->refcnt makes sense as long as rte->iosapic points
> to a valid iosapic entry.  It looks odd to me that rte->iosapic is left
> pointing an invalid iosapic entry after this iosapic is removed.  So, I
> agree with Tony's concern.

Hi Toshi,

I agree with you and Tony.
I will find a better solution and do some clean up, and then send another patch.

Thanks
Hanjun Guo





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-08-20  7:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-10  7:23 [BUGFIX PATCH] kexec & iosapic: kexec oops when iosapic was removed Hanjun Guo
2012-08-11  3:10 ` [BUGFIX PATCH][RESEND] " Hanjun Guo
2012-08-13  2:54   ` Luck, Tony
2012-08-16 10:28     ` Hanjun Guo
2012-08-16 19:33       ` Toshi Kani
2012-08-20  7:46         ` Hanjun Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).