linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] PCI/AER: Fix NULL pci_ops return when hotplug a pci bus which was doing aer error inject
@ 2012-08-11 11:52 Yijing Wang
  2012-08-11 11:52 ` [PATCH 2/3] PCI/AER: Clean pci_bus_ops when related pci bus was removed Yijing Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Yijing Wang @ 2012-08-11 11:52 UTC (permalink / raw)
  To: yijingwang21, Bjorn Helgaas, linux-pci
  Cc: Hanjun Guo, Jiang Liu, Yinghai Lu, Huang Ying, Yijing Wang, Jiang Liu

When we inject aer errors to the target pci device by aer_inject module, the pci_ops of pci
bus which the target device is on will be assign to pci_ops_aer.So if the target pci device
is a bridge, once we hotplug the pci bus(child bus) which the target device bridges to, child
bus's pci_ops will be assigned to pci_ops_aer too.Now every access to the child bus's device
will result to system panic, because it return NULL pci_ops in pci_read_aer.

CallTrace:
bash[5908]: NaT consumption 17179869216 [1]
Modules linked in: aer_inject cpufreq_conservative cpufreq_userspace cpufreq_pow
ersave acpi_cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si(+) ipmi_devintf
ipmi_msghandler dm_mod ppdev iTCO_wdt iTCO_vendor_support sg igb parport_pc i2c_
i801 mptctl i2c_core serio_raw hid_generic lpc_ich mfd_core parport button conta
iner usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbca
che jbd fan processor ide_pci_generic ide_core ata_piix libata mptsas mptscsih m
ptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon

Pid: 5908, CPU 9, comm:                 bash
psr : 00001010085a2010 ifs : 800000000000048e ip  : [<a000000220b815b0>]    Not
tainted (3.5.0-rc6yijing-repo)
ip is at pci_read_aer+0x330/0x460 [aer_inject]
unat: 0000000000000000 pfs : 000000000000048e rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 65519aa6a6969aa5
ldrs: 0000000000000000 ccv : ffffffff00000001 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a000000220b815b0 b6  : a000000220b81280 b7  : a0000001006d56a0
f6  : 1003e0000000000000005 f7  : 1003e0000000000000028
f8  : 1003e00000000000000c8 f9  : 1003e0000000000000005
f10 : 1003e627ec1e2f4c0d8a7 f11 : 1003e0000000000000011
r1  : a0000001014e63c0 r2  : 0000000000000738 r3  : 000000000000fffe
r8  : 0000000000000736 r9  : 0000000000000042 r10 : e000001f08f4c898
r11 : 0000000000000000 r12 : e000000f3dfcfdc0 r13 : e000000f3dfc0000
r14 : 0000000000000738 r15 : 0000000000004000 r16 : a000000220b827c8
r17 : a000000220b827b8 r18 : ffffffffffffff00 r19 : e000000f073b0110
r20 : 0000000000000042 r21 : e000000f073b0114 r22 : 0000000000000000
r23 : e000000f073b0118 r24 : a0000001009e0e49 r25 : 0000000000000001
r26 : 0000000000007041 r27 : e000000f3dfcfde0 r28 : 0000000000000000
r29 : e000000f3dfcfc08 r30 : a000000220b827c8 r31 : e000001f074d6000

Call Trace:
 [<a000000100016500>] show_stack+0x80/0xa0
                                sp=e000000f3dfcf800 bsp=e000000f3dfc1758
 [<a000000100016b60>] show_regs+0x640/0x920
                                sp=e000000f3dfcf9d0 bsp=e000000f3dfc1700
 [<a000000100040770>] die+0x190/0x2c0
                                sp=e000000f3dfcf9e0 bsp=e000000f3dfc16c0
 [<a0000001000408f0>] die_if_kernel+0x50/0x80
                                sp=e000000f3dfcf9e0 bsp=e000000f3dfc1690
 [<a000000100903a90>] ia64_fault+0xf0/0x15e0
                                sp=e000000f3dfcf9e0 bsp=e000000f3dfc1640
 [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e000000f3dfcfbf0 bsp=e000000f3dfc1640
 [<a000000220b815b0>] pci_read_aer+0x330/0x460 [aer_inject]
                                sp=e000000f3dfcfdc0 bsp=e000000f3dfc15c8
 [<a0000001004ace00>] pci_bus_read_config_dword+0xe0/0x140
                                sp=e000000f3dfcfdc0 bsp=e000000f3dfc1580
 [<a0000001004b0c10>] pci_bus_read_dev_vendor_id+0x50/0x200
                                sp=e000000f3dfcfdd0 bsp=e000000f3dfc1530
 [<a0000001008d3d10>] pci_scan_single_device+0x90/0x200
                                sp=e000000f3dfcfdd0 bsp=e000000f3dfc14f8
 [<a0000001004b24b0>] pci_scan_slot+0xb0/0x320
                                sp=e000000f3dfcfde0 bsp=e000000f3dfc14a8
 [<a0000001008d9e90>] pci_scan_child_bus+0x90/0x2e0
                                sp=e000000f3dfcfde0 bsp=e000000f3dfc1468
 [<a0000001008d9580>] pci_scan_bridge+0x540/0xdc0
                                sp=e000000f3dfcfde0 bsp=e000000f3dfc13d0
 [<a0000001008da0b0>] pci_scan_child_bus+0x2b0/0x2e0
                                sp=e000000f3dfcfe00 bsp=e000000f3dfc1390
 [<a0000001008d5bd0>] pci_rescan_bus+0x50/0x220
                                sp=e000000f3dfcfe00 bsp=e000000f3dfc1358
 [<a0000001004c2ab0>] bus_rescan_store+0xf0/0x160
                                sp=e000000f3dfcfe10 bsp=e000000f3dfc1328
 [<a0000001006110b0>] bus_attr_store+0x70/0xa0
                                sp=e000000f3dfcfe20 bsp=e000000f3dfc12f0
 [<a000000100343b00>] sysfs_write_file+0x240/0x340
                                sp=e000000f3dfcfe20 bsp=e000000f3dfc1298
 [<a00000010025e230>] vfs_write+0x1b0/0x3a0
                                sp=e000000f3dfcfe20 bsp=e000000f3dfc1250
 [<a00000010025e5e0>] sys_write+0x80/0x100
                                sp=e000000f3dfcfe20 bsp=e000000f3dfc11d0
 [<a00000010000bf20>] ia64_ret_from_syscall+0x0/0x20
                                sp=e000000f3dfcfe30 bsp=e000000f3dfc11d0
 [<a000000000040720>] __kernel_syscall_via_break+0x0/0x20
                                sp=e000000f3dfd0000 bsp=e000000f3dfc11d0
Disabling lock debugging due to kernel taint

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
---
 drivers/pci/pcie/aer/aer_inject.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer/aer_inject.c b/drivers/pci/pcie/aer/aer_inject.c
index 5222986..fc28785 100644
--- a/drivers/pci/pcie/aer/aer_inject.c
+++ b/drivers/pci/pcie/aer/aer_inject.c
@@ -109,6 +109,19 @@ static struct aer_error *__find_aer_error_by_dev(struct pci_dev *dev)
 	return __find_aer_error((u16)domain, dev->bus->number, dev->devfn);
 }
 
+static bool pci_is_upstream_bus(struct pci_bus *bus, struct pci_bus *up_bus)
+{
+	struct pci_bus *pbus = bus->parent;
+
+	while (pbus) {
+		if (pbus == up_bus)
+			return true;
+		pbus = pbus->parent;
+	}
+
+	return false;
+}
+
 /* inject_lock must be held before calling */
 static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
 {
@@ -118,6 +131,13 @@ static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
 		if (bus_ops->bus == bus)
 			return bus_ops->ops;
 	}
+
+	/* here can't find bus_ops, fall back to get bus_ops of upstream bus */
+	list_for_each_entry(bus_ops, &pci_bus_ops_list, list) {
+		if (pci_is_upstream_bus(bus, bus_ops->bus))
+			return bus_ops->ops;
+	}
+
 	return NULL;
 }
 
@@ -506,6 +526,7 @@ static struct miscdevice aer_inject_device = {
 	.fops = &aer_inject_fops,
 };
 
+
 static int __init aer_inject_init(void)
 {
 	return misc_register(&aer_inject_device);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] PCI/AER: Clean pci_bus_ops when related pci bus was removed
  2012-08-11 11:52 [PATCH 1/3] PCI/AER: Fix NULL pci_ops return when hotplug a pci bus which was doing aer error inject Yijing Wang
@ 2012-08-11 11:52 ` Yijing Wang
  2012-08-25  9:59   ` [RESEND BUGFIX PATCH 2/3] PCI/AER: clean " Yijing Wang
  2012-08-11 11:52 ` [PATCH 3/3] PCI: Check whether pci device has been removed when remove a pci device by sysfs Yijing Wang
  2012-08-25  9:59 ` [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when hotplug a pci bus which was doing aer error inject Yijing Wang
  2 siblings, 1 reply; 14+ messages in thread
From: Yijing Wang @ 2012-08-11 11:52 UTC (permalink / raw)
  To: yijingwang21, Bjorn Helgaas, linux-pci
  Cc: Hanjun Guo, Jiang Liu, Yinghai Lu, Huang Ying, Yijing Wang

When Inject aer errors to the target pci device, a pci_bus_ops will
be allocated for the pci device's pci bus.When the pci bus was removed,
we should also release the pci_bus_ops.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
---
 drivers/pci/pcie/aer/aer_inject.c |   43 ++++++++++++++++++++++++++++++++++++-
 1 files changed, 42 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/pcie/aer/aer_inject.c b/drivers/pci/pcie/aer/aer_inject.c
index fc28785..8ca744f 100644
--- a/drivers/pci/pcie/aer/aer_inject.c
+++ b/drivers/pci/pcie/aer/aer_inject.c
@@ -526,10 +526,50 @@ static struct miscdevice aer_inject_device = {
 	.fops = &aer_inject_fops,
 };
 
+static void aer_clean_pci_bus_ops(struct pci_dev *dev)
+{
+	unsigned long flags;
+	struct pci_bus_ops *bus_ops, *tmp_ops;
+	struct pci_bus *bus;
+	bus = dev->subordinate;
+	if (!bus)
+		return;
+
+	spin_lock_irqsave(&inject_lock, flags);
+	list_for_each_entry_safe(bus_ops, tmp_ops, &pci_bus_ops_list, list)
+		if (bus_ops->bus == bus) {
+			list_del(&bus_ops->list);
+			kfree(bus_ops);
+			break;
+		}
+	spin_unlock_irqrestore(&inject_lock, flags);
+}
+
+static int aer_hp_notify_fn(struct notifier_block *nb,
+		unsigned long event, void *data)
+{
+	switch (event) {
+	case BUS_NOTIFY_DEL_DEVICE:
+		aer_clean_pci_bus_ops(to_pci_dev(data));
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block aerinj_hp_notifier = {
+	.notifier_call = &aer_hp_notify_fn,
+};
 
 static int __init aer_inject_init(void)
 {
-	return misc_register(&aer_inject_device);
+	int ret;
+	ret = misc_register(&aer_inject_device);
+	if (!ret)
+		bus_register_notifier(&pci_bus_type, &aerinj_hp_notifier);
+	return ret;
 }
 
 static void __exit aer_inject_exit(void)
@@ -538,6 +578,7 @@ static void __exit aer_inject_exit(void)
 	unsigned long flags;
 	struct pci_bus_ops *bus_ops;
 
+	bus_unregister_notifier(&pci_bus_type, &aerinj_hp_notifier);
 	misc_deregister(&aer_inject_device);
 
 	while ((bus_ops = pci_bus_ops_pop())) {
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] PCI: Check whether pci device has been removed when remove a pci device by sysfs
  2012-08-11 11:52 [PATCH 1/3] PCI/AER: Fix NULL pci_ops return when hotplug a pci bus which was doing aer error inject Yijing Wang
  2012-08-11 11:52 ` [PATCH 2/3] PCI/AER: Clean pci_bus_ops when related pci bus was removed Yijing Wang
@ 2012-08-11 11:52 ` Yijing Wang
  2012-08-25  9:59   ` [RESEND BUGFIX PATCH 3/3] PCI: check " Yijing Wang
  2012-08-25  9:59 ` [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when hotplug a pci bus which was doing aer error inject Yijing Wang
  2 siblings, 1 reply; 14+ messages in thread
From: Yijing Wang @ 2012-08-11 11:52 UTC (permalink / raw)
  To: yijingwang21, Bjorn Helgaas, linux-pci
  Cc: Hanjun Guo, Jiang Liu, Yinghai Lu, Huang Ying, Yijing Wang

We remove a pci device maybe like this
echo 1 > /sys/bus/pci/devices/xxxx:xx:xx.x/remove
Then remove_store function will be called to complete this remove work,
later the remove work will be queued to sysfs_workqueue by device_schedule_callback.
So if we remove a pci root port device and a pci endpoint device which was the root
port's child device concurrently.The endponit device will be removed when root port's
remove work completed,so when endpoint device itself's remove work start, since endpoint
device has been removed, it will result to oops.

CallTrace:
kworker/u:2[220]: Oops 11003706212352 [1]
Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi
_cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si ipmi_devintf ipmi_msghandle
r dm_mod igb ppdev iTCO_wdt parport_pc iTCO_vendor_support i2c_i801 parport sg m
ptctl serio_raw i2c_core lpc_ich mfd_core hid_generic button container usbhid hi
d uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbcache jbd fan pr
ocessor ide_pci_generic ide_core ata_piix libata mptsas mptscsih mptbase scsi_tr
ansport_sas scsi_mod thermal thermal_sys hwmon

Pid: 220, CPU 30, comm:          kworker/u:2
psr : 0000121008526030 ifs : 8000000000000388 ip  : [<a0000001004b3081>]    Not
tainted (3.5.0-rc6yijing-repo)
ip is at __pci_remove_bus_device+0x101/0x1e0
unat: 0000000000000000 pfs : 0000000000000388 rsc : 0000000000000003
rnat: ffffffffffffffff bsps: ffffffffffffffff pr  : 0000080001919585
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c9e70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001004b3060 b6  : a0000001004c2400 b7  : a0000001000faae0
f6  : 000000000000000000000 f7  : 1003e00000000000057cd
f8  : 1003e0000000050000003 f9  : 1003e000001cb8678a0d0
f10 : 1003e9a05b7a39369e270 f11 : 1003e000000000000008f
r1  : a0000001014e63c0 r2  : e000001f075dec00 r3  : 0000000000000000
r8  : 0000000000000008 r9  : a0000001012e7308 r10 : 0000000004000000
r11 : e000000f0006e800 r12 : e000001f08dbfe00 r13 : e000001f08db0000
r14 : 0000000000000000 r15 : 0000000000000000 r16 : 0000000000000000
r17 : e000000f0006f008 r18 : 000000000f000000 r19 : a0000001012f3910
r20 : 0000000000100001 r21 : a000000101a62990 r22 : a000000100344580
r23 : 0000000000000000 r24 : 0000000000001000 r25 : 0000000000000000
r26 : a000000101a62988 r27 : e000003f0fc37e60 r28 : e000003f0fc37e68
r29 : e000002f07012be0 r30 : 0000000082aa0260 r31 : 0000000000004000

Call Trace:
 [<a000000100016500>] show_stack+0x80/0xa0
                                sp=e000001f08dbf9c0 bsp=e000001f08db1388
 [<a000000100016b60>] show_regs+0x640/0x920
                                sp=e000001f08dbfb90 bsp=e000001f08db1330
 [<a000000100040770>] die+0x190/0x2c0
                                sp=e000001f08dbfba0 bsp=e000001f08db12f0
 [<a000000100908f60>] ia64_do_page_fault+0x7e0/0xac0
                                sp=e000001f08dbfba0 bsp=e000001f08db1290
 [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e000001f08dbfc30 bsp=e000001f08db1290
 [<a0000001004b3080>] __pci_remove_bus_device+0x100/0x1e0
                                sp=e000001f08dbfe00 bsp=e000001f08db1250
 [<a0000001004b32f0>] pci_stop_and_remove_bus_device+0x30/0x60
                                sp=e000001f08dbfe00 bsp=e000001f08db1230
 [<a0000001004c2440>] remove_callback+0x40/0x80
                                sp=e000001f08dbfe00 bsp=e000001f08db1208
 [<a0000001003445d0>] sysfs_schedule_callback_work+0x50/0x120
                                sp=e000001f08dbfe00 bsp=e000001f08db11d0
 [<a0000001000bc2d0>] process_one_work+0x6f0/0xae0
                                sp=e000001f08dbfe00 bsp=e000001f08db1158
 [<a0000001000bcf70>] worker_thread+0x3b0/0xc80
                                sp=e000001f08dbfe00 bsp=e000001f08db1060
 [<a0000001000cf050>] kthread+0x110/0x140
                                sp=e000001f08dbfe00 bsp=e000001f08db1028
 [<a000000100014590>] kernel_thread_helper+0x30/0x60
                                sp=e000001f08dbfe30 bsp=e000001f08db1000
 [<a00000010000a0c0>] start_kernel_thread+0x20/0x40
                                sp=e000001f08dbfe30 bsp=e000001f08db1000
Disabling lock debugging due to kernel taint
Unable to handle kernel NULL pointer dereference (address 0000000000000048)
kworker/u:2[220]: Oops 11012296146944 [2]

Pid: 220, CPU 30, comm:          kworker/u:2
psr : 0000121008022038 ifs : 8000000000000288 ip  : [<a0000001000c4961>]    Tain
ted: G      D      (3.5.0-rc6yijing-repo)
ip is at wq_worker_sleeping+0x61/0x200
unat: 0000000000000000 pfs : 0000000000000288 rsc : 0000000000000003
rnat: 0000121008026038 bsps: a0000001000407e0 pr  : 965a684515516955
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001000c4920 b6  : a0000001000f9fc0 b7  : a0000001000faae0
f6  : 000000000000000000000 f7  : 1003e9e3779b97f4a7c16
f8  : 1003e0000000050000003 f9  : 1003e000001cb87e8a5a8
f10 : 1003e9a78b92717b9f0f8 f11 : 1003e000000000000008f
r1  : a0000001014e63c0 r2  : 0000000000000000 r3  : fffffffffffc1200
r8  : 0000000000000000 r9  : 000000000000001e r10 : a000000101432530
r11 : a000000101432530 r12 : e000001f08dbfb70 r13 : e000001f08db0000
r14 : 0000000000001000 r15 : a000000101432620 r16 : e000003000245d40
r17 : fffffffffffc5c00 r18 : e000003000245d00 r19 : 00000000000000f8
r20 : e000001f08db0070 r21 : 0000000000000048 r22 : e000003000245ce8
r23 : e000003000245ce0 r24 : a000000101a638e0 r25 : ffffffffff48e500
r26 : e000003f088a0098 r27 : 0000000000000400 r28 : 0000000000000001
r29 : 000000000420806c r30 : e000001f08db0014 r31 : 0000000000000000

Call Trace:
 [<a000000100016500>] show_stack+0x80/0xa0
                                sp=e000001f08dbf730 bsp=e000001f08db16f8
 [<a000000100016b60>] show_regs+0x640/0x920
                                sp=e000001f08dbf900 bsp=e000001f08db16a0
 [<a000000100040770>] die+0x190/0x2c0
                                sp=e000001f08dbf910 bsp=e000001f08db1660
 [<a000000100908f60>] ia64_do_page_fault+0x7e0/0xac0
                                sp=e000001f08dbf910 bsp=e000001f08db1600
 [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e000001f08dbf9a0 bsp=e000001f08db1600
 [<a0000001000c4960>] wq_worker_sleeping+0x60/0x200
                                sp=e000001f08dbfb70 bsp=e000001f08db15b8
 [<a0000001009007e0>] __schedule+0x14c0/0x18c0
                                sp=e000001f08dbfb70 bsp=e000001f08db1440
 [<a000000100900ea0>] schedule+0x60/0x140
                                sp=e000001f08dbfb80 bsp=e000001f08db13e0
 [<a000000100090d10>] do_exit+0xef0/0x1740
                                sp=e000001f08dbfb80 bsp=e000001f08db1330
 [<a000000100040840>] die+0x260/0x2c0
                                sp=e000001f08dbfba0 bsp=e000001f08db12f0
 [<a000000100908f60>] ia64_do_page_fault+0x7e0/0xac0
                                sp=e000001f08dbfba0 bsp=e000001f08db1290
 [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e000001f08dbfc30 bsp=e000001f08db1290
 [<a0000001004b3080>] __pci_remove_bus_device+0x100/0x1e0
                                sp=e000001f08dbfe00 bsp=e000001f08db1250
 [<a0000001004b32f0>] pci_stop_and_remove_bus_device+0x30/0x60
                                sp=e000001f08dbfe00 bsp=e000001f08db1230
 [<a0000001004c2440>] remove_callback+0x40/0x80
                                sp=e000001f08dbfe00 bsp=e000001f08db1208
 [<a0000001003445d0>] sysfs_schedule_callback_work+0x50/0x120
                                sp=e000001f08dbfe00 bsp=e000001f08db11d0
 [<a0000001000bc2d0>] process_one_work+0x6f0/0xae0
                                sp=e000001f08dbfe00 bsp=e000001f08db1158
 [<a0000001000bcf70>] worker_thread+0x3b0/0xc80
                                sp=e000001f08dbfe00 bsp=e000001f08db1060
 [<a0000001000cf050>] kthread+0x110/0x140
                                sp=e000001f08dbfe00 bsp=e000001f08db1028
 [<a000000100014590>] kernel_thread_helper+0x30/0x60
                                sp=e000001f08dbfe30 bsp=e000001f08db1000
 [<a00000010000a0c0>] start_kernel_thread+0x20/0x40
                                sp=e000001f08dbfe30 bsp=e000001f08db1000
Fixing recursive fault but reboot is needed!
Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi
_cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si ipmi_devintf ipmi_msghandle
r dm_mod igb ppdev iTCO_wdt parport_pc iTCO_vendor_support i2c_i801 parport sg m
ptctl serio_raw i2c_core lpc_ich mfd_core hid_generic button container usbhid hi
d uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbcache jbd fan pr
ocessor ide_pci_generic ide_core ata_piix libata mptsas mptscsih mptbase scsi_tr
ansport_sas scsi_mod thermal thermal_sys hwmon

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
---
 drivers/pci/pci-sysfs.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 6869009..b0be682 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -332,7 +332,10 @@ static void remove_callback(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 
 	mutex_lock(&pci_remove_rescan_mutex);
+	if (!pdev->is_added)
+		goto out;
 	pci_stop_and_remove_bus_device(pdev);
+out:
 	mutex_unlock(&pci_remove_rescan_mutex);
 }
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when hotplug a pci bus which was doing aer error inject
  2012-08-11 11:52 [PATCH 1/3] PCI/AER: Fix NULL pci_ops return when hotplug a pci bus which was doing aer error inject Yijing Wang
  2012-08-11 11:52 ` [PATCH 2/3] PCI/AER: Clean pci_bus_ops when related pci bus was removed Yijing Wang
  2012-08-11 11:52 ` [PATCH 3/3] PCI: Check whether pci device has been removed when remove a pci device by sysfs Yijing Wang
@ 2012-08-25  9:59 ` Yijing Wang
  2012-08-27  1:23   ` Huang Ying
  2012-08-27  8:49   ` Chen Gong
  2 siblings, 2 replies; 14+ messages in thread
From: Yijing Wang @ 2012-08-25  9:59 UTC (permalink / raw)
  To: Bjorn Helgaas, Rusty Russell, Mauro Carvalho Chehab
  Cc: PCI, Jiang Liu, Huang Ying, Hanjun Guo, linux-kernel

When we inject aer errors to the target pci device by aer_inject module, the pci_ops of pci
bus which the target device is on will be assign to pci_ops_aer.So if the target pci device
is a bridge, once we hotplug the pci bus(child bus) which the target device bridges to, child
bus's pci_ops will be assigned to pci_ops_aer too.Now every access to the child bus's device
will result to system panic, because it return NULL pci_ops in pci_read_aer.
This patch fix this.

CallTrace:
bash[5908]: NaT consumption 17179869216 [1]
Modules linked in: aer_inject cpufreq_conservative cpufreq_userspace cpufreq_pow
ersave acpi_cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si(+) ipmi_devintf
ipmi_msghandler dm_mod ppdev iTCO_wdt iTCO_vendor_support sg igb parport_pc i2c_
i801 mptctl i2c_core serio_raw hid_generic lpc_ich mfd_core parport button conta
iner usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbca
che jbd fan processor ide_pci_generic ide_core ata_piix libata mptsas mptscsih m
ptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon

Pid: 5908, CPU 9, comm:                 bash
psr : 00001010085a2010 ifs : 800000000000048e ip  : [<a000000220b815b0>]    Not
tainted (3.5.0-rc6yijing-repo)
ip is at pci_read_aer+0x330/0x460 [aer_inject]
unat: 0000000000000000 pfs : 000000000000048e rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 65519aa6a6969aa5
ldrs: 0000000000000000 ccv : ffffffff00000001 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a000000220b815b0 b6  : a000000220b81280 b7  : a0000001006d56a0
f6  : 1003e0000000000000005 f7  : 1003e0000000000000028
f8  : 1003e00000000000000c8 f9  : 1003e0000000000000005
f10 : 1003e627ec1e2f4c0d8a7 f11 : 1003e0000000000000011
r1  : a0000001014e63c0 r2  : 0000000000000738 r3  : 000000000000fffe
r8  : 0000000000000736 r9  : 0000000000000042 r10 : e000001f08f4c898
r11 : 0000000000000000 r12 : e000000f3dfcfdc0 r13 : e000000f3dfc0000
r14 : 0000000000000738 r15 : 0000000000004000 r16 : a000000220b827c8
r17 : a000000220b827b8 r18 : ffffffffffffff00 r19 : e000000f073b0110
r20 : 0000000000000042 r21 : e000000f073b0114 r22 : 0000000000000000
r23 : e000000f073b0118 r24 : a0000001009e0e49 r25 : 0000000000000001
r26 : 0000000000007041 r27 : e000000f3dfcfde0 r28 : 0000000000000000
r29 : e000000f3dfcfc08 r30 : a000000220b827c8 r31 : e000001f074d6000

Call Trace:
 [<a000000100016500>] show_stack+0x80/0xa0
                                sp=e000000f3dfcf800 bsp=e000000f3dfc1758
 [<a000000100016b60>] show_regs+0x640/0x920
                                sp=e000000f3dfcf9d0 bsp=e000000f3dfc1700
 [<a000000100040770>] die+0x190/0x2c0
                                sp=e000000f3dfcf9e0 bsp=e000000f3dfc16c0
 [<a0000001000408f0>] die_if_kernel+0x50/0x80
                                sp=e000000f3dfcf9e0 bsp=e000000f3dfc1690
 [<a000000100903a90>] ia64_fault+0xf0/0x15e0
                                sp=e000000f3dfcf9e0 bsp=e000000f3dfc1640
 [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e000000f3dfcfbf0 bsp=e000000f3dfc1640
 [<a000000220b815b0>] pci_read_aer+0x330/0x460 [aer_inject]
                                sp=e000000f3dfcfdc0 bsp=e000000f3dfc15c8
 [<a0000001004ace00>] pci_bus_read_config_dword+0xe0/0x140
                                sp=e000000f3dfcfdc0 bsp=e000000f3dfc1580
 [<a0000001004b0c10>] pci_bus_read_dev_vendor_id+0x50/0x200
                                sp=e000000f3dfcfdd0 bsp=e000000f3dfc1530
 [<a0000001008d3d10>] pci_scan_single_device+0x90/0x200
                                sp=e000000f3dfcfdd0 bsp=e000000f3dfc14f8
 [<a0000001004b24b0>] pci_scan_slot+0xb0/0x320
                                sp=e000000f3dfcfde0 bsp=e000000f3dfc14a8
 [<a0000001008d9e90>] pci_scan_child_bus+0x90/0x2e0
                                sp=e000000f3dfcfde0 bsp=e000000f3dfc1468
 [<a0000001008d9580>] pci_scan_bridge+0x540/0xdc0
                                sp=e000000f3dfcfde0 bsp=e000000f3dfc13d0
 [<a0000001008da0b0>] pci_scan_child_bus+0x2b0/0x2e0
                                sp=e000000f3dfcfe00 bsp=e000000f3dfc1390
 [<a0000001008d5bd0>] pci_rescan_bus+0x50/0x220
                                sp=e000000f3dfcfe00 bsp=e000000f3dfc1358
 [<a0000001004c2ab0>] bus_rescan_store+0xf0/0x160
                                sp=e000000f3dfcfe10 bsp=e000000f3dfc1328
 [<a0000001006110b0>] bus_attr_store+0x70/0xa0
                                sp=e000000f3dfcfe20 bsp=e000000f3dfc12f0
 [<a000000100343b00>] sysfs_write_file+0x240/0x340
                                sp=e000000f3dfcfe20 bsp=e000000f3dfc1298
 [<a00000010025e230>] vfs_write+0x1b0/0x3a0
                                sp=e000000f3dfcfe20 bsp=e000000f3dfc1250
 [<a00000010025e5e0>] sys_write+0x80/0x100
                                sp=e000000f3dfcfe20 bsp=e000000f3dfc11d0
 [<a00000010000bf20>] ia64_ret_from_syscall+0x0/0x20
                                sp=e000000f3dfcfe30 bsp=e000000f3dfc11d0
 [<a000000000040720>] __kernel_syscall_via_break+0x0/0x20
                                sp=e000000f3dfd0000 bsp=e000000f3dfc11d0
Disabling lock debugging due to kernel taint

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
---
 drivers/pci/pcie/aer/aer_inject.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pcie/aer/aer_inject.c b/drivers/pci/pcie/aer/aer_inject.c
index 5222986..fc28785 100644
--- a/drivers/pci/pcie/aer/aer_inject.c
+++ b/drivers/pci/pcie/aer/aer_inject.c
@@ -109,6 +109,19 @@ static struct aer_error *__find_aer_error_by_dev(struct pci_dev *dev)
 	return __find_aer_error((u16)domain, dev->bus->number, dev->devfn);
 }

+static bool pci_is_upstream_bus(struct pci_bus *bus, struct pci_bus *up_bus)
+{
+	struct pci_bus *pbus = bus->parent;
+
+	while (pbus) {
+		if (pbus == up_bus)
+			return true;
+		pbus = pbus->parent;
+	}
+
+	return false;
+}
+
 /* inject_lock must be held before calling */
 static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
 {
@@ -118,6 +131,13 @@ static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
 		if (bus_ops->bus == bus)
 			return bus_ops->ops;
 	}
+
+	/* here can't find bus_ops, fall back to get bus_ops of upstream bus */
+	list_for_each_entry(bus_ops, &pci_bus_ops_list, list) {
+		if (pci_is_upstream_bus(bus, bus_ops->bus))
+			return bus_ops->ops;
+	}
+
 	return NULL;
 }

@@ -506,6 +526,7 @@ static struct miscdevice aer_inject_device = {
 	.fops = &aer_inject_fops,
 };

+
 static int __init aer_inject_init(void)
 {
 	return misc_register(&aer_inject_device);
-- 
1.7.1



.





^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RESEND BUGFIX PATCH 2/3] PCI/AER: clean pci_bus_ops when related pci bus was removed
  2012-08-11 11:52 ` [PATCH 2/3] PCI/AER: Clean pci_bus_ops when related pci bus was removed Yijing Wang
@ 2012-08-25  9:59   ` Yijing Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Yijing Wang @ 2012-08-25  9:59 UTC (permalink / raw)
  To: Bjorn Helgaas, Rusty Russell, Mauro Carvalho Chehab
  Cc: PCI, Jiang Liu, Huang Ying, Hanjun Guo, linux-kernel

When Inject aer errors to the target pci device, a pci_bus_ops will
be allocated for the pci device's pci bus.When the pci bus was removed,
we should also release the pci_bus_ops.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
---
 drivers/pci/pcie/aer/aer_inject.c |   43 ++++++++++++++++++++++++++++++++++++-
 1 files changed, 42 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/pcie/aer/aer_inject.c b/drivers/pci/pcie/aer/aer_inject.c
index fc28785..8ca744f 100644
--- a/drivers/pci/pcie/aer/aer_inject.c
+++ b/drivers/pci/pcie/aer/aer_inject.c
@@ -526,10 +526,50 @@ static struct miscdevice aer_inject_device = {
 	.fops = &aer_inject_fops,
 };

+static void aer_clean_pci_bus_ops(struct pci_dev *dev)
+{
+	unsigned long flags;
+	struct pci_bus_ops *bus_ops, *tmp_ops;
+	struct pci_bus *bus;
+	bus = dev->subordinate;
+	if (!bus)
+		return;
+
+	spin_lock_irqsave(&inject_lock, flags);
+	list_for_each_entry_safe(bus_ops, tmp_ops, &pci_bus_ops_list, list)
+		if (bus_ops->bus == bus) {
+			list_del(&bus_ops->list);
+			kfree(bus_ops);
+			break;
+		}
+	spin_unlock_irqrestore(&inject_lock, flags);
+}
+
+static int aer_hp_notify_fn(struct notifier_block *nb,
+		unsigned long event, void *data)
+{
+	switch (event) {
+	case BUS_NOTIFY_DEL_DEVICE:
+		aer_clean_pci_bus_ops(to_pci_dev(data));
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block aerinj_hp_notifier = {
+	.notifier_call = &aer_hp_notify_fn,
+};

 static int __init aer_inject_init(void)
 {
-	return misc_register(&aer_inject_device);
+	int ret;
+	ret = misc_register(&aer_inject_device);
+	if (!ret)
+		bus_register_notifier(&pci_bus_type, &aerinj_hp_notifier);
+	return ret;
 }

 static void __exit aer_inject_exit(void)
@@ -538,6 +578,7 @@ static void __exit aer_inject_exit(void)
 	unsigned long flags;
 	struct pci_bus_ops *bus_ops;

+	bus_unregister_notifier(&pci_bus_type, &aerinj_hp_notifier);
 	misc_deregister(&aer_inject_device);

 	while ((bus_ops = pci_bus_ops_pop())) {
-- 
1.7.1



.





^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RESEND BUGFIX PATCH 3/3] PCI: check whether pci device has been removed when remove a pci device by sysfs
  2012-08-11 11:52 ` [PATCH 3/3] PCI: Check whether pci device has been removed when remove a pci device by sysfs Yijing Wang
@ 2012-08-25  9:59   ` Yijing Wang
  2012-08-25 14:39     ` Jiang Liu
  0 siblings, 1 reply; 14+ messages in thread
From: Yijing Wang @ 2012-08-25  9:59 UTC (permalink / raw)
  To: Bjorn Helgaas, PCI, linux-kernel; +Cc: Jiang Liu

We remove a pci device maybe like this
echo 1 > /sys/bus/pci/devices/xxxx:xx:xx.x/remove
Then remove_store function will be called to complete this remove work,
later the remove work will be queued to sysfs_workqueue by device_schedule_callback.
So if we remove a pci root port device and a pci endpoint device which was the root
port's child device concurrently.The endponit device will be removed when root port's
remove work completed,so when endpoint device itself's remove work start, since endpoint
device has been removed, it will result to oops.
This patch fix this.

CallTrace:
kworker/u:2[220]: Oops 11003706212352 [1]
Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi
_cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si ipmi_devintf ipmi_msghandle
r dm_mod igb ppdev iTCO_wdt parport_pc iTCO_vendor_support i2c_i801 parport sg m
ptctl serio_raw i2c_core lpc_ich mfd_core hid_generic button container usbhid hi
d uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbcache jbd fan pr
ocessor ide_pci_generic ide_core ata_piix libata mptsas mptscsih mptbase scsi_tr
ansport_sas scsi_mod thermal thermal_sys hwmon

Pid: 220, CPU 30, comm:          kworker/u:2
psr : 0000121008526030 ifs : 8000000000000388 ip  : [<a0000001004b3081>]    Not
tainted (3.5.0-rc6yijing-repo)
ip is at __pci_remove_bus_device+0x101/0x1e0
unat: 0000000000000000 pfs : 0000000000000388 rsc : 0000000000000003
rnat: ffffffffffffffff bsps: ffffffffffffffff pr  : 0000080001919585
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c9e70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001004b3060 b6  : a0000001004c2400 b7  : a0000001000faae0
f6  : 000000000000000000000 f7  : 1003e00000000000057cd
f8  : 1003e0000000050000003 f9  : 1003e000001cb8678a0d0
f10 : 1003e9a05b7a39369e270 f11 : 1003e000000000000008f
r1  : a0000001014e63c0 r2  : e000001f075dec00 r3  : 0000000000000000
r8  : 0000000000000008 r9  : a0000001012e7308 r10 : 0000000004000000
r11 : e000000f0006e800 r12 : e000001f08dbfe00 r13 : e000001f08db0000
r14 : 0000000000000000 r15 : 0000000000000000 r16 : 0000000000000000
r17 : e000000f0006f008 r18 : 000000000f000000 r19 : a0000001012f3910
r20 : 0000000000100001 r21 : a000000101a62990 r22 : a000000100344580
r23 : 0000000000000000 r24 : 0000000000001000 r25 : 0000000000000000
r26 : a000000101a62988 r27 : e000003f0fc37e60 r28 : e000003f0fc37e68
r29 : e000002f07012be0 r30 : 0000000082aa0260 r31 : 0000000000004000

Call Trace:
 [<a000000100016500>] show_stack+0x80/0xa0
                                sp=e000001f08dbf9c0 bsp=e000001f08db1388
 [<a000000100016b60>] show_regs+0x640/0x920
                                sp=e000001f08dbfb90 bsp=e000001f08db1330
 [<a000000100040770>] die+0x190/0x2c0
                                sp=e000001f08dbfba0 bsp=e000001f08db12f0
 [<a000000100908f60>] ia64_do_page_fault+0x7e0/0xac0
                                sp=e000001f08dbfba0 bsp=e000001f08db1290
 [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e000001f08dbfc30 bsp=e000001f08db1290
 [<a0000001004b3080>] __pci_remove_bus_device+0x100/0x1e0
                                sp=e000001f08dbfe00 bsp=e000001f08db1250
 [<a0000001004b32f0>] pci_stop_and_remove_bus_device+0x30/0x60
                                sp=e000001f08dbfe00 bsp=e000001f08db1230
 [<a0000001004c2440>] remove_callback+0x40/0x80
                                sp=e000001f08dbfe00 bsp=e000001f08db1208
 [<a0000001003445d0>] sysfs_schedule_callback_work+0x50/0x120
                                sp=e000001f08dbfe00 bsp=e000001f08db11d0
 [<a0000001000bc2d0>] process_one_work+0x6f0/0xae0
                                sp=e000001f08dbfe00 bsp=e000001f08db1158
 [<a0000001000bcf70>] worker_thread+0x3b0/0xc80
                                sp=e000001f08dbfe00 bsp=e000001f08db1060
 [<a0000001000cf050>] kthread+0x110/0x140
                                sp=e000001f08dbfe00 bsp=e000001f08db1028
 [<a000000100014590>] kernel_thread_helper+0x30/0x60
                                sp=e000001f08dbfe30 bsp=e000001f08db1000
 [<a00000010000a0c0>] start_kernel_thread+0x20/0x40
                                sp=e000001f08dbfe30 bsp=e000001f08db1000
Disabling lock debugging due to kernel taint
Unable to handle kernel NULL pointer dereference (address 0000000000000048)
kworker/u:2[220]: Oops 11012296146944 [2]

Pid: 220, CPU 30, comm:          kworker/u:2
psr : 0000121008022038 ifs : 8000000000000288 ip  : [<a0000001000c4961>]    Tain
ted: G      D      (3.5.0-rc6yijing-repo)
ip is at wq_worker_sleeping+0x61/0x200
unat: 0000000000000000 pfs : 0000000000000288 rsc : 0000000000000003
rnat: 0000121008026038 bsps: a0000001000407e0 pr  : 965a684515516955
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001000c4920 b6  : a0000001000f9fc0 b7  : a0000001000faae0
f6  : 000000000000000000000 f7  : 1003e9e3779b97f4a7c16
f8  : 1003e0000000050000003 f9  : 1003e000001cb87e8a5a8
f10 : 1003e9a78b92717b9f0f8 f11 : 1003e000000000000008f
r1  : a0000001014e63c0 r2  : 0000000000000000 r3  : fffffffffffc1200
r8  : 0000000000000000 r9  : 000000000000001e r10 : a000000101432530
r11 : a000000101432530 r12 : e000001f08dbfb70 r13 : e000001f08db0000
r14 : 0000000000001000 r15 : a000000101432620 r16 : e000003000245d40
r17 : fffffffffffc5c00 r18 : e000003000245d00 r19 : 00000000000000f8
r20 : e000001f08db0070 r21 : 0000000000000048 r22 : e000003000245ce8
r23 : e000003000245ce0 r24 : a000000101a638e0 r25 : ffffffffff48e500
r26 : e000003f088a0098 r27 : 0000000000000400 r28 : 0000000000000001
r29 : 000000000420806c r30 : e000001f08db0014 r31 : 0000000000000000

Call Trace:
 [<a000000100016500>] show_stack+0x80/0xa0
                                sp=e000001f08dbf730 bsp=e000001f08db16f8
 [<a000000100016b60>] show_regs+0x640/0x920
                                sp=e000001f08dbf900 bsp=e000001f08db16a0
 [<a000000100040770>] die+0x190/0x2c0
                                sp=e000001f08dbf910 bsp=e000001f08db1660
 [<a000000100908f60>] ia64_do_page_fault+0x7e0/0xac0
                                sp=e000001f08dbf910 bsp=e000001f08db1600
 [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e000001f08dbf9a0 bsp=e000001f08db1600
 [<a0000001000c4960>] wq_worker_sleeping+0x60/0x200
                                sp=e000001f08dbfb70 bsp=e000001f08db15b8
 [<a0000001009007e0>] __schedule+0x14c0/0x18c0
                                sp=e000001f08dbfb70 bsp=e000001f08db1440
 [<a000000100900ea0>] schedule+0x60/0x140
                                sp=e000001f08dbfb80 bsp=e000001f08db13e0
 [<a000000100090d10>] do_exit+0xef0/0x1740
                                sp=e000001f08dbfb80 bsp=e000001f08db1330
 [<a000000100040840>] die+0x260/0x2c0
                                sp=e000001f08dbfba0 bsp=e000001f08db12f0
 [<a000000100908f60>] ia64_do_page_fault+0x7e0/0xac0
                                sp=e000001f08dbfba0 bsp=e000001f08db1290
 [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e000001f08dbfc30 bsp=e000001f08db1290
 [<a0000001004b3080>] __pci_remove_bus_device+0x100/0x1e0
                                sp=e000001f08dbfe00 bsp=e000001f08db1250
 [<a0000001004b32f0>] pci_stop_and_remove_bus_device+0x30/0x60
                                sp=e000001f08dbfe00 bsp=e000001f08db1230
 [<a0000001004c2440>] remove_callback+0x40/0x80
                                sp=e000001f08dbfe00 bsp=e000001f08db1208
 [<a0000001003445d0>] sysfs_schedule_callback_work+0x50/0x120
                                sp=e000001f08dbfe00 bsp=e000001f08db11d0
 [<a0000001000bc2d0>] process_one_work+0x6f0/0xae0
                                sp=e000001f08dbfe00 bsp=e000001f08db1158
 [<a0000001000bcf70>] worker_thread+0x3b0/0xc80
                                sp=e000001f08dbfe00 bsp=e000001f08db1060
 [<a0000001000cf050>] kthread+0x110/0x140
                                sp=e000001f08dbfe00 bsp=e000001f08db1028
 [<a000000100014590>] kernel_thread_helper+0x30/0x60
                                sp=e000001f08dbfe30 bsp=e000001f08db1000
 [<a00000010000a0c0>] start_kernel_thread+0x20/0x40
                                sp=e000001f08dbfe30 bsp=e000001f08db1000
Fixing recursive fault but reboot is needed!
Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi
_cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si ipmi_devintf ipmi_msghandle
r dm_mod igb ppdev iTCO_wdt parport_pc iTCO_vendor_support i2c_i801 parport sg m
ptctl serio_raw i2c_core lpc_ich mfd_core hid_generic button container usbhid hi
d uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbcache jbd fan pr
ocessor ide_pci_generic ide_core ata_piix libata mptsas mptscsih mptbase scsi_tr
ansport_sas scsi_mod thermal thermal_sys hwmon

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
---
 drivers/pci/pci-sysfs.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 6869009..b0be682 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -332,7 +332,10 @@ static void remove_callback(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);

 	mutex_lock(&pci_remove_rescan_mutex);
+	if (!pdev->is_added)
+		goto out;
 	pci_stop_and_remove_bus_device(pdev);
+out:
 	mutex_unlock(&pci_remove_rescan_mutex);
 }

-- 
1.7.1



.





^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RESEND BUGFIX PATCH 3/3] PCI: check whether pci device has been removed when remove a pci device by sysfs
  2012-08-25  9:59   ` [RESEND BUGFIX PATCH 3/3] PCI: check " Yijing Wang
@ 2012-08-25 14:39     ` Jiang Liu
  2012-08-27  6:42       ` Yijing Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Jiang Liu @ 2012-08-25 14:39 UTC (permalink / raw)
  To: Yijing Wang; +Cc: Bjorn Helgaas, PCI, linux-kernel

Hi Yijing,
	The patch only patially fix the issue, there exists still small race
condition window because pdev->is_added isn't a reliable flag to depend on.
	--Gerry

On 08/25/2012 05:59 PM, Yijing Wang wrote:
> We remove a pci device maybe like this
> echo 1 > /sys/bus/pci/devices/xxxx:xx:xx.x/remove
> Then remove_store function will be called to complete this remove work,
> later the remove work will be queued to sysfs_workqueue by device_schedule_callback.
> So if we remove a pci root port device and a pci endpoint device which was the root
> port's child device concurrently.The endponit device will be removed when root port's
> remove work completed,so when endpoint device itself's remove work start, since endpoint
> device has been removed, it will result to oops.
> This patch fix this.
> 
> CallTrace:
> kworker/u:2[220]: Oops 11003706212352 [1]
> Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi
> _cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si ipmi_devintf ipmi_msghandle
> r dm_mod igb ppdev iTCO_wdt parport_pc iTCO_vendor_support i2c_i801 parport sg m
> ptctl serio_raw i2c_core lpc_ich mfd_core hid_generic button container usbhid hi
> d uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbcache jbd fan pr
> ocessor ide_pci_generic ide_core ata_piix libata mptsas mptscsih mptbase scsi_tr
> ansport_sas scsi_mod thermal thermal_sys hwmon
> 
> Pid: 220, CPU 30, comm:          kworker/u:2
> psr : 0000121008526030 ifs : 8000000000000388 ip  : [<a0000001004b3081>]    Not
> tainted (3.5.0-rc6yijing-repo)
> ip is at __pci_remove_bus_device+0x101/0x1e0
> unat: 0000000000000000 pfs : 0000000000000388 rsc : 0000000000000003
> rnat: ffffffffffffffff bsps: ffffffffffffffff pr  : 0000080001919585
> ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c9e70433f
> csd : 0000000000000000 ssd : 0000000000000000
> b0  : a0000001004b3060 b6  : a0000001004c2400 b7  : a0000001000faae0
> f6  : 000000000000000000000 f7  : 1003e00000000000057cd
> f8  : 1003e0000000050000003 f9  : 1003e000001cb8678a0d0
> f10 : 1003e9a05b7a39369e270 f11 : 1003e000000000000008f
> r1  : a0000001014e63c0 r2  : e000001f075dec00 r3  : 0000000000000000
> r8  : 0000000000000008 r9  : a0000001012e7308 r10 : 0000000004000000
> r11 : e000000f0006e800 r12 : e000001f08dbfe00 r13 : e000001f08db0000
> r14 : 0000000000000000 r15 : 0000000000000000 r16 : 0000000000000000
> r17 : e000000f0006f008 r18 : 000000000f000000 r19 : a0000001012f3910
> r20 : 0000000000100001 r21 : a000000101a62990 r22 : a000000100344580
> r23 : 0000000000000000 r24 : 0000000000001000 r25 : 0000000000000000
> r26 : a000000101a62988 r27 : e000003f0fc37e60 r28 : e000003f0fc37e68
> r29 : e000002f07012be0 r30 : 0000000082aa0260 r31 : 0000000000004000
> 
> Call Trace:
>  [<a000000100016500>] show_stack+0x80/0xa0
>                                 sp=e000001f08dbf9c0 bsp=e000001f08db1388
>  [<a000000100016b60>] show_regs+0x640/0x920
>                                 sp=e000001f08dbfb90 bsp=e000001f08db1330
>  [<a000000100040770>] die+0x190/0x2c0
>                                 sp=e000001f08dbfba0 bsp=e000001f08db12f0
>  [<a000000100908f60>] ia64_do_page_fault+0x7e0/0xac0
>                                 sp=e000001f08dbfba0 bsp=e000001f08db1290
>  [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
>                                 sp=e000001f08dbfc30 bsp=e000001f08db1290
>  [<a0000001004b3080>] __pci_remove_bus_device+0x100/0x1e0
>                                 sp=e000001f08dbfe00 bsp=e000001f08db1250
>  [<a0000001004b32f0>] pci_stop_and_remove_bus_device+0x30/0x60
>                                 sp=e000001f08dbfe00 bsp=e000001f08db1230
>  [<a0000001004c2440>] remove_callback+0x40/0x80
>                                 sp=e000001f08dbfe00 bsp=e000001f08db1208
>  [<a0000001003445d0>] sysfs_schedule_callback_work+0x50/0x120
>                                 sp=e000001f08dbfe00 bsp=e000001f08db11d0
>  [<a0000001000bc2d0>] process_one_work+0x6f0/0xae0
>                                 sp=e000001f08dbfe00 bsp=e000001f08db1158
>  [<a0000001000bcf70>] worker_thread+0x3b0/0xc80
>                                 sp=e000001f08dbfe00 bsp=e000001f08db1060
>  [<a0000001000cf050>] kthread+0x110/0x140
>                                 sp=e000001f08dbfe00 bsp=e000001f08db1028
>  [<a000000100014590>] kernel_thread_helper+0x30/0x60
>                                 sp=e000001f08dbfe30 bsp=e000001f08db1000
>  [<a00000010000a0c0>] start_kernel_thread+0x20/0x40
>                                 sp=e000001f08dbfe30 bsp=e000001f08db1000
> Disabling lock debugging due to kernel taint
> Unable to handle kernel NULL pointer dereference (address 0000000000000048)
> kworker/u:2[220]: Oops 11012296146944 [2]
> 
> Pid: 220, CPU 30, comm:          kworker/u:2
> psr : 0000121008022038 ifs : 8000000000000288 ip  : [<a0000001000c4961>]    Tain
> ted: G      D      (3.5.0-rc6yijing-repo)
> ip is at wq_worker_sleeping+0x61/0x200
> unat: 0000000000000000 pfs : 0000000000000288 rsc : 0000000000000003
> rnat: 0000121008026038 bsps: a0000001000407e0 pr  : 965a684515516955
> ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
> csd : 0000000000000000 ssd : 0000000000000000
> b0  : a0000001000c4920 b6  : a0000001000f9fc0 b7  : a0000001000faae0
> f6  : 000000000000000000000 f7  : 1003e9e3779b97f4a7c16
> f8  : 1003e0000000050000003 f9  : 1003e000001cb87e8a5a8
> f10 : 1003e9a78b92717b9f0f8 f11 : 1003e000000000000008f
> r1  : a0000001014e63c0 r2  : 0000000000000000 r3  : fffffffffffc1200
> r8  : 0000000000000000 r9  : 000000000000001e r10 : a000000101432530
> r11 : a000000101432530 r12 : e000001f08dbfb70 r13 : e000001f08db0000
> r14 : 0000000000001000 r15 : a000000101432620 r16 : e000003000245d40
> r17 : fffffffffffc5c00 r18 : e000003000245d00 r19 : 00000000000000f8
> r20 : e000001f08db0070 r21 : 0000000000000048 r22 : e000003000245ce8
> r23 : e000003000245ce0 r24 : a000000101a638e0 r25 : ffffffffff48e500
> r26 : e000003f088a0098 r27 : 0000000000000400 r28 : 0000000000000001
> r29 : 000000000420806c r30 : e000001f08db0014 r31 : 0000000000000000
> 
> Call Trace:
>  [<a000000100016500>] show_stack+0x80/0xa0
>                                 sp=e000001f08dbf730 bsp=e000001f08db16f8
>  [<a000000100016b60>] show_regs+0x640/0x920
>                                 sp=e000001f08dbf900 bsp=e000001f08db16a0
>  [<a000000100040770>] die+0x190/0x2c0
>                                 sp=e000001f08dbf910 bsp=e000001f08db1660
>  [<a000000100908f60>] ia64_do_page_fault+0x7e0/0xac0
>                                 sp=e000001f08dbf910 bsp=e000001f08db1600
>  [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
>                                 sp=e000001f08dbf9a0 bsp=e000001f08db1600
>  [<a0000001000c4960>] wq_worker_sleeping+0x60/0x200
>                                 sp=e000001f08dbfb70 bsp=e000001f08db15b8
>  [<a0000001009007e0>] __schedule+0x14c0/0x18c0
>                                 sp=e000001f08dbfb70 bsp=e000001f08db1440
>  [<a000000100900ea0>] schedule+0x60/0x140
>                                 sp=e000001f08dbfb80 bsp=e000001f08db13e0
>  [<a000000100090d10>] do_exit+0xef0/0x1740
>                                 sp=e000001f08dbfb80 bsp=e000001f08db1330
>  [<a000000100040840>] die+0x260/0x2c0
>                                 sp=e000001f08dbfba0 bsp=e000001f08db12f0
>  [<a000000100908f60>] ia64_do_page_fault+0x7e0/0xac0
>                                 sp=e000001f08dbfba0 bsp=e000001f08db1290
>  [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
>                                 sp=e000001f08dbfc30 bsp=e000001f08db1290
>  [<a0000001004b3080>] __pci_remove_bus_device+0x100/0x1e0
>                                 sp=e000001f08dbfe00 bsp=e000001f08db1250
>  [<a0000001004b32f0>] pci_stop_and_remove_bus_device+0x30/0x60
>                                 sp=e000001f08dbfe00 bsp=e000001f08db1230
>  [<a0000001004c2440>] remove_callback+0x40/0x80
>                                 sp=e000001f08dbfe00 bsp=e000001f08db1208
>  [<a0000001003445d0>] sysfs_schedule_callback_work+0x50/0x120
>                                 sp=e000001f08dbfe00 bsp=e000001f08db11d0
>  [<a0000001000bc2d0>] process_one_work+0x6f0/0xae0
>                                 sp=e000001f08dbfe00 bsp=e000001f08db1158
>  [<a0000001000bcf70>] worker_thread+0x3b0/0xc80
>                                 sp=e000001f08dbfe00 bsp=e000001f08db1060
>  [<a0000001000cf050>] kthread+0x110/0x140
>                                 sp=e000001f08dbfe00 bsp=e000001f08db1028
>  [<a000000100014590>] kernel_thread_helper+0x30/0x60
>                                 sp=e000001f08dbfe30 bsp=e000001f08db1000
>  [<a00000010000a0c0>] start_kernel_thread+0x20/0x40
>                                 sp=e000001f08dbfe30 bsp=e000001f08db1000
> Fixing recursive fault but reboot is needed!
> Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi
> _cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si ipmi_devintf ipmi_msghandle
> r dm_mod igb ppdev iTCO_wdt parport_pc iTCO_vendor_support i2c_i801 parport sg m
> ptctl serio_raw i2c_core lpc_ich mfd_core hid_generic button container usbhid hi
> d uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbcache jbd fan pr
> ocessor ide_pci_generic ide_core ata_piix libata mptsas mptscsih mptbase scsi_tr
> ansport_sas scsi_mod thermal thermal_sys hwmon
> 
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> ---
>  drivers/pci/pci-sysfs.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 6869009..b0be682 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -332,7 +332,10 @@ static void remove_callback(struct device *dev)
>  	struct pci_dev *pdev = to_pci_dev(dev);
> 
>  	mutex_lock(&pci_remove_rescan_mutex);
> +	if (!pdev->is_added)
> +		goto out;
>  	pci_stop_and_remove_bus_device(pdev);
> +out:
>  	mutex_unlock(&pci_remove_rescan_mutex);
>  }
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when hotplug a pci bus which was doing aer error inject
  2012-08-25  9:59 ` [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when hotplug a pci bus which was doing aer error inject Yijing Wang
@ 2012-08-27  1:23   ` Huang Ying
  2012-08-27 15:05     ` Jiang Liu
  2012-08-27  8:49   ` Chen Gong
  1 sibling, 1 reply; 14+ messages in thread
From: Huang Ying @ 2012-08-27  1:23 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, Rusty Russell, Mauro Carvalho Chehab, PCI,
	Jiang Liu, Hanjun Guo, linux-kernel

On Sat, 2012-08-25 at 17:59 +0800, Yijing Wang wrote:
> When we inject aer errors to the target pci device by aer_inject module, the pci_ops of pci
> bus which the target device is on will be assign to pci_ops_aer.So if the target pci device
> is a bridge, once we hotplug the pci bus(child bus) which the target device bridges to, child
> bus's pci_ops will be assigned to pci_ops_aer too.Now every access to the child bus's device
> will result to system panic, because it return NULL pci_ops in pci_read_aer.
> This patch fix this.
> 
> CallTrace:
> bash[5908]: NaT consumption 17179869216 [1]
> Modules linked in: aer_inject cpufreq_conservative cpufreq_userspace cpufreq_pow
> ersave acpi_cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si(+) ipmi_devintf
> ipmi_msghandler dm_mod ppdev iTCO_wdt iTCO_vendor_support sg igb parport_pc i2c_
> i801 mptctl i2c_core serio_raw hid_generic lpc_ich mfd_core parport button conta
> iner usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbca
> che jbd fan processor ide_pci_generic ide_core ata_piix libata mptsas mptscsih m
> ptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
> 
> Pid: 5908, CPU 9, comm:                 bash
> psr : 00001010085a2010 ifs : 800000000000048e ip  : [<a000000220b815b0>]    Not
> tainted (3.5.0-rc6yijing-repo)
> ip is at pci_read_aer+0x330/0x460 [aer_inject]
> unat: 0000000000000000 pfs : 000000000000048e rsc : 0000000000000003
> rnat: 0000000000000000 bsps: 0000000000000000 pr  : 65519aa6a6969aa5
> ldrs: 0000000000000000 ccv : ffffffff00000001 fpsr: 0009804c8a70033f
> csd : 0000000000000000 ssd : 0000000000000000
> b0  : a000000220b815b0 b6  : a000000220b81280 b7  : a0000001006d56a0
> f6  : 1003e0000000000000005 f7  : 1003e0000000000000028
> f8  : 1003e00000000000000c8 f9  : 1003e0000000000000005
> f10 : 1003e627ec1e2f4c0d8a7 f11 : 1003e0000000000000011
> r1  : a0000001014e63c0 r2  : 0000000000000738 r3  : 000000000000fffe
> r8  : 0000000000000736 r9  : 0000000000000042 r10 : e000001f08f4c898
> r11 : 0000000000000000 r12 : e000000f3dfcfdc0 r13 : e000000f3dfc0000
> r14 : 0000000000000738 r15 : 0000000000004000 r16 : a000000220b827c8
> r17 : a000000220b827b8 r18 : ffffffffffffff00 r19 : e000000f073b0110
> r20 : 0000000000000042 r21 : e000000f073b0114 r22 : 0000000000000000
> r23 : e000000f073b0118 r24 : a0000001009e0e49 r25 : 0000000000000001
> r26 : 0000000000007041 r27 : e000000f3dfcfde0 r28 : 0000000000000000
> r29 : e000000f3dfcfc08 r30 : a000000220b827c8 r31 : e000001f074d6000
> 
> Call Trace:
>  [<a000000100016500>] show_stack+0x80/0xa0
>                                 sp=e000000f3dfcf800 bsp=e000000f3dfc1758
>  [<a000000100016b60>] show_regs+0x640/0x920
>                                 sp=e000000f3dfcf9d0 bsp=e000000f3dfc1700
>  [<a000000100040770>] die+0x190/0x2c0
>                                 sp=e000000f3dfcf9e0 bsp=e000000f3dfc16c0
>  [<a0000001000408f0>] die_if_kernel+0x50/0x80
>                                 sp=e000000f3dfcf9e0 bsp=e000000f3dfc1690
>  [<a000000100903a90>] ia64_fault+0xf0/0x15e0
>                                 sp=e000000f3dfcf9e0 bsp=e000000f3dfc1640
>  [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
>                                 sp=e000000f3dfcfbf0 bsp=e000000f3dfc1640
>  [<a000000220b815b0>] pci_read_aer+0x330/0x460 [aer_inject]
>                                 sp=e000000f3dfcfdc0 bsp=e000000f3dfc15c8
>  [<a0000001004ace00>] pci_bus_read_config_dword+0xe0/0x140
>                                 sp=e000000f3dfcfdc0 bsp=e000000f3dfc1580
>  [<a0000001004b0c10>] pci_bus_read_dev_vendor_id+0x50/0x200
>                                 sp=e000000f3dfcfdd0 bsp=e000000f3dfc1530
>  [<a0000001008d3d10>] pci_scan_single_device+0x90/0x200
>                                 sp=e000000f3dfcfdd0 bsp=e000000f3dfc14f8
>  [<a0000001004b24b0>] pci_scan_slot+0xb0/0x320
>                                 sp=e000000f3dfcfde0 bsp=e000000f3dfc14a8
>  [<a0000001008d9e90>] pci_scan_child_bus+0x90/0x2e0
>                                 sp=e000000f3dfcfde0 bsp=e000000f3dfc1468
>  [<a0000001008d9580>] pci_scan_bridge+0x540/0xdc0
>                                 sp=e000000f3dfcfde0 bsp=e000000f3dfc13d0
>  [<a0000001008da0b0>] pci_scan_child_bus+0x2b0/0x2e0
>                                 sp=e000000f3dfcfe00 bsp=e000000f3dfc1390
>  [<a0000001008d5bd0>] pci_rescan_bus+0x50/0x220
>                                 sp=e000000f3dfcfe00 bsp=e000000f3dfc1358
>  [<a0000001004c2ab0>] bus_rescan_store+0xf0/0x160
>                                 sp=e000000f3dfcfe10 bsp=e000000f3dfc1328
>  [<a0000001006110b0>] bus_attr_store+0x70/0xa0
>                                 sp=e000000f3dfcfe20 bsp=e000000f3dfc12f0
>  [<a000000100343b00>] sysfs_write_file+0x240/0x340
>                                 sp=e000000f3dfcfe20 bsp=e000000f3dfc1298
>  [<a00000010025e230>] vfs_write+0x1b0/0x3a0
>                                 sp=e000000f3dfcfe20 bsp=e000000f3dfc1250
>  [<a00000010025e5e0>] sys_write+0x80/0x100
>                                 sp=e000000f3dfcfe20 bsp=e000000f3dfc11d0
>  [<a00000010000bf20>] ia64_ret_from_syscall+0x0/0x20
>                                 sp=e000000f3dfcfe30 bsp=e000000f3dfc11d0
>  [<a000000000040720>] __kernel_syscall_via_break+0x0/0x20
>                                 sp=e000000f3dfd0000 bsp=e000000f3dfc11d0
> Disabling lock debugging due to kernel taint
> 
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Jiang Liu <liuj97@gmail.com>
> ---
>  drivers/pci/pcie/aer/aer_inject.c |   21 +++++++++++++++++++++
>  1 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer/aer_inject.c b/drivers/pci/pcie/aer/aer_inject.c
> index 5222986..fc28785 100644
> --- a/drivers/pci/pcie/aer/aer_inject.c
> +++ b/drivers/pci/pcie/aer/aer_inject.c
> @@ -109,6 +109,19 @@ static struct aer_error *__find_aer_error_by_dev(struct pci_dev *dev)
>  	return __find_aer_error((u16)domain, dev->bus->number, dev->devfn);
>  }
> 
> +static bool pci_is_upstream_bus(struct pci_bus *bus, struct pci_bus *up_bus)
> +{
> +	struct pci_bus *pbus = bus->parent;
> +
> +	while (pbus) {
> +		if (pbus == up_bus)
> +			return true;
> +		pbus = pbus->parent;
> +	}
> +
> +	return false;
> +}
> +
>  /* inject_lock must be held before calling */
>  static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
>  {
> @@ -118,6 +131,13 @@ static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
>  		if (bus_ops->bus == bus)
>  			return bus_ops->ops;
>  	}
> +
> +	/* here can't find bus_ops, fall back to get bus_ops of upstream bus */
> +	list_for_each_entry(bus_ops, &pci_bus_ops_list, list) {
> +		if (pci_is_upstream_bus(bus, bus_ops->bus))
> +			return bus_ops->ops;
> +	}
> +
>  	return NULL;
>  }
> 
> @@ -506,6 +526,7 @@ static struct miscdevice aer_inject_device = {
>  	.fops = &aer_inject_fops,
>  };
> 
> +
>  static int __init aer_inject_init(void)
>  {
>  	return misc_register(&aer_inject_device);

After

# rmmod aer_inject

What will happen?

Best Regards,
Huang Ying



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RESEND BUGFIX PATCH 3/3] PCI: check whether pci device has been removed when remove a pci device by sysfs
  2012-08-25 14:39     ` Jiang Liu
@ 2012-08-27  6:42       ` Yijing Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Yijing Wang @ 2012-08-27  6:42 UTC (permalink / raw)
  To: Jiang Liu; +Cc: Bjorn Helgaas, PCI, linux-kernel

On 2012/8/25 22:39, Jiang Liu wrote:
> Hi Yijing,
> 	The patch only patially fix the issue, there exists still small race
> condition window because pdev->is_added isn't a reliable flag to depend on.
> 	--Gerry
> 

Hi Gerry,
    You are right, add pdev->is_added flag check here only fix the race condition window
between remove and rescan sysfs interfaces. Maybe we need a more comprehensive solution to
solve these problems between hotplug/remove/rescan actions.Next, I will do a more detailed test for
[RFC PATCH v1 00/22] introduce PCI bus lock to serialize PCI hotplug operations patches.
That is a more comprehensive solution actually.

---------
Thanks.
Yijing

> On 08/25/2012 05:59 PM, Yijing Wang wrote:
>> We remove a pci device maybe like this
>> echo 1 > /sys/bus/pci/devices/xxxx:xx:xx.x/remove
>> Then remove_store function will be called to complete this remove work,
>> later the remove work will be queued to sysfs_workqueue by device_schedule_callback.
>> So if we remove a pci root port device and a pci endpoint device which was the root
>> port's child device concurrently.The endponit device will be removed when root port's
>> remove work completed,so when endpoint device itself's remove work start, since endpoint
>> device has been removed, it will result to oops.
>> This patch fix this.
>>
>> CallTrace:
>> kworker/u:2[220]: Oops 11003706212352 [1]
>> Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi
>> _cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si ipmi_devintf ipmi_msghandle
>> r dm_mod igb ppdev iTCO_wdt parport_pc iTCO_vendor_support i2c_i801 parport sg m
>> ptctl serio_raw i2c_core lpc_ich mfd_core hid_generic button container usbhid hi
>> d uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbcache jbd fan pr
>> ocessor ide_pci_generic ide_core ata_piix libata mptsas mptscsih mptbase scsi_tr
>> ansport_sas scsi_mod thermal thermal_sys hwmon



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when hotplug a pci bus which was doing aer error inject
  2012-08-25  9:59 ` [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when hotplug a pci bus which was doing aer error inject Yijing Wang
  2012-08-27  1:23   ` Huang Ying
@ 2012-08-27  8:49   ` Chen Gong
  2012-08-28  0:47     ` Yijing Wang
  1 sibling, 1 reply; 14+ messages in thread
From: Chen Gong @ 2012-08-27  8:49 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, Rusty Russell, Mauro Carvalho Chehab, PCI,
	Jiang Liu, Huang Ying, Hanjun Guo, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3308 bytes --]

On Sat, Aug 25, 2012 at 05:59:44PM +0800, Yijing Wang wrote:
> Date:	Sat, 25 Aug 2012 17:59:44 +0800
> From: Yijing Wang <wangyijing@huawei.com>
> To: Bjorn Helgaas <bhelgaas@google.com>, Rusty Russell
>  <rusty@rustcorp.com.au>, Mauro Carvalho Chehab <mchehab@redhat.com>
> CC: PCI <linux-pci@vger.kernel.org>, Jiang Liu <liuj97@gmail.com>, Huang
>  Ying <ying.huang@intel.com>, Hanjun Guo <guohanjun@huawei.com>,
>  linux-kernel@vger.kernel.org
> Subject: [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when
>  hotplug a pci bus which was doing aer error inject
> User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:14.0) Gecko/20120713
>  Thunderbird/14.0
> 
> When we inject aer errors to the target pci device by aer_inject module, the pci_ops of pci
> bus which the target device is on will be assign to pci_ops_aer.So if the target pci device
> is a bridge, once we hotplug the pci bus(child bus) which the target device bridges to, child
> bus's pci_ops will be assigned to pci_ops_aer too.Now every access to the child bus's device
> will result to system panic, because it return NULL pci_ops in pci_read_aer.
> This patch fix this.
> 
> CallTrace:
> bash[5908]: NaT consumption 17179869216 [1]
> Modules linked in: aer_inject cpufreq_conservative cpufreq_userspace cpufreq_pow
> ersave acpi_cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si(+) ipmi_devintf
> ipmi_msghandler dm_mod ppdev iTCO_wdt iTCO_vendor_support sg igb parport_pc i2c_
> i801 mptctl i2c_core serio_raw hid_generic lpc_ich mfd_core parport button conta
> iner usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbca
> che jbd fan processor ide_pci_generic ide_core ata_piix libata mptsas mptscsih m
> ptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
> 
[...]
> 
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Jiang Liu <liuj97@gmail.com>
> ---
>  drivers/pci/pcie/aer/aer_inject.c |   21 +++++++++++++++++++++
>  1 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer/aer_inject.c b/drivers/pci/pcie/aer/aer_inject.c
> index 5222986..fc28785 100644
> --- a/drivers/pci/pcie/aer/aer_inject.c
> +++ b/drivers/pci/pcie/aer/aer_inject.c
> @@ -109,6 +109,19 @@ static struct aer_error *__find_aer_error_by_dev(struct pci_dev *dev)
>  	return __find_aer_error((u16)domain, dev->bus->number, dev->devfn);
>  }
> 
> +static bool pci_is_upstream_bus(struct pci_bus *bus, struct pci_bus *up_bus)
> +{
> +	struct pci_bus *pbus = bus->parent;
> +
> +	while (pbus) {
> +		if (pbus == up_bus)
> +			return true;
> +		pbus = pbus->parent;
> +	}
> +
> +	return false;
> +}
> +
>  /* inject_lock must be held before calling */
>  static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
>  {
> @@ -118,6 +131,13 @@ static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
>  		if (bus_ops->bus == bus)
>  			return bus_ops->ops;
>  	}
> +
> +	/* here can't find bus_ops, fall back to get bus_ops of upstream bus */
> +	list_for_each_entry(bus_ops, &pci_bus_ops_list, list) {
> +		if (pci_is_upstream_bus(bus, bus_ops->bus))
> +			return bus_ops->ops;
> +	}
> +
>  	return NULL;
>  }
> 
At least, when returning NULL, a proper check and protection is needed.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when hotplug a pci bus which was doing aer error inject
  2012-08-27  1:23   ` Huang Ying
@ 2012-08-27 15:05     ` Jiang Liu
  2012-08-28  0:38       ` Huang Ying
  0 siblings, 1 reply; 14+ messages in thread
From: Jiang Liu @ 2012-08-27 15:05 UTC (permalink / raw)
  To: Huang Ying
  Cc: Yijing Wang, Bjorn Helgaas, Rusty Russell, Mauro Carvalho Chehab,
	PCI, Hanjun Guo, linux-kernel

Is it ok to ignore such a case? After all, aer_inject is just a test tool:)
It's not worth to change the core logic for such a corner case.
--Gerry

On 08/27/2012 09:23 AM, Huang Ying wrote:
> On Sat, 2012-08-25 at 17:59 +0800, Yijing Wang wrote:
>> When we inject aer errors to the target pci device by aer_inject module, the pci_ops of pci
>> bus which the target device is on will be assign to pci_ops_aer.So if the target pci device
>> is a bridge, once we hotplug the pci bus(child bus) which the target device bridges to, child
>> bus's pci_ops will be assigned to pci_ops_aer too.Now every access to the child bus's device
>> will result to system panic, because it return NULL pci_ops in pci_read_aer.
>> This patch fix this.
>>
>> CallTrace:
>> bash[5908]: NaT consumption 17179869216 [1]
>> Modules linked in: aer_inject cpufreq_conservative cpufreq_userspace cpufreq_pow
>> ersave acpi_cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si(+) ipmi_devintf
>> ipmi_msghandler dm_mod ppdev iTCO_wdt iTCO_vendor_support sg igb parport_pc i2c_
>> i801 mptctl i2c_core serio_raw hid_generic lpc_ich mfd_core parport button conta
>> iner usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbca
>> che jbd fan processor ide_pci_generic ide_core ata_piix libata mptsas mptscsih m
>> ptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
>>
>> Pid: 5908, CPU 9, comm:                 bash
>> psr : 00001010085a2010 ifs : 800000000000048e ip  : [<a000000220b815b0>]    Not
>> tainted (3.5.0-rc6yijing-repo)
>> ip is at pci_read_aer+0x330/0x460 [aer_inject]
>> unat: 0000000000000000 pfs : 000000000000048e rsc : 0000000000000003
>> rnat: 0000000000000000 bsps: 0000000000000000 pr  : 65519aa6a6969aa5
>> ldrs: 0000000000000000 ccv : ffffffff00000001 fpsr: 0009804c8a70033f
>> csd : 0000000000000000 ssd : 0000000000000000
>> b0  : a000000220b815b0 b6  : a000000220b81280 b7  : a0000001006d56a0
>> f6  : 1003e0000000000000005 f7  : 1003e0000000000000028
>> f8  : 1003e00000000000000c8 f9  : 1003e0000000000000005
>> f10 : 1003e627ec1e2f4c0d8a7 f11 : 1003e0000000000000011
>> r1  : a0000001014e63c0 r2  : 0000000000000738 r3  : 000000000000fffe
>> r8  : 0000000000000736 r9  : 0000000000000042 r10 : e000001f08f4c898
>> r11 : 0000000000000000 r12 : e000000f3dfcfdc0 r13 : e000000f3dfc0000
>> r14 : 0000000000000738 r15 : 0000000000004000 r16 : a000000220b827c8
>> r17 : a000000220b827b8 r18 : ffffffffffffff00 r19 : e000000f073b0110
>> r20 : 0000000000000042 r21 : e000000f073b0114 r22 : 0000000000000000
>> r23 : e000000f073b0118 r24 : a0000001009e0e49 r25 : 0000000000000001
>> r26 : 0000000000007041 r27 : e000000f3dfcfde0 r28 : 0000000000000000
>> r29 : e000000f3dfcfc08 r30 : a000000220b827c8 r31 : e000001f074d6000
>>
>> Call Trace:
>>  [<a000000100016500>] show_stack+0x80/0xa0
>>                                 sp=e000000f3dfcf800 bsp=e000000f3dfc1758
>>  [<a000000100016b60>] show_regs+0x640/0x920
>>                                 sp=e000000f3dfcf9d0 bsp=e000000f3dfc1700
>>  [<a000000100040770>] die+0x190/0x2c0
>>                                 sp=e000000f3dfcf9e0 bsp=e000000f3dfc16c0
>>  [<a0000001000408f0>] die_if_kernel+0x50/0x80
>>                                 sp=e000000f3dfcf9e0 bsp=e000000f3dfc1690
>>  [<a000000100903a90>] ia64_fault+0xf0/0x15e0
>>                                 sp=e000000f3dfcf9e0 bsp=e000000f3dfc1640
>>  [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
>>                                 sp=e000000f3dfcfbf0 bsp=e000000f3dfc1640
>>  [<a000000220b815b0>] pci_read_aer+0x330/0x460 [aer_inject]
>>                                 sp=e000000f3dfcfdc0 bsp=e000000f3dfc15c8
>>  [<a0000001004ace00>] pci_bus_read_config_dword+0xe0/0x140
>>                                 sp=e000000f3dfcfdc0 bsp=e000000f3dfc1580
>>  [<a0000001004b0c10>] pci_bus_read_dev_vendor_id+0x50/0x200
>>                                 sp=e000000f3dfcfdd0 bsp=e000000f3dfc1530
>>  [<a0000001008d3d10>] pci_scan_single_device+0x90/0x200
>>                                 sp=e000000f3dfcfdd0 bsp=e000000f3dfc14f8
>>  [<a0000001004b24b0>] pci_scan_slot+0xb0/0x320
>>                                 sp=e000000f3dfcfde0 bsp=e000000f3dfc14a8
>>  [<a0000001008d9e90>] pci_scan_child_bus+0x90/0x2e0
>>                                 sp=e000000f3dfcfde0 bsp=e000000f3dfc1468
>>  [<a0000001008d9580>] pci_scan_bridge+0x540/0xdc0
>>                                 sp=e000000f3dfcfde0 bsp=e000000f3dfc13d0
>>  [<a0000001008da0b0>] pci_scan_child_bus+0x2b0/0x2e0
>>                                 sp=e000000f3dfcfe00 bsp=e000000f3dfc1390
>>  [<a0000001008d5bd0>] pci_rescan_bus+0x50/0x220
>>                                 sp=e000000f3dfcfe00 bsp=e000000f3dfc1358
>>  [<a0000001004c2ab0>] bus_rescan_store+0xf0/0x160
>>                                 sp=e000000f3dfcfe10 bsp=e000000f3dfc1328
>>  [<a0000001006110b0>] bus_attr_store+0x70/0xa0
>>                                 sp=e000000f3dfcfe20 bsp=e000000f3dfc12f0
>>  [<a000000100343b00>] sysfs_write_file+0x240/0x340
>>                                 sp=e000000f3dfcfe20 bsp=e000000f3dfc1298
>>  [<a00000010025e230>] vfs_write+0x1b0/0x3a0
>>                                 sp=e000000f3dfcfe20 bsp=e000000f3dfc1250
>>  [<a00000010025e5e0>] sys_write+0x80/0x100
>>                                 sp=e000000f3dfcfe20 bsp=e000000f3dfc11d0
>>  [<a00000010000bf20>] ia64_ret_from_syscall+0x0/0x20
>>                                 sp=e000000f3dfcfe30 bsp=e000000f3dfc11d0
>>  [<a000000000040720>] __kernel_syscall_via_break+0x0/0x20
>>                                 sp=e000000f3dfd0000 bsp=e000000f3dfc11d0
>> Disabling lock debugging due to kernel taint
>>
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> Signed-off-by: Jiang Liu <liuj97@gmail.com>
>> ---
>>  drivers/pci/pcie/aer/aer_inject.c |   21 +++++++++++++++++++++
>>  1 files changed, 21 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/aer/aer_inject.c b/drivers/pci/pcie/aer/aer_inject.c
>> index 5222986..fc28785 100644
>> --- a/drivers/pci/pcie/aer/aer_inject.c
>> +++ b/drivers/pci/pcie/aer/aer_inject.c
>> @@ -109,6 +109,19 @@ static struct aer_error *__find_aer_error_by_dev(struct pci_dev *dev)
>>  	return __find_aer_error((u16)domain, dev->bus->number, dev->devfn);
>>  }
>>
>> +static bool pci_is_upstream_bus(struct pci_bus *bus, struct pci_bus *up_bus)
>> +{
>> +	struct pci_bus *pbus = bus->parent;
>> +
>> +	while (pbus) {
>> +		if (pbus == up_bus)
>> +			return true;
>> +		pbus = pbus->parent;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>  /* inject_lock must be held before calling */
>>  static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
>>  {
>> @@ -118,6 +131,13 @@ static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
>>  		if (bus_ops->bus == bus)
>>  			return bus_ops->ops;
>>  	}
>> +
>> +	/* here can't find bus_ops, fall back to get bus_ops of upstream bus */
>> +	list_for_each_entry(bus_ops, &pci_bus_ops_list, list) {
>> +		if (pci_is_upstream_bus(bus, bus_ops->bus))
>> +			return bus_ops->ops;
>> +	}
>> +
>>  	return NULL;
>>  }
>>
>> @@ -506,6 +526,7 @@ static struct miscdevice aer_inject_device = {
>>  	.fops = &aer_inject_fops,
>>  };
>>
>> +
>>  static int __init aer_inject_init(void)
>>  {
>>  	return misc_register(&aer_inject_device);
> 
> After
> 
> # rmmod aer_inject
> 
> What will happen?
> 
> Best Regards,
> Huang Ying
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when hotplug a pci bus which was doing aer error inject
  2012-08-27 15:05     ` Jiang Liu
@ 2012-08-28  0:38       ` Huang Ying
  2012-08-28  0:53         ` Yijing Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Huang Ying @ 2012-08-28  0:38 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Yijing Wang, Bjorn Helgaas, Rusty Russell, Mauro Carvalho Chehab,
	PCI, Hanjun Guo, linux-kernel

On Mon, 2012-08-27 at 23:05 +0800, Jiang Liu wrote:
> Is it ok to ignore such a case? After all, aer_inject is just a test tool:)
> It's not worth to change the core logic for such a corner case.
> --Gerry

Why ignore?  At least you can prevent aer_inject from unload if
something special happened.

Best Regards,
Huang Ying

> On 08/27/2012 09:23 AM, Huang Ying wrote:
> > On Sat, 2012-08-25 at 17:59 +0800, Yijing Wang wrote:
> >> When we inject aer errors to the target pci device by aer_inject module, the pci_ops of pci
> >> bus which the target device is on will be assign to pci_ops_aer.So if the target pci device
> >> is a bridge, once we hotplug the pci bus(child bus) which the target device bridges to, child
> >> bus's pci_ops will be assigned to pci_ops_aer too.Now every access to the child bus's device
> >> will result to system panic, because it return NULL pci_ops in pci_read_aer.
> >> This patch fix this.
> >>
> >> CallTrace:
> >> bash[5908]: NaT consumption 17179869216 [1]
> >> Modules linked in: aer_inject cpufreq_conservative cpufreq_userspace cpufreq_pow
> >> ersave acpi_cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si(+) ipmi_devintf
> >> ipmi_msghandler dm_mod ppdev iTCO_wdt iTCO_vendor_support sg igb parport_pc i2c_
> >> i801 mptctl i2c_core serio_raw hid_generic lpc_ich mfd_core parport button conta
> >> iner usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbca
> >> che jbd fan processor ide_pci_generic ide_core ata_piix libata mptsas mptscsih m
> >> ptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
> >>
> >> Pid: 5908, CPU 9, comm:                 bash
> >> psr : 00001010085a2010 ifs : 800000000000048e ip  : [<a000000220b815b0>]    Not
> >> tainted (3.5.0-rc6yijing-repo)
> >> ip is at pci_read_aer+0x330/0x460 [aer_inject]
> >> unat: 0000000000000000 pfs : 000000000000048e rsc : 0000000000000003
> >> rnat: 0000000000000000 bsps: 0000000000000000 pr  : 65519aa6a6969aa5
> >> ldrs: 0000000000000000 ccv : ffffffff00000001 fpsr: 0009804c8a70033f
> >> csd : 0000000000000000 ssd : 0000000000000000
> >> b0  : a000000220b815b0 b6  : a000000220b81280 b7  : a0000001006d56a0
> >> f6  : 1003e0000000000000005 f7  : 1003e0000000000000028
> >> f8  : 1003e00000000000000c8 f9  : 1003e0000000000000005
> >> f10 : 1003e627ec1e2f4c0d8a7 f11 : 1003e0000000000000011
> >> r1  : a0000001014e63c0 r2  : 0000000000000738 r3  : 000000000000fffe
> >> r8  : 0000000000000736 r9  : 0000000000000042 r10 : e000001f08f4c898
> >> r11 : 0000000000000000 r12 : e000000f3dfcfdc0 r13 : e000000f3dfc0000
> >> r14 : 0000000000000738 r15 : 0000000000004000 r16 : a000000220b827c8
> >> r17 : a000000220b827b8 r18 : ffffffffffffff00 r19 : e000000f073b0110
> >> r20 : 0000000000000042 r21 : e000000f073b0114 r22 : 0000000000000000
> >> r23 : e000000f073b0118 r24 : a0000001009e0e49 r25 : 0000000000000001
> >> r26 : 0000000000007041 r27 : e000000f3dfcfde0 r28 : 0000000000000000
> >> r29 : e000000f3dfcfc08 r30 : a000000220b827c8 r31 : e000001f074d6000
> >>
> >> Call Trace:
> >>  [<a000000100016500>] show_stack+0x80/0xa0
> >>                                 sp=e000000f3dfcf800 bsp=e000000f3dfc1758
> >>  [<a000000100016b60>] show_regs+0x640/0x920
> >>                                 sp=e000000f3dfcf9d0 bsp=e000000f3dfc1700
> >>  [<a000000100040770>] die+0x190/0x2c0
> >>                                 sp=e000000f3dfcf9e0 bsp=e000000f3dfc16c0
> >>  [<a0000001000408f0>] die_if_kernel+0x50/0x80
> >>                                 sp=e000000f3dfcf9e0 bsp=e000000f3dfc1690
> >>  [<a000000100903a90>] ia64_fault+0xf0/0x15e0
> >>                                 sp=e000000f3dfcf9e0 bsp=e000000f3dfc1640
> >>  [<a00000010000c0a0>] ia64_native_leave_kernel+0x0/0x270
> >>                                 sp=e000000f3dfcfbf0 bsp=e000000f3dfc1640
> >>  [<a000000220b815b0>] pci_read_aer+0x330/0x460 [aer_inject]
> >>                                 sp=e000000f3dfcfdc0 bsp=e000000f3dfc15c8
> >>  [<a0000001004ace00>] pci_bus_read_config_dword+0xe0/0x140
> >>                                 sp=e000000f3dfcfdc0 bsp=e000000f3dfc1580
> >>  [<a0000001004b0c10>] pci_bus_read_dev_vendor_id+0x50/0x200
> >>                                 sp=e000000f3dfcfdd0 bsp=e000000f3dfc1530
> >>  [<a0000001008d3d10>] pci_scan_single_device+0x90/0x200
> >>                                 sp=e000000f3dfcfdd0 bsp=e000000f3dfc14f8
> >>  [<a0000001004b24b0>] pci_scan_slot+0xb0/0x320
> >>                                 sp=e000000f3dfcfde0 bsp=e000000f3dfc14a8
> >>  [<a0000001008d9e90>] pci_scan_child_bus+0x90/0x2e0
> >>                                 sp=e000000f3dfcfde0 bsp=e000000f3dfc1468
> >>  [<a0000001008d9580>] pci_scan_bridge+0x540/0xdc0
> >>                                 sp=e000000f3dfcfde0 bsp=e000000f3dfc13d0
> >>  [<a0000001008da0b0>] pci_scan_child_bus+0x2b0/0x2e0
> >>                                 sp=e000000f3dfcfe00 bsp=e000000f3dfc1390
> >>  [<a0000001008d5bd0>] pci_rescan_bus+0x50/0x220
> >>                                 sp=e000000f3dfcfe00 bsp=e000000f3dfc1358
> >>  [<a0000001004c2ab0>] bus_rescan_store+0xf0/0x160
> >>                                 sp=e000000f3dfcfe10 bsp=e000000f3dfc1328
> >>  [<a0000001006110b0>] bus_attr_store+0x70/0xa0
> >>                                 sp=e000000f3dfcfe20 bsp=e000000f3dfc12f0
> >>  [<a000000100343b00>] sysfs_write_file+0x240/0x340
> >>                                 sp=e000000f3dfcfe20 bsp=e000000f3dfc1298
> >>  [<a00000010025e230>] vfs_write+0x1b0/0x3a0
> >>                                 sp=e000000f3dfcfe20 bsp=e000000f3dfc1250
> >>  [<a00000010025e5e0>] sys_write+0x80/0x100
> >>                                 sp=e000000f3dfcfe20 bsp=e000000f3dfc11d0
> >>  [<a00000010000bf20>] ia64_ret_from_syscall+0x0/0x20
> >>                                 sp=e000000f3dfcfe30 bsp=e000000f3dfc11d0
> >>  [<a000000000040720>] __kernel_syscall_via_break+0x0/0x20
> >>                                 sp=e000000f3dfd0000 bsp=e000000f3dfc11d0
> >> Disabling lock debugging due to kernel taint
> >>
> >> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> >> Signed-off-by: Jiang Liu <liuj97@gmail.com>
> >> ---
> >>  drivers/pci/pcie/aer/aer_inject.c |   21 +++++++++++++++++++++
> >>  1 files changed, 21 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/drivers/pci/pcie/aer/aer_inject.c b/drivers/pci/pcie/aer/aer_inject.c
> >> index 5222986..fc28785 100644
> >> --- a/drivers/pci/pcie/aer/aer_inject.c
> >> +++ b/drivers/pci/pcie/aer/aer_inject.c
> >> @@ -109,6 +109,19 @@ static struct aer_error *__find_aer_error_by_dev(struct pci_dev *dev)
> >>  	return __find_aer_error((u16)domain, dev->bus->number, dev->devfn);
> >>  }
> >>
> >> +static bool pci_is_upstream_bus(struct pci_bus *bus, struct pci_bus *up_bus)
> >> +{
> >> +	struct pci_bus *pbus = bus->parent;
> >> +
> >> +	while (pbus) {
> >> +		if (pbus == up_bus)
> >> +			return true;
> >> +		pbus = pbus->parent;
> >> +	}
> >> +
> >> +	return false;
> >> +}
> >> +
> >>  /* inject_lock must be held before calling */
> >>  static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
> >>  {
> >> @@ -118,6 +131,13 @@ static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
> >>  		if (bus_ops->bus == bus)
> >>  			return bus_ops->ops;
> >>  	}
> >> +
> >> +	/* here can't find bus_ops, fall back to get bus_ops of upstream bus */
> >> +	list_for_each_entry(bus_ops, &pci_bus_ops_list, list) {
> >> +		if (pci_is_upstream_bus(bus, bus_ops->bus))
> >> +			return bus_ops->ops;
> >> +	}
> >> +
> >>  	return NULL;
> >>  }
> >>
> >> @@ -506,6 +526,7 @@ static struct miscdevice aer_inject_device = {
> >>  	.fops = &aer_inject_fops,
> >>  };
> >>
> >> +
> >>  static int __init aer_inject_init(void)
> >>  {
> >>  	return misc_register(&aer_inject_device);
> > 
> > After
> > 
> > # rmmod aer_inject
> > 
> > What will happen?
> > 
> > Best Regards,
> > Huang Ying
> > 
> > 
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when hotplug a pci bus which was doing aer error inject
  2012-08-27  8:49   ` Chen Gong
@ 2012-08-28  0:47     ` Yijing Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Yijing Wang @ 2012-08-28  0:47 UTC (permalink / raw)
  To: Bjorn Helgaas, Rusty Russell, Mauro Carvalho Chehab, PCI,
	Jiang Liu, Huang Ying, Hanjun Guo, linux-kernel, gong.chen

>> bash[5908]: NaT consumption 17179869216 [1]
>> Modules linked in: aer_inject cpufreq_conservative cpufreq_userspace cpufreq_pow
>> ersave acpi_cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si(+) ipmi_devintf
>> ipmi_msghandler dm_mod ppdev iTCO_wdt iTCO_vendor_support sg igb parport_pc i2c_
>> i801 mptctl i2c_core serio_raw hid_generic lpc_ich mfd_core parport button conta
>> iner usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbca
>> che jbd fan processor ide_pci_generic ide_core ata_piix libata mptsas mptscsih m
>> ptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
>>
> [...]
>>
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> Signed-off-by: Jiang Liu <liuj97@gmail.com>
>> ---
>>  drivers/pci/pcie/aer/aer_inject.c |   21 +++++++++++++++++++++
>>  1 files changed, 21 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/aer/aer_inject.c b/drivers/pci/pcie/aer/aer_inject.c
>> index 5222986..fc28785 100644
>> --- a/drivers/pci/pcie/aer/aer_inject.c
>> +++ b/drivers/pci/pcie/aer/aer_inject.c
>> @@ -109,6 +109,19 @@ static struct aer_error *__find_aer_error_by_dev(struct pci_dev *dev)
>>  	return __find_aer_error((u16)domain, dev->bus->number, dev->devfn);
>>  }
>>
>> +static bool pci_is_upstream_bus(struct pci_bus *bus, struct pci_bus *up_bus)
>> +{
>> +	struct pci_bus *pbus = bus->parent;
>> +
>> +	while (pbus) {
>> +		if (pbus == up_bus)
>> +			return true;
>> +		pbus = pbus->parent;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>  /* inject_lock must be held before calling */
>>  static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
>>  {
>> @@ -118,6 +131,13 @@ static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
>>  		if (bus_ops->bus == bus)
>>  			return bus_ops->ops;
>>  	}
>> +
>> +	/* here can't find bus_ops, fall back to get bus_ops of upstream bus */
>> +	list_for_each_entry(bus_ops, &pci_bus_ops_list, list) {
>> +		if (pci_is_upstream_bus(bus, bus_ops->bus))
>> +			return bus_ops->ops;
>> +	}
>> +
>>  	return NULL;
>>  }
>>
> At least, when returning NULL, a proper check and protection is needed.
> 
Hi Chen Gong,
    Thanks for your comments. It's real dangerous when returning NULL, Since pci_read_aer/pci_write_aer functions
had no any protection codes to check it. I think maybe we can treat this situation as a read/write access error, and set *val = 0 ?
Another way here is panic system, Becasue this is a really unexpected situation.

-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when hotplug a pci bus which was doing aer error inject
  2012-08-28  0:38       ` Huang Ying
@ 2012-08-28  0:53         ` Yijing Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Yijing Wang @ 2012-08-28  0:53 UTC (permalink / raw)
  To: Huang Ying
  Cc: Jiang Liu, Bjorn Helgaas, Rusty Russell, Mauro Carvalho Chehab,
	PCI, Hanjun Guo, linux-kernel

On 2012/8/28 8:38, Huang Ying wrote:
> On Mon, 2012-08-27 at 23:05 +0800, Jiang Liu wrote:
>> Is it ok to ignore such a case? After all, aer_inject is just a test tool:)
>> It's not worth to change the core logic for such a corner case.
>> --Gerry
> 
> Why ignore?  At least you can prevent aer_inject from unload if
> something special happened.
> 

Hi Huang Ying,
   Thanks for your comments. It's my negligence. I will add some protection code when do #rmmod aer_inject(a race condition window about bus_ops),
I will correct it in the new version patch.

----------
Thanks!
Yijing


>> On 08/27/2012 09:23 AM, Huang Ying wrote:
>>> On Sat, 2012-08-25 at 17:59 +0800, Yijing Wang wrote:
>>>> When we inject aer errors to the target pci device by aer_inject module, the pci_ops of pci
>>>> bus which the target device is on will be assign to pci_ops_aer.So if the target pci device
>>>> is a bridge, once we hotplug the pci bus(child bus) which the target device bridges to, child
>>>> bus's pci_ops will be assigned to pci_ops_aer too.Now every access to the child bus's device
>>>> will result to system panic, because it return NULL pci_ops in pci_read_aer.
>>>> This patch fix this.




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-08-28  0:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-11 11:52 [PATCH 1/3] PCI/AER: Fix NULL pci_ops return when hotplug a pci bus which was doing aer error inject Yijing Wang
2012-08-11 11:52 ` [PATCH 2/3] PCI/AER: Clean pci_bus_ops when related pci bus was removed Yijing Wang
2012-08-25  9:59   ` [RESEND BUGFIX PATCH 2/3] PCI/AER: clean " Yijing Wang
2012-08-11 11:52 ` [PATCH 3/3] PCI: Check whether pci device has been removed when remove a pci device by sysfs Yijing Wang
2012-08-25  9:59   ` [RESEND BUGFIX PATCH 3/3] PCI: check " Yijing Wang
2012-08-25 14:39     ` Jiang Liu
2012-08-27  6:42       ` Yijing Wang
2012-08-25  9:59 ` [RESEND BUGFIX PATCH 1/3] PCI/AER: fix pci_ops return NULL when hotplug a pci bus which was doing aer error inject Yijing Wang
2012-08-27  1:23   ` Huang Ying
2012-08-27 15:05     ` Jiang Liu
2012-08-28  0:38       ` Huang Ying
2012-08-28  0:53         ` Yijing Wang
2012-08-27  8:49   ` Chen Gong
2012-08-28  0:47     ` Yijing Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).