From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f46.google.com ([209.85.160.46]:41642 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754115Ab2HYOmQ (ORCPT ); Sat, 25 Aug 2012 10:42:16 -0400 Message-ID: <5038E3B7.5090601@gmail.com> Date: Sat, 25 Aug 2012 22:39:51 +0800 From: Jiang Liu MIME-Version: 1.0 To: Yijing Wang CC: Bjorn Helgaas , PCI , linux-kernel@vger.kernel.org Subject: Re: [RESEND BUGFIX PATCH 3/3] PCI: check whether pci device has been removed when remove a pci device by sysfs References: <1344685946-8172-3-git-send-email-wangyijing@huawei.com> <5038A21C.4070200@huawei.com> In-Reply-To: <5038A21C.4070200@huawei.com> Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-pci-owner@vger.kernel.org List-ID: Hi Yijing, The patch only patially fix the issue, there exists still small race condition window because pdev->is_added isn't a reliable flag to depend on. --Gerry On 08/25/2012 05:59 PM, Yijing Wang wrote: > We remove a pci device maybe like this > echo 1 > /sys/bus/pci/devices/xxxx:xx:xx.x/remove > Then remove_store function will be called to complete this remove work, > later the remove work will be queued to sysfs_workqueue by device_schedule_callback. > So if we remove a pci root port device and a pci endpoint device which was the root > port's child device concurrently.The endponit device will be removed when root port's > remove work completed,so when endpoint device itself's remove work start, since endpoint > device has been removed, it will result to oops. > This patch fix this. > > CallTrace: > kworker/u:2[220]: Oops 11003706212352 [1] > Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi > _cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si ipmi_devintf ipmi_msghandle > r dm_mod igb ppdev iTCO_wdt parport_pc iTCO_vendor_support i2c_i801 parport sg m > ptctl serio_raw i2c_core lpc_ich mfd_core hid_generic button container usbhid hi > d uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbcache jbd fan pr > ocessor ide_pci_generic ide_core ata_piix libata mptsas mptscsih mptbase scsi_tr > ansport_sas scsi_mod thermal thermal_sys hwmon > > Pid: 220, CPU 30, comm: kworker/u:2 > psr : 0000121008526030 ifs : 8000000000000388 ip : [] Not > tainted (3.5.0-rc6yijing-repo) > ip is at __pci_remove_bus_device+0x101/0x1e0 > unat: 0000000000000000 pfs : 0000000000000388 rsc : 0000000000000003 > rnat: ffffffffffffffff bsps: ffffffffffffffff pr : 0000080001919585 > ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c9e70433f > csd : 0000000000000000 ssd : 0000000000000000 > b0 : a0000001004b3060 b6 : a0000001004c2400 b7 : a0000001000faae0 > f6 : 000000000000000000000 f7 : 1003e00000000000057cd > f8 : 1003e0000000050000003 f9 : 1003e000001cb8678a0d0 > f10 : 1003e9a05b7a39369e270 f11 : 1003e000000000000008f > r1 : a0000001014e63c0 r2 : e000001f075dec00 r3 : 0000000000000000 > r8 : 0000000000000008 r9 : a0000001012e7308 r10 : 0000000004000000 > r11 : e000000f0006e800 r12 : e000001f08dbfe00 r13 : e000001f08db0000 > r14 : 0000000000000000 r15 : 0000000000000000 r16 : 0000000000000000 > r17 : e000000f0006f008 r18 : 000000000f000000 r19 : a0000001012f3910 > r20 : 0000000000100001 r21 : a000000101a62990 r22 : a000000100344580 > r23 : 0000000000000000 r24 : 0000000000001000 r25 : 0000000000000000 > r26 : a000000101a62988 r27 : e000003f0fc37e60 r28 : e000003f0fc37e68 > r29 : e000002f07012be0 r30 : 0000000082aa0260 r31 : 0000000000004000 > > Call Trace: > [] show_stack+0x80/0xa0 > sp=e000001f08dbf9c0 bsp=e000001f08db1388 > [] show_regs+0x640/0x920 > sp=e000001f08dbfb90 bsp=e000001f08db1330 > [] die+0x190/0x2c0 > sp=e000001f08dbfba0 bsp=e000001f08db12f0 > [] ia64_do_page_fault+0x7e0/0xac0 > sp=e000001f08dbfba0 bsp=e000001f08db1290 > [] ia64_native_leave_kernel+0x0/0x270 > sp=e000001f08dbfc30 bsp=e000001f08db1290 > [] __pci_remove_bus_device+0x100/0x1e0 > sp=e000001f08dbfe00 bsp=e000001f08db1250 > [] pci_stop_and_remove_bus_device+0x30/0x60 > sp=e000001f08dbfe00 bsp=e000001f08db1230 > [] remove_callback+0x40/0x80 > sp=e000001f08dbfe00 bsp=e000001f08db1208 > [] sysfs_schedule_callback_work+0x50/0x120 > sp=e000001f08dbfe00 bsp=e000001f08db11d0 > [] process_one_work+0x6f0/0xae0 > sp=e000001f08dbfe00 bsp=e000001f08db1158 > [] worker_thread+0x3b0/0xc80 > sp=e000001f08dbfe00 bsp=e000001f08db1060 > [] kthread+0x110/0x140 > sp=e000001f08dbfe00 bsp=e000001f08db1028 > [] kernel_thread_helper+0x30/0x60 > sp=e000001f08dbfe30 bsp=e000001f08db1000 > [] start_kernel_thread+0x20/0x40 > sp=e000001f08dbfe30 bsp=e000001f08db1000 > Disabling lock debugging due to kernel taint > Unable to handle kernel NULL pointer dereference (address 0000000000000048) > kworker/u:2[220]: Oops 11012296146944 [2] > > Pid: 220, CPU 30, comm: kworker/u:2 > psr : 0000121008022038 ifs : 8000000000000288 ip : [] Tain > ted: G D (3.5.0-rc6yijing-repo) > ip is at wq_worker_sleeping+0x61/0x200 > unat: 0000000000000000 pfs : 0000000000000288 rsc : 0000000000000003 > rnat: 0000121008026038 bsps: a0000001000407e0 pr : 965a684515516955 > ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f > csd : 0000000000000000 ssd : 0000000000000000 > b0 : a0000001000c4920 b6 : a0000001000f9fc0 b7 : a0000001000faae0 > f6 : 000000000000000000000 f7 : 1003e9e3779b97f4a7c16 > f8 : 1003e0000000050000003 f9 : 1003e000001cb87e8a5a8 > f10 : 1003e9a78b92717b9f0f8 f11 : 1003e000000000000008f > r1 : a0000001014e63c0 r2 : 0000000000000000 r3 : fffffffffffc1200 > r8 : 0000000000000000 r9 : 000000000000001e r10 : a000000101432530 > r11 : a000000101432530 r12 : e000001f08dbfb70 r13 : e000001f08db0000 > r14 : 0000000000001000 r15 : a000000101432620 r16 : e000003000245d40 > r17 : fffffffffffc5c00 r18 : e000003000245d00 r19 : 00000000000000f8 > r20 : e000001f08db0070 r21 : 0000000000000048 r22 : e000003000245ce8 > r23 : e000003000245ce0 r24 : a000000101a638e0 r25 : ffffffffff48e500 > r26 : e000003f088a0098 r27 : 0000000000000400 r28 : 0000000000000001 > r29 : 000000000420806c r30 : e000001f08db0014 r31 : 0000000000000000 > > Call Trace: > [] show_stack+0x80/0xa0 > sp=e000001f08dbf730 bsp=e000001f08db16f8 > [] show_regs+0x640/0x920 > sp=e000001f08dbf900 bsp=e000001f08db16a0 > [] die+0x190/0x2c0 > sp=e000001f08dbf910 bsp=e000001f08db1660 > [] ia64_do_page_fault+0x7e0/0xac0 > sp=e000001f08dbf910 bsp=e000001f08db1600 > [] ia64_native_leave_kernel+0x0/0x270 > sp=e000001f08dbf9a0 bsp=e000001f08db1600 > [] wq_worker_sleeping+0x60/0x200 > sp=e000001f08dbfb70 bsp=e000001f08db15b8 > [] __schedule+0x14c0/0x18c0 > sp=e000001f08dbfb70 bsp=e000001f08db1440 > [] schedule+0x60/0x140 > sp=e000001f08dbfb80 bsp=e000001f08db13e0 > [] do_exit+0xef0/0x1740 > sp=e000001f08dbfb80 bsp=e000001f08db1330 > [] die+0x260/0x2c0 > sp=e000001f08dbfba0 bsp=e000001f08db12f0 > [] ia64_do_page_fault+0x7e0/0xac0 > sp=e000001f08dbfba0 bsp=e000001f08db1290 > [] ia64_native_leave_kernel+0x0/0x270 > sp=e000001f08dbfc30 bsp=e000001f08db1290 > [] __pci_remove_bus_device+0x100/0x1e0 > sp=e000001f08dbfe00 bsp=e000001f08db1250 > [] pci_stop_and_remove_bus_device+0x30/0x60 > sp=e000001f08dbfe00 bsp=e000001f08db1230 > [] remove_callback+0x40/0x80 > sp=e000001f08dbfe00 bsp=e000001f08db1208 > [] sysfs_schedule_callback_work+0x50/0x120 > sp=e000001f08dbfe00 bsp=e000001f08db11d0 > [] process_one_work+0x6f0/0xae0 > sp=e000001f08dbfe00 bsp=e000001f08db1158 > [] worker_thread+0x3b0/0xc80 > sp=e000001f08dbfe00 bsp=e000001f08db1060 > [] kthread+0x110/0x140 > sp=e000001f08dbfe00 bsp=e000001f08db1028 > [] kernel_thread_helper+0x30/0x60 > sp=e000001f08dbfe30 bsp=e000001f08db1000 > [] start_kernel_thread+0x20/0x40 > sp=e000001f08dbfe30 bsp=e000001f08db1000 > Fixing recursive fault but reboot is needed! > Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi > _cpufreq binfmt_misc fuse nls_iso8859_1 loop ipmi_si ipmi_devintf ipmi_msghandle > r dm_mod igb ppdev iTCO_wdt parport_pc iTCO_vendor_support i2c_i801 parport sg m > ptctl serio_raw i2c_core lpc_ich mfd_core hid_generic button container usbhid hi > d uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif ext3 mbcache jbd fan pr > ocessor ide_pci_generic ide_core ata_piix libata mptsas mptscsih mptbase scsi_tr > ansport_sas scsi_mod thermal thermal_sys hwmon > > Signed-off-by: Yijing Wang > --- > drivers/pci/pci-sysfs.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index 6869009..b0be682 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -332,7 +332,10 @@ static void remove_callback(struct device *dev) > struct pci_dev *pdev = to_pci_dev(dev); > > mutex_lock(&pci_remove_rescan_mutex); > + if (!pdev->is_added) > + goto out; > pci_stop_and_remove_bus_device(pdev); > +out: > mutex_unlock(&pci_remove_rescan_mutex); > } >