All of lore.kernel.org
 help / color / mirror / Atom feed
* Latest bk kernel does not properly free PCI IO & MEM allocations
@ 2005-03-09 18:38 Prarit Bhargava
  2005-03-10 15:07 ` Prarit Bhargava
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Prarit Bhargava @ 2005-03-09 18:38 UTC (permalink / raw)
  To: linux-hotplug



Hello all,

I noticed this in testing of a hotplug driver I've written for SGI Altix hardware.

If you cat /proc/iomem and check lspci after removing a device via hotplug:

[root@cranberry3 linux-2.6.9-hotplug]# lspci
01:01.0 Co-processor: Silicon Graphics, Inc. IOC4 I/O controller (rev 4f)
01:03.0 SCSI storage controller: QLogic Corp. ISP12160 Dual Channel Ultra3 SCSI Processor (rev 06)
01:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15)
03:02.0 ATM network controller: FORE Systems Inc ForeRunnerHE ATM Adapter
04:01.0 SCSI storage controller: QLogic Corp. QLA2200 64-bit Fibre Channel Adapter (rev 05)
[root@cranberry3 linux-2.6.9-hotplug]# cat /proc/iomem 
c00004080c200000-c00004080c2fffff : 0000:01:01.0
c00004080f400000-c00004080f4fffff : 0000:03:02.0
c00004080fe00000-c00004080fe00fff : 0000:04:01.0
c00004080fe01000-c00004080fe01fff : 0000:04:02.0
  00220000-0023ffff : 0000:04:01.0

There are still allocations listed for device 04:02.0 even though it has been
removed from the system.

I tracked down the issue to the following:

When pci_remove_bus_device is called on the device in the slot (should be a safe 
thing to do) both the pci_driver remove function for a device and the function 
pci_free_resources are called.

In the case of the QLA2x00 driver, the driver's remove function calls
pci_release_regions.  This function releases the IO & MEM allocs for the pci
device and kfree's them.

As previously mentioned, pci_free_resources is called.  This function attempts 
to use and free IO & MEM allocs for the pci device.

This may (about 5% of the time) causes an oops, ie) this is a critical issue for anyone
using hotplug.

This raises the following question -- should a driver be cleaning up resources or should
PCI?

P.



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latest bk kernel does not properly free PCI IO & MEM allocations
  2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
@ 2005-03-10 15:07 ` Prarit Bhargava
  2005-03-10 17:17 ` Greg KH
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Prarit Bhargava @ 2005-03-10 15:07 UTC (permalink / raw)
  To: linux-hotplug

[-- Attachment #1: Type: text/plain, Size: 2698 bytes --]

The following patch reproduces the oops 100% of the time by doing the 
following.

Boot machine.
Load Hotplug driver (if not already loaded)
Disable Slot that has MEM and IO allocations
cat /proc/iomem <--- leads to oops.

P.

Prarit Bhargava wrote:

>
>
> Hello all,
>
> I noticed this in testing of a hotplug driver I've written for SGI 
> Altix hardware.
>
> If you cat /proc/iomem and check lspci after removing a device via 
> hotplug:
>
> [root@cranberry3 linux-2.6.9-hotplug]# lspci
> 01:01.0 Co-processor: Silicon Graphics, Inc. IOC4 I/O controller (rev 4f)
> 01:03.0 SCSI storage controller: QLogic Corp. ISP12160 Dual Channel 
> Ultra3 SCSI Processor (rev 06)
> 01:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 
> Gigabit Ethernet (rev 15)
> 03:02.0 ATM network controller: FORE Systems Inc ForeRunnerHE ATM Adapter
> 04:01.0 SCSI storage controller: QLogic Corp. QLA2200 64-bit Fibre 
> Channel Adapter (rev 05)
> [root@cranberry3 linux-2.6.9-hotplug]# cat /proc/iomem 
> c00004080c200000-c00004080c2fffff : 0000:01:01.0
> c00004080f400000-c00004080f4fffff : 0000:03:02.0
> c00004080fe00000-c00004080fe00fff : 0000:04:01.0
> c00004080fe01000-c00004080fe01fff : 0000:04:02.0
>  00220000-0023ffff : 0000:04:01.0
>
> There are still allocations listed for device 04:02.0 even though it 
> has been
> removed from the system.
>
> I tracked down the issue to the following:
>
> When pci_remove_bus_device is called on the device in the slot (should 
> be a safe thing to do) both the pci_driver remove function for a 
> device and the function pci_free_resources are called.
>
> In the case of the QLA2x00 driver, the driver's remove function calls
> pci_release_regions.  This function releases the IO & MEM allocs for 
> the pci
> device and kfree's them.
>
> As previously mentioned, pci_free_resources is called.  This function 
> attempts to use and free IO & MEM allocs for the pci device.
>
> This may (about 5% of the time) causes an oops, ie) this is a critical 
> issue for anyone
> using hotplug.
>
> This raises the following question -- should a driver be cleaning up 
> resources or should
> PCI?
>
> P.
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
> Linux-hotplug-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
>

[-- Attachment #2: repro.patch --]
[-- Type: text/x-patch, Size: 628 bytes --]

--- kernel/resource.c.orig	2005-03-10 14:57:58.317008676 -0500
+++ kernel/resource.c	2005-03-10 14:58:02.310172689 -0500
@@ -495,20 +495,21 @@
 			break;
 		if (res->start <= start && res->end >= end) {
 			if (!(res->flags & IORESOURCE_BUSY)) {
 				p = &res->child;
 				continue;
 			}
 			if (res->start != start || res->end != end)
 				break;
 			*p = res->sibling;
 			write_unlock(&resource_lock);
+			memset(res, 0, sizeof(*res));
 			kfree(res);
 			return;
 		}
 		p = &res->sibling;
 	}
 
 	write_unlock(&resource_lock);
 
 	printk(KERN_WARNING "Trying to free nonexistent resource <%08lx-%08lx>\n", start, end);
 }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latest bk kernel does not properly free PCI IO & MEM allocations
  2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
  2005-03-10 15:07 ` Prarit Bhargava
@ 2005-03-10 17:17 ` Greg KH
  2005-03-10 18:16 ` Prarit Bhargava
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Greg KH @ 2005-03-10 17:17 UTC (permalink / raw)
  To: linux-hotplug

On Wed, Mar 09, 2005 at 01:38:33PM -0500, Prarit Bhargava wrote:
> 
> 
> Hello all,
> 
> I noticed this in testing of a hotplug driver I've written for SGI Altix 
> hardware.
> 
> If you cat /proc/iomem and check lspci after removing a device via hotplug:
> 
> [root@cranberry3 linux-2.6.9-hotplug]# lspci
> 01:01.0 Co-processor: Silicon Graphics, Inc. IOC4 I/O controller (rev 4f)
> 01:03.0 SCSI storage controller: QLogic Corp. ISP12160 Dual Channel Ultra3 
> SCSI Processor (rev 06)
> 01:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit 
> Ethernet (rev 15)
> 03:02.0 ATM network controller: FORE Systems Inc ForeRunnerHE ATM Adapter
> 04:01.0 SCSI storage controller: QLogic Corp. QLA2200 64-bit Fibre Channel 
> Adapter (rev 05)
> [root@cranberry3 linux-2.6.9-hotplug]# cat /proc/iomem 
> c00004080c200000-c00004080c2fffff : 0000:01:01.0
> c00004080f400000-c00004080f4fffff : 0000:03:02.0
> c00004080fe00000-c00004080fe00fff : 0000:04:01.0
> c00004080fe01000-c00004080fe01fff : 0000:04:02.0
>  00220000-0023ffff : 0000:04:01.0
> 
> There are still allocations listed for device 04:02.0 even though it has 
> been
> removed from the system.
> 
> I tracked down the issue to the following:
> 
> When pci_remove_bus_device is called on the device in the slot (should be a 
> safe thing to do) both the pci_driver remove function for a device and the 
> function pci_free_resources are called.
> 
> In the case of the QLA2x00 driver, the driver's remove function calls
> pci_release_regions.  This function releases the IO & MEM allocs for the pci
> device and kfree's them.
> 
> As previously mentioned, pci_free_resources is called.  This function 
> attempts to use and free IO & MEM allocs for the pci device.

Where does it attempt to use it?  I see a loop to free the resources,
not anything else.

Also, if you enable slab debugging, does this always occur?

And, this should be brought up on the linux-pci mailing list, not the
hotplug list.

thanks,

greg k-h


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latest bk kernel does not properly free PCI IO & MEM allocations
  2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
  2005-03-10 15:07 ` Prarit Bhargava
  2005-03-10 17:17 ` Greg KH
@ 2005-03-10 18:16 ` Prarit Bhargava
  2005-03-10 18:50 ` Prarit Bhargava
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Prarit Bhargava @ 2005-03-10 18:16 UTC (permalink / raw)
  To: linux-hotplug



Greg KH wrote:

>>I tracked down the issue to the following:
>>
>>When pci_remove_bus_device is called on the device in the slot (should be a 
>>safe thing to do) both the pci_driver remove function for a device and the 
>>function pci_free_resources are called.
>>
>>In the case of the QLA2x00 driver, the driver's remove function calls
>>pci_release_regions.  This function releases the IO & MEM allocs for the pci
>>device and kfree's them.
>>
>>As previously mentioned, pci_free_resources is called.  This function 
>>attempts to use and free IO & MEM allocs for the pci device.
>>    
>>
>
>Where does it attempt to use it?  I see a loop to free the resources,
>not anything else.
>
>  
>
Sorry -- I confused you.  I meant "use and then free" in that it uses 
kfree'd memory and then attempts to free the resources. 

>Also, if you enable slab debugging, does this always occur?
>  
>
Haven't tried that ... I'll give it a shot and report back to the list.

>And, this should be brought up on the linux-pci mailing list, not the
>hotplug list.
>
>  
>
I'll defer to your judgement on that :)  Given the code paths above it 
seemed to be a hotplug issue to me :)

>thanks,
>
>  
>
Thanks for your reply -- I'll turn on slab debug and see what happens...

>greg k-h
>
>
>  
>
P.

>-------------------------------------------------------
>SF email is sponsored by - The IT Product Guide
>Read honest & candid reviews on hundreds of IT Products from real users.
>Discover which products truly live up to the hype. Start reading now.
>http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
>_______________________________________________
>Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
>Linux-hotplug-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
>
>  
>


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latest bk kernel does not properly free PCI IO & MEM allocations
  2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
                   ` (2 preceding siblings ...)
  2005-03-10 18:16 ` Prarit Bhargava
@ 2005-03-10 18:50 ` Prarit Bhargava
  2005-03-10 23:23 ` Greg KH
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Prarit Bhargava @ 2005-03-10 18:50 UTC (permalink / raw)
  To: linux-hotplug


> Greg KH wrote: Also, if you enable slab debugging, does this always 
> occur?
>  
>

Sorry Greg -- still waiting on linux-pci subscription ... :(

Anyway just to continue the discussion -- here's a run with SLAB debug 
on.  The oops is now 100% reproducible. 

[root@altix3 ~]# lspci
01:01.0 Co-processor: Silicon Graphics, Inc. IOC4 I/O controller (rev 53)
01:03.0 SCSI storage controller: QLogic Corp. ISP12160 Dual Channel 
Ultra3 SCSI Processor (rev 06)
01:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 
Gigabit Ethernet (rev 15)
02:01.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 02)
02:01.1 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 02)
03:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X 
Fusion-MPT Dual Ultra320 SCSI (rev 07)
03:01.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X 
Fusion-MPT Dual Ultra320 SCSI (rev 07)
05:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X 
Fusion-MPT Dual Ultra320 SCSI (rev 07)
05:01.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X 
Fusion-MPT Dual Ultra320 SCSI (rev 07)
[root@altix3 ~]# cat /proc/iomem
c00000880c200000-c00000880c2fffff : 0000:01:01.0
c00000880f400000-c00000880f41ffff : 0000:03:01.0
c00000880d400000-c00000880d41ffff : 0000:05:01.0
c00001880cc00000-c00001880cc00fff : 0000:02:01.0
c00001880cc01000-c00001880cc01fff : 0000:02:01.1
[root@altix3 ~]# cat /proc/ioports
00000060-0000006f : i8042
130027ad160-130027ad163 : PM1a_EVT_BLK
130027ad170-130027ad171 : PM1a_CNT_BLK
130027ad180-130027ad183 : PM_TMR
130027ad190-130027ad197 : GPE0_BLK
c00000880c200100-c00000880c20011f : ide0
c00000880c200140-c00000880c200163 : ide0
c00000880c600000-c00000880c6000ff : 0000:01:03.0
c00001880ca00000-c00001880ca000ff : 0000:02:01.0
c00000880f200000-c00000880f2000ff : 0000:03:01.0
c00000880d200000-c00000880d2000ff : 0000:05:01.0
[root@altix3 ~]# !modprobe
modprobe sgi_hotplug
[root@altix3 ~]# cd /sys/bus/pci/slots/
[root@altix3 slots]# ls
m_001\x01_b_1_s_2  m_001\x01_b_3_s_1  m_001\x01_b_4_s_2  m_001\x01_b_6_s_1
m_001\x01_b_2_s_1  m_001\x01_b_3_s_2  m_001\x01_b_5_s_1  m_001\x01_b_6_s_2
m_001\x01_b_2_s_2  m_001\x01_b_4_s_1  m_001\x01_b_5_s_2
[root@altix3 slots]# cd m_001\x01_b_5_s_1
[root@altix3 m_001\x01_b_5_s_1]# echo 0 > power
Begin PCI Hot-Plug Message for -> m_001\x01_b_5_s_1
Remove operation successful
End PCI Hot-Plug Message for -> m_001\x01_b_5_s_1

[root@altix3 m_001\x01_b_5_s_1]# lspci
01:01.0 Co-processor: Silicon Graphics, Inc. IOC4 I/O controller (rev 53)
01:03.0 SCSI storage controller: QLogic Corp. ISP12160 Dual Channel 
Ultra3 SCSI Processor (rev 06)
01:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 
Gigabit Ethernet (rev 15)
02:01.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 02)
02:01.1 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 02)
03:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X 
Fusion-MPT Dual Ultra320 SCSI (rev 07)
03:01.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X 
Fusion-MPT Dual Ultra320 SCSI (rev 07)
[root@altix3 m_001\x01_b_5_s_1]# cat /proc/iomem
Segmentation fault
Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b8b
cat[2871]: Oops 11012296146944 [1]
Modules linked in: sgi_hotplug(U) md5(U) ipv6(U) parport_pc(U) lp(U) 
parport(U) autofs4(U) sunrpc(U) ds(U) yenta_socket(U) pcmcia_core(U) 
ipt_REJECT(U) ipt_state(U) ip_conntrack(U) iptable_filter(U) 
ip_tables(U) vfat(U) fat(U) button(U) sgiioc4(U) tg3(U) dm_snapshot(U) 
dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) qla2300(U) qla2xxx(U) 
scsi_transport_fc(U) qla1280(U) mptscsih(U) mptbase(U) sd_mod(U) scsi_mod(U)

Pid: 2871, CPU 3, comm:                  cat
psr : 0000121008126010 ifs : 800000000000028d ip  : 
[<a00000010007ddf1>]    Not tainted
ip is at r_show+0x51/0x120
unat: 0000000000000000 pfs : 0000000000000894 rsc : 0000000000000003
rnat: 0000101008126010 bsps: a00000010022fe80 pr  : 000000000955a959
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a00000010015ee70 b6  : a00000010007db80 b7  : a00000010007dda0
f6  : 0fffafffffffff0000000 f7  : 0ffdec000000000000000
f8  : 10002c000000000000000 f9  : 100038000000000000000
f10 : 0fffebffffffff4000000 f11 : 1003e0000000000000000
r1  : a00000010096a980 r2  : a000000100702a18 r3  : e00001b07be04748
r8  : 6b6b6b6b6b6b6b6b r9  : 6b6b6b6b6b6b6b8b r10 : a000000100634858
r11 : a00000010007dda0 r12 : e00000b0759afe00 r13 : e00000b0759a8000
r14 : 000000000000ffff r15 : ffffffffffffffff r16 : a000000100702a08
r17 : 6b6b6b6b6b6b6b6b r18 : 0000000000000062 r19 : e00000b0759afe20
r20 : a0000001005cfa00 r21 : 0000000000000270 r22 : a0000001005cf790
r23 : a000000100778a18 r24 : a0000001007124d3 r25 : 0000000000000073
r26 : e00000b0759afe10 r27 : ffffffffffffffff r28 : e00000b0759afdd0
r29 : 000000000000000b r30 : 000000000000000c r31 : fffffffffffffffe

Call Trace:
 [<a000000100016a40>] show_stack+0x80/0xa0
                                spà0000b0759af9b0 bspà0000b0759a9110
 [<a000000100017350>] show_regs+0x890/0x8c0
                                spà0000b0759afb80 bspà0000b0759a90c8
 [<a00000010003c970>] die+0x150/0x240
                                spà0000b0759afba0 bspà0000b0759a9088
 [<a00000010005d870>] ia64_do_page_fault+0x9f0/0xba0
                                spà0000b0759afba0 bspà0000b0759a9020
 [<a00000010000f480>] ia64_leave_kernel+0x0/0x260
                                spà0000b0759afc30 bspà0000b0759a9020
 [<a00000010007ddf0>] r_show+0x50/0x120
                                spà0000b0759afe00 bspà0000b0759a8fb0
 [<a00000010015ee70>] seq_read+0x730/0x940
                                spà0000b0759afe10 bspà0000b0759a8f28
 [<a000000100111590>] vfs_read+0x290/0x360
                                spà0000b0759afe20 bspà0000b0759a8ed8
 [<a000000100111bd0>] sys_read+0x70/0xe0
                                spà0000b0759afe20 bspà0000b0759a8e60


[root@altix3 m_001\x01_b_5_s_1]#


I'm close to what I think an appropriate solution is.  I could just remove
the erroneous call to pci_free_resources, however, I believe that this may
cause some drivers (that are dependent on the pci_free_resources call) to 
break. 

Since pci_free_resources is a legacy (deprecated?) call that we should modify
the code such that release_resources sets the appropriate pci_dev->res pointers
to null, and pci_free_resources only attempts to free a resource if the 
pointer is not NULL.

At that time we would output a warning message about the deprecation of the
call ... IMO, pci_free_resources should never be called.  

Of course, you could argue that the drivers are broken if they depend on 
pci_free_resources anyway ;)


P.

>


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latest bk kernel does not properly free PCI IO & MEM allocations
  2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
                   ` (3 preceding siblings ...)
  2005-03-10 18:50 ` Prarit Bhargava
@ 2005-03-10 23:23 ` Greg KH
  2005-03-11 21:30 ` Prarit Bhargava
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Greg KH @ 2005-03-10 23:23 UTC (permalink / raw)
  To: linux-hotplug

On Thu, Mar 10, 2005 at 01:50:24PM -0500, Prarit Bhargava wrote:
> 
> 
> I'm close to what I think an appropriate solution is.  I could just remove
> the erroneous call to pci_free_resources, however, I believe that this may
> cause some drivers (that are dependent on the pci_free_resources call) to 
> break. 
> 
> Since pci_free_resources is a legacy (deprecated?) call that we should 
> modify the code such that release_resources sets the appropriate
> pci_dev->res pointers to null, and pci_free_resources only attempts to
> free a resource if the pointer is not NULL.

pci_free_resources is not a legacy call, it's a function internal to the
pci core.

Does the following patch fix the issue for you?

thanks,

greg k-h

=== drivers/pci/remove.c 1.12 vs edited ==--- 1.12/drivers/pci/remove.c	2005-01-18 12:22:39 -08:00
+++ edited/drivers/pci/remove.c	2005-03-10 15:22:52 -08:00
@@ -19,7 +19,7 @@ static void pci_free_resources(struct pc
 	pci_cleanup_rom(dev);
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *res = dev->resource + i;
-		if (res->parent)
+		if (res && res->parent)
 			release_resource(res);
 	}
 }


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latest bk kernel does not properly free PCI IO & MEM allocations
  2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
                   ` (4 preceding siblings ...)
  2005-03-10 23:23 ` Greg KH
@ 2005-03-11 21:30 ` Prarit Bhargava
  2005-03-12  7:38 ` Greg KH
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Prarit Bhargava @ 2005-03-11 21:30 UTC (permalink / raw)
  To: linux-hotplug

[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]

Hi Greg,

Greg KH wrote:

>pci_free_resources is not a legacy call, it's a function internal to the
>pci core.
>  
>
Sorry -- I read that back and realize how incorrect that was.

What I meant to say was pci_free_resources calls release_resource, where
release_region calls __release_region.  __release_region is called a 
"legacy" function?  I
assumed this meant that it was to be deprecated?  Oh ... Uh ... does the 
term Legacy
really refer to Legacy address space?  (Prarit tries not to look 
sheepish asking that question.)

>Does the following patch fix the issue for you?
>
>  
>
Nope, because the res pointer still points to an "old" memory address -- 
then the res value check succeeds and I would oops if res->parent was 
NULL. (Haven't tried it but I think it will still die if SLAB debug is 
on ...)

How about the attached patch?  (I think the warning is a bit wordy ... 
if you have a better idea I'm all for it ... If you want to dump the 
warning I'm fine with that too...)

P.

P.S.  Greg, I've sent 3 separate requests to the linux PCI list but 
haven't heard anything back.  Do you know who the maintainer is?  ... 
I'll double check my spam filter ...




[-- Attachment #2: pci.patch --]
[-- Type: text/x-patch, Size: 928 bytes --]

===== drivers/pci/remove.c 1.6 vs edited =====
--- 1.6/drivers/pci/remove.c	2005-01-14 18:06:55 -05:00
+++ edited/drivers/pci/remove.c	2005-03-11 21:18:22 -05:00
@@ -19,8 +19,12 @@
 	pci_cleanup_rom(dev);
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *res = dev->resource + i;
-		if (res->parent)
+		if (res && res->parent) {
+			printk(KERN_WARNING "WARNING: PCI RESOURCE 0x%lx:0x%lx for dev %s free'd by PCI subsystem.",
+				pci_resource_start(dev,i), pci_resource_end(dev,i), dev->slot_name);
+			printk(KERN_WARNING " This should be done at the driver level.");
 			release_resource(res);
+		}
 	}
 }
 
===== kernel/resource.c 1.26 vs edited =====
--- 1.26/kernel/resource.c	2005-01-08 00:44:13 -05:00
+++ edited/kernel/resource.c	2005-03-11 20:56:47 -05:00
@@ -505,6 +505,7 @@
 			*p = res->sibling;
 			write_unlock(&resource_lock);
 			kfree(res);
+			res = NULL;
 			return;
 		}
 		p = &res->sibling;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latest bk kernel does not properly free PCI IO & MEM allocations
  2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
                   ` (5 preceding siblings ...)
  2005-03-11 21:30 ` Prarit Bhargava
@ 2005-03-12  7:38 ` Greg KH
  2005-03-12 17:04 ` prarit
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Greg KH @ 2005-03-12  7:38 UTC (permalink / raw)
  To: linux-hotplug

On Fri, Mar 11, 2005 at 04:30:07PM -0500, Prarit Bhargava wrote:
> Hi Greg,
> 
> Greg KH wrote:
> 
> >pci_free_resources is not a legacy call, it's a function internal to the
> >pci core.
> > 
> >
> Sorry -- I read that back and realize how incorrect that was.
> 
> What I meant to say was pci_free_resources calls release_resource, where
> release_region calls __release_region.  __release_region is called a 
> "legacy" function?

What documentation calls it that?

> >Does the following patch fix the issue for you?
> >
> > 
> >
> Nope, because the res pointer still points to an "old" memory address -- 
> then the res value check succeeds and I would oops if res->parent was 
> NULL. (Haven't tried it but I think it will still die if SLAB debug is 
> on ...)
> 
> How about the attached patch?  (I think the warning is a bit wordy ... 
> if you have a better idea I'm all for it ... If you want to dump the 
> warning I'm fine with that too...)

I don't like the wording at all, no one will notice it (trust me I've
tried stuff like this before...)

But what's really curious, is why no one has hit this before.  Nothing
has changed recently in this area of the kernel.  Did this used to work
before?  Does it work just fine without the patch for other drivers?

thanks,

greg k-h


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latest bk kernel does not properly free PCI IO & MEM allocations
  2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
                   ` (6 preceding siblings ...)
  2005-03-12  7:38 ` Greg KH
@ 2005-03-12 17:04 ` prarit
  2005-03-13  0:49 ` Re: Latest bk kernel does not properly free PCI IO & MEM prarit
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: prarit @ 2005-03-12 17:04 UTC (permalink / raw)
  To: linux-hotplug

Greg,

Thanks for the reply.  I'm at home today and don't have access to my SGI
account.  If you want to fwd this to the mailing list please feel free.
If I don't see it Monday morning, I'll fwd it back to the list then.

>> What I meant to say was pci_free_resources calls release_resource, where
>> release_region calls __release_region.  __release_region is called a 
>> "legacy" function?
 
> What documentation calls it that?

Hrmm ... I distinctly remember seeing that somewhere.  I'll find it again.
I also recall the term "compatibility cruft" used ... in ioport.h?

>I don"t like the wording at all, no one will notice it (trust me I"ve
> tried stuff like this before...)

Yeah ... I know.  I've tried it before too.  I've been working in this
space for a long time and it's clear to me that it needs a compile time 
#define DEBUG printk option.  Issues within the resource allocation code 
are impossible to debug without sticking a large # of printk's 
throughout the code.

>But what"s really curious, is why no one has hit this before.  Nothing
> has changed recently in this area of the kernel.  Did this used to work
> before?  Does it work just fine without the patch for other drivers?

I haven't gone back through kernels to determine where this "breakage"
occurred, but I do know that it is in the 2.6.9 kernel as I have been 
developing on a RHEL4 platform.

This is how I stumbled across this:  As you know I'm building and am
testing an SGI Altix Hotplug Driver.  

While testing I started up a few memory stress and IO stress tests
*that did not involve the card I was targeting for the test* and
approximately 5% of the time I hit a NULL pointer oops.  

At first glance, I thought the issue was within the sysfs/proc 
filesystems or pci_free_resources as that's where the oops' were 
-- obviously with more inspection I realized that didn't make
sense and it wasn't  the case.

A few things have to happen (in the precise order) for someone
to hit this issue.  Note that I have been running on 16, 32, and
64 cpu systems so it is very like #4 below is due to another CPU.

1.  HP slot is disabled via sysfs.
2.  PCI driver must call release_regions
3.  release_regions kfree's resource structures
4.  Context switch/Another CPU: resource area is alloc'd by 
    something else.
5.  pci_free_resources is called

Possible oops right here.

I've also stumbled across the case where the pci_free_resources case 
tried to free non-existant regions -- dumping the memory address indicated
in one case that I was looking at char data ... I recall that it 
looked like I was looking at the word "qla".  I dumped /proc/iomem
and /proc/ioports and incurred an oops in /proc .

Suppose #4 above doesn't happen.  Step #5 occurs and no one (user or 
system) is the wiser.  The memory is still intact -- no oops.

Additionally, I've seen people using PCI Hotplug in the field. 
Typically when removing a card sysadmins tend to quiesce the system
before removal.  That makes #4 just that much less likely to occur.

:) :)  Want the long explanation? :) :)

P.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: Latest bk kernel does not properly free PCI IO & MEM
  2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
                   ` (7 preceding siblings ...)
  2005-03-12 17:04 ` prarit
@ 2005-03-13  0:49 ` prarit
  2005-03-15  6:11 ` Re: Latest bk kernel does not properly free PCI IO & MEM allocations Rajesh Shah
  2005-03-15 12:55 ` Prarit Bhargava
  10 siblings, 0 replies; 12+ messages in thread
From: prarit @ 2005-03-13  0:49 UTC (permalink / raw)
  To: linux-hotplug

> So running a UP kernel on your box doesn't cause any problems?

Gee .. I haven't actually tried it :).  That's probably a 
good test to run ... I'll try that first thing on 
Monday.

> > Additionally, I've seen people using PCI Hotplug in the field. 
> > Typically when removing a card sysadmins tend to quiesce the system
> > before removal.  That makes #4 just that much less likely to occur.
> 
> It's usually a good idea to do this :)
> 

In some sense, yes.  I would argue that it's not a 
PCI HotPlug issue, but a driver issue -- most of the
drivers out there don't pay a lot of attention to the
stability of the driver remove function :(

I've seen all sorts of things in removes that are done
wrong -- from not terminating timers, to not quiecing
queues, to not properly kfree'ing memory.

Qlogic drivers are especially bad :) ... That's why
I hesitate to test with them in the system but in this
case it was what was available to me.

> > :) :)  Want the long explanation? :) :)
> 
> Sure...
> 

I'll go back to work and pull out the oops' etc.

> Anyway, I'll add the basic, NULL out the pointer and check for it patch
> to the tree soon, so that your oops should not happen anymore.

Thanks Greg :)  I've tested with that patch and it
seems better.  IMO though, we still have a glitch
in PCI.  I'm not convinced that it should be calling
pci_free_resources at all.  

> 
> thanks,
> 
> greg k-h
> 

Thanks,

P.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: Latest bk kernel does not properly free PCI IO & MEM allocations
  2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
                   ` (8 preceding siblings ...)
  2005-03-13  0:49 ` Re: Latest bk kernel does not properly free PCI IO & MEM prarit
@ 2005-03-15  6:11 ` Rajesh Shah
  2005-03-15 12:55 ` Prarit Bhargava
  10 siblings, 0 replies; 12+ messages in thread
From: Rajesh Shah @ 2005-03-15  6:11 UTC (permalink / raw)
  To: linux-hotplug

On Sun, Mar 13, 2005 at 12:49:42AM +0000, prarit wrote:
> 
> I'll go back to work and pull out the oops' etc.
> 
> > Anyway, I'll add the basic, NULL out the pointer and check for it patch
> > to the tree soon, so that your oops should not happen anymore.
> 
> Thanks Greg :)  I've tested with that patch and it
> seems better.  IMO though, we still have a glitch
> in PCI.  I'm not convinced that it should be calling
> pci_free_resources at all.  
> 
Note that ia64 arch code calls pci_claim_resources() at bus
scan time (in pcibios_fixup_bus()). This adds the resources
in pci_dev structure to the resource list, even before the
corresponding driver is loaded. It is fairly common for a
device driver to then call pci_request_regions() when it is
loaded. This causes a new resource descriptor to be allocated
and added to the resource list.  For example, on my tiger4
ia64 system, /proc/iomem shows:

<snip>
f9000000-fbffffff : PCI Bus 0000:00
  f9ff0000-f9ff03ff : 0000:00:1d.7
    f9ff0000-f9ff03ff : ehci_hcd
  fa000000-faffffff : 0000:01:01.0
  fbfa0000-fbfbffff : 0000:01:01.0
  fbfd0000-fbfd0fff : 0000:01:01.0
  fbfe0000-fbffffff : 0000:01:00.0
    fbfe0000-fbffffff : e1000
<snip>

Since we have 2 resource descriptors for the same resource in
some cases, they both do need to be removed at device remove
time. However, I saw only 1 resource in your /proc/iomem (which
was presumably after the driver had loaded). I wonder if this is
the problem in this case.

In any case, claiming and freeing pci resources needs wider
cleanup. See

http://marc.theaimsgroup.com/?l=linux-kernel&m\x110841094228134&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m\x110842019012889&w=2

for related discussions.

Rajesh


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latest bk kernel does not properly free PCI IO & MEM allocations
  2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
                   ` (9 preceding siblings ...)
  2005-03-15  6:11 ` Re: Latest bk kernel does not properly free PCI IO & MEM allocations Rajesh Shah
@ 2005-03-15 12:55 ` Prarit Bhargava
  10 siblings, 0 replies; 12+ messages in thread
From: Prarit Bhargava @ 2005-03-15 12:55 UTC (permalink / raw)
  To: linux-hotplug



Rajesh Shah wrote:

>Note that ia64 arch code calls pci_claim_resources() at bus
>scan time (in pcibios_fixup_bus()). This adds the resources
>  
>
I've noticed this as well ...

>in pci_dev structure to the resource list, even before the
>corresponding driver is loaded. It is fairly common for a
>device driver to then call pci_request_regions() when it is
>loaded. This causes a new resource descriptor to be allocated
>and added to the resource list.  For example, on my tiger4
>ia64 system, /proc/iomem shows:
>
><snip>
>f9000000-fbffffff : PCI Bus 0000:00
>  f9ff0000-f9ff03ff : 0000:00:1d.7
>    f9ff0000-f9ff03ff : ehci_hcd
>  fa000000-faffffff : 0000:01:01.0
>  fbfa0000-fbfbffff : 0000:01:01.0
>  fbfd0000-fbfd0fff : 0000:01:01.0
>  fbfe0000-fbffffff : 0000:01:00.0
>    fbfe0000-fbffffff : e1000
><snip>
>
>  
>
UGH.  This gets worse and worse, doesn't it :)

>Since we have 2 resource descriptors for the same resource in
>some cases, they both do need to be removed at device remove
>time. However, I saw only 1 resource in your /proc/iomem (which
>was presumably after the driver had loaded). I wonder if this is
>the problem in this case.
>
>In any case, claiming and freeing pci resources needs wider
>cleanup. See
>
>http://marc.theaimsgroup.com/?l=linux-kernel&m\x110841094228134&w=2
>http://marc.theaimsgroup.com/?l=linux-kernel&m\x110842019012889&w=2
>
>for related discussions.
>
>  
>
Thanks Rajesh -- I appreciate the pointers to earlier discussions.

P.



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-03-15 12:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
2005-03-10 15:07 ` Prarit Bhargava
2005-03-10 17:17 ` Greg KH
2005-03-10 18:16 ` Prarit Bhargava
2005-03-10 18:50 ` Prarit Bhargava
2005-03-10 23:23 ` Greg KH
2005-03-11 21:30 ` Prarit Bhargava
2005-03-12  7:38 ` Greg KH
2005-03-12 17:04 ` prarit
2005-03-13  0:49 ` Re: Latest bk kernel does not properly free PCI IO & MEM prarit
2005-03-15  6:11 ` Re: Latest bk kernel does not properly free PCI IO & MEM allocations Rajesh Shah
2005-03-15 12:55 ` Prarit Bhargava

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.