All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sd name space exhaustion causes system hang
@ 2010-09-20 16:20 Michael Reed
  2010-09-21 15:08 ` Hannes Reinecke
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Reed @ 2010-09-20 16:20 UTC (permalink / raw)
  To: linux-scsi; +Cc: Jeremy Higdon, Tony Ernst

Following a site power outage which re-enabled all the ports on my FC
switches, my system subsequently booted with far too many luns!  I had
let it run hoping it would make multi-user.  It didn't.  :(  It hung solid
after exhausting the last sd device, sdzzz, and attempting to create sdaaaa
and beyond.  I was unable to get a dump.

Discovered using a 2.6.32.13 based system.

Patch at the bottom corrects this by detecting when the last index is
utilized and failing the sd probe of the device.  Patch applies to
scsi-misc-2.6.


Signed-off-by: Michael Reed <mdr@sgi.com>


------------[ cut here ]------------
WARNING: at /usr/src/packages/BUILD/kernel-default-2.6.32.13/linux-2.6.32/block/genhd.c:547 add_disk+0x150/0x3c0()
Modules linked in: dm_mod tpm_tis mptfc qla2xxx tpm mptscsih ide_cd_mod tpm_bios shpchp mptbase lpfc pci_hotplug scsi_transport_fc tg3 button scsi_tgt sg cdrom sd_mod crc_t10dif qla1280 scsi_mod xfs exportfs fan processor sgiioc4 ide_core ioc4 thermal thermal_sys hwmon
Supported: Yes

Call Trace:
 [<a000000100017a80>] show_stack+0x80/0xa0
                                sp=e00000346db7fbf0 bsp=e00000346db712b8
 [<a0000001008e2d30>] dump_stack+0x30/0x50
                                sp=e00000346db7fdc0 bsp=e00000346db712a0
 [<a0000001000b8440>] warn_slowpath_common+0xc0/0x120
                                sp=e00000346db7fdc0 bsp=e00000346db71268
 [<a0000001000b84d0>] warn_slowpath_null+0x30/0x60
                                sp=e00000346db7fdc0 bsp=e00000346db71240
 [<a0000001004393f0>] add_disk+0x150/0x3c0
                                sp=e00000346db7fdc0 bsp=e00000346db71210
 [<a000000204e7c310>] sd_probe_async+0x1f0/0x4a0 [sd_mod]
                                sp=e00000346db7fdd0 bsp=e00000346db711b8
 [<a000000100105180>] run_one_entry+0x180/0x4c0
                                sp=e00000346db7fdd0 bsp=e00000346db71170
 [<a0000001001055b0>] async_thread+0xf0/0x1c0
                                sp=e00000346db7fdd0 bsp=e00000346db71140
 [<a0000001000f51e0>] kthread+0x100/0x140
                                sp=e00000346db7fe00 bsp=e00000346db71108
 [<a0000001000154f0>] kernel_thread_helper+0xd0/0x100
                                sp=e00000346db7fe30 bsp=e00000346db710e0
 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                sp=e00000346db7fe30 bsp=e00000346db710e0
---[ end trace 509bba4ec4cded93 ]---
sd 12:0:42:25: Attached scsi generic sg31636 type 0
 sdzzz:
 sdaaaa: unknown partition table
 unknown partition table
sd 13:0:42:41: [sdaaab] 51200 512-byte logical blocks: (26.2 MB/25.0 MiB)
scsi 3:0:43:6: Direct-Access     Linux    (none)           0818 PQ: 0 ANSI: 5
sd 2:0:43:25: Attached scsi generic sg31637 type 0
sd 13:0:42:41: [sdaaab] Write Protect is off
sd 13:0:42:41: [sdaaab] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA

------------[ cut here ]------------
WARNING: at /usr/src/packages/BUILD/kernel-default-2.6.32.13/linux-2.6.32/fs/sysfs/dir.c:491 sysfs_add_one+0x150/0x180()
sysfs: cannot create duplicate filename '/devices/virtual/bdi/0:0'
Modules linked in: dm_mod tpm_tis mptfc qla2xxx tpm mptscsih ide_cd_mod tpm_bios shpchp mptbase lpfc pci_hotplug scsi_transport_fc tg3 button scsi_tgt sg cdrom sd_mod crc_t10dif qla1280 scsi_mod xfs exportfs fan processor sgiioc4 ide_core ioc4 thermal thermal_sys hwmon
Supported: Yes

Call Trace:
 [<a000000100017a80>] show_stack+0x80/0xa0
                                sp=e00000346db9fb10 bsp=e00000346db915d8
 [<a0000001008e2d30>] dump_stack+0x30/0x50
                                sp=e00000346db9fce0 bsp=e00000346db915c0
 [<a0000001000b8440>] warn_slowpath_common+0xc0/0x120
                                sp=e00000346db9fce0 bsp=e00000346db91588
 [<a0000001000b8590>] warn_slowpath_fmt+0x90/0xc0
                                sp=e00000346db9fce0 bsp=e00000346db91528
 [<a00000010032f8b0>] sysfs_add_one+0x150/0x180
                                sp=e00000346db9fd20 bsp=e00000346db914e8
 [<a000000100330640>] create_dir+0x80/0x100
                                sp=e00000346db9fd20 bsp=e00000346db914b0
 [<a000000100330730>] sysfs_create_dir+0x70/0x100
                                sp=e00000346db9fd40 bsp=e00000346db91490
 [<a0000001004ac570>] kobject_add_internal+0x210/0x5c0
                                sp=e00000346db9fd50 bsp=e00000346db91448
 [<a0000001004acb20>] kobject_add_varg+0x60/0xc0
                                sp=e00000346db9fd50 bsp=e00000346db91410
 [<a0000001004accd0>] kobject_add+0x90/0x140
                                sp=e00000346db9fd50 bsp=e00000346db913a8
 [<a000000100636f00>] device_add+0x1a0/0xbc0
                                sp=e00000346db9fd80 bsp=e00000346db91348
 [<a000000100637950>] device_register+0x30/0x60
                                sp=e00000346db9fd90 bsp=e00000346db91328
 [<a000000100637b00>] device_create_vargs+0x180/0x1a0
                                sp=e00000346db9fd90 bsp=e00000346db912d8
 [<a0000001001c95d0>] bdi_register+0xf0/0x420
                                sp=e00000346db9fd90 bsp=e00000346db91268
 [<a0000001001c9940>] bdi_register_dev+0x40/0x60
                                sp=e00000346db9fdc0 bsp=e00000346db91240
 [<a0000001004395c0>] add_disk+0x320/0x3c0
                                sp=e00000346db9fdc0 bsp=e00000346db91210
 [<a000000204e7c310>] sd_probe_async+0x1f0/0x4a0 [sd_mod]
                                sp=e00000346db9fdd0 bsp=e00000346db911b8
 [<a000000100105180>] run_one_entry+0x180/0x4c0
                                sp=e00000346db9fdd0 bsp=e00000346db91170
 [<a0000001001055b0>] async_thread+0xf0/0x1c0
                                sp=e00000346db9fdd0 bsp=e00000346db91140
 [<a0000001000f51e0>] kthread+0x100/0x140
                                sp=e00000346db9fe00 bsp=e00000346db91108
 [<a0000001000154f0>] kernel_thread_helper+0xd0/0x100
                                sp=e00000346db9fe30 bsp=e00000346db910e0
 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                sp=e00000346db9fe30 bsp=e00000346db910e0
---[ end trace 509bba4ec4cded97 ]---
kobject_add_internal failed for 0:0 with -EEXIST, don't try to register things with the same name in the same directory.


diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8c9b275..72bb658 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2252,11 +2252,10 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
 	index = sdkp->index;
 	dev = &sdp->sdev_gendev;
 
-	if (index < SD_MAX_DISKS) {
-		gd->major = sd_major((index & 0xf0) >> 4);
-		gd->first_minor = ((index & 0xf) << 4) | (index & 0xfff00);
-		gd->minors = SD_MINORS;
-	}
+	gd->major = sd_major((index & 0xf0) >> 4);
+	gd->first_minor = ((index & 0xf) << 4) | (index & 0xfff00);
+	gd->minors = SD_MINORS;
+		
 	gd->fops = &sd_fops;
 	gd->private_data = &sdkp->driver;
 	gd->queue = sdkp->device->request_queue;
@@ -2346,6 +2345,12 @@ static int sd_probe(struct device *dev)
 	if (error)
 		goto out_put;
 
+	if (index >= SD_MAX_DISKS) {
+		error = -ENODEV;
+		sdev_printk(KERN_WARNING, sdp, "SCSI disk (sd) name space exhausted.\n");
+		goto out_free_index;
+	}
+
 	error = sd_format_disk_name("sd", index, gd->disk_name, DISK_NAME_LEN);
 	if (error)
 		goto out_free_index;

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] sd name space exhaustion causes system hang
  2010-09-20 16:20 [PATCH] sd name space exhaustion causes system hang Michael Reed
@ 2010-09-21 15:08 ` Hannes Reinecke
  2010-09-21 15:58   ` Michael Reed
  0 siblings, 1 reply; 4+ messages in thread
From: Hannes Reinecke @ 2010-09-21 15:08 UTC (permalink / raw)
  To: Michael Reed; +Cc: linux-scsi, Jeremy Higdon, Tony Ernst

Michael Reed wrote:
> Following a site power outage which re-enabled all the ports on my FC
> switches, my system subsequently booted with far too many luns!  I had
> let it run hoping it would make multi-user.  It didn't.  :(  It hung solid
> after exhausting the last sd device, sdzzz, and attempting to create sdaaaa
> and beyond.  I was unable to get a dump.
> 
> Discovered using a 2.6.32.13 based system.
> 
> Patch at the bottom corrects this by detecting when the last index is
> utilized and failing the sd probe of the device.  Patch applies to
> scsi-misc-2.6.
> 
Hmm. Shouldn't we rather use dynamic majors once we're over
SD_MAJORS? We do have enough space in the 'index' field and the name
is generated dynamically anyway ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sd name space exhaustion causes system hang
  2010-09-21 15:08 ` Hannes Reinecke
@ 2010-09-21 15:58   ` Michael Reed
  2010-09-21 17:16     ` Michael Reed
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Reed @ 2010-09-21 15:58 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-scsi, Jeremy Higdon, Tony Ernst



On 09/21/2010 10:08 AM, Hannes Reinecke wrote:
> Michael Reed wrote:
>> Following a site power outage which re-enabled all the ports on my FC
>> switches, my system subsequently booted with far too many luns!  I had
>> let it run hoping it would make multi-user.  It didn't.  :(  It hung solid
>> after exhausting the last sd device, sdzzz, and attempting to create sdaaaa
>> and beyond.  I was unable to get a dump.
>>
>> Discovered using a 2.6.32.13 based system.
>>
>> Patch at the bottom corrects this by detecting when the last index is
>> utilized and failing the sd probe of the device.  Patch applies to
>> scsi-misc-2.6.
>>
> Hmm. Shouldn't we rather use dynamic majors once we're over
> SD_MAJORS? We do have enough space in the 'index' field and the name
> is generated dynamically anyway ...

A bit beyond the scope of what I was trying to correct.  And beyond my
current knowledge of how things work.  If you'd care to guide me, or
provide me a patch for testing....  I can probably get my config up to
30,000 or so luns.

Mike

> 
> Cheers,
> 
> Hannes

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sd name space exhaustion causes system hang
  2010-09-21 15:58   ` Michael Reed
@ 2010-09-21 17:16     ` Michael Reed
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Reed @ 2010-09-21 17:16 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-scsi, Jeremy Higdon, Tony Ernst



On 09/21/2010 10:58 AM, Michael Reed wrote:
> 
> 
> On 09/21/2010 10:08 AM, Hannes Reinecke wrote:
>> Michael Reed wrote:
>>> Following a site power outage which re-enabled all the ports on my FC
>>> switches, my system subsequently booted with far too many luns!  I had
>>> let it run hoping it would make multi-user.  It didn't.  :(  It hung solid
>>> after exhausting the last sd device, sdzzz, and attempting to create sdaaaa
>>> and beyond.  I was unable to get a dump.
>>>
>>> Discovered using a 2.6.32.13 based system.
>>>
>>> Patch at the bottom corrects this by detecting when the last index is
>>> utilized and failing the sd probe of the device.  Patch applies to
>>> scsi-misc-2.6.
>>>
>> Hmm. Shouldn't we rather use dynamic majors once we're over
>> SD_MAJORS? We do have enough space in the 'index' field and the name
>> is generated dynamically anyway ...
> 
> A bit beyond the scope of what I was trying to correct.  And beyond my
> current knowledge of how things work.  If you'd care to guide me, or
> provide me a patch for testing....  I can probably get my config up to
> 30,000 or so luns.
> 
> Mike

Have we exhausted SD_MAJORS when SD_MAX_DISKS is exhausted?  I don't
think so.

        gd->major = sd_major((index & 0xf0) >> 4);
        gd->first_minor = ((index & 0xf) << 4) | (index & 0xfff00);
        gd->minors = SD_MINORS;

There appear to be 16 majors allocated.  SD_MINORS is also 16.  Minors
are used for partitions, right?  The number of sd devices assigned to a
major is the max number of minors / SD_MINORS.  It appears that a minor
number is 20 bits.  That corresponds to 1,048,576 minors per major.
Dividing by 16 minors per device leaves us with 65536 sd devices which
can be assigned to a major number.

As the sd namespace is currently limited to 18,xxx devices, I don't believe
we've filled all the sd major numbers which are allocated.  16 * 65536 =
1,048,576 possible sd devices.

If this is correct, I see no reason to convert the code to dynamic majors
as it's really quite doubtful that all possible sd devices will ever be
created on a system.

Mike


> 
>>
>> Cheers,
>>
>> Hannes
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-09-21 17:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-20 16:20 [PATCH] sd name space exhaustion causes system hang Michael Reed
2010-09-21 15:08 ` Hannes Reinecke
2010-09-21 15:58   ` Michael Reed
2010-09-21 17:16     ` Michael Reed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.