linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-10 20:39 Badari Pulavarty
  2003-04-10 20:54 ` Randy.Dunlap
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Badari Pulavarty @ 2003-04-10 20:39 UTC (permalink / raw)
  To: linux-kernel, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 571 bytes --]

Hi,

Here is the (sd) patch to support > 4000 disks on 32-bit dev_t work
in 2.5.67-mm tree.

This patch addresses the backward compatibility with device nodes
issue. All the new disks will be addressed by only last major.

SCSI has 16 majors. Each major supports 16 disks currently.
This patch leaves this assumption for first 15 majors and all the
new disks addressable by 32/64 dev_t work will be added to
SCSI last major#. This way, we don't need to create device
nodes in /dev, if you switch between 2.4 and 2.5.

Any comments ?

Thanks,
Badari



[-- Attachment #2: sd.new --]
[-- Type: text/x-diff, Size: 2514 bytes --]

--- linux-2.5.67/drivers/scsi/sd.c	Wed Apr  9 13:12:38 2003
+++ linux-2.5.67.new/drivers/scsi/sd.c	Thu Apr 10 13:23:49 2003
@@ -56,7 +56,9 @@
  * Remaining dev_t-handling stuff
  */
 #define SD_MAJORS	16
-#define SD_DISKS	(SD_MAJORS << 4)
+#define SD_DISKS	((SD_MAJORS - 1) << 4)
+#define LAST_MAJOR_DISKS	(1 << (KDEV_MINOR_BITS - 4))
+#define TOTAL_SD_DISKS	(SD_DISKS + LAST_MAJOR_DISKS)
 
 /*
  * Time out in seconds for disks and Magneto-opticals (which are slower).
@@ -85,7 +87,7 @@ struct scsi_disk {
 static LIST_HEAD(sd_devlist);
 static spinlock_t sd_devlist_lock = SPIN_LOCK_UNLOCKED;
 
-static unsigned long sd_index_bits[SD_DISKS / BITS_PER_LONG];
+static unsigned long sd_index_bits[TOTAL_SD_DISKS / BITS_PER_LONG];
 static spinlock_t sd_index_lock = SPIN_LOCK_UNLOCKED;
 
 static void sd_init_onedisk(struct scsi_disk * sdkp, struct gendisk *disk);
@@ -123,7 +125,10 @@ static int sd_major(int major_idx)
 	case 1 ... 7:
 		return SCSI_DISK1_MAJOR + major_idx - 1;
 	case 8 ... 15:
-		return SCSI_DISK8_MAJOR + major_idx;
+		return SCSI_DISK8_MAJOR + major_idx - 8;
+#define MAX_IDX	(TOTAL_SD_DISKS >> 4)
+	case 16 ... MAX_IDX:
+		return SCSI_DISK15_MAJOR;
 	default:
 		BUG();
 		return 0;	/* shut up gcc */
@@ -1313,8 +1318,8 @@ static int sd_attach(struct scsi_device 
 		goto out_free;
 
 	spin_lock(&sd_index_lock);
-	index = find_first_zero_bit(sd_index_bits, SD_DISKS);
-	if (index == SD_DISKS) {
+	index = find_first_zero_bit(sd_index_bits, TOTAL_SD_DISKS);
+	if (index == TOTAL_SD_DISKS) {
 		spin_unlock(&sd_index_lock);
 		error = -EBUSY;
 		goto out_put;
@@ -1329,15 +1334,25 @@ static int sd_attach(struct scsi_device 
 
 	gd->de = sdp->de;
 	gd->major = sd_major(index >> 4);
-	gd->first_minor = (index & 15) << 4;
+#define DISKS_PER_MINOR_MASK	((1 << (KDEV_MINOR_BITS - 4)) - 1)
+	if (index > SD_DISKS) 
+		gd->first_minor = ((index - SD_DISKS) & DISKS_PER_MINOR_MASK) << 4;
+	else
+		gd->first_minor = (index & 15) << 4;
 	gd->minors = 16;
 	gd->fops = &sd_fops;
 
-	if (index >= 26) {
+	if (index < 26) {
+		sprintf(gd->disk_name, "sd%c", 'a' + index % 26);
+	} else if (index < (26*27)) {
 		sprintf(gd->disk_name, "sd%c%c",
 			'a' + index/26-1,'a' + index % 26);
 	} else {
-		sprintf(gd->disk_name, "sd%c", 'a' + index % 26);
+		const unsigned int m1 = (index/ 26 - 1) / 26 - 1;
+		const unsigned int m2 = (index / 26 - 1) % 26;
+		const unsigned int m3 = index % 26;
+		sprintf(gd->disk_name, "sd%c%c%c", 
+			'a' + m1, 'a' + m2, 'a' + m3);
 	}
 
 	sd_init_onedisk(sdkp, gd);
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-10 20:39 [patch for playing] Patch to support 4000 disks and maintain backward compatibility Badari Pulavarty
@ 2003-04-10 20:54 ` Randy.Dunlap
  2003-04-11  0:08 ` Roman Zippel
  2003-04-11  8:04 ` [patch for playing] Patch to support 4000 disks and maintain Giuliano Pochini
  2 siblings, 0 replies; 13+ messages in thread
From: Randy.Dunlap @ 2003-04-10 20:54 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: linux-kernel, linux-scsi

On Thu, 10 Apr 2003 13:39:49 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote:

| Here is the (sd) patch to support > 4000 disks on 32-bit dev_t work
| in 2.5.67-mm tree.
| 
| This patch addresses the backward compatibility with device nodes
| issue. All the new disks will be addressed by only last major.
| 
| SCSI has 16 majors. Each major supports 16 disks currently.
| This patch leaves this assumption for first 15 majors and all the
| new disks addressable by 32/64 dev_t work will be added to
| SCSI last major#. This way, we don't need to create device
| nodes in /dev, if you switch between 2.4 and 2.5.
| 
| Any comments ?


 #define SD_MAJORS	16
-#define SD_DISKS	(SD_MAJORS << 4)
+#define SD_DISKS	((SD_MAJORS - 1) << 4)
+#define LAST_MAJOR_DISKS	(1 << (KDEV_MINOR_BITS - 4))
+#define TOTAL_SD_DISKS	(SD_DISKS + LAST_MAJOR_DISKS)
 
@@ -85,7 +87,7 @@ struct scsi_disk {
 static LIST_HEAD(sd_devlist);
 static spinlock_t sd_devlist_lock = SPIN_LOCK_UNLOCKED;
 
-static unsigned long sd_index_bits[SD_DISKS / BITS_PER_LONG];
+static unsigned long sd_index_bits[TOTAL_SD_DISKS / BITS_PER_LONG];
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~
If there's any chance that TOTAL_SD_DISKS is not a multiple of
BITS_PER_LONG, then the value above should better be

	(TOTAL_SD_DISKS + BITS_PER_LONG - 1) / BITS_PER_LONG


--
~Randy   ['tangent' is not a verb...unless you believe that
          "in English any noun can be verbed."]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-10 20:39 [patch for playing] Patch to support 4000 disks and maintain backward compatibility Badari Pulavarty
  2003-04-10 20:54 ` Randy.Dunlap
@ 2003-04-11  0:08 ` Roman Zippel
  2003-04-11  1:25   ` Badari Pulavarty
  2003-04-11  8:04 ` [patch for playing] Patch to support 4000 disks and maintain Giuliano Pochini
  2 siblings, 1 reply; 13+ messages in thread
From: Roman Zippel @ 2003-04-11  0:08 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: linux-kernel, linux-scsi

Hi,

On Thu, 10 Apr 2003, Badari Pulavarty wrote:

> This patch addresses the backward compatibility with device nodes
> issue. All the new disks will be addressed by only last major.

This nicely demonstrates, that it's not exactly becoming nicer, when one 
has to deal with compatibility. This is one more reason to at least 
consider a more general solution, from which all drivers can benefit from.

bye, Roman


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-11  0:08 ` Roman Zippel
@ 2003-04-11  1:25   ` Badari Pulavarty
  2003-04-11 15:43     ` Joel Becker
  0 siblings, 1 reply; 13+ messages in thread
From: Badari Pulavarty @ 2003-04-11  1:25 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, linux-scsi

On Thursday 10 April 2003 05:08 pm, Roman Zippel wrote:
> Hi,
>
> On Thu, 10 Apr 2003, Badari Pulavarty wrote:
> > This patch addresses the backward compatibility with device nodes
> > issue. All the new disks will be addressed by only last major.
>
> This nicely demonstrates, that it's not exactly becoming nicer, when one
> has to deal with compatibility. This is one more reason to at least
> consider a more general solution, from which all drivers can benefit from.

I am all for more general solution (dynamic assignment), if I can get

(1) backward compatibility with device nodes
(2) device nodes get updated automagically whenever my
<major,minor> changes. (may be due to insmod/rmmod, reboot etc..)

I can't see (2) happening easily. I know that Greg KH is working on
udev (/dev/ memory filesystem). Once that happens, we have to change
drivers/subsystems (we need) to make dynamic allocation. All of this Is 
going to happen for 2.6 ?

Thats why, I am trying to come out with half-cooked workable solution for 2.6. 

Thanks,
Badari

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [patch for playing] Patch to support 4000 disks and maintain
  2003-04-10 20:39 [patch for playing] Patch to support 4000 disks and maintain backward compatibility Badari Pulavarty
  2003-04-10 20:54 ` Randy.Dunlap
  2003-04-11  0:08 ` Roman Zippel
@ 2003-04-11  8:04 ` Giuliano Pochini
  2003-04-11 15:44   ` Joel Becker
  2 siblings, 1 reply; 13+ messages in thread
From: Giuliano Pochini @ 2003-04-11  8:04 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: linux-scsi, linux-kernel


On 10-Apr-2003 Badari Pulavarty wrote:
> Hi,
>
> Here is the (sd) patch to support > 4000 disks on 32-bit dev_t work
> in 2.5.67-mm tree.
>
> This patch addresses the backward compatibility with device nodes
> issue. All the new disks will be addressed by only last major.
>
> SCSI has 16 majors. Each major supports 16 disks currently. [...]

4000 discs should be enough for anyone :)
Are >16 partitions/disc possible ?


Bye.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-11  1:25   ` Badari Pulavarty
@ 2003-04-11 15:43     ` Joel Becker
  0 siblings, 0 replies; 13+ messages in thread
From: Joel Becker @ 2003-04-11 15:43 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Roman Zippel, linux-kernel, linux-scsi

On Thu, Apr 10, 2003 at 06:25:12PM -0700, Badari Pulavarty wrote:
> I can't see (2) happening easily. I know that Greg KH is working on
> udev (/dev/ memory filesystem). Once that happens, we have to change
> drivers/subsystems (we need) to make dynamic allocation. All of this Is 
> going to happen for 2.6 ?

	Just to be clear, let's not repeat the devfsd fiasco.  If we
have dynamic update of device nodes (we need it), I want it to work with
my /dev on ext3.

Joel

-- 

"You look in her eyes, the music begins to play.
 Hopeless romantics, here we go again."

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain
  2003-04-11  8:04 ` [patch for playing] Patch to support 4000 disks and maintain Giuliano Pochini
@ 2003-04-11 15:44   ` Joel Becker
  2003-04-11 16:28     ` Badari Pulavarty
  0 siblings, 1 reply; 13+ messages in thread
From: Joel Becker @ 2003-04-11 15:44 UTC (permalink / raw)
  To: Giuliano Pochini; +Cc: Badari Pulavarty, linux-scsi, linux-kernel

On Fri, Apr 11, 2003 at 10:04:30AM +0200, Giuliano Pochini wrote:
> 4000 discs should be enough for anyone :)
> Are >16 partitions/disc possible ?

	>16 partitions/disc is possible once you remove Linux's
arbitrary limit.
	4000 disks is today.  8000 disks is next year at the latest.
Just imagine multipathing today's 4000 disks.

Joel

-- 

"Well-timed silence hath more eloquence than speech."  
         - Martin Fraquhar Tupper

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain
  2003-04-11 15:44   ` Joel Becker
@ 2003-04-11 16:28     ` Badari Pulavarty
  2003-04-11 17:57       ` Joel Becker
  0 siblings, 1 reply; 13+ messages in thread
From: Badari Pulavarty @ 2003-04-11 16:28 UTC (permalink / raw)
  To: Joel Becker, Giuliano Pochini; +Cc: linux-scsi, linux-kernel

On Friday 11 April 2003 08:44 am, Joel Becker wrote:
> On Fri, Apr 11, 2003 at 10:04:30AM +0200, Giuliano Pochini wrote:
> > 4000 discs should be enough for anyone :)
> > Are >16 partitions/disc possible ?
> >
> 	>16 partitions/disc is possible once you remove Linux's
>
> arbitrary limit.
> 	4000 disks is today.  8000 disks is next year at the latest.

Well !! My patch does not have anything hardcoded to 4000.
Depends on how many minor bits. But we have to put a hardlimit
somewhere ..

> Just imagine multipathing today's 4000 disks.

Fortunately, the multipath solution Mike Anderson & Patrick Mansfield
working on, colapses all the disks you see thro multiple paths into 
number of  realdisks (4000). So you don't really need extra devices 
to support multipathing.

Thanks,
Badari


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain
  2003-04-11 16:28     ` Badari Pulavarty
@ 2003-04-11 17:57       ` Joel Becker
  2003-04-11 18:12         ` Patrick Mansfield
  0 siblings, 1 reply; 13+ messages in thread
From: Joel Becker @ 2003-04-11 17:57 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Giuliano Pochini, linux-scsi, linux-kernel

On Fri, Apr 11, 2003 at 08:28:32AM -0800, Badari Pulavarty wrote:
> Well !! My patch does not have anything hardcoded to 4000.
> Depends on how many minor bits. But we have to put a hardlimit
> somewhere ..

	Yes, but that hardlimit has to expect the number to be Very
Large in the very near future.  16K disks on a big compute farm is not
too far off.

> Fortunately, the multipath solution Mike Anderson & Patrick Mansfield
> working on, colapses all the disks you see thro multiple paths into 
> number of  realdisks (4000). So you don't really need extra devices 
> to support multipathing.

	Yes, but what if I want to see the multiple paths?  Does their
solution allow you to specify the path behind the 'realdisk'?  Does it
allow querying of the paths?

Joel

-- 

Life's Little Instruction Book #451

	"Don't be afraid to say, 'I'm sorry.'"

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain
  2003-04-11 17:57       ` Joel Becker
@ 2003-04-11 18:12         ` Patrick Mansfield
  2003-04-11 18:35           ` Joel Becker
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick Mansfield @ 2003-04-11 18:12 UTC (permalink / raw)
  To: Joel Becker
  Cc: Badari Pulavarty, Giuliano Pochini, linux-scsi, linux-kernel,
	Mike Anderson

On Fri, Apr 11, 2003 at 10:57:37AM -0700, Joel Becker wrote:

> > Fortunately, the multipath solution Mike Anderson & Patrick Mansfield
> > working on, colapses all the disks you see thro multiple paths into 
> > number of  realdisks (4000). So you don't really need extra devices 
> > to support multipathing.
> 
> 	Yes, but what if I want to see the multiple paths?  Does their
> solution allow you to specify the path behind the 'realdisk'?  Does it
> allow querying of the paths?
> 
> Joel

Seeing the path to disk relationship - yes (via proc for now, eventually
via sysfs).

Specifying from user space what path goes with what disk (explicitly): no,
that might require user level scanning.

Query a path - seeing and setting path states yes (via proc right now).
Using sg with a specified path - no, but on the TODO list.

I'm trying to pull the current multi-path patch up to 2.5.66 (ouch). 

The last functioning patch was against 2.5.59, some general info:

http://www-124.ibm.com/storageio/multipath/scsi-multipath/index.php

Or the patch:

http://www-124.ibm.com/storageio/multipath/scsi-multipath/releases/2.5.59-mpath-1.patch.gz

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain
  2003-04-11 18:12         ` Patrick Mansfield
@ 2003-04-11 18:35           ` Joel Becker
  2003-04-11 20:04             ` Patrick Mansfield
  0 siblings, 1 reply; 13+ messages in thread
From: Joel Becker @ 2003-04-11 18:35 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: Badari Pulavarty, Giuliano Pochini, linux-scsi, linux-kernel,
	Mike Anderson

On Fri, Apr 11, 2003 at 11:12:32AM -0700, Patrick Mansfield wrote:
> I'm trying to pull the current multi-path patch up to 2.5.66 (ouch). 

	I wasn't aware of this work.  This is very interesting.  Two
questions:

1) When does it failover?  Meaning, if I I/O to a disk, but someone
yanks the fibrechannel plug.  Does your multipath wait for a SCSI
timeout to redirect the I/O?

2) If so, have you considered trapping loop up/down events to handle
such a case?  Real users of multipath tech do not want to wait 90s for
failover.

Joel

-- 

"Win95 file and print sharing are for relatively friendly nets."
        - Paul Leach, Microsoft

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain
  2003-04-11 18:35           ` Joel Becker
@ 2003-04-11 20:04             ` Patrick Mansfield
  2003-04-11 23:18               ` Joel Becker
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick Mansfield @ 2003-04-11 20:04 UTC (permalink / raw)
  To: Joel Becker
  Cc: Badari Pulavarty, Giuliano Pochini, linux-scsi, linux-kernel,
	Mike Anderson

On Fri, Apr 11, 2003 at 11:35:43AM -0700, Joel Becker wrote:
> On Fri, Apr 11, 2003 at 11:12:32AM -0700, Patrick Mansfield wrote:
> > I'm trying to pull the current multi-path patch up to 2.5.66 (ouch). 
> 
> 	I wasn't aware of this work.  This is very interesting.  Two
> questions:
> 
> 1) When does it failover?  Meaning, if I I/O to a disk, but someone
> yanks the fibrechannel plug.  Does your multipath wait for a SCSI
> timeout to redirect the I/O?

> 2) If so, have you considered trapping loop up/down events to handle
> such a case?  Real users of multipath tech do not want to wait 90s for
> failover.
> 
> Joel

Generally it fails a path when we get a path specific error; it fails the
IO if we get a device (i.e. logical unit) error. If there are no paths
available, the IO is failed (though this could be changed to be
user-settable).

Behaviour on a cable removal is fibre, adapter, and adapter drive specific
- the qla driver has some sort of timeout on a port down that can be
lowered. It (fibre channel) can immediately complete (with failure) an
outstanding IO on a port down (or SCN notification). loop attached does
not always give you notification.

So with loop attached (AFAIK) you still might have to wait for a timeout
if you yank a disk.

Timeouts are the hardest to deal with - since we don't know where the
error occurred, so generally should not fail the IO (or path).

If we had user scanning, and some sort of hotplug for targets coming and
going, those be used to add and remove (or just fail) paths (at least for
switch attached).

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain
  2003-04-11 20:04             ` Patrick Mansfield
@ 2003-04-11 23:18               ` Joel Becker
  0 siblings, 0 replies; 13+ messages in thread
From: Joel Becker @ 2003-04-11 23:18 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: Badari Pulavarty, Giuliano Pochini, linux-scsi, linux-kernel,
	Mike Anderson

On Fri, Apr 11, 2003 at 01:04:07PM -0700, Patrick Mansfield wrote:
> If we had user scanning, and some sort of hotplug for targets coming and
> going, those be used to add and remove (or just fail) paths (at least for
> switch attached).

	That's the issue.  We need notification of add and remove, so we
don't find our multipathing hanging for 90s or more.  That's a big deal
to people, and it's a problem with the current md multipath.

Joel

-- 

 "I'm living so far beyond my income that we may almost be said
 to be living apart."
         - e e cummings

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2003-04-11 23:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-10 20:39 [patch for playing] Patch to support 4000 disks and maintain backward compatibility Badari Pulavarty
2003-04-10 20:54 ` Randy.Dunlap
2003-04-11  0:08 ` Roman Zippel
2003-04-11  1:25   ` Badari Pulavarty
2003-04-11 15:43     ` Joel Becker
2003-04-11  8:04 ` [patch for playing] Patch to support 4000 disks and maintain Giuliano Pochini
2003-04-11 15:44   ` Joel Becker
2003-04-11 16:28     ` Badari Pulavarty
2003-04-11 17:57       ` Joel Becker
2003-04-11 18:12         ` Patrick Mansfield
2003-04-11 18:35           ` Joel Becker
2003-04-11 20:04             ` Patrick Mansfield
2003-04-11 23:18               ` Joel Becker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).