linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-10 23:33 Andries.Brouwer
  2003-04-10 23:37 ` Badari Pulavarty
  0 siblings, 1 reply; 29+ messages in thread
From: Andries.Brouwer @ 2003-04-10 23:33 UTC (permalink / raw)
  To: Andries.Brouwer, linux-kernel, linux-scsi, pbadari

    From: Badari Pulavarty <pbadari@us.ibm.com>

    > Then we don't know which disks have disappeared. Pity.
    > If the number space is infinite then
    >    index = next_index++;
    > gives a new number each time we need one.

    Yes !! I agree. I am not worried about running out them.
    I am more worried about names slipping. I atleast hope
    to see device names not changing by just doing
    rmmod/insmod.

But you see, the present sd_index_bits[] gives no such
guarantee. In sd_detach a bit is cleared, in sd_attach
the first free bit is given out. There is no memory.

Andries

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-10 23:33 [patch for playing] Patch to support 4000 disks and maintain backward compatibility Andries.Brouwer
@ 2003-04-10 23:37 ` Badari Pulavarty
  0 siblings, 0 replies; 29+ messages in thread
From: Badari Pulavarty @ 2003-04-10 23:37 UTC (permalink / raw)
  To: Andries.Brouwer, linux-kernel, linux-scsi

On Thursday 10 April 2003 04:33 pm, Andries.Brouwer@cwi.nl wrote:
>     From: Badari Pulavarty <pbadari@us.ibm.com>
>
>     > Then we don't know which disks have disappeared. Pity.
>     > If the number space is infinite then
>     >    index = next_index++;
>     > gives a new number each time we need one.
>
>     Yes !! I agree. I am not worried about running out them.
>     I am more worried about names slipping. I atleast hope
>     to see device names not changing by just doing
>     rmmod/insmod.
>
> But you see, the present sd_index_bits[] gives no such
> guarantee. In sd_detach a bit is cleared, in sd_attach
> the first free bit is given out. There is no memory.

But the disks are probed in the same manner as last time
(if the disks/controllers are not moved, crashed etc..). 
So we will end up getting same names.

Ofcourse, we need a device naming solution to fix
this for real.

Thanks,
Badari

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-13 13:59 Paul McKenney
  0 siblings, 0 replies; 29+ messages in thread
From: Paul McKenney @ 2003-04-13 13:59 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andries.Brouwer, Linux Kernel, linux-kernel-owner,
	SCSI Mailing List, pbadari





> On Fri, 2003-04-11 at 20:13, Paul McKenney wrote:
> > Some compatibility needs more code than other compatibility.
> > The desired compatibility includes the following, much of which
> > has been noted earlier in this thread, and some of which may
> > need to wait for multipath I/O and other of which might be best
> > provided by a volume manager:
>
> We're talking about two types of compatibility.  The first (which
> everyone agrees on) is that /dev names don't change between 2.4 and
> 2.5.  The second is direct numerical compatibility so that a /dev still
> using the old 8:8 scheme works.
>
> What you're asking for is more a wish list of enhancements:

True, and the more disks one has, the more one would be wishing
for them.

> > o     It must be possible to switch between 2.4 and 2.5/6
> >       kernels without a given disk's name changing.
>
> By and large, this is true.  There will be problems where probe order
> has altered because of changes to bus enumeration schemes or for other
> reasons.

Exactly.

> > o     New 2.5/6 installations should se a clean disk naming
> >       scheme without historical cruft.
>
> They'll all see /dev/sd<A>[n] as in 2.4
>
> > o     Removing or adding one disk should not affect the
> >       names of other disks.  Ideally, moving a given disk
> >       from one place to another should not change its
> >       name.  "The good news is that we repaired your
> >       disk.  The bad news is that, due to the resulting
> >       name changes, your application thoroughly
> >       corrupted all of its data."
>
> True until a reboot, as in 2.4

Again, exactly.  The current state requires the sysadm to
disable the application automatically running on reboot so
that the application's pathnames could be changed -- after
the sysadm works out what the names have changed to.  Ouch.

This does not necessarily need to be done in the kernel,
one could use a volume manager as noted above, or udev
with appropriate plugins, as you noted below.

> > o     Adding or removing a FC or SCSI adapter should not
> >       affect the names of disks hanging off of other
> >       FC or SCSI controllers.  Ideally, the name of
> >       a disk should not change when its FC or SCSI
> >       controller is moved from one slot to another.
>
> That's not a 2.4 guarantee, it won't be a 2.5 one.

Again, one really does not want to have to figure out
which disk changed to what name, then hand-edit all
mount commands, application config files, etc. to change
all the disk and partition names.

And again, something like udev or a volume manager might
be appropriate if this functionality is not to be provided
by the base kernel itself.

> > o     Failures of or repairs to the FC fabric should
> >       not change the names of any of the disks (though
> >       a sufficiently thorough failure might make some
> >       of the disks unreachable).
>
> True until reboot, as in 2.4

See above...

> > o     Cluster nodes should ideally have the same name
> >       for a given disk.  Extra credit, though greatly
> >       appreciated by anyone who has ever had to deal
> >       with a cluster where different nodes have different
> >       names for the same disk.  ;-)
>
> That has never been true in Linux, and not in most commerical unixes,
> either.  Cluster tools are used to do this mapping and most commercially
> available linux cluster tools solve this problem, so there's no need for
> the kernel to do it.
>
> A lot of these enhancements will be covered (or at least a solution will
> be facilitated) by udev.  See
>
> http://marc.theaimsgroup.com/?t=105003184600001&r=1&w=2
>
> (unfortunately, the announcement thread has grown rather big).

Yes, given a plugin that used SCSI's UUID, this could work quite well.

                                    Thanx, Paul


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-12  1:13 Paul McKenney
@ 2003-04-12 14:14 ` James Bottomley
  0 siblings, 0 replies; 29+ messages in thread
From: James Bottomley @ 2003-04-12 14:14 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Andries.Brouwer, Linux Kernel, linux-kernel-owner,
	SCSI Mailing List, pbadari

On Fri, 2003-04-11 at 20:13, Paul McKenney wrote:
> Some compatibility needs more code than other compatibility.
> The desired compatibility includes the following, much of which
> has been noted earlier in this thread, and some of which may
> need to wait for multipath I/O and other of which might be best
> provided by a volume manager:

We're talking about two types of compatibility.  The first (which
everyone agrees on) is that /dev names don't change between 2.4 and
2.5.  The second is direct numerical compatibility so that a /dev still
using the old 8:8 scheme works.

What you're asking for is more a wish list of enhancements:

> o     It must be possible to switch between 2.4 and 2.5/6
>       kernels without a given disk's name changing.

By and large, this is true.  There will be problems where probe order
has altered because of changes to bus enumeration schemes or for other
reasons.

> o     New 2.5/6 installations should se a clean disk naming
>       scheme without historical cruft.

They'll all see /dev/sd<A>[n] as in 2.4

> o     Removing or adding one disk should not affect the
>       names of other disks.  Ideally, moving a given disk
>       from one place to another should not change its
>       name.  "The good news is that we repaired your
>       disk.  The bad news is that, due to the resulting
>       name changes, your application thoroughly
>       corrupted all of its data."

True until a reboot, as in 2.4

> o     Adding or removing a FC or SCSI adapter should not
>       affect the names of disks hanging off of other
>       FC or SCSI controllers.  Ideally, the name of
>       a disk should not change when its FC or SCSI
>       controller is moved from one slot to another.

That's not a 2.4 guarantee, it won't be a 2.5 one.

> o     Failures of or repairs to the FC fabric should
>       not change the names of any of the disks (though
>       a sufficiently thorough failure might make some
>       of the disks unreachable).

True until reboot, as in 2.4

> o     Cluster nodes should ideally have the same name
>       for a given disk.  Extra credit, though greatly
>       appreciated by anyone who has ever had to deal
>       with a cluster where different nodes have different
>       names for the same disk.  ;-)

That has never been true in Linux, and not in most commerical unixes,
either.  Cluster tools are used to do this mapping and most commercially
available linux cluster tools solve this problem, so there's no need for
the kernel to do it.

A lot of these enhancements will be covered (or at least a solution will
be facilitated) by udev.  See

http://marc.theaimsgroup.com/?t=105003184600001&r=1&w=2

(unfortunately, the announcement thread has grown rather big).

James


James



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-12  1:13 Paul McKenney
  2003-04-12 14:14 ` James Bottomley
  0 siblings, 1 reply; 29+ messages in thread
From: Paul McKenney @ 2003-04-12  1:13 UTC (permalink / raw)
  To: Andries.Brouwer
  Cc: Andries.Brouwer, James.Bottomley, linux-kernel,
	linux-kernel-owner, linux-scsi, pbadari





> > It would also be nice for numeric compatibility to be a compile time
option
>
> It sounds as if you expect a lot of old cruft.
> But the compatibility code is just ten lines or so.
> Internally the kernel has an index. Externally there is a dev_t.
> At open() time the dev_t is converted. At registration time
> sd announces interest in three or four dev_t regions.
> That is all.

Some compatibility needs more code than other compatibility.
The desired compatibility includes the following, much of which
has been noted earlier in this thread, and some of which may
need to wait for multipath I/O and other of which might be best
provided by a volume manager:

o     It must be possible to switch between 2.4 and 2.5/6
      kernels without a given disk's name changing.

o     New 2.5/6 installations should se a clean disk naming
      scheme without historical cruft.

o     Removing or adding one disk should not affect the
      names of other disks.  Ideally, moving a given disk
      from one place to another should not change its
      name.  "The good news is that we repaired your
      disk.  The bad news is that, due to the resulting
      name changes, your application thoroughly
      corrupted all of its data."

o     Adding or removing a FC or SCSI adapter should not
      affect the names of disks hanging off of other
      FC or SCSI controllers.  Ideally, the name of
      a disk should not change when its FC or SCSI
      controller is moved from one slot to another.

o     Failures of or repairs to the FC fabric should
      not change the names of any of the disks (though
      a sufficiently thorough failure might make some
      of the disks unreachable).

o     Cluster nodes should ideally have the same name
      for a given disk.  Extra credit, though greatly
      appreciated by anyone who has ever had to deal
      with a cluster where different nodes have different
      names for the same disk.  ;-)

                              Thanx, Paul


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-11 20:14 ` James Bottomley
@ 2003-04-11 23:21   ` Joel Becker
  0 siblings, 0 replies; 29+ messages in thread
From: Joel Becker @ 2003-04-11 23:21 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andries.Brouwer, Linux Kernel, SCSI Mailing List, pbadari

On Fri, Apr 11, 2003 at 03:14:32PM -0500, James Bottomley wrote:
> > Linux does not arbitrarily break old systems. The aim must be
> > to have all combinations of (old/new) kernel with (old/new) glibc
> > to work well in all situations where old kernel + old glibc worked.

	100%

> Well, if you're going to do this, at least make it possible to tie all
> the sd devices to a single major (i.e. the numeric compatibility layer
> simply maps to the new single major scheme internally).  It would also
> be nice for numeric compatibility to be a compile time option too...

	The real issue is that almost all consumers of the new kernel
will have a /dev populated with old numbers.  Only new installs
(completely fresh) won't be burdened.  And new installs won't be the
majority of 2.6 users for quite some time after 2.6.0.

Joel

-- 

"Copy from one, it's plagiarism; copy from two, it's research."
        - Wilson Mizner

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-11 21:13 Andries.Brouwer
  0 siblings, 0 replies; 29+ messages in thread
From: Andries.Brouwer @ 2003-04-11 21:13 UTC (permalink / raw)
  To: Andries.Brouwer, James.Bottomley; +Cc: linux-kernel, linux-scsi, pbadari

> It would also be nice for numeric compatibility to be a compile time option

It sounds as if you expect a lot of old cruft.
But the compatibility code is just ten lines or so.
Internally the kernel has an index. Externally there is a dev_t.
At open() time the dev_t is converted. At registration time
sd announces interest in three or four dev_t regions.
That is all.

Andries

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-11 19:45 Andries.Brouwer
@ 2003-04-11 20:14 ` James Bottomley
  2003-04-11 23:21   ` Joel Becker
  0 siblings, 1 reply; 29+ messages in thread
From: James Bottomley @ 2003-04-11 20:14 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: Linux Kernel, SCSI Mailing List, pbadari

On Fri, 2003-04-11 at 14:45, Andries.Brouwer@cwi.nl wrote:
> I think compatibility is very important.
> Linux does not arbitrarily break old systems. The aim must be
> to have all combinations of (old/new) kernel with (old/new) glibc
> to work well in all situations where old kernel + old glibc worked.

Well, if you're going to do this, at least make it possible to tie all
the sd devices to a single major (i.e. the numeric compatibility layer
simply maps to the new single major scheme internally).  It would also
be nice for numeric compatibility to be a compile time option too...

It's also possible that SCSI may not be the only consumer of such a
compatibility layer (IDE also has multiple majors), so it may be
worthwhile putting it somewhere more globally useful (like
fs/block_dev.c)

James



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-11 19:45 Andries.Brouwer
  2003-04-11 20:14 ` James Bottomley
  0 siblings, 1 reply; 29+ messages in thread
From: Andries.Brouwer @ 2003-04-11 19:45 UTC (permalink / raw)
  To: Andries.Brouwer, James.Bottomley; +Cc: linux-kernel, linux-scsi, pbadari

    From James.Bottomley@SteelEye.com  Fri Apr 11 21:12:28 2003

    > It is me who wants compatibility as far as 8+8 device numbers are
    > concerned, while I can see lots of ways to use new number space.

    This, I'm not too sure about.  I see the value to kernel developers who
    boot between different versions of the kernel, but I think when 2.6 goes
    live and ships to end users, it's better not to have such numeric
    equivalency crufting up the SCSI interfaces.

I think compatibility is very important.
Linux does not arbitrarily break old systems. The aim must be
to have all combinations of (old/new) kernel with (old/new) glibc
to work well in all situations where old kernel + old glibc worked.

Andries



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-11 18:07 Andries.Brouwer
@ 2003-04-11 19:12 ` James Bottomley
  0 siblings, 0 replies; 29+ messages in thread
From: James Bottomley @ 2003-04-11 19:12 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: Linux Kernel, SCSI Mailing List, pbadari

On Fri, 2003-04-11 at 13:07, Andries.Brouwer@cwi.nl wrote:
> It is just that Badari and I were talking about the numbering scheme
> index = next_index++ and he pointed out that the current system
> has a certain weak number preservation guarantee that this
> index = next_index++ does not have. True.

Yes. I was just pointing out this was a byproduct of our compaction
requirement in 8:8, not necessarily a guarantee I think needs
preserving.

> It is me who wants compatibility as far as 8+8 device numbers are
> concerned, while I can see lots of ways to use new number space.

This, I'm not too sure about.  I see the value to kernel developers who
boot between different versions of the kernel, but I think when 2.6 goes
live and ships to end users, it's better not to have such numeric
equivalency crufting up the SCSI interfaces.

James



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-11 18:07 Andries.Brouwer
  2003-04-11 19:12 ` James Bottomley
  0 siblings, 1 reply; 29+ messages in thread
From: Andries.Brouwer @ 2003-04-11 18:07 UTC (permalink / raw)
  To: Andries.Brouwer, James.Bottomley; +Cc: linux-kernel, linux-scsi, pbadari

    From James.Bottomley@SteelEye.com  Fri Apr 11 16:33:37 2003

    On Fri, 2003-04-11 at 06:42, Andries.Brouwer@cwi.nl wrote:
    >     Here is my problem..
    > 
    > OK, I see what you mean. I agree.

    Could you elaborate on the reason you want to keep the minor space
    compact?

That is not necessarily what I want.
Indeed, I see as one of the possible uses of a large dev_t
a hash of a proper name.

It is just that Badari and I were talking about the numbering scheme
index = next_index++ and he pointed out that the current system
has a certain weak number preservation guarantee that this
index = next_index++ does not have. True.

It is Roman who wanted to keep the number space compact.

It is me who wants compatibility as far as 8+8 device numbers are
concerned, while I can see lots of ways to use new number space.

(You need not worry that I worry about preservation of numbering
after rmmod. I am not interested. But in case anybody is, there
is a numbering solution that achieves that, too.)

The whole conversation came because there is an array in sd.c
that must go, or must be limited to the size needed for compatibility.

Andries

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-11 14:33 ` James Bottomley
@ 2003-04-11 16:21   ` Badari Pulavarty
  0 siblings, 0 replies; 29+ messages in thread
From: Badari Pulavarty @ 2003-04-11 16:21 UTC (permalink / raw)
  To: James Bottomley, Andries.Brouwer; +Cc: Linux Kernel, SCSI Mailing List

On Friday 11 April 2003 07:33 am, James Bottomley wrote:
> On Fri, 2003-04-11 at 06:42, Andries.Brouwer@cwi.nl wrote:
> >     Here is my problem..
> >
> >     #insmod ips.o
> >       < found 10 disks>
> >     #insmod qla2300.o
> >       < found 10 disks>
> >     #rmmod ips.o
> >        <removed 10 disks>
> >     #insmod ips.o
> >       <found 10 disks - but new names>
> >
> > OK, I see what you mean. I agree.
>
> Could you elaborate on the reason you want to keep the minor space
> compact?  I don't regard the insmod/rmmod problem as valid because if
> you do:
>
> rmmod ips.o
> rmmod qla2300.o
> insmod qla2300.o
> insmod ips.o
>
> All bets are off again. For small kernel dev_t it was essential to keep
> a compact minor space because otherwise we coulde run out of minors.
> Sparse minors cause no inefficiency in the mid-layer, or in sd.  There
> are problems in sg which could be solved by encoding the device type in
> the minor.

Here user/admin atleast knows what he is doing. So they have to deal
with it. (Proper device naming solution would be great here).

But just by doing rmmod/insmod if my device names change, it will
be a pain. For example, in my case, i have to re-do all my raw device
bindings to just start the database. This will be a problem with 
dynamic <major, minor> assignments also. (Again, i will need a proper
device naming solution here).

Thanks,
Badari


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-11 10:09   ` Douglas Gilbert
@ 2003-04-11 16:12     ` Badari Pulavarty
  0 siblings, 0 replies; 29+ messages in thread
From: Badari Pulavarty @ 2003-04-11 16:12 UTC (permalink / raw)
  To: dougg; +Cc: Andries.Brouwer, linux-kernel, linux-scsi

On Friday 11 April 2003 03:09 am, Douglas Gilbert wrote:
> Badari Pulavarty wrote:
> > Here is my problem..
> >
> > #insmod ips.o
> >   < found 10 disks>
> > #insmod qla2300.o
> >   < found 10 disks>
> > #rmmod ips.o
> >    <removed 10 disks>
> > #insmod ips.o
> >   <found 10 disks - but new names>
>
> Badari,
> In 2.5 lets assume the /dev/sd[a-z][a-z][a-z]
> device addressing is left as is (more or less). To
> identify lots of disks the Vital Product Data page 0x83
> (failing that, the disk serial number) should be used.
>
> This information is available via sysfs (thanks to
> Patrick Mansfield and Mike Anderson).
>
> # cd /sys/bus/scsi/devices
> # find . -follow -name 'name' -exec cat {} \; -print
> SIBM     DNES-309170W            AJF98887
> ./1:0:4:0/name
> SFUJITSU MAM3184MP       UKS0P2300CK0
> ./0:0:1:0/name
>
> It is relatively easy to write user space tools to show
> this information:
> # lsscsi -n
> [0:0:1:0]    disk    FUJITSU  MAM3184MP        0106  /dev/sda
>    name: SFUJITSU MAM3184MP       UKS0P2300CK0
> [1:0:4:0]    disk    IBM      DNES-309170W     SA30  /dev/sdb
>    name: SIBM     DNES-309170W            AJF98887
>
> Each pair of lines links the transient topological and device
> node name ("0:0:1:0" and "dev/sda" respectively) with a
> (hopefully) invariant "name" for that device.
>
> So if that name was hashed there would be a reasonable mapping
> from that name to the current Linux scsi disk device node name
> (e.g. /dev/sda). So user space tools could work out the mapping
> and provide the "memory" from one boot to the next (and across
> the deletion and re-addition of HBA modules).
>
> Doug Gilbert

Doug,

I completly agree with what you said. One can write a user-space 
tool to create/re-create/update device node and try to keep device
names consistent. I am sure people (Greg KH) are working on this.

All I am trying to do is, come out with a plan to do this for 2.6.
We can do all the user-space stuff and make generic dynamic
<major, minor> assignment and some how make the /dev/ nodes
magically everytime rmmod/insmod or re-boot. There are lots
of dependencies and players here. All these going to happen for 2.6 ?
I would love to see this happen. But please, don't leave it in
the middle. (leaving the device node mapping to user/admin).
And also, how do we deal with booting/running 2.4 ?

If not, my patch is a fallback solution.

Thanks,
Badari


Thanks,
Badari

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-11  1:25   ` Badari Pulavarty
@ 2003-04-11 15:43     ` Joel Becker
  0 siblings, 0 replies; 29+ messages in thread
From: Joel Becker @ 2003-04-11 15:43 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Roman Zippel, linux-kernel, linux-scsi

On Thu, Apr 10, 2003 at 06:25:12PM -0700, Badari Pulavarty wrote:
> I can't see (2) happening easily. I know that Greg KH is working on
> udev (/dev/ memory filesystem). Once that happens, we have to change
> drivers/subsystems (we need) to make dynamic allocation. All of this Is 
> going to happen for 2.6 ?

	Just to be clear, let's not repeat the devfsd fiasco.  If we
have dynamic update of device nodes (we need it), I want it to work with
my /dev on ext3.

Joel

-- 

"You look in her eyes, the music begins to play.
 Hopeless romantics, here we go again."

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-11 11:42 Andries.Brouwer
@ 2003-04-11 14:33 ` James Bottomley
  2003-04-11 16:21   ` Badari Pulavarty
  0 siblings, 1 reply; 29+ messages in thread
From: James Bottomley @ 2003-04-11 14:33 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: Linux Kernel, SCSI Mailing List, pbadari

On Fri, 2003-04-11 at 06:42, Andries.Brouwer@cwi.nl wrote:
>     Here is my problem..
> 
>     #insmod ips.o
>       < found 10 disks>
>     #insmod qla2300.o
>       < found 10 disks>
>     #rmmod ips.o
>        <removed 10 disks>
>     #insmod ips.o
>       <found 10 disks - but new names>
> 
> OK, I see what you mean. I agree.

Could you elaborate on the reason you want to keep the minor space
compact?  I don't regard the insmod/rmmod problem as valid because if
you do:

rmmod ips.o
rmmod qla2300.o
insmod qla2300.o
insmod ips.o

All bets are off again. For small kernel dev_t it was essential to keep
a compact minor space because otherwise we coulde run out of minors. 
Sparse minors cause no inefficiency in the mid-layer, or in sd.  There
are problems in sg which could be solved by encoding the device type in
the minor.

> [I see that dougg wants to solve such things by properly naming,
> but that is a higher level. Given a large number space an
> easier solution is to give each module its own part of the
> number space.]

Please, no.  Dividing up the minor space like this would be a step
backwards (adding more policy to the kernel).  Someone would also have
to manage this scheme.

James



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-11 11:42 Andries.Brouwer
  2003-04-11 14:33 ` James Bottomley
  0 siblings, 1 reply; 29+ messages in thread
From: Andries.Brouwer @ 2003-04-11 11:42 UTC (permalink / raw)
  To: Andries.Brouwer, linux-kernel, linux-scsi, pbadari

    From: Badari Pulavarty <pbadari@us.ibm.com>

    >     > But you see, the present sd_index_bits[] gives no such
    >     > guarantee. In sd_detach a bit is cleared, in sd_attach
    >     > the first free bit is given out. There is no memory.
    >
    >     But the disks are probed in the same manner as last time
    >     (if the disks/controllers are not moved, crashed etc..).
    >     So we will end up getting same names.
    >
    > Oh, but if next_index is 0 in the module (or reset by the
    > init_module code), then also with index = next_index++
    > things will be the same after rmmod/insmod.

    Here is my problem..

    #insmod ips.o
      < found 10 disks>
    #insmod qla2300.o
      < found 10 disks>
    #rmmod ips.o
       <removed 10 disks>
    #insmod ips.o
      <found 10 disks - but new names>

OK, I see what you mean. I agree.

Andries


[I see that dougg wants to solve such things by properly naming,
but that is a higher level. Given a large number space an
easier solution is to give each module its own part of the
number space.]



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-11  1:09 ` Badari Pulavarty
@ 2003-04-11 10:09   ` Douglas Gilbert
  2003-04-11 16:12     ` Badari Pulavarty
  0 siblings, 1 reply; 29+ messages in thread
From: Douglas Gilbert @ 2003-04-11 10:09 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Andries.Brouwer, linux-kernel, linux-scsi

Badari Pulavarty wrote:
> On Thursday 10 April 2003 04:53 pm, Andries.Brouwer@cwi.nl wrote:
> 
>>    From: Badari Pulavarty <pbadari@us.ibm.com>
>>
>>    >     I am more worried about names slipping. I atleast hope
>>    >     to see device names not changing by just doing
>>    >     rmmod/insmod.
>>    >
>>    > But you see, the present sd_index_bits[] gives no such
>>    > guarantee. In sd_detach a bit is cleared, in sd_attach
>>    > the first free bit is given out. There is no memory.
>>
>>    But the disks are probed in the same manner as last time
>>    (if the disks/controllers are not moved, crashed etc..).
>>    So we will end up getting same names.
>>
>>Oh, but if next_index is 0 in the module (or reset by the
>>init_module code), then also with index = next_index++
>>things will be the same after rmmod/insmod.
> 
> 
> Here is my problem..
> 
> #insmod ips.o
>   < found 10 disks>
> #insmod qla2300.o
>   < found 10 disks>
> #rmmod ips.o
>    <removed 10 disks>
> #insmod ips.o
>   <found 10 disks - but new names>

Badari,
In 2.5 lets assume the /dev/sd[a-z][a-z][a-z]
device addressing is left as is (more or less). To
identify lots of disks the Vital Product Data page 0x83
(failing that, the disk serial number) should be used.

This information is available via sysfs (thanks to
Patrick Mansfield and Mike Anderson).

# cd /sys/bus/scsi/devices
# find . -follow -name 'name' -exec cat {} \; -print
SIBM     DNES-309170W            AJF98887
./1:0:4:0/name
SFUJITSU MAM3184MP       UKS0P2300CK0
./0:0:1:0/name

It is relatively easy to write user space tools to show
this information:
# lsscsi -n
[0:0:1:0]    disk    FUJITSU  MAM3184MP        0106  /dev/sda
   name: SFUJITSU MAM3184MP       UKS0P2300CK0
[1:0:4:0]    disk    IBM      DNES-309170W     SA30  /dev/sdb
   name: SIBM     DNES-309170W            AJF98887

Each pair of lines links the transient topological and device
node name ("0:0:1:0" and "dev/sda" respectively) with a
(hopefully) invariant "name" for that device.

So if that name was hashed there would be a reasonable mapping
from that name to the current Linux scsi disk device node name
(e.g. /dev/sda). So user space tools could work out the mapping
and provide the "memory" from one boot to the next (and across
the deletion and re-addition of HBA modules).

Doug Gilbert





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-11  0:08 ` Roman Zippel
@ 2003-04-11  1:25   ` Badari Pulavarty
  2003-04-11 15:43     ` Joel Becker
  0 siblings, 1 reply; 29+ messages in thread
From: Badari Pulavarty @ 2003-04-11  1:25 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, linux-scsi

On Thursday 10 April 2003 05:08 pm, Roman Zippel wrote:
> Hi,
>
> On Thu, 10 Apr 2003, Badari Pulavarty wrote:
> > This patch addresses the backward compatibility with device nodes
> > issue. All the new disks will be addressed by only last major.
>
> This nicely demonstrates, that it's not exactly becoming nicer, when one
> has to deal with compatibility. This is one more reason to at least
> consider a more general solution, from which all drivers can benefit from.

I am all for more general solution (dynamic assignment), if I can get

(1) backward compatibility with device nodes
(2) device nodes get updated automagically whenever my
<major,minor> changes. (may be due to insmod/rmmod, reboot etc..)

I can't see (2) happening easily. I know that Greg KH is working on
udev (/dev/ memory filesystem). Once that happens, we have to change
drivers/subsystems (we need) to make dynamic allocation. All of this Is 
going to happen for 2.6 ?

Thats why, I am trying to come out with half-cooked workable solution for 2.6. 

Thanks,
Badari

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-10 23:53 Andries.Brouwer
@ 2003-04-11  1:09 ` Badari Pulavarty
  2003-04-11 10:09   ` Douglas Gilbert
  0 siblings, 1 reply; 29+ messages in thread
From: Badari Pulavarty @ 2003-04-11  1:09 UTC (permalink / raw)
  To: Andries.Brouwer, linux-kernel, linux-scsi

On Thursday 10 April 2003 04:53 pm, Andries.Brouwer@cwi.nl wrote:
>     From: Badari Pulavarty <pbadari@us.ibm.com>
>
>     >     I am more worried about names slipping. I atleast hope
>     >     to see device names not changing by just doing
>     >     rmmod/insmod.
>     >
>     > But you see, the present sd_index_bits[] gives no such
>     > guarantee. In sd_detach a bit is cleared, in sd_attach
>     > the first free bit is given out. There is no memory.
>
>     But the disks are probed in the same manner as last time
>     (if the disks/controllers are not moved, crashed etc..).
>     So we will end up getting same names.
>
> Oh, but if next_index is 0 in the module (or reset by the
> init_module code), then also with index = next_index++
> things will be the same after rmmod/insmod.

Here is my problem..

#insmod ips.o
  < found 10 disks>
#insmod qla2300.o
  < found 10 disks>
#rmmod ips.o
   <removed 10 disks>
#insmod ips.o
  <found 10 disks - but new names>

- Badari

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-11  0:13 Andries.Brouwer
  0 siblings, 0 replies; 29+ messages in thread
From: Andries.Brouwer @ 2003-04-11  0:13 UTC (permalink / raw)
  To: Andries.Brouwer, zippel; +Cc: linux-kernel, linux-scsi, pbadari

    From: Roman Zippel <zippel@linux-m68k.org>

    > The conclusion is that the easy way out is to define MAX_NR_DISKS.
    > A different way out, especially when we use 32+32, is to kill this
    > sd_index_bits[] array, and give each disk a new number: replace
    >     index = find_first_zero_bit(sd_index_bits, SD_DISKS);
    > by
    >     index = next_index++;

    Unless you fix all programs which scan /dev/sg*, you better keep 
    the used range dense, so this not really option.

That only holds for the first 256 minors (of the first 8 majors).
Since we want to be completely backwards compatible, nothing
changes there.

But people who want to use new features must update their programs
or at least recompile.

Andries



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-10 20:39 Badari Pulavarty
  2003-04-10 20:54 ` Randy.Dunlap
@ 2003-04-11  0:08 ` Roman Zippel
  2003-04-11  1:25   ` Badari Pulavarty
  1 sibling, 1 reply; 29+ messages in thread
From: Roman Zippel @ 2003-04-11  0:08 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: linux-kernel, linux-scsi

Hi,

On Thu, 10 Apr 2003, Badari Pulavarty wrote:

> This patch addresses the backward compatibility with device nodes
> issue. All the new disks will be addressed by only last major.

This nicely demonstrates, that it's not exactly becoming nicer, when one 
has to deal with compatibility. This is one more reason to at least 
consider a more general solution, from which all drivers can benefit from.

bye, Roman


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-10 22:09 Andries.Brouwer
  2003-04-10 22:22 ` Badari Pulavarty
@ 2003-04-10 23:57 ` Roman Zippel
  1 sibling, 0 replies; 29+ messages in thread
From: Roman Zippel @ 2003-04-10 23:57 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: linux-kernel, linux-scsi, pbadari

Hi,

On Fri, 11 Apr 2003 Andries.Brouwer@cwi.nl wrote:

> The conclusion is that the easy way out is to define MAX_NR_DISKS.
> A different way out, especially when we use 32+32, is to kill this
> sd_index_bits[] array, and give each disk a new number: replace
> 	index = find_first_zero_bit(sd_index_bits, SD_DISKS);
> by
> 	index = next_index++;

This one is fun:
http://www.ussg.iu.edu/hypermail/linux/kernel/0103.3/0394.html

Anyway, unless you fix all programs which scan /dev/sg*, you better keep 
the used range dense, so this not really option.

bye, Roman


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-10 23:53 Andries.Brouwer
  2003-04-11  1:09 ` Badari Pulavarty
  0 siblings, 1 reply; 29+ messages in thread
From: Andries.Brouwer @ 2003-04-10 23:53 UTC (permalink / raw)
  To: Andries.Brouwer, linux-kernel, linux-scsi, pbadari

    From: Badari Pulavarty <pbadari@us.ibm.com>

    >     I am more worried about names slipping. I atleast hope
    >     to see device names not changing by just doing
    >     rmmod/insmod.
    >
    > But you see, the present sd_index_bits[] gives no such
    > guarantee. In sd_detach a bit is cleared, in sd_attach
    > the first free bit is given out. There is no memory.

    But the disks are probed in the same manner as last time
    (if the disks/controllers are not moved, crashed etc..).
    So we will end up getting same names.

Oh, but if next_index is 0 in the module (or reset by the
init_module code), then also with index = next_index++
things will be the same after rmmod/insmod.

Andries



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-10 23:09 Andries.Brouwer
@ 2003-04-10 23:16 ` Badari Pulavarty
  0 siblings, 0 replies; 29+ messages in thread
From: Badari Pulavarty @ 2003-04-10 23:16 UTC (permalink / raw)
  To: Andries.Brouwer, linux-kernel, linux-scsi

On Thursday 10 April 2003 04:09 pm, Andries.Brouwer@cwi.nl wrote:
>     > A different way out, especially when we use 32+32, is to kill this
>     > sd_index_bits[] array, and give each disk a new number: replace
>     > 	index = find_first_zero_bit(sd_index_bits, SD_DISKS);
>     > by
>     > 	index = next_index++;
>
>     I wish it is that simple. We use sd_index_bits[] since we could
>     sd_detach() and then sd_attach()  few disks. We will end up with
>     holes, name slippage without this. We need to know what disks are
>     currently being in use.
>
> It is that simple. (At least with 64-bit dev_t.)
> Look at the use of sd_index_bits[]. It is static in sd.c.
> There is the definition, the first free bit is found (and set)
> in sd_attach() to provide our disk with a number, this bit is
> cleared again in sd_detach().
>
> That is all. In other words, a mechanism to give an unused number
> to each disk for which sd_attach() is called.
>
> Now suppose we do nothing in sd_detach().
> Then we don't know which disks have disappeared. Pity.
> If the number space is infinite then
> 	index = next_index++;
> gives a new number each time we need one.

Yes !! I agree. I am not worried about running out them.
I am more worried about names slipping. I atleast hope
to see device names not changing by just doing
rmmod/insmod. 

Thanks,
Badari

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-10 23:09 Andries.Brouwer
  2003-04-10 23:16 ` Badari Pulavarty
  0 siblings, 1 reply; 29+ messages in thread
From: Andries.Brouwer @ 2003-04-10 23:09 UTC (permalink / raw)
  To: Andries.Brouwer, linux-kernel, linux-scsi, pbadari

    > A different way out, especially when we use 32+32, is to kill this
    > sd_index_bits[] array, and give each disk a new number: replace
    > 	index = find_first_zero_bit(sd_index_bits, SD_DISKS);
    > by
    > 	index = next_index++;

    I wish it is that simple. We use sd_index_bits[] since we could
    sd_detach() and then sd_attach()  few disks. We will end up with
    holes, name slippage without this. We need to know what disks are
    currently being in use.

It is that simple. (At least with 64-bit dev_t.)
Look at the use of sd_index_bits[]. It is static in sd.c.
There is the definition, the first free bit is found (and set)
in sd_attach() to provide our disk with a number, this bit is
cleared again in sd_detach().

That is all. In other words, a mechanism to give an unused number
to each disk for which sd_attach() is called.

Now suppose we do nothing in sd_detach().
Then we don't know which disks have disappeared. Pity.
If the number space is infinite then
	index = next_index++;
gives a new number each time we need one.

Now that it is finite, some estimates are needed. How often
will sd_attach() be called during the uptime of this kernel /
the lifetime of this computer? And how much space is available?

Among 2^64 device numbers, 2^48 reserved for scsi disks
is a very small fraction. With at most 2^12 partitions
on each disk that would leave room for 2^36 disks.
Do you think during the lifetime of this computer a new
scsi disk will be added more than 68 . 10^9 times?
That would be adding 400 disks each second for five years.

You see, 2^64 is not infinite, but it is close.

Andries

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-10 22:09 Andries.Brouwer
@ 2003-04-10 22:22 ` Badari Pulavarty
  2003-04-10 23:57 ` Roman Zippel
  1 sibling, 0 replies; 29+ messages in thread
From: Badari Pulavarty @ 2003-04-10 22:22 UTC (permalink / raw)
  To: Andries.Brouwer, linux-kernel, linux-scsi

On Thursday 10 April 2003 03:09 pm, Andries.Brouwer@cwi.nl wrote:

>
> I try to make sure there are no assumptions about the
> size or structure of device numbers anywhere outside kdev_t.h.
> In particular I object to the use of KDEV_MINOR_BITS.
>
> Apart from this formal point, there is also the practical point:
> suppose 64 = 32+32 is used, so that KDEV_MINOR_BITS equals 32.
> Then LAST_MAJOR_DISKS is 2^28 and sd_index_bits[] would be 32 MB array.
> Unreasonable.

agreed !! (I mentioned this ealier in my previous postings - sd_index_bits[]
array size)

>
> The conclusion is that the easy way out is to define MAX_NR_DISKS.

Unfortunately, MAX_NR_DISK will be dependent on KDEV_MINOR_BITS.
We can't set MAX_NR_DISKS to arbitrary value and if there are not
enought MINOR bits, it won't work. Only way to make this work is
to do dynamic major allocation and update /dev/ entries for them.

> A different way out, especially when we use 32+32, is to kill this
> sd_index_bits[] array, and give each disk a new number: replace
> 	index = find_first_zero_bit(sd_index_bits, SD_DISKS);
> by
> 	index = next_index++;
>
I wish it is that simple. We use sd_index_bits[] since we could
sd_detach() and then sd_attach()  few disks. We will end up with
holes, name slippage without this. We need to know what disks are 
currently being in use.

Thanks,
Badari


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-10 22:09 Andries.Brouwer
  2003-04-10 22:22 ` Badari Pulavarty
  2003-04-10 23:57 ` Roman Zippel
  0 siblings, 2 replies; 29+ messages in thread
From: Andries.Brouwer @ 2003-04-10 22:09 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, pbadari

[I noticed that I have been unsubscribed for a day or so,
so may not have seen earlier mail.]

	From: Badari Pulavarty <pbadari@us.ibm.com>

	--- linux-2.5.67/drivers/scsi/sd.c	Wed Apr  9 13:12:38 2003
	+++ linux-2.5.67.new/drivers/scsi/sd.c	Thu Apr 10 13:23:49 2003
	@@ -56,7 +56,9 @@
	  * Remaining dev_t-handling stuff
	  */
	 #define SD_MAJORS	16
	-#define SD_DISKS	(SD_MAJORS << 4)
	+#define SD_DISKS	((SD_MAJORS - 1) << 4)
	+#define LAST_MAJOR_DISKS	(1 << (KDEV_MINOR_BITS - 4))
	+#define TOTAL_SD_DISKS	(SD_DISKS + LAST_MAJOR_DISKS)
	 
	-static unsigned long sd_index_bits[SD_DISKS / BITS_PER_LONG];
	+static unsigned long sd_index_bits[TOTAL_SD_DISKS / BITS_PER_LONG];

I try to make sure there are no assumptions about the
size or structure of device numbers anywhere outside kdev_t.h.
In particular I object to the use of KDEV_MINOR_BITS.

Apart from this formal point, there is also the practical point:
suppose 64 = 32+32 is used, so that KDEV_MINOR_BITS equals 32.
Then LAST_MAJOR_DISKS is 2^28 and sd_index_bits[] would be 32 MB array.
Unreasonable.

The conclusion is that the easy way out is to define MAX_NR_DISKS.
A different way out, especially when we use 32+32, is to kill this
sd_index_bits[] array, and give each disk a new number: replace
	index = find_first_zero_bit(sd_index_bits, SD_DISKS);
by
	index = next_index++;

Andries

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch for playing] Patch to support 4000 disks and maintain backward compatibility
  2003-04-10 20:39 Badari Pulavarty
@ 2003-04-10 20:54 ` Randy.Dunlap
  2003-04-11  0:08 ` Roman Zippel
  1 sibling, 0 replies; 29+ messages in thread
From: Randy.Dunlap @ 2003-04-10 20:54 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: linux-kernel, linux-scsi

On Thu, 10 Apr 2003 13:39:49 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote:

| Here is the (sd) patch to support > 4000 disks on 32-bit dev_t work
| in 2.5.67-mm tree.
| 
| This patch addresses the backward compatibility with device nodes
| issue. All the new disks will be addressed by only last major.
| 
| SCSI has 16 majors. Each major supports 16 disks currently.
| This patch leaves this assumption for first 15 majors and all the
| new disks addressable by 32/64 dev_t work will be added to
| SCSI last major#. This way, we don't need to create device
| nodes in /dev, if you switch between 2.4 and 2.5.
| 
| Any comments ?


 #define SD_MAJORS	16
-#define SD_DISKS	(SD_MAJORS << 4)
+#define SD_DISKS	((SD_MAJORS - 1) << 4)
+#define LAST_MAJOR_DISKS	(1 << (KDEV_MINOR_BITS - 4))
+#define TOTAL_SD_DISKS	(SD_DISKS + LAST_MAJOR_DISKS)
 
@@ -85,7 +87,7 @@ struct scsi_disk {
 static LIST_HEAD(sd_devlist);
 static spinlock_t sd_devlist_lock = SPIN_LOCK_UNLOCKED;
 
-static unsigned long sd_index_bits[SD_DISKS / BITS_PER_LONG];
+static unsigned long sd_index_bits[TOTAL_SD_DISKS / BITS_PER_LONG];
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~
If there's any chance that TOTAL_SD_DISKS is not a multiple of
BITS_PER_LONG, then the value above should better be

	(TOTAL_SD_DISKS + BITS_PER_LONG - 1) / BITS_PER_LONG


--
~Randy   ['tangent' is not a verb...unless you believe that
          "in English any noun can be verbed."]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch for playing] Patch to support 4000 disks and maintain backward compatibility
@ 2003-04-10 20:39 Badari Pulavarty
  2003-04-10 20:54 ` Randy.Dunlap
  2003-04-11  0:08 ` Roman Zippel
  0 siblings, 2 replies; 29+ messages in thread
From: Badari Pulavarty @ 2003-04-10 20:39 UTC (permalink / raw)
  To: linux-kernel, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 571 bytes --]

Hi,

Here is the (sd) patch to support > 4000 disks on 32-bit dev_t work
in 2.5.67-mm tree.

This patch addresses the backward compatibility with device nodes
issue. All the new disks will be addressed by only last major.

SCSI has 16 majors. Each major supports 16 disks currently.
This patch leaves this assumption for first 15 majors and all the
new disks addressable by 32/64 dev_t work will be added to
SCSI last major#. This way, we don't need to create device
nodes in /dev, if you switch between 2.4 and 2.5.

Any comments ?

Thanks,
Badari



[-- Attachment #2: sd.new --]
[-- Type: text/x-diff, Size: 2514 bytes --]

--- linux-2.5.67/drivers/scsi/sd.c	Wed Apr  9 13:12:38 2003
+++ linux-2.5.67.new/drivers/scsi/sd.c	Thu Apr 10 13:23:49 2003
@@ -56,7 +56,9 @@
  * Remaining dev_t-handling stuff
  */
 #define SD_MAJORS	16
-#define SD_DISKS	(SD_MAJORS << 4)
+#define SD_DISKS	((SD_MAJORS - 1) << 4)
+#define LAST_MAJOR_DISKS	(1 << (KDEV_MINOR_BITS - 4))
+#define TOTAL_SD_DISKS	(SD_DISKS + LAST_MAJOR_DISKS)
 
 /*
  * Time out in seconds for disks and Magneto-opticals (which are slower).
@@ -85,7 +87,7 @@ struct scsi_disk {
 static LIST_HEAD(sd_devlist);
 static spinlock_t sd_devlist_lock = SPIN_LOCK_UNLOCKED;
 
-static unsigned long sd_index_bits[SD_DISKS / BITS_PER_LONG];
+static unsigned long sd_index_bits[TOTAL_SD_DISKS / BITS_PER_LONG];
 static spinlock_t sd_index_lock = SPIN_LOCK_UNLOCKED;
 
 static void sd_init_onedisk(struct scsi_disk * sdkp, struct gendisk *disk);
@@ -123,7 +125,10 @@ static int sd_major(int major_idx)
 	case 1 ... 7:
 		return SCSI_DISK1_MAJOR + major_idx - 1;
 	case 8 ... 15:
-		return SCSI_DISK8_MAJOR + major_idx;
+		return SCSI_DISK8_MAJOR + major_idx - 8;
+#define MAX_IDX	(TOTAL_SD_DISKS >> 4)
+	case 16 ... MAX_IDX:
+		return SCSI_DISK15_MAJOR;
 	default:
 		BUG();
 		return 0;	/* shut up gcc */
@@ -1313,8 +1318,8 @@ static int sd_attach(struct scsi_device 
 		goto out_free;
 
 	spin_lock(&sd_index_lock);
-	index = find_first_zero_bit(sd_index_bits, SD_DISKS);
-	if (index == SD_DISKS) {
+	index = find_first_zero_bit(sd_index_bits, TOTAL_SD_DISKS);
+	if (index == TOTAL_SD_DISKS) {
 		spin_unlock(&sd_index_lock);
 		error = -EBUSY;
 		goto out_put;
@@ -1329,15 +1334,25 @@ static int sd_attach(struct scsi_device 
 
 	gd->de = sdp->de;
 	gd->major = sd_major(index >> 4);
-	gd->first_minor = (index & 15) << 4;
+#define DISKS_PER_MINOR_MASK	((1 << (KDEV_MINOR_BITS - 4)) - 1)
+	if (index > SD_DISKS) 
+		gd->first_minor = ((index - SD_DISKS) & DISKS_PER_MINOR_MASK) << 4;
+	else
+		gd->first_minor = (index & 15) << 4;
 	gd->minors = 16;
 	gd->fops = &sd_fops;
 
-	if (index >= 26) {
+	if (index < 26) {
+		sprintf(gd->disk_name, "sd%c", 'a' + index % 26);
+	} else if (index < (26*27)) {
 		sprintf(gd->disk_name, "sd%c%c",
 			'a' + index/26-1,'a' + index % 26);
 	} else {
-		sprintf(gd->disk_name, "sd%c", 'a' + index % 26);
+		const unsigned int m1 = (index/ 26 - 1) / 26 - 1;
+		const unsigned int m2 = (index / 26 - 1) % 26;
+		const unsigned int m3 = index % 26;
+		sprintf(gd->disk_name, "sd%c%c%c", 
+			'a' + m1, 'a' + m2, 'a' + m3);
 	}
 
 	sd_init_onedisk(sdkp, gd);
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2003-04-13 14:07 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-10 23:33 [patch for playing] Patch to support 4000 disks and maintain backward compatibility Andries.Brouwer
2003-04-10 23:37 ` Badari Pulavarty
  -- strict thread matches above, loose matches on Subject: below --
2003-04-13 13:59 Paul McKenney
2003-04-12  1:13 Paul McKenney
2003-04-12 14:14 ` James Bottomley
2003-04-11 21:13 Andries.Brouwer
2003-04-11 19:45 Andries.Brouwer
2003-04-11 20:14 ` James Bottomley
2003-04-11 23:21   ` Joel Becker
2003-04-11 18:07 Andries.Brouwer
2003-04-11 19:12 ` James Bottomley
2003-04-11 11:42 Andries.Brouwer
2003-04-11 14:33 ` James Bottomley
2003-04-11 16:21   ` Badari Pulavarty
2003-04-11  0:13 Andries.Brouwer
2003-04-10 23:53 Andries.Brouwer
2003-04-11  1:09 ` Badari Pulavarty
2003-04-11 10:09   ` Douglas Gilbert
2003-04-11 16:12     ` Badari Pulavarty
2003-04-10 23:09 Andries.Brouwer
2003-04-10 23:16 ` Badari Pulavarty
2003-04-10 22:09 Andries.Brouwer
2003-04-10 22:22 ` Badari Pulavarty
2003-04-10 23:57 ` Roman Zippel
2003-04-10 20:39 Badari Pulavarty
2003-04-10 20:54 ` Randy.Dunlap
2003-04-11  0:08 ` Roman Zippel
2003-04-11  1:25   ` Badari Pulavarty
2003-04-11 15:43     ` Joel Becker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).