linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] lazy allocation of struct block_device
@ 2001-08-31  4:43 Alexander Viro
  0 siblings, 0 replies; 16+ messages in thread
From: Alexander Viro @ 2001-08-31  4:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andries Brouwer, Alan Cox, linux-kernel

	Linus, I've looked into that (allocating ->i_bdev upon open()).
There are several problems with this approach and none of the solutions I
can see looks like a clear winner.

1) when do we drop ->i_bdev?
2) how should we change the refcounting on struct block_device?
3) what to do with beasts that don't have major number and just set
->i_bdev when they allocate an inode?

Possible solutions:

a) once we had set ->i_bdev it's there to stay - we free it only when
inode itself is freed.  No changes in refcounting, devfs-like animals
simply do allocation earlier then the rest.

b) ->i_bdev itself doesn't contribute to block_device refcount.  We keep
struct block_device as long as it's opened by somebody.  When everyone
who had inode opened is gone we reset ->i_bdev.

c) allocation/freeing of block_device is controlled by driver (and partitioning
code).  bdcache always corresponds to the block devices that actually are
there.  ->i_bdev is set upon open() and doesn't contribute anything to
refcount.  When we set it, inode is placed on the list anchored in ->i_bdev.
When device goes away we reset ->i_bdev and take inode out of the list.

d) we trust ->i_bdev only for opened inodes.  Whenever we try to open
the thing we do cache lookup (by ->i_rdev) and set ->i_bdev.  Refcounting
on block_device - as in (b).

Each of these variants has problems.

a) Allocation and freeing logics differ.  Not pretty and we still can
overpopulate the cache.

b) We don't know when the last opener of inode goes away.  We know that
for block_device itself, but what to do if several inodes have the same
major/minor?  We can add ->i_openers or something like that, but it means
growing already bloated struct inode _and_ adding overhead to open()/close().
Cases that set ->i_bdev early are not a problem - we could just start with
non-zero ->i_openers.

c) Interesting, and might be the Right Thing(tm), but... we will have to
add register_<something> to drivers of non-partitioned devices (partitioned
ones can do that in grok_partitions()).  We also need to find a reusable
list in struct inode (shouldn't be a problem, since a lot of stuff is never
used for devices) or add a new list_head there.

d) Early-set cases are trouble - we need a new flag ("trust ->i_bdev") to
prevent cache lookups/reassigning ->i_bdev for them.


I could argue for any of these variants - (a) means minimally intrusive
changes, (c) has a potential for moving partition stuff into struct
block_device and opens a way for cleaning a lot of code in drivers/killing
a bunch of arrays/removing a lot of crap from devfs, (b) and (d) give the
smallest cache sizes.

In the long run I'd probably prefer (c), simply because it allows very nice
cleanups for 2.5 - we will need some sort of per-partition structure
allocated when we find a partition anyway and we might point ->i_bdev
to these just fine.  Moreover, we are getting rid of the "same methods
table for everyone with given major" crap, which is a Good Thing(tm).

Linus, it's your decision - it seriously affects architecture.  Either
way, patch won't be too large and all variants can be done in 2.4 just
fine.  Similar problems exist for character devices, but there we
may need more intrusive changes.

Please, comment.
								Al


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
@ 2001-09-02 20:05 Andries.Brouwer
  0 siblings, 0 replies; 16+ messages in thread
From: Andries.Brouwer @ 2001-09-02 20:05 UTC (permalink / raw)
  To: Andries.Brouwer, viro; +Cc: alan, linux-kernel, torvalds

>> How many bits should a dev_t have? Well, enough.

> Enough for what? To cover all currently supported devices?

Enough to avoid the hassles that one has when dev_t is too small.

dev_t is a communication channel - you come with a cookie
and get a device in return.

Now NFS uses a 64-bit dev_t. If you choose a smaller one
then you have to invent some mapping between your 16 or 32 bits
and the 64 of NFS. I have not myself used systems that use a
64-bit dev_t (except for my own Linux machine :-) but have seen
systems with 32 bits divided 8+24 or 12+20 or 14+18 or 16+16,
so your mapping may have to depend on what is on the other side.
Not difficult, but annoying. A hassle for the sysadm.
There is no hassle with 64-bit dev_t.

In reality nobody wants a dev_t. We want a string.
A device path that gives the bus and SCSI ID or USB address
or internet URL plus protocol where to find this device.
But such a device path is large and of unknown shape.
Current user space software cannot easily handle such new objects.
Life becomes simpler if a disk on my local ethernet that
requires a password before use can be accessed as /dev/eda
not different from /dev/hda. Some as yet unspecified attach()
system call can turn device paths into numbers (dev_t),
and a following mknod() can attach a Unix filename to the number.
You see that in such a setup the dev_t is a handle, maybe a
pointer, not unlike the filehandles that NFS uses.
If your machine has more than 1 GB of memory, maybe you want
to use more than 32 bits for your handle.

The above is just fiction - I don't know how devices will be handled
in the future. But I find it very easy to conjure up scenarios
where having 64-bit dev_t would be very useful in order to make
sure that our current body of programs keeps working also in new
circumstances.

Andries

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
  2001-09-02 17:25 Andries.Brouwer
@ 2001-09-02 18:46 ` Alexander Viro
  0 siblings, 0 replies; 16+ messages in thread
From: Alexander Viro @ 2001-09-02 18:46 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: alan, linux-kernel, torvalds



On Sun, 2 Sep 2001 Andries.Brouwer@cwi.nl wrote:

> Since several people react, let us fork the discussion
> and talk about dev_t for a while.
> 
> Your reaction ("not in any tree I'd ever run") is too strong,
> more emotion than thought. (Or maybe you only want to please Linus.)

Not really, especially since if Linus will go that way I'll simply fork
the tree.
 
> How long must filenames (maximally) be? Well, long enough.
> And I used systems with namelength 4, 6, 8, 11, 12, 14, 31, 255, maybe more.
> And 255 is long enough. Is it ridiculous to allow 255? Isn't 100 enough?
> Yes, 100 is also enough, as tar shows. But going to 255 has essentially
> zero cost, so has only advantages.
> People do not have to use such long names, but they can, if they want.
> The longest filename on this machine here has length 97.
> 
> How many bits should a dev_t have? Well, enough.

Enough for what? To cover all currently supported devices? Or to allocate
a major for each driver that will ever be written? I can see the value of
the former, but not the latter.

> Now you malign glibc ("this piece of shit"), but in reality this
> has very little to do with glibc. The only important use (important
> efficiency-wise) of dev_t is in the stat() system call.
> Now the present Linux stat64 call uses 96-bit dev_t, 16 bits info
> and 80 bits padding. Even if you use some tiny libc, you get
> 96 bits - in other words, the inefficiency is in the kernel.
> Thus, I do not quite understand why you say that you prefer
> to have only the disadvantages, and that you never in your life
> want the advantages of a large dev_t.

Mostly for the same reasons why I don't like the idea of perpetuating
*FAT and friends.  dev_t as a way to specify device was tolerable on
v7.  It doesn't scale and the fact that you need to expand it indicates
exactly that.  IOW, if we ever need that many device types - numbers
are not going to work.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
@ 2001-09-02 17:25 Andries.Brouwer
  2001-09-02 18:46 ` Alexander Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Andries.Brouwer @ 2001-09-02 17:25 UTC (permalink / raw)
  To: Andries.Brouwer, viro; +Cc: alan, linux-kernel, torvalds

    From viro@math.psu.edu Sun Sep  2 13:39:05 2001

    On Sun, 2 Sep 2001 Andries.Brouwer@cwi.nl wrote:

    > That majors allocation is frozen is not my problem.
    > I make life simple with a 64-bit dev_t. Everybody else already has the
    > disadvantages of a 64-bit dev_t, since glibc moves 64 bits around,
    > but not the advantage of a large namespace.

    Not an option for any tree I'd ever run on my box and IIRC Linus was not
    likely to inflict that on his.  As for the glibc...
        a) not everything uses this piece of shit
        b) it would be really nice to get rid of it completely someday

    > But that is a separate discussion.

    <nod>

Since several people react, let us fork the discussion
and talk about dev_t for a while.

Your reaction ("not in any tree I'd ever run") is too strong,
more emotion than thought. (Or maybe you only want to please Linus.)

How long must filenames (maximally) be? Well, long enough.
And I used systems with namelength 4, 6, 8, 11, 12, 14, 31, 255, maybe more.
And 255 is long enough. Is it ridiculous to allow 255? Isn't 100 enough?
Yes, 100 is also enough, as tar shows. But going to 255 has essentially
zero cost, so has only advantages.
People do not have to use such long names, but they can, if they want.
The longest filename on this machine here has length 97.

How many bits should a dev_t have? Well, enough.
And 16 is not enough, and people go through painful contortions.
And 64 is enough. Is it ridiculous to allow 64? Isn't 32 enough?
Maybe, in most cases, yes, although there are some slight problems
with NFS and interoperability - certainly 64 makes life easier.
Moreover, going to 64 has essentially zero cost, so has only advantages.

Now you malign glibc ("this piece of shit"), but in reality this
has very little to do with glibc. The only important use (important
efficiency-wise) of dev_t is in the stat() system call.
Now the present Linux stat64 call uses 96-bit dev_t, 16 bits info
and 80 bits padding. Even if you use some tiny libc, you get
96 bits - in other words, the inefficiency is in the kernel.

Thus, I do not quite understand why you say that you prefer
to have only the disadvantages, and that you never in your life
want the advantages of a large dev_t.

Andries

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
  2001-09-02 15:38 ` Richard Gooch
  2001-09-02 16:07   ` Alexander Viro
@ 2001-09-02 16:16   ` Richard Gooch
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Gooch @ 2001-09-02 16:16 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Andries.Brouwer, alan, linux-kernel, torvalds

Alexander Viro writes:
> 
> 
> On Sun, 2 Sep 2001, Richard Gooch wrote:
> 
> > Yep. Having to allocate/search for a device structure during open(2)
> > is insane. It's wasteful. For ext2fs you can do it in lookup(), and
> > for devfs (or similar) you can do the ideal thing: when the device
> > entry is registered with the FS (i.e. only once).
> 
> No.  We _must_ do it on ->open() for cases when it had been NULL.  Driver
> might be not there at ->lookup() time and hunting down all inodes with
> give major/minor is insanity.

Apologies. Let me clarify what I meant: it's insane to only ever
allow allocate/search during open(2).

Oh, and yeah: doing allocate/search during lookup() for ext2fs is a
bad idea. Wasn't thinking straight.

> Now, if inode had been created by the driver - sure, we create the
> association between it and <whatever>_device from the very
> beginning.

Yep. That's what I meant.

> We must support device nodes on normal filesystems.

Of course. No argument.

> Support for such beasts is there to stay.  However, we shouldn't do
> things that make allocation of new majors mandatory.  IOW, as far as
> I'm concerned solutions that do not allow "establish connection with
> foo_device upon inode creation and don't bother with device number
> for that one" are broken.

Nod.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
  2001-09-02 15:38 ` Richard Gooch
@ 2001-09-02 16:07   ` Alexander Viro
  2001-09-02 16:16   ` Richard Gooch
  1 sibling, 0 replies; 16+ messages in thread
From: Alexander Viro @ 2001-09-02 16:07 UTC (permalink / raw)
  To: Richard Gooch; +Cc: Andries.Brouwer, alan, linux-kernel, torvalds



On Sun, 2 Sep 2001, Richard Gooch wrote:

> Yep. Having to allocate/search for a device structure during open(2)
> is insane. It's wasteful. For ext2fs you can do it in lookup(), and
> for devfs (or similar) you can do the ideal thing: when the device
> entry is registered with the FS (i.e. only once).

No.  We _must_ do it on ->open() for cases when it had been NULL.  Driver
might be not there at ->lookup() time and hunting down all inodes with
give major/minor is insanity.

Now, if inode had been created by the driver - sure, we create the association
between it and <whatever>_device from the very beginning.

We must support device nodes on normal filesystems.  Period.  They are there
on tons of boxen and current support for device nodes on synthetic trees
is _not_ suitable for general use (== I'd question sanity of anyone using
it on production multiuser boxen).

Support for such beasts is there to stay.  However, we shouldn't do things
that make allocation of new majors mandatory.  IOW, as far as I'm concerned
solutions that do not allow "establish connection with foo_device upon
inode creation and don't bother with device number for that one" are
broken.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
  2001-09-02 10:24 Andries.Brouwer
  2001-09-02 11:38 ` Alexander Viro
@ 2001-09-02 15:38 ` Richard Gooch
  2001-09-02 16:07   ` Alexander Viro
  2001-09-02 16:16   ` Richard Gooch
  1 sibling, 2 replies; 16+ messages in thread
From: Richard Gooch @ 2001-09-02 15:38 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Andries.Brouwer, alan, linux-kernel, torvalds

Alexander Viro writes:
> 
> 
> On Sun, 2 Sep 2001 Andries.Brouwer@cwi.nl wrote:
> 
> > That majors allocation is frozen is not my problem.
> > I make life simple with a 64-bit dev_t. Everybody else already has the
> > disadvantages of a 64-bit dev_t, since glibc moves 64 bits around,
> > but not the advantage of a large namespace.
> 
> Not an option for any tree I'd ever run on my box and IIRC Linus was not
> likely to inflict that on his.  As for the glibc...
> 	a) not everything uses this piece of shit

Nod.

> 	b) it would be really nice to get rid of it completely someday

Or at the very least, to selectively not compile large chunks of stuff
that went in that I will never need. Stuff that isn't there on libc5
and other Unices. Something that takes up less than half a floppy.
And having source and binary compatibility between major releases
wouldn't hurt either!

> > Concerning devfs, I don't use it and have not really thought about it.
> > I think my point of view would be that devfs provides a different object.
> > A device node in a filesystem is a pair (pathname, dev_t).
> > Opening it gives the triple (pathname, dev_t, kdev_t).
> > What devfs provides is (pathname), after opening (pathname, kdev_t).
> 
> Wait a second.  To hell with devfs, I'm talking about any synthetic
> tree containing device nodes.  Being forced to allocate a point in
> numeric namespace just to be able to associate a driver-created
> inode with (also driver-created) device...  Looks rather bogus, not
> to mention the ugliness of maintaining said numeric namespace.

Nod.

> > But I think the pointer kdev_t (or i_bcdev) must still be recomputed:
> > it remains true that modules can be unloaded.
> 
> Umm... So what? They can be unloaded only when we have device not opened.
> We might leave both allocation and freeing to driver-side code (and that
> includes grok_partitions()) and let the freeing code reset ->i_bdev and
> friends in all relevant inodes.

Yep. Having to allocate/search for a device structure during open(2)
is insane. It's wasteful. For ext2fs you can do it in lookup(), and
for devfs (or similar) you can do the ideal thing: when the device
entry is registered with the FS (i.e. only once).


				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
  2001-09-02 11:38 ` Alexander Viro
@ 2001-09-02 12:49   ` Alan Cox
  0 siblings, 0 replies; 16+ messages in thread
From: Alan Cox @ 2001-09-02 12:49 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Andries.Brouwer, alan, linux-kernel, torvalds

> Not an option for any tree I'd ever run on my box and IIRC Linus was not
> likely to inflict that on his.  As for the glibc...
> 	a) not everything uses this piece of shit
> 	b) it would be really nice to get rid of it completely someday

In the userspace world the 64bit dev_t doesn't show up in anyones profiles
its an unusual opaque cookie compare, or a cast (and the cast case gcc
generates nice code for anyway). 

Alan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
  2001-09-02 10:24 Andries.Brouwer
@ 2001-09-02 11:38 ` Alexander Viro
  2001-09-02 12:49   ` Alan Cox
  2001-09-02 15:38 ` Richard Gooch
  1 sibling, 1 reply; 16+ messages in thread
From: Alexander Viro @ 2001-09-02 11:38 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: alan, linux-kernel, torvalds



On Sun, 2 Sep 2001 Andries.Brouwer@cwi.nl wrote:

> That majors allocation is frozen is not my problem.
> I make life simple with a 64-bit dev_t. Everybody else already has the
> disadvantages of a 64-bit dev_t, since glibc moves 64 bits around,
> but not the advantage of a large namespace.

Not an option for any tree I'd ever run on my box and IIRC Linus was not
likely to inflict that on his.  As for the glibc...
	a) not everything uses this piece of shit
	b) it would be really nice to get rid of it completely someday

> But that is a separate discussion.

<nod>

> Concerning devfs, I don't use it and have not really thought about it.
> I think my point of view would be that devfs provides a different object.
> A device node in a filesystem is a pair (pathname, dev_t).
> Opening it gives the triple (pathname, dev_t, kdev_t).
> What devfs provides is (pathname), after opening (pathname, kdev_t).

Wait a second.  To hell with devfs, I'm talking about any synthetic tree
containing device nodes.  Being forced to allocate a point in numeric
namespace just to be able to associate a driver-created inode with
(also driver-created) device...  Looks rather bogus, not to mention the
ugliness of maintaining said numeric namespace.

> But I think the pointer kdev_t (or i_bcdev) must still be recomputed:
> it remains true that modules can be unloaded.

Umm... So what? They can be unloaded only when we have device not opened.
We might leave both allocation and freeing to driver-side code (and that
includes grok_partitions()) and let the freeing code reset ->i_bdev and
friends in all relevant inodes.
 
> Alternatives like putting these things in a list are much worse,
> since that would slow down the handling of all inodes, not only
> device nodes.

???
blkdev_open() would do list_add()
freeing foo_device would go through the list and reset ->i_bdev/do list_del()
for each element.

Where's the overhead for non-device inodes?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
@ 2001-09-02 10:24 Andries.Brouwer
  2001-09-02 11:38 ` Alexander Viro
  2001-09-02 15:38 ` Richard Gooch
  0 siblings, 2 replies; 16+ messages in thread
From: Andries.Brouwer @ 2001-09-02 10:24 UTC (permalink / raw)
  To: Andries.Brouwer, viro; +Cc: alan, linux-kernel, torvalds

    From viro@math.psu.edu Sun Sep  2 01:54:52 2001

    On Sat, 1 Sep 2001 Andries.Brouwer@cwi.nl wrote:

    > Since the refcount is for the device struct, we cannot do anything
    > until that count hits zero. Then the release method of the device struct
    > is called (which frees it, or decrements a refcount for the module).

    Wait a minute. You had suggested (upthread) that these objects are allocated
    by partition code and contain per-partition data. Freeing them once the device
    is closed looks very odd.

The partition data itself is in the gendisk chain.
A device struct contains a pointer to it.

But this is a separate discussion: what is the precise contents of
device struct and driver struct.

(I have used different versions. My main interest was always in
turning k[b]dev_t into a pointer and removing the arrays. Since some
of the arrays are 1-dimensional and some are 2-dimensional, the
mechanical conversion leads to a majorstruct that contains the integer
major and a name function and the contents of the 1-dimensional arrays,
and a minorstruct that contains the integer minor and the contents
of the 2-dimensional arrays and a pointer to the majorstruct.
People tend to panic "don't you know that there is no 1-1 corr.."
when they hear "majorstruct" so I started using "driverstruct".)

    > After this i_bcdev cannot be used anymore.
    > Since we don't know whether this happened already, i_bcdev must be
    > recomputed on each open or mount.

    ... and that means that we can't make devfs-like filesystems just set
    ->i_bdev (or ->i_cdev) at read_super() (or lookup()) and avoid all mess
    with major/minor allocation.  IMO that's unfortunate, especially since
    majors allocation is on the permanent freeze.

That majors allocation is frozen is not my problem.
I make life simple with a 64-bit dev_t. Everybody else already has the
disadvantages of a 64-bit dev_t, since glibc moves 64 bits around,
but not the advantage of a large namespace.
But that is a separate discussion.

Concerning devfs, I don't use it and have not really thought about it.
I think my point of view would be that devfs provides a different object.
A device node in a filesystem is a pair (pathname, dev_t).
Opening it gives the triple (pathname, dev_t, kdev_t).
What devfs provides is (pathname), after opening (pathname, kdev_t).
But I think the pointer kdev_t (or i_bcdev) must still be recomputed:
it remains true that modules can be unloaded.

    > (One might invent additional data structure to avoid this recomputation,
    > but data structures take memory and add the complication that they
    > must be kept consistent and up-to-date. Since mounting, or opening
    > a block device, are very infrequent operations, it does not matter
    > that we do a possibly superfluous bdopen().)

    Once you look at character devices (they have same set of problems)
    frequency goes up big way.

True. I still think there would be no problem - this reopen is not expensive.

Alternatives like putting these things in a list are much worse,
since that would slow down the handling of all inodes, not only
device nodes.

Andries

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
  2001-09-01 20:42 Andries.Brouwer
  2001-09-01 21:26 ` Jamie Lokier
  2001-09-01 23:41 ` Anton Altaparmakov
@ 2001-09-01 23:54 ` Alexander Viro
  2 siblings, 0 replies; 16+ messages in thread
From: Alexander Viro @ 2001-09-01 23:54 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: alan, linux-kernel, torvalds



On Sat, 1 Sep 2001 Andries.Brouwer@cwi.nl wrote:

>     How do you tell when inode is not opened anymore?
> ...
>     When do you reset it [i_bcdev]?
> 
> Now I see what you are asking - at first I thought you wondered
> about the bookkeeping for the device struct, and there is no problem there,
> but you ask about the bookkeeping for i_bcdev.
> 
> Since the refcount is for the device struct, we cannot do anything
> until that count hits zero. Then the release method of the device struct
> is called (which frees it, or decrements a refcount for the module).

Wait a minute. You had suggested (upthread) that these objects are allocated
by partition code and contain per-partition data. Freeing them once the device
is closed looks very odd.

> After this i_bcdev cannot be used anymore.
> Since we don't know whether this happened already, i_bcdev must be
> recomputed on each open or mount.

... and that means that we can't make devfs-like filesystems just set
->i_bdev (or ->i_cdev) at read_super() (or lookup()) and avoid all mess
with major/minor allocation.  IMO that's unfortunate, especially since
majors allocation is on the permanent freeze.

> (One might invent additional data structure to avoid this recomputation,
> but data structures take memory and add the complication that they
> must be kept consistent and up-to-date. Since mounting, or opening
> a block device, are very infrequent operations, it does not matter
> that we do a possibly superfluous bdopen().)

Once you look at character devices (they have same set of problems)
frequency goes up big way.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
  2001-09-01 20:42 Andries.Brouwer
  2001-09-01 21:26 ` Jamie Lokier
@ 2001-09-01 23:41 ` Anton Altaparmakov
  2001-09-01 23:54 ` Alexander Viro
  2 siblings, 0 replies; 16+ messages in thread
From: Anton Altaparmakov @ 2001-09-01 23:41 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Andries.Brouwer, viro, alan, linux-kernel, torvalds

At 22:26 01/09/2001, Jamie Lokier wrote:
>Andries.Brouwer@cwi.nl wrote:
> >[...]
> > However, a union is not so bad. It seems a pity to avoid unions
> > and waste 4 bytes for every inode with separate i_bdev and i_cdev
> > instead of a single i_bcdev.
>
>Please, a union of different pointer types is much nicer.  You can have
>i_bdev and i_cdev without wasting any bytes.
>
>This form works with GCC 2.96:
>
>                 union {
>                         struct char_device * i_cdev;
>                         struct block_device * i_bdev;
>                 };

It sure does. One of the nicest new gcc features IMHO!

>If you're using a really old compiler that doesn't support anonymous unions,
>(GCC 2.95 might be in this category, I'm not sure),

GCC 2.95 does NOT support this. I only found out after I had people 
complain to me that Linux-NTFS userspace tools would not compile for them 
and it turned out they were using gcc-2.95... and I 2.96. - I have since 
then verified this myself:

egcs and gcc up to 2.95 do not support unnamed structs/unions.

gcc-2.96 and gcc-3.0 support them fine.

>  then you'll need this:
>
>         #define i_bdev __i_bcdev_union.i_bdev
>         #define i_cdev __i_bcdev_union.i_cdev
>                 union {
>                         struct char_device * i_cdev;
>                         struct block_device * i_bdev;
>                 } __i_bcdev_union;

Neat trick! Thanks! I was wondering what to do with NTFS TNG driver (which 
uses unnamed structs/unions extensively) and this just might solve my 
problems without having to rewrite half the driver... (-;

Best regards,

         Anton


-- 
   "Nothing succeeds like success." - Alexandre Dumas
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
  2001-09-01 20:42 Andries.Brouwer
@ 2001-09-01 21:26 ` Jamie Lokier
  2001-09-01 23:41 ` Anton Altaparmakov
  2001-09-01 23:54 ` Alexander Viro
  2 siblings, 0 replies; 16+ messages in thread
From: Jamie Lokier @ 2001-09-01 21:26 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: viro, alan, linux-kernel, torvalds

Andries.Brouwer@cwi.nl wrote:
>     From viro@math.psu.edu Sat Sep  1 18:26:53 2001
>     > A kdev_t is a pointer to a struct that has the info now found in
>     > the arrays (and major, minor fields, and a name function..).
>     > This struct is allocated by the driver.
> 
>     Umm... Apply the arguments from the char_device thread - pointers to
>     unions are rather bad idea.  IOW, kdev_t must die - kernel always
>     knows which kind we are dealing with.
>[...]
> However, a union is not so bad. It seems a pity to avoid unions
> and waste 4 bytes for every inode with separate i_bdev and i_cdev
> instead of a single i_bcdev.

Please, a union of different pointer types is much nicer.  You can have
i_bdev and i_cdev without wasting any bytes.

This form works with GCC 2.96:

		union {
			struct char_device * i_cdev;
			struct block_device * i_bdev;
		};

If you're using a really old compiler that doesn't support anonymous unions,
(GCC 2.95 might be in this category, I'm not sure), then you'll need this:

	#define i_bdev __i_bcdev_union.i_bdev
	#define i_cdev __i_bcdev_union.i_cdev
		union {
			struct char_device * i_cdev;
			struct block_device * i_bdev;
		} __i_bcdev_union;

Either way, you avoid pointers to unions and you also avoid having a
named union type which contains pointers.

-- Jamie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
@ 2001-09-01 20:42 Andries.Brouwer
  2001-09-01 21:26 ` Jamie Lokier
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Andries.Brouwer @ 2001-09-01 20:42 UTC (permalink / raw)
  To: Andries.Brouwer, viro; +Cc: alan, linux-kernel, torvalds

    From viro@math.psu.edu Sat Sep  1 18:26:53 2001

    On Sat, 1 Sep 2001 Andries.Brouwer@cwi.nl wrote:

    > A kdev_t is a pointer to a struct that has the info now found in
    > the arrays (and major, minor fields, and a name function..).
    > This struct is allocated by the driver.

    Umm... Apply the arguments from the char_device thread - pointers to
    unions are rather bad idea.  IOW, kdev_t must die - kernel always
    knows which kind we are dealing with.

I don't mind.
Nothing really changes if you split it into kbdev_t and kcdev_t.

(Splitting is a lot of work, a large source edit,
but basically straightforward. I did similar things a few times,
splitting MKDEV into MKBDEV and MKCDEV [and MKXDEV].
However, a union is not so bad. It seems a pity to avoid unions
and waste 4 bytes for every inode with separate i_bdev and i_cdev
instead of a single i_bcdev.
Anyway, this has nothing to do with the present discussion.)

...
    > This "a refcount" is the openct field of the device struct,
    > somewhat like the present bd_openers.
    > 
    > The decrements of the refcount are done in kill_super() for s_dev
    > and at the close/umount corresponding to the open/mount that set it for i_bcdev.

    ??? So you decrement twice on umount?

Yes, and increment twice on mount.
That may be easiest since s_dev can also come from get_unnamed_dev().

...
    How do you tell when inode is not opened anymore?
...
    When do you reset it [i_bcdev]?

Now I see what you are asking - at first I thought you wondered
about the bookkeeping for the device struct, and there is no problem there,
but you ask about the bookkeeping for i_bcdev.

Since the refcount is for the device struct, we cannot do anything
until that count hits zero. Then the release method of the device struct
is called (which frees it, or decrements a refcount for the module).
After this i_bcdev cannot be used anymore.
Since we don't know whether this happened already, i_bcdev must be
recomputed on each open or mount.

(One might invent additional data structure to avoid this recomputation,
but data structures take memory and add the complication that they
must be kept consistent and up-to-date. Since mounting, or opening
a block device, are very infrequent operations, it does not matter
that we do a possibly superfluous bdopen().)

Andries

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
  2001-09-01 13:30 Andries.Brouwer
@ 2001-09-01 16:26 ` Alexander Viro
  0 siblings, 0 replies; 16+ messages in thread
From: Alexander Viro @ 2001-09-01 16:26 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: torvalds, alan, linux-kernel



On Sat, 1 Sep 2001 Andries.Brouwer@cwi.nl wrote:

> A kdev_t is a pointer to a struct that has the info now found in
> the arrays (and major, minor fields, and a name function..).
> This struct is allocated by the driver.
> Maybe it is statically compiled in, and no refcounting is needed.
> Maybe it is a static struct in the driver which is a module.
> Now refcounting is needed so that we can refuse to unload the driver
> when the count is nonzero.
> Maybe the struct was allocated dynamically, and we need a refcount
> to be able to free it again.
> Only the driver knows such details.

Umm... Apply the arguments from the char_device thread - pointers to
unions are rather bad idea.  IOW, kdev_t must die - kernel always
knows which kind we are dealing with.

> To be more precise, I usually used two levels: driverstruct and
> devicestruct, where a kdev_t is a pointer to a devicestruct
> and the devicestruct contains a pointer to the driverstruct.
> In the block device case, the handling of devicestructs is the
> task of the partitioning code. Of course the driverstructs belong
> to the driver.
> 
> An inode has fields kdev_t i_dev and dev_t i_rdev and kdev_t i_bcdev
> where the last two are significant only for devices, and the last one
> only for opened devices. It is the opened version of i_rdev, and
> significant whenever non-NULL.

How do you tell when inode is not opened anymore? Notice that counting
openers of _device_ is wrong answer - several inodes can have the same
major:minor.  By the time when the last one is closed you may have a
lot of trouble finding other ones...

> Concerning refcounting:
> i_dev comes from s_dev and no refcount is required as long as sb exists
> s_dev comes from get_unnamed_dev() or ROOT_DEV or i_bcdev
> and a refcount must be incremented when it is set
> i_bcdev comes from opening i_rdev and a refcount must be incremented
> when it is set.

When do you reset it?

> This "a refcount" is the openct field of the device struct,
> somewhat like the present bd_openers.
> 
> The decrements of the refcount are done in kill_super() for s_dev
> and at the close/umount corresponding to the open/mount that set it for i_bcdev.

??? So you decrement twice on umount?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] lazy allocation of struct block_device
@ 2001-09-01 13:30 Andries.Brouwer
  2001-09-01 16:26 ` Alexander Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Andries.Brouwer @ 2001-09-01 13:30 UTC (permalink / raw)
  To: torvalds, viro; +Cc: alan, linux-kernel

(This sounds like a fragment in a discussion where I have not seen
the previous fragment.)

        Linus, I've looked into that (allocating ->i_bdev upon open()).
    There are several problems with this approach and none of the solutions I
    can see looks like a clear winner.

    1) when do we drop ->i_bdev?
    2) how should we change the refcounting on struct block_device?
    3) what to do with beasts that don't have major number and just set
    ->i_bdev when they allocate an inode?

(I'll use my own terminology since the above questions about i_bdev
only make sense if you first define the precise intended function of i_bdev.
Probably you'll be able to translate, or tell me what point I should address.)

A kdev_t is a pointer to a struct that has the info now found in
the arrays (and major, minor fields, and a name function..).
This struct is allocated by the driver.
Maybe it is statically compiled in, and no refcounting is needed.
Maybe it is a static struct in the driver which is a module.
Now refcounting is needed so that we can refuse to unload the driver
when the count is nonzero.
Maybe the struct was allocated dynamically, and we need a refcount
to be able to free it again.
Only the driver knows such details.

To be more precise, I usually used two levels: driverstruct and
devicestruct, where a kdev_t is a pointer to a devicestruct
and the devicestruct contains a pointer to the driverstruct.
In the block device case, the handling of devicestructs is the
task of the partitioning code. Of course the driverstructs belong
to the driver.

An inode has fields kdev_t i_dev and dev_t i_rdev and kdev_t i_bcdev
where the last two are significant only for devices, and the last one
only for opened devices. It is the opened version of i_rdev, and
significant whenever non-NULL.
(It was a mistake of mine to make i_rdev a kdev_t: device nodes
can just contain random numbers. The current code contains the same
mistake when the mknod() code does init_special_inode(inode, mode, rdev);
One should first start worrying about rdev when the thing is opened.)

Concerning refcounting:
i_dev comes from s_dev and no refcount is required as long as sb exists
s_dev comes from get_unnamed_dev() or ROOT_DEV or i_bcdev
and a refcount must be incremented when it is set
i_bcdev comes from opening i_rdev and a refcount must be incremented
when it is set.
This "a refcount" is the openct field of the device struct,
somewhat like the present bd_openers.

The decrements of the refcount are done in kill_super() for s_dev
and at the close/umount corresponding to the open/mount that set it for i_bcdev.

Andries

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2001-09-02 20:06 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-31  4:43 [RFC] lazy allocation of struct block_device Alexander Viro
2001-09-01 13:30 Andries.Brouwer
2001-09-01 16:26 ` Alexander Viro
2001-09-01 20:42 Andries.Brouwer
2001-09-01 21:26 ` Jamie Lokier
2001-09-01 23:41 ` Anton Altaparmakov
2001-09-01 23:54 ` Alexander Viro
2001-09-02 10:24 Andries.Brouwer
2001-09-02 11:38 ` Alexander Viro
2001-09-02 12:49   ` Alan Cox
2001-09-02 15:38 ` Richard Gooch
2001-09-02 16:07   ` Alexander Viro
2001-09-02 16:16   ` Richard Gooch
2001-09-02 17:25 Andries.Brouwer
2001-09-02 18:46 ` Alexander Viro
2001-09-02 20:05 Andries.Brouwer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).