linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* LANANA: To Pending Device Number Registrants
@ 2001-05-14 19:19 H. Peter Anvin
  2001-05-14 19:36 ` Jeff Garzik
                   ` (3 more replies)
  0 siblings, 4 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-14 19:19 UTC (permalink / raw)
  To: Linux Kernel Mailing List

First of all, I apologize for not having sent this notice out sooner. 
This kind of writing is very painful to deal with.

Linus Torvalds has requested a moratorium on new device number
assignments. His hope is that a new and better method for device space
handing will emerge as a result.

Alan Cox has requested that I maintain a forked registry for his -ac
kernel patch tree.  I have agreed to do so once I have forked off the
"final" version of the registry for Linus' tree.  At that time I will
process the backlog for the benefit of the -ac registry only.  Please
have patience until I can get that to happen.

Please note that this is not my decision (in fact, I have serious
concerns with it.)  In particular, /dev namespace coordination still
applies.

Sincerely,

	H. Peter Anvin
	The Linux Assigned Names and Numbers Authority

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 19:19 LANANA: To Pending Device Number Registrants H. Peter Anvin
@ 2001-05-14 19:36 ` Jeff Garzik
  2001-05-14 19:57   ` H. Peter Anvin
  2001-05-14 20:09 ` Richard Gooch
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 317+ messages in thread
From: Jeff Garzik @ 2001-05-14 19:36 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux Kernel Mailing List, Linus Torvalds, viro

"H. Peter Anvin" wrote:
> Linus Torvalds has requested a moratorium on new device number
> assignments. His hope is that a new and better method for device space
> handing will emerge as a result.

Here's my suggestion for a solution.

Once I work through a bunch of net driver problems, I want to release a
snapshot block device driver (freezes a blkdev in time).  For this, I
needed a block major.  After hearing about the device number freeze, I
was wondering if this solution works:

Register block device using existing API, and obtain a dynamically
assigned major number.  Export a tiny ramfs which lists all device
nodes.  Mounted on /dev/snap, /dev/snap/0 would be the first blkdev for
snap's dynamically assigned major.  (Al Viro said he has skeleton code
to create such an fs, IIRC)

This solution
(a) keeps from grot-ing up /proc even more [I had considered
proc_mknod() until viro talked me out of it]
(b) does not require centrally assigned majors and minors.
(c) does not require devfs.  most distros ship without it afaik, and
switching to it is not an overnight process, and requires devfsd to be
useful in the real world.

-- 
Jeff Garzik      | Game called on account of naked chick
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 19:36 ` Jeff Garzik
@ 2001-05-14 19:57   ` H. Peter Anvin
  2001-05-14 20:04     ` Jeff Garzik
                       ` (2 more replies)
  0 siblings, 3 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-14 19:57 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linux Kernel Mailing List, Linus Torvalds, viro

Jeff Garzik wrote:
> 
> "H. Peter Anvin" wrote:
> > Linus Torvalds has requested a moratorium on new device number
> > assignments. His hope is that a new and better method for device space
> > handing will emerge as a result.
> 
> Here's my suggestion for a solution.
> 
> Once I work through a bunch of net driver problems, I want to release a
> snapshot block device driver (freezes a blkdev in time).  For this, I
> needed a block major.  After hearing about the device number freeze, I
> was wondering if this solution works:
> 
> Register block device using existing API, and obtain a dynamically
> assigned major number.  Export a tiny ramfs which lists all device
> nodes.  Mounted on /dev/snap, /dev/snap/0 would be the first blkdev for
> snap's dynamically assigned major.  (Al Viro said he has skeleton code
> to create such an fs, IIRC)
> 
> This solution
> (a) keeps from grot-ing up /proc even more [I had considered
> proc_mknod() until viro talked me out of it]
> (b) does not require centrally assigned majors and minors.
> (c) does not require devfs.  most distros ship without it afaik, and
> switching to it is not an overnight process, and requires devfsd to be
> useful in the real world.
> 

It does, however, not manage permissions, nor does it provide for a sane
namespace (it exposes too many internal implementation details in the
interface -- in particular, the driver becomes part of the namespace, and
devices move around between drivers regularly.)

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 19:57   ` H. Peter Anvin
@ 2001-05-14 20:04     ` Jeff Garzik
  2001-05-14 20:09     ` Alan Cox
  2001-05-14 23:32     ` Richard Gooch
  2 siblings, 0 replies; 317+ messages in thread
From: Jeff Garzik @ 2001-05-14 20:04 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux Kernel Mailing List, Linus Torvalds, viro

"H. Peter Anvin" wrote:
> Jeff Garzik wrote:
> > Register block device using existing API, and obtain a dynamically
> > assigned major number.  Export a tiny ramfs which lists all device
> > nodes.  Mounted on /dev/snap, /dev/snap/0 would be the first blkdev for
> > snap's dynamically assigned major.  (Al Viro said he has skeleton code
> > to create such an fs, IIRC)

> It does, however, not manage permissions, nor does it provide for a sane
> namespace (it exposes too many internal implementation details in the
> interface -- in particular, the driver becomes part of the namespace, and
> devices move around between drivers regularly.)

True -- that capability is provided by devfs+devfsd, which is supported
in the driver by using the existing 2.4 devfs hooks.  For that case one
would not mount the driver's devfs.

I look at it as an effective transition mechanism.  Device numbers are
frozen now, but devfs is not deployed now.  My solution gets us from
point A to point B, and is IMHO workable in the stable 2.4 series.  We
can go 100% devfs or whatever in 2.5 or 3.0..

Regards,

	Jeff


-- 
Jeff Garzik      | Game called on account of naked chick
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 19:19 LANANA: To Pending Device Number Registrants H. Peter Anvin
  2001-05-14 19:36 ` Jeff Garzik
@ 2001-05-14 20:09 ` Richard Gooch
  2001-05-14 20:14   ` Jeff Garzik
  2001-05-15 17:37 ` Pavel Machek
  2001-05-16 15:58 ` Kurt Garloff
  3 siblings, 1 reply; 317+ messages in thread
From: Richard Gooch @ 2001-05-14 20:09 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: H. Peter Anvin, Linux Kernel Mailing List, Linus Torvalds, viro

Jeff Garzik writes:
> "H. Peter Anvin" wrote:
> > Linus Torvalds has requested a moratorium on new device number
> > assignments. His hope is that a new and better method for device space
> > handing will emerge as a result.
> 
> Here's my suggestion for a solution.
> 
> Once I work through a bunch of net driver problems, I want to release a
> snapshot block device driver (freezes a blkdev in time).  For this, I
> needed a block major.  After hearing about the device number freeze, I
> was wondering if this solution works:
> 
> Register block device using existing API, and obtain a dynamically
> assigned major number.  Export a tiny ramfs which lists all device
> nodes.  Mounted on /dev/snap, /dev/snap/0 would be the first blkdev for
> snap's dynamically assigned major.  (Al Viro said he has skeleton code
> to create such an fs, IIRC)
> 
> This solution
> (a) keeps from grot-ing up /proc even more [I had considered
> proc_mknod() until viro talked me out of it]
> (b) does not require centrally assigned majors and minors.
> (c) does not require devfs.  most distros ship without it afaik, and
> switching to it is not an overnight process, and requires devfsd to be
> useful in the real world.

So we add yet another series of hacks to avoid doing what's
necessary?!?

BTW: I once made a patch that put back in the compatibility device
names in the kernel, so you don't need to run devfsd for this.
Obviously, that's not a patch that Linus would want in his kernel
(otherwise he wouldn't have made me take them out in the first place),
but it is something vendors can add in their patchsets (does anybody
ship a virgin kernel?).

This patch is very small and clean. It touches two places in
fs/devfs/base.c and creates one new file in fs/devfs.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 19:57   ` H. Peter Anvin
  2001-05-14 20:04     ` Jeff Garzik
@ 2001-05-14 20:09     ` Alan Cox
  2001-05-14 20:24       ` Jeff Garzik
  2001-05-14 23:32     ` Richard Gooch
  2 siblings, 1 reply; 317+ messages in thread
From: Alan Cox @ 2001-05-14 20:09 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, Linux Kernel Mailing List, Linus Torvalds, viro

> > (c) does not require devfs.  most distros ship without it afaik, and
> > switching to it is not an overnight process, and requires devfsd to be
> > useful in the real world.
> > 
> 
> It does, however, not manage permissions, nor does it provide for a sane
> namespace (it exposes too many internal implementation details in the
> interface -- in particular, the driver becomes part of the namespace, and
> devices move around between drivers regularly.)

It is also very hard to tar that device file.

As to devfsd well Al Viro was reporting races in it long ago that I don't 
believe Richard has had time to fix nor has anyone else fixed. 

What is the state on devfs there ?


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:09 ` Richard Gooch
@ 2001-05-14 20:14   ` Jeff Garzik
  0 siblings, 0 replies; 317+ messages in thread
From: Jeff Garzik @ 2001-05-14 20:14 UTC (permalink / raw)
  To: Richard Gooch
  Cc: H. Peter Anvin, Linux Kernel Mailing List, Linus Torvalds, viro

Richard Gooch wrote:
> So we add yet another series of hacks to avoid doing what's
> necessary?!?

We cannot change the world in a day.  :)

"Doing what's necessary" is way beyond the scope of 2.4, IMHO.

-- 
Jeff Garzik      | Game called on account of naked chick
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:09     ` Alan Cox
@ 2001-05-14 20:24       ` Jeff Garzik
  2001-05-14 20:27         ` H. Peter Anvin
                           ` (2 more replies)
  0 siblings, 3 replies; 317+ messages in thread
From: Jeff Garzik @ 2001-05-14 20:24 UTC (permalink / raw)
  To: Alan Cox; +Cc: H. Peter Anvin, Linux Kernel Mailing List, Linus Torvalds, viro

Note also that persistence of permissions and hardcoded in-kernel naming
is a problem throughout proc...  It's not unique to in-driver
filesystems.
-- 
Jeff Garzik      | Game called on account of naked chick
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:24       ` Jeff Garzik
@ 2001-05-14 20:27         ` H. Peter Anvin
  2001-05-14 22:21           ` Alan Cox
  2001-05-14 20:29         ` Linus Torvalds
  2001-05-14 21:18         ` Alan Cox
  2 siblings, 1 reply; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-14 20:27 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Alan Cox, Linux Kernel Mailing List, Linus Torvalds, viro

Jeff Garzik wrote:
> 
> Note also that persistence of permissions and hardcoded in-kernel naming
> is a problem throughout proc...  It's not unique to in-driver
> filesystems.
>

It's not so much about hardcoding the names as hardcoding the *STRUCTURE*
of the names.  For example, the current devfs has /dev/misc/* which is
completely bogus -- it exposes an implementation detail (using the
miscdev API as opposed to the charmajor API) which should be hidden; in
fact a number of drivers have started their lives as miscdev devices and
changed over time.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:24       ` Jeff Garzik
  2001-05-14 20:27         ` H. Peter Anvin
@ 2001-05-14 20:29         ` Linus Torvalds
  2001-05-14 20:55           ` Neil Brown
                             ` (4 more replies)
  2001-05-14 21:18         ` Alan Cox
  2 siblings, 5 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-14 20:29 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Alan Cox, H. Peter Anvin, Linux Kernel Mailing List, viro



On Mon, 14 May 2001, Jeff Garzik wrote:
>
> Note also that persistence of permissions and hardcoded in-kernel naming
> is a problem throughout proc...  It's not unique to in-driver
> filesystems.

Also note how a 32-bit (or 64-bit) dev_t does NOT make it any easier to
manage permissions or anything like that anyway. Look at the current mess
/dev is. Imagine it an order of magnitude worse.

Big device numbers are _not_ a solution. I will accept a 32-bit one, but
no more, and I will _not_ accept a "manage by hand" approach any more. The
time has long since come to say "No". Which I've done. If you can't make
it manage the thing automatically with a script, you won't get a hardcoded
major device number just because you're lazy.

End of discussion.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:29         ` Linus Torvalds
@ 2001-05-14 20:55           ` Neil Brown
  2001-05-14 21:20             ` Alan Cox
                               ` (2 more replies)
  2001-05-14 21:09           ` Andi Kleen
                             ` (3 subsequent siblings)
  4 siblings, 3 replies; 317+ messages in thread
From: Neil Brown @ 2001-05-14 20:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Garzik, Alan Cox, H. Peter Anvin, Linux Kernel Mailing List, viro

On Monday May 14, torvalds@transmeta.com wrote:
> 
> End of discussion.
> 
> 		Linus
> 

...and start of education please...

I want to create a new block device - it is a different interface to
the software-raid code that allows the arrays to be partitioned using
normal partition tables.

So I need a major number - to give to devfs_register_blkdev at least.
You don't want me to have a hardcoded one (which is fine) so I need a
dynamically allocated one - yes?

This means that we need some analogue to {get,put}_unnamed_dev that
manages a range of dynamically allocated majors.
Is there such a beast already, or does someone need to write it?
What range(s) should be used for block devices? 

Am I missing something obvious here?

NeilBrown

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:29         ` Linus Torvalds
  2001-05-14 20:55           ` Neil Brown
@ 2001-05-14 21:09           ` Andi Kleen
  2001-05-14 21:11           ` Rik van Riel
                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 317+ messages in thread
From: Andi Kleen @ 2001-05-14 21:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Garzik, Alan Cox, H. Peter Anvin, Linux Kernel Mailing List, viro

On Mon, May 14, 2001 at 01:29:51PM -0700, Linus Torvalds wrote:
> Big device numbers are _not_ a solution. I will accept a 32-bit one, but
> no more, and I will _not_ accept a "manage by hand" approach any more. The
> time has long since come to say "No". Which I've done. If you can't make
> it manage the thing automatically with a script, you won't get a hardcoded
> major device number just because you're lazy.

As far as I can see it just needs a /proc/devices that also outputs
minor ranges with names, and a small program similar to scsidev to 
generate nodes in /dev based on that on the fly on early bootup.

Is that what you have in mind?

-Andi

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:29         ` Linus Torvalds
  2001-05-14 20:55           ` Neil Brown
  2001-05-14 21:09           ` Andi Kleen
@ 2001-05-14 21:11           ` Rik van Riel
  2001-05-14 21:23             ` Alan Cox
  2001-05-14 21:16           ` Alan Cox
  2001-05-14 23:34           ` LANANA: To Pending Device Number Registrants Richard Gooch
  4 siblings, 1 reply; 317+ messages in thread
From: Rik van Riel @ 2001-05-14 21:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Garzik, Alan Cox, H. Peter Anvin, Linux Kernel Mailing List, viro

On Mon, 14 May 2001, Linus Torvalds wrote:

> End of discussion.

I've been doubting whether to work on both the -ac kernels
and the -linus tree, but this is a pretty good argument for
sticking with -ac and just ignoring the -linus tree...

Lets see what happens...

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:29         ` Linus Torvalds
                             ` (2 preceding siblings ...)
  2001-05-14 21:11           ` Rik van Riel
@ 2001-05-14 21:16           ` Alan Cox
  2001-05-14 22:05             ` Alexander Viro
  2001-05-14 23:34           ` LANANA: To Pending Device Number Registrants Richard Gooch
  4 siblings, 1 reply; 317+ messages in thread
From: Alan Cox @ 2001-05-14 21:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Garzik, Alan Cox, H. Peter Anvin, Linux Kernel Mailing List, viro

> Big device numbers are _not_ a solution. I will accept a 32-bit one, but
> no more, and I will _not_ accept a "manage by hand" approach any more. The
> time has long since come to say "No". Which I've done. If you can't make
> it manage the thing automatically with a script, you won't get a hardcoded
> major device number just because you're lazy.

And on that issue I'm so convinced you are wrong I'm prepared to maintain
sensible Unix device behaviour in the -ac pretty much indefinitely.

> End of discussion.

And that is precisely why ....


Abstract device file systems are beautiful concepts but they don't solve
the device name space problem and they introduce hideous incompatibilities
with existing software. Plan 9 is beautiful. It has a userbase approximately
the size of Linux 0.12 - because it is not compatible.

Alan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:24       ` Jeff Garzik
  2001-05-14 20:27         ` H. Peter Anvin
  2001-05-14 20:29         ` Linus Torvalds
@ 2001-05-14 21:18         ` Alan Cox
  2 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-14 21:18 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Alan Cox, H. Peter Anvin, Linux Kernel Mailing List,
	Linus Torvalds, viro

> Note also that persistence of permissions and hardcoded in-kernel naming
> is a problem throughout proc...  It's not unique to in-driver
> filesystems.

And the /proc namespace is a walking testimony to why numbers are not the 
primarily problem in /dev space and tidyness

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:55           ` Neil Brown
@ 2001-05-14 21:20             ` Alan Cox
  2001-05-14 21:37               ` Neil Brown
  2001-05-14 21:24             ` Jeff Garzik
  2001-05-15  6:41             ` Linus Torvalds
  2 siblings, 1 reply; 317+ messages in thread
From: Alan Cox @ 2001-05-14 21:20 UTC (permalink / raw)
  To: Neil Brown
  Cc: Linus Torvalds, Jeff Garzik, Alan Cox, H. Peter Anvin,
	Linux Kernel Mailing List, viro

> This means that we need some analogue to {get,put}_unnamed_dev that
> manages a range of dynamically allocated majors.
> Is there such a beast already, or does someone need to write it?
> What range(s) should be used for block devices? 
> 
> Am I missing something obvious here?

Obvious question: Do you need your majors to be together in order, or can
you pick 8 random numbers each boot and expect the user to cope ?

Equally if they were static numbers do they have to be together or scattered ?


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 21:11           ` Rik van Riel
@ 2001-05-14 21:23             ` Alan Cox
  2001-05-15  0:33               ` Rik van Riel
  0 siblings, 1 reply; 317+ messages in thread
From: Alan Cox @ 2001-05-14 21:23 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Linus Torvalds, Jeff Garzik, Alan Cox, H. Peter Anvin,
	Linux Kernel Mailing List, viro

> I've been doubting whether to work on both the -ac kernels
> and the -linus tree, but this is a pretty good argument for
> sticking with -ac and just ignoring the -linus tree...

Time will make that decision. Linus kindly gave us all the power to vote with
our feet. One thing I absolutely refuse to do is to let a disagreemnt over
some specific device implementation turn into an excuse for a wider difference
in the trees.

So yes -ac might have static majors but the rest of it I intend to keep merging
with Linus and tracking closely to his tree. Certainly not ignoring the -linus
tree. 

Alan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:55           ` Neil Brown
  2001-05-14 21:20             ` Alan Cox
@ 2001-05-14 21:24             ` Jeff Garzik
  2001-05-14 21:33               ` Neil Brown
  2001-05-15  6:41             ` Linus Torvalds
  2 siblings, 1 reply; 317+ messages in thread
From: Jeff Garzik @ 2001-05-14 21:24 UTC (permalink / raw)
  To: Neil Brown
  Cc: Linus Torvalds, Alan Cox, H. Peter Anvin,
	Linux Kernel Mailing List, viro

Neil Brown wrote:
> So I need a major number - to give to devfs_register_blkdev at least.
> You don't want me to have a hardcoded one (which is fine) so I need a
> dynamically allocated one - yes?
> 
> This means that we need some analogue to {get,put}_unnamed_dev that
> manages a range of dynamically allocated majors.
> Is there such a beast already, or does someone need to write it?
> What range(s) should be used for block devices?

register_blkdev will assign a dynamic major to your block device, if a
static one is not provided.  This has been true since 2.2, maybe 2.0
IIRC.

-- 
Jeff Garzik      | Game called on account of naked chick
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 21:24             ` Jeff Garzik
@ 2001-05-14 21:33               ` Neil Brown
  0 siblings, 0 replies; 317+ messages in thread
From: Neil Brown @ 2001-05-14 21:33 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Neil Brown, Linus Torvalds, Alan Cox, H. Peter Anvin,
	Linux Kernel Mailing List, viro

On Monday May 14, jgarzik@mandrakesoft.com wrote:
> Neil Brown wrote:
> > So I need a major number - to give to devfs_register_blkdev at least.
> > You don't want me to have a hardcoded one (which is fine) so I need a
> > dynamically allocated one - yes?
> > 
> > This means that we need some analogue to {get,put}_unnamed_dev that
> > manages a range of dynamically allocated majors.
> > Is there such a beast already, or does someone need to write it?
> > What range(s) should be used for block devices?
> 
> register_blkdev will assign a dynamic major to your block device, if a
> static one is not provided.  This has been true since 2.2, maybe 2.0
> IIRC.

Oh, yes.  So it does.  Give it '0' and it will choose one.  I went
looking for this functionality once and just couldn't find it.... must
be blind :-)

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 21:20             ` Alan Cox
@ 2001-05-14 21:37               ` Neil Brown
  0 siblings, 0 replies; 317+ messages in thread
From: Neil Brown @ 2001-05-14 21:37 UTC (permalink / raw)
  To: Alan Cox
  Cc: Neil Brown, Linus Torvalds, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro

On Monday May 14, alan@lxorguk.ukuu.org.uk wrote:
> > This means that we need some analogue to {get,put}_unnamed_dev that
> > manages a range of dynamically allocated majors.
> > Is there such a beast already, or does someone need to write it?
> > What range(s) should be used for block devices? 
> > 
> > Am I missing something obvious here?
> 
> Obvious question: Do you need your majors to be together in order, or can
> you pick 8 random numbers each boot and expect the user to cope ?
> 
> Equally if they were static numbers do they have to be together or scattered ?

I think you are assuming that I need multiple majors so that
potentially all 256 possibly md devices can each be partitioned.  Is
that right?

I wasn't going to be that generous.  If you want to partition md
devices, you can only partition the first 16, into upto 15 partitions each.

If you have need for more arrays or more partitions than that, then
your problem is less like a peanut, and LVM begins to look less like a
sledgehammer.

NeilBrown

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 21:16           ` Alan Cox
@ 2001-05-14 22:05             ` Alexander Viro
  2001-05-14 22:30               ` Alan Cox
  2001-05-14 23:01               ` Interrupted sound with 2.4.4-ac6 Hermann Himmelbauer
  0 siblings, 2 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-14 22:05 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List



On Mon, 14 May 2001, Alan Cox wrote:

> Abstract device file systems are beautiful concepts but they don't solve
> the device name space problem and they introduce hideous incompatibilities
> with existing software. 

let me get it straight. You are talking about software that would be
	a) device-specific,
	b) Linux-only,
	c) working with devices that do not exist in 2.4.

Would you mind demonstrating such wonder? Old devices are still there,
AFAICS. Ext2 (reiserfs, devfs, abortion-of-your-choice-fs) still has
the ability to create device nodes for them.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:27         ` H. Peter Anvin
@ 2001-05-14 22:21           ` Alan Cox
  2001-05-14 23:43             ` Jan Niehusmann
  0 siblings, 1 reply; 317+ messages in thread
From: Alan Cox @ 2001-05-14 22:21 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, Alan Cox, Linux Kernel Mailing List, Linus Torvalds, viro

> It's not so much about hardcoding the names as hardcoding the *STRUCTURE*
> of the names.  For example, the current devfs has /dev/misc/* which is
> completely bogus -- it exposes an implementation detail (using the

The fact kernel space touches on naming directly is itself bogus. devfsd doing
it is nice, devfs doing naming - well it could be done.

> miscdev API as opposed to the charmajor API) which should be hidden; in
> fact a number of drivers have started their lives as miscdev devices and
> changed over time.

IHMO We have three types of namespaces

1.	Kernel interface namespace. Policy set by the kernel. Mappings
	constant and well defined where the underlying objects are well
	defined

	- inodes, dev_t

	You can make it a string if you like but at the end of the day 
	has to be an opaque handle. For constant devices it also has to be
	a constant name. Otherwise the /dev file I archived with the corporate
	backup system turns out to be a different device when I restore the 
	box after a problem and I reformat the wrong disk...

	And yes some stuff really is dynamic. Trying to talk about USB
	devices in a constant way is kind of hard. 

	We could re-encode these as strings. In fact thats how AmigaOS and
	VMS seem to do it. But we have the standards that are based on
	numbers and people who do like to assume they have meaning (eg
	hdparm, some scsi tools, ...). At best the string is a variable length
	encoding of a cookie.

	Another real horror we have is trademarks. We've already had people
	force changes on the name of kernel modules by threatening/asking 
	because names have trademark value and they argue module names 
	could be confusing.

	*NOBODY* at the 2.5 kernel summit had an answer to that problem.

2.	User namespaces. Language dependant. Policy dependant.
	Can be dynamic, can be static, can be arbitary. Not set by kernel
	policy

		/dev/foo

	Generally has to be centrally managed to put an order on the
	name spaces.

3.	Enumeration spaces

	Things like /dev/disc on devfs. Spaces that allow you to walk across
	all devices with a given property. A device can be in many

I don't care how #2 is implemented.  I care that I can implement #2 any way
I like. I care that I can implement #2 compatibly with my existing apps and
my existing backup system. I care that people who dont speak a word of English
can internationalise it.



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:05             ` Alexander Viro
@ 2001-05-14 22:30               ` Alan Cox
  2001-05-14 22:48                 ` Alexander Viro
  2001-05-15  4:12                 ` LANANA: Getting out of hand? God
  2001-05-14 23:01               ` Interrupted sound with 2.4.4-ac6 Hermann Himmelbauer
  1 sibling, 2 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-14 22:30 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Alan Cox, Linus Torvalds, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

> Would you mind demonstrating such wonder? Old devices are still there,
> AFAICS. Ext2 (reiserfs, devfs, abortion-of-your-choice-fs) still has
> the ability to create device nodes for them.

Except that Linus wont hand out major numbers, which means I can't even boot
simply off such a device. I bet the vendors in question dont think the sun
shines out of linus backside any more.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:48                 ` Alexander Viro
@ 2001-05-14 22:46                   ` Alan Cox
  2001-05-14 22:53                     ` Alexander Viro
  0 siblings, 1 reply; 317+ messages in thread
From: Alan Cox @ 2001-05-14 22:46 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Alan Cox, Linus Torvalds, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

> > Except that Linus wont hand out major numbers, which means I can't even boot
> > simply off such a device. I bet the vendors in question dont think the sun
> > shines out of linus backside any more.
> 
> Not really. Special-casing for mounting root is trivially solvable. BTDT,
> and you've reviewed the patch.

And lilo ?



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:30               ` Alan Cox
@ 2001-05-14 22:48                 ` Alexander Viro
  2001-05-14 22:46                   ` Alan Cox
  2001-05-15  4:12                 ` LANANA: Getting out of hand? God
  1 sibling, 1 reply; 317+ messages in thread
From: Alexander Viro @ 2001-05-14 22:48 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List



On Mon, 14 May 2001, Alan Cox wrote:

> > Would you mind demonstrating such wonder? Old devices are still there,
> > AFAICS. Ext2 (reiserfs, devfs, abortion-of-your-choice-fs) still has
> > the ability to create device nodes for them.
> 
> Except that Linus wont hand out major numbers, which means I can't even boot
> simply off such a device. I bet the vendors in question dont think the sun
> shines out of linus backside any more.

Not really. Special-casing for mounting root is trivially solvable. BTDT,
and you've reviewed the patch.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:46                   ` Alan Cox
@ 2001-05-14 22:53                     ` Alexander Viro
  2001-05-14 22:54                       ` H. Peter Anvin
  2001-05-14 22:55                       ` Alan Cox
  0 siblings, 2 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-14 22:53 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List



On Mon, 14 May 2001, Alan Cox wrote:

> > > Except that Linus wont hand out major numbers, which means I can't even boot
> > > simply off such a device. I bet the vendors in question dont think the sun
> > > shines out of linus backside any more.
> > 
> > Not really. Special-casing for mounting root is trivially solvable. BTDT,
> > and you've reviewed the patch.
> 
> And lilo ?

LILO uses BIOS, for fsck sake. It couldn't care less for device numbers
on the kernel side. Ask Andries how much do they have in common with
BIOS drive numbers.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:53                     ` Alexander Viro
@ 2001-05-14 22:54                       ` H. Peter Anvin
  2001-05-14 23:00                         ` Alexander Viro
                                           ` (2 more replies)
  2001-05-14 22:55                       ` Alan Cox
  1 sibling, 3 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-14 22:54 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Alan Cox, Linus Torvalds, Jeff Garzik, Linux Kernel Mailing List

Alexander Viro wrote:
> 
> On Mon, 14 May 2001, Alan Cox wrote:
> 
> > > > Except that Linus wont hand out major numbers, which means I can't even boot
> > > > simply off such a device. I bet the vendors in question dont think the sun
> > > > shines out of linus backside any more.
> > >
> > > Not really. Special-casing for mounting root is trivially solvable. BTDT,
> > > and you've reviewed the patch.
> >
> > And lilo ?
> 
> LILO uses BIOS, for fsck sake. It couldn't care less for device numbers
> on the kernel side. Ask Andries how much do they have in common with
> BIOS drive numbers.
> 

That's not the issue.  LILO takes whatever you pass to root= and converts
it to a device number at /sbin/lilo time.  An idiotic practice on the
part of LILO, in my opinion, that ought to have been fixed a long time
ago.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:53                     ` Alexander Viro
  2001-05-14 22:54                       ` H. Peter Anvin
@ 2001-05-14 22:55                       ` Alan Cox
  2001-05-14 23:11                         ` Dan Hollis
                                           ` (2 more replies)
  1 sibling, 3 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-14 22:55 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Alan Cox, Linus Torvalds, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

> > And lilo ?
> 
> LILO uses BIOS, for fsck sake. It couldn't care less for device numbers
> on the kernel side. Ask Andries how much do they have in common with
> BIOS drive numbers.

grep MAJOR lilo-21.4.4/*|wc -l
    323

Also hdparm
raidtools
psmisc
mtools
mt-st
gpm
joystick

and that is a simple grep of BUILD/*/*.c on RH 7.0. Im not even looking deeper
into subdirectories or powertools or anything like a full debian archive




^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 23:00                         ` Alexander Viro
@ 2001-05-14 22:58                           ` Alan Cox
  2001-05-14 23:29                             ` Alexander Viro
                                               ` (2 more replies)
  2001-05-14 23:39                           ` LANANA: " Richard Gooch
  1 sibling, 3 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-14 22:58 UTC (permalink / raw)
  To: Alexander Viro
  Cc: H. Peter Anvin, Alan Cox, Linus Torvalds, Jeff Garzik,
	Linux Kernel Mailing List

> Oh, _that_ one. <shrug> pass rootname=driver!name (or whatever syntax
> you prefer) to the kernel and call do_mount() instead of sys_mknod() in
> prepare_namespace() (rootfs patch). BFD.

Yet another 2.5 project. If Linus wants to go play with name driven devices
and you want to help him great, but if he'd care to put out
linux-2.5.0.tar.gz _before_ starting that would be good for all of us



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:54                       ` H. Peter Anvin
@ 2001-05-14 23:00                         ` Alexander Viro
  2001-05-14 22:58                           ` Alan Cox
  2001-05-14 23:39                           ` LANANA: " Richard Gooch
  2001-05-14 23:18                         ` Arjan van de Ven
  2001-05-15  5:56                         ` Oliver Neukum
  2 siblings, 2 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-14 23:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Alan Cox, Linus Torvalds, Jeff Garzik, Linux Kernel Mailing List



On Mon, 14 May 2001, H. Peter Anvin wrote:

> > LILO uses BIOS, for fsck sake. It couldn't care less for device numbers
> > on the kernel side. Ask Andries how much do they have in common with
> > BIOS drive numbers.
> > 
> 
> That's not the issue.  LILO takes whatever you pass to root= and converts
> it to a device number at /sbin/lilo time.  An idiotic practice on the
> part of LILO, in my opinion, that ought to have been fixed a long time
> ago.

Oh, _that_ one. <shrug> pass rootname=driver!name (or whatever syntax
you prefer) to the kernel and call do_mount() instead of sys_mknod() in
prepare_namespace() (rootfs patch). BFD.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Interrupted sound with 2.4.4-ac6
  2001-05-14 22:05             ` Alexander Viro
  2001-05-14 22:30               ` Alan Cox
@ 2001-05-14 23:01               ` Hermann Himmelbauer
  1 sibling, 0 replies; 317+ messages in thread
From: Hermann Himmelbauer @ 2001-05-14 23:01 UTC (permalink / raw)
  To: Alexander Viro, Linux Kernel Mailing List

Hi,
I built a nice mp3 player out of a AMD 486-DX133 and a soundblaster
es1371. I always used 2.2.16 and it worked properly. Due to several
reasons I want to switch to 2.4, so I tried my luck with 2.4.4-ac6.

Basically it works but the sound gets interrupted (around 0.5 - 5seconds
silence) from time to time although the system is around 44% idle. This
never happened with 2.2.16. The interrupts are not periodically,
sometimes there are none for a minute, sometimes there are even
interrupts around 5 seconds long.

I have to state that the data comes from an nfs-mounted directory -
perhaps this is a reason?

Another interesting thing is that during those interrupts the processor
usage of mpg123 decreases from 53% to around 20%, so it looks as if
mpg123 can either not get data or not output data.

Do you have any clues? Are there perhaps some kernel parameters to tune
(buffer size, dma...)?

		Best Regards,
		Hermann



-- 
 ,_,
(O,O)     "There is more to life than increasing its speed."
(   )     -- Gandhi
-"-"--------------------------------------------------------------

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:55                       ` Alan Cox
@ 2001-05-14 23:11                         ` Dan Hollis
  2001-05-14 23:19                           ` Alan Cox
  2001-05-14 23:23                         ` Alexander Viro
  2001-05-15  1:10                         ` Keith Owens
  2 siblings, 1 reply; 317+ messages in thread
From: Dan Hollis @ 2001-05-14 23:11 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List

On Mon, 14 May 2001, Alan Cox wrote:
> grep MAJOR lilo-21.4.4/*|wc -l
>     323
> Also hdparm
> raidtools
> psmisc
> mtools
> mt-st
> gpm
> joystick

so we now have a list of stuff that needs to be fixed 8)

or at least, a cross section sampling of stuff to design a new API for.

-Dan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:54                       ` H. Peter Anvin
  2001-05-14 23:00                         ` Alexander Viro
@ 2001-05-14 23:18                         ` Arjan van de Ven
  2001-05-14 23:20                           ` Alan Cox
  2001-05-15 18:57                           ` Kai Henningsen
  2001-05-15  5:56                         ` Oliver Neukum
  2 siblings, 2 replies; 317+ messages in thread
From: Arjan van de Ven @ 2001-05-14 23:18 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

In article <3B006229.EA65A868@transmeta.com> you wrote:

> That's not the issue.  LILO takes whatever you pass to root= and converts
> it to a device number at /sbin/lilo time.  An idiotic practice on the
> part of LILO, in my opinion, that ought to have been fixed a long time
> ago.

That's why you want mount-root-by-partition-label, not by device

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 23:11                         ` Dan Hollis
@ 2001-05-14 23:19                           ` Alan Cox
  0 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-14 23:19 UTC (permalink / raw)
  To: Dan Hollis
  Cc: Alan Cox, Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List

> >     323
> > Also hdparm
> > raidtools
> > psmisc
> > mtools
> > mt-st
> > gpm
> > joystick
> 
> so we now have a list of stuff that needs to be fixed 8)
> or at least, a cross section sampling of stuff to design a new API for.

Yes. Most of it actually uses the major stuff to answer the question
'what ioctls are valid' 'what type of thing am I bashing on'

Just issuing ioctls doesnt help as we have overlaps 8( Also we dont want to
get into the DOS like execute 135 queries to figure out what it is by
deep magic patterns.

One suggestion is to do something like
	if(MAJOR_HAS(st, named-property))

which solves that and also nicely fixes an extant problem in that there isnt
a good way to break down heirarchies of ioctl features right now. That is
one thing devfs namespaces can be abused to solve but isnt really the right
use of it.



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 23:18                         ` Arjan van de Ven
@ 2001-05-14 23:20                           ` Alan Cox
  2001-05-15 18:57                           ` Kai Henningsen
  1 sibling, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-14 23:20 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: H. Peter Anvin, linux-kernel

> > it to a device number at /sbin/lilo time.  An idiotic practice on the
> > part of LILO, in my opinion, that ought to have been fixed a long time
> > ago.
> 
> That's why you want mount-root-by-partition-label, not by device

Which in itself adds the 'and how does the label tell me what modules to load'
question..

For 2.5 with a clean ramfs root / initrd / uuid scan setup I would agree
entirely that uuid root is a good thing.

Alan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:55                       ` Alan Cox
  2001-05-14 23:11                         ` Dan Hollis
@ 2001-05-14 23:23                         ` Alexander Viro
  2001-05-15  1:10                         ` Keith Owens
  2 siblings, 0 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-14 23:23 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List



On Mon, 14 May 2001, Alan Cox wrote:

> grep MAJOR lilo-21.4.4/*|wc -l
>     323

/me looks and barfs.

Alan, had you actually looked at it? It will require massive changes
whenever you introduce new major. And most of such areas are stuff
that doesn't matter for new devices anyway - geometry, example.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:58                           ` Alan Cox
@ 2001-05-14 23:29                             ` Alexander Viro
  2001-05-15  4:20                             ` God
  2001-05-15  7:48                             ` 2.4 " bert hubert
  2 siblings, 0 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-14 23:29 UTC (permalink / raw)
  To: Alan Cox
  Cc: H. Peter Anvin, Linus Torvalds, Jeff Garzik, Linux Kernel Mailing List



On Mon, 14 May 2001, Alan Cox wrote:

> > Oh, _that_ one. <shrug> pass rootname=driver!name (or whatever syntax
> > you prefer) to the kernel and call do_mount() instead of sys_mknod() in
> > prepare_namespace() (rootfs patch). BFD.
> 
> Yet another 2.5 project. If Linus wants to go play with name driven devices
> and you want to help him great, but if he'd care to put out
> linux-2.5.0.tar.gz _before_ starting that would be good for all of us

Frankly, I'd love to see 2.4.5-pre2 before June ;-)

BTW, rootfs is backwards-compatible - all setups that used to work still
do. And yes, it includes devfs/nfs-root/initrd without linuxrc/initrd
with linuxrc that terminates/initrd with linuxrc that execs init/loading
ramdisk from floppies, etc. Testing was a bitch ;-/


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 19:57   ` H. Peter Anvin
  2001-05-14 20:04     ` Jeff Garzik
  2001-05-14 20:09     ` Alan Cox
@ 2001-05-14 23:32     ` Richard Gooch
  2 siblings, 0 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-14 23:32 UTC (permalink / raw)
  To: Alan Cox
  Cc: H. Peter Anvin, Jeff Garzik, Linux Kernel Mailing List,
	Linus Torvalds, viro

Alan Cox writes:
> > > (c) does not require devfs.  most distros ship without it afaik, and
> > > switching to it is not an overnight process, and requires devfsd to be
> > > useful in the real world.
> > > 
> > 
> > It does, however, not manage permissions, nor does it provide for a sane
> > namespace (it exposes too many internal implementation details in the
> > interface -- in particular, the driver becomes part of the namespace, and
> > devices move around between drivers regularly.)
> 
> It is also very hard to tar that device file.
> 
> As to devfsd well Al Viro was reporting races in it long ago that I
> don't believe Richard has had time to fix nor has anyone else fixed.

Actually, it was devfs, not devfsd that Al was complaining about.
Fortunately these races are hard to trigger without deliberately
trying to trigger them, otherwise I'd be inundated with bug reports
:-/

> What is the state on devfs there ?

Getting very close now. This last weekend was my first time for ages
that I've had an uninterrupted weekend to hack on Linux and didn't
have other really urgent stuff to deal with.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:29         ` Linus Torvalds
                             ` (3 preceding siblings ...)
  2001-05-14 21:16           ` Alan Cox
@ 2001-05-14 23:34           ` Richard Gooch
  4 siblings, 0 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-14 23:34 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Jeff Garzik, Alan Cox, H. Peter Anvin,
	Linux Kernel Mailing List, viro

Andi Kleen writes:
> On Mon, May 14, 2001 at 01:29:51PM -0700, Linus Torvalds wrote:
> > Big device numbers are _not_ a solution. I will accept a 32-bit one, but
> > no more, and I will _not_ accept a "manage by hand" approach any more. The
> > time has long since come to say "No". Which I've done. If you can't make
> > it manage the thing automatically with a script, you won't get a hardcoded
> > major device number just because you're lazy.
> 
> As far as I can see it just needs a /proc/devices that also outputs
> minor ranges with names, and a small program similar to scsidev to 
> generate nodes in /dev based on that on the fly on early bootup.

You can do that with devfs. It provides all this information. If you
really don't want to mount devfs over /dev, then mount it elsewhere
and just use it as an information source to populate /dev. No need to
add more code to the kernel to do it another way.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 23:00                         ` Alexander Viro
  2001-05-14 22:58                           ` Alan Cox
@ 2001-05-14 23:39                           ` Richard Gooch
  1 sibling, 0 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-14 23:39 UTC (permalink / raw)
  To: Alan Cox
  Cc: Alexander Viro, H. Peter Anvin, Linus Torvalds, Jeff Garzik,
	Linux Kernel Mailing List

Alan Cox writes:
> > Oh, _that_ one. <shrug> pass rootname=driver!name (or whatever syntax
> > you prefer) to the kernel and call do_mount() instead of sys_mknod() in
> > prepare_namespace() (rootfs patch). BFD.
> 
> Yet another 2.5 project. If Linus wants to go play with name driven
> devices and you want to help him great, but if he'd care to put out
> linux-2.5.0.tar.gz _before_ starting that would be good for all of
> us

I use LILO and I pass a devfs name for the ROOT fs. You just need to
pass the name as a string. If you type it at the LILO prompt, it gets
passed as a string (and thus devfs will use it to descend the tree).
Also, you can put in /etc/lilo.conf:
	append = "root=/dev/scsi/host0/bus0/target0/lun0/part2"

and it will pass the string. Works nicely.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:21           ` Alan Cox
@ 2001-05-14 23:43             ` Jan Niehusmann
  2001-05-14 23:48               ` Alan Cox
  0 siblings, 1 reply; 317+ messages in thread
From: Jan Niehusmann @ 2001-05-14 23:43 UTC (permalink / raw)
  To: Alan Cox
  Cc: H. Peter Anvin, Jeff Garzik, Linux Kernel Mailing List,
	Linus Torvalds, viro

On Mon, May 14, 2001 at 11:21:00PM +0100, Alan Cox wrote:
> 	You can make it a string if you like but at the end of the day 
> 	has to be an opaque handle. For constant devices it also has to be
> 	a constant name. Otherwise the /dev file I archived with the corporate
> 	backup system turns out to be a different device when I restore the 
> 	box after a problem and I reformat the wrong disk...

Why can't we configure this in user space? I think of something like
/etc/major-numbers. We could then tell the kernel at module load time what
major number to use for a given driver.

The corporate backup system then only needs to restore /dev and 
/etc/major-numbers at the same time. 

I don't think this is the ideal solution. But it has some nice
properties:

- no policy in kernel. Neither device names nor numbers are hard-coded
- no daemons needed, only some simple startup scripts 
- no special filesystems needed, /dev is simple tar-compatible directory
- everybody can add drivers to his system as he wants, without the need
  to register a number. One entry in a config file is enough
- every single system only needs as many major numbers as there are 
  drivers - so even 256 majors should be enough in most cases. 
  (this may be limited by the fact that the existing numbers should be
  recommended as standard entries in /etc/major-numbers to stay backward
  compatible)

Of course there are disadvantages, the biggest problem I see are drivers
compiled into the kernel. They need to get their major number from the
command line, I think, which is pretty ugly.

Perhaps the above is pure bullshit and my proposal is not working for
serveral reasons. But I think we should try to define our requirements to
the device numbering/naming system, and then find a solution that meats
these requirements - the final reason for choosing one solution should be
a technical one, not personal preference.

Jan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 23:43             ` Jan Niehusmann
@ 2001-05-14 23:48               ` Alan Cox
  0 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-14 23:48 UTC (permalink / raw)
  To: Jan Niehusmann
  Cc: Alan Cox, H. Peter Anvin, Jeff Garzik, Linux Kernel Mailing List,
	Linus Torvalds, viro

> Why can't we configure this in user space? I think of something like
> /etc/major-numbers. We could then tell the kernel at module load time what
> major number to use for a given driver.

We've got one of those lists. H Peter Anvin maintains it.

Don't get me wrong - if in 2.5.x someone can produce a scheme which works and
works well I'll be more than happy to be proved wrong, and Im sure hpa will
be glad his registrar role now becomes default device naming not numbers
and I suspect moves into the FHS/LSB. After all numbering or not most
vendors will want to ship a common device naming scheme, at least unless
they have fundamental reasons why not to (eg a desire to use arabic names)

Alan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 21:23             ` Alan Cox
@ 2001-05-15  0:33               ` Rik van Riel
  2001-05-16  9:04                 ` Ingo Oeser
  0 siblings, 1 reply; 317+ messages in thread
From: Rik van Riel @ 2001-05-15  0:33 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro

On Mon, 14 May 2001, Alan Cox wrote:

> > I've been doubting whether to work on both the -ac kernels
> > and the -linus tree, but this is a pretty good argument for
> > sticking with -ac and just ignoring the -linus tree...
>
> Time will make that decision. Linus kindly gave us all the power
> to vote with our feet. One thing I absolutely refuse to do is to
> let a disagreemnt over some specific device implementation turn
> into an excuse for a wider difference in the trees.

Agreed. However, if this thing means I cannot use the -linus
tree without devfs, then it will also mean my VM stuff only
gets tested on -ac kernels...

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:55                       ` Alan Cox
  2001-05-14 23:11                         ` Dan Hollis
  2001-05-14 23:23                         ` Alexander Viro
@ 2001-05-15  1:10                         ` Keith Owens
  2 siblings, 0 replies; 317+ messages in thread
From: Keith Owens @ 2001-05-15  1:10 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Mon, 14 May 2001 23:55:37 +0100 (BST), 
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> > And lilo ?
>Also hdparm
>raidtools
>psmisc
>mtools
>mt-st
>gpm
>joystick

kmod, /etc/modules.conf:

alias block-major-what-random-number-did-the-kernel-pick-this-time driver_name


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: Getting out of hand?
  2001-05-14 22:30               ` Alan Cox
  2001-05-14 22:48                 ` Alexander Viro
@ 2001-05-15  4:12                 ` God
  2001-05-15  4:30                   ` Linus Torvalds
  1 sibling, 1 reply; 317+ messages in thread
From: God @ 2001-05-15  4:12 UTC (permalink / raw)
  To: Alan Cox
  Cc: Alexander Viro, Linus Torvalds, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

On Mon, 14 May 2001, Alan Cox wrote:

> Subject: Re: LANANA: To Pending Device Number Registrants
> 
> > Would you mind demonstrating such wonder? Old devices are still there,
> > AFAICS. Ext2 (reiserfs, devfs, abortion-of-your-choice-fs) still has
> > the ability to create device nodes for them.
> 
> Except that Linus wont hand out major numbers, which means I can't even boot
> simply off such a device. I bet the vendors in question dont think the sun
> shines out of linus backside any more.
> 


ouch ....  can't we all just get along? :<


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:58                           ` Alan Cox
  2001-05-14 23:29                             ` Alexander Viro
@ 2001-05-15  4:20                             ` God
  2001-05-15  7:48                             ` 2.4 " bert hubert
  2 siblings, 0 replies; 317+ messages in thread
From: God @ 2001-05-15  4:20 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

On Mon, 14 May 2001, Alan Cox wrote:
> 
> Yet another 2.5 project. If Linus wants to go play with name driven devices
> and you want to help him great, but if he'd care to put out
> linux-2.5.0.tar.gz _before_ starting that would be good for all of us


ACK! .. 2.5?? .. gawd .. I just installed 2.4.4 like a week ago ... eeek
...




^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: Getting out of hand?
  2001-05-15  4:12                 ` LANANA: Getting out of hand? God
@ 2001-05-15  4:30                   ` Linus Torvalds
  2001-05-15  5:17                     ` Linus Torvalds
                                       ` (3 more replies)
  0 siblings, 4 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15  4:30 UTC (permalink / raw)
  To: Alan Cox
  Cc: Alexander Viro, Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List


On Mon, 14 May 2001, Alan Cox wrote:
> 
> Except that Linus wont hand out major numbers, which means I can't even boot
> simply off such a device. I bet the vendors in question dont think the sun
> shines out of linus backside any more.

Actually, it does. It's just that some people have gotten so blinded by my
a** that they can no longer see it any more ;)

The problem I have is that there are lots of _good_ solutions, but they
all imply a bit more work than the bad ones. 

What does that result in? Everybody continues to use the simple old setup,
which required no thought at all, but that is a pain to maintain.

For example, the only thing you need in order to boot is to have a nice
clean "disk" major number. That's it. Nothing fancy, nothing more. 

Look at what we have now:

 - ramdisk: major 1. Fair enough - ramdisk is special, in that it doesn't
   have any "real hardware". No problem.
 - SCSI disks:
	major 8, 65-71,
 - Compaq smart2:
	major 72-79
 - Compaq CISS:
	major 104-111
 - DASD;
	major 94
 - IDE:
	major 3, 22, 33-34, 56-57, 88-91

and then the small random ones.

NONE of these major numbers have _any_ redeeming qualities except for the
ramdisk. They should all be _one_ major number, namely "disk". There are
absolutely NO advantages to having separate devices for soem strange
compaq controllers and IDE disks. There is _no_ point in having some SCSI
disks show up at major 8, while others (who just happen to be attached to
a scsi bus that is not driven by the generic SCSI layer) show up at major
104 or whatever.

And it will never ever get fixed, unless somebody says "No more!". Which
I'm trying my best to say, except some people are so comfortable rolling
around in the shit that they have re-defined shit to be the new standard.

When Microsoft defines darkness to be standard, we laugh at them. When we
do it, Alan Cox stands up for it and claims that it's the best thing since
sliced bread. Double standards, anybody?

What I'm saying is: "No more SHIT!". I'm more than happy to give out a new
standard number for _disks_. I'm NOT AT ALL willing to say "Ok, Peter, go
ahead and give the next braindamaged Compaq/RedHat/Xxxx engineer another
random number so that we can dig ourselves deeper and deeper into this
shithole that Alan and others like so much".

How hard is it to generate a new "disk driver framework", and let people
register themselves, kind of like the "misc" drivers do. Except we'd only
allow DISKS. You could add something like

	register_disk_driver("compaq-ciss", nr_disks, &my_queue);

and then the disk driver framework will select a range of minor numbers
for the disks, and forward all requests that come to those minor numbers
to "my_queue". No major numbers. No fixed minors. And the user sees _one_
disk major, and doesn't care _what_ the hell is behind it.

But no. When I tell people "enough is enough", people want to continue
with the unbearably stupid and ugly thing we've always had, without
realizing that the _real_ problem is not that we have too few major
numbers, but the real problem is that people have mis-used the ones we
_do_ have, and the fact that we have too few _minor_ numbers (which is
easily fixable, and where 20 bits is plenty).

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: Getting out of hand?
  2001-05-15  4:30                   ` Linus Torvalds
@ 2001-05-15  5:17                     ` Linus Torvalds
  2001-05-15  8:24                     ` Geert Uytterhoeven
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15  5:17 UTC (permalink / raw)
  To: Alan Cox
  Cc: Alexander Viro, Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List


On Mon, 14 May 2001, Linus Torvalds wrote:
> 
> How hard is it to generate a new "disk driver framework", and let people
> register themselves, kind of like the "misc" drivers do. Except we'd only
> allow DISKS. You could add something like
> 
> 	register_disk_driver("compaq-ciss", nr_disks, &my_queue);

Note: one _important_ part of this is that absolutely _nobody_ registers a
disk driver except for a controller that is physically found on the
machine. 

None of this stupid "we have numbers pre-allocated for hardware that does
not even exists on this machine" crap that the current setup is full of. 

This way, you can pretty much depend on the fact that in any "normal"
configuration, you'll find disks at "disk0", "disk1", ... completely
regardless of whether the machine has a IDE controller, a "old-fashioned
SCSI" controller, or a Compaq smart-raid controller. And THAT is useful.  
You can migrate filesystem setups from one machine to another, without
worrying about the fact that one machine has IDE disks and another has
SCSI disks - the filesystem will just work, and the kernels will just
boot.

THAT is how it is supposed to work.

For people who care about where the disks are (0.01% of all people, and
half of those are misguded anyway), you can have a /proc interface or an
ioctl or something. 

But don't make excuses for the current setup. And understand why we must
NOT continue to just give out major numbers indiscriminately.

[ Oh, and _please_ don't Cc: me on this discussion. I'm not that
  interested. I know what I want, and I've let the current mess go on for
  too long. If it takes some pain to fix it, then so be it. It needs to be
  fixed, even if people suddenly start thinking that the light of my a**
  dimmed a bit. That's ok. I just don't want to really fill my inbox - I
  read the kernel mailing list with a newsreader and the "D" key. ]

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 22:54                       ` H. Peter Anvin
  2001-05-14 23:00                         ` Alexander Viro
  2001-05-14 23:18                         ` Arjan van de Ven
@ 2001-05-15  5:56                         ` Oliver Neukum
  2001-05-15  5:59                           ` H. Peter Anvin
  2 siblings, 1 reply; 317+ messages in thread
From: Oliver Neukum @ 2001-05-15  5:56 UTC (permalink / raw)
  To: H. Peter Anvin, Alexander Viro
  Cc: Alan Cox, Linus Torvalds, Jeff Garzik, Linux Kernel Mailing List

On Tuesday, 15. May 2001 00:54, H. Peter Anvin wrote:
> Alexander Viro wrote:
> > On Mon, 14 May 2001, Alan Cox wrote:
> > > > > Except that Linus wont hand out major numbers, which means I can't
> > > > > even boot simply off such a device. I bet the vendors in question
> > > > > dont think the sun shines out of linus backside any more.
> > > >
> > > > Not really. Special-casing for mounting root is trivially solvable.
> > > > BTDT, and you've reviewed the patch.
> > >
> > > And lilo ?
> >
> > LILO uses BIOS, for fsck sake. It couldn't care less for device numbers
> > on the kernel side. Ask Andries how much do they have in common with
> > BIOS drive numbers.
>
> That's not the issue.  LILO takes whatever you pass to root= and converts
> it to a device number at /sbin/lilo time.  An idiotic practice on the
> part of LILO, in my opinion, that ought to have been fixed a long time
> ago.

And happily passes a "root=" argument through "append=" for the kernel to 
evaluate.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  5:56                         ` Oliver Neukum
@ 2001-05-15  5:59                           ` H. Peter Anvin
  0 siblings, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15  5:59 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Alexander Viro, Alan Cox, Jeff Garzik, Linux Kernel Mailing List

Oliver Neukum wrote:
> >
> > That's not the issue.  LILO takes whatever you pass to root= and converts
> > it to a device number at /sbin/lilo time.  An idiotic practice on the
> > part of LILO, in my opinion, that ought to have been fixed a long time
> > ago.
> 
> And happily passes a "root=" argument through "append=" for the kernel to
> evaluate.
> 

Sure, but it's not the way they have convinced users to set their systems
up.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 20:55           ` Neil Brown
  2001-05-14 21:20             ` Alan Cox
  2001-05-14 21:24             ` Jeff Garzik
@ 2001-05-15  6:41             ` Linus Torvalds
  2001-05-15  8:57               ` Alan Cox
                                 ` (2 more replies)
  2 siblings, 3 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15  6:41 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jeff Garzik, Alan Cox, H. Peter Anvin, Linux Kernel Mailing List, viro


On Tue, 15 May 2001, Neil Brown wrote:
> 
> I want to create a new block device - it is a different interface to
> the software-raid code that allows the arrays to be partitioned using
> normal partition tables.

See the other posts about creating a "disk" layer. Think of it as just a
simple "lvm" thing, except on a higher level (ie not on the request level,
but on the level _before_ we get to queuing the thing at all).

Plug the thing in at "__blk_get_queue()", and you're done.

> So I need a major number - to give to devfs_register_blkdev at least.
> You don't want me to have a hardcoded one (which is fine) so I need a
> dynamically allocated one - yes?

If you are willing to use devfs, you can just use a major nr of zero, and
devfs will allocate a device for you. 

Not everybody likes devfs, and there are bootstrap issues with this
approach, but it is the simple "get things working quickly" approach that
needs _zero_ changes or infrastructure.

> This means that we need some analogue to {get,put}_unnamed_dev that
> manages a range of dynamically allocated majors.

We already do have that. And have had it for a long time. It's pretty much
been part of "register_blkdev()" since day one (not quite true, but I bet
that code has been there since the days of Linux-1.0.x). 

You just pass in a major number of zero to "register_blkdev()", and it
will make one up for you.

devfs inherited this behaviour from the first version, I think.

> Am I missing something obvious here?

The fact that it already exists, and has existed for 5+ years, but that
nobody really uses it?

Nobody really uses it because it would require you to add a line or two to
your init scripts to pick up the major number from /proc/devices, and
that's obviously too hard. Much better to just hardcode randome numbers,
right?

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* 2.4 To Pending Device Number Registrants
  2001-05-14 22:58                           ` Alan Cox
  2001-05-14 23:29                             ` Alexander Viro
  2001-05-15  4:20                             ` God
@ 2001-05-15  7:48                             ` bert hubert
  2001-05-15  8:54                               ` Alan Cox
  2 siblings, 1 reply; 317+ messages in thread
From: bert hubert @ 2001-05-15  7:48 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: torvalds

On Mon, May 14, 2001 at 11:58:39PM +0100, Alan Cox wrote:
> Yet another 2.5 project. If Linus wants to go play with name driven devices
> and you want to help him great, but if he'd care to put out
> linux-2.5.0.tar.gz _before_ starting that would be good for all of us

Well, that's one thing. 2.4 will not need userspace changes internally, so
any funky major/minor number dynamic allocation stuff needs to be solved
without userspace help. This probably rules out most everything, unless a
setup is found that will special case all of /dev/ currently existing.

So I would think that this block of new major number allocations holds for
2.5 and not 2.4. Also, if I'm correct, 2.4 won't be needing a lot of new
major numbers anyhow.

This all means that a lot of the current hubbub is unjustified - 2.5 is not
there yet. Yes there is urgency and this way of forcing discussion is a very
Linus-eque way of trying to achieve something.

But unless I'm wrong, there is no way that this can affect a 2.4 without
userspace changes which have historically been considered forbidden within a
stable series.

Regards,

bert

-- 
http://www.PowerDNS.com      Versatile DNS Services  
Trilab                       The Technology People   
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: Getting out of hand?
  2001-05-15  4:30                   ` Linus Torvalds
  2001-05-15  5:17                     ` Linus Torvalds
@ 2001-05-15  8:24                     ` Geert Uytterhoeven
  2001-05-15  8:48                     ` Alan Cox
  2001-05-15 21:16                     ` Martin Dalecki
  3 siblings, 0 replies; 317+ messages in thread
From: Geert Uytterhoeven @ 2001-05-15  8:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Alexander Viro, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

On Mon, 14 May 2001, Linus Torvalds wrote:
> On Mon, 14 May 2001, Alan Cox wrote:
> > Except that Linus wont hand out major numbers, which means I can't even boot
> > simply off such a device. I bet the vendors in question dont think the sun
> > shines out of linus backside any more.
> 
> For example, the only thing you need in order to boot is to have a nice
> clean "disk" major number. That's it. Nothing fancy, nothing more. 
> 
> Look at what we have now:
> 
>  - ramdisk: major 1. Fair enough - ramdisk is special, in that it doesn't
>    have any "real hardware". No problem.
>  - SCSI disks:
> 	major 8, 65-71,
>  - Compaq smart2:
> 	major 72-79
>  - Compaq CISS:
> 	major 104-111
>  - DASD;
> 	major 94
>  - IDE:
> 	major 3, 22, 33-34, 56-57, 88-91
> 
> and then the small random ones.
> 
> NONE of these major numbers have _any_ redeeming qualities except for the
> ramdisk. They should all be _one_ major number, namely "disk". There are
> absolutely NO advantages to having separate devices for soem strange
> compaq controllers and IDE disks. There is _no_ point in having some SCSI
> disks show up at major 8, while others (who just happen to be attached to
> a scsi bus that is not driven by the generic SCSI layer) show up at major
> 104 or whatever.

    [...]

> How hard is it to generate a new "disk driver framework", and let people
> register themselves, kind of like the "misc" drivers do. Except we'd only
> allow DISKS. You could add something like
> 
> 	register_disk_driver("compaq-ciss", nr_disks, &my_queue);
> 
> and then the disk driver framework will select a range of minor numbers
> for the disks, and forward all requests that come to those minor numbers
> to "my_queue". No major numbers. No fixed minors. And the user sees _one_
> disk major, and doesn't care _what_ the hell is behind it.

Looks exactly like what we used to do for serial ports on the m68k platform...

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: Getting out of hand?
  2001-05-15  4:30                   ` Linus Torvalds
  2001-05-15  5:17                     ` Linus Torvalds
  2001-05-15  8:24                     ` Geert Uytterhoeven
@ 2001-05-15  8:48                     ` Alan Cox
  2001-05-15 21:16                     ` Martin Dalecki
  3 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-15  8:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Alexander Viro, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

> How hard is it to generate a new "disk driver framework", and let people
> register themselves, kind of like the "misc" drivers do. Except we'd only
> allow DISKS. You could add something like
> 
> 	register_disk_driver("compaq-ciss", nr_disks, &my_queue);

Why bother. Devfs does that already. Thats the enumeration problem 

> and then the disk driver framework will select a range of minor numbers
> for the disks, and forward all requests that come to those minor numbers
> to "my_queue". No major numbers. No fixed minors. And the user sees _one_
> disk major, and doesn't care _what_ the hell is behind it.

The user running devfs sees /dev/disc or /devices/disc and doesnt care
whats behind it already. They also see what is scsi and the like providing
they care to ask. The latter is essential to make ioctl work.

Doing a grep across about a large amount of source code I found several very
definite uses the device type:

1	Is file A the same as file B	

	This continues to work fine

2	Is file A on mountpoint B

	This continues to work fine

3	Are you running this on a sane device

	Joystick, hdparm, ...

4	Which ioctl set can I use of device A

	This breaks. Examples of this include tools like mt-st which has to
	use different ioctls according to the tape class. Our ioctls overlap
	so it isnt safe to issue them and pray

5	Deep nasty lowlevel grungy knowledge

	Things like lilo that knows and to an extent has to know more about
	the universe than is nice.

3 and 4 are variants of the same thing really. The lack of any way other than
the major number to say 'What ioctl classes does this device support'. IMHO
thats a thing you have to fix first - a way to query the device and get back
{"disk", "scsi-disk", "scsi-lowlevel"} or {"disk", "cpqarray"}

The underlying name/number thing is a red herring. You don't need that to
do /dev/disc nicely. devfs rather proved it. 

Alan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: 2.4 To Pending Device Number Registrants
  2001-05-15  7:48                             ` 2.4 " bert hubert
@ 2001-05-15  8:54                               ` Alan Cox
  2001-05-15  9:09                                 ` bert hubert
  0 siblings, 1 reply; 317+ messages in thread
From: Alan Cox @ 2001-05-15  8:54 UTC (permalink / raw)
  To: bert hubert; +Cc: Linux Kernel Mailing List, torvalds

> So I would think that this block of new major number allocations holds for
> 2.5 and not 2.4. Also, if I'm correct, 2.4 won't be needing a lot of new
> major numbers anyhow.

I wouldnt bet on that. Going to a 32bit dev_t internally without user space
noticing would keep it seems to be quite doable if we have to. Right now doesnt
worry me, in 2 years time when 2.6 is approaching release the picture might
have changed a fair bit


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  6:41             ` Linus Torvalds
@ 2001-05-15  8:57               ` Alan Cox
  2001-05-15  9:08                 ` Linus Torvalds
  2001-05-15 11:44               ` Neil Brown
  2001-05-15 15:51               ` John Fremlin
  2 siblings, 1 reply; 317+ messages in thread
From: Alan Cox @ 2001-05-15  8:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Neil Brown, Jeff Garzik, Alan Cox, H. Peter Anvin,
	Linux Kernel Mailing List, viro

> The fact that it already exists, and has existed for 5+ years, but that
> nobody really uses it?
> 
> Nobody really uses it because it would require you to add a line or two to
> your init scripts to pick up the major number from /proc/devices, and
> that's obviously too hard. Much better to just hardcode randome numbers,
> right?

modprobe ?


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  8:57               ` Alan Cox
@ 2001-05-15  9:08                 ` Linus Torvalds
  2001-05-15  9:26                   ` Alan Cox
  2001-05-15  9:28                   ` Alan Cox
  0 siblings, 2 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15  9:08 UTC (permalink / raw)
  To: Alan Cox
  Cc: Neil Brown, Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List, viro


On Tue, 15 May 2001, Alan Cox wrote:
> > 
> > Nobody really uses it because it would require you to add a line or two to
> > your init scripts to pick up the major number from /proc/devices, and
> > that's obviously too hard. Much better to just hardcode randome numbers,
> > right?
> 
> modprobe ?

I was being ironic.

Yes, it's used. Not very widely at all, and historically what has actually
happened is that people have used the dynamic numbers for a while, but in
order to become "real members of society" they've applied for a real
static major number even if the dynamic one worked fine.

Silly, yes. 

Note that my whole argument is that we do NOT need more of the static
numbers, and we should NOT expand the major number space
unnecessarily. We _can_ make do with devfs (trivially - no need to do
anything at all, as devfs already handles the case of dynamic major
numbers quite well). 

But the fact remains that some users want to (a) avoid devfs and (b) have
static maintenance. And I'm ok with that too, but only if the static major
number is in the form of a _generic_ number that has absolutely nothing to
do with any specific drivers (which is why I'd be perfecly ok with still
adding a "disk" major number, but which is why I do NOT want to have Peter
give out "the random number of today" to various stupid device drivers).

So we seem to be in violent agreement here.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: 2.4 To Pending Device Number Registrants
  2001-05-15  8:54                               ` Alan Cox
@ 2001-05-15  9:09                                 ` bert hubert
  0 siblings, 0 replies; 317+ messages in thread
From: bert hubert @ 2001-05-15  9:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Tue, May 15, 2001 at 09:54:33AM +0100, Alan Cox wrote:
> > So I would think that this block of new major number allocations holds for
> > 2.5 and not 2.4. Also, if I'm correct, 2.4 won't be needing a lot of new
> > major numbers anyhow.
> 
> I wouldnt bet on that. Going to a 32bit dev_t internally without user space
> noticing would keep it seems to be quite doable if we have to. Right now doesnt
> worry me, in 2 years time when 2.6 is approaching release the picture might
> have changed a fair bit

I think that we then have two distinct problems: 1) finding a solution for 2.4
that does not change userspace 2) finding a solution for 2.5/2.6 that is
Right.

Personally I'm not sure what 2.4 stands to gain from a redesign. While 2.4
is obviously developing, a stable series needs to solve real problems or
improve performance - I know the way major numbers are allocated right now
is ugly and doesn't scale very well. But is 2.4 the place to fix that?

So the question is: does this new policy hold for 2.4 as well and if so,
why.

Regards,

bert

-- 
http://www.PowerDNS.com      Versatile DNS Services  
Trilab                       The Technology People   
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  9:08                 ` Linus Torvalds
@ 2001-05-15  9:26                   ` Alan Cox
  2001-05-15  9:49                     ` Alexander Viro
                                       ` (2 more replies)
  2001-05-15  9:28                   ` Alan Cox
  1 sibling, 3 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-15  9:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro

> But the fact remains that some users want to (a) avoid devfs and (b) have
> static maintenance. And I'm ok with that too, but only if the static major
> number is in the form of a _generic_ number that has absolutely nothing to
> do with any specific drivers (which is why I'd be perfecly ok with still
> adding a "disk" major number, but which is why I do NOT want to have Peter
> give out "the random number of today" to various stupid device drivers).
> So we seem to be in violent agreement here.

Ok.

Then we need to open the second discussion which is:

Given a file handle 'X' how do I find out what ioctl groups I should apply to
it. So we can go from

	if(MAJOR(st.st_rdev) == ST_MAJOR)
		issue_scsi_ioctls
	else if(MAJOR(st.st_rdev) == FTAPE_MAJOR)
		issue_ftape_ioctls
	else ..
	else
		error

to

	/* Use scsi if possible [scsi, ide-scsi, usb-scsi, ...] */
	if(HAS_FEATURE_SET(fd, "scsi-tape"))
		...
	else if(HAS_FEATURE_SET(fd, "floppy-tape"))
		..

Alan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  9:08                 ` Linus Torvalds
  2001-05-15  9:26                   ` Alan Cox
@ 2001-05-15  9:28                   ` Alan Cox
  2001-05-15 15:15                     ` Linus Torvalds
  1 sibling, 1 reply; 317+ messages in thread
From: Alan Cox @ 2001-05-15  9:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro

> do with any specific drivers (which is why I'd be perfecly ok with still
> adding a "disk" major number, but which is why I do NOT want to have Peter
> give out "the random number of today" to various stupid device drivers).

For block devices that seems to work well. char devices are harder and I'd
rather issue the occasional new major than have people registering automatic
cabbage slicers as a tty or a disk because they cant get a device id.

Alan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  9:26                   ` Alan Cox
@ 2001-05-15  9:49                     ` Alexander Viro
  2001-05-15  9:51                       ` Alan Cox
  2001-05-15 15:10                     ` Linus Torvalds
  2001-05-15 21:40                     ` Chip Salzenberg
  2 siblings, 1 reply; 317+ messages in thread
From: Alexander Viro @ 2001-05-15  9:49 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List



On Tue, 15 May 2001, Alan Cox wrote:

> to
> 
> 	/* Use scsi if possible [scsi, ide-scsi, usb-scsi, ...] */
> 	if(HAS_FEATURE_SET(fd, "scsi-tape"))
> 		...
> 	else if(HAS_FEATURE_SET(fd, "floppy-tape"))
> 		..

Alan, if we are doing that we might as well use saner interface than
ioctl(2). In case you've mentioned we don't want "make device SYS$FOO17
do special action OP$LOUD$BARF4269". We want "make device rewind the tape".
Or "tell us geometry". Or "eject the media". Application doesn't
_care_ whether it is ejecting floppy on Sun or IDE CD, or SCSI
CD or ZIP disk sitting on parallel port. The fact that currently it
has to know is a Bad Thing(tm).

At the very least we need ioctls sorted by function, not by device.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  9:49                     ` Alexander Viro
@ 2001-05-15  9:51                       ` Alan Cox
  2001-05-15 10:12                         ` Alexander Viro
  2001-05-15 15:16                         ` Linus Torvalds
  0 siblings, 2 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-15  9:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Alan Cox, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List

> Alan, if we are doing that we might as well use saner interface than
> ioctl(2). In case you've mentioned we don't want "make device SYS$FOO17
> do special action OP$LOUD$BARF4269". We want "make device rewind the tape".
> Or "tell us geometry". Or "eject the media". Application doesn't

Counter argument; We dont want the bloat of making a floppy tape have
delusions of grandeur in kernel space when mt-st can do it in userspace.

Alan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  9:51                       ` Alan Cox
@ 2001-05-15 10:12                         ` Alexander Viro
  2001-05-15 10:36                           ` Alan Cox
  2001-05-15 15:16                         ` Linus Torvalds
  1 sibling, 1 reply; 317+ messages in thread
From: Alexander Viro @ 2001-05-15 10:12 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List



On Tue, 15 May 2001, Alan Cox wrote:

> > Alan, if we are doing that we might as well use saner interface than
> > ioctl(2). In case you've mentioned we don't want "make device SYS$FOO17
> > do special action OP$LOUD$BARF4269". We want "make device rewind the tape".
> > Or "tell us geometry". Or "eject the media". Application doesn't
> 
> Counter argument; We dont want the bloat of making a floppy tape have
> delusions of grandeur in kernel space when mt-st can do it in userspace.

Cost of adding IOCTL_REWIND_TAPE - two words in each tape driver. That
alone kills a bunch of crap in userland and makes _both_ sides more
maintainable.

Idea that ioctls belong to drivers is bogus. Some of them do, but that's
exactly the case when it's something deeply specific to the driver. As
in "make the printer puke on the top of next page". And even that might be
better off as IOCTL_LART.

IOW, even if we stay with ioctl(2) every place where we do the "if scsi tape
... else if floppy tape ..." is bogus.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 10:12                         ` Alexander Viro
@ 2001-05-15 10:36                           ` Alan Cox
  0 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-15 10:36 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Alan Cox, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List

> Cost of adding IOCTL_REWIND_TAPE - two words in each tape driver. That
> alone kills a bunch of crap in userland and makes _both_ sides more
> maintainable.

A lot lot more than that. There are some cases where what you are saying is
true and we have duplication. The worst culprit was the cd layer and that
has been cleaned up for a while now.

In most of the other cases it varies what is done in drive firmware or in
userspace by the app. Reimplementing half of the drive firmware in kernel
space not user space does not appeal

Where it just normalising ioctl numbers I'd agree 100%



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  6:41             ` Linus Torvalds
  2001-05-15  8:57               ` Alan Cox
@ 2001-05-15 11:44               ` Neil Brown
  2001-05-15 15:34                 ` Linus Torvalds
  2001-05-15 15:51               ` John Fremlin
  2 siblings, 1 reply; 317+ messages in thread
From: Neil Brown @ 2001-05-15 11:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Garzik, Alan Cox, H. Peter Anvin, Linux Kernel Mailing List, viro

On Monday May 14, torvalds@transmeta.com wrote:
> 
> On Tue, 15 May 2001, Neil Brown wrote:
> > 
> > I want to create a new block device - it is a different interface to
> > the software-raid code that allows the arrays to be partitioned using
> > normal partition tables.
> 
> See the other posts about creating a "disk" layer. Think of it as just a
> simple "lvm" thing, except on a higher level (ie not on the request level,
> but on the level _before_ we get to queuing the thing at all).
> 
> Plug the thing in at "__blk_get_queue()", and you're done.
> 

Ok, I'm begining to get the idea.
Ofcourse setting the "queue" function that __blk_get_queue call to do
a lookup of the minor and choose an appropriate queue for the "real"
device wont work as you need to munge bh->b_rdev too.
But you could define a make_request_fn instead which simply
changes b_rdev from the major/minor of the virtual disk device to the
(dynamically allocated) major/minor of the real device.

You would still nee to make sure the blk_size[], blksize_size[],
hardsect_size[], max_readahead[], max_sectors[] all got handled
properly.  Thats probably not too hard.

So this would mean that my new driver (mdp) gets a dynamically
allocated major number which is probably never seen from user-space
(though I could look in /proc/devices if I wanted to), and it is
accessed through /dev/diskAN for some value of A and different
partitions N.

So far I'm with you (I think).

Does the minor number for this "disk" layer have N bits for partition
number and 8-N bits (later to be 20-N bits or similar) for device
number?   If so we are limited to a smallish number of discs for now.
If not (and partitions are packed densely) then changing the
partitioning of a drive could be awkward.

Finally, how do I say that I want the root filesystem to be on a
particular "mdp" device+partition.  I cannot assume that my device
will be the first to register with the "disk" layer, so I cannot be
sure that "root=/dev/diska1" will work.
Maybe a boot line option like:
    diska=mdp,0
could be used.  Each device that registers with "disk" defines a
__setup handler for "disk" which checks if it should register as a
particular "disk".
So if "mdp" finds that there is an mdp device 0, and wants to register
it with "disk" then:
  if "diskX=mdp,0" is a boot option for some X, register it with "disk"
    as device "X-'a'"
  else register it with "disk" as "largest-available-device".

Does that make sense?   It feels a bit ugly though


The brick wall the I feel that I am hitting is that there needs to
be a name space to communicate device identities between kernelspace
and userspace.  You are saying that major numbers are no longer to be
that namespace (at least, not for new drivers) so we have to come up
with a new namespace, which will obviously involve textual names.
There seem to be three options:

 1/ devfs - but not everybody likes that (and there are certainly
     aspects that I am uncomfortable with)
 2/ ugly hacks like the above and like name_to_kdev_t
 3/ "something else" which either hasn't been proposed or hasn't been
     agreed on. (I have my own ideasbut insufficient time).

I guess your goal in establishing the moratorium on new majors is to
force people to break down this brick wall, either by accepting devfs,
pushing through ugly hacks, or finding that "something else".

I'll keep looking for a sledgehammer:-)

NeilBrown

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  9:26                   ` Alan Cox
  2001-05-15  9:49                     ` Alexander Viro
@ 2001-05-15 15:10                     ` Linus Torvalds
  2001-05-15 15:29                       ` Alexander Viro
                                         ` (5 more replies)
  2001-05-15 21:40                     ` Chip Salzenberg
  2 siblings, 6 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 15:10 UTC (permalink / raw)
  To: Alan Cox
  Cc: Neil Brown, Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List, viro


On Tue, 15 May 2001, Alan Cox wrote:
> 
> Given a file handle 'X' how do I find out what ioctl groups I should apply to
> it. So we can go from
> 
> 	if(MAJOR(st.st_rdev) == ST_MAJOR)
> 		issue_scsi_ioctls
> 	else if(MAJOR(st.st_rdev) == FTAPE_MAJOR)
> 		issue_ftape_ioctls
> 	else ..
> 	else
> 		error

Ugh. You do this?

And you don't realize that the whole system is too broken for words?

What is the horrible app that does something like this? 

The fix, I think, is to make the ioctl commands much more regular. That is
probably true in general, and fixing that would hopefully fix the need for
horrible code like the above.

That said:

> 	/* Use scsi if possible [scsi, ide-scsi, usb-scsi, ...] */
> 	if(HAS_FEATURE_SET(fd, "scsi-tape"))
> 		...
> 	else if(HAS_FEATURE_SET(fd, "floppy-tape"))
> 		..

doesn't look horrible, and I don't see why we couldn't expose the "driver
name" for any file descriptor. We already do for some: "fstatfs()" is
largely the same thing on another level.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  9:28                   ` Alan Cox
@ 2001-05-15 15:15                     ` Linus Torvalds
  2001-05-15 15:19                       ` Jeff Garzik
  0 siblings, 1 reply; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 15:15 UTC (permalink / raw)
  To: Alan Cox
  Cc: Neil Brown, Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List, viro


On Tue, 15 May 2001, Alan Cox wrote:
> 
> For block devices that seems to work well. char devices are harder and I'd
> rather issue the occasional new major than have people registering automatic
> cabbage slicers as a tty or a disk because they cant get a device id.

What are the valid cases that couldn't just register as a misc'ish
driver? The one that stands out is serial devices (you have hundreds of
them), but that's the same argument as a disk anyway.

I'd be much happier with trying to expand on /proc/devices etc, so that
the user _can_ get valid information. Otherwise you end up with the stupid
setup where we keep track of static allocations of numbers for truly
specialty hardware ("I have a lip-frobnicator made by Acme Industries that
I wrote a driver for, and I need 16 minor numbers for it").

Right now we have wasted the minors in the misc device the same way we
wasted the majors in general, and for the same (bogus) reasons.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  9:51                       ` Alan Cox
  2001-05-15 10:12                         ` Alexander Viro
@ 2001-05-15 15:16                         ` Linus Torvalds
  2001-05-15 20:55                           ` Alan Cox
  1 sibling, 1 reply; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 15:16 UTC (permalink / raw)
  To: Alan Cox
  Cc: Alexander Viro, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List


On Tue, 15 May 2001, Alan Cox wrote:
> 
> Counter argument; We dont want the bloat of making a floppy tape have
> delusions of grandeur in kernel space when mt-st can do it in userspace.

Counter-counter-argument: we could just export the ioctl's, and make a
"user-level-filesystem". Except it's not a filesystem, but a driver.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 15:15                     ` Linus Torvalds
@ 2001-05-15 15:19                       ` Jeff Garzik
  2001-05-15 15:45                         ` Linus Torvalds
  0 siblings, 1 reply; 317+ messages in thread
From: Jeff Garzik @ 2001-05-15 15:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Neil Brown, H. Peter Anvin, Linux Kernel Mailing List, viro

Linus Torvalds wrote:
> 
> On Tue, 15 May 2001, Alan Cox wrote:
> >
> > For block devices that seems to work well. char devices are harder and I'd
> > rather issue the occasional new major than have people registering automatic
> > cabbage slicers as a tty or a disk because they cant get a device id.
> 
> What are the valid cases that couldn't just register as a misc'ish
> driver? The one that stands out is serial devices (you have hundreds of
> them), but that's the same argument as a disk anyway.

/dev/fbN, /dev/dspN, /dev/videoN, ...

-- 
Jeff Garzik      | Game called on account of naked chick
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 15:10                     ` Linus Torvalds
@ 2001-05-15 15:29                       ` Alexander Viro
  2001-05-15 17:21                       ` James Simmons
                                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-15 15:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List



On Tue, 15 May 2001, Linus Torvalds wrote:

> What is the horrible app that does something like this? 

eject(1), for one thing. And yes, it's ugly beyond belief - don't read
without a barfbag. BTW, LILO is not better, to put it _very_ mildly.

> > 	/* Use scsi if possible [scsi, ide-scsi, usb-scsi, ...] */
> > 	if(HAS_FEATURE_SET(fd, "scsi-tape"))
> > 		...
> > 	else if(HAS_FEATURE_SET(fd, "floppy-tape"))
> > 		..
> 
> doesn't look horrible, and I don't see why we couldn't expose the "driver
> name" for any file descriptor. We already do for some: "fstatfs()" is
> largely the same thing on another level.

Well, yes, if you can extract fs type from fstatfs() output. I don't
think that ->s_magic (i.e. ->f_type) is a good way to do that, though.
We have unused space in struct statfs and IMO putting the name there
is a good idea. Has an additional nice property of killing the crap
like switch by magic numbers.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 11:44               ` Neil Brown
@ 2001-05-15 15:34                 ` Linus Torvalds
  2001-05-16  1:00                   ` Daniel Phillips
  2001-05-16  3:25                   ` Neil Brown
  0 siblings, 2 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 15:34 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jeff Garzik, Alan Cox, H. Peter Anvin, Linux Kernel Mailing List, viro


On Tue, 15 May 2001, Neil Brown wrote:
> 
> Ofcourse setting the "queue" function that __blk_get_queue call to do
> a lookup of the minor and choose an appropriate queue for the "real"
> device wont work as you need to munge bh->b_rdev too.

What I would do is:
 - remove b_rdev completely. No driver is actually interested in what the
   device number is, the only thing they want to use it for is to look up
   which device index we have (and for doing the partition handling, but
   as discussed in a completely independent discussion a few months ago,
   we should handle that as a lvm remapping thing, NOT at the driver
   level!)
 - replace is with b_index

Then, the "get_queue" functions basically end up doing the mapping of

	b_dev -> <queue,b_index>

and the LVM remapping would do

	<queue1,b_index1> -> <queue2,b_index2>

instead of what happens now (right now we have:

	/* At request creation time */
	b_rdev = b_dev  (equivalent to <queue,b_index> <- b_dev in get_queue)

	/* at lvm remapping time */
	b_rdev = mapping(b_rdev);

The current reliance on b_rdev is rather confusing, and has resulted in a
ton of bugs exactly because of that.

> You would still nee to make sure the blk_size[], blksize_size[],
> hardsect_size[], max_readahead[], max_sectors[] all got handled
> properly.

Actually, I htink Jens did most of these, and moved them into a device
array.

> Does the minor number for this "disk" layer have N bits for partition
> number and 8-N bits (later to be 20-N bits or similar) for device
> number?

I'd go with N=8, and only use this for the "new" cases, We've seen that
N=4 is too small (SCSI), and N=6 (IDE) is already too cramped with a 8-bit
minor number.

BUT! Note that when you do the partition handling in get_queue too (and
thus index is an index to the _device_ and has nothing to do with
partitions), you can trivially allow different majors to have different
numbers of partition bits, because the driver no longer cares. This is
required so that the get_queue remapping can easily handle the legacy
IDE/SCSI numbers anyway, so it's easyish to just have both at the same
time - you could have N=4 for a "disk major for old users that need the
16-bit device numbers and a single major", with N=8 for the "new style
major" which doesn't fit in 8 bits.

> Finally, how do I say that I want the root filesystem to be on a
> particular "mdp" device+partition.  I cannot assume that my device
> will be the first to register with the "disk" layer, so I cannot be
> sure that "root=/dev/diska1" will work.

You have never been able to really assume that. Disks move around. 

A lot of people seem to think that controller type or location on the PCI
bus should somehow have some "meaning", and that it guarantees that the
disks don't move in the namespace. That's crap. You can do that in user
space ("what controller are you on?") if you really really care.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 15:19                       ` Jeff Garzik
@ 2001-05-15 15:45                         ` Linus Torvalds
  2001-05-15 17:27                           ` James Simmons
  0 siblings, 1 reply; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 15:45 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Alan Cox, Neil Brown, H. Peter Anvin, Linux Kernel Mailing List, viro


On Tue, 15 May 2001, Jeff Garzik wrote:

> Linus Torvalds wrote:
> > 
> > What are the valid cases that couldn't just register as a misc'ish
> > driver? The one that stands out is serial devices (you have hundreds of
> > them), but that's the same argument as a disk anyway.
> 
> /dev/fbN, /dev/dspN, /dev/videoN, ...

I still don't see why they couldn't be misc drivers? 

Sure, some of them already exist and all that, and we need to support
their major numbers just for backwards compatibility reasons. But a simple
"give me 16 minors, please" should work fine, together with minimal
infrastructure to create the nodes.

Think of the problem as a hot-plug issue. We don't want to statically
allocate device numbers etc for hotplug - we create the nodes on an
as-needed basis when the device is plugged in, and it's fairly easy to do
with a /sbin/hotplug kind of approach.

Static devices like /dev/fbN are no different. They were just plugged in
before the OS booted.

We already need (and largely _have_) the infrastructure for creating
device nodes dynamically: modprobe has done this since pretty much day
one, and /sbin/hotplug allows for it too. What's so distasteful with
applying the same logic to pretty much _all_ devices, and get away from
the silly static number allocation.

There are _very_ few things that need static numbers, and 99% of them are
either (a) legacy reasons, ie people _know_ that IDE is major 3 or (b)
really ugly stuff like the ioctl() example Alan posted which is not really
due to wanting static major numbers at all, but is using static knowledge
to work around _other_ problems.

Fixing those other problems would be good too ;)

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  6:41             ` Linus Torvalds
  2001-05-15  8:57               ` Alan Cox
  2001-05-15 11:44               ` Neil Brown
@ 2001-05-15 15:51               ` John Fremlin
  2 siblings, 0 replies; 317+ messages in thread
From: John Fremlin @ 2001-05-15 15:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

Linus Torvalds <torvalds@transmeta.com> writes:

[...]

> Nobody really uses it because it would require you to add a line or
> two to your init scripts to pick up the major number from
> /proc/devices, and that's obviously too hard. Much better to just
> hardcode randome numbers, right?

And thereby avoid using procfs. Hardcoding is the way the BSDs seem to
be going.

Clueless suggestion: I suppose you could allocate numbers on kernel
build or something.

[...]

-- 

	http://ape.n3.net

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 15:10                     ` Linus Torvalds
  2001-05-15 15:29                       ` Alexander Viro
@ 2001-05-15 17:21                       ` James Simmons
  2001-05-15 17:25                         ` Alexander Viro
  2001-05-15 18:02                       ` Ingo Oeser
                                         ` (3 subsequent siblings)
  5 siblings, 1 reply; 317+ messages in thread
From: James Simmons @ 2001-05-15 17:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro


> > Given a file handle 'X' how do I find out what ioctl groups I should apply to
> > it. So we can go from
> > 
> > 	if(MAJOR(st.st_rdev) == ST_MAJOR)
> > 		issue_scsi_ioctls
> > 	else if(MAJOR(st.st_rdev) == FTAPE_MAJOR)
> > 		issue_ftape_ioctls
> > 	else ..
> > 	else
> > 		error

[snip]..

> The fix, I think, is to make the ioctl commands much more regular. That is
> probably true in general, and fixing that would hopefully fix the need for
> horrible code like the above.

   I have to agree. We also ran into this problem for the framebuffer
layer. If you look in fb.h you see abunch of driver specific ioctl apart
of the standard ioctl space. One purposal I had was that we allocate the
first 50 to be "standard" ioctl calls which all drivers could support
and the rest driver specific. Well no one liked that. Now I look back I
agree. It was a bad idea. So what is the solution?

What I wish was done was the very first ioctl call was a generic ioctl
call to pass driver specific data. Basically you have something like this:

struct fb_driver_specific_data {
	__u32 magic_identifier;
	__u32 size_of_data_packet;
	char *data_buffer;
} 

ioctl(FBIO_DRIVER_SPECIFIC, struct fb_driver_specific_data);

This data would be passed to the driver. The driver would then valid it
and process the data. If not it would ignore it. Here you have just one
ioctl call that every driver could use but yet it can do driver specific
functionality.

   The rest of ioctl space would be "standard" ioctl calls that all
drivers or more than 90% could use. A good example is blanking. Pretty
much all graphics hardware supports this function. 
   Now what about extensions on these standard functionality. Again the
blanking function is a good example. Several handheld devices support back
and front lighting and you can control the level of brightness of the
screen. In this case you can expand the blanking function. You can promise
some sort of blanking can occur but you can't promise you can control the
brightness of the screen. If you ask a driver to do this and it can't
deliever you just let the user know you can't do it. Simple as that.

> doesn't look horrible, and I don't see why we couldn't expose the "driver
> name" for any file descriptor. We already do for some: "fstatfs()" is
> largely the same thing on another level.

I don't find that a bad thing either.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:21                       ` James Simmons
@ 2001-05-15 17:25                         ` Alexander Viro
  2001-05-15 17:29                           ` James Simmons
  0 siblings, 1 reply; 317+ messages in thread
From: Alexander Viro @ 2001-05-15 17:25 UTC (permalink / raw)
  To: James Simmons
  Cc: Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List



On Tue, 15 May 2001, James Simmons wrote:

> What I wish was done was the very first ioctl call was a generic ioctl
> call to pass driver specific data. Basically you have something like this:
> 
> struct fb_driver_specific_data {
> 	__u32 magic_identifier;
> 	__u32 size_of_data_packet;
> 	char *data_buffer;
> } 

It's called write(2). magic_identifier: which file we are writing to.
size_of_data_packet: length. data_buffer: buffer we write from.

And if write() has too much overhead - we'd better fix _that_, because
it's much more likely hotspot than ioctl ever will be.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 15:45                         ` Linus Torvalds
@ 2001-05-15 17:27                           ` James Simmons
  2001-05-15 17:43                             ` Linus Torvalds
  2001-05-15 20:02                             ` Dan Hollis
  0 siblings, 2 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 17:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Garzik, Alan Cox, Neil Brown, H. Peter Anvin,
	Linux Kernel Mailing List, viro


> > /dev/fbN, /dev/dspN, /dev/videoN, ...
> 
> I still don't see why they couldn't be misc drivers? 
> 
> Sure, some of them already exist and all that, and we need to support
> their major numbers just for backwards compatibility reasons. But a simple
> "give me 16 minors, please" should work fine, together with minimal
> infrastructure to create the nodes.
> 
> Think of the problem as a hot-plug issue. We don't want to statically
> allocate device numbers etc for hotplug - we create the nodes on an
> as-needed basis when the device is plugged in, and it's fairly easy to do
> with a /sbin/hotplug kind of approach.
> 
> Static devices like /dev/fbN are no different. They were just plugged in
> before the OS booted.

Actually their are hotplug video cards. High end servers have hot swapable 
graphcis cards. Would you want to take down a very important server
because the graphics card went dead. You pull it out and you plug a new
one in. Also their are PCMCIA video cards. I have seen them for the hand
held ipaqs. It is only a matter of time before all devices are hot
swappable. 


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:25                         ` Alexander Viro
@ 2001-05-15 17:29                           ` James Simmons
  2001-05-15 17:32                             ` Alexander Viro
  2001-05-15 18:04                             ` Linus Torvalds
  0 siblings, 2 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 17:29 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List


> > What I wish was done was the very first ioctl call was a generic ioctl
> > call to pass driver specific data. Basically you have something like this:
> > 
> > struct fb_driver_specific_data {
> > 	__u32 magic_identifier;
> > 	__u32 size_of_data_packet;
> > 	char *data_buffer;
> > } 
> 
> It's called write(2). magic_identifier: which file we are writing to.
> size_of_data_packet: length. data_buffer: buffer we write from.
> 
> And if write() has too much overhead - we'd better fix _that_, because
> it's much more likely hotspot than ioctl ever will be.

I would use write except we use write to draw into the framebuffer. If I
write to the framebuffer with that data the only thing that will happen is
I will get pretty colors on my screen. 


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:29                           ` James Simmons
@ 2001-05-15 17:32                             ` Alexander Viro
  2001-05-15 17:44                               ` James Simmons
  2001-05-15 21:46                               ` Chip Salzenberg
  2001-05-15 18:04                             ` Linus Torvalds
  1 sibling, 2 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-15 17:32 UTC (permalink / raw)
  To: James Simmons
  Cc: Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List



On Tue, 15 May 2001, James Simmons wrote:

> I would use write except we use write to draw into the framebuffer. If I
> write to the framebuffer with that data the only thing that will happen is
> I will get pretty colors on my screen. 

Yes. And we also use write to send data to printer. So what? Nobody makes
you use the same file.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 19:19 LANANA: To Pending Device Number Registrants H. Peter Anvin
  2001-05-14 19:36 ` Jeff Garzik
  2001-05-14 20:09 ` Richard Gooch
@ 2001-05-15 17:37 ` Pavel Machek
  2001-05-17 11:32   ` Alan Cox
  2001-05-16 15:58 ` Kurt Garloff
  3 siblings, 1 reply; 317+ messages in thread
From: Pavel Machek @ 2001-05-15 17:37 UTC (permalink / raw)
  To: H. Peter Anvin, torvalds; +Cc: Linux Kernel Mailing List


Hi!

> First of all, I apologize for not having sent this notice out sooner. 
> This kind of writing is very painful to deal with.
> 
> Linus Torvalds has requested a moratorium on new device number
> assignments. His hope is that a new and better method for device space
> handing will emerge as a result.

Linus, Is that wise? I could understand moratorium during 2.5, but during 2.4?!

And worse, what about drivers that want to be merged into 2.2?
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:27                           ` James Simmons
@ 2001-05-15 17:43                             ` Linus Torvalds
  2001-05-15 18:04                               ` Jeff Garzik
                                                 ` (3 more replies)
  2001-05-15 20:02                             ` Dan Hollis
  1 sibling, 4 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 17:43 UTC (permalink / raw)
  To: James Simmons
  Cc: Jeff Garzik, Alan Cox, Neil Brown, H. Peter Anvin,
	Linux Kernel Mailing List, viro


On Tue, 15 May 2001, James Simmons wrote:
> > 
> > Static devices like /dev/fbN are no different. They were just plugged in
> > before the OS booted.
> 
> Actually their are hotplug video cards. High end servers have hot swapable 
> graphcis cards. Would you want to take down a very important server
> because the graphics card went dead. You pull it out and you plug a new
> one in. Also their are PCMCIA video cards. I have seen them for the hand
> held ipaqs. It is only a matter of time before all devices are hot
> swappable. 

True, but not really necessarily important.

The thing is, even if the device happens to be soldered down, inside a
computer that is locked in a safe, the question boils down to a fairly
simple one: "how do we approach devices?".

Do we approach devices as something static, or do we approach them as more
dynamic entities? Do we consider soldered-down devices to be fundamentally
different from the ones that can be hot-plugged?

And my opinion is that the "hot-plugged" approach works for devices even
if they are soldered down - the "plugging" event just always happens
before the OS is booted, and people just don't unplug it. So we might as
well consider devices to always be hot-pluggable, whether that is actually
physically true or not. Because that will always work, and that way we
don't create any artificial distinctions (and they often really _are_
artifical: historically soldered-down devices tend to eventually move in a
more hot-pluggable direction, as you point out).

Now, if we just fundamentally try to think about any device as being
hot-pluggable, you realize that things like "which PCI slot is this device
in" are completely _worthless_ as device identification, because they
fundamentally take the wrong approach, and they don't fit the generic
approach at all.

But this is also why I don't think static device numbers make any
sense. It's silly to have the same disk show up as different devices just
because it is connected to a different kind of controller. And it is
_really_ silly to statically pre-allocate device numbers based on the
"location" of a device. 

We should strive for a setup where device plugin causes that device to
show up in /dev, and everywhere else it is needed. And the logical
extension of such a setup is to consider built-in devices to be plugged in
at bootup.

This is true to the point that I would not actually think that it is a bad
idea to call /sbin/hotplug when we enumerate the motherboard devices. In
fact, if you look at the current network drivers, this is exactly what
will happen: when we auto-detect the motherboard devices, we _will_
actually call /sbin/hotplug to tell that we've "inserted" a network
device.

It's just that we haven't really mounted the root filesystem yet, so
user-land never actually "sees" this fact. But I think it's the right
approach to take, and realizing that even static devices are just a
sub-case of the problem of dynamic allocation means that you tend to
automatically also see that static device number allocation is just
broken.

[ The biggest silliness is this "let's try to make the disks appear in the
  same order that the BIOS probes them". Now THAT is really stupid, and it
  goes on a lot more than I'd ever like to see. ]

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:32                             ` Alexander Viro
@ 2001-05-15 17:44                               ` James Simmons
  2001-05-15 18:18                                 ` Ingo Oeser
                                                   ` (2 more replies)
  2001-05-15 21:46                               ` Chip Salzenberg
  1 sibling, 3 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 17:44 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List


> > I would use write except we use write to draw into the framebuffer. If I
> > write to the framebuffer with that data the only thing that will happen is
> > I will get pretty colors on my screen. 
> 
> Yes. And we also use write to send data to printer. So what? Nobody makes
> you use the same file.

Well creating a new device wouldn't make linus happen right now. I do
agree ioctl calls are evil!!!! You only have X amount of them. With write
you can have infinte amounts of different functions to perform on a
device. I didn't design fbdev :-( If I did it would have been far
different. I do plan on some day merging drm and fbdev into one interface. So
I plan to change this behavior. I like to see this interface ioctl-less
(is their such a word ???). You mmap to alter buffers. Mmap is much more
flexiable than write for graphics buffers anyways. You use write to pass
"data" to the driver.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 15:10                     ` Linus Torvalds
  2001-05-15 15:29                       ` Alexander Viro
  2001-05-15 17:21                       ` James Simmons
@ 2001-05-15 18:02                       ` Ingo Oeser
  2001-05-15 19:31                       ` Richard Gooch
                                         ` (2 subsequent siblings)
  5 siblings, 0 replies; 317+ messages in thread
From: Ingo Oeser @ 2001-05-15 18:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro

On Tue, May 15, 2001 at 08:10:29AM -0700, Linus Torvalds wrote:
> That said:
> 
> > 	/* Use scsi if possible [scsi, ide-scsi, usb-scsi, ...] */
> > 	if(HAS_FEATURE_SET(fd, "scsi-tape"))
> > 		...
> > 	else if(HAS_FEATURE_SET(fd, "floppy-tape"))
> > 		..
> 
> doesn't look horrible,

That is good and the thing other OS do since years. The call it
"DEVCAPS" or "device capabilities". They use bitmasks for this
(which might not be perfect).

> and I don't see why we couldn't expose the "driver
> name" for any file descriptor.

Because we dont like to replace:

   if (st.device == MAJOR_1)
      bla
   else if ...

with

   if (!strcmp(st.device,"driver_1") )
      bla
   else if ...

?

There is no win doing it this way, because every time we add a
new driver that fits or change the name of one, we need add
support for it.

But the device majors are not needed for this, that's true ;-)

Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
         <<<<<<<<<<<<     been there and had much fun   >>>>>>>>>>>>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:43                             ` Linus Torvalds
@ 2001-05-15 18:04                               ` Jeff Garzik
  2001-05-15 18:15                                 ` Linus Torvalds
                                                   ` (2 more replies)
  2001-05-15 18:19                               ` James Simmons
                                                 ` (2 subsequent siblings)
  3 siblings, 3 replies; 317+ messages in thread
From: Jeff Garzik @ 2001-05-15 18:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: James Simmons, Alan Cox, Neil Brown, H. Peter Anvin,
	Linux Kernel Mailing List, viro

Linus Torvalds wrote:
> And my opinion is that the "hot-plugged" approach works for devices even
> if they are soldered down

agreed, as you probably know :)


> Now, if we just fundamentally try to think about any device as being
> hot-pluggable, you realize that things like "which PCI slot is this device
> in" are completely _worthless_ as device identification, because they
> fundamentally take the wrong approach, and they don't fit the generic
> approach at all.

Should I interpret this as you disagreeing with
exporting-bus-info-to-userspace type additions?  ie. some random
get-info ioctl spits out pci_dev->slot_name to userspace.

I believe there are rare cases where this is useful.  When one already
has the /dev node (via an open fd used for ioctl, usually), additionally
you need the bus info to make an association between an active device on
the hardware bus, and an active driver in the kernel.  X could use this
info to figure out which fbdev devices to avoid.  SCSI is already using
similar info, as of 2.4.4, as are net devs.  Userspace apps that diddle
hardware are a definite minority case, but for that case the PCI slot
info is useful.


> This is true to the point that I would not actually think that it is a bad
> idea to call /sbin/hotplug when we enumerate the motherboard devices.

Don't ask for it or you might actually get it.... ;-)  I think having a
pci_driver for northbridge and southbridge devices would make ACPI-free
PM easy and achieveable.

	Jeff


-- 
Jeff Garzik      | Game called on account of naked chick
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:29                           ` James Simmons
  2001-05-15 17:32                             ` Alexander Viro
@ 2001-05-15 18:04                             ` Linus Torvalds
  2001-05-15 18:58                               ` Johannes Erdfelt
                                                 ` (3 more replies)
  1 sibling, 4 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 18:04 UTC (permalink / raw)
  To: James Simmons
  Cc: Alexander Viro, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List


On Tue, 15 May 2001, James Simmons wrote:
> > 
> > And if write() has too much overhead - we'd better fix _that_, because
> > it's much more likely hotspot than ioctl ever will be.
> 
> I would use write except we use write to draw into the framebuffer. If I
> write to the framebuffer with that data the only thing that will happen is
> I will get pretty colors on my screen. 

Note that this was the same argument that the USB people had, and it was
wrong then. It's wrong now.

The USB people decided on using ioctl's, because the way USB works you
send a packet down a "USB pipe", which is identified by the direction, the
device number and the type (and other details). So what the USB system
does to expose this to user land is very similar to what you propose for
ioctl's: a structured ioctl that has a "data" field.

What Al is saying, and what makes perfect sense is that you generate a
separate fd for each "pipe". It's even more obvious in the case of USB,
because, by golly, the things are actually _called_ "pipes" in the USB
documentation, which should have made people make the immediate
association. Instead of doing

	fd = open("unstructured-name" ...);
	ioctl(fd, MAGICIOCTL, { structured data });

you do

	fd = open("/structured/name", ...);
	write(fd, data, size);

or possibly you take a more socket-like approach and do

	fd = socket(part-of-the-structure);
	bind(fd, more-of-the-structure)
	connect(fd, last-part-of-the-structure);

and use write() there (or use "sendto()" etc which allow more dynamic
structure constructs - you don't have to statically bind the fd early at
bind/connect time.

See? 

Don't get boxed in by thinking that you only have one fd. Even if you have
only one _device_node_, you can have multiple fd's. In fact, you can, with
the Linux VFS layer, fairly easily do things like

	mknod /dev/fd0 c X Y

and then use

	fd = open("/dev/fd0/colourspace", O_RDWR);

and your device just implements some trivial "lookup()" functions (you
don't _have_ to be a directory to allow name lookups - although right now
I suspect that you can confuse the VFS layer if you aren't. That's a VFS
layer deficiency, if so. Nobody has tested it, but it should be really
easy to fix if somebody is really interested).

Note that with these kinds of things, you don't need ugly ioctl's. The
code, I bet, would be a LOT more readable. There's nothing fundamentally
impossible with having

	> /dev/fd0/eject

cause an eject event on /dev/fd0. It would be fairly easy, I bet, to
expand the current "struct file_operations def_blk_fops" to also include
_dentry_ operations, and then all of this could be done by fs/block_dev.c,
with the actual device drivers not having to know about it.

Same thing with character devices. We should be fairly easily able to make
something like

	fd = open("/dev/fd0/colourspace=1", ...)

be fully parsed by the fs/block_dev.c layer: we could add a nice string to
"bd_op->open()", to be passed in to the device driver to do with as it
wishes. That would require _no_ changes from device driver writers except
the addition of a new argument ("const char * arg") and the choice to
possibly using that argument for extra structure..

This, btw, is Al Viro's wet dream. But I have to agree: using name spaces
etc is MUCH preferable to ioctl's, makes code more readable and logical,
and often makes it possible to do things you couldn't sanely do before
(control these things from scripts etc).

And using ASCII names ("eject") instead of numbers (see the "FDEJECT" and
"CDROMEJECT" etc #defines) sure as hell makes for easier maintenance and
avoids the whole issue of maintaining static numbers (all the same things
that make me hate device number maintenance makes me also hate the fact
that we need to maintain this list of ioctl numbers etc). By using
descriptive names, the "maintenance" simple does not exist.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 18:04                               ` Jeff Garzik
@ 2001-05-15 18:15                                 ` Linus Torvalds
  2001-05-15 19:36                                   ` Jonathan Lundell
  2001-05-17 21:23                                   ` Kai Henningsen
  2001-05-15 19:33                                 ` Kai Henningsen
  2001-05-16  7:25                                 ` Geert Uytterhoeven
  2 siblings, 2 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 18:15 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: James Simmons, Alan Cox, Neil Brown, H. Peter Anvin,
	Linux Kernel Mailing List, viro


On Tue, 15 May 2001, Jeff Garzik wrote:
> 
> > Now, if we just fundamentally try to think about any device as being
> > hot-pluggable, you realize that things like "which PCI slot is this device
> > in" are completely _worthless_ as device identification, because they
> > fundamentally take the wrong approach, and they don't fit the generic
> > approach at all.
> 
> Should I interpret this as you disagreeing with
> exporting-bus-info-to-userspace type additions?  ie. some random
> get-info ioctl spits out pci_dev->slot_name to userspace.

Yes and no.

I'm absolutely _not_ against exporting information. That kind of
information can be very useful to help the user diagnose things, by
visualizing device layout etc (ie think here of a "device
manager" application that does all the pretty graphics that people so
enjoy).

Giving that kind of information to the user can be very useful indeed. And
I have no arguments against it.

The part I absolutely detest is when the information becomes more than
just "information", and is used to enforce a world-view. Anybody who uses
physical location for naming devices (ie you have to know where the hell
the thing is in order to look it up), is so far out to lunch that it's not
even funny. And the sad fact is that this is pretty much how ALL unixes
have historically done things ("Oh, you want to see the disk? Sure. It's
on scsi bus 1, channel 2, ID 3, lun 0, so you just open /dev/s1c3l0 and
you're done! Easy as pie!").

Keep it informational. And NEVER EVER make it part of the design.

That way, people who grew up with big unix machines can have their scripts
that creates the stupid names dynamically on the fly, and still play at
being bound to a static naming scheme that was silly 20 years ago and is
just incredibly stupid today. There's a script for doing exactly this for
SCSI. I forget what it's called, because I obviously think the thing is
stupid, but giving people the power to do even silly things is what Linux
is all about.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:44                               ` James Simmons
@ 2001-05-15 18:18                                 ` Ingo Oeser
  2001-05-15 18:36                                   ` James Simmons
  2001-05-15 18:42                                 ` Alexander Viro
  2001-05-16  8:29                                 ` Helge Hafting
  2 siblings, 1 reply; 317+ messages in thread
From: Ingo Oeser @ 2001-05-15 18:18 UTC (permalink / raw)
  To: James Simmons; +Cc: Alexander Viro, Linux Kernel Mailing List

On Tue, May 15, 2001 at 10:44:23AM -0700, James Simmons wrote:
> different. I do plan on some day merging drm and fbdev into one interface. So
> I plan to change this behavior. I like to see this interface ioctl-less
> (is their such a word ???). You mmap to alter buffers. Mmap is much more
> flexiable than write for graphics buffers anyways. You use write to pass
> "data" to the driver.

The only problem with mmap(): You cannot know, if the page
changed under you a**.

What would first mmap()ed page of the screen look like, if some
accelerator wrote a line there? Invalidating all mmap()ed pages
for each and every accelerator command would be evil. Forbidding
reads of that page is evil, too.

I have the same problem with DSPs, which like to mmap() some of
their memory into the application, but can alter this memory
every instruction the execute.

mmap() has it's beauties, but ...

Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
         <<<<<<<<<<<<     been there and had much fun   >>>>>>>>>>>>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:43                             ` Linus Torvalds
  2001-05-15 18:04                               ` Jeff Garzik
@ 2001-05-15 18:19                               ` James Simmons
  2001-05-15 20:23                               ` Alan Cox
  2001-05-15 21:52                               ` Andreas Dilger
  3 siblings, 0 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 18:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Garzik, Alan Cox, Neil Brown, H. Peter Anvin,
	Linux Kernel Mailing List, viro


> And my opinion is that the "hot-plugged" approach works for devices even
> if they are soldered down - the "plugging" event just always happens
> before the OS is booted, and people just don't unplug it. 

I absolutely agree with you. This is the way I always think of devices.

> Now, if we just fundamentally try to think about any device as being
> hot-pluggable, you realize that things like "which PCI slot is this device
> in" are completely _worthless_ as device identification, because they
> fundamentally take the wrong approach, and they don't fit the generic
> approach at all.
>
> But this is also why I don't think static device numbers make any
> sense. It's silly to have the same disk show up as different devices just
> because it is connected to a different kind of controller. And it is
> _really_ silly to statically pre-allocate device numbers based on the
> "location" of a device. 

   I agree. It gets worse when we consider multiple bus and NUMA machines.
On a NUMA system you have to ask yourself how should a piece of hardware
look from the node it is on and from another node. Should they look the
same. Then their is the visiblity issue. Which node can see this piece of
hardware? Some devices span several nodes, some only one node. Then on
some NUMA systems you can setup "raid" like systems. If one peice of
hardware fails on a node then another piece of hardware on some other node 
that was caching the hardware state of the hardware that just falied takes
over. Then we have the famous ISA devices in multiple PCI bus systems.
NUMA systems with nodes that multiple buses would just compound the
problem.
 


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 18:18                                 ` Ingo Oeser
@ 2001-05-15 18:36                                   ` James Simmons
  0 siblings, 0 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 18:36 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Alexander Viro, Linux Kernel Mailing List


> The only problem with mmap(): You cannot know, if the page
> changed under you a**.
> 
> What would first mmap()ed page of the screen look like, if some
> accelerator wrote a line there? Invalidating all mmap()ed pages
> for each and every accelerator command would be evil. Forbidding
> reads of that page is evil, too.
> 
> I have the same problem with DSPs, which like to mmap() some of
> their memory into the application, but can alter this memory
> every instruction the execute.

I know about this problem for some time :-( Unfortunely most cards don't
have OpenGL or some similar api on chip. Of course you don't have to
invalid all the mappings. Only the ones the accelerator affected. This
plus proper serialization could over come this. 



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:44                               ` James Simmons
  2001-05-15 18:18                                 ` Ingo Oeser
@ 2001-05-15 18:42                                 ` Alexander Viro
  2001-05-16  8:29                                 ` Helge Hafting
  2 siblings, 0 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-15 18:42 UTC (permalink / raw)
  To: James Simmons
  Cc: Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List



On Tue, 15 May 2001, James Simmons wrote:

> Well creating a new device wouldn't make linus happen right now. I do
> agree ioctl calls are evil!!!! You only have X amount of them. With write
> you can have infinte amounts of different functions to perform on a
> device. I didn't design fbdev :-( If I did it would have been far
> different. I do plan on some day merging drm and fbdev into one interface. So
> I plan to change this behavior. I like to see this interface ioctl-less
> (is their such a word ???). You mmap to alter buffers. Mmap is much more
> flexiable than write for graphics buffers anyways. You use write to pass
> "data" to the driver.

For data - maybe (but you lose any semblance of network transparency).
For commands? No fscking way in hell.

Look, we used to live in the world where every bloody action with
every bloody device required a special application (or macro, or
special library or equivalent abortion). It sucked. It _still_
sucks that way in CP/M and VMS lands.

The only reason for such suckitude is laziness. We shouldn't need to
do fscking voodoo to change modem speed. We shouldn't need it to change
font. We shouldn't need it to rewind tape or format disk. We _have_ to
do it because "ioctls are good enough and allow to do it fast, dirty and
without thinking about good APIs".

But guess what? mmap() also means that you need special applications.
mmap() also doesn't work over the network. You may need it as a performance
hack for massive data transfers, but if you hit memory bandwidth limitiations
on stuff like changing palette... Well, maybe you should spend time doing
something more productive. Like picking nose or masturbating.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 23:18                         ` Arjan van de Ven
  2001-05-14 23:20                           ` Alan Cox
@ 2001-05-15 18:57                           ` Kai Henningsen
  1 sibling, 0 replies; 317+ messages in thread
From: Kai Henningsen @ 2001-05-15 18:57 UTC (permalink / raw)
  To: linux-kernel

alan@lxorguk.ukuu.org.uk (Alan Cox)  wrote on 15.05.01 in <E14zRdW-0001gY-00@the-village.bc.nu>:

> > > it to a device number at /sbin/lilo time.  An idiotic practice on the
> > > part of LILO, in my opinion, that ought to have been fixed a long time
> > > ago.
> >
> > That's why you want mount-root-by-partition-label, not by device
>
> Which in itself adds the 'and how does the label tell me what modules to
> load' question..

The same way the old root device does: not. Why should it? It doesn't tell  
you what to eat for lunch either.

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 18:04                             ` Linus Torvalds
@ 2001-05-15 18:58                               ` Johannes Erdfelt
  2001-05-15 19:17                                 ` Linus Torvalds
  2001-05-17 20:40                                 ` Kai Henningsen
  2001-05-15 20:03                               ` James Simmons
                                                 ` (2 subsequent siblings)
  3 siblings, 2 replies; 317+ messages in thread
From: Johannes Erdfelt @ 2001-05-15 18:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: James Simmons, Alexander Viro, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List

On Tue, May 15, 2001, Linus Torvalds <torvalds@transmeta.com> wrote:
> 
> On Tue, 15 May 2001, James Simmons wrote:
> > > 
> > > And if write() has too much overhead - we'd better fix _that_, because
> > > it's much more likely hotspot than ioctl ever will be.
> > 
> > I would use write except we use write to draw into the framebuffer. If I
> > write to the framebuffer with that data the only thing that will happen is
> > I will get pretty colors on my screen. 
> 
> Note that this was the same argument that the USB people had, and it was
> wrong then. It's wrong now.

Hardly pretty and definately not perfect, but that's mostly because we
haven't solved the other problems yet.

> The USB people decided on using ioctl's, because the way USB works you
> send a packet down a "USB pipe", which is identified by the direction, the
> device number and the type (and other details). So what the USB system
> does to expose this to user land is very similar to what you propose for
> ioctl's: a structured ioctl that has a "data" field.
> 
> What Al is saying, and what makes perfect sense is that you generate a
> separate fd for each "pipe". It's even more obvious in the case of USB,
> because, by golly, the things are actually _called_ "pipes" in the USB
> documentation, which should have made people make the immediate
> association. Instead of doing

That was not why we did it that way. We used ioctl's because there is no
way to apply USB semantics to file streams.

> 	fd = open("unstructured-name" ...);
> 	ioctl(fd, MAGICIOCTL, { structured data });
> 
> you do
> 
> 	fd = open("/structured/name", ...);
> 	write(fd, data, size);

That was the plan, but unfortunately USB pipes aren't what you or I
would consider a pipe normally.

Take isochronous pipe's for instance. How would we apply that to a
normal file stream? Or an interrupt pipe?

Even bulk has issues because USB pipe's aren't necessarily streams, they
can packetized in the psuedo weird way that USB does things.

I'm all for seperating out each endpoint into a seperate file in your
/structured/name way since it's necessary for multi interface devices,
but we still have problems with sharing the default control pipe for
instance.

> or possibly you take a more socket-like approach and do
> 
> 	fd = socket(part-of-the-structure);
> 	bind(fd, more-of-the-structure)
> 	connect(fd, last-part-of-the-structure);

I don't like socket's since we do have a well bound set of endpoints. We
don't have 4 billion IP's with 64k ports to choose from. We have x
endpoints that the device tells us about ahead of time.

> and use write() there (or use "sendto()" etc which allow more dynamic
> structure constructs - you don't have to statically bind the fd early at
> bind/connect time.

sendto() is the only reason I can think that socket's would be useful.
bulk pipes are more like a sequenced datagram transport than anything.

However, it still doesn't seem to apply cleanly to the other 3 types of
pipe type's.

Perhaps we can create some new families specifically for USB?

> This, btw, is Al Viro's wet dream. But I have to agree: using name spaces
> etc is MUCH preferable to ioctl's, makes code more readable and logical,
> and often makes it possible to do things you couldn't sanely do before
> (control these things from scripts etc).

I think this is an excellent idea as well. Just USB is poor example
since the real problem with USB is not the naming, it's applying USB
semantics to file streams or sockets.

JE


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 18:58                               ` Johannes Erdfelt
@ 2001-05-15 19:17                                 ` Linus Torvalds
  2001-05-15 19:23                                   ` H. Peter Anvin
  2001-05-15 19:43                                   ` Johannes Erdfelt
  2001-05-17 20:40                                 ` Kai Henningsen
  1 sibling, 2 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 19:17 UTC (permalink / raw)
  To: Johannes Erdfelt
  Cc: James Simmons, Alexander Viro, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List


On Tue, 15 May 2001, Johannes Erdfelt wrote:
> 
> Even bulk has issues because USB pipe's aren't necessarily streams, they
> can packetized in the psuedo weird way that USB does things.

This is ok. "pipe" does not mean that the write data doesn't have
boundaries.

Think about UDP. It's done with file desriptors, yet it is very much
packetized. 

Even a regular "pipe" actually has packet behaviour: a single write of <
PIPEBUF is guaranteed by UNIX to complete atomically, which is exactly so
that people can use pipes in a "packet" environment.

A file descriptor does NOT imply that the data you read or write must be
one mushy stream of bytes. It's ok to honour write() packet boundaries
etc.

You should absolutely NOT think that "we cannot send a packet down the
control pipe because multiple writers might confuse each other". You can
still require that separate packets be cleanly delimeted.

It's a huge mistake to think that you _have_ to use ioctl's to get
"packet" behaviour, or to get structured reads/writes. 

The advantage of read/write is that it doesn't _force_ a packet on you,
but the kernel really doesn't care if you have some structure to your read
and write requests.

> > or possibly you take a more socket-like approach and do
> > 
> > 	fd = socket(part-of-the-structure);
> > 	bind(fd, more-of-the-structure)
> > 	connect(fd, last-part-of-the-structure);
> 
> I don't like socket's since we do have a well bound set of endpoints. We
> don't have 4 billion IP's with 64k ports to choose from. We have x
> endpoints that the device tells us about ahead of time.

Note that "sockets" != "IPv4". Sockets just have names, they can be IPv4
(4+2 byte things), they can be pathnames (UNIX domain) and they can be
large IPv6 (16+2 or whatever). Or they could be small USB names. There's
nothing fundamentally wrong with "binding" a one-byte address and a
one-byte "interface" name. You'd just create a AF_USB layer ;)

But no, I don't actually like sockets all that much myself. They are hard
to use from scripts, and many more people are familiar with open/close and
read/write.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 19:17                                 ` Linus Torvalds
@ 2001-05-15 19:23                                   ` H. Peter Anvin
  2001-05-15 19:43                                   ` Johannes Erdfelt
  1 sibling, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Erdfelt, James Simmons, Alexander Viro, Alan Cox,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List

Linus Torvalds wrote:
> 
> But no, I don't actually like sockets all that much myself. They are hard
> to use from scripts, and many more people are familiar with open/close and
> read/write.
> 

I always thought it was really strange that I couldn't open() a AF_UNIX
socket in order to write() to it (as a stream socket, of course.)  It
really makes a lot of things harder to do than it needs to be, and I
would still like to see this generalization done.

That being said, if USB exported a filesystem I don't see any good reason
why you shouldn't be able to advertise "socket" (S_ISSOCK()) objects and
simply have them accept open("/dev/usb/blah/blah") instead of
connect(AF_USB, ...) -- and still use send() and recv() where it is more
appropriate to do so than using read() and write().

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 15:10                     ` Linus Torvalds
                                         ` (2 preceding siblings ...)
  2001-05-15 18:02                       ` Ingo Oeser
@ 2001-05-15 19:31                       ` Richard Gooch
  2001-05-15 19:37                         ` H. Peter Anvin
                                           ` (2 more replies)
  2001-05-15 20:58                       ` Alan Cox
  2001-05-15 21:42                       ` Chip Salzenberg
  5 siblings, 3 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-15 19:31 UTC (permalink / raw)
  To: Ingo Oeser
  Cc: Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List, viro

Ingo Oeser writes:
> On Tue, May 15, 2001 at 08:10:29AM -0700, Linus Torvalds wrote:
> > and I don't see why we couldn't expose the "driver
> > name" for any file descriptor.
> 
> Because we dont like to replace:
> 
>    if (st.device == MAJOR_1)
>       bla
>    else if ...
> 
> with
> 
>    if (!strcmp(st.device,"driver_1") )
>       bla
>    else if ...
> 
> ?
> 
> There is no win doing it this way, because every time we add a
> new driver that fits or change the name of one, we need add
> support for it.

Now look at how we can already do these things with devfs. Let's say
I've opened /dev/cdroms/cdrom0 and it's sitting on fd=3.
% ls -lF /proc/self/fd/3
lrwx------   1 root     root           64 May 15 13:24 /proc/self/fd/3 -> /dev/ide/host0/bus0/target1/lun0/cd

So, in my application I do:
	len = readlink ("/proc/self/3", buffer, buflen);
	if (strcmp (buffer + len - 2, "cd") != 0) {
		fprintf (stderr, "Not a CD-ROM! Bugger off.\n");
		exit (1);
	}
	if (strncmp (buffer, "/dev/ide", 8) == 0) do_ide (fd);
	else if (strncmp (buffer, "/dev/scsi", 9) == 0) do_scsi (fd);
	else do_generic (fd);

That's a lot cleaner than relying on magic numbers, IMNSHO.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 18:04                               ` Jeff Garzik
  2001-05-15 18:15                                 ` Linus Torvalds
@ 2001-05-15 19:33                                 ` Kai Henningsen
  2001-05-16  7:25                                 ` Geert Uytterhoeven
  2 siblings, 0 replies; 317+ messages in thread
From: Kai Henningsen @ 2001-05-15 19:33 UTC (permalink / raw)
  To: linux-kernel

torvalds@transmeta.com (Linus Torvalds)  wrote on 15.05.01 in <Pine.LNX.4.21.0105151107290.2112-100000@penguin.transmeta.com>:

> just incredibly stupid today. There's a script for doing exactly this for
> SCSI. I forget what it's called, because I obviously think the thing is
> stupid, but giving people the power to do even silly things is what Linux
> is all about.

Are you maybe talking about scsidev? It can produce names like /dev/scsi/ 
sdh24-e000c0i12l0p1 (ugh). It can *also* create names like
/dev/scsi/QAt-p3 for "that's the third partition on the Quantum Atlas, I  
shouldn't put important stuff there because Quantums like to break". (The  
QAt part comes from a config file.)

The latter I've used for quite a while (until I found mount-by-UUID). The  
former is unspeakably ugly.

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 18:15                                 ` Linus Torvalds
@ 2001-05-15 19:36                                   ` Jonathan Lundell
  2001-05-15 20:18                                     ` Linus Torvalds
                                                       ` (2 more replies)
  2001-05-17 21:23                                   ` Kai Henningsen
  1 sibling, 3 replies; 317+ messages in thread
From: Jonathan Lundell @ 2001-05-15 19:36 UTC (permalink / raw)
  To: Linus Torvalds, Jeff Garzik
  Cc: James Simmons, Alan Cox, Neil Brown, H. Peter Anvin,
	Linux Kernel Mailing List, viro

At 11:15 AM -0700 2001-05-15, Linus Torvalds wrote:
>The part I absolutely detest is when the information becomes more than
>just "information", and is used to enforce a world-view. Anybody who uses
>physical location for naming devices (ie you have to know where the hell
>the thing is in order to look it up), is so far out to lunch that it's not
>even funny. And the sad fact is that this is pretty much how ALL unixes
>have historically done things ("Oh, you want to see the disk? Sure. It's
>on scsi bus 1, channel 2, ID 3, lun 0, so you just open /dev/s1c3l0 and
>you're done! Easy as pie!").
>
>Keep it informational. And NEVER EVER make it part of the design.

What about:

1 (network domain). I have two network interfaces that I connect to 
two different network segments, eth0 & eth1; they're ifconfig'd to 
the appropriate IP and MAC addresses. I really do need to know 
physically which (physical) hole to plug my eth0 cable into. 
(Extension: same situation, but it's a firewall and I've got 12 ports 
to connect.) (Extension #2: if I add a NIC to the system and reboot, 
I'd really prefer that the NICs already in use didn't get renumbered.)

2 (disk domain). I have multiple spindles on multiple SCSI adapters. 
I want to allocate them to more than one RAID0/1/5 set, with the 
usual considerations of putting mirrors on different adapters, 
spreading my RAID5 drives optimally, ditto stripes. I need (eg) SCSI 
paths to config all this, and I further need real physical locations 
to identify failed drives that need to be hot-replaced. The mirror 
members will move around as drives are replaced and hot spares come 
into play.

Seems like more that merely informational.

(A side observation: PCI or SCSI bus/device/lun/etc paths are not 
physical locations; you also need external hardware-specific 
knowledge to be able to talk about real physical locations in a way 
that does the system operator any good.)
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 19:31                       ` Richard Gooch
@ 2001-05-15 19:37                         ` H. Peter Anvin
  2001-05-15 20:10                         ` Alan Cox
  2001-05-15 21:41                         ` Richard Gooch
  2 siblings, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 19:37 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Ingo Oeser, Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	Linux Kernel Mailing List, viro

Richard Gooch wrote:
> 
> Ingo Oeser writes:
> > On Tue, May 15, 2001 at 08:10:29AM -0700, Linus Torvalds wrote:
> > > and I don't see why we couldn't expose the "driver
> > > name" for any file descriptor.
> >
> > Because we dont like to replace:
> >
> >    if (st.device == MAJOR_1)
> >       bla
> >    else if ...
> >
> > with
> >
> >    if (!strcmp(st.device,"driver_1") )
> >       bla
> >    else if ...
> >
> > ?
> >
> > There is no win doing it this way, because every time we add a
> > new driver that fits or change the name of one, we need add
> > support for it.
> 
> Now look at how we can already do these things with devfs. Let's say
> I've opened /dev/cdroms/cdrom0 and it's sitting on fd=3.
> % ls -lF /proc/self/fd/3
> lrwx------   1 root     root           64 May 15 13:24 /proc/self/fd/3 -> /dev/ide/host0/bus0/target1/lun0/cd
> 
> So, in my application I do:
>         len = readlink ("/proc/self/3", buffer, buflen);
>         if (strcmp (buffer + len - 2, "cd") != 0) {
>                 fprintf (stderr, "Not a CD-ROM! Bugger off.\n");
>                 exit (1);
>         }
>         if (strncmp (buffer, "/dev/ide", 8) == 0) do_ide (fd);
>         else if (strncmp (buffer, "/dev/scsi", 9) == 0) do_scsi (fd);
>         else do_generic (fd);
> 
> That's a lot cleaner than relying on magic numbers, IMNSHO.
> 

You know, Richard, this was an example on what *NOT* to do!

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 19:17                                 ` Linus Torvalds
  2001-05-15 19:23                                   ` H. Peter Anvin
@ 2001-05-15 19:43                                   ` Johannes Erdfelt
  2001-05-15 21:58                                     ` Chip Salzenberg
                                                       ` (3 more replies)
  1 sibling, 4 replies; 317+ messages in thread
From: Johannes Erdfelt @ 2001-05-15 19:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: James Simmons, Alexander Viro, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List

On Tue, May 15, 2001, Linus Torvalds <torvalds@transmeta.com> wrote:
> 
> On Tue, 15 May 2001, Johannes Erdfelt wrote:
> > 
> > Even bulk has issues because USB pipe's aren't necessarily streams, they
> > can packetized in the psuedo weird way that USB does things.
> 
> This is ok. "pipe" does not mean that the write data doesn't have
> boundaries.
> 
> Think about UDP. It's done with file desriptors, yet it is very much
> packetized. 
> 
> Even a regular "pipe" actually has packet behaviour: a single write of <
> PIPEBUF is guaranteed by UNIX to complete atomically, which is exactly so
> that people can use pipes in a "packet" environment.
> 
> A file descriptor does NOT imply that the data you read or write must be
> one mushy stream of bytes. It's ok to honour write() packet boundaries
> etc.
> 
> You should absolutely NOT think that "we cannot send a packet down the
> control pipe because multiple writers might confuse each other". You can
> still require that separate packets be cleanly delimeted.
> 
> It's a huge mistake to think that you _have_ to use ioctl's to get
> "packet" behaviour, or to get structured reads/writes. 

We never made that assumption. We used ioctl's since it was the easiest
and consistent way of solving the problem at the moment. We never said
it was the pefect solution :)

> The advantage of read/write is that it doesn't _force_ a packet on you,
> but the kernel really doesn't care if you have some structure to your read
> and write requests.

No argument here. I completely agree.

The problem with the shared control pipe is not 2 writers stomping on
each other, it's permissions. You can have 2 interfaces on one device
which are completely seperate from each other and you'd like 2 seperate
users/programs to have access to each interface. Each endpoint is
guaranteed to be unique to an interface, except for the default control
pipe.

A simple solution would be to clone the default control pipe for each
interface and manage the permissions independantly.

The major problem with read/write and USB is that while it can solve the
problem for control and bulk pipes, it can't for interrupt and
isochronous pipes.

> > > or possibly you take a more socket-like approach and do
> > > 
> > > 	fd = socket(part-of-the-structure);
> > > 	bind(fd, more-of-the-structure)
> > > 	connect(fd, last-part-of-the-structure);
> > 
> > I don't like socket's since we do have a well bound set of endpoints. We
> > don't have 4 billion IP's with 64k ports to choose from. We have x
> > endpoints that the device tells us about ahead of time.
> 
> Note that "sockets" != "IPv4". Sockets just have names, they can be IPv4
> (4+2 byte things), they can be pathnames (UNIX domain) and they can be
> large IPv6 (16+2 or whatever). Or they could be small USB names. There's
> nothing fundamentally wrong with "binding" a one-byte address and a
> one-byte "interface" name. You'd just create a AF_USB layer ;)

I had always made the assumption that sockets were created because you
couldn't easily map IPv4 semantics onto filesystems. It's unreasonable
to have a file for every possible IP address/port you can communicate
with.

It's not so unreasonable with USB however since the data set (endpoints)
is significantly smaller and manageable. It can be placed in the
filesystem namespace without any problems.

That being said, we can't solve all of USB's problems that way. AF_USB
would probably solve them all however.

> But no, I don't actually like sockets all that much myself. They are hard
> to use from scripts, and many more people are familiar with open/close and
> read/write.

Agreed.

It would be nice to use open/close/read/write for control and bulk and
sockets for interrupt and isochronous.

Although I think that's just too complicated. It's probably easier to
make everything a socket and deal with it that way.

JE


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:27                           ` James Simmons
  2001-05-15 17:43                             ` Linus Torvalds
@ 2001-05-15 20:02                             ` Dan Hollis
  1 sibling, 0 replies; 317+ messages in thread
From: Dan Hollis @ 2001-05-15 20:02 UTC (permalink / raw)
  To: James Simmons
  Cc: Jeff Garzik, Alan Cox, Neil Brown, H. Peter Anvin,
	Linux Kernel Mailing List

On Tue, 15 May 2001, James Simmons wrote:
> Actually their are hotplug video cards. High end servers have hot swapable
> graphcis cards. Would you want to take down a very important server
> because the graphics card went dead. You pull it out and you plug a new
> one in. Also their are PCMCIA video cards. I have seen them for the hand
> held ipaqs. It is only a matter of time before all devices are hot
> swappable.

All PCI is potentially hot pluggable right now.

There is also firewire to contend with.

-Dan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 18:04                             ` Linus Torvalds
  2001-05-15 18:58                               ` Johannes Erdfelt
@ 2001-05-15 20:03                               ` James Simmons
  2001-05-15 20:06                                 ` H. Peter Anvin
                                                   ` (3 more replies)
  2001-05-15 21:22                               ` Jan Harkes
  2001-05-15 21:39                               ` Martin Dalecki
  3 siblings, 4 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 20:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Viro, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List


> What Al is saying, and what makes perfect sense is that you generate a
> separate fd for each "pipe". It's even more obvious in the case of USB,
> because, by golly, the things are actually _called_ "pipes" in the USB
> documentation, which should have made people make the immediate
> association. Instead of doing

Graphics cards are the same way. Especially high end ones. They have pipes
as well. For low end cards you can think of them as single pipeline cards
with one pipe.

> See? 
> 
> Don't get boxed in by thinking that you only have one fd. Even if you have
> only one _device_node_, you can have multiple fd's. In fact, you can, with
> the Linux VFS layer, fairly easily do things like
> 
> 	mknod /dev/fd0 c X Y
> 
> and then use
> 
> 	fd = open("/dev/fd0/colourspace", O_RDWR);

Yipes!! I have to say UNIX has a tendency to teach you ioctl is the only
way. I have never thought outside of the box nor see anyone else in this
manner. This is absolutely brillant!!! I can see alot of possibilties with
this. 



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:03                               ` James Simmons
@ 2001-05-15 20:06                                 ` H. Peter Anvin
  2001-05-15 20:28                                   ` James Simmons
  2001-05-15 20:14                                 ` Alexander Viro
                                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 20:06 UTC (permalink / raw)
  To: James Simmons
  Cc: Linus Torvalds, Alexander Viro, Alan Cox, Neil Brown,
	Jeff Garzik, Linux Kernel Mailing List

James Simmons wrote:
> >
> > Don't get boxed in by thinking that you only have one fd. Even if you have
> > only one _device_node_, you can have multiple fd's. In fact, you can, with
> > the Linux VFS layer, fairly easily do things like
> >
> >       mknod /dev/fd0 c X Y
> >
> > and then use
> >
> >       fd = open("/dev/fd0/colourspace", O_RDWR);
> 
> Yipes!! I have to say UNIX has a tendency to teach you ioctl is the only
> way. I have never thought outside of the box nor see anyone else in this
> manner. This is absolutely brillant!!! I can see alot of possibilties with
> this.
> 

I actually suggested something like this a while ago, mainly w.r.t. how
to deal with serial ports (e.g. /dev/ttyS0/callout instead of /dev/cua0).

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 19:31                       ` Richard Gooch
  2001-05-15 19:37                         ` H. Peter Anvin
@ 2001-05-15 20:10                         ` Alan Cox
  2001-05-15 21:41                         ` Richard Gooch
  2 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-15 20:10 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Ingo Oeser, Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List, viro

> 	len = readlink ("/proc/self/3", buffer, buflen);
> 	if (strcmp (buffer + len - 2, "cd") != 0) {
> 		fprintf (stderr, "Not a CD-ROM! Bugger off.\n");
> 		exit (1);

And on my box cd is the cabbage dicer whoops

What I actually want to know is 'does it do cd ioctls, can I talk scsi
commands to it, does it support cabbage dicing ..


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:03                               ` James Simmons
  2001-05-15 20:06                                 ` H. Peter Anvin
@ 2001-05-15 20:14                                 ` Alexander Viro
  2001-05-15 20:30                                   ` H. Peter Anvin
                                                     ` (3 more replies)
  2001-05-15 20:17                                 ` H. Peter Anvin
  2001-05-15 21:59                                 ` Chip Salzenberg
  3 siblings, 4 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-15 20:14 UTC (permalink / raw)
  To: James Simmons
  Cc: Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List



On Tue, 15 May 2001, James Simmons wrote:

> > only one _device_node_, you can have multiple fd's. In fact, you can, with
> > the Linux VFS layer, fairly easily do things like
> > 
> > 	mknod /dev/fd0 c X Y
> > 
> > and then use
> > 
> > 	fd = open("/dev/fd0/colourspace", O_RDWR);
> 
> Yipes!! I have to say UNIX has a tendency to teach you ioctl is the only
> way. I have never thought outside of the box nor see anyone else in this
> manner. This is absolutely brillant!!! I can see alot of possibilties with
> this. 

The thing being, why thet hell create these device/directory hybrids?
Driver can export a tree and we mount it on fb0. After that you have
the whole set - yes, /dev/fb0/colourspace, etc. - no problem. And no
need to do mknod, BTW. Yes, we'll need to use /dev/fb0/frame for
frame itself. BFD...

You see, as soon as you want slightly more structured stuff (deeper than
one level) you need the dentry tree, yodda, yodda. IOW, you need a
filesystem anyway and it's easy to implement. Want me to do framebufferfs?
Would make a nice demo.  No majors. No minors. No ioctls. Less code than
in current tree.  ~3 days to implement.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:03                               ` James Simmons
  2001-05-15 20:06                                 ` H. Peter Anvin
  2001-05-15 20:14                                 ` Alexander Viro
@ 2001-05-15 20:17                                 ` H. Peter Anvin
  2001-05-15 21:59                                 ` Chip Salzenberg
  3 siblings, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 20:17 UTC (permalink / raw)
  To: James Simmons
  Cc: Linus Torvalds, Alexander Viro, Alan Cox, Neil Brown,
	Jeff Garzik, Linux Kernel Mailing List

James Simmons wrote:
> >
> >       fd = open("/dev/fd0/colourspace", O_RDWR);
> 
> Yipes!! I have to say UNIX has a tendency to teach you ioctl is the only
> way. I have never thought outside of the box nor see anyone else in this
> manner. This is absolutely brillant!!! I can see alot of possibilties with
> this.
> 

By the way, since this is of general interest...

I asked the POSIX people if there was anything in the Austin (Unix 2002)
draft that would prohibit this behaviour.  The response was more or less
of the form "we are not really sure if it's within the spec, but it is
perfectly reasonable."  The (only) issue seems to be whether or not the
requirement to deliver ENOTDIR in cases like this is absolute or if this
is a permissible extension.  The way I interpret what I got back was
pretty much "go for it and don't worry about it."

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 19:36                                   ` Jonathan Lundell
@ 2001-05-15 20:18                                     ` Linus Torvalds
  2001-05-15 20:26                                       ` Dan Hollis
                                                         ` (6 more replies)
  2001-05-18  2:18                                     ` Jonathan Lundell
  2001-05-19  8:42                                     ` Kai Henningsen
  2 siblings, 7 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 20:18 UTC (permalink / raw)
  To: Jonathan Lundell
  Cc: Jeff Garzik, James Simmons, Alan Cox, Neil Brown, H. Peter Anvin,
	Linux Kernel Mailing List, viro


On Tue, 15 May 2001, Jonathan Lundell wrote:
> >
> >Keep it informational. And NEVER EVER make it part of the design.
> 
> What about:
> 
> 1 (network domain). I have two network interfaces that I connect to 
> two different network segments, eth0 & eth1;

So?

Informational. You can always ask what "eth0" and "eth1" are.

There's another side to this: repeatability. A setup should be
_repeatable_.

This is what we have now. Network devices are called "eth0..N", and nobody
is complaining about the fact that the numbering is basically random. It
is _repeatable_ as long as you don't change your hardware setup, and the
numbering has effectively _nothing_ to do with "location".

You don't say "oh, I have my network card in PCI bus #2, slot #3,
subfunction #1, so I should do 'ifconfig netp2s3f1'". Right?

The location of the device is _meaningless_. 

Linux gets this right. We don't give 100Mbps cards different names from
10Mbps cards - and pcmcia cards show up in the same namespace as cardbus,
which is the same namespace as ISA. And it doesn't matter what _driver_ we
use.

The "eth0..N" naming is done RIGHT!

> 2 (disk domain). I have multiple spindles on multiple SCSI adapters. 

So? Same deal. You don't have eth0..N, you have disk0..N. 

What's the problem? It's _repeatable_, in that as long as you don't change
your disks, they'll show up the same way. But the 0..N doesn't imply that
the disks are anywhere special.

Linux gets this _somewhat_ right. The /dev/sdxxx naming is correct (or, if
you look at only IDE devices, /dev/hdxxx). The problem is that we don't
have a unified namespace, so unlike eth0..N we do _not_ have a unified
namespace for disks.

Your argument that names change if you add disks etc is complete crap. OF
COURSE they change. You cannot avoid it. Whatever scheme you use will
cause name-changes. The location-based one causes exactly the same kinds
of problems, except they are even worse - now you have to care which ID
your disk has etc. 

The argument that "if you use numbering based on where in the SCSI chain
the disk is, disks don't pop in and out" is absolute crap. It's not true
even for SCSI any more (there are devices that will aquire their location
dynamically), and it has never been true anywhere else. Give it up.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:43                             ` Linus Torvalds
  2001-05-15 18:04                               ` Jeff Garzik
  2001-05-15 18:19                               ` James Simmons
@ 2001-05-15 20:23                               ` Alan Cox
  2001-05-15 20:28                                 ` H. Peter Anvin
  2001-05-15 21:52                               ` Andreas Dilger
  3 siblings, 1 reply; 317+ messages in thread
From: Alan Cox @ 2001-05-15 20:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: James Simmons, Jeff Garzik, Alan Cox, Neil Brown, H. Peter Anvin,
	Linux Kernel Mailing List, viro

> And my opinion is that the "hot-plugged" approach works for devices even
> if they are soldered down - the "plugging" event just always happens
> before the OS is booted, and people just don't unplug it. So we might as

This is true on one condition. That you can ask the device what it is,
what it does and to an extent where it is and how you get to it.

Right now thyats much of what majors is about -but I still believe this is 
2.5 stuff

> show up in /dev, and everywhere else it is needed. And the logical
> extension of such a setup is to consider built-in devices to be plugged in
> at bootup.

Agreed

> [ The biggest silliness is this "let's try to make the disks appear in the
>   same order that the BIOS probes them". Now THAT is really stupid, and it
>   goes on a lot more than I'd ever like to see. ]

RIght - Lilo needs to know but nobody else should except when they need to ask
eg to find which disk failed

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:18                                     ` Linus Torvalds
@ 2001-05-15 20:26                                       ` Dan Hollis
  2001-05-15 22:14                                         ` Miles Lane
  2001-05-15 21:29                                       ` Alex Bligh - linux-kernel
                                                         ` (5 subsequent siblings)
  6 siblings, 1 reply; 317+ messages in thread
From: Dan Hollis @ 2001-05-15 20:26 UTC (permalink / raw)
  To: Linux Kernel Mailing List

This thread is becoming high enough volume and likely to become much more
so, perhaps a separate ml should be set up for it? linux-device-management
perhaps?

-Dan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:23                               ` Alan Cox
@ 2001-05-15 20:28                                 ` H. Peter Anvin
  0 siblings, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 20:28 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, James Simmons, Jeff Garzik, Neil Brown,
	Linux Kernel Mailing List, viro

Alan Cox wrote:
> 
> > [ The biggest silliness is this "let's try to make the disks appear in the
> >   same order that the BIOS probes them". Now THAT is really stupid, and it
> >   goes on a lot more than I'd ever like to see. ]
> 
> RIght - Lilo needs to know but nobody else should except when they need to ask
> eg to find which disk failed
> 

There would be some value to an informational ioctl() or other query
mechanism to give the firmware identifier (BIOS on PC platforms.)  This
may, of course, be "null" in which case you need to give an error message
if you're trying to boot from it!

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:06                                 ` H. Peter Anvin
@ 2001-05-15 20:28                                   ` James Simmons
  2001-05-15 21:20                                     ` Nicolas Pitre
  0 siblings, 1 reply; 317+ messages in thread
From: James Simmons @ 2001-05-15 20:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Alexander Viro, Alan Cox, Neil Brown,
	Jeff Garzik, Linux Kernel Mailing List


> I actually suggested something like this a while ago, mainly w.r.t. how
> to deal with serial ports (e.g. /dev/ttyS0/callout instead of /dev/cua0).

Very brillant. I like to see this as well, plus include the other serial
devices.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:14                                 ` Alexander Viro
@ 2001-05-15 20:30                                   ` H. Peter Anvin
  2001-05-15 20:41                                     ` Alexander Viro
  2001-05-15 20:37                                   ` Linus Torvalds
                                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 20:30 UTC (permalink / raw)
  To: Alexander Viro
  Cc: James Simmons, Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	Linux Kernel Mailing List

Alexander Viro wrote:
> 
> On Tue, 15 May 2001, James Simmons wrote:
> 
> > > only one _device_node_, you can have multiple fd's. In fact, you can, with
> > > the Linux VFS layer, fairly easily do things like
> > >
> > >     mknod /dev/fd0 c X Y
> > >
> > > and then use
> > >
> > >     fd = open("/dev/fd0/colourspace", O_RDWR);
> >
> > Yipes!! I have to say UNIX has a tendency to teach you ioctl is the only
> > way. I have never thought outside of the box nor see anyone else in this
> > manner. This is absolutely brillant!!! I can see alot of possibilties with
> > this.
> 
> The thing being, why thet hell create these device/directory hybrids?
> 

Permission management.  The permissions on the subnodes are inherited
from the main node, which is stored on a persistent medium.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:14                                 ` Alexander Viro
  2001-05-15 20:30                                   ` H. Peter Anvin
@ 2001-05-15 20:37                                   ` Linus Torvalds
  2001-05-15 20:56                                     ` Jeff Garzik
                                                       ` (2 more replies)
  2001-05-15 20:57                                   ` LANANA: To Pending Device Number Registrants James Simmons
  2001-05-17 20:33                                   ` Kai Henningsen
  3 siblings, 3 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 20:37 UTC (permalink / raw)
  To: Alexander Viro
  Cc: James Simmons, Alan Cox, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List


On Tue, 15 May 2001, Alexander Viro wrote:
> 
> The thing being, why thet hell create these device/directory hybrids?

Backwards compatibility, and the ability to automatically take advantage
of existing filesystems without having any administrative worries.

No real technical reason, in other words.

But trust me: avoiding administrative worry is a _big_ plus.

And making people think of device nodes as more of a "window" into the
driver is a good thing anyway.

> Driver can export a tree and we mount it on fb0. After that you have
> the whole set - yes, /dev/fb0/colourspace, etc. - no problem. And no
> need to do mknod, BTW. Yes, we'll need to use /dev/fb0/frame for
> frame itself. BFD...

Actually, we can just continue to use "/dev/fb0", which would continue to
work the way ti has always worked.

It's a mistake to think that a directory has to be a directory. Or to
think that a device node has to be a device node. It's perfectly ok to
just think of it as namespaces. So opening /dev/fb0 continues to open the
"master fd", whatever that means (in this case, the actual frame
buffer). The namespaces _under_ /dev/fb0 would be the control channels, or
in fact _anything_ that the frame buffer driver wants to expose.

They might also be exactly the same channel, except with certain magic
bits set. The example peter gave was fine: tty devices could very usefully
be opened with something like

	fd = open("/dev/tty00/nonblock,9600,n8", O_RDWR);

where we actually open up exactly the same channel as if we opened up
/dev/cua00, we just set the speed etc at the same time. Which makes things
a hell of a lot more readable, AND they are again easily done from
scripts. The above is exactly the kind of thing that UNIX has not done
well, and some others have done better (let's face it, even _DOS_ did it
better, for chrissake! Those callout devices and those ioctl's are a pain
in the ass, for no really good reason).

Using ASCII names for these kinds of channel controls is fine.

> You see, as soon as you want slightly more structured stuff (deeper than
> one level) you need the dentry tree, yodda, yodda. IOW, you need a
> filesystem anyway and it's easy to implement.

I want to ease people into this notion. I'm personally perfectly happy to
make it a real filesystem, if you are willing to write the code. But I've
become convinced that the transition has to be really simple, with no
administrative work.

It should be a case of "Just plug in a new kernel, and suddenly your
existing filesystem just allows you to do more! 20% more for the same
price! AND we'll throw in this useful ginzu knife for just 4.95 for
shipping and handling. Absolutely free!"

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:30                                   ` H. Peter Anvin
@ 2001-05-15 20:41                                     ` Alexander Viro
  2001-05-15 20:51                                       ` Linus Torvalds
  0 siblings, 1 reply; 317+ messages in thread
From: Alexander Viro @ 2001-05-15 20:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: James Simmons, Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	Linux Kernel Mailing List



On Tue, 15 May 2001, H. Peter Anvin wrote:

> Permission management.  The permissions on the subnodes are inherited
> from the main node, which is stored on a persistent medium.

If you want them all to inherit it - inherit from mountpoint. End of story.
Yes, it means that permission(9) will need vfsmount argument. But we
_will_ need that anyway. For per-mountpoint read-only, if nothing else.

Want details? Please. We have the ->getattr() method. Currently not
used, but intended to be used by ...stat family (with the current
behaviour being default). Now, let's pass to permission(9), notify_change(9)
and ->{set,get}attr()  both vfsmount and dentry. See what I mean?

We get (essentially for free)
	* per-mountpoint read-only flag (I've already done nosuid, noexec
and nodev per-mountpoint)
	* ability to have inodes that simply don't have owners - ownership
is determined (and handled) by the functions/methods above. So FAT and
friends can get rid of knowledge of uid=,gid=" crap.
	* ability to inherit ownership from mountpoint and if fs wants it -
update the ownership of mountpoint.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:41                                     ` Alexander Viro
@ 2001-05-15 20:51                                       ` Linus Torvalds
  2001-05-16  1:01                                         ` Daniel Phillips
  0 siblings, 1 reply; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 20:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: H. Peter Anvin, James Simmons, Alan Cox, Neil Brown, Jeff Garzik,
	Linux Kernel Mailing List


On Tue, 15 May 2001, Alexander Viro wrote:
>
> If you want them all to inherit it - inherit from mountpoint.

..which is exactly what the device node ends up being. The implicit
mount-point.

And which point, btw, it is completely indistinguishable to user space
whether the thing is implemented as a full filesystem, or whether it's
just that the device node exports a simple "lookup()" that it passes down
to the device driver. So this is also the point where it becomes nothing
but an implementation issue, and as such it's much less contentious.

Done right, they'll be automatic mount-points, which gives us:
 - perfect backwards compatibility (opening just the node will do what it
   has always done)
 - _zero_ extra system administration.

And I really think the zero system administration thing is the important
one. For some reason, sysadmin is where all the fights break out (see
devfs, but historically we had all the same problems with the original
device naming etc).

Sysadmin and editors. The holy wars of UNIX.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 15:16                         ` Linus Torvalds
@ 2001-05-15 20:55                           ` Alan Cox
  0 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-15 20:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Alexander Viro, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List

> On Tue, 15 May 2001, Alan Cox wrote:
> > Counter argument; We dont want the bloat of making a floppy tape have
> > delusions of grandeur in kernel space when mt-st can do it in userspace.
> 
> Counter-counter-argument: we could just export the ioctl's, and make a
> "user-level-filesystem". Except it's not a filesystem, but a driver.

Still pushes code into kernel space. Im all for 'tapes' as one set of objects
and a cleaner user space fallback than peer at the major

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:37                                   ` Linus Torvalds
@ 2001-05-15 20:56                                     ` Jeff Garzik
  2001-05-15 21:22                                     ` James Simmons
  2001-05-17 10:42                                     ` Pavel Machek
  2 siblings, 0 replies; 317+ messages in thread
From: Jeff Garzik @ 2001-05-15 20:56 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Linus Torvalds wrote:
> It should be a case of "Just plug in a new kernel, and suddenly your
> existing filesystem just allows you to do more! 20% more for the same
> price! AND we'll throw in this useful ginzu knife for just 4.95 for
> shipping and handling. Absolutely free!"

...Linus demonstrates why American culture is a bad influence on you.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:14                                 ` Alexander Viro
  2001-05-15 20:30                                   ` H. Peter Anvin
  2001-05-15 20:37                                   ` Linus Torvalds
@ 2001-05-15 20:57                                   ` James Simmons
  2001-05-17 20:33                                   ` Kai Henningsen
  3 siblings, 0 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 20:57 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List


> You see, as soon as you want slightly more structured stuff (deeper than
> one level) you need the dentry tree, yodda, yodda. IOW, you need a
> filesystem anyway and it's easy to implement. Want me to do framebufferfs?
> Would make a nice demo.  No majors. No minors. No ioctls. Less code than
> in current tree.  ~3 days to implement.

Yes. I like to give this fbdevfs a try. Once tested I have no problem
placing it into my kernel tree I have. I planned on reworking the fbdev
layer anyways for 2.5.X. As Linus pointed out is the backwards
compatiabilty. Maybe name it to something else. Since I like to see fbdev
and drm merge we need a new name anyways. Later I can migrate DRI
functionality into this filesystem. It would be a nice demo. It would be
really cool if I could stream the framebuffer image over a network :-)
 



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 15:10                     ` Linus Torvalds
                                         ` (3 preceding siblings ...)
  2001-05-15 19:31                       ` Richard Gooch
@ 2001-05-15 20:58                       ` Alan Cox
  2001-05-15 21:42                       ` Chip Salzenberg
  5 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-15 20:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro

> > 	else
> > 		error
> 
> Ugh. You do this?

Lots of apps do it - hdparm, mt-st, lilo

> The fix, I think, is to make the ioctl commands much more regular. That is
> probably true in general, and fixing that would hopefully fix the need for
> horrible code like the above.

Definitely.  Some of it would remain but it becomes

		if(ioctl(foo)==-EOPNOTSUPP)
			bang_it_nby_hand()


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: Getting out of hand?
  2001-05-15  4:30                   ` Linus Torvalds
                                       ` (2 preceding siblings ...)
  2001-05-15  8:48                     ` Alan Cox
@ 2001-05-15 21:16                     ` Martin Dalecki
  3 siblings, 0 replies; 317+ messages in thread
From: Martin Dalecki @ 2001-05-15 21:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Alexander Viro, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

Linus Torvalds wrote:
> 
> On Mon, 14 May 2001, Alan Cox wrote:
> >
> > Except that Linus wont hand out major numbers, which means I can't even boot
> > simply off such a device. I bet the vendors in question dont think the sun
> > shines out of linus backside any more.
> 
> Actually, it does. It's just that some people have gotten so blinded by my
> a** that they can no longer see it any more ;)
> 
> The problem I have is that there are lots of _good_ solutions, but they
> all imply a bit more work than the bad ones.
> 
> What does that result in? Everybody continues to use the simple old setup,
> which required no thought at all, but that is a pain to maintain.
> 
> For example, the only thing you need in order to boot is to have a nice
> clean "disk" major number. That's it. Nothing fancy, nothing more.
> 
> Look at what we have now:
> 
>  - ramdisk: major 1. Fair enough - ramdisk is special, in that it doesn't
>    have any "real hardware". No problem.
>  - SCSI disks:
>         major 8, 65-71,
>  - Compaq smart2:
>         major 72-79
>  - Compaq CISS:
>         major 104-111
>  - DASD;
>         major 94
>  - IDE:
>         major 3, 22, 33-34, 56-57, 88-91
> 
> and then the small random ones.
> 
> NONE of these major numbers have _any_ redeeming qualities except for the
> ramdisk. They should all be _one_ major number, namely "disk". There are
> absolutely NO advantages to having separate devices for soem strange
> compaq controllers and IDE disks. There is _no_ point in having some SCSI
> disks show up at major 8, while others (who just happen to be attached to
> a scsi bus that is not driven by the generic SCSI layer) show up at major
> 104 or whatever.

And then the IDE stuff is stiuoid to use the same major numbers for
in fact entierly different devices like CD-ROM and IDE disk drivers on
the same major... This makes it VERY uncomfortable to guarantee that
for example the sector size and driver read ahead are properties
tighted to the major number alone... In fact Linux is bundling 
read ahead with the major number only, in esp. inside the RAID drivers
which is entierly wrong! (see blksize_size array and read_ahead array).

And yes the RAID drivers are in particular *VERY* stiupid in
terms of major/minor number usage.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:28                                   ` James Simmons
@ 2001-05-15 21:20                                     ` Nicolas Pitre
  2001-05-15 21:28                                       ` James Simmons
                                                         ` (2 more replies)
  0 siblings, 3 replies; 317+ messages in thread
From: Nicolas Pitre @ 2001-05-15 21:20 UTC (permalink / raw)
  To: James Simmons
  Cc: H. Peter Anvin, Linus Torvalds, Alexander Viro, Alan Cox,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List



On Tue, 15 May 2001, James Simmons wrote:

>
> > I actually suggested something like this a while ago, mainly w.r.t. how
> > to deal with serial ports (e.g. /dev/ttyS0/callout instead of /dev/cua0).
>
> Very brillant. I like to see this as well, plus include the other serial
> devices.

Personally, I'd really like to see /dev/ttyS0 be the first detected serial
port on a system, /dev/ttyS1 the second, etc.  Currently there are plenty of
different serial hardware with all their own drivers and /dev entries.  For
embedded systems with serial consoles, and also across architectures, this
is a pain since the filesystem and namely /dev/inittab has to be adjusted
for all different types of UARTs.  This is not the case for every different
type of NICs and that's a good thing.


Nicolas


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 18:04                             ` Linus Torvalds
  2001-05-15 18:58                               ` Johannes Erdfelt
  2001-05-15 20:03                               ` James Simmons
@ 2001-05-15 21:22                               ` Jan Harkes
  2001-05-15 21:39                               ` Martin Dalecki
  3 siblings, 0 replies; 317+ messages in thread
From: Jan Harkes @ 2001-05-15 21:22 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Tue, May 15, 2001 at 11:04:27AM -0700, Linus Torvalds wrote:
> And using ASCII names ("eject") instead of numbers (see the "FDEJECT" and
> "CDROMEJECT" etc #defines) sure as hell makes for easier maintenance and
> avoids the whole issue of maintaining static numbers (all the same things
> that make me hate device number maintenance makes me also hate the fact
> that we need to maintain this list of ioctl numbers etc). By using
> descriptive names, the "maintenance" simple does not exist.

If people couldn't even agree on using the same ioctl number, why
would they agree on using the same ASCII name? In other words, there
will still be maintenance, it just moves the problem into a different
(and hopefully more maintainable) 'namespace'.

Jan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:37                                   ` Linus Torvalds
  2001-05-15 20:56                                     ` Jeff Garzik
@ 2001-05-15 21:22                                     ` James Simmons
  2001-05-17 10:42                                     ` Pavel Machek
  2 siblings, 0 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 21:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Viro, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List


> I want to ease people into this notion. I'm personally perfectly happy to
> make it a real filesystem, if you are willing to write the code. But I've
> become convinced that the transition has to be really simple, with no
> administrative work.

Just one thing. Since we would end up with a system where each device is a
filesystem of some type how does this fit into devfs. I can't imagine
having a fstab file with 20 some filesystems for different types of
devices. I like to see the ability to mount device filesystems but I like
to have them all mounted at one time at boot time. Of course their might
be some users who don't feel this way. They might want to count what gets
mounted where. I guess devfsd could be expaned to handle this.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:20                                     ` Nicolas Pitre
@ 2001-05-15 21:28                                       ` James Simmons
  2001-05-15 21:31                                         ` H. Peter Anvin
                                                           ` (3 more replies)
  2001-05-16  0:59                                       ` Daniel Phillips
  2001-05-16  7:17                                       ` Kai Henningsen
  2 siblings, 4 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 21:28 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: H. Peter Anvin, Linus Torvalds, Alexander Viro, Alan Cox,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List


> Personally, I'd really like to see /dev/ttyS0 be the first detected serial
> port on a system, /dev/ttyS1 the second, etc.  Currently there are plenty of
> different serial hardware with all their own drivers and /dev entries.  For
> embedded systems with serial consoles, and also across architectures, this
> is a pain since the filesystem and namely /dev/inittab has to be adjusted
> for all different types of UARTs.  This is not the case for every different
> type of NICs and that's a good thing.

I couldn't agree with you more. It gives me headaches at work. One note,
their is a except to the eth0 thing. USB to USB networking. It uses usb0,
etc. I personally which they use eth0.  


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:18                                     ` Linus Torvalds
  2001-05-15 20:26                                       ` Dan Hollis
@ 2001-05-15 21:29                                       ` Alex Bligh - linux-kernel
  2001-05-15 21:36                                         ` Linus Torvalds
                                                           ` (2 more replies)
  2001-05-15 21:51                                       ` Mark Frazer
                                                         ` (4 subsequent siblings)
  6 siblings, 3 replies; 317+ messages in thread
From: Alex Bligh - linux-kernel @ 2001-05-15 21:29 UTC (permalink / raw)
  To: Linus Torvalds, Jonathan Lundell
  Cc: Jeff Garzik, James Simmons, Alan Cox, Neil Brown, H. Peter Anvin,
	Linux Kernel Mailing List, viro, Alex Bligh - linux-kernel

> The argument that "if you use numbering based on where in the SCSI chain
> the disk is, disks don't pop in and out" is absolute crap. It's not true
> even for SCSI any more (there are devices that will aquire their location
> dynamically), and it has never been true anywhere else. Give it up.

Q: Let us assume you have dynamic numbering disk0..N as you suggest,
   and you have some s/w RAID of SCSI disks. A disk fails, and is (hot)
   removed. Life continues. You reboot the machine. Disks are now numbered
   disk0..(N-1). If the RAID config specifies using disk0..N thusly, it
   is going to be very confused, as stripes will appear in the wrong place.
   Doesn't that mean the file specifying the RAID config is going to have
   to enumerate SCSI IDs (or something configuration invariant) as
   opposed to use the disk0..N numbering anyway? Sure it can interrogate
   each disk0..N to see which has the ID that it actually wanted, but
   doesn't this rather subvert the purpose?

IE, given one could create /dev/disk/?.+, isn't the important
argument that they share common major device numbers etc., not whether
they linearly reorder precisely to 0..N as opposed to have some form
of identifier guaranteed to be static across reboot & config change.
--
Alex Bligh

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:28                                       ` James Simmons
@ 2001-05-15 21:31                                         ` H. Peter Anvin
  2001-05-15 21:43                                         ` Johannes Erdfelt
                                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 21:31 UTC (permalink / raw)
  To: James Simmons
  Cc: Nicolas Pitre, Linus Torvalds, Alexander Viro, Alan Cox,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List

James Simmons wrote:
> 
> > Personally, I'd really like to see /dev/ttyS0 be the first detected serial
> > port on a system, /dev/ttyS1 the second, etc.  Currently there are plenty of
> > different serial hardware with all their own drivers and /dev entries.  For
> > embedded systems with serial consoles, and also across architectures, this
> > is a pain since the filesystem and namely /dev/inittab has to be adjusted
> > for all different types of UARTs.  This is not the case for every different
> > type of NICs and that's a good thing.
> 
> I couldn't agree with you more. It gives me headaches at work. One note,
> their is a except to the eth0 thing. USB to USB networking. It uses usb0,
> etc. I personally which they use eth0.
> 

"ethX" is only used for Ethernet.  Other types of network devices use
other names.

Personally, I would also like to see network devices manifest in the
filesystem namespace like everything else.

	-=hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:29                                       ` Alex Bligh - linux-kernel
@ 2001-05-15 21:36                                         ` Linus Torvalds
  2001-05-15 22:03                                         ` Jeff Mahoney
  2001-05-15 22:42                                         ` Andreas Dilger
  2 siblings, 0 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-15 21:36 UTC (permalink / raw)
  To: Alex Bligh - linux-kernel
  Cc: Jonathan Lundell, Jeff Garzik, James Simmons, Alan Cox,
	Neil Brown, H. Peter Anvin, Linux Kernel Mailing List, viro


On Tue, 15 May 2001, Alex Bligh - linux-kernel wrote:
> 
> Q: Let us assume you have dynamic numbering disk0..N as you suggest,
>    and you have some s/w RAID of SCSI disks. A disk fails, and is (hot)
>    removed. Life continues. You reboot the machine. Disks are now numbered
>    disk0..(N-1). If the RAID config specifies using disk0..N thusly,

If you have a raid config like that, then you're screwed _whatever_ you
do.

Look into using UUID's, which fix this properly.

And note, btw, how I think the md autorun stuff do all of this the RIGHT
way. Where RIGHT very much includes not using positional information etc.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 18:04                             ` Linus Torvalds
                                                 ` (2 preceding siblings ...)
  2001-05-15 21:22                               ` Jan Harkes
@ 2001-05-15 21:39                               ` Martin Dalecki
  3 siblings, 0 replies; 317+ messages in thread
From: Martin Dalecki @ 2001-05-15 21:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: James Simmons, Alexander Viro, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List

Linus Torvalds wrote:

> and then use
> 
>         fd = open("/dev/fd0/colourspace", O_RDWR);

> This, btw, is Al Viro's wet dream. But I have to agree: using name spaces
> etc is MUCH preferable to ioctl's, makes code more readable and logical,
> and often makes it possible to do things you couldn't sanely do before
> (control these things from scripts etc).
> 
> And using ASCII names ("eject") instead of numbers (see the "FDEJECT" and
> "CDROMEJECT" etc #defines) sure as hell makes for easier maintenance and
> avoids the whole issue of maintaining static numbers (all the same things
> that make me hate device number maintenance makes me also hate the fact
> that we need to maintain this list of ioctl numbers etc). By using
> descriptive names, the "maintenance" simple does not exist.


Blah blah blah.... Now we have just one ugly cluttered undocumented
(please insert the list of you favourite invictions here) /proc.
This way we would have TONS of them.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  9:26                   ` Alan Cox
  2001-05-15  9:49                     ` Alexander Viro
  2001-05-15 15:10                     ` Linus Torvalds
@ 2001-05-15 21:40                     ` Chip Salzenberg
  2001-05-15 22:12                       ` Alan Cox
                                         ` (2 more replies)
  2 siblings, 3 replies; 317+ messages in thread
From: Chip Salzenberg @ 2001-05-15 21:40 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro

According to Alan Cox:
> Given a file handle 'X' how do I find out what ioctl groups I should
> apply to it.

Wouldn't it be better just to *try* ioctls and see which ones work and
which ones don't?

This ioctl situation reminds me of how novice programmers assume that
they have to call access() or stat() and check a file for existence
and readability before calling open().  But that's just stupid when
you think about it, because if the file isn't there and the open()
fails, that's OK!  Failures are not fatal.

Similarly, ioctl failures are not fatal.  Just Try Them.
-- 
Chip Salzenberg              - a.k.a. -             <chip@valinux.com>
 "We have no fuel on board, plus or minus 8 kilograms."  -- NEAR tech

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 19:31                       ` Richard Gooch
  2001-05-15 19:37                         ` H. Peter Anvin
  2001-05-15 20:10                         ` Alan Cox
@ 2001-05-15 21:41                         ` Richard Gooch
  2001-05-15 21:47                           ` Alexander Viro
                                             ` (5 more replies)
  2 siblings, 6 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-15 21:41 UTC (permalink / raw)
  To: Alan Cox
  Cc: Ingo Oeser, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List, viro

Alan Cox writes:
> > 	len = readlink ("/proc/self/3", buffer, buflen);
> > 	if (strcmp (buffer + len - 2, "cd") != 0) {
> > 		fprintf (stderr, "Not a CD-ROM! Bugger off.\n");
> > 		exit (1);
> 
> And on my box cd is the cabbage dicer whoops

Actually, no, because it's guaranteed that a trailing "/cd" is a
CD-ROM. That's the standard.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 15:10                     ` Linus Torvalds
                                         ` (4 preceding siblings ...)
  2001-05-15 20:58                       ` Alan Cox
@ 2001-05-15 21:42                       ` Chip Salzenberg
  2001-05-15 21:46                         ` Alexander Viro
  5 siblings, 1 reply; 317+ messages in thread
From: Chip Salzenberg @ 2001-05-15 21:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro

According to Linus Torvalds:
> I don't see why we couldn't expose the "driver name" for any file
> descriptor.

Is it wise to assume that there is only one such name for *any* file
descriptor?
-- 
Chip Salzenberg              - a.k.a. -             <chip@valinux.com>
 "We have no fuel on board, plus or minus 8 kilograms."  -- NEAR tech

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:28                                       ` James Simmons
  2001-05-15 21:31                                         ` H. Peter Anvin
@ 2001-05-15 21:43                                         ` Johannes Erdfelt
  2001-05-15 21:49                                           ` James Simmons
  2001-05-16  7:05                                           ` Kai Henningsen
  2001-05-15 22:07                                         ` Alan Cox
  2001-05-16  7:11                                         ` Kai Henningsen
  3 siblings, 2 replies; 317+ messages in thread
From: Johannes Erdfelt @ 2001-05-15 21:43 UTC (permalink / raw)
  To: James Simmons
  Cc: Nicolas Pitre, H. Peter Anvin, Linus Torvalds, Alexander Viro,
	Alan Cox, Neil Brown, Jeff Garzik, Linux Kernel Mailing List

On Tue, May 15, 2001, James Simmons <jsimmons@transvirtual.com> wrote:
> > Personally, I'd really like to see /dev/ttyS0 be the first detected serial
> > port on a system, /dev/ttyS1 the second, etc.  Currently there are plenty of
> > different serial hardware with all their own drivers and /dev entries.  For
> > embedded systems with serial consoles, and also across architectures, this
> > is a pain since the filesystem and namely /dev/inittab has to be adjusted
> > for all different types of UARTs.  This is not the case for every different
> > type of NICs and that's a good thing.
> 
> I couldn't agree with you more. It gives me headaches at work. One note,
> their is a except to the eth0 thing. USB to USB networking. It uses usb0,
> etc. I personally which they use eth0.  

USB to USB networking cables aren't ethernet.

There are USB to ethernet adapters and they do appear as eth0.

JE


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:32                             ` Alexander Viro
  2001-05-15 17:44                               ` James Simmons
@ 2001-05-15 21:46                               ` Chip Salzenberg
  2001-05-15 21:50                                 ` James Simmons
  1 sibling, 1 reply; 317+ messages in thread
From: Chip Salzenberg @ 2001-05-15 21:46 UTC (permalink / raw)
  To: Alexander Viro
  Cc: James Simmons, Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List

According to Alexander Viro:
> On Tue, 15 May 2001, James Simmons wrote:
> > I would use write except we use write to draw into the framebuffer. If I
> > write to the framebuffer with that data the only thing that will happen is
> > I will get pretty colors on my screen. 
> 
> Yes. And we also use write to send data to printer. So what? Nobody makes
> you use the same file.

You're talking about /dev/fb0 vs. /dev/fb0ctl, right?

Would that driver authors routinely used such clean designs.

PS: No, readers, AFAIK, there is no such thing as /dev/fb0ctl.  Yet.
-- 
Chip Salzenberg              - a.k.a. -             <chip@valinux.com>
 "We have no fuel on board, plus or minus 8 kilograms."  -- NEAR tech

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:42                       ` Chip Salzenberg
@ 2001-05-15 21:46                         ` Alexander Viro
  2001-05-15 21:57                           ` H. Peter Anvin
  2001-05-15 22:18                           ` Alan Cox
  0 siblings, 2 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-15 21:46 UTC (permalink / raw)
  To: Chip Salzenberg
  Cc: Linus Torvalds, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List



On Tue, 15 May 2001, Chip Salzenberg wrote:

> According to Linus Torvalds:
> > I don't see why we couldn't expose the "driver name" for any file
> > descriptor.
> 
> Is it wise to assume that there is only one such name for *any* file
> descriptor?

Type of filesystem where the file came from? Sure.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:41                         ` Richard Gooch
@ 2001-05-15 21:47                           ` Alexander Viro
  2001-05-15 22:14                           ` Alan Cox
                                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-15 21:47 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Alan Cox, Ingo Oeser, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List



On Tue, 15 May 2001, Richard Gooch wrote:

> Alan Cox writes:
> > > 	len = readlink ("/proc/self/3", buffer, buflen);
> > > 	if (strcmp (buffer + len - 2, "cd") != 0) {
> > > 		fprintf (stderr, "Not a CD-ROM! Bugger off.\n");
> > > 		exit (1);
> > 
> > And on my box cd is the cabbage dicer whoops
> 
> Actually, no, because it's guaranteed that a trailing "/cd" is a
> CD-ROM. That's the standard.

Set by...?


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:43                                         ` Johannes Erdfelt
@ 2001-05-15 21:49                                           ` James Simmons
  2001-05-16  7:05                                           ` Kai Henningsen
  1 sibling, 0 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 21:49 UTC (permalink / raw)
  To: Johannes Erdfelt
  Cc: Nicolas Pitre, H. Peter Anvin, Linus Torvalds, Alexander Viro,
	Alan Cox, Neil Brown, Jeff Garzik, Linux Kernel Mailing List


> > I couldn't agree with you more. It gives me headaches at work. One note,
> > their is a except to the eth0 thing. USB to USB networking. It uses usb0,
> > etc. I personally which they use eth0.  
> 
> USB to USB networking cables aren't ethernet.

I'm talking about a wireless connection. ipaq USB cradle to PC. 



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:46                               ` Chip Salzenberg
@ 2001-05-15 21:50                                 ` James Simmons
  0 siblings, 0 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 21:50 UTC (permalink / raw)
  To: Chip Salzenberg
  Cc: Alexander Viro, Linus Torvalds, Alan Cox, Neil Brown,
	Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List


> > Yes. And we also use write to send data to printer. So what? Nobody makes
> > you use the same file.
> 
> You're talking about /dev/fb0 vs. /dev/fb0ctl, right?

No. We are talking about a fbdev filesystem versus what we have now.
See later in the thread.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:18                                     ` Linus Torvalds
  2001-05-15 20:26                                       ` Dan Hollis
  2001-05-15 21:29                                       ` Alex Bligh - linux-kernel
@ 2001-05-15 21:51                                       ` Mark Frazer
  2001-05-15 22:35                                       ` Bob Glamm
                                                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 317+ messages in thread
From: Mark Frazer @ 2001-05-15 21:51 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Linus Torvalds <torvalds@transmeta.com> [01/05/15 16:28]:
> 
> The "eth0..N" naming is done RIGHT!

Nothing to do with the kernel but, one should then argue that the
current stuff in /etc/sysconfig/network-scripts is broken for hotplug as
placing a new network adapter into your bus will renumber your interfaces
causing them to be ifconfig'd wrongly.  You'd want to associate the IP
configuration stuff with the particular network interface, by MAC address.

As for the software-RAID getting messed up, isn't that what volume labels
are for?

What else in the current distro setups is going to need to change for
hotplug?

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:43                             ` Linus Torvalds
                                                 ` (2 preceding siblings ...)
  2001-05-15 20:23                               ` Alan Cox
@ 2001-05-15 21:52                               ` Andreas Dilger
  3 siblings, 0 replies; 317+ messages in thread
From: Andreas Dilger @ 2001-05-15 21:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: James Simmons, Jeff Garzik, Alan Cox, Neil Brown, H. Peter Anvin,
	Linux Kernel Mailing List, viro

Linus writes:
> And my opinion is that the "hot-plugged" approach works for devices even
> if they are soldered down - the "plugging" event just always happens
> before the OS is booted, and people just don't unplug it. So we might as
> well consider devices to always be hot-pluggable, whether that is actually
> physically true or not. Because that will always work, and that way we
> don't create any artificial distinctions (and they often really _are_
> artifical: historically soldered-down devices tend to eventually move in a
> more hot-pluggable direction, as you point out).

Basically, what you describe is the system that AIX has used since day 1.

All devices are "discovered" at boot time (even if they are soldered-down),
and registered in a database (for Linux that database could be /dev and
some text files in /etc)...

Every device driver supports hot-plugging.  No "static" major or minor
numbers per-se (i.e. no HPA needed), but devices are static in the sense
of "once a device node is created in /dev, it does not change major/minor
numbers until removed", so it is the same device name across reboots.
Devices keep a device node in /dev until specifically unconfigured (so
sysadmins know a device disappeared) but are seen as "unavailable", and
can be permanently removed via "rmdev".  Real hot-plug devices on Linux
could remove their own device node when removed, I suppose.

> This is true to the point that I would not actually think that it is a bad
> idea to call /sbin/hotplug when we enumerate the motherboard devices. In
> fact, if you look at the current network drivers, this is exactly what
> will happen: when we auto-detect the motherboard devices, we _will_
> actually call /sbin/hotplug to tell that we've "inserted" a network
> device.

Yes, in AIX it is always possible to re-run device detection (cfgmgr)
to find new devices (if they do not announce that fact themselves).
This would re-traverse all of the system busses (including CPU, memory,
motherboard(s), etc) looking for new devices, and each item found
spits out either endpoints or children which need further enumeration.
It is also possible to only selectively run device detection, so you
could, say, ask it to check for new disks only on a specific adapter
(important if you have 300 disks on a system).

This way you can detect new disks, adapter cards, serial ports, etc. any
time after the system is up.  All disks are identified as "/dev/hdiskX",
(serial ports /dev/ttySX, etc) and if you really need to know more
about the location of the device that is stored as part of the device
attributes (e.g. physical path to that device, I/O addresses, some device
attributes like serial number/MAC/etc).

> [ The biggest silliness is this "let's try to make the disks appear in the
>   same order that the BIOS probes them". Now THAT is really stupid, and it
>   goes on a lot more than I'd ever like to see. ]

Of course, since AIX uses exclusively LVM, it can identify which disk
is which no matter where it has moved, so it never had that legacy "I
have a DOS partition on C: and D: and I don't want to toast them" issue.
Since AIX runs only on proprietary H/W it always can tell you which slot
a card is in if you need to identify it (which PCI can do, but ISA can't).

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:46                         ` Alexander Viro
@ 2001-05-15 21:57                           ` H. Peter Anvin
  2001-05-15 22:07                             ` Chip Salzenberg
  2001-05-15 22:18                           ` Alan Cox
  1 sibling, 1 reply; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 21:57 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Chip Salzenberg, Linus Torvalds, Alan Cox, Neil Brown,
	Jeff Garzik, Linux Kernel Mailing List

Alexander Viro wrote:
> 
> On Tue, 15 May 2001, Chip Salzenberg wrote:
> 
> > According to Linus Torvalds:
> > > I don't see why we couldn't expose the "driver name" for any file
> > > descriptor.
> >
> > Is it wise to assume that there is only one such name for *any* file
> > descriptor?
> 
> Type of filesystem where the file came from? Sure.
> 

I think this is the wrong question.  A device can inherently belong to
multiple device classes, and it really should be thought of as such.  For
example a disk may belong, at the same time, to the "scsi", "disk" and
"scsi-disk" device classes, meaning that it supports the union of the
"scsi" common interfaces, "disk" common interfaces, and "scsi-disk"
common interfaces.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 19:43                                   ` Johannes Erdfelt
@ 2001-05-15 21:58                                     ` Chip Salzenberg
  2001-05-16  8:51                                     ` Helge Hafting
                                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 317+ messages in thread
From: Chip Salzenberg @ 2001-05-15 21:58 UTC (permalink / raw)
  To: Johannes Erdfelt
  Cc: Linus Torvalds, James Simmons, Alexander Viro, Alan Cox,
	Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

According to Johannes Erdfelt:
> I had always made the assumption that sockets were created because you
> couldn't easily map IPv4 semantics onto filesystems. It's unreasonable
> to have a file for every possible IP address/port you can communicate
> with.

I think you're right on both counts, but I'm sure you'll agree that
just because some undergrad at Berkeley did something a certain way 20
years ago doesn't mean we have to follow it blindly. :-)

IIRC, Plan 9 allocate TCP connections rather like Linux allocates
ptys.  When we allocate a pty we don't have to say what program we're
going to connect to; we allocate it and then use it as we like.
Similarly, in Plan 9 you allocate a TCP connection without having to
say who you're going to connect to.  The main differences between the
Plan 9 approach and the socket approach are:

  1. Plan 9 connections are filesystem entities (like our ptys)
  2. Control is done via read/write on a separate control channel,
     which is *also* a filesystem entity.

USB could use a similar approach.  And since each client would
allocate a new connection entity for its own use -- even if it's going
to connect to a device that someone else is already connected to --
permissions becomes quite simple to manage.

Come to think of it, the mechanism I'm describing could address all
hotpluggable devices....
-- 
Chip Salzenberg              - a.k.a. -             <chip@valinux.com>
 "We have no fuel on board, plus or minus 8 kilograms."  -- NEAR tech

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:03                               ` James Simmons
                                                   ` (2 preceding siblings ...)
  2001-05-15 20:17                                 ` H. Peter Anvin
@ 2001-05-15 21:59                                 ` Chip Salzenberg
  2001-05-15 22:51                                   ` James Simmons
  3 siblings, 1 reply; 317+ messages in thread
From: Chip Salzenberg @ 2001-05-15 21:59 UTC (permalink / raw)
  To: James Simmons
  Cc: Linus Torvalds, Alexander Viro, Alan Cox, Neil Brown,
	Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List

According to James Simmons:
> Graphics cards are the same way. Especially high end ones. They have pipes
> as well. For low end cards you can think of them as single pipeline cards
> with one pipe.

It still frosts my shorts that DRM (e.g. /dev/dri/card0) doesn't use
write().  It's a natural way to feed pipelines.  But no, it's a raft
of ioctl() calls.  *sigh*
-- 
Chip Salzenberg              - a.k.a. -             <chip@valinux.com>
 "We have no fuel on board, plus or minus 8 kilograms."  -- NEAR tech

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:29                                       ` Alex Bligh - linux-kernel
  2001-05-15 21:36                                         ` Linus Torvalds
@ 2001-05-15 22:03                                         ` Jeff Mahoney
  2001-05-15 22:42                                         ` Andreas Dilger
  2 siblings, 0 replies; 317+ messages in thread
From: Jeff Mahoney @ 2001-05-15 22:03 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Tue, May 15, 2001 at 10:29:38PM +0100, Alex Bligh - linux-kernel wrote:
> > The argument that "if you use numbering based on where in the SCSI chain
> > the disk is, disks don't pop in and out" is absolute crap. It's not true
> > even for SCSI any more (there are devices that will aquire their location
> > dynamically), and it has never been true anywhere else. Give it up.
> 
> Q: Let us assume you have dynamic numbering disk0..N as you suggest,
>    and you have some s/w RAID of SCSI disks. A disk fails, and is (hot)
>    removed. Life continues. You reboot the machine. Disks are now numbered
>    disk0..(N-1). If the RAID config specifies using disk0..N thusly, it
>    is going to be very confused, as stripes will appear in the wrong place.
>    Doesn't that mean the file specifying the RAID config is going to have
>    to enumerate SCSI IDs (or something configuration invariant) as
>    opposed to use the disk0..N numbering anyway? Sure it can interrogate
>    each disk0..N to see which has the ID that it actually wanted, but
>    doesn't this rather subvert the purpose?
> 
> IE, given one could create /dev/disk/?.+, isn't the important
> argument that they share common major device numbers etc., not whether
> they linearly reorder precisely to 0..N as opposed to have some form
> of identifier guaranteed to be static across reboot & config change.

    I was think about something along these lines earlier, and I guess this is
    the perfect time to bring it up..

    Before I started working with Linux full time, I did a lot of work as an
    admin with Digital UNIX/Tru64. Tru64 v5 has a feature that at first glance
    I wasn't too sure about, but it's sort of grown on me.

    /dev/disk/* is populated with entries of the style dsk0a, dsk0b, etc.. The
    numbering of the disk is bus independant, ID independent, even transport
    independant. The disks are kept track of by disk serial number, and so for
    example, if you have three disks with serials "456", "123", "789", they
    would be recognized as dsk0, dsk1, dsk2, in the order of discovery. Once
    discovered, the disk with a certain serial number will _always_ be at a
    certain "dsk#" location, stored in /etc/disk-mappings or some such file in
    the root filesystem.

    Under the current system, if dsk1 fails, and the system reboots, all the
    disk numbers slide down. Linus mentioned "of course your disk numbers will
    change" earlier, but there's no real reason they have to, but there's also
    no reason you have to access them in the sys-v'ish c0d0p0.. style either.

    What I liked about the way Tru64 did it is that the disk numbering stays
    the same. You could've just taken the disk out for a day to put in another
    system, and you're planning on putting it back in. Re-attach the disk w/
    serial number "123", and it gets assigned "dsk1" again. Add another disk to
    the system, it gets dsk3, regardless of whether dsk1 was re-attached or
    not.

    This approach has the advantage of abstracting the device type from the
    user, as well as offering reproducable ordering.  The clear and immediate
    exception to all of this is replacing a failed disk, RAID or not. Since the
    mapping is done via user space, a simple utility, or even a text editor
    could remap a "new" disk to an "old" location, thus making the disk
    replaced. The "moving" of the disk to a different location shouldn't be
    much different than a device disappearing and reappearing.

    Of course, this is all a high level view of the whole process, but I
    thought I'd throw it out there for comment.

    -Jeff

-- 
Jeff Mahoney
jeffm@suse.com
jeffm@csh.rit.edu

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:28                                       ` James Simmons
  2001-05-15 21:31                                         ` H. Peter Anvin
  2001-05-15 21:43                                         ` Johannes Erdfelt
@ 2001-05-15 22:07                                         ` Alan Cox
  2001-05-16  7:11                                         ` Kai Henningsen
  3 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-15 22:07 UTC (permalink / raw)
  To: James Simmons
  Cc: Nicolas Pitre, H. Peter Anvin, Linus Torvalds, Alexander Viro,
	Alan Cox, Neil Brown, Jeff Garzik, Linux Kernel Mailing List

> I couldn't agree with you more. It gives me headaches at work. One note,
> their is a except to the eth0 thing. USB to USB networking. It uses usb0,
> etc. I personally which they use eth0.  
> 

The packet framing is quite different so it doesnt really make sense. For
wireless it does use ethernet packet format so it makes sense


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:57                           ` H. Peter Anvin
@ 2001-05-15 22:07                             ` Chip Salzenberg
  2001-05-15 22:11                               ` H. Peter Anvin
  0 siblings, 1 reply; 317+ messages in thread
From: Chip Salzenberg @ 2001-05-15 22:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Alexander Viro, Linus Torvalds, Alan Cox, Neil Brown,
	Jeff Garzik, Linux Kernel Mailing List

According to H. Peter Anvin:
> A device can inherently belong to multiple device classes, and it
> really should be thought of as such.

And then there are layering technologies like LVM and loopback.
They should be included in a discovery, but if you limit yourself
to one "device type", there's no place for them.

> For example a disk may belong, at the same time, to the "scsi",
> "disk" and "scsi-disk" device classes [...]

True, but in a sane system, "scsi" + "disk" implies "scsi-disk".
-- 
Chip Salzenberg              - a.k.a. -             <chip@valinux.com>
 "We have no fuel on board, plus or minus 8 kilograms."  -- NEAR tech

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 22:07                             ` Chip Salzenberg
@ 2001-05-15 22:11                               ` H. Peter Anvin
  0 siblings, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 22:11 UTC (permalink / raw)
  To: Chip Salzenberg
  Cc: Alexander Viro, Linus Torvalds, Alan Cox, Neil Brown,
	Jeff Garzik, Linux Kernel Mailing List

Chip Salzenberg wrote:
> 
> According to H. Peter Anvin:
> > A device can inherently belong to multiple device classes, and it
> > really should be thought of as such.
> 
> And then there are layering technologies like LVM and loopback.
> They should be included in a discovery, but if you limit yourself
> to one "device type", there's no place for them.
> 
> > For example a disk may belong, at the same time, to the "scsi",
> > "disk" and "scsi-disk" device classes [...]
> 
> True, but in a sane system, "scsi" + "disk" implies "scsi-disk".
> 

Well, of course, but it's still a separate class.  An operation can
belong to "scsi-disk" that doesn't belong in either "scsi" or "disk". 
You can replace the - with an upside-down U if you want; it's not in
Latin-1 unfortunately.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:40                     ` Chip Salzenberg
@ 2001-05-15 22:12                       ` Alan Cox
  2001-05-15 22:19                         ` H. Peter Anvin
  2001-05-15 23:39                         ` Chip Salzenberg
  2001-05-15 22:49                       ` James Simmons
  2001-05-15 23:22                       ` Kenneth Johansson
  2 siblings, 2 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-15 22:12 UTC (permalink / raw)
  To: Chip Salzenberg
  Cc: Alan Cox, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List, viro

> Wouldn't it be better just to *try* ioctls and see which ones work and
> which ones don't?

1. We have overlaps

2. I've seen code where people play clever ioctl tricks to deduce a device
type and it ends up looking like one of those chemistry identification
charts (hopefully minus do you see smoke ?)

It should be clean and explicit


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:41                         ` Richard Gooch
  2001-05-15 21:47                           ` Alexander Viro
@ 2001-05-15 22:14                           ` Alan Cox
  2001-05-15 22:24                           ` Richard Gooch
                                             ` (3 subsequent siblings)
  5 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-15 22:14 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Alan Cox, Ingo Oeser, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List, viro

> > > 	if (strcmp (buffer + len - 2, "cd") != 0) {
> > > 		fprintf (stderr, "Not a CD-ROM! Bugger off.\n");
> > > 		exit (1);
> > 
> > And on my box cd is the cabbage dicer whoops
> 
> Actually, no, because it's guaranteed that a trailing "/cd" is a
> CD-ROM. That's the standard.

And its back to /dev/disc versus /dev/disk. Dont muddle user picked file
names with kernel namespace bindings please.



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:26                                       ` Dan Hollis
@ 2001-05-15 22:14                                         ` Miles Lane
  0 siblings, 0 replies; 317+ messages in thread
From: Miles Lane @ 2001-05-15 22:14 UTC (permalink / raw)
  To: Dan Hollis; +Cc: Linux Kernel Mailing List

On 15 May 2001 13:26:58 -0700, Dan Hollis wrote:
> This thread is becoming high enough volume and likely to become much more
> so, perhaps a separate ml should be set up for it? linux-device-management
> perhaps?

I agree that this is going to be a very high-volume discussion.  OTOH,
this discussion is going to have a fundamental impact on nearly everyong
doing driver work in the kernel tree.  It's hard for me to conceive of
kernel hackers who wouldn't want to track this discussion and thereby
gain a much better understanding of the implementation and design issues
surrounding Linus driver development.  Taking the discussion to another
list is likely to fail and would also deprive many of having this
information in their faces, which is probably where it belongs.

Happy trails,
        Miles


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:46                         ` Alexander Viro
  2001-05-15 21:57                           ` H. Peter Anvin
@ 2001-05-15 22:18                           ` Alan Cox
  1 sibling, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-15 22:18 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Chip Salzenberg, Linus Torvalds, Alan Cox, Neil Brown,
	Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List

> Type of filesystem where the file came from? Sure.

Who says it comes from only one - even on devfs that is not true

/dev/disk/4 is quite possibly

	disk
	scsi-disk
	scsi-device
	usb-scsi-device
	usb-device

all at once


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 22:12                       ` Alan Cox
@ 2001-05-15 22:19                         ` H. Peter Anvin
  2001-05-15 22:28                           ` Alan Cox
  2001-05-15 23:39                         ` Chip Salzenberg
  1 sibling, 1 reply; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 22:19 UTC (permalink / raw)
  To: Alan Cox
  Cc: Chip Salzenberg, Linus Torvalds, Neil Brown, Jeff Garzik,
	Linux Kernel Mailing List, viro

Alan Cox wrote:
> 
> > Wouldn't it be better just to *try* ioctls and see which ones work and
> > which ones don't?
> 
> 1. We have overlaps
> 

1. is of course a problem in itself.  Someone who creates overlapping
ioctls should be spanked, hard.

> 2. I've seen code where people play clever ioctl tricks to deduce a device
> type and it ends up looking like one of those chemistry identification
> charts (hopefully minus do you see smoke ?)
> 
> It should be clean and explicit

Agreed, but "determining type of device" and "determining if interface X
is available on this device" are different operations.  If the latter is
what you want, you want to *explicitly* perform the latter operation.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:41                         ` Richard Gooch
  2001-05-15 21:47                           ` Alexander Viro
  2001-05-15 22:14                           ` Alan Cox
@ 2001-05-15 22:24                           ` Richard Gooch
  2001-05-15 22:27                             ` H. Peter Anvin
  2001-05-15 22:38                             ` Alexander Viro
  2001-05-15 22:28                           ` Richard Gooch
                                             ` (2 subsequent siblings)
  5 siblings, 2 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-15 22:24 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Alan Cox, Ingo Oeser, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List

Alexander Viro writes:
> 
> 
> On Tue, 15 May 2001, Richard Gooch wrote:
> 
> > Alan Cox writes:
> > > > 	len = readlink ("/proc/self/3", buffer, buflen);
> > > > 	if (strcmp (buffer + len - 2, "cd") != 0) {
> > > > 		fprintf (stderr, "Not a CD-ROM! Bugger off.\n");
> > > > 		exit (1);
> > > 
> > > And on my box cd is the cabbage dicer whoops
> > 
> > Actually, no, because it's guaranteed that a trailing "/cd" is a
> > CD-ROM. That's the standard.
> 
> Set by...?

Me&Linus. The device name authority (Peter). Whoever. If you want
Peter to bless it, then fine. But the standard is there. Violators
will be persecuted.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 22:24                           ` Richard Gooch
@ 2001-05-15 22:27                             ` H. Peter Anvin
  2001-05-15 22:38                             ` Alexander Viro
  1 sibling, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 22:27 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Alexander Viro, Alan Cox, Ingo Oeser, Linus Torvalds, Neil Brown,
	Jeff Garzik, Linux Kernel Mailing List

Richard Gooch wrote:
> 
> Alexander Viro writes:
> >
> >
> > On Tue, 15 May 2001, Richard Gooch wrote:
> >
> > > Alan Cox writes:
> > > > >         len = readlink ("/proc/self/3", buffer, buflen);
> > > > >         if (strcmp (buffer + len - 2, "cd") != 0) {
> > > > >                 fprintf (stderr, "Not a CD-ROM! Bugger off.\n");
> > > > >                 exit (1);
> > > >
> > > > And on my box cd is the cabbage dicer whoops
> > >
> > > Actually, no, because it's guaranteed that a trailing "/cd" is a
> > > CD-ROM. That's the standard.
> >
> > Set by...?
> 
> Me&Linus. The device name authority (Peter). Whoever. If you want
> Peter to bless it, then fine. But the standard is there. Violators
> will be persecuted.
> 

No, bad designers will be persecuted.  This is bad design.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:41                         ` Richard Gooch
                                             ` (2 preceding siblings ...)
  2001-05-15 22:24                           ` Richard Gooch
@ 2001-05-15 22:28                           ` Richard Gooch
  2001-05-15 22:32                             ` H. Peter Anvin
  2001-05-15 22:33                             ` Alan Cox
  2001-05-16  7:21                           ` Geert Uytterhoeven
  2001-05-16 18:22                           ` Richard Gooch
  5 siblings, 2 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-15 22:28 UTC (permalink / raw)
  To: Alan Cox
  Cc: Richard Gooch, Ingo Oeser, Linus Torvalds, Neil Brown,
	Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List, viro

Alan Cox writes:
> > > > 	if (strcmp (buffer + len - 2, "cd") != 0) {
> > > > 		fprintf (stderr, "Not a CD-ROM! Bugger off.\n");
> > > > 		exit (1);
> > > 
> > > And on my box cd is the cabbage dicer whoops
> > 
> > Actually, no, because it's guaranteed that a trailing "/cd" is a
> > CD-ROM. That's the standard.
> 
> And its back to /dev/disc versus /dev/disk. Dont muddle user picked
> file names with kernel namespace bindings please.

Even if we have per-device filesystems, we are going to have the same
issue, in one form or another. If we have a "/devicetype" trailing
component added on, then somewhere it has to report "CD-ROM" or "cd"
or "Compact Disc".

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 22:19                         ` H. Peter Anvin
@ 2001-05-15 22:28                           ` Alan Cox
  2001-05-15 22:34                             ` H. Peter Anvin
  0 siblings, 1 reply; 317+ messages in thread
From: Alan Cox @ 2001-05-15 22:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Alan Cox, Chip Salzenberg, Linus Torvalds, Neil Brown,
	Jeff Garzik, Linux Kernel Mailing List, viro

> 1. is of course a problem in itself.  Someone who creates overlapping
> ioctls should be spanked, hard.

No argument there. But there is no LANANA ioctl body

> Agreed, but "determining type of device" and "determining if interface X
> is available on this device" are different operations.  If the latter is
> what you want, you want to *explicitly* perform the latter operation.

Both should be clean and explicit

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 22:28                           ` Richard Gooch
@ 2001-05-15 22:32                             ` H. Peter Anvin
  2001-05-15 22:33                             ` Alan Cox
  1 sibling, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 22:32 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Alan Cox, Ingo Oeser, Linus Torvalds, Neil Brown, Jeff Garzik,
	Linux Kernel Mailing List, viro

Richard Gooch wrote:
> 
> Even if we have per-device filesystems, we are going to have the same
> issue, in one form or another. If we have a "/devicetype" trailing
> component added on, then somewhere it has to report "CD-ROM" or "cd"
> or "Compact Disc".
> 

Again, many device types aren't mutually exclusive.  It's a set, not an
enum.

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 22:28                           ` Richard Gooch
  2001-05-15 22:32                             ` H. Peter Anvin
@ 2001-05-15 22:33                             ` Alan Cox
  1 sibling, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-15 22:33 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Alan Cox, Richard Gooch, Ingo Oeser, Linus Torvalds, Neil Brown,
	Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List, viro

> Even if we have per-device filesystems, we are going to have the same
> issue, in one form or another. If we have a "/devicetype" trailing
> component added on, then somewhere it has to report "CD-ROM" or "cd"
> or "Compact Disc".

When I ask it. Not when I name it

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 22:28                           ` Alan Cox
@ 2001-05-15 22:34                             ` H. Peter Anvin
  0 siblings, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-15 22:34 UTC (permalink / raw)
  To: Alan Cox
  Cc: Chip Salzenberg, Linus Torvalds, Neil Brown, Jeff Garzik,
	Linux Kernel Mailing List, viro

Alan Cox wrote:
> 
> > 1. is of course a problem in itself.  Someone who creates overlapping
> > ioctls should be spanked, hard.
> 
> No argument there. But there is no LANANA ioctl body
> 

I though Michael Chastain was maintaining this set.  No, we haven't made
it an official LANANA function, mostly because I didn't want to push.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:18                                     ` Linus Torvalds
                                                         ` (2 preceding siblings ...)
  2001-05-15 21:51                                       ` Mark Frazer
@ 2001-05-15 22:35                                       ` Bob Glamm
  2001-05-16  0:56                                       ` Jonathan Lundell
                                                         ` (2 subsequent siblings)
  6 siblings, 0 replies; 317+ messages in thread
From: Bob Glamm @ 2001-05-15 22:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jonathan Lundell, Jeff Garzik, James Simmons, Alan Cox,
	Neil Brown, H. Peter Anvin, Linux Kernel Mailing List, viro

> > >Keep it informational. And NEVER EVER make it part of the design.
> > 
> > What about:
> > 
> > 1 (network domain). I have two network interfaces that I connect to 
> > two different network segments, eth0 & eth1;
> 
> So?
> 
> Informational. You can always ask what "eth0" and "eth1" are.
[...] 
> The location of the device is _meaningless_. 
[...]

Roast me if I'm wrong or if this has been beat to death, but there
seem to be two sides of the issue here:

  1) Device numbering/naming.  It is immaterial to the kernel how the
     devices are enumerated or named.  In fact, I would argue that the
     naming could be non-deterministic across reboots.

  2) Device identification.  The end-user or user-space software needs
     to be able to configure certain devices a certain way.  They too
     don't (shouldn't) care what numbers or names are given to the
     devices, as long as they can configure the proper device correctly.

I don't disagree that a move toward making the move toward dynamic device
enumeration/naming is the right way to go; in fact, I don't disagree
that a 32-bit dev_t would be more than adequate (and sparse) for most
configurations - even the largest configured machines wouldn't have more
than several million device names/nodes.

However, I *do* see that there is a LOT of end-user software that still
depends on static numbering to partially identify devices.  Yes, it is
half-baked that major 8 gets you SCSI devices, and then after you open
all the minor devices you STILL get to do all the device-specific ioctl()
calls to identify the device capabilities of the controller or each target
on the controller.  But I don't think that arbitrarily slamming the door
on static naming/numbering to force people to change arguably broken
code or semantics is the right move to make either.

Instead, what about doing the transformation gradually?  Static and
dyanmic enumeration shouldn't have to be mutually exclusive.  E.g.
in the interim devices could be accessed via dynamically enumerated/named
nodes as well as the old staticially enumerated/named nodes.  The 
current device enumeration space seems be sparse enough to take
care of this for most cases.

During this transition, end-user software would have the chance to be
re-written to use the new dynamically enumerated/named device scheme,
perhaps with a somewhat standardized interface to make identification 
and capability detection of devices easier from software.  At some
scheduled point in future kernel development support for the old
static enumeration/naming scheme would be dropped.

Finally, there has to be an *easy* way of identifying devices from software.
You're right, I don't care if my network cards are numbered 0-1-2, 2-0-1,
or in any other permutation, *as long as I can write something like this*:

  # start up networking
  for i in eth0 eth1 eth2; do
      identify device $i
      get configuration/config procedure for device $i identity
      configure $i
  done

Ideally, the identity of device $i would remain the same across reboots.
Note that the device isn't named by its identity, rather, the identity is
acquired from the device.

This gets difficult for certain situations but I think those situations
are rare.  Most modern hardware I've seen has some intrinsic identification
built on board.

> Linux gets this right. We don't give 100Mbps cards different names from
> 10Mbps cards - and pcmcia cards show up in the same namespace as cardbus,
> which is the same namespace as ISA. And it doesn't matter what _driver_ we
> use.
> 
> The "eth0..N" naming is done RIGHT!
> 
> > 2 (disk domain). I have multiple spindles on multiple SCSI adapters. 
> 
> So? Same deal. You don't have eth0..N, you have disk0..N. 
[...]
> Linux gets this _somewhat_ right. The /dev/sdxxx naming is correct (or, if
> you look at only IDE devices, /dev/hdxxx). The problem is that we don't
> have a unified namespace, so unlike eth0..N we do _not_ have a unified
> namespace for disks.

This numbering seems to be a kernel categorization policy.  E.g.,
I have k eth devices, numbered eth0..k-1.  I have m disks, numbered
disc0..m-1, I have n video adapters, numbered fb0..n-1, etc.  This
implies that at some point someone will have to maintain a list of 
device categories.

IMHO the example isn't consistent though.  ethXX devices are a different
level of classification from diskYY.  I would argue that *all* network
devices should be named net0, net1, etc., be they Ethernet, Token Ring, Fibre
Channel, ATM.  Just as different disks be named disk0, disk1, etc., whether
they are IDE, SCSI, ESDI, or some other controller format.

-Bob

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 22:24                           ` Richard Gooch
  2001-05-15 22:27                             ` H. Peter Anvin
@ 2001-05-15 22:38                             ` Alexander Viro
  1 sibling, 0 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-15 22:38 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Alan Cox, Ingo Oeser, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List



On Tue, 15 May 2001, Richard Gooch wrote:

> Me&Linus. The device name authority (Peter). Whoever. If you want
> Peter to bless it, then fine. But the standard is there. Violators
> will be persecuted.

Ah, standard on names in devfs? About as relevant as GOSIP...


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:29                                       ` Alex Bligh - linux-kernel
  2001-05-15 21:36                                         ` Linus Torvalds
  2001-05-15 22:03                                         ` Jeff Mahoney
@ 2001-05-15 22:42                                         ` Andreas Dilger
  2 siblings, 0 replies; 317+ messages in thread
From: Andreas Dilger @ 2001-05-15 22:42 UTC (permalink / raw)
  To: Alex Bligh - linux-kernel
  Cc: Linus Torvalds, Jonathan Lundell, Jeff Garzik, James Simmons,
	Alan Cox, Neil Brown, H. Peter Anvin, Linux Kernel Mailing List,
	viro

Alex Bligh writes:
> Q: Let us assume you have dynamic numbering disk0..N as you suggest,
>    and you have some s/w RAID of SCSI disks. A disk fails, and is (hot)
>    removed. Life continues. You reboot the machine. Disks are now numbered
>    disk0..(N-1). If the RAID config specifies using disk0..N thusly, it
>    is going to be very confused, as stripes will appear in the wrong place.

But this already happens with the SCSI device numbering, so no big change.
It would also happen if you have multi-path access to the RAID box (i.e.
two SCSI controllers), or with FC where there IS no "physical address",
or move the disks to a different type of SCSI controller (with a different
detection order than other controllers in the system), etc.

>    Doesn't that mean the file specifying the RAID config is going to have
>    to enumerate SCSI IDs (or something configuration invariant) as
>    opposed to use the disk0..N numbering anyway?

No such thing as configuration invariant in some cases.

>    Sure it can interrogate each disk0..N to see which has the ID that
>    it actually wanted, but doesn't this rather subvert the purpose?

Not at all.  To be robust, the (software) RAID system should ONLY access
disks that it knows belong to a given RAID set.  To do otherwise is
useless.  This is what LVM does, and it surprises me if MD RAID does
anything else (never really looked into it...).

In any case, a sane system would likely not expose all of the underlying
disks that make up a RAID set as a "disk", after that RAID set was built.
At configuration time, any /dev/disk{A,B,C} that went into the RAID
set would be removed, and the resulting RAID volume would become a new
/dev/diskX, just like any other disk in the system.  If you really needed
more information about the RAID configuration, use the RAID tools to
query the attributes of /dev/diskX.  That would solve a _lot_ of problems
with disks that make up meta-volumes being accessed via /dev/hdX instead
of /dev/mdY.

> IE, given one could create /dev/disk/?.+, isn't the important
> argument that they share common major device numbers etc., not whether
> they linearly reorder precisely to 0..N as opposed to have some form
> of identifier guaranteed to be static across reboot & config change.

I don't think the objective is necessarily to have a _packed_ device
numbering, nor one that changes randomly after each reboot, but just a
generic device naming independent of physical location, access method, etc.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:40                     ` Chip Salzenberg
  2001-05-15 22:12                       ` Alan Cox
@ 2001-05-15 22:49                       ` James Simmons
  2001-05-15 23:22                       ` Kenneth Johansson
  2 siblings, 0 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 22:49 UTC (permalink / raw)
  To: Chip Salzenberg
  Cc: Alan Cox, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List, viro


> According to Alan Cox:
> > Given a file handle 'X' how do I find out what ioctl groups I should
> > apply to it.
> 
> Wouldn't it be better just to *try* ioctls and see which ones work and
> which ones don't?

You can do this with the tty layer. Just open /dev/tty and try tioclinux. 
On my serial console it fails and when I run the exact same program works
on my VT. It is the only way to see if these ioctl calls work. No other
way to see if your on a serial console or a VT. 


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:59                                 ` Chip Salzenberg
@ 2001-05-15 22:51                                   ` James Simmons
  0 siblings, 0 replies; 317+ messages in thread
From: James Simmons @ 2001-05-15 22:51 UTC (permalink / raw)
  To: Chip Salzenberg
  Cc: Linus Torvalds, Alexander Viro, Alan Cox, Neil Brown,
	Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List


> > Graphics cards are the same way. Especially high end ones. They have pipes
> > as well. For low end cards you can think of them as single pipeline cards
> > with one pipe.
> 
> It still frosts my shorts that DRM (e.g. /dev/dri/card0) doesn't use
> write().  It's a natural way to feed pipelines.  But no, it's a raft
> of ioctl() calls.  *sigh*

I never liked this either. ioctl calls are slooooooooooooooooooooooooooow.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:40                     ` Chip Salzenberg
  2001-05-15 22:12                       ` Alan Cox
  2001-05-15 22:49                       ` James Simmons
@ 2001-05-15 23:22                       ` Kenneth Johansson
  2 siblings, 0 replies; 317+ messages in thread
From: Kenneth Johansson @ 2001-05-15 23:22 UTC (permalink / raw)
  To: Chip Salzenberg
  Cc: Alan Cox, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List, viro

Chip Salzenberg wrote:

> According to Alan Cox:
> > Given a file handle 'X' how do I find out what ioctl groups I should
> > apply to it.
>
> Wouldn't it be better just to *try* ioctls and see which ones work and
> which ones don't?

As ioctl's is just numbers that can be valid but mean totally different thing depending on device I don't see how this is going to work. It took me close to a month to figure out why my new 10000rpm scsi disk constantly ended up with a read only filesystem. What I did not know at that time was that the /dev/sg[x] numbering was changed by adding something to the scsi chain and my backup software now sent the eject command to the disk instead of to the tape. This ioctl means spinn down when it is sent to the disk and that in turn produces a fatal error and a remount to ro for the mounted filesystem.

This problem had not existed for me if things had been mapped more static this i why I'am not overly happy hearing that things is going to be even more volatile. But I guess that it's possible to solve most of this issues in userspace.

One way of looking on the problem is to take it to it's logical extrem. This would be a kernel with no persistent storage for devices no major,minor and no name policy. The kernel would simply put any device it finds in /dev/devno/[x] where x is just a number that should be seen as something that changed for every reboot.

Any userspace solution that can create device links that are stable out of this is one that would get my vote. This obviously needs some type if standard why to query the devices or else it's no way it can work.



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 22:12                       ` Alan Cox
  2001-05-15 22:19                         ` H. Peter Anvin
@ 2001-05-15 23:39                         ` Chip Salzenberg
  2001-05-16 20:37                           ` Alan Cox
  1 sibling, 1 reply; 317+ messages in thread
From: Chip Salzenberg @ 2001-05-15 23:39 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro

According to Alan Cox:
> Chip:
> > Wouldn't it be better just to *try* ioctls and see which ones work and
> > which ones don't?
> 
> 1. We have overlaps

We all agree that overlaps need to be eliminated over time.  In the
meantime, as a coping strategy: I'll bet you that for any two given
device classes, there is at least one ioctl that works on only one of
them.  (I'm only talking about an interim workaround!  Calm down!  Put
down those bats!)

> 2. I've seen code where people play clever ioctl tricks to deduce a
> device type and it ends up looking like one of those chemistry
> identification charts (hopefully minus do you see smoke ?)

I don't mean to suggest that ioctls be used to deduce device types
(except in the case of overlapping ioctl numbers, which shouldn't be
all *that* common (I hope)).  I mean to suggest that the question
"What device type are you?" usually shouldn't even be asked!

If you want to do X to the device on fd, just call ioctl(fd, X, ...).
Either it works or it doesn't.

I realize that overlapping ioctls throw a monkey wrench into this
world view.  Is it a bigger wrench than the wrenching pain that we'll
have to live through to make device identification reliable?  Depends
on how many ioctls overlap, and how easily we could make them stop
overlapping.
-- 
Chip Salzenberg              - a.k.a. -             <chip@valinux.com>
 "We have no fuel on board, plus or minus 8 kilograms."  -- NEAR tech

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:18                                     ` Linus Torvalds
                                                         ` (3 preceding siblings ...)
  2001-05-15 22:35                                       ` Bob Glamm
@ 2001-05-16  0:56                                       ` Jonathan Lundell
  2001-05-16  2:31                                         ` Andrew Morton
  2001-05-16  6:56                                         ` Jonathan Lundell
  2001-05-16  7:24                                       ` Geert Uytterhoeven
  2001-05-16 16:04                                       ` Michael Meissner
  6 siblings, 2 replies; 317+ messages in thread
From: Jonathan Lundell @ 2001-05-16  0:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Garzik, James Simmons, Alan Cox, Neil Brown, H. Peter Anvin,
	Linux Kernel Mailing List, viro

At 1:18 PM -0700 2001-05-15, Linus Torvalds wrote:
>  > 1 (network domain). I have two network interfaces that I connect to
>>  two different network segments, eth0 & eth1;
>
>So?
>
>Informational. You can always ask what "eth0" and "eth1" are.
>
>There's another side to this: repeatability. A setup should be
>_repeatable_.
>
>This is what we have now. Network devices are called "eth0..N", and nobody
>is complaining about the fact that the numbering is basically random. It
>is _repeatable_ as long as you don't change your hardware setup, and the
>numbering has effectively _nothing_ to do with "location".
>
>You don't say "oh, I have my network card in PCI bus #2, slot #3,
>subfunction #1, so I should do 'ifconfig netp2s3f1'". Right?
>
>The location of the device is _meaningless_.

I *like* eth0..n (I'd like net0..n better). And I *can't* ask what 
eth0 and eth1 are, by the way, but I should be able to (Jeff Garzik 
has proposed an extension to ethtool to help out this lack, but it's 
not in Linux today, and needs concrete implementation anyway).

But that's not my point. I'm *not* proposing that we exchange eth0 
for geographic names. I'm suggesting, though, that the location of 
the device is *not* meaningless, because it's the physically-located 
RJ45 socket (or whatever) that I have to connect a particular cable 
to. Sure, no big deal for systems with a single connection, but it 
becomes a real pain when you've got a dozen, which is a reasonable 
number for some network-infrastructure functions (eg firewalls).

When I ifconfig one of a collection of interfaces, I'm very much 
talking about the specific physical interface connected via a 
specific physical cable to a specific physical switch port.

Bob Glamm  is on the right track with

At 5:35 PM -0500 2001-05-15, Bob Glamm wrote:
>   # start up networking
>   for i in eth0 eth1 eth2; do
>       identify device $i
>       get configuration/config procedure for device $i identity
>       configure $i
>   done

...it's just that right now the connection between eth* and its 
physical identity isn't made.
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:20                                     ` Nicolas Pitre
  2001-05-15 21:28                                       ` James Simmons
@ 2001-05-16  0:59                                       ` Daniel Phillips
  2001-05-16  1:34                                         ` Nicolas Pitre
                                                           ` (2 more replies)
  2001-05-16  7:17                                       ` Kai Henningsen
  2 siblings, 3 replies; 317+ messages in thread
From: Daniel Phillips @ 2001-05-16  0:59 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linux Kernel Mailing List

On Tuesday 15 May 2001 23:20, Nicolas Pitre wrote:
> Personally, I'd really like to see /dev/ttyS0 be the first detected
> serial port on a system, /dev/ttyS1 the second, etc.

There are well-defined rules for the first four on PC's.  The ttySx 
better match the labels the OEM put on the box.

--
Daniel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 15:34                 ` Linus Torvalds
@ 2001-05-16  1:00                   ` Daniel Phillips
  2001-05-16 12:58                     ` Jens Axboe
  2001-05-16  3:25                   ` Neil Brown
  1 sibling, 1 reply; 317+ messages in thread
From: Daniel Phillips @ 2001-05-16  1:00 UTC (permalink / raw)
  To: Linus Torvalds, Neil Brown
  Cc: Jeff Garzik, Alan Cox, H. Peter Anvin, Linux Kernel Mailing List, viro

On Tuesday 15 May 2001 17:34, Linus Torvalds wrote:
> On Tue, 15 May 2001, Neil Brown wrote:
> > Ofcourse setting the "queue" function that __blk_get_queue call to
> > do a lookup of the minor and choose an appropriate queue for the
> > "real" device wont work as you need to munge bh->b_rdev too.
>
> What I would do is:
>  - remove b_rdev completely.

:-) And b_rsector too?

> [...]

>  - replace is with b_index
>
> Then, the "get_queue" functions basically end up doing the mapping of
>
> 	b_dev -> <queue,b_index>

To clarify, will be b_index be in the buffer_head or not?

--
Daniel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:51                                       ` Linus Torvalds
@ 2001-05-16  1:01                                         ` Daniel Phillips
  2001-05-16  1:04                                           ` H. Peter Anvin
  0 siblings, 1 reply; 317+ messages in thread
From: Daniel Phillips @ 2001-05-16  1:01 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro
  Cc: H. Peter Anvin, James Simmons, Alan Cox, Neil Brown, Jeff Garzik,
	Linux Kernel Mailing List

On Tuesday 15 May 2001 22:51, Linus Torvalds wrote:
> On Tue, 15 May 2001, Alexander Viro wrote:
> > If you want them all to inherit it - inherit from mountpoint.
>
> ..which is exactly what the device node ends up being. The implicit
> mount-point.
>
> And which point, btw, it is completely indistinguishable to user
> space whether the thing is implemented as a full filesystem, or
> whether it's just that the device node exports a simple "lookup()"
> that it passes down to the device driver. So this is also the point
> where it becomes nothing but an implementation issue, and as such
> it's much less contentious.
>
> Done right, they'll be automatic mount-points

Sounds like "treat it like a file and it acts like a file, treat it 
like a directory and it acts like a directory".

--
Daniel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  1:01                                         ` Daniel Phillips
@ 2001-05-16  1:04                                           ` H. Peter Anvin
  0 siblings, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-16  1:04 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Linus Torvalds, Alexander Viro, James Simmons, Alan Cox,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List

Daniel Phillips wrote:
> 
> Sounds like "treat it like a file and it acts like a file, treat it
> like a directory and it acts like a directory".
> 

The original plan was that you only could indirect through it; not
chdir() for example.  One could do the whole enchilada, but then one
would have to expect that open() could have very different effects with
and without O_DIRECTORY (and open() on directories without O_DIRECTORY
should be outlawed with prejudice.)

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  0:59                                       ` Daniel Phillips
@ 2001-05-16  1:34                                         ` Nicolas Pitre
  2001-05-16  1:51                                           ` Jonathan Lundell
  2001-05-16 11:34                                         ` Erik Mouw
  2001-05-17 17:07                                         ` Eric W. Biederman
  2 siblings, 1 reply; 317+ messages in thread
From: Nicolas Pitre @ 2001-05-16  1:34 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Linux Kernel Mailing List



On Wed, 16 May 2001, Daniel Phillips wrote:

> On Tuesday 15 May 2001 23:20, Nicolas Pitre wrote:
> > Personally, I'd really like to see /dev/ttyS0 be the first detected
> > serial port on a system, /dev/ttyS1 the second, etc.
>
> There are well-defined rules for the first four on PC's.  The ttySx
> better match the labels the OEM put on the box.

Then just make them be detected first.


Nicolas


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  1:34                                         ` Nicolas Pitre
@ 2001-05-16  1:51                                           ` Jonathan Lundell
  0 siblings, 0 replies; 317+ messages in thread
From: Jonathan Lundell @ 2001-05-16  1:51 UTC (permalink / raw)
  To: Nicolas Pitre, Daniel Phillips; +Cc: Linux Kernel Mailing List

At 9:34 PM -0400 2001-05-15, Nicolas Pitre wrote:
>On Wed, 16 May 2001, Daniel Phillips wrote:
>
>>  On Tuesday 15 May 2001 23:20, Nicolas Pitre wrote:
>>  > Personally, I'd really like to see /dev/ttyS0 be the first detected
>>  > serial port on a system, /dev/ttyS1 the second, etc.
>>
>>  There are well-defined rules for the first four on PC's.  The ttySx
>>  better match the labels the OEM put on the box.
>
>Then just make them be detected first.

Well, they traditionally start with 1, not 0, too. Or have cute 
little icons and no text. Or aren't labelled at all. I'm using one 
fairly well-known dual-port PCI serial board that silently 
interchanged the two ports on a rev change, with no labelling change 
at all ('cause there was no label!). Make your ttySx match *that*!

-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  0:56                                       ` Jonathan Lundell
@ 2001-05-16  2:31                                         ` Andrew Morton
  2001-05-16  6:56                                         ` Jonathan Lundell
  1 sibling, 0 replies; 317+ messages in thread
From: Andrew Morton @ 2001-05-16  2:31 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: Linux Kernel Mailing List

Jonathan Lundell wrote:
> 
> ...
> I *like* eth0..n (I'd like net0..n better). And I *can't* ask what
> eth0 and eth1 are, by the way, but I should be able to (Jeff Garzik
> has proposed an extension to ethtool to help out this lack, but it's
> not in Linux today, and needs concrete implementation anyway).
> 
> But that's not my point. I'm *not* proposing that we exchange eth0
> for geographic names. I'm suggesting, though, that the location of
> the device is *not* meaningless, because it's the physically-located
> RJ45 socket (or whatever) that I have to connect a particular cable
> to. Sure, no big deal for systems with a single connection, but it
> becomes a real pain when you've got a dozen, which is a reasonable
> number for some network-infrastructure functions (eg firewalls).
> 
> When I ifconfig one of a collection of interfaces, I'm very much
> talking about the specific physical interface connected via a
> specific physical cable to a specific physical switch port.
> 

Yes, it can be a security trap as well - physically move a card and
your firewall rules end up being applied to the wrong connection.

The 2.4 kernel allows you to rename an interface.  So you can build
a little database of (MAC address/name) pairs. Apply this after booting
and before bringing up the interfaces and everything has the name
you wanted, based on MAC address.

Andi Kleen has an app which does this:

	ftp://ftp.firstfloor.org/pub/ak/smallsrc/nameif.c

but apparently some additional kernel work is needed to make
this work 100% correctly.  I do not know what the specific
problem is.


-

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 15:34                 ` Linus Torvalds
  2001-05-16  1:00                   ` Daniel Phillips
@ 2001-05-16  3:25                   ` Neil Brown
  1 sibling, 0 replies; 317+ messages in thread
From: Neil Brown @ 2001-05-16  3:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Tuesday May 15, torvalds@transmeta.com wrote:
> 
> On Tue, 15 May 2001, Neil Brown wrote:
> > 
> > Ofcourse setting the "queue" function that __blk_get_queue call to do
> > a lookup of the minor and choose an appropriate queue for the "real"
> > device wont work as you need to munge bh->b_rdev too.
> 
> What I would do is:
>  - remove b_rdev completely. No driver is actually interested in what the
>    device number is, the only thing they want to use it for is to look up
>    which device index we have (and for doing the partition handling, but
>    as discussed in a completely independent discussion a few months ago,
>    we should handle that as a lvm remapping thing, NOT at the driver
>    level!)
>  - replace is with b_index

Wouldn't 
    struct block_device *b_bdev
be a better choice? (though you would need to fiddle with reference
counts then, so maybe not, I'm not sure).

> 
> > You would still nee to make sure the blk_size[], blksize_size[],
> > hardsect_size[], max_readahead[], max_sectors[] all got handled
> > properly.
> 
> Actually, I htink Jens did most of these, and moved them into a device
> array.
> 

Will this go into 2.4.X anytime soon? or is it 2.5 material?

> > Does the minor number for this "disk" layer have N bits for partition
> > number and 8-N bits (later to be 20-N bits or similar) for device
> > number?
> 
> I'd go with N=8, and only use this for the "new" cases, We've seen that
> N=4 is too small (SCSI), and N=6 (IDE) is already too cramped with a 8-bit
> minor number.

I was assuming that this "disk" device was something that we could do
now, but if N==8, then we need more minor bits before it can be used
effectively, and that means lots of user-space changes doesn't it? So
it won't be in 2.4.

> 
> BUT! Note that when you do the partition handling in get_queue too (and
> thus index is an index to the _device_ and has nothing to do with
> partitions), you can trivially allow different majors to have different
> numbers of partition bits, because the driver no longer cares. This is
> required so that the get_queue remapping can easily handle the legacy
> IDE/SCSI numbers anyway, so it's easyish to just have both at the same
> time - you could have N=4 for a "disk major for old users that need the
> 16-bit device numbers and a single major", with N=8 for the "new style
> major" which doesn't fit in 8 bits.

Uhm.  If I understand you correctly, you are saying that a single
major (the "disk" major) could have different partition sub-divisions
for different minor ranges.  e.g:
 minor 0-15 is partitions of first scsi drive
 minor 16-31 is partitions of second scsi drive
 minor 64-128 is partitions of first IDE drive

Is that what you mean?  I wouldn't want to have to maintain /dev/...

> 
> > Finally, how do I say that I want the root filesystem to be on a
> > particular "mdp" device+partition.  I cannot assume that my device
> > will be the first to register with the "disk" layer, so I cannot be
> > sure that "root=/dev/diska1" will work.
> 
> You have never been able to really assume that. Disks move around. 
> 
> A lot of people seem to think that controller type or location on the PCI
> bus should somehow have some "meaning", and that it guarantees that the
> disks don't move in the namespace. That's crap. You can do that in user
> space ("what controller are you on?") if you really really care.

This is a topic that seems to be generating alot of discussion in these
threads.
Clearly each object (drive, partitions, filesystem, pointer, mouse,
framebuffer...) can potentially have several different names, each of
which may be valid in it's own context.  I think the kernel should
export all names equally without prejudice.

NeilBrown

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  0:56                                       ` Jonathan Lundell
  2001-05-16  2:31                                         ` Andrew Morton
@ 2001-05-16  6:56                                         ` Jonathan Lundell
  2001-05-16  8:02                                           ` Vojtech Pavlik
                                                             ` (2 more replies)
  1 sibling, 3 replies; 317+ messages in thread
From: Jonathan Lundell @ 2001-05-16  6:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel Mailing List

At 12:31 PM +1000 2001-05-16, Andrew Morton wrote:
>  > When I ifconfig one of a collection of interfaces, I'm very much
>>  talking about the specific physical interface connected via a
>  > specific physical cable to a specific physical switch port.
>>
>
>Yes, it can be a security trap as well - physically move a card and
>your firewall rules end up being applied to the wrong connection.
>
>The 2.4 kernel allows you to rename an interface.  So you can build
>a little database of (MAC address/name) pairs. Apply this after booting
>and before bringing up the interfaces and everything has the name
>you wanted, based on MAC address.
>
>Andi Kleen has an app which does this:
>
>	ftp://ftp.firstfloor.org/pub/ak/smallsrc/nameif.c
>
>but apparently some additional kernel work is needed to make
>this work 100% correctly.  I do not know what the specific
>problem is.

There's a bit of a catch 22, though, if you don't have unique MAC 
addresses in the system (across multiple interfaces). It's common 
practice in the SPARC world (Solaris, anyway) for all the interfaces 
to default to a single system-wide MAC address. The fact that MAC 
addresses are at least semi-volatile is also bothersome.

It's also  true that some buses simply don't yield up physical 
locations (ISA springs to mind, and I gather that FC is squishy that 
way), but it's desirable to be able to make the connection all ways 
(eth# <-> bus location <-> physical location <-> MAC address) in a 
uniform manner. (Where MAC address might be something else in a 
non-Ethernet domain.)
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:43                                         ` Johannes Erdfelt
  2001-05-15 21:49                                           ` James Simmons
@ 2001-05-16  7:05                                           ` Kai Henningsen
  1 sibling, 0 replies; 317+ messages in thread
From: Kai Henningsen @ 2001-05-16  7:05 UTC (permalink / raw)
  To: linux-kernel

jsimmons@transvirtual.com (James Simmons)  wrote on 15.05.01 in <Pine.LNX.4.10.10105151448150.22038-100000@www.transvirtual.com>:

> > > I couldn't agree with you more. It gives me headaches at work. One note,
> > > their is a except to the eth0 thing. USB to USB networking. It uses
> > > usb0, etc. I personally which they use eth0.
> >
> > USB to USB networking cables aren't ethernet.
>
> I'm talking about a wireless connection. ipaq USB cradle to PC.

I don't know about USB, but I do know about PPP.

The point is, Ethernet is *different* from PPP. The frame formats are  
different, even the protocols (aside from IP) are different.

It's similar to the difference between serial and parallel ports. Sure,  
for some things, they're the same - but for others, they really aren't,  
and that's why it makes sense to call the one ttyS0 and the other lp0.

Similar for eth0 vs. ppp0.

Yes, both are network interfaces. But no, you don't do ARP on ppp0, for  
example (you do LCP instead, and it does different stuff, too).

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:28                                       ` James Simmons
                                                           ` (2 preceding siblings ...)
  2001-05-15 22:07                                         ` Alan Cox
@ 2001-05-16  7:11                                         ` Kai Henningsen
  2001-05-16  7:43                                           ` Alexander Viro
  3 siblings, 1 reply; 317+ messages in thread
From: Kai Henningsen @ 2001-05-16  7:11 UTC (permalink / raw)
  To: linux-kernel

hpa@transmeta.com (H. Peter Anvin)  wrote on 15.05.01 in <3B01A044.F72BFDD1@transmeta.com>:

> Personally, I would also like to see network devices manifest in the
> filesystem namespace like everything else.

Yes.

Can we have a meta-rule?

*Every* by-name kernel interface should have a filesystem variant.

That is, if there's a kernel interface where you give the kernel a string  
to identify an in-kernel object, there should be some place in the file  
system (after mounting any prerequisites) that will respond to a path  
ending in that name.

That doesn't necessarily mean the parent will be a readable directory -  
that would, of course, be preferrable, but if enumerating all objects is a  
problem, then dropping this requirement is much preferrable to not having  
a pathname.

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:20                                     ` Nicolas Pitre
  2001-05-15 21:28                                       ` James Simmons
  2001-05-16  0:59                                       ` Daniel Phillips
@ 2001-05-16  7:17                                       ` Kai Henningsen
  2 siblings, 0 replies; 317+ messages in thread
From: Kai Henningsen @ 2001-05-16  7:17 UTC (permalink / raw)
  To: linux-kernel

phillips@bonn-fries.net (Daniel Phillips)  wrote on 16.05.01 in <01051602593001.00406@starship>:

> On Tuesday 15 May 2001 23:20, Nicolas Pitre wrote:
> > Personally, I'd really like to see /dev/ttyS0 be the first detected
> > serial port on a system, /dev/ttyS1 the second, etc.
>
> There are well-defined rules for the first four on PC's.  The ttySx
> better match the labels the OEM put on the box.

Sorry, but that turns out not to be the case.

There are some rules for devices called COMx (x=1..4, ports much more than  
interrupts[1]), but I haven't ever seen a box that had a "ttySx" label  
from the manufacturer. And you can easily renumber those; the BIOS uses a  
4-entry lookup table for the ports.

[1] COM1=3F8, COM2=2F8, COM3=3E8, COM2=2E8

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:41                         ` Richard Gooch
                                             ` (3 preceding siblings ...)
  2001-05-15 22:28                           ` Richard Gooch
@ 2001-05-16  7:21                           ` Geert Uytterhoeven
  2001-05-16 18:22                           ` Richard Gooch
  5 siblings, 0 replies; 317+ messages in thread
From: Geert Uytterhoeven @ 2001-05-16  7:21 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Alan Cox, Ingo Oeser, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List, viro

On Tue, 15 May 2001, Richard Gooch wrote:
> Alan Cox writes:
> > > 	len = readlink ("/proc/self/3", buffer, buflen);
> > > 	if (strcmp (buffer + len - 2, "cd") != 0) {
> > > 		fprintf (stderr, "Not a CD-ROM! Bugger off.\n");
> > > 		exit (1);
> > 
> > And on my box cd is the cabbage dicer whoops
> 
> Actually, no, because it's guaranteed that a trailing "/cd" is a
> CD-ROM. That's the standard.

Then  check for `/cd' at the end instead of `cd' :-)

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:18                                     ` Linus Torvalds
                                                         ` (4 preceding siblings ...)
  2001-05-16  0:56                                       ` Jonathan Lundell
@ 2001-05-16  7:24                                       ` Geert Uytterhoeven
  2001-05-16 23:26                                         ` Alan Cox
  2001-05-16 16:04                                       ` Michael Meissner
  6 siblings, 1 reply; 317+ messages in thread
From: Geert Uytterhoeven @ 2001-05-16  7:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jonathan Lundell, Jeff Garzik, James Simmons, Alan Cox,
	Neil Brown, H. Peter Anvin, Linux Kernel Mailing List, viro

On Tue, 15 May 2001, Linus Torvalds wrote:
> On Tue, 15 May 2001, Jonathan Lundell wrote:
> > 2 (disk domain). I have multiple spindles on multiple SCSI adapters. 
> 
> So? Same deal. You don't have eth0..N, you have disk0..N. 
> 
> What's the problem? It's _repeatable_, in that as long as you don't change
> your disks, they'll show up the same way. But the 0..N doesn't imply that
> the disks are anywhere special.

Are FireWire (and USB) disks always detected in the same order? Or does it
behave like ADB, where you never know which mouse/keyboard is which
mouse/keyboard?

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 18:04                               ` Jeff Garzik
  2001-05-15 18:15                                 ` Linus Torvalds
  2001-05-15 19:33                                 ` Kai Henningsen
@ 2001-05-16  7:25                                 ` Geert Uytterhoeven
  2 siblings, 0 replies; 317+ messages in thread
From: Geert Uytterhoeven @ 2001-05-16  7:25 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Linus Torvalds, James Simmons, Alan Cox, Neil Brown,
	H. Peter Anvin, Linux Kernel Mailing List, viro

On Tue, 15 May 2001, Jeff Garzik wrote:
> Linus Torvalds wrote:
> > Now, if we just fundamentally try to think about any device as being
> > hot-pluggable, you realize that things like "which PCI slot is this device
> > in" are completely _worthless_ as device identification, because they
> > fundamentally take the wrong approach, and they don't fit the generic
> > approach at all.
> 
> Should I interpret this as you disagreeing with
> exporting-bus-info-to-userspace type additions?  ie. some random
> get-info ioctl spits out pci_dev->slot_name to userspace.
> 
> I believe there are rare cases where this is useful.  When one already
> has the /dev node (via an open fd used for ioctl, usually), additionally
> you need the bus info to make an association between an active device on
> the hardware bus, and an active driver in the kernel.  X could use this
> info to figure out which fbdev devices to avoid.  SCSI is already using
> similar info, as of 2.4.4, as are net devs.  Userspace apps that diddle
> hardware are a definite minority case, but for that case the PCI slot
> info is useful.

X can already look at the fb_fix_screeninfo.{smem,mmio}_start fields and match
that with pci_dev.resource[*].

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  7:11                                         ` Kai Henningsen
@ 2001-05-16  7:43                                           ` Alexander Viro
  2001-05-16  9:45                                             ` Malcolm Beattie
  0 siblings, 1 reply; 317+ messages in thread
From: Alexander Viro @ 2001-05-16  7:43 UTC (permalink / raw)
  To: Kai Henningsen; +Cc: linux-kernel



On 16 May 2001, Kai Henningsen wrote:

> hpa@transmeta.com (H. Peter Anvin)  wrote on 15.05.01 in <3B01A044.F72BFDD1@transmeta.com>:
> 
> > Personally, I would also like to see network devices manifest in the
> > filesystem namespace like everything else.
> 
> Yes.
> 
> Can we have a meta-rule?
> 
> *Every* by-name kernel interface should have a filesystem variant.
> 
> That is, if there's a kernel interface where you give the kernel a string  
> to identify an in-kernel object, there should be some place in the file  
> system (after mounting any prerequisites) that will respond to a path  
> ending in that name.

You'll get in trouble with that in exactly one case: filesystem types.
No, it would make a lot of sense to have them as fs objects. For one
thing, we could turn mount(2) into
	open appropriate fs type
	convince the sucker that you are allowed, tell which device you want,
etc.
	open mountpoint
	mount(fs_fd, dir_fd)
Would work like charm, especially since we could fit the network filesystems
into the same scheme and get rid of the kludges a-la ncpfs mount sequence.

There's only one sore spot: how'd you mount _that_ fs? ;-)


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  6:56                                         ` Jonathan Lundell
@ 2001-05-16  8:02                                           ` Vojtech Pavlik
  2001-05-16 12:20                                           ` Bogdan Costescu
  2001-05-16 14:37                                           ` Jonathan Lundell
  2 siblings, 0 replies; 317+ messages in thread
From: Vojtech Pavlik @ 2001-05-16  8:02 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: Andrew Morton, Linux Kernel Mailing List

On Tue, May 15, 2001 at 11:56:41PM -0700, Jonathan Lundell wrote:
> At 12:31 PM +1000 2001-05-16, Andrew Morton wrote:
> >  > When I ifconfig one of a collection of interfaces, I'm very much
> >>  talking about the specific physical interface connected via a
> >  > specific physical cable to a specific physical switch port.
> >>
> >
> >Yes, it can be a security trap as well - physically move a card and
> >your firewall rules end up being applied to the wrong connection.
> >
> >The 2.4 kernel allows you to rename an interface.  So you can build
> >a little database of (MAC address/name) pairs. Apply this after booting
> >and before bringing up the interfaces and everything has the name
> >you wanted, based on MAC address.
> >
> >Andi Kleen has an app which does this:
> >
> >	ftp://ftp.firstfloor.org/pub/ak/smallsrc/nameif.c
> >
> >but apparently some additional kernel work is needed to make
> >this work 100% correctly.  I do not know what the specific
> >problem is.
> 
> There's a bit of a catch 22, though, if you don't have unique MAC 
> addresses in the system (across multiple interfaces). It's common 
> practice in the SPARC world (Solaris, anyway) for all the interfaces 
> to default to a single system-wide MAC address. The fact that MAC 
> addresses are at least semi-volatile is also bothersome.
> 
> It's also  true that some buses simply don't yield up physical 
> locations (ISA springs to mind,

ISA is quite fine, you can use the i/o space as physical locations.

> and I gather that FC is squishy that 
> way), but it's desirable to be able to make the connection all ways 
> (eth# <-> bus location <-> physical location <-> MAC address) in a 
> uniform manner. (Where MAC address might be something else in a 
> non-Ethernet domain.)

Yes.

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:44                               ` James Simmons
  2001-05-15 18:18                                 ` Ingo Oeser
  2001-05-15 18:42                                 ` Alexander Viro
@ 2001-05-16  8:29                                 ` Helge Hafting
  2001-05-16 17:16                                   ` James Simmons
  2 siblings, 1 reply; 317+ messages in thread
From: Helge Hafting @ 2001-05-16  8:29 UTC (permalink / raw)
  To: James Simmons, linux-kernel

James Simmons wrote:
> 
> > > I would use write except we use write to draw into the framebuffer. If I
> > > write to the framebuffer with that data the only thing that will happen is
> > > I will get pretty colors on my screen.
> >
> > Yes. And we also use write to send data to printer. So what? Nobody makes
> > you use the same file.
> 
> Well creating a new device wouldn't make linus happen right now. I do
> agree ioctl calls are evil!!!! You only have X amount of them. With write
> you can have infinte amounts of different functions to perform on a
> device. I didn't design fbdev :-( If I did it would have been far
> different. I do plan on some day merging drm and fbdev into one interface. So
> I plan to change this behavior. I like to see this interface ioctl-less
> (is their such a word ???). You mmap to alter buffers. Mmap is much more
> flexiable than write for graphics buffers anyways. You use write to pass
> "data" to the driver.

mmap is fine for a fb, but please don't remove read/write.
I can now do a screendump with "cat /dev/fb/0 > file", 
because everything is a file.
Having 
/dev/fb/0/brightness
/dev/fb/0/opengl
and so on seems to be a better approach.

Helge Hafting

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 19:43                                   ` Johannes Erdfelt
  2001-05-15 21:58                                     ` Chip Salzenberg
@ 2001-05-16  8:51                                     ` Helge Hafting
  2001-05-17 10:20                                     ` Pavel Machek
  2001-05-19  8:18                                     ` Kai Henningsen
  3 siblings, 0 replies; 317+ messages in thread
From: Helge Hafting @ 2001-05-16  8:51 UTC (permalink / raw)
  To: Johannes Erdfelt; +Cc: linux-kernel

Johannes Erdfelt wrote:

> I had always made the assumption that sockets were created because you
> couldn't easily map IPv4 semantics onto filesystems. It's unreasonable
> to have a file for every possible IP address/port you can communicate
> with.

You could have "open("/ipv4/127.0.0.1/80") without having pre-allocated
files and directories.  The "ipfs" driver would simply
accept any valid address without looking it up in any directory
structure.
Preallocation problems is no argument against a fs, in this
case tradition is.

Helge Hafting

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15  0:33               ` Rik van Riel
@ 2001-05-16  9:04                 ` Ingo Oeser
  0 siblings, 0 replies; 317+ messages in thread
From: Ingo Oeser @ 2001-05-16  9:04 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Alan Cox, Linux Kernel Mailing List

On Mon, May 14, 2001 at 09:33:35PM -0300, Rik van Riel wrote:
> Agreed. However, if this thing means I cannot use the -linus
> tree without devfs, then it will also mean my VM stuff only
> gets tested on -ac kernels...

No Problem. I test most of your VM stuff anyway and I use devfs
on that machine ;-)

PS: It's not that hard to build a machine, which can support
both. E-Mail me, if you would like to know _how_ to do that.

Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
         <<<<<<<<<<<<     been there and had much fun   >>>>>>>>>>>>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  7:43                                           ` Alexander Viro
@ 2001-05-16  9:45                                             ` Malcolm Beattie
  0 siblings, 0 replies; 317+ messages in thread
From: Malcolm Beattie @ 2001-05-16  9:45 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Kai Henningsen, linux-kernel

Alexander Viro writes:
> thing, we could turn mount(2) into
> 	open appropriate fs type
> 	convince the sucker that you are allowed, tell which device you want,
> etc.
> 	open mountpoint
> 	mount(fs_fd, dir_fd)
> Would work like charm, especially since we could fit the network filesystems
> into the same scheme and get rid of the kludges a-la ncpfs mount sequence.
> 
> There's only one sore spot: how'd you mount _that_ fs? ;-)

Start up init with fs_fd on file descriptor 3 and init can put it
where it likes.

--Malcolm

-- 
Malcolm Beattie <mbeattie@sable.ox.ac.uk>
Unix Systems Programmer
Oxford University Computing Services

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  0:59                                       ` Daniel Phillips
  2001-05-16  1:34                                         ` Nicolas Pitre
@ 2001-05-16 11:34                                         ` Erik Mouw
  2001-05-17 17:07                                         ` Eric W. Biederman
  2 siblings, 0 replies; 317+ messages in thread
From: Erik Mouw @ 2001-05-16 11:34 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Nicolas Pitre, Linux Kernel Mailing List

On Wed, May 16, 2001 at 02:59:30AM +0200, Daniel Phillips wrote:
> On Tuesday 15 May 2001 23:20, Nicolas Pitre wrote:
> > Personally, I'd really like to see /dev/ttyS0 be the first detected
> > serial port on a system, /dev/ttyS1 the second, etc.
> 
> There are well-defined rules for the first four on PC's.  The ttySx 
> better match the labels the OEM put on the box.

Nico's point is that there is a lot of linux beyond PCs. My LART[1] has
three serial ports which I didn't label at all. The official SA1100
serial driver has /dev/ttySA[0-2] allocated. Other ARM systems use
/dev/ttyS0. Guess what happens when you want to install debian-arm on
an SA1100 system. A serial device registry like we have for the sound
cards would be most welcome.


Erik

[1] StrongARM SA1100 embedded board, http://www.lart.tudelft.nl/

-- 
J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department
of Electrical Engineering, Faculty of Information Technology and Systems,
Delft University of Technology, PO BOX 5031,  2600 GA Delft, The Netherlands
Phone: +31-15-2783635  Fax: +31-15-2781843  Email: J.A.K.Mouw@its.tudelft.nl
WWW: http://www-ict.its.tudelft.nl/~erik/

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  6:56                                         ` Jonathan Lundell
  2001-05-16  8:02                                           ` Vojtech Pavlik
@ 2001-05-16 12:20                                           ` Bogdan Costescu
  2001-05-16 14:37                                           ` Jonathan Lundell
  2 siblings, 0 replies; 317+ messages in thread
From: Bogdan Costescu @ 2001-05-16 12:20 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: Andrew Morton, Linux Kernel Mailing List

On Tue, 15 May 2001, Jonathan Lundell wrote:

> >The 2.4 kernel allows you to rename an interface.  So you can build
> >a little database of (MAC address/name) pairs. Apply this after booting
> >and before bringing up the interfaces and everything has the name
> >you wanted, based on MAC address.
>
> There's a bit of a catch 22, though, if you don't have unique MAC
> addresses in the system (across multiple interfaces).

The same situation appears when using bonding.o. For several years,
Don Becker's (and derived) network drivers support changing MAC address
when the interface is down. So Al's /dev/eth/<n>/MAC has different values
depending on whether bonding is active or not. Should /dev/eth/<n>/MAC
always have the original value (to be able to uniquely identify this card)
or the in-use value (used by ARP, I believe) ? Or maybe have a
/dev/eth/<n>/MAC_in_use ?

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  1:00                   ` Daniel Phillips
@ 2001-05-16 12:58                     ` Jens Axboe
  0 siblings, 0 replies; 317+ messages in thread
From: Jens Axboe @ 2001-05-16 12:58 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Linus Torvalds, Neil Brown, Jeff Garzik, Alan Cox,
	H. Peter Anvin, Linux Kernel Mailing List, viro

On Wed, May 16 2001, Daniel Phillips wrote:
> On Tuesday 15 May 2001 17:34, Linus Torvalds wrote:
> > On Tue, 15 May 2001, Neil Brown wrote:
> > > Ofcourse setting the "queue" function that __blk_get_queue call to
> > > do a lookup of the minor and choose an appropriate queue for the
> > > "real" device wont work as you need to munge bh->b_rdev too.
> >
> > What I would do is:
> >  - remove b_rdev completely.
> 
> :-) And b_rsector too?

Way ahead of you, it's gone :-)

Neither of these are part of the buffer_head as a caching entity, they
belong purely in the I/O path. I'll show code in a day or two.

> > [...]
> 
> >  - replace is with b_index
> >
> > Then, the "get_queue" functions basically end up doing the mapping of
> >
> > 	b_dev -> <queue,b_index>
> 
> To clarify, will be b_index be in the buffer_head or not?

It should not

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  6:56                                         ` Jonathan Lundell
  2001-05-16  8:02                                           ` Vojtech Pavlik
  2001-05-16 12:20                                           ` Bogdan Costescu
@ 2001-05-16 14:37                                           ` Jonathan Lundell
  2001-05-16 14:57                                             ` Vojtech Pavlik
  2001-05-16 15:24                                             ` Jonathan Lundell
  2 siblings, 2 replies; 317+ messages in thread
From: Jonathan Lundell @ 2001-05-16 14:37 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: Linux Kernel Mailing List

At 10:02 AM +0200 2001-05-16, Vojtech Pavlik wrote:
>  > It's also  true that some buses simply don't yield up physical
>>  locations (ISA springs to mind,
>
>ISA is quite fine, you can use the i/o space as physical locations.

I meant physical not as in physical-vs-virtual addresses (all ISA 
addresses, memory or IO, are physical in this sense, by the time they 
get to the bus). Rather, I meant that you can't determine which slot 
a given device is plugged into. If you have two NICs in two ISA 
slots, there's no way to distinguish between the slots. In practice, 
you'd have to experiment or remove a card and check the jumpering or 
some such.
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 14:37                                           ` Jonathan Lundell
@ 2001-05-16 14:57                                             ` Vojtech Pavlik
  2001-05-16 15:24                                             ` Jonathan Lundell
  1 sibling, 0 replies; 317+ messages in thread
From: Vojtech Pavlik @ 2001-05-16 14:57 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: Linux Kernel Mailing List

On Wed, May 16, 2001 at 07:37:45AM -0700, Jonathan Lundell wrote:
> At 10:02 AM +0200 2001-05-16, Vojtech Pavlik wrote:
> >  > It's also  true that some buses simply don't yield up physical
> >>  locations (ISA springs to mind,
> >
> >ISA is quite fine, you can use the i/o space as physical locations.
> 
> I meant physical not as in physical-vs-virtual addresses (all ISA 
> addresses, memory or IO, are physical in this sense, by the time they 
> get to the bus). Rather, I meant that you can't determine which slot 
> a given device is plugged into. If you have two NICs in two ISA 
> slots, there's no way to distinguish between the slots. In practice, 
> you'd have to experiment or remove a card and check the jumpering or 
> some such.

Yes. But I meant that while this indeed is not possible, still the i/o
port address can be used instead of the slot number, because it at least
is physically jumpered and must be unique.

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 14:37                                           ` Jonathan Lundell
  2001-05-16 14:57                                             ` Vojtech Pavlik
@ 2001-05-16 15:24                                             ` Jonathan Lundell
  1 sibling, 0 replies; 317+ messages in thread
From: Jonathan Lundell @ 2001-05-16 15:24 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: Linux Kernel Mailing List

At 4:57 PM +0200 2001-05-16, Vojtech Pavlik wrote:
>On Wed, May 16, 2001 at 07:37:45AM -0700, Jonathan Lundell wrote:
>>  At 10:02 AM +0200 2001-05-16, Vojtech Pavlik wrote:
>>  >  > It's also  true that some buses simply don't yield up physical
>>  >>  locations (ISA springs to mind,
>>  >
>>  >ISA is quite fine, you can use the i/o space as physical locations.
>>
>>  I meant physical not as in physical-vs-virtual addresses (all ISA
>>  addresses, memory or IO, are physical in this sense, by the time they
>>  get to the bus). Rather, I meant that you can't determine which slot
>>  a given device is plugged into. If you have two NICs in two ISA
>>  slots, there's no way to distinguish between the slots. In practice,
>>  you'd have to experiment or remove a card and check the jumpering or
>>  some such.
>
>Yes. But I meant that while this indeed is not possible, still the i/o
>port address can be used instead of the slot number, because it at least
>is physically jumpered and must be unique.

Yes, I agree. And it's stable (whereas "physical" PCI addresses are 
not). Best we've got for ISA (though it's true for ISA memory 
addresses as well).

-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-14 19:19 LANANA: To Pending Device Number Registrants H. Peter Anvin
                   ` (2 preceding siblings ...)
  2001-05-15 17:37 ` Pavel Machek
@ 2001-05-16 15:58 ` Kurt Garloff
  3 siblings, 0 replies; 317+ messages in thread
From: Kurt Garloff @ 2001-05-16 15:58 UTC (permalink / raw)
  To: H. Peter Anvin, Linus Torvalds, alan; +Cc: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 4974 bytes --]

Hi HPA, Linus, Alan,

On Mon, May 14, 2001 at 12:19:34PM -0700, H. Peter Anvin wrote:
> Linus Torvalds has requested a moratorium on new device number
> assignments. His hope is that a new and better method for device space
> handing will emerge as a result.
> 
> Alan Cox has requested that I maintain a forked registry for his -ac
> kernel patch tree.  I have agreed to do so once I have forked off the
> "final" version of the registry for Linus' tree.
[...]

I've been following the discussion and would like to throw in my few cents.

First of all, I see perfectly Linus' point of disliking the manual
maintenance of device numbers. As the current solution is not elegant and
requires a lot of work by the device registrar and probably more and more
work in the future, this should be replaced by a better - automatic and
elegant - mechanism in the future. Point taken.

But I'm really surprised reading that the device registry is about to close
now. 
Well, we need the better mechanism, before doing so.

At first sight, devfs looks like the solution to this. But, on a second
sight, devfs looks like a compromise between a new system, which would 
completely abstract the details of the underlying hardware and just present
the devices from a user interface point of view, and the current system. 
We get rid of static major/minors, but the naming scheme still imposes to
know lots of details about the way your hardware is plugged together.
Furthermore, it seems, many people dislike devfs a lot. Instead of fixing a
few problems with devfs or (more likely) with devfsd, they seem to believe
the concept is fundamentally broken, otherwise their behaviour could only be
considered ridiculous.

So, we currently don't have a solution to this problem. A lot of good ideas
have been proposed in this thread, but no working code that addresses all
the issues (autoloading of modules, repeatability of naming, devices needed
early in boot stage, ...) is there or was at least designed to a point where
implementation is just a question of writing it down.
Furthermore, some userspace apps use the major no to determine the exact
type of a (otherwise similar) device to decide what functionality to offer
or how to implement certain operations.

In short: This change breaks currently working things.
This can all be fixed, of course, but I doubt it can happen in very short
time.

To be honest, I fail to believe that such a policy change is happening now,
in the middle of a stable kernel series, and I fail to believe that this is
really what Linus suggested. 
IMHO, this policy change affects the kernel more than a major rework of VM,
to just give an example. It's very definitely not stuff for 2.4.

As far as I know, HPA has not complained about his workload for the manual
maintenance of the the LANANA. Currently, it still seems doable, and HPA is
doing it, fortunately. Thanks! I've not seen anybody complain about the
way he does it either.

So, I would be very pleased if we could agree, that it's acceptable to go on
with the old manual device registry of HPA for the rest of the 2.4 kernel
series. Let's start the new policy with 2.5.1!

And please no fork! Every fork splits the community in two parts and
basically halves our power. I don't think this would help Linux to be
accepted by more people or to be able to be used to solve more problems 
in a nice and efficient way.

I think I do very well understand one of the reasons, why Alan wants to stay
with the old scheme. It (still) works. 
How would you imagine to make a stable Linux distribution, if such a
change happens now? I do not think that all apps can be fixed and a stable,
reproducible and reliable mechanism to manage device nodes can be introduced
within the time frame of 2.4 kernels. 
Even switching a distro to use devfs only looks easier than this.
So, we would need to keep to the old mechanism, if we don't want to risk
stability and manageability of a distro. Which is paramount ...
I would not like to be forced to use -ac kernels.
(Alan, don't get this wrong! It's great that you prepare kernels to test
 somewhat more experimental features or drivers, but I would like to have
 the choice.)

If we start the new device management with 2.5.1, there's plenty of time to
have the mechanism and the kernel stabilize, to get issues like automatic
module loading sorted and to adapt distributions before 2.6/3.0 comes out.
That's fine with me.

Well, maybe you never considered applying this policy change right now
within 2.4, Linus. Well, ignore my posting then, if you like.

Best regards,
-- 
Kurt Garloff                   <kurt@garloff.de>         [Eindhoven, NL]
Physics: Plasma simulations  <K.Garloff@Phys.TUE.NL>  [TU Eindhoven, NL]
Linux: SCSI, Security          <garloff@suse.de>   [SuSE Nuernberg, FRG]
 (See mail header or public key servers for PGP2 and GPG public keys.)

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:18                                     ` Linus Torvalds
                                                         ` (5 preceding siblings ...)
  2001-05-16  7:24                                       ` Geert Uytterhoeven
@ 2001-05-16 16:04                                       ` Michael Meissner
  2001-05-16 21:36                                         ` Andreas Dilger
  6 siblings, 1 reply; 317+ messages in thread
From: Michael Meissner @ 2001-05-16 16:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jonathan Lundell, Jeff Garzik, James Simmons, Alan Cox,
	Neil Brown, H. Peter Anvin, Linux Kernel Mailing List, viro

On Tue, May 15, 2001 at 01:18:09PM -0700, Linus Torvalds wrote:
> 
> On Tue, 15 May 2001, Jonathan Lundell wrote:
> > >
> > >Keep it informational. And NEVER EVER make it part of the design.
> > 
> > What about:
> > 
> > 1 (network domain). I have two network interfaces that I connect to 
> > two different network segments, eth0 & eth1;
> 
> So?
> 
> Informational. You can always ask what "eth0" and "eth1" are.
> 
> There's another side to this: repeatability. A setup should be
> _repeatable_.
> 
> This is what we have now. Network devices are called "eth0..N", and nobody
> is complaining about the fact that the numbering is basically random. It
> is _repeatable_ as long as you don't change your hardware setup, and the
> numbering has effectively _nothing_ to do with "location".

Well yes and no.  The numbers are currently repeatable for a given kernel, but
I know I and others were bitten by the 2.2. to 2.4 transition, where the kernel
used a different algorithm for the order in which it detected scsi and network
adapters (ie, in my machine with 3 scsi adapters, Linux 2.2 always picked the
Adaptec scsi adapter builtin into my motherboard as the first adapter, but 2.4
decided to pick my TekRam 390F adapter).

As lots of people have been saying, you need to know which physical slot to
plut the wire connecting eth0, eth1, etc. into.  Similarly for serial ports, if
I have 3 or 4 (or 127 :-) USB serial devices, I really don't want to have to
change my cabling each time I boot or change OSes (since I doubt my UPS will be
happy if I give it the commands destined for the X10 controller or my remote
boards).

-- 
Michael Meissner, Red Hat, Inc.  (GCC group)
PMB 198, 174 Littleton Road #3, Westford, Massachusetts 01886, USA
Work:	  meissner@redhat.com		phone: +1 978-486-9304
Non-work: meissner@spectacle-pond.org	fax:   +1 978-692-4482

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  8:29                                 ` Helge Hafting
@ 2001-05-16 17:16                                   ` James Simmons
  0 siblings, 0 replies; 317+ messages in thread
From: James Simmons @ 2001-05-16 17:16 UTC (permalink / raw)
  To: Helge Hafting; +Cc: linux-kernel


> mmap is fine for a fb, but please don't remove read/write.
> I can now do a screendump with "cat /dev/fb/0 > file", 
> because everything is a file.
> Having 
> /dev/fb/0/brightness
> /dev/fb/0/opengl
> and so on seems to be a better approach.

One I like to name of the file system to be something else. This way apps
can move over to it. You will still be able to do the above. I plan to
have for each framebuffer

/dev/gfx/frameX. 

So say your card can do double buffer you could do 

cat /dev/gfx/frame0 > file1
cat /dev/gfx/frame1 > file2

So you could access the double buffer as well :-)


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 21:41                         ` Richard Gooch
                                             ` (4 preceding siblings ...)
  2001-05-16  7:21                           ` Geert Uytterhoeven
@ 2001-05-16 18:22                           ` Richard Gooch
  2001-05-16 19:36                             ` H. Peter Anvin
                                               ` (3 more replies)
  5 siblings, 4 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-16 18:22 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alan Cox, Ingo Oeser, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List, viro

Geert Uytterhoeven writes:
> On Tue, 15 May 2001, Richard Gooch wrote:
> > Alan Cox writes:
> > > > 	len = readlink ("/proc/self/3", buffer, buflen);
> > > > 	if (strcmp (buffer + len - 2, "cd") != 0) {
> > > > 		fprintf (stderr, "Not a CD-ROM! Bugger off.\n");
> > > > 		exit (1);
> > > 
> > > And on my box cd is the cabbage dicer whoops
> > 
> > Actually, no, because it's guaranteed that a trailing "/cd" is a
> > CD-ROM. That's the standard.
> 
> Then  check for `/cd' at the end instead of `cd' :-)

Argh! What I wrote in text is what I meant to say. The code didn't
match. No wonder people seemed to be missing the point. So the line of
code I actually meant was:
	if (strcmp (buffer + len - 3, "/cd") != 0) {

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 18:22                           ` Richard Gooch
@ 2001-05-16 19:36                             ` H. Peter Anvin
  2001-05-16 20:01                             ` Richard Gooch
                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-16 19:36 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Geert Uytterhoeven, Alan Cox, Ingo Oeser, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

Richard Gooch wrote:
> 
> Geert Uytterhoeven writes:
> > On Tue, 15 May 2001, Richard Gooch wrote:
> > > Alan Cox writes:
> > > > >         len = readlink ("/proc/self/3", buffer, buflen);
> > > > >         if (strcmp (buffer + len - 2, "cd") != 0) {
> > > > >                 fprintf (stderr, "Not a CD-ROM! Bugger off.\n");
> > > > >                 exit (1);
> > > >
> > > > And on my box cd is the cabbage dicer whoops
> > >
> > > Actually, no, because it's guaranteed that a trailing "/cd" is a
> > > CD-ROM. That's the standard.
> >
> > Then  check for `/cd' at the end instead of `cd' :-)
> 
> Argh! What I wrote in text is what I meant to say. The code didn't
> match. No wonder people seemed to be missing the point. So the line of
> code I actually meant was:
>         if (strcmp (buffer + len - 3, "/cd") != 0) {
> 

This is still a really bad idea.  You don't want to tie this kind of
things to the name.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 18:22                           ` Richard Gooch
  2001-05-16 19:36                             ` H. Peter Anvin
@ 2001-05-16 20:01                             ` Richard Gooch
  2001-05-16 20:05                               ` H. Peter Anvin
                                                 ` (4 more replies)
  2001-05-16 23:51                             ` Alan Cox
  2001-05-16 23:58                             ` Richard Gooch
  3 siblings, 5 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-16 20:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Geert Uytterhoeven, Alan Cox, Ingo Oeser, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

H. Peter Anvin writes:
> Richard Gooch wrote:
> > Argh! What I wrote in text is what I meant to say. The code didn't
> > match. No wonder people seemed to be missing the point. So the line of
> > code I actually meant was:
> >         if (strcmp (buffer + len - 3, "/cd") != 0) {
> 
> This is still a really bad idea.  You don't want to tie this kind of
> things to the name.

Why do you think it's a bad idea?

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 20:01                             ` Richard Gooch
@ 2001-05-16 20:05                               ` H. Peter Anvin
  2001-05-16 20:18                               ` Linus Torvalds
                                                 ` (3 subsequent siblings)
  4 siblings, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-16 20:05 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Geert Uytterhoeven, Alan Cox, Ingo Oeser, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

Richard Gooch wrote:
> 
> H. Peter Anvin writes:
> > Richard Gooch wrote:
> > > Argh! What I wrote in text is what I meant to say. The code didn't
> > > match. No wonder people seemed to be missing the point. So the line of
> > > code I actually meant was:
> > >         if (strcmp (buffer + len - 3, "/cd") != 0) {
> >
> > This is still a really bad idea.  You don't want to tie this kind of
> > things to the name.
> 
> Why do you think it's a bad idea?
> 

Because you are now, once again, tying two things that are completely and
utterly unrelated: device classification and device name.  It breaks
every time someone comes out with a new device which is "kind of like an
old device, but not really," like CD-writers (which was kind-of-like
WORM, kind-of-like CD-ROM) and DVD (kind-of-like CD)... 

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 20:01                             ` Richard Gooch
  2001-05-16 20:05                               ` H. Peter Anvin
@ 2001-05-16 20:18                               ` Linus Torvalds
  2001-05-16 20:44                               ` Richard Gooch
                                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-16 20:18 UTC (permalink / raw)
  To: Richard Gooch
  Cc: H. Peter Anvin, Geert Uytterhoeven, Alan Cox, Ingo Oeser,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro


On Wed, 16 May 2001, Richard Gooch wrote:
> > 
> > This is still a really bad idea.  You don't want to tie this kind of
> > things to the name.
> 
> Why do you think it's a bad idea?

Well, one reason names are bad is that they don't always exist.

If you only have the fd (remember that unix notion of using <stdin> and
<stdout>), you'd have no clue where the thing came from. So something else
than the name is certainly a good idea for some of these issues.

That said, I still think the real problem is rampant use of ioctl's, which
are a bad idea in the first place. Magic numbers are always bad, and are a
sign of bad design.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 23:39                         ` Chip Salzenberg
@ 2001-05-16 20:37                           ` Alan Cox
  0 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-16 20:37 UTC (permalink / raw)
  To: Chip Salzenberg
  Cc: Alan Cox, Linus Torvalds, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List, viro

> I don't mean to suggest that ioctls be used to deduce device types
> (except in the case of overlapping ioctl numbers, which shouldn't be
> all *that* common (I hope)).  I mean to suggest that the question
> "What device type are you?" usually shouldn't even be asked!

But people need to ask it. Sometimes it really matters. It doesnt have to be
in your face as /dev/hda1 versus /dev/sda1 is but it has to be possible


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 20:01                             ` Richard Gooch
  2001-05-16 20:05                               ` H. Peter Anvin
  2001-05-16 20:18                               ` Linus Torvalds
@ 2001-05-16 20:44                               ` Richard Gooch
  2001-05-16 20:54                               ` Richard Gooch
  2001-05-17 21:06                               ` Kai Henningsen
  4 siblings, 0 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-16 20:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Geert Uytterhoeven, Alan Cox, Ingo Oeser,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

Linus Torvalds writes:
> 
> On Wed, 16 May 2001, Richard Gooch wrote:
> > > 
> > > This is still a really bad idea.  You don't want to tie this kind of
> > > things to the name.
> > 
> > Why do you think it's a bad idea?
> 
> Well, one reason names are bad is that they don't always exist.
> 
> If you only have the fd (remember that unix notion of using <stdin>
> and <stdout>), you'd have no clue where the thing came from. So
> something else than the name is certainly a good idea for some of
> these issues.

But, as I described in my original message, you use /proc/self/fd to
find where the fd came from. Or are you saying that you can't rely on
having /proc available?

Or do you have other reasons not to like the scheme I described? One
of the reasons I like it is because it requires no new kernel code.

> That said, I still think the real problem is rampant use of ioctl's,
> which are a bad idea in the first place. Magic numbers are always
> bad, and are a sign of bad design.

No argument from me.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 20:01                             ` Richard Gooch
                                                 ` (2 preceding siblings ...)
  2001-05-16 20:44                               ` Richard Gooch
@ 2001-05-16 20:54                               ` Richard Gooch
  2001-05-16 21:36                                 ` H. Peter Anvin
  2001-05-17 21:06                               ` Kai Henningsen
  4 siblings, 1 reply; 317+ messages in thread
From: Richard Gooch @ 2001-05-16 20:54 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Geert Uytterhoeven, Alan Cox, Ingo Oeser, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

H. Peter Anvin writes:
> Richard Gooch wrote:
> > 
> > H. Peter Anvin writes:
> > > Richard Gooch wrote:
> > > > Argh! What I wrote in text is what I meant to say. The code didn't
> > > > match. No wonder people seemed to be missing the point. So the line of
> > > > code I actually meant was:
> > > >         if (strcmp (buffer + len - 3, "/cd") != 0) {
> > >
> > > This is still a really bad idea.  You don't want to tie this kind of
> > > things to the name.
> > 
> > Why do you think it's a bad idea?
> 
> Because you are now, once again, tying two things that are
> completely and utterly unrelated: device classification and device
> name.  It breaks every time someone comes out with a new device
> which is "kind of like an old device, but not really," like
> CD-writers (which was kind-of-like WORM, kind-of-like CD-ROM) and
> DVD (kind-of-like CD)...

But all devices which export a CD-ROM interface will do so. So the
device node that is associated with the CD-ROM driver will export
CD-ROM semantics, and the trailing name will be "/cd".

Other interfaces a device exports, such as a CD-RW, appear as a
different device node ("generic" for SCSI, because we have no CD-RW
classification at this point).

My scheme works already, and works reliably. Nothing had to be done to
support the CD-ROM interface to CD-RW and DVD devices.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 20:54                               ` Richard Gooch
@ 2001-05-16 21:36                                 ` H. Peter Anvin
  2001-05-16 22:11                                   ` Ingo Oeser
  0 siblings, 1 reply; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-16 21:36 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Geert Uytterhoeven, Alan Cox, Ingo Oeser, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

Richard Gooch wrote:
> >
> > Because you are now, once again, tying two things that are
> > completely and utterly unrelated: device classification and device
> > name.  It breaks every time someone comes out with a new device
> > which is "kind of like an old device, but not really," like
> > CD-writers (which was kind-of-like WORM, kind-of-like CD-ROM) and
> > DVD (kind-of-like CD)...
> 
> But all devices which export a CD-ROM interface will do so. So the
> device node that is associated with the CD-ROM driver will export
> CD-ROM semantics, and the trailing name will be "/cd".
> 
> Other interfaces a device exports, such as a CD-RW, appear as a
> different device node ("generic" for SCSI, because we have no CD-RW
> classification at this point).
> 
> My scheme works already, and works reliably. Nothing had to be done to
> support the CD-ROM interface to CD-RW and DVD devices.
> 

It's still completely braindamaged: (a) these interfaces aren't
disjoint.  They refer to the same device, and will interfere with each
other; (b) it is highly undesirable to tie the naming to the interfaces
in this way.  It further restricts the namespaces you can export, for one
thing.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 16:04                                       ` Michael Meissner
@ 2001-05-16 21:36                                         ` Andreas Dilger
  0 siblings, 0 replies; 317+ messages in thread
From: Andreas Dilger @ 2001-05-16 21:36 UTC (permalink / raw)
  To: Michael Meissner
  Cc: Linus Torvalds, Jonathan Lundell, Jeff Garzik, James Simmons,
	Alan Cox, Neil Brown, H. Peter Anvin, Linux Kernel Mailing List,
	viro

Michael Meissner writes:
> On Tue, May 15, 2001 at 01:18:09PM -0700, Linus Torvalds wrote:
> > This is what we have now. Network devices are called "eth0..N", and nobody
> > is complaining about the fact that the numbering is basically random. It
> > is _repeatable_ as long as you don't change your hardware setup, and the
> > numbering has effectively _nothing_ to do with "location".
> 
> Well yes and no.  The numbers are currently repeatable for a given kernel,
> but I know I and others were bitten by the 2.2. to 2.4 transition, where
> the kernel used a different algorithm for the order in which it detected
> scsi and network adapters (ie, in my machine with 3 scsi adapters, Linux 2.2
> always picked the Adaptec scsi adapter builtin into my motherboard as the
> first adapter, but 2.4 decided to pick my TekRam 390F adapter).

With a proper user-space solution for device naming, you wouldn't care what
order the kernel enumerated devices in.  You want the kernel to list all of
the devices (in any order), and then user-space is in charge of creating
(or maintaining) a semi-permanent ID to device mapping regardless of what
the major/minor number or physical device location is.

> As lots of people have been saying, you need to know which physical slot to
> plut the wire connecting eth0, eth1, etc. into.  Similarly for serial ports,
> if I have 3 or 4 (or 127 :-) USB serial devices, I really don't want to have
> to change my cabling each time I boot or change OSes (since I doubt my UPS
> will be happy if I give it the commands destined for the X10 controller or
> my remote boards).

If you keep a "static" database (in userspace) of device name -> physical
device mappings, then you are safe as long as either:
a) There is some way to identify a device which has moved (H/W serial number,
   LVM/fs UUID/label, unique make/model, etc).
b) You can get some physical location information about the device (i.e.
   I/O port address, bus/slot information, etc).

Linux currently always assumes that (b) implies the order of enumerating
devices is fixed, even when it isn't always true (hence problems with
SCSI addressing, ethernet cards, etc).  We _should_ be using (a) as much
as it is possible.  If both (a) and (b) are not true (e.g. two USB mice
(no serial number) and you change your USB layout) then there isn't much
that software can do without user intervention.

At least if there is a simple mapping table maintained in user space(*),
it is easy to switch the identities of devices to whatever they want with
little effort, rather than having to re-do all of their other config files.

Note that despite the presence of (b), we should NOT use this information
as part of the device name, since we want to be able to keep the same
device name (possibly with a _small_ bit of user intervention) even if
the device moves around.

Cheers, Andreas

(*) Something like a simple lookup table with name=value pairs (very e.g.):

ide-serial:a1e2a40a5e03=disk0
scsi-serial:xj23as88d=disk1
mdraid-uuid:3b02f06c-2c33-5905-c247-1f806535505c=disk7
isa-ioport:03f8=ttyS0
isa-ioport:02f8=ttyS1
isa-ioport:02e8=ttyS3
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 21:36                                 ` H. Peter Anvin
@ 2001-05-16 22:11                                   ` Ingo Oeser
  2001-05-16 22:13                                     ` H. Peter Anvin
  2001-05-16 23:03                                     ` Richard Gooch
  0 siblings, 2 replies; 317+ messages in thread
From: Ingo Oeser @ 2001-05-16 22:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Richard Gooch, Geert Uytterhoeven, Alan Cox, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

On Wed, May 16, 2001 at 02:36:44PM -0700, H. Peter Anvin wrote:
> > But all devices which export a CD-ROM interface will do so. So the
> > device node that is associated with the CD-ROM driver will export
> > CD-ROM semantics, and the trailing name will be "/cd".
> > 
> > Other interfaces a device exports, such as a CD-RW, appear as a
> > different device node ("generic" for SCSI, because we have no CD-RW
> > classification at this point).
> > 
> > My scheme works already, and works reliably. Nothing had to be done to
> > support the CD-ROM interface to CD-RW and DVD devices.
> > 
> 
> It's still completely braindamaged: (a) these interfaces aren't
> disjoint.  They refer to the same device, and will interfere with each
> other; (b) it is highly undesirable to tie the naming to the interfaces
> in this way.  It further restricts the namespaces you can export, for one
> thing.

We do this already with ide-scsi. A device is visible as /dev/hda
and /dev/sda at the same time. Or think IDE-CDRW: /dev/hda,
/dev/sr0 and /dev/sg0.

All at the same time.

It is perfectly normal to export different interfaces for the
same device. This is basically, what subfunctions on PCI do: Same
device with different interfaces. 

Just that we do it through a driver with ide and through the
hardware with a multi function PCI card.

Applications don't care about devices. They care about entities
that have capabilities and programming interfaces. What they
_really_ are and if this is only emulated is not important.

Sorry, I don't see your point here :-(


Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
         <<<<<<<<<<<<     been there and had much fun   >>>>>>>>>>>>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 22:11                                   ` Ingo Oeser
@ 2001-05-16 22:13                                     ` H. Peter Anvin
  2001-05-16 22:21                                       ` Jens Axboe
  2001-05-16 23:03                                     ` Richard Gooch
  1 sibling, 1 reply; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-16 22:13 UTC (permalink / raw)
  To: Ingo Oeser
  Cc: Richard Gooch, Geert Uytterhoeven, Alan Cox, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

Ingo Oeser wrote:
> 
> We do this already with ide-scsi. A device is visible as /dev/hda
> and /dev/sda at the same time. Or think IDE-CDRW: /dev/hda,
> /dev/sr0 and /dev/sg0.
> 
> All at the same time.
> 

... and if you don't know about this funny aliasing, you get screwed. 
This is BAD DESIGN, once again.

> It is perfectly normal to export different interfaces for the
> same device. This is basically, what subfunctions on PCI do: Same
> device with different interfaces.
> 
> Just that we do it through a driver with ide and through the
> hardware with a multi function PCI card.
> 
> Applications don't care about devices. They care about entities
> that have capabilities and programming interfaces. What they
> _really_ are and if this is only emulated is not important.
> 
> Sorry, I don't see your point here :-(
> 

That seems to be a common theme with you.

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 22:13                                     ` H. Peter Anvin
@ 2001-05-16 22:21                                       ` Jens Axboe
  0 siblings, 0 replies; 317+ messages in thread
From: Jens Axboe @ 2001-05-16 22:21 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Oeser, Richard Gooch, Geert Uytterhoeven, Alan Cox,
	Linus Torvalds, Neil Brown, Jeff Garzik,
	Linux Kernel Mailing List, viro

On Wed, May 16 2001, H. Peter Anvin wrote:
> Ingo Oeser wrote:
> > 
> > We do this already with ide-scsi. A device is visible as /dev/hda
> > and /dev/sda at the same time. Or think IDE-CDRW: /dev/hda,
> > /dev/sr0 and /dev/sg0.
> > 
> > All at the same time.
> > 
> 
> ... and if you don't know about this funny aliasing, you get screwed. 
> This is BAD DESIGN, once again.

And he's wrong too, we don't do this all the time. If /dev/hda is ide-cd
controlled, then it can't be accessed through /dev/sr0 -- and vice
versa. sg vs sr is different, one is a char the other a block device.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 22:11                                   ` Ingo Oeser
  2001-05-16 22:13                                     ` H. Peter Anvin
@ 2001-05-16 23:03                                     ` Richard Gooch
  2001-05-16 23:25                                       ` H. Peter Anvin
  2001-05-16 23:37                                       ` Richard Gooch
  1 sibling, 2 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-16 23:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Oeser, Geert Uytterhoeven, Alan Cox, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

H. Peter Anvin writes:
> Ingo Oeser wrote:
> > 
> > We do this already with ide-scsi. A device is visible as /dev/hda
> > and /dev/sda at the same time. Or think IDE-CDRW: /dev/hda,
> > /dev/sr0 and /dev/sg0.
> > 
> > All at the same time.
> > 
> 
> ... and if you don't know about this funny aliasing, you get screwed. 
> This is BAD DESIGN, once again.

We have this aliasing anyway. sg and sr are just one example. If you
care about conflicts, then make sure the drivers lock each other out.
It's got nothing to do with the mechanism to find out whether
something can behave like a CD-ROM or not.

> > Sorry, I don't see your point here :-(
> 
> That seems to be a common theme with you.

C'mon, Peter. No need for that.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:03                                     ` Richard Gooch
@ 2001-05-16 23:25                                       ` H. Peter Anvin
  2001-05-16 23:37                                       ` Richard Gooch
  1 sibling, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-16 23:25 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Ingo Oeser, Geert Uytterhoeven, Alan Cox, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

Richard Gooch wrote:
> 
> H. Peter Anvin writes:
> > Ingo Oeser wrote:
> > >
> > > We do this already with ide-scsi. A device is visible as /dev/hda
> > > and /dev/sda at the same time. Or think IDE-CDRW: /dev/hda,
> > > /dev/sr0 and /dev/sg0.
> > >
> > > All at the same time.
> > >
> >
> > ... and if you don't know about this funny aliasing, you get screwed.
> > This is BAD DESIGN, once again.
> 
> We have this aliasing anyway. sg and sr are just one example. If you
> care about conflicts, then make sure the drivers lock each other out.
> It's got nothing to do with the mechanism to find out whether
> something can behave like a CD-ROM or not.
> 

No fscking way.  What you're saying "well, my design is broken, so break
your driver even further."  You're suggesting prohibiting legal (and
useful) operations because you're advocating an idiotic design to
identify devices?  Give me a break.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  7:24                                       ` Geert Uytterhoeven
@ 2001-05-16 23:26                                         ` Alan Cox
  2001-05-16 23:31                                           ` H. Peter Anvin
                                                             ` (2 more replies)
  0 siblings, 3 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-16 23:26 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linus Torvalds, Jonathan Lundell, Jeff Garzik, James Simmons,
	Alan Cox, Neil Brown, H. Peter Anvin, Linux Kernel Mailing List,
	viro

> Are FireWire (and USB) disks always detected in the same order? Or does it
> behave like ADB, where you never know which mouse/keyboard is which
> mouse/keyboard?

USB disks are required (haha etc) to have serial numbers. Firewire similarly
has unique disk identifiers.  


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:26                                         ` Alan Cox
@ 2001-05-16 23:31                                           ` H. Peter Anvin
  2001-05-16 23:53                                             ` Linus Torvalds
                                                               ` (2 more replies)
  2001-05-16 23:52                                           ` Linus Torvalds
  2001-05-17  1:26                                           ` Joel Becker
  2 siblings, 3 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-16 23:31 UTC (permalink / raw)
  To: Alan Cox
  Cc: Geert Uytterhoeven, Linus Torvalds, Jonathan Lundell,
	Jeff Garzik, James Simmons, Neil Brown,
	Linux Kernel Mailing List, viro

Alan Cox wrote:
> 
> > Are FireWire (and USB) disks always detected in the same order? Or does it
> > behave like ADB, where you never know which mouse/keyboard is which
> > mouse/keyboard?
> 
> USB disks are required (haha etc) to have serial numbers. Firewire similarly
> has unique disk identifiers.
> 

How about for other device classes?

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:03                                     ` Richard Gooch
  2001-05-16 23:25                                       ` H. Peter Anvin
@ 2001-05-16 23:37                                       ` Richard Gooch
  2001-05-16 23:38                                         ` H. Peter Anvin
  2001-05-16 23:41                                         ` Richard Gooch
  1 sibling, 2 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-16 23:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Oeser, Geert Uytterhoeven, Alan Cox, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

H. Peter Anvin writes:
> Richard Gooch wrote:
> > We have this aliasing anyway. sg and sr are just one example. If you
> > care about conflicts, then make sure the drivers lock each other out.
> > It's got nothing to do with the mechanism to find out whether
> > something can behave like a CD-ROM or not.
> 
> No fscking way.  What you're saying "well, my design is broken, so
> break your driver even further."  You're suggesting prohibiting
> legal (and useful) operations because you're advocating an idiotic
> design to identify devices?  Give me a break.

Erm, let's start again. My central point is that you can use devfs
names to reliably figure out what kind of device a FD is, as a cleaner
alternative to comparing major numbers. Therefore, I'm challenging the
notion that you need to reserve magic major numbers in order to
distinguish devices.

I suspect you're thinking about a different problem, which is finding
out what a device can do. Implementing some kind of capability list
may well be a good approach to *that* problem. There are some details
to figure out, like how multiple drivers interact with each other.
They could be tricky.

Now, with the above said, what operations do you think I'm
prohibiting?

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:37                                       ` Richard Gooch
@ 2001-05-16 23:38                                         ` H. Peter Anvin
  2001-05-16 23:41                                         ` Richard Gooch
  1 sibling, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-16 23:38 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Ingo Oeser, Geert Uytterhoeven, Alan Cox, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

Richard Gooch wrote:
> 
> Erm, let's start again. My central point is that you can use devfs
> names to reliably figure out what kind of device a FD is, as a cleaner
> alternative to comparing major numbers. Therefore, I'm challenging the
> notion that you need to reserve magic major numbers in order to
> distinguish devices.
> 

Noone in this tree has made that claim.  Everyone agree it's butt-ugly. 
However, your solution is by and large just as butt-ugly.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:37                                       ` Richard Gooch
  2001-05-16 23:38                                         ` H. Peter Anvin
@ 2001-05-16 23:41                                         ` Richard Gooch
  2001-05-16 23:43                                           ` H. Peter Anvin
  2001-05-16 23:49                                           ` Richard Gooch
  1 sibling, 2 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-16 23:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Oeser, Geert Uytterhoeven, Alan Cox, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

H. Peter Anvin writes:
> Richard Gooch wrote:
> > 
> > Erm, let's start again. My central point is that you can use devfs
> > names to reliably figure out what kind of device a FD is, as a cleaner
> > alternative to comparing major numbers. Therefore, I'm challenging the
> > notion that you need to reserve magic major numbers in order to
> > distinguish devices.
> 
> Noone in this tree has made that claim.  Everyone agree it's
> butt-ugly.  However, your solution is by and large just as
> butt-ugly.

So you'd prefer some kind of capability list?

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:41                                         ` Richard Gooch
@ 2001-05-16 23:43                                           ` H. Peter Anvin
  2001-05-16 23:49                                           ` Richard Gooch
  1 sibling, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-16 23:43 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Ingo Oeser, Geert Uytterhoeven, Alan Cox, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

Richard Gooch wrote:
> 
> H. Peter Anvin writes:
> > Richard Gooch wrote:
> > >
> > > Erm, let's start again. My central point is that you can use devfs
> > > names to reliably figure out what kind of device a FD is, as a cleaner
> > > alternative to comparing major numbers. Therefore, I'm challenging the
> > > notion that you need to reserve magic major numbers in order to
> > > distinguish devices.
> >
> > Noone in this tree has made that claim.  Everyone agree it's
> > butt-ugly.  However, your solution is by and large just as
> > butt-ugly.
> 
> So you'd prefer some kind of capability list?
> 

Yes.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:41                                         ` Richard Gooch
  2001-05-16 23:43                                           ` H. Peter Anvin
@ 2001-05-16 23:49                                           ` Richard Gooch
  2001-05-16 23:55                                             ` H. Peter Anvin
  2001-05-17 21:12                                             ` Kai Henningsen
  1 sibling, 2 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-16 23:49 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux Kernel Mailing List

[Cc: list trimmed because I figure people are getting tired of us:-]
H. Peter Anvin writes:
> Richard Gooch wrote:
> > 
> > H. Peter Anvin writes:
> > > Richard Gooch wrote:
> > > >
> > > > Erm, let's start again. My central point is that you can use devfs
> > > > names to reliably figure out what kind of device a FD is, as a cleaner
> > > > alternative to comparing major numbers. Therefore, I'm challenging the
> > > > notion that you need to reserve magic major numbers in order to
> > > > distinguish devices.
> > >
> > > Noone in this tree has made that claim.  Everyone agree it's
> > > butt-ugly.  However, your solution is by and large just as
> > > butt-ugly.
> > 
> > So you'd prefer some kind of capability list?

OK. How do you figure on dealing with the problem of multiple
high-level drivers talking to the same device? How does sr.o "know"
that this is also a CD-RW? How does sg.o "know" that this is also a
tape?

Where does the responsibility lie for figuring out the capabilities?

Further, which device node/fs/driver exports the capability list?

And what about locking between drivers?

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 18:22                           ` Richard Gooch
  2001-05-16 19:36                             ` H. Peter Anvin
  2001-05-16 20:01                             ` Richard Gooch
@ 2001-05-16 23:51                             ` Alan Cox
  2001-05-16 23:58                             ` Richard Gooch
  3 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-16 23:51 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Geert Uytterhoeven, Alan Cox, Ingo Oeser, Linus Torvalds,
	Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro

> Argh! What I wrote in text is what I meant to say. The code didn't
> match. No wonder people seemed to be missing the point. So the line of
> code I actually meant was:
> 	if (strcmp (buffer + len - 3, "/cd") != 0) {

drivers/kitchen/bluetooth/vegerack/cd

its the cabbage dicer still ..

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:26                                         ` Alan Cox
  2001-05-16 23:31                                           ` H. Peter Anvin
@ 2001-05-16 23:52                                           ` Linus Torvalds
  2001-05-17  1:26                                           ` Joel Becker
  2 siblings, 0 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-16 23:52 UTC (permalink / raw)
  To: Alan Cox
  Cc: Geert Uytterhoeven, Jonathan Lundell, Jeff Garzik, James Simmons,
	Neil Brown, H. Peter Anvin, Linux Kernel Mailing List, viro


On Thu, 17 May 2001, Alan Cox wrote:
>
> > Are FireWire (and USB) disks always detected in the same order? Or does it
> > behave like ADB, where you never know which mouse/keyboard is which
> > mouse/keyboard?
> 
> USB disks are required (haha etc) to have serial numbers. Firewire similarly
> has unique disk identifiers.  

Well, as that doesn't actually work out in practice, the good news is that
USB at least _tries_ to always walk the tree the same way when detecting
devices, so if you don't change where your devices are in the topologu
they should show up in similar places.

Of course, "not changing topology" also means things like not powering off
devices with external power supplies etc..

The serial numbers are probably not that reliable.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:31                                           ` H. Peter Anvin
@ 2001-05-16 23:53                                             ` Linus Torvalds
  2001-05-17  0:21                                             ` Alan Cox
  2001-05-17  6:43                                             ` Thomas Sailer
  2 siblings, 0 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-16 23:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Alan Cox, Geert Uytterhoeven, Jonathan Lundell, Jeff Garzik,
	James Simmons, Neil Brown, Linux Kernel Mailing List, viro


On Wed, 16 May 2001, H. Peter Anvin wrote:
> Alan Cox wrote:
> > 
> > > Are FireWire (and USB) disks always detected in the same order? Or does it
> > > behave like ADB, where you never know which mouse/keyboard is which
> > > mouse/keyboard?
> > 
> > USB disks are required (haha etc) to have serial numbers. Firewire similarly
> > has unique disk identifiers.
> 
> How about for other device classes?

Note that this whole decision hinges on a fact that simply isn't _true_.

You simply _cannot_ get the physical location of many devices. Sometimes
the topology of the bus is basically anonymous - there _is_ no location.

People had better just accept this. Don't get hung up about where
something is.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:49                                           ` Richard Gooch
@ 2001-05-16 23:55                                             ` H. Peter Anvin
  2001-05-17 21:12                                             ` Kai Henningsen
  1 sibling, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-16 23:55 UTC (permalink / raw)
  To: Richard Gooch; +Cc: Linux Kernel Mailing List

Richard Gooch wrote:
> 
> OK. How do you figure on dealing with the problem of multiple
> high-level drivers talking to the same device? How does sr.o "know"
> that this is also a CD-RW? How does sg.o "know" that this is also a
> tape?
> 

At some point something talks to the device -- in this case, it's the
SCSI layer.  Follow the interfaces in the kernel and it becomes obvious.

> Where does the responsibility lie for figuring out the capabilities?
> 
> Further, which device node/fs/driver exports the capability list?
> 
> And what about locking between drivers?

Orthogonal issue.  You may want a locking mechanism, but it almost
certainly should not be automatic.

Note that especially ide-scsi is a good example on how *not* to do
things.  The fact that you have to choose one of two interfaces for
different operations (I can't use a CD-writer in the default
configuration!) is insane.

	-hpa


-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 18:22                           ` Richard Gooch
                                               ` (2 preceding siblings ...)
  2001-05-16 23:51                             ` Alan Cox
@ 2001-05-16 23:58                             ` Richard Gooch
  2001-05-17  0:12                               ` H. Peter Anvin
                                                 ` (2 more replies)
  3 siblings, 3 replies; 317+ messages in thread
From: Richard Gooch @ 2001-05-16 23:58 UTC (permalink / raw)
  To: Alan Cox
  Cc: Richard Gooch, Geert Uytterhoeven, Ingo Oeser, Linus Torvalds,
	Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro

Alan Cox writes:
> > Argh! What I wrote in text is what I meant to say. The code didn't
> > match. No wonder people seemed to be missing the point. So the line of
> > code I actually meant was:
> > 	if (strcmp (buffer + len - 3, "/cd") != 0) {
> 
> drivers/kitchen/bluetooth/vegerack/cd
> 
> its the cabbage dicer still ..

No, because it violates the standard. Just as we can define a major
number to have a specific meaning, we can define a name in the devfs
namespace to have a specific meaning.

Yes, it's broken if someone writes a cabbage dicer driver and uses
"cd" as the leaf node name for devfs.

Yes, it's broken if someone writes a cabbage dicer driver and uses
the same major as the IDE CD-ROM or SCSI CD-ROM drivers.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:58                             ` Richard Gooch
@ 2001-05-17  0:12                               ` H. Peter Anvin
  2001-05-17  0:24                               ` Alan Cox
  2001-05-17  1:35                               ` Jeff Garzik
  2 siblings, 0 replies; 317+ messages in thread
From: H. Peter Anvin @ 2001-05-17  0:12 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Alan Cox, Geert Uytterhoeven, Ingo Oeser, Linus Torvalds,
	Neil Brown, Jeff Garzik, Linux Kernel Mailing List, viro

Richard Gooch wrote:
> 
> Alan Cox writes:
> > > Argh! What I wrote in text is what I meant to say. The code didn't
> > > match. No wonder people seemed to be missing the point. So the line of
> > > code I actually meant was:
> > >     if (strcmp (buffer + len - 3, "/cd") != 0) {
> >
> > drivers/kitchen/bluetooth/vegerack/cd
> >
> > its the cabbage dicer still ..
> 
> No, because it violates the standard. Just as we can define a major
> number to have a specific meaning, we can define a name in the devfs
> namespace to have a specific meaning.
> 
> Yes, it's broken if someone writes a cabbage dicer driver and uses
> "cd" as the leaf node name for devfs.
> 
> Yes, it's broken if someone writes a cabbage dicer driver and uses
> the same major as the IDE CD-ROM or SCSI CD-ROM drivers.
> 

But unlike the latter case, your case isn't even self-enforcing. 
Furthermore, it puts a lot of future restrictions on the namespace, and
take it from me, you don't want to do that.

That, of course, is in addition to everything else...

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:31                                           ` H. Peter Anvin
  2001-05-16 23:53                                             ` Linus Torvalds
@ 2001-05-17  0:21                                             ` Alan Cox
  2001-05-17  7:57                                               ` Geert Uytterhoeven
  2001-05-17 16:26                                               ` James Simmons
  2001-05-17  6:43                                             ` Thomas Sailer
  2 siblings, 2 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-17  0:21 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Alan Cox, Geert Uytterhoeven, Linus Torvalds, Jonathan Lundell,
	Jeff Garzik, James Simmons, Neil Brown,
	Linux Kernel Mailing List, viro

> > USB disks are required (haha etc) to have serial numbers. Firewire similarly
> > has unique disk identifiers.
> 
> How about for other device classes?

Keyboards and mice dont which is a real pig because it prevents you using
dual head, two usb keyboards and 2 usb mice for a dual user box (assuming
someone fixed the console code mess to cope with multiple console users as
a concept)


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:58                             ` Richard Gooch
  2001-05-17  0:12                               ` H. Peter Anvin
@ 2001-05-17  0:24                               ` Alan Cox
  2001-05-17  1:35                               ` Jeff Garzik
  2 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-17  0:24 UTC (permalink / raw)
  To: Richard Gooch
  Cc: Alan Cox, Richard Gooch, Geert Uytterhoeven, Ingo Oeser,
	Linus Torvalds, Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List, viro

> Yes, it's broken if someone writes a cabbage dicer driver and uses
> "cd" as the leaf node name for devfs.
> 
> Yes, it's broken if someone writes a cabbage dicer driver and uses
> the same major as the IDE CD-ROM or SCSI CD-ROM drivers.

The difference is one is a kernel interface magic cookie (be it a variable
length one with a 7bit ascii mapping that happens to relate to it, or a 
short constnt) the other is user policy and has no business being set in a
spec

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:26                                         ` Alan Cox
  2001-05-16 23:31                                           ` H. Peter Anvin
  2001-05-16 23:52                                           ` Linus Torvalds
@ 2001-05-17  1:26                                           ` Joel Becker
  2 siblings, 0 replies; 317+ messages in thread
From: Joel Becker @ 2001-05-17  1:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

On Thu, May 17, 2001 at 12:26:12AM +0100, Alan Cox wrote:
> > Are FireWire (and USB) disks always detected in the same order? Or does it
> > behave like ADB, where you never know which mouse/keyboard is which
> > mouse/keyboard?
> 
> USB disks are required (haha etc) to have serial numbers. Firewire similarly
> has unique disk identifiers.  

	"haha etc" indeed.  We've just purchased and tested a BusLink
20GB USB disk.  Works fine with the driver we found.  Great, in fact.
But no serial number at all.

Joel


-- 

"What do you take me for, an idiot?"  
        - General Charles de Gaulle, when a journalist asked him
          if he was happy.

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:58                             ` Richard Gooch
  2001-05-17  0:12                               ` H. Peter Anvin
  2001-05-17  0:24                               ` Alan Cox
@ 2001-05-17  1:35                               ` Jeff Garzik
  2001-05-17  9:33                                 ` Guest section DW
  2 siblings, 1 reply; 317+ messages in thread
From: Jeff Garzik @ 2001-05-17  1:35 UTC (permalink / raw)
  To: Richard Gooch; +Cc: Linux Kernel Mailing List

To inject a bit of concrete into this discussion, I note that block
devices with dynamically assigned don't work with CONFIG_DEVFS and
devfs=only.  Block devices -require- majors currently, due to those
!@#!@# arrays.  However, devfs_register_blkdev always returns zero when
devfs=only, even if its a block device and a dynamic major is requested.
-- 
Jeff Garzik      | Game called on account of naked chick
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:31                                           ` H. Peter Anvin
  2001-05-16 23:53                                             ` Linus Torvalds
  2001-05-17  0:21                                             ` Alan Cox
@ 2001-05-17  6:43                                             ` Thomas Sailer
  2001-05-17 16:58                                               ` Tim Jansen
  2 siblings, 1 reply; 317+ messages in thread
From: Thomas Sailer @ 2001-05-17  6:43 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux Kernel Mailing List

"H. Peter Anvin" schrieb:

> How about for other device classes?

Cheap USB devices (and sometimes even expensive ones)
do not have serial numbers or other unique identifiers.
Therefore some sort of topology based addressing scheme
has to be used in that case.

Tom

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-17  0:21                                             ` Alan Cox
@ 2001-05-17  7:57                                               ` Geert Uytterhoeven
  2001-05-17 16:26                                               ` James Simmons
  1 sibling, 0 replies; 317+ messages in thread
From: Geert Uytterhoeven @ 2001-05-17  7:57 UTC (permalink / raw)
  To: Alan Cox
  Cc: H. Peter Anvin, Linus Torvalds, Jonathan Lundell, Jeff Garzik,
	James Simmons, Neil Brown, Linux Kernel Mailing List, viro

On Thu, 17 May 2001, Alan Cox wrote:
> > > USB disks are required (haha etc) to have serial numbers. Firewire similarly
> > > has unique disk identifiers.
> > 
> > How about for other device classes?
> 
> Keyboards and mice dont which is a real pig because it prevents you using
> dual head, two usb keyboards and 2 usb mice for a dual user box (assuming
> someone fixed the console code mess to cope with multiple console users as
> a concept)

FYI... http://sf.net/projects/linuxconsole/

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-17  1:35                               ` Jeff Garzik
@ 2001-05-17  9:33                                 ` Guest section DW
  0 siblings, 0 replies; 317+ messages in thread
From: Guest section DW @ 2001-05-17  9:33 UTC (permalink / raw)
  To: Jeff Garzik, Richard Gooch; +Cc: Linux Kernel Mailing List

On Wed, May 16, 2001 at 09:35:09PM -0400, Jeff Garzik wrote:

> To inject a bit of concrete into this discussion, I note that block
> devices with dynamically assigned don't work with CONFIG_DEVFS and
> devfs=only.  Block devices -require- majors currently, due to those
> !@#!@# arrays.  However, devfs_register_blkdev always returns zero when
> devfs=only, even if its a block device and a dynamic major is requested.

Jeff, this is a non-issue.
These arrays you talk about are removed in a simple edit session -
I did it a handful of times - there is no problem there whatsoever.

What you are talking about is kernel-internal. We are free to do
whatever we like inside the kernel, and we have full information.
The only question is how to transmit device identity across the
kernel space/user space boundary.

Andries


[And my solution is to use cookies - 64-bit opaque numbers that
carry no information, and are generated at random by the kernel,
but with the properties: (i) things that have a device number
today keep this number, (ii) the random generation is such that
whenever possible chances are good that after a reboot the same
device will have the same number.]

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 19:43                                   ` Johannes Erdfelt
  2001-05-15 21:58                                     ` Chip Salzenberg
  2001-05-16  8:51                                     ` Helge Hafting
@ 2001-05-17 10:20                                     ` Pavel Machek
  2001-05-18 17:32                                       ` Johannes Erdfelt
  2001-05-19  8:18                                     ` Kai Henningsen
  3 siblings, 1 reply; 317+ messages in thread
From: Pavel Machek @ 2001-05-17 10:20 UTC (permalink / raw)
  To: Johannes Erdfelt
  Cc: Linus Torvalds, James Simmons, Alexander Viro, Alan Cox,
	Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

Hi!

> > But no, I don't actually like sockets all that much myself. They are hard
> > to use from scripts, and many more people are familiar with open/close and
> > read/write.
> 
> Agreed.
> 
> It would be nice to use open/close/read/write for control and bulk and
> sockets for interrupt and isochronous.

What makes interrupt so different? Last time I checked int pipes were very
similar to bulk pipes... Do you care about "packet boundaries"? You can
somehow emulate with read, too...
								Pavel

-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:37                                   ` Linus Torvalds
  2001-05-15 20:56                                     ` Jeff Garzik
  2001-05-15 21:22                                     ` James Simmons
@ 2001-05-17 10:42                                     ` Pavel Machek
  2001-05-18 18:32                                       ` James Simmons
  2 siblings, 1 reply; 317+ messages in thread
From: Pavel Machek @ 2001-05-17 10:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Viro, James Simmons, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List

Hi!
Hi!

> They might also be exactly the same channel, except with certain magic
> bits set. The example peter gave was fine: tty devices could very usefully
> be opened with something like
> 
> 	fd = open("/dev/tty00/nonblock,9600,n8", O_RDWR);
> 
> where we actually open up exactly the same channel as if we opened up
> /dev/cua00, we just set the speed etc at the same time. Which makes things

Hmm, there might be problem with this. How do you change speed without
reopening device? [Remember: your mice knows when you close device]
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 17:37 ` Pavel Machek
@ 2001-05-17 11:32   ` Alan Cox
  0 siblings, 0 replies; 317+ messages in thread
From: Alan Cox @ 2001-05-17 11:32 UTC (permalink / raw)
  To: Pavel Machek; +Cc: H. Peter Anvin, torvalds, Linux Kernel Mailing List

> Linus, Is that wise? I could understand moratorium during 2.5, but during 2.4?!
> And worse, what about drivers that want to be merged into 2.2?

2.2 will be using the same forked registry as 2.4-ac. I dont anticipate much
being added to it that will need a major however


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-17  0:21                                             ` Alan Cox
  2001-05-17  7:57                                               ` Geert Uytterhoeven
@ 2001-05-17 16:26                                               ` James Simmons
  1 sibling, 0 replies; 317+ messages in thread
From: James Simmons @ 2001-05-17 16:26 UTC (permalink / raw)
  To: Alan Cox
  Cc: H. Peter Anvin, Geert Uytterhoeven, Linus Torvalds,
	Jonathan Lundell, Jeff Garzik, Neil Brown,
	Linux Kernel Mailing List, viro


> > > USB disks are required (haha etc) to have serial numbers. Firewire similarly
> > > has unique disk identifiers.
> > 
> > How about for other device classes?
> 
> Keyboards and mice dont which is a real pig because it prevents you using
> dual head, two usb keyboards and 2 usb mice for a dual user box (assuming
> someone fixed the console code mess to cope with multiple console users as
> a concept)

Wrong!! I already have a multi-desktop system running at home. I have two
PS/2 keyboards, Sun keyboard, and a USB keyboard hooked up to my system. I
only have two monitors/video cards so I only have two VTs going at the
same time. I even managed to get two X servers running on each VT
independent of each other. This was not ease but it is possible. 
   I can tell you I didn't use any type of id system. How do I deal with
devices like multiple sound cards (yes I have that too) in a multidesktop
environment? With file permission on the device nodes. I created
different groups for different desktops. With a little tweaking with PAM
when I login to a specific workstation I'm automatically added to a
certain desktop group. I can't access any devices that belong to another 
desktop group. When I logout I'm removed from that group. Now xdm needs a
little hacking to make this work. 
   The beauty of this approach is the admin determines which devices
belong to which desktop. Now for hotpluggable devices when they plugged
back in a device they just unplugged and they don't end up on the same
device that new device will belong to no one. They can use some userland
utility then to grab the device. In this case it is first come first
serve. Surely when you plug in your device you are NOT going to steal a
device in use but one that is avaliable. Now what if two people unplug
their mice at the same time and plug them back in and they are mixed up.
Again I think a userland utility can solve this problem. Here you would
need to be root to fix the problem. 


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-17  6:43                                             ` Thomas Sailer
@ 2001-05-17 16:58                                               ` Tim Jansen
  2001-05-17 17:18                                                 ` James Simmons
  2001-05-17 22:03                                                 ` Oliver Neukum
  0 siblings, 2 replies; 317+ messages in thread
From: Tim Jansen @ 2001-05-17 16:58 UTC (permalink / raw)
  To: t.sailer; +Cc: Linux Kernel Mailing List

On Thursday 17 May 2001 08:43, Thomas Sailer wrote:
> Cheap USB devices (and sometimes even expensive ones)
> do not have serial numbers or other unique identifiers.
> Therefore some sort of topology based addressing scheme
> has to be used in that case.

No, there is another addressing scheme that can be for devices without serial 
number: the vendor and product ids. Most people do not have two devices of 
the same kind, so you often do not need the topology at all.

BTW this document describes the use of device ids on windows:
http://www.osr.com/ddk/idstrings_8tt3.htm

bye...

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16  0:59                                       ` Daniel Phillips
  2001-05-16  1:34                                         ` Nicolas Pitre
  2001-05-16 11:34                                         ` Erik Mouw
@ 2001-05-17 17:07                                         ` Eric W. Biederman
  2001-05-17 19:30                                           ` Jeff Randall
  2 siblings, 1 reply; 317+ messages in thread
From: Eric W. Biederman @ 2001-05-17 17:07 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Nicolas Pitre, Linux Kernel Mailing List

Daniel Phillips <phillips@bonn-fries.net> writes:

> On Tuesday 15 May 2001 23:20, Nicolas Pitre wrote:
> > Personally, I'd really like to see /dev/ttyS0 be the first detected
> > serial port on a system, /dev/ttyS1 the second, etc.
> 
> There are well-defined rules for the first four on PC's.  The ttySx 
> better match the labels the OEM put on the box.

Actually it would be better to have the OEM put a label in the
firmware, and then have a way to query the device for it's label.

The legacy rules are nice but serial ports are done with superio chips
now.  And superio chips are almost all ISA PNP chips without device
enumeration, and isolation. 

Eric


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-17 16:58                                               ` Tim Jansen
@ 2001-05-17 17:18                                                 ` James Simmons
  2001-05-17 17:29                                                   ` Geert Uytterhoeven
  2001-05-17 17:41                                                   ` Tim Jansen
  2001-05-17 22:03                                                 ` Oliver Neukum
  1 sibling, 2 replies; 317+ messages in thread
From: James Simmons @ 2001-05-17 17:18 UTC (permalink / raw)
  To: Tim Jansen; +Cc: t.sailer, Linux Kernel Mailing List


> No, there is another addressing scheme that can be for devices without serial 
> number: the vendor and product ids. Most people do not have two devices of 
> the same kind, so you often do not need the topology at all.

I wouldn't make that assumpation. I have two PS/2 keybaords attached to my
system and they don't have serial ids nor do they have vendor or product
ids.



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-17 17:18                                                 ` James Simmons
@ 2001-05-17 17:29                                                   ` Geert Uytterhoeven
  2001-05-17 17:41                                                   ` Tim Jansen
  1 sibling, 0 replies; 317+ messages in thread
From: Geert Uytterhoeven @ 2001-05-17 17:29 UTC (permalink / raw)
  To: James Simmons; +Cc: Tim Jansen, t.sailer, Linux Kernel Mailing List

On Thu, 17 May 2001, James Simmons wrote:
> > No, there is another addressing scheme that can be for devices without serial 
> > number: the vendor and product ids. Most people do not have two devices of 
> > the same kind, so you often do not need the topology at all.
> 
> I wouldn't make that assumpation. I have two PS/2 keybaords attached to my
> system and they don't have serial ids nor do they have vendor or product
> ids.

Yes. And it's the hardcore users who have multiple devices of the same kind.
Think e.g. RAID.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-17 17:18                                                 ` James Simmons
  2001-05-17 17:29                                                   ` Geert Uytterhoeven
@ 2001-05-17 17:41                                                   ` Tim Jansen
  1 sibling, 0 replies; 317+ messages in thread
From: Tim Jansen @ 2001-05-17 17:41 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Thursday 17 May 2001 19:18, you wrote:
> I wouldn't make that assumpation. I have two PS/2 keybaords attached to my
> system and they don't have serial ids nor do they have vendor or product
> ids.

Yes, PS/2 is a system where you must use the location. That's why a device id 
must contain the id, the serial number AND the location.

bye...

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-17 17:07                                         ` Eric W. Biederman
@ 2001-05-17 19:30                                           ` Jeff Randall
  0 siblings, 0 replies; 317+ messages in thread
From: Jeff Randall @ 2001-05-17 19:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Daniel Phillips, Nicolas Pitre, Linux Kernel Mailing List

Eric W. Biederman wrote:
> Daniel Phillips <phillips@bonn-fries.net> writes:
> > On Tuesday 15 May 2001 23:20, Nicolas Pitre wrote:
> > > Personally, I'd really like to see /dev/ttyS0 be the first detected
> > > serial port on a system, /dev/ttyS1 the second, etc.
> > 
> > There are well-defined rules for the first four on PC's.  The ttySx 
> > better match the labels the OEM put on the box.
> 
> Actually it would be better to have the OEM put a label in the
> firmware, and then have a way to query the device for it's label.
> 
> The legacy rules are nice but serial ports are done with superio chips
> now.  And superio chips are almost all ISA PNP chips without device
> enumeration, and isolation. 

Not all serial ports are superio chips.  There's all kinds of serial
ports on all kinds of different busses being supported under Linux.  The
company I work for supports serial ports on ISA, PCI, SCSI, Ethernet, and
USB at the moment...


-- 
Jeff Randall - Jeff_Randall@digi.com  "A paranoid person is never alone,
                                       he knows he's always the center
                                       of attention..."

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 20:14                                 ` Alexander Viro
                                                     ` (2 preceding siblings ...)
  2001-05-15 20:57                                   ` LANANA: To Pending Device Number Registrants James Simmons
@ 2001-05-17 20:33                                   ` Kai Henningsen
  3 siblings, 0 replies; 317+ messages in thread
From: Kai Henningsen @ 2001-05-17 20:33 UTC (permalink / raw)
  To: linux-kernel

torvalds@transmeta.com (Linus Torvalds)  wrote on 15.05.01 in <Pine.LNX.4.21.0105151328160.2470-100000@penguin.transmeta.com>:

> They might also be exactly the same channel, except with certain magic
> bits set. The example peter gave was fine: tty devices could very usefully
> be opened with something like
>
> 	fd = open("/dev/tty00/nonblock,9600,n8", O_RDWR);
>
> where we actually open up exactly the same channel as if we opened up
> /dev/cua00, we just set the speed etc at the same time. Which makes things
> a hell of a lot more readable, AND they are again easily done from
> scripts. The above is exactly the kind of thing that UNIX has not done
> well, and some others have done better (let's face it, even _DOS_ did it
> better, for chrissake! Those callout devices and those ioctl's are a pain
> in the ass, for no really good reason).

Umm ... where to begin.

1. No, DOS didn't do it better - DOS devices were mostly a bad copy of  
Xenix devices.

2. DOS definitely didn't do it better for serial ports. Serial ports are  
the single most broken devices that DOS supports by default, so much so  
that literally *no* serious program that needed the serial ports used the  
built-in driver. Only toy programs did that. Because those drivers weren't  
anything but toys themselves.

I know this the hard way. I used serial ports under DOS for something like  
ten years.

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 18:58                               ` Johannes Erdfelt
  2001-05-15 19:17                                 ` Linus Torvalds
@ 2001-05-17 20:40                                 ` Kai Henningsen
  2001-05-17 22:46                                   ` Johannes Erdfelt
  1 sibling, 1 reply; 317+ messages in thread
From: Kai Henningsen @ 2001-05-17 20:40 UTC (permalink / raw)
  To: linux-kernel

johannes@erdfelt.com (Johannes Erdfelt)  wrote on 15.05.01 in <20010515154325.Z5599@sventech.com>:

> I had always made the assumption that sockets were created because you
> couldn't easily map IPv4 semantics onto filesystems. It's unreasonable
> to have a file for every possible IP address/port you can communicate
> with.

Not at all. What is unreasonable is douing a "ls" on the directory in  
question.

Big deal; make it mode d--x--x--x. Problem solved.

And I'm pretty certain stuff like that *has* been done - wasn't there a  
ftp file system where you could "ls /mountpoint/ftp.kernel.org/pub/linux"?

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 20:01                             ` Richard Gooch
                                                 ` (3 preceding siblings ...)
  2001-05-16 20:54                               ` Richard Gooch
@ 2001-05-17 21:06                               ` Kai Henningsen
  4 siblings, 0 replies; 317+ messages in thread
From: Kai Henningsen @ 2001-05-17 21:06 UTC (permalink / raw)
  To: linux-kernel

rgooch@ras.ucalgary.ca (Richard Gooch)  wrote on 16.05.01 in <200105162054.f4GKsaF10834@vindaloo.ras.ucalgary.ca>:

> H. Peter Anvin writes:
> > Richard Gooch wrote:
> > >
> > > H. Peter Anvin writes:
> > > > Richard Gooch wrote:
> > > > > Argh! What I wrote in text is what I meant to say. The code didn't
> > > > > match. No wonder people seemed to be missing the point. So the line
> > > > > of code I actually meant was:
> > > > >         if (strcmp (buffer + len - 3, "/cd") != 0) {
> > > >
> > > > This is still a really bad idea.  You don't want to tie this kind of
> > > > things to the name.
> > >
> > > Why do you think it's a bad idea?
> >
> > Because you are now, once again, tying two things that are
> > completely and utterly unrelated: device classification and device
> > name.  It breaks every time someone comes out with a new device
> > which is "kind of like an old device, but not really," like
> > CD-writers (which was kind-of-like WORM, kind-of-like CD-ROM) and
> > DVD (kind-of-like CD)...
>
> But all devices which export a CD-ROM interface will do so. So the
> device node that is associated with the CD-ROM driver will export
> CD-ROM semantics, and the trailing name will be "/cd".

Uh, how do they have the filename end in more than one device type suffix  
at the same time?

That was the point, remember. You're trying to find out about a device on  
the end of your file handle, and that device *does* match more than one of  
these.

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-16 23:49                                           ` Richard Gooch
  2001-05-16 23:55                                             ` H. Peter Anvin
@ 2001-05-17 21:12                                             ` Kai Henningsen
  1 sibling, 0 replies; 317+ messages in thread
From: Kai Henningsen @ 2001-05-17 21:12 UTC (permalink / raw)
  To: linux-kernel

hpa@transmeta.com (H. Peter Anvin)  wrote on 16.05.01 in <3B03137A.80221C59@transmeta.com>:

> At some point something talks to the device -- in this case, it's the
> SCSI layer.  Follow the interfaces in the kernel and it becomes obvious.

rc = sys_iskind(int filehandle, const char *driverkind)

rc = 0 or Esomething

Think of it as a generalization of isatty(). Maybe

#define isatty(f) sys_iskind(f, "tty")

:-;

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 18:15                                 ` Linus Torvalds
  2001-05-15 19:36                                   ` Jonathan Lundell
@ 2001-05-17 21:23                                   ` Kai Henningsen
  1 sibling, 0 replies; 317+ messages in thread
From: Kai Henningsen @ 2001-05-17 21:23 UTC (permalink / raw)
  To: linux-kernel

jlundell@pobox.com (Jonathan Lundell)  wrote on 15.05.01 in <p05100316b7272cdfd50c@[207.213.214.37]>:

> What about:
>
> 1 (network domain). I have two network interfaces that I connect to
> two different network segments, eth0 & eth1; they're ifconfig'd to
> the appropriate IP and MAC addresses. I really do need to know
> physically which (physical) hole to plug my eth0 cable into.

Sorry, the software doesn't know that. Never has, for that matter.

> (Extension: same situation, but it's a firewall and I've got 12 ports
> to connect.) (Extension #2: if I add a NIC to the system and reboot,
> I'd really prefer that the NICs already in use didn't get renumbered.)

Make your config script look at the hardware MAC addresses. Those don't  
change.

> 2 (disk domain). I have multiple spindles on multiple SCSI adapters.
> I want to allocate them to more than one RAID0/1/5 set, with the
> usual considerations of putting mirrors on different adapters,
> spreading my RAID5 drives optimally, ditto stripes. I need (eg) SCSI
> paths to config all this, and I further need real physical locations
> to identify failed drives that need to be hot-replaced. The mirror
> members will move around as drives are replaced and hot spares come
> into play.

Use partition UUIDs, or SCSI serial numbers, or whatever. This works  
today.

> Seems like more that merely informational.

The *location*? Nope. Some unique id for the device, if available at all:  
sure.

> (A side observation: PCI or SCSI bus/device/lun/etc paths are not
> physical locations; you also need external hardware-specific
> knowledge to be able to talk about real physical locations in a way
> that does the system operator any good.)

And those you typically do not have.

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-17 16:58                                               ` Tim Jansen
  2001-05-17 17:18                                                 ` James Simmons
@ 2001-05-17 22:03                                                 ` Oliver Neukum
  1 sibling, 0 replies; 317+ messages in thread
From: Oliver Neukum @ 2001-05-17 22:03 UTC (permalink / raw)
  To: Tim Jansen, t.sailer; +Cc: Linux Kernel Mailing List

On Thursday, 17. May 2001 18:58, Tim Jansen wrote:
> On Thursday 17 May 2001 08:43, Thomas Sailer wrote:
> > Cheap USB devices (and sometimes even expensive ones)
> > do not have serial numbers or other unique identifiers.
> > Therefore some sort of topology based addressing scheme
> > has to be used in that case.
>
> No, there is another addressing scheme that can be for devices without
> serial number: the vendor and product ids. Most people do not have two
> devices of the same kind, so you often do not need the topology at all.

We need to handle the case, even if it is rare. Thus the code must be there. 
If it is there we may as well use it.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-17 20:40                                 ` Kai Henningsen
@ 2001-05-17 22:46                                   ` Johannes Erdfelt
  0 siblings, 0 replies; 317+ messages in thread
From: Johannes Erdfelt @ 2001-05-17 22:46 UTC (permalink / raw)
  To: Kai Henningsen; +Cc: linux-kernel

On Thu, May 17, 2001, Kai Henningsen <kaih@khms.westfalen.de> wrote:
> johannes@erdfelt.com (Johannes Erdfelt)  wrote on 15.05.01 in <20010515154325.Z5599@sventech.com>:
> 
> > I had always made the assumption that sockets were created because you
> > couldn't easily map IPv4 semantics onto filesystems. It's unreasonable
> > to have a file for every possible IP address/port you can communicate
> > with.
> 
> Not at all. What is unreasonable is douing a "ls" on the directory in  
> question.
> 
> Big deal; make it mode d--x--x--x. Problem solved.
> 
> And I'm pretty certain stuff like that *has* been done - wasn't there a  
> ftp file system where you could "ls /mountpoint/ftp.kernel.org/pub/linux"?

I think this is the difference between reasonable and unreasonable.

I'm sure it could be done, but should it?

JE


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 19:36                                   ` Jonathan Lundell
  2001-05-15 20:18                                     ` Linus Torvalds
@ 2001-05-18  2:18                                     ` Jonathan Lundell
  2001-05-19 17:36                                       ` Jonathan Lundell
  2001-05-19 17:45                                       ` Jonathan Lundell
  2001-05-19  8:42                                     ` Kai Henningsen
  2 siblings, 2 replies; 317+ messages in thread
From: Jonathan Lundell @ 2001-05-18  2:18 UTC (permalink / raw)
  To: Kai Henningsen, linux-kernel

At 11:23 PM +0200 2001-05-17, Kai Henningsen wrote:
>jlundell@pobox.com (Jonathan Lundell)  wrote on 15.05.01 in 
><p05100316b7272cdfd50c@[207.213.214.37]>:
>
>>  What about:
>>
>>  1 (network domain). I have two network interfaces that I connect to
>>  two different network segments, eth0 & eth1; they're ifconfig'd to
>>  the appropriate IP and MAC addresses. I really do need to know
>>  physically which (physical) hole to plug my eth0 cable into.
>
>Sorry, the software doesn't know that. Never has, for that matter.

Well, no, it doesn't. That's a problem. Jeff Garzik's ethtool 
extension at least tells me the PCI bus/dev/fcn, though, and from 
that I can write a userland mapping function to the physical 
location. My point, though, is that finding the socket is a real-life 
problem on systems with multiple interfaces. I don't expect the 
kernel to know the physical locations, but the user has to be able to 
get from kernel/ifconfig names (eth#) to sockets, one way or another. 
Support for a uniform means of doing the mapping, even if it needs 
userland help, would be good.

>  > (Extension: same situation, but it's a firewall and I've got 12 ports
>>  to connect.) (Extension #2: if I add a NIC to the system and reboot,
>>  I'd really prefer that the NICs already in use didn't get renumbered.)
>
>Make your config script look at the hardware MAC addresses. Those don't
>change.

They're not necessarily unique, though.

>  > 2 (disk domain). I have multiple spindles on multiple SCSI adapters.
>>  I want to allocate them to more than one RAID0/1/5 set, with the
>>  usual considerations of putting mirrors on different adapters,
>>  spreading my RAID5 drives optimally, ditto stripes. I need (eg) SCSI
>>  paths to config all this, and I further need real physical locations
>>  to identify failed drives that need to be hot-replaced. The mirror
>>  members will move around as drives are replaced and hot spares come
>>  into play.
>
>Use partition UUIDs, or SCSI serial numbers, or whatever. This works
>today.

This pushes the problem back in time: I need to write the UUID, for 
example, at some point. And, with hot-swappable drives, I'm still 
interested in the physical location. I really know know that there's 
a good answer to this problem, especially with FC, but I need to tell 
an operator, "replace this particular physical drive". It doesn't do 
any good to tell the operator the UUID.

>  > Seems like more that merely informational.
>
>The *location*? Nope. Some unique id for the device, if available at all:
>sure.

What good does it do to tell an operator to connect a cable to a MAC 
address? Or to remove a drive having a particular UUID? If it's "mere 
information", it's *necessary* mere information.

>  > (A side observation: PCI or SCSI bus/device/lun/etc paths are not
>>  physical locations; you also need external hardware-specific
>>  knowledge to be able to talk about real physical locations in a way
>>  that does the system operator any good.)
>
>And those you typically do not have.

But (ideally) should.

-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-17 10:20                                     ` Pavel Machek
@ 2001-05-18 17:32                                       ` Johannes Erdfelt
  2001-05-19 10:21                                         ` Pavel Machek
  0 siblings, 1 reply; 317+ messages in thread
From: Johannes Erdfelt @ 2001-05-18 17:32 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Linux Kernel Mailing List

On Thu, May 17, 2001, Pavel Machek <pavel@suse.cz> wrote:
> > > But no, I don't actually like sockets all that much myself. They are hard
> > > to use from scripts, and many more people are familiar with open/close and
> > > read/write.
> > 
> > Agreed.
> > 
> > It would be nice to use open/close/read/write for control and bulk and
> > sockets for interrupt and isochronous.
> 
> What makes interrupt so different? Last time I checked int pipes were very
> similar to bulk pipes... Do you care about "packet boundaries"? You can
> somehow emulate with read, too...

We probably could. It would have interesting semantics however. We would
have to have an ioctl or something else to specify period, and if it's
one shot, etc.

We could probably shoehorn isochronous semantics onto read/write as
well, but I don't want to begin to think how ugly that'll be.

The reason I don't favor the read/write idea for interrupt and
isochronous are the fact that they are so different. We could shoehorn
the semantics onto it, but we'd just be moving the problem from one
place to somewhere else.

A completely ioctl solution would work better in that case since it's
cleaner. The only problem would be the fact it's called ioctl.

JE


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-17 10:42                                     ` Pavel Machek
@ 2001-05-18 18:32                                       ` James Simmons
  2001-05-19 10:23                                         ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants] Pavel Machek
  0 siblings, 1 reply; 317+ messages in thread
From: James Simmons @ 2001-05-18 18:32 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Alexander Viro, Alan Cox, Neil Brown,
	Jeff Garzik, H. Peter Anvin, Linux Kernel Mailing List


> > They might also be exactly the same channel, except with certain magic
> > bits set. The example peter gave was fine: tty devices could very usefully
> > be opened with something like
> > 
> > 	fd = open("/dev/tty00/nonblock,9600,n8", O_RDWR);
> > 
> > where we actually open up exactly the same channel as if we opened up
> > /dev/cua00, we just set the speed etc at the same time. Which makes things
> 
> Hmm, there might be problem with this. How do you change speed without
> reopening device? [Remember: your mice knows when you close device]

If you implement it as a filesystem you coould have a settings file in the
tty filesystem. Something like this:

echo "115200" >  /dev/tty/settings



^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 19:43                                   ` Johannes Erdfelt
                                                       ` (2 preceding siblings ...)
  2001-05-17 10:20                                     ` Pavel Machek
@ 2001-05-19  8:18                                     ` Kai Henningsen
  3 siblings, 0 replies; 317+ messages in thread
From: Kai Henningsen @ 2001-05-19  8:18 UTC (permalink / raw)
  To: linux-kernel

johannes@erdfelt.com (Johannes Erdfelt)  wrote on 17.05.01 in <20010517184636.L32405@sventech.com>:

> On Thu, May 17, 2001, Kai Henningsen <kaih@khms.westfalen.de> wrote:
> > johannes@erdfelt.com (Johannes Erdfelt)  wrote on 15.05.01 in
> > <20010515154325.Z5599@sventech.com>:
> >
> > > I had always made the assumption that sockets were created because you
> > > couldn't easily map IPv4 semantics onto filesystems. It's unreasonable
> > > to have a file for every possible IP address/port you can communicate
> > > with.
> >
> > Not at all. What is unreasonable is douing a "ls" on the directory in
> > question.
> >
> > Big deal; make it mode d--x--x--x. Problem solved.
> >
> > And I'm pretty certain stuff like that *has* been done - wasn't there a
> > ftp file system where you could "ls /mountpoint/ftp.kernel.org/pub/linux"?
>
> I think this is the difference between reasonable and unreasonable.

What's unreasonable about it?

> I'm sure it could be done, but should it?

Well, the author of the Midnight Commander seems to think it should.

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-15 19:36                                   ` Jonathan Lundell
  2001-05-15 20:18                                     ` Linus Torvalds
  2001-05-18  2:18                                     ` Jonathan Lundell
@ 2001-05-19  8:42                                     ` Kai Henningsen
  2 siblings, 0 replies; 317+ messages in thread
From: Kai Henningsen @ 2001-05-19  8:42 UTC (permalink / raw)
  To: linux-kernel

jlundell@pobox.com (Jonathan Lundell)  wrote on 17.05.01 in <p05100301b72a335d4b61@[10.128.7.49]>:

> At 11:23 PM +0200 2001-05-17, Kai Henningsen wrote:
> >jlundell@pobox.com (Jonathan Lundell)  wrote on 15.05.01 in
> ><p05100316b7272cdfd50c@[207.213.214.37]>:
> >
> >>  What about:
> >>
> >>  1 (network domain). I have two network interfaces that I connect to
> >>  two different network segments, eth0 & eth1; they're ifconfig'd to
> >>  the appropriate IP and MAC addresses. I really do need to know
> >>  physically which (physical) hole to plug my eth0 cable into.
> >
> >Sorry, the software doesn't know that. Never has, for that matter.
>
> Well, no, it doesn't. That's a problem.

Maybe, but it's not a problem you can solve from the kernel.

> Jeff Garzik's ethtool
> extension at least tells me the PCI bus/dev/fcn, though, and from
> that I can write a userland mapping function to the physical
> location.

I don't see how PCI bus/dev/fcn lets you do that.

> My point, though, is that finding the socket is a real-life
> problem on systems with multiple interfaces. I don't expect the
> kernel to know the physical locations, but the user has to be able to
> get from kernel/ifconfig names (eth#) to sockets, one way or another.

Local documentation is just about the only way to do it.

And one way that'd work fairly well with at least PC network cards is  
putting a sticker with the MAC address on them where you can see it while  
looking for the right place to put your plug.

Not the only way, either.

> Support for a uniform means of doing the mapping, even if it needs
> userland help, would be good.

It doesn't need userland *or* kernel help.

> >  > (Extension: same situation, but it's a firewall and I've got 12 ports
> >>  to connect.) (Extension #2: if I add a NIC to the system and reboot,
> >>  I'd really prefer that the NICs already in use didn't get renumbered.)
> >
> >Make your config script look at the hardware MAC addresses. Those don't
> >change.
>
> They're not necessarily unique, though.

So if you plug both into the same network segment, that segment is broken?  
That looks like very stupid design to me.

It's not as if getting enough unique MAC addresses was particularly  
expensive. These days, even el-cheapo PC network cards get that right.  
(And have for quite a number of years.)

> >  > 2 (disk domain). I have multiple spindles on multiple SCSI adapters.
> >>  I want to allocate them to more than one RAID0/1/5 set, with the
> >>  usual considerations of putting mirrors on different adapters,
> >>  spreading my RAID5 drives optimally, ditto stripes. I need (eg) SCSI
> >>  paths to config all this, and I further need real physical locations
> >>  to identify failed drives that need to be hot-replaced. The mirror
> >>  members will move around as drives are replaced and hot spares come
> >>  into play.
> >
> >Use partition UUIDs, or SCSI serial numbers, or whatever. This works
> >today.
>
> This pushes the problem back in time: I need to write the UUID, for

But not the SCSI serial number.

> example, at some point. And, with hot-swappable drives, I'm still
> interested in the physical location. I really know know that there's
> a good answer to this problem, especially with FC, but I need to tell
> an operator, "replace this particular physical drive". It doesn't do
> any good to tell the operator the UUID.

Well, if it's a small system, any enumeration plus id-page query will let  
you identify *a* name for the device. There's no need for that name to be  
stable. (The only stable names you need are for mount and friends, and  
those can easily use UUIDs.)

In a big system, where presumably you use lots of similar drives, those  
better have some sort of serial number (which you can, of course, get at  
the same way as above). In that case, part of the preparation of a hot  
swap drive would be to put the serial number on a sticker on the drive (or  
put some other id there and note the correspondence in some database).

And, of course, your software can note which UUID goes with which serial  
number.

If your drives have *no* serial number, you can try a software one ... or  
follow the old advice: don't do that, then. Don't use unidentifiable  
drives in many-similar-drive production systems.

> >  > Seems like more that merely informational.
> >
> >The *location*? Nope. Some unique id for the device, if available at all:
> >sure.
>
> What good does it do to tell an operator to connect a cable to a MAC
> address? Or to remove a drive having a particular UUID? If it's "mere
> information", it's *necessary* mere information.

See above for how that works. As in, actually works in practice. As in, I  
really shouldn't have to explain this.

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-18 17:32                                       ` Johannes Erdfelt
@ 2001-05-19 10:21                                         ` Pavel Machek
  0 siblings, 0 replies; 317+ messages in thread
From: Pavel Machek @ 2001-05-19 10:21 UTC (permalink / raw)
  To: Johannes Erdfelt; +Cc: Linux Kernel Mailing List

Hi!

> > > > But no, I don't actually like sockets all that much myself. They are hard
> > > > to use from scripts, and many more people are familiar with open/close and
> > > > read/write.
> > > 
> > > Agreed.
> > > 
> > > It would be nice to use open/close/read/write for control and bulk and
> > > sockets for interrupt and isochronous.
> > 
> > What makes interrupt so different? Last time I checked int pipes were very
> > similar to bulk pipes... Do you care about "packet boundaries"? You can
> > somehow emulate with read, too...
> 
> We probably could. It would have interesting semantics however. We would
> have to have an ioctl or something else to specify period, and if it's
> one shot, etc.

ioctl for specifying period seems okay to me, and I believe UDP
sockets already have very similar semantics for read/write.

> We could probably shoehorn isochronous semantics onto read/write as
> well, but I don't want to begin to think how ugly that'll be.

What's the problem?

> A completely ioctl solution would work better in that case since it's
> cleaner. The only problem would be the fact it's called ioctl.

I do not think it is cleaner. Could AF_USB be used to get "clean"
solution?
								Pavel
-- 
The best software in life is free (not shareware)!		Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+

^ permalink raw reply	[flat|nested] 317+ messages in thread

* no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants]
  2001-05-18 18:32                                       ` James Simmons
@ 2001-05-19 10:23                                         ` Pavel Machek
  2001-05-19 19:00                                           ` Linus Torvalds
  0 siblings, 1 reply; 317+ messages in thread
From: Pavel Machek @ 2001-05-19 10:23 UTC (permalink / raw)
  To: James Simmons
  Cc: Pavel Machek, Linus Torvalds, Alexander Viro, Alan Cox,
	Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

Hi!

> > > They might also be exactly the same channel, except with certain magic
> > > bits set. The example peter gave was fine: tty devices could very usefully
> > > be opened with something like
> > > 
> > > 	fd = open("/dev/tty00/nonblock,9600,n8", O_RDWR);
> > > 
> > > where we actually open up exactly the same channel as if we opened up
> > > /dev/cua00, we just set the speed etc at the same time. Which makes things
> > 
> > Hmm, there might be problem with this. How do you change speed without
> > reopening device? [Remember: your mice knows when you close device]
> 
> If you implement it as a filesystem you coould have a settings file in the
> tty filesystem. Something like this:
> 
> echo "115200" >  /dev/tty/settings

You can currently do 

stty 115200

or 

stty 19200

when your stdin is serial port. If it is filesystem, you'll have hard
time finding *which* of serial ports it is, followed by opening it.

What about this?

bash < /dev/ttyS0 &
rm -r /dev/ttyS0
how does bash change speed of serial line, then?
								Pavel
-- 
The best software in life is free (not shareware)!		Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-18  2:18                                     ` Jonathan Lundell
@ 2001-05-19 17:36                                       ` Jonathan Lundell
  2001-05-20  9:37                                         ` Eric W. Biederman
                                                           ` (3 more replies)
  2001-05-19 17:45                                       ` Jonathan Lundell
  1 sibling, 4 replies; 317+ messages in thread
From: Jonathan Lundell @ 2001-05-19 17:36 UTC (permalink / raw)
  To: Kai Henningsen, linux-kernel

At 10:42 AM +0200 2001-05-19, Kai Henningsen wrote:
>  > Jeff Garzik's ethtool
>  > extension at least tells me the PCI bus/dev/fcn, though, and from
>>  that I can write a userland mapping function to the physical
>>  location.
>
>I don't see how PCI bus/dev/fcn lets you do that.

I know from system documentation, or can figure out once and for all 
by experimentation, the correspondence between PCI bus/dev/fcn and 
physical locations. Jeff's extension gives me the mapping between 
eth# and PCI bus/dev/fcn, which is not otherwise available (outside 
the kernel).
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-18  2:18                                     ` Jonathan Lundell
  2001-05-19 17:36                                       ` Jonathan Lundell
@ 2001-05-19 17:45                                       ` Jonathan Lundell
  1 sibling, 0 replies; 317+ messages in thread
From: Jonathan Lundell @ 2001-05-19 17:45 UTC (permalink / raw)
  To: Kai Henningsen, linux-kernel

At 10:42 AM +0200 2001-05-19, Kai Henningsen wrote:
>  > >Make your config script look at the hardware MAC addresses. Those don't
>>  >change.
>>
>>  They're not necessarily unique, though.
>
>So if you plug both into the same network segment, that segment is broken? 
>That looks like very stupid design to me.
>
>It's not as if getting enough unique MAC addresses was particularly 
>expensive. These days, even el-cheapo PC network cards get that right. 
>(And have for quite a number of years.)

Many do, some don't. Moreover, the MAC address is volatile in that it 
can be changed at will (via, eg, ifconfig).

I assume that the reason that Sun (for example) defaults to all MAC 
addresses on a system being the same is that it doesn't make sense, 
ordinarily, to plug two Ethernet interfaces into the same network 
segment. If, for some reason, you really want to do that, there's 
ifconfig ready to reassign the MAC address.

If I plug both into the same network segments by accident (because I 
can't tell which is which, say), then my configuration is nearly as 
broken with different MAC addresses as with identical ones; the fix 
is to replug correctly, not to change MAC addresses.
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants]
  2001-05-19 10:23                                         ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants] Pavel Machek
@ 2001-05-19 19:00                                           ` Linus Torvalds
  2001-05-19 19:17                                             ` Pavel Machek
  2001-05-19 20:11                                             ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumber Registrants] Abramo Bagnara
  0 siblings, 2 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-19 19:00 UTC (permalink / raw)
  To: Pavel Machek
  Cc: James Simmons, Alexander Viro, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List


[ Attribution is gone, so I just deleted it.. ]

> > > > 	fd = open("/dev/tty00/nonblock,9600,n8", O_RDWR);
> > >
> > > Hmm, there might be problem with this. How do you change speed without
> > > reopening device? [Remember: your mice knows when you close device]

The naming scheme is not a replacement for these kinds of ioctl's - it's
just a way to make them less critical, and get people thinking in other
directions so that we don't get _more_ ioctl's.

Remember, the serial lines we already have legacy support for, that's not
going away. The termios-based stuff isn't Linux-only, and we'll
obviously maintain it for the forseeable future.

But if we can use naming to avoid ioctl's in the future, then THAT is
good. I'm in particular thinking about frame-buffer and similar things,
where we might be able to avoid making the situation worse.

And remember where this discussion started: not ioctl's, but device
numbers. The _biggest_ advantage of naming may be to get rid of the need
for extra major and minor numbers, and cleaning up /dev in the process-

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants]
  2001-05-19 19:00                                           ` Linus Torvalds
@ 2001-05-19 19:17                                             ` Pavel Machek
  2001-05-19 19:35                                               ` Linus Torvalds
                                                                 ` (2 more replies)
  2001-05-19 20:11                                             ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumber Registrants] Abramo Bagnara
  1 sibling, 3 replies; 317+ messages in thread
From: Pavel Machek @ 2001-05-19 19:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: James Simmons, Alexander Viro, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List

Hi!

> > > > > 	fd = open("/dev/tty00/nonblock,9600,n8", O_RDWR);
> > > >
> > > > Hmm, there might be problem with this. How do you change speed without
> > > > reopening device? [Remember: your mice knows when you close device]
> 
> The naming scheme is not a replacement for these kinds of ioctl's - it's
> just a way to make them less critical, and get people thinking in other
> directions so that we don't get _more_ ioctl's.
> 
> Remember, the serial lines we already have legacy support for, that's not
> going away. The termios-based stuff isn't Linux-only, and we'll
> obviously maintain it for the forseeable future.

Well, if we did something like modify(int fd, char *how), you could do

modify(0, "nonblock,9600") 

which looks slightly better than special ioctl. You could even hack
libc to emulate ioctl with modify.

I thought about how to do networking without sockets, and it seems to
me like this kind of modify syscall is needed, because network sockets
connect to *two* different places (one local address and one
remote). Sockets are really nasty :-(.

> But if we can use naming to avoid ioctl's in the future, then THAT is
> good. I'm in particular thinking about frame-buffer and similar things,
> where we might be able to avoid making the situation worse.

Yup. OTOH making "new" system so powerfull that it could lead to
ioctls emulated in libc would be very nice, too.
								Pavel
-- 
The best software in life is free (not shareware)!		Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants]
  2001-05-19 19:17                                             ` Pavel Machek
@ 2001-05-19 19:35                                               ` Linus Torvalds
  2001-05-19 19:43                                                 ` Pavel Machek
  2001-05-19 23:57                                                 ` Alexander Viro
  2001-05-20  0:01                                               ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants] Alexander Viro
  2001-05-20  9:53                                               ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Num Kai Henningsen
  2 siblings, 2 replies; 317+ messages in thread
From: Linus Torvalds @ 2001-05-19 19:35 UTC (permalink / raw)
  To: Pavel Machek
  Cc: James Simmons, Alexander Viro, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List


On Sat, 19 May 2001, Pavel Machek wrote:
> 
> Well, if we did something like modify(int fd, char *how), you could do
> 
> modify(0, "nonblock,9600") 

What you're really proposing is to make ioctl's be ASCII strings.

Which is not necessarily a bad idea, and I think plan9 did something
similar (or rather, if I remember correctly, plan9 has control streams
that were ASCII. Or am I confused?).

> I thought about how to do networking without sockets, and it seems to
> me like this kind of modify syscall is needed, because network sockets
> connect to *two* different places (one local address and one
> remote). Sockets are really nasty :-(.

One of the horrors of ioctl's is indeed that they are not very
well-defined, and as such cannot be transported over a network without
knowing more about them. Structuring them some way would already be very
useful. the _IOC() macros do this partially, of course, but because it is
a voluntary thing it is not actually followed all that well in general,
and most ioctl names are just random numbers that don't tell the structure
of the arguments or return values.

And a "stream of bytes" is in a very real sense the simplest structure,
and is the unix way (and the plan9 way is to avoid binary streams, and use
ASCII text instead when possible, whihc probably also makes sense).

However, you can't really use a string. It would really have to be two
memory regions: incoming and outgoing, with an ASCII representation being
the _preferred_ method for stuff that isn't obviously structured or
performance-critical.

Let's not take this too far, though.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants]
  2001-05-19 19:35                                               ` Linus Torvalds
@ 2001-05-19 19:43                                                 ` Pavel Machek
  2001-05-19 20:31                                                   ` Tim Jansen
  2001-05-19 23:57                                                 ` Alexander Viro
  1 sibling, 1 reply; 317+ messages in thread
From: Pavel Machek @ 2001-05-19 19:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: James Simmons, Alexander Viro, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List

Hi!

> > Well, if we did something like modify(int fd, char *how), you could do
> > 
> > modify(0, "nonblock,9600") 
> 
> What you're really proposing is to make ioctl's be ASCII strings.

Yup.

> Which is not necessarily a bad idea, and I think plan9 did something
> similar (or rather, if I remember correctly, plan9 has control streams
> that were ASCII. Or am I confused?).

I think that plan9 uses something different -- they have ttyS0 and
ttyS0ctl. This would leave us with problem "how do I get handle to
ttyS0ctl when I only have handle to ttyS0"?

...

> However, you can't really use a string. It would really have to be two
> memory regions: incoming and outgoing, with an ASCII representation being
> the _preferred_ method for stuff that isn't obviously structured or
> performance-critical.

What are cases where it is usefull to pass data back from kernel?
...aha, serial controls include possibility to read stuff, right?

								Pavel
-- 
The best software in life is free (not shareware)!		Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumber  Registrants]
  2001-05-19 19:00                                           ` Linus Torvalds
  2001-05-19 19:17                                             ` Pavel Machek
@ 2001-05-19 20:11                                             ` Abramo Bagnara
  1 sibling, 0 replies; 317+ messages in thread
From: Abramo Bagnara @ 2001-05-19 20:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Machek, James Simmons, Alexander Viro, Alan Cox,
	Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

Linus Torvalds wrote:
> 
> [ Attribution is gone, so I just deleted it.. ]
> 
> > > > >         fd = open("/dev/tty00/nonblock,9600,n8", O_RDWR);
> > > >
> > > > Hmm, there might be problem with this. How do you change speed without
> > > > reopening device? [Remember: your mice knows when you close device]
> 
> The naming scheme is not a replacement for these kinds of ioctl's - it's
> just a way to make them less critical, and get people thinking in other
> directions so that we don't get _more_ ioctl's.

However
	fchdir(fd);
	s = open("speed");
	write(s, "19200\n", 6);

would be enough to solve the problem Pavel is pointing also without the
need to use ioctl.


-- 
Abramo Bagnara                       mailto:abramo@alsa-project.org

Opera Unica                          Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project               http://www.alsa-project.org
It sounds good!

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants]
  2001-05-19 19:43                                                 ` Pavel Machek
@ 2001-05-19 20:31                                                   ` Tim Jansen
  0 siblings, 0 replies; 317+ messages in thread
From: Tim Jansen @ 2001-05-19 20:31 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Saturday 19 May 2001 21:43, Pavel Machek wrote:
> I think that plan9 uses something different -- they have ttyS0 and
> ttyS0ctl. This would leave us with problem "how do I get handle to
> ttyS0ctl when I only have handle to ttyS0"?

One possibility is to add multiforked (multi-stream) file support to Linux, 
then you could have a control stream. 


bye...

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants]
  2001-05-19 19:35                                               ` Linus Torvalds
  2001-05-19 19:43                                                 ` Pavel Machek
@ 2001-05-19 23:57                                                 ` Alexander Viro
  2001-05-20  7:18                                                   ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumber Registrants] Abramo Bagnara
  1 sibling, 1 reply; 317+ messages in thread
From: Alexander Viro @ 2001-05-19 23:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Machek, James Simmons, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List



On Sat, 19 May 2001, Linus Torvalds wrote:

> 
> On Sat, 19 May 2001, Pavel Machek wrote:
> > 
> > Well, if we did something like modify(int fd, char *how), you could do
> > 
> > modify(0, "nonblock,9600") 
> 
> What you're really proposing is to make ioctl's be ASCII strings.
> 
> Which is not necessarily a bad idea, and I think plan9 did something
> similar (or rather, if I remember correctly, plan9 has control streams
> that were ASCII. Or am I confused?).

You are not. Control streams in question look like normal files. Normally
driver exports a tree with several data files (e.g. fd0, fd1, fd2, fd3)
and several control files (e.g. fd0ctl, fd1ctl, fd2ctl, fd3ctl). write()
to the latter passes commands. No extra syscalls needed.

Notice that sometimes it's not ASCII - depends on the nature of stuff you
are passing. Things like setting font, etc. need to pass bitmaps, so some
parts of the stuff you write end up as binary. Which is perfectly sane.

> And a "stream of bytes" is in a very real sense the simplest structure,
> and is the unix way (and the plan9 way is to avoid binary streams, and use
> ASCII text instead when possible, whihc probably also makes sense).

s/possible/makes sense/. For commands ASCII is OK, but for cases when you
pass binary data as a part of command (not just "something large", but
something that really happens to be a bitmap, etc.) you write it as binary
data.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants]
  2001-05-19 19:17                                             ` Pavel Machek
  2001-05-19 19:35                                               ` Linus Torvalds
@ 2001-05-20  0:01                                               ` Alexander Viro
  2001-05-20 11:17                                                 ` handling network using filesystem [was Re: no ioctls for serial ports?] Pavel Machek
  2001-05-20  9:53                                               ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Num Kai Henningsen
  2 siblings, 1 reply; 317+ messages in thread
From: Alexander Viro @ 2001-05-20  0:01 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, James Simmons, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List



On Sat, 19 May 2001, Pavel Machek wrote:

> I thought about how to do networking without sockets, and it seems to
> me like this kind of modify syscall is needed, because network sockets
> connect to *two* different places (one local address and one
> remote). Sockets are really nasty :-(.

Pavel, take a look at http://plan9.bell-labs.com/sys/man/3/ip


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumber  Registrants]
  2001-05-19 23:57                                                 ` Alexander Viro
@ 2001-05-20  7:18                                                   ` Abramo Bagnara
  2001-05-20  7:41                                                     ` Alexander Viro
  0 siblings, 1 reply; 317+ messages in thread
From: Abramo Bagnara @ 2001-05-20  7:18 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Linus Torvalds, Pavel Machek, James Simmons, Alan Cox,
	Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

Alexander Viro wrote:
> 
> On Sat, 19 May 2001, Linus Torvalds wrote:
> 
> >
> > What you're really proposing is to make ioctl's be ASCII strings.
> >
> > Which is not necessarily a bad idea, and I think plan9 did something
> > similar (or rather, if I remember correctly, plan9 has control streams
> > that were ASCII. Or am I confused?).
> 
> You are not. Control streams in question look like normal files. Normally
> driver exports a tree with several data files (e.g. fd0, fd1, fd2, fd3)
> and several control files (e.g. fd0ctl, fd1ctl, fd2ctl, fd3ctl). write()
> to the latter passes commands. No extra syscalls needed.

I've just had a "so simple to risk to be stupid" idea.

To have /proc/self/fd/N/ioctl would not have the potential to suppress
ioctl needs for *all* current uses?

-- 
Abramo Bagnara                       mailto:abramo@alsa-project.org

Opera Unica                          Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project               http://www.alsa-project.org
It sounds good!

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumber Registrants]
  2001-05-20  7:18                                                   ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumber Registrants] Abramo Bagnara
@ 2001-05-20  7:41                                                     ` Alexander Viro
  2001-05-20  8:30                                                       ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumberRegistrants] Abramo Bagnara
  0 siblings, 1 reply; 317+ messages in thread
From: Alexander Viro @ 2001-05-20  7:41 UTC (permalink / raw)
  To: Abramo Bagnara
  Cc: Linus Torvalds, Pavel Machek, James Simmons, Alan Cox,
	Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List



On Sun, 20 May 2001, Abramo Bagnara wrote:

> I've just had a "so simple to risk to be stupid" idea.
> 
> To have /proc/self/fd/N/ioctl would not have the potential to suppress
> ioctl needs for *all* current uses?

No, it wouldn't. For one thing, it messes the only half-decent part of
procfs. For another, the real issue is how to eliminate the bogus
ioctls from userland programs and what to replace them with.

Crappy API won't become better if you simply change the calling conventions.
And problem with ioctls is that most of them are crappy APIs. Coming from
authors' laziness and/or debility.

So there is no easy way to solve that stuff - we'll need to rethink tons
of badly designed interfaces. Finding a way to represent them in fs is
the least of the problems.

And we really need to rethink them. Repackaged shit remains shit and the
whole point of exrecise is to get rid of it, not to shove it into a new
pile.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending  DeviceNumberRegistrants]
  2001-05-20  7:41                                                     ` Alexander Viro
@ 2001-05-20  8:30                                                       ` Abramo Bagnara
  2001-05-20 10:09                                                         ` Alexander Viro
  0 siblings, 1 reply; 317+ messages in thread
From: Abramo Bagnara @ 2001-05-20  8:30 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Linus Torvalds, Pavel Machek, James Simmons, Alan Cox,
	Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List

Alexander Viro wrote:
> 
> On Sun, 20 May 2001, Abramo Bagnara wrote:
> 
> > I've just had a "so simple to risk to be stupid" idea.
> >
> > To have /proc/self/fd/N/ioctl would not have the potential to suppress
> > ioctl needs for *all* current uses?
> 
> No, it wouldn't. For one thing, it messes the only half-decent part of
> procfs. For another, the real issue is how to eliminate the bogus
> ioctls from userland programs and what to replace them with.

Linus wrote:

> The problem with ioctl is that not only are people passing ioctl's
> pointers to structures, but:
>  - they're not telling how big the structure is
>  - the structure can have pointers to other places
>  - sometimes it modifies the structure passed in

> None of which are "network-nice". Basically, ioctl() is historically used
> as a "pass any crap into driver xxxx, and the driver - and ONLY the driver
> - will know what to do with it".

> And when _only_ a driver knows what the arguments mean, upper layers can't
> encapsulate them. Upper layers cannot make a packet of the argument and
> send it over the network to another machine. Upper layers cannot do
> sanity-checking on things like "is this argument a valid pointer". Which
> means, for example, that not only can you not send the ioctl arguments
> anywhere, but ioctl's have also historically been a hot-bed of bugs.

Suppose now to have a convention that control stream are in the form:
"ACTION ARGUMENTS"

Then we have
echo "speed 19200" > /proc/self/fd/0/ioctl
instead of
stty 19200

It seems to me something different from a pile of shit ;-)

And it may works also via NFS (with some changes).

> Crappy API won't become better if you simply change the calling conventions.
> And problem with ioctls is that most of them are crappy APIs. Coming from
> authors' laziness and/or debility.
> 
> So there is no easy way to solve that stuff - we'll need to rethink tons
> of badly designed interfaces.

This is orthogonal wrt ioctl problems pointed by Linus.

I've simply proposed an *infrastructure* for better interfaces.

-- 
Abramo Bagnara                       mailto:abramo@alsa-project.org

Opera Unica                          Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project               http://www.alsa-project.org
It sounds good!

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-19 17:36                                       ` Jonathan Lundell
@ 2001-05-20  9:37                                         ` Eric W. Biederman
  2001-05-20 14:16                                         ` Chris Wedgwood
                                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 317+ messages in thread
From: Eric W. Biederman @ 2001-05-20  9:37 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: Kai Henningsen, linux-kernel

Jonathan Lundell <jlundell@pobox.com> writes:

> At 10:42 AM +0200 2001-05-19, Kai Henningsen wrote:
> >  > Jeff Garzik's ethtool
> >  > extension at least tells me the PCI bus/dev/fcn, though, and from
> >>  that I can write a userland mapping function to the physical
> >>  location.
> >
> >I don't see how PCI bus/dev/fcn lets you do that.
> 
> I know from system documentation, or can figure out once and for all 
> by experimentation, the correspondence between PCI bus/dev/fcn and 
> physical locations. Jeff's extension gives me the mapping between 
> eth# and PCI bus/dev/fcn, which is not otherwise available (outside 
> the kernel).

Just a second let me reenumerate your pci busses, and change all of the bus
numbers.  Not that this is a bad thought.  It is just you need to know
the tree of PCI busses/bridges up to the root on the machine in question.

Eric

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending Device Num
  2001-05-19 19:17                                             ` Pavel Machek
  2001-05-19 19:35                                               ` Linus Torvalds
  2001-05-20  0:01                                               ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants] Alexander Viro
@ 2001-05-20  9:53                                               ` Kai Henningsen
  2001-05-20 13:40                                                 ` Alexander Viro
  2 siblings, 1 reply; 317+ messages in thread
From: Kai Henningsen @ 2001-05-20  9:53 UTC (permalink / raw)
  To: linux-kernel

pavel@suse.cz (Pavel Machek)  wrote on 19.05.01 in <20010519214321.A9550@atrey.karlin.mff.cuni.cz>:

> I think that plan9 uses something different -- they have ttyS0 and
> ttyS0ctl. This would leave us with problem "how do I get handle to
> ttyS0ctl when I only have handle to ttyS0"?

I've seen this question several times in this thread. I haven't seen the  
obvious answer, though.

Have a new system call:

ctlfd = open_device_control_fd(fd);

If fd is something that doesn't have a control interface (say, it already  
is a control filehandle), this returns an appropriate error code.

This has another benefit, in that you can get control descriptors for  
stuff that doesn't currently have a filename (but does have ioctls), such  
as network sockets.

MfG Kai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending  DeviceNumberRegistrants]
  2001-05-20  8:30                                                       ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumberRegistrants] Abramo Bagnara
@ 2001-05-20 10:09                                                         ` Alexander Viro
  0 siblings, 0 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-20 10:09 UTC (permalink / raw)
  To: Abramo Bagnara
  Cc: Linus Torvalds, Pavel Machek, James Simmons, Alan Cox,
	Neil Brown, Jeff Garzik, H. Peter Anvin,
	Linux Kernel Mailing List



On Sun, 20 May 2001, Abramo Bagnara wrote:

> Suppose now to have a convention that control stream are in the form:
> "ACTION ARGUMENTS"
> 
> Then we have
> echo "speed 19200" > /proc/self/fd/0/ioctl
> instead of
> stty 19200
> 
> It seems to me something different from a pile of shit ;-)

But it isn't.

	a) You are still trying to think of it as an OOB data associated with
normal channel. That is _wrong_. There is no 1-to-1 relation between these
OOB channels and normal ones. Wrong model. Commands are not associated with
data streams. Sometimes you can tie them together, but in many cases you just
can't. Building the infrastructure on that is a Bad Thing(tm).

	b) Way too many ioctls do not have that form. So aside of converting
code to handling the form above you will need to change the bleedin' APIs.
Sorry. No way around that.

	c) Aside of implementing something dumb a-la XDR and putting encoding
part into libc and decoding one into the procfs (which doesn't fix any of
the problems and only adds to ugliness) any method means that you will need
to go through drivers one-by-one. There is no magic way to deal with that
mess at once - the whole problem is that this pile of dung was festering
for too long and became a complete mess. The fact that anyone who felt an
urge to toss into it did so without a second thought also doesn't help.

	I went through that crap about a week ago when I was doing audit of
copy_from_user() callers. And I ask everyone who seriously wants to discuss
the situation: go and read through that code. Write the APIs down. Stare at
them. When you will get the feeling of the things out there (_not_ a vague
"well, they are for passing some commands; how bad can it be?") join the
show.

> > So there is no easy way to solve that stuff - we'll need to rethink tons
> > of badly designed interfaces.
> 
> This is orthogonal wrt ioctl problems pointed by Linus.

No, it isn't. That's the same problem. We have tons of garbage that will have
to be converted to sane form _before_ we can do anything with it. Result of
the braindead attitude of those who were dumping into that pile.

It should be fixed, but it won't be easy and it won't be fast. If you want
to help - wonderful. But keep in mind that it will take months of wading
through the ugliest code we have in the tree. If you've got a weak stomach -
stay out. I've been there and it's not a nice place.

Getting a list of all ioctls in the tree, along with types of their arguments
would be a great start. Anyone willing to help with that?
 
> I've simply proposed an *infrastructure* for better interfaces.

	We already have that infrastructure. It's called ramfs. Building
infrastructure on the model that doesn't fit the problem domain is a Bad
Thing(tm). We already have enough ESRitis around.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* handling network using filesystem [was Re: no ioctls for serial ports?]
  2001-05-20  0:01                                               ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants] Alexander Viro
@ 2001-05-20 11:17                                                 ` Pavel Machek
  0 siblings, 0 replies; 317+ messages in thread
From: Pavel Machek @ 2001-05-20 11:17 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Linus Torvalds, James Simmons, Alan Cox, Neil Brown, Jeff Garzik,
	H. Peter Anvin, Linux Kernel Mailing List

Hi!

> > I thought about how to do networking without sockets, and it seems to
> > me like this kind of modify syscall is needed, because network sockets
> > connect to *two* different places (one local address and one
> > remote). Sockets are really nasty :-(.
> 
> Pavel, take a look at http://plan9.bell-labs.com/sys/man/3/ip

Looks nice, and it seems they are even able to run BSD socket
emulation over that. Wow.

However, it is still mid-ugly:

       Opening  the  clone  file reserves a connection.  The file
       descriptor returned from the open(2)  will  point  to  the
       control  file,  ctl,  of  the  newly allocated connection.
       Reading ctl returns a text string representing the  number
       of the connection.  Connections may be used either to lis­
       ten for incoming calls  or  to  initiate  calls  to  other
       machines.

So, you open "clone". That creates directory for you. You can get its
number by reading from "clone" file.

That's pretty strange, agreed?
								Pavel
-- 
I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending Device Num
  2001-05-20  9:53                                               ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Num Kai Henningsen
@ 2001-05-20 13:40                                                 ` Alexander Viro
  2001-05-20 14:27                                                   ` Tim Jansen
                                                                     ` (2 more replies)
  0 siblings, 3 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-20 13:40 UTC (permalink / raw)
  To: Kai Henningsen; +Cc: linux-kernel



On 20 May 2001, Kai Henningsen wrote:

> I've seen this question several times in this thread. I haven't seen the  
> obvious answer, though.
> 
> Have a new system call:
> 
> ctlfd = open_device_control_fd(fd);
> 
> If fd is something that doesn't have a control interface (say, it already  
> is a control filehandle), this returns an appropriate error code.

It may have several. Which one?

FWIW, I think that mixing network and device ioctls is a bad idea. These
groups are very different and we'd be better off dealing with changes in
them separately.

For devices... I'd say that fpath(2) (same type as getcwd(2)) would be
a good way to deal with that. Or fpath(3) - implemented via readlink(2)
on /proc/self/fd/<n>.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-19 17:36                                       ` Jonathan Lundell
  2001-05-20  9:37                                         ` Eric W. Biederman
@ 2001-05-20 14:16                                         ` Chris Wedgwood
  2001-05-20 15:54                                         ` Jonathan Lundell
  2001-05-20 15:57                                         ` Jonathan Lundell
  3 siblings, 0 replies; 317+ messages in thread
From: Chris Wedgwood @ 2001-05-20 14:16 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: Kai Henningsen, linux-kernel

On Sat, May 19, 2001 at 10:36:14AM -0700, Jonathan Lundell wrote:

    I know from system documentation, or can figure out once and for
    all by experimentation, the correspondence between PCI
    bus/dev/fcn and physical locations. Jeff's extension gives me the
    mapping between eth# and PCI bus/dev/fcn, which is not otherwise
    available (outside the kernel).

Won't work with hotplug PCI (consider plugging in something with a
bridge).



  --cw

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending Device Num
  2001-05-20 13:40                                                 ` Alexander Viro
@ 2001-05-20 14:27                                                   ` Tim Jansen
  2001-05-20 14:30                                                   ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum Abramo Bagnara
  2001-05-22  5:56                                                   ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Num Pavel Machek
  2 siblings, 0 replies; 317+ messages in thread
From: Tim Jansen @ 2001-05-20 14:27 UTC (permalink / raw)
  To: linux-kernel

On Sunday 20 May 2001 15:40, Alexander Viro wrote:
> > ctlfd = open_device_control_fd(fd);
> > If fd is something that doesn't have a control interface (say, it already
> > is a control filehandle), this returns an appropriate error code.
> It may have several. Which one?


That's why I proposed using multi-stream files. With a syscall like

fd2 = open_substream(fd, "somename")

you could have several control streams and also be prepared if you want to 
support multi-stream filesystems like NTFS in the future...

BTW: how does this work in NT? Do you first open a file and then fork it like 
in my example,  do they have a special open for substreams or is the 
substream always encoded in the filename?

bye...

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-20 13:40                                                 ` Alexander Viro
  2001-05-20 14:27                                                   ` Tim Jansen
@ 2001-05-20 14:30                                                   ` Abramo Bagnara
  2001-05-20 14:45                                                     ` Alexander Viro
  2001-05-22  5:56                                                   ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Num Pavel Machek
  2 siblings, 1 reply; 317+ messages in thread
From: Abramo Bagnara @ 2001-05-20 14:30 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Kai Henningsen, linux-kernel

Alexander Viro wrote:
> 
> On 20 May 2001, Kai Henningsen wrote:
> 
> > I've seen this question several times in this thread. I haven't seen the
> > obvious answer, though.
> >
> > Have a new system call:
> >
> > ctlfd = open_device_control_fd(fd);
> > If fd is something that doesn't have a control interface (say, it already
> > is a control filehandle), this returns an appropriate error code.
> 
> It may have several. Which one?

Can you explain better this?

-- 
Abramo Bagnara                       mailto:abramo@alsa-project.org

Opera Unica                          Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project               http://www.alsa-project.org
It sounds good!

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-20 14:30                                                   ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum Abramo Bagnara
@ 2001-05-20 14:45                                                     ` Alexander Viro
  2001-05-20 15:00                                                       ` Abramo Bagnara
                                                                         ` (2 more replies)
  0 siblings, 3 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-20 14:45 UTC (permalink / raw)
  To: Abramo Bagnara; +Cc: Kai Henningsen, linux-kernel



On Sun, 20 May 2001, Abramo Bagnara wrote:

> > It may have several. Which one?
> 
> Can you explain better this?

Example: console. You want to be able to pass font changes. I'm
less than sure that putting them on the same channel as, e.g.,
keyboard mapping changes is a good idea. We can do it, but I don't
see why it's natural thing to do. Moreover, you already have
/dev/vcs<n> and /dev/vcsa<n>. Can you explain what's the difference
between them (per-VC channels) and keyboard mapping (also per-VC)?

Face it, we _already_ have more than one side band.

Moreover, we have channels that are not tied to a particular device -
they are for a group of them. Example: setting timings for IDE controller.
Sure, we can just say "open /dev/hda instead of /dev/hda5", but then we
are back to the "find related file" problem you tried to avoid.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-20 14:45                                                     ` Alexander Viro
@ 2001-05-20 15:00                                                       ` Abramo Bagnara
  2001-05-20 15:18                                                         ` Alexander Viro
  2001-05-20 15:26                                                       ` Jakob Østergaard
  2001-05-21 17:45                                                       ` Oliver Xymoron
  2 siblings, 1 reply; 317+ messages in thread
From: Abramo Bagnara @ 2001-05-20 15:00 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Kai Henningsen, linux-kernel

Alexander Viro wrote:
> 
> On Sun, 20 May 2001, Abramo Bagnara wrote:
> 
> > > It may have several. Which one?
> >
> > Can you explain better this?
> 
> Example: console. You want to be able to pass font changes. I'm
> less than sure that putting them on the same channel as, e.g.,
> keyboard mapping changes is a good idea. We can do it, but I don't
> see why it's natural thing to do. Moreover, you already have
> /dev/vcs<n> and /dev/vcsa<n>. Can you explain what's the difference
> between them (per-VC channels) and keyboard mapping (also per-VC)?
> 
> Face it, we _already_ have more than one side band.

This does not imply it's necessarily a good idea.
We are comparing

echo "9600" > /proc/self/fd/0/speed (or /dev/ttyS0/speed)
echo "8" > /proc/self/fd/0/bits (or /dev/ttyS0/bits)

with 

echo -e "speed 9600\nbits 8" > /proc/self/fd/0/ioctl (or
/dev/ttyS0/ioctl).

My personal preference goes to the latter, but it's a matter of taste
(and convention choice)

(echo -n "keymap " ; cat keymap) > /dev/tty1/ioctl
(echo -n "font " ; cat font) > /dev/tty1/ioctl

This seems ugly to you?

> Moreover, we have channels that are not tied to a particular device -
> they are for a group of them. Example: setting timings for IDE controller.
> Sure, we can just say "open /dev/hda instead of /dev/hda5", but then we
> are back to the "find related file" problem you tried to avoid.

It does not seems appropriate to permit to change IDE timings using an
handle to a partition... nor it seems very safe under a permissions
point of view.

-- 
Abramo Bagnara                       mailto:abramo@alsa-project.org

Opera Unica                          Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project               http://www.alsa-project.org
It sounds good!

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-20 15:00                                                       ` Abramo Bagnara
@ 2001-05-20 15:18                                                         ` Alexander Viro
  2001-05-20 15:40                                                           ` Abramo Bagnara
  0 siblings, 1 reply; 317+ messages in thread
From: Alexander Viro @ 2001-05-20 15:18 UTC (permalink / raw)
  To: Abramo Bagnara; +Cc: Kai Henningsen, linux-kernel



On Sun, 20 May 2001, Abramo Bagnara wrote:

> > Face it, we _already_ have more than one side band.
> 
> This does not imply it's necessarily a good idea.
> We are comparing
> 
> echo "9600" > /proc/self/fd/0/speed (or /dev/ttyS0/speed)
> echo "8" > /proc/self/fd/0/bits (or /dev/ttyS0/bits)
> 
> with 
> 
> echo -e "speed 9600\nbits 8" > /proc/self/fd/0/ioctl (or
> /dev/ttyS0/ioctl).

How about reading from them? You are forcing restriction that may make
sense in some cases, but doesn't look good for everything.

> > Moreover, we have channels that are not tied to a particular device -
> > they are for a group of them. Example: setting timings for IDE controller.
> > Sure, we can just say "open /dev/hda instead of /dev/hda5", but then we
> > are back to the "find related file" problem you tried to avoid.
> 
> It does not seems appropriate to permit to change IDE timings using an
> handle to a partition... nor it seems very safe under a permissions
> point of view.

However, we _do_ allow that. Right now. And yes, I agree that we should
go to separate file for that. And we are right back to finding a related
file.

It's not a function of descriptor. Sorry. Just as with /dev/tty1 -> /dev/vcs1
and its ilk.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-20 14:45                                                     ` Alexander Viro
  2001-05-20 15:00                                                       ` Abramo Bagnara
@ 2001-05-20 15:26                                                       ` Jakob Østergaard
  2001-05-20 15:42                                                         ` Alexander Viro
  2001-05-21 17:45                                                       ` Oliver Xymoron
  2 siblings, 1 reply; 317+ messages in thread
From: Jakob Østergaard @ 2001-05-20 15:26 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Abramo Bagnara, Kai Henningsen, linux-kernel

On Sun, May 20, 2001 at 10:45:07AM -0400, Alexander Viro wrote:
> 
> 
> On Sun, 20 May 2001, Abramo Bagnara wrote:
> 
> > > It may have several. Which one?
> > 
> > Can you explain better this?
> 
> Example: console. You want to be able to pass font changes. I'm
> less than sure that putting them on the same channel as, e.g.,
> keyboard mapping changes is a good idea. We can do it, but I don't
> see why it's natural thing to do. Moreover, you already have
> /dev/vcs<n> and /dev/vcsa<n>. Can you explain what's the difference
> between them (per-VC channels) and keyboard mapping (also per-VC)?
> 
> Face it, we _already_ have more than one side band.

Wouldn't it be natural to
  write(fd, <control type>)
  write(fd, <control information)
  read(fd, reply)

Only one control file for all controls a device understands

> 
> Moreover, we have channels that are not tied to a particular device -
> they are for a group of them. Example: setting timings for IDE controller.
> Sure, we can just say "open /dev/hda instead of /dev/hda5", but then we
> are back to the "find related file" problem you tried to avoid.

If the IDE controller is a device we can control, it should have a device
file and a control device file.

Problem solved, AFAICS.

Controlling an IDE controller by writing to a device that's hanging on one
of it's busses is a hack, IMO.

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-20 15:18                                                         ` Alexander Viro
@ 2001-05-20 15:40                                                           ` Abramo Bagnara
  2001-05-20 16:01                                                             ` Alexander Viro
  0 siblings, 1 reply; 317+ messages in thread
From: Abramo Bagnara @ 2001-05-20 15:40 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Kai Henningsen, linux-kernel

Alexander Viro wrote:
> 
> On Sun, 20 May 2001, Abramo Bagnara wrote:
> 
> > > Face it, we _already_ have more than one side band.
> >
> > This does not imply it's necessarily a good idea.
> > We are comparing
> >
> > echo "9600" > /proc/self/fd/0/speed (or /dev/ttyS0/speed)
> > echo "8" > /proc/self/fd/0/bits (or /dev/ttyS0/bits)
> >
> > with
> >
> > echo -e "speed 9600\nbits 8" > /proc/self/fd/0/ioctl (or
> > /dev/ttyS0/ioctl).
> 
> How about reading from them? You are forcing restriction that may make
> sense in some cases, but doesn't look good for everything.

exec 3>/dev/ttyS0/ioctl
exec 4<&3
echo "speed" >&3
cat <&4
exec 3>&-
exec 4<&-

Can you make a counter example where this doesn't look good?

> 
> > > Moreover, we have channels that are not tied to a particular device -
> > > they are for a group of them. Example: setting timings for IDE controller.
> > > Sure, we can just say "open /dev/hda instead of /dev/hda5", but then we
> > > are back to the "find related file" problem you tried to avoid.
> >
> > It does not seems appropriate to permit to change IDE timings using an
> > handle to a partition... nor it seems very safe under a permissions
> > point of view.
> 
> However, we _do_ allow that. Right now. And yes, I agree that we should
> go to separate file for that. And we are right back to finding a related
> file.

I'd prefer to make what you often call a crapectomy: no IDE timing
change using a partition handle. It's something like to permit that on a
LVM handle, it's stupid...

About tty and vcs split: there the problem is more subtle and it's
related to a missing separation of keyboard and screen.
After to have done this choice (i.e. to have the some behaviour of
serial port) someone has realized that to read from console screen it's
a sensible action (to fetch current content).

This is the typical case where to have /dev/tty1/ioctl does not
substitute to have another device for console screen reading.

Note that it's a *different* device (different permission, etc.).

-- 
Abramo Bagnara                       mailto:abramo@alsa-project.org

Opera Unica                          Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project               http://www.alsa-project.org
It sounds good!

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-20 15:26                                                       ` Jakob Østergaard
@ 2001-05-20 15:42                                                         ` Alexander Viro
  0 siblings, 0 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-20 15:42 UTC (permalink / raw)
  To: Jakob Østergaard; +Cc: Abramo Bagnara, Kai Henningsen, linux-kernel



On Sun, 20 May 2001, [iso-8859-1] Jakob ьstergaard wrote:

> > Face it, we _already_ have more than one side band.
> 
> Wouldn't it be natural to
>   write(fd, <control type>)
>   write(fd, <control information)
>   read(fd, reply)
> 
> Only one control file for all controls a device understands

That's one of the ways to do it. However, it's less than ideal when
you want to mix access to such channels. Again, look at font and
screen contents of VC. You can force everything into that model.
It even makes sense for many cases. Not all of them, though and
any solution for the rest will handle the special case.

Example of such solution (_not_ for sockets - they are very different)
readlink() on /proc/self/fd/<n>, then replace everything past
the last /. BTW, that way you can bind a device into chroot jail, but
leave some subset of channels out of it. Or all of them. Just don't
bind the side channels there.

> > Moreover, we have channels that are not tied to a particular device -
> > they are for a group of them. Example: setting timings for IDE controller.
> > Sure, we can just say "open /dev/hda instead of /dev/hda5", but then we
> > are back to the "find related file" problem you tried to avoid.
> 
> If the IDE controller is a device we can control, it should have a device
> file and a control device file.
> 
> Problem solved, AFAICS.

Sure, but the same logics applies to /proc/self/fd/<n>/ioctl. Yes, sometimes
you need to figure out the name of related file. Depends on situation.
Saying that we have one very special related file that corresponds to current
ioctls looks rather bogus.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-19 17:36                                       ` Jonathan Lundell
  2001-05-20  9:37                                         ` Eric W. Biederman
  2001-05-20 14:16                                         ` Chris Wedgwood
@ 2001-05-20 15:54                                         ` Jonathan Lundell
  2001-05-20 15:57                                         ` Jonathan Lundell
  3 siblings, 0 replies; 317+ messages in thread
From: Jonathan Lundell @ 2001-05-20 15:54 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Kai Henningsen, linux-kernel

At 3:37 AM -0600 2001-05-20, Eric W. Biederman wrote:
>Jonathan Lundell <jlundell@pobox.com> writes:
>
>>  At 10:42 AM +0200 2001-05-19, Kai Henningsen wrote:
>>  >  > Jeff Garzik's ethtool
>>  >  > extension at least tells me the PCI bus/dev/fcn, though, and from
>>  >>  that I can write a userland mapping function to the physical
>>  >>  location.
>>  >
>>  >I don't see how PCI bus/dev/fcn lets you do that.
>>
>>  I know from system documentation, or can figure out once and for all
>>  by experimentation, the correspondence between PCI bus/dev/fcn and
>>  physical locations. Jeff's extension gives me the mapping between
>>  eth# and PCI bus/dev/fcn, which is not otherwise available (outside
>>  the kernel).
>
>Just a second let me reenumerate your pci busses, and change all of the bus
>numbers.  Not that this is a bad thought.  It is just you need to know
>the tree of PCI busses/bridges up to the root on the machine in question.

Yes, you do. And it's true that renumbering is problematical; I 
hadn't thought of all the implications. Say, you have a system with 
hot-plug slots on two buses, and someone hot-plugs a card with a 
bridge (fairly common; most dual/quad Ethernet boards have a bridge). 
If the buses were numbered densely to begin with, they're going to 
have to be renumbered above the point that the new bridge was added.

Phooey. Well, it can still be done, but it's a bit more complicated 
than the bus/dev/fcn-to-location map I was imagining. You'd have to 
describe the topology of the built-in buses, and dynamically make the 
correspondences. As you say, "know the tree", by topology, not bus 
numbers.


-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: To Pending Device Number Registrants
  2001-05-19 17:36                                       ` Jonathan Lundell
                                                           ` (2 preceding siblings ...)
  2001-05-20 15:54                                         ` Jonathan Lundell
@ 2001-05-20 15:57                                         ` Jonathan Lundell
  3 siblings, 0 replies; 317+ messages in thread
From: Jonathan Lundell @ 2001-05-20 15:57 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Kai Henningsen, linux-kernel

At 2:16 AM +1200 2001-05-21, Chris Wedgwood wrote:
>On Sat, May 19, 2001 at 10:36:14AM -0700, Jonathan Lundell wrote:
>
>     I know from system documentation, or can figure out once and for
>     all by experimentation, the correspondence between PCI
>     bus/dev/fcn and physical locations. Jeff's extension gives me the
>     mapping between eth# and PCI bus/dev/fcn, which is not otherwise
>     available (outside the kernel).
>
>Won't work with hotplug PCI (consider plugging in something with a
>bridge).

It's true that hotplug devices make it more complicated, but I think 
the result can be achieved by describing the correspondence 
topologically rather than as a simple b/d/f-to-location table.
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-20 15:40                                                           ` Abramo Bagnara
@ 2001-05-20 16:01                                                             ` Alexander Viro
  0 siblings, 0 replies; 317+ messages in thread
From: Alexander Viro @ 2001-05-20 16:01 UTC (permalink / raw)
  To: Abramo Bagnara; +Cc: Kai Henningsen, linux-kernel



On Sun, 20 May 2001, Abramo Bagnara wrote:

> > How about reading from them? You are forcing restriction that may make
> > sense in some cases, but doesn't look good for everything.
> 
> exec 3>/dev/ttyS0/ioctl
> exec 4<&3
> echo "speed" >&3
> cat <&4
> exec 3>&-
> exec 4<&-
> 
> Can you make a counter example where this doesn't look good?

If in your opinion it looks good... Again, you are forcing the policy
decision on a lot of interfaces. For no good reason, AFAICS.

> > However, we _do_ allow that. Right now. And yes, I agree that we should
                                                     ^^^^^^^^^^^^^^^^^^^^^^
> > go to separate file for that. And we are right back to finding a related
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > file.
> 
> I'd prefer to make what you often call a crapectomy: no IDE timing
> change using a partition handle. It's something like to permit that on a
> LVM handle, it's stupid...

Sigh... Sure, but so is the magic way to get ioctl descriptor by file
descriptor. They are two separate files. End of story. Yes, they are
related - provided by the same driver.

Look, emulating ioctl(2) via write(2) is _not_ the goal. We will have
to keep that syscall, for binary compatibility reasons if nothing else.
We can start fixing the applications that use Linux-only ioctls (and
that pretty much means that we'll have to leave the networking ones alone
for quite a while).

What _is_ interesting is a sane API that could be used instead of the
current mess with device ioctls. There's only one reason to go for
fs/n/ioctl scheme - mass conversion of applications with little to
no thinking. Not going to happen. Simply because you'll need to switch
arguments in many of these cases.
 
> About tty and vcs split: there the problem is more subtle and it's
> related to a missing separation of keyboard and screen.
> After to have done this choice (i.e. to have the some behaviour of
> serial port) someone has realized that to read from console screen it's
> a sensible action (to fetch current content).

> This is the typical case where to have /dev/tty1/ioctl does not
> substitute to have another device for console screen reading.
> 
> Note that it's a *different* device (different permission, etc.).

And your ioctl is different from that because...? I can certainly see
good reasons to restrict user to some subsets functionality provided by
ioctls - BTW, one more reason why multiple related channels are good.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-20 14:45                                                     ` Alexander Viro
  2001-05-20 15:00                                                       ` Abramo Bagnara
  2001-05-20 15:26                                                       ` Jakob Østergaard
@ 2001-05-21 17:45                                                       ` Oliver Xymoron
  2001-05-21 18:14                                                         ` Alexander Viro
  2 siblings, 1 reply; 317+ messages in thread
From: Oliver Xymoron @ 2001-05-21 17:45 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Abramo Bagnara, Kai Henningsen, linux-kernel

On Sun, 20 May 2001, Alexander Viro wrote:

> On Sun, 20 May 2001, Abramo Bagnara wrote:
>
> > > It may have several. Which one?
> >
> > Can you explain better this?
>
> Example: console. You want to be able to pass font changes. I'm
> less than sure that putting them on the same channel as, e.g.,
> keyboard mapping changes is a good idea.

If you've got side channels that are of a packet nature (aka commands),
then they can all happily coexist on one device. If you've got channels
that are streams or intended for mmap, those ought to be different
devices.

> Moreover, we have channels that are not tied to a particular device -
> they are for a group of them. Example: setting timings for IDE controller.
> Sure, we can just say "open /dev/hda instead of /dev/hda5", but then we
> are back to the "find related file" problem you tried to avoid.

Hmmm.. I suspect there's a permission issue lurking here anyway. It's
probably valid to want to give out raw partition access, say to a
database user, but not to give out permission to fiddle with the
underlying drive.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-21 17:45                                                       ` Oliver Xymoron
@ 2001-05-21 18:14                                                         ` Alexander Viro
  2001-05-21 18:37                                                           ` Oliver Xymoron
  0 siblings, 1 reply; 317+ messages in thread
From: Alexander Viro @ 2001-05-21 18:14 UTC (permalink / raw)
  To: Oliver Xymoron; +Cc: Abramo Bagnara, Kai Henningsen, linux-kernel



On Mon, 21 May 2001, Oliver Xymoron wrote:

> If you've got side channels that are of a packet nature (aka commands),
> then they can all happily coexist on one device. If you've got channels
> that are streams or intended for mmap, those ought to be different
> devices.

Since you've been refering to -9 - care to take a look at the contents of
uart(3)? Or lpt(3). Or draw(3), for that matter.


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-21 18:14                                                         ` Alexander Viro
@ 2001-05-21 18:37                                                           ` Oliver Xymoron
  2001-05-21 18:49                                                             ` Alexander Viro
  0 siblings, 1 reply; 317+ messages in thread
From: Oliver Xymoron @ 2001-05-21 18:37 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel

On Mon, 21 May 2001, Alexander Viro wrote:

> On Mon, 21 May 2001, Oliver Xymoron wrote:
>
> > If you've got side channels that are of a packet nature (aka commands),
> > then they can all happily coexist on one device. If you've got channels
> > that are streams or intended for mmap, those ought to be different
> > devices.
>
> Since you've been refering to -9 - care to take a look at the contents of
> uart(3)? Or lpt(3). Or draw(3), for that matter.

K - so what? I'm guessing what you want me to see is that these
implement multiple channels. Is there a reason that eia001stat couldn't be
implemented as

 f=open("/dev/eia001ctl",O_RDWR);
 write(f,"stat\n");
 status=read(f); /* returns "stat foo\n" */

We don't want to implement a separate device node for every OOB ioctl that
returns data, do we? Why should stat be any different?

/dev/draw is interesting but largely irrelevant. And again, colormap and
refresh - why are they not part of ctl? You've got to select on refresh
anyway, might as well accept asynchronous messages through ctl.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-21 18:37                                                           ` Oliver Xymoron
@ 2001-05-21 18:49                                                             ` Alexander Viro
  2001-05-21 19:08                                                               ` Oliver Xymoron
  0 siblings, 1 reply; 317+ messages in thread
From: Alexander Viro @ 2001-05-21 18:49 UTC (permalink / raw)
  To: Oliver Xymoron; +Cc: linux-kernel



On Mon, 21 May 2001, Oliver Xymoron wrote:

> K - so what? I'm guessing what you want me to see is that these
> implement multiple channels. Is there a reason that eia001stat couldn't be
> implemented as
> 
>  f=open("/dev/eia001ctl",O_RDWR);
>  write(f,"stat\n");
>  status=read(f); /* returns "stat foo\n" */

Less convenient.

> We don't want to implement a separate device node for every OOB ioctl that
> returns data, do we? Why should stat be any different?

For every? Probably not. Forcing all of them together? I bet that in many
cases it will be damn inconvenient. You are forcing the policy on all
drivers. For no good reason, AFAICS.

> /dev/draw is interesting but largely irrelevant. And again, colormap and
> refresh - why are they not part of ctl? You've got to select on refresh
> anyway, might as well accept asynchronous messages through ctl.

You've got to do _what_ on refresh?


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum
  2001-05-21 18:49                                                             ` Alexander Viro
@ 2001-05-21 19:08                                                               ` Oliver Xymoron
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Xymoron @ 2001-05-21 19:08 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel

On Mon, 21 May 2001, Alexander Viro wrote:

> On Mon, 21 May 2001, Oliver Xymoron wrote:
>
> > K - so what? I'm guessing what you want me to see is that these
> > implement multiple channels. Is there a reason that eia001stat couldn't be
> > implemented as
> >
> >  f=open("/dev/eia001ctl",O_RDWR);
> >  write(f,"stat\n");
> >  status=read(f); /* returns "stat foo\n" */
>
> Less convenient.

True enough.

> > We don't want to implement a separate device node for every OOB ioctl that
> > returns data, do we? Why should stat be any different?
>
> For every? Probably not. Forcing all of them together? I bet that in many
> cases it will be damn inconvenient. You are forcing the policy on all
> drivers. For no good reason, AFAICS.

No - I'm merely pointing out that it's sufficient. And I'm pretty sure we
want to make additional control or stream interfaces the exception rather
than the rule. And having a standard read and write protocol of some sort
for ctl devices is more or less mandatory, otherwise they will all work
differently. This is not to say driver writers aren't allowed to depart
from it, just that it'll be more work if they do.

> > /dev/draw is interesting but largely irrelevant. And again, colormap and
> > refresh - why are they not part of ctl? You've got to select on refresh
> > anyway, might as well accept asynchronous messages through ctl.
>
> You've got to do _what_ on refresh?

I'm guessing some sort of poll or select on the refresh device, assuming a
single-threaded app. But no, I've never used 9 nor am I especially
interested in exploring it in depth, given its license and lack of
community.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: no ioctls for serial ports? [was Re: LANANA: To Pending Device Num
  2001-05-20 13:40                                                 ` Alexander Viro
  2001-05-20 14:27                                                   ` Tim Jansen
  2001-05-20 14:30                                                   ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum Abramo Bagnara
@ 2001-05-22  5:56                                                   ` Pavel Machek
  2 siblings, 0 replies; 317+ messages in thread
From: Pavel Machek @ 2001-05-22  5:56 UTC (permalink / raw)
  To: Alexander Viro, Kai Henningsen; +Cc: linux-kernel

Hi!

> > I've seen this question several times in this thread. I haven't seen the  
> > obvious answer, though.
> > 
> > Have a new system call:
> > 
> > ctlfd = open_device_control_fd(fd);
> > 
> > If fd is something that doesn't have a control interface (say, it already  
> > is a control filehandle), this returns an appropriate error code.
> 
> It may have several. Which one?
> 
> FWIW, I think that mixing network and device ioctls is a bad idea. These
> groups are very different and we'd be better off dealing with changes in
> them separately.
> 
> For devices... I'd say that fpath(2) (same type as getcwd(2)) would be
> a good way to deal with that. Or fpath(3) - implemented via readlink(2)
> on /proc/self/fd/<n>.

fpath is *wrong* solution, and extremely ugly.

stty 115200 < /dev/ttyS0 &
rm /dev/ttyS0

or even worse

stty 115200 < /dev/ttyS0 &
ln -s /dev/ttyS1 /dev/ttyS0

What I'm trying to show is that with fpath you can no longer delete
open devices and continue to work with them. I really think that
open_sub(fd, "control") is right solution.
								Pavel
-- 
I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
       [not found] <7146.1033580256@warthog.cambridge.redhat.com>
@ 2002-10-03  0:36 ` Linus Torvalds
  2002-10-03  9:05   ` David Howells
  2002-10-04 14:11   ` [patch] [kkern] " Patrick Audley
  0 siblings, 2 replies; 317+ messages in thread
From: Linus Torvalds @ 2002-10-03  0:36 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel


On Wed, 2 Oct 2002, David Howells wrote:
> 
> This patch adds an Andrew File System (AFS) driver to the kernel. Currently
> it only provides read-only, uncached, non-automounted and unsecured support.

Are you sure this is the right way to go?

As far as I can tell, this is a dead end, because we fundamentally cannot
do the local backing store from the kernel.

>From my (nonexistent) understanding of how AFS works, would it not be a 
whole lot more sensible to implement it as a coda client or something like 
that (with the networking support in-kernel, but with the caching logic 
etc in user space).

I dunno, I just get the feeling that a good AFS client simply cannot be 
done entirely in kernel space, and if you start off like this, you'll 
never get where you really want to go. Pls comment on this (and yeah, the 
comment can be a "Boy, you're really a stupid git, and here's why: xyz", 
but I really want the "xyz" part too ;)

Now, admittedly maybe the user-space deamon approach is crap, and what we
really want is to have some way to cache network stuff on the disk
directly from the kernel, ie just implement a real mapping/page-indexed
cachefs that people could mount and use together with different network 
filesystems.

		Linus


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-03  0:36 ` [PATCH] AFS filesystem for Linux (2/2) Linus Torvalds
@ 2002-10-03  9:05   ` David Howells
  2002-10-03 16:53     ` Jan Harkes
  2002-10-06 16:49     ` Troy Benjegerdes
  2002-10-04 14:11   ` [patch] [kkern] " Patrick Audley
  1 sibling, 2 replies; 317+ messages in thread
From: David Howells @ 2002-10-03  9:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Howells, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6517 bytes --]


Linus Torvalds wrote:
> On Wed, 2 Oct 2002, David Howells wrote:
> >
> > This patch adds an Andrew File System (AFS) driver to the
> > kernel. Currently it only provides read-only, uncached, non-automounted
> > and unsecured support.
>
> Are you sure this is the right way to go?

I think so. I think it makes sense for the AFS VFS-interface to go as directly
as possible to the network without having to make context switches to get into
userspace.

> As far as I can tell, this is a dead end, because we fundamentally cannot
> do the local backing store from the kernel.

I disagree. I think we can (besides which OpenAFS does so), and that most of
it is probably easier to do here than in userspace. For example:

 (*) Readpage

     The filesystem can either use a BIO to read directly into a new page if
     the page is already in the cache, or it can read from across the network,
     and then use a BIO to write the updated page into the cache. This should
     avoid as much page-aliasing as possible.

 (*) Writepage

     The filesystem can use a BIO to write a page into the cache, whilst
     simultaneously dispatching it across the network. Alternatively, it can
     write the page to the cache (fast) immediately and queue it up to be sent
     across the network (slow) from the cache if memory pressure is high.

 (*) Index searching

     The cache needs to keep an on disc index if the contents are to survive
     rebooting. This can be searched more efficiently from within the kernel.

     I've written a scanning algorithm that can scan any file for fixed length
     records in a manner that allows the disc blocks to be scanned in whatever
     order they come off of the disc. This should make scanning the index
     files faster, and may not actually be possible in userspace.

 (*) On-disc file layout

     The on-disc file layout I'm using is the same as many other unix
     filesystems (direct, indirect and double-indirect block pointers) and is
     fairly simple. The biggest difference where I see a hole, I know I have
     to fetch the page from the server, rather that just assuming there's an
     implicit empty page there.

     I'm fetching files a page at a time (on demand from the VM), though I may
     extend this to get bunches of pages for efficiency reasons.

     What I'm not going to do is fetch each file into the the cache and
     maintain it there in its entirety from the moment it is opened to the
     moment it is closed. This has two definite disadvantages: you can't open
     a file bigger than the remaining space in the cache, and the size of the
     cache and the sizes of the files opened limits the number of files you
     can have open.

Currently my plans are not to support disconnected operation (as there are
likely to be holes in the files cached). This means that I don't need to cache
security information on disc, since I can "retrieve" it from the server upon
opening a file anyway.

One thing I'm currently undecided on is whether the security tokens for a user
should be attached to the struct file * being used, or whether they should be
retrieved from a list attached to the current process in some way.

> From my (nonexistent) understanding of how AFS works, would it not be a
> whole lot more sensible to implement it as a coda client or something like
> that (with the networking support in-kernel, but with the caching logic
> etc in user space).

See above.

As has been suggested to me, it may be possible to unload just the space
reclaimation algorithm to userspace. It may also be possible to put the index
maintainer in userspace, and have afs_iget() call it to search for and add
records. "Callbacks" from the server (which indicate that a file has changed
in someway) could also be passed to the index maintainer so that is can
release all the cache blocks for the changed file.

However, the biggest problem with splitting the caching like this is that
there then has to be some sort of locking between kernel space and userspace
to govern access to the allocation bitmaps (or whatever).

Arjan Van de Ven's suggestion is that all the cached data files should be
exposed as files in the cachefs, which then would have an unlink method
available so that the userspace daemon can tell the kernel side cache manager
to reclaim a particular file.

> I dunno, I just get the feeling that a good AFS client simply cannot be
> done entirely in kernel space, and if you start off like this, you'll
> never get where you really want to go. Pls comment on this (and yeah, the
> comment can be a "Boy, you're really a stupid git, and here's why: xyz",
> but I really want the "xyz" part too ;)

I think it can (and should) be done in the kernel (at least for the most part
- there are auxilliary userspace tools that consult the server directly). See
the reasons given above.

> Now, admittedly maybe the user-space deamon approach is crap, and what we
> really want is to have some way to cache network stuff on the disk
> directly from the kernel, ie just implement a real mapping/page-indexed
> cachefs that people could mount and use together with different network
> filesystems.

Hmmm... Interesting idea. There is the problem of working out which files
belong to what source. The AFS filesystem has a three tier approach to
identifying the source of a file: {cell,volume,vnode}, where any given volume
can be on more than one server in a particular cell, and a vnode is the
equivalent of an inode. I suppose NFS, say, could be handled similarly:
{server,export,inode}, and SMB would probably be {server,share,filename}.

The biggest hurdle here is the difference in potential record lengths:-/

		CELL RECORD	CONSISTS OF
	AFS	64 + 16*4	name + 16 volume location servers
	NFS	4		IPv4 address
	SMB	?		server name (maybe just IP address)

		VLDB RECORD	CONSISTS OF
	AFS	64 + 64		volume name, numbers and server info
	NFS	4096?		export path length
	SMB	?		share name

		VNODE INDEX	CONSISTS OF
	AFS	4 + 4		vnode ID number, vnode ID version
	NFS	8		inode number
	SMB	4096?		full file name within share
 or	SMB	4 + 256		cache dir index and filename

To have a heterogenous cache, the VLDB record and vnode index records could be
extended to 2K or 4K in size, or maybe separate catalogues and indices could
be maintained for different filesystem types, and a 0th tier could be a
catalogue of different types held within this cache, complete with information
as to the entry sizes of the tier 1, 2 and 3 catalogues.

David


[-- Attachment #2: Type: text/plain, Size: 1806 bytes --]


THE CACHE LAYOUT
================

The local cache will be structured as a set of files:

  (1) Meta-data file. Contains meta data records (~=inodes) for every "file"
      in the cache (including itself).

      Each meta-data record contains a set of direct block pointers, an
      indirect block pointer and a double indirect block pointer by which the
      data to which it points can be located on disc.

      There will always be enough direct pointers to refer to all the blocks
      in a directory directly.

  (2) Cell cache catalogue. Any cell for which we have data cached will be
      recorded in this file.

  (3) Volume location catalogue. Any volume for which we have data cached will
      be recorded in this file. Each VL entry points to the cell record to
      which it belongs.

  (4) A set of indexes (hash table type thing) with fixed size records as
      small as I can make them that cross-reference a volume location and a
      vnode number with an entry in the meta-data file that describes where to
      find the cached data on disc.

      Index entries are also time stamped to show the last time they were
      accessed.

  (5) The cached vnodes (files/dirs/symlinks) themselves.

      Note that in the case of a cached file, any hole in the file _actually_
      represents a page not yet fetched from the server.

There is also a bitmap to indicate which blocks are currently allocated.

I wasn't planning on storing user accessiblity data in the cache itself,
though I can always change my mind later, because this differs from user to
user, and is subject to change without notice from one attempt to read to
another due to the lack of notifications from the protection server when ACLs
change. This sort of thing will have to be stored in each "struct file".

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-03  9:05   ` David Howells
@ 2002-10-03 16:53     ` Jan Harkes
  2002-10-03 17:45       ` Jan Harkes
                         ` (3 more replies)
  2002-10-06 16:49     ` Troy Benjegerdes
  1 sibling, 4 replies; 317+ messages in thread
From: Jan Harkes @ 2002-10-03 16:53 UTC (permalink / raw)
  To: David Howells; +Cc: Linus Torvalds, linux-kernel

On Thu, Oct 03, 2002 at 10:05:39AM +0100, David Howells wrote:
> Linus Torvalds wrote:
> > On Wed, 2 Oct 2002, David Howells wrote:
> > >
> > > This patch adds an Andrew File System (AFS) driver to the
> > > kernel. Currently it only provides read-only, uncached, non-automounted
> > > and unsecured support.
> >
> > Are you sure this is the right way to go?
> 
> I think so. I think it makes sense for the AFS VFS-interface to go as directly
> as possible to the network without having to make context switches to get into
> userspace.

So you want to eventually link kerberos into the kernel to get the
security right?

> > As far as I can tell, this is a dead end, because we fundamentally cannot
> > do the local backing store from the kernel.
> 
> I disagree. I think we can (besides which OpenAFS does so), and that most of
> it is probably easier to do here than in userspace. For example:
> 
>  (*) Readpage
> 
>      The filesystem can either use a BIO to read directly into a new page if
>      the page is already in the cache, or it can read from across the network,
>      and then use a BIO to write the updated page into the cache. This should
>      avoid as much page-aliasing as possible.

Coda 'solves' the page-aliasing issues by passing the kernel the same
file descriptor as it is using itself to put the data into the container
(cache) file. You could do the same and tell the kernel what the
'expected size' is, it can then block or trigger further fetches when
that part of the file isn't available yet.

We don't need to do it at such a granualarity because of the
disconnected operation. It is more reliable as we can return a stale
copy when we lose the network halfway during the fetch.

>  (*) Writepage
> 
>      The filesystem can use a BIO to write a page into the cache, whilst
>      simultaneously dispatching it across the network. Alternatively, it can
>      write the page to the cache (fast) immediately and queue it up to be sent
>      across the network (slow) from the cache if memory pressure is high.

Hmm, a version of AFS that doesn't adhere to AFS semantics, interesting.
Are you going to emulate the same broken behaviour as transarc AFS on
O_RDWR? Basically when you open a file O_RDWR and write some data, and
anyone else 'commits' an update to the file before you close the
filehandle. Your client writes back the previously committed data, which
it has proactively fetched, but with the local metadata (i.e. i_size).
So you end up with something that closely resembles neither of the actual
versions that were written.

>  (*) Index searching
> 
>      The cache needs to keep an on disc index if the contents are to survive
>      rebooting. This can be searched more efficiently from within the kernel.
> 
>      I've written a scanning algorithm that can scan any file for fixed length
>      records in a manner that allows the disc blocks to be scanned in whatever
>      order they come off of the disc. This should make scanning the index
>      files faster, and may not actually be possible in userspace.

Can you say hack. Different underlying filesystems will lay out their
data differently, who says that ext3 with the dirindex hashes or
reiserfs, or foofs will not suddenly break your solution and still work
reliable (and faster) from userspace. When you can a file from userspace
the kernel will give you readahead, and with a well working elevator
any 'improvements' you obtain really should end up in the noise.

>  (*) On-disc file layout
> 
>      The on-disc file layout I'm using is the same as many other unix
>      filesystems (direct, indirect and double-indirect block pointers) and is
>      fairly simple. The biggest difference where I see a hole, I know I have
>      to fetch the page from the server, rather that just assuming there's an
>      implicit empty page there.

Intermezzo does the same thing, they even proposed a 'punch hole'
syscall to allow a userspace daemon to 'invalidate' parts of a file so
that the kernel will send the upcall to refetch the data from the server.

>      I'm fetching files a page at a time (on demand from the VM), though I may
>      extend this to get bunches of pages for efficiency reasons.

VM/VFS will handle appropriate readahead for you, you might just want to
join the separate requests into one bigger request.

>      moment it is closed. This has two definite disadvantages: you can't open
>      a file bigger than the remaining space in the cache, and the size of the
>      cache and the sizes of the files opened limits the number of files you
>      can have open.

And one definite advantage, you actually provide AFS session semantics.

> > Now, admittedly maybe the user-space deamon approach is crap, and what we
> > really want is to have some way to cache network stuff on the disk
> > directly from the kernel, ie just implement a real mapping/page-indexed
> > cachefs that people could mount and use together with different network
> > filesystems.
> 
> Hmmm... Interesting idea. There is the problem of working out which files
> belong to what source. The AFS filesystem has a three tier approach to
> identifying the source of a file: {cell,volume,vnode}, where any given volume
> can be on more than one server in a particular cell, and a vnode is the
> equivalent of an inode. I suppose NFS, say, could be handled similarly:
> {server,export,inode}, and SMB would probably be {server,share,filename}.

And my current development version of Coda has {cell,volume,vnode,unique}
(128 bits), which is the same size as a UUID which was designed to have
a very high probability of uniqueness. So if I ever consider adding
another 'ident', I'll just switch to identifying each object with a
UUID.

> The biggest hurdle here is the difference in potential record lengths:-/
> 
> 		CELL RECORD	CONSISTS OF
> 	AFS	64 + 16*4	name + 16 volume location servers
> 	NFS	4		IPv4 address
> 	SMB	?		server name (maybe just IP address)

How about IPv6?

> To have a heterogenous cache, the VLDB record and vnode index records could be
> extended to 2K or 4K in size, or maybe separate catalogues and indices could
> be maintained for different filesystem types, and a 0th tier could be a
> catalogue of different types held within this cache, complete with information
> as to the entry sizes of the tier 1, 2 and 3 catalogues.

Or you could use a hash or a userspace daemon that can map a fs-specific
handle to a local cache file.

Jan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-03 16:53     ` Jan Harkes
@ 2002-10-03 17:45       ` Jan Harkes
  2002-10-03 21:46       ` David Howells
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 317+ messages in thread
From: Jan Harkes @ 2002-10-03 17:45 UTC (permalink / raw)
  To: linux-kernel

On Thu, Oct 03, 2002 at 12:53:04PM -0400, Jan Harkes wrote:
> On Thu, Oct 03, 2002 at 10:05:39AM +0100, David Howells wrote:
> > To have a heterogenous cache, the VLDB record and vnode index records could be
> > extended to 2K or 4K in size, or maybe separate catalogues and indices could
> > be maintained for different filesystem types, and a 0th tier could be a
> > catalogue of different types held within this cache, complete with information
> > as to the entry sizes of the tier 1, 2 and 3 catalogues.
> 
> Or you could use a hash or a userspace daemon that can map a fs-specific
> handle to a local cache file.

I just thought of the fh_to_dentry stuff that is used by knfsd. Those
fh keys should be just the right (and fs independent) thing to index
such a generic fs-cache with.

Jan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-03 16:53     ` Jan Harkes
  2002-10-03 17:45       ` Jan Harkes
@ 2002-10-03 21:46       ` David Howells
  2002-10-04  8:13       ` David Howells
       [not found]       ` <15381.1033681790@warthog.cambridge.redhat.com>
  3 siblings, 0 replies; 317+ messages in thread
From: David Howells @ 2002-10-03 21:46 UTC (permalink / raw)
  To: David Howells, Linus Torvalds, linux-kernel

Hi Jan,

Do I take it you were (partially) responsible for Coda development? I have to
admit I don't know much about Coda.

> So you want to eventually link kerberos into the kernel to get the
> security right?

That's unnecessary judging by OpenAFS. AFAICT, only the ticket needs to be
cached in the kernel (this is obtained by means of a userspace program), and
then the ticket is passed through the security challenge/response mechanism
provided by RxRPC.

Otherwise, the entire network side of OpenAFS would have to be in userspace
too, I suspect.

It may be possible to offload the security aspects to userspace. I'll have to
think about that.

Besides, I get the impression that NFSv4 may require some level of kerberos
support in the kernel.

> Coda 'solves' the page-aliasing issues by passing the kernel the same file
> descriptor as it is using itself to put the data into the container (cache)
> file. You could do the same and tell the kernel what the 'expected size' is,
> it can then block or trigger further fetches when that part of the file
> isn't available yet.

I presume Coda uses a 1:1 mapping between Coda files and cache files stored
under a local filesystem (such as EXT3). If so, how do you detect holes in the
file, given that the underlying fs doesn't permit you to differentiate between
a hole and a block of zeros.

> We don't need to do it at such a granualarity because of the disconnected
> operation. It is more reliable as we can return a stale copy when we lose
> the network halfway during the fetch.

OTOH, if you have a copy that you know is now out of date, then one could
argue that you shouldn't let the user/application see it, as they/it are then
basing anything they do on known "bad" data.

Should I also take it that Coda keeps the old file around until it has fetched
a revised copy? If so, then surely you can't update a file unless your cache
can find room for the entire revised copy. Surely another consequence of this
is that the practical maximum file size you can deal with is half the size of
your cache.

> Hmm, a version of AFS that doesn't adhere to AFS semantics, interesting.
> Are you going to emulate the same broken behaviour as transarc AFS on
> O_RDWR? Basically when you open a file O_RDWR and write some data, and
> anyone else 'commits' an update to the file before you close the
> filehandle. Your client writes back the previously committed data, which it
> has proactively fetched, but with the local metadata (i.e. i_size).  So you
> end up with something that closely resembles neither of the actual versions
> that were written.

What I'm intending to do is have the write VFS method attempt to write the new
data direct to the server and to the cache simultaneously where possible. If
the volume is not available for some reason, I have a number of choices:

 (1) Make the write block until the volume becomes available again.

 (2) Immediately(-ish) fail with an error.

 (3) Store the write in the cache and try and sync up with the volume when it
     becomes available again.

However, with shared writable mappings, this isn't necessarily possible as we
can only really get the data when the VM prods our writepage(s) method. In
this case, we have another choice:

 (4) "Diff" the page in the pagecache against a copy stored in the cache and
     try to send the changes to the server.

Using disconnected operation doesn't actually make this any easier. The
problem of how and when write conflicts are resolved still arises.

There is a fifth option, and that is to try to lock the target file against
other accessors whilst we are trying to write to it (prepare/commit write
maybe).

> Different underlying filesystems will lay out their data differently, who
> says that ext3 with the dirindex hashes or reiserfs, or foofs will not
> suddenly break your solution and still work reliable (and faster) from
> userspace.

Because (and I may not have made this clear) you nominate a block device as
the cache, not an already existing filesystem, and mount it as afscache
filesystem type. _This_ specifies the layout of the cache, and so whatever
other filesystems do is irrelevant.

> Can you say hack.

No need to. I can go direct to the block device through the BIO system, and so
can throw a heap of requests at the blockdev and deal with them as they
complete, in the order they are read off of the disc when scanning catalogues.

> When you can a file from userspace the kernel will give you readahead, and
> with a well working elevator any 'improvements' you obtain really should end
> up in the noise.

Since I can fire off several requests simultaneously, I effectively obtain a
readahead type of effect, and since I don't have to follow any ordering
constraints (my catalogues are unordered), I can deal with the blocks in
whatever order the elevator delivers them to me.

> Intermezzo does the same thing, they even proposed a 'punch hole' syscall to
> allow a userspace daemon to 'invalidate' parts of a file so that the kernel
> will send the upcall to refetch the data from the server.

I don't need a hole punching syscall or ioctl. Apart from the fact that the
filesystem is already in the kernel and doesn't require a syscall, the cache
filesystem has to discard an entire file as a whole when it notices or is told
of a change.

> VM/VFS will handle appropriate readahead for you, you might just want to
> join the separate requests into one bigger request.

Agreed. That would be a reasonable way of doing it. The reason I thought of
doing it the way I suggested is that I could make the block size bigger in the
cache, and thus reduce indexing walking latency for adjacent pages.

> And one definite advantage, you actually provide AFS session semantics.

According to the AFS-3 Architectural Overview, "AFS does _not_ provide for
completely disconnected operation of file system clients" [their emphasis].

Furthermore, the overview also talks about "Chunked Access", in which it
allows files to be pulled over to the client and pushed back to the server in
chunks of 64Kb, thus allowing "AFS files of any size to be accessed from a
client".

Note that 64Kb is also a "default" that can be configured.

It also mentions that the read-entire-file notion was dropped, giving some of
the reasons I've mentioned.

> And my current development version of Coda has {cell,volume,vnode,unique}
> (128 bits), which is the same size as a UUID which was designed to have a
> very high probability of uniqueness. So if I ever consider adding another
> 'ident', I'll just switch to identifying each object with a UUID.

Does this mean that every Coda cell is issued with a 4-byte ID number? Or does
there need to be an additional index in the cache?

> How about IPv6?

These were just examples I know fairly well to illustrate the problems.

> Or you could use a hash or a userspace daemon that can map a fs-specific
> handle to a local cache file.

You still have to store a hash somewhere, and if it's stored in a userspace
daemon's VM, then it'll probably end up being swapped out to disc, and it may
have to be regenerated from indices every time the daemon is restarted (or
else your cache has to be started afresh.

Thanks for your insights though.

Cheers,
David

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-03 16:53     ` Jan Harkes
  2002-10-03 17:45       ` Jan Harkes
  2002-10-03 21:46       ` David Howells
@ 2002-10-04  8:13       ` David Howells
       [not found]       ` <15381.1033681790@warthog.cambridge.redhat.com>
  3 siblings, 0 replies; 317+ messages in thread
From: David Howells @ 2002-10-04  8:13 UTC (permalink / raw)
  To: Jan Harkes; +Cc: David Howells, Linus Torvalds, linux-kernel


Hi Jan,

> Coda 'solves' the page-aliasing issues by passing the kernel the same file
> descriptor as it is using itself to put the data into the container (cache)
> file. You could do the same and tell the kernel what the 'expected size' is,
> it can then block or trigger further fetches when that part of the file
> isn't available yet.

True. But how should I tell when a page isn't available yet, if I do as Coda
appears to do and directly use a file on another local filesystem (for Coda,
this doesn't matter since it maintains the entire file in its cache).

As for writing, surely Coda's solution is even worse? Since the pages in
question are "owned" by the backing cache file as far as the VM is concerned,
and all writes to a Coda file are ducted directly through to the backing cache
file, how do you (a) tell when the file has changed, and (b) tell which bits
of the file need to be sent back to the server? Do you send the entire file
back every time it gets changed? Do you keep a copy and "diff" the two?

> Basically when you open a file O_RDWR and write some data, and anyone else
> 'commits' an update to the file before you close the filehandle. Your client
> writes back the previously committed data, which it has proactively fetched,
> but with the local metadata (i.e. i_size).

This appears to be a flaw in the protocol, and it's going to be a problem, no
matter where the writes are done.

David

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
       [not found]       ` <15381.1033681790@warthog.cambridge.redhat.com>
@ 2002-10-04 14:02         ` Jan Harkes
  2002-10-04 14:40           ` Trond Myklebust
  2002-10-04 15:34           ` David Howells
  0 siblings, 2 replies; 317+ messages in thread
From: Jan Harkes @ 2002-10-04 14:02 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel

On Thu, Oct 03, 2002 at 10:49:50PM +0100, David Howells wrote:
> Do I take it you were (partially) responsible for Coda development? I have to
> admit I don't know much about Coda.

Yeah, you could say I'm pretty much responsible of where we're going
with Coda nowadays. Don't know whether that's a good thing, because I
can be a bit stubborn on some things sometimes ;)
> 
> > So you want to eventually link kerberos into the kernel to get the
> > security right?
> 
> That's unnecessary judging by OpenAFS. AFAICT, only the ticket needs to be
> cached in the kernel (this is obtained by means of a userspace program), and
> then the ticket is passed through the security challenge/response mechanism
> provided by RxRPC.

The challenge response then might the same as Coda's, a modified
needham-schroeder handshake based on a shared secret which in your case
is obtained through kerberos. But I thought Kerberos was better
integrated in AFS and they actually were even using krb encryption for
the rxrpc packets (i.e. how are packets kept secure after the handshake)

> Besides, I get the impression that NFSv4 may require some level of kerberos
> support in the kernel.

If not kerberos, at least some for of encryption. IP/Sec would need the
same things.

> > Coda 'solves' the page-aliasing issues by passing the kernel the same file
> > descriptor as it is using itself to put the data into the container (cache)
> > file. You could do the same and tell the kernel what the 'expected size' is,
> > it can then block or trigger further fetches when that part of the file
> > isn't available yet.
> 
> I presume Coda uses a 1:1 mapping between Coda files and cache files stored
> under a local filesystem (such as EXT3). If so, how do you detect holes in the
> file, given that the underlying fs doesn't permit you to differentiate between
> a hole and a block of zeros.

We don't. Coda has the all or nothing fetch. When we get an open upcall
and the file isn't cached we get the whole file, and return the
filehandle we just used to fetch the file, that way even when pages
haven't been flushed to disk yet, the kernel will see the same data. All
reads and writes are wrapped in such a way that readpage and writepage
directly access the cache file, when an mmap is active the Coda inode
isn't even touched anymore.

> > We don't need to do it at such a granualarity because of the disconnected
> > operation. It is more reliable as we can return a stale copy when we lose
> > the network halfway during the fetch.
> 
> OTOH, if you have a copy that you know is now out of date, then one could
> argue that you shouldn't let the user/application see it, as they/it are then
> basing anything they do on known "bad" data.

Correct, but it is part of the Coda semantics, while disconnected from
the network/servers you are given the last cached version of a file. 

> Should I also take it that Coda keeps the old file around until it has
> fetched a revised copy? If so, then surely you can't update a file
> unless your cache can find room for the entire revised copy.

Actually, throwing out the old copy is still done, but I consider that a
bug. And from numerous reports I've seen, others consider it a bug as
well. That is mainly because people who use Coda actively at some point
start to expect Coda's semantic model and know that when they open a
file that is cached they should get something, even when the wire is
pulled (or the server dies).

> another consequence of this is that the practical maximum file size
> you can deal with is half the size of your cache.

Pretty much, but we need that extra space for the disconnected writes
anyways, that way we can always roll back to a consistent version.

> > Hmm, a version of AFS that doesn't adhere to AFS semantics, interesting.
> > Are you going to emulate the same broken behaviour as transarc AFS on
> > O_RDWR?

Ok, traditional AFS semantics is 'session semantics'. Very strict,
whenever you read from a file, everything is consistent wrt. the time
you opened the file. Whatever you write isn't committed on the server
until you close the file. This model has great advantages such as
minimizing network traffic, giving lock free read/write consistency.

The problems with the model are, you might be looking at stale data, or
you might overwrite someone elses changed (last writer wins). To avoid
the first you would need to reopen the file regularily, to avoid the
second you need to lock files.

However, once AFS became commercial, there was a lot of pressure from
new users to provide something more similar to UNIX semantics. So
whenever a callback was received from the servers, and the file was
open for reading, the client would pull in the latest version. That way
processes (shared databases?) didn't need to do close/open to see new
updates. (oops, lost the read consistency)

Now people started throwing big databases in the filesystem, and the
cache issues became important. So they introduced 'chunked access',
dirty chunks are still written when the file is closed, but also when
the cache is full (oops, lost write consistency).

When a file is open for both reading and writing you get effects like
the one I described above. With a full cache on the writing side you can
imagine even more interesting interactions, where we end up with a file
that contains alternating 64KB blocks of both files. So the nice
combination of these two subtle changes made the AFS3 file system in my
opinion unreliable and unusable, ofcourse all of this can be avoided by
locking the file, but in that case the original semantics were probably
already more than sufficient.

> What I'm intending to do is have the write VFS method attempt to write the new
> data direct to the server and to the cache simultaneously where possible. If
> the volume is not available for some reason, I have a number of choices:

You don't have the choices if you are trying to implement an AFS
compatible filesystem.

>  (3) Store the write in the cache and try and sync up with the volume when it
>      becomes available again.

That would be sufficient. But you _really_ need to block during the
final close operation when you have to make sure the data is on the
server before returning.

>  (4) "Diff" the page in the pagecache against a copy stored in the cache and
>      try to send the changes to the server.

As far as I know this is impossible without changing the existing AFS
servers.

> I don't need a hole punching syscall or ioctl. Apart from the fact that the
> filesystem is already in the kernel and doesn't require a syscall, the cache
> filesystem has to discard an entire file as a whole when it notices or is told
> of a change.

Doesn't AFS3 give callbacks when it updates a chunk of the file? I guess
it still has retained at least that part of the original semantics, send
callbacks when the file is closed (and the data is 'officially'
committed). It is still up in the air what clients see that read the
file between the chunked writes and the actual file close.

> > And one definite advantage, you actually provide AFS session semantics.
> 
> According to the AFS-3 Architectural Overview, "AFS does _not_ provide for
> completely disconnected operation of file system clients" [their emphasis].

Disconnected operation has never been 'AFS semantics'. That's a Coda thing.

> > And my current development version of Coda has {cell,volume,vnode,unique}
> > (128 bits), which is the same size as a UUID which was designed to have a
> > very high probability of uniqueness. So if I ever consider adding another
> > 'ident', I'll just switch to identifying each object with a UUID.
> 
> Does this mean that every Coda cell is issued with a 4-byte ID number?
> Or does there need to be an additional index in the cache?

A Coda cell is simply a FQDN, whenever the userspace cachemanager
accesses a new cell it a locally unique ID, which will exist as long as
there are objects from that cell in the cache.

> You still have to store a hash somewhere, and if it's stored in a userspace
> daemon's VM, then it'll probably end up being swapped out to disc,

Why would it probably be swapped out to disc? If you're really worried
about that you could mlock the memory. And if you think that is too
expensive, it is still better to mlock memory in userspace that to
allocate that same memory in kernel space.

>                                                                    and it may
> have to be regenerated from indices every time the daemon is restarted (or
> else your cache has to be started afresh.

Yeah, that's why Coda is using a recoverable VM, basically a mmapped
file with an log where modifications are recorded so that we can
replay/rollback uncommitted operations when we're restarting.

Jan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [patch] [kkern] AFS filesystem for Linux (2/2)
  2002-10-03  0:36 ` [PATCH] AFS filesystem for Linux (2/2) Linus Torvalds
  2002-10-03  9:05   ` David Howells
@ 2002-10-04 14:11   ` Patrick Audley
  1 sibling, 0 replies; 317+ messages in thread
From: Patrick Audley @ 2002-10-04 14:11 UTC (permalink / raw)
  To: linux-kernel


    Linus> Now, admittedly maybe the user-space deamon approach is
    Linus> crap, and what we really want is to have some way to cache
    Linus> network stuff on the disk directly from the kernel, ie just
    Linus> implement a real mapping/page-indexed cachefs that people
    Linus> could mount and use together with different network
    Linus> filesystems.

    Cachefs has been on our most wanted list for a while now in Linux
biocomputing..  Using NFS for huge datasets with cachefs on Solaris is
a breeze and Linux currently offers no alternative
(well... coda/intermezzo are close but no close enough).  A general
purpose cachefs would be beautiful.

                                                Patrick Audley


-- 
`Every program attempts to expand until it can read mail. Those programs
   which cannot so expand are replaced by ones which can.'' 
                  - The Law of Software Development
...
Patrick Audley                paudley@compbio.dundee.ac.uk
Computational Biology         http://www.compbio.dundee.ac.uk
University of Dundee          http://blackcat.ca
Dundee, Scotland              +44 1382 348721


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 14:02         ` Jan Harkes
@ 2002-10-04 14:40           ` Trond Myklebust
  2002-10-04 15:35             ` David Howells
  2002-10-04 15:34           ` David Howells
  1 sibling, 1 reply; 317+ messages in thread
From: Trond Myklebust @ 2002-10-04 14:40 UTC (permalink / raw)
  To: Jan Harkes; +Cc: David Howells, linux-kernel

>>>>> " " == Jan Harkes <jaharkes@cs.cmu.edu> writes:

    >> Besides, I get the impression that NFSv4 may require some level
    >> of kerberos support in the kernel.

     > If not kerberos, at least some for of encryption. IP/Sec would
     > need the same things.

NFSv4 does indeed require the full kerberos encryption stuff in the
kernel. The RFC specifies that krb5 support is a minimum requirement,
and we will expect to have that in 2.6 (or 3.0 or whatever it's called
these days...)

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 14:02         ` Jan Harkes
  2002-10-04 14:40           ` Trond Myklebust
@ 2002-10-04 15:34           ` David Howells
  2002-10-04 16:07             ` Jan Harkes
  1 sibling, 1 reply; 317+ messages in thread
From: David Howells @ 2002-10-04 15:34 UTC (permalink / raw)
  To: Jan Harkes; +Cc: David Howells, linux-kernel


Hi Jan,

> We don't. Coda has the all or nothing fetch. When we get an open upcall and
> the file isn't cached we get the whole file, and return the filehandle we
> just used to fetch the file, that way even when pages haven't been flushed
> to disk yet, the kernel will see the same data. All reads and writes are
> wrapped in such a way that readpage and writepage directly access the cache
> file, when an mmap is active the Coda inode isn't even touched anymore.

So, say you've got a really big file in your cache. Someone changes a single
byte in it. You can work out that the file has changed by one of a number of
means. How do you avoid having to throw the entire file back to the server?

> Pretty much, but we need that extra space for the disconnected writes
> anyways, that way we can always roll back to a consistent version.

How do you sync up with the server on reconnection? This is similar to the
problem you pointed out that I have to solve - how to deal with the file on
the server being changed by another client whilst I'm also trying to write to
it.

> Ok, traditional AFS semantics is 'session semantics'. Very strict, whenever
> you read from a file, everything is consistent wrt. the time you opened the
> file. Whatever you write isn't committed on the server until you close the
> file. This model has great advantages such as minimizing network traffic,
> giving lock free read/write consistency.

Whatever model you choose, you have to accept some compromises.

The model I'm thinking of is as follows:

 (*) Data that the client doesn't have immediately to hand (ie is not yet
     cached) it fetched a chunk at a time upon request of the VM (thus
     allowing data readahead to be driven by the VM).

 (*) All data cached by a client for a particular file is zapped if I get a
     callback from the server and/or the data version number of that file
     appears to have changed.

 (*) In O_SYNC mode, data is written back to the server as promptly as
     possible within the write() call (maybe through the auspices of
     prepare_write and commit_write).

 (*) In non-O_SYNC mode, I would like the data to be written back through the
     page cache's writepage() routine(s). By setting the dirty bits on pages,
     the write will be scheduled by the VM at some point. This would permit
     better write coelescing locally. However, security becomes a problem,
     since I have to say to the server which user I'm doing a store as, and if
     the data is coelesced from writes done as several different users, then
     there could be a problem if the store is rejected.

     How does Coda deal with this security problem?

I admit there are a number of problems with this model that might be
alleviated by better operations being available in the AFS spec (such as data
insertion without having to nominate a new EOF position, and data appending
without needing to know the old or specify a new EOF position.

> Now people started throwing big databases in the filesystem, and the cache
> issues became important. So they introduced 'chunked access', dirty chunks
> are still written when the file is closed, but also when the cache is full
> (oops, lost write consistency).

Anyone using a network filesystem of any type to store a big databases is
probably just asking for trouble. IMHO they're far better off talking to a
distributed DB through its own network access protocol. But that's besides the
point.

> >  (4) "Diff" the page in the pagecache against a copy stored in the cache
> >	 and try to send the changes to the server.
> 
> As far as I know this is impossible without changing the existing AFS
> servers.

It can be done entirely in the client, provided it has a copy of the
unmodified page still in its cache.

> Doesn't AFS3 give callbacks when it updates a chunk of the file? I guess it
> still has retained at least that part of the original semantics, send
> callbacks when the file is closed (and the data is 'officially'
> committed). It is still up in the air what clients see that read the file
> between the chunked writes and the actual file close.

All it says is that a given file has changed. It doesn't provide a clue as to
where. Hence the entire file has to be flushed.

> Disconnected operation has never been 'AFS semantics'. That's a Coda thing.

I didn't say it was. It's explicitly denied in the AFS docs, though it's
discussed under the future developments section.

> A Coda cell is simply a FQDN, whenever the userspace cachemanager accesses a
> new cell it a locally unique ID, which will exist as long as there are
> objects from that cell in the cache.

The 32-bit ID still has to be mapped through a catalogue somewhere (may be a
text file that the cache manager reads on starting), and if the catalogue is
external to the cache, the catalogue may change what the ID corresponds to
without the cache being invalidated.

> Why would it probably be swapped out to disc? If you're really worried about
> that you could mlock the memory. And if you think that is too expensive, it
> is still better to mlock memory in userspace that to allocate that same
> memory in kernel space.

I'm storing mine on disc as do normal disc-based FS's. That means it can be a
lot bigger. Besides, you don't really want to mlock memory or store it in the
kernel - that would be a big chunk of memory permanently committed and
unavailable for other uses.

> Yeah, that's why Coda is using a recoverable VM, basically a mmapped
> file with an log where modifications are recorded so that we can
> replay/rollback uncommitted operations when we're restarting.

What's "VM" in this context?

David

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 14:40           ` Trond Myklebust
@ 2002-10-04 15:35             ` David Howells
  2002-10-04 15:53               ` Trond Myklebust
  2002-10-04 16:30               ` Andreas Dilger
  0 siblings, 2 replies; 317+ messages in thread
From: David Howells @ 2002-10-04 15:35 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jan Harkes, David Howells, linux-kernel


> NFSv4 does indeed require the full kerberos encryption stuff in the
> kernel. The RFC specifies that krb5 support is a minimum requirement, and we
> will expect to have that in 2.6 (or 3.0 or whatever it's called these
> days...)

Might this be something I can make use of for my AFS filesystem too?

David

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 15:35             ` David Howells
@ 2002-10-04 15:53               ` Trond Myklebust
  2002-10-04 15:56                 ` David Howells
  2002-10-04 16:30               ` Andreas Dilger
  1 sibling, 1 reply; 317+ messages in thread
From: Trond Myklebust @ 2002-10-04 15:53 UTC (permalink / raw)
  To: David Howells; +Cc: Jan Harkes, linux-kernel

>>>>> " " == David Howells <dhowells@cambridge.redhat.com> writes:

    >> NFSv4 does indeed require the full kerberos encryption stuff in
    >> the kernel. The RFC specifies that krb5 support is a minimum
    >> requirement, and we will expect to have that in 2.6 (or 3.0 or
    >> whatever it's called these days...)

     > Might this be something I can make use of for my AFS filesystem
     > too?

Possibly. Our intention is to integrate the RPCSEC_GSS security
protocol (see RFC2203) into the sunrpc code, then use krb5 as one of
the authentication flavours.

Whereas I doubt that AFS uses RPCSEC_GSS, I believe that the kerberos
code itself (+ upcall mechanism for getting user tokens etc.) could be
reused by you. I presume that you would make use of the sunrpc code
too?

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 15:53               ` Trond Myklebust
@ 2002-10-04 15:56                 ` David Howells
  2002-10-04 16:03                   ` Trond Myklebust
  0 siblings, 1 reply; 317+ messages in thread
From: David Howells @ 2002-10-04 15:56 UTC (permalink / raw)
  To: trond.myklebust; +Cc: David Howells, Jan Harkes, linux-kernel


> Whereas I doubt that AFS uses RPCSEC_GSS, I believe that the kerberos code
> itself (+ upcall mechanism for getting user tokens etc.) could be reused by
> you. I presume that you would make use of the sunrpc code too?

I would if I could, but RxRPC is a sufficiently different protocol to make
that impractical:-/

David

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 15:56                 ` David Howells
@ 2002-10-04 16:03                   ` Trond Myklebust
  2002-10-04 16:17                     ` David Howells
  0 siblings, 1 reply; 317+ messages in thread
From: Trond Myklebust @ 2002-10-04 16:03 UTC (permalink / raw)
  To: David Howells; +Cc: Jan Harkes, linux-kernel

>>>>> " " == David Howells <dhowells@cambridge.redhat.com> writes:

    >> Whereas I doubt that AFS uses RPCSEC_GSS, I believe that the
    >> kerberos code itself (+ upcall mechanism for getting user
    >> tokens etc.) could be reused by you. I presume that you would
    >> make use of the sunrpc code too?

     > I would if I could, but RxRPC is a sufficiently different
     > protocol to make that impractical:-/

How badly does it differ? If you are talking only about a couple of
differences in the RPC headers, then that is something that can easily
be overcome...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 15:34           ` David Howells
@ 2002-10-04 16:07             ` Jan Harkes
  2002-10-04 16:56               ` David Howells
  0 siblings, 1 reply; 317+ messages in thread
From: Jan Harkes @ 2002-10-04 16:07 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel

On Fri, Oct 04, 2002 at 04:34:32PM +0100, David Howells wrote:
> > We don't. Coda has the all or nothing fetch. When we get an open upcall and
> > the file isn't cached we get the whole file, and return the filehandle we
> > just used to fetch the file, that way even when pages haven't been flushed
> > to disk yet, the kernel will see the same data. All reads and writes are
> > wrapped in such a way that readpage and writepage directly access the cache
> > file, when an mmap is active the Coda inode isn't even touched anymore.
> 
> So, say you've got a really big file in your cache. Someone changes a single
> byte in it. You can work out that the file has changed by one of a number of
> means. How do you avoid having to throw the entire file back to the server?

Coda actually writes the whole file back. Someone was working on an
rsync based method, but I haven't heard anything for a while. It is
actually mostly databases that update single bytes in large files.
Anything like a text editor will move the original file to a backup copy
and write a new version anyways. Compilers tend to throw away the old
object files and create a new file, etc.

> > Pretty much, but we need that extra space for the disconnected writes
> > anyways, that way we can always roll back to a consistent version.
> 
> How do you sync up with the server on reconnection? This is similar to the
> problem you pointed out that I have to solve - how to deal with the file on
> the server being changed by another client whilst I'm also trying to write to
> it.

We automatically resolve directory conflict, file conflicts are 'punted'
to userspace. There is a manual repair tool, where a user can pick the
version that he believes is the most appropriate, or he can substitute a
replacement or merged copy.

There is also a framework for application specific resolvers, these can
try to automatically merge f.i. calendar files, or email boxes, or
remove/recompile object files when there is a conflict, etc.

For AFS you don't have to care, it's semantics without locking are that
the 'last writer to close it's fd' wins.

> Whatever model you choose, you have to accept some compromises.

I thought you were implementing AFS, how can you choose your model?

> The model I'm thinking of is as follows:
...
>  (*) All data cached by a client for a particular file is zapped if I get a
>      callback from the server and/or the data version number of that file
>      appears to have changed.

How about dirty data, that was locally modified? There is a network
latency that comes into play here. If you want last writer wins you
should not zap dirty data.

>  (*) In O_SYNC mode, data is written back to the server as promptly as
>      possible within the write() call (maybe through the auspices of
>      prepare_write and commit_write).

So other clients will read inconsistent data if they don't have that
chunk cached. And if the server generates callbacks on the write of a
chunk, all clients see the update?

>  (*) In non-O_SYNC mode, I would like the data to be written back through the
>      page cache's writepage() routine(s). By setting the dirty bits on pages,
>      the write will be scheduled by the VM at some point. This would permit
>      better write coelescing locally. However, security becomes a problem,
>      since I have to say to the server which user I'm doing a store as, and if
>      the data is coelesced from writes done as several different users, then
>      there could be a problem if the store is rejected.

Ehh, access permissions should already be checked when the file is
opened. NFS is dealing with the same problems here.

>      How does Coda deal with this security problem?

What problem? If you don't have write permission you can't write to the
file. If you do have write permissions you can write. When the last
writer closes it's filehandle the data is sent to the server with his
permissions.

Now if the ACL is changed on the servers before the close so that this
last writer loses write access we get a 'conflict' that is punted to
userspace, similar to the case when writing to an already updated file.

> > Why would it probably be swapped out to disc? If you're really worried about
> > that you could mlock the memory. And if you think that is too expensive, it
> > is still better to mlock memory in userspace that to allocate that same
> > memory in kernel space.
> 
> I'm storing mine on disc as do normal disc-based FS's. That means it can be a
> lot bigger. Besides, you don't really want to mlock memory or store it in the
> kernel - that would be a big chunk of memory permanently committed and
> unavailable for other uses.

So I store it in VM on the disk and you store it in your private FS on
disk. Looks like the same thing to me, except I get to enjoy the
benefits of the page/swapcache keeping the data in memory when it is
frequently used without having to do anything smart.

> > Yeah, that's why Coda is using a recoverable VM, basically a mmapped
> > file with an log where modifications are recorded so that we can
> > replay/rollback uncommitted operations when we're restarting.
> 
> What's "VM" in this context?

Virtual Memory. A private mmap of a file. Updates dirty the pages, so
they end up in swap, but we log the same update in a logfile (journal),
when the log fills up, the changes written directly to the underlying
file. Optionally, when we know that the swap copy and the file copy of
the page are identical, the page is remapped to reduce swap usage. That
is where we store all the metadata and the Coda file -> cache file
mappings so that they can survive a reboot or system crash.

Jan


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 16:03                   ` Trond Myklebust
@ 2002-10-04 16:17                     ` David Howells
  2002-10-04 17:04                       ` Trond Myklebust
  0 siblings, 1 reply; 317+ messages in thread
From: David Howells @ 2002-10-04 16:17 UTC (permalink / raw)
  To: trond.myklebust; +Cc: David Howells, Jan Harkes, linux-kernel


> > I would if I could, but RxRPC is a sufficiently different
> > protocol to make that impractical:-/
> 
> How badly does it differ? If you are talking only about a couple of
> differences in the RPC headers, then that is something that can easily
> be overcome...

There appears to be more to it than that. RxRPC has sequencing and ACKing
windows and things that I don't think SUNRPC has (basically it tries to do
some of TCP itself). I've attached the struct definitions for the RxRPC header
and the ACK packet body from my RxRPC implementation for your reference.

Furthermore, RxRPC allows for big binary blobs to be interpolated in the
middle of packets (though if I understand it correctly it effectively encodes
them as an XDR octet array of sorts).

David


struct rxrpc_header
{
	u32	epoch;		/* client boot timestamp */

	u32	cid;		/* connection and channel ID */
#define RXRPC_MAXCALLS		4			/* max active calls per conn */
#define RXRPC_CHANNELMASK	(RXRPC_MAXCALLS-1)	/* mask for channel ID */
#define RXRPC_CIDMASK		(~RXRPC_CHANNELMASK)	/* mask for connection ID */
#define RXRPC_CIDSHIFT		2			/* shift for connection ID */

	u32	callNumber;	/* call ID (0 for connection-level packets) */
#define RXRPC_PROCESS_MAXCALLS	(1<<2)	/* maximum number of active calls per conn (power of 2) */

	u32	seq;		/* sequence number of pkt in call stream */
	u32	serial;		/* serial number of pkt sent to network */

	u8	type;		/* packet type */
#define RXRPC_PACKET_TYPE_DATA		1	/* data */
#define RXRPC_PACKET_TYPE_ACK		2	/* ACK */
#define RXRPC_PACKET_TYPE_BUSY		3	/* call reject */
#define RXRPC_PACKET_TYPE_ABORT		4	/* call/connection abort */
#define RXRPC_PACKET_TYPE_ACKALL	5	/* ACK all outstanding packets on call */
#define RXRPC_PACKET_TYPE_CHALLENGE	6	/* connection security challenge (SRVR->CLNT) */
#define RXRPC_PACKET_TYPE_RESPONSE	7	/* connection secutity response (CLNT->SRVR) */
#define RXRPC_PACKET_TYPE_DEBUG		8	/* debug info request */
#define RXRPC_N_PACKET_TYPES		9	/* number of packet types (incl type 0) */

	u8	flags;		/* packet flags */
#define RXRPC_CLIENT_INITIATED	0x01		/* signifies a packet generated by a client */
#define RXRPC_REQUEST_ACK	0x02		/* request an unconditional ACK of this packet */
#define RXRPC_LAST_PACKET	0x04		/* the last packet from this side for this call */
#define RXRPC_MORE_PACKETS	0x08		/* more packets to come */
#define RXRPC_JUMBO_PACKET	0x20		/* [DATA] this is a jumbo packet */
#define RXRPC_SLOW_START_OK	0x20		/* [ACK] slow start supported */

	u8	userStatus;	/* app-layer defined status */
	u8	securityIndex;	/* security protocol ID */
	u16	_rsvd;		/* reserved (used by kerberos security as cksum) */
	u16	serviceId;	/* service ID */

} __attribute__((packed));


struct rxrpc_ackpacket
{
	u16	bufferSpace;	/* number of packet buffers available */
	u16	maxSkew;	/* diff between serno being ACK'd and highest serial no received */
	u32	firstPacket;	/* sequence no of first ACK'd packet in attached list */
	u32	previousPacket;	/* sequence no of previous packet received */
	u32	serial;		/* serial no of packet that prompted this ACK */

	u8	reason;		/* reason for ACK */
#define RXRPC_ACK_REQUESTED		1	/* ACK was requested on packet */
#define RXRPC_ACK_DUPLICATE		2	/* duplicate packet received */
#define RXRPC_ACK_OUT_OF_SEQUENCE	3	/* out of sequence packet received */
#define RXRPC_ACK_EXCEEDS_WINDOW	4	/* packet received beyond end of ACK window */
#define RXRPC_ACK_NOSPACE		5	/* packet discarded due to lack of buffer space */
#define RXRPC_ACK_PING			6	/* keep alive ACK */
#define RXRPC_ACK_PING_RESPONSE		7	/* response to RXRPC_ACK_PING */
#define RXRPC_ACK_DELAY			8	/* nothing happened since received packet */
#define RXRPC_ACK_IDLE			9	/* ACK due to fully received ACK window */

	u8	nAcks;		/* number of ACKs */
#define RXRPC_MAXACKS	255

	u8	acks[0];	/* list of ACK/NAKs */
#define RXRPC_ACK_TYPE_NACK		0
#define RXRPC_ACK_TYPE_ACK		1

} __attribute__((packed));

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 15:35             ` David Howells
  2002-10-04 15:53               ` Trond Myklebust
@ 2002-10-04 16:30               ` Andreas Dilger
  1 sibling, 0 replies; 317+ messages in thread
From: Andreas Dilger @ 2002-10-04 16:30 UTC (permalink / raw)
  To: David Howells; +Cc: Trond Myklebust, Jan Harkes, linux-kernel

On Oct 04, 2002  16:35 +0100, David Howells wrote:
> 
> > NFSv4 does indeed require the full kerberos encryption stuff in the
> > kernel. The RFC specifies that krb5 support is a minimum requirement, and we
> > will expect to have that in 2.6 (or 3.0 or whatever it's called these
> > days...)
> 
> Might this be something I can make use of for my AFS filesystem too?

We will also need kerberos for Lustre when we start implementing
security.  We will be using the GSSAPI for security, so basically
the same as what AFS is using.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 16:07             ` Jan Harkes
@ 2002-10-04 16:56               ` David Howells
  2002-10-04 17:36                 ` Jan Harkes
  0 siblings, 1 reply; 317+ messages in thread
From: David Howells @ 2002-10-04 16:56 UTC (permalink / raw)
  To: David Howells, linux-kernel


Hi Jan,

> We automatically resolve directory conflict, file conflicts are 'punted' to
> userspace. There is a manual repair tool, where a user can pick the version
> that he believes is the most appropriate, or he can substitute a replacement
> or merged copy.
>
> There is also a framework for application specific resolvers, these can try
> to automatically merge f.i. calendar files, or email boxes, or
> remove/recompile object files when there is a conflict, etc.

I see. So users have to know how to resolve conflicts themselves...

> For AFS you don't have to care, it's semantics without locking are that
> the 'last writer to close it's fd' wins.

Is that a requirement for AFS? I'm looking at trying to pass writes to the
server as soon as possible.

> I thought you were implementing AFS, how can you choose your model?

Where there's leeway, I can choose.

> How about dirty data, that was locally modified?

In some ways, what I'd ideally like to do is have write() contact the server
and actually send the data direct, then assuming that wasn't rejected, update
the local cache. Any bits of the write that don't overlap pages that are
cached _can_ just be discarded once written to the server, as I can always
retrieve them from the server later, complete with their surrounding data.

Alternatively, I can use prepare-write to make sure I hold copies of all the
pages I want to modify, and then use commit-write to write them back. The
problem with doing that is that I potentially have bits either side that may
end up reverting a write made to the server by someone else.

The third option - using writepage() - has the same problems as the second.

However, I could get around this by doing a binary "diff" of the page in the
local cache against the modified page in memory and just send the changes to
the server. Or, I could then retrieve a fresh copy of the page and apply the
differences to that. The problem there is that I can't detect changes that
have made no difference.

> There is a network latency that comes into play here. If you want last
> writer wins you should not zap dirty data.

I have to zap pages that some other client has modified. The server has told
me to invalidate it, and so I need to reread it from the server.

This is probably only going to be a problem with shared-writable mappings, and
the problem is due to AFS, not how I do my caching.

> So other clients will read inconsistent data if they don't have that
> chunk cached. And if the server generates callbacks on the write of a
> chunk, all clients see the update?

Not with prepare-/commit-write if commit-write is synchronous.

In any case, if they don't have that chunk cached they'll have to fetch it
from the server.

> Ehh, access permissions should already be checked when the file is
> opened. NFS is dealing with the same problems here.

AFS fileservers don't have a concept of an open file. Every time you do a
fetch or a store (be for it data or metadata) you get a short-term "lease" on
the file that the server breaks if some other client modifies it.

Furthermore, every operation a client issues is checked at the RxRPC
level. This way a client can have several users holding a file open in the
local VFS and just label the operations sent to the server with tokens
representing the users doing the access.

How does Coda do it? Does it get "open" the file on the server with a
particular set of credentials for each user that wants to use it?

> >      How does Coda deal with this security problem?
> 
> What problem?

For instance: you have one file in your cache, several people write to it, but
one of the people in the middle had permission taken away as far as the server
is concerned.

Also, when writing the file back, if several people have changed it, whose
credentials do you present to the server as having made the change?

> If you don't have write permission you can't write to the file. If you do
> have write permissions you can write. When the last writer closes it's
> filehandle the data is sent to the server with his permissions.

What if someone changes the permissions on the server's copy of the file
whilst you have the file open?

> Now if the ACL is changed on the servers before the close so that this last
> writer loses write access we get a 'conflict' that is punted to userspace,
> similar to the case when writing to an already updated file.

This is what I was getting at. The main issue is how do you deal with a file
in the cache that has been written to by several people, some of whom don't
have write permission any more.

> So I store it in VM on the disk and you store it in your private FS on
> disk. Looks like the same thing to me, except I get to enjoy the benefits of
> the page/swapcache keeping the data in memory when it is frequently used
> without having to do anything smart.

And I get to enjoy the benefits of the pagecache keeping the data in memory
when it is frequently used, and I can also do smart stuff and try to optimise
access.

> Virtual Memory. A private mmap of a file.

I thought you might have meant a "Volume Manager" rather than the kernel's VM
subsystem.

> Updates dirty the pages, so they end up in swap, but we log the same update
> in a logfile (journal), when the log fills up, the changes written directly
> to the underlying file.

So you keep everything in triplicate whilst running? (or at least duplicate
plus a frequently emptied journal).

> Optionally, when we know that the swap copy and the file copy of the page
> are identical, the page is remapped to reduce swap usage. That is where we
> store all the metadata and the Coda file -> cache file mappings so that they
> can survive a reboot or system crash.

I see... you can have dirty data residing in the cache over a reboot, and thus
you can't just nuke your cache partially or wholly, and you have to keep
careful account of all the entries in there... Hmmm... I'll have to think
about that, but I don't think it'll be a problem, provided I don't let close()
complete until all the dirty data from a file is written to the server.

David

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 16:17                     ` David Howells
@ 2002-10-04 17:04                       ` Trond Myklebust
  2002-10-04 17:29                         ` David Howells
  2002-10-07 14:14                         ` David Howells
  0 siblings, 2 replies; 317+ messages in thread
From: Trond Myklebust @ 2002-10-04 17:04 UTC (permalink / raw)
  To: David Howells; +Cc: Jan Harkes, linux-kernel

>>>>> " " == David Howells <dhowells@cambridge.redhat.com> writes:

     > There appears to be more to it than that. RxRPC has sequencing
     > and ACKing windows and things that I don't think SUNRPC has
     > (basically it tries to do some of TCP itself). I've attached
     > the struct definitions for the RxRPC header and the ACK packet
     > body from my RxRPC implementation for your reference.

The 'socket protocol' stuff already copes with UDP w/ congestion
control + TCP. It will eventually do SCTP and IPv6 when we get down to
it all.
Fitting UDP w/ sequencing+acking as an extra protocol should be
possible, and in fact support for sequencing is needed anyway for
the security code in RPCSEC_GSS.

The nice thing about the SunRPC code is that it provides a generic
engine for sending and receiving messages asynchronously. For the
client, the SunRPC specific stuff is almost all in
net/sunrpc/clnt.c. If you write a replacement for that in order to
deal with the RxRPC quirks, you should still be able to make use of
rpciod and the socket + auth code.

Ditto for the server stuff: nobody forces you to use svc_process to
interpret the RPC headers.

     > Furthermore, RxRPC allows for big binary blobs to be
     > interpolated in the middle of packets (though if I understand
     > it correctly it effectively encodes them as an XDR octet array
     > of sorts).

The RPC layer doesn't worry too much about the contents of the data
you send. We're already interpolating pages into our RPC messages...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 17:04                       ` Trond Myklebust
@ 2002-10-04 17:29                         ` David Howells
  2002-10-07 14:14                         ` David Howells
  1 sibling, 0 replies; 317+ messages in thread
From: David Howells @ 2002-10-04 17:29 UTC (permalink / raw)
  To: trond.myklebust; +Cc: David Howells, linux-kernel


> The 'socket protocol' stuff already copes with UDP w/ congestion
> control + TCP. It will eventually do SCTP and IPv6 when we get down to
> it all.
> Fitting UDP w/ sequencing+acking as an extra protocol should be
> possible, and in fact support for sequencing is needed anyway for
> the security code in RPCSEC_GSS.

Interesting idea. What do you mean by an extra protocol exactly? An extra
IPPROTO_xxxx option as might be passed to socket()?

My RxRPC implementation currently does all the incoming message management and
routing in process context in a worker thread.

The following details have to be resolved wherever it is done:

 (*) RxRPC calls (operations) may be aborted or rejected.

 (*) RxRPC packets have to be ACK'd/NAK'd and maybe resent. ACK packets may
     ACK or NAK multiple data packets.

 (*) RxRPC packets have their own "connection number" and "channel number" and
     "call number" spaces on top UDP port numbers, thus require multi-tier
     routing.

 (*) An operation (a "call") may get challenged or may be challenged for
     security details in the middle of transmission or reception.

 (*) RxRPC packets may get merged together and placed into one UDP
     packet. They then have to be split and processed individually upon
     reception.

I don't see that there should be a problem with the first three (after all,
TCP does all of them). The fourth may be the trickiest, but that may also be
the one NFSv4 has to deal with.

> The nice thing about the SunRPC code is that it provides a generic
> engine for sending and receiving messages asynchronously. For the
> client, the SunRPC specific stuff is almost all in
> net/sunrpc/clnt.c. If you write a replacement for that in order to
> deal with the RxRPC quirks, you should still be able to make use of
> rpciod and the socket + auth code.

It's not entirely clear as to how to use it.

> Ditto for the server stuff: nobody forces you to use svc_process to
> interpret the RPC headers.
> 
>      > Furthermore, RxRPC allows for big binary blobs to be
>      > interpolated in the middle of packets (though if I understand
>      > it correctly it effectively encodes them as an XDR octet array
>      > of sorts).
> 
> The RPC layer doesn't worry too much about the contents of the data
> you send. We're already interpolating pages into our RPC messages...

That's useful. I also pass pages to my rxrpc receive data routine to copy into
directly from skbuffs, but I don't see that being a problem.

David

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 16:56               ` David Howells
@ 2002-10-04 17:36                 ` Jan Harkes
  2002-10-07  9:14                   ` David Howells
  0 siblings, 1 reply; 317+ messages in thread
From: Jan Harkes @ 2002-10-04 17:36 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel

On Fri, Oct 04, 2002 at 05:56:36PM +0100, David Howells wrote:
> > For AFS you don't have to care, it's semantics without locking are that
> > the 'last writer to close it's fd' wins.
> 
> Is that a requirement for AFS? I'm looking at trying to pass writes to the
> server as soon as possible.

Last time I checked, yes. Callbacks should be sent only when the file is
closed.

> > There is a network latency that comes into play here. If you want last
> > writer wins you should not zap dirty data.
> 
> I have to zap pages that some other client has modified. The server has told
> me to invalidate it, and so I need to reread it from the server.
>
> This is probably only going to be a problem with shared-writable mappings, and
> the problem is due to AFS, not how I do my caching.

But local modification are per definition more up-to-date because you
haven't closed the file yet. It is very much a problem of how you do
your caching. I'm trying to tell you AFS has a well defined consistency
model called session semantics. That directly influenzes every decision
you make wrt. to caching and everything I've seen up until now shows me
that you're simply implementing an NFS client that happens to speak to
AFS servers.

> > Ehh, access permissions should already be checked when the file is
> > opened. NFS is dealing with the same problems here.
> 
> AFS fileservers don't have a concept of an open file. Every time you do a
> fetch or a store (be for it data or metadata) you get a short-term "lease" on
> the file that the server breaks if some other client modifies it.

Actually AFS is most definitely a stateful filesystem and as such has
the concept of an open file.

Just google for "AFS NFS stateful stateless" and you get literally
hundreds of handouts and lecture notes from distributes systems classes
that describe exactly these differences. UNIX/session semantics,
stateless/stateful, etc.

> Furthermore, every operation a client issues is checked at the RxRPC
> level. This way a client can have several users holding a file open in the
> local VFS and just label the operations sent to the server with tokens
> representing the users doing the access.
> 
> How does Coda do it? Does it get "open" the file on the server with a
> particular set of credentials for each user that wants to use it?

Yes. If the user hasn't opened the file before, we have to check the
permissions with the server. So every user that opens the same file will
generate an "open" rpc to the server.

> > >      How does Coda deal with this security problem?
> > 
> > What problem?
> 
> For instance: you have one file in your cache, several people write to it, but
> one of the people in the middle had permission taken away as far as the server
> is concerned.

First of all, typical write-write sharing is extremely low, otherwise
Coda wouldn't have even worked with it's optimistic replication and
caching. Second of all permission changes are even rarer.

So on the off chance that a write-write shared file on happens to be
affected by a permission change.

- We either succeed in writing the file when the person who lost access
  closed before the last writer who still had access. Big deal, he had
  access when he opened the file, and it is next to impossible to tell
  wheter his close arrived before or after the permission was taken away
  except when every operation is somehow given an absolute linear
  ordering in time. Generally too expensive on a distributed system
  with clients that may come and go as they please.

- Or the person who lost access is the last one to close the file and
  trigger the writeback. In this case the servers will deny the write
  operation and the client will declare a conflict on that object and
  switch to disconnected mode. Someone who has write permission can
  repair the conflict.

> Also, when writing the file back, if several people have changed it, whose
> credentials do you present to the server as having made the change?
>
> > If you don't have write permission you can't write to the file. If you do
> > have write permissions you can write. When the last writer closes it's
> > filehandle the data is sent to the server with his permissions.

Like I said in the section you quoted yourself, the last writer who
closes the file.

> > Updates dirty the pages, so they end up in swap, but we log the same update
> > in a logfile (journal), when the log fills up, the changes written directly
> > to the underlying file.
> 
> So you keep everything in triplicate whilst running? (or at least duplicate
> plus a frequently emptied journal).

Actually, for any unmodified data there is only one copy. Once it is
modified there are two (the dirty page and the logged change), why would
there be a third, I don't care about the copy in the backing file...

It has the advantage that even if you pull the powercord, we can still
recover the last consistent set of metadata and from that validate both
the data in the cache files, and revalidate everything against the
servers, which we already need for the disconnected operation.

Jan

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-03  9:05   ` David Howells
  2002-10-03 16:53     ` Jan Harkes
@ 2002-10-06 16:49     ` Troy Benjegerdes
  2002-10-07  9:16       ` David Howells
  1 sibling, 1 reply; 317+ messages in thread
From: Troy Benjegerdes @ 2002-10-06 16:49 UTC (permalink / raw)
  To: David Howells; +Cc: Linus Torvalds, David Howells, linux-kernel

On Thu, Oct 03, 2002 at 10:05:39AM +0100, David Howells wrote:
> 
> Linus Torvalds wrote:
> > On Wed, 2 Oct 2002, David Howells wrote:
> > >
> > > This patch adds an Andrew File System (AFS) driver to the
> > > kernel. Currently it only provides read-only, uncached, non-automounted
> > > and unsecured support.
> >
> > Are you sure this is the right way to go?
> 
> I think so. I think it makes sense for the AFS VFS-interface to go as directly
> as possible to the network without having to make context switches to get into
> userspace.

Hrrm, well, I'm in the middle of deploying AFS (moving away from NFS), and 
one of the ideas I toyed with was how to get a diskless AFS client. (yeah, 
that sounds silly at first, 2GB disks used to be large, not systems with 
2GB of ram are common.) A mostly kernel-based implementation of AFS would 
be quite usefull for this. (and the remaining bits could be in an 
initramfs)

-- 
Troy Benjegerdes | master of mispeeling | 'da hozer' |  hozer@drgw.net
-----"If this message isn't misspelled, I didn't write it" -- Me -----
"Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's 
why I draw cartoons. It's my life." -- Charles Schulz

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 17:36                 ` Jan Harkes
@ 2002-10-07  9:14                   ` David Howells
  0 siblings, 0 replies; 317+ messages in thread
From: David Howells @ 2002-10-07  9:14 UTC (permalink / raw)
  To: Jan Harkes; +Cc: David Howells, linux-kernel


> Last time I checked, yes. Callbacks should be sent only when the file is
> closed.

Sent by whom?

The client doesn't send callbacks. The server breaks all the callbacks on a
file whenever the contents or metadata of that file are changed. Closing has
nothing to do with it as far as the server is concerned (indeed, there is no
close).

> But local modification are per definition more up-to-date because you
> haven't closed the file yet. It is very much a problem of how you do your
> caching. I'm trying to tell you AFS has a well defined consistency model
> called session semantics. That directly influenzes every decision you make
> wrt. to caching and everything I've seen up until now shows me that you're
> simply implementing an NFS client that happens to speak to AFS servers.

I'm trying to follow the AFS architecture guide, without sacrificing too much
to cache coherency problems - which it sounds like Coda has... You've said it
requires user intervention (automated maybe) to fix up files for which the
cache has become inconsistent with respect to the server.

The difference is between Write-Back and Write-Through caching (to use h/w CPU
caching terminology). What I'm looking at doing is using Write-Through
caching, but I could make it configurable.

As I've said, Write-Back caching has more tricky security issues, particularly
when more than one user is changing a file on a client, given that the file
can't be "opened" with the server.

> Actually AFS is most definitely a stateful filesystem and as such has
> the concept of an open file.

Check the AFS fileserver API. There is no open call (and no close call). You
_can't_ "open" the file on the server. All you can do is fetch/store data or
metadata which gets you a short-term guarantee that you'll be told if what you
have is invalidated. (Admittedly, there are also calls for creating/destroying
various sorts of files, but they aren't pertinent to this discussion).

Thus, every time you access the file, your credentials have to be effectively
rechecked (the fileserver may cache granted permissions, but that's an
optimisation).

Furthermore, whilst the fileserver will inform you if the mode bits on a vnode
are changed, and probably when the ACL is changed on a directory, it won't
inform you when group ACEs of that ACL are changed in the protection
server. Therefore, the access rights granted to any particular set of
credentials may change without a client knowing about it - even if it still
holds a valid unbroken lease.

My point is that it can't be assumed that security details negotiated at
client-side open will still be valid come the first operation on that file.

> First of all, typical write-write sharing is extremely low, otherwise
> Coda wouldn't have even worked with it's optimistic replication and
> caching. Second of all permission changes are even rarer.

Rare isn't the same as impossible. If it can happen, it has to be considered
and has to be catered for.

> - We either succeed in writing the file when the person who lost access
>   closed before the last writer who still had access. Big deal, he had
>   access when he opened the file, and it is next to impossible to tell
>   wheter his close arrived before or after the permission was taken away
>   except when every operation is somehow given an absolute linear
>   ordering in time. Generally too expensive on a distributed system
>   with clients that may come and go as they please.

So someone else submits the now denied user's changes, thus allowing them to
bypass security... I'm not arguing that you're necessarily wrong. Some
compromise has to be made, and without being able to account to the server for
every one who has made a change to the data being submitted, and perform full
database style rollback and re-execute, the options are limited.

> - Or the person who lost access is the last one to close the file and
>   trigger the writeback. In this case the servers will deny the write
>   operation and the client will declare a conflict on that object and
>   switch to disconnected mode. Someone who has write permission can
>   repair the conflict.

This is not an option I can use.

> Actually, for any unmodified data there is only one copy. Once it is
> modified there are two (the dirty page and the logged change), why would
> there be a third, I don't care about the copy in the backing file...

You listed three (not including RAM): swap, logfile, underlying file.

> It has the advantage that even if you pull the powercord, we can still
> recover the last consistent set of metadata and from that validate both the
> data in the cache files, and revalidate everything against the servers,
> which we already need for the disconnected operation.

Does your cache run in synchronous mode with respect to the disc?

It may be that I can improve matters in my caching by using a journal... and
the VM/VFS has hooks for use by journalling filesystems such as EXT3.

David

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-06 16:49     ` Troy Benjegerdes
@ 2002-10-07  9:16       ` David Howells
  0 siblings, 0 replies; 317+ messages in thread
From: David Howells @ 2002-10-07  9:16 UTC (permalink / raw)
  To: Troy Benjegerdes; +Cc: David Howells, Linus Torvalds, linux-kernel


> Hrrm, well, I'm in the middle of deploying AFS (moving away from NFS), and
> one of the ideas I toyed with was how to get a diskless AFS client. (yeah,
> that sounds silly at first, 2GB disks used to be large, not systems with 2GB
> of ram are common.) A mostly kernel-based implementation of AFS would be
> quite usefull for this. (and the remaining bits could be in an initramfs)

My client will be able to run without using any cache other than the pagecache
(as it does now).

David

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-04 17:04                       ` Trond Myklebust
  2002-10-04 17:29                         ` David Howells
@ 2002-10-07 14:14                         ` David Howells
  2002-10-07 14:54                           ` Trond Myklebust
  1 sibling, 1 reply; 317+ messages in thread
From: David Howells @ 2002-10-07 14:14 UTC (permalink / raw)
  To: trond.myklebust; +Cc: David Howells, linux-kernel


Does NFSv4 involve caching? If so, might working out a common cache API be of
use to you?

David

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-07 14:14                         ` David Howells
@ 2002-10-07 14:54                           ` Trond Myklebust
  2002-10-07 15:36                             ` David Howells
  0 siblings, 1 reply; 317+ messages in thread
From: Trond Myklebust @ 2002-10-07 14:54 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel

>>>>> " " == David Howells <dhowells@cambridge.redhat.com> writes:

     > Does NFSv4 involve caching? If so, might working out a common
     > cache API be of use to you?

NFSv4 does not specify that files need to be backed by local storage
the way AFS does if that is what you mean. However it does offer
AFS-like features (such as file delegation / leases) that make a
cachefs a much more feasible proposition.

I, for one, would be very interested in seeing a cachefs add-on for
NFSv4. I think that it would be of great use for GRID / distributed
computation applications, which is where my personal interest in NFSv4
lies.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [PATCH] AFS filesystem for Linux (2/2)
  2002-10-07 14:54                           ` Trond Myklebust
@ 2002-10-07 15:36                             ` David Howells
  0 siblings, 0 replies; 317+ messages in thread
From: David Howells @ 2002-10-07 15:36 UTC (permalink / raw)
  To: trond.myklebust; +Cc: David Howells, linux-kernel


> NFSv4 does not specify that files need to be backed by local storage
> the way AFS does if that is what you mean. However it does offer
> AFS-like features (such as file delegation / leases) that make a
> cachefs a much more feasible proposition.
> 
> I, for one, would be very interested in seeing a cachefs add-on for
> NFSv4. I think that it would be of great use for GRID / distributed
> computation applications, which is where my personal interest in NFSv4
> lies.

Can you give me some sort of idea as to what keys I might use for indexing?

For instance, AFS has the following:

	PRIMARY KEY	SIZE		AUXILLIARY DATA IN INDEX
	==============	==============	============================
	cell name	up to 64 ASCII	- volume location database
					  server addresses

	volume ID	32-bit number	- name of volume
					- associated keys
					- fileserver addresses

	vnode ID	32-bit number	- access time
					- vnode metadata record pointer
					  - vnode ID version
					  - vnode data version
					  - modify time
					  - size
					  - data block pointers

Each index entry of course has a pointer back up the hierarchy.

Furthermore, to determine whether a cached file's contents are still valid, I
can compare the the vnode ID version and vnode data version numbers against
the server.

Not all these indices and keys will necessarily be useful for NFS.

David

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: LANANA: Getting out of hand?
@ 2001-05-15 16:26 Rick Hohensee
  0 siblings, 0 replies; 317+ messages in thread
From: Rick Hohensee @ 2001-05-15 16:26 UTC (permalink / raw)
  To: linux-kernel

Torvalds sez
>On Mon, 14 May 2001, Alan Cox wrote:
>>
>> Except that Linus wont hand out major numbers, which means I can't even boot
>> simply off such a device. I bet the vendors in question dont think the sun
>> shines out of linus backside any more.
>
>Actually, it does. It's just that some people have gotten so blinded by my
>a** that they can no longer see it any more ;)
>
>The problem I have is that there are lots of _good_ solutions, but they
>all imply a bit more work than the bad ones.
>
>What does that result in? Everybody continues to use the simple old setup,
>which required no thought at all, but that is a pain to maintain.
>
>For example, the only thing you need in order to boot is to have a nice
>clean "disk" major number. That's it. Nothing fancy, nothing more.

To what extent do the code of the various drivers reflect that? i.e. is there
some code that is common to all block devices, and that is used by code that 
is common to all disk devices, that further is used by all drivers pretending 
to be IDE devices, which is further used by all code pretending to be EIDE 
devices, etc. ?  If you look at majors, say, as a binary tree which you 
walk in accordance with the bits in the major, can drivers nest like that?

I've wondered about that for a long time for various reasons.

Rick Hohensee
www.clienux.com


^ permalink raw reply	[flat|nested] 317+ messages in thread

end of thread, other threads:[~2002-10-07 15:31 UTC | newest]

Thread overview: 317+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-05-14 19:19 LANANA: To Pending Device Number Registrants H. Peter Anvin
2001-05-14 19:36 ` Jeff Garzik
2001-05-14 19:57   ` H. Peter Anvin
2001-05-14 20:04     ` Jeff Garzik
2001-05-14 20:09     ` Alan Cox
2001-05-14 20:24       ` Jeff Garzik
2001-05-14 20:27         ` H. Peter Anvin
2001-05-14 22:21           ` Alan Cox
2001-05-14 23:43             ` Jan Niehusmann
2001-05-14 23:48               ` Alan Cox
2001-05-14 20:29         ` Linus Torvalds
2001-05-14 20:55           ` Neil Brown
2001-05-14 21:20             ` Alan Cox
2001-05-14 21:37               ` Neil Brown
2001-05-14 21:24             ` Jeff Garzik
2001-05-14 21:33               ` Neil Brown
2001-05-15  6:41             ` Linus Torvalds
2001-05-15  8:57               ` Alan Cox
2001-05-15  9:08                 ` Linus Torvalds
2001-05-15  9:26                   ` Alan Cox
2001-05-15  9:49                     ` Alexander Viro
2001-05-15  9:51                       ` Alan Cox
2001-05-15 10:12                         ` Alexander Viro
2001-05-15 10:36                           ` Alan Cox
2001-05-15 15:16                         ` Linus Torvalds
2001-05-15 20:55                           ` Alan Cox
2001-05-15 15:10                     ` Linus Torvalds
2001-05-15 15:29                       ` Alexander Viro
2001-05-15 17:21                       ` James Simmons
2001-05-15 17:25                         ` Alexander Viro
2001-05-15 17:29                           ` James Simmons
2001-05-15 17:32                             ` Alexander Viro
2001-05-15 17:44                               ` James Simmons
2001-05-15 18:18                                 ` Ingo Oeser
2001-05-15 18:36                                   ` James Simmons
2001-05-15 18:42                                 ` Alexander Viro
2001-05-16  8:29                                 ` Helge Hafting
2001-05-16 17:16                                   ` James Simmons
2001-05-15 21:46                               ` Chip Salzenberg
2001-05-15 21:50                                 ` James Simmons
2001-05-15 18:04                             ` Linus Torvalds
2001-05-15 18:58                               ` Johannes Erdfelt
2001-05-15 19:17                                 ` Linus Torvalds
2001-05-15 19:23                                   ` H. Peter Anvin
2001-05-15 19:43                                   ` Johannes Erdfelt
2001-05-15 21:58                                     ` Chip Salzenberg
2001-05-16  8:51                                     ` Helge Hafting
2001-05-17 10:20                                     ` Pavel Machek
2001-05-18 17:32                                       ` Johannes Erdfelt
2001-05-19 10:21                                         ` Pavel Machek
2001-05-19  8:18                                     ` Kai Henningsen
2001-05-17 20:40                                 ` Kai Henningsen
2001-05-17 22:46                                   ` Johannes Erdfelt
2001-05-15 20:03                               ` James Simmons
2001-05-15 20:06                                 ` H. Peter Anvin
2001-05-15 20:28                                   ` James Simmons
2001-05-15 21:20                                     ` Nicolas Pitre
2001-05-15 21:28                                       ` James Simmons
2001-05-15 21:31                                         ` H. Peter Anvin
2001-05-15 21:43                                         ` Johannes Erdfelt
2001-05-15 21:49                                           ` James Simmons
2001-05-16  7:05                                           ` Kai Henningsen
2001-05-15 22:07                                         ` Alan Cox
2001-05-16  7:11                                         ` Kai Henningsen
2001-05-16  7:43                                           ` Alexander Viro
2001-05-16  9:45                                             ` Malcolm Beattie
2001-05-16  0:59                                       ` Daniel Phillips
2001-05-16  1:34                                         ` Nicolas Pitre
2001-05-16  1:51                                           ` Jonathan Lundell
2001-05-16 11:34                                         ` Erik Mouw
2001-05-17 17:07                                         ` Eric W. Biederman
2001-05-17 19:30                                           ` Jeff Randall
2001-05-16  7:17                                       ` Kai Henningsen
2001-05-15 20:14                                 ` Alexander Viro
2001-05-15 20:30                                   ` H. Peter Anvin
2001-05-15 20:41                                     ` Alexander Viro
2001-05-15 20:51                                       ` Linus Torvalds
2001-05-16  1:01                                         ` Daniel Phillips
2001-05-16  1:04                                           ` H. Peter Anvin
2001-05-15 20:37                                   ` Linus Torvalds
2001-05-15 20:56                                     ` Jeff Garzik
2001-05-15 21:22                                     ` James Simmons
2001-05-17 10:42                                     ` Pavel Machek
2001-05-18 18:32                                       ` James Simmons
2001-05-19 10:23                                         ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants] Pavel Machek
2001-05-19 19:00                                           ` Linus Torvalds
2001-05-19 19:17                                             ` Pavel Machek
2001-05-19 19:35                                               ` Linus Torvalds
2001-05-19 19:43                                                 ` Pavel Machek
2001-05-19 20:31                                                   ` Tim Jansen
2001-05-19 23:57                                                 ` Alexander Viro
2001-05-20  7:18                                                   ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumber Registrants] Abramo Bagnara
2001-05-20  7:41                                                     ` Alexander Viro
2001-05-20  8:30                                                       ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumberRegistrants] Abramo Bagnara
2001-05-20 10:09                                                         ` Alexander Viro
2001-05-20  0:01                                               ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants] Alexander Viro
2001-05-20 11:17                                                 ` handling network using filesystem [was Re: no ioctls for serial ports?] Pavel Machek
2001-05-20  9:53                                               ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Num Kai Henningsen
2001-05-20 13:40                                                 ` Alexander Viro
2001-05-20 14:27                                                   ` Tim Jansen
2001-05-20 14:30                                                   ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum Abramo Bagnara
2001-05-20 14:45                                                     ` Alexander Viro
2001-05-20 15:00                                                       ` Abramo Bagnara
2001-05-20 15:18                                                         ` Alexander Viro
2001-05-20 15:40                                                           ` Abramo Bagnara
2001-05-20 16:01                                                             ` Alexander Viro
2001-05-20 15:26                                                       ` Jakob Østergaard
2001-05-20 15:42                                                         ` Alexander Viro
2001-05-21 17:45                                                       ` Oliver Xymoron
2001-05-21 18:14                                                         ` Alexander Viro
2001-05-21 18:37                                                           ` Oliver Xymoron
2001-05-21 18:49                                                             ` Alexander Viro
2001-05-21 19:08                                                               ` Oliver Xymoron
2001-05-22  5:56                                                   ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Num Pavel Machek
2001-05-19 20:11                                             ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumber Registrants] Abramo Bagnara
2001-05-15 20:57                                   ` LANANA: To Pending Device Number Registrants James Simmons
2001-05-17 20:33                                   ` Kai Henningsen
2001-05-15 20:17                                 ` H. Peter Anvin
2001-05-15 21:59                                 ` Chip Salzenberg
2001-05-15 22:51                                   ` James Simmons
2001-05-15 21:22                               ` Jan Harkes
2001-05-15 21:39                               ` Martin Dalecki
2001-05-15 18:02                       ` Ingo Oeser
2001-05-15 19:31                       ` Richard Gooch
2001-05-15 19:37                         ` H. Peter Anvin
2001-05-15 20:10                         ` Alan Cox
2001-05-15 21:41                         ` Richard Gooch
2001-05-15 21:47                           ` Alexander Viro
2001-05-15 22:14                           ` Alan Cox
2001-05-15 22:24                           ` Richard Gooch
2001-05-15 22:27                             ` H. Peter Anvin
2001-05-15 22:38                             ` Alexander Viro
2001-05-15 22:28                           ` Richard Gooch
2001-05-15 22:32                             ` H. Peter Anvin
2001-05-15 22:33                             ` Alan Cox
2001-05-16  7:21                           ` Geert Uytterhoeven
2001-05-16 18:22                           ` Richard Gooch
2001-05-16 19:36                             ` H. Peter Anvin
2001-05-16 20:01                             ` Richard Gooch
2001-05-16 20:05                               ` H. Peter Anvin
2001-05-16 20:18                               ` Linus Torvalds
2001-05-16 20:44                               ` Richard Gooch
2001-05-16 20:54                               ` Richard Gooch
2001-05-16 21:36                                 ` H. Peter Anvin
2001-05-16 22:11                                   ` Ingo Oeser
2001-05-16 22:13                                     ` H. Peter Anvin
2001-05-16 22:21                                       ` Jens Axboe
2001-05-16 23:03                                     ` Richard Gooch
2001-05-16 23:25                                       ` H. Peter Anvin
2001-05-16 23:37                                       ` Richard Gooch
2001-05-16 23:38                                         ` H. Peter Anvin
2001-05-16 23:41                                         ` Richard Gooch
2001-05-16 23:43                                           ` H. Peter Anvin
2001-05-16 23:49                                           ` Richard Gooch
2001-05-16 23:55                                             ` H. Peter Anvin
2001-05-17 21:12                                             ` Kai Henningsen
2001-05-17 21:06                               ` Kai Henningsen
2001-05-16 23:51                             ` Alan Cox
2001-05-16 23:58                             ` Richard Gooch
2001-05-17  0:12                               ` H. Peter Anvin
2001-05-17  0:24                               ` Alan Cox
2001-05-17  1:35                               ` Jeff Garzik
2001-05-17  9:33                                 ` Guest section DW
2001-05-15 20:58                       ` Alan Cox
2001-05-15 21:42                       ` Chip Salzenberg
2001-05-15 21:46                         ` Alexander Viro
2001-05-15 21:57                           ` H. Peter Anvin
2001-05-15 22:07                             ` Chip Salzenberg
2001-05-15 22:11                               ` H. Peter Anvin
2001-05-15 22:18                           ` Alan Cox
2001-05-15 21:40                     ` Chip Salzenberg
2001-05-15 22:12                       ` Alan Cox
2001-05-15 22:19                         ` H. Peter Anvin
2001-05-15 22:28                           ` Alan Cox
2001-05-15 22:34                             ` H. Peter Anvin
2001-05-15 23:39                         ` Chip Salzenberg
2001-05-16 20:37                           ` Alan Cox
2001-05-15 22:49                       ` James Simmons
2001-05-15 23:22                       ` Kenneth Johansson
2001-05-15  9:28                   ` Alan Cox
2001-05-15 15:15                     ` Linus Torvalds
2001-05-15 15:19                       ` Jeff Garzik
2001-05-15 15:45                         ` Linus Torvalds
2001-05-15 17:27                           ` James Simmons
2001-05-15 17:43                             ` Linus Torvalds
2001-05-15 18:04                               ` Jeff Garzik
2001-05-15 18:15                                 ` Linus Torvalds
2001-05-15 19:36                                   ` Jonathan Lundell
2001-05-15 20:18                                     ` Linus Torvalds
2001-05-15 20:26                                       ` Dan Hollis
2001-05-15 22:14                                         ` Miles Lane
2001-05-15 21:29                                       ` Alex Bligh - linux-kernel
2001-05-15 21:36                                         ` Linus Torvalds
2001-05-15 22:03                                         ` Jeff Mahoney
2001-05-15 22:42                                         ` Andreas Dilger
2001-05-15 21:51                                       ` Mark Frazer
2001-05-15 22:35                                       ` Bob Glamm
2001-05-16  0:56                                       ` Jonathan Lundell
2001-05-16  2:31                                         ` Andrew Morton
2001-05-16  6:56                                         ` Jonathan Lundell
2001-05-16  8:02                                           ` Vojtech Pavlik
2001-05-16 12:20                                           ` Bogdan Costescu
2001-05-16 14:37                                           ` Jonathan Lundell
2001-05-16 14:57                                             ` Vojtech Pavlik
2001-05-16 15:24                                             ` Jonathan Lundell
2001-05-16  7:24                                       ` Geert Uytterhoeven
2001-05-16 23:26                                         ` Alan Cox
2001-05-16 23:31                                           ` H. Peter Anvin
2001-05-16 23:53                                             ` Linus Torvalds
2001-05-17  0:21                                             ` Alan Cox
2001-05-17  7:57                                               ` Geert Uytterhoeven
2001-05-17 16:26                                               ` James Simmons
2001-05-17  6:43                                             ` Thomas Sailer
2001-05-17 16:58                                               ` Tim Jansen
2001-05-17 17:18                                                 ` James Simmons
2001-05-17 17:29                                                   ` Geert Uytterhoeven
2001-05-17 17:41                                                   ` Tim Jansen
2001-05-17 22:03                                                 ` Oliver Neukum
2001-05-16 23:52                                           ` Linus Torvalds
2001-05-17  1:26                                           ` Joel Becker
2001-05-16 16:04                                       ` Michael Meissner
2001-05-16 21:36                                         ` Andreas Dilger
2001-05-18  2:18                                     ` Jonathan Lundell
2001-05-19 17:36                                       ` Jonathan Lundell
2001-05-20  9:37                                         ` Eric W. Biederman
2001-05-20 14:16                                         ` Chris Wedgwood
2001-05-20 15:54                                         ` Jonathan Lundell
2001-05-20 15:57                                         ` Jonathan Lundell
2001-05-19 17:45                                       ` Jonathan Lundell
2001-05-19  8:42                                     ` Kai Henningsen
2001-05-17 21:23                                   ` Kai Henningsen
2001-05-15 19:33                                 ` Kai Henningsen
2001-05-16  7:25                                 ` Geert Uytterhoeven
2001-05-15 18:19                               ` James Simmons
2001-05-15 20:23                               ` Alan Cox
2001-05-15 20:28                                 ` H. Peter Anvin
2001-05-15 21:52                               ` Andreas Dilger
2001-05-15 20:02                             ` Dan Hollis
2001-05-15 11:44               ` Neil Brown
2001-05-15 15:34                 ` Linus Torvalds
2001-05-16  1:00                   ` Daniel Phillips
2001-05-16 12:58                     ` Jens Axboe
2001-05-16  3:25                   ` Neil Brown
2001-05-15 15:51               ` John Fremlin
2001-05-14 21:09           ` Andi Kleen
2001-05-14 21:11           ` Rik van Riel
2001-05-14 21:23             ` Alan Cox
2001-05-15  0:33               ` Rik van Riel
2001-05-16  9:04                 ` Ingo Oeser
2001-05-14 21:16           ` Alan Cox
2001-05-14 22:05             ` Alexander Viro
2001-05-14 22:30               ` Alan Cox
2001-05-14 22:48                 ` Alexander Viro
2001-05-14 22:46                   ` Alan Cox
2001-05-14 22:53                     ` Alexander Viro
2001-05-14 22:54                       ` H. Peter Anvin
2001-05-14 23:00                         ` Alexander Viro
2001-05-14 22:58                           ` Alan Cox
2001-05-14 23:29                             ` Alexander Viro
2001-05-15  4:20                             ` God
2001-05-15  7:48                             ` 2.4 " bert hubert
2001-05-15  8:54                               ` Alan Cox
2001-05-15  9:09                                 ` bert hubert
2001-05-14 23:39                           ` LANANA: " Richard Gooch
2001-05-14 23:18                         ` Arjan van de Ven
2001-05-14 23:20                           ` Alan Cox
2001-05-15 18:57                           ` Kai Henningsen
2001-05-15  5:56                         ` Oliver Neukum
2001-05-15  5:59                           ` H. Peter Anvin
2001-05-14 22:55                       ` Alan Cox
2001-05-14 23:11                         ` Dan Hollis
2001-05-14 23:19                           ` Alan Cox
2001-05-14 23:23                         ` Alexander Viro
2001-05-15  1:10                         ` Keith Owens
2001-05-15  4:12                 ` LANANA: Getting out of hand? God
2001-05-15  4:30                   ` Linus Torvalds
2001-05-15  5:17                     ` Linus Torvalds
2001-05-15  8:24                     ` Geert Uytterhoeven
2001-05-15  8:48                     ` Alan Cox
2001-05-15 21:16                     ` Martin Dalecki
2001-05-14 23:01               ` Interrupted sound with 2.4.4-ac6 Hermann Himmelbauer
2001-05-14 23:34           ` LANANA: To Pending Device Number Registrants Richard Gooch
2001-05-14 21:18         ` Alan Cox
2001-05-14 23:32     ` Richard Gooch
2001-05-14 20:09 ` Richard Gooch
2001-05-14 20:14   ` Jeff Garzik
2001-05-15 17:37 ` Pavel Machek
2001-05-17 11:32   ` Alan Cox
2001-05-16 15:58 ` Kurt Garloff
2001-05-15 16:26 LANANA: Getting out of hand? Rick Hohensee
     [not found] <7146.1033580256@warthog.cambridge.redhat.com>
2002-10-03  0:36 ` [PATCH] AFS filesystem for Linux (2/2) Linus Torvalds
2002-10-03  9:05   ` David Howells
2002-10-03 16:53     ` Jan Harkes
2002-10-03 17:45       ` Jan Harkes
2002-10-03 21:46       ` David Howells
2002-10-04  8:13       ` David Howells
     [not found]       ` <15381.1033681790@warthog.cambridge.redhat.com>
2002-10-04 14:02         ` Jan Harkes
2002-10-04 14:40           ` Trond Myklebust
2002-10-04 15:35             ` David Howells
2002-10-04 15:53               ` Trond Myklebust
2002-10-04 15:56                 ` David Howells
2002-10-04 16:03                   ` Trond Myklebust
2002-10-04 16:17                     ` David Howells
2002-10-04 17:04                       ` Trond Myklebust
2002-10-04 17:29                         ` David Howells
2002-10-07 14:14                         ` David Howells
2002-10-07 14:54                           ` Trond Myklebust
2002-10-07 15:36                             ` David Howells
2002-10-04 16:30               ` Andreas Dilger
2002-10-04 15:34           ` David Howells
2002-10-04 16:07             ` Jan Harkes
2002-10-04 16:56               ` David Howells
2002-10-04 17:36                 ` Jan Harkes
2002-10-07  9:14                   ` David Howells
2002-10-06 16:49     ` Troy Benjegerdes
2002-10-07  9:16       ` David Howells
2002-10-04 14:11   ` [patch] [kkern] " Patrick Audley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).