linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] new system call mknod64
@ 2003-04-22  1:02 Andries.Brouwer
  2003-04-22  1:32 ` H. Peter Anvin
  0 siblings, 1 reply; 43+ messages in thread
From: Andries.Brouwer @ 2003-04-22  1:02 UTC (permalink / raw)
  To: hpa, linux-kernel

[You prefer sending to l-k only. But my mailbox is aeb@cwi.nl,
and l-k is read elsewhere. What you send there I may or may not see.
If you want me to see it, please cc.]

>> u64, or, if you prefer, as struct { u32 major, minor; }.

> Any reason why we don't just *make it* a struct?

Well, I have also done that of course. Both struct and u64 work well.
Since only kdev_t.h knows about the actual structure of kdev_t
it is very easy to switch.

--------------
typedef struct {
        u32 major;
        u32 minor;
} kdev_t;

#define major(dev)      ((dev).major)
#define minor(dev)      ((dev).minor)
#define mk_kdev(major, minor)   ((kdev_t) { major, minor } )

#define HASHDEV(dev)    (major(dev) ^ minor(dev))       /* arbitrary */
#define NODEV           (mk_kdev(0,0))
#define kdev_none(dev)  (major(dev) == 0 && minor(dev) == 0)

static inline int kdev_same(kdev_t dev1, kdev_t dev2)
{
        return (dev1.major == dev2.major) && (dev1.minor == dev2.minor);
}
--------------

(there are some defines in the tty code that have to be adapted,
that is all)


>> sys_mknod takes unsigned int (instead of dev_t)
>> sys_mknod64 takes two unsigned ints.

> Why unsigned int?  If we have a legacy call it should presumably use
> the legacy __u16 format.

That would become rather ugly. The present situation is not u16,
it depends on the architecture. But unsigned int covers the
present situation on all architectures.

Andries

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-22  1:02 [PATCH] new system call mknod64 Andries.Brouwer
@ 2003-04-22  1:32 ` H. Peter Anvin
  2003-04-22  2:01   ` Jamie Lokier
  0 siblings, 1 reply; 43+ messages in thread
From: H. Peter Anvin @ 2003-04-22  1:32 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <UTC200304220102.h3M126n06187.aeb@smtp.cwi.nl>
By author:    Andries.Brouwer@cwi.nl
In newsgroup: linux.dev.kernel
> 
> Well, I have also done that of course. Both struct and u64 work well.
> Since only kdev_t.h knows about the actual structure of kdev_t
> it is very easy to switch.
> 

The main advantage with making it a struct is that it keep people from
doing stupid stuff like (int)dev where dev is a kdev_t...  There is
all kinds of shit like that in the kernel...

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-22  1:32 ` H. Peter Anvin
@ 2003-04-22  2:01   ` Jamie Lokier
  2003-04-22  2:52     ` H. Peter Anvin
  0 siblings, 1 reply; 43+ messages in thread
From: Jamie Lokier @ 2003-04-22  2:01 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

H. Peter Anvin wrote:
> > Well, I have also done that of course. Both struct and u64 work well.
> > Since only kdev_t.h knows about the actual structure of kdev_t
> > it is very easy to switch.
> > 
> 
> The main advantage with making it a struct is that it keep people from
> doing stupid stuff like (int)dev where dev is a kdev_t...  There is
> all kinds of shit like that in the kernel...

If you want that good quality 64-bit code, try making it a struct
containing just a u64 :)

-- Jamie

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-22  2:01   ` Jamie Lokier
@ 2003-04-22  2:52     ` H. Peter Anvin
  2003-04-22  6:00       ` jw schultz
  0 siblings, 1 reply; 43+ messages in thread
From: H. Peter Anvin @ 2003-04-22  2:52 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel

Jamie Lokier wrote:
>>
>>The main advantage with making it a struct is that it keep people from
>>doing stupid stuff like (int)dev where dev is a kdev_t...  There is
>>all kinds of shit like that in the kernel...
> 
> If you want that good quality 64-bit code, try making it a struct
> containing just a u64 :)
> 

Perhaps:

#if BITS_PER_LONG == 64
typedef struct { u64 val; } kdev_t;

/* Macros for major minor mkdev */
#else
typedef struct { u32 major, minor; } kdev_t;

/* Macros... */
#endif

	-hpa



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-22  2:52     ` H. Peter Anvin
@ 2003-04-22  6:00       ` jw schultz
  2003-04-22 17:17         ` H. Peter Anvin
  0 siblings, 1 reply; 43+ messages in thread
From: jw schultz @ 2003-04-22  6:00 UTC (permalink / raw)
  To: linux-kernel

On Mon, Apr 21, 2003 at 07:52:04PM -0700, H. Peter Anvin wrote:
> Jamie Lokier wrote:
> >>
> >>The main advantage with making it a struct is that it keep people from
> >>doing stupid stuff like (int)dev where dev is a kdev_t...  There is
> >>all kinds of shit like that in the kernel...
> > 
> > If you want that good quality 64-bit code, try making it a struct
> > containing just a u64 :)
> > 
> 
> Perhaps:
> 
> #if BITS_PER_LONG == 64
> typedef struct { u64 val; } kdev_t;
> 
> /* Macros for major minor mkdev */
> #else
> typedef struct { u32 major, minor; } kdev_t;
> 
> /* Macros... */
> #endif
> 

or a union?
typedef union { u64 dev; struct { u32 major, minor; } d; } kdev_t;

<duck>

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-22  6:00       ` jw schultz
@ 2003-04-22 17:17         ` H. Peter Anvin
  0 siblings, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2003-04-22 17:17 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20030422060013.GO16934@pegasys.ws>
By author:    jw schultz <jw@pegasys.ws>
In newsgroup: linux.dev.kernel
>
> On Mon, Apr 21, 2003 at 07:52:04PM -0700, H. Peter Anvin wrote:
> > Jamie Lokier wrote:
> > >>
> > >>The main advantage with making it a struct is that it keep people from
> > >>doing stupid stuff like (int)dev where dev is a kdev_t...  There is
> > >>all kinds of shit like that in the kernel...
> > > 
> > > If you want that good quality 64-bit code, try making it a struct
> > > containing just a u64 :)
> > > 
> > 
> > Perhaps:
> > 
> > #if BITS_PER_LONG == 64
> > typedef struct { u64 val; } kdev_t;
> > 
> > /* Macros for major minor mkdev */
> > #else
> > typedef struct { u32 major, minor; } kdev_t;
> > 
> > /* Macros... */
> > #endif
> > 
> 
> or a union?
> typedef union { u64 dev; struct { u32 major, minor; } d; } kdev_t;
> 

No... what I want to avoid, again, are idiots^Wpeople doing:

      foo = (int) dev->dev;

... or something like that.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
       [not found]     ` <20030422083014$0fe2@gated-at.bofh.it>
@ 2003-04-22 20:02       ` Arnd Bergmann
  0 siblings, 0 replies; 43+ messages in thread
From: Arnd Bergmann @ 2003-04-22 20:02 UTC (permalink / raw)
  To: Shachar Shemesh, linux-kernel

Shachar Shemesh wrote:


>>On x86_64, the struct produces the same code as the scalar.
>>The same is true on s390x.
>>  
> I don't know how x86_64 is doing, but x86 does not issue a bus error 
> when unaligned value is being accessed. It is therefor reasonable for 
> the compiler not to worry about it. If x86_64 is the same, the results 
> you report seem like a reasonable feature of gcc, rather than a bug.

Right, from include/asm-*/unaligned.h, you can see which archs are
able to do unaligned accesses:

cris, i386, m68k, ppc, ppc64, s390, s390x and x86_64

        Arnd <><

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 23:50   ` Jamie Lokier
@ 2003-04-22  8:24     ` Shachar Shemesh
  0 siblings, 0 replies; 43+ messages in thread
From: Shachar Shemesh @ 2003-04-22  8:24 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: H. Peter Anvin, linux-kernel

Jamie Lokier wrote:

>It varies very much between architectures.
>
>I just checked, and simple copies of this structure are absolutely
>atrocious in GCC 3.2 (I tried Alpha, Mips64 and Sparc64).  The code
>was approx. 3 times longer to copy the 32:32 struct than to copy a 64
>bit scalar.
>  
>
Last time I had access to gcc on sparc, copying a struct where the 
compiler guessed that non-aligned access may occure produced code that 
was guarenteed not to crash the program. This was tested in user mode, 
in 32 bit, but still...

Copying a struct with two 32 bit values does not prove to the compiler 
that it will be 64bit aligned. It is therefor reasonable for the 
compiler to assume it needs two 32 bit transfers, rather than one 64 bit 
transfer. Try adding "#pragme align 8", or whatever it is called, and 
seeing if this inefficiency goes away.

>On x86_64, the struct produces the same code as the scalar.
>The same is true on s390x.
>  
>
I don't know how x86_64 is doing, but x86 does not issue a bus error 
when unaligned value is being accessed. It is therefor reasonable for 
the compiler not to worry about it. If x86_64 is the same, the results 
you report seem like a reasonable feature of gcc, rather than a bug.

                Shachar

-- 
Shachar Shemesh
Open Source integration consultant
Home page & resume - http://www.shemesh.biz/



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 23:02 ` H. Peter Anvin
@ 2003-04-21 23:50   ` Jamie Lokier
  2003-04-22  8:24     ` Shachar Shemesh
  0 siblings, 1 reply; 43+ messages in thread
From: Jamie Lokier @ 2003-04-21 23:50 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

H. Peter Anvin wrote:
> > and in fact the patches I have been giving out use kdev_t
> > as internal format, where you can think of kdev_t as
> > u64, or, if you prefer, as struct { u32 major, minor; }.
> 
> Any reason why we don't just *make it* a struct?  (Well, besides that
> it'd somewhat suck on 64-bit architectures?)

It varies very much between architectures.

I just checked, and simple copies of this structure are absolutely
atrocious in GCC 3.2 (I tried Alpha, Mips64 and Sparc64).  The code
was approx. 3 times longer to copy the 32:32 struct than to copy a 64
bit scalar.

On x86_64, the struct produces the same code as the scalar.
The same is true on s390x.

If you change this to test 16:16, on i386 (or x86_64 with -m32),
the struct still produces the same code as the scalar.

Looks like a part of GCC that might be easy to improve, given that it
works quite well on some architectures already.

-- Jamie

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:22           ` Linus Torvalds
  2003-04-21 18:27             ` viro
  2003-04-21 18:35             ` Christoph Hellwig
@ 2003-04-21 23:49             ` Roman Zippel
  2 siblings, 0 replies; 43+ messages in thread
From: Roman Zippel @ 2003-04-21 23:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, David S. Miller, Andries.Brouwer, linux-kernel

Hi,

On Mon, 21 Apr 2003, Linus Torvalds wrote:

> > Not anymore for blockdevices.  And now that Al's back not anymore soon
> > for charater devices, too :)
> 
> Actually, we still do it for both block _and_ character devices.
> 
> Look at "nfs*xdr.c" to see what's up.

This is actually a good example to show how problematic the major/minor 
split is. It depends very much on the nfs server which bits you actually 
get back and it requires some guessing from the client side which bits it 
can use. A linux-2.2 server will happilly truncate that value to 16 bit, 
BSD will give you a 8:24 value back.
It's very unlikely that you can use a 64bit dev_t reliably with nfs in the 
foreseeable future.

bye, Roman



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 21:43 Andries.Brouwer
@ 2003-04-21 23:02 ` H. Peter Anvin
  2003-04-21 23:50   ` Jamie Lokier
  0 siblings, 1 reply; 43+ messages in thread
From: H. Peter Anvin @ 2003-04-21 23:02 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <UTC200304212143.h3LLh6e02148.aeb@smtp.cwi.nl>
By author:    Andries.Brouwer@cwi.nl
In newsgroup: linux.dev.kernel
> 
> and in fact the patches I have been giving out use kdev_t
> as internal format, where you can think of kdev_t as
> u64, or, if you prefer, as struct { u32 major, minor; }.
> 

Any reason why we don't just *make it* a struct?  (Well, besides that
it'd somewhat suck on 64-bit architectures?)

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 21:48 Andries.Brouwer
@ 2003-04-21 22:07 ` Linus Torvalds
  0 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2003-04-21 22:07 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: davem, hch, linux-kernel, viro, zippel


On Mon, 21 Apr 2003 Andries.Brouwer@cwi.nl wrote:
> 
> Yes. I hope you don't mind that I called this 32:32 monster kdev_t.

You two should form a comedy team. For a _very_ geeky audience.

			Linus


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
@ 2003-04-21 21:48 Andries.Brouwer
  2003-04-21 22:07 ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: Andries.Brouwer @ 2003-04-21 21:48 UTC (permalink / raw)
  To: Andries.Brouwer, davem, hch, linux-kernel, torvalds, viro, zippel

> Let's go for 32:32 internal and simply map upon mknod(2) and friends.
> On the syscall boundary.  End of problem.

Yes. I hope you don't mind that I called this 32:32 monster kdev_t.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
@ 2003-04-21 21:43 Andries.Brouwer
  2003-04-21 23:02 ` H. Peter Anvin
  0 siblings, 1 reply; 43+ messages in thread
From: Andries.Brouwer @ 2003-04-21 21:43 UTC (permalink / raw)
  To: torvalds, viro; +Cc: Andries.Brouwer, davem, hch, linux-kernel, zippel

Nice to see this discussion.

Linus says

> The question is only _where_ (not whether) we do the mapping. Right now we 
> keep "dev_t" in the same format as we give back to user space, and thus we 
> always map into that format internally. But we don't have to: we can have 
> an internal format that is different from the one we show users.

and in fact the patches I have been giving out use kdev_t
as internal format, where you can think of kdev_t as
u64, or, if you prefer, as struct { u32 major, minor; }.

As I wrote a month or two ago, my favourite version is to
have register_region work in the kdev_t space, rather than
the dev_t space, since intervals in kdev_t space have a
direct interpretation in terms of major, minor.

Andries

(Both versions do not differ very much;
as far as I am concerned the choice is not very important,
but the kdev_t version is slightly cleaner.)

(As Al already remarked, device numbers do not play much of a role
internally. I removed i_dev. We still have i_rdev.)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 19:35                       ` viro
@ 2003-04-21 20:02                         ` Linus Torvalds
  0 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2003-04-21 20:02 UTC (permalink / raw)
  To: viro
  Cc: Christoph Hellwig, Roman Zippel, David S. Miller,
	Andries.Brouwer, linux-kernel


On Mon, 21 Apr 2003 viro@parcelfarce.linux.theplanet.co.uk wrote:
> 
> stat() family, ustat(2), quota syscall, ioctls that pass device numbers,
> /dev/raw, RAID, probably process accounting.
> 
> FWIW, I believe that you are overestimating the amount of internal code
> that cares about device numbers.

I don't think so. I agree that it's not very many places, and in fact the 
reason we currently do _not_ do dev_t replacement at system call boundary 
is that it looks to be so rare that it's easier to always use the user 
representation, and then always do the explicit MINOR/MAJOR in the places 
that use dev_t.

I don't really care which way it is done (ie system call boundary or in 
usage), and I'm happy with either - as long as it always _does_ get done, 
and nobody ever uses the user representation that can have aliases for 
anything important.

(My preference, quite frankly, is to always have major/minor be explicit, 
and never deal with "dev_t" at all. Especially with a 64-bit dev_t it is 
actually often _faster_ and _simpler_ to just carry major/minor around 
explicitly because then gcc won't ever have to worry it's small deficient 
brain about "unsigned long long".)

		Linus


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 19:04                   ` Linus Torvalds
@ 2003-04-21 19:59                     ` H. Peter Anvin
  0 siblings, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2003-04-21 19:59 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <Pine.LNX.4.44.0304211153270.9109-100000@home.transmeta.com>
By author:    Linus Torvalds <torvalds@transmeta.com>
In newsgroup: linux.dev.kernel
> 
> But that _will_ force aliasing, unless you start doing some really funky 
> things (make the dev_t look more like a UTF-8 unicode-like extension, 
> which is obviously possible). In other words, there will be OTHER values 
> for "dev_t" that will _also_ look like the tuple <3,1>.
> 
>  - other values of dev_t that also look like <3,1> had better act 
>    _identically_ to the legacy values. It _has_ to work this way, since 
>    otherwise you'd have a total maintenance nightmare, with "ls -l"  
>    showing two device files as being identical, yet having different
>    behaviour.
> 

Actually, the lessons learned from many things including UTF-8 (which
unfortunately does have aliasing) seems to indicate that the only
right answer is that noncanonical aliases are *illegal.*  If we do
mapping on the syscall boundary, then the kernel will always report
canonical form, and we should just throw -EINVAL on receiving a
noncanonical device number if such a thing can exist at all.

FWIW, here is a completely alias-free encoding of dev_t which is also
backwards compatible and hole-free:

  dev_t := major<31:8> . minor<31:8> . major<7:0> . minor<7:0>

where . is bitwise concatenation.  One of the major advantages, other
that being alias-free, is that the resulting code is free from
conditionals.

typedef __u64 dev_t;

static inline __u32 MAJOR(dev_t __d)
{
	return (__u32)(__d >> 32) & 0xffffff00 |
	       (__u32)(__d >> 8)  & 0x000000ff;
}
static inline __u32 MINOR(dev_t __d)
{
	return (__u32)(__d >> 8) & 0xffffff00 |
	       (__u32)__d & 0x000000ff;
}
static inline dev_t MKDEV(__u32 __ma, __u32 __mi)
{
	return ((dev_t)(__ma & 0xffffff00) << 32) |
	       ((dev_t)(__ma & 0x000000ff) << 8) |
	       ((dev_t)(__mi & 0xffffff00) << 8) |
	       ((dev_t)__mi & 0x000000ff);
}

In i386 assembly language, using regcall(%eax,%edx,%ecx):

MAJOR:
	movb %ah,%dl
	movl %edx,%eax
	ret

MINOR:
	movb %al,%ah
	movb %dl,%al
	rorl $8,%eax
	ret

MKDEV:
	movl %eax,%ecx
	shll $8,%eax
	movb %dl,%ah
	movb %cl,%al
	shrl $24,%ecx
	movb %cl,%dl
	ret
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 19:05                     ` Linus Torvalds
@ 2003-04-21 19:35                       ` viro
  2003-04-21 20:02                         ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: viro @ 2003-04-21 19:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Roman Zippel, David S. Miller,
	Andries.Brouwer, linux-kernel

On Mon, Apr 21, 2003 at 12:05:32PM -0700, Linus Torvalds wrote:
> 
> On Mon, 21 Apr 2003 viro@parcelfarce.linux.theplanet.co.uk wrote:
> > 
> > Let's go for 32:32 internal and simply map upon mknod(2) and friends.
                                                             ^^^^^^^^^^^ ;-)
> stat() too.

stat() family, ustat(2), quota syscall, ioctls that pass device numbers,
/dev/raw, RAID, probably process accounting.

FWIW, I believe that you are overestimating the amount of internal code
that cares about device numbers.  Recent example: tty drivers.  99% of
references to tty->device were of form
	minor(tty->device)-tty->driver.start_minor.
Adding tty->index initialized to the above
	a) cleans the code up and kills a bunch of typos
	b) is obvious (albeit minor) optimization
	c) makes much more sense from the driver POV - "that's 5th of my
ttys" vs. "when somebody opens a device with dev_t equal to 5:69, they'll
get this tty".  The latter makes sense when we are opening that sucker -
at the same time when we decide which driver will handle it in the first
place.

If anything, I'd rather see code in char_dev.c give us a triple -
file_operations, pointer to whatever object driver wanted to associate
with this device number (tty_driver, in case of tty layer) and index.

We can do that easily (drivers/block/genhd.c has almost exactly what
we need), old drivers can still use minor() to their hearts' contest
and we can start killing a *lot* of ugly warts.  Old register_chrdev()
will work as usual - just tell the char_dev.c that everything with
that major should get a triple (file_operations, NULL, minor).  No
changes required....

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:54                 ` viro
@ 2003-04-21 19:16                   ` Jörn Engel
  0 siblings, 0 replies; 43+ messages in thread
From: Jörn Engel @ 2003-04-21 19:16 UTC (permalink / raw)
  To: viro
  Cc: Linus Torvalds, Christoph Hellwig, Roman Zippel, David S. Miller,
	Andries.Brouwer, linux-kernel, David Woodhouse

On Mon, 21 April 2003 19:54:24 +0100, viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Mon, Apr 21, 2003 at 11:35:31AM -0700, Linus Torvalds wrote:
> > 
> > Since they are always in canonical format, there is no way for them to 
> > have the aliasing issue. However, even then they _should_ be careful, 
> > since it would be very confusing (and bad) if they consider
> > 
> > 	0x00010100 	(major 1, minor 256)
> > 
> > to be fundamentally different from
> > 
> > 	0x01ff		(major 1, minor 255)
> > 
> > and cause problems that way.
> > 
> > In other words, if I'm a device driver, and I say that I want "range 
> > 0-0xfff" for "major 2", then I had better get _all_ of it.
> 
> Sure.  However, note that right now there is only one driver that
> wants a range bigger than 256 (and has to split it).  UNIX 98 ptys.

Once this whole matter has settled down a little, mtdchar might want
more than 256 as well. The good news though, is that the old range
should stay unchanged for compatibility and the more-than-256 range
can remain unsplit.

Jörn

-- 
Simplicity is prerequisite for reliability.
-- Edsger W. Dijkstra

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:58                   ` viro
@ 2003-04-21 19:05                     ` Linus Torvalds
  2003-04-21 19:35                       ` viro
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2003-04-21 19:05 UTC (permalink / raw)
  To: viro
  Cc: Christoph Hellwig, Roman Zippel, David S. Miller,
	Andries.Brouwer, linux-kernel


On Mon, 21 Apr 2003 viro@parcelfarce.linux.theplanet.co.uk wrote:
> 
> Let's go for 32:32 internal and simply map upon mknod(2) and friends.

stat() too.

> On the syscall boundary.  End of problem.

I agree - that would make it always be obvious where the mapping happens, 
_and_ it cleanly avoids the alias issue internally, so that we don't have 
to play games in device drivers that want big ranges.

		Linus


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:47                 ` Christoph Hellwig
  2003-04-21 18:58                   ` viro
@ 2003-04-21 19:04                   ` Linus Torvalds
  2003-04-21 19:59                     ` H. Peter Anvin
  1 sibling, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2003-04-21 19:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Roman Zippel, David S. Miller, Andries.Brouwer, linux-kernel


On Mon, 21 Apr 2003, Christoph Hellwig wrote:
>
> Why do we need to do a mapping?  Old applications just won't see the
> high bits (they're mapped to whatever overflow value) - values that
> fit into the old 16bit range should never be remapped.

Ehh.. Old and new drivers alike will use the MAJOR() macro, and that macro
had better work with old and new kernels. Agreed? (And if you don't agree,
don't even bother to answer, I'm not really interested in even discussing
something so fundamental).

And regardless of whether a person uses an old or a new library, they had
better see the same MAJOR() and MINOR() values for a legacy device, like
/dev/hda1. In other words, the library version of MAJOR(),MINOR() _has_ to
return the value 3,1, or it can break perfectly valid programs.

Again, if you don't agree, don't even bother sending me email any more
about this issue. This is not negotiable. We _will_ have backwards and
forwards compatibility, and that's final.

This means that MAJOR() has to look at bits 8..15 if the value is small. 
No ifs, buts and maybes about it.

HOWEVER, clearly MAJOR() has to look at other bits too, otherwise it 
wouldn't make any sense to make a bigger dev_t in the first place. The 
current MAJOR() is the logical extension.

But that _will_ force aliasing, unless you start doing some really funky 
things (make the dev_t look more like a UTF-8 unicode-like extension, 
which is obviously possible). In other words, there will be OTHER values 
for "dev_t" that will _also_ look like the tuple <3,1>.

And my requirements are that

 - other values of dev_t that also look like <3,1> had better act 
   _identically_ to the legacy values. It _has_ to work this way, since 
   otherwise you'd have a total maintenance nightmare, with "ls -l"  
   showing two device files as being identical, yet having different
   behaviour.

 - Device drivers that ask for "major 3, minors 0-0xfff" _have_ to do the 
   sane thing. In particular, in the presense of aliases (see above), it 
   has to match _both_ aliases (see above).

These are not things open for discussion. We know what the behaviour MUST 
BE. Aliases _have_ to behave identically, anything else is _indisputably_ 
crap.

And I claim that this means that you have to have a mapping somewhere. 
You're free to come up with new ideas, but I don't think it will work. 
Keep the above rules in mind: backwards compatibility and aliases that 
work identically. That's all it really boils down to.

		Linus


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:47                 ` Christoph Hellwig
@ 2003-04-21 18:58                   ` viro
  2003-04-21 19:05                     ` Linus Torvalds
  2003-04-21 19:04                   ` Linus Torvalds
  1 sibling, 1 reply; 43+ messages in thread
From: viro @ 2003-04-21 18:58 UTC (permalink / raw)
  To: Christoph Hellwig, Linus Torvalds, Roman Zippel, David S. Miller,
	Andries.Brouwer, linux-kernel

On Mon, Apr 21, 2003 at 07:47:49PM +0100, Christoph Hellwig wrote:
> On Mon, Apr 21, 2003 at 11:44:59AM -0700, Linus Torvalds wrote:
> > We HAVE to do the mapping somewhere. Old applications only use the lower 
> > 16 bits, and that's just something that MUST NOT be broken. 
> > 
> > The question is only _where_ (not whether) we do the mapping. Right now we 
> > keep "dev_t" in teh same format as we give back to user space, and thus we 
> > always map into that format internally. But we don't have to: we can have 
> > an internal format that is different from the one we show users.
> 
> Why do we need to do a mapping?  Old applications just won't see the
> high bits (they're mapped to whatever overflow value) - values that
> fit into the old 16bit range should never be remapped.

Why?  Whenever we deal with fs code, we _do_ map anyway (bytesex, if nothing
else).  Ditto for any network filesystems.

Let's go for 32:32 internal and simply map upon mknod(2) and friends.
On the syscall boundary.  End of problem.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:35               ` Linus Torvalds
@ 2003-04-21 18:54                 ` viro
  2003-04-21 19:16                   ` Jörn Engel
  0 siblings, 1 reply; 43+ messages in thread
From: viro @ 2003-04-21 18:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Roman Zippel, David S. Miller,
	Andries.Brouwer, linux-kernel

On Mon, Apr 21, 2003 at 11:35:31AM -0700, Linus Torvalds wrote:
> 
> On Mon, 21 Apr 2003 viro@parcelfarce.linux.theplanet.co.uk wrote:
> > > I personally think that anything that uses "dev_t" in _any_ other way than 
> > > <major,minor> is fundamentally broken.
> > 
> > Do you consider internal use of MKDEV-produced constants broken?
> 
> Since they are always in canonical format, there is no way for them to 
> have the aliasing issue. However, even then they _should_ be careful, 
> since it would be very confusing (and bad) if they consider
> 
> 	0x00010100 	(major 1, minor 256)
> 
> to be fundamentally different from
> 
> 	0x01ff		(major 1, minor 255)
> 
> and cause problems that way.
> 
> In other words, if I'm a device driver, and I say that I want "range 
> 0-0xfff" for "major 2", then I had better get _all_ of it.

Sure.  However, note that right now there is only one driver that
wants a range bigger than 256 (and has to split it).  UNIX 98 ptys.
All other character devices ask for less than that.  Block devices
are not asking for large ranges at all (it's either "map that area
this way" or "here's the range of device numbers for partitions of
that disk").

IOW, the only case that might be tempting is devpts.  And there we
simply don't care whether it's current 128:0--128:255, 129:0--129:255, ...
or 128:0--2047.  These guys live on a virtual fs that doesn't get
exported over network.

For now I'd propose to wrap large ranges over the major boundary _and_
have devpts ask for single range.  That allows to clean pty.c now (and have
current behaviour preserved) and after the switch we get only one change -
all these guys migrate to major 128.  Which, AFAICS, is a Good Thing(tm).
Everything else stays with the numebrs that are used now.

Comments?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:44               ` Linus Torvalds
  2003-04-21 18:47                 ` Christoph Hellwig
@ 2003-04-21 18:51                 ` H. Peter Anvin
  1 sibling, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2003-04-21 18:51 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <Pine.LNX.4.44.0304211141590.9109-100000@home.transmeta.com>
By author:    Linus Torvalds <torvalds@transmeta.com>
In newsgroup: linux.dev.kernel
> 
> Yes, we could make dev_t's internally be always 32+32, and do the
> marshalling at stat() time. That would actually be my preferred approach, 
> and would solve some of the problems with using "dev_t" as an opaque type 
> right now (ie it would solve the "discontiguous region" issue.
> 

That would be Andries' kdev_t approach.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:44               ` Linus Torvalds
@ 2003-04-21 18:47                 ` Christoph Hellwig
  2003-04-21 18:58                   ` viro
  2003-04-21 19:04                   ` Linus Torvalds
  2003-04-21 18:51                 ` H. Peter Anvin
  1 sibling, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2003-04-21 18:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Roman Zippel, David S. Miller, Andries.Brouwer, linux-kernel

On Mon, Apr 21, 2003 at 11:44:59AM -0700, Linus Torvalds wrote:
> We HAVE to do the mapping somewhere. Old applications only use the lower 
> 16 bits, and that's just something that MUST NOT be broken. 
> 
> The question is only _where_ (not whether) we do the mapping. Right now we 
> keep "dev_t" in teh same format as we give back to user space, and thus we 
> always map into that format internally. But we don't have to: we can have 
> an internal format that is different from the one we show users.

Why do we need to do a mapping?  Old applications just won't see the
high bits (they're mapped to whatever overflow value) - values that
fit into the old 16bit range should never be remapped.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:35             ` Christoph Hellwig
  2003-04-21 18:42               ` H. Peter Anvin
@ 2003-04-21 18:44               ` Linus Torvalds
  2003-04-21 18:47                 ` Christoph Hellwig
  2003-04-21 18:51                 ` H. Peter Anvin
  1 sibling, 2 replies; 43+ messages in thread
From: Linus Torvalds @ 2003-04-21 18:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Roman Zippel, David S. Miller, Andries.Brouwer, linux-kernel


On Mon, 21 Apr 2003, Christoph Hellwig wrote:
> 
> Just think s/major/dev_lo/g and s/minor/dev_hi/g.  This is the
> represantation for a legacy protocol.  Just because fat thinks
> of a filename as 8+3 Linux filenames don't have to be that format.

Yes, we could make dev_t's internally be always 32+32, and do the
marshalling at stat() time. That would actually be my preferred approach, 
and would solve some of the problems with using "dev_t" as an opaque type 
right now (ie it would solve the "discontiguous region" issue.

> Umm, no.  You're far to major/minor biased to realized live get a lot
> sipler for use if we don't do any complicated mapping of old dev_t
> to the larger dev_t.

We HAVE to do the mapping somewhere. Old applications only use the lower 
16 bits, and that's just something that MUST NOT be broken. 

The question is only _where_ (not whether) we do the mapping. Right now we 
keep "dev_t" in teh same format as we give back to user space, and thus we 
always map into that format internally. But we don't have to: we can have 
an internal format that is different from the one we show users.

		Linus


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:35             ` Christoph Hellwig
@ 2003-04-21 18:42               ` H. Peter Anvin
  2003-04-21 18:44               ` Linus Torvalds
  1 sibling, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2003-04-21 18:42 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20030421193546.A10287@infradead.org>
By author:    Christoph Hellwig <hch@infradead.org>
In newsgroup: linux.dev.kernel
> 
> Umm, no.  You're far to major/minor biased to realized live get a lot
> sipler for use if we don't do any complicated mapping of old dev_t
> to the larger dev_t.  With the proper ranges we can just map it
> numerically 1:1 to the new dev_t.  Yes, that means it's all in one
> new "major".  But who cares?
> 

Everyone who has to fix up the resulting mess.  There is such a thing
as backwards compatibility, you know, and it's usually a good thing.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:22           ` Linus Torvalds
  2003-04-21 18:27             ` viro
@ 2003-04-21 18:35             ` Christoph Hellwig
  2003-04-21 18:42               ` H. Peter Anvin
  2003-04-21 18:44               ` Linus Torvalds
  2003-04-21 23:49             ` Roman Zippel
  2 siblings, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2003-04-21 18:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Roman Zippel, David S. Miller,
	Andries.Brouwer, linux-kernel

On Mon, Apr 21, 2003 at 11:22:51AM -0700, Linus Torvalds wrote:
> Actually, we still do it for both block _and_ character devices.
> 
> Look at "nfs*xdr.c" to see what's up.

Just think s/major/dev_lo/g and s/minor/dev_hi/g.  This is the
represantation for a legacy protocol.  Just because fat thinks
of a filename as 8+3 Linux filenames don't have to be that format.

> The fact that the kernel internally has generalized it away doesn't 
> matter. Any kernel virtualization of the number still _has_ to account for 
> the fact that it's a real thing.
> 
> Put another way:
> 
> 	0x0000000000000101
> 
> _has_ to open the same file as
> 
> 	0x0000000100000001
> 
> because otherwise the kernel virtualization is broken (since they will
> look the same to a user, and they will end up being written to disk the
> same way).

Umm, no.  You're far to major/minor biased to realized live get a lot
sipler for use if we don't do any complicated mapping of old dev_t
to the larger dev_t.  With the proper ranges we can just map it
numerically 1:1 to the new dev_t.  Yes, that means it's all in one
new "major".  But who cares?


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:27             ` viro
@ 2003-04-21 18:35               ` Linus Torvalds
  2003-04-21 18:54                 ` viro
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2003-04-21 18:35 UTC (permalink / raw)
  To: viro
  Cc: Christoph Hellwig, Roman Zippel, David S. Miller,
	Andries.Brouwer, linux-kernel


On Mon, 21 Apr 2003 viro@parcelfarce.linux.theplanet.co.uk wrote:
> > I personally think that anything that uses "dev_t" in _any_ other way than 
> > <major,minor> is fundamentally broken.
> 
> Do you consider internal use of MKDEV-produced constants broken?

Since they are always in canonical format, there is no way for them to 
have the aliasing issue. However, even then they _should_ be careful, 
since it would be very confusing (and bad) if they consider

	0x00010100 	(major 1, minor 256)

to be fundamentally different from

	0x01ff		(major 1, minor 255)

and cause problems that way.

In other words, if I'm a device driver, and I say that I want "range 
0-0xfff" for "major 2", then I had better get _all_ of it. That means that 
I'd better get

	0x0200-0x02ff

_and_

	0x00020100-0x00020fff

and quite frankly, I think that ends up being a lot easier to handle if 
you just always consider it to be a <major,minor> split.

(but as long as the end result is equivalent, who cares?)

			Linus


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:22           ` Linus Torvalds
@ 2003-04-21 18:27             ` viro
  2003-04-21 18:35               ` Linus Torvalds
  2003-04-21 18:35             ` Christoph Hellwig
  2003-04-21 23:49             ` Roman Zippel
  2 siblings, 1 reply; 43+ messages in thread
From: viro @ 2003-04-21 18:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Roman Zippel, David S. Miller,
	Andries.Brouwer, linux-kernel

On Mon, Apr 21, 2003 at 11:22:51AM -0700, Linus Torvalds wrote:
 
> One way to avoid the bug is to always keep all dev_t numbers in "canonical 
> format". Which happens automatically if the interface is <major, minor> 
> rather than a 64-bit blob.
> 
> I personally think that anything that uses "dev_t" in _any_ other way than 
> <major,minor> is fundamentally broken.

Do you consider internal use of MKDEV-produced constants broken?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:10         ` Christoph Hellwig
  2003-04-21 18:21           ` viro
@ 2003-04-21 18:22           ` Linus Torvalds
  2003-04-21 18:27             ` viro
                               ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: Linus Torvalds @ 2003-04-21 18:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Roman Zippel, David S. Miller, Andries.Brouwer, linux-kernel


On Mon, 21 Apr 2003, Christoph Hellwig wrote:
> 
> Not anymore for blockdevices.  And now that Al's back not anymore soon
> for charater devices, too :)

Actually, we still do it for both block _and_ character devices.

Look at "nfs*xdr.c" to see what's up.

In other words, that split is definitely not virtual. It's a real thing 
with real visibility for users.

The fact that the kernel internally has generalized it away doesn't 
matter. Any kernel virtualization of the number still _has_ to account for 
the fact that it's a real thing.

Put another way:

	0x0000000000000101

_has_ to open the same file as

	0x0000000100000001

because otherwise the kernel virtualization is broken (since they will
look the same to a user, and they will end up being written to disk the
same way).

Thus any code that only looks at 64-bit dev_t without taking this into 
account is BUGGY. 

One way to avoid the bug is to always keep all dev_t numbers in "canonical 
format". Which happens automatically if the interface is <major, minor> 
rather than a 64-bit blob.

I personally think that anything that uses "dev_t" in _any_ other way than 
<major,minor> is fundamentally broken.

		Linus


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:10         ` Christoph Hellwig
@ 2003-04-21 18:21           ` viro
  2003-04-21 18:22           ` Linus Torvalds
  1 sibling, 0 replies; 43+ messages in thread
From: viro @ 2003-04-21 18:21 UTC (permalink / raw)
  To: Christoph Hellwig, Linus Torvalds, Roman Zippel, David S. Miller,
	Andries.Brouwer, linux-kernel

On Mon, Apr 21, 2003 at 07:10:13PM +0100, Christoph Hellwig wrote:
> On Mon, Apr 21, 2003 at 11:01:21AM -0700, Linus Torvalds wrote:
> > Oh, the split has huge meaning inside the kernel. We split the number 
> > every time we open the device, and use that split to look up the result.
> 
> Not anymore for blockdevices.  And now that Al's back not anymore soon
> for charater devices, too :)

Oh, we certainly _do_ split - simply because there are ranges that
belong to same driver (or driver and object).

	However, the split boundary is not uniform - it depends on
driver/object/whatnot.  IMO it's a moot point by now, anyway - most
of the kernel couldn't care less about device numbers.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 18:01       ` Linus Torvalds
@ 2003-04-21 18:10         ` Christoph Hellwig
  2003-04-21 18:21           ` viro
  2003-04-21 18:22           ` Linus Torvalds
  0 siblings, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2003-04-21 18:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Roman Zippel, David S. Miller, Andries.Brouwer, hch, linux-kernel

On Mon, Apr 21, 2003 at 11:01:21AM -0700, Linus Torvalds wrote:
> Oh, the split has huge meaning inside the kernel. We split the number 
> every time we open the device, and use that split to look up the result.

Not anymore for blockdevices.  And now that Al's back not anymore soon
for charater devices, too :)

Really, it's a bad idea to introduce this arbitrary split now that
the major/minor split is basically gone inkernel after lots of hard
work.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-21 11:59     ` Roman Zippel
@ 2003-04-21 18:01       ` Linus Torvalds
  2003-04-21 18:10         ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2003-04-21 18:01 UTC (permalink / raw)
  To: Roman Zippel; +Cc: David S. Miller, Andries.Brouwer, hch, linux-kernel


On Mon, 21 Apr 2003, Roman Zippel wrote:
> 
> May I ask what advantage it has to split that number?
> Everywhere it's just a simple number, only when we present that number to 
> the user, we create some kind of illusion that this split has any meaning.

Oh, the split has huge meaning inside the kernel. We split the number 
every time we open the device, and use that split to look up the result.

There's another issue, though, which may or may not be a good thing. If we 
split and re-create the device number, that will always force the "dev_t" 
to be in "canonical form", ie if the major and minor both fit in 8 bits, 
then we will always fit the whole dev_t in 16 bits. 

This shows up as a difference in the two approaches: if you consider the 
user-supplied number as a unsplit binary "dev_t", then the user can supply 
a 64-bit number like 0x00000001000000001, and we will actually use that as 
the dev_t. However, if we split it up, and the user supplies <1,1>, then 
we will always generate 0x0000000000000101 as the 64-bit dev_t, and there 
is never any way to generate the "non-canonical" form.

Does it matter? Probably not. I actually think it's slightly preferable to 
alway shave things in the canonical form, and the networked filesystems 
will generally canonicalize it anyway since they usually split it up into 
major/minor. But it _is_ potentially a user-visible difference.

		Linus


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-20 21:56   ` Linus Torvalds
@ 2003-04-21 11:59     ` Roman Zippel
  2003-04-21 18:01       ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: Roman Zippel @ 2003-04-21 11:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David S. Miller, Andries.Brouwer, hch, linux-kernel

Hi,

On Sun, 20 Apr 2003, Linus Torvalds wrote:

> The kernel should get major and minor numbers. It's a sad mistake that 
> UNIX uses "dev_t" in the first place, and clearly the glibc interface to 
> user mode will have to be that historical braindamage. But we should 
> realize that the _right_ interface is keeping the <major, minor> tuple 
> explicit, and any new system call interfaces should be of that type.

May I ask what advantage it has to split that number?
Everywhere it's just a simple number, only when we present that number to 
the user, we create some kind of illusion that this split has any meaning.

bye, Roman


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-20 22:12 Andries.Brouwer
@ 2003-04-21  6:31 ` H. Peter Anvin
  0 siblings, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2003-04-21  6:31 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <UTC200304202212.h3KMCIW15403.aeb@smtp.cwi.nl>
By author:    Andries.Brouwer@cwi.nl
In newsgroup: linux.dev.kernel
>
> > the _right_ interface is keeping the <major, minor> tuple explicit
> 
> Good! I debated with myself and changed three times between
> 
> 	mknod64(path, mode, major, minor);
> 
> and
> 
> 	mknod64(path, mode, devhi, devlo);
> 
> 
> This becomes the fourth time.
> 
> Andries
> 
> [My choice is still unsigned int major, minor.
> Do you prefer __u32?]
> 

Yes, if we are splitting it we definitely should make it __u32
(uint32_t), especially to be nice to the 64/32 platforms.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
@ 2003-04-20 22:12 Andries.Brouwer
  2003-04-21  6:31 ` H. Peter Anvin
  0 siblings, 1 reply; 43+ messages in thread
From: Andries.Brouwer @ 2003-04-20 22:12 UTC (permalink / raw)
  To: davem, torvalds; +Cc: Andries.Brouwer, hch, linux-kernel

> the _right_ interface is keeping the <major, minor> tuple explicit

Good! I debated with myself and changed three times between

	mknod64(path, mode, major, minor);

and

	mknod64(path, mode, devhi, devlo);

This becomes the fourth time.

Andries


[My choice is still unsigned int major, minor.
Do you prefer __u32?]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-20 21:43 ` David S. Miller
@ 2003-04-20 21:56   ` Linus Torvalds
  2003-04-21 11:59     ` Roman Zippel
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2003-04-20 21:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: Andries.Brouwer, hch, linux-kernel


On Sun, 20 Apr 2003, David S. Miller wrote:
>    
>    Such an abstract statement nobody can disagree with.
>    Do you have an opinion in the mknod case?
>    
> If you are trying to reach 64-bit dev_t's, why not
> use __u64 as the argument?

That's not correct either.

I will refuse a patch that just takes "dev_t", since I just think that's 
stupid.

The kernel should get major and minor numbers. It's a sad mistake that 
UNIX uses "dev_t" in the first place, and clearly the glibc interface to 
user mode will have to be that historical braindamage. But we should 
realize that the _right_ interface is keeping the <major, minor> tuple 
explicit, and any new system call interfaces should be of that type.

			Linus


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-20 21:26 Andries.Brouwer
@ 2003-04-20 21:43 ` David S. Miller
  2003-04-20 21:56   ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: David S. Miller @ 2003-04-20 21:43 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: hch, linux-kernel, torvalds

   From: Andries.Brouwer@cwi.nl
   Date: Sun, 20 Apr 2003 23:26:24 +0200 (MEST)
   
   Such an abstract statement nobody can disagree with.
   Do you have an opinion in the mknod case?
   
If you are trying to reach 64-bit dev_t's, why not
use __u64 as the argument?

What I wouldn't be happy with would be any usage of
"long" or pointers as that is the usual source of troubles.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
@ 2003-04-20 21:26 Andries.Brouwer
  2003-04-20 21:43 ` David S. Miller
  0 siblings, 1 reply; 43+ messages in thread
From: Andries.Brouwer @ 2003-04-20 21:26 UTC (permalink / raw)
  To: Andries.Brouwer, davem; +Cc: hch, linux-kernel, torvalds

    From davem@redhat.com  Sun Apr 20 23:12:17 2003

    > Yesterday or the day before Linus preferred __u32 etc for this
    > loopinfo64 ioctl, so I did it that way. Here, since mknod is a
    > traditional Unix system call, I am still inclined to prefer
    > (unsigned) int above __u32.  Of course it doesn't matter much.

    To 64-bit platforms implementing 32-bit compatability layers,
    it can matter a ton to use portable vs. non-portable types.

Such an abstract statement nobody can disagree with.
Do you have an opinion in the mknod case?

(For example, I do not suppose anybody would argue that
open() should return an __u32 instead of an int.)

Andries

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-20 20:34 Andries.Brouwer
@ 2003-04-20 21:12 ` David S. Miller
  0 siblings, 0 replies; 43+ messages in thread
From: David S. Miller @ 2003-04-20 21:12 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: hch, linux-kernel, torvalds

On Sun, 2003-04-20 at 13:34, Andries.Brouwer@cwi.nl wrote:
> Yesterday or the day before Linus preferred __u32 etc for this
> loopinfo64 ioctl, so I did it that way. Here, since mknod is a
> traditional Unix system call, I am still inclined to prefer
> (unsigned) int above __u32.  Of course it doesn't matter much.

To 64-bit platforms implementing 32-bit compatability layers,
it can matter a ton to use portable vs. non-portable types.

-- 
David S. Miller <davem@redhat.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
@ 2003-04-20 20:34 Andries.Brouwer
  2003-04-20 21:12 ` David S. Miller
  0 siblings, 1 reply; 43+ messages in thread
From: Andries.Brouwer @ 2003-04-20 20:34 UTC (permalink / raw)
  To: Andries.Brouwer, hch; +Cc: linux-kernel, torvalds

    From: Christoph Hellwig <hch@infradead.org>

    On Sun, Apr 20, 2003 at 08:39:32PM +0200, Andries.Brouwer@cwi.nl wrote:
    > Change the type of the mknod arg from dev_t to unsigned int.
    > Add (for i386) mknod64.

    Please make the argument for mknod/mknod64 __u32/__u64.  And I
    don't think adding the syscall makes sense before the internal
    dev_t representation has changed.

Yes, there is a dozen rather uninteresting patches that can
be applied any moment. But a new system call is more important,
so I show it in public at some earlier stage, so that Linus and
others, like you, can comment.

Yesterday or the day before Linus preferred __u32 etc for this
loopinfo64 ioctl, so I did it that way. Here, since mknod is a
traditional Unix system call, I am still inclined to prefer
(unsigned) int above __u32.  Of course it doesn't matter much.

Andries

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] new system call mknod64
  2003-04-20 18:39 Andries.Brouwer
@ 2003-04-20 18:52 ` Christoph Hellwig
  0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2003-04-20 18:52 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: torvalds, linux-kernel

On Sun, Apr 20, 2003 at 08:39:32PM +0200, Andries.Brouwer@cwi.nl wrote:
> Change the type of the mknod arg from dev_t to unsigned int.
> Add (for i386) mknod64.

Please make the argument for mknod/mknod64 __u32/__u64.  And I
don't think adding the syscall makes sense before the internal
dev_t represantion has changed.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH] new system call mknod64
@ 2003-04-20 18:39 Andries.Brouwer
  2003-04-20 18:52 ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Andries.Brouwer @ 2003-04-20 18:39 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

Change the type of the mknod arg from dev_t to unsigned int.
Add (for i386) mknod64.

diff -u --recursive --new-file -X /linux/dontdiff a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
--- a/arch/i386/kernel/entry.S	Tue Mar 25 04:54:29 2003
+++ b/arch/i386/kernel/entry.S	Sun Apr 20 19:13:34 2003
@@ -852,6 +852,7 @@
  	.long sys_clock_gettime		/* 265 */
  	.long sys_clock_getres
  	.long sys_clock_nanosleep
+	.long sys_mknod64
  
  
 nr_syscalls=(.-sys_call_table)/4
diff -u --recursive --new-file -X /linux/dontdiff a/fs/namei.c b/fs/namei.c
--- a/fs/namei.c	Sun Apr 20 12:59:32 2003
+++ b/fs/namei.c	Sun Apr 20 19:13:34 2003
@@ -1403,11 +1403,12 @@
 	return error;
 }
 
-asmlinkage long sys_mknod(const char __user * filename, int mode, dev_t dev)
+static long
+do_mknod(const char __user *filename, int mode, dev_t dev)
 {
 	int error = 0;
-	char * tmp;
-	struct dentry * dentry;
+	char *tmp;
+	struct dentry *dentry;
 	struct nameidata nd;
 
 	if (S_ISDIR(mode))
@@ -1448,6 +1449,19 @@
 	return error;
 }
 
+asmlinkage long
+sys_mknod(const char __user *filename, int mode, unsigned int dev)
+{
+	return do_mknod(filename, mode, dev);
+}
+
+asmlinkage long
+sys_mknod64(const char __user *filename, int mode,
+	    unsigned int devhi, unsigned int devlo)
+{
+	return do_mknod(filename, mode, ((u64) devhi << 32) + devlo);
+}
+
 int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
 {
 	int error = may_create(dir, dentry);
@@ -2140,8 +2154,8 @@
 {
 	struct page * page;
 	struct address_space *mapping = dentry->d_inode->i_mapping;
-	page = read_cache_page(mapping, 0, (filler_t *)mapping->a_ops->readpage,
-				NULL);
+	page = read_cache_page(mapping, 0,
+			       (filler_t *)mapping->a_ops->readpage, NULL);
 	if (IS_ERR(page))
 		goto sync_fail;
 	wait_on_page_locked(page);
diff -u --recursive --new-file -X /linux/dontdiff a/include/asm-i386/unistd.h b/include/asm-i386/unistd.h
--- a/include/asm-i386/unistd.h	Mon Feb 24 23:02:56 2003
+++ b/include/asm-i386/unistd.h	Sun Apr 20 19:13:34 2003
@@ -273,8 +273,9 @@
 #define __NR_clock_gettime	(__NR_timer_create+6)
 #define __NR_clock_getres	(__NR_timer_create+7)
 #define __NR_clock_nanosleep	(__NR_timer_create+8)
+#define __NR_sys_mknod64	268
 
-#define NR_syscalls 268
+#define NR_syscalls 269
 
 /* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */
 

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2003-04-22 19:52 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-22  1:02 [PATCH] new system call mknod64 Andries.Brouwer
2003-04-22  1:32 ` H. Peter Anvin
2003-04-22  2:01   ` Jamie Lokier
2003-04-22  2:52     ` H. Peter Anvin
2003-04-22  6:00       ` jw schultz
2003-04-22 17:17         ` H. Peter Anvin
     [not found] <20030421215009$2052@gated-at.bofh.it>
     [not found] ` <20030421231010$7ee3@gated-at.bofh.it>
     [not found]   ` <20030422000016$17e3@gated-at.bofh.it>
     [not found]     ` <20030422083014$0fe2@gated-at.bofh.it>
2003-04-22 20:02       ` Arnd Bergmann
  -- strict thread matches above, loose matches on Subject: below --
2003-04-21 21:48 Andries.Brouwer
2003-04-21 22:07 ` Linus Torvalds
2003-04-21 21:43 Andries.Brouwer
2003-04-21 23:02 ` H. Peter Anvin
2003-04-21 23:50   ` Jamie Lokier
2003-04-22  8:24     ` Shachar Shemesh
2003-04-20 22:12 Andries.Brouwer
2003-04-21  6:31 ` H. Peter Anvin
2003-04-20 21:26 Andries.Brouwer
2003-04-20 21:43 ` David S. Miller
2003-04-20 21:56   ` Linus Torvalds
2003-04-21 11:59     ` Roman Zippel
2003-04-21 18:01       ` Linus Torvalds
2003-04-21 18:10         ` Christoph Hellwig
2003-04-21 18:21           ` viro
2003-04-21 18:22           ` Linus Torvalds
2003-04-21 18:27             ` viro
2003-04-21 18:35               ` Linus Torvalds
2003-04-21 18:54                 ` viro
2003-04-21 19:16                   ` Jörn Engel
2003-04-21 18:35             ` Christoph Hellwig
2003-04-21 18:42               ` H. Peter Anvin
2003-04-21 18:44               ` Linus Torvalds
2003-04-21 18:47                 ` Christoph Hellwig
2003-04-21 18:58                   ` viro
2003-04-21 19:05                     ` Linus Torvalds
2003-04-21 19:35                       ` viro
2003-04-21 20:02                         ` Linus Torvalds
2003-04-21 19:04                   ` Linus Torvalds
2003-04-21 19:59                     ` H. Peter Anvin
2003-04-21 18:51                 ` H. Peter Anvin
2003-04-21 23:49             ` Roman Zippel
2003-04-20 20:34 Andries.Brouwer
2003-04-20 21:12 ` David S. Miller
2003-04-20 18:39 Andries.Brouwer
2003-04-20 18:52 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).