All of lore.kernel.org
 help / color / mirror / Atom feed
* RFD: x32 ABI system call numbers
@ 2011-08-26 23:00 H. Peter Anvin
  2011-08-26 23:13 ` Linus Torvalds
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-26 23:00 UTC (permalink / raw)
  To: LKML, Linus Torvalds, H.J. Lu, Ingo Molnar, Thomas Gleixner

Hello all,

As most of you know I and H.J. Lu have been working on a native 32-bit
ABI for x86-64 Linux.  H.J. has had a prototype git tree for a while; I
am currently in the process of cleaning up the kernel patches to post.

Before posting, Ingo suggested that I discuss the handling of system
calls, as this affects some of the machinery that needs to go into the
patchset.

x32 uses mostly the compat system calls already available for the i386
ABI (which means it also uses i386 ABI numbers and data structure
layouts).  There are only seven, mostly signal-related, entirely new
system calls, and most of them are trivial wrappers.

x32 uses the same SYSCALL64 instruction as native x86-64.  Currently, on
x86, the choice of system call ABI is a purely local property -- a
64-bit process can call int $0x80 and get the i386 ABI.  I have wanted
to keep this property and avoid testing global state for the meaning of
a system call.  As such, the only thing that is available to distinguish
an x32 system call from an x86-64 system call is the system call number
itself.

In the current patchset, rather than having two separate system call
tables (which would add several instructions to the system call entry
path, including for native 64-bit binaries) we have added the x32 system
calls to the 64-bit system call table with a small gap (starting at 512)
to avoid adding to the cache footprint of native 64-bit processes.

However, this leads to an annoying problem for the system calls which do
*not* need to be duplicated between x86-64 and x32, which is actually
most system calls -- 218 of 310 in the current kernel.  Unfortunately, a
single subsystem -- input -- uses is_compat() on a bunch of the I/O
paths, even changing things like the text format of sysfs entries
depending on the ABI of the user space process.

Rather than duplicating the system call table, we are proposing to deal
with that by setting bit 30 in the system call number across the board
when called from x32, so we end up with:

# Shared system call, sys_read (0)

x86-64:		%eax = 0x00000000
x32:		%eax = 0x40000000

# Unshared system call, sys_stat (4/513)

x86-64:		%eax = 0x00000004
x32:		%eax = 0x40000201

The extra bit would be masked off and only affect device drivers like
input which relies on is_compat().

The question here is if anyone has a reason to believe this would be
unacceptable.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-26 23:00 RFD: x32 ABI system call numbers H. Peter Anvin
@ 2011-08-26 23:13 ` Linus Torvalds
  2011-08-26 23:39   ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Linus Torvalds @ 2011-08-26 23:13 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On Fri, Aug 26, 2011 at 4:00 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Rather than duplicating the system call table, we are proposing to deal
> with that by setting bit 30 in the system call number across the board
> when called from x32, so we end up with:
>
> # Shared system call, sys_read (0)
>
> x86-64:         %eax = 0x00000000
> x32:            %eax = 0x40000000
>
> # Unshared system call, sys_stat (4/513)
>
> x86-64:         %eax = 0x00000004
> x32:            %eax = 0x40000201
>
> The extra bit would be masked off and only affect device drivers like
> input which relies on is_compat().

So a couple of questions:

 - why do we need another system call model at all?

 - And if that is clarified, why in the name of christ would you
unshare something like 'sys_stat()' to begin with? I really that's
just a crazy example, because otherwise I just have to assume that
people are being stupid.

 - Assuming the two others can be explained, and if this is relevant
only for x86-64, why not put it in bit 62? Right now we do

     call *sys_call_table(,%rax,8)

   which means that the high three bits (in a 64-bit word) are the
perfect place to put any flags: they'll be ignored without us having
to do any masking at all (of course, we'd still have to think about
the "cmpq $__NR_syscall_max,%rax" detail, so who knows).

> The question here is if anyone has a reason to believe this would be
> unacceptable.

I think the real question is "why?". I think we're missing a lot of
background for why we'd want yet another set of system calls at all,
and why we'd want another state flag. Why can't the x32 code just use
the native 64-bit system calls entirely?

                                   Linus

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-26 23:13 ` Linus Torvalds
@ 2011-08-26 23:39   ` H. Peter Anvin
  2011-08-27  0:36     ` Linus Torvalds
  2011-09-02  6:17     ` Andy Lutomirski
  0 siblings, 2 replies; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-26 23:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On 08/26/2011 04:13 PM, Linus Torvalds wrote:
>>
>> The extra bit would be masked off and only affect device drivers like
>> input which relies on is_compat().
> 
> So a couple of questions:
> 
>  - why do we need another system call model at all?

We think we can get more performance for a process which doesn't need
more than 4 GiB of virtual address space by allowing them to keep
pointers 4 bytes long, while still giving them the advantage of 16
64-bit registers, PC-relative addressing, and so on.  Furthermore, there
are users who seem more willing to port code known to not be 64-bit
clean to x32 than to do a whole new port.

If the question is "why not just thunk this in userspace", the answer is
that we'd like to take advantage of the compat layer already in the kernel.

If the question is "why not just use int $0x80" we actually did that in
early prototyping, but SYSCALL64 is much faster.

>  - And if that is clarified, why in the name of christ would you
> unshare something like 'sys_stat()' to begin with? I really that's
> just a crazy example, because otherwise I just have to assume that
> people are being stupid.

sys_stat is unshared because it involves data structures in memory.  In
x32, this invokes compat_sys_newstat just like you would from an i386
process.

In order to not create a completely new ABI we use the i386 in-memory
data structure layout everywhere, except of course for the ones where
the register set differences matter (for some of the signals.)

We have followed the 32-bit model fairly slavishly -- there is LFS vs
non-LFS for example -- to make the porting to x32 easier.  That doesn't
mean that there aren't system calls in our current list that are
unshared when they shouldn't be... I haven't done the full audit of the
list yet.

>  - Assuming the two others can be explained, and if this is relevant
> only for x86-64, why not put it in bit 62? Right now we do
> 
>      call *sys_call_table(,%rax,8)
> 
>    which means that the high three bits (in a 64-bit word) are the
> perfect place to put any flags: they'll be ignored without us having
> to do any masking at all (of course, we'd still have to think about
> the "cmpq $__NR_syscall_max,%rax" detail, so who knows).

First of all, loading a value into the high half of the 64-bit register
means using a 10-byte-long instruction instead of a 5-byte-long
instruction.  Second of all, we decided at some point (I don't know
when) that the system call number is %eax, not %rax, and we actually
mask off the top 32 bits already.  This change thus just means changing:

	movl	%eax, %eax

to

	andl	~0x40000000, %eax

Note that by keeping bit 31 intact we still do the right thing with the
compare.

(We avoided using bit 31 because there are number of places the kernel
assumes that a system call number expressed as either an int or a long
must be positive, and that a negative number represents a non-system
call kernel entry, e.g. interrupts.)

>> The question here is if anyone has a reason to believe this would be
>> unacceptable.
> 
> I think the real question is "why?". I think we're missing a lot of
> background for why we'd want yet another set of system calls at all,
> and why we'd want another state flag. Why can't the x32 code just use
> the native 64-bit system calls entirely?

I hope I have explained it above.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-26 23:39   ` H. Peter Anvin
@ 2011-08-27  0:36     ` Linus Torvalds
  2011-08-27  0:43       ` Linus Torvalds
  2011-08-27  0:57       ` H. Peter Anvin
  2011-09-02  6:17     ` Andy Lutomirski
  1 sibling, 2 replies; 94+ messages in thread
From: Linus Torvalds @ 2011-08-27  0:36 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On Fri, Aug 26, 2011 at 4:39 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 08/26/2011 04:13 PM, Linus Torvalds wrote:
>>
>>  - why do we need another system call model at all?
>
> We think we can get more performance for a process which doesn't need
> more than 4 GiB of virtual address space by allowing them to keep
> pointers 4 bytes long, while still giving them the advantage of 16
> 64-bit registers, PC-relative addressing, and so on.  Furthermore, there
> are users who seem more willing to port code known to not be 64-bit
> clean to x32 than to do a whole new port.

Let me repeat: what the f*&% does that have to do with a new system call model?

NOTHING.

At most, it means that we should add a PROT_4G to the mmap() system
call, so that user space can say "I want the result in the low 4G"
range. But even that wouldn't need a new system call.

> If the question is "why not just thunk this in userspace", the answer is
> that we'd like to take advantage of the compat layer already in the kernel.

And you *still* don't answer the question, you just make up new red
herring sentences.

Apart from mmap() (and to a lesser degree brk()), the kernel almost
never makes up user pointers. It's user space that points to them, and
gives them as arguments. There'sa few system calls that take pointers
to pointers: execve(), readv/writev, send/recvmsg, but then the actual
*example* you give isn't even one of those.

> If the question is "why not just use int $0x80" we actually did that in
> early prototyping, but SYSCALL64 is much faster.

No. The actual question is "why are you giving crazy examples, and
what the f*ck is going on"?

> sys_stat is unshared because it involves data structures in memory.  In
> x32, this invokes compat_sys_newstat just like you would from an i386
> process.

.. and this is exactly the kind of answer that makes me go NAK NAK NAK.

Christ no.

If you are doing a new native x32 model, then you DAMN WELL USE THE
EXISTING "stat()" system call.

There is *ZERO* reason to not use it. Use the standard 64-bit
structure layout. Why the hell would it be a new system call?

That's what I'm arguing against. This kind of crazy "let's make up YET
ANOTHER interface, even though the existing interface would work".
WHY?

If you want to be compatible with "int 0x80" and old libraries, then I
really don't see why you would introduce *anything* new.

And if it is truly a new ABI, then damn it, use the existing 64-bit
system calls as far as possible.

The "mix and match randomly and then introduce new system calls
because you made a bad design decision" sounds just crazy.

                              Linus

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-27  0:36     ` Linus Torvalds
@ 2011-08-27  0:43       ` Linus Torvalds
  2011-08-27  0:53         ` H. Peter Anvin
                           ` (2 more replies)
  2011-08-27  0:57       ` H. Peter Anvin
  1 sibling, 3 replies; 94+ messages in thread
From: Linus Torvalds @ 2011-08-27  0:43 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On Fri, Aug 26, 2011 at 5:36 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> There is *ZERO* reason to not use it. Use the standard 64-bit
> structure layout. Why the hell would it be a new system call?

Oh, I see why you do that. It's because our 64-bit 'struct stat' uses
"unsigned long" etc.

Just fix that. Make it use __u64 instead of "unsigned long", and
everything should "just work". The 64-bit kernel will not change any
ABI, and when you compile your new ia32 model, it will do the right
thing too.

The fact that we still use "unsigned long" in the x86 <asm/stat.h> is
certainly a bit embarrassing, but I guess that all predates us being
more aware of 32/64-bit issues. It really should be fixed regardless
of any ia32 interface issues.

                       Linus

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-27  0:43       ` Linus Torvalds
@ 2011-08-27  0:53         ` H. Peter Anvin
  2011-08-27  1:18           ` Linus Torvalds
  2011-08-27  1:12         ` H. Peter Anvin
  2011-09-06 20:40         ` Florian Weimer
  2 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-27  0:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On 08/26/2011 05:43 PM, Linus Torvalds wrote:
> On Fri, Aug 26, 2011 at 5:36 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> There is *ZERO* reason to not use it. Use the standard 64-bit
>> structure layout. Why the hell would it be a new system call?
> 
> Oh, I see why you do that. It's because our 64-bit 'struct stat' uses
> "unsigned long" etc.
> 
> Just fix that. Make it use __u64 instead of "unsigned long", and
> everything should "just work". The 64-bit kernel will not change any
> ABI, and when you compile your new ia32 model, it will do the right
> thing too.
> 
> The fact that we still use "unsigned long" in the x86 <asm/stat.h> is
> certainly a bit embarrassing, but I guess that all predates us being
> more aware of 32/64-bit issues. It really should be fixed regardless
> of any ia32 interface issues.
> 

Unfortunately, there is actually a reason for the use of "unsigned long"
here -- it means that the combination of the time and the _nsec fields
matches struct timespec.  struct timespec/struct timeval is one of those
things that it would be really nice if we *could* change (it's not
inherently pointer-sized, and it really should be 64 bits), but struct
timespec and struct timeval are embedded in a a number of memory
structures, some of which have pointers; and they are used by ioctls.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-27  0:36     ` Linus Torvalds
  2011-08-27  0:43       ` Linus Torvalds
@ 2011-08-27  0:57       ` H. Peter Anvin
  2011-08-27  4:40         ` Christoph Hellwig
  1 sibling, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-27  0:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On 08/26/2011 05:36 PM, Linus Torvalds wrote:
> If you want to be compatible with "int 0x80" and old libraries, then I
> really don't see why you would introduce *anything* new.

Just to be clear, the reason to keep the LFS stuff in there was to be
compatible with the existing 32-bit *programming model*, so that a
program recompiled from i386 to x32 should behave the same.

Not that anyone should compile without -D_FILE_OFFSET_BITS=64 these days...

	-hpa


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-27  0:43       ` Linus Torvalds
  2011-08-27  0:53         ` H. Peter Anvin
@ 2011-08-27  1:12         ` H. Peter Anvin
  2011-08-27  1:42           ` Linus Torvalds
  2011-09-06 20:40         ` Florian Weimer
  2 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-27  1:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

For reference, this is the current list (again, unaudited!) of unshared
system calls.  Only the ones with *x32* in the the entry point name have
any new code in the kernel at all.

	-hpa

#
# x32 system calls start at 512 to avoid cache impact for native 32 bit
#
512     x32     open                    compat_sys_open
513     x32     stat                    compat_sys_newstat
514     x32     fstat                   compat_sys_newfstat
515     x32     lstat                   compat_sys_newlstat
516     x32     rt_sigaction            sys32_rt_sigaction
517     x32     rt_sigprocmask          sys32_rt_sigprocmask
518     x32     rt_sigreturn            stub_x32_rt_sigreturn
519     x32     ioctl                   compat_sys_ioctl
520     x32     readv                   compat_sys_readv
521     x32     writev                  compat_sys_writev
522     x32     select                  compat_sys_select
523     x32     shmat                   compat_sys_x32_shmat
524     x32     shmctl                  compat_sys_shmctl
525     x32     nanosleep               compat_sys_nanosleep
526     x32     getitimer               compat_sys_getitimer
527     x32     setitimer               compat_sys_setitimer
528     x32     recvfrom                compat_sys_recvfrom
529     x32     sendmsg                 compat_sys_sendmsg
530     x32     recvmsg                 compat_sys_recvmsg
531     x32     setsockopt              compat_sys_setsockopt
532     x32     getsockopt              compat_sys_getsockopt
533     x32     execve                  stub_x32_execve
534     x32     wait4                   compat_sys_wait4
535     x32     semctl                  compat_sys_x32_semctl
536     x32     msgsnd                  compat_sys_x32_msgsnd
537     x32     msgrcv                  compat_sys_x32_msgrcv
538     x32     msgctl                  compat_sys_msgctl
539     x32     fcntl                   compat_sys_fcntl64
540     x32     getdents                compat_sys_getdents
541     x32     gettimeofday            compat_sys_gettimeofday
542     x32     getrlimit               compat_sys_getrlimit
543     x32     getrusage               compat_sys_getrusage
544     x32     sysinfo                 compat_sys_sysinfo
545     x32     times                   compat_sys_times
546     x32     rt_sigpending           sys32_rt_sigpending
547     x32     rt_sigtimedwait         compat_sys_rt_sigtimedwait
548     x32     rt_sigqueueinfo         sys32_rt_sigqueueinfo
549     x32     sigaltstack             stub_x32_sigaltstack
550     x32     utime                   compat_sys_utime
551     x32     ustat                   compat_sys_ustat
552     x32     statfs                  compat_sys_statfs
553     x32     fstatfs                 compat_sys_fstatfs
554     x32     sched_rr_get_interval   sys32_sched_rr_get_interval
555     x32     _sysctl                 compat_sys_sysctl
556     x32     adjtimex                compat_sys_adjtimex
557     x32     setrlimit               compat_sys_setrlimit
558     x32     settimeofday            compat_sys_settimeofday
559     x32     quotactl                sys32_quotactl
560     x32     nfsservctl              compat_sys_nfsservctl
561     x32     time                    compat_sys_time
562     x32     futex                   compat_sys_futex
563     x32     sched_setaffinity       compat_sys_sched_setaffinity
564     x32     sched_getaffinity       compat_sys_sched_getaffinity
565     x32     io_setup                compat_sys_io_setup
566     x32     io_getevents            compat_sys_io_getevents
567     x32     io_submit               compat_sys_io_submit
568     x32     getdents64              compat_sys_getdents64
569     x32     semtimedop              compat_sys_semtimedop
570     x32     timer_create            compat_sys_timer_create
571     x32     timer_settime           compat_sys_timer_settime
572     x32     timer_gettime           compat_sys_timer_gettime
573     x32     clock_settime           compat_sys_clock_settime
574     x32     clock_gettime           compat_sys_clock_gettime
575     x32     clock_getres            compat_sys_clock_getres
576     x32     clock_nanosleep         compat_sys_clock_nanosleep
577     x32     utimes                  compat_sys_utimes
578     x32     mq_open                 compat_sys_mq_open
579     x32     mq_timedsend            compat_sys_mq_timedsend
580     x32     mq_timedreceive         compat_sys_mq_timedreceive
581     x32     mq_notify               compat_sys_mq_notify
582     x32     mq_getsetattr           compat_sys_mq_getsetattr
583     x32     kexec_load              compat_sys_kexec_load
584     x32     waitid                  compat_sys_waitid
585     x32     openat                  compat_sys_openat
586     x32     futimesat               compat_sys_futimesat
587     x32     fstatat64               sys32_fstatat
588     x32     pselect6                compat_sys_pselect6
589     x32     ppoll                   compat_sys_ppoll
590     x32     set_robust_list         compat_sys_set_robust_list
591     x32     get_robust_list         compat_sys_get_robust_list
592     x32     vmsplice                compat_sys_vmsplice
593     x32     move_pages              compat_sys_move_pages
594     x32     utimensat               compat_sys_utimensat
595     x32     signalfd                compat_sys_signalfd
596     x32     timerfd_settime         compat_sys_timerfd_settime
597     x32     timerfd_gettime         compat_sys_timerfd_gettime
598     x32     signalfd4               compat_sys_signalfd4
599     x32     rt_tgsigqueueinfo       compat_sys_rt_tgsigqueueinfo
600     x32     stat64                  sys32_stat64
601     x32     fstat64                 sys32_fstat64
602     x32     lstat64                 sys32_lstat64
603     x32     statfs64                compat_sys_statfs64
604     x32     fstatfs64               compat_sys_fstatfs64
605     x32     recvmmsg                compat_sys_recvmmsg
606     x32     open_by_handle_at       compat_sys_open_by_handle_at
607     x32     clock_adjtime           compat_sys_clock_adjtime
608     x32     sendmmsg                compat_sys_sendmmsg

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-27  0:53         ` H. Peter Anvin
@ 2011-08-27  1:18           ` Linus Torvalds
  2011-08-27  1:35             ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Linus Torvalds @ 2011-08-27  1:18 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On Fri, Aug 26, 2011 at 5:53 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Unfortunately, there is actually a reason for the use of "unsigned long"
> here -- it means that the combination of the time and the _nsec fields
> matches struct timespec.  struct timespec/struct timeval is one of those
> things that it would be really nice if we *could* change (it's not
> inherently pointer-sized, and it really should be 64 bits), but struct
> timespec and struct timeval are embedded in a a number of memory
> structures, some of which have pointers; and they are used by ioctls.

But for "struct stat"? You can't depend on that anyway.

I do agree that it would be nice to just make "struct timeval" always
be 64 bits, and I actually think it *should* be done for any new ABI.
If for no other reason than "time_t" should be 64-bit, in order to
avoid all the issues with 2038.

Because the POSIX defintion of 'timeval' is *not* that the fields must
be 'long'. It's "time_t" + "suseconds_t", so it should be entirely
possible to make 'struct timeval' use 64-bit fields.

"struct timespec" seems to be designed as "time_t" + "long" which is
sad. But again, we could easily make it be

  typedef __u64 time_t;

  struct timespec {
    time_t tv_sec;
    long tv_nsec;
    long tv_unused;
  }

and it would actually be perfectly compatible with x86-64.

And I really do think that a new 32-bit ABI is *much* better off
trying to be compatible with x86-64 (and avoiding things like 2038)
than it is trying to be compatible with the old-style x86-32 binaries.
I realize that it may be *easier* to be compatible with x86-32 and
just add a few new system calls, but I think it's wrong.

2038 is a long time away for legacy binaries. It's *not* all that long
away if you are introducing a new 32-bit mode for performance.

                       Linus

But I think that's independent of 'struct stat' anyway.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-27  1:18           ` Linus Torvalds
@ 2011-08-27  1:35             ` H. Peter Anvin
  2011-08-27  1:45               ` Linus Torvalds
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-27  1:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On 08/26/2011 06:18 PM, Linus Torvalds wrote:
> On Fri, Aug 26, 2011 at 5:53 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> Unfortunately, there is actually a reason for the use of "unsigned long"
>> here -- it means that the combination of the time and the _nsec fields
>> matches struct timespec.  struct timespec/struct timeval is one of those
>> things that it would be really nice if we *could* change (it's not
>> inherently pointer-sized, and it really should be 64 bits), but struct
>> timespec and struct timeval are embedded in a a number of memory
>> structures, some of which have pointers; and they are used by ioctls.
> 
> But for "struct stat"? You can't depend on that anyway.

No, but since we have the conversion function in the kernel already it
seems we might as well use it.

> And I really do think that a new 32-bit ABI is *much* better off
> trying to be compatible with x86-64 (and avoiding things like 2038)
> than it is trying to be compatible with the old-style x86-32 binaries.
> I realize that it may be *easier* to be compatible with x86-32 and
> just add a few new system calls, but I think it's wrong.

It is wrong to more than a small degree, to be sure.  However, "easier"
here is the difference of working through every data structure used by
every ioctl in every driver in the kernel and figure out which ones have
pointers or pointer-sized items, when that is work that has already been
done once.  Admittedly that does come with some legacy, but it does not
appear to be a significant extra cost.

Y2038 is certainly the single biggest issue here, and I fully admit to
not having a good answer on that one.  Note that applies to every single
32-bit ABI in the Linux kernel, including asm-generic which is used by
all brand new architectures.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-27  1:12         ` H. Peter Anvin
@ 2011-08-27  1:42           ` Linus Torvalds
  2011-08-29 19:01             ` Geert Uytterhoeven
  0 siblings, 1 reply; 94+ messages in thread
From: Linus Torvalds @ 2011-08-27  1:42 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On Fri, Aug 26, 2011 at 6:12 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> For reference, this is the current list (again, unaudited!) of unshared
> system calls.  Only the ones with *x32* in the the entry point name have
> any new code in the kernel at all.

So a *lot* of these make me extremely unhappy.

> # x32 system calls start at 512 to avoid cache impact for native 32 bit
> #
> 512     x32     open                    compat_sys_open

The only difference between open and compat_sys_open() is that the
latter doesn't set O_LARGEFILE, no?

That seems *extremely* wrong. Darn it, there is no reason we should
ever allow the old LARGEFILE crap in a new model. So why would we ever
want to that compat_open()?

> 513     x32     stat                    compat_sys_newstat
> 514     x32     fstat                   compat_sys_newfstat
> 515     x32     lstat                   compat_sys_newlstat

So these I'm unhappy with because I really think we should just use
the 64-bit stat format, instead of dicking around with the legacy
formats.

The native x86-64 stat is *better* than the crazy i386 formats, for
chissake! The i386 "newstat" has those crazy padding fields. They make
no sense.

> 516     x32     rt_sigaction            sys32_rt_sigaction
> 517     x32     rt_sigprocmask          sys32_rt_sigprocmask
> 518     x32     rt_sigreturn            stub_x32_rt_sigreturn

So these may be valid. I don't know what your stub_x32_rt_sigreturn
is, but I hope it still restores the full 64-bit registers? Even in
ULP64, I assume we still use 64-bit registers for "long long" etc?

> 519     x32     ioctl                   compat_sys_ioctl
> 520     x32     readv                   compat_sys_readv
> 521     x32     writev                  compat_sys_writev

Ok, these are the ones I expect. They are all about structures with
pointers in user space.

> 522     x32     select                  compat_sys_select

But this one I really suspect we'd be better off just having 64-bit
timeval, obviating the need for the compat system call.

time_t really *should* be 64-bit, the same way "off_t" should be. If
it's not, there's something wrong.

> 523     x32     shmat                   compat_sys_x32_shmat
> 524     x32     shmctl                  compat_sys_shmctl

I assume this is due to the same "return shm segment in low 4GB" thing.

I do think it would be better to have a "SHM_4G" flag that gets set by
user space, but I guess that's ok.

> 525     x32     nanosleep               compat_sys_nanosleep
> 526     x32     getitimer               compat_sys_getitimer
> 527     x32     setitimer               compat_sys_setitimer

Again, these would be better off with a 64-bit time_t. Seriously.

> 528     x32     recvfrom                compat_sys_recvfrom
> 529     x32     sendmsg                 compat_sys_sendmsg
> 530     x32     recvmsg                 compat_sys_recvmsg

Ok, pointers in user land.

> 531     x32     setsockopt              compat_sys_setsockopt
> 532     x32     getsockopt              compat_sys_getsockopt

Grr. I guess these fall under the same heading.

> 533     x32     execve                  stub_x32_execve

Yes.

> 534     x32     wait4                   compat_sys_wait4

Why is this? "rusage"? Can't we just make that 64-bit?

> 535     x32     semctl                  compat_sys_x32_semctl
> 536     x32     msgsnd                  compat_sys_x32_msgsnd
> 537     x32     msgrcv                  compat_sys_x32_msgrcv
> 538     x32     msgctl                  compat_sys_msgctl

Ok.

> 539     x32     fcntl                   compat_sys_fcntl64

flock?

> 540     x32     getdents                compat_sys_getdents

.. but why this? Isn't 'linux_dirent64' good enough?

> 541     x32     gettimeofday            compat_sys_gettimeofday

64-bit time_t?

> 542     x32     getrlimit               compat_sys_getrlimit
> 543     x32     getrusage               compat_sys_getrusage

Looks like another "should just be 64-bit"

> 544     x32     sysinfo                 compat_sys_sysinfo
> 545     x32     times                   compat_sys_times

64-bit time_t should fix this.
...
> 560     x32     nfsservctl              compat_sys_nfsservctl

We've removed this one.

.. lots more. It really looks annoying to me. A lot of the remaining
ones should just be "int 0x80", they aren't performance critical. Why
do they have to have new system call entry points?

                        Linus

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-27  1:35             ` H. Peter Anvin
@ 2011-08-27  1:45               ` Linus Torvalds
  0 siblings, 0 replies; 94+ messages in thread
From: Linus Torvalds @ 2011-08-27  1:45 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On Fri, Aug 26, 2011 at 6:35 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Y2038 is certainly the single biggest issue here, and I fully admit to
> not having a good answer on that one.  Note that applies to every single
> 32-bit ABI in the Linux kernel, including asm-generic which is used by
> all brand new architectures.

So I've had journalists ask me about it, and I've always said that by
the time 2038 rolls around, we'll all be using 64-bit CPU's.

Which I think is a reasonable answer.

But if those 64-bit CPU's are then running some ULP32 "fast mode",
then that answer goes out the window. And I really think it's
fundamentally wrong to have "off_t" and "time_t" be 32-bit in this day
and age.  I'd hate to introduce a new mode like that. It just makes me
go "Eww".

                         Linus

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-27  0:57       ` H. Peter Anvin
@ 2011-08-27  4:40         ` Christoph Hellwig
  2011-08-29 15:04           ` Arnd Bergmann
  0 siblings, 1 reply; 94+ messages in thread
From: Christoph Hellwig @ 2011-08-27  4:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On Fri, Aug 26, 2011 at 05:57:34PM -0700, H. Peter Anvin wrote:
> On 08/26/2011 05:36 PM, Linus Torvalds wrote:
> > If you want to be compatible with "int 0x80" and old libraries, then I
> > really don't see why you would introduce *anything* new.
> 
> Just to be clear, the reason to keep the LFS stuff in there was to be
> compatible with the existing 32-bit *programming model*, so that a
> program recompiled from i386 to x32 should behave the same.
> 
> Not that anyone should compile without -D_FILE_OFFSET_BITS=64 these days...

Any new port should not even offer non-LFS system calls.  They are a
pain in the but, and I would sacrifice chicken if we coud stop glibc
offering it as a default that way.

> 
> 	-hpa
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
---end quoted text---

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-27  4:40         ` Christoph Hellwig
@ 2011-08-29 15:04           ` Arnd Bergmann
  2011-08-29 18:31             ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Arnd Bergmann @ 2011-08-29 15:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: H. Peter Anvin, Linus Torvalds, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner

On Saturday 27 August 2011, Christoph Hellwig wrote:
> On Fri, Aug 26, 2011 at 05:57:34PM -0700, H. Peter Anvin wrote:
> > On 08/26/2011 05:36 PM, Linus Torvalds wrote:
> > > If you want to be compatible with "int 0x80" and old libraries, then I
> > > really don't see why you would introduce anything new.
> > 
> > Just to be clear, the reason to keep the LFS stuff in there was to be
> > compatible with the existing 32-bit *programming model*, so that a
> > program recompiled from i386 to x32 should behave the same.
> > 
> > Not that anyone should compile without -D_FILE_OFFSET_BITS=64 these days...
> 
> Any new port should not even offer non-LFS system calls.  They are a
> pain in the but, and I would sacrifice chicken if we coud stop glibc
> offering it as a default that way.

Right. The asm-generic/unistd.h interface doesn't provide them either
for new architectures and expects libc to emulate them for any user
application whose developers can't be bothered to fix their code.

I think I've also commented in the past that I think x32 should use
the same set of syscalls asm asm-generic, even if it's more convenient
to use a different ordering.

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-29 15:04           ` Arnd Bergmann
@ 2011-08-29 18:31             ` H. Peter Anvin
  2011-08-30 12:09               ` Arnd Bergmann
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-29 18:31 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Linus Torvalds, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner

On 08/29/2011 08:04 AM, Arnd Bergmann wrote:
> 
> Right. The asm-generic/unistd.h interface doesn't provide them either
> for new architectures and expects libc to emulate them for any user
> application whose developers can't be bothered to fix their code.
> 
> I think I've also commented in the past that I think x32 should use
> the same set of syscalls asm asm-generic, even if it's more convenient
> to use a different ordering.
> 

It definitely is not convenient to use asm-generic for a whole lot of
reasons, which basically comes down to leveraging the existing x86-64
system calls plus leveraging the i386-on-x86-64 compat layer as much as
possible.

I talked to H.J. this morning and we're certainly dropping the 32-bit
filesystem calls.  I'm going to audit which paths have both time_t
(including struct timespec/timeval) and pointers; that is hopefully a
matter of legwork.  This will mean introducing new ioctls, but it's not
clear how many.

The end result is going to be bigger than the current patchset (which is
+2197 -510, and most of which is just the system call tables themselves;
the balance is only +690 -105), but it is definitely a *better* ABI.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-27  1:42           ` Linus Torvalds
@ 2011-08-29 19:01             ` Geert Uytterhoeven
  2011-08-29 19:03               ` H. Peter Anvin
                                 ` (2 more replies)
  0 siblings, 3 replies; 94+ messages in thread
From: Geert Uytterhoeven @ 2011-08-29 19:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On Sat, Aug 27, 2011 at 03:42, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> time_t really *should* be 64-bit, the same way "off_t" should be. If
> it's not, there's something wrong.

Which will break all this non-portable 32-bit-only source code x32 was invented
for in the first place?
Someone will pass a time_t or off_t and an innocent pointer to a custom
printf-alike function to format it like "%u %s" and it will go bang...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-29 19:01             ` Geert Uytterhoeven
@ 2011-08-29 19:03               ` H. Peter Anvin
  2011-08-30  1:17               ` Ted Ts'o
  2011-08-30  1:48               ` Linus Torvalds
  2 siblings, 0 replies; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-29 19:03 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linus Torvalds, LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On 08/29/2011 12:01 PM, Geert Uytterhoeven wrote:
> 
> Which will break all this non-portable 32-bit-only source code x32 was invented
> for in the first place?
> Someone will pass a time_t or off_t and an innocent pointer to a custom
> printf-alike function to format it like "%u %s" and it will go bang...
> 

Yes, this is the downside...

	-hpa


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-29 19:01             ` Geert Uytterhoeven
  2011-08-29 19:03               ` H. Peter Anvin
@ 2011-08-30  1:17               ` Ted Ts'o
  2011-08-30  1:48               ` Linus Torvalds
  2 siblings, 0 replies; 94+ messages in thread
From: Ted Ts'o @ 2011-08-30  1:17 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linus Torvalds, H. Peter Anvin, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner

On Mon, Aug 29, 2011 at 09:01:37PM +0200, Geert Uytterhoeven wrote:
> On Sat, Aug 27, 2011 at 03:42, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > time_t really *should* be 64-bit, the same way "off_t" should be. If
> > it's not, there's something wrong.
> 
> Which will break all this non-portable 32-bit-only source code x32 was invented
> for in the first place?
> Someone will pass a time_t or off_t and an innocent pointer to a custom
> printf-alike function to format it like "%u %s" and it will go bang...

Well, for old static binaries, the old syscall ABI would still have to
use 32-bit off_t's and time_t's.  And for dynamically linked binaries,
glibc could deal with the compatibility issues with the old ABI,
across the shared library interface, right?

      	    	      	  		       	    - Ted

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-29 19:01             ` Geert Uytterhoeven
  2011-08-29 19:03               ` H. Peter Anvin
  2011-08-30  1:17               ` Ted Ts'o
@ 2011-08-30  1:48               ` Linus Torvalds
  2011-08-30  2:16                 ` Kyle Moffett
  2011-08-30  7:00                 ` Geert Uytterhoeven
  2 siblings, 2 replies; 94+ messages in thread
From: Linus Torvalds @ 2011-08-30  1:48 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: H. Peter Anvin, LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On Mon, Aug 29, 2011 at 12:01 PM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>
> Which will break all this non-portable 32-bit-only source code x32 was invented
> for in the first place?

NO. HELL NO!

Guys, get a f&*king grip already!

There are absolutely ZERO compatibility issues.

If you want compatibility, you run traditional 32-bit x86.

If you want full 64-bit, you run standard x86-64 binaries.

x32 is *not* about compatibility. It's about pure performance, and
perhaps smaller binaries. Nothing else. If you start blathering about
"compatibility", you're so on the wrong track that it isn't even
funny!

I would seriously suggest that if you want compatible ioctl's, don't
use x32. There's no point. It's not why x32 exists. If this turns into
a "let's add a new compat layer", or "let's just re-do x86-32 all over
again just using "syscall", I think the whole thing is totally
pointless.

If something isn't performance-sensitive, why do it in x32 at all? It
involves new libraries, new compiler support, new calling conventions,
and it will be *less well supported* than the traditional compat mode.

The *only* reason to use x32 is:

 - you get all the new x86-64 registers, and the improved calling
convention (big registers, more of them)

 - you get a faster system call.

 - smaller memory footprint (32-bit pointers and long).

But no, "compatibility" isn't one of those reasons. You won't be able
to use old libraries, you won't even be able to use an old compiler,
and I would seriously suggest that maybe we shouldn't care about the
more esoteric ioctl's if they end up being a pain.

                             Linus

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-30  1:48               ` Linus Torvalds
@ 2011-08-30  2:16                 ` Kyle Moffett
  2011-08-30  4:45                   ` H. Peter Anvin
  2011-08-30  7:09                   ` Andi Kleen
  2011-08-30  7:00                 ` Geert Uytterhoeven
  1 sibling, 2 replies; 94+ messages in thread
From: Kyle Moffett @ 2011-08-30  2:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Geert Uytterhoeven, H. Peter Anvin, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner

On Mon, Aug 29, 2011 at 21:48, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Aug 29, 2011 at 12:01 PM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>> Which will break all this non-portable 32-bit-only source code x32 was invented
>> for in the first place?
>
> There are absolutely ZERO compatibility issues.
>
> If you want compatibility, you run traditional 32-bit x86.
>
> If you want full 64-bit, you run standard x86-64 binaries.
>
> x32 is *not* about compatibility. It's about pure performance, and
> perhaps smaller binaries. Nothing else. If you start blathering about
> "compatibility", you're so on the wrong track that it isn't even
> funny!

I agree.

This is exactly the same reason that PowerPC64 systems are 99%+
32-bit binaries.

When "64-bit" doesn't magically mean "more registers" or "vector ops",
then all it really does is chew up twice as much RAM for every pointer.

The only programs which really care are those which map many gigs
of stuff into memory (IE: big databases, etc).  Even "git" on some
pretty outrageously large repositories can pretty easily page between
pack files without much overhead.

The goal of x32 as I understand it is to allow 32-bit x86 programs to
use all the nifty extra registers and faster instructions (IE: syscall)
without needing to deal with the 2x memory overhead of 64-bit
pointers.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-30  2:16                 ` Kyle Moffett
@ 2011-08-30  4:45                   ` H. Peter Anvin
  2011-08-30  7:06                     ` Geert Uytterhoeven
  2011-08-30  7:09                   ` Andi Kleen
  1 sibling, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-30  4:45 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: Linus Torvalds, Geert Uytterhoeven, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner

On 08/29/2011 07:16 PM, Kyle Moffett wrote:
> 
> The goal of x32 as I understand it is to allow 32-bit x86 programs to
> use all the nifty extra registers and faster instructions (IE: syscall)
> without needing to deal with the 2x memory overhead of 64-bit
> pointers.
> 

That is the major goal.  A minor goal is to bring x86-64 goodness to
those who have an (irrational) fear of 64 bits thinking it is a major
porting effort.  Thus, *source-level* porting effort matters, but it is
completely subordinate to the major goal.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-30  1:48               ` Linus Torvalds
  2011-08-30  2:16                 ` Kyle Moffett
@ 2011-08-30  7:00                 ` Geert Uytterhoeven
  2011-09-20 18:37                   ` Jan Engelhardt
  1 sibling, 1 reply; 94+ messages in thread
From: Geert Uytterhoeven @ 2011-08-30  7:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On Tue, Aug 30, 2011 at 03:48, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Aug 29, 2011 at 12:01 PM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
>>
>> Which will break all this non-portable 32-bit-only source code x32 was invented
>> for in the first place?
>
> NO. HELL NO!
>
> Guys, get a f&*king grip already!
>
> There are absolutely ZERO compatibility issues.
>
> If you want compatibility, you run traditional 32-bit x86.

Sorry, this was mainly pointed to:

"Furthermore, there are users who seem more willing to port code known to
 not be 64-bit clean to x32 than to do a whole new port."

earlier in the thread.

> If you want full 64-bit, you run standard x86-64 binaries.

Sure.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-30  4:45                   ` H. Peter Anvin
@ 2011-08-30  7:06                     ` Geert Uytterhoeven
  2011-08-30 12:18                       ` Arnd Bergmann
  0 siblings, 1 reply; 94+ messages in thread
From: Geert Uytterhoeven @ 2011-08-30  7:06 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Kyle Moffett, Linus Torvalds, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner

On Tue, Aug 30, 2011 at 06:45, H. Peter Anvin <hpa@zytor.com> wrote:
> On 08/29/2011 07:16 PM, Kyle Moffett wrote:
>> The goal of x32 as I understand it is to allow 32-bit x86 programs to
>> use all the nifty extra registers and faster instructions (IE: syscall)
>> without needing to deal with the 2x memory overhead of 64-bit
>> pointers.

Good. IIRC, the PPC people were thinking about something similar back in 2007,
but it hasn't materialized yet.

> That is the major goal.  A minor goal is to bring x86-64 goodness to
> those who have an (irrational) fear of 64 bits thinking it is a major
> porting effort.  Thus, *source-level* porting effort matters, but it is
> completely subordinate to the major goal.

>From my experience, if the source code is not 64-bit clean, it will probably
be a major hassle to make it cope with changed time_t and off_t and other traps
of that kind...

As a first step, gcc should start enabling -Wall by default ;-)
And add a -Wcast option, as grepping for (C-style) casts is way too difficult...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-30  2:16                 ` Kyle Moffett
  2011-08-30  4:45                   ` H. Peter Anvin
@ 2011-08-30  7:09                   ` Andi Kleen
  2011-08-30  9:56                     ` Alan Cox
  1 sibling, 1 reply; 94+ messages in thread
From: Andi Kleen @ 2011-08-30  7:09 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: Linus Torvalds, Geert Uytterhoeven, H. Peter Anvin, LKML,
	H.J. Lu, Ingo Molnar, Thomas Gleixner

Kyle Moffett <kyle@moffetthome.net> writes:

Old wisdom, as wrong as always.

> The only programs which really care are those which map many gigs
> of stuff into memory (IE: big databases, etc). 

You mean anything that mmaps a file?

2-3GB is not a whole lot these days.

2Gb is a very similar thing as Linus y2038, just you're much more
likely to hit it.

IMHO the only excuse right now for 32bit is to use it in a JIT
that can dynamically expand the pointer. Or for old binaries.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-30  7:09                   ` Andi Kleen
@ 2011-08-30  9:56                     ` Alan Cox
  0 siblings, 0 replies; 94+ messages in thread
From: Alan Cox @ 2011-08-30  9:56 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Kyle Moffett, Linus Torvalds, Geert Uytterhoeven, H. Peter Anvin,
	LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

> > The only programs which really care are those which map many gigs
> > of stuff into memory (IE: big databases, etc). 
> 
> You mean anything that mmaps a file?
> 2-3GB is not a whole lot these days.

For most stuff its a lot, and most apps that mmap files mmap the bits
they need. O_LARGEFILE and 64bit time_t do make sense though.

> 2Gb is a very similar thing as Linus y2038, just you're much more
> likely to hit it.

Also if using a 64bit time_t means less compat gunge it makes things much
easier. At that point 32bit time_t becomes a userspace/library thunking
problen.

> IMHO the only excuse right now for 32bit is to use it in a JIT
> that can dynamically expand the pointer. Or for old binaries.

Or for a lot of code which runs way faster in 32bit pointer mode. Less
memory, smaller cache footprint.

Alan

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-29 18:31             ` H. Peter Anvin
@ 2011-08-30 12:09               ` Arnd Bergmann
  2011-08-30 16:35                 ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Arnd Bergmann @ 2011-08-30 12:09 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christoph Hellwig, Linus Torvalds, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner

On Monday 29 August 2011, H. Peter Anvin wrote:
> On 08/29/2011 08:04 AM, Arnd Bergmann wrote:
> > 
> > Right. The asm-generic/unistd.h interface doesn't provide them either
> > for new architectures and expects libc to emulate them for any user
> > application whose developers can't be bothered to fix their code.
> > 
> > I think I've also commented in the past that I think x32 should use
> > the same set of syscalls asm asm-generic, even if it's more convenient
> > to use a different ordering.
> > 
> 
> It definitely is not convenient to use asm-generic for a whole lot of
> reasons, which basically comes down to leveraging the existing x86-64
> system calls plus leveraging the i386-on-x86-64 compat layer as much as
> possible.

Yes, but that's not what I meant. My point was that any new architecture
should have only the set of 269 syscalls that asm-generic has, i.e. 
no none of the syscalls that have been replaced for any number of
reasons (large files, *at, uid32, pselect).

I do agree that you should keep using the x86 data structures unless
there is a good reason to do otherwise, and I agree that you should
keep using the syscall numbers for the calls that remain, but I would
just leave out from the ABI the calls that are no longer necessary.

> I talked to H.J. this morning and we're certainly dropping the 32-bit
> filesystem calls.  I'm going to audit which paths have both time_t
> (including struct timespec/timeval) and pointers; that is hopefully a
> matter of legwork.  This will mean introducing new ioctls, but it's not
> clear how many.
> 
> The end result is going to be bigger than the current patchset (which is
> +2197 -510, and most of which is just the system call tables themselves;
> the balance is only +690 -105), but it is definitely a better ABI.

Ok.

I'm wondering about the time_t changes: given that we are still adding
new 32 bit architectures, should we change the asm-generic API as well
to use 64 bit time_t by default (with fallbacks for the existing ones)?

If you are adding support for these in x32 already, we could use the
same code for regular 32 bit architectures.

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-30  7:06                     ` Geert Uytterhoeven
@ 2011-08-30 12:18                       ` Arnd Bergmann
  0 siblings, 0 replies; 94+ messages in thread
From: Arnd Bergmann @ 2011-08-30 12:18 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: H. Peter Anvin, Kyle Moffett, Linus Torvalds, LKML, H.J. Lu,
	Ingo Molnar, Thomas Gleixner

On Tuesday 30 August 2011, Geert Uytterhoeven wrote:
> On Tue, Aug 30, 2011 at 06:45, H. Peter Anvin <hpa@zytor.com> wrote:
> > On 08/29/2011 07:16 PM, Kyle Moffett wrote:
> >> The goal of x32 as I understand it is to allow 32-bit x86 programs to
> >> use all the nifty extra registers and faster instructions (IE: syscall)
> >> without needing to deal with the 2x memory overhead of 64-bit
> >> pointers.
> 
> Good. IIRC, the PPC people were thinking about something similar back in 2007,
> but it hasn't materialized yet.

I think every arch has thought about it at some point. Note that current powerpc
(and s390) enterprise systems are 64 bit only. s390 wants to add a new 32 bit
ABI along these lines (much simpler to do than on x86 though). MIPS has
had it for a long time (the n32 ABI).

For sh and tile, the 32-on-64 ABI still uses the 64 bit registers all the
time and is incompatible with the native 32 bit ABI.

> > That is the major goal.  A minor goal is to bring x86-64 goodness to
> > those who have an (irrational) fear of 64 bits thinking it is a major
> > porting effort.  Thus, *source-level* porting effort matters, but it is
> > completely subordinate to the major goal.
> 
> From my experience, if the source code is not 64-bit clean, it will probably
> be a major hassle to make it cope with changed time_t and off_t and other traps
> of that kind...

I don't think so. The only major user space projects that I've seen being
fundamentally 32-bit only are the ones that have some kind of language
interpreter that casts freely between pointers and 32 bit wide integers.

The uses of time_t and off_t are typically fairly localized in the programs
by comparison and can be fixed much easier, especially if you don't actually
want to use larger than 32 bit values for these.

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-30 12:09               ` Arnd Bergmann
@ 2011-08-30 16:35                 ` H. Peter Anvin
  2011-08-31 16:14                   ` Arnd Bergmann
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-30 16:35 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Linus Torvalds, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner

On 08/30/2011 05:09 AM, Arnd Bergmann wrote:
> 
> I'm wondering about the time_t changes: given that we are still adding
> new 32 bit architectures, should we change the asm-generic API as well
> to use 64 bit time_t by default (with fallbacks for the existing ones)?
> 
> If you are adding support for these in x32 already, we could use the
> same code for regular 32 bit architectures.
> 

It seems absolutely boggling insane that we're introducing new
architectures with no legacy whatsoever and use 32-bit time_t on those.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-30 16:35                 ` H. Peter Anvin
@ 2011-08-31 16:14                   ` Arnd Bergmann
  2011-08-31 16:25                     ` H. Peter Anvin
                                       ` (2 more replies)
  0 siblings, 3 replies; 94+ messages in thread
From: Arnd Bergmann @ 2011-08-31 16:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christoph Hellwig, Linus Torvalds, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

On Tuesday 30 August 2011, H. Peter Anvin wrote:
> On 08/30/2011 05:09 AM, Arnd Bergmann wrote:
> > 
> > I'm wondering about the time_t changes: given that we are still adding
> > new 32 bit architectures, should we change the asm-generic API as well
> > to use 64 bit time_t by default (with fallbacks for the existing ones)?
> > 
> > If you are adding support for these in x32 already, we could use the
> > same code for regular 32 bit architectures.
> > 
> 
> It seems absolutely boggling insane that we're introducing new
> architectures with no legacy whatsoever and use 32-bit time_t on those.

Ok, but I think we do need to consider the potential problems in this.
I would expect a number of things to break if we just define it to
'long long' on new architectures, including:

* pre-c99 C compilers or programs that rely on --std=c89
* padding in struct timespec when you have a long long tv_sec and
  32-bit long tv_nsec. This might cause kernel stack data leakage
  in some kernel interfaces when they don't clear the padding.
* random broken applications assuming that timespec/timeval has
  two 'long' members, instead of using the proper header files.

Obviously these are all fixable for any new ABI, but will cause
some annoyance.

I've added a few people to Cc who are in various stages of the
process to finalize their upstream kernel ports. It's clearly
the right decision to have time_t 64-bit eventually, the question
is how much work is everyone willing to spend in the short run,
and who is going to test it. In particular, openrisc has just
been merged, so we should not be changing it any more unless
there is a serious problem, but if there is not much legacy user
space with the current ABI yet, it may still be worth switching
over.

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 16:14                   ` Arnd Bergmann
@ 2011-08-31 16:25                     ` H. Peter Anvin
  2011-08-31 16:39                       ` Arnd Bergmann
  2011-08-31 16:46                     ` Linus Torvalds
  2011-09-01  6:08                     ` Jonas Bonn
  2 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-31 16:25 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Linus Torvalds, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

On 08/31/2011 09:14 AM, Arnd Bergmann wrote:
> 
> Ok, but I think we do need to consider the potential problems in this.
> I would expect a number of things to break if we just define it to
> 'long long' on new architectures, including:
> 
> * pre-c99 C compilers or programs that rely on --std=c89

This is a very long time ago by now.  Pre-C99 compilers without the long
long extension probably don't exist for these new architectures;
applications is a little bit messier, but still.

> * padding in struct timespec when you have a long long tv_sec and
>   32-bit long tv_nsec. This might cause kernel stack data leakage
>   in some kernel interfaces when they don't clear the padding.

Don't to that then.  For what it's worth, I think we currently use the
same size for both fields.

> * random broken applications assuming that timespec/timeval has
>   two 'long' members, instead of using the proper header files.
> 
> Obviously these are all fixable for any new ABI, but will cause
> some annoyance.
> 
> I've added a few people to Cc who are in various stages of the
> process to finalize their upstream kernel ports. It's clearly
> the right decision to have time_t 64-bit eventually, the question
> is how much work is everyone willing to spend in the short run,
> and who is going to test it. In particular, openrisc has just
> been merged, so we should not be changing it any more unless
> there is a serious problem, but if there is not much legacy user
> space with the current ABI yet, it may still be worth switching
> over.

Either way, all of this applies to x32 even more, sadly.

The other thing is that we probably need to do is to set a date when we
redefine legacy 32-bit time_t to be unsigned.  A good time might be some
time around (time_t)0x60000000 = Thu Jan 14 08:25:36 UTC 2021 if not sooner.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 16:25                     ` H. Peter Anvin
@ 2011-08-31 16:39                       ` Arnd Bergmann
  2011-08-31 16:48                         ` Linus Torvalds
  0 siblings, 1 reply; 94+ messages in thread
From: Arnd Bergmann @ 2011-08-31 16:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christoph Hellwig, Linus Torvalds, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

On Wednesday 31 August 2011, H. Peter Anvin wrote:
> On 08/31/2011 09:14 AM, Arnd Bergmann wrote:

> > * padding in struct timespec when you have a long long tv_sec and
> >   32-bit long tv_nsec. This might cause kernel stack data leakage
> >   in some kernel interfaces when they don't clear the padding.
> 
> Don't to that then.  For what it's worth, I think we currently use the
> same size for both fields.

Ok, good point.

> > * random broken applications assuming that timespec/timeval has
> >   two 'long' members, instead of using the proper header files.
> > 
> > Obviously these are all fixable for any new ABI, but will cause
> > some annoyance.
> > 
> > I've added a few people to Cc who are in various stages of the
> > process to finalize their upstream kernel ports. It's clearly
> > the right decision to have time_t 64-bit eventually, the question
> > is how much work is everyone willing to spend in the short run,
> > and who is going to test it. In particular, openrisc has just
> > been merged, so we should not be changing it any more unless
> > there is a serious problem, but if there is not much legacy user
> > space with the current ABI yet, it may still be worth switching
> > over.
> 
> Either way, all of this applies to x32 even more, sadly.
> 
> The other thing is that we probably need to do is to set a date when we
> redefine legacy 32-bit time_t to be unsigned.  A good time might be some
> time around (time_t)0x60000000 = Thu Jan 14 08:25:36 UTC 2021 if not sooner.

Well, we could chicken out and just use unsigned int for time_t on new
32 bit ABIs, which would buy us time until ~2106 before we need to
convert everything to 64 bit...

Do you see any side-effects of changing time_t to unsigned, besides
file dates outside of the 1970...2038 range? If we end up having to
introduce new syscalls because of some incompatibility, we could just
as well introduce the full set of syscalls using a new time64_t for
32 bit architectures, like we did for uid32_t and loff_t.
The only problem that I can see right now with changing over is that
the 32-on-64 bit emulation changes from sign-extend to zero-extend.

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 16:14                   ` Arnd Bergmann
  2011-08-31 16:25                     ` H. Peter Anvin
@ 2011-08-31 16:46                     ` Linus Torvalds
  2011-08-31 17:05                       ` H.J. Lu
  2011-08-31 17:09                       ` H. Peter Anvin
  2011-09-01  6:08                     ` Jonas Bonn
  2 siblings, 2 replies; 94+ messages in thread
From: Linus Torvalds @ 2011-08-31 16:46 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Christoph Hellwig, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

On Wed, Aug 31, 2011 at 9:14 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> * padding in struct timespec when you have a long long tv_sec and
>  32-bit long tv_nsec. This might cause kernel stack data leakage
>  in some kernel interfaces when they don't clear the padding.

I suspect only sane solution to this (having thought about it some
more) is to just say "POSIX is f*^&ing wrong".

I think everybody agrees that time_t *needs* to be 64-bit. That is
only getting more and more clear the closer we get to 2038. There may
be excuses for it for some random crappy 32-bit embedded platform that
nobody really expects to survive for many more years, but it's simply
not debatable for anything else.

And if time_t is 64-bit, then timespec and timeval practically needs
to have a 64-bit tv_usec/tv_nsec because anything else causes problems
with packing etc. And that's doubly true in a 64-bit environment with
a 32-bit "sub-environment".

POSIX has been wrong before. Sometimes the solution really is to say
"sorry, you wrote that 20 years ago, and things have changed".

> * random broken applications assuming that timespec/timeval has
>  two 'long' members, instead of using the proper header files.

Those applications are already broken.

I just googled for these kinds of issues, and found this text: "A
timeval has two components, both ints".  Does that happen to work
often? Yes. Does it make it any more correct? Hell no. But people
really used to believe that, and it even used to be true. AND THEY GOT
FIXED.

If you assume two 'long' members, you're already incorrect, because
'time_t' is not at all guaranteed to be 'long'. And if you assume that
'tv_nsec' is "long", you may be correct wrt POSIX, but given the
realities I think it's still perfectly valid to say "you're a moron,
and we need to fix it".

Because paper is what we use to wipe after we've used the toilet. At
some point, "reality" just hits a hell of a lot harder than any paper
ever will.

I really think that "x32" should try to aim *VERY* hard at using the
64-bit system calls, and seeing itself as being a "32-bit application
in a 64-bit world".  That's not just true for time_t (which I think
should be 64-bit on anything new that expects to survive for any
amount of time), but in general.

I could well imagine, for example, that you might have x32
applications that wanted to access huge datasets, and then use special
"accessor" functions for that (think "HIGHMEM.SYS" except within the
application). That really says "think of it as a 64-bit process, but
with a short pointer mode for density" to me.

                         Linus

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 16:39                       ` Arnd Bergmann
@ 2011-08-31 16:48                         ` Linus Torvalds
  2011-08-31 19:18                           ` Arnd Bergmann
  0 siblings, 1 reply; 94+ messages in thread
From: Linus Torvalds @ 2011-08-31 16:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Christoph Hellwig, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

On Wed, Aug 31, 2011 at 9:39 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> Well, we could chicken out and just use unsigned int for time_t on new
> 32 bit ABIs, which would buy us time until ~2106 before we need to
> convert everything to 64 bit...

You do realize that there are probably quite a lot of programs that
depend on signed time_t because they really do care about dates before
1970?

"unsigned time_t" is not going to really solve anything. It can be a
crutch for some cases, but seriously, the only solution is to just
admit that a 32-bit time_t just doesn't work.

                Linus

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 16:46                     ` Linus Torvalds
@ 2011-08-31 17:05                       ` H.J. Lu
  2011-09-03  2:56                         ` H.J. Lu
  2011-08-31 17:09                       ` H. Peter Anvin
  1 sibling, 1 reply; 94+ messages in thread
From: H.J. Lu @ 2011-08-31 17:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, H. Peter Anvin, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Wed, Aug 31, 2011 at 9:46 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I really think that "x32" should try to aim *VERY* hard at using the
> 64-bit system calls, and seeing itself as being a "32-bit application
> in a 64-bit world".  That's not just true for time_t (which I think
> should be 64-bit on anything new that expects to survive for any
> amount of time), but in general.
>

I have been making x32 to use 64bit system calls as much as possible.
Hopefully, I will get it to work in a week or 2.

-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 16:46                     ` Linus Torvalds
  2011-08-31 17:05                       ` H.J. Lu
@ 2011-08-31 17:09                       ` H. Peter Anvin
  2011-08-31 17:19                         ` Linus Torvalds
  2011-09-01 13:30                         ` RFD: x32 ABI system call numbers Avi Kivity
  1 sibling, 2 replies; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-31 17:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, Christoph Hellwig, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

On 08/31/2011 09:46 AM, Linus Torvalds wrote:
> On Wed, Aug 31, 2011 at 9:14 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>
>> * padding in struct timespec when you have a long long tv_sec and
>>  32-bit long tv_nsec. This might cause kernel stack data leakage
>>  in some kernel interfaces when they don't clear the padding.
> 
> I suspect only sane solution to this (having thought about it some
> more) is to just say "POSIX is f*^&ing wrong".
> 

Urk.  Someone had the bright idea of defining tv_nsec as "long" in the
standard, whereas tv_usec is suseconds_t.  F**** brilliant, and more
than a little bit stupid.

Logically one could work around it by having "struct timespec" contain a
padding member in the endian-appropriate place I guess, and make sure to
clear it in the kernel, but it's rather ugly.  It might have performance
advantages to doing it that way, though.

> I really think that "x32" should try to aim *VERY* hard at using the
> 64-bit system calls, and seeing itself as being a "32-bit application
> in a 64-bit world".  That's not just true for time_t (which I think
> should be 64-bit on anything new that expects to survive for any
> amount of time), but in general.

We're trying for it.  The things we're trying to avoid is to muck (too
much) with the compat layer for the mega-multiplex system calls like
ioctl.  We can't just use the 64-bit ioctl because ioctl structures
generally contain pointers.

	-hpa


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 17:09                       ` H. Peter Anvin
@ 2011-08-31 17:19                         ` Linus Torvalds
  2011-08-31 17:38                           ` H. Peter Anvin
  2012-02-08 21:36                           ` 64-bit time on 32-bit systems H. Peter Anvin
  2011-09-01 13:30                         ` RFD: x32 ABI system call numbers Avi Kivity
  1 sibling, 2 replies; 94+ messages in thread
From: Linus Torvalds @ 2011-08-31 17:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Arnd Bergmann, Christoph Hellwig, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

On Wed, Aug 31, 2011 at 10:09 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> I suspect only sane solution to this (having thought about it some
>> more) is to just say "POSIX is f*^&ing wrong".
>
> Urk.  Someone had the bright idea of defining tv_nsec as "long" in the
> standard, whereas tv_usec is suseconds_t.  F**** brilliant, and more
> than a little bit stupid.

I think tv_nsec was just overlooked, and people thought "it has no
legacy users that were 'int', so we'll just leave it at 'long', which
is guaranteed to be enough for nanoseconds that only needs a range of
32 bits".

In contrast, tv_usec probably *does* have legacy users that are "int".

So POSIX almost certainly only looked backwards, and never thought
about users who would need to make it "long long" for compatibility
reasons.

The fact that *every*other*related*field* in POSIX/SuS has a typedef
exactly for these kinds of reasons just shows how stupid that "long
tv_nsec" thing is.

I suspect that on Linux we can just say "tv_nsec" is suseconds_t too.
Then we can make time_t and suseconds_t just match, and be "__s64" on
all new platforms.

                           Linus

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 17:19                         ` Linus Torvalds
@ 2011-08-31 17:38                           ` H. Peter Anvin
  2011-09-01 11:35                             ` Arnd Bergmann
  2012-02-08 21:36                           ` 64-bit time on 32-bit systems H. Peter Anvin
  1 sibling, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-31 17:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, Christoph Hellwig, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

On 08/31/2011 10:19 AM, Linus Torvalds wrote:
> 
> I think tv_nsec was just overlooked, and people thought "it has no
> legacy users that were 'int', so we'll just leave it at 'long', which
> is guaranteed to be enough for nanoseconds that only needs a range of
> 32 bits".
> 
> In contrast, tv_usec probably *does* have legacy users that are "int".
> 
> So POSIX almost certainly only looked backwards, and never thought
> about users who would need to make it "long long" for compatibility
> reasons.
> 
> The fact that *every*other*related*field* in POSIX/SuS has a typedef
> exactly for these kinds of reasons just shows how stupid that "long
> tv_nsec" thing is.
> 
> I suspect that on Linux we can just say "tv_nsec" is suseconds_t too.
> Then we can make time_t and suseconds_t just match, and be "__s64" on
> all new platforms.
> 

Let me see if I can raise this with the POSIX committee.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 16:48                         ` Linus Torvalds
@ 2011-08-31 19:18                           ` Arnd Bergmann
  2011-08-31 19:44                             ` H. Peter Anvin
  2011-08-31 19:49                             ` Geert Uytterhoeven
  0 siblings, 2 replies; 94+ messages in thread
From: Arnd Bergmann @ 2011-08-31 19:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Christoph Hellwig, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

On Wednesday 31 August 2011 09:48:35 Linus Torvalds wrote:
> On Wed, Aug 31, 2011 at 9:39 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > Well, we could chicken out and just use unsigned int for time_t on new
> > 32 bit ABIs, which would buy us time until ~2106 before we need to
> > convert everything to 64 bit...
> 
> You do realize that there are probably quite a lot of programs that
> depend on signed time_t because they really do care about dates before
> 1970?

Yes, it already occurred to me after I had written the above that we
really want it to be signed, especially to allow a meaningful conversion
at least one-way between 32 and 64 bit time_t values.

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 19:18                           ` Arnd Bergmann
@ 2011-08-31 19:44                             ` H. Peter Anvin
  2011-08-31 19:54                               ` Alan Cox
  2011-08-31 19:49                             ` Geert Uytterhoeven
  1 sibling, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-31 19:44 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linus Torvalds, Christoph Hellwig, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

On 08/31/2011 12:18 PM, Arnd Bergmann wrote:
>>
>> You do realize that there are probably quite a lot of programs that
>> depend on signed time_t because they really do care about dates before
>> 1970?
> 
> Yes, it already occurred to me after I had written the above that we
> really want it to be signed, especially to allow a meaningful conversion
> at least one-way between 32 and 64 bit time_t values.
> 

The only reason I mentioned redefining 32-bit time_t as unsigned was for
*legacy ABIs*.

	-hpa



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 19:18                           ` Arnd Bergmann
  2011-08-31 19:44                             ` H. Peter Anvin
@ 2011-08-31 19:49                             ` Geert Uytterhoeven
  1 sibling, 0 replies; 94+ messages in thread
From: Geert Uytterhoeven @ 2011-08-31 19:49 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linus Torvalds, H. Peter Anvin, Christoph Hellwig, LKML, H.J. Lu,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Wed, Aug 31, 2011 at 21:18, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday 31 August 2011 09:48:35 Linus Torvalds wrote:
>> On Wed, Aug 31, 2011 at 9:39 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> >
>> > Well, we could chicken out and just use unsigned int for time_t on new
>> > 32 bit ABIs, which would buy us time until ~2106 before we need to
>> > convert everything to 64 bit...
>>
>> You do realize that there are probably quite a lot of programs that
>> depend on signed time_t because they really do care about dates before
>> 1970?
>
> Yes, it already occurred to me after I had written the above that we
> really want it to be signed, especially to allow a meaningful conversion
> at least one-way between 32 and 64 bit time_t values.

If you care about dates before 1970, you're using time_t not to store
the current
time +/- some epsilon, for a "reasonable small epsilon", but to store real
dates. That was never a good idea. During the early days of UNIX, when the 1970
base was chosen, lots of people born before 1902 were still alive...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 19:44                             ` H. Peter Anvin
@ 2011-08-31 19:54                               ` Alan Cox
  2011-08-31 20:02                                 ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Alan Cox @ 2011-08-31 19:54 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Arnd Bergmann, Linus Torvalds, Christoph Hellwig, LKML, H.J. Lu,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Wed, 31 Aug 2011 12:44:04 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 08/31/2011 12:18 PM, Arnd Bergmann wrote:
> >>
> >> You do realize that there are probably quite a lot of programs that
> >> depend on signed time_t because they really do care about dates before
> >> 1970?
> > 
> > Yes, it already occurred to me after I had written the above that we
> > really want it to be signed, especially to allow a meaningful conversion
> > at least one-way between 32 and 64 bit time_t values.
> > 
> 
> The only reason I mentioned redefining 32-bit time_t as unsigned was for
> *legacy ABIs*.

But if you redefine it then it's not a legacy ABI any more - its a new
ABI. Might as well just cause the pain. 64bit has already done much of
the cleaning up.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 19:54                               ` Alan Cox
@ 2011-08-31 20:02                                 ` H. Peter Anvin
  2011-08-31 20:55                                   ` Arnd Bergmann
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-31 20:02 UTC (permalink / raw)
  To: Alan Cox
  Cc: Arnd Bergmann, Linus Torvalds, Christoph Hellwig, LKML, H.J. Lu,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On 08/31/2011 12:54 PM, Alan Cox wrote:
>>
>> The only reason I mentioned redefining 32-bit time_t as unsigned was for
>> *legacy ABIs*.
> 
> But if you redefine it then it's not a legacy ABI any more - its a new
> ABI.

Only sort-of-kind of.  It's like hacking around the Y2K problem by date
windowing; it technically it is an ABI change but when it is going to
break anyway...

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 20:02                                 ` H. Peter Anvin
@ 2011-08-31 20:55                                   ` Arnd Bergmann
  2011-08-31 20:58                                     ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Arnd Bergmann @ 2011-08-31 20:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Alan Cox, Linus Torvalds, Christoph Hellwig, LKML, H.J. Lu,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Wednesday 31 August 2011 13:02:15 H. Peter Anvin wrote:
> On 08/31/2011 12:54 PM, Alan Cox wrote:
> >>
> >> The only reason I mentioned redefining 32-bit time_t as unsigned was for
> >> *legacy ABIs*.
> > 
> > But if you redefine it then it's not a legacy ABI any more - its a new
> > ABI.
> 
> Only sort-of-kind of.  It's like hacking around the Y2K problem by date
> windowing; it technically it is an ABI change but when it is going to
> break anyway...

But isn't this mostly a glibc thing then? The definition of time_t that
is used by applications comes from bits/typesizes.h, not from the
kernel's linux/types.h. If we use 64 bit time_t values internally
in the kernel and truncate them to 32 bits on the user interface,
there is no visible difference between signed and unsigned values
for data passed from kernel to user when it's interpreted as
signed int anyway.

For the rarer case of user space passing a 32 bit time_t into the
kernel (e.g. utimensat), there is of course a difference.

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 20:55                                   ` Arnd Bergmann
@ 2011-08-31 20:58                                     ` H. Peter Anvin
  0 siblings, 0 replies; 94+ messages in thread
From: H. Peter Anvin @ 2011-08-31 20:58 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Alan Cox, Linus Torvalds, Christoph Hellwig, LKML, H.J. Lu,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On 08/31/2011 01:55 PM, Arnd Bergmann wrote:
> 
> But isn't this mostly a glibc thing then? The definition of time_t that
> is used by applications comes from bits/typesizes.h, not from the
> kernel's linux/types.h. If we use 64 bit time_t values internally
> in the kernel and truncate them to 32 bits on the user interface,
> there is no visible difference between signed and unsigned values
> for data passed from kernel to user when it's interpreted as
> signed int anyway.
> 
> For the rarer case of user space passing a 32 bit time_t into the
> kernel (e.g. utimensat), there is of course a difference.
> 

Yes, exactly.  It should be done in a coordinated fashion.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 16:14                   ` Arnd Bergmann
  2011-08-31 16:25                     ` H. Peter Anvin
  2011-08-31 16:46                     ` Linus Torvalds
@ 2011-09-01  6:08                     ` Jonas Bonn
  2 siblings, 0 replies; 94+ messages in thread
From: Jonas Bonn @ 2011-09-01  6:08 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Christoph Hellwig, Linus Torvalds, LKML, H.J. Lu,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Tobias Klauser

[-- Attachment #1: Type: text/plain, Size: 1614 bytes --]

On Wed, 2011-08-31 at 18:14 +0200, Arnd Bergmann wrote:
> On Tuesday 30 August 2011, H. Peter Anvin wrote:
> > On 08/30/2011 05:09 AM, Arnd Bergmann wrote:
> > > 
> > > I'm wondering about the time_t changes: given that we are still adding
> > > new 32 bit architectures, should we change the asm-generic API as well
> > > to use 64 bit time_t by default (with fallbacks for the existing ones)?
> > > 
> > > If you are adding support for these in x32 already, we could use the
> > > same code for regular 32 bit architectures.
> > > 
> > 
> > It seems absolutely boggling insane that we're introducing new
> > architectures with no legacy whatsoever and use 32-bit time_t on those.
> 
> I've added a few people to Cc who are in various stages of the
> process to finalize their upstream kernel ports. It's clearly
> the right decision to have time_t 64-bit eventually, the question
> is how much work is everyone willing to spend in the short run,
> and who is going to test it. In particular, openrisc has just
> been merged, so we should not be changing it any more unless
> there is a serious problem, but if there is not much legacy user
> space with the current ABI yet, it may still be worth switching
> over.

As far as OpenRISC is concerned, this change can still be made now.  I
know who the users of this platform are and, considering the rest of the
libc churn that comes with dropping the legacy syscalls, I can guarantee
that nobody's going to complain.  OpenRISC may be merged but 3.1's not
released yet so there's still a bit of wiggle room to get this done.

/Jonas

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 17:38                           ` H. Peter Anvin
@ 2011-09-01 11:35                             ` Arnd Bergmann
  2011-10-01 19:38                               ` Jonas Bonn
  0 siblings, 1 reply; 94+ messages in thread
From: Arnd Bergmann @ 2011-09-01 11:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Christoph Hellwig, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

On Wednesday 31 August 2011, H. Peter Anvin wrote:
> On 08/31/2011 10:19 AM, Linus Torvalds wrote:
> > 
> > I think tv_nsec was just overlooked, and people thought "it has no
> > legacy users that were 'int', so we'll just leave it at 'long', which
> > is guaranteed to be enough for nanoseconds that only needs a range of
> > 32 bits".
> > 
> > In contrast, tv_usec probably does have legacy users that are "int".
> > 
> > So POSIX almost certainly only looked backwards, and never thought
> > about users who would need to make it "long long" for compatibility
> > reasons.
> > 
> > The fact that *every*other*related*field* in POSIX/SuS has a typedef
> > exactly for these kinds of reasons just shows how stupid that "long
> > tv_nsec" thing is.
> > 
> > I suspect that on Linux we can just say "tv_nsec" is suseconds_t too.
> > Then we can make time_t and suseconds_t just match, and be "__s64" on
> > all new platforms.
> > 
> 
> Let me see if I can raise this with the POSIX committee.

Shall we go ahead with this patch for 3.1 in the meantime? This is the
least invasive way I can see to let OpenRISC use 64 bit time_t in the
released kernel.

The worst thing that can happen is that we will have to change it again
if this patch breaks something on OpenRISC, but if we don't do it now,
then we have one more architecture stuck with 32 bit time_t or we will
have to break its ABI.

I'm not completely convinced about the type we should use for tv_nsec
and tv_usec. The main worry I have is that common implementations
of timeval_add() or similar will require an expensive 64 bit division
on 32 bit systems, which they would not need with a 32 bit suseconds_t.
Should we use explicit padding instead in that case?

Interestingly, I noticed that parisc always uses a 32 bit suseconds_t,
even for its 64 bit ABI (which is not used all that much), so it has
implicit padding.

8<----
OpenRISC: change time_t and suseconds_t to 64 bit

time_t should really be 64 bit wide for all new ABIs including 32 bit
architectures, to allow having timestamps beyond 2038. For now, we
leave the default in asm-generic/posix-types to 32 bit wide, but the
plan is to change that in the next merge window so we reduce the
risk of breaking other architectures in the process.

In order to allow struct timespec/timeval to be free of padding, we
need suseconds_t to be the same size, and change the second member
of struct timespec to also be suseconds_t instead of long.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index 11162e6..77bcc02 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -38,7 +38,6 @@ generic-y += msgbuf.h
 generic-y += pci.h
 generic-y += percpu.h
 generic-y += poll.h
-generic-y += posix_types.h
 generic-y += resource.h
 generic-y += rmap.h
 generic-y += scatterlist.h
diff --git a/arch/openrisc/include/asm/posix_types.h b/arch/openrisc/include/asm/posix_types.h
new file mode 100644
index 0000000..f0b2944
--- /dev/null
+++ b/arch/openrisc/include/asm/posix_types.h
@@ -0,0 +1,12 @@
+#ifndef __OPENRISC_POSIX_TIME_T
+#define __OPENRISC_POSIX_TIME_T
+
+typedef long long __kernel_suseconds_t;
+#define __kernel_suseconds_t __kernel_suseconds_t
+
+typedef long long __kernel_time_t;
+#define __kernel_time_t __kernel_time_t
+
+#include <asm-generic/posix_types.h>
+
+#endif
diff --git a/include/asm-generic/posix_types.h b/include/asm-generic/posix_types.h
index 3dab008..0c53135 100644
--- a/include/asm-generic/posix_types.h
+++ b/include/asm-generic/posix_types.h
@@ -39,6 +39,10 @@ typedef unsigned int	__kernel_gid_t;
 typedef long		__kernel_suseconds_t;
 #endif
 
+#ifndef __kernel_time_t
+typedef long		__kernel_time_t;
+#endif
+
 #ifndef __kernel_daddr_t
 typedef int		__kernel_daddr_t;
 #endif
@@ -78,7 +82,6 @@ typedef long		__kernel_ptrdiff_t;
  */
 typedef long		__kernel_off_t;
 typedef long long	__kernel_loff_t;
-typedef long		__kernel_time_t;
 typedef long		__kernel_clock_t;
 typedef int		__kernel_timer_t;
 typedef int		__kernel_clockid_t;
diff --git a/include/linux/time.h b/include/linux/time.h
index b306178..207c0aa 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -12,8 +12,8 @@
 #ifndef _STRUCT_TIMESPEC
 #define _STRUCT_TIMESPEC
 struct timespec {
-	__kernel_time_t	tv_sec;			/* seconds */
-	long		tv_nsec;		/* nanoseconds */
+	__kernel_time_t		tv_sec;		/* seconds */
+	__kernel_suseconds_t	tv_nsec;	/* nanoseconds */
 };
 #endif
 

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 17:09                       ` H. Peter Anvin
  2011-08-31 17:19                         ` Linus Torvalds
@ 2011-09-01 13:30                         ` Avi Kivity
  2011-09-01 14:13                           ` H. Peter Anvin
  1 sibling, 1 reply; 94+ messages in thread
From: Avi Kivity @ 2011-09-01 13:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, LKML, H.J. Lu,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On 08/31/2011 08:09 PM, H. Peter Anvin wrote:
> >  I really think that "x32" should try to aim *VERY* hard at using the
> >  64-bit system calls, and seeing itself as being a "32-bit application
> >  in a 64-bit world".  That's not just true for time_t (which I think
> >  should be 64-bit on anything new that expects to survive for any
> >  amount of time), but in general.
>
> We're trying for it.  The things we're trying to avoid is to muck (too
> much) with the compat layer for the mega-multiplex system calls like
> ioctl.  We can't just use the 64-bit ioctl because ioctl structures
> generally contain pointers.
>

     struct iovec
     {
         void __user *iov_base;    /* BSD uses caddr_t (1003.1g requires 
void *) */
         __kernel_size_t iov_len; /* Must be size_t (1003.1g) */
     } __attribute__((x32_abi_64));

     typedef long time_t __attribute__((x32_abi_64));

The x32_abi_64 attribute converts pointers and longs back to 64-bit and 
adjusts the alignment accordingly.  If we tag all userspace visible 
structures with this attribute, we can use the 64-bit ABI without changes.

Issues:
&my_iovec->iov_base yields something that is not a void ** (reads of a 
64-bit pointer decay to a 32-bit pointer, writes zero extend).
   printf formats will break
   if someone embeds an iovec in a structure, it will occupy more space 
than expected


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-01 13:30                         ` RFD: x32 ABI system call numbers Avi Kivity
@ 2011-09-01 14:13                           ` H. Peter Anvin
  2011-09-02  0:49                             ` Pedro Alves
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-09-01 14:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, LKML, H.J. Lu,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On 09/01/2011 06:30 AM, Avi Kivity wrote:
> On 08/31/2011 08:09 PM, H. Peter Anvin wrote:
>>>  I really think that "x32" should try to aim *VERY* hard at using the
>>>  64-bit system calls, and seeing itself as being a "32-bit application
>>>  in a 64-bit world".  That's not just true for time_t (which I think
>>>  should be 64-bit on anything new that expects to survive for any
>>>  amount of time), but in general.
>>
>> We're trying for it.  The things we're trying to avoid is to muck (too
>> much) with the compat layer for the mega-multiplex system calls like
>> ioctl.  We can't just use the 64-bit ioctl because ioctl structures
>> generally contain pointers.
>>
> 
>      struct iovec
>      {
>          void __user *iov_base;    /* BSD uses caddr_t (1003.1g requires 
> void *) */
>          __kernel_size_t iov_len; /* Must be size_t (1003.1g) */
>      } __attribute__((x32_abi_64));
> 
>      typedef long time_t __attribute__((x32_abi_64));
> 
> The x32_abi_64 attribute converts pointers and longs back to 64-bit and 
> adjusts the alignment accordingly.  If we tag all userspace visible 
> structures with this attribute, we can use the 64-bit ABI without changes.
> 
> Issues:
> &my_iovec->iov_base yields something that is not a void ** (reads of a 
> 64-bit pointer decay to a 32-bit pointer, writes zero extend).
>    printf formats will break
>    if someone embeds an iovec in a structure, it will occupy more space 
> than expected
> 

Yes, the idea of a compiler extension was floated... however, we'd
prefer not to go there if at all possible.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-01 14:13                           ` H. Peter Anvin
@ 2011-09-02  0:49                             ` Pedro Alves
  2011-09-02  1:51                               ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Pedro Alves @ 2011-09-02  0:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Avi Kivity, Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner, Richard Kuo,
	Mark Salter, Jonas Bonn, Tobias Klauser

On Thursday 01 September 2011 15:13:30, H. Peter Anvin wrote:
> On 09/01/2011 06:30 AM, Avi Kivity wrote:
> > On 08/31/2011 08:09 PM, H. Peter Anvin wrote:
> >>>  I really think that "x32" should try to aim *VERY* hard at using the
> >>>  64-bit system calls, and seeing itself as being a "32-bit application
> >>>  in a 64-bit world".  That's not just true for time_t (which I think
> >>>  should be 64-bit on anything new that expects to survive for any
> >>>  amount of time), but in general.
> >>
> >> We're trying for it.  The things we're trying to avoid is to muck (too
> >> much) with the compat layer for the mega-multiplex system calls like
> >> ioctl.  We can't just use the 64-bit ioctl because ioctl structures
> >> generally contain pointers.
> >>
> > 
> >      struct iovec
> >      {
> >          void __user *iov_base;    /* BSD uses caddr_t (1003.1g requires 
> > void *) */
> >          __kernel_size_t iov_len; /* Must be size_t (1003.1g) */
> >      } __attribute__((x32_abi_64));
> > 
> >      typedef long time_t __attribute__((x32_abi_64));
> > 
> > The x32_abi_64 attribute converts pointers and longs back to 64-bit and 
> > adjusts the alignment accordingly.  If we tag all userspace visible 
> > structures with this attribute, we can use the 64-bit ABI without changes.

I would expect no new gcc extension to be needed for that -- there's the
mode attribute (you can read DI as 64-bit):

 typedef void * __kernel_ptr64 __attribute ((mode(DI)));

 struct iovec
 {
   __kernel_ptr64 iov_base;
   ...
 };

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-02  0:49                             ` Pedro Alves
@ 2011-09-02  1:51                               ` H. Peter Anvin
  2011-09-02  8:02                                 ` Arnd Bergmann
  2011-09-02  8:42                                 ` Pedro Alves
  0 siblings, 2 replies; 94+ messages in thread
From: H. Peter Anvin @ 2011-09-02  1:51 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Avi Kivity, Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner, Richard Kuo,
	Mark Salter, Jonas Bonn, Tobias Klauser

On 09/01/2011 05:49 PM, Pedro Alves wrote:
>>>
>>>       struct iovec
>>>       {
>>>           void __user *iov_base;    /* BSD uses caddr_t (1003.1g requires
>>> void *) */
>>>           __kernel_size_t iov_len; /* Must be size_t (1003.1g) */
>>>       } __attribute__((x32_abi_64));
>>>
>>>       typedef long time_t __attribute__((x32_abi_64));
>>>
>>> The x32_abi_64 attribute converts pointers and longs back to 64-bit and
>>> adjusts the alignment accordingly.  If we tag all userspace visible
>>> structures with this attribute, we can use the 64-bit ABI without changes.
>
> I would expect no new gcc extension to be needed for that -- there's the
> mode attribute (you can read DI as 64-bit):
>
>   typedef void * __kernel_ptr64 __attribute ((mode(DI)));
>
>   struct iovec
>   {
>     __kernel_ptr64 iov_base;
>     ...
>   };
>

Does that work for *writing*, too?  That might be a very useful little 
escape hatch for some particularly tight corners.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-26 23:39   ` H. Peter Anvin
  2011-08-27  0:36     ` Linus Torvalds
@ 2011-09-02  6:17     ` Andy Lutomirski
  1 sibling, 0 replies; 94+ messages in thread
From: Andy Lutomirski @ 2011-09-02  6:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

On 08/26/2011 07:39 PM, H. Peter Anvin wrote:
> On 08/26/2011 04:13 PM, Linus Torvalds wrote:
>>>
>>> The extra bit would be masked off and only affect device drivers like
>>> input which relies on is_compat().
>>
>> So a couple of questions:
>>
>>   - why do we need another system call model at all?
>
> We think we can get more performance for a process which doesn't need
> more than 4 GiB of virtual address space by allowing them to keep
> pointers 4 bytes long, while still giving them the advantage of 16
> 64-bit registers, PC-relative addressing, and so on.  Furthermore, there
> are users who seem more willing to port code known to not be 64-bit
> clean to x32 than to do a whole new port.
>
> If the question is "why not just thunk this in userspace", the answer is
> that we'd like to take advantage of the compat layer already in the kernel.
>
> If the question is "why not just use int $0x80" we actually did that in
> early prototyping, but SYSCALL64 is much faster.

This may be a dumb question, but:

Why not just set some high bit of rax to enable compat syscalls to be 
issued with the 64-bit SYSCALL instruction.  This could be done with 
zero overhead for normal 64-bit code (you can just adjust the existing 
system-call-number-too-high path) and the total kernel patch should be 
just a handful of lines.  Does x32 need any more kernel support than 
that?  You'll confuse strace on new binaries, but that shouldn't be a 
big deal.

Also, why do ioctls and userspace structs need any translation at all? 
This is a new ABI -- why not just teach x32 code to stick zeros in its 
structs in the appropriate places to use the 64-bit layout.

--Andy

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-02  1:51                               ` H. Peter Anvin
@ 2011-09-02  8:02                                 ` Arnd Bergmann
  2011-09-02  8:42                                 ` Pedro Alves
  1 sibling, 0 replies; 94+ messages in thread
From: Arnd Bergmann @ 2011-09-02  8:02 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pedro Alves, Avi Kivity, Linus Torvalds, Christoph Hellwig, LKML,
	H.J. Lu, Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Thursday 01 September 2011 18:51:35 H. Peter Anvin wrote:
> On 09/01/2011 05:49 PM, Pedro Alves wrote:
> >>>
> >>>       struct iovec
> >>>       {
> >>>           void __user *iov_base;    /* BSD uses caddr_t (1003.1g requires
> >>> void *) */
> >>>           __kernel_size_t iov_len; /* Must be size_t (1003.1g) */
> >>>       } __attribute__((x32_abi_64));
> >>>
> >>>       typedef long time_t __attribute__((x32_abi_64));
> >>>
> >>> The x32_abi_64 attribute converts pointers and longs back to 64-bit and
> >>> adjusts the alignment accordingly.  If we tag all userspace visible
> >>> structures with this attribute, we can use the 64-bit ABI without changes.
> >
> > I would expect no new gcc extension to be needed for that -- there's the
> > mode attribute (you can read DI as 64-bit):
> >
> >   typedef void * __kernel_ptr64 __attribute ((mode(DI)));
> >
> >   struct iovec
> >   {
> >     __kernel_ptr64 iov_base;
> >     ...
> >   };
> >
> 
> Does that work for *writing*, too?  That might be a very useful little 
> escape hatch for some particularly tight corners.

I've tried to use that extension in other contexts without much success,
mostly I believe because gcc back-end support for it needs to be there
but wasn't at the time I tried. If the x32 back-end does this correctly,
you win.

A different gcc extension that might turn out to be useful here is the
named address space extension that lets you annotate a pointer to
be different from other pointers. On the SPU architecture we use this
for the destinction between local 18 bit pointers and 64-bit pointers
into the user process address space.

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-02  1:51                               ` H. Peter Anvin
  2011-09-02  8:02                                 ` Arnd Bergmann
@ 2011-09-02  8:42                                 ` Pedro Alves
  1 sibling, 0 replies; 94+ messages in thread
From: Pedro Alves @ 2011-09-02  8:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Avi Kivity, Linus Torvalds, Arnd Bergmann, Christoph Hellwig,
	LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner, Richard Kuo,
	Mark Salter, Jonas Bonn, Tobias Klauser

On Friday 02 September 2011 02:51:35, H. Peter Anvin wrote:
> On 09/01/2011 05:49 PM, Pedro Alves wrote:
> >>>
> >>>       struct iovec
> >>>       {
> >>>           void __user *iov_base;    /* BSD uses caddr_t (1003.1g requires
> >>> void *) */
> >>>           __kernel_size_t iov_len; /* Must be size_t (1003.1g) */
> >>>       } __attribute__((x32_abi_64));
> >>>
> >>>       typedef long time_t __attribute__((x32_abi_64));
> >>>
> >>> The x32_abi_64 attribute converts pointers and longs back to 64-bit and
> >>> adjusts the alignment accordingly.  If we tag all userspace visible
> >>> structures with this attribute, we can use the 64-bit ABI without changes.
> >
> > I would expect no new gcc extension to be needed for that -- there's the
> > mode attribute (you can read DI as 64-bit):
> >
> >   typedef void * __kernel_ptr64 __attribute ((mode(DI)));
> >
> >   struct iovec
> >   {
> >     __kernel_ptr64 iov_base;
> >     ...
> >   };
> >
> 
> Does that work for *writing*, too?  That might be a very useful little 
> escape hatch for some particularly tight corners.

With a gcc trunk from earlier this month (that supports -mx32):

typedef void * ptr64 __attribute ((mode(DI)));

struct foo
{
  ptr64 p64;
};

int
foofunc (void)
{
  struct foo foo;
  int i;
  void *p32;
  ptr64 p64;

  p32 = &i;
  p32 = &foofunc;

  p64 = p32;
  p64 = &i;
  p64 = foofunc;

  foo.p64 = p32;

  void **ptrp64 = &p64;  /* gives "warning: initialization from incompatible pointer type [enabled by default]". */

  return (int) (void*) ptrp64;
}

With:

./cc1 ~/mode.c -o mode.o -mx32 -fverbose-asm

gives me:

/home/pedro/mode.c: In function ‘foofunc’:
/home/pedro/mode.c:25:19: warning: initialization from incompatible pointer type [enabled by default]

and produced (note, -O0):

foofunc:
.LFB0:
        .cfi_startproc
        pushq   %rbp    #
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp      #,
        .cfi_def_cfa_register 6
        leaq    -20(%rbp), %rax #, tmp61
        movl    %eax, -4(%rbp)  # tmp61, p32
        movl    $foofunc, -4(%rbp)      #, p32
        mov     -4(%rbp), %eax  # p32,
        movq    %rax, -32(%rbp) #, p64
        leaq    -20(%rbp), %rax #, tmp62
        movq    %rax, -32(%rbp) # tmp62, p64
        movq    $foofunc, -32(%rbp)     #, p64
        mov     -4(%rbp), %eax  # p32, tmp63
        movq    %rax, -16(%rbp) # tmp63, foo.p64
        leaq    -32(%rbp), %rax #, tmp64
        movl    %eax, -8(%rbp)  # tmp64, ptrp64
        movl    -8(%rbp), %eax  # ptrp64, D.2693
        popq    %rbp    #
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-31 17:05                       ` H.J. Lu
@ 2011-09-03  2:56                         ` H.J. Lu
  2011-09-03  3:04                           ` Linus Torvalds
  0 siblings, 1 reply; 94+ messages in thread
From: H.J. Lu @ 2011-09-03  2:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, H. Peter Anvin, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Wed, Aug 31, 2011 at 10:05 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Aug 31, 2011 at 9:46 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> I really think that "x32" should try to aim *VERY* hard at using the
>> 64-bit system calls, and seeing itself as being a "32-bit application
>> in a 64-bit world".  That's not just true for time_t (which I think
>> should be 64-bit on anything new that expects to survive for any
>> amount of time), but in general.
>>
>
> I have been making x32 to use 64bit system calls as much as possible.
> Hopefully, I will get it to work in a week or 2.
>

I have an x32 kernel which usea 64bit time_t/timespec/timeval and file
system interface:

1. Use 64bit system calls as much as we can.
2. There are 32 compat system calls for x32 to support structure parameters
with pointers and longs.
3. Use x86-64 vDSO relocatable files directly to generate x32 vDSO.

We need to make decision on some system calls which take a pointer
to structure with long.  For example, msg calls take a pointer to

          struct msgbuf {
               long mtype;       /* message type, must be > 0 */
               char mtext[1];    /* message data */
           };

We have 3 choices:

1. Use long long for x32 and use 64bit msg system calls.
2. Keep long for x32 and use compat system call.
3. Add a 4byte padding for x32 and sign-extend from mtype to 4byte
padding before passing to 64bit msg system call.

I implemented option 1.  I can also implement option 2 or 3.

Thanks.


-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03  2:56                         ` H.J. Lu
@ 2011-09-03  3:04                           ` Linus Torvalds
  2011-09-03  4:02                             ` H.J. Lu
  0 siblings, 1 reply; 94+ messages in thread
From: Linus Torvalds @ 2011-09-03  3:04 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Arnd Bergmann, H. Peter Anvin, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Fri, Sep 2, 2011 at 7:56 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
> We need to make decision on some system calls which take a pointer
> to structure with long.  For example, msg calls take a pointer to
>
>          struct msgbuf {
>               long mtype;       /* message type, must be > 0 */
>               char mtext[1];    /* message data */
>           };
>
> We have 3 choices:
>
> 1. Use long long for x32 and use 64bit msg system calls.

I think this sounds like the best option. As many plain 64-bit system
calls as humanly possible.

                Linus

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03  3:04                           ` Linus Torvalds
@ 2011-09-03  4:02                             ` H.J. Lu
  2011-09-03  4:29                               ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: H.J. Lu @ 2011-09-03  4:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, H. Peter Anvin, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Fri, Sep 2, 2011 at 8:04 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Sep 2, 2011 at 7:56 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> We need to make decision on some system calls which take a pointer
>> to structure with long.  For example, msg calls take a pointer to
>>
>>          struct msgbuf {
>>               long mtype;       /* message type, must be > 0 */
>>               char mtext[1];    /* message data */
>>           };
>>
>> We have 3 choices:
>>
>> 1. Use long long for x32 and use 64bit msg system calls.
>
> I think this sounds like the best option. As many plain 64-bit system
> calls as humanly possible.
>

I defined __SNATIVE_LONG_TYPE and  __UNATIVE_LONG_TYPE
in x32 header, which are native signed and unsigned long types. I
used them in

bits/ipc.h:    __UNATIVE_LONG_TYPE __unused1;
bits/ipc.h:    __UNATIVE_LONG_TYPE __unused2;
bits/mqueue.h:  __SNATIVE_LONG_TYPE mq_flags;		/* Message queue flags.  */
bits/mqueue.h:  __SNATIVE_LONG_TYPE mq_maxmsg;	/* Maximum number of
messages.  */
bits/mqueue.h:  __SNATIVE_LONG_TYPE mq_msgsize;	/* Maximum message size.  */
bits/mqueue.h:  __SNATIVE_LONG_TYPE mq_curmsgs;	/* Number of messages
currently queued.  */
bits/mqueue.h:  __SNATIVE_LONG_TYPE __pad[4];
bits/msq.h:typedef __UNATIVE_LONG_TYPE msgqnum_t;
bits/msq.h:typedef __UNATIVE_LONG_TYPE msglen_t;
bits/msq.h:  __UNATIVE_LONG_TYPE __msg_cbytes; /* current number of
bytes on queue */
bits/msq.h:  __UNATIVE_LONG_TYPE __unused4;
bits/msq.h:  __UNATIVE_LONG_TYPE __unused5;
bits/sem.h:  __UNATIVE_LONG_TYPE sem_nsems;	/* number of semaphores in set */
bits/sem.h:  __UNATIVE_LONG_TYPE __unused3;
bits/sem.h:  __UNATIVE_LONG_TYPE __unused4;
bits/shm.h:typedef __UNATIVE_LONG_TYPE shmatt_t;
bits/shm.h:    __UNATIVE_LONG_TYPE __unused4;
bits/shm.h:    __UNATIVE_LONG_TYPE __unused5;
bits/shm.h:    __UNATIVE_LONG_TYPE shmmax;
bits/shm.h:    __UNATIVE_LONG_TYPE shmmin;
bits/shm.h:    __UNATIVE_LONG_TYPE shmmni;
bits/shm.h:    __UNATIVE_LONG_TYPE shmseg;
bits/shm.h:    __UNATIVE_LONG_TYPE shmall;
bits/shm.h:    __UNATIVE_LONG_TYPE __unused1;
bits/shm.h:    __UNATIVE_LONG_TYPE __unused2;
bits/shm.h:    __UNATIVE_LONG_TYPE __unused3;
bits/shm.h:    __UNATIVE_LONG_TYPE __unused4;
bits/shm.h:    __UNATIVE_LONG_TYPE shm_tot;	/* total allocated shm */
bits/shm.h:    __UNATIVE_LONG_TYPE shm_rss;	/* total resident shm */
bits/shm.h:    __UNATIVE_LONG_TYPE shm_swp;	/* total swapped shm */
bits/shm.h:    __UNATIVE_LONG_TYPE swap_attempts;
bits/shm.h:    __UNATIVE_LONG_TYPE swap_successes;
bits/statfs.h:    __SNATIVE_LONG_TYPE f_type;
bits/statfs.h:    __SNATIVE_LONG_TYPE f_bsize;
bits/statfs.h:    __SNATIVE_LONG_TYPE f_namelen;
bits/statfs.h:    __SNATIVE_LONG_TYPE f_frsize;
bits/statfs.h:    __SNATIVE_LONG_TYPE f_flags;
bits/statfs.h:    __SNATIVE_LONG_TYPE f_spare[4];
bits/statfs.h:    __SNATIVE_LONG_TYPE f_type;
bits/statfs.h:    __SNATIVE_LONG_TYPE f_bsize;
bits/statfs.h:    __SNATIVE_LONG_TYPE f_namelen;
bits/statfs.h:    __SNATIVE_LONG_TYPE f_frsize;
bits/statfs.h:    __SNATIVE_LONG_TYPE f_flags;
bits/statfs.h:    __SNATIVE_LONG_TYPE f_spare[4];
bits/stat.h:    __UNATIVE_LONG_TYPE st_atimensec;	/* Nscecs of last access.  */
bits/stat.h:    __UNATIVE_LONG_TYPE st_mtimensec;	/* Nsecs of last
modification.  */
bits/stat.h:    __UNATIVE_LONG_TYPE st_ctimensec;	/* Nsecs of last
status change.  */
bits/stat.h:    __UNATIVE_LONG_TYPE st_atimensec;	/* Nscecs of last access.  */
bits/stat.h:    __UNATIVE_LONG_TYPE st_mtimensec;	/* Nsecs of last
modification.  */
bits/stat.h:    __UNATIVE_LONG_TYPE st_ctimensec;	/* Nsecs of last
status change.  */
bits/timex.h:  __SNATIVE_LONG_TYPE offset;/* time offset (usec) */
bits/timex.h:  __SNATIVE_LONG_TYPE freq;/* frequency offset (scaled ppm) */
bits/timex.h:  __SNATIVE_LONG_TYPE maxerror;/* maximum error (usec) */
bits/timex.h:  __SNATIVE_LONG_TYPE esterror;/* estimated error (usec) */
bits/timex.h:  __SNATIVE_LONG_TYPE constant;/* pll time constant */
bits/timex.h:  __SNATIVE_LONG_TYPE precision;/* clock precision (usec)
(read only) */
bits/timex.h:  __SNATIVE_LONG_TYPE tolerance;/* clock frequency
tolerance (ppm) (read only) */
bits/timex.h:  __SNATIVE_LONG_TYPE tick;/* (modified) usecs between
clock ticks */
bits/timex.h:  __SNATIVE_LONG_TYPE ppsfreq;/* pps frequency (scaled ppm) (ro) */
bits/timex.h:  __SNATIVE_LONG_TYPE jitter;/* pps jitter (us) (ro) */
bits/timex.h:  __SNATIVE_LONG_TYPE stabil;/* pps stability (scaled ppm) (ro) */
bits/timex.h:  __SNATIVE_LONG_TYPE jitcnt;/* jitter limit exceeded (ro) */
bits/timex.h:  __SNATIVE_LONG_TYPE calcnt;/* calibration intervals (ro) */
bits/timex.h:  __SNATIVE_LONG_TYPE errcnt;/* calibration errors (ro) */
bits/timex.h:  __SNATIVE_LONG_TYPE stbcnt;/* stability limit exceeded (ro) */
sys/msg.h:    __SNATIVE_LONG_TYPE mtype;	/* type of received/sent message */

and

/* POSIX.1b structure for a time value.  This is like a `struct timeval' but
   has nanoseconds instead of microseconds.  */
struct timespec
  {
    __time_t tv_sec;            /* Seconds.  */
    __SNATIVE_LONG_TYPE tv_nsec;/* Nanoseconds.  */
  };

so that I can use 64bit system calls directly.


-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03  4:02                             ` H.J. Lu
@ 2011-09-03  4:29                               ` H. Peter Anvin
  2011-09-03  4:44                                 ` H.J. Lu
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-09-03  4:29 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On 09/02/2011 09:02 PM, H.J. Lu wrote:
>
> I defined __SNATIVE_LONG_TYPE and  __UNATIVE_LONG_TYPE
> in x32 header, which are native signed and unsigned long types. I
> used them in
>

What is the definition of these macros?

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03  4:29                               ` H. Peter Anvin
@ 2011-09-03  4:44                                 ` H.J. Lu
  2011-09-03  5:16                                   ` H. Peter Anvin
  2011-09-03  5:29                                   ` H. Peter Anvin
  0 siblings, 2 replies; 94+ messages in thread
From: H.J. Lu @ 2011-09-03  4:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Fri, Sep 2, 2011 at 9:29 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/02/2011 09:02 PM, H.J. Lu wrote:
>>
>> I defined __SNATIVE_LONG_TYPE and  __UNATIVE_LONG_TYPE
>> in x32 header, which are native signed and unsigned long types. I
>> used them in
>>
>
> What is the definition of these macros?
>

I have

#if defined __x86_64__ && __WORDSIZE == 32
#define __INO_T_TYPE		__UQUAD_TYPE
#define __NLINK_T_TYPE		__UQUAD_TYPE
#define __OFF_T_TYPE		__SQUAD_TYPE
#define __RLIM_T_TYPE		__UQUAD_TYPE
#define	__BLKCNT_T_TYPE		__SQUAD_TYPE
#define	__FSFILCNT_T_TYPE	__UQUAD_TYPE
#define	__FSBLKCNT_T_TYPE	__UQUAD_TYPE
#define __TIME_T_TYPE		__SQUAD_TYPE
#define __BLKSIZE_T_TYPE	__SQUAD_TYPE
#define __SUSECONDS_T_TYPE	__SQUAD_TYPE
#define __SNATIVE_LONG_TYPE	__SQUAD_TYPE
#define __UNATIVE_LONG_TYPE	__UQUAD_TYPE
#else
#define __INO_T_TYPE		__ULONGWORD_TYPE
#define __NLINK_T_TYPE		__UWORD_TYPE
#define __OFF_T_TYPE		__SLONGWORD_TYPE
#define __RLIM_T_TYPE		__ULONGWORD_TYPE
#define	__BLKCNT_T_TYPE		__SLONGWORD_TYPE
#define	__FSFILCNT_T_TYPE	__ULONGWORD_TYPE
#define	__FSBLKCNT_T_TYPE	__ULONGWORD_TYPE
#define __TIME_T_TYPE		__SLONGWORD_TYPE
#define __BLKSIZE_T_TYPE	__SLONGWORD_TYPE
#define __SUSECONDS_T_TYPE	__SLONGWORD_TYPE
#define __SNATIVE_LONG_TYPE	__SLONGWORD_TYPE
#define __UNATIVE_LONG_TYPE	__ULONGWORD_TYPE
#endif

There is one problem with:

int sysinfo(struct sysinfo *info);

struct sysinfo comes from <linux/kernel.h>:

struct sysinfo {
	long uptime;			/* Seconds since boot */
	unsigned long loads[3];		/* 1, 5, and 15 minute load averages */
	unsigned long totalram;		/* Total usable main memory size */
	unsigned long freeram;		/* Available memory size */
	unsigned long sharedram;	/* Amount of shared memory */
	unsigned long bufferram;	/* Memory used by buffers */
	unsigned long totalswap;	/* Total swap space size */
	unsigned long freeswap;		/* swap space still available */
	unsigned short procs;		/* Number of current processes */
	unsigned short pad;		/* explicit padding for m68k */
	unsigned long totalhigh;	/* Total high memory size */
	unsigned long freehigh;		/* Available high memory size */
	unsigned int mem_unit;		/* Memory unit size in bytes */
	char _f[20-2*sizeof(long)-sizeof(int)];	/* Padding: libc5 uses this.. */
};

I couldn't find a clean way to use long long for x32.  I wind up calling
compat_sys_sysinfo.  Also linux/aio_abi.h has

typedef unsigned long	aio_context_t;

I had to use compat_sys_io_setup for

int io_setup(unsigned nr_events, aio_context_t *ctxp);

Is there a way to support something similar to  __SNATIVE_LONG_TYPE
and  __UNATIVE_LONG_TYPE for kernel header files.


-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03  4:44                                 ` H.J. Lu
@ 2011-09-03  5:16                                   ` H. Peter Anvin
  2011-09-03 14:11                                     ` H.J. Lu
  2011-09-03  5:29                                   ` H. Peter Anvin
  1 sibling, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-09-03  5:16 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On 09/02/2011 09:44 PM, H.J. Lu wrote:
>>
>> What is the definition of these macros?
>>
> 
> I have
> 
> #if defined __x86_64__ && __WORDSIZE == 32
> #define __INO_T_TYPE		__UQUAD_TYPE
> #define __NLINK_T_TYPE		__UQUAD_TYPE
> #define __OFF_T_TYPE		__SQUAD_TYPE
> #define __RLIM_T_TYPE		__UQUAD_TYPE
> #define	__BLKCNT_T_TYPE		__SQUAD_TYPE
> #define	__FSFILCNT_T_TYPE	__UQUAD_TYPE
> #define	__FSBLKCNT_T_TYPE	__UQUAD_TYPE
> #define __TIME_T_TYPE		__SQUAD_TYPE
> #define __BLKSIZE_T_TYPE	__SQUAD_TYPE
> #define __SUSECONDS_T_TYPE	__SQUAD_TYPE
> #define __SNATIVE_LONG_TYPE	__SQUAD_TYPE
> #define __UNATIVE_LONG_TYPE	__UQUAD_TYPE
> #else
> #define __INO_T_TYPE		__ULONGWORD_TYPE
> #define __NLINK_T_TYPE		__UWORD_TYPE
> #define __OFF_T_TYPE		__SLONGWORD_TYPE
> #define __RLIM_T_TYPE		__ULONGWORD_TYPE
> #define	__BLKCNT_T_TYPE		__SLONGWORD_TYPE
> #define	__FSFILCNT_T_TYPE	__ULONGWORD_TYPE
> #define	__FSBLKCNT_T_TYPE	__ULONGWORD_TYPE
> #define __TIME_T_TYPE		__SLONGWORD_TYPE
> #define __BLKSIZE_T_TYPE	__SLONGWORD_TYPE
> #define __SUSECONDS_T_TYPE	__SLONGWORD_TYPE
> #define __SNATIVE_LONG_TYPE	__SLONGWORD_TYPE
> #define __UNATIVE_LONG_TYPE	__ULONGWORD_TYPE
> #endif
> 

I don't understand the types on the right side...
	
	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03  4:44                                 ` H.J. Lu
  2011-09-03  5:16                                   ` H. Peter Anvin
@ 2011-09-03  5:29                                   ` H. Peter Anvin
  2011-09-03  8:41                                     ` Arnd Bergmann
  2011-09-03 14:15                                     ` H.J. Lu
  1 sibling, 2 replies; 94+ messages in thread
From: H. Peter Anvin @ 2011-09-03  5:29 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On 09/02/2011 09:44 PM, H.J. Lu wrote:
> 
> Is there a way to support something similar to  __SNATIVE_LONG_TYPE
> and  __UNATIVE_LONG_TYPE for kernel header files.
> 

Again, what is the definition you're looking for?  We have __u64 and
__s64, but the question is what alignment those types should be using,
given that we're presumably stuck with using compat_ioctl for ioctl...

	-hpa



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03  5:29                                   ` H. Peter Anvin
@ 2011-09-03  8:41                                     ` Arnd Bergmann
  2011-09-03 14:04                                       ` Valdis.Kletnieks
  2011-09-03 14:15                                     ` H.J. Lu
  1 sibling, 1 reply; 94+ messages in thread
From: Arnd Bergmann @ 2011-09-03  8:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: H.J. Lu, Linus Torvalds, Christoph Hellwig, LKML, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

On Friday 02 September 2011 22:29:38 H. Peter Anvin wrote:
> On 09/02/2011 09:44 PM, H.J. Lu wrote:
> > 
> > Is there a way to support something similar to  __SNATIVE_LONG_TYPE
> > and  __UNATIVE_LONG_TYPE for kernel header files.
> > 
> 
> Again, what is the definition you're looking for?  We have __u64 and
> __s64, but the question is what alignment those types should be using,
> given that we're presumably stuck with using compat_ioctl for ioctl...

I think the above types have to use full 64 bit alignment, otherwise
you are incompatible with any native data structures that have padding
in the 64-bit ABI.

For the ioctl interface however, the __u64/__s64 type in the x32 ABI
must be defined with __attribute__((packed,aligned(4))) to match what
the kernel implements because it emulates the x86-32 ABI.

This also means we need to audit all ioctl definitions that use
some other type like 'unsigned long long' or 'uint64_t' and change
those to use the proper '__u64'.

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03  8:41                                     ` Arnd Bergmann
@ 2011-09-03 14:04                                       ` Valdis.Kletnieks
  2011-09-03 16:40                                         ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Valdis.Kletnieks @ 2011-09-03 14:04 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, H.J. Lu, Linus Torvalds, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

[-- Attachment #1: Type: text/plain, Size: 446 bytes --]

On Sat, 03 Sep 2011 10:41:16 +0200, Arnd Bergmann said:

(Admittedly, I'm tuning in late on this discussion, but...)

> For the ioctl interface however, the __u64/__s64 type in the x32 ABI
> must be defined with __attribute__((packed,aligned(4))) to match what
> the kernel implements because it emulates the x86-32 ABI.

Is this a cast-in-stone issue, or is it still not too late to change that?
And if we change that, can we simplify anything?

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03  5:16                                   ` H. Peter Anvin
@ 2011-09-03 14:11                                     ` H.J. Lu
  0 siblings, 0 replies; 94+ messages in thread
From: H.J. Lu @ 2011-09-03 14:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Fri, Sep 2, 2011 at 10:16 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/02/2011 09:44 PM, H.J. Lu wrote:
>>>
>>> What is the definition of these macros?
>>>
>>
>> I have
>>
>> #if defined __x86_64__ && __WORDSIZE == 32
>> #define __INO_T_TYPE          __UQUAD_TYPE
>> #define __NLINK_T_TYPE                __UQUAD_TYPE
>> #define __OFF_T_TYPE          __SQUAD_TYPE
>> #define __RLIM_T_TYPE         __UQUAD_TYPE
>> #define       __BLKCNT_T_TYPE         __SQUAD_TYPE
>> #define       __FSFILCNT_T_TYPE       __UQUAD_TYPE
>> #define       __FSBLKCNT_T_TYPE       __UQUAD_TYPE
>> #define __TIME_T_TYPE         __SQUAD_TYPE
>> #define __BLKSIZE_T_TYPE      __SQUAD_TYPE
>> #define __SUSECONDS_T_TYPE    __SQUAD_TYPE
>> #define __SNATIVE_LONG_TYPE   __SQUAD_TYPE
>> #define __UNATIVE_LONG_TYPE   __UQUAD_TYPE
>> #else
>> #define __INO_T_TYPE          __ULONGWORD_TYPE
>> #define __NLINK_T_TYPE                __UWORD_TYPE
>> #define __OFF_T_TYPE          __SLONGWORD_TYPE
>> #define __RLIM_T_TYPE         __ULONGWORD_TYPE
>> #define       __BLKCNT_T_TYPE         __SLONGWORD_TYPE
>> #define       __FSFILCNT_T_TYPE       __ULONGWORD_TYPE
>> #define       __FSBLKCNT_T_TYPE       __ULONGWORD_TYPE
>> #define __TIME_T_TYPE         __SLONGWORD_TYPE
>> #define __BLKSIZE_T_TYPE      __SLONGWORD_TYPE
>> #define __SUSECONDS_T_TYPE    __SLONGWORD_TYPE
>> #define __SNATIVE_LONG_TYPE   __SLONGWORD_TYPE
>> #define __UNATIVE_LONG_TYPE   __ULONGWORD_TYPE
>> #endif
>>
>
> I don't understand the types on the right side...
>

LONGWORD is long and QUAD is 64bit int.


-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03  5:29                                   ` H. Peter Anvin
  2011-09-03  8:41                                     ` Arnd Bergmann
@ 2011-09-03 14:15                                     ` H.J. Lu
  1 sibling, 0 replies; 94+ messages in thread
From: H.J. Lu @ 2011-09-03 14:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Arnd Bergmann, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Fri, Sep 2, 2011 at 10:29 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/02/2011 09:44 PM, H.J. Lu wrote:
>>
>> Is there a way to support something similar to  __SNATIVE_LONG_TYPE
>> and  __UNATIVE_LONG_TYPE for kernel header files.
>>
>
> Again, what is the definition you're looking for?  We have __u64 and
> __s64, but the question is what alignment those types should be using,
> given that we're presumably stuck with using compat_ioctl for ioctl...
>

I want a type for integer register with natural alignment, which is

1. long with 4 byte alignment for ia32.
2. long with 8 byte alignment for x86-64
3. long long with 8 byte alignment for x32.

-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03 14:04                                       ` Valdis.Kletnieks
@ 2011-09-03 16:40                                         ` H. Peter Anvin
  2011-09-03 17:16                                           ` Valdis.Kletnieks
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-09-03 16:40 UTC (permalink / raw)
  To: Valdis.Kletnieks, Arnd Bergmann
  Cc: H.J. Lu, Linus Torvalds, Christoph Hellwig, LKML, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser

Valdis.Kletnieks@vt.edu wrote:

>On Sat, 03 Sep 2011 10:41:16 +0200, Arnd Bergmann said:
>
>(Admittedly, I'm tuning in late on this discussion, but...)
>
>> For the ioctl interface however, the __u64/__s64 type in the x32 ABI
>> must be defined with __attribute__((packed,aligned(4))) to match what
>> the kernel implements because it emulates the x86-32 ABI.
>
>Is this a cast-in-stone issue, or is it still not too late to change
>that?
>And if we change that, can we simplify anything?

The complexity of changing that would be enormous.
-- 
Sent from my mobile phone. Please excuse my brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03 16:40                                         ` H. Peter Anvin
@ 2011-09-03 17:16                                           ` Valdis.Kletnieks
  2011-09-03 17:22                                             ` H.J. Lu
  2011-09-03 17:27                                             ` H. Peter Anvin
  0 siblings, 2 replies; 94+ messages in thread
From: Valdis.Kletnieks @ 2011-09-03 17:16 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Arnd Bergmann, H.J. Lu, Linus Torvalds, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

[-- Attachment #1: Type: text/plain, Size: 923 bytes --]

On Sat, 03 Sep 2011 09:40:55 PDT, "H. Peter Anvin" said:
> Valdis.Kletnieks@vt.edu wrote:
> 
> >On Sat, 03 Sep 2011 10:41:16 +0200, Arnd Bergmann said:
> >
> >(Admittedly, I'm tuning in late on this discussion, but...)
> >
> >> For the ioctl interface however, the __u64/__s64 type in the x32 ABI
> >> must be defined with __attribute__((packed,aligned(4))) to match what
> >> the kernel implements because it emulates the x86-32 ABI.
> >
> >Is this a cast-in-stone issue, or is it still not too late to change that?
> >And if we change that, can we simplify anything?
> 
> The complexity of changing that would be enormous.

Oh, I know changing the x86-32 ABI is impossible - I meant changing the
decision to emulate that ABI (as opposed to emulating the x86-64 ABI, or a
variant thereof, or something else).  Or are we already commited to that
route, even if we're still trying to figure out what syscalls to include?




[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03 17:16                                           ` Valdis.Kletnieks
@ 2011-09-03 17:22                                             ` H.J. Lu
  2011-09-03 17:28                                               ` H. Peter Anvin
  2011-09-03 17:27                                             ` H. Peter Anvin
  1 sibling, 1 reply; 94+ messages in thread
From: H.J. Lu @ 2011-09-03 17:22 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: H. Peter Anvin, Arnd Bergmann, Linus Torvalds, Christoph Hellwig,
	LKML, Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Sat, Sep 3, 2011 at 10:16 AM,  <Valdis.Kletnieks@vt.edu> wrote:
> On Sat, 03 Sep 2011 09:40:55 PDT, "H. Peter Anvin" said:
>> Valdis.Kletnieks@vt.edu wrote:
>>
>> >On Sat, 03 Sep 2011 10:41:16 +0200, Arnd Bergmann said:
>> >
>> >(Admittedly, I'm tuning in late on this discussion, but...)
>> >
>> >> For the ioctl interface however, the __u64/__s64 type in the x32 ABI
>> >> must be defined with __attribute__((packed,aligned(4))) to match what
>> >> the kernel implements because it emulates the x86-32 ABI.
>> >
>> >Is this a cast-in-stone issue, or is it still not too late to change that?
>> >And if we change that, can we simplify anything?
>>
>> The complexity of changing that would be enormous.
>
> Oh, I know changing the x86-32 ABI is impossible - I meant changing the
> decision to emulate that ABI (as opposed to emulating the x86-64 ABI, or a
> variant thereof, or something else).  Or are we already commited to that
> route, even if we're still trying to figure out what syscalls to include?

We can't use 64bit ioctl for x32 if indirect pointers are ever passed
to ioctl.


-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03 17:16                                           ` Valdis.Kletnieks
  2011-09-03 17:22                                             ` H.J. Lu
@ 2011-09-03 17:27                                             ` H. Peter Anvin
  2011-09-04 13:51                                               ` Valdis.Kletnieks
  2011-09-04 15:17                                               ` Arnd Bergmann
  1 sibling, 2 replies; 94+ messages in thread
From: H. Peter Anvin @ 2011-09-03 17:27 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Arnd Bergmann, H.J. Lu, Linus Torvalds, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On 09/03/2011 10:16 AM, Valdis.Kletnieks@vt.edu wrote:
>>
>> The complexity of changing that would be enormous.
>
> Oh, I know changing the x86-32 ABI is impossible - I meant changing the
> decision to emulate that ABI (as opposed to emulating the x86-64 ABI, or a
> variant thereof, or something else).  Or are we already commited to that
> route, even if we're still trying to figure out what syscalls to include?
>

About ioctl in particular, the ABI has dependencies into almost every 
single driver in the Linux kernel.  It is hard-coded in the kernel that 
there are two paths -- native and compat.  Since pointers are going to 
be 4 bytes, it means we have to use the compat path.

We may be able to cheat a little bit since we encode the argument sizes 
in the ioctl numbers; this solves the case of PPGETTIME/PPSETTIME for 
example (in fact, this ioctl looks currently broken in compat mode!) 
However, at some point the sheer number of data types that can be 
consumed by ioctl is a real concern, so changing the ones we really care 
about -- like timespec/timeval -- while leaving the rest intact so we 
can use the compat path as a general rule would be highly useful.

	-hpa


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03 17:22                                             ` H.J. Lu
@ 2011-09-03 17:28                                               ` H. Peter Anvin
  0 siblings, 0 replies; 94+ messages in thread
From: H. Peter Anvin @ 2011-09-03 17:28 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Valdis.Kletnieks, Arnd Bergmann, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On 09/03/2011 10:22 AM, H.J. Lu wrote:
>
> We can't use 64bit ioctl for x32 if indirect pointers are ever passed
> to ioctl.
>

Which they are, in a fairly large number of ioctls.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03 17:27                                             ` H. Peter Anvin
@ 2011-09-04 13:51                                               ` Valdis.Kletnieks
  2011-09-04 15:17                                               ` Arnd Bergmann
  1 sibling, 0 replies; 94+ messages in thread
From: Valdis.Kletnieks @ 2011-09-04 13:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Arnd Bergmann, H.J. Lu, Linus Torvalds, Christoph Hellwig, LKML,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

[-- Attachment #1: Type: text/plain, Size: 379 bytes --]

On Sat, 03 Sep 2011 10:27:42 PDT, "H. Peter Anvin" said:

> About ioctl in particular, the ABI has dependencies into almost every 
> single driver in the Linux kernel.  It is hard-coded in the kernel that 
> there are two paths -- native and compat.  Since pointers are going to 
> be 4 bytes, it means we have to use the compat path.

Ah, that's the part I was missing, thanks.

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-03 17:27                                             ` H. Peter Anvin
  2011-09-04 13:51                                               ` Valdis.Kletnieks
@ 2011-09-04 15:17                                               ` Arnd Bergmann
  2011-09-04 17:08                                                 ` Linus Torvalds
  2011-09-04 18:40                                                 ` H.J. Lu
  1 sibling, 2 replies; 94+ messages in thread
From: Arnd Bergmann @ 2011-09-04 15:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Valdis.Kletnieks, H.J. Lu, Linus Torvalds, Christoph Hellwig,
	LKML, Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Saturday 03 September 2011 10:27:42 H. Peter Anvin wrote:
> On 09/03/2011 10:16 AM, Valdis.Kletnieks@vt.edu wrote:
> >>
> >> The complexity of changing that would be enormous.
> >
> > Oh, I know changing the x86-32 ABI is impossible - I meant changing the
> > decision to emulate that ABI (as opposed to emulating the x86-64 ABI, or a
> > variant thereof, or something else).  Or are we already commited to that
> > route, even if we're still trying to figure out what syscalls to include?
> >
> 
> About ioctl in particular, the ABI has dependencies into almost every 
> single driver in the Linux kernel.  It is hard-coded in the kernel that 
> there are two paths -- native and compat.  Since pointers are going to 
> be 4 bytes, it means we have to use the compat path.
> 
> We may be able to cheat a little bit since we encode the argument sizes 
> in the ioctl numbers; this solves the case of PPGETTIME/PPSETTIME for 
> example (in fact, this ioctl looks currently broken in compat mode!) 
> However, at some point the sheer number of data types that can be 
> consumed by ioctl is a real concern, so changing the ones we really care 
> about -- like timespec/timeval -- while leaving the rest intact so we 
> can use the compat path as a general rule would be highly useful.

The ppdev ioctls are indeed missing in user space, and they are
an example for a different problem than the one I meant.

We really have a number of different cases that we will have to
deal with in different ways:

* different layout and ioctl code due to padding on x86-32,
  x32 is compatible:
  DRM_IOCTL_RADEON_SETPARAM
  DRM_IOCTL_UPDATE_DRAW32
  EXT4_IOC32_GROUP_ADD

* different layout due to padding on x86-32, but same ioctl code:
  RAW_SETBIND
  RAW_GETBIND

* uses time_t, different ioctl code:
  PPPIOCGIDLE32
  VIDIOC_DQBUF32
  VIDIOC_QBUF32
  VIDIOC_QUERYBUF32
  VIDIOC_DQEVENT32

* uses time_t, same ioctl code:
  VIDEO_GET_EVENT
  LPSETTIMEOUT

* Different alignment, three different ioctl numbers:
  FS_IOC_RESVSP_32
  FS_IOC_RESVSP64_32

* manually checks if compat_task:
  input/evdev

* Very complex, no easy solution:
  XFS_IOC_*

* Only needed for x86-32, not for x32:
  sys_quotactl

* Data structures embed time values, not an ioctl
  sys_sendmsg (cmsg)
  sys_recvmsg (cmsg)
  sys_mq_*
  sys_semtimedop

For a lot of these cases, the best option is to change the
kernel headers to use new definitions on x32 before someone
tries to ship a distro, especially when the ioctl command code
is fixed. In case of the XFS ioctls, I think the only sane
way is define the x32 ABI to match the 64 bit ABI completely,
while for RAW_GETBIND and VIDEO_GET_EVENT it's probably enough
to make x32 match x86-32.

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-04 15:17                                               ` Arnd Bergmann
@ 2011-09-04 17:08                                                 ` Linus Torvalds
  2011-09-04 18:40                                                 ` H.J. Lu
  1 sibling, 0 replies; 94+ messages in thread
From: Linus Torvalds @ 2011-09-04 17:08 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Valdis.Kletnieks, H.J. Lu, Christoph Hellwig,
	LKML, Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Sun, Sep 4, 2011 at 8:17 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> For a lot of these cases, the best option is to change the
> kernel headers to use new definitions on x32 before someone
> tries to ship a distro, especially when the ioctl command code
> is fixed. In case of the XFS ioctls, I think the only sane
> way is define the x32 ABI to match the 64 bit ABI completely,
> while for RAW_GETBIND and VIDEO_GET_EVENT it's probably enough
> to make x32 match x86-32.

Ack, ack, ack.

If we make the x32 ioctl system call first do "regular" ioctl, and
then fall back to the compat ones if that fails, x32 can mix and
match. It's not pretty, but I think it's better than the alternative
(which would be to have to use one r the other and add lots of new
compat handling).

Of course, we could also just have "compat_ioctl()" fall back to
native mode in general, and not make a x32 special case.

                     Linus

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-04 15:17                                               ` Arnd Bergmann
  2011-09-04 17:08                                                 ` Linus Torvalds
@ 2011-09-04 18:40                                                 ` H.J. Lu
  2011-09-04 19:06                                                   ` Arnd Bergmann
  2011-09-04 19:31                                                   ` richard -rw- weinberger
  1 sibling, 2 replies; 94+ messages in thread
From: H.J. Lu @ 2011-09-04 18:40 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Sun, Sep 4, 2011 at 8:17 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Saturday 03 September 2011 10:27:42 H. Peter Anvin wrote:
>> On 09/03/2011 10:16 AM, Valdis.Kletnieks@vt.edu wrote:
>> >>
>> >> The complexity of changing that would be enormous.
>> >
>> > Oh, I know changing the x86-32 ABI is impossible - I meant changing the
>> > decision to emulate that ABI (as opposed to emulating the x86-64 ABI, or a
>> > variant thereof, or something else).  Or are we already commited to that
>> > route, even if we're still trying to figure out what syscalls to include?
>> >
>>
>> About ioctl in particular, the ABI has dependencies into almost every
>> single driver in the Linux kernel.  It is hard-coded in the kernel that
>> there are two paths -- native and compat.  Since pointers are going to
>> be 4 bytes, it means we have to use the compat path.
>>
>> We may be able to cheat a little bit since we encode the argument sizes
>> in the ioctl numbers; this solves the case of PPGETTIME/PPSETTIME for
>> example (in fact, this ioctl looks currently broken in compat mode!)
>> However, at some point the sheer number of data types that can be
>> consumed by ioctl is a real concern, so changing the ones we really care
>> about -- like timespec/timeval -- while leaving the rest intact so we
>> can use the compat path as a general rule would be highly useful.
>
> The ppdev ioctls are indeed missing in user space, and they are
> an example for a different problem than the one I meant.
>
> We really have a number of different cases that we will have to
> deal with in different ways:
>
> * different layout and ioctl code due to padding on x86-32,
>  x32 is compatible:
>  DRM_IOCTL_RADEON_SETPARAM
>  DRM_IOCTL_UPDATE_DRAW32
>  EXT4_IOC32_GROUP_ADD
>
> * different layout due to padding on x86-32, but same ioctl code:
>  RAW_SETBIND
>  RAW_GETBIND
>
> * uses time_t, different ioctl code:
>  PPPIOCGIDLE32
>  VIDIOC_DQBUF32
>  VIDIOC_QBUF32
>  VIDIOC_QUERYBUF32
>  VIDIOC_DQEVENT32
>
> * uses time_t, same ioctl code:
>  VIDEO_GET_EVENT
>  LPSETTIMEOUT
>
> * Different alignment, three different ioctl numbers:
>  FS_IOC_RESVSP_32
>  FS_IOC_RESVSP64_32
>
> * manually checks if compat_task:
>  input/evdev
>
> * Very complex, no easy solution:
>  XFS_IOC_*
>
> * Only needed for x86-32, not for x32:
>  sys_quotactl
>
> * Data structures embed time values, not an ioctl
>  sys_sendmsg (cmsg)
>  sys_recvmsg (cmsg)
>  sys_mq_*
>  sys_semtimedop
>
> For a lot of these cases, the best option is to change the
> kernel headers to use new definitions on x32 before someone
> tries to ship a distro, especially when the ioctl command code
> is fixed. In case of the XFS ioctls, I think the only sane
> way is define the x32 ABI to match the 64 bit ABI completely,
> while for RAW_GETBIND and VIDEO_GET_EVENT it's probably enough
> to make x32 match x86-32.
>
>        Arnd
>

I need to use the following compat system calls for x32 due to
pointers or longs in struct passed to system calls.

-- 
H.J.
---
#define __NR_x32_rt_sigaction
#define __NR_x32_rt_sigprocmask
#define __NR_x32_rt_sigreturn
#define __NR_x32_ioctl
#define __NR_x32_readv
#define __NR_x32_writev
#define __NR_x32_recvfrom
#define __NR_x32_sendmsg
#define __NR_x32_recvmsg
#define __NR_x32_execve
#define __NR_x32_times
#define __NR_x32_rt_sigpending
#define __NR_x32_rt_sigtimedwait
#define __NR_x32_rt_sigqueueinfo
#define __NR_x32_sigaltstack
#define __NR_x32__sysctl
#define __NR_x32_timer_create
#define __NR_x32_mq_notify
#define __NR_x32_kexec_load
#define __NR_x32_waitid
#define __NR_x32_set_robust_list
#define __NR_x32_get_robust_list
#define __NR_x32_vmsplice
#define __NR_x32_move_pages
#define __NR_x32_preadv
#define __NR_x32_pwritev
#define __NR_x32_rt_tgsigqueueinfo
#define __NR_x32_recvmmsg
#define __NR_x32_sendmmsg

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-04 18:40                                                 ` H.J. Lu
@ 2011-09-04 19:06                                                   ` Arnd Bergmann
  2011-09-04 19:31                                                     ` H.J. Lu
  2011-09-04 20:11                                                     ` H. Peter Anvin
  2011-09-04 19:31                                                   ` richard -rw- weinberger
  1 sibling, 2 replies; 94+ messages in thread
From: Arnd Bergmann @ 2011-09-04 19:06 UTC (permalink / raw)
  To: H.J. Lu
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Sunday 04 September 2011 11:40:39 H.J. Lu wrote:
> #define __NR_x32_rt_sigaction
> #define __NR_x32_rt_sigprocmask
> #define __NR_x32_rt_sigreturn
> #define __NR_x32_rt_sigpending
> #define __NR_x32_rt_sigtimedwait
> #define __NR_x32_rt_sigqueueinfo
> #define __NR_x32_rt_tgsigqueueinfo
> #define __NR_x32_sigaltstack
> #define __NR_x32_waitid
> #define __NR_x32_mq_notify
> #define __NR_x32_timer_create

Right, all signal functions certainly need a new implementation.

> #define __NR_x32_ioctl

What do you plan to do for ioctl? Does this mean you want to
have a third file_operations pointer besides ioctl and compat_ioctl?
I would hope that you manage this by using different ioctl command
numbers in the few cases where the x32 version has to differ from
the x86-32 data structure.

> #define __NR_x32_readv
> #define __NR_x32_writev
> #define __NR_x32_preadv
> #define __NR_x32_pwritev
> #define __NR_x32_vmsplice
> #define __NR_x32_move_pages
> #define __NR_x32_execve
> #define __NR_x32_set_robust_list
> #define __NR_x32_get_robust_list
> #define __NR_x32__sysctl
> #define __NR_x32_kexec_load
> #define __NR_x32_times

I guess you could define the x32 iovec etc. to be compatible with the
64 bit one, but that's not a major thing. Why can't you use the
regular x86_32 calls here?

> #define __NR_x32_recvfrom
> #define __NR_x32_sendmsg
> #define __NR_x32_recvmsg
> #define __NR_x32_recvmmsg
> #define __NR_x32_sendmmsg

These today use the MSG_CMSG_COMPAT flag to distinguish native and compat
calls. Do you plan to have another flag here to handle cmsg time values?

What about things like mq_{get,set}attr, quotactl and semtimedop?

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-04 18:40                                                 ` H.J. Lu
  2011-09-04 19:06                                                   ` Arnd Bergmann
@ 2011-09-04 19:31                                                   ` richard -rw- weinberger
  2011-09-04 19:32                                                     ` H.J. Lu
  1 sibling, 1 reply; 94+ messages in thread
From: richard -rw- weinberger @ 2011-09-04 19:31 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Arnd Bergmann, H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Sun, Sep 4, 2011 at 8:40 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> #define __NR_x32__sysctl

Do we really want sysctl?

Some quotes from it's man-page:
"Do not use this system call!  See NOTES."

"Glibc does not provide a wrapper for this system call; call it using
syscall(2).

 Or rather... don't call it: use of this system call has long been
discouraged, and it is so unloved that it
 is likely to disappear in a future kernel version.  Remove it from
your programs  now;  use  the  /proc/sys
 interface instead."

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-04 19:06                                                   ` Arnd Bergmann
@ 2011-09-04 19:31                                                     ` H.J. Lu
  2011-09-04 21:13                                                       ` Arnd Bergmann
  2011-09-04 20:11                                                     ` H. Peter Anvin
  1 sibling, 1 reply; 94+ messages in thread
From: H.J. Lu @ 2011-09-04 19:31 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Sun, Sep 4, 2011 at 12:06 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Sunday 04 September 2011 11:40:39 H.J. Lu wrote:
>> #define __NR_x32_rt_sigaction
>> #define __NR_x32_rt_sigprocmask
>> #define __NR_x32_rt_sigreturn
>> #define __NR_x32_rt_sigpending
>> #define __NR_x32_rt_sigtimedwait
>> #define __NR_x32_rt_sigqueueinfo
>> #define __NR_x32_rt_tgsigqueueinfo
>> #define __NR_x32_sigaltstack
>> #define __NR_x32_waitid
>> #define __NR_x32_mq_notify
>> #define __NR_x32_timer_create
>
> Right, all signal functions certainly need a new implementation.

We only need to add stub_x32_rt_sigreturn, stub_x32_execve,
stub_x32_sigaltstack and stub_x32_sigaltstack   The rest use
x86-32 system calls.

>> #define __NR_x32_ioctl
>
> What do you plan to do for ioctl? Does this mean you want to
> have a third file_operations pointer besides ioctl and compat_ioctl?
> I would hope that you manage this by using different ioctl command
> numbers in the few cases where the x32 version has to differ from
> the x86-32 data structure.

This requires some kernel changes since x32 has 32bit pointers and 64bit
time_t/timespec/timeval.   We can't use straight x86-32 nor x86-64.

>> #define __NR_x32_readv
>> #define __NR_x32_writev
>> #define __NR_x32_preadv
>> #define __NR_x32_pwritev
>> #define __NR_x32_vmsplice
>> #define __NR_x32_move_pages
>> #define __NR_x32_execve
>> #define __NR_x32_set_robust_list
>> #define __NR_x32_get_robust_list
>> #define __NR_x32__sysctl
>> #define __NR_x32_kexec_load
>> #define __NR_x32_times
>
> I guess you could define the x32 iovec etc. to be compatible with the
> 64 bit one, but that's not a major thing. Why can't you use the
> regular x86_32 calls here?

I am using the regular  x86_32 calls for them.

>
>> #define __NR_x32_recvfrom
>> #define __NR_x32_sendmsg
>> #define __NR_x32_recvmsg
>> #define __NR_x32_recvmmsg
>> #define __NR_x32_sendmmsg
>
> These today use the MSG_CMSG_COMPAT flag to distinguish native and compat
> calls. Do you plan to have another flag here to handle cmsg time values?

I am using x86-32 calls for them.

> What about things like mq_{get,set}attr, quotactl and semtimedop?
>

I am using 64bit system calls for x32.

-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-04 19:31                                                   ` richard -rw- weinberger
@ 2011-09-04 19:32                                                     ` H.J. Lu
  0 siblings, 0 replies; 94+ messages in thread
From: H.J. Lu @ 2011-09-04 19:32 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Arnd Bergmann, H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Sun, Sep 4, 2011 at 12:31 PM, richard -rw- weinberger
<richard.weinberger@gmail.com> wrote:
> On Sun, Sep 4, 2011 at 8:40 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> #define __NR_x32__sysctl
>
> Do we really want sysctl?
>
> Some quotes from it's man-page:
> "Do not use this system call!  See NOTES."
>
> "Glibc does not provide a wrapper for this system call; call it using
> syscall(2).
>
>  Or rather... don't call it: use of this system call has long been
> discouraged, and it is so unloved that it
>  is likely to disappear in a future kernel version.  Remove it from
> your programs  now;  use  the  /proc/sys
>  interface instead."
>

I will remove it.

Thanks.


-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-04 19:06                                                   ` Arnd Bergmann
  2011-09-04 19:31                                                     ` H.J. Lu
@ 2011-09-04 20:11                                                     ` H. Peter Anvin
  1 sibling, 0 replies; 94+ messages in thread
From: H. Peter Anvin @ 2011-09-04 20:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H.J. Lu, Valdis.Kletnieks, Linus Torvalds, Christoph Hellwig,
	LKML, Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

> 
>> #define __NR_x32_ioctl
> 
> What do you plan to do for ioctl? Does this mean you want to
> have a third file_operations pointer besides ioctl and compat_ioctl?
> I would hope that you manage this by using different ioctl command
> numbers in the few cases where the x32 version has to differ from
> the x86-32 data structure.
> 

This *should* just be an entry for compat_ioctl in the x86-64 system
call table (as opposed to the i386 system call table.)

>> #define __NR_x32_readv
>> #define __NR_x32_writev
>> #define __NR_x32_preadv
>> #define __NR_x32_pwritev
>> #define __NR_x32_vmsplice
>> #define __NR_x32_move_pages
>> #define __NR_x32_execve
>> #define __NR_x32_set_robust_list
>> #define __NR_x32_get_robust_list
>> #define __NR_x32__sysctl
>> #define __NR_x32_kexec_load
>> #define __NR_x32_times
> 
> I guess you could define the x32 iovec etc. to be compatible with the
> 64 bit one, but that's not a major thing. Why can't you use the
> regular x86_32 calls here?

They still need to be bound to x86-64 system call entry points.  (Well,
in theory they could be invoked via int $0x80, but that is ugly in a
whole lot of ways.)

>> #define __NR_x32_recvfrom
>> #define __NR_x32_sendmsg
>> #define __NR_x32_recvmsg
>> #define __NR_x32_recvmmsg
>> #define __NR_x32_sendmmsg
> 
> These today use the MSG_CMSG_COMPAT flag to distinguish native and compat
> calls. Do you plan to have another flag here to handle cmsg time values?
> 
> What about things like mq_{get,set}attr, quotactl and semtimedop?

	-hpa


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-04 19:31                                                     ` H.J. Lu
@ 2011-09-04 21:13                                                       ` Arnd Bergmann
  2011-09-04 21:25                                                         ` H.J. Lu
  0 siblings, 1 reply; 94+ messages in thread
From: Arnd Bergmann @ 2011-09-04 21:13 UTC (permalink / raw)
  To: H.J. Lu
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Sunday 04 September 2011 12:31:25 H.J. Lu wrote:
> On Sun, Sep 4, 2011 at 12:06 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >> #define __NR_x32_ioctl
> >
> > What do you plan to do for ioctl? Does this mean you want to
> > have a third file_operations pointer besides ioctl and compat_ioctl?
> > I would hope that you manage this by using different ioctl command
> > numbers in the few cases where the x32 version has to differ from
> > the x86-32 data structure.
> 
> This requires some kernel changes since x32 has 32bit pointers and 64bit
> time_t/timespec/timeval.   We can't use straight x86-32 nor x86-64.

I understand that it's not easy, but how do you want to get there?
There is no central implementation of ioctl, it's all in the device drivers!

My point was that the part that you do control is the ABI for x32, so
you can change the driver's header files to do things like

#ifndef __x32__
struct foo_ioctl_data {
	time_t	time;
	long		something_else;
	__u64		something_big;
};
#else
struct foo_ioctl_data {
	time_t	time;
	long long	something_else;
	__u64		something_big;
};
#endif

#define FOO_IOCTL_BAR _IOR('f', 0, struct foo_ioctl_data)

#ifdef __KERNEL__
struct compat_foo_ioctl_data {
	compat_time_t	time;
	compat_long_t	something_else;
	compat_u64		something_big;
};
#define FOO_IOCTL32_BAR _IOR('f', 0, struct compat_foo_ioctl_data)

static long foo_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
	void __user *uptr = compat_ptr(arg)

	switch (cmd) {
		case FOO_IOCTL32_BAR: /* regular compat case */
			return foo_compat_ioctl_bar(filp, uptr);
		case FOO_IOCTL_BAR: /* x32 passing native struct */
			return foo_ioctl_bar(filp, uptr);
	}
	return -ENOIOCTLCMD;
}

This way, the same compat_ioctl function can easily support both x86-32 and
x32. In fact, many compat_ioctl handlers already contain two code paths for the
compat_u64 case, where they fall back on the native handler for anything but x86.

> >> #define __NR_x32_recvfrom
> >> #define __NR_x32_sendmsg
> >> #define __NR_x32_recvmsg
> >> #define __NR_x32_recvmmsg
> >> #define __NR_x32_sendmmsg
> >
> > These today use the MSG_CMSG_COMPAT flag to distinguish native and compat
> > calls. Do you plan to have another flag here to handle cmsg time values?
> 
> I am using x86-32 calls for them.
>
> > What about things like mq_{get,set}attr, quotactl and semtimedop?
> >
> 
> I am using 64bit system calls for x32.

But isn't that broken? These all pass u64 or time_t values at some point.

	Arnd 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-04 21:13                                                       ` Arnd Bergmann
@ 2011-09-04 21:25                                                         ` H.J. Lu
  2011-09-04 21:41                                                           ` Arnd Bergmann
  0 siblings, 1 reply; 94+ messages in thread
From: H.J. Lu @ 2011-09-04 21:25 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Sun, Sep 4, 2011 at 2:13 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Sunday 04 September 2011 12:31:25 H.J. Lu wrote:
>> On Sun, Sep 4, 2011 at 12:06 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> >> #define __NR_x32_ioctl
>> >
>> > What do you plan to do for ioctl? Does this mean you want to
>> > have a third file_operations pointer besides ioctl and compat_ioctl?
>> > I would hope that you manage this by using different ioctl command
>> > numbers in the few cases where the x32 version has to differ from
>> > the x86-32 data structure.
>>
>> This requires some kernel changes since x32 has 32bit pointers and 64bit
>> time_t/timespec/timeval.   We can't use straight x86-32 nor x86-64.
>
> I understand that it's not easy, but how do you want to get there?
> There is no central implementation of ioctl, it's all in the device drivers!
>
> My point was that the part that you do control is the ABI for x32, so
> you can change the driver's header files to do things like
>
> #ifndef __x32__
> struct foo_ioctl_data {
>        time_t  time;
>        long            something_else;
>        __u64           something_big;
> };
> #else
> struct foo_ioctl_data {
>        time_t  time;
>        long long       something_else;
>        __u64           something_big;
> };
> #endif
>
> #define FOO_IOCTL_BAR _IOR('f', 0, struct foo_ioctl_data)
>
> #ifdef __KERNEL__
> struct compat_foo_ioctl_data {
>        compat_time_t   time;
>        compat_long_t   something_else;
>        compat_u64              something_big;
> };
> #define FOO_IOCTL32_BAR _IOR('f', 0, struct compat_foo_ioctl_data)
>
> static long foo_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> {
>        void __user *uptr = compat_ptr(arg)
>
>        switch (cmd) {
>                case FOO_IOCTL32_BAR: /* regular compat case */
>                        return foo_compat_ioctl_bar(filp, uptr);
>                case FOO_IOCTL_BAR: /* x32 passing native struct */
>                        return foo_ioctl_bar(filp, uptr);
>        }
>        return -ENOIOCTLCMD;
> }
>
> This way, the same compat_ioctl function can easily support both x86-32 and
> x32. In fact, many compat_ioctl handlers already contain two code paths for the
> compat_u64 case, where they fall back on the native handler for anything but x86.

This is one way to deal with itoctl.  I will leave it to Peter.

>> >> #define __NR_x32_recvfrom
>> >> #define __NR_x32_sendmsg
>> >> #define __NR_x32_recvmsg
>> >> #define __NR_x32_recvmmsg
>> >> #define __NR_x32_sendmmsg
>> >
>> > These today use the MSG_CMSG_COMPAT flag to distinguish native and compat
>> > calls. Do you plan to have another flag here to handle cmsg time values?
>>
>> I am using x86-32 calls for them.
>>
>> > What about things like mq_{get,set}attr, quotactl and semtimedop?
>> >
>>
>> I am using 64bit system calls for x32.
>
> But isn't that broken? These all pass u64 or time_t values at some point.
>

time_t isn't a problem since time_t/timeval/timespec are identical for
x32 and x86-64.  As for u64, I added NATIVE_LONG_TYPE, which is
defined as long long for x32,  and use it instead of long in types for
64bit system calls.


-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-04 21:25                                                         ` H.J. Lu
@ 2011-09-04 21:41                                                           ` Arnd Bergmann
  2011-09-04 22:13                                                             ` H.J. Lu
  0 siblings, 1 reply; 94+ messages in thread
From: Arnd Bergmann @ 2011-09-04 21:41 UTC (permalink / raw)
  To: H.J. Lu
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Sunday 04 September 2011 14:25:53 H.J. Lu wrote:
> >> >> #define __NR_x32_recvfrom
> >> >> #define __NR_x32_sendmsg
> >> >> #define __NR_x32_recvmsg
> >> >> #define __NR_x32_recvmmsg
> >> >> #define __NR_x32_sendmmsg
> >> >
> >> > These today use the MSG_CMSG_COMPAT flag to distinguish native and compat
> >> > calls. Do you plan to have another flag here to handle cmsg time values?
> >>
> >> I am using x86-32 calls for them.
> >>
> >> > What about things like mq_{get,set}attr, quotactl and semtimedop?
> >> >
> >>
> >> I am using 64bit system calls for x32.
> >
> > But isn't that broken? These all pass u64 or time_t values at some point.
> >
> 
> time_t isn't a problem since time_t/timeval/timespec are identical for
> x32 and x86-64.  As for u64, I added NATIVE_LONG_TYPE, which is
> defined as long long for x32,  and use it instead of long in types for
> 64bit system calls.

Sorry, I misread you as saying you use the compat syscalls for these.
If you use the native 64 bit syscalls, you have the opposite problem:
Some network protocols (e.g. netlink or rxrpc) use other data structures
that require conversion, e.g. 'long' members that x32 will get wrong.

For quotactl I guess you are right, using the 64 bit call instead
of the x86_32 call will just work on x32 like they do on other compat
architectures. The same should be true for semtimedop, I had misinterpreted
that one thinking that you would still need to convert struct sembuf
(you would need that on another architecture which uses 32 bit struct
alignment in one ABI but the other).

For mq_{get,set}attr, I think you will either have to use the 32 bit
call or conditionally define struct mq_attr to contain 'long long'
members.

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-04 21:41                                                           ` Arnd Bergmann
@ 2011-09-04 22:13                                                             ` H.J. Lu
  2011-09-05  7:48                                                               ` Arnd Bergmann
  0 siblings, 1 reply; 94+ messages in thread
From: H.J. Lu @ 2011-09-04 22:13 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Sun, Sep 4, 2011 at 2:41 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Sunday 04 September 2011 14:25:53 H.J. Lu wrote:
>> >> >> #define __NR_x32_recvfrom
>> >> >> #define __NR_x32_sendmsg
>> >> >> #define __NR_x32_recvmsg
>> >> >> #define __NR_x32_recvmmsg
>> >> >> #define __NR_x32_sendmmsg
>> >> >
>> >> > These today use the MSG_CMSG_COMPAT flag to distinguish native and compat
>> >> > calls. Do you plan to have another flag here to handle cmsg time values?
>> >>
>> >> I am using x86-32 calls for them.
>> >>
>> >> > What about things like mq_{get,set}attr, quotactl and semtimedop?
>> >> >
>> >>
>> >> I am using 64bit system calls for x32.
>> >
>> > But isn't that broken? These all pass u64 or time_t values at some point.
>> >
>>
>> time_t isn't a problem since time_t/timeval/timespec are identical for
>> x32 and x86-64.  As for u64, I added NATIVE_LONG_TYPE, which is
>> defined as long long for x32,  and use it instead of long in types for
>> 64bit system calls.
>
> Sorry, I misread you as saying you use the compat syscalls for these.
> If you use the native 64 bit syscalls, you have the opposite problem:
> Some network protocols (e.g. netlink or rxrpc) use other data structures
> that require conversion, e.g. 'long' members that x32 will get wrong.

For those, I use x86-32 calls.

> For quotactl I guess you are right, using the 64 bit call instead
> of the x86_32 call will just work on x32 like they do on other compat
> architectures. The same should be true for semtimedop, I had misinterpreted
> that one thinking that you would still need to convert struct sembuf
> (you would need that on another architecture which uses 32 bit struct
> alignment in one ABI but the other).

I make sure that x32  struct sembuf is identical to 64bit struct sembuf.

> For mq_{get,set}attr, I think you will either have to use the 32 bit
> call or conditionally define struct mq_attr to contain 'long long'
> members.

I have

struct mq_attr
{
  __SNATIVE_LONG_TYPE mq_flags;		/* Message queue flags.  */
  __SNATIVE_LONG_TYPE mq_maxmsg;	/* Maximum number of messages.  */
  __SNATIVE_LONG_TYPE mq_msgsize;	/* Maximum message size.  */
  __SNATIVE_LONG_TYPE mq_curmsgs;	/* Number of messages currently queued.
 */
  __SNATIVE_LONG_TYPE __pad[4];
};

__SNATIVE_LONG_TYPE is long long for x32.


-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-04 22:13                                                             ` H.J. Lu
@ 2011-09-05  7:48                                                               ` Arnd Bergmann
  2011-09-05 15:11                                                                 ` H.J. Lu
  0 siblings, 1 reply; 94+ messages in thread
From: Arnd Bergmann @ 2011-09-05  7:48 UTC (permalink / raw)
  To: H.J. Lu
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Sunday 04 September 2011 15:13:18 H.J. Lu wrote:
> On Sun, Sep 4, 2011 at 2:41 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Sunday 04 September 2011 14:25:53 H.J. Lu wrote:
> >> >> >> #define __NR_x32_recvfrom
> >> >> >> #define __NR_x32_sendmsg
> >> >> >> #define __NR_x32_recvmsg
> >> >> >> #define __NR_x32_recvmmsg
> >> >> >> #define __NR_x32_sendmmsg
> >> >> >
> >> >> > These today use the MSG_CMSG_COMPAT flag to distinguish native and compat
> >> >> > calls. Do you plan to have another flag here to handle cmsg time values?
> >> >>
> >> >> I am using x86-32 calls for them.
> >> >
> >> > But isn't that broken? These all pass u64 or time_t values at some point.
> >> >
> >>
> >> time_t isn't a problem since time_t/timeval/timespec are identical for
> >> x32 and x86-64.  As for u64, I added NATIVE_LONG_TYPE, which is
> >> defined as long long for x32,  and use it instead of long in types for
> >> 64bit system calls.
> >
> > Sorry, I misread you as saying you use the compat syscalls for these.
> > If you use the native 64 bit syscalls, you have the opposite problem:
> > Some network protocols (e.g. netlink or rxrpc) use other data structures
> > that require conversion, e.g. 'long' members that x32 will get wrong.
> 
> For those, I use x86-32 calls.

So to ask again, what do you plan to do about SCM_TIMESTAMP*?

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-05  7:48                                                               ` Arnd Bergmann
@ 2011-09-05 15:11                                                                 ` H.J. Lu
  2011-09-05 17:21                                                                   ` Arnd Bergmann
  2011-09-09 21:02                                                                   ` H.J. Lu
  0 siblings, 2 replies; 94+ messages in thread
From: H.J. Lu @ 2011-09-05 15:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

[-- Attachment #1: Type: text/plain, Size: 1926 bytes --]

On Mon, Sep 5, 2011 at 12:48 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Sunday 04 September 2011 15:13:18 H.J. Lu wrote:
>> On Sun, Sep 4, 2011 at 2:41 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Sunday 04 September 2011 14:25:53 H.J. Lu wrote:
>> >> >> >> #define __NR_x32_recvfrom
>> >> >> >> #define __NR_x32_sendmsg
>> >> >> >> #define __NR_x32_recvmsg
>> >> >> >> #define __NR_x32_recvmmsg
>> >> >> >> #define __NR_x32_sendmmsg
>> >> >> >
>> >> >> > These today use the MSG_CMSG_COMPAT flag to distinguish native and compat
>> >> >> > calls. Do you plan to have another flag here to handle cmsg time values?
>> >> >>
>> >> >> I am using x86-32 calls for them.
>> >> >
>> >> > But isn't that broken? These all pass u64 or time_t values at some point.
>> >> >
>> >>
>> >> time_t isn't a problem since time_t/timeval/timespec are identical for
>> >> x32 and x86-64.  As for u64, I added NATIVE_LONG_TYPE, which is
>> >> defined as long long for x32,  and use it instead of long in types for
>> >> 64bit system calls.
>> >
>> > Sorry, I misread you as saying you use the compat syscalls for these.
>> > If you use the native 64 bit syscalls, you have the opposite problem:
>> > Some network protocols (e.g. netlink or rxrpc) use other data structures
>> > that require conversion, e.g. 'long' members that x32 will get wrong.
>>
>> For those, I use x86-32 calls.
>
> So to ask again, what do you plan to do about SCM_TIMESTAMP*?
>

I added MSG_CMSG_COMPAT64 and new compat system calls with
64bit timespec/val to support it.  See the enclosed patch.

BTW, I also added

compat_sys_preadv64(unsigned long fd, const struct compat_iovec __user *vec,
         unsigned long vlen, loff_t pos)
compat_sys_pwritev64(unsigned long fd, const struct compat_iovec __user *vec,
         unsigned long vlen, loff_t pos)

to support 32bit compat_iovec * and 64bit offset.


-- 
H.J.

[-- Attachment #2: linux-cmsg-x32-1.patch --]
[-- Type: text/plain, Size: 4267 bytes --]

diff --git a/include/linux/socket.h b/include/linux/socket.h
index d0e77f6..86ecf27 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -272,8 +272,10 @@ struct ucred {
 					   descriptor received through
 					   SCM_RIGHTS */
 #if defined(CONFIG_COMPAT)
+#define MSG_CMSG_COMPAT64 0x20000000	/* This message needs 64 bit timestamp */
 #define MSG_CMSG_COMPAT	0x80000000	/* This message needs 32 bit fixups */
 #else
+#define MSG_CMSG_COMPAT64	0	/* We never have 64 bit timestamp */
 #define MSG_CMSG_COMPAT	0		/* We never have 32 bit fixups */
 #endif
 
diff --git a/net/compat.c b/net/compat.c
index c578d93..555b951 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -229,24 +229,26 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
 		return 0; /* XXX: return error? check spec. */
 	}
 
-	if (level == SOL_SOCKET && type == SCM_TIMESTAMP) {
-		struct timeval *tv = (struct timeval *)data;
-		ctv.tv_sec = tv->tv_sec;
-		ctv.tv_usec = tv->tv_usec;
-		data = &ctv;
-		len = sizeof(ctv);
-	}
-	if (level == SOL_SOCKET &&
-	    (type == SCM_TIMESTAMPNS || type == SCM_TIMESTAMPING)) {
-		int count = type == SCM_TIMESTAMPNS ? 1 : 3;
-		int i;
-		struct timespec *ts = (struct timespec *)data;
-		for (i = 0; i < count; i++) {
-			cts[i].tv_sec = ts[i].tv_sec;
-			cts[i].tv_nsec = ts[i].tv_nsec;
+	if ((MSG_CMSG_COMPAT64 & kmsg->msg_flags) == 0) {
+		if (level == SOL_SOCKET && type == SCM_TIMESTAMP) {
+			struct timeval *tv = (struct timeval *)data;
+			ctv.tv_sec = tv->tv_sec;
+			ctv.tv_usec = tv->tv_usec;
+			data = &ctv;
+			len = sizeof(ctv);
+		}
+		if (level == SOL_SOCKET &&
+		    (type == SCM_TIMESTAMPNS || type == SCM_TIMESTAMPING)) {
+			int count = type == SCM_TIMESTAMPNS ? 1 : 3;
+			int i;
+			struct timespec *ts = (struct timespec *)data;
+			for (i = 0; i < count; i++) {
+				cts[i].tv_sec = ts[i].tv_sec;
+				cts[i].tv_nsec = ts[i].tv_nsec;
+			}
+			data = &cts;
+			len = sizeof(cts[0]) * count;
 		}
-		data = &cts;
-		len = sizeof(cts[0]) * count;
 	}
 
 	cmlen = CMSG_COMPAT_LEN(len);
@@ -781,6 +783,52 @@ asmlinkage long compat_sys_recvmmsg(int fd, struct compat_mmsghdr __user *mmsg,
 	return datagrams;
 }
 
+asmlinkage long compat_sys_sendmsg64(int fd, struct compat_msghdr __user *msg,
+				     unsigned flags)
+{
+	return sys_sendmsg(fd, (struct msghdr __user *)msg,
+			   flags | MSG_CMSG_COMPAT | MSG_CMSG_COMPAT64);
+}
+
+asmlinkage long compat_sys_sendmmsg64(int fd, struct compat_mmsghdr __user *mmsg,
+				      unsigned vlen, unsigned int flags)
+{
+	return __sys_sendmmsg(fd, (struct mmsghdr __user *)mmsg, vlen,
+			      flags | MSG_CMSG_COMPAT | MSG_CMSG_COMPAT64);
+}
+
+asmlinkage long compat_sys_recvmsg64(int fd, struct compat_msghdr __user *msg,
+				     unsigned int flags)
+{
+	return sys_recvmsg(fd, (struct msghdr __user *)msg,
+			   flags | MSG_CMSG_COMPAT | MSG_CMSG_COMPAT64);
+}
+
+asmlinkage long compat_sys_recv64(int fd, void __user *buf, size_t len,
+				  unsigned flags)
+{
+	return sys_recv(fd, buf, len,
+			flags | MSG_CMSG_COMPAT | MSG_CMSG_COMPAT64);
+}
+
+asmlinkage long compat_sys_recvfrom64(int fd, void __user *buf, size_t len,
+				      unsigned flags, struct sockaddr __user *addr,
+				      int __user *addrlen)
+{
+	return sys_recvfrom(fd, buf, len,
+			    flags | MSG_CMSG_COMPAT | MSG_CMSG_COMPAT64,
+			    addr, addrlen);
+}
+
+asmlinkage long compat_sys_recvmmsg64(int fd, struct compat_mmsghdr __user *mmsg,
+				      unsigned vlen, unsigned int flags,
+				      struct timespec __user *timeout)
+{
+	return __sys_recvmmsg(fd, (struct mmsghdr __user *)mmsg, vlen,
+			      flags | MSG_CMSG_COMPAT | MSG_CMSG_COMPAT64,
+			      timeout);
+}
+
 asmlinkage long compat_sys_socketcall(int call, u32 __user *args)
 {
 	int ret;
diff --git a/net/socket.c b/net/socket.c
index 24a7740..fae5472 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2133,7 +2133,7 @@ static int __sys_recvmsg(struct socket *sock, struct msghdr __user *msg,
 	total_len = err;
 
 	cmsg_ptr = (unsigned long)msg_sys->msg_control;
-	msg_sys->msg_flags = flags & (MSG_CMSG_CLOEXEC|MSG_CMSG_COMPAT);
+	msg_sys->msg_flags = flags & (MSG_CMSG_CLOEXEC|MSG_CMSG_COMPAT|MSG_CMSG_COMPAT64);
 
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-05 15:11                                                                 ` H.J. Lu
@ 2011-09-05 17:21                                                                   ` Arnd Bergmann
  2011-09-05 19:34                                                                     ` H.J. Lu
  2011-09-09 21:02                                                                   ` H.J. Lu
  1 sibling, 1 reply; 94+ messages in thread
From: Arnd Bergmann @ 2011-09-05 17:21 UTC (permalink / raw)
  To: H.J. Lu
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Monday 05 September 2011, H.J. Lu wrote:
> I added MSG_CMSG_COMPAT64 and new compat system calls with
> 64bit timespec/val to support it.  See the enclosed patch.

Yes, looks good. Maybe there should be an #ifdef in there though,
so the other compat architectures don't get the extra code.

> BTW, I also added
> 
> compat_sys_preadv64(unsigned long fd, const struct compat_iovec __user *vec,
>          unsigned long vlen, loff_t pos)
> compat_sys_pwritev64(unsigned long fd, const struct compat_iovec __user *vec,
>          unsigned long vlen, loff_t pos)
> 
> to support 32bit compat_iovec * and 64bit offset.

Does that make much of a difference? I would guess that it's just as
easy to do in libc by splitting the pos argument and calling the
existing compat_sys_preadv. Alternatively, you could make glibc
copy the iovec array to the 64 bit format and call the native syscall,
because compat_rw_copy_check_uvector() otherwise just ends up doing that
in kernel space. Or you just define the x32 libc iovec to 

struct iovec {
	void *iov_base;
	unsigned int __pad; /* gets cleared by libc */
	__u64 iov_len;
}

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-05 17:21                                                                   ` Arnd Bergmann
@ 2011-09-05 19:34                                                                     ` H.J. Lu
  2011-09-05 19:54                                                                       ` H.J. Lu
  0 siblings, 1 reply; 94+ messages in thread
From: H.J. Lu @ 2011-09-05 19:34 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Mon, Sep 5, 2011 at 10:21 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Monday 05 September 2011, H.J. Lu wrote:
>> I added MSG_CMSG_COMPAT64 and new compat system calls with
>> 64bit timespec/val to support it.  See the enclosed patch.
>
> Yes, looks good. Maybe there should be an #ifdef in there though,
> so the other compat architectures don't get the extra code.
>
>> BTW, I also added
>>
>> compat_sys_preadv64(unsigned long fd, const struct compat_iovec __user *vec,
>>          unsigned long vlen, loff_t pos)
>> compat_sys_pwritev64(unsigned long fd, const struct compat_iovec __user *vec,
>>          unsigned long vlen, loff_t pos)
>>
>> to support 32bit compat_iovec * and 64bit offset.
>
> Does that make much of a difference? I would guess that it's just as
> easy to do in libc by splitting the pos argument and calling the
> existing compat_sys_preadv. Alternatively, you could make glibc
> copy the iovec array to the 64 bit format and call the native syscall,
> because compat_rw_copy_check_uvector() otherwise just ends up doing that
> in kernel space. Or you just define the x32 libc iovec to
>
> struct iovec {
>        void *iov_base;
>        unsigned int __pad; /* gets cleared by libc */
>        __u64 iov_len;
> }
>

I need to clear __pad for every readv/writev/preadv/pwritev call
even if it has been cleared before.  Is compat_sys_xxx faster
than this?


-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-05 19:34                                                                     ` H.J. Lu
@ 2011-09-05 19:54                                                                       ` H.J. Lu
  2011-09-05 19:59                                                                         ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: H.J. Lu @ 2011-09-05 19:54 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Mon, Sep 5, 2011 at 12:34 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Sep 5, 2011 at 10:21 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Monday 05 September 2011, H.J. Lu wrote:
>>> I added MSG_CMSG_COMPAT64 and new compat system calls with
>>> 64bit timespec/val to support it.  See the enclosed patch.
>>
>> Yes, looks good. Maybe there should be an #ifdef in there though,
>> so the other compat architectures don't get the extra code.
>>
>>> BTW, I also added
>>>
>>> compat_sys_preadv64(unsigned long fd, const struct compat_iovec __user *vec,
>>>          unsigned long vlen, loff_t pos)
>>> compat_sys_pwritev64(unsigned long fd, const struct compat_iovec __user *vec,
>>>          unsigned long vlen, loff_t pos)
>>>
>>> to support 32bit compat_iovec * and 64bit offset.
>>
>> Does that make much of a difference? I would guess that it's just as
>> easy to do in libc by splitting the pos argument and calling the
>> existing compat_sys_preadv. Alternatively, you could make glibc
>> copy the iovec array to the 64 bit format and call the native syscall,
>> because compat_rw_copy_check_uvector() otherwise just ends up doing that
>> in kernel space. Or you just define the x32 libc iovec to
>>
>> struct iovec {
>>        void *iov_base;
>>        unsigned int __pad; /* gets cleared by libc */
>>        __u64 iov_len;
>> }
>>
>
> I need to clear __pad for every readv/writev/preadv/pwritev call
> even if it has been cleared before.  Is compat_sys_xxx faster
> than this?
>
>

Since  readv/writev/preadv/pwritev have const struct iovec *iov, I
have to copy the whole array.  compat_sys seems more efficient.

-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-05 19:54                                                                       ` H.J. Lu
@ 2011-09-05 19:59                                                                         ` H. Peter Anvin
  2011-09-05 20:27                                                                           ` Arnd Bergmann
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2011-09-05 19:59 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Arnd Bergmann, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On 09/05/2011 12:54 PM, H.J. Lu wrote:
> 
> Since  readv/writev/preadv/pwritev have const struct iovec *iov, I
> have to copy the whole array.  compat_sys seems more efficient.
> 

compat_sys for these do exactly what we want, right?

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-05 19:59                                                                         ` H. Peter Anvin
@ 2011-09-05 20:27                                                                           ` Arnd Bergmann
  0 siblings, 0 replies; 94+ messages in thread
From: Arnd Bergmann @ 2011-09-05 20:27 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: H.J. Lu, Valdis.Kletnieks, Linus Torvalds, Christoph Hellwig,
	LKML, Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Jonas Bonn, Tobias Klauser

On Monday 05 September 2011 12:59:50 H. Peter Anvin wrote:
> On 09/05/2011 12:54 PM, H.J. Lu wrote:
> > 
> > Since  readv/writev/preadv/pwritev have const struct iovec *iov, I
> > have to copy the whole array.  compat_sys seems more efficient.
> > 
> 
> compat_sys for these do exactly what we want, right?

Quoting from compat_rw_copy_check_uvector():

        if (nr_segs > fast_segs) {
                ret = -ENOMEM;
                iov = kmalloc(nr_segs*sizeof(struct iovec), GFP_KERNEL);
                if (iov == NULL)
                        goto out;
        }
        *ret_pointer = iov;

        /*
         * Single unix specification:
         * We should -EINVAL if an element length is not >= 0 and fitting an
         * ssize_t.
         *
         * In Linux, the total length is limited to MAX_RW_COUNT, there is
         * no overflow possibility.
         */
        tot_len = 0;
        ret = -EINVAL;
        for (seg = 0; seg < nr_segs; seg++) {
                compat_uptr_t buf;
                compat_ssize_t len;

                if (__get_user(len, &uvector->iov_len) ||
                   __get_user(buf, &uvector->iov_base)) {
                        ret = -EFAULT;
                        goto out;
                }
                if (len < 0)    /* size_t not fitting in compat_ssize_t .. */
                        goto out;
                if (!access_ok(vrfy_dir(type), compat_ptr(buf), len)) {
                        ret = -EFAULT;
                        goto out;
                }
                if (len > MAX_RW_COUNT - tot_len)
                        len = MAX_RW_COUNT - tot_len;
                tot_len += len;
                iov->iov_base = compat_ptr(buf);
                iov->iov_len = (compat_size_t) len;
                uvector++;
                iov++;
        }


compared to native rw_copy_check_uvector():

        if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
                ret = -EFAULT;
                goto out;
        }

        /*
         * According to the Single Unix Specification we should return EINVAL
         * if an element length is < 0 when cast to ssize_t or if the
         * total length would overflow the ssize_t return value of the
         * system call.
         *
         * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
         * overflow case.
         */
        ret = 0;
        for (seg = 0; seg < nr_segs; seg++) {
                void __user *buf = iov[seg].iov_base;
                ssize_t len = (ssize_t)iov[seg].iov_len;

                /* see if we we're about to use an invalid len or if
                 * it's about to overflow ssize_t */
                if (len < 0) {
                        ret = -EINVAL;
                        goto out;
                }
                if (unlikely(!access_ok(vrfy_dir(type), buf, len))) {
                        ret = -EFAULT;
                        goto out;
                }
                if (len > MAX_RW_COUNT - ret) {
                        len = MAX_RW_COUNT - ret;
                        iov[seg].iov_len = len;
                }
                ret += len;
        }

This is better than I thought for the compat version. The only overhead
is in reading the array in word chunks as opposed to a single memcpu for
the native case. This should barely be noticeably within the other stuff
done in the same function. So you are both right, the compat case is good.

I was assuming that this would do something worse, like an extra copy
of the data back to userspace.

	Arnd

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-27  0:43       ` Linus Torvalds
  2011-08-27  0:53         ` H. Peter Anvin
  2011-08-27  1:12         ` H. Peter Anvin
@ 2011-09-06 20:40         ` Florian Weimer
  2 siblings, 0 replies; 94+ messages in thread
From: Florian Weimer @ 2011-09-06 20:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, LKML, H.J. Lu, Ingo Molnar, Thomas Gleixner

* Linus Torvalds:

> On Fri, Aug 26, 2011 at 5:36 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> There is *ZERO* reason to not use it. Use the standard 64-bit
>> structure layout. Why the hell would it be a new system call?
>
> Oh, I see why you do that. It's because our 64-bit 'struct stat' uses
> "unsigned long" etc.
>
> Just fix that. Make it use __u64 instead of "unsigned long", and
> everything should "just work". The 64-bit kernel will not change any
> ABI, and when you compile your new ia32 model, it will do the right
> thing too.

And even if you don't want to do that (probably somebody wants to
recompile broken applications, same rationale as for n32, see
<http://gcc.gnu.org/ml/gcc/2011-02/msg00243.html>), you can still do
the translation in user space.  Then tools like strace and valgrind
just work, and you don't risk introducing kernel vulnerabilities
through broken translation code.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-05 15:11                                                                 ` H.J. Lu
  2011-09-05 17:21                                                                   ` Arnd Bergmann
@ 2011-09-09 21:02                                                                   ` H.J. Lu
  1 sibling, 0 replies; 94+ messages in thread
From: H.J. Lu @ 2011-09-09 21:02 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Valdis.Kletnieks, Linus Torvalds,
	Christoph Hellwig, LKML, Ingo Molnar, Thomas Gleixner,
	Richard Kuo, Mark Salter, Jonas Bonn, Tobias Klauser

On Mon, Sep 5, 2011 at 8:11 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Sep 5, 2011 at 12:48 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Sunday 04 September 2011 15:13:18 H.J. Lu wrote:
>>> On Sun, Sep 4, 2011 at 2:41 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> > On Sunday 04 September 2011 14:25:53 H.J. Lu wrote:
>>> >> >> >> #define __NR_x32_recvfrom
>>> >> >> >> #define __NR_x32_sendmsg
>>> >> >> >> #define __NR_x32_recvmsg
>>> >> >> >> #define __NR_x32_recvmmsg
>>> >> >> >> #define __NR_x32_sendmmsg
>>> >> >> >
>>> >> >> > These today use the MSG_CMSG_COMPAT flag to distinguish native and compat
>>> >> >> > calls. Do you plan to have another flag here to handle cmsg time values?
>>> >> >>
>>> >> >> I am using x86-32 calls for them.
>>> >> >
>>> >> > But isn't that broken? These all pass u64 or time_t values at some point.
>>> >> >
>>> >>
>>> >> time_t isn't a problem since time_t/timeval/timespec are identical for
>>> >> x32 and x86-64.  As for u64, I added NATIVE_LONG_TYPE, which is
>>> >> defined as long long for x32,  and use it instead of long in types for
>>> >> 64bit system calls.
>>> >
>>> > Sorry, I misread you as saying you use the compat syscalls for these.
>>> > If you use the native 64 bit syscalls, you have the opposite problem:
>>> > Some network protocols (e.g. netlink or rxrpc) use other data structures
>>> > that require conversion, e.g. 'long' members that x32 will get wrong.
>>>
>>> For those, I use x86-32 calls.
>>
>> So to ask again, what do you plan to do about SCM_TIMESTAMP*?
>>
>
> I added MSG_CMSG_COMPAT64 and new compat system calls with
> 64bit timespec/val to support it.  See the enclosed patch.
>

I decided to define

#define COMPAT_USE_64BIT_TIME \
  ((task_pt_regs(current)->orig_ax & __X32_SYSCALL_BIT) != 0)

instead of adding MSG_CMSG_COMPAT64.  It can be used to check
64bit time_t, timespec and timeval in any system calls.

-- 
H.J.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-08-30  7:00                 ` Geert Uytterhoeven
@ 2011-09-20 18:37                   ` Jan Engelhardt
  0 siblings, 0 replies; 94+ messages in thread
From: Jan Engelhardt @ 2011-09-20 18:37 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linus Torvalds, H. Peter Anvin, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner


On Tuesday 2011-08-30 09:00, Geert Uytterhoeven wrote:
>
>"Furthermore, there are users who seem more willing to port code known 
>to not be 64-bit clean to x32 than to do a whole new port."

Not to forget, there are qualified developers who are more than willing 
to port your code (known to not be 64-bit clean) to arbitrary 64-bit 
architectures, and properly so.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFD: x32 ABI system call numbers
  2011-09-01 11:35                             ` Arnd Bergmann
@ 2011-10-01 19:38                               ` Jonas Bonn
  0 siblings, 0 replies; 94+ messages in thread
From: Jonas Bonn @ 2011-10-01 19:38 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: H. Peter Anvin, Linus Torvalds, Christoph Hellwig, LKML, H.J. Lu,
	Ingo Molnar, Thomas Gleixner, Richard Kuo, Mark Salter,
	Tobias Klauser

[-- Attachment #1: Type: text/plain, Size: 5174 bytes --]

On Thu, 2011-09-01 at 13:35 +0200, Arnd Bergmann wrote:
> On Wednesday 31 August 2011, H. Peter Anvin wrote:
> > On 08/31/2011 10:19 AM, Linus Torvalds wrote:
> > > 
> > > I think tv_nsec was just overlooked, and people thought "it has no
> > > legacy users that were 'int', so we'll just leave it at 'long', which
> > > is guaranteed to be enough for nanoseconds that only needs a range of
> > > 32 bits".
> > > 
> > > In contrast, tv_usec probably does have legacy users that are "int".
> > > 
> > > So POSIX almost certainly only looked backwards, and never thought
> > > about users who would need to make it "long long" for compatibility
> > > reasons.
> > > 
> > > The fact that *every*other*related*field* in POSIX/SuS has a typedef
> > > exactly for these kinds of reasons just shows how stupid that "long
> > > tv_nsec" thing is.
> > > 
> > > I suspect that on Linux we can just say "tv_nsec" is suseconds_t too.
> > > Then we can make time_t and suseconds_t just match, and be "__s64" on
> > > all new platforms.
> > > 
> > 
> > Let me see if I can raise this with the POSIX committee.
> 
> Shall we go ahead with this patch for 3.1 in the meantime? This is the
> least invasive way I can see to let OpenRISC use 64 bit time_t in the
> released kernel.

Was there any consensus reached on this matter?  A couple of days left
to try to sneak this in, if this is what everyone agrees is the right
way forward...

The patch is fine as far as OpenRISC is concerned.

/Jonas



> 
> The worst thing that can happen is that we will have to change it again
> if this patch breaks something on OpenRISC, but if we don't do it now,
> then we have one more architecture stuck with 32 bit time_t or we will
> have to break its ABI.
> 
> I'm not completely convinced about the type we should use for tv_nsec
> and tv_usec. The main worry I have is that common implementations
> of timeval_add() or similar will require an expensive 64 bit division
> on 32 bit systems, which they would not need with a 32 bit suseconds_t.
> Should we use explicit padding instead in that case?
> 
> Interestingly, I noticed that parisc always uses a 32 bit suseconds_t,
> even for its 64 bit ABI (which is not used all that much), so it has
> implicit padding.
> 
> 8<----
> OpenRISC: change time_t and suseconds_t to 64 bit
> 
> time_t should really be 64 bit wide for all new ABIs including 32 bit
> architectures, to allow having timestamps beyond 2038. For now, we
> leave the default in asm-generic/posix-types to 32 bit wide, but the
> plan is to change that in the next merge window so we reduce the
> risk of breaking other architectures in the process.
> 
> In order to allow struct timespec/timeval to be free of padding, we
> need suseconds_t to be the same size, and change the second member
> of struct timespec to also be suseconds_t instead of long.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> 
> diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
> index 11162e6..77bcc02 100644
> --- a/arch/openrisc/include/asm/Kbuild
> +++ b/arch/openrisc/include/asm/Kbuild
> @@ -38,7 +38,6 @@ generic-y += msgbuf.h
>  generic-y += pci.h
>  generic-y += percpu.h
>  generic-y += poll.h
> -generic-y += posix_types.h
>  generic-y += resource.h
>  generic-y += rmap.h
>  generic-y += scatterlist.h
> diff --git a/arch/openrisc/include/asm/posix_types.h b/arch/openrisc/include/asm/posix_types.h
> new file mode 100644
> index 0000000..f0b2944
> --- /dev/null
> +++ b/arch/openrisc/include/asm/posix_types.h
> @@ -0,0 +1,12 @@
> +#ifndef __OPENRISC_POSIX_TIME_T
> +#define __OPENRISC_POSIX_TIME_T
> +
> +typedef long long __kernel_suseconds_t;
> +#define __kernel_suseconds_t __kernel_suseconds_t
> +
> +typedef long long __kernel_time_t;
> +#define __kernel_time_t __kernel_time_t
> +
> +#include <asm-generic/posix_types.h>
> +
> +#endif
> diff --git a/include/asm-generic/posix_types.h b/include/asm-generic/posix_types.h
> index 3dab008..0c53135 100644
> --- a/include/asm-generic/posix_types.h
> +++ b/include/asm-generic/posix_types.h
> @@ -39,6 +39,10 @@ typedef unsigned int	__kernel_gid_t;
>  typedef long		__kernel_suseconds_t;
>  #endif
>  
> +#ifndef __kernel_time_t
> +typedef long		__kernel_time_t;
> +#endif
> +
>  #ifndef __kernel_daddr_t
>  typedef int		__kernel_daddr_t;
>  #endif
> @@ -78,7 +82,6 @@ typedef long		__kernel_ptrdiff_t;
>   */
>  typedef long		__kernel_off_t;
>  typedef long long	__kernel_loff_t;
> -typedef long		__kernel_time_t;
>  typedef long		__kernel_clock_t;
>  typedef int		__kernel_timer_t;
>  typedef int		__kernel_clockid_t;
> diff --git a/include/linux/time.h b/include/linux/time.h
> index b306178..207c0aa 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -12,8 +12,8 @@
>  #ifndef _STRUCT_TIMESPEC
>  #define _STRUCT_TIMESPEC
>  struct timespec {
> -	__kernel_time_t	tv_sec;			/* seconds */
> -	long		tv_nsec;		/* nanoseconds */
> +	__kernel_time_t		tv_sec;		/* seconds */
> +	__kernel_suseconds_t	tv_nsec;	/* nanoseconds */
>  };
>  #endif
>  


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* 64-bit time on 32-bit systems
  2011-08-31 17:19                         ` Linus Torvalds
  2011-08-31 17:38                           ` H. Peter Anvin
@ 2012-02-08 21:36                           ` H. Peter Anvin
  1 sibling, 0 replies; 94+ messages in thread
From: H. Peter Anvin @ 2012-02-08 21:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, Christoph Hellwig, LKML, H.J. Lu, Ingo Molnar,
	Thomas Gleixner, Richard Kuo, Mark Salter, Jonas Bonn,
	Tobias Klauser, David S. Miller, H.J. Lu

Resuming a long-stuck discussion...

On 08/31/2011 10:19 AM, Linus Torvalds wrote:
> On Wed, Aug 31, 2011 at 10:09 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> I suspect only sane solution to this (having thought about it some
>>> more) is to just say "POSIX is f*^&ing wrong".
>>
>> Urk.  Someone had the bright idea of defining tv_nsec as "long" in the
>> standard, whereas tv_usec is suseconds_t.  F**** brilliant, and more
>> than a little bit stupid.
> 
> I think tv_nsec was just overlooked, and people thought "it has no
> legacy users that were 'int', so we'll just leave it at 'long', which
> is guaranteed to be enough for nanoseconds that only needs a range of
> 32 bits".
> 
> In contrast, tv_usec probably *does* have legacy users that are "int".
> 
> So POSIX almost certainly only looked backwards, and never thought
> about users who would need to make it "long long" for compatibility
> reasons.
> 
> The fact that *every*other*related*field* in POSIX/SuS has a typedef
> exactly for these kinds of reasons just shows how stupid that "long
> tv_nsec" thing is.
> 
> I suspect that on Linux we can just say "tv_nsec" is suseconds_t too.
> Then we can make time_t and suseconds_t just match, and be "__s64" on
> all new platforms.
> 

So I somewhat accidentally stumbled onto what appears to the the reason
for this while cleaning up posix_types.h last night.

The problem at hand seems to be that suseconds_t is 32 bits on SPARC64.
 This appears to be the case in both Linux and Solaris, which is
probably why struct timespec has "long" instead of suseconds_t (Sun
always have been prominent on the POSIX committee.)

As such, I don't think we can redefine struct timespec to have
suseconds_t for the nanosecond field, even on Linux.  We could define
snseconds_t, or we would have to do something really ugly like define a
padding field when on a 32-bit platform (which the kernel would then
have to ignore when reading from userspace by truncating the 64-bit
value rather than signaling an error if the upper 32 bits are set.)

snseconds_t seems semi-reasonable to me... I guess we'd have to push
that at the POSIX people.  Fortunately it shouldn't break anything to
have it be a wider type than is otherwise necessary.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2012-02-08 21:37 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-26 23:00 RFD: x32 ABI system call numbers H. Peter Anvin
2011-08-26 23:13 ` Linus Torvalds
2011-08-26 23:39   ` H. Peter Anvin
2011-08-27  0:36     ` Linus Torvalds
2011-08-27  0:43       ` Linus Torvalds
2011-08-27  0:53         ` H. Peter Anvin
2011-08-27  1:18           ` Linus Torvalds
2011-08-27  1:35             ` H. Peter Anvin
2011-08-27  1:45               ` Linus Torvalds
2011-08-27  1:12         ` H. Peter Anvin
2011-08-27  1:42           ` Linus Torvalds
2011-08-29 19:01             ` Geert Uytterhoeven
2011-08-29 19:03               ` H. Peter Anvin
2011-08-30  1:17               ` Ted Ts'o
2011-08-30  1:48               ` Linus Torvalds
2011-08-30  2:16                 ` Kyle Moffett
2011-08-30  4:45                   ` H. Peter Anvin
2011-08-30  7:06                     ` Geert Uytterhoeven
2011-08-30 12:18                       ` Arnd Bergmann
2011-08-30  7:09                   ` Andi Kleen
2011-08-30  9:56                     ` Alan Cox
2011-08-30  7:00                 ` Geert Uytterhoeven
2011-09-20 18:37                   ` Jan Engelhardt
2011-09-06 20:40         ` Florian Weimer
2011-08-27  0:57       ` H. Peter Anvin
2011-08-27  4:40         ` Christoph Hellwig
2011-08-29 15:04           ` Arnd Bergmann
2011-08-29 18:31             ` H. Peter Anvin
2011-08-30 12:09               ` Arnd Bergmann
2011-08-30 16:35                 ` H. Peter Anvin
2011-08-31 16:14                   ` Arnd Bergmann
2011-08-31 16:25                     ` H. Peter Anvin
2011-08-31 16:39                       ` Arnd Bergmann
2011-08-31 16:48                         ` Linus Torvalds
2011-08-31 19:18                           ` Arnd Bergmann
2011-08-31 19:44                             ` H. Peter Anvin
2011-08-31 19:54                               ` Alan Cox
2011-08-31 20:02                                 ` H. Peter Anvin
2011-08-31 20:55                                   ` Arnd Bergmann
2011-08-31 20:58                                     ` H. Peter Anvin
2011-08-31 19:49                             ` Geert Uytterhoeven
2011-08-31 16:46                     ` Linus Torvalds
2011-08-31 17:05                       ` H.J. Lu
2011-09-03  2:56                         ` H.J. Lu
2011-09-03  3:04                           ` Linus Torvalds
2011-09-03  4:02                             ` H.J. Lu
2011-09-03  4:29                               ` H. Peter Anvin
2011-09-03  4:44                                 ` H.J. Lu
2011-09-03  5:16                                   ` H. Peter Anvin
2011-09-03 14:11                                     ` H.J. Lu
2011-09-03  5:29                                   ` H. Peter Anvin
2011-09-03  8:41                                     ` Arnd Bergmann
2011-09-03 14:04                                       ` Valdis.Kletnieks
2011-09-03 16:40                                         ` H. Peter Anvin
2011-09-03 17:16                                           ` Valdis.Kletnieks
2011-09-03 17:22                                             ` H.J. Lu
2011-09-03 17:28                                               ` H. Peter Anvin
2011-09-03 17:27                                             ` H. Peter Anvin
2011-09-04 13:51                                               ` Valdis.Kletnieks
2011-09-04 15:17                                               ` Arnd Bergmann
2011-09-04 17:08                                                 ` Linus Torvalds
2011-09-04 18:40                                                 ` H.J. Lu
2011-09-04 19:06                                                   ` Arnd Bergmann
2011-09-04 19:31                                                     ` H.J. Lu
2011-09-04 21:13                                                       ` Arnd Bergmann
2011-09-04 21:25                                                         ` H.J. Lu
2011-09-04 21:41                                                           ` Arnd Bergmann
2011-09-04 22:13                                                             ` H.J. Lu
2011-09-05  7:48                                                               ` Arnd Bergmann
2011-09-05 15:11                                                                 ` H.J. Lu
2011-09-05 17:21                                                                   ` Arnd Bergmann
2011-09-05 19:34                                                                     ` H.J. Lu
2011-09-05 19:54                                                                       ` H.J. Lu
2011-09-05 19:59                                                                         ` H. Peter Anvin
2011-09-05 20:27                                                                           ` Arnd Bergmann
2011-09-09 21:02                                                                   ` H.J. Lu
2011-09-04 20:11                                                     ` H. Peter Anvin
2011-09-04 19:31                                                   ` richard -rw- weinberger
2011-09-04 19:32                                                     ` H.J. Lu
2011-09-03 14:15                                     ` H.J. Lu
2011-08-31 17:09                       ` H. Peter Anvin
2011-08-31 17:19                         ` Linus Torvalds
2011-08-31 17:38                           ` H. Peter Anvin
2011-09-01 11:35                             ` Arnd Bergmann
2011-10-01 19:38                               ` Jonas Bonn
2012-02-08 21:36                           ` 64-bit time on 32-bit systems H. Peter Anvin
2011-09-01 13:30                         ` RFD: x32 ABI system call numbers Avi Kivity
2011-09-01 14:13                           ` H. Peter Anvin
2011-09-02  0:49                             ` Pedro Alves
2011-09-02  1:51                               ` H. Peter Anvin
2011-09-02  8:02                                 ` Arnd Bergmann
2011-09-02  8:42                                 ` Pedro Alves
2011-09-01  6:08                     ` Jonas Bonn
2011-09-02  6:17     ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.