All of lore.kernel.org
 help / color / mirror / Atom feed
* Cleaning up numbering for new x86 syscalls?
@ 2018-11-20  0:22 Andy Lutomirski
  2018-11-20  7:33 ` Ingo Molnar
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Andy Lutomirski @ 2018-11-20  0:22 UTC (permalink / raw)
  To: X86 ML, LKML, Borislav Petkov, Peter Zijlstra, Tycho Andersen,
	Daniel Colascione, Florian Weimer, Carlos O'Donell,
	Rich Felker

Hi all-

We currently have some giant turds in the way that syscalls are
numbered.  We have the x86_32 table, which is totally sane other than
some legacy multiplexers.  Then we have the x86_64 table, which is,
um, demented:

 - The numbers don't match x86_32.  I have no idea why.

 - We use bit 30, which triggers in_x32_syscall().  It should have
been bit 31, bit I digress.

 - We have this weird set of extra x32 syscalls that start at 512.
Who wants to bet whether we have no bugs if someone does syscall with,
say, nr == 512 (i.e. not 512 | BIT(30)) or nr == (16 | BIT(30))?  The
latter would be non-compat ioctl with in_x32_syscall() set and hence
in_compat_syscall() set.

 - Bloody restart_syscall() has a different number on x86_64 and
x64_32, which is a big mess.

I propose we consider some subset of the following:

1. Introduce restart_syscall_2().  Make its number be 1024.  Maybe
someday we could start using it instead of restart_syscall().  The
only issue I can see is programs that allow restart_syscall() using
seccomp but don't allow the new variant.

2. Introduce an outright ban on new syscalls with nr < 1024.

3. Introduce an outright ban on the addition of new __x32_compat
syscalls.  If new compat hacks are needed, they can use
in_compat_syscall(), thank you very much.

4. Modify the wrappers of the __x32_compat entries so that they will
return -ENOSYS if in_x32_syscall() returns false.

5. Adjust the scripts so that we only have to wire up new syscalls
once.  They'll have a nr above 1024, and they'll have the same nr on
all x86 variants.

Thoughts?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up numbering for new x86 syscalls?
  2018-11-20  0:22 Cleaning up numbering for new x86 syscalls? Andy Lutomirski
@ 2018-11-20  7:33 ` Ingo Molnar
  2018-11-20 23:04   ` Bernd Petrovitsch
  2018-11-20  9:03 ` Florian Weimer
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2018-11-20  7:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, LKML, Borislav Petkov, Peter Zijlstra, Tycho Andersen,
	Daniel Colascione, Florian Weimer, Carlos O'Donell,
	Rich Felker


* Andy Lutomirski <luto@kernel.org> wrote:

> Hi all-
> 
> We currently have some giant turds in the way that syscalls are
> numbered.  We have the x86_32 table, which is totally sane other than
> some legacy multiplexers.  Then we have the x86_64 table, which is,
> um, demented:
> 
>  - The numbers don't match x86_32.  I have no idea why.
> 
>  - We use bit 30, which triggers in_x32_syscall().  It should have
> been bit 31, bit I digress.
> 
>  - We have this weird set of extra x32 syscalls that start at 512.
> Who wants to bet whether we have no bugs if someone does syscall with,
> say, nr == 512 (i.e. not 512 | BIT(30)) or nr == (16 | BIT(30))?  The
> latter would be non-compat ioctl with in_x32_syscall() set and hence
> in_compat_syscall() set.
> 
>  - Bloody restart_syscall() has a different number on x86_64 and
> x64_32, which is a big mess.
> 
> I propose we consider some subset of the following:
> 
> 1. Introduce restart_syscall_2().  Make its number be 1024.  Maybe
> someday we could start using it instead of restart_syscall().  The
> only issue I can see is programs that allow restart_syscall() using
> seccomp but don't allow the new variant.
> 
> 2. Introduce an outright ban on new syscalls with nr < 1024.

Also let's make sure it results in a build error or boot panic if someone 
tries.

> 3. Introduce an outright ban on the addition of new __x32_compat
> syscalls.  If new compat hacks are needed, they can use
> in_compat_syscall(), thank you very much.

Here too build-time and runtime enforcement would be nice.

> 4. Modify the wrappers of the __x32_compat entries so that they will
> return -ENOSYS if in_x32_syscall() returns false.
> 
> 5. Adjust the scripts so that we only have to wire up new syscalls
> once.  They'll have a nr above 1024, and they'll have the same nr on
> all x86 variants.
> 
> Thoughts?

Fully agreed:

6. Is x32 even used in practice? I still think it was a mistake to add it 
   and some significant distributions like Fedora are not enabling it.

Barring any sane way to phase out x32 support I'd suggest we implement 
all your suggestions.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up numbering for new x86 syscalls?
  2018-11-20  0:22 Cleaning up numbering for new x86 syscalls? Andy Lutomirski
  2018-11-20  7:33 ` Ingo Molnar
@ 2018-11-20  9:03 ` Florian Weimer
  2018-11-20 15:23   ` Andy Lutomirski
  2018-11-20 16:48 ` Tycho Andersen
  2018-11-21 17:14 ` Arnd Bergmann
  3 siblings, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2018-11-20  9:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, LKML, Borislav Petkov, Peter Zijlstra, Tycho Andersen,
	Daniel Colascione, Carlos O'Donell, Rich Felker

* Andy Lutomirski:

> 5. Adjust the scripts so that we only have to wire up new syscalls
> once.  They'll have a nr above 1024, and they'll have the same nr on
> all x86 variants.

Is there a sufficiently sized gap on all other architectures as well?
The restriction to the x86 variants seems arbitrary to me.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up numbering for new x86 syscalls?
  2018-11-20  9:03 ` Florian Weimer
@ 2018-11-20 15:23   ` Andy Lutomirski
  2018-11-20 18:07     ` Josh Poimboeuf
  2018-11-21 17:23     ` Arnd Bergmann
  0 siblings, 2 replies; 10+ messages in thread
From: Andy Lutomirski @ 2018-11-20 15:23 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andrew Lutomirski, X86 ML, LKML, Borislav Petkov, Peter Zijlstra,
	Tycho Andersen, Daniel Colascione, Carlos O'Donell,
	Rich Felker

On Tue, Nov 20, 2018 at 1:03 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Andy Lutomirski:
>
> > 5. Adjust the scripts so that we only have to wire up new syscalls
> > once.  They'll have a nr above 1024, and they'll have the same nr on
> > all x86 variants.
>
> Is there a sufficiently sized gap on all other architectures as well?
> The restriction to the x86 variants seems arbitrary to me.
>

Fair point.  We have this shiny "generic" syscall list.  Maybe we can
get x86 synced up with it for new syscalls.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up numbering for new x86 syscalls?
  2018-11-20  0:22 Cleaning up numbering for new x86 syscalls? Andy Lutomirski
  2018-11-20  7:33 ` Ingo Molnar
  2018-11-20  9:03 ` Florian Weimer
@ 2018-11-20 16:48 ` Tycho Andersen
  2018-11-21 17:14 ` Arnd Bergmann
  3 siblings, 0 replies; 10+ messages in thread
From: Tycho Andersen @ 2018-11-20 16:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, LKML, Borislav Petkov, Peter Zijlstra, Daniel Colascione,
	Florian Weimer, Carlos O'Donell, Rich Felker

On Mon, Nov 19, 2018 at 04:22:49PM -0800, Andy Lutomirski wrote:
> Hi all-
> 
> We currently have some giant turds in the way that syscalls are
> numbered.  We have the x86_32 table, which is totally sane other than
> some legacy multiplexers.  Then we have the x86_64 table, which is,
> um, demented:
> 
>  - The numbers don't match x86_32.  I have no idea why.
> 
>  - We use bit 30, which triggers in_x32_syscall().  It should have
> been bit 31, bit I digress.
> 
>  - We have this weird set of extra x32 syscalls that start at 512.
> Who wants to bet whether we have no bugs if someone does syscall with,
> say, nr == 512 (i.e. not 512 | BIT(30)) or nr == (16 | BIT(30))?  The
> latter would be non-compat ioctl with in_x32_syscall() set and hence
> in_compat_syscall() set.
> 
>  - Bloody restart_syscall() has a different number on x86_64 and
> x64_32, which is a big mess.
> 
> I propose we consider some subset of the following:
> 
> 1. Introduce restart_syscall_2().  Make its number be 1024.  Maybe
> someday we could start using it instead of restart_syscall().  The
> only issue I can see is programs that allow restart_syscall() using
> seccomp but don't allow the new variant.
>
> 2. Introduce an outright ban on new syscalls with nr < 1024.
> 
> 3. Introduce an outright ban on the addition of new __x32_compat
> syscalls.  If new compat hacks are needed, they can use
> in_compat_syscall(), thank you very much.
> 
> 4. Modify the wrappers of the __x32_compat entries so that they will
> return -ENOSYS if in_x32_syscall() returns false.

This sounds like a great idea independent of all of this.

> 5. Adjust the scripts so that we only have to wire up new syscalls
> once.  They'll have a nr above 1024, and they'll have the same nr on
> all x86 variants.
> 
> Thoughts?

+1. Who wants to do it? :D

Tycho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up numbering for new x86 syscalls?
  2018-11-20 15:23   ` Andy Lutomirski
@ 2018-11-20 18:07     ` Josh Poimboeuf
  2018-11-21 17:23     ` Arnd Bergmann
  1 sibling, 0 replies; 10+ messages in thread
From: Josh Poimboeuf @ 2018-11-20 18:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Florian Weimer, X86 ML, LKML, Borislav Petkov, Peter Zijlstra,
	Tycho Andersen, Daniel Colascione, Carlos O'Donell,
	Rich Felker, Adhemerval Zanella

On Tue, Nov 20, 2018 at 07:23:09AM -0800, Andy Lutomirski wrote:
> On Tue, Nov 20, 2018 at 1:03 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Andy Lutomirski:
> >
> > > 5. Adjust the scripts so that we only have to wire up new syscalls
> > > once.  They'll have a nr above 1024, and they'll have the same nr on
> > > all x86 variants.
> >
> > Is there a sufficiently sized gap on all other architectures as well?
> > The restriction to the x86 variants seems arbitrary to me.
> >
> 
> Fair point.  We have this shiny "generic" syscall list.  Maybe we can
> get x86 synced up with it for new syscalls.

I heard this discussed at Plumbers.  There was a proposal to use the
same syscall numbers across architectures.  Also, when adding new
generic syscalls, they want all arches to be wired up at the same time.

  https://linuxplumbersconf.org/event/2/contributions/149/attachments/129/161/Ideas_to_improve_glibc_and_Kernel_interaction.pdf

Adding Adhemerval to CC.

-- 
Josh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up numbering for new x86 syscalls?
  2018-11-20  7:33 ` Ingo Molnar
@ 2018-11-20 23:04   ` Bernd Petrovitsch
  2018-11-30 23:25     ` Maciej W. Rozycki
  0 siblings, 1 reply; 10+ messages in thread
From: Bernd Petrovitsch @ 2018-11-20 23:04 UTC (permalink / raw)
  To: Ingo Molnar, Andy Lutomirski
  Cc: X86 ML, LKML, Borislav Petkov, Peter Zijlstra, Tycho Andersen,
	Daniel Colascione, Florian Weimer, Carlos O'Donell,
	Rich Felker

On 20/11/2018 08:33, Ingo Molnar wrote:
[...]
> 6. Is x32 even used in practice? I still think it was a mistake to add it 
>    and some significant distributions like Fedora are not enabling it.

x32 works as far as gcc/gas/ld is concerned (at least for compiling
non-trivial programs).
Finding a distribution that actually *delivers* x32 libraries is another
thing (and said non-trivial software uses ATM e.g. libxml2) - at least I
can't find an "x32-Ubuntu".
And no, I don't see a compelling reason to (try to) build the n+1.
architecture for the major distributions.
And yes, lots of stuff will not compile out of the box (especially if
one uses a somewhat sane set of gcc options - not only -Wall -Wextra
-Werror) but if one gets software to compile for i386 and x86_64,
getting it to compile for x32 is a Friday afternoon job (more or less).
And yes, there is enough hardware/systems out there that uses 64bit CPUs
(for whatever reason - if only that one can't get a 32bit CPU for that
board) but will never ever need more than 2-3 GB RAM .....

MfG,
	Bernd
-- 
Bernd Petrovitsch                  Email : bernd@petrovitsch.priv.at
                     LUGA : http://www.luga.at

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up numbering for new x86 syscalls?
  2018-11-20  0:22 Cleaning up numbering for new x86 syscalls? Andy Lutomirski
                   ` (2 preceding siblings ...)
  2018-11-20 16:48 ` Tycho Andersen
@ 2018-11-21 17:14 ` Arnd Bergmann
  3 siblings, 0 replies; 10+ messages in thread
From: Arnd Bergmann @ 2018-11-21 17:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List,
	Borislav Petkov, Peter Zijlstra, Tycho Andersen,
	Daniel Colascione, Florian Weimer, carlos, Rich Felker

On Tue, Nov 20, 2018 at 1:25 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> Hi all-
>
> We currently have some giant turds in the way that syscalls are
> numbered.  We have the x86_32 table, which is totally sane other than
> some legacy multiplexers.  Then we have the x86_64 table, which is,
> um, demented:
>
>  - The numbers don't match x86_32.  I have no idea why.

I think it was an early attempt at cleanup up the table, and only
adding those that were still used. Back in the days, each architecture
had its own table, and of course they started out as separate
top-level architectures.

>  - We use bit 30, which triggers in_x32_syscall().  It should have
> been bit 31, bit I digress.
>
>  - We have this weird set of extra x32 syscalls that start at 512.
> Who wants to bet whether we have no bugs if someone does syscall with,
> say, nr == 512 (i.e. not 512 | BIT(30)) or nr == (16 | BIT(30))?  The
> latter would be non-compat ioctl with in_x32_syscall() set and hence
> in_compat_syscall() set.

The comment in the table says it's purely for keeping the calls
in separate cache lines. I don't know if the cache lines make
a difference in the end, but it seems that once we start running
into the x32 syscall numbers, I think we just treat them like any
others, we just choose to never call them from a 64-bit glibc.

> I propose we consider some subset of the following:
>
> 1. Introduce restart_syscall_2().  Make its number be 1024.  Maybe
> someday we could start using it instead of restart_syscall().  The
> only issue I can see is programs that allow restart_syscall() using
> seccomp but don't allow the new variant.
>
> 2. Introduce an outright ban on new syscalls with nr < 1024.

This would leave a hole of several hundred numbers if we do it
for all architectures. Wasting multiple kilobytes for a cosmetic
cleanup might be considered excessive.

> 3. Introduce an outright ban on the addition of new __x32_compat
> syscalls.  If new compat hacks are needed, they can use
> in_compat_syscall(), thank you very much.

I would definitely want to keep anything regarding x32 out of the
common syscall implementation. If you want to add on to that
pile, please do it in arch/x86, not in kernel/ or fs/.

If we decide that x32 is a failed experiment and we don't keep
it working in the future, let's just kill it off right away. I'm fairly
sure nobody depends on it for anything real, the only users I
could find are either for showing off benchmark results or for
playing around with it for fun. Most of that fun part has apparently
ended many years ago, but there is still some work going into
debian/x32. We probably need to coordinate with them and see
if they know of actual users before removing it. Popcon lists
5 active users [1] and a sharp downward trend.

> 4. Modify the wrappers of the __x32_compat entries so that they will
> return -ENOSYS if in_x32_syscall() returns false.

No objection here, but what would that help?

> 5. Adjust the scripts so that we only have to wire up new syscalls
> once.  They'll have a nr above 1024, and they'll have the same nr on
> all x86 variants.
>
> Thoughts?

I would definitely welcome assigning the same syscall numbers across
all architectures. It is a needless burden for the libc developers to
figure out for each syscall which kernel is known to support it.
When a call gets added, they typically add logic to check for the
system call at runtime, but for older syscalls, it helps to know when
all architectures support it once the minimum kernel version for
a libc has been raised beyond that.

Please see also the work that Firoz Khan has been posting
for generalizing the tables on all architectures to use the
format we have on x86, arm and s390. I hope we can merge it
all for 4.21, and then build on top of that for generalization and
cleanups.

      Arnd

[1] https://popcon.debian.org/stat/sub-x32.png

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up numbering for new x86 syscalls?
  2018-11-20 15:23   ` Andy Lutomirski
  2018-11-20 18:07     ` Josh Poimboeuf
@ 2018-11-21 17:23     ` Arnd Bergmann
  1 sibling, 0 replies; 10+ messages in thread
From: Arnd Bergmann @ 2018-11-21 17:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Florian Weimer, the arch/x86 maintainers,
	Linux Kernel Mailing List, Borislav Petkov, Peter Zijlstra,
	Tycho Andersen, Daniel Colascione, carlos, Rich Felker

On Tue, Nov 20, 2018 at 4:35 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Tue, Nov 20, 2018 at 1:03 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Andy Lutomirski:
> >
> > > 5. Adjust the scripts so that we only have to wire up new syscalls
> > > once.  They'll have a nr above 1024, and they'll have the same nr on
> > > all x86 variants.
> >
> > Is there a sufficiently sized gap on all other architectures as well?
> > The restriction to the x86 variants seems arbitrary to me.
> >
>
> Fair point.  We have this shiny "generic" syscall list.  Maybe we can
> get x86 synced up with it for new syscalls.

The generic table is already a subset of the x86 tables, so there
should be no need to sync up the contents.

It's more critical on other architectures that currently lack a number
of the syscalls that got added in asm-generic and x86 recently,
so I'd like to synchronize these all and add the missing calls
to ensure that each architecture has at least all the calls from
asm-generic table.

After that,  I would hope to come up with a way to add future numbers
to all tables together, either using the same numbers everywhere (plus
an offset where necessary, e.g. mips), or even have an include file
logic so we only need a single file for future additions.

Note: for y2038, we will have to add around 20 to 25 syscalls to each
32-bit architecture, plus another 10 for those that lack the separate
sys_ipc calls.


      Arnd

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up numbering for new x86 syscalls?
  2018-11-20 23:04   ` Bernd Petrovitsch
@ 2018-11-30 23:25     ` Maciej W. Rozycki
  0 siblings, 0 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2018-11-30 23:25 UTC (permalink / raw)
  To: Bernd Petrovitsch
  Cc: Ingo Molnar, Andy Lutomirski, X86 ML, LKML, Borislav Petkov,
	Peter Zijlstra, Tycho Andersen, Daniel Colascione,
	Florian Weimer, Carlos O'Donell, Rich Felker

On Wed, 21 Nov 2018, Bernd Petrovitsch wrote:

> And yes, lots of stuff will not compile out of the box (especially if
> one uses a somewhat sane set of gcc options - not only -Wall -Wextra
> -Werror) but if one gets software to compile for i386 and x86_64,
> getting it to compile for x32 is a Friday afternoon job (more or less).
> And yes, there is enough hardware/systems out there that uses 64bit CPUs
> (for whatever reason - if only that one can't get a 32bit CPU for that
> board) but will never ever need more than 2-3 GB RAM .....

 The functionally equivalent 64-bit ILP32 MIPS n32 ABI has been around 
supported by Linux and the GNU toolchain for some 17 years now and people 
have been using it, so by now any sane piece of software that does not use 
handcoded assembly should work out of the box for the x86-64 x32 ABI as 
well.

 NB the important advantage of an LP64 ABI over an ILP32 ABI is the 
ability to mmap(2) files that exceed 4GiB in size (and in reality even 
smaller ones, as some user VM space is surely needed for other stuff), 
regardless of how much physical RAM is actually supported or has been 
installed.

 And these days even a web browser can easily overrun a 4GiB VM space. :(

 FWIW,

  Maciej

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-11-30 23:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-20  0:22 Cleaning up numbering for new x86 syscalls? Andy Lutomirski
2018-11-20  7:33 ` Ingo Molnar
2018-11-20 23:04   ` Bernd Petrovitsch
2018-11-30 23:25     ` Maciej W. Rozycki
2018-11-20  9:03 ` Florian Weimer
2018-11-20 15:23   ` Andy Lutomirski
2018-11-20 18:07     ` Josh Poimboeuf
2018-11-21 17:23     ` Arnd Bergmann
2018-11-20 16:48 ` Tycho Andersen
2018-11-21 17:14 ` Arnd Bergmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.