linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* in_compat_syscall() on x86
@ 2021-01-04 12:16 David Laight
  2021-01-04 16:46 ` David Laight
  2021-01-04 16:58 ` Al Viro
  0 siblings, 2 replies; 14+ messages in thread
From: David Laight @ 2021-01-04 12:16 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig, linux-kernel

On x86 in_compat_syscall() is defined as:
    in_ia32_syscall() || in_x32_syscall()

Now in_ia32_syscall() is a simple check of the TS_COMPAT flag.
However in_x32_syscall() is a horrid beast that has to indirect
through to the original %eax value (ie the syscall number) and
check for a bit there.

So on a kernel with x32 support (probably most distro kernels)
the in_compat_syscall() check is rather more expensive than
one might expect.

It would be muck better if both checks could be done together.
I think this would require the syscall entry code to set a
value in both the 64bit and x32 entry paths.
(Can a process make both 64bit and x32 system calls?)

To do this sensible (probably) requires a byte be allocated
to hold the syscall type - rather than using flag bits
in the 'status' field.

Apart from the syscall entry, the exec code seems to change
the syscall type to that of the binary being executed.
I didn't spot anything else that changes the fields.

But I failed to find the full list of allocated bits for
the 'status' field.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: in_compat_syscall() on x86
  2021-01-04 12:16 in_compat_syscall() on x86 David Laight
@ 2021-01-04 16:46 ` David Laight
  2021-01-04 16:58 ` Al Viro
  1 sibling, 0 replies; 14+ messages in thread
From: David Laight @ 2021-01-04 16:46 UTC (permalink / raw)
  To: David Laight, Al Viro, Christoph Hellwig, linux-kernel; +Cc: X86 ML

Copy x86@kernel.org

> -----Original Message-----
> From: David Laight <David.Laight@ACULAB.COM>
> Sent: 04 January 2021 12:17
> To: Al Viro <viro@zeniv.linux.org.uk>; Christoph Hellwig <hch@lst.de>; linux-kernel@vger.kernel.org
> Subject: in_compat_syscall() on x86
> 
> On x86 in_compat_syscall() is defined as:
>     in_ia32_syscall() || in_x32_syscall()
> 
> Now in_ia32_syscall() is a simple check of the TS_COMPAT flag.
> However in_x32_syscall() is a horrid beast that has to indirect
> through to the original %eax value (ie the syscall number) and
> check for a bit there.
> 
> So on a kernel with x32 support (probably most distro kernels)
> the in_compat_syscall() check is rather more expensive than
> one might expect.
> 
> It would be muck better if both checks could be done together.
> I think this would require the syscall entry code to set a
> value in both the 64bit and x32 entry paths.
> (Can a process make both 64bit and x32 system calls?)
> 
> To do this sensible (probably) requires a byte be allocated
> to hold the syscall type - rather than using flag bits
> in the 'status' field.
> 
> Apart from the syscall entry, the exec code seems to change
> the syscall type to that of the binary being executed.
> I didn't spot anything else that changes the fields.
> 
> But I failed to find the full list of allocated bits for
> the 'status' field.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: in_compat_syscall() on x86
  2021-01-04 12:16 in_compat_syscall() on x86 David Laight
  2021-01-04 16:46 ` David Laight
@ 2021-01-04 16:58 ` Al Viro
  2021-01-04 20:41   ` Eric W. Biederman
  1 sibling, 1 reply; 14+ messages in thread
From: Al Viro @ 2021-01-04 16:58 UTC (permalink / raw)
  To: David Laight; +Cc: Christoph Hellwig, linux-kernel

On Mon, Jan 04, 2021 at 12:16:56PM +0000, David Laight wrote:
> On x86 in_compat_syscall() is defined as:
>     in_ia32_syscall() || in_x32_syscall()
> 
> Now in_ia32_syscall() is a simple check of the TS_COMPAT flag.
> However in_x32_syscall() is a horrid beast that has to indirect
> through to the original %eax value (ie the syscall number) and
> check for a bit there.
> 
> So on a kernel with x32 support (probably most distro kernels)
> the in_compat_syscall() check is rather more expensive than
> one might expect.
> 
> It would be muck better if both checks could be done together.
> I think this would require the syscall entry code to set a
> value in both the 64bit and x32 entry paths.
> (Can a process make both 64bit and x32 system calls?)

Yes, it bloody well can.

And I see no benefit in pushing that logics into syscall entry,
since anything that calls in_compat_syscall() more than once
per syscall execution is doing the wrong thing.  Moreover,
in quite a few cases we don't call the sucker at all, and for
all of those pushing that crap into syscall entry logics is
pure loss.

What's the point, really?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: in_compat_syscall() on x86
  2021-01-04 16:58 ` Al Viro
@ 2021-01-04 20:41   ` Eric W. Biederman
  2021-01-04 22:34     ` David Laight
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2021-01-04 20:41 UTC (permalink / raw)
  To: Al Viro; +Cc: David Laight, Christoph Hellwig, linux-kernel, X86 ML

Al Viro <viro@zeniv.linux.org.uk> writes:

> On Mon, Jan 04, 2021 at 12:16:56PM +0000, David Laight wrote:
>> On x86 in_compat_syscall() is defined as:
>>     in_ia32_syscall() || in_x32_syscall()
>> 
>> Now in_ia32_syscall() is a simple check of the TS_COMPAT flag.
>> However in_x32_syscall() is a horrid beast that has to indirect
>> through to the original %eax value (ie the syscall number) and
>> check for a bit there.
>> 
>> So on a kernel with x32 support (probably most distro kernels)
>> the in_compat_syscall() check is rather more expensive than
>> one might expect.

I suggest you check the distro kernels.  I suspect they don't compile in
support for x32.  As far as I can tell x32 is an undead beast of a
subarchitecture that just enough people use that it can't be removed,
but few enough people use it likely has a few lurking scary bugs.

>> It would be muck better if both checks could be done together.
>> I think this would require the syscall entry code to set a
>> value in both the 64bit and x32 entry paths.
>> (Can a process make both 64bit and x32 system calls?)
>
> Yes, it bloody well can.
>
> And I see no benefit in pushing that logics into syscall entry,
> since anything that calls in_compat_syscall() more than once
> per syscall execution is doing the wrong thing.  Moreover,
> in quite a few cases we don't call the sucker at all, and for
> all of those pushing that crap into syscall entry logics is
> pure loss.

The x32 system calls have their own system call table and it would be
trivial to set a flag like TS_COMPAT when looking up a system call from
that table.  I expect such a change would be purely in the noise.

> What's the point, really?

Before we came up with the current games with __copy_siginfo_to_user
and x32_copy_siginfo_to_user I was wondering if we should make such
a change.  The delivery of compat signal frames and core dumps which
do not go through the system call entry path could almost benefit from
a flag that could be set/tested when on those paths.

The fact that only SIGCHLD (which can not trigger a coredump) is
different saves the coredump code from needing such a test.

The fact that the signal frame code is simple enough it can directly
call x32_copy_siginfo_to_user or __copy_siginfo_to_user saves us there.

So I don't think we have any cases where we actually need a flag that
is independent of the system call but we have come very close.


For people who want to optimize I suggest tracking down the handful of
users of x32 and see if x32 can be made to just go away.

Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: in_compat_syscall() on x86
  2021-01-04 20:41   ` Eric W. Biederman
@ 2021-01-04 22:34     ` David Laight
  2021-01-04 23:04       ` Andy Lutomirski
  0 siblings, 1 reply; 14+ messages in thread
From: David Laight @ 2021-01-04 22:34 UTC (permalink / raw)
  To: 'Eric W. Biederman', Al Viro
  Cc: Christoph Hellwig, linux-kernel, X86 ML

From: Eric W. Biederman
> Sent: 04 January 2021 20:41
> 
> Al Viro <viro@zeniv.linux.org.uk> writes:
> 
> > On Mon, Jan 04, 2021 at 12:16:56PM +0000, David Laight wrote:
> >> On x86 in_compat_syscall() is defined as:
> >>     in_ia32_syscall() || in_x32_syscall()
> >>
> >> Now in_ia32_syscall() is a simple check of the TS_COMPAT flag.
> >> However in_x32_syscall() is a horrid beast that has to indirect
> >> through to the original %eax value (ie the syscall number) and
> >> check for a bit there.
> >>
> >> So on a kernel with x32 support (probably most distro kernels)
> >> the in_compat_syscall() check is rather more expensive than
> >> one might expect.
> 
> I suggest you check the distro kernels.  I suspect they don't compile in
> support for x32.  As far as I can tell x32 is an undead beast of a
> subarchitecture that just enough people use that it can't be removed,
> but few enough people use it likely has a few lurking scary bugs.

It is defined in the Ubuntu kernel configs I've got lurking:
Both 3.8.0-19_generic (Ubuntu 13.04) and 5.4.0-56_generic (probably 20.04).
Which is probably why it is in my test builds (I've just cut out
a lot of modules).

> >> It would be muck better if both checks could be done together.
> >> I think this would require the syscall entry code to set a
> >> value in both the 64bit and x32 entry paths.
> >> (Can a process make both 64bit and x32 system calls?)
> >
> > Yes, it bloody well can.
> >
> > And I see no benefit in pushing that logics into syscall entry,
> > since anything that calls in_compat_syscall() more than once
> > per syscall execution is doing the wrong thing.  Moreover,
> > in quite a few cases we don't call the sucker at all, and for
> > all of those pushing that crap into syscall entry logics is
> > pure loss.
> 
> The x32 system calls have their own system call table and it would be
> trivial to set a flag like TS_COMPAT when looking up a system call from
> that table.  I expect such a change would be purely in the noise.

Certainly a write of 0/1/2 into a dirtied cache line of 'current'
could easily cost absolutely nothing.
Especially if current has already been read.

I also wondered about resetting it to zero when an x32 system call
exits (rather than entry to a 64bit one).

For ia32 the flag is set (with |=) on every syscall entry.
Even though I'm pretty sure it can only change during exec.

> > What's the point, really?
> 
> Before we came up with the current games with __copy_siginfo_to_user
> and x32_copy_siginfo_to_user I was wondering if we should make such
> a change.  The delivery of compat signal frames and core dumps which
> do not go through the system call entry path could almost benefit from
> a flag that could be set/tested when on those paths.

For signal delivery it should (probably) depend on the system call
that setup the signal handler.
Although I'm sure I remember one kernel where some of it was done
in libc (with a single entrypoint for all hadlers).

> The fact that only SIGCHLD (which can not trigger a coredump) is
> different saves the coredump code from needing such a test.
> 
> The fact that the signal frame code is simple enough it can directly
> call x32_copy_siginfo_to_user or __copy_siginfo_to_user saves us there.
> 
> So I don't think we have any cases where we actually need a flag that
> is independent of the system call but we have come very close.

If a program can do both 64bit and x32 system calls you probably
need to generate a 64bit core dump if it has ever made a 64bit
system call??

> For people who want to optimize I suggest tracking down the handful of
> users of x32 and see if x32 can be made to just go away.

Unlikely since Ubuntu seem to have enabled it for years.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: in_compat_syscall() on x86
  2021-01-04 22:34     ` David Laight
@ 2021-01-04 23:04       ` Andy Lutomirski
  2021-01-05  0:47         ` Eric W. Biederman
  2021-01-05  9:53         ` David Laight
  0 siblings, 2 replies; 14+ messages in thread
From: Andy Lutomirski @ 2021-01-04 23:04 UTC (permalink / raw)
  To: David Laight
  Cc: Eric W. Biederman, Al Viro, Christoph Hellwig, linux-kernel, X86 ML


> On Jan 4, 2021, at 2:36 PM, David Laight <David.Laight@aculab.com> wrote:
> 
> From: Eric W. Biederman
>> Sent: 04 January 2021 20:41
>> 
>> Al Viro <viro@zeniv.linux.org.uk> writes:
>> 
>>> On Mon, Jan 04, 2021 at 12:16:56PM +0000, David Laight wrote:
>>>> On x86 in_compat_syscall() is defined as:
>>>>    in_ia32_syscall() || in_x32_syscall()
>>>> 
>>>> Now in_ia32_syscall() is a simple check of the TS_COMPAT flag.
>>>> However in_x32_syscall() is a horrid beast that has to indirect
>>>> through to the original %eax value (ie the syscall number) and
>>>> check for a bit there.
>>>> 
>>>> So on a kernel with x32 support (probably most distro kernels)
>>>> the in_compat_syscall() check is rather more expensive than
>>>> one might expect.
>> 
>> I suggest you check the distro kernels.  I suspect they don't compile in
>> support for x32.  As far as I can tell x32 is an undead beast of a
>> subarchitecture that just enough people use that it can't be removed,
>> but few enough people use it likely has a few lurking scary bugs.
> 
> It is defined in the Ubuntu kernel configs I've got lurking:
> Both 3.8.0-19_generic (Ubuntu 13.04) and 5.4.0-56_generic (probably 20.04).
> Which is probably why it is in my test builds (I've just cut out
> a lot of modules).
> 
>>>> It would be muck better if both checks could be done together.
>>>> I think this would require the syscall entry code to set a
>>>> value in both the 64bit and x32 entry paths.
>>>> (Can a process make both 64bit and x32 system calls?)
>>> 
>>> Yes, it bloody well can.
>>> 
>>> And I see no benefit in pushing that logics into syscall entry,
>>> since anything that calls in_compat_syscall() more than once
>>> per syscall execution is doing the wrong thing.  Moreover,
>>> in quite a few cases we don't call the sucker at all, and for
>>> all of those pushing that crap into syscall entry logics is
>>> pure loss.
>> 
>> The x32 system calls have their own system call table and it would be
>> trivial to set a flag like TS_COMPAT when looking up a system call from
>> that table.  I expect such a change would be purely in the noise.
> 
> Certainly a write of 0/1/2 into a dirtied cache line of 'current'
> could easily cost absolutely nothing.
> Especially if current has already been read.
> 
> I also wondered about resetting it to zero when an x32 system call
> exits (rather than entry to a 64bit one).
> 
> For ia32 the flag is set (with |=) on every syscall entry.
> Even though I'm pretty sure it can only change during exec.

It can change for every syscall. I have tests that do this.

> 
>>> What's the point, really?
>> 
>> Before we came up with the current games with __copy_siginfo_to_user
>> and x32_copy_siginfo_to_user I was wondering if we should make such
>> a change.  The delivery of compat signal frames and core dumps which
>> do not go through the system call entry path could almost benefit from
>> a flag that could be set/tested when on those paths.
> 
> For signal delivery it should (probably) depend on the system call
> that setup the signal handler.

I think it has worked this way for some time now.

> Although I'm sure I remember one kernel where some of it was done
> in libc (with a single entrypoint for all hadlers).
> 
>> The fact that only SIGCHLD (which can not trigger a coredump) is
>> different saves the coredump code from needing such a test.
>> 
>> The fact that the signal frame code is simple enough it can directly
>> call x32_copy_siginfo_to_user or __copy_siginfo_to_user saves us there.
>> 
>> So I don't think we have any cases where we actually need a flag that
>> is independent of the system call but we have come very close.
> 
> If a program can do both 64bit and x32 system calls you probably
> need to generate a 64bit core dump if it has ever made a 64bit
> system call??

I think core dump should (and does) depend on the execution mode at the time of the crash.

It’s worth noting that GCC’s understanding of mixed bitness is horrible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: in_compat_syscall() on x86
  2021-01-04 23:04       ` Andy Lutomirski
@ 2021-01-05  0:47         ` Eric W. Biederman
  2021-01-05  0:57           ` Al Viro
  2021-01-05  9:53         ` David Laight
  1 sibling, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2021-01-05  0:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Laight, Al Viro, Christoph Hellwig, linux-kernel, X86 ML

Andy Lutomirski <luto@amacapital.net> writes:

>> On Jan 4, 2021, at 2:36 PM, David Laight <David.Laight@aculab.com> wrote:
>> 
>> From: Eric W. Biederman
>>> Sent: 04 January 2021 20:41
>>> 
>>> Al Viro <viro@zeniv.linux.org.uk> writes:
>>> 
>>>> On Mon, Jan 04, 2021 at 12:16:56PM +0000, David Laight wrote:
>>>>> On x86 in_compat_syscall() is defined as:
>>>>>    in_ia32_syscall() || in_x32_syscall()
>>>>> 
>>>>> Now in_ia32_syscall() is a simple check of the TS_COMPAT flag.
>>>>> However in_x32_syscall() is a horrid beast that has to indirect
>>>>> through to the original %eax value (ie the syscall number) and
>>>>> check for a bit there.
>>>>> 
>>>>> So on a kernel with x32 support (probably most distro kernels)
>>>>> the in_compat_syscall() check is rather more expensive than
>>>>> one might expect.
>>> 
>>> I suggest you check the distro kernels.  I suspect they don't compile in
>>> support for x32.  As far as I can tell x32 is an undead beast of a
>>> subarchitecture that just enough people use that it can't be removed,
>>> but few enough people use it likely has a few lurking scary bugs.
>> 
>> It is defined in the Ubuntu kernel configs I've got lurking:
>> Both 3.8.0-19_generic (Ubuntu 13.04) and 5.4.0-56_generic (probably 20.04).
>> Which is probably why it is in my test builds (I've just cut out
>> a lot of modules).

Interesting.  That sounds like something a gentle prod to the Ubuntu
kernel team might get them to disable.  Especially if there are not any
x32 binaries in sight.

Maybe Ubuntu has a reason or maybe someone just enabled the option
because it was there and they could.

>>>>> It would be muck better if both checks could be done together.
>>>>> I think this would require the syscall entry code to set a
>>>>> value in both the 64bit and x32 entry paths.
>>>>> (Can a process make both 64bit and x32 system calls?)
>>>> 
>>>> Yes, it bloody well can.
>>>> 
>>>> And I see no benefit in pushing that logics into syscall entry,
>>>> since anything that calls in_compat_syscall() more than once
>>>> per syscall execution is doing the wrong thing.  Moreover,
>>>> in quite a few cases we don't call the sucker at all, and for
>>>> all of those pushing that crap into syscall entry logics is
>>>> pure loss.
>>> 
>>> The x32 system calls have their own system call table and it would be
>>> trivial to set a flag like TS_COMPAT when looking up a system call from
>>> that table.  I expect such a change would be purely in the noise.
>> 
>> Certainly a write of 0/1/2 into a dirtied cache line of 'current'
>> could easily cost absolutely nothing.
>> Especially if current has already been read.
>> 
>> I also wondered about resetting it to zero when an x32 system call
>> exits (rather than entry to a 64bit one).
>> 
>> For ia32 the flag is set (with |=) on every syscall entry.
>> Even though I'm pretty sure it can only change during exec.
>
> It can change for every syscall. I have tests that do this.
>
>>>> What's the point, really?
>>> 
>>> Before we came up with the current games with __copy_siginfo_to_user
>>> and x32_copy_siginfo_to_user I was wondering if we should make such
>>> a change.  The delivery of compat signal frames and core dumps which
>>> do not go through the system call entry path could almost benefit from
>>> a flag that could be set/tested when on those paths.
>> 
>> For signal delivery it should (probably) depend on the system call
>> that setup the signal handler.
>
> I think it has worked this way for some time now.

It always has, but there is code that called as part of signal delivery
that needs to know if it is ia32 or x32 code (namely
copy_siginfo_to_user32).  The code paths are short enough we don't
strictly need the runtime test on that path and we have been able to
remove it, but it is an example of the kind of path that is not a
syscall entry where it would be nice to set the flag.

>> Although I'm sure I remember one kernel where some of it was done
>> in libc (with a single entrypoint for all hadlers).
>> 
>>> The fact that only SIGCHLD (which can not trigger a coredump) is
>>> different saves the coredump code from needing such a test.
>>> 
>>> The fact that the signal frame code is simple enough it can directly
>>> call x32_copy_siginfo_to_user or __copy_siginfo_to_user saves us there.
>>> 
>>> So I don't think we have any cases where we actually need a flag that
>>> is independent of the system call but we have come very close.
>> 
>> If a program can do both 64bit and x32 system calls you probably
>> need to generate a 64bit core dump if it has ever made a 64bit
>> system call??
>
> I think core dump should (and does) depend on the execution mode at
> the time of the crash.

The core dump code is currently tied to what binary you exec.
The code in exec sets mm->binfmt, and the coredump code uses mm->binfmt
to pick the coredump handler.

An x32 binary will make all kinds of 64bit calls where it doesn't need
the compat handling.  And of course x32 binaries run in 64bit mode with
32bit pointers so looking at the current execution mode doesn't help.

Further fun compat_binfmt_elf is shared between x32 and ia32, because
except for a few stray places they do exactly the same thing.

It is lucky that except for SIGCHLD the signals are between x32 and ia32
are exactly the same so that the kernel can encode them exactly the same
way.

> It’s worth noting that GCC’s understanding of mixed bitness is horrible.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: in_compat_syscall() on x86
  2021-01-05  0:47         ` Eric W. Biederman
@ 2021-01-05  0:57           ` Al Viro
  2021-01-06  0:03             ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2021-01-05  0:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, David Laight, Christoph Hellwig, linux-kernel, X86 ML

On Mon, Jan 04, 2021 at 06:47:38PM -0600, Eric W. Biederman wrote:
> >> It is defined in the Ubuntu kernel configs I've got lurking:
> >> Both 3.8.0-19_generic (Ubuntu 13.04) and 5.4.0-56_generic (probably 20.04).
> >> Which is probably why it is in my test builds (I've just cut out
> >> a lot of modules).
> 
> Interesting.  That sounds like something a gentle prod to the Ubuntu
> kernel team might get them to disable.  Especially if there are not any
> x32 binaries in sight.

What for?

> The core dump code is currently tied to what binary you exec.
> The code in exec sets mm->binfmt, and the coredump code uses mm->binfmt
> to pick the coredump handler.
> 
> An x32 binary will make all kinds of 64bit calls where it doesn't need
> the compat handling.  And of course x32 binaries run in 64bit mode with
> 32bit pointers so looking at the current execution mode doesn't help.
> 
> Further fun compat_binfmt_elf is shared between x32 and ia32, because
> except for a few stray places they do exactly the same thing.

FWIW, there's a series cleaning that crap up nicely; as a side benefit,
it converts both compats on mips (o32 and n32) to regular compat_binfmt_elf.c
Yes, the current mainline is bloody awful in that area (PRSTATUS_SIZE and
SET_PR_FPVALID are not for weak stomach), but that's really not hard to
get into sane shape - -next had that done in last cycle and I'm currently
testing (well, building the test kernel) of port of that to 5.11-rc1.

I really don't see the point of getting rid of x32 - mips n32 is *not*
going away, and that's an exact parallel.

PS: if anything, I wonder if we would better off with binfmt_elf{64,32}.o,
built from fs/binfmt_elf.c; it's not that hard to do.  With arseloads
of weirdness going away if we do that...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: in_compat_syscall() on x86
  2021-01-04 23:04       ` Andy Lutomirski
  2021-01-05  0:47         ` Eric W. Biederman
@ 2021-01-05  9:53         ` David Laight
  2021-01-05 17:35           ` Andy Lutomirski
  1 sibling, 1 reply; 14+ messages in thread
From: David Laight @ 2021-01-05  9:53 UTC (permalink / raw)
  To: 'Andy Lutomirski'
  Cc: Eric W. Biederman, Al Viro, Christoph Hellwig, linux-kernel, X86 ML

From: Andy Lutomirski
> Sent: 04 January 2021 23:04
...
> >> The x32 system calls have their own system call table and it would be
> >> trivial to set a flag like TS_COMPAT when looking up a system call from
> >> that table.  I expect such a change would be purely in the noise.
> >
> > Certainly a write of 0/1/2 into a dirtied cache line of 'current'
> > could easily cost absolutely nothing.
> > Especially if current has already been read.
> >
> > I also wondered about resetting it to zero when an x32 system call
> > exits (rather than entry to a 64bit one).
> >
> > For ia32 the flag is set (with |=) on every syscall entry.
> > Even though I'm pretty sure it can only change during exec.
> 
> It can change for every syscall. I have tests that do this.

Do they still work?
I don't think the ia32 flag is cleared anywhere.

So a 64bit writev() might to badly wrong after an ia32 syscall.

The x32 flag is different, the syscalls just have different numbers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: in_compat_syscall() on x86
  2021-01-05  9:53         ` David Laight
@ 2021-01-05 17:35           ` Andy Lutomirski
  2021-01-06  9:42             ` David Laight
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2021-01-05 17:35 UTC (permalink / raw)
  To: David Laight
  Cc: Eric W. Biederman, Al Viro, Christoph Hellwig, linux-kernel, X86 ML

On Tue, Jan 5, 2021 at 1:53 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Andy Lutomirski
> > Sent: 04 January 2021 23:04
> ...
> > >> The x32 system calls have their own system call table and it would be
> > >> trivial to set a flag like TS_COMPAT when looking up a system call from
> > >> that table.  I expect such a change would be purely in the noise.
> > >
> > > Certainly a write of 0/1/2 into a dirtied cache line of 'current'
> > > could easily cost absolutely nothing.
> > > Especially if current has already been read.
> > >
> > > I also wondered about resetting it to zero when an x32 system call
> > > exits (rather than entry to a 64bit one).
> > >
> > > For ia32 the flag is set (with |=) on every syscall entry.
> > > Even though I'm pretty sure it can only change during exec.
> >
> > It can change for every syscall. I have tests that do this.
>
> Do they still work?

They seem to.

> I don't think the ia32 flag is cleared anywhere.

It's hiding in arch_exit_to_user_mode_prepare().

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: in_compat_syscall() on x86
  2021-01-05  0:57           ` Al Viro
@ 2021-01-06  0:03             ` Eric W. Biederman
  2021-01-06  0:11               ` Bernd Petrovitsch
  2021-01-06  0:30               ` Al Viro
  0 siblings, 2 replies; 14+ messages in thread
From: Eric W. Biederman @ 2021-01-06  0:03 UTC (permalink / raw)
  To: Al Viro
  Cc: Andy Lutomirski, David Laight, Christoph Hellwig, linux-kernel, X86 ML

Al Viro <viro@zeniv.linux.org.uk> writes:

> On Mon, Jan 04, 2021 at 06:47:38PM -0600, Eric W. Biederman wrote:
>> >> It is defined in the Ubuntu kernel configs I've got lurking:
>> >> Both 3.8.0-19_generic (Ubuntu 13.04) and 5.4.0-56_generic (probably 20.04).
>> >> Which is probably why it is in my test builds (I've just cut out
>> >> a lot of modules).
>> 
>> Interesting.  That sounds like something a gentle prod to the Ubuntu
>> kernel team might get them to disable.  Especially if there are not any
>> x32 binaries in sight.
>
> What for?

Any code that no one uses is better off disabled or deleted.

There are maintenance and support costs to such code as they cause extra
work when maintaining the kernel, and because the code is practically
never tested inevitably bugs some of which turn into security issues
slip through.

Maybe I am wrong and there are interesting users of x32.  All I remember
is that last time this was discussed someone found a distro that
actually shipped an x32 build to users.  Which was just enough users to
keep x32 alive.  Given that distros are in the process of dropping 32bit
support I suspect that distro may be going if it is not already gone.

There are a lot of weird x32 special cases that it would be nice to get
rid of if no one uses x32, and could certainly be made less of an issue
if distro's that don't actually care about x32 simply stopped compiling
it in.

>> The core dump code is currently tied to what binary you exec.
>> The code in exec sets mm->binfmt, and the coredump code uses mm->binfmt
>> to pick the coredump handler.
>> 
>> An x32 binary will make all kinds of 64bit calls where it doesn't need
>> the compat handling.  And of course x32 binaries run in 64bit mode with
>> 32bit pointers so looking at the current execution mode doesn't help.
>> 
>> Further fun compat_binfmt_elf is shared between x32 and ia32, because
>> except for a few stray places they do exactly the same thing.
>
> FWIW, there's a series cleaning that crap up nicely; as a side benefit,
> it converts both compats on mips (o32 and n32) to regular compat_binfmt_elf.c
> Yes, the current mainline is bloody awful in that area (PRSTATUS_SIZE and
> SET_PR_FPVALID are not for weak stomach), but that's really not hard to
> get into sane shape - -next had that done in last cycle and I'm currently
> testing (well, building the test kernel) of port of that to 5.11-rc1.

That does sound interesting.  Anytime we can clean up arch specific
weirdness so that it simply becomes generic weirdness and it can be
tested and maintained by more people is always nice.

> I really don't see the point of getting rid of x32 - mips n32 is *not*
> going away, and that's an exact parallel.

From what I maintain x32 and n32 are *not* exact parallels.  The change
in size of the timestamps of SIGCHLD is not present on any other
architecture.

The truth is someone years ago royallying messed up struct siginfo and
took a structure that should have been the same on 32bit and 64bit and
would up with a horrible monstrosity of unions.

> PS: if anything, I wonder if we would better off with binfmt_elf{64,32}.o,
> built from fs/binfmt_elf.c; it's not that hard to do.  With arseloads
> of weirdness going away if we do that...

It sounds like we are already on our way to having that.  I suspect you
would need an binfmt_elf_ops to handle the architecture specific bits,
but it should not be too bad.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: in_compat_syscall() on x86
  2021-01-06  0:03             ` Eric W. Biederman
@ 2021-01-06  0:11               ` Bernd Petrovitsch
  2021-01-06  0:30               ` Al Viro
  1 sibling, 0 replies; 14+ messages in thread
From: Bernd Petrovitsch @ 2021-01-06  0:11 UTC (permalink / raw)
  To: Eric W. Biederman, Al Viro
  Cc: Andy Lutomirski, David Laight, Christoph Hellwig, linux-kernel, X86 ML

Hi all!

On Tue, 2021-01-05 at 18:03 -0600, Eric W. Biederman wrote:
[...]
> Maybe I am wrong and there are interesting users of x32.  All I remember
> is that last time this was discussed someone found a distro that
> actually shipped an x32 build to users.  Which was just enough users to

https://wiki.debian.org/X32Port ?

[...]

MfG,
	Bernd
-- 
Bernd Petrovitsch                  Email : bernd@petrovitsch.priv.at
There is no cloud, just other people computers. - FSFE
                     LUGA : http://www.luga.at



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: in_compat_syscall() on x86
  2021-01-06  0:03             ` Eric W. Biederman
  2021-01-06  0:11               ` Bernd Petrovitsch
@ 2021-01-06  0:30               ` Al Viro
  1 sibling, 0 replies; 14+ messages in thread
From: Al Viro @ 2021-01-06  0:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, David Laight, Christoph Hellwig, linux-kernel, X86 ML

On Tue, Jan 05, 2021 at 06:03:15PM -0600, Eric W. Biederman wrote:
> > Yes, the current mainline is bloody awful in that area (PRSTATUS_SIZE and
> > SET_PR_FPVALID are not for weak stomach), but that's really not hard to
> > get into sane shape - -next had that done in last cycle and I'm currently
> > testing (well, building the test kernel) of port of that to 5.11-rc1.
> 
> That does sound interesting.  Anytime we can clean up arch specific
> weirdness so that it simply becomes generic weirdness and it can be
> tested and maintained by more people is always nice.

vfs.git #work.elf-compat, and AFAICS it works.

 arch/Kconfig                               |   3 ++
 arch/arm64/Kconfig                         |   1 -
 arch/ia64/kernel/crash.c                   |   2 +-
 arch/mips/Kconfig                          |   8 ++----
 arch/mips/include/asm/elf.h                |  56 +++++++++++++-----------------------
 arch/mips/include/asm/elfcore-compat.h     |  29 +++++++++++++++++++
 arch/mips/kernel/Makefile                  |   4 +--
 arch/mips/kernel/binfmt_elfn32.c           | 106 --------------------------------------------------------------------
 arch/mips/kernel/binfmt_elfo32.c           | 109 ----------------------------------------------------------------------
 arch/mips/kernel/scall64-n64.S             |   2 +-
 arch/parisc/Kconfig                        |   1 -
 arch/powerpc/Kconfig                       |   1 -
 arch/powerpc/platforms/powernv/opal-core.c |   6 ++--
 arch/s390/Kconfig                          |   1 -
 arch/s390/kernel/crash_dump.c              |   2 +-
 arch/sparc/Kconfig                         |   1 -
 arch/x86/Kconfig                           |   2 +-
 arch/x86/include/asm/compat.h              |  11 -------
 arch/x86/include/asm/elf.h                 |   2 +-
 arch/x86/include/asm/elfcore-compat.h      |  31 ++++++++++++++++++++
 fs/Kconfig.binfmt                          |   2 +-
 fs/binfmt_elf.c                            |  19 ++++++-------
 fs/binfmt_elf_fdpic.c                      |  22 ++++----------
 fs/compat_binfmt_elf.c                     |   7 +----
 include/linux/elfcore-compat.h             |  15 ++++++++--
 include/linux/elfcore.h                    |   7 ++++-
 kernel/kexec_core.c                        |   2 +-
 28 files changed, 128 insertions(+), 324 deletions(-)
 create mode 100644 arch/mips/include/asm/elfcore-compat.h
 delete mode 100644 arch/mips/kernel/binfmt_elfn32.c
 delete mode 100644 arch/mips/kernel/binfmt_elfo32.c
 create mode 100644 arch/x86/include/asm/elfcore-compat.h


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: in_compat_syscall() on x86
  2021-01-05 17:35           ` Andy Lutomirski
@ 2021-01-06  9:42             ` David Laight
  0 siblings, 0 replies; 14+ messages in thread
From: David Laight @ 2021-01-06  9:42 UTC (permalink / raw)
  To: 'Andy Lutomirski'
  Cc: Eric W. Biederman, Al Viro, Christoph Hellwig, linux-kernel, X86 ML

From: Andy Lutomirski
> Sent: 05 January 2021 17:35
> 
> On Tue, Jan 5, 2021 at 1:53 AM David Laight <David.Laight@aculab.com> wrote:
> > ...
...
> > > > I also wondered about resetting it to zero when an x32 system call
> > > > exits (rather than entry to a 64bit one).
> > > >
> > > > For ia32 the flag is set (with |=) on every syscall entry.
> > > > Even though I'm pretty sure it can only change during exec.
> > >
> > > It can change for every syscall. I have tests that do this.
> >
> > Do they still work?
> 
> They seem to.
> 
> > I don't think the ia32 flag is cleared anywhere.
> 
> It's hiding in arch_exit_to_user_mode_prepare().

No wonder I don't notice.
There is a RMW to clear TS_COMPAT | TS_I386_REGS_POKED as (about)
the very last thing done before returning to user.

Short or renaming the 'status' field and fixing the compilation errors
I can't see an obvious way of detecting any other status flag bits.

If none are used (or a different field is used) then an unconditional
write of the syscall type in the entry paths would remove the need for
the clear in the syscall exit path.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-01-06  9:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-04 12:16 in_compat_syscall() on x86 David Laight
2021-01-04 16:46 ` David Laight
2021-01-04 16:58 ` Al Viro
2021-01-04 20:41   ` Eric W. Biederman
2021-01-04 22:34     ` David Laight
2021-01-04 23:04       ` Andy Lutomirski
2021-01-05  0:47         ` Eric W. Biederman
2021-01-05  0:57           ` Al Viro
2021-01-06  0:03             ` Eric W. Biederman
2021-01-06  0:11               ` Bernd Petrovitsch
2021-01-06  0:30               ` Al Viro
2021-01-05  9:53         ` David Laight
2021-01-05 17:35           ` Andy Lutomirski
2021-01-06  9:42             ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).