kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed
* Detecting the availability of VSYSCALL
@ 2019-06-25 15:15 Florian Weimer
  2019-06-25 16:30 ` Thomas Gleixner
  2019-06-25 20:08 ` Kees Cook
  0 siblings, 2 replies; 18+ messages in thread
From: Florian Weimer @ 2019-06-25 15:15 UTC (permalink / raw)
  To: linux-api, kernel-hardening, linux-x86_64, linux-arch
  Cc: Andy Lutomirski, Kees Cook, Carlos O'Donell

We're trying to create portable binaries which use VSYSCALL on older
kernels (to avoid performance regressions), but gracefully degrade to
full system calls on kernels which do not have VSYSCALL support compiled
in (or disabled at boot).

For technical reasons, we cannot use vDSO fallback.  Trying vDSO first
and only then use VSYSCALL is the way this has been tackled in the past,
which is why this userspace ABI breakage goes generally unnoticed.  But
we don't have a dynamic linker in our scenario.

Is there any reliable way to detect that VSYSCALL is unavailable,
without resorting to parsing /proc/self/maps or opening file
descriptors?

Should we try mapping something at the magic address (without MAP_FIXED)
and see if we get back a different address?  Something in the auxiliary
vector would work for us, too, but nothing seems to exists there
unfortunately.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-25 15:15 Detecting the availability of VSYSCALL Florian Weimer
@ 2019-06-25 16:30 ` Thomas Gleixner
  2019-06-25 16:38   ` Florian Weimer
  2019-06-25 20:08 ` Kees Cook
  1 sibling, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2019-06-25 16:30 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-api, kernel-hardening, linux-x86_64, linux-arch,
	Andy Lutomirski, Kees Cook, Carlos O'Donell, x86

On Tue, 25 Jun 2019, Florian Weimer wrote:
> We're trying to create portable binaries which use VSYSCALL on older
> kernels (to avoid performance regressions), but gracefully degrade to
> full system calls on kernels which do not have VSYSCALL support compiled
> in (or disabled at boot).
>
> For technical reasons, we cannot use vDSO fallback.  Trying vDSO first
> and only then use VSYSCALL is the way this has been tackled in the past,
> which is why this userspace ABI breakage goes generally unnoticed.  But
> we don't have a dynamic linker in our scenario.

I'm not following. On newer kernels which usually have vsyscall disabled
you need to use real syscalls anyway, so why are you so worried about
performance on older kernels. That doesn't make sense.

> Is there any reliable way to detect that VSYSCALL is unavailable,
> without resorting to parsing /proc/self/maps or opening file
> descriptors?

Not that I'm aware of except

    sigaction(SIG_SEGV,....)

/me hides
 
> Should we try mapping something at the magic address (without MAP_FIXED)
> and see if we get back a different address?  Something in the auxiliary
> vector would work for us, too, but nothing seems to exists there
> unfortunately.

Would, but there is no such thing.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-25 16:30 ` Thomas Gleixner
@ 2019-06-25 16:38   ` Florian Weimer
  2019-06-25 20:11     ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2019-06-25 16:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-api, kernel-hardening, linux-x86_64, linux-arch,
	Andy Lutomirski, Kees Cook, Carlos O'Donell, x86

* Thomas Gleixner:

> On Tue, 25 Jun 2019, Florian Weimer wrote:
>> We're trying to create portable binaries which use VSYSCALL on older
>> kernels (to avoid performance regressions), but gracefully degrade to
>> full system calls on kernels which do not have VSYSCALL support compiled
>> in (or disabled at boot).
>>
>> For technical reasons, we cannot use vDSO fallback.  Trying vDSO first
>> and only then use VSYSCALL is the way this has been tackled in the past,
>> which is why this userspace ABI breakage goes generally unnoticed.  But
>> we don't have a dynamic linker in our scenario.
>
> I'm not following. On newer kernels which usually have vsyscall disabled
> you need to use real syscalls anyway, so why are you so worried about
> performance on older kernels. That doesn't make sense.

We want binaries that run fast on VSYSCALL kernels, but can fall back to
full system calls on kernels that do not have them (instead of
crashing).

We could parse the vDSO and prefer the functions found there, but this
is for the statically linked case.  We currently do not have a (minimal)
dynamic loader there in that version of the code base, so that doesn't
really work for us.

>> Is there any reliable way to detect that VSYSCALL is unavailable,
>> without resorting to parsing /proc/self/maps or opening file
>> descriptors?
>
> Not that I'm aware of except
>
>     sigaction(SIG_SEGV,....)
>
> /me hides

I know people do this for SIGILL to probe for CPU features, but yeah,
let's just not go there. 8-p

Thanks,
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-25 15:15 Detecting the availability of VSYSCALL Florian Weimer
  2019-06-25 16:30 ` Thomas Gleixner
@ 2019-06-25 20:08 ` Kees Cook
  2019-06-25 20:13   ` Andy Lutomirski
  1 sibling, 1 reply; 18+ messages in thread
From: Kees Cook @ 2019-06-25 20:08 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-api, kernel-hardening, linux-x86_64, linux-arch,
	Andy Lutomirski, Carlos O'Donell

On Tue, Jun 25, 2019 at 05:15:27PM +0200, Florian Weimer wrote:
> Should we try mapping something at the magic address (without MAP_FIXED)
> and see if we get back a different address?  Something in the auxiliary
> vector would work for us, too, but nothing seems to exists there
> unfortunately.

It seems like mmap() won't even work because it's in the high memory
area. I can't map something a page under the vsyscall page either, so I
can't distinguish it with mmap, mprotect, madvise, or msync. :(

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-25 16:38   ` Florian Weimer
@ 2019-06-25 20:11     ` Andy Lutomirski
  2019-06-25 20:47       ` Florian Weimer
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2019-06-25 20:11 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Thomas Gleixner, Linux API, Kernel Hardening, linux-x86_64,
	linux-arch, Andy Lutomirski, Kees Cook, Carlos O'Donell,
	X86 ML

On Tue, Jun 25, 2019 at 9:39 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Thomas Gleixner:
>
> > On Tue, 25 Jun 2019, Florian Weimer wrote:
> >> We're trying to create portable binaries which use VSYSCALL on older
> >> kernels (to avoid performance regressions), but gracefully degrade to
> >> full system calls on kernels which do not have VSYSCALL support compiled
> >> in (or disabled at boot).
> >>
> >> For technical reasons, we cannot use vDSO fallback.  Trying vDSO first
> >> and only then use VSYSCALL is the way this has been tackled in the past,
> >> which is why this userspace ABI breakage goes generally unnoticed.  But
> >> we don't have a dynamic linker in our scenario.
> >
> > I'm not following. On newer kernels which usually have vsyscall disabled
> > you need to use real syscalls anyway, so why are you so worried about
> > performance on older kernels. That doesn't make sense.
>
> We want binaries that run fast on VSYSCALL kernels, but can fall back to
> full system calls on kernels that do not have them (instead of
> crashing).

Define "VSYSCALL kernels."  On any remotely recent kernel (*all* new
kernels and all kernels for the last several years that haven't
specifically requested vsyscall=native), using vsyscalls is much, much
slower than just doing syscalls.  I know a way you can tell whether
vsyscalls are fast, but it's unreliable, and I'm disinclined to
suggest it.  There are also at least two pending patch series that
will interfere.

>
> We could parse the vDSO and prefer the functions found there, but this
> is for the statically linked case.  We currently do not have a (minimal)
> dynamic loader there in that version of the code base, so that doesn't
> really work for us.

Is anything preventing you from adding a vDSO parser?  I wrote one
just for this type of use:

$ wc -l tools/testing/selftests/vDSO/parse_vdso.c
269 tools/testing/selftests/vDSO/parse_vdso.c

(289 lines includes quite a bit of comment.)


$ head -n8 tools/testing/selftests/vDSO/parse_vdso.c
/*
 * parse_vdso.c: Linux reference vDSO parser
 * Written by Andrew Lutomirski, 2011-2014.
 *
 * This code is meant to be linked in to various programs that run on Linux.
 * As such, it is available with as few restrictions as possible.  This file
 * is licensed under the Creative Commons Zero License, version 1.0,
 * available at http://creativecommons.org/publicdomain/zero/1.0/legalcode

If this license is too restrictive for you, I could probably be
convinced to relicense it, I'd be surprised :)  In hindsight, I kind
of wish I'd used MIT instead, since the Go runtime took advantage of
the CC0 license to import it without attribution *and* break it quite
badly in the process.

IMO the correct solution is to parse the vDSO and, if that fails, to
use plain syscalls as a fallback.  You should not ship anything that
uses a vsyscall under any circumstances, unless you need the last
ounce of performance on that one ancient version of OpenSuSE that
crashes if the vDSO is enabled.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-25 20:08 ` Kees Cook
@ 2019-06-25 20:13   ` Andy Lutomirski
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2019-06-25 20:13 UTC (permalink / raw)
  To: Kees Cook
  Cc: Florian Weimer, Linux API, Kernel Hardening, linux-x86_64,
	linux-arch, Andy Lutomirski, Carlos O'Donell

On Tue, Jun 25, 2019 at 1:08 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Jun 25, 2019 at 05:15:27PM +0200, Florian Weimer wrote:
> > Should we try mapping something at the magic address (without MAP_FIXED)
> > and see if we get back a different address?  Something in the auxiliary
> > vector would work for us, too, but nothing seems to exists there
> > unfortunately.
>
> It seems like mmap() won't even work because it's in the high memory
> area. I can't map something a page under the vsyscall page either, so I
> can't distinguish it with mmap, mprotect, madvise, or msync. :(
>

I keep contemplating making munmap() work on it.  That would nicely
answer the question: if munmap() fails, it's not there, and, if
munmap() succeeds, it's not there :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-25 20:11     ` Andy Lutomirski
@ 2019-06-25 20:47       ` Florian Weimer
  2019-06-25 21:49         ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2019-06-25 20:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Linux API, Kernel Hardening, linux-x86_64,
	linux-arch, Kees Cook, Carlos O'Donell, X86 ML

* Andy Lutomirski:

>> We want binaries that run fast on VSYSCALL kernels, but can fall back to
>> full system calls on kernels that do not have them (instead of
>> crashing).
>
> Define "VSYSCALL kernels."  On any remotely recent kernel (*all* new
> kernels and all kernels for the last several years that haven't
> specifically requested vsyscall=native), using vsyscalls is much, much
> slower than just doing syscalls.  I know a way you can tell whether
> vsyscalls are fast, but it's unreliable, and I'm disinclined to
> suggest it.  There are also at least two pending patch series that
> will interfere.

The fast path is for the benefit of the 2.6.32-based kernel in Red Hat
Enterprise Linux 6.  It doesn't have the vsyscall emulation code yet, I
think.

My hope is to produce (statically linked) binaries that run as fast on
that kernel as they run today, but can gracefully fall back to something
else on kernels without vsyscall support.

>> We could parse the vDSO and prefer the functions found there, but this
>> is for the statically linked case.  We currently do not have a (minimal)
>> dynamic loader there in that version of the code base, so that doesn't
>> really work for us.
>
> Is anything preventing you from adding a vDSO parser?  I wrote one
> just for this type of use:
>
> $ wc -l tools/testing/selftests/vDSO/parse_vdso.c
> 269 tools/testing/selftests/vDSO/parse_vdso.c
>
> (289 lines includes quite a bit of comment.)

I'm worried that if I use a custom parser and the binaries start
crashing again because something changed in the kernel (within the scope
permitted by the ELF specification), the kernel won't be fixed.

That is, we'd be in exactly the same situation as today.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-25 20:47       ` Florian Weimer
@ 2019-06-25 21:49         ` Andy Lutomirski
  2019-06-26 12:12           ` Florian Weimer
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2019-06-25 21:49 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Thomas Gleixner, Linux API, Kernel Hardening,
	linux-x86_64, linux-arch, Kees Cook, Carlos O'Donell, X86 ML

On Tue, Jun 25, 2019 at 1:47 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Andy Lutomirski:
>
> >> We want binaries that run fast on VSYSCALL kernels, but can fall back to
> >> full system calls on kernels that do not have them (instead of
> >> crashing).
> >
> > Define "VSYSCALL kernels."  On any remotely recent kernel (*all* new
> > kernels and all kernels for the last several years that haven't
> > specifically requested vsyscall=native), using vsyscalls is much, much
> > slower than just doing syscalls.  I know a way you can tell whether
> > vsyscalls are fast, but it's unreliable, and I'm disinclined to
> > suggest it.  There are also at least two pending patch series that
> > will interfere.
>
> The fast path is for the benefit of the 2.6.32-based kernel in Red Hat
> Enterprise Linux 6.  It doesn't have the vsyscall emulation code yet, I
> think.
>
> My hope is to produce (statically linked) binaries that run as fast on
> that kernel as they run today, but can gracefully fall back to something
> else on kernels without vsyscall support.
>
> >> We could parse the vDSO and prefer the functions found there, but this
> >> is for the statically linked case.  We currently do not have a (minimal)
> >> dynamic loader there in that version of the code base, so that doesn't
> >> really work for us.
> >
> > Is anything preventing you from adding a vDSO parser?  I wrote one
> > just for this type of use:
> >
> > $ wc -l tools/testing/selftests/vDSO/parse_vdso.c
> > 269 tools/testing/selftests/vDSO/parse_vdso.c
> >
> > (289 lines includes quite a bit of comment.)
>
> I'm worried that if I use a custom parser and the binaries start
> crashing again because something changed in the kernel (within the scope
> permitted by the ELF specification), the kernel won't be fixed.
>
> That is, we'd be in exactly the same situation as today.

With my maintainer hat on, the kernel won't do that.  Obviously a
review of my parser would be appreciated, but I consider it to be
fully supported, just like glibc and musl's parsers are fully
supported.  Sadly, I *also* consider the version Go forked for a while
(now fixed) to be supported.  Sigh.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-25 21:49         ` Andy Lutomirski
@ 2019-06-26 12:12           ` Florian Weimer
  2019-06-26 14:15             ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2019-06-26 12:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Linux API, Kernel Hardening, linux-x86_64,
	linux-arch, Kees Cook, Carlos O'Donell, X86 ML

* Andy Lutomirski:

> On Tue, Jun 25, 2019 at 1:47 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Andy Lutomirski:
>>
>> >> We want binaries that run fast on VSYSCALL kernels, but can fall back to
>> >> full system calls on kernels that do not have them (instead of
>> >> crashing).
>> >
>> > Define "VSYSCALL kernels."  On any remotely recent kernel (*all* new
>> > kernels and all kernels for the last several years that haven't
>> > specifically requested vsyscall=native), using vsyscalls is much, much
>> > slower than just doing syscalls.  I know a way you can tell whether
>> > vsyscalls are fast, but it's unreliable, and I'm disinclined to
>> > suggest it.  There are also at least two pending patch series that
>> > will interfere.
>>
>> The fast path is for the benefit of the 2.6.32-based kernel in Red Hat
>> Enterprise Linux 6.  It doesn't have the vsyscall emulation code yet, I
>> think.
>>
>> My hope is to produce (statically linked) binaries that run as fast on
>> that kernel as they run today, but can gracefully fall back to something
>> else on kernels without vsyscall support.
>>
>> >> We could parse the vDSO and prefer the functions found there, but this
>> >> is for the statically linked case.  We currently do not have a (minimal)
>> >> dynamic loader there in that version of the code base, so that doesn't
>> >> really work for us.
>> >
>> > Is anything preventing you from adding a vDSO parser?  I wrote one
>> > just for this type of use:
>> >
>> > $ wc -l tools/testing/selftests/vDSO/parse_vdso.c
>> > 269 tools/testing/selftests/vDSO/parse_vdso.c
>> >
>> > (289 lines includes quite a bit of comment.)
>>
>> I'm worried that if I use a custom parser and the binaries start
>> crashing again because something changed in the kernel (within the scope
>> permitted by the ELF specification), the kernel won't be fixed.
>>
>> That is, we'd be in exactly the same situation as today.
>
> With my maintainer hat on, the kernel won't do that.  Obviously a
> review of my parser would be appreciated, but I consider it to be
> fully supported, just like glibc and musl's parsers are fully
> supported.  Sadly, I *also* consider the version Go forked for a while
> (now fixed) to be supported.  Sigh.

We've been burnt once, otherwise we wouldn't be having this
conversation.  It's not just what the kernel does by default; if it's
configurable, it will be disabled by some, and if it's label as
“security hardening”, the userspace ABI promise is suddenly forgotten
and it's all userspace's fault for not supporting the new way.

It looks like parsing the vDSO is the only way forward, and we have to
move in that direction if we move at all.

It's tempting to read the machine code on the vsyscall page and analyze
that, but vsyscall=none behavior changed at one point, and you no longer
any mapping there at all.  So that doesn't work, either.

I do hope the next userspace ABI break will have an option to undo it on
a per-container basis.  Or at least a flag to detect it.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-26 12:12           ` Florian Weimer
@ 2019-06-26 14:15             ` Andy Lutomirski
  2019-06-26 15:00               ` Florian Weimer
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2019-06-26 14:15 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Thomas Gleixner, Linux API, Kernel Hardening,
	linux-x86_64, linux-arch, Kees Cook, Carlos O'Donell, X86 ML



> On Jun 26, 2019, at 5:12 AM, Florian Weimer <fweimer@redhat.com> wrote:
> 
> * Andy Lutomirski:
> 
>>> On Tue, Jun 25, 2019 at 1:47 PM Florian Weimer <fweimer@redhat.com> wrote:
>>> 
>>> * Andy Lutomirski:
>>> 
>>>>> We want binaries that run fast on VSYSCALL kernels, but can fall back to
>>>>> full system calls on kernels that do not have them (instead of
>>>>> crashing).
>>>> 
>>>> Define "VSYSCALL kernels."  On any remotely recent kernel (*all* new
>>>> kernels and all kernels for the last several years that haven't
>>>> specifically requested vsyscall=native), using vsyscalls is much, much
>>>> slower than just doing syscalls.  I know a way you can tell whether
>>>> vsyscalls are fast, but it's unreliable, and I'm disinclined to
>>>> suggest it.  There are also at least two pending patch series that
>>>> will interfere.
>>> 
>>> The fast path is for the benefit of the 2.6.32-based kernel in Red Hat
>>> Enterprise Linux 6.  It doesn't have the vsyscall emulation code yet, I
>>> think.
>>> 
>>> My hope is to produce (statically linked) binaries that run as fast on
>>> that kernel as they run today, but can gracefully fall back to something
>>> else on kernels without vsyscall support.
>>> 
>>>>> We could parse the vDSO and prefer the functions found there, but this
>>>>> is for the statically linked case.  We currently do not have a (minimal)
>>>>> dynamic loader there in that version of the code base, so that doesn't
>>>>> really work for us.
>>>> 
>>>> Is anything preventing you from adding a vDSO parser?  I wrote one
>>>> just for this type of use:
>>>> 
>>>> $ wc -l tools/testing/selftests/vDSO/parse_vdso.c
>>>> 269 tools/testing/selftests/vDSO/parse_vdso.c
>>>> 
>>>> (289 lines includes quite a bit of comment.)
>>> 
>>> I'm worried that if I use a custom parser and the binaries start
>>> crashing again because something changed in the kernel (within the scope
>>> permitted by the ELF specification), the kernel won't be fixed.
>>> 
>>> That is, we'd be in exactly the same situation as today.
>> 
>> With my maintainer hat on, the kernel won't do that.  Obviously a
>> review of my parser would be appreciated, but I consider it to be
>> fully supported, just like glibc and musl's parsers are fully
>> supported.  Sadly, I *also* consider the version Go forked for a while
>> (now fixed) to be supported.  Sigh.
> 
> We've been burnt once, otherwise we wouldn't be having this
> conversation.  It's not just what the kernel does by default; if it's
> configurable, it will be disabled by some, and if it's label as
> “security hardening”, the userspace ABI promise is suddenly forgotten
> and it's all userspace's fault for not supporting the new way.
> 
> It looks like parsing the vDSO is the only way forward, and we have to
> move in that direction if we move at all.
> 
> It's tempting to read the machine code on the vsyscall page and analyze
> that, but vsyscall=none behavior changed at one point, and you no longer
> any mapping there at all.  So that doesn't work, either.

It’s worse than that. I have patches to make the vsyscall be execute-only. And the slowly forthcoming CET patches will change the machine code.

> 
> I do hope the next userspace ABI break will have an option to undo it on
> a per-container basis.  Or at least a flag to detect it.
> 

I didn’t add a flag because the vsyscall page was thoroughly obsolete when all this happened, and I wanted to encourage all new code to just parse the vDSO instead of piling on the hacks.

Anyway, you may be the right person to ask: is there some credible way that the kernel could detect new binaries that don’t need vsyscalls?  Maybe a new ELF note on a static binary or on the ELF interpreter? We can dynamically switch it in principle.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-26 14:15             ` Andy Lutomirski
@ 2019-06-26 15:00               ` Florian Weimer
  2019-06-26 15:21                 ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2019-06-26 15:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Thomas Gleixner, Linux API, Kernel Hardening,
	linux-x86_64, linux-arch, Kees Cook, Carlos O'Donell, X86 ML

* Andy Lutomirski:

> I didn’t add a flag because the vsyscall page was thoroughly obsolete
> when all this happened, and I wanted to encourage all new code to just
> parse the vDSO instead of piling on the hacks.

It turned out that the thorny cases just switched to system calls
instead.  I think we finally completed the transition in glibc upstream
in 2018 (for x86).

> Anyway, you may be the right person to ask: is there some credible way
> that the kernel could detect new binaries that don’t need vsyscalls?
> Maybe a new ELF note on a static binary or on the ELF interpreter? We
> can dynamically switch it in principle.

For this kind of change, markup similar to PT_GNU_STACK would have been
appropriate, I think: Old kernels and loaders would have ignored the
program header and loaded the program anyway, but the vsyscall page
still existed, so that would have been fine. The kernel would have
needed to check the program interpreter or the main executable (without
a program interpreter, i.e., the statically linked case).  Due the way
the vsyscalls are concentrated in glibc, a dynamically linked executable
would not have needed checking (or re-linking).  I don't think we would
have implemented the full late enablement after dlopen we did for
executable stacks.  In theory, any code could have jumped to the
vsyscall area, but in practice, it's just dynamically linked glibc and
static binaries.

But nowadays, unmarked glibcs which do not depend on vsyscall vastly
outnumber unmarked glibcs which requrie it.  Therefore, markup of
binaries does not seem to be reasonable to day.  I could imagine a
personality flag you can set (if yoy have CAP_SYS_ADMIN) that re-enables
vsyscall support for new subprocesses.  And a container runtime would do
this based on metadata found in the image.  This way, the container host
itself could be protected, and you could still run legacy images which
require vsyscall.

For the non-container case, if you know that you'll run legacy
workloads, you'd still have the boot parameter.  But I think it could
default to vsyscall=none in many more cases.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-26 15:00               ` Florian Weimer
@ 2019-06-26 15:21                 ` Andy Lutomirski
  2019-06-26 15:36                   ` Florian Weimer
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2019-06-26 15:21 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Thomas Gleixner, Linux API, Kernel Hardening,
	linux-x86_64, linux-arch, Kees Cook, Carlos O'Donell, X86 ML



> On Jun 26, 2019, at 8:00 AM, Florian Weimer <fweimer@redhat.com> wrote:
> 
> * Andy Lutomirski:
> 
>> I didn’t add a flag because the vsyscall page was thoroughly obsolete
>> when all this happened, and I wanted to encourage all new code to just
>> parse the vDSO instead of piling on the hacks.
> 
> It turned out that the thorny cases just switched to system calls
> instead.  I think we finally completed the transition in glibc upstream
> in 2018 (for x86).
> 
>> Anyway, you may be the right person to ask: is there some credible way
>> that the kernel could detect new binaries that don’t need vsyscalls?
>> Maybe a new ELF note on a static binary or on the ELF interpreter? We
>> can dynamically switch it in principle.
> 
> For this kind of change, markup similar to PT_GNU_STACK would have been
> appropriate, I think: Old kernels and loaders would have ignored the
> program header and loaded the program anyway, but the vsyscall page
> still existed, so that would have been fine. The kernel would have
> needed to check the program interpreter or the main executable (without
> a program interpreter, i.e., the statically linked case).  Due the way
> the vsyscalls are concentrated in glibc, a dynamically linked executable
> would not have needed checking (or re-linking).  I don't think we would
> have implemented the full late enablement after dlopen we did for
> executable stacks.  In theory, any code could have jumped to the
> vsyscall area, but in practice, it's just dynamically linked glibc and
> static binaries.
> 
> But nowadays, unmarked glibcs which do not depend on vsyscall vastly
> outnumber unmarked glibcs which requrie it.  Therefore, markup of
> binaries does not seem to be reasonable to day.  I could imagine a
> personality flag you can set (if yoy have CAP_SYS_ADMIN) that re-enables
> vsyscall support for new subprocesses.  And a container runtime would do
> this based on metadata found in the image.  This way, the container host
> itself could be protected, and you could still run legacy images which
> require vsyscall.
> 
> For the non-container case, if you know that you'll run legacy
> workloads, you'd still have the boot parameter.  But I think it could
> default to vsyscall=none in many more cases.
> 

I’m wondering if we can still do it: add a note or other ELF indicator that says “I don’t need vsyscalls.”  Then we can change the default mode to “no vsyscalls if the flag is there, else execute-only vsyscalls”.

Would glibc go along with this?  Would enterprise distros consider backporting such a thing?

I

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-26 15:21                 ` Andy Lutomirski
@ 2019-06-26 15:36                   ` Florian Weimer
  2019-06-26 16:24                     ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2019-06-26 15:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Thomas Gleixner, Linux API, Kernel Hardening,
	linux-x86_64, linux-arch, Kees Cook, Carlos O'Donell, X86 ML

* Andy Lutomirski:

> I’m wondering if we can still do it: add a note or other ELF indicator
> that says “I don’t need vsyscalls.”  Then we can change the default
> mode to “no vsyscalls if the flag is there, else execute-only
> vsyscalls”.
>
> Would glibc go along with this?

I think we can make it happen, at least for relatively recent glibc
linked with current binutils.  It's not trivial because it requires
coordination among multiple projects.  We have three or four widely used
link editors now, but we could make it happen.  (Although getting to
PT_GNU_PROPERTY wasn't exactly easy.)

> Would enterprise distros consider backporting such a thing?

Enterprise distros aren't the problem here because they can't remove
vsyscall support for quite a while due to existing customer binaries.
For them, it would just be an additional (and welcome) hardening
opportunity.

The challenge here are container hosting platforms which have already
disabled vsyscall, presumably to protect the container host itself.
They would need to rebuild the container host userspace with the markup
to keep it protected, and then they could switch to a kernel which has
vsyscall-unless-opt-out logic.  That seems to be a bit of a stretch
because from their perspective, there's no problem today.

My guess is that it would be easier to have a personality flag.  Then
they could keep the host largely as-is, and would “only” need a
mechanism to pass through the flag from the image metadata to the actual
container creation.  It's still a change to the container host (and the
kernel change is required as well), but it would not require relinking
every statically linked binary.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-26 15:36                   ` Florian Weimer
@ 2019-06-26 16:24                     ` Andy Lutomirski
  2019-06-26 16:45                       ` Florian Weimer
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2019-06-26 16:24 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Thomas Gleixner, Linux API, Kernel Hardening,
	linux-x86_64, linux-arch, Kees Cook, Carlos O'Donell, X86 ML


> On Jun 26, 2019, at 8:36 AM, Florian Weimer <fweimer@redhat.com> wrote:
> 
> * Andy Lutomirski:
> 
>> I’m wondering if we can still do it: add a note or other ELF indicator
>> that says “I don’t need vsyscalls.”  Then we can change the default
>> mode to “no vsyscalls if the flag is there, else execute-only
>> vsyscalls”.
>> 
>> Would glibc go along with this?
> 
> I think we can make it happen, at least for relatively recent glibc
> linked with current binutils.  It's not trivial because it requires
> coordination among multiple projects.  We have three or four widely used
> link editors now, but we could make it happen.  (Although getting to
> PT_GNU_PROPERTY wasn't exactly easy.)

Can’t an ELF note be done with some more or less ordinary asm such that any link editor will insert it correctly?

> 
>> Would enterprise distros consider backporting such a thing?
> 
> Enterprise distros aren't the problem here because they can't remove
> vsyscall support for quite a while due to existing customer binaries.
> For them, it would just be an additional (and welcome) hardening
> opportunity.
> 
> The challenge here are container hosting platforms which have already
> disabled vsyscall, presumably to protect the container host itself.
> They would need to rebuild the container host userspace with the markup
> to keep it protected, and then they could switch to a kernel which has
> vsyscall-unless-opt-out logic.  That seems to be a bit of a stretch
> because from their perspective, there's no problem today.
> 
> My guess is that it would be easier to have a personality flag.  Then
> they could keep the host largely as-is, and would “only” need a
> mechanism to pass through the flag from the image metadata to the actual
> container creation.  It's still a change to the container host (and the
> kernel change is required as well), but it would not require relinking
> every statically linked binary.
> 
> 

The problem with a personality flag is that it needs to have some kind of sensible behavior for setuid programs, and getting that right in a way that doesn’t scream “exploit me” while preserving useful compatibility may be tricky.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-26 16:24                     ` Andy Lutomirski
@ 2019-06-26 16:45                       ` Florian Weimer
  2019-06-26 16:52                         ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2019-06-26 16:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Thomas Gleixner, Linux API, Kernel Hardening,
	linux-x86_64, linux-arch, Kees Cook, Carlos O'Donell, X86 ML

* Andy Lutomirski:

> Can’t an ELF note be done with some more or less ordinary asm such
> that any link editor will insert it correctly?

We've just been over this for the CET enablement.  ELF PT_NOTE parsing
was rejected there.

I don't think binutils ld has a way to set an ELF program header it
doesn't know anything about.

>>> Would enterprise distros consider backporting such a thing?
>> 
>> Enterprise distros aren't the problem here because they can't remove
>> vsyscall support for quite a while due to existing customer binaries.
>> For them, it would just be an additional (and welcome) hardening
>> opportunity.
>> 
>> The challenge here are container hosting platforms which have already
>> disabled vsyscall, presumably to protect the container host itself.
>> They would need to rebuild the container host userspace with the markup
>> to keep it protected, and then they could switch to a kernel which has
>> vsyscall-unless-opt-out logic.  That seems to be a bit of a stretch
>> because from their perspective, there's no problem today.
>> 
>> My guess is that it would be easier to have a personality flag.  Then
>> they could keep the host largely as-is, and would “only” need a
>> mechanism to pass through the flag from the image metadata to the actual
>> container creation.  It's still a change to the container host (and the
>> kernel change is required as well), but it would not require relinking
>> every statically linked binary.

> The problem with a personality flag is that it needs to have some kind
> of sensible behavior for setuid programs, and getting that right in a
> way that doesn’t scream “exploit me” while preserving useful
> compatibility may be tricky.

Are restrictive personality flags still a problem with user namespaces?
I think it would be fine to restrict this one to CAP_SYS_ADMIN.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-26 16:45                       ` Florian Weimer
@ 2019-06-26 16:52                         ` Andy Lutomirski
  2019-06-26 17:04                           ` Florian Weimer
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2019-06-26 16:52 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Thomas Gleixner, Linux API, Kernel Hardening,
	linux-x86_64, linux-arch, Kees Cook, Carlos O'Donell, X86 ML

On Wed, Jun 26, 2019 at 9:45 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Andy Lutomirski:
>
> > Can’t an ELF note be done with some more or less ordinary asm such
> > that any link editor will insert it correctly?
>
> We've just been over this for the CET enablement.  ELF PT_NOTE parsing
> was rejected there.

No one told me this.  Unless I missed something, the latest kernel
patches still had PT_NOTE parsing.  Can you point me at an
enlightening thread or explain what happened?

> > The problem with a personality flag is that it needs to have some kind
> > of sensible behavior for setuid programs, and getting that right in a
> > way that doesn’t scream “exploit me” while preserving useful
> > compatibility may be tricky.
>
> Are restrictive personality flags still a problem with user namespaces?
> I think it would be fine to restrict this one to CAP_SYS_ADMIN.

We could possibly get away with this, but now we're introducing a
whole new mechanism.  I'd rather just add proper per-namespace
sysctls, but this is a pretty big hammer.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-26 16:52                         ` Andy Lutomirski
@ 2019-06-26 17:04                           ` Florian Weimer
  2019-06-26 17:14                             ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2019-06-26 17:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Linux API, Kernel Hardening, linux-x86_64,
	linux-arch, Kees Cook, Carlos O'Donell, X86 ML

* Andy Lutomirski:

> On Wed, Jun 26, 2019 at 9:45 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Andy Lutomirski:
>>
>> > Can’t an ELF note be done with some more or less ordinary asm such
>> > that any link editor will insert it correctly?
>>
>> We've just been over this for the CET enablement.  ELF PT_NOTE parsing
>> was rejected there.
>
> No one told me this.  Unless I missed something, the latest kernel
> patches still had PT_NOTE parsing.  Can you point me at an
> enlightening thread or explain what happened?

The ABI was changed rather late, and PT_GNU_PROPERTY has been added.
But this is okay because the kernel only looks at the dynamic loader,
which we can update fairly easily.

The thread is:

Subject: Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file

<87blyu7ubf.fsf@oldenburg2.str.redhat.com> is a message reference in it.

>> > The problem with a personality flag is that it needs to have some kind
>> > of sensible behavior for setuid programs, and getting that right in a
>> > way that doesn’t scream “exploit me” while preserving useful
>> > compatibility may be tricky.
>>
>> Are restrictive personality flags still a problem with user namespaces?
>> I think it would be fine to restrict this one to CAP_SYS_ADMIN.
>
> We could possibly get away with this, but now we're introducing a
> whole new mechanism.  I'd rather just add proper per-namespace
> sysctls, but this is a pretty big hammer.

Oh, I wasn't aware of that.  I thought that this already existed in some
form, e.g. prctl with PR_SET_SECCOMP requiring CAP_SYS_ADMIN unless
PR_SET_NO_NEW_PRIVS was active as well.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Detecting the availability of VSYSCALL
  2019-06-26 17:04                           ` Florian Weimer
@ 2019-06-26 17:14                             ` Andy Lutomirski
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2019-06-26 17:14 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andy Lutomirski, Thomas Gleixner, Linux API, Kernel Hardening,
	linux-x86_64, linux-arch, Kees Cook, Carlos O'Donell, X86 ML

On Wed, Jun 26, 2019 at 10:04 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Andy Lutomirski:
>
> > On Wed, Jun 26, 2019 at 9:45 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * Andy Lutomirski:
> >>
> >> > Can’t an ELF note be done with some more or less ordinary asm such
> >> > that any link editor will insert it correctly?
> >>
> >> We've just been over this for the CET enablement.  ELF PT_NOTE parsing
> >> was rejected there.
> >
> > No one told me this.  Unless I missed something, the latest kernel
> > patches still had PT_NOTE parsing.  Can you point me at an
> > enlightening thread or explain what happened?
>
> The ABI was changed rather late, and PT_GNU_PROPERTY has been added.
> But this is okay because the kernel only looks at the dynamic loader,
> which we can update fairly easily.

Ugh.  I replied there.  I don't consider any of that to have much
bearing on what we do for vsyscalls.  That being said, the
PT_GNU_PROPERTY thing sounds like maybe we could use it for a bit
saying "no vsyscalls needed".

>
> The thread is:
>
> Subject: Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
>
> <87blyu7ubf.fsf@oldenburg2.str.redhat.com> is a message reference in it.
>
> >> > The problem with a personality flag is that it needs to have some kind
> >> > of sensible behavior for setuid programs, and getting that right in a
> >> > way that doesn’t scream “exploit me” while preserving useful
> >> > compatibility may be tricky.
> >>
> >> Are restrictive personality flags still a problem with user namespaces?
> >> I think it would be fine to restrict this one to CAP_SYS_ADMIN.
> >
> > We could possibly get away with this, but now we're introducing a
> > whole new mechanism.  I'd rather just add proper per-namespace
> > sysctls, but this is a pretty big hammer.
>
> Oh, I wasn't aware of that.  I thought that this already existed in some
> form, e.g. prctl with PR_SET_SECCOMP requiring CAP_SYS_ADMIN unless
> PR_SET_NO_NEW_PRIVS was active as well.

We do have that, but I don't think we have it for personality.  The
whole personality mechanism scares me a bit due to a lack of this type
of thing, and I'd want to review it carefully before adding a new
personality bit.


--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-06-26 17:15 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25 15:15 Detecting the availability of VSYSCALL Florian Weimer
2019-06-25 16:30 ` Thomas Gleixner
2019-06-25 16:38   ` Florian Weimer
2019-06-25 20:11     ` Andy Lutomirski
2019-06-25 20:47       ` Florian Weimer
2019-06-25 21:49         ` Andy Lutomirski
2019-06-26 12:12           ` Florian Weimer
2019-06-26 14:15             ` Andy Lutomirski
2019-06-26 15:00               ` Florian Weimer
2019-06-26 15:21                 ` Andy Lutomirski
2019-06-26 15:36                   ` Florian Weimer
2019-06-26 16:24                     ` Andy Lutomirski
2019-06-26 16:45                       ` Florian Weimer
2019-06-26 16:52                         ` Andy Lutomirski
2019-06-26 17:04                           ` Florian Weimer
2019-06-26 17:14                             ` Andy Lutomirski
2019-06-25 20:08 ` Kees Cook
2019-06-25 20:13   ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).