Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed

* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
       [not found]   ` <78464155-f459-773f-d0ee-c5bdbeb39e5d@gmail.com>
@ 2020-10-22 20:02     ` Kees Cook
  2020-10-22 22:24       ` Topi Miettinen
  2020-10-23  9:02       ` Catalin Marinas
  0 siblings, 2 replies; 12+ messages in thread
From: Kees Cook @ 2020-10-22 20:02 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha,
	systemd-devel, linux-kernel, Mark Rutland, Mark Brown,
	Dave Martin, Catalin Marinas, Will Deacon, Salvatore Mesoraca,
	kernel-hardening, linux-hardening

On Thu, Oct 22, 2020 at 01:39:07PM +0300, Topi Miettinen wrote:
> But I think SELinux has a more complete solution (execmem) which can track
> the pages better than is possible with seccomp solution which has a very
> narrow field of view. Maybe this facility could be made available to
> non-SELinux systems, for example with prctl()? Then the in-kernel MDWX could
> allow mprotect(PROT_EXEC | PROT_BTI) in case the backing file hasn't been
> modified, the source filesystem isn't writable for the calling process and
> the file descriptor isn't created with memfd_create().

Right. The problem here is that systemd is attempting to mediate a
state change using only syscall details (i.e. with seccomp) instead of
a stateful analysis. Using a MAC is likely the only sane way to do that.
SELinux is a bit difficult to adjust "on the fly" the way systemd would
like to do things, and the more dynamic approach seen with SARA[1] isn't
yet in the kernel. Trying to enforce memory W^X protection correctly
via seccomp isn't really going to work well, as far as I can see.

Regardless, it makes sense to me to have the kernel load the executable
itself with BTI enabled by default. I prefer gaining Catalin's suggested
patch[2]. :)

[1] https://lore.kernel.org/kernel-hardening/1562410493-8661-1-git-send-email-s.mesoraca16@gmail.com/
[2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
  2020-10-22 20:02     ` BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures Kees Cook
@ 2020-10-22 22:24       ` Topi Miettinen
  2020-10-23 17:52         ` Salvatore Mesoraca
  2020-10-23  9:02       ` Catalin Marinas
  1 sibling, 1 reply; 12+ messages in thread
From: Topi Miettinen @ 2020-10-22 22:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha,
	systemd-devel, linux-kernel, Mark Rutland, Mark Brown,
	Dave Martin, Catalin Marinas, Will Deacon, Salvatore Mesoraca,
	kernel-hardening, linux-hardening

On 22.10.2020 23.02, Kees Cook wrote:
> On Thu, Oct 22, 2020 at 01:39:07PM +0300, Topi Miettinen wrote:
>> But I think SELinux has a more complete solution (execmem) which can track
>> the pages better than is possible with seccomp solution which has a very
>> narrow field of view. Maybe this facility could be made available to
>> non-SELinux systems, for example with prctl()? Then the in-kernel MDWX could
>> allow mprotect(PROT_EXEC | PROT_BTI) in case the backing file hasn't been
>> modified, the source filesystem isn't writable for the calling process and
>> the file descriptor isn't created with memfd_create().
> 
> Right. The problem here is that systemd is attempting to mediate a
> state change using only syscall details (i.e. with seccomp) instead of
> a stateful analysis. Using a MAC is likely the only sane way to do that.
> SELinux is a bit difficult to adjust "on the fly" the way systemd would
> like to do things, and the more dynamic approach seen with SARA[1] isn't
> yet in the kernel.

SARA looks interesting. What is missing is a prctl() to enable all W^X 
protections irrevocably for the current process, then systemd could 
enable it for services with MemoryDenyWriteExecute=yes.

I didn't also see specific measures against memfd_create() or file 
system W&X, but perhaps those can be added later. Maybe pkey_mprotect() 
is not handled either unless it uses the same LSM hook as mprotect().

> Trying to enforce memory W^X protection correctly
> via seccomp isn't really going to work well, as far as I can see.

Not in general, but I think it can work well in context of system 
services. Then you can ensure that for a specific service, 
memfd_create() is blocked by seccomp and the file systems are W^X 
because of mount namespaces etc., so there should not be any means to 
construct arbitrary executable pages.

-Topi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
  2020-10-22 22:24       ` Topi Miettinen
@ 2020-10-23 17:52         ` Salvatore Mesoraca
  2020-10-24 11:34           ` Topi Miettinen
  0 siblings, 1 reply; 12+ messages in thread
From: Salvatore Mesoraca @ 2020-10-23 17:52 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: Kees Cook, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel,
	libc-alpha, systemd-devel, linux-kernel, Mark Rutland,
	Mark Brown, Dave Martin, Catalin Marinas, Will Deacon,
	Kernel Hardening, linux-hardening

Hi,

On Thu, 22 Oct 2020 at 23:24, Topi Miettinen <toiwoton@gmail.com> wrote:
> SARA looks interesting. What is missing is a prctl() to enable all W^X
> protections irrevocably for the current process, then systemd could
> enable it for services with MemoryDenyWriteExecute=yes.

SARA actually has a procattr[0] interface to do just that.
There is also a library[1] to help using it.

> I didn't also see specific measures against memfd_create() or file
> system W&X, but perhaps those can be added later.

You are right, there are no measures against those vectors.
It would be interesting to add them, though.

> Maybe pkey_mprotect()
> is not handled either unless it uses the same LSM hook as mprotect().

IIRC mprotect is implemented more or less as a pkey_mprotect with -1 as pkey.
The same LSM hook should cover both.

Salvatore

[0] https://lore.kernel.org/lkml/1562410493-8661-10-git-send-email-s.mesoraca16@gmail.com/
[1] https://github.com/smeso/libsara

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
  2020-10-23 17:52         ` Salvatore Mesoraca
@ 2020-10-24 11:34           ` Topi Miettinen
  2020-10-24 14:12             ` Salvatore Mesoraca
  0 siblings, 1 reply; 12+ messages in thread
From: Topi Miettinen @ 2020-10-24 11:34 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: Kees Cook, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel,
	libc-alpha, systemd-devel, linux-kernel, Mark Rutland,
	Mark Brown, Dave Martin, Catalin Marinas, Will Deacon,
	Kernel Hardening, linux-hardening

On 23.10.2020 20.52, Salvatore Mesoraca wrote:
> Hi,
> 
> On Thu, 22 Oct 2020 at 23:24, Topi Miettinen <toiwoton@gmail.com> wrote:
>> SARA looks interesting. What is missing is a prctl() to enable all W^X
>> protections irrevocably for the current process, then systemd could
>> enable it for services with MemoryDenyWriteExecute=yes.
> 
> SARA actually has a procattr[0] interface to do just that.
> There is also a library[1] to help using it.

That means that /proc has to be available and writable at that point, so 
setting up procattrs has to be done before mount namespaces are set up. 
In general, it would be nice for sandboxing facilities in kernel if 
there would be a way to start enforcing restrictions only at next 
execve(), like setexeccon() for SELinux and aa_change_onexec() for 
AppArmor. Otherwise the exact order of setting up various sandboxing 
options can be very tricky to arrange correctly, since each option may 
have a subtle effect to the sandboxing features enabled later. In case 
of SARA, the operations done between shuffling the mount namespace and 
before execve() shouldn't be affected so it isn't important. Even if it 
did (a new sandboxing feature in the future would need trampolines or 
JIT code generation), maybe the procattr file could be opened early but 
it could be written closer to execve().

-Topi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
  2020-10-24 11:34           ` Topi Miettinen
@ 2020-10-24 14:12             ` Salvatore Mesoraca
  2020-10-25 13:42               ` Jordan Glover
  0 siblings, 1 reply; 12+ messages in thread
From: Salvatore Mesoraca @ 2020-10-24 14:12 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: Kees Cook, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel,
	libc-alpha, systemd-devel, linux-kernel, Mark Rutland,
	Mark Brown, Dave Martin, Catalin Marinas, Will Deacon,
	Kernel Hardening, linux-hardening

On Sat, 24 Oct 2020 at 12:34, Topi Miettinen <toiwoton@gmail.com> wrote:
>
> On 23.10.2020 20.52, Salvatore Mesoraca wrote:
> > Hi,
> >
> > On Thu, 22 Oct 2020 at 23:24, Topi Miettinen <toiwoton@gmail.com> wrote:
> >> SARA looks interesting. What is missing is a prctl() to enable all W^X
> >> protections irrevocably for the current process, then systemd could
> >> enable it for services with MemoryDenyWriteExecute=yes.
> >
> > SARA actually has a procattr[0] interface to do just that.
> > There is also a library[1] to help using it.
>
> That means that /proc has to be available and writable at that point, so
> setting up procattrs has to be done before mount namespaces are set up.
> In general, it would be nice for sandboxing facilities in kernel if
> there would be a way to start enforcing restrictions only at next
> execve(), like setexeccon() for SELinux and aa_change_onexec() for
> AppArmor. Otherwise the exact order of setting up various sandboxing
> options can be very tricky to arrange correctly, since each option may
> have a subtle effect to the sandboxing features enabled later. In case
> of SARA, the operations done between shuffling the mount namespace and
> before execve() shouldn't be affected so it isn't important. Even if it
> did (a new sandboxing feature in the future would need trampolines or
> JIT code generation), maybe the procattr file could be opened early but
> it could be written closer to execve().

A new "apply on exec" procattr file seems reasonable and relatively easy to add.
As Kees pointed out, the main obstacle here is the fact that SARA is
not upstream :(

Salvatore

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
  2020-10-24 14:12             ` Salvatore Mesoraca
@ 2020-10-25 13:42               ` Jordan Glover
  0 siblings, 0 replies; 12+ messages in thread
From: Jordan Glover @ 2020-10-25 13:42 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: Topi Miettinen, Kees Cook, Szabolcs Nagy, Jeremy Linton,
	linux-arm-kernel, libc-alpha, systemd-devel, linux-kernel,
	Mark Rutland, Mark Brown, Dave Martin, Catalin Marinas,
	Will Deacon, Kernel Hardening, linux-hardening

On Saturday, October 24, 2020 2:12 PM, Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:

> On Sat, 24 Oct 2020 at 12:34, Topi Miettinen toiwoton@gmail.com wrote:
>
> > On 23.10.2020 20.52, Salvatore Mesoraca wrote:
> >
> > > Hi,
> > > On Thu, 22 Oct 2020 at 23:24, Topi Miettinen toiwoton@gmail.com wrote:
> > >
> > > > SARA looks interesting. What is missing is a prctl() to enable all W^X
> > > > protections irrevocably for the current process, then systemd could
> > > > enable it for services with MemoryDenyWriteExecute=yes.
> > >
> > > SARA actually has a procattr[0] interface to do just that.
> > > There is also a library[1] to help using it.
> >
> > That means that /proc has to be available and writable at that point, so
> > setting up procattrs has to be done before mount namespaces are set up.
> > In general, it would be nice for sandboxing facilities in kernel if
> > there would be a way to start enforcing restrictions only at next
> > execve(), like setexeccon() for SELinux and aa_change_onexec() for
> > AppArmor. Otherwise the exact order of setting up various sandboxing
> > options can be very tricky to arrange correctly, since each option may
> > have a subtle effect to the sandboxing features enabled later. In case
> > of SARA, the operations done between shuffling the mount namespace and
> > before execve() shouldn't be affected so it isn't important. Even if it
> > did (a new sandboxing feature in the future would need trampolines or
> > JIT code generation), maybe the procattr file could be opened early but
> > it could be written closer to execve().
>
> A new "apply on exec" procattr file seems reasonable and relatively easy to add.
> As Kees pointed out, the main obstacle here is the fact that SARA is
> not upstream :(
>
> Salvatore

Is there a chance we will see new SARA iteration soon on lkml? :)

Jordan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
  2020-10-22 20:02     ` BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures Kees Cook
  2020-10-22 22:24       ` Topi Miettinen
@ 2020-10-23  9:02       ` Catalin Marinas
  2020-10-24 11:01         ` Topi Miettinen
  1 sibling, 1 reply; 12+ messages in thread
From: Catalin Marinas @ 2020-10-23  9:02 UTC (permalink / raw)
  To: Kees Cook
  Cc: Topi Miettinen, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel,
	libc-alpha, systemd-devel, linux-kernel, Mark Rutland,
	Mark Brown, Dave Martin, Will Deacon, Salvatore Mesoraca,
	kernel-hardening, linux-hardening

On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote:
> Regardless, it makes sense to me to have the kernel load the executable
> itself with BTI enabled by default. I prefer gaining Catalin's suggested
> patch[2]. :)
[...]
> [2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/

I think I first heard the idea at Mark R ;).

It still needs glibc changes to avoid the mprotect(), or at least ignore
the error. Since this is an ABI change and we don't know which kernels
would have it backported, maybe better to still issue the mprotect() but
ignore the failure.

-- 
Catalin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
  2020-10-23  9:02       ` Catalin Marinas
@ 2020-10-24 11:01         ` Topi Miettinen
  2020-10-26 14:52           ` Catalin Marinas
  0 siblings, 1 reply; 12+ messages in thread
From: Topi Miettinen @ 2020-10-24 11:01 UTC (permalink / raw)
  To: Catalin Marinas, Kees Cook
  Cc: Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha,
	systemd-devel, linux-kernel, Mark Rutland, Mark Brown,
	Dave Martin, Will Deacon, Salvatore Mesoraca, kernel-hardening,
	linux-hardening

On 23.10.2020 12.02, Catalin Marinas wrote:
> On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote:
>> Regardless, it makes sense to me to have the kernel load the executable
>> itself with BTI enabled by default. I prefer gaining Catalin's suggested
>> patch[2]. :)
> [...]
>> [2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/
> 
> I think I first heard the idea at Mark R ;).
> 
> It still needs glibc changes to avoid the mprotect(), or at least ignore
> the error. Since this is an ABI change and we don't know which kernels
> would have it backported, maybe better to still issue the mprotect() but
> ignore the failure.

What about kernel adding an auxiliary vector as a flag to indicate that 
BTI is supported and recommended by the kernel? Then dynamic loader 
could use that to detect that a) the main executable is BTI protected 
and there's no need to mprotect() it and b) PROT_BTI flag should be 
added to all PROT_EXEC pages.

In absence of the vector, the dynamic loader might choose to skip doing 
PROT_BTI at all (since the main executable isn't protected anyway 
either, or maybe even the kernel is up-to-date but it knows that it's 
not recommended for some reason, or maybe the kernel is so ancient that 
it doesn't know about BTI). Optionally it could still read the flag from 
ELF later (for compatibility with old kernels) and then do the 
mprotect() dance, which may trip seccomp filters, possibly fatally.

-Topi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
  2020-10-24 11:01         ` Topi Miettinen
@ 2020-10-26 14:52           ` Catalin Marinas
  2020-10-26 15:56             ` Dave Martin
  2020-10-26 16:31             ` Topi Miettinen
  0 siblings, 2 replies; 12+ messages in thread
From: Catalin Marinas @ 2020-10-26 14:52 UTC (permalink / raw)
  To: Topi Miettinen
  Cc: Kees Cook, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel,
	libc-alpha, systemd-devel, linux-kernel, Mark Rutland,
	Mark Brown, Dave Martin, Will Deacon, Salvatore Mesoraca,
	kernel-hardening, linux-hardening

On Sat, Oct 24, 2020 at 02:01:30PM +0300, Topi Miettinen wrote:
> On 23.10.2020 12.02, Catalin Marinas wrote:
> > On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote:
> > > Regardless, it makes sense to me to have the kernel load the executable
> > > itself with BTI enabled by default. I prefer gaining Catalin's suggested
> > > patch[2]. :)
> > [...]
> > > [2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/
> > 
> > I think I first heard the idea at Mark R ;).
> > 
> > It still needs glibc changes to avoid the mprotect(), or at least ignore
> > the error. Since this is an ABI change and we don't know which kernels
> > would have it backported, maybe better to still issue the mprotect() but
> > ignore the failure.
> 
> What about kernel adding an auxiliary vector as a flag to indicate that BTI
> is supported and recommended by the kernel? Then dynamic loader could use
> that to detect that a) the main executable is BTI protected and there's no
> need to mprotect() it and b) PROT_BTI flag should be added to all PROT_EXEC
> pages.

We could add a bit to AT_FLAGS, it's always been 0 for Linux.

> In absence of the vector, the dynamic loader might choose to skip doing
> PROT_BTI at all (since the main executable isn't protected anyway either, or
> maybe even the kernel is up-to-date but it knows that it's not recommended
> for some reason, or maybe the kernel is so ancient that it doesn't know
> about BTI). Optionally it could still read the flag from ELF later (for
> compatibility with old kernels) and then do the mprotect() dance, which may
> trip seccomp filters, possibly fatally.

I think the safest is for the dynamic loader to issue an mprotect() and
ignore the EPERM error. Not all user deployments have this seccomp
filter, so they can still benefit, and user can't tell whether the
kernel change has been backported.

Now, if the dynamic loader silently ignores the mprotect() failure on
the main executable, is there much value in exposing a flag in the aux
vectors? It saves a few (one?) mprotect() calls but I don't think it
matters much. Anyway, I don't mind the flag.

The only potential risk is if the dynamic loader decides not to turn
PROT_BTI one because of some mix and match of objects but AFAIK BTI
allows interworking.

-- 
Catalin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
  2020-10-26 14:52           ` Catalin Marinas
@ 2020-10-26 15:56             ` Dave Martin
  2020-10-26 16:51               ` Mark Brown
  2020-10-26 16:31             ` Topi Miettinen
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Martin @ 2020-10-26 15:56 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Topi Miettinen, Mark Rutland, Salvatore Mesoraca, systemd-devel,
	Kees Cook, kernel-hardening, Will Deacon, linux-kernel,
	Jeremy Linton, Mark Brown, linux-hardening, libc-alpha,
	linux-arm-kernel

On Mon, Oct 26, 2020 at 02:52:46PM +0000, Catalin Marinas via Libc-alpha wrote:
> On Sat, Oct 24, 2020 at 02:01:30PM +0300, Topi Miettinen wrote:
> > On 23.10.2020 12.02, Catalin Marinas wrote:
> > > On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote:
> > > > Regardless, it makes sense to me to have the kernel load the executable
> > > > itself with BTI enabled by default. I prefer gaining Catalin's suggested
> > > > patch[2]. :)
> > > [...]
> > > > [2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/
> > > 
> > > I think I first heard the idea at Mark R ;).
> > > 
> > > It still needs glibc changes to avoid the mprotect(), or at least ignore
> > > the error. Since this is an ABI change and we don't know which kernels
> > > would have it backported, maybe better to still issue the mprotect() but
> > > ignore the failure.
> > 
> > What about kernel adding an auxiliary vector as a flag to indicate that BTI
> > is supported and recommended by the kernel? Then dynamic loader could use
> > that to detect that a) the main executable is BTI protected and there's no
> > need to mprotect() it and b) PROT_BTI flag should be added to all PROT_EXEC
> > pages.
> 
> We could add a bit to AT_FLAGS, it's always been 0 for Linux.
> 
> > In absence of the vector, the dynamic loader might choose to skip doing
> > PROT_BTI at all (since the main executable isn't protected anyway either, or
> > maybe even the kernel is up-to-date but it knows that it's not recommended
> > for some reason, or maybe the kernel is so ancient that it doesn't know
> > about BTI). Optionally it could still read the flag from ELF later (for
> > compatibility with old kernels) and then do the mprotect() dance, which may
> > trip seccomp filters, possibly fatally.
> 
> I think the safest is for the dynamic loader to issue an mprotect() and
> ignore the EPERM error. Not all user deployments have this seccomp
> filter, so they can still benefit, and user can't tell whether the
> kernel change has been backported.
> 
> Now, if the dynamic loader silently ignores the mprotect() failure on
> the main executable, is there much value in exposing a flag in the aux
> vectors? It saves a few (one?) mprotect() calls but I don't think it
> matters much. Anyway, I don't mind the flag.

I don't see a problem with the aforementioned patch [2] to pre-set BTI
on the pages of the main binary.

The original rationale here was that ld.so doesn't _need_ this, since it
is going to examine the binary's ELF headers anyway.  But equally, if
the binary is marked as supporting BTI then it's safe to enable BTI for
the binary's own pages.


I'd tend to agree that an AT_FLAGS flag doesn't add much.  I think real
EPERMs would only be seen in assert-fail type situations.  Failure of
mmap() is likely to result in a segfault later on, or correct operation
with weakened permissions on some pages.  Given the likely failure
modes, that situation doesn't feel too bad.


> The only potential risk is if the dynamic loader decides not to turn
> PROT_BTI one because of some mix and match of objects but AFAIK BTI
> allows interworking.

Yes, the design means that a page's PROT_BTI can be set safely if the
code in that page was compiled for BTI, irrespective of how other pages
were compiled.  The reasons why we don't do this at finer granularity
are (a) is't not very useful, and (b) ELF images only contain a BTI
property note for the whole image, not per segment.

I think that ld.so already makes this decision at ELF image granularity
(unless someone contradicts me).

Cheers
---Dave

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
  2020-10-26 15:56             ` Dave Martin
@ 2020-10-26 16:51               ` Mark Brown
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Brown @ 2020-10-26 16:51 UTC (permalink / raw)
  To: Dave Martin
  Cc: Catalin Marinas, Topi Miettinen, Mark Rutland,
	Salvatore Mesoraca, systemd-devel, Kees Cook, kernel-hardening,
	Will Deacon, linux-kernel, Jeremy Linton, linux-hardening,
	libc-alpha, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 541 bytes --]

On Mon, Oct 26, 2020 at 03:56:35PM +0000, Dave Martin wrote:
> On Mon, Oct 26, 2020 at 02:52:46PM +0000, Catalin Marinas via Libc-alpha wrote:

> > Now, if the dynamic loader silently ignores the mprotect() failure on
> > the main executable, is there much value in exposing a flag in the aux
> > vectors? It saves a few (one?) mprotect() calls but I don't think it
> > matters much. Anyway, I don't mind the flag.

> I don't see a problem with the aforementioned patch [2] to pre-set BTI
> on the pages of the main binary.

Me either FWIW.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
  2020-10-26 14:52           ` Catalin Marinas
  2020-10-26 15:56             ` Dave Martin
@ 2020-10-26 16:31             ` Topi Miettinen
  1 sibling, 0 replies; 12+ messages in thread
From: Topi Miettinen @ 2020-10-26 16:31 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Kees Cook, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel,
	libc-alpha, systemd-devel, linux-kernel, Mark Rutland,
	Mark Brown, Dave Martin, Will Deacon, Salvatore Mesoraca,
	kernel-hardening, linux-hardening

On 26.10.2020 16.52, Catalin Marinas wrote:
> On Sat, Oct 24, 2020 at 02:01:30PM +0300, Topi Miettinen wrote:
>> On 23.10.2020 12.02, Catalin Marinas wrote:
>>> On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote:
>>>> Regardless, it makes sense to me to have the kernel load the executable
>>>> itself with BTI enabled by default. I prefer gaining Catalin's suggested
>>>> patch[2]. :)
>>> [...]
>>>> [2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/
>>>
>>> I think I first heard the idea at Mark R ;).
>>>
>>> It still needs glibc changes to avoid the mprotect(), or at least ignore
>>> the error. Since this is an ABI change and we don't know which kernels
>>> would have it backported, maybe better to still issue the mprotect() but
>>> ignore the failure.
>>
>> What about kernel adding an auxiliary vector as a flag to indicate that BTI
>> is supported and recommended by the kernel? Then dynamic loader could use
>> that to detect that a) the main executable is BTI protected and there's no
>> need to mprotect() it and b) PROT_BTI flag should be added to all PROT_EXEC
>> pages.
> 
> We could add a bit to AT_FLAGS, it's always been 0 for Linux.

Great!

>> In absence of the vector, the dynamic loader might choose to skip doing
>> PROT_BTI at all (since the main executable isn't protected anyway either, or
>> maybe even the kernel is up-to-date but it knows that it's not recommended
>> for some reason, or maybe the kernel is so ancient that it doesn't know
>> about BTI). Optionally it could still read the flag from ELF later (for
>> compatibility with old kernels) and then do the mprotect() dance, which may
>> trip seccomp filters, possibly fatally.
> 
> I think the safest is for the dynamic loader to issue an mprotect() and
> ignore the EPERM error. Not all user deployments have this seccomp
> filter, so they can still benefit, and user can't tell whether the
> kernel change has been backported.

But the seccomp filter can be set to kill the process, so that's 
definitely not the safest way. I think safest is that when the AT_FLAGS 
bit is seen, ld.so doesn't do any mprotect() calls but instead when 
mapping the segments, mmap() flags are adjusted to include PROT_BTI, so 
mprotect() calls are not necessary. If there's no seccomp filter, 
there's no disadvantage for avoiding the useless mprotect() calls.

I'd expect the backported kernel change to include both aux vector and 
also using PROT_BTI for the main executable. Then the logic would work 
with backported kernels as well.

If there's no aux vector, all bets are off. The kernel could be old and 
unpatched, even so old that PROT_BTI is not known. Perhaps also in the 
future there may be new technologies which have replaced BTI and the 
kernel could want a previous generation ld.so not to try to use BTI, so 
this could be also indicated with the lack of aux vector. The dynamic 
loader could still attempt to mprotect() the pages, but that could be 
fatal. Getting to the point where the error can be ignored means that 
there's no seccomp filter, at least none set to kill. Perhaps the pain 
is only temporary, new or patched kernels should eventually replace the 
old versions.

> Now, if the dynamic loader silently ignores the mprotect() failure on
> the main executable, is there much value in exposing a flag in the aux
> vectors? It saves a few (one?) mprotect() calls but I don't think it
> matters much. Anyway, I don't mind the flag.

Saving a few system calls is indeed not an issue, but not being able to 
use MDWX and PROT_BTI simultaneously was the original problem (service 
failures).

-Topi

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-10-26 17:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <8584c14f-5c28-9d70-c054-7c78127d84ea@arm.com>
     [not found] ` <20201022075447.GO3819@arm.com>
     [not found]   ` <78464155-f459-773f-d0ee-c5bdbeb39e5d@gmail.com>
2020-10-22 20:02     ` BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures Kees Cook
2020-10-22 22:24       ` Topi Miettinen
2020-10-23 17:52         ` Salvatore Mesoraca
2020-10-24 11:34           ` Topi Miettinen
2020-10-24 14:12             ` Salvatore Mesoraca
2020-10-25 13:42               ` Jordan Glover
2020-10-23  9:02       ` Catalin Marinas
2020-10-24 11:01         ` Topi Miettinen
2020-10-26 14:52           ` Catalin Marinas
2020-10-26 15:56             ` Dave Martin
2020-10-26 16:51               ` Mark Brown
2020-10-26 16:31             ` Topi Miettinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).