* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures [not found] ` <78464155-f459-773f-d0ee-c5bdbeb39e5d@gmail.com> @ 2020-10-22 20:02 ` Kees Cook 2020-10-22 22:24 ` Topi Miettinen 2020-10-23 9:02 ` Catalin Marinas 0 siblings, 2 replies; 12+ messages in thread From: Kees Cook @ 2020-10-22 20:02 UTC (permalink / raw) To: Topi Miettinen Cc: Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha, systemd-devel, linux-kernel, Mark Rutland, Mark Brown, Dave Martin, Catalin Marinas, Will Deacon, Salvatore Mesoraca, kernel-hardening, linux-hardening On Thu, Oct 22, 2020 at 01:39:07PM +0300, Topi Miettinen wrote: > But I think SELinux has a more complete solution (execmem) which can track > the pages better than is possible with seccomp solution which has a very > narrow field of view. Maybe this facility could be made available to > non-SELinux systems, for example with prctl()? Then the in-kernel MDWX could > allow mprotect(PROT_EXEC | PROT_BTI) in case the backing file hasn't been > modified, the source filesystem isn't writable for the calling process and > the file descriptor isn't created with memfd_create(). Right. The problem here is that systemd is attempting to mediate a state change using only syscall details (i.e. with seccomp) instead of a stateful analysis. Using a MAC is likely the only sane way to do that. SELinux is a bit difficult to adjust "on the fly" the way systemd would like to do things, and the more dynamic approach seen with SARA[1] isn't yet in the kernel. Trying to enforce memory W^X protection correctly via seccomp isn't really going to work well, as far as I can see. Regardless, it makes sense to me to have the kernel load the executable itself with BTI enabled by default. I prefer gaining Catalin's suggested patch[2]. :) [1] https://lore.kernel.org/kernel-hardening/1562410493-8661-1-git-send-email-s.mesoraca16@gmail.com/ [2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/ -- Kees Cook ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures 2020-10-22 20:02 ` BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures Kees Cook @ 2020-10-22 22:24 ` Topi Miettinen 2020-10-23 17:52 ` Salvatore Mesoraca 2020-10-23 9:02 ` Catalin Marinas 1 sibling, 1 reply; 12+ messages in thread From: Topi Miettinen @ 2020-10-22 22:24 UTC (permalink / raw) To: Kees Cook Cc: Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha, systemd-devel, linux-kernel, Mark Rutland, Mark Brown, Dave Martin, Catalin Marinas, Will Deacon, Salvatore Mesoraca, kernel-hardening, linux-hardening On 22.10.2020 23.02, Kees Cook wrote: > On Thu, Oct 22, 2020 at 01:39:07PM +0300, Topi Miettinen wrote: >> But I think SELinux has a more complete solution (execmem) which can track >> the pages better than is possible with seccomp solution which has a very >> narrow field of view. Maybe this facility could be made available to >> non-SELinux systems, for example with prctl()? Then the in-kernel MDWX could >> allow mprotect(PROT_EXEC | PROT_BTI) in case the backing file hasn't been >> modified, the source filesystem isn't writable for the calling process and >> the file descriptor isn't created with memfd_create(). > > Right. The problem here is that systemd is attempting to mediate a > state change using only syscall details (i.e. with seccomp) instead of > a stateful analysis. Using a MAC is likely the only sane way to do that. > SELinux is a bit difficult to adjust "on the fly" the way systemd would > like to do things, and the more dynamic approach seen with SARA[1] isn't > yet in the kernel. SARA looks interesting. What is missing is a prctl() to enable all W^X protections irrevocably for the current process, then systemd could enable it for services with MemoryDenyWriteExecute=yes. I didn't also see specific measures against memfd_create() or file system W&X, but perhaps those can be added later. Maybe pkey_mprotect() is not handled either unless it uses the same LSM hook as mprotect(). > Trying to enforce memory W^X protection correctly > via seccomp isn't really going to work well, as far as I can see. Not in general, but I think it can work well in context of system services. Then you can ensure that for a specific service, memfd_create() is blocked by seccomp and the file systems are W^X because of mount namespaces etc., so there should not be any means to construct arbitrary executable pages. -Topi ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures 2020-10-22 22:24 ` Topi Miettinen @ 2020-10-23 17:52 ` Salvatore Mesoraca 2020-10-24 11:34 ` Topi Miettinen 0 siblings, 1 reply; 12+ messages in thread From: Salvatore Mesoraca @ 2020-10-23 17:52 UTC (permalink / raw) To: Topi Miettinen Cc: Kees Cook, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha, systemd-devel, linux-kernel, Mark Rutland, Mark Brown, Dave Martin, Catalin Marinas, Will Deacon, Kernel Hardening, linux-hardening Hi, On Thu, 22 Oct 2020 at 23:24, Topi Miettinen <toiwoton@gmail.com> wrote: > SARA looks interesting. What is missing is a prctl() to enable all W^X > protections irrevocably for the current process, then systemd could > enable it for services with MemoryDenyWriteExecute=yes. SARA actually has a procattr[0] interface to do just that. There is also a library[1] to help using it. > I didn't also see specific measures against memfd_create() or file > system W&X, but perhaps those can be added later. You are right, there are no measures against those vectors. It would be interesting to add them, though. > Maybe pkey_mprotect() > is not handled either unless it uses the same LSM hook as mprotect(). IIRC mprotect is implemented more or less as a pkey_mprotect with -1 as pkey. The same LSM hook should cover both. Salvatore [0] https://lore.kernel.org/lkml/1562410493-8661-10-git-send-email-s.mesoraca16@gmail.com/ [1] https://github.com/smeso/libsara ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures 2020-10-23 17:52 ` Salvatore Mesoraca @ 2020-10-24 11:34 ` Topi Miettinen 2020-10-24 14:12 ` Salvatore Mesoraca 0 siblings, 1 reply; 12+ messages in thread From: Topi Miettinen @ 2020-10-24 11:34 UTC (permalink / raw) To: Salvatore Mesoraca Cc: Kees Cook, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha, systemd-devel, linux-kernel, Mark Rutland, Mark Brown, Dave Martin, Catalin Marinas, Will Deacon, Kernel Hardening, linux-hardening On 23.10.2020 20.52, Salvatore Mesoraca wrote: > Hi, > > On Thu, 22 Oct 2020 at 23:24, Topi Miettinen <toiwoton@gmail.com> wrote: >> SARA looks interesting. What is missing is a prctl() to enable all W^X >> protections irrevocably for the current process, then systemd could >> enable it for services with MemoryDenyWriteExecute=yes. > > SARA actually has a procattr[0] interface to do just that. > There is also a library[1] to help using it. That means that /proc has to be available and writable at that point, so setting up procattrs has to be done before mount namespaces are set up. In general, it would be nice for sandboxing facilities in kernel if there would be a way to start enforcing restrictions only at next execve(), like setexeccon() for SELinux and aa_change_onexec() for AppArmor. Otherwise the exact order of setting up various sandboxing options can be very tricky to arrange correctly, since each option may have a subtle effect to the sandboxing features enabled later. In case of SARA, the operations done between shuffling the mount namespace and before execve() shouldn't be affected so it isn't important. Even if it did (a new sandboxing feature in the future would need trampolines or JIT code generation), maybe the procattr file could be opened early but it could be written closer to execve(). -Topi ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures 2020-10-24 11:34 ` Topi Miettinen @ 2020-10-24 14:12 ` Salvatore Mesoraca 2020-10-25 13:42 ` Jordan Glover 0 siblings, 1 reply; 12+ messages in thread From: Salvatore Mesoraca @ 2020-10-24 14:12 UTC (permalink / raw) To: Topi Miettinen Cc: Kees Cook, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha, systemd-devel, linux-kernel, Mark Rutland, Mark Brown, Dave Martin, Catalin Marinas, Will Deacon, Kernel Hardening, linux-hardening On Sat, 24 Oct 2020 at 12:34, Topi Miettinen <toiwoton@gmail.com> wrote: > > On 23.10.2020 20.52, Salvatore Mesoraca wrote: > > Hi, > > > > On Thu, 22 Oct 2020 at 23:24, Topi Miettinen <toiwoton@gmail.com> wrote: > >> SARA looks interesting. What is missing is a prctl() to enable all W^X > >> protections irrevocably for the current process, then systemd could > >> enable it for services with MemoryDenyWriteExecute=yes. > > > > SARA actually has a procattr[0] interface to do just that. > > There is also a library[1] to help using it. > > That means that /proc has to be available and writable at that point, so > setting up procattrs has to be done before mount namespaces are set up. > In general, it would be nice for sandboxing facilities in kernel if > there would be a way to start enforcing restrictions only at next > execve(), like setexeccon() for SELinux and aa_change_onexec() for > AppArmor. Otherwise the exact order of setting up various sandboxing > options can be very tricky to arrange correctly, since each option may > have a subtle effect to the sandboxing features enabled later. In case > of SARA, the operations done between shuffling the mount namespace and > before execve() shouldn't be affected so it isn't important. Even if it > did (a new sandboxing feature in the future would need trampolines or > JIT code generation), maybe the procattr file could be opened early but > it could be written closer to execve(). A new "apply on exec" procattr file seems reasonable and relatively easy to add. As Kees pointed out, the main obstacle here is the fact that SARA is not upstream :( Salvatore ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures 2020-10-24 14:12 ` Salvatore Mesoraca @ 2020-10-25 13:42 ` Jordan Glover 0 siblings, 0 replies; 12+ messages in thread From: Jordan Glover @ 2020-10-25 13:42 UTC (permalink / raw) To: Salvatore Mesoraca Cc: Topi Miettinen, Kees Cook, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha, systemd-devel, linux-kernel, Mark Rutland, Mark Brown, Dave Martin, Catalin Marinas, Will Deacon, Kernel Hardening, linux-hardening On Saturday, October 24, 2020 2:12 PM, Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > On Sat, 24 Oct 2020 at 12:34, Topi Miettinen toiwoton@gmail.com wrote: > > > On 23.10.2020 20.52, Salvatore Mesoraca wrote: > > > > > Hi, > > > On Thu, 22 Oct 2020 at 23:24, Topi Miettinen toiwoton@gmail.com wrote: > > > > > > > SARA looks interesting. What is missing is a prctl() to enable all W^X > > > > protections irrevocably for the current process, then systemd could > > > > enable it for services with MemoryDenyWriteExecute=yes. > > > > > > SARA actually has a procattr[0] interface to do just that. > > > There is also a library[1] to help using it. > > > > That means that /proc has to be available and writable at that point, so > > setting up procattrs has to be done before mount namespaces are set up. > > In general, it would be nice for sandboxing facilities in kernel if > > there would be a way to start enforcing restrictions only at next > > execve(), like setexeccon() for SELinux and aa_change_onexec() for > > AppArmor. Otherwise the exact order of setting up various sandboxing > > options can be very tricky to arrange correctly, since each option may > > have a subtle effect to the sandboxing features enabled later. In case > > of SARA, the operations done between shuffling the mount namespace and > > before execve() shouldn't be affected so it isn't important. Even if it > > did (a new sandboxing feature in the future would need trampolines or > > JIT code generation), maybe the procattr file could be opened early but > > it could be written closer to execve(). > > A new "apply on exec" procattr file seems reasonable and relatively easy to add. > As Kees pointed out, the main obstacle here is the fact that SARA is > not upstream :( > > Salvatore Is there a chance we will see new SARA iteration soon on lkml? :) Jordan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures 2020-10-22 20:02 ` BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures Kees Cook 2020-10-22 22:24 ` Topi Miettinen @ 2020-10-23 9:02 ` Catalin Marinas 2020-10-24 11:01 ` Topi Miettinen 1 sibling, 1 reply; 12+ messages in thread From: Catalin Marinas @ 2020-10-23 9:02 UTC (permalink / raw) To: Kees Cook Cc: Topi Miettinen, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha, systemd-devel, linux-kernel, Mark Rutland, Mark Brown, Dave Martin, Will Deacon, Salvatore Mesoraca, kernel-hardening, linux-hardening On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote: > Regardless, it makes sense to me to have the kernel load the executable > itself with BTI enabled by default. I prefer gaining Catalin's suggested > patch[2]. :) [...] > [2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/ I think I first heard the idea at Mark R ;). It still needs glibc changes to avoid the mprotect(), or at least ignore the error. Since this is an ABI change and we don't know which kernels would have it backported, maybe better to still issue the mprotect() but ignore the failure. -- Catalin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures 2020-10-23 9:02 ` Catalin Marinas @ 2020-10-24 11:01 ` Topi Miettinen 2020-10-26 14:52 ` Catalin Marinas 0 siblings, 1 reply; 12+ messages in thread From: Topi Miettinen @ 2020-10-24 11:01 UTC (permalink / raw) To: Catalin Marinas, Kees Cook Cc: Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha, systemd-devel, linux-kernel, Mark Rutland, Mark Brown, Dave Martin, Will Deacon, Salvatore Mesoraca, kernel-hardening, linux-hardening On 23.10.2020 12.02, Catalin Marinas wrote: > On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote: >> Regardless, it makes sense to me to have the kernel load the executable >> itself with BTI enabled by default. I prefer gaining Catalin's suggested >> patch[2]. :) > [...] >> [2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/ > > I think I first heard the idea at Mark R ;). > > It still needs glibc changes to avoid the mprotect(), or at least ignore > the error. Since this is an ABI change and we don't know which kernels > would have it backported, maybe better to still issue the mprotect() but > ignore the failure. What about kernel adding an auxiliary vector as a flag to indicate that BTI is supported and recommended by the kernel? Then dynamic loader could use that to detect that a) the main executable is BTI protected and there's no need to mprotect() it and b) PROT_BTI flag should be added to all PROT_EXEC pages. In absence of the vector, the dynamic loader might choose to skip doing PROT_BTI at all (since the main executable isn't protected anyway either, or maybe even the kernel is up-to-date but it knows that it's not recommended for some reason, or maybe the kernel is so ancient that it doesn't know about BTI). Optionally it could still read the flag from ELF later (for compatibility with old kernels) and then do the mprotect() dance, which may trip seccomp filters, possibly fatally. -Topi ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures 2020-10-24 11:01 ` Topi Miettinen @ 2020-10-26 14:52 ` Catalin Marinas 2020-10-26 15:56 ` Dave Martin 2020-10-26 16:31 ` Topi Miettinen 0 siblings, 2 replies; 12+ messages in thread From: Catalin Marinas @ 2020-10-26 14:52 UTC (permalink / raw) To: Topi Miettinen Cc: Kees Cook, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha, systemd-devel, linux-kernel, Mark Rutland, Mark Brown, Dave Martin, Will Deacon, Salvatore Mesoraca, kernel-hardening, linux-hardening On Sat, Oct 24, 2020 at 02:01:30PM +0300, Topi Miettinen wrote: > On 23.10.2020 12.02, Catalin Marinas wrote: > > On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote: > > > Regardless, it makes sense to me to have the kernel load the executable > > > itself with BTI enabled by default. I prefer gaining Catalin's suggested > > > patch[2]. :) > > [...] > > > [2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/ > > > > I think I first heard the idea at Mark R ;). > > > > It still needs glibc changes to avoid the mprotect(), or at least ignore > > the error. Since this is an ABI change and we don't know which kernels > > would have it backported, maybe better to still issue the mprotect() but > > ignore the failure. > > What about kernel adding an auxiliary vector as a flag to indicate that BTI > is supported and recommended by the kernel? Then dynamic loader could use > that to detect that a) the main executable is BTI protected and there's no > need to mprotect() it and b) PROT_BTI flag should be added to all PROT_EXEC > pages. We could add a bit to AT_FLAGS, it's always been 0 for Linux. > In absence of the vector, the dynamic loader might choose to skip doing > PROT_BTI at all (since the main executable isn't protected anyway either, or > maybe even the kernel is up-to-date but it knows that it's not recommended > for some reason, or maybe the kernel is so ancient that it doesn't know > about BTI). Optionally it could still read the flag from ELF later (for > compatibility with old kernels) and then do the mprotect() dance, which may > trip seccomp filters, possibly fatally. I think the safest is for the dynamic loader to issue an mprotect() and ignore the EPERM error. Not all user deployments have this seccomp filter, so they can still benefit, and user can't tell whether the kernel change has been backported. Now, if the dynamic loader silently ignores the mprotect() failure on the main executable, is there much value in exposing a flag in the aux vectors? It saves a few (one?) mprotect() calls but I don't think it matters much. Anyway, I don't mind the flag. The only potential risk is if the dynamic loader decides not to turn PROT_BTI one because of some mix and match of objects but AFAIK BTI allows interworking. -- Catalin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures 2020-10-26 14:52 ` Catalin Marinas @ 2020-10-26 15:56 ` Dave Martin 2020-10-26 16:51 ` Mark Brown 2020-10-26 16:31 ` Topi Miettinen 1 sibling, 1 reply; 12+ messages in thread From: Dave Martin @ 2020-10-26 15:56 UTC (permalink / raw) To: Catalin Marinas Cc: Topi Miettinen, Mark Rutland, Salvatore Mesoraca, systemd-devel, Kees Cook, kernel-hardening, Will Deacon, linux-kernel, Jeremy Linton, Mark Brown, linux-hardening, libc-alpha, linux-arm-kernel On Mon, Oct 26, 2020 at 02:52:46PM +0000, Catalin Marinas via Libc-alpha wrote: > On Sat, Oct 24, 2020 at 02:01:30PM +0300, Topi Miettinen wrote: > > On 23.10.2020 12.02, Catalin Marinas wrote: > > > On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote: > > > > Regardless, it makes sense to me to have the kernel load the executable > > > > itself with BTI enabled by default. I prefer gaining Catalin's suggested > > > > patch[2]. :) > > > [...] > > > > [2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/ > > > > > > I think I first heard the idea at Mark R ;). > > > > > > It still needs glibc changes to avoid the mprotect(), or at least ignore > > > the error. Since this is an ABI change and we don't know which kernels > > > would have it backported, maybe better to still issue the mprotect() but > > > ignore the failure. > > > > What about kernel adding an auxiliary vector as a flag to indicate that BTI > > is supported and recommended by the kernel? Then dynamic loader could use > > that to detect that a) the main executable is BTI protected and there's no > > need to mprotect() it and b) PROT_BTI flag should be added to all PROT_EXEC > > pages. > > We could add a bit to AT_FLAGS, it's always been 0 for Linux. > > > In absence of the vector, the dynamic loader might choose to skip doing > > PROT_BTI at all (since the main executable isn't protected anyway either, or > > maybe even the kernel is up-to-date but it knows that it's not recommended > > for some reason, or maybe the kernel is so ancient that it doesn't know > > about BTI). Optionally it could still read the flag from ELF later (for > > compatibility with old kernels) and then do the mprotect() dance, which may > > trip seccomp filters, possibly fatally. > > I think the safest is for the dynamic loader to issue an mprotect() and > ignore the EPERM error. Not all user deployments have this seccomp > filter, so they can still benefit, and user can't tell whether the > kernel change has been backported. > > Now, if the dynamic loader silently ignores the mprotect() failure on > the main executable, is there much value in exposing a flag in the aux > vectors? It saves a few (one?) mprotect() calls but I don't think it > matters much. Anyway, I don't mind the flag. I don't see a problem with the aforementioned patch [2] to pre-set BTI on the pages of the main binary. The original rationale here was that ld.so doesn't _need_ this, since it is going to examine the binary's ELF headers anyway. But equally, if the binary is marked as supporting BTI then it's safe to enable BTI for the binary's own pages. I'd tend to agree that an AT_FLAGS flag doesn't add much. I think real EPERMs would only be seen in assert-fail type situations. Failure of mmap() is likely to result in a segfault later on, or correct operation with weakened permissions on some pages. Given the likely failure modes, that situation doesn't feel too bad. > The only potential risk is if the dynamic loader decides not to turn > PROT_BTI one because of some mix and match of objects but AFAIK BTI > allows interworking. Yes, the design means that a page's PROT_BTI can be set safely if the code in that page was compiled for BTI, irrespective of how other pages were compiled. The reasons why we don't do this at finer granularity are (a) is't not very useful, and (b) ELF images only contain a BTI property note for the whole image, not per segment. I think that ld.so already makes this decision at ELF image granularity (unless someone contradicts me). Cheers ---Dave ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures 2020-10-26 15:56 ` Dave Martin @ 2020-10-26 16:51 ` Mark Brown 0 siblings, 0 replies; 12+ messages in thread From: Mark Brown @ 2020-10-26 16:51 UTC (permalink / raw) To: Dave Martin Cc: Catalin Marinas, Topi Miettinen, Mark Rutland, Salvatore Mesoraca, systemd-devel, Kees Cook, kernel-hardening, Will Deacon, linux-kernel, Jeremy Linton, linux-hardening, libc-alpha, linux-arm-kernel [-- Attachment #1: Type: text/plain, Size: 541 bytes --] On Mon, Oct 26, 2020 at 03:56:35PM +0000, Dave Martin wrote: > On Mon, Oct 26, 2020 at 02:52:46PM +0000, Catalin Marinas via Libc-alpha wrote: > > Now, if the dynamic loader silently ignores the mprotect() failure on > > the main executable, is there much value in exposing a flag in the aux > > vectors? It saves a few (one?) mprotect() calls but I don't think it > > matters much. Anyway, I don't mind the flag. > I don't see a problem with the aforementioned patch [2] to pre-set BTI > on the pages of the main binary. Me either FWIW. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures 2020-10-26 14:52 ` Catalin Marinas 2020-10-26 15:56 ` Dave Martin @ 2020-10-26 16:31 ` Topi Miettinen 1 sibling, 0 replies; 12+ messages in thread From: Topi Miettinen @ 2020-10-26 16:31 UTC (permalink / raw) To: Catalin Marinas Cc: Kees Cook, Szabolcs Nagy, Jeremy Linton, linux-arm-kernel, libc-alpha, systemd-devel, linux-kernel, Mark Rutland, Mark Brown, Dave Martin, Will Deacon, Salvatore Mesoraca, kernel-hardening, linux-hardening On 26.10.2020 16.52, Catalin Marinas wrote: > On Sat, Oct 24, 2020 at 02:01:30PM +0300, Topi Miettinen wrote: >> On 23.10.2020 12.02, Catalin Marinas wrote: >>> On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote: >>>> Regardless, it makes sense to me to have the kernel load the executable >>>> itself with BTI enabled by default. I prefer gaining Catalin's suggested >>>> patch[2]. :) >>> [...] >>>> [2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/ >>> >>> I think I first heard the idea at Mark R ;). >>> >>> It still needs glibc changes to avoid the mprotect(), or at least ignore >>> the error. Since this is an ABI change and we don't know which kernels >>> would have it backported, maybe better to still issue the mprotect() but >>> ignore the failure. >> >> What about kernel adding an auxiliary vector as a flag to indicate that BTI >> is supported and recommended by the kernel? Then dynamic loader could use >> that to detect that a) the main executable is BTI protected and there's no >> need to mprotect() it and b) PROT_BTI flag should be added to all PROT_EXEC >> pages. > > We could add a bit to AT_FLAGS, it's always been 0 for Linux. Great! >> In absence of the vector, the dynamic loader might choose to skip doing >> PROT_BTI at all (since the main executable isn't protected anyway either, or >> maybe even the kernel is up-to-date but it knows that it's not recommended >> for some reason, or maybe the kernel is so ancient that it doesn't know >> about BTI). Optionally it could still read the flag from ELF later (for >> compatibility with old kernels) and then do the mprotect() dance, which may >> trip seccomp filters, possibly fatally. > > I think the safest is for the dynamic loader to issue an mprotect() and > ignore the EPERM error. Not all user deployments have this seccomp > filter, so they can still benefit, and user can't tell whether the > kernel change has been backported. But the seccomp filter can be set to kill the process, so that's definitely not the safest way. I think safest is that when the AT_FLAGS bit is seen, ld.so doesn't do any mprotect() calls but instead when mapping the segments, mmap() flags are adjusted to include PROT_BTI, so mprotect() calls are not necessary. If there's no seccomp filter, there's no disadvantage for avoiding the useless mprotect() calls. I'd expect the backported kernel change to include both aux vector and also using PROT_BTI for the main executable. Then the logic would work with backported kernels as well. If there's no aux vector, all bets are off. The kernel could be old and unpatched, even so old that PROT_BTI is not known. Perhaps also in the future there may be new technologies which have replaced BTI and the kernel could want a previous generation ld.so not to try to use BTI, so this could be also indicated with the lack of aux vector. The dynamic loader could still attempt to mprotect() the pages, but that could be fatal. Getting to the point where the error can be ignored means that there's no seccomp filter, at least none set to kill. Perhaps the pain is only temporary, new or patched kernels should eventually replace the old versions. > Now, if the dynamic loader silently ignores the mprotect() failure on > the main executable, is there much value in exposing a flag in the aux > vectors? It saves a few (one?) mprotect() calls but I don't think it > matters much. Anyway, I don't mind the flag. Saving a few system calls is indeed not an issue, but not being able to use MDWX and PROT_BTI simultaneously was the original problem (service failures). -Topi ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-10-26 17:28 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <8584c14f-5c28-9d70-c054-7c78127d84ea@arm.com> [not found] ` <20201022075447.GO3819@arm.com> [not found] ` <78464155-f459-773f-d0ee-c5bdbeb39e5d@gmail.com> 2020-10-22 20:02 ` BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures Kees Cook 2020-10-22 22:24 ` Topi Miettinen 2020-10-23 17:52 ` Salvatore Mesoraca 2020-10-24 11:34 ` Topi Miettinen 2020-10-24 14:12 ` Salvatore Mesoraca 2020-10-25 13:42 ` Jordan Glover 2020-10-23 9:02 ` Catalin Marinas 2020-10-24 11:01 ` Topi Miettinen 2020-10-26 14:52 ` Catalin Marinas 2020-10-26 15:56 ` Dave Martin 2020-10-26 16:51 ` Mark Brown 2020-10-26 16:31 ` Topi Miettinen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).