All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Martin <Dave.Martin@arm.com>
To: Jeremy Linton <jeremy.linton@arm.com>
Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	systemd-devel@lists.freedesktop.org,
	Kees Cook <keescook@chromium.org>,
	Catalin Marinas <Catalin.Marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Mark Brown <broonie@kernel.org>,
	toiwoton@gmail.com, libc-alpha@sourceware.org,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
Date: Tue, 27 Oct 2020 14:15:22 +0000	[thread overview]
Message-ID: <20201027141522.GD27285@arm.com> (raw)
In-Reply-To: <45c64b49-a38b-4b0c-d9cf-6c586dacbcc9@arm.com>

On Mon, Oct 26, 2020 at 05:39:42PM -0500, Jeremy Linton via Libc-alpha wrote:
> Hi,
> 
> On 10/26/20 12:52 PM, Dave Martin wrote:
> >On Mon, Oct 26, 2020 at 04:57:55PM +0000, Szabolcs Nagy via Libc-alpha wrote:
> >>The 10/26/2020 16:24, Dave Martin via Libc-alpha wrote:
> >>>Unrolling this discussion a bit, this problem comes from a few sources:
> >>>
> >>>1) systemd is trying to implement a policy that doesn't fit SECCOMP
> >>>syscall filtering very well.
> >>>
> >>>2) The program is trying to do something not expressible through the
> >>>syscall interface: really the intent is to set PROT_BTI on the page,
> >>>with no intent to set PROT_EXEC on any page that didn't already have it
> >>>set.
> >>>
> >>>
> >>>This limitation of mprotect() was known when I originally added PROT_BTI,
> >>>but at that time we weren't aware of a clear use case that would fail.
> >>>
> >>>
> >>>Would it now help to add something like:
> >>>
> >>>int mchangeprot(void *addr, size_t len, int old_flags, int new_flags)
> >>>{
> >>>	int ret = -EINVAL;
> >>>	mmap_write_lock(current->mm);
> >>>	if (all vmas in [addr .. addr + len) have
> >>>			their mprotect flags set to old_flags) {
> >>>
> >>>		ret = mprotect(addr, len, new_flags);
> >>>	}
> >>>	
> >>>	mmap_write_unlock(current->mm);
> >>>	return ret;
> >>>}
> >>
> >>if more prot flags are introduced then the exact
> >>match for old_flags may be restrictive and currently
> >>there is no way to query these flags to figure out
> >>how to toggle one prot flag in a future proof way,
> >>so i don't think this solves the issue completely.
> >
> >Ack -- I illustrated this model because it makes the seccomp filter's
> >job easy, but it does have limitations.
> >
> >>i think we might need a new api, given that aarch64
> >>now has PROT_BTI and PROT_MTE while existing code
> >>expects RWX only, but i don't know what api is best.
> >
> >An alternative option would be a call that sets / clears chosen
> >flags and leaves others unchanged.
> 
> I tend to favor a set/clear API, but that could also just be done by
> creating a new PROT_BTI_IF_X which enables BTI for areas already set to
> _EXEC. That goes right by the seccomp filters too, and actually is closer to
> what glibc wants to do anyway.

That works, though I'm not so keen on teating PROT_BTI as a special case,
since the problem is likely to recur when other weird per-arch flags get
added...

I also wonder whether we actually care whether the pages are marked
executable or not here; probably the flags can just be independent.  This
rather depends on whether the how the architecture treats the BTI (a.k.a
GP) pagetable bit for non-executable pages.  I have a feeling we already
allow PROT_BTI && !PROT_EXEC through anyway.


What about a generic-ish set/clear interface that still works by just
adding a couple of PROT_ flags:

	switch (flags & (PROT_SET | PROT_CLEAR)) {
	case PROT_SET: prot |= flags; break;
	case PROT_CLEAR: prot &= ~flags; break;
	case 0: prot = flags; break;

	default:
		return -EINVAL;
	}

This can't atomically set some flags while clearing some others, but for
simple stuff it seems sufficient and shouldn't be too invasive on the
kernel side.

We will still have to take the mm lock when doing a SET or CLEAR, but
not for the non-set/clear case.


Anyway, libc could now do:

	mprotect(addr, len, PROT_SET | PROT_BTI);

with much the same effect as your PROT_BTI_IF_X.


JITting or breakpoint setting code that wants to change the permissions
temporarily, without needing to know whether PROT_BTI is set, say:

	mprotect(addr, len, PROT_SET | PROT_WRITE);
	*addr = BKPT_INSN;
	mprotect(addr, len, PROT_CLEAR | PROT_WRITE);


Thoughts?

I won't claim this doesn't still have some limitations...

Cheers
---Dave

WARNING: multiple messages have this Message-ID (diff)
From: Dave Martin <Dave.Martin@arm.com>
To: Jeremy Linton <jeremy.linton@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	systemd-devel@lists.freedesktop.org,
	Kees Cook <keescook@chromium.org>,
	Szabolcs Nagy <szabolcs.nagy@arm.com>,
	Catalin Marinas <Catalin.Marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Mark Brown <broonie@kernel.org>,
	toiwoton@gmail.com, libc-alpha@sourceware.org,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
Date: Tue, 27 Oct 2020 14:15:22 +0000	[thread overview]
Message-ID: <20201027141522.GD27285@arm.com> (raw)
In-Reply-To: <45c64b49-a38b-4b0c-d9cf-6c586dacbcc9@arm.com>

On Mon, Oct 26, 2020 at 05:39:42PM -0500, Jeremy Linton via Libc-alpha wrote:
> Hi,
> 
> On 10/26/20 12:52 PM, Dave Martin wrote:
> >On Mon, Oct 26, 2020 at 04:57:55PM +0000, Szabolcs Nagy via Libc-alpha wrote:
> >>The 10/26/2020 16:24, Dave Martin via Libc-alpha wrote:
> >>>Unrolling this discussion a bit, this problem comes from a few sources:
> >>>
> >>>1) systemd is trying to implement a policy that doesn't fit SECCOMP
> >>>syscall filtering very well.
> >>>
> >>>2) The program is trying to do something not expressible through the
> >>>syscall interface: really the intent is to set PROT_BTI on the page,
> >>>with no intent to set PROT_EXEC on any page that didn't already have it
> >>>set.
> >>>
> >>>
> >>>This limitation of mprotect() was known when I originally added PROT_BTI,
> >>>but at that time we weren't aware of a clear use case that would fail.
> >>>
> >>>
> >>>Would it now help to add something like:
> >>>
> >>>int mchangeprot(void *addr, size_t len, int old_flags, int new_flags)
> >>>{
> >>>	int ret = -EINVAL;
> >>>	mmap_write_lock(current->mm);
> >>>	if (all vmas in [addr .. addr + len) have
> >>>			their mprotect flags set to old_flags) {
> >>>
> >>>		ret = mprotect(addr, len, new_flags);
> >>>	}
> >>>	
> >>>	mmap_write_unlock(current->mm);
> >>>	return ret;
> >>>}
> >>
> >>if more prot flags are introduced then the exact
> >>match for old_flags may be restrictive and currently
> >>there is no way to query these flags to figure out
> >>how to toggle one prot flag in a future proof way,
> >>so i don't think this solves the issue completely.
> >
> >Ack -- I illustrated this model because it makes the seccomp filter's
> >job easy, but it does have limitations.
> >
> >>i think we might need a new api, given that aarch64
> >>now has PROT_BTI and PROT_MTE while existing code
> >>expects RWX only, but i don't know what api is best.
> >
> >An alternative option would be a call that sets / clears chosen
> >flags and leaves others unchanged.
> 
> I tend to favor a set/clear API, but that could also just be done by
> creating a new PROT_BTI_IF_X which enables BTI for areas already set to
> _EXEC. That goes right by the seccomp filters too, and actually is closer to
> what glibc wants to do anyway.

That works, though I'm not so keen on teating PROT_BTI as a special case,
since the problem is likely to recur when other weird per-arch flags get
added...

I also wonder whether we actually care whether the pages are marked
executable or not here; probably the flags can just be independent.  This
rather depends on whether the how the architecture treats the BTI (a.k.a
GP) pagetable bit for non-executable pages.  I have a feeling we already
allow PROT_BTI && !PROT_EXEC through anyway.


What about a generic-ish set/clear interface that still works by just
adding a couple of PROT_ flags:

	switch (flags & (PROT_SET | PROT_CLEAR)) {
	case PROT_SET: prot |= flags; break;
	case PROT_CLEAR: prot &= ~flags; break;
	case 0: prot = flags; break;

	default:
		return -EINVAL;
	}

This can't atomically set some flags while clearing some others, but for
simple stuff it seems sufficient and shouldn't be too invasive on the
kernel side.

We will still have to take the mm lock when doing a SET or CLEAR, but
not for the non-set/clear case.


Anyway, libc could now do:

	mprotect(addr, len, PROT_SET | PROT_BTI);

with much the same effect as your PROT_BTI_IF_X.


JITting or breakpoint setting code that wants to change the permissions
temporarily, without needing to know whether PROT_BTI is set, say:

	mprotect(addr, len, PROT_SET | PROT_WRITE);
	*addr = BKPT_INSN;
	mprotect(addr, len, PROT_CLEAR | PROT_WRITE);


Thoughts?

I won't claim this doesn't still have some limitations...

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-10-27 18:09 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8584c14f-5c28-9d70-c054-7c78127d84ea@arm.com>
2020-10-22  7:18 ` [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures Lennart Poettering
2020-10-22  7:18   ` Lennart Poettering
2020-10-22  7:54   ` Florian Weimer
2020-10-22  7:54     ` Florian Weimer
2020-10-22  8:17     ` Topi Miettinen
2020-10-22  8:17       ` Topi Miettinen
2020-10-22  8:25       ` Florian Weimer
2020-10-22  8:25         ` Florian Weimer
2020-10-22  8:29       ` Szabolcs Nagy
2020-10-22  8:29         ` Szabolcs Nagy
2020-10-22  8:38         ` Lennart Poettering
2020-10-22  8:38           ` Lennart Poettering
2020-10-22  9:31           ` Catalin Marinas
2020-10-22  9:31             ` Catalin Marinas
2020-10-22 10:12             ` Topi Miettinen
2020-10-22 10:12               ` Topi Miettinen
2020-10-22 10:27               ` Florian Weimer
2020-10-22 10:27                 ` Florian Weimer
2020-10-23  6:13             ` Szabolcs Nagy
2020-10-23  6:13               ` Szabolcs Nagy
2020-10-23  9:04               ` Catalin Marinas
2020-10-23  9:04                 ` Catalin Marinas
2020-10-22 10:03         ` Topi Miettinen
2020-10-22 10:03           ` Topi Miettinen
2020-10-22  8:05   ` Szabolcs Nagy
2020-10-22  8:05     ` Szabolcs Nagy
2020-10-22  8:31     ` Lennart Poettering
2020-10-22  8:31       ` Lennart Poettering
     [not found] ` <20201022075447.GO3819@arm.com>
2020-10-22 10:39   ` Topi Miettinen
2020-10-22 10:39     ` Topi Miettinen
2020-10-22 20:02     ` Kees Cook
2020-10-22 20:02       ` Kees Cook
2020-10-22 20:02       ` Kees Cook
2020-10-22 22:24       ` Topi Miettinen
2020-10-22 22:24         ` Topi Miettinen
2020-10-22 22:24         ` Topi Miettinen
2020-10-23 17:52         ` Salvatore Mesoraca
2020-10-23 17:52           ` Salvatore Mesoraca
2020-10-23 17:52           ` Salvatore Mesoraca
2020-10-24 11:34           ` Topi Miettinen
2020-10-24 11:34             ` Topi Miettinen
2020-10-24 11:34             ` Topi Miettinen
2020-10-24 14:12             ` Salvatore Mesoraca
2020-10-24 14:12               ` Salvatore Mesoraca
2020-10-24 14:12               ` Salvatore Mesoraca
2020-10-25 13:42               ` Jordan Glover
2020-10-25 13:42                 ` Jordan Glover
2020-10-25 13:42                 ` Jordan Glover
2020-10-23  9:02       ` Catalin Marinas
2020-10-23  9:02         ` Catalin Marinas
2020-10-23  9:02         ` Catalin Marinas
2020-10-24 11:01         ` Topi Miettinen
2020-10-24 11:01           ` Topi Miettinen
2020-10-24 11:01           ` Topi Miettinen
2020-10-26 14:52           ` Catalin Marinas
2020-10-26 14:52             ` Catalin Marinas
2020-10-26 14:52             ` Catalin Marinas
2020-10-26 15:56             ` Dave Martin
2020-10-26 15:56               ` Dave Martin
2020-10-26 15:56               ` Dave Martin
2020-10-26 16:51               ` Mark Brown
2020-10-26 16:51                 ` Mark Brown
2020-10-26 16:51                 ` Mark Brown
2020-10-26 16:31             ` Topi Miettinen
2020-10-26 16:31               ` Topi Miettinen
2020-10-26 16:31               ` Topi Miettinen
2020-10-26 16:24 ` Dave Martin
2020-10-26 16:24   ` Dave Martin
2020-10-26 16:39   ` Topi Miettinen
2020-10-26 16:39     ` Topi Miettinen
2020-10-26 16:45   ` Florian Weimer
2020-10-26 16:45     ` Florian Weimer
2020-10-27 14:22     ` Dave Martin
2020-10-27 14:22       ` Dave Martin
2020-10-27 14:41       ` Florian Weimer
2020-10-27 14:41         ` Florian Weimer
2020-10-26 16:57   ` Szabolcs Nagy
2020-10-26 16:57     ` Szabolcs Nagy
2020-10-26 17:52     ` Dave Martin
2020-10-26 17:52       ` Dave Martin
2020-10-26 22:39       ` Jeremy Linton
2020-10-26 22:39         ` Jeremy Linton
2020-10-27 14:15         ` Dave Martin [this message]
2020-10-27 14:15           ` Dave Martin
2020-10-29 11:02           ` Catalin Marinas
2020-10-29 11:02             ` Catalin Marinas
2020-11-04 12:18             ` Dave Martin
2020-11-04 12:18               ` Dave Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201027141522.GD27285@arm.com \
    --to=dave.martin@arm.com \
    --cc=Catalin.Marinas@arm.com \
    --cc=broonie@kernel.org \
    --cc=jeremy.linton@arm.com \
    --cc=keescook@chromium.org \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=systemd-devel@lists.freedesktop.org \
    --cc=szabolcs.nagy@arm.com \
    --cc=toiwoton@gmail.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.