Re: [PATCH v6] arm64: Expose FAR_EL1 tag bits in sigcontext

From: Peter Collingbourne <pcc@google.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrey Konovalov <andreyknvl@google.com>,
	Kevin Brodsky <kevin.brodsky@arm.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Kostya Serebryany <kcc@google.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	Will Deacon <will@kernel.org>, Dave Martin <Dave.Martin@arm.com>,
	Evgenii Stepanov <eugenis@google.com>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [PATCH v6] arm64: Expose FAR_EL1 tag bits in sigcontext
Date: Thu, 21 May 2020 11:03:52 -0700	[thread overview]
Message-ID: <CAMn1gO6cgcP0O85BA_ire9j1L5zvN4i2JFRXO7R=MScXbmWG1g@mail.gmail.com> (raw)
In-Reply-To: <87zha1ea98.fsf@x220.int.ebiederm.org>

On Thu, May 21, 2020 at 5:39 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Peter Collingbourne <pcc@google.com> writes:
>
> > On Wed, May 20, 2020 at 2:26 AM Dave Martin <Dave.Martin@arm.com> wrote:
> >>
> >> On Wed, May 20, 2020 at 09:55:03AM +0100, Will Deacon wrote:
> >> > On Tue, May 19, 2020 at 03:00:12PM -0700, Peter Collingbourne wrote:
> >> > > On Mon, May 18, 2020 at 2:53 AM Dave Martin <Dave.Martin@arm.com> wrote:
> >> > > > On Thu, May 14, 2020 at 05:58:21PM -0700, Peter Collingbourne wrote:
> >> > > > > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> >> > > > > index baa88dc02e5c..5867f2fdbe64 100644
> >> > > > > --- a/arch/arm64/kernel/signal.c
> >> > > > > +++ b/arch/arm64/kernel/signal.c
> >> > > > > @@ -648,6 +648,7 @@ static int setup_sigframe(struct
> >> > > > > rt_sigframe_user_layout *user,
> >> > > > >                 __put_user_error(ESR_MAGIC, &esr_ctx->head.magic, err);
> >> > > > >                 __put_user_error(sizeof(*esr_ctx), &esr_ctx->head.size, err);
> >> > > > >                 __put_user_error(current->thread.fault_code,
> >> > > > > &esr_ctx->esr, err);
> >> > > > > +               current->thread.fault_code = 0;
> >> > > >
> >> > > > Perhaps, but we'd need to be careful.  For example, can we run out of
> >> > > > user stack before this and deliver a SIGSEGV, but with the old
> >> > > > fault_code still set?  Then we'd emit the old fault code with the
> >> > > > new "can't deliver signal" signal, which doesn't make sense.
> >> > > >
> >> > > > Stuff may also go wrong with signal prioritisation.
> >> > > >
> >> > > > If a higher-priority signal (say SIGINT) comes in after a data abort
> >> > > > enters the kernel but before the resulting SIGSEGV is dequeued for
> >> > > > delivery, wouldn't we deliver SIGINT first, with the bogus fault code?
> >> > > > With your change we'd then have cleared the fault code by the time we
> >> > > > deliver the SIGSEGV it actually relates to, if I've understood right.
> >> > > >
> >> > > > Today, I think we just attach that fault code to every signal that's
> >> > > > delivered until something overwrites or resets it, which means that
> >> > > > a signal that needs fault_code gets it, at the expense of attaching
> >> > > > it to a bunch of other random signals too.
> >> > > >
> >> > > >
> >> > > > Checking the signal number and si_code might help us to know what we
> >> > > > should be doing with fault_code.  We need to have sure userspace can't
> >> > > > trick us with a non kernel generated signal here.  It would also be
> >> > > > necessary to check how PTRACE_SETSIGINFO interacts with this.
> >> > >
> >> > > With these possible interactions in mind I think we should store the
> >> > > fault code and fault address in kernel_siginfo instead of
> >> > > thread_struct (and clear these fields when we receive a siginfo from
> >> > > userspace, i.e. in copy_siginfo_from_user which is used by
> >> > > ptrace(PTRACE_SETSIGINFO) among other places). That way, the
> >> > > information is clearly associated with the signal itself and not the
> >> > > thread, so we don't need to worry about our signal being delivered out
> >> > > of order.
> >> >
> >> > Hmm, I can't see a way to do that that isn't horribly invasive in the core
> >> > signal code. Can you?
> >
> > I think I've come up with a way that doesn't seem to be too invasive.
> > See patch #1 of the series that I'm about to send out.
> >
> >> > But generally, I agree: the per-thread handling of fault_address and
> >> > fault_code appears to be quite broken in the face of signal prioritisation
> >> > and signals that don't correspond directly to hardware trap. It would be
> >> > nice to have some tests for this...
> >> >
> >> > If we want to pile on more bodges, perhaps we could stash the signal number
> >> > to which the fault_{address,code} relate, and then check that at delivery
> >> > and clear on a match. I hate it.
> >>
> >> I agree with Daniel's suggestion in principle, but I was also concerned
> >> about whether it would be too invasive elsewhere.
> >>
> >> Question though: does the core code take special care to make sure that
> >> a force_sig cannot be outprioritised by a regular signal?  If so,
> >> perhaps we get away with it.  I ask this, because the same same issue
> >> may be hitting other arches otherwise.
> >
> > Not as far as I can tell. There does appear to be prioritisation for
> > synchronous signals [1] but as far as I can tell nothing to
> > distinguish one of these signals from one with the same signal number
> > sent from userspace (e.g. via kill(2)).
>
> The si_code will differ between signals generated between userspace
> and signals generated by the kernel.
>
> We do allow a little bit of ptrace and sending to yourself to spoof
> kernel generated signals, for reasons of debugging and process migration
> where an existing process needs to be reconstructed.  But the defenses
> should be strong enough you can assume that we reliably distinguish
> between a signal from userspace and a signal from the kernel.

So check for SIGBUS || SIGSEGV and one of the below si_codes, and only
add the context in that case? Seems fragile to me, but I suppose I
could live with it.

> I don't fully follow what you are doing but this feels like the
> kind of case where a new si_code has been defined as well as additional
> fields in siginfo.

There is no new si_code for this, the information will be exposed for
several existing si_code types (BUS_ADRERR, BUS_ADRALN, BUS_MCEERR_AR,
SEGV_ACCERR, SEGV_MAPERR), and possibly others in the future
(particularly SEGV_MTESERR, which is part of the proposed MTE patch
set). Note that we already have a union field for BUS_MCEERR_AR, and
we may want to expose it for the other si_codes that already have
union fields as well.

That being said, taking a closer look at siginfo, I think we are in
luck and we might be able to make this work in a reasonable way by
reusing padding (see below).

> In your patchset I really hate that you were going back to
> force_sig_info, and filling out struct siginfo by hand.  That is an
> error prone pattern, and I have fixed enough bugs in the kernel to prove
> that.

To be fair, most of the callers are in helper functions that take
explicit parameters similar to force_sig_fault et al, and the SIGILL
one could easily be made that way as well.

> I take exception to the idea that including the full address might break
> userspace.  That means typically means someone has been too lazy to look
> and see what userspace is doing.  When that userspace that might break
> is the same userspace you are changing the kernel to serve that makes me
> nervous.  AKA the userspace that cares about this signal and how it is
> represented in siginfo.

It's not a matter of being lazy. This behaviour isn't just an accident
but has been explicitly documented for years (see the
tagged-pointers.rst file that I changed: "Non-zero tags are not
preserved when delivering signals."), so users can reasonably rely on
it. Furthermore we simply don't have visibility into the majority of
userspace. For example, there are a lot of closed source Android apps
out there, and who knows what signal handlers they're installing and
how they're making use of the si_addr field on e.g. SEGV_MAPERR. We
can't just change the documented semantics under their feet.

It's also not the same userspace either. The userspace that's
initially going to be consuming the new fields is in a part of the
Android system that handles and reports crashes, and that's something
that we control unlike all the apps.

Finally, the userspace may need to know whether the tag bits were
actually zero or whether they were just unavailable, otherwise
userspace could for example produce a misleading crash report. Simply
having the kernel set the top bits of si_addr wouldn't accomplish that
due to the kernel's previous behaviour, hence the mask to let
userspace know which bits are accurate.

> A fix of one instance of SIGILL should not be included with a patch that
> does something else, and really should come before everything else if
> possible.

Fair point. I can see if I can split that part out.

> If this information really belongs in struct siginfo (as it sounds like)
> please actually put the information in siginfo, and let userspace look
> in siginfo to find it.  struct siginfo is a union with plenty of space,
> and plenty of si_codes.
>
> If this applies to multiple cases then it might be trickier but please
> dig into the details, don't toss things into sigcontext just because
> you can't figure out a clean design for reporting this.

If we wanted this in siginfo, one idea that I had was to revert commit
b68a68d3dcc15ebbf23cbe91af1abf57591bd96b and add unsigned char fields
_addr_top_byte and _addr_top_byte_mask in the padding between
_addr_lsb and the union (with comments on all the fields of course to
say when they are filled in). I think that would work since we are
already clearing padding in siginfo, one nice property of the new
fields is that the zero values are correct in the case where the
information isn't being exposed (so old kernels would already have the
correct behaviour). That would only work on certain architectures
(i.e. at least alignof(void*) >= 4) so I suppose it could have an
#ifdef __aarch64__ around it.

Peter

Peter
>
> Eric
>
>
> > Peter
> >
> > [1] https://github.com/torvalds/linux/blob/b85051e755b0e9d6dd8f17ef1da083851b83287d/kernel/signal.c#L222

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel