Re: [PATCH v6] arm64: Expose FAR_EL1 tag bits in sigcontext

From: ebiederm@xmission.com (Eric W. Biederman)
To: Peter Collingbourne <pcc@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>,
	Kevin Brodsky <kevin.brodsky@arm.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Kostya Serebryany <kcc@google.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	Will Deacon <will@kernel.org>, Dave Martin <Dave.Martin@arm.com>,
	Evgenii Stepanov <eugenis@google.com>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [PATCH v6] arm64: Expose FAR_EL1 tag bits in sigcontext
Date: Thu, 21 May 2020 14:24:45 -0500	[thread overview]
Message-ID: <874ks9drb6.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <CAMn1gO6cgcP0O85BA_ire9j1L5zvN4i2JFRXO7R=MScXbmWG1g@mail.gmail.com> (Peter Collingbourne's message of "Thu, 21 May 2020 11:03:52 -0700")

Peter Collingbourne <pcc@google.com> writes:

> On Thu, May 21, 2020 at 5:39 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Peter Collingbourne <pcc@google.com> writes:
>>
>> > On Wed, May 20, 2020 at 2:26 AM Dave Martin <Dave.Martin@arm.com> wrote:
>> >>
>> >> On Wed, May 20, 2020 at 09:55:03AM +0100, Will Deacon wrote:
>> >> > On Tue, May 19, 2020 at 03:00:12PM -0700, Peter Collingbourne wrote:
>> >> > > On Mon, May 18, 2020 at 2:53 AM Dave Martin <Dave.Martin@arm.com> wrote:
>> >> > > > On Thu, May 14, 2020 at 05:58:21PM -0700, Peter Collingbourne wrote:
>> >> > > > > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
>> >> > > > > index baa88dc02e5c..5867f2fdbe64 100644
>> >> > > > > --- a/arch/arm64/kernel/signal.c
>> >> > > > > +++ b/arch/arm64/kernel/signal.c
>> >> > > > > @@ -648,6 +648,7 @@ static int setup_sigframe(struct
>> >> > > > > rt_sigframe_user_layout *user,
>> >> > > > >                 __put_user_error(ESR_MAGIC, &esr_ctx->head.magic, err);
>> >> > > > >                 __put_user_error(sizeof(*esr_ctx), &esr_ctx->head.size, err);
>> >> > > > >                 __put_user_error(current->thread.fault_code,
>> >> > > > > &esr_ctx->esr, err);
>> >> > > > > +               current->thread.fault_code = 0;
>> >> > > >
>> >> > > > Perhaps, but we'd need to be careful.  For example, can we run out of
>> >> > > > user stack before this and deliver a SIGSEGV, but with the old
>> >> > > > fault_code still set?  Then we'd emit the old fault code with the
>> >> > > > new "can't deliver signal" signal, which doesn't make sense.
>> >> > > >
>> >> > > > Stuff may also go wrong with signal prioritisation.
>> >> > > >
>> >> > > > If a higher-priority signal (say SIGINT) comes in after a data abort
>> >> > > > enters the kernel but before the resulting SIGSEGV is dequeued for
>> >> > > > delivery, wouldn't we deliver SIGINT first, with the bogus fault code?
>> >> > > > With your change we'd then have cleared the fault code by the time we
>> >> > > > deliver the SIGSEGV it actually relates to, if I've understood right.
>> >> > > >
>> >> > > > Today, I think we just attach that fault code to every signal that's
>> >> > > > delivered until something overwrites or resets it, which means that
>> >> > > > a signal that needs fault_code gets it, at the expense of attaching
>> >> > > > it to a bunch of other random signals too.
>> >> > > >
>> >> > > >
>> >> > > > Checking the signal number and si_code might help us to know what we
>> >> > > > should be doing with fault_code.  We need to have sure userspace can't
>> >> > > > trick us with a non kernel generated signal here.  It would also be
>> >> > > > necessary to check how PTRACE_SETSIGINFO interacts with this.
>> >> > >
>> >> > > With these possible interactions in mind I think we should store the
>> >> > > fault code and fault address in kernel_siginfo instead of
>> >> > > thread_struct (and clear these fields when we receive a siginfo from
>> >> > > userspace, i.e. in copy_siginfo_from_user which is used by
>> >> > > ptrace(PTRACE_SETSIGINFO) among other places). That way, the
>> >> > > information is clearly associated with the signal itself and not the
>> >> > > thread, so we don't need to worry about our signal being delivered out
>> >> > > of order.
>> >> >
>> >> > Hmm, I can't see a way to do that that isn't horribly invasive in the core
>> >> > signal code. Can you?
>> >
>> > I think I've come up with a way that doesn't seem to be too invasive.
>> > See patch #1 of the series that I'm about to send out.
>> >
>> >> > But generally, I agree: the per-thread handling of fault_address and
>> >> > fault_code appears to be quite broken in the face of signal prioritisation
>> >> > and signals that don't correspond directly to hardware trap. It would be
>> >> > nice to have some tests for this...
>> >> >
>> >> > If we want to pile on more bodges, perhaps we could stash the signal number
>> >> > to which the fault_{address,code} relate, and then check that at delivery
>> >> > and clear on a match. I hate it.
>> >>
>> >> I agree with Daniel's suggestion in principle, but I was also concerned
>> >> about whether it would be too invasive elsewhere.
>> >>
>> >> Question though: does the core code take special care to make sure that
>> >> a force_sig cannot be outprioritised by a regular signal?  If so,
>> >> perhaps we get away with it.  I ask this, because the same same issue
>> >> may be hitting other arches otherwise.
>> >
>> > Not as far as I can tell. There does appear to be prioritisation for
>> > synchronous signals [1] but as far as I can tell nothing to
>> > distinguish one of these signals from one with the same signal number
>> > sent from userspace (e.g. via kill(2)).
>>
>> The si_code will differ between signals generated between userspace
>> and signals generated by the kernel.
>>
>> We do allow a little bit of ptrace and sending to yourself to spoof
>> kernel generated signals, for reasons of debugging and process migration
>> where an existing process needs to be reconstructed.  But the defenses
>> should be strong enough you can assume that we reliably distinguish
>> between a signal from userspace and a signal from the kernel.
>
> So check for SIGBUS || SIGSEGV and one of the below si_codes, and only
> add the context in that case? Seems fragile to me, but I suppose I
> could live with it.
>
>> I don't fully follow what you are doing but this feels like the
>> kind of case where a new si_code has been defined as well as additional
>> fields in siginfo.
>
> There is no new si_code for this, the information will be exposed for
> several existing si_code types (BUS_ADRERR, BUS_ADRALN, BUS_MCEERR_AR,
> SEGV_ACCERR, SEGV_MAPERR), and possibly others in the future
> (particularly SEGV_MTESERR, which is part of the proposed MTE patch
> set). Note that we already have a union field for BUS_MCEERR_AR, and
> we may want to expose it for the other si_codes that already have
> union fields as well.
>
> That being said, taking a closer look at siginfo, I think we are in
> luck and we might be able to make this work in a reasonable way by
> reusing padding (see below).
>
>> In your patchset I really hate that you were going back to
>> force_sig_info, and filling out struct siginfo by hand.  That is an
>> error prone pattern, and I have fixed enough bugs in the kernel to prove
>> that.
>
> To be fair, most of the callers are in helper functions that take
> explicit parameters similar to force_sig_fault et al, and the SIGILL
> one could easily be made that way as well.
>
>> I take exception to the idea that including the full address might break
>> userspace.  That means typically means someone has been too lazy to look
>> and see what userspace is doing.  When that userspace that might break
>> is the same userspace you are changing the kernel to serve that makes me
>> nervous.  AKA the userspace that cares about this signal and how it is
>> represented in siginfo.
>
> It's not a matter of being lazy. This behaviour isn't just an accident
> but has been explicitly documented for years (see the
> tagged-pointers.rst file that I changed: "Non-zero tags are not
> preserved when delivering signals."), so users can reasonably rely on
> it. Furthermore we simply don't have visibility into the majority of
> userspace. For example, there are a lot of closed source Android apps
> out there, and who knows what signal handlers they're installing and
> how they're making use of the si_addr field on e.g. SEGV_MAPERR. We
> can't just change the documented semantics under their feet.
>
> It's also not the same userspace either. The userspace that's
> initially going to be consuming the new fields is in a part of the
> Android system that handles and reports crashes, and that's something
> that we control unlike all the apps.
>
> Finally, the userspace may need to know whether the tag bits were
> actually zero or whether they were just unavailable, otherwise
> userspace could for example produce a misleading crash report. Simply
> having the kernel set the top bits of si_addr wouldn't accomplish that
> due to the kernel's previous behaviour, hence the mask to let
> userspace know which bits are accurate.
>
>> A fix of one instance of SIGILL should not be included with a patch that
>> does something else, and really should come before everything else if
>> possible.
>
> Fair point. I can see if I can split that part out.
>
>> If this information really belongs in struct siginfo (as it sounds like)
>> please actually put the information in siginfo, and let userspace look
>> in siginfo to find it.  struct siginfo is a union with plenty of space,
>> and plenty of si_codes.
>>
>> If this applies to multiple cases then it might be trickier but please
>> dig into the details, don't toss things into sigcontext just because
>> you can't figure out a clean design for reporting this.
>
> If we wanted this in siginfo, one idea that I had was to revert commit
> b68a68d3dcc15ebbf23cbe91af1abf57591bd96b and add unsigned char fields
> _addr_top_byte and _addr_top_byte_mask in the padding between
> _addr_lsb and the union (with comments on all the fields of course to
> say when they are filled in). I think that would work since we are
> already clearing padding in siginfo, one nice property of the new
> fields is that the zero values are correct in the case where the
> information isn't being exposed (so old kernels would already have the
> correct behaviour). That would only work on certain architectures
> (i.e. at least alignof(void*) >= 4) so I suppose it could have an
> #ifdef __aarch64__ around it.

Perhaps add a 4th padding member to the union inside of _sigfault, that
adds something like 4 unsigned long's worth of data, and then have your
fields after the union.

Is it quite a bit of work to gather that information from the
instructions that faulted?  I am just checking that this work is really
makes sense.

What I really don't understand is how well this problem generalizes to
other architectures to tell if this is something other people need to
solve at some point as well.

Eric

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel