From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Sean Christopherson <seanjc@google.com>
Cc: "Russell King, ARM Linux" <linux@armlinux.org.uk>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Guo Ren <guoren@kernel.org>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Michael Ellerman <mpe@ellerman.id.au>,
Heiko Carstens <hca@linux.ibm.com>, gor <gor@linux.ibm.com>,
Christian Borntraeger <borntraeger@de.ibm.com>,
Oleg Nesterov <oleg@redhat.com>, rostedt <rostedt@goodmis.org>,
Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Andy Lutomirski <luto@kernel.org>, paulmck <paulmck@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>,
Paolo Bonzini <pbonzini@redhat.com>, shuah <shuah@kernel.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-csky <linux-csky@vger.kernel.org>,
linux-mips <linux-mips@vger.kernel.org>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
linux-s390 <linux-s390@vger.kernel.org>,
KVM list <kvm@vger.kernel.org>,
linux-kselftest <linux-kselftest@vger.kernel.org>,
Peter Foley <pefoley@google.com>,
Shakeel Butt <shakeelb@google.com>,
Ben Gardon <bgardon@google.com>
Subject: Re: [PATCH 1/5] KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
Date: Fri, 20 Aug 2021 14:51:03 -0400 (EDT) [thread overview]
Message-ID: <1872633041.20290.1629485463253.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <YR7tzZ98XC6OV2vu@google.com>
----- On Aug 19, 2021, at 7:48 PM, Sean Christopherson seanjc@google.com wrote:
> On Thu, Aug 19, 2021, Mathieu Desnoyers wrote:
>> ----- On Aug 17, 2021, at 8:12 PM, Sean Christopherson seanjc@google.com wrote:
>> > @@ -250,7 +250,7 @@ static int rseq_ip_fixup(struct pt_regs *regs)
>> > * If not nested over a rseq critical section, restart is useless.
>> > * Clear the rseq_cs pointer and return.
>> > */
>> > - if (!in_rseq_cs(ip, &rseq_cs))
>> > + if (!regs || !in_rseq_cs(ip, &rseq_cs))
>>
>> I think clearing the thread's rseq_cs unconditionally here when regs is NULL
>> is not the behavior we want when this is called from xfer_to_guest_mode_work.
>>
>> If we have a scenario where userspace ends up calling this ioctl(KVM_RUN)
>> from within a rseq c.s., we really want a CONFIG_DEBUG_RSEQ=y kernel to
>> kill this application in the rseq_syscall handler when exiting back to usermode
>> when the ioctl eventually returns.
>>
>> However, clearing the thread's rseq_cs will prevent this from happening.
>>
>> So I would favor an approach where we simply do:
>>
>> if (!regs)
>> return 0;
>>
>> Immediately at the beginning of rseq_ip_fixup, before getting the instruction
>> pointer, so effectively skip all side-effects of the ip fixup code. Indeed, it
>> is not relevant to do any fixup here, because it is nested in a ioctl system
>> call.
>>
>> Effectively, this would preserve the SIGSEGV behavior when this ioctl is
>> erroneously called by user-space from a rseq critical section.
>
> Ha, that's effectively what I implemented first, but I changed it because of the
> comment in clear_rseq_cs() that says:
>
> The rseq_cs field is set to NULL on preemption or signal delivery ... as well
> as well as on top of code outside of the rseq assembly block.
>
> Which makes it sound like something might rely on clearing rseq_cs?
This comment is describing succinctly the lazy clear scheme for rseq_cs.
Without the lazy clear scheme, a rseq c.s. would look like:
* init(rseq_cs)
* cpu = TLS->rseq::cpu_id_start
* [1] TLS->rseq::rseq_cs = rseq_cs
* [start_ip] ----------------------------
* [2] if (cpu != TLS->rseq::cpu_id)
* goto abort_ip;
* [3] <last_instruction_in_cs>
* [post_commit_ip] ----------------------------
* [4] TLS->rseq::rseq_cs = NULL
But as a fast-path optimization, [4] is not entirely needed because the rseq_cs
descriptor contains information about the instruction pointer range of the critical
section. Therefore, userspace can omit [4], but if the kernel never clears it, it
means that it will have to re-read the rseq_cs descriptor's content each time it
needs to check it to confirm that it is not nested over a rseq c.s..
So making the kernel lazily clear the rseq_cs pointer is just an optimization which
ensures that the kernel won't do useless work the next time it needs to check
rseq_cs, given that it has already validated that the userspace code is currently
not within the rseq c.s. currently advertised by the rseq_cs field.
>
> Ah, or is it the case that rseq_cs is non-NULL if and only if userspace is in an
> rseq critical section, and because syscalls in critical sections are illegal, by
> definition clearing rseq_cs is a nop unless userspace is misbehaving.
Not quite, as I described above. But we want it to stay set so the CONFIG_DEBUG_RSEQ
code executed when returning from ioctl to userspace will be able to validate that
it is not nested within a rseq critical section.
>
> If that's true, what about explicitly checking that at NOTIFY_RESUME? Or is it
> not worth the extra code to detect an error that will likely be caught anyways?
The error will indeed already be caught on return from ioctl to userspace, so I
don't see any added value in duplicating this check.
Thanks,
Mathieu
>
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index 35f7bd0fced0..28b8342290b0 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -282,6 +282,13 @@ void __rseq_handle_notify_resume(struct ksignal *ksig,
> struct pt_regs *regs)
>
> if (unlikely(t->flags & PF_EXITING))
> return;
> + if (!regs) {
> +#ifdef CONFIG_DEBUG_RSEQ
> + if (t->rseq && rseq_get_rseq_cs(t, &rseq_cs))
> + goto error;
> +#endif
> + return;
> + }
> ret = rseq_ip_fixup(regs);
> if (unlikely(ret < 0))
> goto error;
>
>> Thanks for looking into this !
>>
>> Mathieu
>>
>> > return clear_rseq_cs(t);
>> > ret = rseq_need_restart(t, rseq_cs.flags);
>> > if (ret <= 0)
>> > --
>> > 2.33.0.rc1.237.g0d66db33f3-goog
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
> > http://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2021-08-20 18:51 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-18 0:12 [PATCH 0/5] KVM: rseq: Fix and a test for a KVM+rseq bug Sean Christopherson
2021-08-18 0:12 ` [PATCH 1/5] KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest Sean Christopherson
2021-08-19 21:39 ` Mathieu Desnoyers
2021-08-19 23:48 ` Sean Christopherson
2021-08-20 18:51 ` Mathieu Desnoyers [this message]
2021-08-20 22:26 ` Sean Christopherson
2021-09-06 10:28 ` Paolo Bonzini
2021-09-07 14:38 ` Sean Christopherson
2021-08-18 0:12 ` [PATCH 2/5] entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume() Sean Christopherson
2021-08-19 21:41 ` Mathieu Desnoyers
2021-08-18 0:12 ` [PATCH 3/5] tools: Move x86 syscall number fallbacks to .../uapi/ Sean Christopherson
2021-08-18 0:12 ` [PATCH 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs Sean Christopherson
2021-08-19 21:52 ` Mathieu Desnoyers
2021-08-19 23:33 ` Sean Christopherson
2021-08-20 18:31 ` Mathieu Desnoyers
2021-08-20 22:25 ` Sean Christopherson
2021-08-18 0:12 ` [PATCH 5/5] KVM: selftests: Remove __NR_userfaultfd syscall fallback Sean Christopherson
2021-09-22 14:12 ` [PATCH 0/5] KVM: rseq: Fix and a test for a KVM+rseq bug Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1872633041.20290.1629485463253.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=benh@kernel.crashing.org \
--cc=bgardon@google.com \
--cc=boqun.feng@gmail.com \
--cc=borntraeger@de.ibm.com \
--cc=catalin.marinas@arm.com \
--cc=gor@linux.ibm.com \
--cc=guoren@kernel.org \
--cc=hca@linux.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-csky@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=oleg@redhat.com \
--cc=paulmck@kernel.org \
--cc=paulus@samba.org \
--cc=pbonzini@redhat.com \
--cc=pefoley@google.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=shakeelb@google.com \
--cc=shuah@kernel.org \
--cc=tglx@linutronix.de \
--cc=tsbogend@alpha.franken.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).