linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: the x86 sysret_rip test fails on the Intel FRED architecture
       [not found] <SA1PR11MB6734FA9139B9C9F6CC2ED123A8C59@SA1PR11MB6734.namprd11.prod.outlook.com>
@ 2023-01-20 17:45 ` Dave Hansen
       [not found]   ` <eb81f7f2-d266-d999-b41a-e6eae086e731@citrix.com>
  2023-01-21  4:59   ` H. Peter Anvin
  0 siblings, 2 replies; 88+ messages in thread
From: Dave Hansen @ 2023-01-20 17:45 UTC (permalink / raw)
  To: Li, Xin3, tglx, mingo, bp, peterz, dave.hansen, H.Peter Anvin
  Cc: x86, linux-kernel

On 1/19/23 23:49, Li, Xin3 wrote:
> The x86 sysret_rip test has the following assertion:
> 
>         /* R11 and EFLAGS should already match. */
>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
>                ctx->uc_mcontext.gregs[REG_R11]);
> 
> This is being tested to avoid kernel state leak due to sysret vs iret,
> but that on FRED r11 is *always* preserved, and the test just fails.

Let's figure out the reason that FRED acts differently, first.  Right
now, the SDM says:

	SYSCALL also saves RFLAGS into R11

so that behavior of SYSCALL _looks_ architectural to me.  Was this
change in SYSCALL behavior with FRED intentional?

If not intentional, it might be something that can still be fixed.  If
it is intentional and is going to be with us for a while we have a few
options.  If userspace is _really_ depending on this behavior, we could
just clobber r11 ourselves in the FRED entry path.  If not, we can
remove the assertion in the selftest.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
       [not found]   ` <eb81f7f2-d266-d999-b41a-e6eae086e731@citrix.com>
@ 2023-01-20 20:50     ` H. Peter Anvin
  2023-01-20 21:10       ` Andrew Cooper
  0 siblings, 1 reply; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-20 20:50 UTC (permalink / raw)
  To: Andrew Cooper, Dave Hansen, Li, Xin3, tglx, mingo, bp, peterz,
	dave.hansen
  Cc: x86, linux-kernel

On January 20, 2023 10:52:02 AM PST, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>On 20/01/2023 5:45 pm, Dave Hansen wrote:
>> On 1/19/23 23:49, Li, Xin3 wrote:
>>> The x86 sysret_rip test has the following assertion:
>>>
>>>         /* R11 and EFLAGS should already match. */
>>>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
>>>                ctx->uc_mcontext.gregs[REG_R11]);
>>>
>>> This is being tested to avoid kernel state leak due to sysret vs iret,
>>> but that on FRED r11 is *always* preserved, and the test just fails.
>> Let's figure out the reason that FRED acts differently, first.  Right
>> now, the SDM says:
>>
>> 	SYSCALL also saves RFLAGS into R11
>>
>> so that behavior of SYSCALL _looks_ architectural to me.  Was this
>> change in SYSCALL behavior with FRED intentional?
>
>FRED 3.0 Section 7.4 says the only changes for the SYSCALL and SYSENTER
>instructions are the enablement conditions.  Nowhere else is there
>mention of a FRED OS needing to emulate legacy syscall behaviour by
>adjusting %r11/%rcx
>
>However, ERETU does handle flags different to SYSRET (in particular, I
>think you can establish TF on the instruction boundary after SYSCALL
>now).  What are the raw values of REG_EFL and REG_R11 ?
>
>~Andrew
>

Just to avoid any confusion:

Syscall and sysenter in a FRED system are treated equivalently to software interrupts, e.g. INT 0x80. They do not modify any registers.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-20 20:50     ` H. Peter Anvin
@ 2023-01-20 21:10       ` Andrew Cooper
  2023-01-20 21:17         ` H. Peter Anvin
  0 siblings, 1 reply; 88+ messages in thread
From: Andrew Cooper @ 2023-01-20 21:10 UTC (permalink / raw)
  To: H. Peter Anvin, Dave Hansen, Li, Xin3, tglx, mingo, bp, peterz,
	dave.hansen, Andrew Cooper
  Cc: x86, linux-kernel

On 20/01/2023 8:50 pm, H. Peter Anvin wrote:
> On January 20, 2023 10:52:02 AM PST, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>> On 20/01/2023 5:45 pm, Dave Hansen wrote:
>>> On 1/19/23 23:49, Li, Xin3 wrote:
>>>> The x86 sysret_rip test has the following assertion:
>>>>
>>>>         /* R11 and EFLAGS should already match. */
>>>>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
>>>>                ctx->uc_mcontext.gregs[REG_R11]);
>>>>
>>>> This is being tested to avoid kernel state leak due to sysret vs iret,
>>>> but that on FRED r11 is *always* preserved, and the test just fails.
>>> Let's figure out the reason that FRED acts differently, first.  Right
>>> now, the SDM says:
>>>
>>> 	SYSCALL also saves RFLAGS into R11
>>>
>>> so that behavior of SYSCALL _looks_ architectural to me.  Was this
>>> change in SYSCALL behavior with FRED intentional?
>> FRED 3.0 Section 7.4 says the only changes for the SYSCALL and SYSENTER
>> instructions are the enablement conditions.  Nowhere else is there
>> mention of a FRED OS needing to emulate legacy syscall behaviour by
>> adjusting %r11/%rcx
>>
>> However, ERETU does handle flags different to SYSRET (in particular, I
>> think you can establish TF on the instruction boundary after SYSCALL
>> now).  What are the raw values of REG_EFL and REG_R11 ?
>>
>> ~Andrew
>>
> Just to avoid any confusion:
>
> Syscall and sysenter in a FRED system are treated equivalently to software interrupts, e.g. INT 0x80. They do not modify any registers.

In which case can Intel please publish a v4 spec which actually says this?

I can't see anything in the v3 spec which mentions a change in register
behaviour for SYSCALL.

~Andrew

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-20 21:10       ` Andrew Cooper
@ 2023-01-20 21:17         ` H. Peter Anvin
  2023-01-20 21:29           ` Andrew Cooper
  0 siblings, 1 reply; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-20 21:17 UTC (permalink / raw)
  To: Andrew Cooper, Dave Hansen, Li, Xin3, tglx, mingo, bp, peterz,
	dave.hansen
  Cc: x86, linux-kernel

On January 20, 2023 1:10:09 PM PST, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>On 20/01/2023 8:50 pm, H. Peter Anvin wrote:
>> On January 20, 2023 10:52:02 AM PST, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>>> On 20/01/2023 5:45 pm, Dave Hansen wrote:
>>>> On 1/19/23 23:49, Li, Xin3 wrote:
>>>>> The x86 sysret_rip test has the following assertion:
>>>>>
>>>>>         /* R11 and EFLAGS should already match. */
>>>>>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
>>>>>                ctx->uc_mcontext.gregs[REG_R11]);
>>>>>
>>>>> This is being tested to avoid kernel state leak due to sysret vs iret,
>>>>> but that on FRED r11 is *always* preserved, and the test just fails.
>>>> Let's figure out the reason that FRED acts differently, first.  Right
>>>> now, the SDM says:
>>>>
>>>> 	SYSCALL also saves RFLAGS into R11
>>>>
>>>> so that behavior of SYSCALL _looks_ architectural to me.  Was this
>>>> change in SYSCALL behavior with FRED intentional?
>>> FRED 3.0 Section 7.4 says the only changes for the SYSCALL and SYSENTER
>>> instructions are the enablement conditions.  Nowhere else is there
>>> mention of a FRED OS needing to emulate legacy syscall behaviour by
>>> adjusting %r11/%rcx
>>>
>>> However, ERETU does handle flags different to SYSRET (in particular, I
>>> think you can establish TF on the instruction boundary after SYSCALL
>>> now).  What are the raw values of REG_EFL and REG_R11 ?
>>>
>>> ~Andrew
>>>
>> Just to avoid any confusion:
>>
>> Syscall and sysenter in a FRED system are treated equivalently to software interrupts, e.g. INT 0x80. They do not modify any registers.
>
>In which case can Intel please publish a v4 spec which actually says this?
>
>I can't see anything in the v3 spec which mentions a change in register
>behaviour for SYSCALL.
>
>~Andrew
>

I'll make sure it makes it into the next update.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-20 21:17         ` H. Peter Anvin
@ 2023-01-20 21:29           ` Andrew Cooper
  0 siblings, 0 replies; 88+ messages in thread
From: Andrew Cooper @ 2023-01-20 21:29 UTC (permalink / raw)
  To: H. Peter Anvin, Dave Hansen, Li, Xin3, tglx, mingo, bp, peterz,
	dave.hansen, Andrew Cooper
  Cc: x86, linux-kernel

On 20/01/2023 9:17 pm, H. Peter Anvin wrote:
> On January 20, 2023 1:10:09 PM PST, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>> On 20/01/2023 8:50 pm, H. Peter Anvin wrote:
>>> On January 20, 2023 10:52:02 AM PST, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>>>> On 20/01/2023 5:45 pm, Dave Hansen wrote:
>>>>> On 1/19/23 23:49, Li, Xin3 wrote:
>>>>>> The x86 sysret_rip test has the following assertion:
>>>>>>
>>>>>>         /* R11 and EFLAGS should already match. */
>>>>>>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
>>>>>>                ctx->uc_mcontext.gregs[REG_R11]);
>>>>>>
>>>>>> This is being tested to avoid kernel state leak due to sysret vs iret,
>>>>>> but that on FRED r11 is *always* preserved, and the test just fails.
>>>>> Let's figure out the reason that FRED acts differently, first.  Right
>>>>> now, the SDM says:
>>>>>
>>>>> 	SYSCALL also saves RFLAGS into R11
>>>>>
>>>>> so that behavior of SYSCALL _looks_ architectural to me.  Was this
>>>>> change in SYSCALL behavior with FRED intentional?
>>>> FRED 3.0 Section 7.4 says the only changes for the SYSCALL and SYSENTER
>>>> instructions are the enablement conditions.  Nowhere else is there
>>>> mention of a FRED OS needing to emulate legacy syscall behaviour by
>>>> adjusting %r11/%rcx
>>>>
>>>> However, ERETU does handle flags different to SYSRET (in particular, I
>>>> think you can establish TF on the instruction boundary after SYSCALL
>>>> now).  What are the raw values of REG_EFL and REG_R11 ?
>>>>
>>>> ~Andrew
>>>>
>>> Just to avoid any confusion:
>>>
>>> Syscall and sysenter in a FRED system are treated equivalently to software interrupts, e.g. INT 0x80. They do not modify any registers.
>> In which case can Intel please publish a v4 spec which actually says this?
>>
>> I can't see anything in the v3 spec which mentions a change in register
>> behaviour for SYSCALL.
>>
>> ~Andrew
>>
> I'll make sure it makes it into the next update.

Thankyou!

~Andrew

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-20 17:45 ` the x86 sysret_rip test fails on the Intel FRED architecture Dave Hansen
       [not found]   ` <eb81f7f2-d266-d999-b41a-e6eae086e731@citrix.com>
@ 2023-01-21  4:59   ` H. Peter Anvin
  2023-01-21 16:46     ` Dave Hansen
  2023-01-22  3:38     ` Li, Xin3
  1 sibling, 2 replies; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-21  4:59 UTC (permalink / raw)
  To: Dave Hansen, Li, Xin3, tglx, mingo, bp, peterz, dave.hansen
  Cc: x86, linux-kernel

On January 20, 2023 9:45:26 AM PST, Dave Hansen <dave.hansen@intel.com> wrote:
>On 1/19/23 23:49, Li, Xin3 wrote:
>> The x86 sysret_rip test has the following assertion:
>> 
>>         /* R11 and EFLAGS should already match. */
>>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
>>                ctx->uc_mcontext.gregs[REG_R11]);
>> 
>> This is being tested to avoid kernel state leak due to sysret vs iret,
>> but that on FRED r11 is *always* preserved, and the test just fails.
>
>Let's figure out the reason that FRED acts differently, first.  Right
>now, the SDM says:
>
>	SYSCALL also saves RFLAGS into R11
>
>so that behavior of SYSCALL _looks_ architectural to me.  Was this
>change in SYSCALL behavior with FRED intentional?
>
>If not intentional, it might be something that can still be fixed.  If
>it is intentional and is going to be with us for a while we have a few
>options.  If userspace is _really_ depending on this behavior, we could
>just clobber r11 ourselves in the FRED entry path.  If not, we can
>remove the assertion in the selftest.

We can't clobber it in the FRED entry path, since it is common for all events, but we could do it in the syscall dispatch.

However, it doesn't seem to make sense to do so to me. The current behavior is much more of an artifact than desired behavior.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-21  4:59   ` H. Peter Anvin
@ 2023-01-21 16:46     ` Dave Hansen
  2023-01-21 21:47       ` Brian Gerst
  2023-01-22  3:38     ` Li, Xin3
  1 sibling, 1 reply; 88+ messages in thread
From: Dave Hansen @ 2023-01-21 16:46 UTC (permalink / raw)
  To: H. Peter Anvin, Li, Xin3, tglx, mingo, bp, peterz, dave.hansen
  Cc: x86, linux-kernel

On 1/20/23 20:59, H. Peter Anvin wrote:
>> If not intentional, it might be something that can still be fixed.
>> If it is intentional and is going to be with us for a while we have
>> a few options.  If userspace is _really_ depending on this
>> behavior, we could just clobber r11 ourselves in the FRED entry
>> path.  If not, we can remove the assertion in the selftest.
> We can't clobber it in the FRED entry path, since it is common for
> all events, but we could do it in the syscall dispatch.
> 
> However, it doesn't seem to make sense to do so to me. The current
> behavior is much more of an artifact than desired behavior.
I guess the SDM statements really are for the kernel's benefit and not
for userspace.  Userspace _should_ be treating SYSCALL like a CALL and
r11 like any old register that can be clobbered.  Right now, the kernel
just happens to clobber it with RFLAGS.

I do the the odds of anyone relying on this behavior are pretty small.
Let's just zap the check from the selftest, document what we did in the
FRED docs and changelog and move on.

If someone screams later, we can fix in some SYSCALL-specific piece of
the FRED code.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-21 16:46     ` Dave Hansen
@ 2023-01-21 21:47       ` Brian Gerst
  2023-01-22  3:01         ` Li, Xin3
  0 siblings, 1 reply; 88+ messages in thread
From: Brian Gerst @ 2023-01-21 21:47 UTC (permalink / raw)
  To: Dave Hansen
  Cc: H. Peter Anvin, Li, Xin3, tglx, mingo, bp, peterz, dave.hansen,
	x86, linux-kernel

On Sat, Jan 21, 2023 at 12:34 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 1/20/23 20:59, H. Peter Anvin wrote:
> >> If not intentional, it might be something that can still be fixed.
> >> If it is intentional and is going to be with us for a while we have
> >> a few options.  If userspace is _really_ depending on this
> >> behavior, we could just clobber r11 ourselves in the FRED entry
> >> path.  If not, we can remove the assertion in the selftest.
> > We can't clobber it in the FRED entry path, since it is common for
> > all events, but we could do it in the syscall dispatch.
> >
> > However, it doesn't seem to make sense to do so to me. The current
> > behavior is much more of an artifact than desired behavior.
> I guess the SDM statements really are for the kernel's benefit and not
> for userspace.  Userspace _should_ be treating SYSCALL like a CALL and
> r11 like any old register that can be clobbered.  Right now, the kernel
> just happens to clobber it with RFLAGS.
>
> I do the the odds of anyone relying on this behavior are pretty small.
> Let's just zap the check from the selftest, document what we did in the
> FRED docs and changelog and move on.

Keep the selftest check, but also accept preserved RCX/R11.  What
really matters is that the kernel isn't leaking data.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-21 21:47       ` Brian Gerst
@ 2023-01-22  3:01         ` Li, Xin3
  2023-01-22  3:28           ` H. Peter Anvin
  0 siblings, 1 reply; 88+ messages in thread
From: Li, Xin3 @ 2023-01-22  3:01 UTC (permalink / raw)
  To: Brian Gerst, Hansen, Dave
  Cc: H. Peter Anvin, tglx, mingo, bp, peterz, dave.hansen, x86, linux-kernel

> > >> If not intentional, it might be something that can still be fixed.
> > >> If it is intentional and is going to be with us for a while we have
> > >> a few options.  If userspace is _really_ depending on this
> > >> behavior, we could just clobber r11 ourselves in the FRED entry
> > >> path.  If not, we can remove the assertion in the selftest.
> > > We can't clobber it in the FRED entry path, since it is common for
> > > all events, but we could do it in the syscall dispatch.
> > >
> > > However, it doesn't seem to make sense to do so to me. The current
> > > behavior is much more of an artifact than desired behavior.
> > I guess the SDM statements really are for the kernel's benefit and not
> > for userspace.  Userspace _should_ be treating SYSCALL like a CALL and
> > r11 like any old register that can be clobbered.  Right now, the
> > kernel just happens to clobber it with RFLAGS.
> >
> > I do the the odds of anyone relying on this behavior are pretty small.
> > Let's just zap the check from the selftest, document what we did in
> > the FRED docs and changelog and move on.
> 
> Keep the selftest check, but also accept preserved RCX/R11.  What really matters is
> that the kernel isn't leaking data.

I feel it the same way, it looks to me that the check is to make sure
R11 doesn't leak any kernel data because the Linux kernel deliberately
overwrites R11 with the value of user level flags just before returning
to user level.

I wanted to zap the check, but as HPA said, this is an artifact to not leak
any kernel data.  I guess it doesn't make a difference if the kernel sets
R11 to 0.

Maybe it's still reasonable to keep such a check for IDT.  However, it makes
no sense for FRED systems, because all GP registers are saved/restored upon
event delivery/return.

Thanks!
  Xin

> 
> --
> Brian Gerst

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-22  3:01         ` Li, Xin3
@ 2023-01-22  3:28           ` H. Peter Anvin
  0 siblings, 0 replies; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-22  3:28 UTC (permalink / raw)
  To: Li, Xin3, Brian Gerst, Hansen, Dave
  Cc: tglx, mingo, bp, peterz, dave.hansen, x86, linux-kernel

On January 21, 2023 7:01:53 PM PST, "Li, Xin3" <xin3.li@intel.com> wrote:
>> > >> If not intentional, it might be something that can still be fixed.
>> > >> If it is intentional and is going to be with us for a while we have
>> > >> a few options.  If userspace is _really_ depending on this
>> > >> behavior, we could just clobber r11 ourselves in the FRED entry
>> > >> path.  If not, we can remove the assertion in the selftest.
>> > > We can't clobber it in the FRED entry path, since it is common for
>> > > all events, but we could do it in the syscall dispatch.
>> > >
>> > > However, it doesn't seem to make sense to do so to me. The current
>> > > behavior is much more of an artifact than desired behavior.
>> > I guess the SDM statements really are for the kernel's benefit and not
>> > for userspace.  Userspace _should_ be treating SYSCALL like a CALL and
>> > r11 like any old register that can be clobbered.  Right now, the
>> > kernel just happens to clobber it with RFLAGS.
>> >
>> > I do the the odds of anyone relying on this behavior are pretty small.
>> > Let's just zap the check from the selftest, document what we did in
>> > the FRED docs and changelog and move on.
>> 
>> Keep the selftest check, but also accept preserved RCX/R11.  What really matters is
>> that the kernel isn't leaking data.
>
>I feel it the same way, it looks to me that the check is to make sure
>R11 doesn't leak any kernel data because the Linux kernel deliberately
>overwrites R11 with the value of user level flags just before returning
>to user level.
>
>I wanted to zap the check, but as HPA said, this is an artifact to not leak
>any kernel data.  I guess it doesn't make a difference if the kernel sets
>R11 to 0.
>
>Maybe it's still reasonable to keep such a check for IDT.  However, it makes
>no sense for FRED systems, because all GP registers are saved/restored upon
>event delivery/return.
>
>Thanks!
>  Xin
>
>> 
>> --
>> Brian Gerst
>

The big thing is that the system calls that return with sysret v iret on IDT systems need to be consistent, in order to not leak kernel state.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-21  4:59   ` H. Peter Anvin
  2023-01-21 16:46     ` Dave Hansen
@ 2023-01-22  3:38     ` Li, Xin3
  2023-01-22  4:34       ` Dave Hansen
  1 sibling, 1 reply; 88+ messages in thread
From: Li, Xin3 @ 2023-01-22  3:38 UTC (permalink / raw)
  To: H. Peter Anvin, Hansen, Dave, tglx, mingo, bp, peterz, dave.hansen
  Cc: x86, linux-kernel

> >> The x86 sysret_rip test has the following assertion:
> >>
> >>         /* R11 and EFLAGS should already match. */
> >>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
> >>                ctx->uc_mcontext.gregs[REG_R11]);
> >>
> >> This is being tested to avoid kernel state leak due to sysret vs
> >> iret, but that on FRED r11 is *always* preserved, and the test just fails.
> >
> >Let's figure out the reason that FRED acts differently, first.  Right
> >now, the SDM says:
> >
> >	SYSCALL also saves RFLAGS into R11
> >
> >so that behavior of SYSCALL _looks_ architectural to me.  Was this
> >change in SYSCALL behavior with FRED intentional?
> >
> >If not intentional, it might be something that can still be fixed.  If
> >it is intentional and is going to be with us for a while we have a few
> >options.  If userspace is _really_ depending on this behavior, we could
> >just clobber r11 ourselves in the FRED entry path.  If not, we can
> >remove the assertion in the selftest.
> 
> We can't clobber it in the FRED entry path, since it is common for all events, but we
> could do it in the syscall dispatch.

Yes, adding "regs->r11 = regs->flags" in the SYSCALL dispatch does make
the test pass.

> 
> However, it doesn't seem to make sense to do so to me. The current behavior is
> much more of an artifact than desired behavior.

We kind of have an agreement that %r11 = %flags after returning from the kernel.

And the question is, is it what we want?

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-22  3:38     ` Li, Xin3
@ 2023-01-22  4:34       ` Dave Hansen
  2023-01-22  4:44         ` H. Peter Anvin
  2023-01-22 23:45         ` H. Peter Anvin
  0 siblings, 2 replies; 88+ messages in thread
From: Dave Hansen @ 2023-01-22  4:34 UTC (permalink / raw)
  To: Li, Xin3, H. Peter Anvin, tglx, mingo, bp, peterz, dave.hansen
  Cc: x86, linux-kernel

On 1/21/23 19:38, Li, Xin3 wrote:
>> However, it doesn't seem to make sense to do so to me. The current behavior is
>> much more of an artifact than desired behavior.
> We kind of have an agreement that %r11 = %flags after returning from the kernel.
> 
> And the question is, is it what we want?

Can the selftest just set r11=rflags before the syscall?  The old
syscall entry path will set r11=rflags.  The FRED path won't touch it.
Either case will pass an r11==rflags check.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-22  4:34       ` Dave Hansen
@ 2023-01-22  4:44         ` H. Peter Anvin
  2023-01-22  8:22           ` Li, Xin3
  2023-01-22 23:45         ` H. Peter Anvin
  1 sibling, 1 reply; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-22  4:44 UTC (permalink / raw)
  To: Dave Hansen, Li, Xin3, tglx, mingo, bp, peterz, dave.hansen
  Cc: x86, linux-kernel

On January 21, 2023 8:34:09 PM PST, Dave Hansen <dave.hansen@intel.com> wrote:
>On 1/21/23 19:38, Li, Xin3 wrote:
>>> However, it doesn't seem to make sense to do so to me. The current behavior is
>>> much more of an artifact than desired behavior.
>> We kind of have an agreement that %r11 = %flags after returning from the kernel.
>> 
>> And the question is, is it what we want?
>
>Can the selftest just set r11=rflags before the syscall?  The old
>syscall entry path will set r11=rflags.  The FRED path won't touch it.
>Either case will pass an r11==rflags check.

That's a good idea.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-22  4:44         ` H. Peter Anvin
@ 2023-01-22  8:22           ` Li, Xin3
  2023-01-22  8:54             ` Ammar Faizi
  0 siblings, 1 reply; 88+ messages in thread
From: Li, Xin3 @ 2023-01-22  8:22 UTC (permalink / raw)
  To: H. Peter Anvin, Hansen, Dave, tglx, mingo, bp, peterz, dave.hansen
  Cc: x86, linux-kernel

> On January 21, 2023 8:34:09 PM PST, Dave Hansen <dave.hansen@intel.com>
> wrote:
> >On 1/21/23 19:38, Li, Xin3 wrote:
> >>> However, it doesn't seem to make sense to do so to me. The current
> >>> behavior is much more of an artifact than desired behavior.
> >> We kind of have an agreement that %r11 = %flags after returning from the
> kernel.
> >>
> >> And the question is, is it what we want?
> >
> >Can the selftest just set r11=rflags before the syscall?  The old
> >syscall entry path will set r11=rflags.  The FRED path won't touch it.
> >Either case will pass an r11==rflags check.
> 
> That's a good idea.

The problem is where/how to set %r11 = %rflags in the test code.

The check happens in the USER1 signal handler, and we could set %r11
just before calling raise(SIGUSR1).  However, the C library implementation
of raise() modifies %r11, thus we can't preserve %r11 until the SYSCALL
instruction. And the test still fails.

Thanks!
  XIn

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: RE: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-22  8:22           ` Li, Xin3
@ 2023-01-22  8:54             ` Ammar Faizi
  2023-01-22  9:40               ` H. Peter Anvin
  0 siblings, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-22  8:54 UTC (permalink / raw)
  To: Li, Xin3
  Cc: H. Peter Anvin, Dave Hansen, Peter Zijlstra, Dave Hansen,
	Ingo Molnar, Thomas Gleixner, Borislav Petkov, x86 Mailing List,
	Linux Kernel Mailing List

On 1/22/23 3:22 PM, Li, Xin3 wrote:
> The problem is where/how to set %r11 = %rflags in the test code.
> 
> The check happens in the USER1 signal handler, and we could set %r11
> just before calling raise(SIGUSR1).  However, the C library implementation
> of raise() modifies %r11, thus we can't preserve %r11 until the SYSCALL
> instruction. And the test still fails.

 From "man 3 raise":

"""
The raise() function sends a signal to the calling process or thread.
In a single-threaded program it is equivalent to

         kill(getpid(), sig);
"""

Implementing kill syscall with %r11 modified before entering the kernel
may look like this?

static void __raise(int sig)
{
         __asm__ volatile (
                 "pushf\n\t"
                 "popq %%r11\n\t"
                 "syscall"
                 :
                 : "D"(getpid()),        /* %rdi */
                   "S"(sig),             /* %rsi */
                   "a"(__NR_kill)        /* %rax */
                 : "rcx", "r11", "memory"
         );
}

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: RE: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-22  8:54             ` Ammar Faizi
@ 2023-01-22  9:40               ` H. Peter Anvin
  0 siblings, 0 replies; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-22  9:40 UTC (permalink / raw)
  To: Ammar Faizi, Li, Xin3
  Cc: Dave Hansen, Peter Zijlstra, Dave Hansen, Ingo Molnar,
	Thomas Gleixner, Borislav Petkov, x86 Mailing List,
	Linux Kernel Mailing List

On January 22, 2023 12:54:56 AM PST, Ammar Faizi <ammarfaizi2@gnuweeb.org> wrote:
>On 1/22/23 3:22 PM, Li, Xin3 wrote:
>> The problem is where/how to set %r11 = %rflags in the test code.
>> 
>> The check happens in the USER1 signal handler, and we could set %r11
>> just before calling raise(SIGUSR1).  However, the C library implementation
>> of raise() modifies %r11, thus we can't preserve %r11 until the SYSCALL
>> instruction. And the test still fails.
>
>From "man 3 raise":
>
>"""
>The raise() function sends a signal to the calling process or thread.
>In a single-threaded program it is equivalent to
>
>        kill(getpid(), sig);
>"""
>
>Implementing kill syscall with %r11 modified before entering the kernel
>may look like this?
>
>static void __raise(int sig)
>{
>        __asm__ volatile (
>                "pushf\n\t"
>                "popq %%r11\n\t"
>                "syscall"
>                :
>                : "D"(getpid()),        /* %rdi */
>                  "S"(sig),             /* %rsi */
>                  "a"(__NR_kill)        /* %rax */
>                : "rcx", "r11", "memory"
>        );
>}
>

Exactly.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-22  4:34       ` Dave Hansen
  2023-01-22  4:44         ` H. Peter Anvin
@ 2023-01-22 23:45         ` H. Peter Anvin
  2023-01-23  9:02           ` Ammar Faizi
  1 sibling, 1 reply; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-22 23:45 UTC (permalink / raw)
  To: Dave Hansen, Li, Xin3, tglx, mingo, bp, peterz, dave.hansen
  Cc: x86, linux-kernel

On 1/21/23 20:34, Dave Hansen wrote:
> On 1/21/23 19:38, Li, Xin3 wrote:
>>> However, it doesn't seem to make sense to do so to me. The current behavior is
>>> much more of an artifact than desired behavior.
>> We kind of have an agreement that %r11 = %flags after returning from the kernel.
>>
>> And the question is, is it what we want?
> 
> Can the selftest just set r11=rflags before the syscall?  The old
> syscall entry path will set r11=rflags.  The FRED path won't touch it.
> Either case will pass an r11==rflags check.

Thinking about this some more, the whole point here is to make sure that 
all syscalls work consistently, so that we don't leak a kernel 
implementation detail.

Either r11 and rcx should both be clobbered (set to %rcx and the return 
%rip respectively), or none should be.

As such, I think we want to do something like:



/* Arbitrary values */
const unsigned long r11_sentinel = 0xfeedfacedeadbeef;
const unsigned long rcx_sentinel = 0x5ca1ab1e0b57ac1e;

/* An arbitrary *valid* RFLAGS value */
const unsigned long rflags_sentinel = 0x200893;

enum regs_ok {
	REGS_ERROR  = -1,	/* Invalid register contents */
	REGS_SAVED  =  0,	/* Registers properly preserved */
	REGS_SYSRET =  1	/* Registers match syscall/sysret */
};

/*
  * Returns:
  *  0 = %rcx and %r11 preserved
  *  1 = %rcx and %r11 set to %rflags and %rip
  * -1 = %rcx and/or %r11 set to any other values
  *
  * Note that check_regs_syscall() set %rbx to the syscall return %rip.
  */
static enum regs_ok check_regs_result(unsigned long r11,
	unsigned long rcx, unsigned long rbx)
{
	if (r11 == r11_sentinel && rcx == rcx_sentinel)
		return REGS_SAVED;
	else if (r11 == rflags_sentinel && rcx == rbx)
		return REGS_SYSRET;
	else
		return REGS_ERROR;
}

static enum regs_ok check_regs_syscall(int syscall,
	unsigned long arg1, unsigned long arg2)
{

	register unsigned long r11 asm("%r11");
	unsigned long rcx, rbx, tmp;

	r11 = r11_sentinel;
	rcx = rcx_sentinel;

	asm volatile("push %3; popf; "
		     "lea 1f(%%rip),%2; "
		     "syscall; "
		     "1:"
		     : "+r" (r11), "+c" (rcx), "=b" (rbx)
		     : "g" (rflags_sentinel),
		       "a" (syscall), "D" (arg1), "S" (arg2));

	return check_regs_result(r11, rcx, rbx);
}

enum regs_ok
check_regs_getppid(void)
{
     return check_regs_syscall(__NR_getppid, 0, 0);
}



Test it out with a trivial system call like __NR_getppid which is 
extremely likely to return with SYSRET on an IDT system; check that it 
returns a nonnegative value and then save the result.

All tests from that point on should return the *same* value, including 
the signal handler etc.

I make the result-checking function separate so that it can be fed data 
from e.g. the signal stack or ptrace.

	-hpa

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-22 23:45         ` H. Peter Anvin
@ 2023-01-23  9:02           ` Ammar Faizi
  2023-01-23 19:43             ` H. Peter Anvin
  2023-01-23 23:53             ` the x86 sysret_rip test fails on the Intel FRED architecture Andrew Cooper
  0 siblings, 2 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-23  9:02 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, Dave Hansen, Li, Xin3, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, x86, Linux Kernel Mailing List

On 1/23/23 6:45 AM, H. Peter Anvin wrote:
> static enum regs_ok check_regs_syscall(int syscall,
>      unsigned long arg1, unsigned long arg2)
> {
> 
>      register unsigned long r11 asm("%r11");
>      unsigned long rcx, rbx, tmp;

tmp is unused.

>      r11 = r11_sentinel;
>      rcx = rcx_sentinel;
> 
>      asm volatile("push %3; popf; "
>               "lea 1f(%%rip),%2; "
>               "syscall; "
>               "1:"
>               : "+r" (r11), "+c" (rcx), "=b" (rbx)
>               : "g" (rflags_sentinel),
>                 "a" (syscall), "D" (arg1), "S" (arg2));

BTW, I just realized this "push" is unsafe for userspace code if the
compiler decides to inline this inside a leaf function that uses the
redzone.

Reason: Because this "push;" clobbers redzone.

It doesn't always happen, but when that happens it can be confusing to
debug.

A simple workaround is: just compile it with "-mno-red-zone" flag.

Alternative, without using that flag, maybe preserve the value like:

     movq    -8(%rsp), %r12
     pushq   %[rflags_sentinel]
     popf
     movq    %r12, -8(%rsp)
     syscall

with "r12" and "memory" added to the clobber list.

What do you think?

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-23  9:02           ` Ammar Faizi
@ 2023-01-23 19:43             ` H. Peter Anvin
  2023-01-23 23:43               ` Ammar Faizi
  2023-01-23 23:53             ` the x86 sysret_rip test fails on the Intel FRED architecture Andrew Cooper
  1 sibling, 1 reply; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-23 19:43 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Dave Hansen, Dave Hansen, Li, Xin3, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, x86, Linux Kernel Mailing List

On January 23, 2023 1:02:22 AM PST, Ammar Faizi <ammarfaizi2@gnuweeb.org> wrote:
>On 1/23/23 6:45 AM, H. Peter Anvin wrote:
>> static enum regs_ok check_regs_syscall(int syscall,
>>      unsigned long arg1, unsigned long arg2)
>> {
>> 
>>      register unsigned long r11 asm("%r11");
>>      unsigned long rcx, rbx, tmp;
>
>tmp is unused.
>
>>      r11 = r11_sentinel;
>>      rcx = rcx_sentinel;
>> 
>>      asm volatile("push %3; popf; "
>>               "lea 1f(%%rip),%2; "
>>               "syscall; "
>>               "1:"
>>               : "+r" (r11), "+c" (rcx), "=b" (rbx)
>>               : "g" (rflags_sentinel),
>>                 "a" (syscall), "D" (arg1), "S" (arg2));
>
>BTW, I just realized this "push" is unsafe for userspace code if the
>compiler decides to inline this inside a leaf function that uses the
>redzone.
>
>Reason: Because this "push;" clobbers redzone.
>
>It doesn't always happen, but when that happens it can be confusing to
>debug.
>
>A simple workaround is: just compile it with "-mno-red-zone" flag.
>
>Alternative, without using that flag, maybe preserve the value like:
>
>    movq    -8(%rsp), %r12
>    pushq   %[rflags_sentinel]
>    popf
>    movq    %r12, -8(%rsp)
>    syscall
>
>with "r12" and "memory" added to the clobber list.
>
>What do you think?
>

Good spotting. %rax needs to be marked clobbered, too.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-23 19:43             ` H. Peter Anvin
@ 2023-01-23 23:43               ` Ammar Faizi
  2023-01-23 23:58                 ` H. Peter Anvin
  0 siblings, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-23 23:43 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, Dave Hansen, Xin Li, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, x86 Mailing List,
	Linux Kernel Mailing List

On Mon, Jan 23, 2023 at 11:43:35AM -0800, H. Peter Anvin wrote:
> Good spotting. %rax needs to be marked clobbered, too.

Yeah, that 'syscall' variable should be using a "+a" constraint.

But anyway, I found something wrong with this. I was playing with your
code and I found it failed to assert that %r11 == rflags_sentinel on a
non-FRED system. It happens because "popf" doesn't set the %rflags
to the expected value.

I tried to simplify it like this:

    pushq  $0x200893
    popf              # This popf sets %rflags to 0x200a93, not to 0x200893.
    pushf
    popq   %r11

    # Now %r11 == 0x200a93,
    # but the expected value is %r11 == 0x200893.

Looking into their bits:

    (gdb) p/t 0x200893
    $1 = 1000000000100010010011
    (gdb) p/t 0x200a93
    $2 = 1000000000101010010011

Align them to spot differences:

    0x200893 = 0b1000000000100010010011
    0x200a93 = 0b1000000000101010010011
                             ^

Or just xor them to find the differences:

    (gdb) p/x 0x200893 ^ 0x200a93
    $3 = 0x200

** Checks my Intel SDM cheat sheets. **

Then, I was like "Oh, that's (1 << 9) a.k.a. IF. Of course we can't
change rflags[IF] from userspace!!!".

In short, we can't use 0x200893 as the rflags_sentinel value because it
clears the interrupt flag.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-23  9:02           ` Ammar Faizi
  2023-01-23 19:43             ` H. Peter Anvin
@ 2023-01-23 23:53             ` Andrew Cooper
  2023-01-24  0:01               ` H. Peter Anvin
  1 sibling, 1 reply; 88+ messages in thread
From: Andrew Cooper @ 2023-01-23 23:53 UTC (permalink / raw)
  To: Ammar Faizi, H. Peter Anvin
  Cc: Dave Hansen, Dave Hansen, Li, Xin3, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, x86, Linux Kernel Mailing List,
	Andrew Cooper

On 23/01/2023 9:02 am, Ammar Faizi wrote:
> On 1/23/23 6:45 AM, H. Peter Anvin wrote:
>> static enum regs_ok check_regs_syscall(int syscall,
>>      unsigned long arg1, unsigned long arg2)
>> {
>>
>>      register unsigned long r11 asm("%r11");
>>      unsigned long rcx, rbx, tmp;
>
> tmp is unused.
>
>>      r11 = r11_sentinel;
>>      rcx = rcx_sentinel;
>>
>>      asm volatile("push %3; popf; "
>>               "lea 1f(%%rip),%2; "
>>               "syscall; "
>>               "1:"
>>               : "+r" (r11), "+c" (rcx), "=b" (rbx)
>>               : "g" (rflags_sentinel),
>>                 "a" (syscall), "D" (arg1), "S" (arg2));
>
> BTW, I just realized this "push" is unsafe for userspace code if the
> compiler decides to inline this inside a leaf function that uses the
> redzone.
>
> Reason: Because this "push;" clobbers redzone.
>
> It doesn't always happen, but when that happens it can be confusing to
> debug.
>
> A simple workaround is: just compile it with "-mno-red-zone" flag.

You can't compile userspace like that, unless you recompile and
statically link everything including libc.

The proper way to fix this is to add a "+r" constraint on the stack
pointer, at which point the compiler will arrange for nothing in the
redzone to be live around the asm block.

~Andrew

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-23 23:43               ` Ammar Faizi
@ 2023-01-23 23:58                 ` H. Peter Anvin
  2023-01-24  0:26                   ` [RFC PATCH v1 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
  0 siblings, 1 reply; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-23 23:58 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Dave Hansen, Dave Hansen, Xin Li, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, x86 Mailing List,
	Linux Kernel Mailing List

On 1/23/23 15:43, Ammar Faizi wrote:
> 
> Align them to spot differences:
> 
>      0x200893 = 0b1000000000100010010011
>      0x200a93 = 0b1000000000101010010011
>                               ^
> 
> Or just xor them to find the differences:
> 
>      (gdb) p/x 0x200893 ^ 0x200a93
>      $3 = 0x200
> 
> ** Checks my Intel SDM cheat sheets. **
> 
> Then, I was like "Oh, that's (1 << 9) a.k.a. IF. Of course we can't
> change rflags[IF] from userspace!!!".
> 
> In short, we can't use 0x200893 as the rflags_sentinel value because it
> clears the interrupt flag.
> 

Right, my mistake.

	-hpa

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: the x86 sysret_rip test fails on the Intel FRED architecture
  2023-01-23 23:53             ` the x86 sysret_rip test fails on the Intel FRED architecture Andrew Cooper
@ 2023-01-24  0:01               ` H. Peter Anvin
  2023-01-24  2:27                 ` [RFC PATCH v2 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
  0 siblings, 1 reply; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-24  0:01 UTC (permalink / raw)
  To: Andrew Cooper, Ammar Faizi
  Cc: Dave Hansen, Dave Hansen, Li, Xin3, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, x86, Linux Kernel Mailing List

On 1/23/23 15:53, Andrew Cooper wrote:
>>
>> A simple workaround is: just compile it with "-mno-red-zone" flag.
> 
> You can't compile userspace like that, unless you recompile and
> statically link everything including libc.
> 
> The proper way to fix this is to add a "+r" constraint on the stack
> pointer, at which point the compiler will arrange for nothing in the
> redzone to be live around the asm block.
> 

So you are of course right that that is The Right Way[TM] to fix this.

However, to be picky: since the redzone is not in use across function 
calls, it is perfectly legitimate to inhibit the redzone for one 
function only.

	-hpa

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v1 0/2] selftests/x86: sysret_rip update for FRED system
  2023-01-23 23:58                 ` H. Peter Anvin
@ 2023-01-24  0:26                   ` Ammar Faizi
  2023-01-24  0:26                     ` [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a " Ammar Faizi
  2023-01-24  0:26                     ` [RFC PATCH v1 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
  0 siblings, 2 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24  0:26 UTC (permalink / raw)
  To: x86 Mailing List
  Cc: Ammar Faizi, H. Peter Anvin, Dave Hansen, Dave Hansen, Xin Li,
	Thomas Gleixner, Andrew Cooper, Brian Gerst, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Shuah Khan, Ingo Molnar,
	Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

On Mon, 23 Jan 2023 15:58:12 -0800, "H. Peter Anvin" wrote:
> On 1/23/23 15:43, Ammar Faizi wrote:
> > 
> > Align them to spot differences:
> > 
> >      0x200893 = 0b1000000000100010010011
> >      0x200a93 = 0b1000000000101010010011
> >                               ^
> > 
> > Or just xor them to find the differences:
> > 
> >      (gdb) p/x 0x200893 ^ 0x200a93
> >      $3 = 0x200
> > 
> > ** Checks my Intel SDM cheat sheets. **
> > 
> > Then, I was like "Oh, that's (1 << 9) a.k.a. IF. Of course we can't
> > change rflags[IF] from userspace!!!".
> > 
> > In short, we can't use 0x200893 as the rflags_sentinel value because it
> > clears the interrupt flag.
> > 
> 
> Right, my mistake.

I changed it to 0x200a93. The test passed on my machine. But I don't
have a FRED system to test the special case.

Didn't manage to apply the feedback from Andrew about the way to handle
redzone properly, though.

Something like this...

----------

This is just an RFC patchset.

Xin Li reported sysret_rip test fails at:

        assert(ctx->uc_mcontext.gregs[REG_EFL] ==
               ctx->uc_mcontext.gregs[REG_R11]);

in a FRED system. Handle the FRED system scenario too. There are two
patches in this series. Comments welcome...

Note: Only tested for 'syscall' sets %rcx=%rip and %r11=%rflags case.
I don't have a FRED system to test it.

How to test this:

  $ make -C tools/testing/selftests/x86
  $ tools/testing/selftests/x86/sysret_rip_64

Link: https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-9c001c2343af@intel.com
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

Ammar Faizi (2):
  selftests/x86: sysret_rip: Handle syscall in a FRED system
  selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`

 tools/testing/selftests/x86/sysret_rip.c | 105 ++++++++++++++++++++++-
 1 file changed, 104 insertions(+), 1 deletion(-)


base-commit: e12ad468c22065a2826b2fc4c11d2113a7975301
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-24  0:26                   ` [RFC PATCH v1 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
@ 2023-01-24  0:26                     ` Ammar Faizi
  2023-01-24  1:40                       ` H. Peter Anvin
  2023-01-24  0:26                     ` [RFC PATCH v1 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
  1 sibling, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24  0:26 UTC (permalink / raw)
  To: x86 Mailing List
  Cc: Ammar Faizi, H. Peter Anvin, Dave Hansen, Dave Hansen, Xin Li,
	Thomas Gleixner, Andrew Cooper, Brian Gerst, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Shuah Khan, Ingo Molnar,
	Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

The current selftest asserts %r11 == %rflags after the 'syscall'
returns to user. Such an assertion doesn't apply to a FRED system
because in a FRED system the 'syscall' instruction does not set
%r11=%rflags and %rcx=%rip.

Handle the FRED case. Now, test that:

  - "syscall" in a FRED system doesn't clobber %rcx and %r11.
  - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.

The 'raise()' function from libc can't be used to control those
registers. Therefore, create a syscall wrapper in inline Assembly to
fully control them.

Fixes: 660602140103 ("selftests/x86: Add a selftest for SYSRET to noncanonical addresses")
Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Reported-by: Xin Li <xin3.li@intel.com>
Co-developed-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

Missing a Signed-off-by tag from HPA. 

@hpa send your sign off if you like this patch.

 tools/testing/selftests/x86/sysret_rip.c | 96 +++++++++++++++++++++++-
 1 file changed, 95 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 84d74be1d90207ab..9056f2e2674d2bc5 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -39,6 +39,95 @@ asm (
 extern const char test_page[];
 static void const *current_test_page_addr = test_page;
 
+/* Arbitrary values */
+static const unsigned long r11_sentinel = 0xfeedfacedeadbeef;
+static const unsigned long rcx_sentinel = 0x5ca1ab1e0b57ac1e;
+
+/* An arbitrary *valid* RFLAGS value */
+static const unsigned long rflags_sentinel = 0x200a93;
+
+enum regs_ok {
+	REGS_ERROR  = -1,	/* Invalid register contents */
+	REGS_SAVED  =  0,	/* Registers properly preserved */
+	REGS_SYSRET =  1	/* Registers match syscall/sysret */
+};
+
+/*
+ * Returns:
+ *  0 = %rcx and %r11 preserved.
+ *  1 = %rcx and %r11 set to %rflags and %rip.
+ * -1 = %rcx and/or %r11 set to any other values.
+ *
+ * Note that check_regs_syscall() sets %rbx to the syscall return %rip.
+ */
+static enum regs_ok check_regs_result(unsigned long r11, unsigned long rcx,
+				      unsigned long rbx)
+{
+	if (r11 == r11_sentinel && rcx == rcx_sentinel) {
+		return REGS_SAVED;
+	} else if (r11 == rflags_sentinel && rcx == rbx) {
+		return REGS_SYSRET;
+	} else {
+		printf("[FAIL] check_regs_result\n");
+		printf("        r11_sentinel = %#lx; %%r11 = %#lx;\n", r11_sentinel, r11);
+		printf("        rcx_sentinel = %#lx; %%rcx = %#lx;\n", rcx_sentinel, rcx);
+		printf("        rflags_sentinel = %#lx\n", rflags_sentinel);
+		return REGS_ERROR;
+	}
+}
+
+static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
+		       unsigned long arg3, unsigned long arg4,
+		       unsigned long arg5, unsigned long arg6)
+{
+	register unsigned long r11 asm("%r11");
+	register unsigned long r10 asm("%r10");
+	register unsigned long r8 asm("%r8");
+	register unsigned long r9 asm("%r9");
+	unsigned long rcx, rbx;
+
+	r11 = r11_sentinel;
+	rcx = rcx_sentinel;
+	r10 = arg4;
+	r8 = arg5;
+	r9 = arg6;
+
+	asm volatile (
+		"movq	-8(%%rsp), %%r12\n\t"	/* Don't clobber redzone. */
+		"pushq	%[rflags_sentinel]\n\t"
+		"popf\n\t"
+		"movq	%%r12, -8(%%rsp)\n\t"
+		"leaq	1f(%%rip), %[rbx]\n\t"
+		"syscall\n"
+		"1:"
+
+		: "+a" (nr_syscall),
+		  "+r" (r11),
+		  "+c" (rcx),
+		  [rbx] "=b" (rbx)
+
+		: [rflags_sentinel] "g" (rflags_sentinel),
+		  "D" (arg1),	/* %rdi */
+		  "S" (arg2),	/* %rsi */
+		  "d" (arg3),	/* %rdx */
+		  "r" (r10),
+		  "r" (r8),
+		  "r" (r9)
+
+		: "r12", "memory"
+	);
+
+	/*
+	 * Test that:
+	 *
+	 * - "syscall" in a FRED system doesn't clobber %rcx and %r11.
+	 * - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
+	 *
+	 */
+	assert(check_regs_result(r11, rcx, rbx) != REGS_ERROR);
+	return nr_syscall;
+}
+
 static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
 		       int flags)
 {
@@ -101,11 +190,16 @@ static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
 	return;
 }
 
+static void __raise(int sig)
+{
+	do_syscall(__NR_kill, getpid(), sig, 0, 0, 0, 0);
+}
+
 static void test_sigreturn_to(unsigned long ip)
 {
 	rip = ip;
 	printf("[RUN]\tsigreturn to 0x%lx\n", ip);
-	raise(SIGUSR1);
+	__raise(SIGUSR1);
 }
 
 static jmp_buf jmpbuf;
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [RFC PATCH v1 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-24  0:26                   ` [RFC PATCH v1 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
  2023-01-24  0:26                     ` [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a " Ammar Faizi
@ 2023-01-24  0:26                     ` Ammar Faizi
  1 sibling, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24  0:26 UTC (permalink / raw)
  To: x86 Mailing List
  Cc: Ammar Faizi, H. Peter Anvin, Dave Hansen, Dave Hansen, Xin Li,
	Thomas Gleixner, Andrew Cooper, Brian Gerst, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Shuah Khan, Ingo Molnar,
	Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

Test that:

 - "syscall" in a FRED system doesn't clobber %rcx and %r11.
 - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.

Test them out with a trivial system call like __NR_getppid and friends
which are extremely likely to return with SYSRET on an IDT system; check
that it returns a nonnegative value and then save the result.

Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Co-developed-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

Missing a Signed-off-by tag from HPA. 

@hpa send your sign off if you like this patch.

 tools/testing/selftests/x86/sysret_rip.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 9056f2e2674d2bc5..c55f6d04f0ae1f2d 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -252,8 +252,17 @@ static void test_syscall_fallthrough_to(unsigned long ip)
 	printf("[OK]\tWe survived\n");
 }
 
+static void test_syscall_rcx_r11(void)
+{
+	do_syscall(__NR_getpid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_gettid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_getppid, 0, 0, 0, 0, 0, 0);
+}
+
 int main()
 {
+	test_syscall_rcx_r11();
+
 	/*
 	 * When the kernel returns from a slow-path syscall, it will
 	 * detect whether SYSRET is appropriate.  If it incorrectly
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-24  0:26                     ` [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a " Ammar Faizi
@ 2023-01-24  1:40                       ` H. Peter Anvin
  2023-01-24  2:31                         ` Ammar Faizi
                                           ` (2 more replies)
  0 siblings, 3 replies; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-24  1:40 UTC (permalink / raw)
  To: Ammar Faizi, x86 Mailing List
  Cc: Dave Hansen, Dave Hansen, Xin Li, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On 1/23/23 16:26, Ammar Faizi wrote:
> +
> +static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
> +		       unsigned long arg3, unsigned long arg4,
> +		       unsigned long arg5, unsigned long arg6)
> +{
> +	register unsigned long r11 asm("%r11");
> +	register unsigned long r10 asm("%r10");
> +	register unsigned long r8 asm("%r8");
> +	register unsigned long r9 asm("%r9");
> +	unsigned long rcx, rbx;
> +
> +	r11 = r11_sentinel;
> +	rcx = rcx_sentinel;
> +	r10 = arg4;
> +	r8 = arg5;
> +	r9 = arg6;
> +
> +	asm volatile (
> +		"movq	-8(%%rsp), %%r12\n\t"	/* Don't clobber redzone. */
> +		"pushq	%[rflags_sentinel]\n\t"
> +		"popf\n\t"
> +		"movq	%%r12, -8(%%rsp)\n\t"
> +		"leaq	1f(%%rip), %[rbx]\n\t"
> +		"syscall\n"
> +		"1:"
> +
> +		: "+a" (nr_syscall),
> +		  "+r" (r11),
> +		  "+c" (rcx),
> +		  [rbx] "=b" (rbx)
> +
> +		: [rflags_sentinel] "g" (rflags_sentinel),
> +		  "D" (arg1),	/* %rdi */
> +		  "S" (arg2),	/* %rsi */
> +		  "d" (arg3),	/* %rdx */
> +		  "r" (r10),
> +		  "r" (r8),
> +		  "r" (r9)
> +
> +		: "r12", "memory"
> +	);
> +
> +	/*
> +	 * Test that:
> +	 *
> +	 * - "syscall" in a FRED system doesn't clobber %rcx and %r11.
> +	 * - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
> +	 *
> +	 */
> +	assert(check_regs_result(r11, rcx, rbx) != REGS_ERROR);
> +	return nr_syscall;
> +}
> +

So as per Andrew's comment, add:

register void * rsp asm("%rsp");

...

"+r" (rsp)	/* clobber the redzone */

... as the right way to avoid redzone problems.

	-hpa

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v2 0/2] selftests/x86: sysret_rip update for FRED system
  2023-01-24  0:01               ` H. Peter Anvin
@ 2023-01-24  2:27                 ` Ammar Faizi
  2023-01-24  2:27                   ` [RFC PATCH v2 1/2] selftests/x86: sysret_rip: Handle syscall in a " Ammar Faizi
  2023-01-24  2:27                   ` [RFC PATCH v2 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
  0 siblings, 2 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24  2:27 UTC (permalink / raw)
  To: H. Peter Anvin, x86 Mailing List
  Cc: Ammar Faizi, Dave Hansen, Dave Hansen, Xin Li, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

This is an RFC patchset v2.

Xin Li reported sysret_rip test fails at:

        assert(ctx->uc_mcontext.gregs[REG_EFL] ==
               ctx->uc_mcontext.gregs[REG_R11]);

in a FRED system. Handle the FRED system scenario too. There are two
patches in this series. Comments welcome...

Note: This patchset is only tested for 'syscall' sets %rcx=%rip and
%r11=%rflags case. I don't have a FRED system to test it.

How to test this:

  $ make -C tools/testing/selftests/x86
  $ tools/testing/selftests/x86/sysret_rip_64

Link: https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-9c001c2343af@intel.com
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

## Changelog v2:

   - Use "+r"(rsp) as the right way to avoid redzone problems
     per Andrew's comment (hpa).
     (Ref: https://lore.kernel.org/lkml/8f5c24df-514d-5d89-f58f-ec8c3eb1e049@zytor.com )

---

Ammar Faizi (2):
  selftests/x86: sysret_rip: Handle syscall in a FRED system
  selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`

 tools/testing/selftests/x86/sysret_rip.c | 105 ++++++++++++++++++++++-
 1 file changed, 104 insertions(+), 1 deletion(-)


base-commit: e12ad468c22065a2826b2fc4c11d2113a7975301
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v2 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-24  2:27                 ` [RFC PATCH v2 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
@ 2023-01-24  2:27                   ` Ammar Faizi
  2023-01-24  5:44                     ` H. Peter Anvin
  2023-01-24  2:27                   ` [RFC PATCH v2 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
  1 sibling, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24  2:27 UTC (permalink / raw)
  To: H. Peter Anvin, x86 Mailing List
  Cc: Ammar Faizi, Dave Hansen, Dave Hansen, Xin Li, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

The current selftest asserts %r11 == %rflags after the 'syscall'
returns to user. Such an assertion doesn't apply to a FRED system
because in a FRED system the 'syscall' instruction does not set
%r11=%rflags and %rcx=%rip.

Handle the FRED case. Now, test that:

  - "syscall" in a FRED system doesn't clobber %rcx and %r11.
  - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.

The 'raise()' function from libc can't be used to control those
registers. Therefore, create a syscall wrapper in inline Assembly to
fully control them.

Fixes: 660602140103 ("selftests/x86: Add a selftest for SYSRET to noncanonical addresses")
Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Reported-by: Xin Li <xin3.li@intel.com>
Co-developed-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

Need hpa's sign off.

## Changelog v2:

   - Use "+r"(rsp) as the right way to avoid redzone problems
     per Andrew's comment (hpa).
     (Ref: https://lore.kernel.org/lkml/8f5c24df-514d-5d89-f58f-ec8c3eb1e049@zytor.com )

 tools/testing/selftests/x86/sysret_rip.c | 96 +++++++++++++++++++++++-
 1 file changed, 95 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 84d74be1d90207ab..550bc4e69ac46997 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -39,6 +39,95 @@ asm (
 extern const char test_page[];
 static void const *current_test_page_addr = test_page;
 
+/* Arbitrary values */
+static const unsigned long r11_sentinel = 0xfeedfacedeadbeef;
+static const unsigned long rcx_sentinel = 0x5ca1ab1e0b57ac1e;
+
+/* An arbitrary *valid* RFLAGS value */
+static const unsigned long rflags_sentinel = 0x200a93;
+
+enum regs_ok {
+	REGS_ERROR  = -1,	/* Invalid register contents */
+	REGS_SAVED  =  0,	/* Registers properly preserved */
+	REGS_SYSRET =  1	/* Registers match syscall/sysret */
+};
+
+/*
+ * Returns:
+ *  0 = %rcx and %r11 preserved.
+ *  1 = %rcx and %r11 set to %rflags and %rip.
+ * -1 = %rcx and/or %r11 set to any other values.
+ *
+ * Note that check_regs_syscall() sets %rbx to the syscall return %rip.
+ */
+static enum regs_ok check_regs_result(unsigned long r11, unsigned long rcx,
+				      unsigned long rbx)
+{
+	if (r11 == r11_sentinel && rcx == rcx_sentinel) {
+		return REGS_SAVED;
+	} else if (r11 == rflags_sentinel && rcx == rbx) {
+		return REGS_SYSRET;
+	} else {
+		printf("[FAIL] check_regs_result\n");
+		printf("        r11_sentinel = %#lx; %%r11 = %#lx;\n", r11_sentinel, r11);
+		printf("        rcx_sentinel = %#lx; %%rcx = %#lx;\n", rcx_sentinel, rcx);
+		printf("        rflags_sentinel = %#lx\n", rflags_sentinel);
+		return REGS_ERROR;
+	}
+}
+
+static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
+		       unsigned long arg3, unsigned long arg4,
+		       unsigned long arg5, unsigned long arg6)
+{
+	register unsigned long r11 asm("%r11");
+	register unsigned long r10 asm("%r10");
+	register unsigned long r8 asm("%r8");
+	register unsigned long r9 asm("%r9");
+	register void *rsp asm("%rsp");
+	unsigned long rcx, rbx;
+
+	r11 = r11_sentinel;
+	rcx = rcx_sentinel;
+	r10 = arg4;
+	r8 = arg5;
+	r9 = arg6;
+
+	asm volatile (
+		"pushq	%[rflags_sentinel]\n\t"
+		"popf\n\t"
+		"leaq	1f(%%rip), %[rbx]\n\t"
+		"syscall\n"
+		"1:"
+
+		: "+a" (nr_syscall),
+		  "+r" (r11),
+		  "+c" (rcx),
+		  [rbx] "=b" (rbx),
+		  "+r" (rsp)	/* Clobber the redzone */
+
+		: [rflags_sentinel] "g" (rflags_sentinel),
+		  "D" (arg1),	/* %rdi */
+		  "S" (arg2),	/* %rsi */
+		  "d" (arg3),	/* %rdx */
+		  "r" (r10),
+		  "r" (r8),
+		  "r" (r9)
+
+		: "memory"
+	);
+
+	/*
+	 * Test that:
+	 *
+	 * - "syscall" in a FRED system doesn't clobber %rcx and %r11.
+	 * - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
+	 *
+	 */
+	assert(check_regs_result(r11, rcx, rbx) != REGS_ERROR);
+	return nr_syscall;
+}
+
 static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
 		       int flags)
 {
@@ -101,11 +190,16 @@ static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
 	return;
 }
 
+static void __raise(int sig)
+{
+	do_syscall(__NR_kill, getpid(), sig, 0, 0, 0, 0);
+}
+
 static void test_sigreturn_to(unsigned long ip)
 {
 	rip = ip;
 	printf("[RUN]\tsigreturn to 0x%lx\n", ip);
-	raise(SIGUSR1);
+	__raise(SIGUSR1);
 }
 
 static jmp_buf jmpbuf;
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [RFC PATCH v2 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-24  2:27                 ` [RFC PATCH v2 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
  2023-01-24  2:27                   ` [RFC PATCH v2 1/2] selftests/x86: sysret_rip: Handle syscall in a " Ammar Faizi
@ 2023-01-24  2:27                   ` Ammar Faizi
  2023-01-24  6:16                     ` H. Peter Anvin
  1 sibling, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24  2:27 UTC (permalink / raw)
  To: H. Peter Anvin, x86 Mailing List
  Cc: Ammar Faizi, Dave Hansen, Dave Hansen, Xin Li, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

Test that:

 - "syscall" in a FRED system doesn't clobber %rcx and %r11.
 - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.

Test them out with a trivial system call like __NR_getppid and friends
which are extremely likely to return with SYSRET on an IDT system; check
that it returns a nonnegative value and then save the result.

Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Co-developed-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

Need hpa's sign off.

 tools/testing/selftests/x86/sysret_rip.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 550bc4e69ac46997..75c72d34dbc5840c 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -252,8 +252,17 @@ static void test_syscall_fallthrough_to(unsigned long ip)
 	printf("[OK]\tWe survived\n");
 }
 
+static void test_syscall_rcx_r11(void)
+{
+	do_syscall(__NR_getpid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_gettid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_getppid, 0, 0, 0, 0, 0, 0);
+}
+
 int main()
 {
+	test_syscall_rcx_r11();
+
 	/*
 	 * When the kernel returns from a slow-path syscall, it will
 	 * detect whether SYSRET is appropriate.  If it incorrectly
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-24  1:40                       ` H. Peter Anvin
@ 2023-01-24  2:31                         ` Ammar Faizi
  2023-01-26 20:08                         ` Ammar Faizi
  2023-01-26 20:16                         ` Ammar Faizi
  2 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24  2:31 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: x86 Mailing List, Dave Hansen, Dave Hansen, Xin Li,
	Thomas Gleixner, Andrew Cooper, Brian Gerst, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Shuah Khan, Ingo Molnar,
	Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On Mon, Jan 23, 2023 at 05:40:23PM -0800, H. Peter Anvin wrote:
> On 1/23/23 16:26, Ammar Faizi wrote:
> > +
> > +static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
> > +		       unsigned long arg3, unsigned long arg4,
> > +		       unsigned long arg5, unsigned long arg6)
> > +{
> > +	register unsigned long r11 asm("%r11");
> > +	register unsigned long r10 asm("%r10");
> > +	register unsigned long r8 asm("%r8");
> > +	register unsigned long r9 asm("%r9");
> > +	unsigned long rcx, rbx;
> > +
> > +	r11 = r11_sentinel;
> > +	rcx = rcx_sentinel;
> > +	r10 = arg4;
> > +	r8 = arg5;
> > +	r9 = arg6;
> > +
> > +	asm volatile (
> > +		"movq	-8(%%rsp), %%r12\n\t"	/* Don't clobber redzone. */
> > +		"pushq	%[rflags_sentinel]\n\t"
> > +		"popf\n\t"
> > +		"movq	%%r12, -8(%%rsp)\n\t"
> > +		"leaq	1f(%%rip), %[rbx]\n\t"
> > +		"syscall\n"
> > +		"1:"
> > +
> > +		: "+a" (nr_syscall),
> > +		  "+r" (r11),
> > +		  "+c" (rcx),
> > +		  [rbx] "=b" (rbx)
> > +
> > +		: [rflags_sentinel] "g" (rflags_sentinel),
> > +		  "D" (arg1),	/* %rdi */
> > +		  "S" (arg2),	/* %rsi */
> > +		  "d" (arg3),	/* %rdx */
> > +		  "r" (r10),
> > +		  "r" (r8),
> > +		  "r" (r9)
> > +
> > +		: "r12", "memory"
> > +	);
> > +
> > +	/*
> > +	 * Test that:
> > +	 *
> > +	 * - "syscall" in a FRED system doesn't clobber %rcx and %r11.
> > +	 * - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
> > +	 *
> > +	 */
> > +	assert(check_regs_result(r11, rcx, rbx) != REGS_ERROR);
> > +	return nr_syscall;
> > +}
> > +
> 
> So as per Andrew's comment, add:
> 
> register void * rsp asm("%rsp");
> 
> ...
> 
> "+r" (rsp)	/* clobber the redzone */
> 
> ... as the right way to avoid redzone problems.

Fixed in v2.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v2 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-24  2:27                   ` [RFC PATCH v2 1/2] selftests/x86: sysret_rip: Handle syscall in a " Ammar Faizi
@ 2023-01-24  5:44                     ` H. Peter Anvin
  0 siblings, 0 replies; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-24  5:44 UTC (permalink / raw)
  To: Ammar Faizi, x86 Mailing List
  Cc: Dave Hansen, Dave Hansen, Xin Li, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List



On 1/23/23 18:27, Ammar Faizi wrote:
> From: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> 
> The current selftest asserts %r11 == %rflags after the 'syscall'
> returns to user. Such an assertion doesn't apply to a FRED system
> because in a FRED system the 'syscall' instruction does not set
> %r11=%rflags and %rcx=%rip.
> 
> Handle the FRED case. Now, test that:
> 
>    - "syscall" in a FRED system doesn't clobber %rcx and %r11.
>    - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
> 
> The 'raise()' function from libc can't be used to control those
> registers. Therefore, create a syscall wrapper in inline Assembly to
> fully control them.
> 
> Fixes: 660602140103 ("selftests/x86: Add a selftest for SYSRET to noncanonical addresses")
> Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
> Reported-by: Xin Li <xin3.li@intel.com>
> Co-developed-by: "H. Peter Anvin" <hpa@zytor.com>
> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> ---
> 
> Need hpa's sign off.
> 

For both patches:

Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v2 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-24  2:27                   ` [RFC PATCH v2 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
@ 2023-01-24  6:16                     ` H. Peter Anvin
  2023-01-24  6:41                       ` Ammar Faizi
  0 siblings, 1 reply; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-24  6:16 UTC (permalink / raw)
  To: Ammar Faizi, x86 Mailing List
  Cc: Dave Hansen, Dave Hansen, Xin Li, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List



On 1/23/23 18:27, Ammar Faizi wrote:
> From: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> 
> Test that:
> 
>   - "syscall" in a FRED system doesn't clobber %rcx and %r11.
>   - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
> 
> Test them out with a trivial system call like __NR_getppid and friends
> which are extremely likely to return with SYSRET on an IDT system; check
> that it returns a nonnegative value and then save the result.
> 

"Nonnegative" here should be "valid"; it is an implementation detail 
that the error value is -1.

However, you are not checking that you don't get a mix of REGS_SAVED and 
REGS_SYSRET, which is a major part of the point.

	-hpa



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v2 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-24  6:16                     ` H. Peter Anvin
@ 2023-01-24  6:41                       ` Ammar Faizi
  2023-01-24  6:47                         ` Ammar Faizi
  2023-01-24  9:07                         ` H. Peter Anvin
  0 siblings, 2 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24  6:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: x86 Mailing List, Dave Hansen, Dave Hansen, Xin Li,
	Thomas Gleixner, Andrew Cooper, Brian Gerst, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Shuah Khan, Ingo Molnar,
	Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On Mon, Jan 23, 2023 at 10:16:01PM -0800, H. Peter Anvin wrote:
> On 1/23/23 18:27, Ammar Faizi wrote:
> > Test that:
> > 
> >   - "syscall" in a FRED system doesn't clobber %rcx and %r11.
> >   - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
> > 
> > Test them out with a trivial system call like __NR_getppid and friends
> > which are extremely likely to return with SYSRET on an IDT system; check
> > that it returns a nonnegative value and then save the result.
> > 
> 
> "Nonnegative" here should be "valid"; it is an implementation detail that
> the error value is -1.

Copy-paste error, will fix that!

> However, you are not checking that you don't get a mix of REGS_SAVED and
> REGS_SYSRET, which is a major part of the point.

Good point!

What do you think of adding this on top of patch #1?

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 75c72d34dbc5840c..3da827713831acbc 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -47,11 +47,14 @@ static const unsigned long rcx_sentinel = 0x5ca1ab1e0b57ac1e;
 static const unsigned long rflags_sentinel = 0x200a93;
 
 enum regs_ok {
-	REGS_ERROR  = -1,	/* Invalid register contents */
-	REGS_SAVED  =  0,	/* Registers properly preserved */
-	REGS_SYSRET =  1	/* Registers match syscall/sysret */
+	REGS_INIT_VAL	= -2,	/* For init value checker, never returned */
+	REGS_ERROR 	= -1,	/* Invalid register contents */
+	REGS_SAVED 	=  0,	/* Registers properly preserved */
+	REGS_SYSRET	=  1	/* Registers match syscall/sysret */
 };
 
+static enum regs_ok regs_ok_state = REGS_INIT_VAL;
+
 /*
  * Returns:
  *  0 = %rcx and %r11 preserved.
@@ -86,6 +89,7 @@ static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
 	register unsigned long r9 asm("%r9");
 	register void *rsp asm("%rsp");
 	unsigned long rcx, rbx;
+	enum regs_ok ret;
 
 	r11 = r11_sentinel;
 	rcx = rcx_sentinel;
@@ -124,7 +128,14 @@ static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
 	 * - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
 	 *
 	 */
-	assert(check_regs_result(r11, rcx, rbx) != REGS_ERROR);
+	ret = check_regs_result(r11, rcx, rbx);
+	assert(ret != REGS_ERROR);
+
+	if (regs_ok_state == REGS_INIT_VAL)
+		regs_ok_state = ret;
+	else
+		assert(ret == regs_ok_state);
+
 	return nr_syscall;
 }
 
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v2 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-24  6:41                       ` Ammar Faizi
@ 2023-01-24  6:47                         ` Ammar Faizi
  2023-01-24  9:07                         ` H. Peter Anvin
  1 sibling, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24  6:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: x86 Mailing List, Dave Hansen, Dave Hansen, Xin Li,
	Thomas Gleixner, Andrew Cooper, Brian Gerst, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Shuah Khan, Ingo Molnar,
	Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On Tue, Jan 24, 2023 at 01:41:30PM +0700, Ammar Faizi wrote:
> On Mon, Jan 23, 2023 at 10:16:01PM -0800, H. Peter Anvin wrote:
> > However, you are not checking that you don't get a mix of REGS_SAVED and
> > REGS_SYSRET, which is a major part of the point.
> 
> Good point!
> 
> What do you think of adding this on top of patch #1?
> 
> diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
> index 75c72d34dbc5840c..3da827713831acbc 100644
> --- a/tools/testing/selftests/x86/sysret_rip.c
> +++ b/tools/testing/selftests/x86/sysret_rip.c
> @@ -47,11 +47,14 @@ static const unsigned long rcx_sentinel = 0x5ca1ab1e0b57ac1e;
>  static const unsigned long rflags_sentinel = 0x200a93;
>  
>  enum regs_ok {
> -	REGS_ERROR  = -1,	/* Invalid register contents */
> -	REGS_SAVED  =  0,	/* Registers properly preserved */
> -	REGS_SYSRET =  1	/* Registers match syscall/sysret */
> +	REGS_INIT_VAL	= -2,	/* For init value checker, never returned */
> +	REGS_ERROR 	= -1,	/* Invalid register contents */
> +	REGS_SAVED 	=  0,	/* Registers properly preserved */
> +	REGS_SYSRET	=  1	/* Registers match syscall/sysret */
>  };
>  
> +static enum regs_ok regs_ok_state = REGS_INIT_VAL;
> +
>  /*
>   * Returns:
>   *  0 = %rcx and %r11 preserved.
> @@ -86,6 +89,7 @@ static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
>  	register unsigned long r9 asm("%r9");
>  	register void *rsp asm("%rsp");
>  	unsigned long rcx, rbx;
> +	enum regs_ok ret;
>  
>  	r11 = r11_sentinel;
>  	rcx = rcx_sentinel;
> @@ -124,7 +128,14 @@ static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
>  	 * - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
>  	 *
>  	 */
> -	assert(check_regs_result(r11, rcx, rbx) != REGS_ERROR);
> +	ret = check_regs_result(r11, rcx, rbx);
> +	assert(ret != REGS_ERROR);
> +
> +	if (regs_ok_state == REGS_INIT_VAL)
> +		regs_ok_state = ret;
> +	else
> +		assert(ret == regs_ok_state);
> +
>  	return nr_syscall;
>  }
>  

And oh, on top of that. Add a comment, just to make it clear...

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 3da827713831acbc..3d783f5baee1b66a 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -131,6 +131,10 @@ static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
        ret = check_regs_result(r11, rcx, rbx);
        assert(ret != REGS_ERROR);
 
+       /*
+        * Check that we don't get a mix of REGS_SAVED and REGS_SYSRET.
+        * Need at least 2 times 'syscall' invoked from this function.
+        */
        if (regs_ok_state == REGS_INIT_VAL)
                regs_ok_state = ret;
        else

-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v2 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-24  6:41                       ` Ammar Faizi
  2023-01-24  6:47                         ` Ammar Faizi
@ 2023-01-24  9:07                         ` H. Peter Anvin
  2023-01-24  9:12                           ` Ammar Faizi
  1 sibling, 1 reply; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-24  9:07 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: x86 Mailing List, Dave Hansen, Dave Hansen, Xin Li,
	Thomas Gleixner, Andrew Cooper, Brian Gerst, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Shuah Khan, Ingo Molnar,
	Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On January 23, 2023 10:41:20 PM PST, Ammar Faizi <ammarfaizi2@gnuweeb.org> wrote:
>On Mon, Jan 23, 2023 at 10:16:01PM -0800, H. Peter Anvin wrote:
>> On 1/23/23 18:27, Ammar Faizi wrote:
>> > Test that:
>> > 
>> >   - "syscall" in a FRED system doesn't clobber %rcx and %r11.
>> >   - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
>> > 
>> > Test them out with a trivial system call like __NR_getppid and friends
>> > which are extremely likely to return with SYSRET on an IDT system; check
>> > that it returns a nonnegative value and then save the result.
>> > 
>> 
>> "Nonnegative" here should be "valid"; it is an implementation detail that
>> the error value is -1.
>
>Copy-paste error, will fix that!
>
>> However, you are not checking that you don't get a mix of REGS_SAVED and
>> REGS_SYSRET, which is a major part of the point.
>
>Good point!
>
>What do you think of adding this on top of patch #1?
>
>diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
>index 75c72d34dbc5840c..3da827713831acbc 100644
>--- a/tools/testing/selftests/x86/sysret_rip.c
>+++ b/tools/testing/selftests/x86/sysret_rip.c
>@@ -47,11 +47,14 @@ static const unsigned long rcx_sentinel = 0x5ca1ab1e0b57ac1e;
> static const unsigned long rflags_sentinel = 0x200a93;
> 
> enum regs_ok {
>-	REGS_ERROR  = -1,	/* Invalid register contents */
>-	REGS_SAVED  =  0,	/* Registers properly preserved */
>-	REGS_SYSRET =  1	/* Registers match syscall/sysret */
>+	REGS_INIT_VAL	= -2,	/* For init value checker, never returned */
>+	REGS_ERROR 	= -1,	/* Invalid register contents */
>+	REGS_SAVED 	=  0,	/* Registers properly preserved */
>+	REGS_SYSRET	=  1	/* Registers match syscall/sysret */
> };
> 
>+static enum regs_ok regs_ok_state = REGS_INIT_VAL;
>+
> /*
>  * Returns:
>  *  0 = %rcx and %r11 preserved.
>@@ -86,6 +89,7 @@ static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
> 	register unsigned long r9 asm("%r9");
> 	register void *rsp asm("%rsp");
> 	unsigned long rcx, rbx;
>+	enum regs_ok ret;
> 
> 	r11 = r11_sentinel;
> 	rcx = rcx_sentinel;
>@@ -124,7 +128,14 @@ static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
> 	 * - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
> 	 *
> 	 */
>-	assert(check_regs_result(r11, rcx, rbx) != REGS_ERROR);
>+	ret = check_regs_result(r11, rcx, rbx);
>+	assert(ret != REGS_ERROR);
>+
>+	if (regs_ok_state == REGS_INIT_VAL)
>+		regs_ok_state = ret;
>+	else
>+		assert(ret == regs_ok_state);
>+
> 	return nr_syscall;
> }
> 

Works for me, although I would call it REGS_UNDEFINED myself.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v2 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-24  9:07                         ` H. Peter Anvin
@ 2023-01-24  9:12                           ` Ammar Faizi
  2023-01-24 10:09                             ` [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
                                               ` (4 more replies)
  0 siblings, 5 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24  9:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: x86 Mailing List, Dave Hansen, Dave Hansen, Xin Li,
	Thomas Gleixner, Andrew Cooper, Brian Gerst, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Shuah Khan, Ingo Molnar,
	Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On Tue, Jan 24, 2023 at 01:07:38AM -0800, H. Peter Anvin wrote:
> Works for me, although I would call it REGS_UNDEFINED myself.

I'll call it REGS_UNDEFINED too, in v3.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system
  2023-01-24  9:12                           ` Ammar Faizi
@ 2023-01-24 10:09                             ` Ammar Faizi
  2023-01-24 10:09                               ` [RFC PATCH v3 1/2] selftests/x86: sysret_rip: Handle syscall in a " Ammar Faizi
                                                 ` (2 more replies)
  2023-01-25  3:22                             ` [RFC PATCH v4 0/2] sysret_rip update for the Intel FRED architecture Ammar Faizi
                                               ` (3 subsequent siblings)
  4 siblings, 3 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24 10:09 UTC (permalink / raw)
  To: H. Peter Anvin, x86 Mailing List
  Cc: Ammar Faizi, Dave Hansen, Dave Hansen, Xin Li, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

This is an RFC patchset v3:
sysret_rip test update for Intel FRED architecture. 

Xin Li reported sysret_rip test fails at:

        assert(ctx->uc_mcontext.gregs[REG_EFL] ==
               ctx->uc_mcontext.gregs[REG_R11]);

in a FRED system. Let's handle the FRED system scenario too. The
'syscall' instruction in a FRED system doesn't set %r11=%rflags.

There are two patches in this series.

How to test this:

  $ make -C tools/testing/selftests/x86
  $ tools/testing/selftests/x86/sysret_rip_64

Link: https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-9c001c2343af@intel.com
Fixes: 660602140103 ("selftests/x86: Add a selftest for SYSRET to noncanonical addresses")
Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Reported-by: Xin Li <xin3.li@intel.com>
Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

## Changelog v3:

   - Test that we don't get a mix of REGS_SAVED and REGS_SYSRET,
     which is a major part of the point (hpa).

## Changelog v2:

   - Use "+r"(rsp) as the right way to avoid redzone problems
     per Andrew's comment (hpa).
     (Ref: https://lore.kernel.org/lkml/8f5c24df-514d-5d89-f58f-ec8c3eb1e049@zytor.com )

---

Ammar Faizi (2):
  selftests/x86: sysret_rip: Handle syscall in a FRED system
  selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`

 tools/testing/selftests/x86/sysret_rip.c | 120 ++++++++++++++++++++++-
 1 file changed, 119 insertions(+), 1 deletion(-)


base-commit: e12ad468c22065a2826b2fc4c11d2113a7975301
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v3 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-24 10:09                             ` [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
@ 2023-01-24 10:09                               ` Ammar Faizi
  2023-01-24 10:09                               ` [RFC PATCH v3 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
  2023-01-24 21:32                               ` [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system Li, Xin3
  2 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24 10:09 UTC (permalink / raw)
  To: H. Peter Anvin, x86 Mailing List
  Cc: Ammar Faizi, Dave Hansen, Dave Hansen, Xin Li, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

The current selftest asserts %r11 == %rflags after the 'syscall'
returns to user. Such an assertion doesn't apply to a FRED system
because in a FRED system the 'syscall' instruction does not set
%r11=%rflags and %rcx=%rip.

Handle the FRED case. Now, test that:

  - "syscall" in a FRED system doesn't clobber %rcx and %r11.
  - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.

The 'raise()' function from libc can't be used to control those
registers. Therefore, create a syscall wrapper in inline Assembly to
fully control them.

Fixes: 660602140103 ("selftests/x86: Add a selftest for SYSRET to noncanonical addresses")
Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Reported-by: Xin Li <xin3.li@intel.com>
Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 tools/testing/selftests/x86/sysret_rip.c | 111 ++++++++++++++++++++++-
 1 file changed, 110 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 84d74be1d90207ab..b0d271c19ddd7834 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -39,6 +39,110 @@ asm (
 extern const char test_page[];
 static void const *current_test_page_addr = test_page;
 
+/* Arbitrary values */
+static const unsigned long r11_sentinel = 0xfeedfacedeadbeef;
+static const unsigned long rcx_sentinel = 0x5ca1ab1e0b57ac1e;
+
+/* An arbitrary *valid* RFLAGS value */
+static const unsigned long rflags_sentinel = 0x200a93;
+
+enum regs_ok {
+	REGS_UNDEFINED	= -2,	/* For init value checker, never returned */
+	REGS_ERROR	= -1,	/* Invalid register contents */
+	REGS_SAVED	=  0,	/* Registers properly preserved */
+	REGS_SYSRET	=  1	/* Registers match syscall/sysret */
+};
+
+static enum regs_ok regs_ok_state = REGS_UNDEFINED;
+
+/*
+ * Returns:
+ *  0 = %rcx and %r11 preserved.
+ *  1 = %rcx and %r11 set to %rflags and %rip.
+ * -1 = %rcx and/or %r11 set to any other values.
+ *
+ * Note that check_regs_syscall() sets %rbx to the syscall return %rip.
+ */
+static enum regs_ok check_regs_result(unsigned long r11, unsigned long rcx,
+				      unsigned long rbx)
+{
+	if (r11 == r11_sentinel && rcx == rcx_sentinel) {
+		return REGS_SAVED;
+	} else if (r11 == rflags_sentinel && rcx == rbx) {
+		return REGS_SYSRET;
+	} else {
+		printf("[FAIL] check_regs_result\n");
+		printf("        r11_sentinel = %#lx; %%r11 = %#lx;\n", r11_sentinel, r11);
+		printf("        rcx_sentinel = %#lx; %%rcx = %#lx;\n", rcx_sentinel, rcx);
+		printf("        rflags_sentinel = %#lx\n", rflags_sentinel);
+		return REGS_ERROR;
+	}
+}
+
+static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
+		       unsigned long arg3, unsigned long arg4,
+		       unsigned long arg5, unsigned long arg6)
+{
+	register unsigned long r11 asm("%r11");
+	register unsigned long r10 asm("%r10");
+	register unsigned long r8 asm("%r8");
+	register unsigned long r9 asm("%r9");
+	register void *rsp asm("%rsp");
+	unsigned long rcx, rbx;
+	enum regs_ok ret;
+
+	r11 = r11_sentinel;
+	rcx = rcx_sentinel;
+	r10 = arg4;
+	r8 = arg5;
+	r9 = arg6;
+
+	asm volatile (
+		"pushq	%[rflags_sentinel]\n\t"
+		"popf\n\t"
+		"leaq	1f(%%rip), %[rbx]\n\t"
+		"syscall\n"
+		"1:"
+
+		: "+a" (nr_syscall),
+		  "+r" (r11),
+		  "+c" (rcx),
+		  [rbx] "=b" (rbx),
+		  "+r" (rsp)	/* Clobber the redzone */
+
+		: [rflags_sentinel] "g" (rflags_sentinel),
+		  "D" (arg1),	/* %rdi */
+		  "S" (arg2),	/* %rsi */
+		  "d" (arg3),	/* %rdx */
+		  "r" (r10),
+		  "r" (r8),
+		  "r" (r9)
+
+		: "memory"
+	);
+
+	/*
+	 * Test that:
+	 *
+	 * - "syscall" in a FRED system doesn't clobber %rcx and %r11.
+	 * - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
+	 *
+	 */
+	ret = check_regs_result(r11, rcx, rbx);
+	assert(ret != REGS_ERROR);
+
+	/*
+	 * Test that we don't get a mix of REGS_SAVED and REGS_SYSRET.
+	 * Need at least 2 times 'syscall' invoked from this function.
+	 */
+	if (regs_ok_state == REGS_UNDEFINED)
+		regs_ok_state = ret;
+	else
+		assert(ret == regs_ok_state);
+
+	return nr_syscall;
+}
+
 static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
 		       int flags)
 {
@@ -101,11 +205,16 @@ static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
 	return;
 }
 
+static void __raise(int sig)
+{
+	do_syscall(__NR_kill, getpid(), sig, 0, 0, 0, 0);
+}
+
 static void test_sigreturn_to(unsigned long ip)
 {
 	rip = ip;
 	printf("[RUN]\tsigreturn to 0x%lx\n", ip);
-	raise(SIGUSR1);
+	__raise(SIGUSR1);
 }
 
 static jmp_buf jmpbuf;
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [RFC PATCH v3 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-24 10:09                             ` [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
  2023-01-24 10:09                               ` [RFC PATCH v3 1/2] selftests/x86: sysret_rip: Handle syscall in a " Ammar Faizi
@ 2023-01-24 10:09                               ` Ammar Faizi
  2023-01-24 20:59                                 ` H. Peter Anvin
  2023-01-24 21:32                               ` [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system Li, Xin3
  2 siblings, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-24 10:09 UTC (permalink / raw)
  To: H. Peter Anvin, x86 Mailing List
  Cc: Ammar Faizi, Dave Hansen, Dave Hansen, Xin Li, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

Test that:

 - "syscall" in a FRED system doesn't clobber %rcx and %r11.
 - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.

Test them out with a trivial system call like __NR_getppid and friends
which are extremely likely to return with SYSRET on an IDT system.

Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 tools/testing/selftests/x86/sysret_rip.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index b0d271c19ddd7834..bf90fac95a264e2d 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -267,8 +267,17 @@ static void test_syscall_fallthrough_to(unsigned long ip)
 	printf("[OK]\tWe survived\n");
 }
 
+static void test_syscall_rcx_r11(void)
+{
+	do_syscall(__NR_getpid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_gettid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_getppid, 0, 0, 0, 0, 0, 0);
+}
+
 int main()
 {
+	test_syscall_rcx_r11();
+
 	/*
 	 * When the kernel returns from a slow-path syscall, it will
 	 * detect whether SYSRET is appropriate.  If it incorrectly
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v3 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-24 10:09                               ` [RFC PATCH v3 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
@ 2023-01-24 20:59                                 ` H. Peter Anvin
  2023-01-25  3:29                                   ` Ammar Faizi
  0 siblings, 1 reply; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-24 20:59 UTC (permalink / raw)
  To: Ammar Faizi, x86 Mailing List
  Cc: Dave Hansen, Dave Hansen, Xin Li, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List



On 1/24/23 02:09, Ammar Faizi wrote:
> From: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> 
> Test that:
> 
>   - "syscall" in a FRED system doesn't clobber %rcx and %r11.
>   - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
> 
> Test them out with a trivial system call like __NR_getppid and friends
> which are extremely likely to return with SYSRET on an IDT system.
> 
> Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
> Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>

Add to the description that the purpose of this is to ensure that 
various system calls are *consistent*, as per the comment immediately 
below your code.

	-hpa


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system
  2023-01-24 10:09                             ` [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
  2023-01-24 10:09                               ` [RFC PATCH v3 1/2] selftests/x86: sysret_rip: Handle syscall in a " Ammar Faizi
  2023-01-24 10:09                               ` [RFC PATCH v3 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
@ 2023-01-24 21:32                               ` Li, Xin3
  2023-01-24 21:37                                 ` H. Peter Anvin
  2023-01-24 21:51                                 ` Andrew Cooper
  2 siblings, 2 replies; 88+ messages in thread
From: Li, Xin3 @ 2023-01-24 21:32 UTC (permalink / raw)
  To: Ammar Faizi, H. Peter Anvin, x86 Mailing List
  Cc: Hansen, Dave, Dave Hansen, Thomas Gleixner, andrew.cooper3,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Lutomirski, Andy, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

> From: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> 
> This is an RFC patchset v3:
> sysret_rip test update for Intel FRED architecture.
> 
> Xin Li reported sysret_rip test fails at:
> 
>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
>                ctx->uc_mcontext.gregs[REG_R11]);

On FRED systems, flags is 0x200a93 and r11 is 0xfeedfacedeadbeef here.

We need to remove or change this assertion, maybe:
  assert(ctx->uc_mcontext.gregs[REG_EFL] == ctx->uc_mcontext.gregs[REG_R11] ||
         r11_sentinel == ctx->uc_mcontext.gregs[REG_R11]);

> 
> in a FRED system. Let's handle the FRED system scenario too. The 'syscall'
> instruction in a FRED system doesn't set %r11=%rflags.
> 
> There are two patches in this series.
> 
> How to test this:
> 
>   $ make -C tools/testing/selftests/x86
>   $ tools/testing/selftests/x86/sysret_rip_64
> 
> Link: https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-
> 9c001c2343af@intel.com
> Fixes: 660602140103 ("selftests/x86: Add a selftest for SYSRET to noncanonical
> addresses")
> Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-
> 786b55054126@zytor.com
> Reported-by: Xin Li <xin3.li@intel.com>
> Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> ---
> 
> ## Changelog v3:
> 
>    - Test that we don't get a mix of REGS_SAVED and REGS_SYSRET,
>      which is a major part of the point (hpa).
> 
> ## Changelog v2:
> 
>    - Use "+r"(rsp) as the right way to avoid redzone problems
>      per Andrew's comment (hpa).
>      (Ref: https://lore.kernel.org/lkml/8f5c24df-514d-5d89-f58f-
> ec8c3eb1e049@zytor.com )
> 
> ---
> 
> Ammar Faizi (2):
>   selftests/x86: sysret_rip: Handle syscall in a FRED system
>   selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
> 
>  tools/testing/selftests/x86/sysret_rip.c | 120 ++++++++++++++++++++++-
>  1 file changed, 119 insertions(+), 1 deletion(-)
> 
> 
> base-commit: e12ad468c22065a2826b2fc4c11d2113a7975301
> --
> Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system
  2023-01-24 21:32                               ` [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system Li, Xin3
@ 2023-01-24 21:37                                 ` H. Peter Anvin
  2023-01-24 23:20                                   ` Li, Xin3
  2023-01-25  3:27                                   ` Ammar Faizi
  2023-01-24 21:51                                 ` Andrew Cooper
  1 sibling, 2 replies; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-24 21:37 UTC (permalink / raw)
  To: Li, Xin3, Ammar Faizi, x86 Mailing List
  Cc: Hansen, Dave, Dave Hansen, Thomas Gleixner, andrew.cooper3,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Lutomirski, Andy, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On January 24, 2023 1:32:14 PM PST, "Li, Xin3" <xin3.li@intel.com> wrote:
>> From: Ammar Faizi <ammarfaizi2@gnuweeb.org>
>> 
>> This is an RFC patchset v3:
>> sysret_rip test update for Intel FRED architecture.
>> 
>> Xin Li reported sysret_rip test fails at:
>> 
>>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
>>                ctx->uc_mcontext.gregs[REG_R11]);
>
>On FRED systems, flags is 0x200a93 and r11 is 0xfeedfacedeadbeef here.
>
>We need to remove or change this assertion, maybe:
>  assert(ctx->uc_mcontext.gregs[REG_EFL] == ctx->uc_mcontext.gregs[REG_R11] ||
>         r11_sentinel == ctx->uc_mcontext.gregs[REG_R11]);
>
>> 
>> in a FRED system. Let's handle the FRED system scenario too. The 'syscall'
>> instruction in a FRED system doesn't set %r11=%rflags.
>> 
>> There are two patches in this series.
>> 
>> How to test this:
>> 
>>   $ make -C tools/testing/selftests/x86
>>   $ tools/testing/selftests/x86/sysret_rip_64
>> 
>> Link: https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-
>> 9c001c2343af@intel.com
>> Fixes: 660602140103 ("selftests/x86: Add a selftest for SYSRET to noncanonical
>> addresses")
>> Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-
>> 786b55054126@zytor.com
>> Reported-by: Xin Li <xin3.li@intel.com>
>> Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
>> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
>> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
>> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
>> ---
>> 
>> ## Changelog v3:
>> 
>>    - Test that we don't get a mix of REGS_SAVED and REGS_SYSRET,
>>      which is a major part of the point (hpa).
>> 
>> ## Changelog v2:
>> 
>>    - Use "+r"(rsp) as the right way to avoid redzone problems
>>      per Andrew's comment (hpa).
>>      (Ref: https://lore.kernel.org/lkml/8f5c24df-514d-5d89-f58f-
>> ec8c3eb1e049@zytor.com )
>> 
>> ---
>> 
>> Ammar Faizi (2):
>>   selftests/x86: sysret_rip: Handle syscall in a FRED system
>>   selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
>> 
>>  tools/testing/selftests/x86/sysret_rip.c | 120 ++++++++++++++++++++++-
>>  1 file changed, 119 insertions(+), 1 deletion(-)
>> 
>> 
>> base-commit: e12ad468c22065a2826b2fc4c11d2113a7975301
>> --
>> Ammar Faizi
>
>

This should use check_regs_result() – which is exactly the reason I made that a separate function.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system
  2023-01-24 21:32                               ` [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system Li, Xin3
  2023-01-24 21:37                                 ` H. Peter Anvin
@ 2023-01-24 21:51                                 ` Andrew Cooper
  2023-01-24 23:58                                   ` Li, Xin3
  1 sibling, 1 reply; 88+ messages in thread
From: Andrew Cooper @ 2023-01-24 21:51 UTC (permalink / raw)
  To: Li, Xin3, Ammar Faizi, H. Peter Anvin, x86 Mailing List
  Cc: Hansen, Dave, Dave Hansen, Thomas Gleixner, Brian Gerst,
	Ingo Molnar, Borislav Petkov, Peter Zijlstra, Shuah Khan,
	Ingo Molnar, Lutomirski, Andy, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List,
	Andrew Cooper

On 24/01/2023 9:32 pm, Li, Xin3 wrote:
>> From: Ammar Faizi <ammarfaizi2@gnuweeb.org>
>>
>> This is an RFC patchset v3:
>> sysret_rip test update for Intel FRED architecture.
>>
>> Xin Li reported sysret_rip test fails at:
>>
>>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
>>                ctx->uc_mcontext.gregs[REG_R11]);
> On FRED systems, flags is 0x200a93 and r11 is 0xfeedfacedeadbeef here.

Is this under SIMICS, or elsewhere?  It's doubly weird that flags/%r11
match, but %rip/%rcx don't.

~Andrew

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system
  2023-01-24 21:37                                 ` H. Peter Anvin
@ 2023-01-24 23:20                                   ` Li, Xin3
  2023-01-25  3:27                                   ` Ammar Faizi
  1 sibling, 0 replies; 88+ messages in thread
From: Li, Xin3 @ 2023-01-24 23:20 UTC (permalink / raw)
  To: H. Peter Anvin, Ammar Faizi, x86 Mailing List
  Cc: Hansen, Dave, Dave Hansen, Thomas Gleixner, andrew.cooper3,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Lutomirski, Andy, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

> >>
> >> Xin Li reported sysret_rip test fails at:
> >>
> >>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
> >>                ctx->uc_mcontext.gregs[REG_R11]);
> >
> >On FRED systems, flags is 0x200a93 and r11 is 0xfeedfacedeadbeef here.
> >
> >We need to remove or change this assertion, maybe:
> >  assert(ctx->uc_mcontext.gregs[REG_EFL] == ctx->uc_mcontext.gregs[REG_R11] ||
> >         r11_sentinel == ctx->uc_mcontext.gregs[REG_R11]);
> >
> >>
> >> in a FRED system. Let's handle the FRED system scenario too. The 'syscall'
> >> instruction in a FRED system doesn't set %r11=%rflags.
> >>
> 
> This should use check_regs_result() – which is exactly the reason I made that a
> separate function.

Exactly.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system
  2023-01-24 21:51                                 ` Andrew Cooper
@ 2023-01-24 23:58                                   ` Li, Xin3
  0 siblings, 0 replies; 88+ messages in thread
From: Li, Xin3 @ 2023-01-24 23:58 UTC (permalink / raw)
  To: andrew.cooper3, Ammar Faizi, H. Peter Anvin, x86 Mailing List
  Cc: Hansen, Dave, Dave Hansen, Thomas Gleixner, Brian Gerst,
	Ingo Molnar, Borislav Petkov, Peter Zijlstra, Shuah Khan,
	Ingo Molnar, Lutomirski, Andy, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List,
	andrew.cooper3

> >>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
> >>                ctx->uc_mcontext.gregs[REG_R11]);
> > On FRED systems, flags is 0x200a93 and r11 is 0xfeedfacedeadbeef here.
> 
> Is this under SIMICS, or elsewhere?  It's doubly weird that flags/%r11 match, but
> %rip/%rcx don't.

This is on Intel Simics, with FRED enabled.

Flags and %r11 don’t match on FRED, and %rip and %rcx also don't match.

I think it's expected.

Xin

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v4 0/2] sysret_rip update for the Intel FRED architecture
  2023-01-24  9:12                           ` Ammar Faizi
  2023-01-24 10:09                             ` [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
@ 2023-01-25  3:22                             ` Ammar Faizi
  2023-01-25  3:22                               ` [RFC PATCH v4 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
  2023-01-25  3:22                               ` [RFC PATCH v4 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
  2023-01-25  3:49                             ` [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture Ammar Faizi
                                               ` (2 subsequent siblings)
  4 siblings, 2 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  3:22 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Ammar Faizi, Dave Hansen, Dave Hansen, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	x86 Mailing List, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

This is an RFC patchset v4. There are two patches in this series.

Xin Li reported that the sysret_rip test fails at:

        assert(ctx->uc_mcontext.gregs[REG_EFL] ==
               ctx->uc_mcontext.gregs[REG_R11]);

on the Intel FRED architecture. Let's handle the FRED system scenario
too. The 'syscall' instruction in a FRED system doesn't set %rcx=%rip
and %r11=%rflags.

Syscall and sysenter in a FRED system are treated equivalently to
software interrupts, e.g. INT 0x80. They do not modify any registers.

Link: https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-9c001c2343af@intel.com

#### Changelog v4:

   - Fix the assertion condition inside the SIGUSR1 handler (Xin Li).

   - Explain the purpose of patch #2 in the commit message (HPA).

   - Update commit message (Ammar).

   - Repeat test_syscall_rcx_r11_consistent() 32 times to be more sure
     that the result is really consistent (Ammar).

#### Changelog v3:

   - Test that we don't get a mix of REGS_SAVED and REGS_SYSRET,
     which is a major part of the point (HPA).

#### Changelog v2:

   - Use "+r"(rsp) as the right way to avoid redzone problems
     per Andrew's comment (HPA).


Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

Ammar Faizi (2):
  selftests/x86: sysret_rip: Handle syscall in a FRED system
  selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`

 tools/testing/selftests/x86/sysret_rip.c | 147 +++++++++++++++++++++--
 1 file changed, 138 insertions(+), 9 deletions(-)


base-commit: e12ad468c22065a2826b2fc4c11d2113a7975301
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v4 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25  3:22                             ` [RFC PATCH v4 0/2] sysret_rip update for the Intel FRED architecture Ammar Faizi
@ 2023-01-25  3:22                               ` Ammar Faizi
  2023-01-25  3:37                                 ` Ammar Faizi
  2023-01-25  3:22                               ` [RFC PATCH v4 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
  1 sibling, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  3:22 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Ammar Faizi, Dave Hansen, Dave Hansen, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	x86 Mailing List, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

The current selftest asserts (%r11 == %rflags) after the 'syscall'
returns to user. Such an assertion doesn't apply to the FRED system
because in that system the 'syscall' instruction does not set
%r11=%rflags and %rcx=%rip.

Handle the FRED case. Now, test that:

  - "syscall" in a FRED system doesn't clobber %rcx and %r11.
  - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.

The 'raise()' function from libc can't be used to control those
registers. Therefore, create a syscall wrapper in inline Assembly to
fully control them.

Fixes: 660602140103 ("selftests/x86: Add a selftest for SYSRET to noncanonical addresses")
Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Reported-by: Xin Li <xin3.li@intel.com>
Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 tools/testing/selftests/x86/sysret_rip.c | 127 +++++++++++++++++++++--
 1 file changed, 120 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 84d74be1d90207ab..86a31bbac9a85a88 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -39,6 +39,113 @@ asm (
 extern const char test_page[];
 static void const *current_test_page_addr = test_page;
 
+/* Arbitrary values */
+static const unsigned long r11_sentinel = 0xfeedfacedeadbeef;
+static const unsigned long rcx_sentinel = 0x5ca1ab1e0b57ac1e;
+
+/* An arbitrary *valid* RFLAGS value */
+static const unsigned long rflags_sentinel = 0x200a93;
+
+enum regs_ok {
+	REGS_UNDEFINED	= -2,	/* For consistency checker init, never returned */
+	REGS_ERROR	= -1,	/* Invalid register contents */
+	REGS_SAVED	=  0,	/* Registers properly preserved */
+	REGS_SYSRET	=  1	/* Registers match syscall/sysret */
+};
+
+/*
+ * Returns:
+ *  0 = %rcx and %r11 preserved.
+ *  1 = %rcx and %r11 set to %rflags and %rip.
+ * -1 = %rcx and/or %r11 set to any other values.
+ *
+ * @rbx should be set to the syscall return %rip.
+ */
+static enum regs_ok check_regs_result(unsigned long r11, unsigned long rcx,
+				      unsigned long rbx)
+{
+	if (r11 == r11_sentinel && rcx == rcx_sentinel)
+		return REGS_SAVED;
+
+	if (r11 == rflags_sentinel && rcx == rbx)
+		return REGS_SYSRET;
+
+	printf("[FAIL] check_regs_result\n");
+	printf("        r11_sentinel = %#lx; %%r11 = %#lx;\n", r11_sentinel, r11);
+	printf("        rcx_sentinel = %#lx; %%rcx = %#lx;\n", rcx_sentinel, rcx);
+	printf("        rflags_sentinel = %#lx\n", rflags_sentinel);
+	return REGS_ERROR;
+}
+
+static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
+		       unsigned long arg3, unsigned long arg4,
+		       unsigned long arg5, unsigned long arg6)
+{
+	static enum regs_ok regs_ok_state = REGS_UNDEFINED;
+	register unsigned long r11 asm("%r11");
+	register unsigned long r10 asm("%r10");
+	register unsigned long r8 asm("%r8");
+	register unsigned long r9 asm("%r9");
+	register void *rsp asm("%rsp");
+	unsigned long rcx, rbx;
+	enum regs_ok ret;
+
+	r11 = r11_sentinel;
+	rcx = rcx_sentinel;
+	r10 = arg4;
+	r8 = arg5;
+	r9 = arg6;
+
+	asm volatile (
+		"pushq	%[rflags_sentinel]\n\t"
+		"popf\n\t"
+		"leaq	1f(%%rip), %[rbx]\n\t"
+		"syscall\n"
+		"1:"
+
+		: "+a" (nr_syscall),
+		  "+r" (r11),
+		  "+c" (rcx),
+		  [rbx] "=b" (rbx),
+		  "+r" (rsp)	/* Clobber the redzone */
+
+		: [rflags_sentinel] "g" (rflags_sentinel),
+		  "D" (arg1),	/* %rdi */
+		  "S" (arg2),	/* %rsi */
+		  "d" (arg3),	/* %rdx */
+		  "r" (r10),
+		  "r" (r8),
+		  "r" (r9)
+
+		: "memory"
+	);
+
+	/*
+	 * Test that:
+	 *
+	 * - "syscall" in a FRED system doesn't clobber %rcx and %r11.
+	 * - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
+	 */
+	ret = check_regs_result(r11, rcx, rbx);
+	assert(ret != REGS_ERROR);
+
+	/*
+	 * Test that we don't get a mix of REGS_SAVED and REGS_SYSRET.
+	 * It needs at least calling do_syscall() twice to assert.
+	 */
+	if (regs_ok_state == REGS_UNDEFINED) {
+		/*
+		 * First time calling do_syscall().
+		 */
+		regs_ok_state = ret;
+		return ret;
+	} else {
+		assert(regs_ok_state == ret);
+	}
+
+	return nr_syscall;
+}
+
 static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
 		       int flags)
 {
@@ -85,27 +192,33 @@ static void sigsegv_for_sigreturn_test(int sig, siginfo_t *info, void *ctx_void)
 static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
 {
 	ucontext_t *ctx = (ucontext_t*)ctx_void;
+	enum regs_ok ret;
 
 	memcpy(&initial_regs, &ctx->uc_mcontext.gregs, sizeof(gregset_t));
 
+	ret = check_regs_result(ctx->uc_mcontext.gregs[REG_R11],
+				ctx->uc_mcontext.gregs[REG_RCX],
+				ctx->uc_mcontext.gregs[REG_RBX]);
+
+	assert(ret != REGS_ERROR);
+
 	/* Set IP and CX to match so that SYSRET can happen. */
 	ctx->uc_mcontext.gregs[REG_RIP] = rip;
 	ctx->uc_mcontext.gregs[REG_RCX] = rip;
-
-	/* R11 and EFLAGS should already match. */
-	assert(ctx->uc_mcontext.gregs[REG_EFL] ==
-	       ctx->uc_mcontext.gregs[REG_R11]);
-
 	sethandler(SIGSEGV, sigsegv_for_sigreturn_test, SA_RESETHAND);
 
-	return;
+}
+
+static void __raise(int sig)
+{
+	do_syscall(__NR_kill, getpid(), sig, 0, 0, 0, 0);
 }
 
 static void test_sigreturn_to(unsigned long ip)
 {
 	rip = ip;
 	printf("[RUN]\tsigreturn to 0x%lx\n", ip);
-	raise(SIGUSR1);
+	__raise(SIGUSR1);
 }
 
 static jmp_buf jmpbuf;
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [RFC PATCH v4 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-25  3:22                             ` [RFC PATCH v4 0/2] sysret_rip update for the Intel FRED architecture Ammar Faizi
  2023-01-25  3:22                               ` [RFC PATCH v4 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
@ 2023-01-25  3:22                               ` Ammar Faizi
  1 sibling, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  3:22 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Ammar Faizi, Dave Hansen, Dave Hansen, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	x86 Mailing List, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

Test that:

  - REGS_SAVED: "syscall" in a FRED system doesn't clobber %rcx and
    %r11.

  - REGS_SYSRET: "syscall" in a non-FRED system sets %rcx=%rip and
    %r11=%rflags.

Test them out with trivial system calls like __NR_getppid and friends
which are extremely likely to return with SYSRET on an IDT system.

Goals of this test:

  - Ensure that the syscall behavior is consistent. It should be either
    always REGS_SAVED or always REGS_SYSRET. Not a mix of them.

  - Ensure that the kernel doesn't leak its internal data when
    returning to userspace.

Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 tools/testing/selftests/x86/sysret_rip.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 86a31bbac9a85a88..5b8576147c92dbbd 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -271,8 +271,24 @@ static void test_syscall_fallthrough_to(unsigned long ip)
 	printf("[OK]\tWe survived\n");
 }
 
+/*
+ * Ensure that various system calls are consistent.
+ * We should not get a mix of REGS_SAVED and REGS_SYSRET.
+ */
+static void test_syscall_rcx_r11_consistent(void)
+{
+	do_syscall(__NR_getpid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_gettid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_getppid, 0, 0, 0, 0, 0, 0);
+}
+
 int main()
 {
+	int i;
+
+	for (i = 0; i < 32; i++)
+		test_syscall_rcx_r11_consistent();
+
 	/*
 	 * When the kernel returns from a slow-path syscall, it will
 	 * detect whether SYSRET is appropriate.  If it incorrectly
@@ -280,7 +296,7 @@ int main()
 	 * it'll crash on Intel CPUs.
 	 */
 	sethandler(SIGUSR1, sigusr1, 0);
-	for (int i = 47; i < 64; i++)
+	for (i = 47; i < 64; i++)
 		test_sigreturn_to(1UL<<i);
 
 	clearhandler(SIGUSR1);
@@ -291,7 +307,7 @@ int main()
 	test_syscall_fallthrough_to((1UL << 47) - 2*PAGE_SIZE);
 
 	/* These are the interesting cases. */
-	for (int i = 47; i < 64; i++) {
+	for (i = 47; i < 64; i++) {
 		test_syscall_fallthrough_to((1UL<<i) - PAGE_SIZE);
 		test_syscall_fallthrough_to(1UL<<i);
 	}
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system
  2023-01-24 21:37                                 ` H. Peter Anvin
  2023-01-24 23:20                                   ` Li, Xin3
@ 2023-01-25  3:27                                   ` Ammar Faizi
  1 sibling, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  3:27 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Li, Xin3, x86 Mailing List, Hansen, Dave, Dave Hansen,
	Thomas Gleixner, andrew.cooper3, Brian Gerst, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Shuah Khan, Ingo Molnar,
	Lutomirski, Andy, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On Tue, Jan 24, 2023 at 01:37:43PM -0800, H. Peter Anvin wrote:
> On January 24, 2023 1:32:14 PM PST, "Li, Xin3" <xin3.li@intel.com> wrote:
> >> From: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> >> 
> >> This is an RFC patchset v3:
> >> sysret_rip test update for Intel FRED architecture.
> >> 
> >> Xin Li reported sysret_rip test fails at:
> >> 
> >>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
> >>                ctx->uc_mcontext.gregs[REG_R11]);
> >
> >On FRED systems, flags is 0x200a93 and r11 is 0xfeedfacedeadbeef here.
> >
> >We need to remove or change this assertion, maybe:
> >  assert(ctx->uc_mcontext.gregs[REG_EFL] == ctx->uc_mcontext.gregs[REG_R11] ||
> >         r11_sentinel == ctx->uc_mcontext.gregs[REG_R11]);
> >
>
> This should use check_regs_result() – which is exactly the reason I made that a separate function.

Fixed in v4.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v3 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-24 20:59                                 ` H. Peter Anvin
@ 2023-01-25  3:29                                   ` Ammar Faizi
  0 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  3:29 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: x86 Mailing List, Dave Hansen, Dave Hansen, Xin Li,
	Thomas Gleixner, Andrew Cooper, Brian Gerst, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Shuah Khan, Ingo Molnar,
	Andy Lutomirski, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On Tue, Jan 24, 2023 at 12:59:23PM -0800, H. Peter Anvin wrote:
> 
> 
> On 1/24/23 02:09, Ammar Faizi wrote:
> > From: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> > 
> > Test that:
> > 
> >   - "syscall" in a FRED system doesn't clobber %rcx and %r11.
> >   - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
> > 
> > Test them out with a trivial system call like __NR_getppid and friends
> > which are extremely likely to return with SYSRET on an IDT system.
> > 
> > Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
> > Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> > Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> > Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> > Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> 
> Add to the description that the purpose of this is to ensure that various
> system calls are *consistent*, as per the comment immediately below your
> code.

Added in v4.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v4 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25  3:22                               ` [RFC PATCH v4 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
@ 2023-01-25  3:37                                 ` Ammar Faizi
  2023-01-25  3:44                                   ` Ammar Faizi
  0 siblings, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  3:37 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Dave Hansen, Dave Hansen, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On Wed, Jan 25, 2023 at 10:22:39AM +0700, Ammar Faizi wrote:
> +	/*
> +	 * Test that we don't get a mix of REGS_SAVED and REGS_SYSRET.
> +	 * It needs at least calling do_syscall() twice to assert.
> +	 */
> +	if (regs_ok_state == REGS_UNDEFINED) {
> +		/*
> +		 * First time calling do_syscall().
> +		 */
> +		regs_ok_state = ret;
> +		return ret;
> +	} else {

Oops, this should not be returning ret here. Ignore this version.
I'll send v5.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v4 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25  3:37                                 ` Ammar Faizi
@ 2023-01-25  3:44                                   ` Ammar Faizi
  0 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  3:44 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Dave Hansen, Dave Hansen, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On Wed, Jan 25, 2023 at 10:37:17AM +0700, Ammar Faizi wrote:
> On Wed, Jan 25, 2023 at 10:22:39AM +0700, Ammar Faizi wrote:
> > +	/*
> > +	 * Test that we don't get a mix of REGS_SAVED and REGS_SYSRET.
> > +	 * It needs at least calling do_syscall() twice to assert.
> > +	 */
> > +	if (regs_ok_state == REGS_UNDEFINED) {
> > +		/*
> > +		 * First time calling do_syscall().
> > +		 */
> > +		regs_ok_state = ret;
> > +		return ret;
> > +	} else {
> 
> Oops, this should not be returning ret here. Ignore this version.
> I'll send v5.

Side note: This mistake doesn't affect the test correctness because
we currenly don't use the return value of do_syscall().

But still worth fixing...

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture
  2023-01-24  9:12                           ` Ammar Faizi
  2023-01-24 10:09                             ` [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
  2023-01-25  3:22                             ` [RFC PATCH v4 0/2] sysret_rip update for the Intel FRED architecture Ammar Faizi
@ 2023-01-25  3:49                             ` Ammar Faizi
  2023-01-25  3:49                               ` [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
                                                 ` (2 more replies)
  2023-01-25 21:17                             ` [RFC PATCH v6 0/3] " Ammar Faizi
  2023-01-25 23:24                             ` [RFC PATCH v7 0/3] sysret_rip update for the Intel FRED architecture Ammar Faizi
  4 siblings, 3 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  3:49 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Ammar Faizi, Dave Hansen, Dave Hansen, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	x86 Mailing List, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

This is an RFC patchset v5. There are two patches in this series.

Xin Li reported that the sysret_rip test fails at:

        assert(ctx->uc_mcontext.gregs[REG_EFL] ==
               ctx->uc_mcontext.gregs[REG_R11]);

on the Intel FRED architecture. Let's handle the FRED system scenario
too. The 'syscall' instruction in a FRED system doesn't set %rcx=%rip
and %r11=%rflags.

Syscall and sysenter in a FRED system are treated equivalently to
software interrupts, e.g. INT 0x80. They do not modify any registers.

Link: https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-9c001c2343af@intel.com

#### Changelog v5:

   - Fix do_syscall() return value (Ammar).

#### Changelog v4:

   - Fix the assertion condition inside the SIGUSR1 handler (Xin Li).

   - Explain the purpose of patch #2 in the commit message (HPA).

   - Update commit message (Ammar).

   - Repeat test_syscall_rcx_r11_consistent() 32 times to be more sure
     that the result is really consistent (Ammar).

#### Changelog v3:

   - Test that we don't get a mix of REGS_SAVED and REGS_SYSRET,
     which is a major part of the point (HPA).

#### Changelog v2:

   - Use "+r"(rsp) as the right way to avoid redzone problems
     per Andrew's comment (HPA).


Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

Ammar Faizi (2):
  selftests/x86: sysret_rip: Handle syscall in a FRED system
  selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`

 tools/testing/selftests/x86/sysret_rip.c | 146 +++++++++++++++++++++--
 1 file changed, 137 insertions(+), 9 deletions(-)


base-commit: e12ad468c22065a2826b2fc4c11d2113a7975301
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25  3:49                             ` [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture Ammar Faizi
@ 2023-01-25  3:49                               ` Ammar Faizi
  2023-01-25  8:39                                 ` H. Peter Anvin
  2023-01-25  3:49                               ` [RFC PATCH v5 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
  2023-01-25  8:22                               ` [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture Li, Xin3
  2 siblings, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  3:49 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Ammar Faizi, Dave Hansen, Dave Hansen, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	x86 Mailing List, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

The current selftest asserts (%r11 == %rflags) after the 'syscall'
returns to user. Such an assertion doesn't apply to the FRED system
because in that system the 'syscall' instruction does not set
%r11=%rflags and %rcx=%rip.

Handle the FRED case. Now, test that:

  - "syscall" in a FRED system doesn't clobber %rcx and %r11.
  - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.

The 'raise()' function from libc can't be used to control those
registers. Therefore, create a syscall wrapper in inline Assembly to
fully control them.

Fixes: 660602140103 ("selftests/x86: Add a selftest for SYSRET to noncanonical addresses")
Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Reported-by: Xin Li <xin3.li@intel.com>
Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 tools/testing/selftests/x86/sysret_rip.c | 126 +++++++++++++++++++++--
 1 file changed, 119 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 84d74be1d90207ab..d3dc5ec496854179 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -39,6 +39,112 @@ asm (
 extern const char test_page[];
 static void const *current_test_page_addr = test_page;
 
+/* Arbitrary values */
+static const unsigned long r11_sentinel = 0xfeedfacedeadbeef;
+static const unsigned long rcx_sentinel = 0x5ca1ab1e0b57ac1e;
+
+/* An arbitrary *valid* RFLAGS value */
+static const unsigned long rflags_sentinel = 0x200a93;
+
+enum regs_ok {
+	REGS_UNDEFINED	= -2,	/* For consistency checker init, never returned */
+	REGS_ERROR	= -1,	/* Invalid register contents */
+	REGS_SAVED	=  0,	/* Registers properly preserved */
+	REGS_SYSRET	=  1	/* Registers match syscall/sysret */
+};
+
+/*
+ * Returns:
+ *  0 = %rcx and %r11 preserved.
+ *  1 = %rcx and %r11 set to %rflags and %rip.
+ * -1 = %rcx and/or %r11 set to any other values.
+ *
+ * @rbx should be set to the syscall return %rip.
+ */
+static enum regs_ok check_regs_result(unsigned long r11, unsigned long rcx,
+				      unsigned long rbx)
+{
+	if (r11 == r11_sentinel && rcx == rcx_sentinel)
+		return REGS_SAVED;
+
+	if (r11 == rflags_sentinel && rcx == rbx)
+		return REGS_SYSRET;
+
+	printf("[FAIL] check_regs_result\n");
+	printf("        r11_sentinel = %#lx; %%r11 = %#lx;\n", r11_sentinel, r11);
+	printf("        rcx_sentinel = %#lx; %%rcx = %#lx;\n", rcx_sentinel, rcx);
+	printf("        rflags_sentinel = %#lx\n", rflags_sentinel);
+	return REGS_ERROR;
+}
+
+static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
+		       unsigned long arg3, unsigned long arg4,
+		       unsigned long arg5, unsigned long arg6)
+{
+	static enum regs_ok regs_ok_state = REGS_UNDEFINED;
+	register unsigned long r11 asm("%r11");
+	register unsigned long r10 asm("%r10");
+	register unsigned long r8 asm("%r8");
+	register unsigned long r9 asm("%r9");
+	register void *rsp asm("%rsp");
+	unsigned long rcx, rbx;
+	enum regs_ok ret;
+
+	r11 = r11_sentinel;
+	rcx = rcx_sentinel;
+	r10 = arg4;
+	r8 = arg5;
+	r9 = arg6;
+
+	asm volatile (
+		"pushq	%[rflags_sentinel]\n\t"
+		"popf\n\t"
+		"leaq	1f(%%rip), %[rbx]\n\t"
+		"syscall\n"
+		"1:"
+
+		: "+a" (nr_syscall),
+		  "+r" (r11),
+		  "+c" (rcx),
+		  [rbx] "=b" (rbx),
+		  "+r" (rsp)	/* Clobber the redzone */
+
+		: [rflags_sentinel] "g" (rflags_sentinel),
+		  "D" (arg1),	/* %rdi */
+		  "S" (arg2),	/* %rsi */
+		  "d" (arg3),	/* %rdx */
+		  "r" (r10),
+		  "r" (r8),
+		  "r" (r9)
+
+		: "memory"
+	);
+
+	/*
+	 * Test that:
+	 *
+	 * - "syscall" in a FRED system doesn't clobber %rcx and %r11.
+	 * - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
+	 */
+	ret = check_regs_result(r11, rcx, rbx);
+	assert(ret != REGS_ERROR);
+
+	/*
+	 * Test that we don't get a mix of REGS_SAVED and REGS_SYSRET.
+	 * It needs at least calling do_syscall() twice to assert.
+	 */
+	if (regs_ok_state == REGS_UNDEFINED) {
+		/*
+		 * First time calling do_syscall().
+		 */
+		regs_ok_state = ret;
+	} else {
+		assert(regs_ok_state == ret);
+	}
+
+	return nr_syscall;
+}
+
 static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
 		       int flags)
 {
@@ -85,27 +191,33 @@ static void sigsegv_for_sigreturn_test(int sig, siginfo_t *info, void *ctx_void)
 static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
 {
 	ucontext_t *ctx = (ucontext_t*)ctx_void;
+	enum regs_ok ret;
 
 	memcpy(&initial_regs, &ctx->uc_mcontext.gregs, sizeof(gregset_t));
 
+	ret = check_regs_result(ctx->uc_mcontext.gregs[REG_R11],
+				ctx->uc_mcontext.gregs[REG_RCX],
+				ctx->uc_mcontext.gregs[REG_RBX]);
+
+	assert(ret != REGS_ERROR);
+
 	/* Set IP and CX to match so that SYSRET can happen. */
 	ctx->uc_mcontext.gregs[REG_RIP] = rip;
 	ctx->uc_mcontext.gregs[REG_RCX] = rip;
-
-	/* R11 and EFLAGS should already match. */
-	assert(ctx->uc_mcontext.gregs[REG_EFL] ==
-	       ctx->uc_mcontext.gregs[REG_R11]);
-
 	sethandler(SIGSEGV, sigsegv_for_sigreturn_test, SA_RESETHAND);
 
-	return;
+}
+
+static void __raise(int sig)
+{
+	do_syscall(__NR_kill, getpid(), sig, 0, 0, 0, 0);
 }
 
 static void test_sigreturn_to(unsigned long ip)
 {
 	rip = ip;
 	printf("[RUN]\tsigreturn to 0x%lx\n", ip);
-	raise(SIGUSR1);
+	__raise(SIGUSR1);
 }
 
 static jmp_buf jmpbuf;
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [RFC PATCH v5 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-25  3:49                             ` [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture Ammar Faizi
  2023-01-25  3:49                               ` [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
@ 2023-01-25  3:49                               ` Ammar Faizi
  2023-01-25  8:22                               ` [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture Li, Xin3
  2 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  3:49 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Ammar Faizi, Dave Hansen, Dave Hansen, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	x86 Mailing List, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

Test that:

  - REGS_SAVED: "syscall" in a FRED system doesn't clobber %rcx and
    %r11.

  - REGS_SYSRET: "syscall" in a non-FRED system sets %rcx=%rip and
    %r11=%rflags.

Test them out with trivial system calls like __NR_getppid and friends
which are extremely likely to return with SYSRET on an IDT system.

Goals of this test:

  - Ensure that the syscall behavior is consistent. It should be either
    always REGS_SAVED or always REGS_SYSRET. Not a mix of them.

  - The kernel doesn't leak its internal data when returning to
    userspace.

Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 tools/testing/selftests/x86/sysret_rip.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index d3dc5ec496854179..7ebfb603bf45fa0a 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -270,8 +270,24 @@ static void test_syscall_fallthrough_to(unsigned long ip)
 	printf("[OK]\tWe survived\n");
 }
 
+/*
+ * Ensure that various system calls are consistent.
+ * We should not get a mix of REGS_SAVED and REGS_SYSRET.
+ */
+static void test_syscall_rcx_r11_consistent(void)
+{
+	do_syscall(__NR_getpid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_gettid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_getppid, 0, 0, 0, 0, 0, 0);
+}
+
 int main()
 {
+	int i;
+
+	for (i = 0; i < 32; i++)
+		test_syscall_rcx_r11_consistent();
+
 	/*
 	 * When the kernel returns from a slow-path syscall, it will
 	 * detect whether SYSRET is appropriate.  If it incorrectly
@@ -279,7 +295,7 @@ int main()
 	 * it'll crash on Intel CPUs.
 	 */
 	sethandler(SIGUSR1, sigusr1, 0);
-	for (int i = 47; i < 64; i++)
+	for (i = 47; i < 64; i++)
 		test_sigreturn_to(1UL<<i);
 
 	clearhandler(SIGUSR1);
@@ -290,7 +306,7 @@ int main()
 	test_syscall_fallthrough_to((1UL << 47) - 2*PAGE_SIZE);
 
 	/* These are the interesting cases. */
-	for (int i = 47; i < 64; i++) {
+	for (i = 47; i < 64; i++) {
 		test_syscall_fallthrough_to((1UL<<i) - PAGE_SIZE);
 		test_syscall_fallthrough_to(1UL<<i);
 	}
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* RE: [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture
  2023-01-25  3:49                             ` [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture Ammar Faizi
  2023-01-25  3:49                               ` [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
  2023-01-25  3:49                               ` [RFC PATCH v5 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
@ 2023-01-25  8:22                               ` Li, Xin3
  2023-01-25  8:32                                 ` Ammar Faizi
  2 siblings, 1 reply; 88+ messages in thread
From: Li, Xin3 @ 2023-01-25  8:22 UTC (permalink / raw)
  To: Ammar Faizi, H. Peter Anvin
  Cc: Hansen, Dave, Dave Hansen, Thomas Gleixner, andrew.cooper3,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Lutomirski, Andy, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

This version passes on FRED, thanks a lot for quickly fixing it.

  Xin

> From: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> 
> This is an RFC patchset v5. There are two patches in this series.
> 
> Xin Li reported that the sysret_rip test fails at:
> 
>         assert(ctx->uc_mcontext.gregs[REG_EFL] ==
>                ctx->uc_mcontext.gregs[REG_R11]);
> 
> on the Intel FRED architecture. Let's handle the FRED system scenario too. The
> 'syscall' instruction in a FRED system doesn't set %rcx=%rip and %r11=%rflags.
> 
> Syscall and sysenter in a FRED system are treated equivalently to software
> interrupts, e.g. INT 0x80. They do not modify any registers.
> 
> Link: https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-
> 9c001c2343af@intel.com
> 
> #### Changelog v5:
> 
>    - Fix do_syscall() return value (Ammar).
> 
> #### Changelog v4:
> 
>    - Fix the assertion condition inside the SIGUSR1 handler (Xin Li).
> 
>    - Explain the purpose of patch #2 in the commit message (HPA).
> 
>    - Update commit message (Ammar).
> 
>    - Repeat test_syscall_rcx_r11_consistent() 32 times to be more sure
>      that the result is really consistent (Ammar).
> 
> #### Changelog v3:
> 
>    - Test that we don't get a mix of REGS_SAVED and REGS_SYSRET,
>      which is a major part of the point (HPA).
> 
> #### Changelog v2:
> 
>    - Use "+r"(rsp) as the right way to avoid redzone problems
>      per Andrew's comment (HPA).
> 
> 
> Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> ---
> 
> Ammar Faizi (2):
>   selftests/x86: sysret_rip: Handle syscall in a FRED system
>   selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
> 
>  tools/testing/selftests/x86/sysret_rip.c | 146 +++++++++++++++++++++--
>  1 file changed, 137 insertions(+), 9 deletions(-)
> 
> 
> base-commit: e12ad468c22065a2826b2fc4c11d2113a7975301
> --
> Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture
  2023-01-25  8:22                               ` [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture Li, Xin3
@ 2023-01-25  8:32                                 ` Ammar Faizi
  2023-01-25 17:07                                   ` Li, Xin3
  0 siblings, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  8:32 UTC (permalink / raw)
  To: Li, Xin3
  Cc: H. Peter Anvin, Hansen, Dave, Dave Hansen, Thomas Gleixner,
	andrew.cooper3, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Lutomirski, Andy,
	x86 Mailing List, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On Wed, Jan 25, 2023 at 08:22:48AM +0000, Li, Xin3 wrote:
> This version passes on FRED, thanks a lot for quickly fixing it.

Great!

Can you pick these two patches and include it in the next version of
"x86: enable FRED for x86-64" RFC patchset?

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25  3:49                               ` [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
@ 2023-01-25  8:39                                 ` H. Peter Anvin
  2023-01-25  8:53                                   ` Ammar Faizi
  2023-01-25  9:57                                   ` Ammar Faizi
  0 siblings, 2 replies; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-25  8:39 UTC (permalink / raw)
  To: Ammar Faizi, Xin Li
  Cc: Dave Hansen, Dave Hansen, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On 1/24/23 19:49, Ammar Faizi wrote:
> +
> +	/*
> +	 * Test that:
> +	 *
> +	 * - "syscall" in a FRED system doesn't clobber %rcx and %r11.
> +	 * - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
> +	 */
> +	ret = check_regs_result(r11, rcx, rbx);
> +	assert(ret != REGS_ERROR);
> +
> +	/*
> +	 * Test that we don't get a mix of REGS_SAVED and REGS_SYSRET.
> +	 * It needs at least calling do_syscall() twice to assert.
> +	 */
> +	if (regs_ok_state == REGS_UNDEFINED) {
> +		/*
> +		 * First time calling do_syscall().
> +		 */
> +		regs_ok_state = ret;
> +	} else {
> +		assert(regs_ok_state == ret);
> +	}
> +

[...]

> +	ret = check_regs_result(ctx->uc_mcontext.gregs[REG_R11],
> +				ctx->uc_mcontext.gregs[REG_RCX],
> +				ctx->uc_mcontext.gregs[REG_RBX]);
> +
> +	assert(ret != REGS_ERROR);
> +

This instance, too, needs to be checked against regs_ok_result. It would 
make most sense to move that handling, and the assert() into 
check_regs_result() or into a separate function around it.

>   	/* Set IP and CX to match so that SYSRET can happen. */
>   	ctx->uc_mcontext.gregs[REG_RIP] = rip;
>   	ctx->uc_mcontext.gregs[REG_RCX] = rip;

It would be interesting to have the syscall handler try both with and 
without this (so it would end up doing both IRET and SYSCALL on legacy.) 
Perhaps SIGUSR1 versus SIGUSR2...

	-hpa

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25  8:39                                 ` H. Peter Anvin
@ 2023-01-25  8:53                                   ` Ammar Faizi
  2023-01-25  9:57                                   ` Ammar Faizi
  1 sibling, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  8:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Xin Li, Dave Hansen, Dave Hansen, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On Wed, Jan 25, 2023 at 12:39:26AM -0800, H. Peter Anvin wrote:
> > +	ret = check_regs_result(ctx->uc_mcontext.gregs[REG_R11],
> > +				ctx->uc_mcontext.gregs[REG_RCX],
> > +				ctx->uc_mcontext.gregs[REG_RBX]);
> > +
> > +	assert(ret != REGS_ERROR);
> > +
> 
> This instance, too, needs to be checked against regs_ok_result. It would
> make most sense to move that handling, and the assert() into
> check_regs_result() or into a separate function around it.

OK. Sounds better.
 
> >   	/* Set IP and CX to match so that SYSRET can happen. */
> >   	ctx->uc_mcontext.gregs[REG_RIP] = rip;
> >   	ctx->uc_mcontext.gregs[REG_RCX] = rip;
> 
> It would be interesting to have the syscall handler try both with and
> without this (so it would end up doing both IRET and SYSCALL on legacy.)
> Perhaps SIGUSR1 versus SIGUSR2...

We will have a new separate patch for that.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25  8:39                                 ` H. Peter Anvin
  2023-01-25  8:53                                   ` Ammar Faizi
@ 2023-01-25  9:57                                   ` Ammar Faizi
  2023-01-25 10:01                                     ` Ammar Faizi
  2023-01-25 10:17                                     ` H. Peter Anvin
  1 sibling, 2 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25  9:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Xin Li, Dave Hansen, Dave Hansen, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On Wed, Jan 25, 2023 at 12:39:26AM -0800, H. Peter Anvin wrote:
> >   	/* Set IP and CX to match so that SYSRET can happen. */
> >   	ctx->uc_mcontext.gregs[REG_RIP] = rip;
> >   	ctx->uc_mcontext.gregs[REG_RCX] = rip;
> 
> It would be interesting to have the syscall handler try both with and
> without this (so it would end up doing both IRET and SYSCALL on legacy.)
> Perhaps SIGUSR1 versus SIGUSR2...

Just to clarify this more so I am sure I understand it correctly.

Did you mean to have the same signal handler without modifiying
'REG_RCX' but still change 'REG_RIP'?

IOW, we want to only *remove*:

   ctx->uc_mcontext.gregs[REG_RCX] = rip;

and *keep*:

   ctx->uc_mcontext.gregs[REG_RIP] = rip;

for the SIGUSR2 handler. Thus, inside the entry64 we will jump to the
iret path because %rcx != %r11 upon rt_sigreturn()?

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25  9:57                                   ` Ammar Faizi
@ 2023-01-25 10:01                                     ` Ammar Faizi
  2023-01-25 10:17                                     ` H. Peter Anvin
  1 sibling, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25 10:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Xin Li, Dave Hansen, Dave Hansen, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On 1/25/23 4:57 PM, Ammar Faizi wrote:
> On Wed, Jan 25, 2023 at 12:39:26AM -0800, H. Peter Anvin wrote:
>>>    	/* Set IP and CX to match so that SYSRET can happen. */
>>>    	ctx->uc_mcontext.gregs[REG_RIP] = rip;
>>>    	ctx->uc_mcontext.gregs[REG_RCX] = rip;
>>
>> It would be interesting to have the syscall handler try both with and
>> without this (so it would end up doing both IRET and SYSCALL on legacy.)
>> Perhaps SIGUSR1 versus SIGUSR2...
> 
> Just to clarify this more so I am sure I understand it correctly.
> 
> Did you mean to have the same signal handler without modifiying
> 'REG_RCX' but still change 'REG_RIP'?
> 
> IOW, we want to only *remove*:
> 
>     ctx->uc_mcontext.gregs[REG_RCX] = rip;
> 
> and *keep*:
> 
>     ctx->uc_mcontext.gregs[REG_RIP] = rip;
> 
> for the SIGUSR2 handler. Thus, inside the entry64 we will jump to the
> iret path because %rcx != %r11 upon rt_sigreturn()?

s/%rcx != %r11/%rcx != %rip/

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25  9:57                                   ` Ammar Faizi
  2023-01-25 10:01                                     ` Ammar Faizi
@ 2023-01-25 10:17                                     ` H. Peter Anvin
  2023-01-25 11:37                                       ` Ammar Faizi
  1 sibling, 1 reply; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-25 10:17 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Xin Li, Dave Hansen, Dave Hansen, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On January 25, 2023 1:57:15 AM PST, Ammar Faizi <ammarfaizi2@gnuweeb.org> wrote:
>On Wed, Jan 25, 2023 at 12:39:26AM -0800, H. Peter Anvin wrote:
>> >   	/* Set IP and CX to match so that SYSRET can happen. */
>> >   	ctx->uc_mcontext.gregs[REG_RIP] = rip;
>> >   	ctx->uc_mcontext.gregs[REG_RCX] = rip;
>> 
>> It would be interesting to have the syscall handler try both with and
>> without this (so it would end up doing both IRET and SYSCALL on legacy.)
>> Perhaps SIGUSR1 versus SIGUSR2...
>
>Just to clarify this more so I am sure I understand it correctly.
>
>Did you mean to have the same signal handler without modifiying
>'REG_RCX' but still change 'REG_RIP'?
>
>IOW, we want to only *remove*:
>
>   ctx->uc_mcontext.gregs[REG_RCX] = rip;
>
>and *keep*:
>
>   ctx->uc_mcontext.gregs[REG_RIP] = rip;
>
>for the SIGUSR2 handler. Thus, inside the entry64 we will jump to the
>iret path because %rcx != %r11 upon rt_sigreturn()?
>

I guess it would depend on what they "normally" are. My #1 impulse would be to leave them both unchanged.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25 10:17                                     ` H. Peter Anvin
@ 2023-01-25 11:37                                       ` Ammar Faizi
  2023-01-25 17:25                                         ` H. Peter Anvin
  0 siblings, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25 11:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Xin Li, Dave Hansen, Dave Hansen, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On Wed, Jan 25, 2023 at 02:17:41AM -0800, H. Peter Anvin wrote:
> I guess it would depend on what they "normally" are. My #1 impulse would be to leave them both unchanged.

Ah okay... I think I understand now. My confusion came from a comment
in that code.

The current SIGUSR1 handler has a comment:

    /* Set IP and CX to match so that SYSRET can happen. */
    ctx->uc_mcontext.gregs[REG_RIP] = rip;
    ctx->uc_mcontext.gregs[REG_RCX] = rip;

So I thought if we leave them both unchanged, then SYSRET can happen
too, because IP and CX match. My initial confusion about that was:

    Where do we actually exercise IRET if the SIGUSR2 handler
    exercises SYSRET then?

I realized my assumption was wrong. The current SIGUSR1 handler
actually forces the kernel to use IRET, not SYSRET. Because the %rip
is set to a non-canonical address. So that's the place where it
exercises IRET.

IOW, my understanding now:

The current SIGUSR1 handler exercises the SYSRET-appropriate condition
detector in the kernel. It doesn't actually go to the SYSRET path
despite the comment saying "SYSRET can happen". That detector must take
us to the IRET path or we will #GP in kernel space on Intel CPUs.

In short, the SIGUSR1 handler asserts that "SYSRET must *not* happen".

The expected SIGUSR2 handler addition exercises the SYSRET path by
leaving REG_IP and REG_CX unchanged.

Am I correct?

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture
  2023-01-25  8:32                                 ` Ammar Faizi
@ 2023-01-25 17:07                                   ` Li, Xin3
  2023-01-25 17:24                                     ` H. Peter Anvin
  0 siblings, 1 reply; 88+ messages in thread
From: Li, Xin3 @ 2023-01-25 17:07 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: H. Peter Anvin, Hansen, Dave, Dave Hansen, Thomas Gleixner,
	andrew.cooper3, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Lutomirski, Andy,
	x86 Mailing List, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

> > This version passes on FRED, thanks a lot for quickly fixing it.
> 
> Great!
> 
> Can you pick these two patches and include it in the next version of
> "x86: enable FRED for x86-64" RFC patchset?

Would it be better to get this patch set merged first?

Otherwise surely I will include it in the FRED patch set.

  Xin

 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture
  2023-01-25 17:07                                   ` Li, Xin3
@ 2023-01-25 17:24                                     ` H. Peter Anvin
  2023-01-25 17:41                                       ` Ammar Faizi
  0 siblings, 1 reply; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-25 17:24 UTC (permalink / raw)
  To: Li, Xin3, Ammar Faizi
  Cc: Hansen, Dave, Dave Hansen, Thomas Gleixner, andrew.cooper3,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Lutomirski, Andy, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On January 25, 2023 9:07:18 AM PST, "Li, Xin3" <xin3.li@intel.com> wrote:
>> > This version passes on FRED, thanks a lot for quickly fixing it.
>> 
>> Great!
>> 
>> Can you pick these two patches and include it in the next version of
>> "x86: enable FRED for x86-64" RFC patchset?
>
>Would it be better to get this patch set merged first?
>
>Otherwise surely I will include it in the FRED patch set.
>
>  Xin
>
> 
>

If the maintainers are ok with it, it would be better to merge it sooner: once we have agreed on the semantics, which I believe we have, we should be testing those semantics and nothing else.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25 11:37                                       ` Ammar Faizi
@ 2023-01-25 17:25                                         ` H. Peter Anvin
  0 siblings, 0 replies; 88+ messages in thread
From: H. Peter Anvin @ 2023-01-25 17:25 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Xin Li, Dave Hansen, Dave Hansen, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On January 25, 2023 3:37:23 AM PST, Ammar Faizi <ammarfaizi2@gnuweeb.org> wrote:
>On Wed, Jan 25, 2023 at 02:17:41AM -0800, H. Peter Anvin wrote:
>> I guess it would depend on what they "normally" are. My #1 impulse would be to leave them both unchanged.
>
>Ah okay... I think I understand now. My confusion came from a comment
>in that code.
>
>The current SIGUSR1 handler has a comment:
>
>    /* Set IP and CX to match so that SYSRET can happen. */
>    ctx->uc_mcontext.gregs[REG_RIP] = rip;
>    ctx->uc_mcontext.gregs[REG_RCX] = rip;
>
>So I thought if we leave them both unchanged, then SYSRET can happen
>too, because IP and CX match. My initial confusion about that was:
>
>    Where do we actually exercise IRET if the SIGUSR2 handler
>    exercises SYSRET then?
>
>I realized my assumption was wrong. The current SIGUSR1 handler
>actually forces the kernel to use IRET, not SYSRET. Because the %rip
>is set to a non-canonical address. So that's the place where it
>exercises IRET.
>
>IOW, my understanding now:
>
>The current SIGUSR1 handler exercises the SYSRET-appropriate condition
>detector in the kernel. It doesn't actually go to the SYSRET path
>despite the comment saying "SYSRET can happen". That detector must take
>us to the IRET path or we will #GP in kernel space on Intel CPUs.
>
>In short, the SIGUSR1 handler asserts that "SYSRET must *not* happen".
>
>The expected SIGUSR2 handler addition exercises the SYSRET path by
>leaving REG_IP and REG_CX unchanged.
>
>Am I correct?
>

That's the idea.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture
  2023-01-25 17:24                                     ` H. Peter Anvin
@ 2023-01-25 17:41                                       ` Ammar Faizi
  2023-01-25 17:48                                         ` Li, Xin3
  0 siblings, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25 17:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Li, Xin3, Hansen, Dave, Dave Hansen, Thomas Gleixner,
	andrew.cooper3, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Lutomirski, Andy,
	x86 Mailing List, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On Wed, Jan 25, 2023 at 09:24:40AM -0800, H. Peter Anvin wrote:
> On January 25, 2023 9:07:18 AM PST, "Li, Xin3" <xin3.li@intel.com> wrote:
> > Would it be better to get this patch set merged first?
> > 
> > Otherwise surely I will include it in the FRED patch set.
> 
> If the maintainers are ok with it, it would be better to merge it
> sooner: once we have agreed on the semantics, which I believe we
> have, we should be testing those semantics and nothing else.

OK, let's keep this patchset separated from the FRED support
patchset.

In the meantime, let me address the recent HPA's comments.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture
  2023-01-25 17:41                                       ` Ammar Faizi
@ 2023-01-25 17:48                                         ` Li, Xin3
  2023-02-15  7:42                                           ` Li, Xin3
  0 siblings, 1 reply; 88+ messages in thread
From: Li, Xin3 @ 2023-01-25 17:48 UTC (permalink / raw)
  To: Ammar Faizi, H. Peter Anvin
  Cc: Hansen, Dave, Dave Hansen, Thomas Gleixner, andrew.cooper3,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Lutomirski, Andy, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

> > > Would it be better to get this patch set merged first?
> > >
> > > Otherwise surely I will include it in the FRED patch set.
> >
> > If the maintainers are ok with it, it would be better to merge it
> > sooner: once we have agreed on the semantics, which I believe we have,
> > we should be testing those semantics and nothing else.
> 
> OK, let's keep this patchset separated from the FRED support patchset.

Thanks!

This patch set first makes the R11/RCX semantics clearer, and it BTW fixes
FRED tests.

To me it's more of an improvement to the existing code.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v6 0/3] sysret_rip update for the Intel FRED architecture
  2023-01-24  9:12                           ` Ammar Faizi
                                               ` (2 preceding siblings ...)
  2023-01-25  3:49                             ` [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture Ammar Faizi
@ 2023-01-25 21:17                             ` Ammar Faizi
  2023-01-25 21:17                               ` [RFC PATCH v6 1/3] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
                                                 ` (2 more replies)
  2023-01-25 23:24                             ` [RFC PATCH v7 0/3] sysret_rip update for the Intel FRED architecture Ammar Faizi
  4 siblings, 3 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25 21:17 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Dave Hansen, Dave Hansen, Kirill A. Shutemov, Thomas Gleixner,
	Andrew Cooper, Peter Zijlstra, Brian Gerst, Borislav Petkov,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

Hi,

This is an RFC patchset v6. There are three patches in this series.

Xin Li reported that the sysret_rip test fails at:

        assert(ctx->uc_mcontext.gregs[REG_EFL] ==
               ctx->uc_mcontext.gregs[REG_R11]);

on the Intel FRED architecture. Let's handle the FRED system scenario
too. The 'syscall' instruction in a FRED system doesn't set %rcx=%rip
and %r11=%rflags.

Syscall and sysenter in a FRED system are treated equivalently to
software interrupts, e.g. INT 0x80. They do not modify any registers.

Link: https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-9c001c2343af@intel.com

## Changelog v6:

   - Move the check-regs assertion in sigusr1() to check_regs_result()
     (HPA).

   - Add a new test just like sigusr1(), but don't modify REG_RCX and
     REG_R11. This is used to test SYSRET behavior consistency (HPA).

## Changelog v5:

   - Fix do_syscall() return value (Ammar).

## Changelog v4:

   - Fix the assertion condition inside the SIGUSR1 handler (Xin Li).

   - Explain the purpose of patch #2 in the commit message (HPA).

   - Update commit message (Ammar).

   - Repeat test_syscall_rcx_r11_consistent() 32 times to be more sure
     that the result is really consistent (Ammar).

## Changelog v3:

   - Test that we don't get a mix of REGS_SAVED and REGS_SYSRET,
     which is a major part of the point (HPA).

## Changelog v2:

   - Use "+r"(rsp) as the right way to avoid redzone problems
     per Andrew's comment (HPA).


Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

Ammar Faizi (3):
  selftests/x86: sysret_rip: Handle syscall in a FRED system
  selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  selftests/x86: sysret_rip: Test opportunistic SYSRET

 tools/testing/selftests/x86/sysret_rip.c | 177 +++++++++++++++++++++--
 1 file changed, 168 insertions(+), 9 deletions(-)


base-commit: e12ad468c22065a2826b2fc4c11d2113a7975301
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v6 1/3] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25 21:17                             ` [RFC PATCH v6 0/3] " Ammar Faizi
@ 2023-01-25 21:17                               ` Ammar Faizi
  2023-01-25 23:01                                 ` Ammar Faizi
  2023-01-25 21:17                               ` [RFC PATCH v6 2/3] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
  2023-01-25 21:17                               ` [RFC PATCH v6 3/3] selftests/x86: sysret_rip: Test opportunistic SYSRET Ammar Faizi
  2 siblings, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25 21:17 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Dave Hansen, Dave Hansen, Kirill A. Shutemov, Thomas Gleixner,
	Andrew Cooper, Peter Zijlstra, Brian Gerst, Borislav Petkov,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

The current selftest asserts (%r11 == %rflags) after the 'syscall'
returns to user. Such an assertion doesn't apply to the FRED system
because in that system the 'syscall' instruction does not set
%r11=%rflags and %rcx=%rip.

Handle the FRED case. Now, test that:

  - "syscall" in a FRED system doesn't clobber %rcx and %r11.
  - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.

The 'raise()' function from libc can't be used to control those
registers. Therefore, create a syscall wrapper in inline Assembly to
fully control them.

Fixes: 660602140103 ("selftests/x86: Add a selftest for SYSRET to noncanonical addresses")
Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Reported-by: Xin Li <xin3.li@intel.com>
Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 tools/testing/selftests/x86/sysret_rip.c | 120 +++++++++++++++++++++--
 1 file changed, 113 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 84d74be1d90207ab..100f55981d77a29b 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -39,6 +39,110 @@ asm (
 extern const char test_page[];
 static void const *current_test_page_addr = test_page;
 
+/* Arbitrary values */
+static const unsigned long r11_sentinel = 0xfeedfacedeadbeef;
+static const unsigned long rcx_sentinel = 0x5ca1ab1e0b57ac1e;
+
+/* An arbitrary *valid* RFLAGS value */
+static const unsigned long rflags_sentinel = 0x200a93;
+
+enum regs_ok {
+	REGS_UNDEFINED	= -1,	/* For consistency checker init, never returned */
+	REGS_SAVED	=  0,	/* Registers properly preserved */
+	REGS_SYSRET	=  1	/* Registers match syscall/sysret */
+};
+
+/*
+ * REGS_SAVED  = %rcx and %r11 preserved.
+ * REGS_SYSRET = %rcx and %r11 set to %rflags and %rip.
+ * REGS_ERROR  = %rcx and/or %r11 set to any other values.
+ *
+ * @rbx should be set to the syscall return %rip.
+ */
+static void check_regs_result(unsigned long r11, unsigned long rcx,
+			      unsigned long rbx)
+{
+	static enum regs_ok regs_ok_state = REGS_UNDEFINED;
+	enum regs_ok ret;
+
+	/*
+	 * Test that:
+	 *
+	 * - "syscall" in a FRED system doesn't clobber %rcx and %r11.
+	 * - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.
+	 */
+	if (r11 == r11_sentinel && rcx == rcx_sentinel) {
+		ret = REGS_SAVED;
+	} else if (r11 == rflags_sentinel && rcx == rbx) {
+		ret = REGS_SYSRET;
+	} else {
+		printf("[FAIL] check_regs_result\n");
+		printf("        r11_sentinel = %#lx; %%r11 = %#lx;\n", r11_sentinel, r11);
+		printf("        rcx_sentinel = %#lx; %%rcx = %#lx;\n", rcx_sentinel, rcx);
+		printf("        rflags_sentinel = %#lx\n", rflags_sentinel);
+		exit(1);
+	}
+
+
+	/*
+	 * Test that we don't get a mix of REGS_SAVED and REGS_SYSRET.
+	 * It needs at least calling check_regs_result() twice to assert.
+	 */
+	if (regs_ok_state == REGS_UNDEFINED) {
+		/*
+		 * First time calling check_regs_result().
+		 */
+		regs_ok_state = ret;
+	} else {
+		assert(regs_ok_state == ret);
+	}
+}
+
+static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
+		       unsigned long arg3, unsigned long arg4,
+		       unsigned long arg5, unsigned long arg6)
+{
+	register unsigned long r11 asm("%r11");
+	register unsigned long r10 asm("%r10");
+	register unsigned long r8 asm("%r8");
+	register unsigned long r9 asm("%r9");
+	register void *rsp asm("%rsp");
+	unsigned long rcx, rbx;
+
+	r11 = r11_sentinel;
+	rcx = rcx_sentinel;
+	r10 = arg4;
+	r8 = arg5;
+	r9 = arg6;
+
+	asm volatile (
+		"pushq	%[rflags_sentinel]\n\t"
+		"popf\n\t"
+		"leaq	1f(%%rip), %[rbx]\n\t"
+		"syscall\n"
+		"1:"
+
+		: "+a" (nr_syscall),
+		  "+r" (r11),
+		  "+c" (rcx),
+		  [rbx] "=b" (rbx),
+		  "+r" (rsp)	/* Clobber the redzone */
+
+		: [rflags_sentinel] "g" (rflags_sentinel),
+		  "D" (arg1),	/* %rdi */
+		  "S" (arg2),	/* %rsi */
+		  "d" (arg3),	/* %rdx */
+		  "r" (r10),
+		  "r" (r8),
+		  "r" (r9)
+
+		: "memory"
+	);
+
+	check_regs_result(r11, rcx, rbx);
+	return nr_syscall;
+}
+
 static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
 		       int flags)
 {
@@ -88,24 +192,26 @@ static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
 
 	memcpy(&initial_regs, &ctx->uc_mcontext.gregs, sizeof(gregset_t));
 
+	check_regs_result(ctx->uc_mcontext.gregs[REG_R11],
+			  ctx->uc_mcontext.gregs[REG_RCX],
+			  ctx->uc_mcontext.gregs[REG_RBX]);
+
 	/* Set IP and CX to match so that SYSRET can happen. */
 	ctx->uc_mcontext.gregs[REG_RIP] = rip;
 	ctx->uc_mcontext.gregs[REG_RCX] = rip;
-
-	/* R11 and EFLAGS should already match. */
-	assert(ctx->uc_mcontext.gregs[REG_EFL] ==
-	       ctx->uc_mcontext.gregs[REG_R11]);
-
 	sethandler(SIGSEGV, sigsegv_for_sigreturn_test, SA_RESETHAND);
+}
 
-	return;
+static void __raise(int sig)
+{
+	do_syscall(__NR_kill, getpid(), sig, 0, 0, 0, 0);
 }
 
 static void test_sigreturn_to(unsigned long ip)
 {
 	rip = ip;
 	printf("[RUN]\tsigreturn to 0x%lx\n", ip);
-	raise(SIGUSR1);
+	__raise(SIGUSR1);
 }
 
 static jmp_buf jmpbuf;
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [RFC PATCH v6 2/3] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-25 21:17                             ` [RFC PATCH v6 0/3] " Ammar Faizi
  2023-01-25 21:17                               ` [RFC PATCH v6 1/3] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
@ 2023-01-25 21:17                               ` Ammar Faizi
  2023-01-25 21:17                               ` [RFC PATCH v6 3/3] selftests/x86: sysret_rip: Test opportunistic SYSRET Ammar Faizi
  2 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25 21:17 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Dave Hansen, Dave Hansen, Kirill A. Shutemov, Thomas Gleixner,
	Andrew Cooper, Peter Zijlstra, Brian Gerst, Borislav Petkov,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

Test that:

  - REGS_SAVED: "syscall" in a FRED system doesn't clobber %rcx and
    %r11.

  - REGS_SYSRET: "syscall" in a non-FRED system sets %rcx=%rip and
    %r11=%rflags.

Test them out with trivial system calls like __NR_getppid and friends
which are extremely likely to return with SYSRET on an IDT system.

Goals of this test:

  - Ensure that the syscall behavior is consistent. It should be either
    always REGS_SAVED or always REGS_SYSRET. Not a mix of them.

  - The kernel doesn't leak its internal data when returning to
    userspace.

Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 tools/testing/selftests/x86/sysret_rip.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 100f55981d77a29b..d45b7f0147cd25ad 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -264,8 +264,24 @@ static void test_syscall_fallthrough_to(unsigned long ip)
 	printf("[OK]\tWe survived\n");
 }
 
+/*
+ * Ensure that various system calls are consistent.
+ * We should not get a mix of REGS_SAVED and REGS_SYSRET.
+ */
+static void test_syscall_rcx_r11_consistent(void)
+{
+	do_syscall(__NR_getpid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_gettid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_getppid, 0, 0, 0, 0, 0, 0);
+}
+
 int main()
 {
+	int i;
+
+	for (i = 0; i < 32; i++)
+		test_syscall_rcx_r11_consistent();
+
 	/*
 	 * When the kernel returns from a slow-path syscall, it will
 	 * detect whether SYSRET is appropriate.  If it incorrectly
@@ -273,7 +289,7 @@ int main()
 	 * it'll crash on Intel CPUs.
 	 */
 	sethandler(SIGUSR1, sigusr1, 0);
-	for (int i = 47; i < 64; i++)
+	for (i = 47; i < 64; i++)
 		test_sigreturn_to(1UL<<i);
 
 	clearhandler(SIGUSR1);
@@ -284,7 +300,7 @@ int main()
 	test_syscall_fallthrough_to((1UL << 47) - 2*PAGE_SIZE);
 
 	/* These are the interesting cases. */
-	for (int i = 47; i < 64; i++) {
+	for (i = 47; i < 64; i++) {
 		test_syscall_fallthrough_to((1UL<<i) - PAGE_SIZE);
 		test_syscall_fallthrough_to(1UL<<i);
 	}
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [RFC PATCH v6 3/3] selftests/x86: sysret_rip: Test opportunistic SYSRET
  2023-01-25 21:17                             ` [RFC PATCH v6 0/3] " Ammar Faizi
  2023-01-25 21:17                               ` [RFC PATCH v6 1/3] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
  2023-01-25 21:17                               ` [RFC PATCH v6 2/3] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
@ 2023-01-25 21:17                               ` Ammar Faizi
  2 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25 21:17 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Dave Hansen, Dave Hansen, Kirill A. Shutemov, Thomas Gleixner,
	Andrew Cooper, Peter Zijlstra, Brian Gerst, Borislav Petkov,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

When run on a non-FRED system, test the opportunistic SYSRET fast-path.
Make sure the %rcx/%r11 clobbering behavior is consistent.

When run on a FRED system, test that %rcx/%r11 are preserved when
invoking syscall.

This is similar to what test_syscall_rcx_r11_consistent() is doing, but
with addition it's done via the SIGUSR2 signal handler.

Link: https://lore.kernel.org/lkml/8770815f-0f23-d0c5-e56a-d401827842c9@zytor.com
Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

On Wed, 25 Jan 2023 00:39:26 -0800, "H. Peter Anvin" wrote:
> >    /* Set IP and CX to match so that SYSRET can happen. */
> >     ctx->uc_mcontext.gregs[REG_RIP] = rip;
> >     ctx->uc_mcontext.gregs[REG_RCX] = rip;
> 
> It would be interesting to have the syscall handler try both with and 
> without this (so it would end up doing both IRET and SYSCALL on legacy.) 
> Perhaps SIGUSR1 versus SIGUSR2...


 tools/testing/selftests/x86/sysret_rip.c | 37 ++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index d45b7f0147cd25ad..a1e5ec6f08bcd523 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -275,6 +275,28 @@ static void test_syscall_rcx_r11_consistent(void)
 	do_syscall(__NR_getppid, 0, 0, 0, 0, 0, 0);
 }
 
+static unsigned long usr2_rcx;
+static unsigned long usr2_r11;
+
+static void sigusr2(int sig, siginfo_t *info, void *ctx_void)
+{
+	ucontext_t *ctx = (ucontext_t*)ctx_void;
+
+	usr2_r11 = ctx->uc_mcontext.gregs[REG_R11];
+	usr2_rcx = ctx->uc_mcontext.gregs[REG_RCX];
+
+	check_regs_result(ctx->uc_mcontext.gregs[REG_R11],
+			  ctx->uc_mcontext.gregs[REG_RCX],
+			  ctx->uc_mcontext.gregs[REG_RBX]);
+}
+
+static void test_sysret_consistent(void)
+{
+	printf("[RUN]\ttest_sysret_consistent\n");
+	__raise(SIGUSR2);
+	printf("[OK]\tRCX = %#lx;  R11 = %#lx\n", usr2_rcx, usr2_r11);
+}
+
 int main()
 {
 	int i;
@@ -292,6 +314,21 @@ int main()
 	for (i = 47; i < 64; i++)
 		test_sigreturn_to(1UL<<i);
 
+	/*
+	 *
+	 * When run on a non-FRED system, test the SYSRET path. Make
+	 * sure the %rcx/%r11 clobbering behavior is consistent.
+	 *
+	 * When run on a FRED system, test that %rcx/%r11 are preserved
+	 * when invoking syscall.
+	 *
+	 * This is similar to test_syscall_rcx_r11_consistent(), but via
+	 * a signal handler.
+	 */
+	sethandler(SIGUSR2, sigusr2, 0);
+	for (i = 0; i < 32; i++)
+		test_sysret_consistent();
+
 	clearhandler(SIGUSR1);
 
 	sethandler(SIGSEGV, sigsegv_for_fallthrough, 0);
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v6 1/3] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25 21:17                               ` [RFC PATCH v6 1/3] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
@ 2023-01-25 23:01                                 ` Ammar Faizi
  0 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25 23:01 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Dave Hansen, Dave Hansen, Kirill A. Shutemov, Thomas Gleixner,
	Andrew Cooper, Peter Zijlstra, Brian Gerst, Borislav Petkov,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On Thu, Jan 26, 2023 at 04:17:12AM +0700, Ammar Faizi wrote:
> +/*
> + * REGS_SAVED  = %rcx and %r11 preserved.
> + * REGS_SYSRET = %rcx and %r11 set to %rflags and %rip.
> + * REGS_ERROR  = %rcx and/or %r11 set to any other values.

Since we moved the assertion into check_regs_result(), we no longer
need REGS_ERROR. check_regs_result() is now a void function.

Will remove that comment...

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v7 0/3] sysret_rip update for the Intel FRED architecture
  2023-01-24  9:12                           ` Ammar Faizi
                                               ` (3 preceding siblings ...)
  2023-01-25 21:17                             ` [RFC PATCH v6 0/3] " Ammar Faizi
@ 2023-01-25 23:24                             ` Ammar Faizi
  2023-01-25 23:24                               ` [RFC PATCH v7 1/3] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
                                                 ` (2 more replies)
  4 siblings, 3 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25 23:24 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Dave Hansen, Dave Hansen, Kirill A. Shutemov, Thomas Gleixner,
	Andrew Cooper, Peter Zijlstra, Brian Gerst, Borislav Petkov,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

Hi,

This is an RFC patchset v7. There are three patches in this series.

Xin Li reported that the sysret_rip test fails at:

        assert(ctx->uc_mcontext.gregs[REG_EFL] ==
               ctx->uc_mcontext.gregs[REG_R11]);

on the Intel FRED architecture. Let's handle the FRED system scenario
too. The 'syscall' instruction in a FRED system doesn't set %rcx=%rip
and %r11=%rflags.

Syscall and sysenter in a FRED system are treated equivalently to
software interrupts, e.g. INT 0x80. They do not modify any registers.

Link: https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-9c001c2343af@intel.com

## Changelog v7:

   - Fix comment, REGS_ERROR no longer exists in the enum (Ammar).

   - Update commit message (Ammar).

## Changelog v6:

   - Move the check-regs assertion in sigusr1() to check_regs_result()
     (HPA).

   - Add a new test just like sigusr1(), but don't modify REG_RCX and
     REG_R11. This is used to test SYSRET behavior consistency (HPA).

## Changelog v5:

   - Fix do_syscall() return value (Ammar).

## Changelog v4:

   - Fix the assertion condition inside the SIGUSR1 handler (Xin Li).

   - Explain the purpose of patch #2 in the commit message (HPA).

   - Update commit message (Ammar).

   - Repeat test_syscall_rcx_r11_consistent() 32 times to be more sure
     that the result is really consistent (Ammar).

## Changelog v3:

   - Test that we don't get a mix of REGS_SAVED and REGS_SYSRET,
     which is a major part of the point (HPA).

## Changelog v2:

   - Use "+r"(rsp) as the right way to avoid redzone problems
     per Andrew's comment (HPA).


Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

Ammar Faizi (3):
  selftests/x86: sysret_rip: Handle syscall in a FRED system
  selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  selftests/x86: sysret_rip: Test SYSRET with a signal handler

 tools/testing/selftests/x86/sysret_rip.c | 171 +++++++++++++++++++++--
 1 file changed, 162 insertions(+), 9 deletions(-)


base-commit: e12ad468c22065a2826b2fc4c11d2113a7975301
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC PATCH v7 1/3] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-25 23:24                             ` [RFC PATCH v7 0/3] sysret_rip update for the Intel FRED architecture Ammar Faizi
@ 2023-01-25 23:24                               ` Ammar Faizi
  2023-01-25 23:24                               ` [RFC PATCH v7 2/3] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
  2023-01-25 23:24                               ` [RFC PATCH v7 3/3] selftests/x86: sysret_rip: Test SYSRET with a signal handler Ammar Faizi
  2 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25 23:24 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Dave Hansen, Dave Hansen, Kirill A. Shutemov, Thomas Gleixner,
	Andrew Cooper, Peter Zijlstra, Brian Gerst, Borislav Petkov,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

The current selftest asserts (%r11 == %rflags) after the 'syscall'
returns to user. Such an assertion doesn't apply to the FRED system
because in that system the 'syscall' instruction does not set
%r11=%rflags and %rcx=%rip.

Handle the FRED case. Now, test that:

  - "syscall" in a FRED system doesn't clobber %rcx and %r11.
  - "syscall" in a non-FRED system sets %rcx=%rip and %r11=%rflags.

The 'raise()' function from libc can't be used to control those
registers. Therefore, create a syscall wrapper in inline Assembly to
fully control them.

Fixes: 660602140103 ("selftests/x86: Add a selftest for SYSRET to noncanonical addresses")
Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Reported-by: Xin Li <xin3.li@intel.com>
Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 tools/testing/selftests/x86/sysret_rip.c | 114 +++++++++++++++++++++--
 1 file changed, 107 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 84d74be1d90207ab..ef3f492d95f6f2a1 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -39,6 +39,104 @@ asm (
 extern const char test_page[];
 static void const *current_test_page_addr = test_page;
 
+/* Arbitrary values */
+static const unsigned long r11_sentinel = 0xfeedfacedeadbeef;
+static const unsigned long rcx_sentinel = 0x5ca1ab1e0b57ac1e;
+
+/* An arbitrary *valid* RFLAGS value */
+static const unsigned long rflags_sentinel = 0x200a93;
+
+enum regs_ok {
+	REGS_UNDEFINED	= -1,	/* For consistency checker init, never returned */
+	REGS_SAVED	=  0,	/* Registers properly preserved */
+	REGS_SYSRET	=  1	/* Registers match syscall/sysret */
+};
+
+/*
+ * @rbx should be set to the syscall return %rip.
+ */
+static void check_regs_result(unsigned long r11, unsigned long rcx,
+			      unsigned long rbx)
+{
+	static enum regs_ok regs_ok_state = REGS_UNDEFINED;
+	enum regs_ok ret;
+
+	/*
+	 * REGS_SAVED  = %rcx and %r11 preserved (Intel FRED).
+	 * REGS_SYSRET = %rcx and %r11 set to %rflags and %rip.
+	 */
+	if (r11 == r11_sentinel && rcx == rcx_sentinel) {
+		ret = REGS_SAVED;
+	} else if (r11 == rflags_sentinel && rcx == rbx) {
+		ret = REGS_SYSRET;
+	} else {
+		printf("[FAIL] check_regs_result\n");
+		printf("        r11_sentinel = %#lx; %%r11 = %#lx;\n", r11_sentinel, r11);
+		printf("        rcx_sentinel = %#lx; %%rcx = %#lx;\n", rcx_sentinel, rcx);
+		printf("        rflags_sentinel = %#lx\n", rflags_sentinel);
+		exit(1);
+	}
+
+
+	/*
+	 * Test that we don't get a mix of REGS_SAVED and REGS_SYSRET.
+	 * It needs at least calling check_regs_result() twice to assert.
+	 */
+	if (regs_ok_state == REGS_UNDEFINED) {
+		/*
+		 * First time calling check_regs_result().
+		 */
+		regs_ok_state = ret;
+	} else {
+		assert(regs_ok_state == ret);
+	}
+}
+
+static long do_syscall(long nr_syscall, unsigned long arg1, unsigned long arg2,
+		       unsigned long arg3, unsigned long arg4,
+		       unsigned long arg5, unsigned long arg6)
+{
+	register unsigned long r11 asm("%r11");
+	register unsigned long r10 asm("%r10");
+	register unsigned long r8 asm("%r8");
+	register unsigned long r9 asm("%r9");
+	register void *rsp asm("%rsp");
+	unsigned long rcx, rbx;
+
+	r11 = r11_sentinel;
+	rcx = rcx_sentinel;
+	r10 = arg4;
+	r8 = arg5;
+	r9 = arg6;
+
+	asm volatile (
+		"pushq	%[rflags_sentinel]\n\t"
+		"popf\n\t"
+		"leaq	1f(%%rip), %[rbx]\n\t"
+		"syscall\n"
+		"1:"
+
+		: "+a" (nr_syscall),
+		  "+r" (r11),
+		  "+c" (rcx),
+		  [rbx] "=b" (rbx),
+		  "+r" (rsp)	/* Clobber the redzone */
+
+		: [rflags_sentinel] "g" (rflags_sentinel),
+		  "D" (arg1),	/* %rdi */
+		  "S" (arg2),	/* %rsi */
+		  "d" (arg3),	/* %rdx */
+		  "r" (r10),
+		  "r" (r8),
+		  "r" (r9)
+
+		: "memory"
+	);
+
+	check_regs_result(r11, rcx, rbx);
+	return nr_syscall;
+}
+
 static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *),
 		       int flags)
 {
@@ -88,24 +186,26 @@ static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
 
 	memcpy(&initial_regs, &ctx->uc_mcontext.gregs, sizeof(gregset_t));
 
+	check_regs_result(ctx->uc_mcontext.gregs[REG_R11],
+			  ctx->uc_mcontext.gregs[REG_RCX],
+			  ctx->uc_mcontext.gregs[REG_RBX]);
+
 	/* Set IP and CX to match so that SYSRET can happen. */
 	ctx->uc_mcontext.gregs[REG_RIP] = rip;
 	ctx->uc_mcontext.gregs[REG_RCX] = rip;
-
-	/* R11 and EFLAGS should already match. */
-	assert(ctx->uc_mcontext.gregs[REG_EFL] ==
-	       ctx->uc_mcontext.gregs[REG_R11]);
-
 	sethandler(SIGSEGV, sigsegv_for_sigreturn_test, SA_RESETHAND);
+}
 
-	return;
+static void __raise(int sig)
+{
+	do_syscall(__NR_kill, getpid(), sig, 0, 0, 0, 0);
 }
 
 static void test_sigreturn_to(unsigned long ip)
 {
 	rip = ip;
 	printf("[RUN]\tsigreturn to 0x%lx\n", ip);
-	raise(SIGUSR1);
+	__raise(SIGUSR1);
 }
 
 static jmp_buf jmpbuf;
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [RFC PATCH v7 2/3] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11`
  2023-01-25 23:24                             ` [RFC PATCH v7 0/3] sysret_rip update for the Intel FRED architecture Ammar Faizi
  2023-01-25 23:24                               ` [RFC PATCH v7 1/3] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
@ 2023-01-25 23:24                               ` Ammar Faizi
  2023-01-25 23:24                               ` [RFC PATCH v7 3/3] selftests/x86: sysret_rip: Test SYSRET with a signal handler Ammar Faizi
  2 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25 23:24 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Dave Hansen, Dave Hansen, Kirill A. Shutemov, Thomas Gleixner,
	Andrew Cooper, Peter Zijlstra, Brian Gerst, Borislav Petkov,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

Test that:

  - REGS_SAVED: "syscall" in a FRED system doesn't clobber %rcx and
    %r11.

  - REGS_SYSRET: "syscall" in a non-FRED system sets %rcx=%rip and
    %r11=%rflags.

Test them out with trivial system calls like __NR_getppid and friends
which are extremely likely to return with SYSRET on an IDT system.

Goals of this test:

  - Ensure that the syscall behavior is consistent. It must be either
    always REGS_SAVED or always REGS_SYSRET. Not a mix of them.

  - The kernel doesn't leak its internal data when returning to
    userspace.

Link: https://lore.kernel.org/lkml/25b96960-a07e-a952-5c23-786b55054126@zytor.com
Co-developed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 tools/testing/selftests/x86/sysret_rip.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index ef3f492d95f6f2a1..d688cb9d5ac919eb 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -258,8 +258,24 @@ static void test_syscall_fallthrough_to(unsigned long ip)
 	printf("[OK]\tWe survived\n");
 }
 
+/*
+ * Ensure that various system calls are consistent.
+ * We must not get a mix of REGS_SAVED and REGS_SYSRET.
+ */
+static void test_syscall_rcx_r11_consistent(void)
+{
+	do_syscall(__NR_getpid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_gettid, 0, 0, 0, 0, 0, 0);
+	do_syscall(__NR_getppid, 0, 0, 0, 0, 0, 0);
+}
+
 int main()
 {
+	int i;
+
+	for (i = 0; i < 32; i++)
+		test_syscall_rcx_r11_consistent();
+
 	/*
 	 * When the kernel returns from a slow-path syscall, it will
 	 * detect whether SYSRET is appropriate.  If it incorrectly
@@ -267,7 +283,7 @@ int main()
 	 * it'll crash on Intel CPUs.
 	 */
 	sethandler(SIGUSR1, sigusr1, 0);
-	for (int i = 47; i < 64; i++)
+	for (i = 47; i < 64; i++)
 		test_sigreturn_to(1UL<<i);
 
 	clearhandler(SIGUSR1);
@@ -278,7 +294,7 @@ int main()
 	test_syscall_fallthrough_to((1UL << 47) - 2*PAGE_SIZE);
 
 	/* These are the interesting cases. */
-	for (int i = 47; i < 64; i++) {
+	for (i = 47; i < 64; i++) {
 		test_syscall_fallthrough_to((1UL<<i) - PAGE_SIZE);
 		test_syscall_fallthrough_to(1UL<<i);
 	}
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [RFC PATCH v7 3/3] selftests/x86: sysret_rip: Test SYSRET with a signal handler
  2023-01-25 23:24                             ` [RFC PATCH v7 0/3] sysret_rip update for the Intel FRED architecture Ammar Faizi
  2023-01-25 23:24                               ` [RFC PATCH v7 1/3] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
  2023-01-25 23:24                               ` [RFC PATCH v7 2/3] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
@ 2023-01-25 23:24                               ` Ammar Faizi
  2 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-25 23:24 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Dave Hansen, Dave Hansen, Kirill A. Shutemov, Thomas Gleixner,
	Andrew Cooper, Peter Zijlstra, Brian Gerst, Borislav Petkov,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

From: Ammar Faizi <ammarfaizi2@gnuweeb.org>

When run on a non-FRED system, test the SYSRET path. Make sure the
%rcx/%r11 clobbering behavior is consistent.

When run on a FRED system, test that %rcx/%r11 are preserved when
invoking syscall.

This is similar to what test_syscall_rcx_r11_consistent() is doing, but
with addition it's done via the SIGUSR2 signal handler.

Link: https://lore.kernel.org/lkml/8770815f-0f23-d0c5-e56a-d401827842c9@zytor.com
Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

On Wed, 25 Jan 2023 00:39:26 -0800, "H. Peter Anvin" wrote:
> >    /* Set IP and CX to match so that SYSRET can happen. */
> >     ctx->uc_mcontext.gregs[REG_RIP] = rip;
> >     ctx->uc_mcontext.gregs[REG_RCX] = rip;
> 
> It would be interesting to have the syscall handler try both with and 
> without this (so it would end up doing both IRET and SYSCALL on legacy.) 
> Perhaps SIGUSR1 versus SIGUSR2...

 tools/testing/selftests/x86/sysret_rip.c | 37 ++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index d688cb9d5ac919eb..630ee261559c14b1 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -269,6 +269,28 @@ static void test_syscall_rcx_r11_consistent(void)
 	do_syscall(__NR_getppid, 0, 0, 0, 0, 0, 0);
 }
 
+static unsigned long usr2_rcx;
+static unsigned long usr2_r11;
+
+static void sigusr2(int sig, siginfo_t *info, void *ctx_void)
+{
+	ucontext_t *ctx = (ucontext_t*)ctx_void;
+
+	usr2_r11 = ctx->uc_mcontext.gregs[REG_R11];
+	usr2_rcx = ctx->uc_mcontext.gregs[REG_RCX];
+
+	check_regs_result(ctx->uc_mcontext.gregs[REG_R11],
+			  ctx->uc_mcontext.gregs[REG_RCX],
+			  ctx->uc_mcontext.gregs[REG_RBX]);
+}
+
+static void test_sysret_consistent(void)
+{
+	printf("[RUN]\ttest_sysret_consistent\n");
+	__raise(SIGUSR2);
+	printf("[OK]\tRCX = %#lx;  R11 = %#lx\n", usr2_rcx, usr2_r11);
+}
+
 int main()
 {
 	int i;
@@ -286,6 +308,21 @@ int main()
 	for (i = 47; i < 64; i++)
 		test_sigreturn_to(1UL<<i);
 
+	/*
+	 *
+	 * When run on a non-FRED system, test the SYSRET path. Make
+	 * sure the %rcx/%r11 clobbering behavior is consistent.
+	 *
+	 * When run on a FRED system, test that %rcx/%r11 are preserved
+	 * when invoking syscall.
+	 *
+	 * This is similar to test_syscall_rcx_r11_consistent(), but via
+	 * a signal handler.
+	 */
+	sethandler(SIGUSR2, sigusr2, 0);
+	for (i = 0; i < 32; i++)
+		test_sysret_consistent();
+
 	clearhandler(SIGUSR1);
 
 	sethandler(SIGSEGV, sigsegv_for_fallthrough, 0);
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-24  1:40                       ` H. Peter Anvin
  2023-01-24  2:31                         ` Ammar Faizi
@ 2023-01-26 20:08                         ` Ammar Faizi
  2023-02-15  9:17                           ` Andrew Cooper
  2023-01-26 20:16                         ` Ammar Faizi
  2 siblings, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-01-26 20:08 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, Dave Hansen, Kirill A. Shutemov, Thomas Gleixner,
	Andrew Cooper, Peter Zijlstra, Brian Gerst, Borislav Petkov,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On Mon, Jan 23, 2023 at 05:40:23PM -0800, H. Peter Anvin wrote:
> So as per Andrew's comment, add:
> 
> register void * rsp asm("%rsp");
> 
> ...
> 
> "+r" (rsp)	/* clobber the redzone */
> 
> ... as the right way to avoid redzone problems.

I played with this more. I found something wrong with this. This doesn't
work for me. The compiler still uses red zone despite I use "+r" (rsp).

What did I do wrong?

-----

ammarfaizi2@integral2:/tmp$ gcc -fno-stack-protector -O2 -Wall -Wextra test.c -o test
ammarfaizi2@integral2:/tmp$ objdump --no-show-raw-insn -d test | grep "a_leaf_func_with_red_zone>:" -A8
0000000000001180 <a_leaf_func_with_red_zone>:
    1180:  endbr64 
    1184:  mov    $0x1,%eax
    1189:  mov    %rax,-0x8(%rsp)   ## BUG!!!
    118e:  pushf  
    118f:  pop    %rax
    1190:  mov    -0x8(%rsp),%rax   ## BUG!!!
    1195:  ret


ammarfaizi2@integral2:/tmp$ clang -O2 -Wall -Wextra test.c -o test
ammarfaizi2@integral2:/tmp$ objdump --no-show-raw-insn -d test | grep "a_leaf_func_with_red_zone>:" -A6
0000000000001140 <a_leaf_func_with_red_zone>:
    1140:  mov    $0x1,%eax
    1145:  mov    %rax,-0x8(%rsp)   ## BUG!!!
    114a:  pushf  
    114b:  pop    %rax
    114c:  mov    -0x8(%rsp),%rax   ## BUG!!!
    1151:  ret


-----
ammarfaizi2@integral2:~$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ammarfaizi2@integral2:~$ clang --version
Ubuntu clang version 16.0.0 (++20230124031324+d63e492562f2-1~exp1~20230124151444.705)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin


-----
test.c:

#include <stdio.h>
static inline void clobber_redzone(void)
{
        register void *rsp __asm__("%rsp");
        unsigned long rflags;

        __asm__ volatile ("pushf; popq %1"
                          : "+r" (rsp), "=r" (rflags));
}

static inline void set_red_zone(long *mem, long val)
{
        __asm__ volatile ("movq %[val], %[mem]"
                           : [mem] "=m" (*mem)
                           : [val] "r" (val));
}

static inline long get_red_zone(long *mem)
{
        long ret;

        __asm__ volatile ("movq %[in], %[out]"
                           : [out] "=r" (ret)
                           : [in] "m" (*mem));
        return ret;
}

__attribute__((__noinline__))
long a_leaf_func_with_red_zone(void)
{
        long x;

        set_red_zone(&x, 1);
        clobber_redzone();
        /* The correct retval is 1 */
        return get_red_zone(&x);
}

int main(void)
{
        printf("ret = %ld\n", a_leaf_func_with_red_zone());
        return 0;
}

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-24  1:40                       ` H. Peter Anvin
  2023-01-24  2:31                         ` Ammar Faizi
  2023-01-26 20:08                         ` Ammar Faizi
@ 2023-01-26 20:16                         ` Ammar Faizi
  2 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-01-26 20:16 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Xin Li, Dave Hansen, Dave Hansen, Kirill A. Shutemov,
	Thomas Gleixner, Andrew Cooper, Peter Zijlstra, Brian Gerst,
	Borislav Petkov, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	x86 Mailing List, Linux Kselftest Mailing List,
	Linux Kernel Mailing List


[ Resending, missed Xin Li from the Cc list... ]

On Mon, Jan 23, 2023 at 05:40:23PM -0800, H. Peter Anvin wrote:
> So as per Andrew's comment, add:
> 
> register void * rsp asm("%rsp");
> 
> ...
> 
> "+r" (rsp)	/* clobber the redzone */
> 
> ... as the right way to avoid redzone problems.

I played with this more. I found something wrong with this. This doesn't
work for me. The compiler still uses red zone despite I use "+r" (rsp).

What did I do wrong?

-----

ammarfaizi2@integral2:/tmp$ gcc -fno-stack-protector -O2 -Wall -Wextra test.c -o test
ammarfaizi2@integral2:/tmp$ objdump --no-show-raw-insn -d test | grep "a_leaf_func_with_red_zone>:" -A8
0000000000001180 <a_leaf_func_with_red_zone>:
    1180:  endbr64 
    1184:  mov    $0x1,%eax
    1189:  mov    %rax,-0x8(%rsp)   ## BUG!!!
    118e:  pushf  
    118f:  pop    %rax
    1190:  mov    -0x8(%rsp),%rax   ## BUG!!!
    1195:  ret


ammarfaizi2@integral2:/tmp$ clang -O2 -Wall -Wextra test.c -o test
ammarfaizi2@integral2:/tmp$ objdump --no-show-raw-insn -d test | grep "a_leaf_func_with_red_zone>:" -A6
0000000000001140 <a_leaf_func_with_red_zone>:
    1140:  mov    $0x1,%eax
    1145:  mov    %rax,-0x8(%rsp)   ## BUG!!!
    114a:  pushf  
    114b:  pop    %rax
    114c:  mov    -0x8(%rsp),%rax   ## BUG!!!
    1151:  ret


-----
ammarfaizi2@integral2:~$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ammarfaizi2@integral2:~$ clang --version
Ubuntu clang version 16.0.0 (++20230124031324+d63e492562f2-1~exp1~20230124151444.705)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin


-----
test.c:

#include <stdio.h>
static inline void clobber_redzone(void)
{
        register void *rsp __asm__("%rsp");
        unsigned long rflags;

        __asm__ volatile ("pushf; popq %1"
                          : "+r" (rsp), "=r" (rflags));
}

static inline void set_red_zone(long *mem, long val)
{
        __asm__ volatile ("movq %[val], %[mem]"
                           : [mem] "=m" (*mem)
                           : [val] "r" (val));
}

static inline long get_red_zone(long *mem)
{
        long ret;

        __asm__ volatile ("movq %[in], %[out]"
                           : [out] "=r" (ret)
                           : [in] "m" (*mem));
        return ret;
}

__attribute__((__noinline__))
long a_leaf_func_with_red_zone(void)
{
        long x;

        set_red_zone(&x, 1);
        clobber_redzone();
        /* The correct retval is 1 */
        return get_red_zone(&x);
}

int main(void)
{
        printf("ret = %ld\n", a_leaf_func_with_red_zone());
        return 0;
}

-- 
Ammar Faizi



^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture
  2023-01-25 17:48                                         ` Li, Xin3
@ 2023-02-15  7:42                                           ` Li, Xin3
  2023-02-15  7:51                                             ` Ammar Faizi
  2023-02-18  4:27                                             ` Ammar Faizi
  0 siblings, 2 replies; 88+ messages in thread
From: Li, Xin3 @ 2023-02-15  7:42 UTC (permalink / raw)
  To: Li, Xin3, Ammar Faizi, H. Peter Anvin
  Cc: Hansen, Dave, Dave Hansen, Thomas Gleixner, andrew.cooper3,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Lutomirski, Andy, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List


> > > > Would it be better to get this patch set merged first?
> > > >
> > > > Otherwise surely I will include it in the FRED patch set.
> > >
> > > If the maintainers are ok with it, it would be better to merge it
> > > sooner: once we have agreed on the semantics, which I believe we
> > > have, we should be testing those semantics and nothing else.
> >
> > OK, let's keep this patchset separated from the FRED support patchset.
> 
> Thanks!
> 
> This patch set first makes the R11/RCX semantics clearer, and it BTW fixes FRED
> tests.
> 
> To me it's more of an improvement to the existing code.

Hi Faizi,

Any update on this patch set?

Thanks!
Xin


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture
  2023-02-15  7:42                                           ` Li, Xin3
@ 2023-02-15  7:51                                             ` Ammar Faizi
  2023-02-18  4:27                                             ` Ammar Faizi
  1 sibling, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-02-15  7:51 UTC (permalink / raw)
  To: Xin Li
  Cc: H. Peter Anvin, Hansen, Dave, Dave Hansen, Thomas Gleixner,
	andrew.cooper3, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Lutomirski, Andy,
	x86 Mailing List, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On Wed, Feb 15, 2023 at 07:42:47AM +0000, Li, Xin3 wrote:
> Hi Faizi,
> 
> Any update on this patch set?

Xin,

Before I send the next version, I need an answer for this one:
https://lore.kernel.org/lkml/Y9LfmQ%2Fr1%2FpEP+uv@biznet-home.integral.gnuweeb.org/

I don't think the redzone problem is handled correctly here. Using
"+r" (rsp) constraint doesn't solve the redzone problem.

HPA, Andrew, anybody?

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-01-26 20:08                         ` Ammar Faizi
@ 2023-02-15  9:17                           ` Andrew Cooper
  2023-02-15 10:29                             ` Andrew Cooper
  2023-02-15 10:42                             ` Ammar Faizi
  0 siblings, 2 replies; 88+ messages in thread
From: Andrew Cooper @ 2023-02-15  9:17 UTC (permalink / raw)
  To: Ammar Faizi, H. Peter Anvin
  Cc: Dave Hansen, Dave Hansen, Kirill A. Shutemov, Thomas Gleixner,
	Peter Zijlstra, Brian Gerst, Borislav Petkov, Shuah Khan,
	Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On 26/01/2023 8:08 pm, Ammar Faizi wrote:
> On Mon, Jan 23, 2023 at 05:40:23PM -0800, H. Peter Anvin wrote:
>> So as per Andrew's comment, add:
>>
>> register void * rsp asm("%rsp");
>>
>> ...
>>
>> "+r" (rsp)	/* clobber the redzone */
>>
>> ... as the right way to avoid redzone problems.
> I played with this more. I found something wrong with this. This doesn't
> work for me. The compiler still uses red zone despite I use "+r" (rsp).
>
> What did I do wrong?

Well this is a fine mess...

https://godbolt.org/z/MaPM7s8qr does the right thing, but is now
contrary to the prior discussion regarding calls in asm, which concluded
that the "+r"(rsp) was the way to go.

Furthermore GCC regressed in 9.0 and emits:

  warning: listing the stack pointer register 'rsp' in a clobber list is
deprecated [-Wdeprecated]

which might be the intention of the developers, but is wrong seeing as
this is the only way to say "I modify the redzone" to the compiler...

~Andrew

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-02-15  9:17                           ` Andrew Cooper
@ 2023-02-15 10:29                             ` Andrew Cooper
  2023-02-15 10:44                               ` Ammar Faizi
  2023-02-15 10:42                             ` Ammar Faizi
  1 sibling, 1 reply; 88+ messages in thread
From: Andrew Cooper @ 2023-02-15 10:29 UTC (permalink / raw)
  To: Ammar Faizi, H. Peter Anvin
  Cc: Dave Hansen, Dave Hansen, Kirill A. Shutemov, Thomas Gleixner,
	Peter Zijlstra, Brian Gerst, Borislav Petkov, Shuah Khan,
	Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On 15/02/2023 9:17 am, Andrew Cooper wrote:
> On 26/01/2023 8:08 pm, Ammar Faizi wrote:
>> On Mon, Jan 23, 2023 at 05:40:23PM -0800, H. Peter Anvin wrote:
>>> So as per Andrew's comment, add:
>>>
>>> register void * rsp asm("%rsp");
>>>
>>> ...
>>>
>>> "+r" (rsp)	/* clobber the redzone */
>>>
>>> ... as the right way to avoid redzone problems.
>> I played with this more. I found something wrong with this. This doesn't
>> work for me. The compiler still uses red zone despite I use "+r" (rsp).
>>
>> What did I do wrong?
> Well this is a fine mess...
>
> https://godbolt.org/z/MaPM7s8qr does the right thing, but is now
> contrary to the prior discussion regarding calls in asm, which concluded
> that the "+r"(rsp) was the way to go.
>
> Furthermore GCC regressed in 9.0 and emits:
>
>   warning: listing the stack pointer register 'rsp' in a clobber list is
> deprecated [-Wdeprecated]
>
> which might be the intention of the developers, but is wrong seeing as
> this is the only way to say "I modify the redzone" to the compiler...

I've opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108799

~Andrew

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-02-15  9:17                           ` Andrew Cooper
  2023-02-15 10:29                             ` Andrew Cooper
@ 2023-02-15 10:42                             ` Ammar Faizi
  1 sibling, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-02-15 10:42 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Xin Li, H. Peter Anvin, Dave Hansen, Dave Hansen,
	Kirill A. Shutemov, Thomas Gleixner, Peter Zijlstra, Brian Gerst,
	Borislav Petkov, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	x86 Mailing List, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On Wed, Feb 15, 2023 at 09:17:23AM +0000, Andrew Cooper wrote:
> On 26/01/2023 8:08 pm, Ammar Faizi wrote:
> > What did I do wrong?
> 
> Well this is a fine mess...
> 
> https://godbolt.org/z/MaPM7s8qr does the right thing, but is now
> contrary to the prior discussion regarding calls in asm, which concluded
> that the "+r"(rsp) was the way to go.

Does that also mean the ASM_CALL_CONSTRAINT macro in
arch/x86/include/asm/asm.h macro is wrong?

That macro adds a "+r"(rsp) constraint, and we assume it's safe to
execute the "call" instruction with that constraint in an inline
Assembly.

I am not sure what "+r" (rsp) actually does. And if we are now
complaining, "+r" (rsp) doesn't work. Since when it works? Or at least,
where is that rule written?  I couldn't find any GCC or Clang version
that does it right with the "+r" (rsp) constraint (from a quick playing
with that godbolt link).

> Furthermore GCC regressed in 9.0 and emits:
> 
>   warning: listing the stack pointer register 'rsp' in a clobber list is
> deprecated [-Wdeprecated]
> 
> which might be the intention of the developers, but is wrong seeing as
> this is the only way to say "I modify the redzone" to the compiler...

Yeah, adding "rsp" to the clobber list works. But sadly, it's deprecated
in GCC. Not sure what the reason is.

I think the most straightforward and safest way, for now, is: "Don't
clobber the red zone from the inline asm.".

I will use the previous approach to avoid red-zone clobbering in the
next revision. That's by adding "r12" to the clobber list and preserving
the red zone content in "%r12". 

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system
  2023-02-15 10:29                             ` Andrew Cooper
@ 2023-02-15 10:44                               ` Ammar Faizi
  0 siblings, 0 replies; 88+ messages in thread
From: Ammar Faizi @ 2023-02-15 10:44 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Xin Li, H. Peter Anvin, Dave Hansen, Dave Hansen,
	Kirill A. Shutemov, Thomas Gleixner, Peter Zijlstra, Brian Gerst,
	Borislav Petkov, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	x86 Mailing List, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On Wed, Feb 15, 2023 at 10:29:37AM +0000, Andrew Cooper wrote:
> I've opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108799

Added myself to the CC list. Thanks for opening.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture
  2023-02-15  7:42                                           ` Li, Xin3
  2023-02-15  7:51                                             ` Ammar Faizi
@ 2023-02-18  4:27                                             ` Ammar Faizi
  2023-02-18  4:51                                               ` H. Peter Anvin
  1 sibling, 1 reply; 88+ messages in thread
From: Ammar Faizi @ 2023-02-18  4:27 UTC (permalink / raw)
  To: Xin Li
  Cc: H. Peter Anvin, Dave Hansen, Dave Hansen, Thomas Gleixner,
	Andrew Cooper, Brian Gerst, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Shuah Khan, Ingo Molnar, Andy Lutomirski,
	x86 Mailing List, Kirill A. Shutemov,
	Linux Kselftest Mailing List, Linux Kernel Mailing List

On Wed, Feb 15, 2023 at 07:42:47AM +0000, Li, Xin3 wrote:
> Hi Faizi,
> 
> Any update on this patch set?

No comment from HPA. But after the recent discussion with Andrew, I
think at least it's now clear that we are not going to use "+r"(rsp) to
avoid the red zone problem.

I am on leave today. Will send revision v8 on Monday.

Thanks,

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture
  2023-02-18  4:27                                             ` Ammar Faizi
@ 2023-02-18  4:51                                               ` H. Peter Anvin
  0 siblings, 0 replies; 88+ messages in thread
From: H. Peter Anvin @ 2023-02-18  4:51 UTC (permalink / raw)
  To: Ammar Faizi, Xin Li
  Cc: Dave Hansen, Dave Hansen, Thomas Gleixner, Andrew Cooper,
	Brian Gerst, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Shuah Khan, Ingo Molnar, Andy Lutomirski, x86 Mailing List,
	Kirill A. Shutemov, Linux Kselftest Mailing List,
	Linux Kernel Mailing List

On February 17, 2023 8:27:40 PM PST, Ammar Faizi <ammarfaizi2@gnuweeb.org> wrote:
>On Wed, Feb 15, 2023 at 07:42:47AM +0000, Li, Xin3 wrote:
>> Hi Faizi,
>> 
>> Any update on this patch set?
>
>No comment from HPA. But after the recent discussion with Andrew, I
>think at least it's now clear that we are not going to use "+r"(rsp) to
>avoid the red zone problem.
>
>I am on leave today. Will send revision v8 on Monday.
>
>Thanks,
>

My apologies, I missed your latest response in the torrent of email. The redzone issue is weird; it ought to be breaking all over the place, not just this.

Let me take a quick look at it...

^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2023-02-18  4:54 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <SA1PR11MB6734FA9139B9C9F6CC2ED123A8C59@SA1PR11MB6734.namprd11.prod.outlook.com>
2023-01-20 17:45 ` the x86 sysret_rip test fails on the Intel FRED architecture Dave Hansen
     [not found]   ` <eb81f7f2-d266-d999-b41a-e6eae086e731@citrix.com>
2023-01-20 20:50     ` H. Peter Anvin
2023-01-20 21:10       ` Andrew Cooper
2023-01-20 21:17         ` H. Peter Anvin
2023-01-20 21:29           ` Andrew Cooper
2023-01-21  4:59   ` H. Peter Anvin
2023-01-21 16:46     ` Dave Hansen
2023-01-21 21:47       ` Brian Gerst
2023-01-22  3:01         ` Li, Xin3
2023-01-22  3:28           ` H. Peter Anvin
2023-01-22  3:38     ` Li, Xin3
2023-01-22  4:34       ` Dave Hansen
2023-01-22  4:44         ` H. Peter Anvin
2023-01-22  8:22           ` Li, Xin3
2023-01-22  8:54             ` Ammar Faizi
2023-01-22  9:40               ` H. Peter Anvin
2023-01-22 23:45         ` H. Peter Anvin
2023-01-23  9:02           ` Ammar Faizi
2023-01-23 19:43             ` H. Peter Anvin
2023-01-23 23:43               ` Ammar Faizi
2023-01-23 23:58                 ` H. Peter Anvin
2023-01-24  0:26                   ` [RFC PATCH v1 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
2023-01-24  0:26                     ` [RFC PATCH v1 1/2] selftests/x86: sysret_rip: Handle syscall in a " Ammar Faizi
2023-01-24  1:40                       ` H. Peter Anvin
2023-01-24  2:31                         ` Ammar Faizi
2023-01-26 20:08                         ` Ammar Faizi
2023-02-15  9:17                           ` Andrew Cooper
2023-02-15 10:29                             ` Andrew Cooper
2023-02-15 10:44                               ` Ammar Faizi
2023-02-15 10:42                             ` Ammar Faizi
2023-01-26 20:16                         ` Ammar Faizi
2023-01-24  0:26                     ` [RFC PATCH v1 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
2023-01-23 23:53             ` the x86 sysret_rip test fails on the Intel FRED architecture Andrew Cooper
2023-01-24  0:01               ` H. Peter Anvin
2023-01-24  2:27                 ` [RFC PATCH v2 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
2023-01-24  2:27                   ` [RFC PATCH v2 1/2] selftests/x86: sysret_rip: Handle syscall in a " Ammar Faizi
2023-01-24  5:44                     ` H. Peter Anvin
2023-01-24  2:27                   ` [RFC PATCH v2 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
2023-01-24  6:16                     ` H. Peter Anvin
2023-01-24  6:41                       ` Ammar Faizi
2023-01-24  6:47                         ` Ammar Faizi
2023-01-24  9:07                         ` H. Peter Anvin
2023-01-24  9:12                           ` Ammar Faizi
2023-01-24 10:09                             ` [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system Ammar Faizi
2023-01-24 10:09                               ` [RFC PATCH v3 1/2] selftests/x86: sysret_rip: Handle syscall in a " Ammar Faizi
2023-01-24 10:09                               ` [RFC PATCH v3 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
2023-01-24 20:59                                 ` H. Peter Anvin
2023-01-25  3:29                                   ` Ammar Faizi
2023-01-24 21:32                               ` [RFC PATCH v3 0/2] selftests/x86: sysret_rip update for FRED system Li, Xin3
2023-01-24 21:37                                 ` H. Peter Anvin
2023-01-24 23:20                                   ` Li, Xin3
2023-01-25  3:27                                   ` Ammar Faizi
2023-01-24 21:51                                 ` Andrew Cooper
2023-01-24 23:58                                   ` Li, Xin3
2023-01-25  3:22                             ` [RFC PATCH v4 0/2] sysret_rip update for the Intel FRED architecture Ammar Faizi
2023-01-25  3:22                               ` [RFC PATCH v4 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
2023-01-25  3:37                                 ` Ammar Faizi
2023-01-25  3:44                                   ` Ammar Faizi
2023-01-25  3:22                               ` [RFC PATCH v4 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
2023-01-25  3:49                             ` [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture Ammar Faizi
2023-01-25  3:49                               ` [RFC PATCH v5 1/2] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
2023-01-25  8:39                                 ` H. Peter Anvin
2023-01-25  8:53                                   ` Ammar Faizi
2023-01-25  9:57                                   ` Ammar Faizi
2023-01-25 10:01                                     ` Ammar Faizi
2023-01-25 10:17                                     ` H. Peter Anvin
2023-01-25 11:37                                       ` Ammar Faizi
2023-01-25 17:25                                         ` H. Peter Anvin
2023-01-25  3:49                               ` [RFC PATCH v5 2/2] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
2023-01-25  8:22                               ` [RFC PATCH v5 0/2] sysret_rip update for the Intel FRED architecture Li, Xin3
2023-01-25  8:32                                 ` Ammar Faizi
2023-01-25 17:07                                   ` Li, Xin3
2023-01-25 17:24                                     ` H. Peter Anvin
2023-01-25 17:41                                       ` Ammar Faizi
2023-01-25 17:48                                         ` Li, Xin3
2023-02-15  7:42                                           ` Li, Xin3
2023-02-15  7:51                                             ` Ammar Faizi
2023-02-18  4:27                                             ` Ammar Faizi
2023-02-18  4:51                                               ` H. Peter Anvin
2023-01-25 21:17                             ` [RFC PATCH v6 0/3] " Ammar Faizi
2023-01-25 21:17                               ` [RFC PATCH v6 1/3] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
2023-01-25 23:01                                 ` Ammar Faizi
2023-01-25 21:17                               ` [RFC PATCH v6 2/3] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
2023-01-25 21:17                               ` [RFC PATCH v6 3/3] selftests/x86: sysret_rip: Test opportunistic SYSRET Ammar Faizi
2023-01-25 23:24                             ` [RFC PATCH v7 0/3] sysret_rip update for the Intel FRED architecture Ammar Faizi
2023-01-25 23:24                               ` [RFC PATCH v7 1/3] selftests/x86: sysret_rip: Handle syscall in a FRED system Ammar Faizi
2023-01-25 23:24                               ` [RFC PATCH v7 2/3] selftests/x86: sysret_rip: Add more syscall tests with respect to `%rcx` and `%r11` Ammar Faizi
2023-01-25 23:24                               ` [RFC PATCH v7 3/3] selftests/x86: sysret_rip: Test SYSRET with a signal handler Ammar Faizi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).