linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	paulmck <paulmck@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Paul Turner <pjt@google.com>,
	linux-api <linux-api@vger.kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Florian Weimer <fw@deneb.enyo.de>,
	David Laight <David.Laight@ACULAB.COM>,
	carlos <carlos@redhat.com>
Subject: Re: [RFC PATCH v2 1/2] rseq: x86: implement abort-at-ip extension
Date: Tue, 11 Jan 2022 12:43:05 -0500 (EST)	[thread overview]
Message-ID: <1626924888.21447.1641922985771.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20220111110556.inteixgtl5vpmka7@wittgenstein>

----- On Jan 11, 2022, at 6:05 AM, Christian Brauner brauner@kernel.org wrote:

> On Mon, Jan 10, 2022 at 12:16:10PM -0500, Mathieu Desnoyers wrote:
>> Allow rseq critical section abort handlers to optionally figure out at
>> which instruction pointer the rseq critical section was aborted.
>> 
>> This allows implementing rseq critical sections with loops containing
>> the commit instruction, for which having the commit as last instruction
>> of the sequence is not possible.  This is useful to implement adaptative
>> mutexes aware of preemption in user-space. (see [1])
>> 
>> This also allows implementing rseq critical sections with multiple
>> commit steps, and use the abort-at-ip information to figure out what
>> needs to be undone in the abort handler.
>> 
>> Introduce the RSEQ_FLAG_QUERY_ABORT_AT_IP rseq system call flag.  This
>> lets userspace query whether the kernel and architecture supports the
>> abort-at-ip rseq extension.
>> 
>> Only provide this information for rseq critical sections for which the
>> rseq_cs descriptor has the RSEQ_CS_FLAG_ABORT_AT_IP flag set.  Abort
>> handlers for critical sections with this flag set need to readjust the
>> stack pointer.  The abort-at-ip pointer is populated by the kernel on
>> the top of stack on abort.  For x86-32, the abort handler of an
>> abort-at-ip critical section needs to add 4 bytes to the stack pointer.
>> For x86-64, the abort hanler needs to add 136 bytes to the stack
>> pointer: 8 bytes to skip the abort-at-ip value, and 128 bytes to skip
>> the x86-64 redzone for leaf functions.
>> 
>> [1]
>> https://github.com/compudj/rseq-test/blob/adapt-lock-abort-at-ip/test-rseq-adaptative-lock.c#L80
>> 
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> ---
>> Changes since v1:
>> - Use top of stack to place abort-at-ip value rather than ecx/rcx
>>   register,
>> - Skip redzone on x86-64.
>> ---
>>  arch/x86/include/asm/ptrace.h |  5 +++++
>>  arch/x86/kernel/ptrace.c      | 12 ++++++++++++
>>  include/uapi/linux/rseq.h     |  4 ++++
>>  kernel/rseq.c                 | 28 ++++++++++++++++++++++++++++
>>  4 files changed, 49 insertions(+)
>> 
>> diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
>> index 703663175a5a..c96eb2448110 100644
>> --- a/arch/x86/include/asm/ptrace.h
>> +++ b/arch/x86/include/asm/ptrace.h
>> @@ -387,5 +387,10 @@ extern int do_set_thread_area(struct task_struct *p, int
>> idx,
>>  # define do_set_thread_area_64(p, s, t)	(0)
>>  #endif
>>  
>> +#ifdef CONFIG_RSEQ
>> +# define RSEQ_ARCH_HAS_ABORT_AT_IP
>> +int rseq_abort_at_ip(struct pt_regs *regs, unsigned long ip);
>> +#endif
>> +
>>  #endif /* !__ASSEMBLY__ */
>>  #endif /* _ASM_X86_PTRACE_H */
>> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
>> index 6d2244c94799..561ed98d12ba 100644
>> --- a/arch/x86/kernel/ptrace.c
>> +++ b/arch/x86/kernel/ptrace.c
>> @@ -1368,3 +1368,15 @@ void user_single_step_report(struct pt_regs *regs)
>>  {
>>  	send_sigtrap(regs, 0, TRAP_BRKPT);
>>  }
>> +
>> +int rseq_abort_at_ip(struct pt_regs *regs, unsigned long ip)
>> +{
>> +	if (user_64bit_mode(regs)) {
>> +		/* Need to skip redzone for leaf functions. */
>> +		regs->sp -= sizeof(u64) + 128;
>> +		return put_user(ip, (u64 __user *)regs->sp);
>> +	} else {
>> +		regs->sp -= sizeof(u32);
>> +		return put_user(ip, (u32 __user *)regs->sp);
>> +	}
> 
> I think it would be really helpful if you added the full explanation for
> sizeof(u64) + 128 or -sizeof(u32) into this codepath or provide
> constants. For folks not familiar with stuff like this it'll look like
> magic numbers. :)

Good point, here is the planned update:

int rseq_abort_at_ip(struct pt_regs *regs, unsigned long ip)
{
        if (user_64bit_mode(regs)) {
                /*
                 * rseq abort-at-ip x86-64 ABI: stack pointer is decremented to
                 * skip the redzone (128 bytes on x86-64), and decremented of
                 * the pointer size (8 bytes).  The aborted address (abort-at-ip)
                 * is stored at this updated stack pointer location (top of stack).
                 *
                 * Skipping the redzone is needed to make sure not to corrupt
                 * stack data when the rseq critical section is within a leaf
                 * function.
                 */
                regs->sp -= sizeof(u64) + 128;
                return put_user(ip, (u64 __user *)regs->sp);
        } else {
                /*
                 * rseq abort-at-ip x86-32 ABI: stack pointer is decremented of
                 * the pointer size (4 bytes).  The aborted address (abort-at-ip)
                 * is stored at this updated stack pointer location (top of stack).
                 */
                regs->sp -= sizeof(u32);
                return put_user(ip, (u32 __user *)regs->sp);
        }
}



> 
>> +}
>> diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
>> index 9a402fdb60e9..3130232e6d0c 100644
>> --- a/include/uapi/linux/rseq.h
>> +++ b/include/uapi/linux/rseq.h
>> @@ -20,12 +20,14 @@ enum rseq_cpu_id_state {
>>  
>>  enum rseq_flags {
>>  	RSEQ_FLAG_UNREGISTER = (1 << 0),
>> +	RSEQ_FLAG_QUERY_ABORT_AT_IP = (1 << 1),
>>  };
>>  
>>  enum rseq_cs_flags_bit {
>>  	RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT	= 0,
>>  	RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT	= 1,
>>  	RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT	= 2,
>> +	RSEQ_CS_FLAG_ABORT_AT_IP_BIT		= 3,
>>  };
>>  
>>  enum rseq_cs_flags {
>> @@ -35,6 +37,8 @@ enum rseq_cs_flags {
>>  		(1U << RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT),
>>  	RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE	=
>>  		(1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT),
>> +	RSEQ_CS_FLAG_ABORT_AT_IP		=
>> +		(1U << RSEQ_CS_FLAG_ABORT_AT_IP_BIT),
>>  };
>>  
>>  /*
>> diff --git a/kernel/rseq.c b/kernel/rseq.c
>> index 6d45ac3dae7f..fb52f2d11b56 100644
>> --- a/kernel/rseq.c
>> +++ b/kernel/rseq.c
>> @@ -21,6 +21,13 @@
>>  #define RSEQ_CS_PREEMPT_MIGRATE_FLAGS (RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE | \
>>  				       RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT)
>>  
>> +#ifdef RSEQ_ARCH_HAS_ABORT_AT_IP
>> +static bool rseq_has_abort_at_ip(void) { return true; }
>> +#else
>> +static bool rseq_has_abort_at_ip(void) { return false; }
>> +static int rseq_abort_at_ip(struct pt_regs *regs, unsigned long ip) { return 0;
>> }
>> +#endif
>> +
>>  /*
>>   *
>>   * Restartable sequences are a lightweight interface that allows
>> @@ -79,6 +86,16 @@
>>   *
>>   *       [abort_ip]
>>   *   F1. <failure>
>> + *
>> + * rseq critical sections defined with the RSEQ_CS_FLAG_ABORT_AT_IP flag
>> + * have the following behavior on abort: when the stack grows down: the
>> + * stack pointer is decremented to skip the redzone, and decremented of
>> + * the pointer size.  The aborted address (abort-at-ip) is stored at
>> + * this stack pointer location.  The user-space abort handler needs to
>> + * pop the abort-at-ip address from the stack, and add the redzone size
>> + * to the stack pointer.
>> + *
>> + * TODO: describe stack grows up.
> 
> Is this intentional or did you forget? :)

Since I did not implement abort-at-ip on stack-grows-up architectures, I felt
it would be too early to describe the algorithm. I can simply remove the TODO
altogether and we'll take care of it when we get there ? If I had to try to
wordsmith it, it would look like e.g.:

 *                                    [...] When the stack grows up: the
 * stack pointer is incremented to skip the redzone, and incremented of
 * the pointer size.  The aborted address (abort-at-ip) is stored immediately
 * under this stack pointer location.  The user-space abort handler needs to
 * pop the abort-at-ip address from the stack, and subtract the redzone size
 * from the stack pointer.

[ Please let me know if I got somehow confused in my understanding of stack grows
up architectures. ]

I'm also unsure whether any of the stack grows up architecture have redzones ?
From a quick grep for redzone in Linux arch/, only openrisc, powerpc64 and
x86-64 appear to have redzones.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2022-01-11 17:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-10 17:16 [RFC PATCH v2 1/2] rseq: x86: implement abort-at-ip extension Mathieu Desnoyers
2022-01-10 17:16 ` [RFC PATCH v2 2/2] selftests: rseq: test abort-at-ip extension on x86 Mathieu Desnoyers
2022-01-11 11:05 ` [RFC PATCH v2 1/2] rseq: x86: implement abort-at-ip extension Christian Brauner
2022-01-11 17:43   ` Mathieu Desnoyers [this message]
2022-01-12  8:46     ` Christian Brauner
2022-01-12 14:47       ` Mathieu Desnoyers
2022-01-12 14:55         ` Christian Brauner
2022-01-12 14:58         ` David Laight
2022-01-12 15:05           ` Mathieu Desnoyers
2022-01-12 15:15             ` David Laight
2022-01-12 15:24               ` Mathieu Desnoyers
2022-01-12 15:34               ` Peter Zijlstra
2022-01-12 15:53                 ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1626924888.21447.1641922985771.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=boqun.feng@gmail.com \
    --cc=brauner@kernel.org \
    --cc=carlos@redhat.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=fw@deneb.enyo.de \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).