* [RFC PATCH 0/4] rseq: Introduce extensible struct rseq @ 2020-07-14 3:03 Mathieu Desnoyers 2020-07-14 3:03 ` [RFC PATCH 1/4] selftests: rseq: Use fixed value as rseq_len parameter Mathieu Desnoyers ` (4 more replies) 0 siblings, 5 replies; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-14 3:03 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Thomas Gleixner, Paul E . McKenney, Boqun Feng, H . Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, carlos, Mathieu Desnoyers Recent discussion led to a solution for extending struct rseq. This is an implementation of the proposed solution. Now is a good time to agree on this scheme before the release of glibc 2.32, just in case there are small details to fix on the user-space side in order to allow extending struct rseq. Thanks, Mathieu Mathieu Desnoyers (4): selftests: rseq: Use fixed value as rseq_len parameter rseq: Allow extending struct rseq selftests: rseq: define __rseq_abi with extensible size selftests: rseq: print rseq extensible size in basic test include/linux/sched.h | 4 +++ include/uapi/linux/rseq.h | 42 ++++++++++++++++++++-- kernel/rseq.c | 44 +++++++++++++++++++---- tools/testing/selftests/rseq/basic_test.c | 15 ++++++++ tools/testing/selftests/rseq/rseq.c | 8 +++-- 5 files changed, 101 insertions(+), 12 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [RFC PATCH 1/4] selftests: rseq: Use fixed value as rseq_len parameter 2020-07-14 3:03 [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Mathieu Desnoyers @ 2020-07-14 3:03 ` Mathieu Desnoyers 2020-07-14 3:03 ` [RFC PATCH 2/4] rseq: Allow extending struct rseq Mathieu Desnoyers ` (3 subsequent siblings) 4 siblings, 0 replies; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-14 3:03 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Thomas Gleixner, Paul E . McKenney, Boqun Feng, H . Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, carlos, Mathieu Desnoyers The rseq registration and unregistration expect a fixed-size length (32 bytes). In preparation to extend struct rseq, pass a fixed value rather than the size of the rseq structure. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> --- include/uapi/linux/rseq.h | 5 +++++ tools/testing/selftests/rseq/rseq.c | 5 ++--- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h index 9a402fdb60e9..e11d9df5e564 100644 --- a/include/uapi/linux/rseq.h +++ b/include/uapi/linux/rseq.h @@ -37,6 +37,11 @@ enum rseq_cs_flags { (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT), }; +/* The rseq_len expected by rseq registration is always 32 bytes. */ +enum rseq_len_expected { + RSEQ_LEN_EXPECTED = 32, +}; + /* * struct rseq_cs is aligned on 4 * 8 bytes to ensure it is always * contained within a single cache-line. It is usually declared as diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c index 7159eb777fd3..da2264c679b9 100644 --- a/tools/testing/selftests/rseq/rseq.c +++ b/tools/testing/selftests/rseq/rseq.c @@ -87,7 +87,7 @@ int rseq_register_current_thread(void) } if (__rseq_refcount++) goto end; - rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), 0, RSEQ_SIG); + rc = sys_rseq(&__rseq_abi, RSEQ_LEN_EXPECTED, 0, RSEQ_SIG); if (!rc) { assert(rseq_current_cpu_raw() >= 0); goto end; @@ -115,8 +115,7 @@ int rseq_unregister_current_thread(void) } if (--__rseq_refcount) goto end; - rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), - RSEQ_FLAG_UNREGISTER, RSEQ_SIG); + rc = sys_rseq(&__rseq_abi, RSEQ_LEN_EXPECTED, RSEQ_FLAG_UNREGISTER, RSEQ_SIG); if (!rc) goto end; __rseq_refcount = 1; -- 2.17.1 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-14 3:03 [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Mathieu Desnoyers 2020-07-14 3:03 ` [RFC PATCH 1/4] selftests: rseq: Use fixed value as rseq_len parameter Mathieu Desnoyers @ 2020-07-14 3:03 ` Mathieu Desnoyers 2020-07-14 9:58 ` Florian Weimer ` (2 more replies) 2020-07-14 3:03 ` [RFC PATCH 3/4] selftests: rseq: define __rseq_abi with extensible size Mathieu Desnoyers ` (2 subsequent siblings) 4 siblings, 3 replies; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-14 3:03 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Thomas Gleixner, Paul E . McKenney, Boqun Feng, H . Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, carlos, Mathieu Desnoyers Add a __rseq_abi.flags "RSEQ_TLS_FLAG_SIZE", which indicates support for extending struct rseq. This adds two new fields to struct rseq: user_size and kernel_size. The user_size field allows the size of the __rseq_abi definition (which can be overridden by symbol interposition either by a preloaded library or by the application) to be handed over to the kernel at registration. This registration can be performed by a library, e.g. glibc, which does not know there is interposition taking place. The kernel_size is populated by the kernel when the "RSEQ_TLS_FLAG_SIZE" flag is set in __rseq_abi.flags to the minimum between user_size and the offset of the "end" field of struct rseq as known by the kernel. This allows user-space to query which fields are effectively populated by the kernel. A rseq_size field is added to the task struct to keep track of the "kernel_size" effective for each thread. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> --- include/linux/sched.h | 4 ++++ include/uapi/linux/rseq.h | 37 ++++++++++++++++++++++++++++++++-- kernel/rseq.c | 42 +++++++++++++++++++++++++++++++++------ 3 files changed, 75 insertions(+), 8 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 692e327d7455..5d61a3197987 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1147,6 +1147,7 @@ struct task_struct { #ifdef CONFIG_RSEQ struct rseq __user *rseq; u32 rseq_sig; + u32 rseq_size; /* * RmW on rseq_event_mask must be performed atomically * with respect to preemption. @@ -1976,10 +1977,12 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) if (clone_flags & CLONE_VM) { t->rseq = NULL; t->rseq_sig = 0; + t->rseq_size = 0; t->rseq_event_mask = 0; } else { t->rseq = current->rseq; t->rseq_sig = current->rseq_sig; + t->rseq_size = current->rseq_size; t->rseq_event_mask = current->rseq_event_mask; } } @@ -1988,6 +1991,7 @@ static inline void rseq_execve(struct task_struct *t) { t->rseq = NULL; t->rseq_sig = 0; + t->rseq_size = 0; t->rseq_event_mask = 0; } diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h index e11d9df5e564..03c0b5e9a859 100644 --- a/include/uapi/linux/rseq.h +++ b/include/uapi/linux/rseq.h @@ -37,6 +37,15 @@ enum rseq_cs_flags { (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT), }; +enum rseq_tls_flags_bit { + /* enum rseq_cs_flags reserves bits 0-2. */ + RSEQ_TLS_FLAG_SIZE_BIT = 3, +}; + +enum rseq_tls_flags { + RSEQ_TLS_FLAG_SIZE = (1U << RSEQ_TLS_FLAG_SIZE_BIT), +}; + /* The rseq_len expected by rseq registration is always 32 bytes. */ enum rseq_len_expected { RSEQ_LEN_EXPECTED = 32, @@ -133,8 +142,9 @@ struct rseq { * * This field should only be updated by the thread which * registered this data structure. Read by the kernel. - * Mainly used for single-stepping through rseq critical sections - * with debuggers. + * + * The RSEQ_CS flags are mainly used for single-stepping through rseq + * critical sections with debuggers. * * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT * Inhibit instruction sequence block restart on preemption @@ -145,8 +155,31 @@ struct rseq { * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE * Inhibit instruction sequence block restart on migration for * this thread. + * + * - RSEQ_TLS_FLAG_SIZE + * Extensible struct rseq ABI. This flag should be statically + * initialized. */ __u32 flags; + /* + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, user_size should be + * statically initialized to offsetof(struct rseq, end). + */ + __u16 user_size; + /* + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, if the kernel supports + * extensible struct rseq ABI, the kernel_size field is populated by + * the kernel to the minimum between user_size and the offset of the + * "end" field within the struct rseq supported by the kernel on + * successful registration. Should be initialized to 0. + */ + __u16 kernel_size; + + /* + * Very last field of the structure, to calculate size excluding padding + * with offsetof(). + */ + char end[]; } __attribute__((aligned(4 * sizeof(__u64)))); #endif /* _UAPI_LINUX_RSEQ_H */ diff --git a/kernel/rseq.c b/kernel/rseq.c index a4f86a9d6937..bbc57fc18573 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -96,6 +96,7 @@ static int rseq_update_cpu_id(struct task_struct *t) static int rseq_reset_rseq_cpu_id(struct task_struct *t) { u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED; + u16 kernel_size = 0; /* * Reset cpu_id_start to its initial state (0). @@ -109,6 +110,11 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t) */ if (put_user(cpu_id, &t->rseq->cpu_id)) return -EFAULT; + /* + * Reset kernel_size to its initial state (0). + */ + if (put_user(kernel_size, &t->rseq->kernel_size)) + return -EFAULT; return 0; } @@ -266,7 +272,7 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs) if (unlikely(t->flags & PF_EXITING)) return; - if (unlikely(!access_ok(t->rseq, sizeof(*t->rseq)))) + if (unlikely(!access_ok(t->rseq, t->rseq_size))) goto error; ret = rseq_ip_fixup(regs); if (unlikely(ret < 0)) @@ -294,7 +300,7 @@ void rseq_syscall(struct pt_regs *regs) if (!t->rseq) return; - if (!access_ok(t->rseq, sizeof(*t->rseq)) || + if (!access_ok(t->rseq, t->rseq_size) || rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) force_sig(SIGSEGV); } @@ -308,6 +314,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, int, flags, u32, sig) { int ret; + u32 tls_flags; if (flags & RSEQ_FLAG_UNREGISTER) { if (flags & ~RSEQ_FLAG_UNREGISTER) @@ -315,7 +322,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, /* Unregister rseq for current thread. */ if (current->rseq != rseq || !current->rseq) return -EINVAL; - if (rseq_len != sizeof(*rseq)) + if (rseq_len != RSEQ_LEN_EXPECTED) return -EINVAL; if (current->rseq_sig != sig) return -EPERM; @@ -323,6 +330,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, if (ret) return ret; current->rseq = NULL; + current->rseq_size = 0; current->rseq_sig = 0; return 0; } @@ -336,7 +344,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, * the provided address differs from the prior * one. */ - if (current->rseq != rseq || rseq_len != sizeof(*rseq)) + if (current->rseq != rseq || rseq_len != RSEQ_LEN_EXPECTED) return -EINVAL; if (current->rseq_sig != sig) return -EPERM; @@ -349,10 +357,32 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, * ensure the provided rseq is properly aligned and valid. */ if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || - rseq_len != sizeof(*rseq)) + rseq_len != RSEQ_LEN_EXPECTED) return -EINVAL; - if (!access_ok(rseq, rseq_len)) + if (!access_ok(rseq, RSEQ_LEN_EXPECTED)) return -EFAULT; + + /* Handle extensible struct rseq ABI. */ + ret = get_user(tls_flags, &rseq->flags); + if (ret) + return ret; + if (tls_flags & RSEQ_TLS_FLAG_SIZE) { + u16 user_size, kernel_size; + + ret = get_user(user_size, &rseq->user_size); + if (ret) + return ret; + if (user_size < offsetof(struct rseq, kernel_size) + sizeof(u16)) + return -EINVAL; + kernel_size = min_t(u16, user_size, offsetof(struct rseq, end)); + ret = put_user(kernel_size, &rseq->kernel_size); + if (ret) + return ret; + current->rseq_size = kernel_size; + } else { + current->rseq_size = offsetof(struct rseq, user_size); + } + current->rseq = rseq; current->rseq_sig = sig; /* -- 2.17.1 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-14 3:03 ` [RFC PATCH 2/4] rseq: Allow extending struct rseq Mathieu Desnoyers @ 2020-07-14 9:58 ` Florian Weimer 2020-07-14 12:50 ` Mathieu Desnoyers 2020-07-14 17:24 ` Peter Oskolkov 2020-07-15 11:38 ` Christian Brauner 2 siblings, 1 reply; 37+ messages in thread From: Florian Weimer @ 2020-07-14 9:58 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, Paul E . McKenney, Boqun Feng, H . Peter Anvin, Paul Turner, linux-api, Christian Brauner, carlos * Mathieu Desnoyers: > + /* > + * Very last field of the structure, to calculate size excluding padding > + * with offsetof(). > + */ > + char end[]; > } __attribute__((aligned(4 * sizeof(__u64)))); This makes the header incompatible with standard C++. How are extensions going to affect the definition of struct rseq, including its alignment? As things stand now, glibc 2.32 will make the size and alignment of struct rseq part of its ABI, so it can't really change after that. With a different approach, we can avoid making the symbol size part of the ABI, but then we cannot use the __rseq_abi TLS symbol. As a result, interoperability with early adopters would be lost. One way to avoid this problem would be for every library to register its own rseq area, of the appropriate size. Then process-wide coordination in userspace would not be needed. Thanks, Florian ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-14 9:58 ` Florian Weimer @ 2020-07-14 12:50 ` Mathieu Desnoyers 2020-07-14 13:00 ` Florian Weimer 0 siblings, 1 reply; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-14 12:50 UTC (permalink / raw) To: Florian Weimer Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, carlos ----- On Jul 14, 2020, at 5:58 AM, Florian Weimer fweimer@redhat.com wrote: > * Mathieu Desnoyers: > >> + /* >> + * Very last field of the structure, to calculate size excluding padding >> + * with offsetof(). >> + */ >> + char end[]; >> } __attribute__((aligned(4 * sizeof(__u64)))); > > This makes the header incompatible with standard C++. One alternative would be to add a helper to compute the effective size on c++, e.g.: /* Always updated with struct rseq_cs declaration. */ #define rseq_last_field kernel_size static inline size_t rseq_effective_size(void) { return offsetof(struct rseq, rseq_last_field) + sizeof(((struct rseq *)NULL)->rseq_last_field); } > > How are extensions going to affect the definition of struct rseq, > including its alignment? The alignment will never decrease. If the structure becomes large enough its alignment could theoretically increase. Would that be an issue ? > As things stand now, glibc 2.32 will make the size and alignment of > struct rseq part of its ABI, so it can't really change after that. Can the size and alignment of a structure be defined as minimum alignment and size values ? For instance, those would be invariant for a given glibc version (if we always use the internal struct rseq declaration), but could be increased in future versions. > With a different approach, we can avoid making the symbol size part of > the ABI, but then we cannot use the __rseq_abi TLS symbol. As a result, > interoperability with early adopters would be lost. Do you mean with a function "getter", and then keeping that pointer around in a per-user TLS ? I would prefer to avoid that because it adds an extra pointer dereference on a fast path. > One way to avoid this problem would be for every library to register its > own rseq area, of the appropriate size. Then process-wide coordination > in userspace would not be needed. I did propose the code to do just that in my initial rseq implementations, but the idea was shutdown by kernel maintainers because it required the kernel to handle a linked-list of rseq areas per thread, which was more complex within the kernel. Thanks, Mathieu > > Thanks, > Florian -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-14 12:50 ` Mathieu Desnoyers @ 2020-07-14 13:00 ` Florian Weimer 2020-07-14 13:19 ` Mathieu Desnoyers 0 siblings, 1 reply; 37+ messages in thread From: Florian Weimer @ 2020-07-14 13:00 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, carlos * Mathieu Desnoyers: >> How are extensions going to affect the definition of struct rseq, >> including its alignment? > > The alignment will never decrease. If the structure becomes large enough > its alignment could theoretically increase. Would that be an issue ? Telling the compiler that struct is larger than it actually is, or that it has more alignment than in memory, results in undefined behavior, even if only fields are accessed in the smaller struct region. An increase in alignment from 32 to 64 is perhaps not likely to have this effect. But the undefined behavior is still there, and has been observed for mismatches like 8 vs 16. >> As things stand now, glibc 2.32 will make the size and alignment of >> struct rseq part of its ABI, so it can't really change after that. > > Can the size and alignment of a structure be defined as minimum alignment > and size values ? For instance, those would be invariant for a given glibc > version (if we always use the internal struct rseq declaration), but could > be increased in future versions. Not if we are talking about a global (TLS) data symbol. No such changes are possible there. We have some workarounds for symbols that live exclusively within glibc, but they don't work if there are libraries out there which interpose the symbol. >> With a different approach, we can avoid making the symbol size part of >> the ABI, but then we cannot use the __rseq_abi TLS symbol. As a result, >> interoperability with early adopters would be lost. > > Do you mean with a function "getter", and then keeping that pointer around > in a per-user TLS ? I would prefer to avoid that because it adds an extra > pointer dereference on a fast path. My choice would have been a function that returns the offset from the thread pointer (which has to be unchanged regarding all threads). Thanks, Florian ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-14 13:00 ` Florian Weimer @ 2020-07-14 13:19 ` Mathieu Desnoyers 2020-07-14 21:30 ` Carlos O'Donell 0 siblings, 1 reply; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-14 13:19 UTC (permalink / raw) To: Florian Weimer Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, carlos ----- On Jul 14, 2020, at 9:00 AM, Florian Weimer fweimer@redhat.com wrote: > * Mathieu Desnoyers: > >>> How are extensions going to affect the definition of struct rseq, >>> including its alignment? >> >> The alignment will never decrease. If the structure becomes large enough >> its alignment could theoretically increase. Would that be an issue ? > > Telling the compiler that struct is larger than it actually is, or that > it has more alignment than in memory, results in undefined behavior, > even if only fields are accessed in the smaller struct region. > > An increase in alignment from 32 to 64 is perhaps not likely to have > this effect. But the undefined behavior is still there, and has been > observed for mismatches like 8 vs 16. Good points. > >>> As things stand now, glibc 2.32 will make the size and alignment of >>> struct rseq part of its ABI, so it can't really change after that. >> >> Can the size and alignment of a structure be defined as minimum alignment >> and size values ? For instance, those would be invariant for a given glibc >> version (if we always use the internal struct rseq declaration), but could >> be increased in future versions. > > Not if we are talking about a global (TLS) data symbol. No such changes > are possible there. We have some workarounds for symbols that live > exclusively within glibc, but they don't work if there are libraries out > there which interpose the symbol. OK > >>> With a different approach, we can avoid making the symbol size part of >>> the ABI, but then we cannot use the __rseq_abi TLS symbol. As a result, >>> interoperability with early adopters would be lost. >> >> Do you mean with a function "getter", and then keeping that pointer around >> in a per-user TLS ? I would prefer to avoid that because it adds an extra >> pointer dereference on a fast path. > > My choice would have been a function that returns the offset from the > thread pointer (which has to be unchanged regarding all threads). So AFAIU we would have glibc expose a symbol, e.g.: off_t rseq_tls_offset(void); Which would be typically called by user libraries and applications at initialization to get the offset of the struct rseq. They should store it in a static variable so rseq critical sections can use that offset. Is there an arch-agnostic way to get the thread pointer from user-space code ? That would be needed by all rseq critical section implementations. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-14 13:19 ` Mathieu Desnoyers @ 2020-07-14 21:30 ` Carlos O'Donell 2020-07-15 13:12 ` Mathieu Desnoyers 0 siblings, 1 reply; 37+ messages in thread From: Carlos O'Donell @ 2020-07-14 21:30 UTC (permalink / raw) To: Mathieu Desnoyers, Florian Weimer Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner On 7/14/20 9:19 AM, Mathieu Desnoyers wrote: > Is there an arch-agnostic way to get the thread pointer from user-space code ? That > would be needed by all rseq critical section implementations. Yes, and no. We have void *__builtin_thread_pointer (void), but few architectures implement the builtin so we'd have to go through a round of compiler updates and backports. All targets know how to access the thread pointer because the compiler has to generate IE-mode accesses to the TLS variables. I have filed an enhancement request: Bug 96200 - Implement __builtin_thread_pointer() and __builtin_set_thread_pointer() if TLS is supported https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96200 We have glibc internal macro APIs to access the thread pointer, but I would rather the compiler handle the access since it can schedule the resulting sequence better. On some arches setting the therad pointer needs a syscall or equivalent operation (hppa), and for some arches there is no fixed register (arm) hence the need for __builtin_thread_pointer() to force the compiler to place the pointer into a register for function return. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-14 21:30 ` Carlos O'Donell @ 2020-07-15 13:12 ` Mathieu Desnoyers 2020-07-15 13:22 ` Florian Weimer 0 siblings, 1 reply; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-15 13:12 UTC (permalink / raw) To: carlos Cc: Florian Weimer, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner ----- On Jul 14, 2020, at 5:30 PM, carlos carlos@redhat.com wrote: > On 7/14/20 9:19 AM, Mathieu Desnoyers wrote: >> Is there an arch-agnostic way to get the thread pointer from user-space code ? >> That >> would be needed by all rseq critical section implementations. > > Yes, and no. We have void *__builtin_thread_pointer (void), but > few architectures implement the builtin so we'd have to go through > a round of compiler updates and backports. All targets know how to > access the thread pointer because the compiler has to generate > IE-mode accesses to the TLS variables. Practically speaking, I suspect this would mean postponing availability of rseq for widely deployed applications for a few more years ? I can very well see end users upgrading their kernel and using an early-adoption library to use rseq today, but requiring to upgrade the entire toolchain will likely postpone adoption to many years from now. It would be good to start getting feedback from rseq users so we can progress on the system call feature development. Unfortunately everything has been in a stand-still for the past years due to lack of rseq registration coordination in user-space. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 13:12 ` Mathieu Desnoyers @ 2020-07-15 13:22 ` Florian Weimer 2020-07-15 13:31 ` Mathieu Desnoyers 0 siblings, 1 reply; 37+ messages in thread From: Florian Weimer @ 2020-07-15 13:22 UTC (permalink / raw) To: Mathieu Desnoyers Cc: carlos, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner * Mathieu Desnoyers: > Practically speaking, I suspect this would mean postponing availability of > rseq for widely deployed applications for a few more years ? There is no rseq support in GCC today, so you have to write assembler code anyway. Thanks, Florian ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 13:22 ` Florian Weimer @ 2020-07-15 13:31 ` Mathieu Desnoyers 2020-07-15 13:42 ` Florian Weimer 0 siblings, 1 reply; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-15 13:31 UTC (permalink / raw) To: Florian Weimer Cc: carlos, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner ----- On Jul 15, 2020, at 9:22 AM, Florian Weimer fweimer@redhat.com wrote: > * Mathieu Desnoyers: > >> Practically speaking, I suspect this would mean postponing availability of >> rseq for widely deployed applications for a few more years ? > > There is no rseq support in GCC today, so you have to write assembler > code anyway. Assembler code is only needed for the rseq critical sections, not for accessing the __rseq_abi. Use-cases like getting the current cpu number do not currently require any assembler for instance. So indeed it could be done today without upgrading the toolchains by writing custom assembler for each architecture to get the thread's struct rseq. AFAIU the ABI to access the thread pointer is fixed for each architecture, right ? How would this allow early-rseq-adopter libraries to interact with glibc ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 13:31 ` Mathieu Desnoyers @ 2020-07-15 13:42 ` Florian Weimer 2020-07-15 13:55 ` Christian Brauner 2020-07-15 14:54 ` Mathieu Desnoyers 0 siblings, 2 replies; 37+ messages in thread From: Florian Weimer @ 2020-07-15 13:42 UTC (permalink / raw) To: Mathieu Desnoyers Cc: carlos, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner * Mathieu Desnoyers: > So indeed it could be done today without upgrading the toolchains by > writing custom assembler for each architecture to get the thread's > struct rseq. AFAIU the ABI to access the thread pointer is fixed for > each architecture, right ? Yes, determining the thread pointer and access initial-exec TLS variables is baked into the ABI. > How would this allow early-rseq-adopter libraries to interact with > glibc ? Under all extension proposals I've seen so far, early adopters are essentially incompatible with glibc rseq registration. I don't think you can have it both ways. Thanks, Florian ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 13:42 ` Florian Weimer @ 2020-07-15 13:55 ` Christian Brauner 2020-07-15 14:20 ` Mathieu Desnoyers 2020-07-15 14:54 ` Mathieu Desnoyers 1 sibling, 1 reply; 37+ messages in thread From: Christian Brauner @ 2020-07-15 13:55 UTC (permalink / raw) To: Florian Weimer Cc: Mathieu Desnoyers, carlos, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api On Wed, Jul 15, 2020 at 03:42:11PM +0200, Florian Weimer wrote: > * Mathieu Desnoyers: > > > So indeed it could be done today without upgrading the toolchains by > > writing custom assembler for each architecture to get the thread's > > struct rseq. AFAIU the ABI to access the thread pointer is fixed for > > each architecture, right ? > > Yes, determining the thread pointer and access initial-exec TLS > variables is baked into the ABI. > > > How would this allow early-rseq-adopter libraries to interact with > > glibc ? > > Under all extension proposals I've seen so far, early adopters are > essentially incompatible with glibc rseq registration. I don't think > you can have it both ways. Who are the early adopters? And if we aren't being compatible with them under the extensible schemes proposed we should be able to achieve compatibility with non-early adopters, right? Which I guess is more important. (I still struggle to make sense what qualifies as an early adopter/what the difference to a non-early adopter is.) Christian ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 13:55 ` Christian Brauner @ 2020-07-15 14:20 ` Mathieu Desnoyers 0 siblings, 0 replies; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-15 14:20 UTC (permalink / raw) To: Christian Brauner Cc: Florian Weimer, carlos, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api ----- On Jul 15, 2020, at 9:55 AM, Christian Brauner christian.brauner@ubuntu.com wrote: > On Wed, Jul 15, 2020 at 03:42:11PM +0200, Florian Weimer wrote: >> * Mathieu Desnoyers: >> >> > So indeed it could be done today without upgrading the toolchains by >> > writing custom assembler for each architecture to get the thread's >> > struct rseq. AFAIU the ABI to access the thread pointer is fixed for >> > each architecture, right ? >> >> Yes, determining the thread pointer and access initial-exec TLS >> variables is baked into the ABI. >> >> > How would this allow early-rseq-adopter libraries to interact with >> > glibc ? >> >> Under all extension proposals I've seen so far, early adopters are >> essentially incompatible with glibc rseq registration. I don't think >> you can have it both ways. > > Who are the early adopters? And if we aren't being compatible with them > under the extensible schemes proposed we should be able to achieve > compatibility with non-early adopters, right? Which I guess is more > important. (I still struggle to make sense what qualifies as an early > adopter/what the difference to a non-early adopter is.) Early adopter libraries and applications are meant to be able to use rseq without requiring upgrade of the entire environment to a newer glibc. I maintain early adopter projects (liburcu, lttng-ust) which postpone using rseq outside of prototype branches until we agree on an ABI to share __rseq_abi between glibc and early adopter libraries. The last thing I want is for those projects to break when an end-user upgrades their glibc. tcmalloc is another early adopter which have less strict compatibility requirements: they are OK with breaking changes requiring upgrading and rebuilding tcmalloc. Indeed, until we cast in stone the layout of struct rseq as exposed by glibc, I think we have some freedom in our definition of "early adopter", because pretty much every relevant open source project which want to use rseq is waiting on glibc to define that ABI, to use rseq either as an early-adopter or through a dependency on newer glibc. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 13:42 ` Florian Weimer 2020-07-15 13:55 ` Christian Brauner @ 2020-07-15 14:54 ` Mathieu Desnoyers 2020-07-15 14:58 ` Florian Weimer 1 sibling, 1 reply; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-15 14:54 UTC (permalink / raw) To: Florian Weimer Cc: carlos, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner ----- On Jul 15, 2020, at 9:42 AM, Florian Weimer fweimer@redhat.com wrote: > * Mathieu Desnoyers: > [...] >> How would this allow early-rseq-adopter libraries to interact with >> glibc ? > > Under all extension proposals I've seen so far, early adopters are > essentially incompatible with glibc rseq registration. I don't think > you can have it both ways. The basic question I'm not sure about is whether we are allowed to increase the size and alignement of __rseq_abi from e.g. glibc 2.32 to glibc 2.33. If not, then we just need to find another way to extend struct rseq, e.g. by adding a pointer to another extended structure in the padding space we have at the end of struct rseq. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 14:54 ` Mathieu Desnoyers @ 2020-07-15 14:58 ` Florian Weimer 2020-07-15 15:26 ` Mathieu Desnoyers 0 siblings, 1 reply; 37+ messages in thread From: Florian Weimer @ 2020-07-15 14:58 UTC (permalink / raw) To: Mathieu Desnoyers Cc: carlos, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner * Mathieu Desnoyers: > ----- On Jul 15, 2020, at 9:42 AM, Florian Weimer fweimer@redhat.com wrote: >> * Mathieu Desnoyers: >> > [...] >>> How would this allow early-rseq-adopter libraries to interact with >>> glibc ? >> >> Under all extension proposals I've seen so far, early adopters are >> essentially incompatible with glibc rseq registration. I don't think >> you can have it both ways. > > The basic question I'm not sure about is whether we are allowed to increase > the size and alignement of __rseq_abi from e.g. glibc 2.32 to glibc 2.33. With the current mechanism (global TLS data symbol), we can do that using symbol versioning. That means that we can only do this on a release boundary, and that it's incompatible with other libraries which use an interposing unversioned symbol. Thanks, Florian ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 14:58 ` Florian Weimer @ 2020-07-15 15:26 ` Mathieu Desnoyers 0 siblings, 0 replies; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-15 15:26 UTC (permalink / raw) To: Florian Weimer Cc: carlos, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner ----- On Jul 15, 2020, at 10:58 AM, Florian Weimer fweimer@redhat.com wrote: > * Mathieu Desnoyers: > >> ----- On Jul 15, 2020, at 9:42 AM, Florian Weimer fweimer@redhat.com wrote: >>> * Mathieu Desnoyers: >>> >> [...] >>>> How would this allow early-rseq-adopter libraries to interact with >>>> glibc ? >>> >>> Under all extension proposals I've seen so far, early adopters are >>> essentially incompatible with glibc rseq registration. I don't think >>> you can have it both ways. >> >> The basic question I'm not sure about is whether we are allowed to increase >> the size and alignement of __rseq_abi from e.g. glibc 2.32 to glibc 2.33. > > With the current mechanism (global TLS data symbol), we can do that > using symbol versioning. That means that we can only do this on a > release boundary, That should not be a problem. > and that it's incompatible with other libraries which > use an interposing unversioned symbol. We have the freedom to define the ABI of this shared __rseq_abi symbol right now. Maybe it's not such a good thing to let early adopters use unversioned __rseq_abi symbols. Let me wrap my head around this scenario then, please let me know if I'm misunderstanding something: 1) glibc 2.32 exposes: __rseq_abi (GLIBC_2.32) with size == 32. __rseq_abi with size == 32 is available as a private symbol within glibc - both symbols alias the same contents. 2) glibc 2.33 exposes: __rseq_abi (GLIBC_2.32) with size == 32. __rseq_abi (GLIBC_2.33) with size == 64. __rseq_abi with size == 64 is available as a private symbol within glibc - the three symbols alias the same contents. Then what happens if we have a program or preloaded library defining __rseq_abi (without version) with size == 32 loaded with a glibc 2.33 ? Or what happens if we have a program or preloaded libary defining __rseq_abi (GLIBC_2.32) with size == 32 loaded with a glibc 2.33 ? I wonder if "GLIBC_*" is the right version namespace for this. Considering that the layout of this structure is defined by the Linux kernel UAPI, maybe we'd want version named as "RSEQ_1.0", "RSEQ_2.0" or something similar. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-14 3:03 ` [RFC PATCH 2/4] rseq: Allow extending struct rseq Mathieu Desnoyers 2020-07-14 9:58 ` Florian Weimer @ 2020-07-14 17:24 ` Peter Oskolkov 2020-07-14 17:43 ` Mathieu Desnoyers 2020-07-15 11:38 ` Christian Brauner 2 siblings, 1 reply; 37+ messages in thread From: Peter Oskolkov @ 2020-07-14 17:24 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, Linux Kernel Mailing List, Thomas Gleixner, Paul E . McKenney, Boqun Feng, H . Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, carlos, Peter Oskolkov At Google, we actually extended struct rseq (I will post the patches here once they are fully deployed and we have specific benefits/improvements to report). We did this by adding several fields below __u32 flags (the last field currently), and correspondingly increasing rseq_len in rseq() syscall. If the kernel does not know of this extension, it will return -EINVAL due to an unexpected rseq_len; then the application can either fall-back to the standard/upstream rseq, or bail. If the kernel does know of this extension, it accepts it. If the application passes the old rseq_len (32), the kernel knows that this is an old application and treats it as such. I looked through the archives, but I did not find specifically why the pretty standard approach described above is considered inferior to the one taken in this patch (freeze rseq_len at 32, add additional length fields to struct rseq). Can these be summarized? Thanks, Peter On Mon, Jul 13, 2020 at 8:04 PM Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > Add a __rseq_abi.flags "RSEQ_TLS_FLAG_SIZE", which indicates support for > extending struct rseq. This adds two new fields to struct rseq: > user_size and kernel_size. > > The user_size field allows the size of the __rseq_abi definition (which > can be overridden by symbol interposition either by a preloaded library > or by the application) to be handed over to the kernel at registration. > This registration can be performed by a library, e.g. glibc, which does > not know there is interposition taking place. > > The kernel_size is populated by the kernel when the "RSEQ_TLS_FLAG_SIZE" > flag is set in __rseq_abi.flags to the minimum between user_size and > the offset of the "end" field of struct rseq as known by the kernel. > This allows user-space to query which fields are effectively populated > by the kernel. > > A rseq_size field is added to the task struct to keep track of the > "kernel_size" effective for each thread. > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > --- > include/linux/sched.h | 4 ++++ > include/uapi/linux/rseq.h | 37 ++++++++++++++++++++++++++++++++-- > kernel/rseq.c | 42 +++++++++++++++++++++++++++++++++------ > 3 files changed, 75 insertions(+), 8 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 692e327d7455..5d61a3197987 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1147,6 +1147,7 @@ struct task_struct { > #ifdef CONFIG_RSEQ > struct rseq __user *rseq; > u32 rseq_sig; > + u32 rseq_size; > /* > * RmW on rseq_event_mask must be performed atomically > * with respect to preemption. > @@ -1976,10 +1977,12 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) > if (clone_flags & CLONE_VM) { > t->rseq = NULL; > t->rseq_sig = 0; > + t->rseq_size = 0; > t->rseq_event_mask = 0; > } else { > t->rseq = current->rseq; > t->rseq_sig = current->rseq_sig; > + t->rseq_size = current->rseq_size; > t->rseq_event_mask = current->rseq_event_mask; > } > } > @@ -1988,6 +1991,7 @@ static inline void rseq_execve(struct task_struct *t) > { > t->rseq = NULL; > t->rseq_sig = 0; > + t->rseq_size = 0; > t->rseq_event_mask = 0; > } > > diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h > index e11d9df5e564..03c0b5e9a859 100644 > --- a/include/uapi/linux/rseq.h > +++ b/include/uapi/linux/rseq.h > @@ -37,6 +37,15 @@ enum rseq_cs_flags { > (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT), > }; > > +enum rseq_tls_flags_bit { > + /* enum rseq_cs_flags reserves bits 0-2. */ > + RSEQ_TLS_FLAG_SIZE_BIT = 3, > +}; > + > +enum rseq_tls_flags { > + RSEQ_TLS_FLAG_SIZE = (1U << RSEQ_TLS_FLAG_SIZE_BIT), > +}; > + > /* The rseq_len expected by rseq registration is always 32 bytes. */ > enum rseq_len_expected { > RSEQ_LEN_EXPECTED = 32, > @@ -133,8 +142,9 @@ struct rseq { > * > * This field should only be updated by the thread which > * registered this data structure. Read by the kernel. > - * Mainly used for single-stepping through rseq critical sections > - * with debuggers. > + * > + * The RSEQ_CS flags are mainly used for single-stepping through rseq > + * critical sections with debuggers. > * > * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT > * Inhibit instruction sequence block restart on preemption > @@ -145,8 +155,31 @@ struct rseq { > * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE > * Inhibit instruction sequence block restart on migration for > * this thread. > + * > + * - RSEQ_TLS_FLAG_SIZE > + * Extensible struct rseq ABI. This flag should be statically > + * initialized. > */ > __u32 flags; > + /* > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, user_size should be > + * statically initialized to offsetof(struct rseq, end). > + */ > + __u16 user_size; > + /* > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, if the kernel supports > + * extensible struct rseq ABI, the kernel_size field is populated by > + * the kernel to the minimum between user_size and the offset of the > + * "end" field within the struct rseq supported by the kernel on > + * successful registration. Should be initialized to 0. > + */ > + __u16 kernel_size; > + > + /* > + * Very last field of the structure, to calculate size excluding padding > + * with offsetof(). > + */ > + char end[]; > } __attribute__((aligned(4 * sizeof(__u64)))); > > #endif /* _UAPI_LINUX_RSEQ_H */ > diff --git a/kernel/rseq.c b/kernel/rseq.c > index a4f86a9d6937..bbc57fc18573 100644 > --- a/kernel/rseq.c > +++ b/kernel/rseq.c > @@ -96,6 +96,7 @@ static int rseq_update_cpu_id(struct task_struct *t) > static int rseq_reset_rseq_cpu_id(struct task_struct *t) > { > u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED; > + u16 kernel_size = 0; > > /* > * Reset cpu_id_start to its initial state (0). > @@ -109,6 +110,11 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t) > */ > if (put_user(cpu_id, &t->rseq->cpu_id)) > return -EFAULT; > + /* > + * Reset kernel_size to its initial state (0). > + */ > + if (put_user(kernel_size, &t->rseq->kernel_size)) > + return -EFAULT; > return 0; > } > > @@ -266,7 +272,7 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs) > > if (unlikely(t->flags & PF_EXITING)) > return; > - if (unlikely(!access_ok(t->rseq, sizeof(*t->rseq)))) > + if (unlikely(!access_ok(t->rseq, t->rseq_size))) > goto error; > ret = rseq_ip_fixup(regs); > if (unlikely(ret < 0)) > @@ -294,7 +300,7 @@ void rseq_syscall(struct pt_regs *regs) > > if (!t->rseq) > return; > - if (!access_ok(t->rseq, sizeof(*t->rseq)) || > + if (!access_ok(t->rseq, t->rseq_size) || > rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) > force_sig(SIGSEGV); > } > @@ -308,6 +314,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, > int, flags, u32, sig) > { > int ret; > + u32 tls_flags; > > if (flags & RSEQ_FLAG_UNREGISTER) { > if (flags & ~RSEQ_FLAG_UNREGISTER) > @@ -315,7 +322,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, > /* Unregister rseq for current thread. */ > if (current->rseq != rseq || !current->rseq) > return -EINVAL; > - if (rseq_len != sizeof(*rseq)) > + if (rseq_len != RSEQ_LEN_EXPECTED) > return -EINVAL; > if (current->rseq_sig != sig) > return -EPERM; > @@ -323,6 +330,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, > if (ret) > return ret; > current->rseq = NULL; > + current->rseq_size = 0; > current->rseq_sig = 0; > return 0; > } > @@ -336,7 +344,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, > * the provided address differs from the prior > * one. > */ > - if (current->rseq != rseq || rseq_len != sizeof(*rseq)) > + if (current->rseq != rseq || rseq_len != RSEQ_LEN_EXPECTED) > return -EINVAL; > if (current->rseq_sig != sig) > return -EPERM; > @@ -349,10 +357,32 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, > * ensure the provided rseq is properly aligned and valid. > */ > if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || > - rseq_len != sizeof(*rseq)) > + rseq_len != RSEQ_LEN_EXPECTED) > return -EINVAL; > - if (!access_ok(rseq, rseq_len)) > + if (!access_ok(rseq, RSEQ_LEN_EXPECTED)) > return -EFAULT; > + > + /* Handle extensible struct rseq ABI. */ > + ret = get_user(tls_flags, &rseq->flags); > + if (ret) > + return ret; > + if (tls_flags & RSEQ_TLS_FLAG_SIZE) { > + u16 user_size, kernel_size; > + > + ret = get_user(user_size, &rseq->user_size); > + if (ret) > + return ret; > + if (user_size < offsetof(struct rseq, kernel_size) + sizeof(u16)) > + return -EINVAL; > + kernel_size = min_t(u16, user_size, offsetof(struct rseq, end)); > + ret = put_user(kernel_size, &rseq->kernel_size); > + if (ret) > + return ret; > + current->rseq_size = kernel_size; > + } else { > + current->rseq_size = offsetof(struct rseq, user_size); > + } > + > current->rseq = rseq; > current->rseq_sig = sig; > /* > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-14 17:24 ` Peter Oskolkov @ 2020-07-14 17:43 ` Mathieu Desnoyers 2020-07-14 18:33 ` Peter Oskolkov 0 siblings, 1 reply; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-14 17:43 UTC (permalink / raw) To: Peter Oskolkov Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, carlos, Peter Oskolkov ----- On Jul 14, 2020, at 1:24 PM, Peter Oskolkov posk@posk.io wrote: > At Google, we actually extended struct rseq (I will post the patches > here once they are fully deployed and we have specific > benefits/improvements to report). We did this by adding several fields > below __u32 flags (the last field currently), and correspondingly > increasing rseq_len in rseq() syscall. If the kernel does not know of > this extension, it will return -EINVAL due to an unexpected rseq_len; > then the application can either fall-back to the standard/upstream > rseq, or bail. If the kernel does know of this extension, it accepts > it. If the application passes the old rseq_len (32), the kernel knows > that this is an old application and treats it as such. > > I looked through the archives, but I did not find specifically why the > pretty standard approach described above is considered inferior to the > one taken in this patch (freeze rseq_len at 32, add additional length > fields to struct rseq). Can these be summarized? I think you don't face the issues I'm facing with libc rseq integration because you control the entire user-space software ecosystem at Google. The main issue we face is that the library responsible for registering rseq (either glibc 2.32+, an early-adopter librseq library, or the application) may very well not be the same library defining the __rseq_abi symbol used in the global symbol table. Interposition with ld preload or by defining the __rseq_abi in the program's executable are good examples of this kind of scenario, and those use-cases are supported. So the size of the __rseq_abi structure may be larger than the struct rseq known by glibc (and eventually smaller, if future glibc versions extend their __rseq_abi size but is loaded with an older program/library doing __rseq_abi interposition). So we need some way to allow code defining the __rseq_abi to let the kernel know how much room is available, without necessarily requiring the code responsible for rseq registration to be aware of that extended layout. This is the purpose of the __rseq_abi.flags RSEQ_FLAG_TLS_SIZE and field __rseq_abi.user_size. And we need some way to allow the kernel to let user-space rseq critical sections (user code) know how much of those fields are actually populated by the kernel. This is the purpose of __rseq_abi.flags RSEQ_FLAG_TLS_SIZE with __rseq_abi.kernel_size. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-14 17:43 ` Mathieu Desnoyers @ 2020-07-14 18:33 ` Peter Oskolkov 2020-07-15 2:34 ` Chris Kennelly 0 siblings, 1 reply; 37+ messages in thread From: Peter Oskolkov @ 2020-07-14 18:33 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Oskolkov, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, carlos, Chris Kennelly On Tue, Jul 14, 2020 at 10:43 AM Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > ----- On Jul 14, 2020, at 1:24 PM, Peter Oskolkov posk@posk.io wrote: > > > At Google, we actually extended struct rseq (I will post the patches > > here once they are fully deployed and we have specific > > benefits/improvements to report). We did this by adding several fields > > below __u32 flags (the last field currently), and correspondingly > > increasing rseq_len in rseq() syscall. If the kernel does not know of > > this extension, it will return -EINVAL due to an unexpected rseq_len; > > then the application can either fall-back to the standard/upstream > > rseq, or bail. If the kernel does know of this extension, it accepts > > it. If the application passes the old rseq_len (32), the kernel knows > > that this is an old application and treats it as such. > > > > I looked through the archives, but I did not find specifically why the > > pretty standard approach described above is considered inferior to the > > one taken in this patch (freeze rseq_len at 32, add additional length > > fields to struct rseq). Can these be summarized? > > I think you don't face the issues I'm facing with libc rseq integration > because you control the entire user-space software ecosystem at Google. > > The main issue we face is that the library responsible for registering > rseq (either glibc 2.32+, an early-adopter librseq library, or the > application) may very well not be the same library defining the __rseq_abi > symbol used in the global symbol table. Interposition with ld preload or > by defining the __rseq_abi in the program's executable are good examples > of this kind of scenario, and those use-cases are supported. > > So the size of the __rseq_abi structure may be larger than the struct > rseq known by glibc (and eventually smaller, if future glibc versions > extend their __rseq_abi size but is loaded with an older program/library > doing __rseq_abi interposition). > > So we need some way to allow code defining the __rseq_abi to let the kernel > know how much room is available, without necessarily requiring the code > responsible for rseq registration to be aware of that extended layout. > This is the purpose of the __rseq_abi.flags RSEQ_FLAG_TLS_SIZE and field > __rseq_abi.user_size. > > And we need some way to allow the kernel to let user-space rseq critical > sections (user code) know how much of those fields are actually populated > by the kernel. This is the purpose of __rseq_abi.flags RSEQ_FLAG_TLS_SIZE > with __rseq_abi.kernel_size. Thanks, Mathieu, for the explanation. Yes, multiple unrelated libraries having to share struct rseq complicates matters. Your approach appears to be a way to reconcile the issues you outlined above. Thanks, Peter > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-14 18:33 ` Peter Oskolkov @ 2020-07-15 2:34 ` Chris Kennelly 2020-07-15 6:31 ` Florian Weimer 2020-07-15 14:50 ` Mathieu Desnoyers 0 siblings, 2 replies; 37+ messages in thread From: Chris Kennelly @ 2020-07-15 2:34 UTC (permalink / raw) To: Peter Oskolkov Cc: Mathieu Desnoyers, Peter Oskolkov, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, carlos On Tue, Jul 14, 2020 at 2:33 PM Peter Oskolkov <posk@google.com> wrote: > > On Tue, Jul 14, 2020 at 10:43 AM Mathieu Desnoyers > <mathieu.desnoyers@efficios.com> wrote: > > > > ----- On Jul 14, 2020, at 1:24 PM, Peter Oskolkov posk@posk.io wrote: > > > > > At Google, we actually extended struct rseq (I will post the patches > > > here once they are fully deployed and we have specific > > > benefits/improvements to report). We did this by adding several fields > > > below __u32 flags (the last field currently), and correspondingly > > > increasing rseq_len in rseq() syscall. If the kernel does not know of > > > this extension, it will return -EINVAL due to an unexpected rseq_len; > > > then the application can either fall-back to the standard/upstream > > > rseq, or bail. If the kernel does know of this extension, it accepts > > > it. If the application passes the old rseq_len (32), the kernel knows > > > that this is an old application and treats it as such. > > > > > > I looked through the archives, but I did not find specifically why the > > > pretty standard approach described above is considered inferior to the > > > one taken in this patch (freeze rseq_len at 32, add additional length > > > fields to struct rseq). Can these be summarized? > > > > I think you don't face the issues I'm facing with libc rseq integration > > because you control the entire user-space software ecosystem at Google. > > > > The main issue we face is that the library responsible for registering > > rseq (either glibc 2.32+, an early-adopter librseq library, or the > > application) may very well not be the same library defining the __rseq_abi > > symbol used in the global symbol table. Interposition with ld preload or > > by defining the __rseq_abi in the program's executable are good examples > > of this kind of scenario, and those use-cases are supported. Does this work if/when we run out of bytes in the current sizeof(__rseq_abi)? Which library provides the TLS symbol (and N bytes of storage) seems sensitive to the choices the linker makes for us, once the symbol sizes diverge. > > So the size of the __rseq_abi structure may be larger than the struct > > rseq known by glibc (and eventually smaller, if future glibc versions > > extend their __rseq_abi size but is loaded with an older program/library > > doing __rseq_abi interposition). When glibc provides registration, is the anticipated use case that a library would unregister and reregister each thread to "upgrade" it to the most modern version of interface it knows about provided by the kernel? > > So we need some way to allow code defining the __rseq_abi to let the kernel > > know how much room is available, without necessarily requiring the code > > responsible for rseq registration to be aware of that extended layout. > > This is the purpose of the __rseq_abi.flags RSEQ_FLAG_TLS_SIZE and field > > __rseq_abi.user_size. > > > > And we need some way to allow the kernel to let user-space rseq critical > > sections (user code) know how much of those fields are actually populated > > by the kernel. This is the purpose of __rseq_abi.flags RSEQ_FLAG_TLS_SIZE > > with __rseq_abi.kernel_size. I authored the userspace component (https://github.com/google/tcmalloc/commit/ad136d45f75a273b934446699cef8b278c34ec6e) that consumes the extensions Peter mentions and found that minimizing the performance impact of their potential absence was a bit of a challenge. There, I could assume an all-or-nothing registration of the new feature--limited only by kernel availability for thread homogeneity--but inconsistencies across early adopter libraries would mean each thread would have to examine its own TLS to determine if a feature were available. Chris ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 2:34 ` Chris Kennelly @ 2020-07-15 6:31 ` Florian Weimer 2020-07-15 10:59 ` Christian Brauner 2020-07-15 14:38 ` Mathieu Desnoyers 2020-07-15 14:50 ` Mathieu Desnoyers 1 sibling, 2 replies; 37+ messages in thread From: Florian Weimer @ 2020-07-15 6:31 UTC (permalink / raw) To: Chris Kennelly Cc: Peter Oskolkov, Mathieu Desnoyers, Peter Oskolkov, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, carlos * Chris Kennelly: > When glibc provides registration, is the anticipated use case that a > library would unregister and reregister each thread to "upgrade" it to > the most modern version of interface it knows about provided by the > kernel? Absolutely not, that is likely to break other consumers because an expected rseq area becomes dormant instead. > There, I could assume an all-or-nothing registration of the new > feature--limited only by kernel availability for thread > homogeneity--but inconsistencies across early adopter libraries would > mean each thread would have to examine its own TLS to determine if a > feature were available. Exactly. Certain uses of seccomp can also have this effect, presenting a non-homogeneous view. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 6:31 ` Florian Weimer @ 2020-07-15 10:59 ` Christian Brauner 2020-07-15 14:38 ` Mathieu Desnoyers 1 sibling, 0 replies; 37+ messages in thread From: Christian Brauner @ 2020-07-15 10:59 UTC (permalink / raw) To: Florian Weimer Cc: Chris Kennelly, Peter Oskolkov, Mathieu Desnoyers, Peter Oskolkov, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, carlos On Wed, Jul 15, 2020 at 08:31:05AM +0200, Florian Weimer wrote: > * Chris Kennelly: > > > When glibc provides registration, is the anticipated use case that a > > library would unregister and reregister each thread to "upgrade" it to > > the most modern version of interface it knows about provided by the > > kernel? > > Absolutely not, that is likely to break other consumers because an > expected rseq area becomes dormant instead. > > > There, I could assume an all-or-nothing registration of the new > > feature--limited only by kernel availability for thread > > homogeneity--but inconsistencies across early adopter libraries would > > mean each thread would have to examine its own TLS to determine if a > > feature were available. Fwiw, I pointed this out in the discussions that led up to this patchset. I don't see how this can work if threads don't check for their feature set. > > Exactly. Certain uses of seccomp can also have this effect, > presenting a non-homogeneous view. Good point. There might be threads with a seccomp filter that would block rseq features is what you mean, I assume. Christian ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 6:31 ` Florian Weimer 2020-07-15 10:59 ` Christian Brauner @ 2020-07-15 14:38 ` Mathieu Desnoyers 1 sibling, 0 replies; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-15 14:38 UTC (permalink / raw) To: Florian Weimer Cc: Chris Kennelly, Peter Oskolkov, Peter Oskolkov, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, carlos ----- On Jul 15, 2020, at 2:31 AM, Florian Weimer fw@deneb.enyo.de wrote: > * Chris Kennelly: > >> When glibc provides registration, is the anticipated use case that a >> library would unregister and reregister each thread to "upgrade" it to >> the most modern version of interface it knows about provided by the >> kernel? > > Absolutely not, that is likely to break other consumers because an > expected rseq area becomes dormant instead. Indeed. > >> There, I could assume an all-or-nothing registration of the new >> feature--limited only by kernel availability for thread >> homogeneity--but inconsistencies across early adopter libraries would >> mean each thread would have to examine its own TLS to determine if a >> feature were available. > > Exactly. Certain uses of seccomp can also have this effect, > presenting a non-homogeneous view. The nice thing about having a consistent feature-set for a given thread group is that it allows specializing the code at thread group startup, rather than requiring to dynamically check for feature availability at runtime in fast-paths. I wonder whether this kind of non-homogeneous view scenario caused by seccomp is something we should support, or something that should be documented as incompatible with rseq ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 2:34 ` Chris Kennelly 2020-07-15 6:31 ` Florian Weimer @ 2020-07-15 14:50 ` Mathieu Desnoyers 1 sibling, 0 replies; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-15 14:50 UTC (permalink / raw) To: Chris Kennelly Cc: Peter Oskolkov, Peter Oskolkov, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, carlos ----- On Jul 14, 2020, at 10:34 PM, Chris Kennelly ckennelly@google.com wrote: > On Tue, Jul 14, 2020 at 2:33 PM Peter Oskolkov <posk@google.com> wrote: >> >> On Tue, Jul 14, 2020 at 10:43 AM Mathieu Desnoyers >> <mathieu.desnoyers@efficios.com> wrote: >> > >> > ----- On Jul 14, 2020, at 1:24 PM, Peter Oskolkov posk@posk.io wrote: >> > >> > > At Google, we actually extended struct rseq (I will post the patches >> > > here once they are fully deployed and we have specific >> > > benefits/improvements to report). We did this by adding several fields >> > > below __u32 flags (the last field currently), and correspondingly >> > > increasing rseq_len in rseq() syscall. If the kernel does not know of >> > > this extension, it will return -EINVAL due to an unexpected rseq_len; >> > > then the application can either fall-back to the standard/upstream >> > > rseq, or bail. If the kernel does know of this extension, it accepts >> > > it. If the application passes the old rseq_len (32), the kernel knows >> > > that this is an old application and treats it as such. >> > > >> > > I looked through the archives, but I did not find specifically why the >> > > pretty standard approach described above is considered inferior to the >> > > one taken in this patch (freeze rseq_len at 32, add additional length >> > > fields to struct rseq). Can these be summarized? >> > >> > I think you don't face the issues I'm facing with libc rseq integration >> > because you control the entire user-space software ecosystem at Google. >> > >> > The main issue we face is that the library responsible for registering >> > rseq (either glibc 2.32+, an early-adopter librseq library, or the >> > application) may very well not be the same library defining the __rseq_abi >> > symbol used in the global symbol table. Interposition with ld preload or >> > by defining the __rseq_abi in the program's executable are good examples >> > of this kind of scenario, and those use-cases are supported. > > Does this work if/when we run out of bytes in the current sizeof(__rseq_abi)? Only if all libraries/programs involved (including glibc) expect that the size of the __rseq_abi can be the smallest possible subset, and only consider it to be "extended" if specific information in the ABI tells them it is the case. > > Which library provides the TLS symbol (and N bytes of storage) seems > sensitive to the choices the linker makes for us, once the symbol > sizes diverge. AFAIU, a symbol defined in the main executable will have precedence over a preloaded library, which has precedence over shared library dependencies, e.g. glibc. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-14 3:03 ` [RFC PATCH 2/4] rseq: Allow extending struct rseq Mathieu Desnoyers 2020-07-14 9:58 ` Florian Weimer 2020-07-14 17:24 ` Peter Oskolkov @ 2020-07-15 11:38 ` Christian Brauner 2020-07-15 12:33 ` Christian Brauner 2 siblings, 1 reply; 37+ messages in thread From: Christian Brauner @ 2020-07-15 11:38 UTC (permalink / raw) To: Mathieu Desnoyers, Florian Weimer Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, Paul E . McKenney, Boqun Feng, H . Peter Anvin, Paul Turner, linux-api, carlos On Mon, Jul 13, 2020 at 11:03:46PM -0400, Mathieu Desnoyers wrote: > Add a __rseq_abi.flags "RSEQ_TLS_FLAG_SIZE", which indicates support for > extending struct rseq. This adds two new fields to struct rseq: > user_size and kernel_size. > > The user_size field allows the size of the __rseq_abi definition (which > can be overridden by symbol interposition either by a preloaded library > or by the application) to be handed over to the kernel at registration. > This registration can be performed by a library, e.g. glibc, which does > not know there is interposition taking place. > > The kernel_size is populated by the kernel when the "RSEQ_TLS_FLAG_SIZE" > flag is set in __rseq_abi.flags to the minimum between user_size and > the offset of the "end" field of struct rseq as known by the kernel. > This allows user-space to query which fields are effectively populated > by the kernel. > > A rseq_size field is added to the task struct to keep track of the > "kernel_size" effective for each thread. > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > --- > include/linux/sched.h | 4 ++++ > include/uapi/linux/rseq.h | 37 ++++++++++++++++++++++++++++++++-- > kernel/rseq.c | 42 +++++++++++++++++++++++++++++++++------ > 3 files changed, 75 insertions(+), 8 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 692e327d7455..5d61a3197987 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1147,6 +1147,7 @@ struct task_struct { > #ifdef CONFIG_RSEQ > struct rseq __user *rseq; > u32 rseq_sig; > + u32 rseq_size; > /* > * RmW on rseq_event_mask must be performed atomically > * with respect to preemption. > @@ -1976,10 +1977,12 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) > if (clone_flags & CLONE_VM) { > t->rseq = NULL; > t->rseq_sig = 0; > + t->rseq_size = 0; > t->rseq_event_mask = 0; > } else { > t->rseq = current->rseq; > t->rseq_sig = current->rseq_sig; > + t->rseq_size = current->rseq_size; > t->rseq_event_mask = current->rseq_event_mask; > } > } > @@ -1988,6 +1991,7 @@ static inline void rseq_execve(struct task_struct *t) > { > t->rseq = NULL; > t->rseq_sig = 0; > + t->rseq_size = 0; > t->rseq_event_mask = 0; > } > > diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h > index e11d9df5e564..03c0b5e9a859 100644 > --- a/include/uapi/linux/rseq.h > +++ b/include/uapi/linux/rseq.h > @@ -37,6 +37,15 @@ enum rseq_cs_flags { > (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT), > }; > > +enum rseq_tls_flags_bit { > + /* enum rseq_cs_flags reserves bits 0-2. */ > + RSEQ_TLS_FLAG_SIZE_BIT = 3, > +}; > + > +enum rseq_tls_flags { > + RSEQ_TLS_FLAG_SIZE = (1U << RSEQ_TLS_FLAG_SIZE_BIT), > +}; > + > /* The rseq_len expected by rseq registration is always 32 bytes. */ > enum rseq_len_expected { > RSEQ_LEN_EXPECTED = 32, > @@ -133,8 +142,9 @@ struct rseq { > * > * This field should only be updated by the thread which > * registered this data structure. Read by the kernel. > - * Mainly used for single-stepping through rseq critical sections > - * with debuggers. > + * > + * The RSEQ_CS flags are mainly used for single-stepping through rseq > + * critical sections with debuggers. > * > * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT > * Inhibit instruction sequence block restart on preemption > @@ -145,8 +155,31 @@ struct rseq { > * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE > * Inhibit instruction sequence block restart on migration for > * this thread. > + * > + * - RSEQ_TLS_FLAG_SIZE > + * Extensible struct rseq ABI. This flag should be statically > + * initialized. > */ > __u32 flags; > + /* > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, user_size should be > + * statically initialized to offsetof(struct rseq, end). > + */ > + __u16 user_size; > + /* > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, if the kernel supports > + * extensible struct rseq ABI, the kernel_size field is populated by > + * the kernel to the minimum between user_size and the offset of the > + * "end" field within the struct rseq supported by the kernel on > + * successful registration. Should be initialized to 0. > + */ > + __u16 kernel_size; (Btw, this what I suggested - minus the user_size part - when I said "expose the size of struct rseq" the kernel knows about. The approach here is of course more general.) It's pretty uncommon to use __u16 for sizes at least in public facing structs. I'd suggest to use __u32 user_size and __u32 kernel_size and if needed, insert padding. Seems you have done this in your union above already. > + > + /* > + * Very last field of the structure, to calculate size excluding padding > + * with offsetof(). > + */ > + char end[]; Hm, could this mess with alignment or break making the struct extensible? Feels like you're adding new members always before this which is also pretty non-standard in terms of how we'd usually extend structs. > } __attribute__((aligned(4 * sizeof(__u64)))); > > #endif /* _UAPI_LINUX_RSEQ_H */ > diff --git a/kernel/rseq.c b/kernel/rseq.c > index a4f86a9d6937..bbc57fc18573 100644 > --- a/kernel/rseq.c > +++ b/kernel/rseq.c > @@ -96,6 +96,7 @@ static int rseq_update_cpu_id(struct task_struct *t) > static int rseq_reset_rseq_cpu_id(struct task_struct *t) > { > u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED; > + u16 kernel_size = 0; > > /* > * Reset cpu_id_start to its initial state (0). > @@ -109,6 +110,11 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t) > */ > if (put_user(cpu_id, &t->rseq->cpu_id)) > return -EFAULT; > + /* > + * Reset kernel_size to its initial state (0). > + */ > + if (put_user(kernel_size, &t->rseq->kernel_size)) > + return -EFAULT; > return 0; > } > > @@ -266,7 +272,7 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs) > > if (unlikely(t->flags & PF_EXITING)) > return; > - if (unlikely(!access_ok(t->rseq, sizeof(*t->rseq)))) > + if (unlikely(!access_ok(t->rseq, t->rseq_size))) > goto error; > ret = rseq_ip_fixup(regs); > if (unlikely(ret < 0)) > @@ -294,7 +300,7 @@ void rseq_syscall(struct pt_regs *regs) > > if (!t->rseq) > return; > - if (!access_ok(t->rseq, sizeof(*t->rseq)) || > + if (!access_ok(t->rseq, t->rseq_size) || > rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) > force_sig(SIGSEGV); > } > @@ -308,6 +314,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, > int, flags, u32, sig) > { > int ret; > + u32 tls_flags; > > if (flags & RSEQ_FLAG_UNREGISTER) { > if (flags & ~RSEQ_FLAG_UNREGISTER) > @@ -315,7 +322,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, > /* Unregister rseq for current thread. */ > if (current->rseq != rseq || !current->rseq) > return -EINVAL; > - if (rseq_len != sizeof(*rseq)) > + if (rseq_len != RSEQ_LEN_EXPECTED) So I have to say that I think it's not a great to fix the length of the rseq_len argument basically making it somewhat a nop. If I recall correctly Florian said something about the rseq_len becoming part of the glibc abi and that's why it can't be changed? Is there any way we can avoid that so we can use the rseq_len argument to have userspace pass down the size of struct rseq they know about? It's really unintuitive to pass down an extensible struct but the length argument associated with it is fixed. I also think there should be some compile-time sanity checks here similar to what we do in other places see e.g. BUILD_BUG_ON(sizeof(struct clone_args) != CLONE_ARGS_SIZE_VER2); So here should at least be sm like: BUILD_BUG_ON(sizeof(struct rseq) != RSEQ_LEN_EXPECTED); > return -EINVAL; > if (current->rseq_sig != sig) > return -EPERM; > @@ -323,6 +330,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, > if (ret) > return ret; > current->rseq = NULL; > + current->rseq_size = 0; > current->rseq_sig = 0; > return 0; > } > @@ -336,7 +344,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, > * the provided address differs from the prior > * one. > */ > - if (current->rseq != rseq || rseq_len != sizeof(*rseq)) > + if (current->rseq != rseq || rseq_len != RSEQ_LEN_EXPECTED) > return -EINVAL; > if (current->rseq_sig != sig) > return -EPERM; > @@ -349,10 +357,32 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, > * ensure the provided rseq is properly aligned and valid. > */ > if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || > - rseq_len != sizeof(*rseq)) > + rseq_len != RSEQ_LEN_EXPECTED) > return -EINVAL; > - if (!access_ok(rseq, rseq_len)) > + if (!access_ok(rseq, RSEQ_LEN_EXPECTED)) > return -EFAULT; > + > + /* Handle extensible struct rseq ABI. */ > + ret = get_user(tls_flags, &rseq->flags); > + if (ret) > + return ret; > + if (tls_flags & RSEQ_TLS_FLAG_SIZE) { > + u16 user_size, kernel_size; > + > + ret = get_user(user_size, &rseq->user_size); > + if (ret) > + return ret; > + if (user_size < offsetof(struct rseq, kernel_size) + sizeof(u16)) > + return -EINVAL; > + kernel_size = min_t(u16, user_size, offsetof(struct rseq, end)); > + ret = put_user(kernel_size, &rseq->kernel_size); > + if (ret) > + return ret; > + current->rseq_size = kernel_size; > + } else { > + current->rseq_size = offsetof(struct rseq, user_size); > + } > + > current->rseq = rseq; > current->rseq_sig = sig; > /* > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 11:38 ` Christian Brauner @ 2020-07-15 12:33 ` Christian Brauner 2020-07-15 15:10 ` Mathieu Desnoyers 0 siblings, 1 reply; 37+ messages in thread From: Christian Brauner @ 2020-07-15 12:33 UTC (permalink / raw) To: Mathieu Desnoyers, Florian Weimer Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, Paul E . McKenney, Boqun Feng, H . Peter Anvin, Paul Turner, linux-api, carlos On Wed, Jul 15, 2020 at 01:38:51PM +0200, Christian Brauner wrote: > On Mon, Jul 13, 2020 at 11:03:46PM -0400, Mathieu Desnoyers wrote: > > Add a __rseq_abi.flags "RSEQ_TLS_FLAG_SIZE", which indicates support for > > extending struct rseq. This adds two new fields to struct rseq: > > user_size and kernel_size. > > > > The user_size field allows the size of the __rseq_abi definition (which > > can be overridden by symbol interposition either by a preloaded library > > or by the application) to be handed over to the kernel at registration. > > This registration can be performed by a library, e.g. glibc, which does > > not know there is interposition taking place. > > > > The kernel_size is populated by the kernel when the "RSEQ_TLS_FLAG_SIZE" > > flag is set in __rseq_abi.flags to the minimum between user_size and > > the offset of the "end" field of struct rseq as known by the kernel. > > This allows user-space to query which fields are effectively populated > > by the kernel. > > > > A rseq_size field is added to the task struct to keep track of the > > "kernel_size" effective for each thread. > > > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > --- > > include/linux/sched.h | 4 ++++ > > include/uapi/linux/rseq.h | 37 ++++++++++++++++++++++++++++++++-- > > kernel/rseq.c | 42 +++++++++++++++++++++++++++++++++------ > > 3 files changed, 75 insertions(+), 8 deletions(-) > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index 692e327d7455..5d61a3197987 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -1147,6 +1147,7 @@ struct task_struct { > > #ifdef CONFIG_RSEQ > > struct rseq __user *rseq; > > u32 rseq_sig; > > + u32 rseq_size; > > /* > > * RmW on rseq_event_mask must be performed atomically > > * with respect to preemption. > > @@ -1976,10 +1977,12 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) > > if (clone_flags & CLONE_VM) { > > t->rseq = NULL; > > t->rseq_sig = 0; > > + t->rseq_size = 0; > > t->rseq_event_mask = 0; > > } else { > > t->rseq = current->rseq; > > t->rseq_sig = current->rseq_sig; > > + t->rseq_size = current->rseq_size; > > t->rseq_event_mask = current->rseq_event_mask; > > } > > } > > @@ -1988,6 +1991,7 @@ static inline void rseq_execve(struct task_struct *t) > > { > > t->rseq = NULL; > > t->rseq_sig = 0; > > + t->rseq_size = 0; > > t->rseq_event_mask = 0; > > } > > > > diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h > > index e11d9df5e564..03c0b5e9a859 100644 > > --- a/include/uapi/linux/rseq.h > > +++ b/include/uapi/linux/rseq.h > > @@ -37,6 +37,15 @@ enum rseq_cs_flags { > > (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT), > > }; > > > > +enum rseq_tls_flags_bit { > > + /* enum rseq_cs_flags reserves bits 0-2. */ > > + RSEQ_TLS_FLAG_SIZE_BIT = 3, > > +}; > > + > > +enum rseq_tls_flags { > > + RSEQ_TLS_FLAG_SIZE = (1U << RSEQ_TLS_FLAG_SIZE_BIT), > > +}; > > + > > /* The rseq_len expected by rseq registration is always 32 bytes. */ > > enum rseq_len_expected { > > RSEQ_LEN_EXPECTED = 32, > > @@ -133,8 +142,9 @@ struct rseq { > > * > > * This field should only be updated by the thread which > > * registered this data structure. Read by the kernel. > > - * Mainly used for single-stepping through rseq critical sections > > - * with debuggers. > > + * > > + * The RSEQ_CS flags are mainly used for single-stepping through rseq > > + * critical sections with debuggers. > > * > > * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT > > * Inhibit instruction sequence block restart on preemption > > @@ -145,8 +155,31 @@ struct rseq { > > * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE > > * Inhibit instruction sequence block restart on migration for > > * this thread. > > + * > > + * - RSEQ_TLS_FLAG_SIZE > > + * Extensible struct rseq ABI. This flag should be statically > > + * initialized. > > */ > > __u32 flags; > > + /* > > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, user_size should be > > + * statically initialized to offsetof(struct rseq, end). > > + */ > > + __u16 user_size; > > + /* > > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, if the kernel supports > > + * extensible struct rseq ABI, the kernel_size field is populated by > > + * the kernel to the minimum between user_size and the offset of the > > + * "end" field within the struct rseq supported by the kernel on > > + * successful registration. Should be initialized to 0. > > + */ > > + __u16 kernel_size; > > (Btw, this what I suggested - minus the user_size part - when I said > "expose the size of struct rseq" the kernel knows about. The approach > here is of course more general.) > > It's pretty uncommon to use __u16 for sizes at least in public facing > structs. I'd suggest to use __u32 user_size and __u32 kernel_size and if > needed, insert padding. Seems you have done this in your union above > already. > > > + > > + /* > > + * Very last field of the structure, to calculate size excluding padding > > + * with offsetof(). > > + */ > > + char end[]; > > Hm, could this mess with alignment or break making the struct > extensible? Feels like you're adding new members always before this > which is also pretty non-standard in terms of how we'd usually extend > structs. > > > } __attribute__((aligned(4 * sizeof(__u64)))); > > > > #endif /* _UAPI_LINUX_RSEQ_H */ > > diff --git a/kernel/rseq.c b/kernel/rseq.c > > index a4f86a9d6937..bbc57fc18573 100644 > > --- a/kernel/rseq.c > > +++ b/kernel/rseq.c > > @@ -96,6 +96,7 @@ static int rseq_update_cpu_id(struct task_struct *t) > > static int rseq_reset_rseq_cpu_id(struct task_struct *t) > > { > > u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED; > > + u16 kernel_size = 0; > > > > /* > > * Reset cpu_id_start to its initial state (0). > > @@ -109,6 +110,11 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t) > > */ > > if (put_user(cpu_id, &t->rseq->cpu_id)) > > return -EFAULT; > > + /* > > + * Reset kernel_size to its initial state (0). > > + */ > > + if (put_user(kernel_size, &t->rseq->kernel_size)) > > + return -EFAULT; > > return 0; > > } > > > > @@ -266,7 +272,7 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs) > > > > if (unlikely(t->flags & PF_EXITING)) > > return; > > - if (unlikely(!access_ok(t->rseq, sizeof(*t->rseq)))) > > + if (unlikely(!access_ok(t->rseq, t->rseq_size))) > > goto error; > > ret = rseq_ip_fixup(regs); > > if (unlikely(ret < 0)) > > @@ -294,7 +300,7 @@ void rseq_syscall(struct pt_regs *regs) > > > > if (!t->rseq) > > return; > > - if (!access_ok(t->rseq, sizeof(*t->rseq)) || > > + if (!access_ok(t->rseq, t->rseq_size) || > > rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) > > force_sig(SIGSEGV); > > } > > @@ -308,6 +314,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, > > int, flags, u32, sig) > > { > > int ret; > > + u32 tls_flags; > > > > if (flags & RSEQ_FLAG_UNREGISTER) { > > if (flags & ~RSEQ_FLAG_UNREGISTER) > > @@ -315,7 +322,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, > > /* Unregister rseq for current thread. */ > > if (current->rseq != rseq || !current->rseq) > > return -EINVAL; > > - if (rseq_len != sizeof(*rseq)) > > + if (rseq_len != RSEQ_LEN_EXPECTED) > > So I have to say that I think it's not a great to fix the length of the > rseq_len argument basically making it somewhat a nop. If I recall > correctly Florian said something about the rseq_len becoming part of the > glibc abi and that's why it can't be changed? > Is there any way we can avoid that so we can use the rseq_len argument > to have userspace pass down the size of struct rseq they know about? > > It's really unintuitive to pass down an extensible struct but the length > argument associated with it is fixed. > > I also think there should be some compile-time sanity checks here > similar to what we do in other places see e.g. > > BUILD_BUG_ON(sizeof(struct clone_args) != CLONE_ARGS_SIZE_VER2); > > So here should at least be sm like: > > BUILD_BUG_ON(sizeof(struct rseq) != RSEQ_LEN_EXPECTED); > So here's a very free-wheeling draft of roughly what I had in mind. Not even compile-tested just to illustrate what I'd change and sorry if that code will make you sob in your hands: From 2879e3c30dbe6ba0fc53884b1c41deaa444924a8 Mon Sep 17 00:00:00 2001 From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Mon, 13 Jul 2020 23:03:46 -0400 Subject: [PATCH] [UNTESTED] rseq: Allow extending struct rseq Add a __rseq_abi.flags "RSEQ_TLS_FLAG_SIZE", which indicates support for extending struct rseq. This adds two new fields to struct rseq: user_size and kernel_size. The user_size field allows the size of the __rseq_abi definition (which can be overridden by symbol interposition either by a preloaded library or by the application) to be handed over to the kernel at registration. This registration can be performed by a library, e.g. glibc, which does not know there is interposition taking place. The kernel_size is populated by the kernel when the "RSEQ_TLS_FLAG_SIZE" flag is set in __rseq_abi.flags to the minimum between user_size and the offset of the "end" field of struct rseq as known by the kernel. This allows user-space to query which fields are effectively populated by the kernel. A rseq_size field is added to the task struct to keep track of the "kernel_size" effective for each thread. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> --- include/linux/sched.h | 4 +++ include/uapi/linux/rseq.h | 36 ++++++++++++++++++-- kernel/rseq.c | 72 ++++++++++++++++++++++++++++++++++++--- 3 files changed, 105 insertions(+), 7 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 692e327d7455..5d61a3197987 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1147,6 +1147,7 @@ struct task_struct { #ifdef CONFIG_RSEQ struct rseq __user *rseq; u32 rseq_sig; + u32 rseq_size; /* * RmW on rseq_event_mask must be performed atomically * with respect to preemption. @@ -1976,10 +1977,12 @@ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) if (clone_flags & CLONE_VM) { t->rseq = NULL; t->rseq_sig = 0; + t->rseq_size = 0; t->rseq_event_mask = 0; } else { t->rseq = current->rseq; t->rseq_sig = current->rseq_sig; + t->rseq_size = current->rseq_size; t->rseq_event_mask = current->rseq_event_mask; } } @@ -1988,6 +1991,7 @@ static inline void rseq_execve(struct task_struct *t) { t->rseq = NULL; t->rseq_sig = 0; + t->rseq_size = 0; t->rseq_event_mask = 0; } diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h index 9a402fdb60e9..f10ce38d9712 100644 --- a/include/uapi/linux/rseq.h +++ b/include/uapi/linux/rseq.h @@ -37,6 +37,15 @@ enum rseq_cs_flags { (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT), }; +enum rseq_tls_flags_bit { + /* enum rseq_cs_flags reserves bits 0-2. */ + RSEQ_TLS_FLAG_SIZE_BIT = 3, +}; + +enum rseq_tls_flags { + RSEQ_TLS_FLAG_SIZE = (1U << RSEQ_TLS_FLAG_SIZE_BIT), +}; + /* * struct rseq_cs is aligned on 4 * 8 bytes to ensure it is always * contained within a single cache-line. It is usually declared as @@ -128,8 +137,9 @@ struct rseq { * * This field should only be updated by the thread which * registered this data structure. Read by the kernel. - * Mainly used for single-stepping through rseq critical sections - * with debuggers. + * + * The RSEQ_CS flags are mainly used for single-stepping through rseq + * critical sections with debuggers. * * - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT * Inhibit instruction sequence block restart on preemption @@ -140,8 +150,30 @@ struct rseq { * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE * Inhibit instruction sequence block restart on migration for * this thread. + * + * - RSEQ_TLS_FLAG_SIZE + * Extensible struct rseq ABI. This flag should be statically + * initialized. */ __u32 flags; + /* + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, user_size should be + * statically initialized to offsetof(struct rseq, end). + */ + __u32 user_size; + /* + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, if the kernel supports + * extensible struct rseq ABI, the kernel_size field is populated by + * the kernel to the minimum between user_size and the offset of the + * "end" field within the struct rseq supported by the kernel on + * successful registration. Should be initialized to 0. + */ + __u32 kernel_size; + __u32 active_size; } __attribute__((aligned(4 * sizeof(__u64)))); +#define RSEQ_SIZE_VER0 24 /* sizeof first published struct */ +#define RSEQ_SIZE_VER1 32 /* sizeof second published struct */ +#define RSEQ_SIZE_LATEST RSEQ_SIZE_VER1 /* sizeof last published struct */ + #endif /* _UAPI_LINUX_RSEQ_H */ diff --git a/kernel/rseq.c b/kernel/rseq.c index a4f86a9d6937..9d28b41a99cd 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -96,6 +96,7 @@ static int rseq_update_cpu_id(struct task_struct *t) static int rseq_reset_rseq_cpu_id(struct task_struct *t) { u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED; + u16 kernel_size = 0; /* * Reset cpu_id_start to its initial state (0). @@ -109,6 +110,11 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t) */ if (put_user(cpu_id, &t->rseq->cpu_id)) return -EFAULT; + /* + * Reset kernel_size to its initial state (0). + */ + if (put_user(kernel_size, &t->rseq->kernel_size)) + return -EFAULT; return 0; } @@ -266,7 +272,7 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs) if (unlikely(t->flags & PF_EXITING)) return; - if (unlikely(!access_ok(t->rseq, sizeof(*t->rseq)))) + if (unlikely(!access_ok(t->rseq, t->rseq_size))) goto error; ret = rseq_ip_fixup(regs); if (unlikely(ret < 0)) @@ -294,7 +300,7 @@ void rseq_syscall(struct pt_regs *regs) if (!t->rseq) return; - if (!access_ok(t->rseq, sizeof(*t->rseq)) || + if (!access_ok(t->rseq, t->rseq_size) || rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs)) force_sig(SIGSEGV); } @@ -308,6 +314,11 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, int, flags, u32, sig) { int ret; + u32 tls_flags; + + BUILD_BUG_ON(offsetofend(struct rseq, flags) != RSEQ_SIZE_VER0); + BUILD_BUG_ON(offsetofend(struct rseq, active_size) != RSEQ_SIZE_VER1); + BUILD_BUG_ON(sizeof(struct rseq) != RSEQ_SIZE_LATEST); if (flags & RSEQ_FLAG_UNREGISTER) { if (flags & ~RSEQ_FLAG_UNREGISTER) @@ -315,14 +326,17 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, /* Unregister rseq for current thread. */ if (current->rseq != rseq || !current->rseq) return -EINVAL; - if (rseq_len != sizeof(*rseq)) + if (rseq_len < RSEQ_SIZE_VER0) return -EINVAL; + if (!access_ok(rseq, rseq_len)) + return -EFAULT; if (current->rseq_sig != sig) return -EPERM; ret = rseq_reset_rseq_cpu_id(current); if (ret) return ret; current->rseq = NULL; + current->rseq_size = 0; current->rseq_sig = 0; return 0; } @@ -336,7 +350,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, * the provided address differs from the prior * one. */ - if (current->rseq != rseq || rseq_len != sizeof(*rseq)) + if (current->rseq != rseq || rseq_len != RSEQ_LEN_EXPECTED) return -EINVAL; if (current->rseq_sig != sig) return -EPERM; @@ -349,10 +363,58 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, * ensure the provided rseq is properly aligned and valid. */ if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || - rseq_len != sizeof(*rseq)) + rseq_len < RSEQ_SIZE_VER0) return -EINVAL; if (!access_ok(rseq, rseq_len)) return -EFAULT; + + /* Handle extensible struct rseq ABI. */ + ret = get_user(tls_flags, &rseq->flags); + if (ret) + return ret; + if (tls_flags & RSEQ_TLS_FLAG_SIZE) { + u32 user_size, kernel_size, active_size; + + /* Can probably be made nicer by using check_zeroed_user(). */ + ret = get_user(user_size, &rseq->user_size); + if (ret) + return ret; + if (user_size != 0) + return -EINVAL; + + ret = get_user(active_size, &rseq->active_size); + if (ret) + return ret; + if (active_size != 0) + return -EINVAL; + + ret = get_user(active_size, &rseq->kernel_size); + if (ret) + return ret; + if (kernel_size != 0) + return -EINVAL; + + /* Calculate the useable size. */ + active_size = min_t(u32, rseq_len, RSEQ_SIZE_LATEST); + ret = put_user(active_size, &rseq->active_size); + if (ret) + return ret; + + /* Let other users know what userspace used to register. */ + ret = put_user(rseq_len, &rseq->user_size); + if (ret) + return -EFAULT; + + /* Let other users know what size the kernel supports. */ + ret = put_user(RSEQ_SIZE_LATEST, &rseq->kernel_size); + if (ret) + return -EFAULT; + + current->rseq_size = active_size; + } else { + current->rseq_size = RSEQ_SIZE_VER0; + } + current->rseq = rseq; current->rseq_sig = sig; /* -- 2.27.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 12:33 ` Christian Brauner @ 2020-07-15 15:10 ` Mathieu Desnoyers 2020-07-15 15:33 ` Christian Brauner 0 siblings, 1 reply; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-15 15:10 UTC (permalink / raw) To: Christian Brauner Cc: Florian Weimer, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, carlos ----- On Jul 15, 2020, at 8:33 AM, Christian Brauner christian.brauner@ubuntu.com wrote: [...] > > So here's a very free-wheeling draft of roughly what I had in mind. Not > even compile-tested just to illustrate what I'd change and sorry if that > code will make you sob in your hands: > [...] > + /* > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, user_size should be > + * statically initialized to offsetof(struct rseq, end). > + */ > + __u32 user_size; > + /* > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, if the kernel supports > + * extensible struct rseq ABI, the kernel_size field is populated by > + * the kernel to the minimum between user_size and the offset of the > + * "end" field within the struct rseq supported by the kernel on > + * successful registration. Should be initialized to 0. > + */ > + __u32 kernel_size; Moving from __u16 to __u32 for both fields don't achieve much, and increase the size of struct rseq (excluding padding) from 24 bytes to 28 bytes. Note that the struct rseq alignment is 32 bytes. At 24 bytes, it leaves room for exactly one 8 bytes pointer, which can be useful for future extensions. If the size is increased to 28 bytes or more, then we're done and cannot add a pointer. > + __u32 active_size; This additional field takes the very last bytes of padding we have in the current layout. > } __attribute__((aligned(4 * sizeof(__u64)))); > > +#define RSEQ_SIZE_VER0 24 /* sizeof first published struct */ This is incorrect. The sizeof(struct_rseq) with its 32 bytes alignment is 32, not 24. The padding at the end of the structure is considered as part of its size, but we cannot rely on its content being zero-initialized based on the C standard. > +#define RSEQ_SIZE_VER1 32 /* sizeof second published struct */ > +#define RSEQ_SIZE_LATEST RSEQ_SIZE_VER1 /* sizeof last published struct */ > + [...] > @@ -349,10 +363,58 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, > rseq_len, > * ensure the provided rseq is properly aligned and valid. > */ > if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || > - rseq_len != sizeof(*rseq)) > + rseq_len < RSEQ_SIZE_VER0) This could perhaps be changed for future kernels, but will break for existing kernels as soon as rseq_len is increased. This is something we should have planned for in the initial implementation of the system call, but here we are. How do you envision that userspace would handle this failure from older kernels ? Try again with a second system call passing RSEQ_SIZE_VER0 as argument ? > return -EINVAL; > if (!access_ok(rseq, rseq_len)) > return -EFAULT; > + > + /* Handle extensible struct rseq ABI. */ > + ret = get_user(tls_flags, &rseq->flags); > + if (ret) > + return ret; > + if (tls_flags & RSEQ_TLS_FLAG_SIZE) { > + u32 user_size, kernel_size, active_size; > + > + /* Can probably be made nicer by using check_zeroed_user(). */ > + ret = get_user(user_size, &rseq->user_size); > + if (ret) > + return ret; > + if (user_size != 0) > + return -EINVAL; > + > + ret = get_user(active_size, &rseq->active_size); > + if (ret) > + return ret; > + if (active_size != 0) > + return -EINVAL; > + > + ret = get_user(active_size, &rseq->kernel_size); I guess you mean kernel_size here. > + if (ret) > + return ret; > + if (kernel_size != 0) > + return -EINVAL; > + > + /* Calculate the useable size. */ > + active_size = min_t(u32, rseq_len, RSEQ_SIZE_LATEST); Where is the rseq_len supposed to come from in userspace ? Should it be that the code doing the registration uses sizeof(struct rseq), or offsetof(struct rseq, end), or should it read the content of __rseq_abi.user_size ? > + ret = put_user(active_size, &rseq->active_size); > + if (ret) > + return ret; > + > + /* Let other users know what userspace used to register. */ > + ret = put_user(rseq_len, &rseq->user_size); > + if (ret) > + return -EFAULT; > + > + /* Let other users know what size the kernel supports. */ I am not sure what those 3 __u32 fields (user_size, kernel_size, and active_size), plus use of the rseq_len syscall parameter, accomplish which was not accomplished by my __u16 user_size + kernel_size approach ? If anything, it seems to make support of older kernels which do not support an extended rseq_len parameter more complex. Thanks, Mathieu > + ret = put_user(RSEQ_SIZE_LATEST, &rseq->kernel_size); > + if (ret) > + return -EFAULT; > + > + current->rseq_size = active_size; > + } else { > + current->rseq_size = RSEQ_SIZE_VER0; > + } > + > current->rseq = rseq; > current->rseq_sig = sig; > /* > -- > 2.27.0 -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 2/4] rseq: Allow extending struct rseq 2020-07-15 15:10 ` Mathieu Desnoyers @ 2020-07-15 15:33 ` Christian Brauner 0 siblings, 0 replies; 37+ messages in thread From: Christian Brauner @ 2020-07-15 15:33 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Florian Weimer, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, carlos On Wed, Jul 15, 2020 at 11:10:47AM -0400, Mathieu Desnoyers wrote: > ----- On Jul 15, 2020, at 8:33 AM, Christian Brauner christian.brauner@ubuntu.com wrote: > [...] > > > > So here's a very free-wheeling draft of roughly what I had in mind. Not > > even compile-tested just to illustrate what I'd change and sorry if that > > code will make you sob in your hands: > > > [...] > > + /* > > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, user_size should be > > + * statically initialized to offsetof(struct rseq, end). > > + */ > > + __u32 user_size; > > + /* > > + * With __rseq_abi.flags RSEQ_TLS_FLAG_SIZE set, if the kernel supports > > + * extensible struct rseq ABI, the kernel_size field is populated by > > + * the kernel to the minimum between user_size and the offset of the > > + * "end" field within the struct rseq supported by the kernel on > > + * successful registration. Should be initialized to 0. > > + */ > > + __u32 kernel_size; > > Moving from __u16 to __u32 for both fields don't achieve much, and increase > the size of struct rseq (excluding padding) from 24 bytes to 28 bytes. > > Note that the struct rseq alignment is 32 bytes. At 24 bytes, it leaves room > for exactly one 8 bytes pointer, which can be useful for future extensions. > If the size is increased to 28 bytes or more, then we're done and cannot > add a pointer. > > > + __u32 active_size; > > This additional field takes the very last bytes of padding we have in the > current layout. > > > } __attribute__((aligned(4 * sizeof(__u64)))); > > > > +#define RSEQ_SIZE_VER0 24 /* sizeof first published struct */ > > This is incorrect. The sizeof(struct_rseq) with its 32 bytes alignment is 32, > not 24. The padding at the end of the structure is considered as part of its > size, but we cannot rely on its content being zero-initialized based on the > C standard. > > > +#define RSEQ_SIZE_VER1 32 /* sizeof second published struct */ > > +#define RSEQ_SIZE_LATEST RSEQ_SIZE_VER1 /* sizeof last published struct */ > > + > > [...] > > > @@ -349,10 +363,58 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, > > rseq_len, > > * ensure the provided rseq is properly aligned and valid. > > */ > > if (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || > > - rseq_len != sizeof(*rseq)) > > + rseq_len < RSEQ_SIZE_VER0) > > This could perhaps be changed for future kernels, but will break for existing > kernels as soon as rseq_len is increased. This is something we should have > planned for in the initial implementation of the system call, but here we are. > > How do you envision that userspace would handle this failure from older kernels ? > Try again with a second system call passing RSEQ_SIZE_VER0 as argument ? > > > return -EINVAL; > > if (!access_ok(rseq, rseq_len)) > > return -EFAULT; > > + > > + /* Handle extensible struct rseq ABI. */ > > + ret = get_user(tls_flags, &rseq->flags); > > + if (ret) > > + return ret; > > + if (tls_flags & RSEQ_TLS_FLAG_SIZE) { > > + u32 user_size, kernel_size, active_size; > > + > > + /* Can probably be made nicer by using check_zeroed_user(). */ > > + ret = get_user(user_size, &rseq->user_size); > > + if (ret) > > + return ret; > > + if (user_size != 0) > > + return -EINVAL; > > + > > + ret = get_user(active_size, &rseq->active_size); > > + if (ret) > > + return ret; > > + if (active_size != 0) > > + return -EINVAL; > > + > > + ret = get_user(active_size, &rseq->kernel_size); > > I guess you mean kernel_size here. > > > + if (ret) > > + return ret; > > + if (kernel_size != 0) > > + return -EINVAL; > > + > > + /* Calculate the useable size. */ > > + active_size = min_t(u32, rseq_len, RSEQ_SIZE_LATEST); > > Where is the rseq_len supposed to come from in userspace ? Should it be > that the code doing the registration uses sizeof(struct rseq), or offsetof(struct rseq, end), > or should it read the content of __rseq_abi.user_size ? > > > + ret = put_user(active_size, &rseq->active_size); > > + if (ret) > > + return ret; > > + > > + /* Let other users know what userspace used to register. */ > > + ret = put_user(rseq_len, &rseq->user_size); > > + if (ret) > > + return -EFAULT; > > + > > + /* Let other users know what size the kernel supports. */ > > I am not sure what those 3 __u32 fields (user_size, kernel_size, and active_size), > plus use of the rseq_len syscall parameter, accomplish which was not accomplished > by my __u16 user_size + kernel_size approach ? If anything, it seems to make support > of older kernels which do not support an extended rseq_len parameter more complex. Yeah, fair point. I really just sketched this. It seemed to me that what you might want to expose all three sizes in some form, i.e. the size the kernel knows about, the size that userspace knows about and the size in use. The advantage of exposing the size the kernel itself knows about and the size that userspace knows about is that you can infer the used size and you don't loose any information. When you only register the kernel used size and the size userspace knows about you can't necessarily infer what size the kernel supports. But I suppose there's no obvious case where this is needed rn. Thanks! Christian ^ permalink raw reply [flat|nested] 37+ messages in thread
* [RFC PATCH 3/4] selftests: rseq: define __rseq_abi with extensible size 2020-07-14 3:03 [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Mathieu Desnoyers 2020-07-14 3:03 ` [RFC PATCH 1/4] selftests: rseq: Use fixed value as rseq_len parameter Mathieu Desnoyers 2020-07-14 3:03 ` [RFC PATCH 2/4] rseq: Allow extending struct rseq Mathieu Desnoyers @ 2020-07-14 3:03 ` Mathieu Desnoyers 2020-07-14 3:03 ` [RFC PATCH 4/4] selftests: rseq: print rseq extensible size in basic test Mathieu Desnoyers 2020-07-14 20:55 ` [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Carlos O'Donell 4 siblings, 0 replies; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-14 3:03 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Thomas Gleixner, Paul E . McKenney, Boqun Feng, H . Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, carlos, Mathieu Desnoyers Define the __rseq_abi with the extensible size feature. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> --- tools/testing/selftests/rseq/rseq.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c index da2264c679b9..2c29215d4790 100644 --- a/tools/testing/selftests/rseq/rseq.c +++ b/tools/testing/selftests/rseq/rseq.c @@ -26,6 +26,7 @@ #include <assert.h> #include <signal.h> #include <limits.h> +#include <stddef.h> #include "rseq.h" @@ -33,6 +34,8 @@ __thread volatile struct rseq __rseq_abi = { .cpu_id = RSEQ_CPU_ID_UNINITIALIZED, + .flags = RSEQ_TLS_FLAG_SIZE, + .user_size = offsetof(struct rseq, end), }; /* -- 2.17.1 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 4/4] selftests: rseq: print rseq extensible size in basic test 2020-07-14 3:03 [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Mathieu Desnoyers ` (2 preceding siblings ...) 2020-07-14 3:03 ` [RFC PATCH 3/4] selftests: rseq: define __rseq_abi with extensible size Mathieu Desnoyers @ 2020-07-14 3:03 ` Mathieu Desnoyers 2020-07-14 20:55 ` [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Carlos O'Donell 4 siblings, 0 replies; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-14 3:03 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Thomas Gleixner, Paul E . McKenney, Boqun Feng, H . Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, carlos, Mathieu Desnoyers Print whether extensible size feature is supported by the kernel and __rseq_abi definition, along with the contents of the kernel_size field if it is available. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> --- tools/testing/selftests/rseq/basic_test.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/tools/testing/selftests/rseq/basic_test.c b/tools/testing/selftests/rseq/basic_test.c index d8efbfb89193..976e040574a1 100644 --- a/tools/testing/selftests/rseq/basic_test.c +++ b/tools/testing/selftests/rseq/basic_test.c @@ -10,6 +10,7 @@ #include <stdio.h> #include <string.h> #include <sys/time.h> +#include <inttypes.h> #include "rseq.h" @@ -35,6 +36,17 @@ void test_cpu_pointer(void) sched_setaffinity(0, sizeof(affinity), &affinity); } +static void print_rseq_size(void) +{ + bool supported = (__rseq_abi.flags & RSEQ_TLS_FLAG_SIZE) && __rseq_abi.kernel_size != 0; + + printf("extensible struct rseq supported: %s", + supported ? "yes" : "no"); + if (supported) + printf(" (kernel_size=%" PRIu16 ")", __rseq_abi.kernel_size); + printf("\n"); +} + int main(int argc, char **argv) { if (rseq_register_current_thread()) { @@ -42,6 +54,9 @@ int main(int argc, char **argv) errno, strerror(errno)); goto init_thread_error; } + + print_rseq_size(); + printf("testing current cpu\n"); test_cpu_pointer(); if (rseq_unregister_current_thread()) { -- 2.17.1 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq 2020-07-14 3:03 [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Mathieu Desnoyers ` (3 preceding siblings ...) 2020-07-14 3:03 ` [RFC PATCH 4/4] selftests: rseq: print rseq extensible size in basic test Mathieu Desnoyers @ 2020-07-14 20:55 ` Carlos O'Donell 2020-07-15 13:02 ` Mathieu Desnoyers 2020-07-15 15:12 ` Florian Weimer 4 siblings, 2 replies; 37+ messages in thread From: Carlos O'Donell @ 2020-07-14 20:55 UTC (permalink / raw) To: Mathieu Desnoyers, Peter Zijlstra Cc: linux-kernel, Thomas Gleixner, Paul E . McKenney, Boqun Feng, H . Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer On 7/13/20 11:03 PM, Mathieu Desnoyers wrote: > Recent discussion led to a solution for extending struct rseq. This is > an implementation of the proposed solution. > > Now is a good time to agree on this scheme before the release of glibc > 2.32, just in case there are small details to fix on the user-space > side in order to allow extending struct rseq. Adding extensibility to the rseq registration process would be great, but we are out of time for the glibc 2.32 release. Should we revert rseq for glibc 2.32 and spend quality time discussing the implications of an extensible design, something that Google already says they are doing? We can, with a clear head, and an agreed upon extension mechanism include rseq in glibc 2.33 (release scheduled for Feburary 1st 2021). We release time boxed every 6 months, no deviation, so you know when your next merge window will be. We have already done the hard work of fixing the nesting signal handler issues, and glibc integration. If we revert today that will also give time for Firefox and Chrome to adjust their sandboxes. Do you wish to go forward with rseq as we have it in glibc 2.32, or do you wish to revert rseq from glibc 2.32, discuss the extension mechanism, and put it back into glibc 2.33 with adjustments? -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq 2020-07-14 20:55 ` [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Carlos O'Donell @ 2020-07-15 13:02 ` Mathieu Desnoyers 2020-07-16 13:39 ` Carlos O'Donell 2020-07-15 15:12 ` Florian Weimer 1 sibling, 1 reply; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-15 13:02 UTC (permalink / raw) To: carlos Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, Chris Kennelly, Linus Torvalds ----- On Jul 14, 2020, at 4:55 PM, carlos carlos@redhat.com wrote: > On 7/13/20 11:03 PM, Mathieu Desnoyers wrote: >> Recent discussion led to a solution for extending struct rseq. This is >> an implementation of the proposed solution. >> >> Now is a good time to agree on this scheme before the release of glibc >> 2.32, just in case there are small details to fix on the user-space >> side in order to allow extending struct rseq. > > Adding extensibility to the rseq registration process would be great, > but we are out of time for the glibc 2.32 release. Of course, and my goal is not to add this support for extensibility before glibc 2.32, but merely to see if we need to change anything in the way it uses rseq today (before the release) in order to facilitate extensibility in the future. > Should we revert rseq for glibc 2.32 and spend quality time discussing > the implications of an extensible design, something that Google already > says they are doing? Google's approach is limited to contexts simpler than multiple unrelated libraries scenarios. Peter Oskolkov stated as a follow-up that my extension approach would be one way to deal with problems associated with sharing __rseq_abi between unrelated libraries: https://lore.kernel.org/lkml/CAPNVh5fiCCJpyeLj_ciWzFrO4fasVXZNhpfKXJhJWJirXdJOjQ@mail.gmail.com/ The fact that Google already have their own rseq extensions internally confirms that planning for extensibility is needed. > We can, with a clear head, and an agreed upon extension mechanism > include rseq in glibc 2.33 (release scheduled for Feburary 1st 2021). > We release time boxed every 6 months, no deviation, so you know when > your next merge window will be. > > We have already done the hard work of fixing the nesting signal > handler issues, and glibc integration. If we revert today that will > also give time for Firefox and Chrome to adjust their sandboxes. > > Do you wish to go forward with rseq as we have it in glibc 2.32, > or do you wish to revert rseq from glibc 2.32, discuss the extension > mechanism, and put it back into glibc 2.33 with adjustments? So here we have a catch-22 situation. Linus wants to see how rseq is being used before accepting additional features (ref. https://lore.kernel.org/lkml/CAHk-=wjk-2c4XvWjdzc-bs9Hbgvy-p7ASSnKKphggr5qDoXRDQ@mail.gmail.com/). This lack of ability to allow user-space to make any large-scale use of the rseq system call in a coordinated fashion blocks wide use of rseq. This coordination is supposed to be done by glibc, and I told every user-space project maintainer who contacted me to hold off using rseq until it is integrated into glibc. "tcmalloc" from Google is the exception because they do not care about ABI compatibility with other libraries (they are OK with a breakage and requiring upgrade). The process I'm going through right now is checking what are our options for extending rseq starting from the current ABI, just to see if we are painting ourselves in a corner with the current glibc integration. However, if we postpone integration of rseq into glibc because of possible future extensibility features, those may never happen because of the lack of usage feedback, due of lack of users, due to lack of coordinated ABI registration. At this point, the main question I would like answered is whether it would be acceptable to increase the size and alignment of the __rseq_abi symbol (which will be exposed by glibc) between e.g. glibc 2.32 and 2.33. If it's not possible, then we can find other solutions, for instance using an indirection with a pointer to an extended structure, but this appears to be slightly less efficient. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq 2020-07-15 13:02 ` Mathieu Desnoyers @ 2020-07-16 13:39 ` Carlos O'Donell 2020-07-16 14:45 ` Mathieu Desnoyers 0 siblings, 1 reply; 37+ messages in thread From: Carlos O'Donell @ 2020-07-16 13:39 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, Chris Kennelly, Linus Torvalds On 7/15/20 9:02 AM, Mathieu Desnoyers wrote: > At this point, the main question I would like answered is whether > it would be acceptable to increase the size and alignment of > the __rseq_abi symbol (which will be exposed by glibc) between > e.g. glibc 2.32 and 2.33. If it's not possible, then we can > find other solutions, for instance using an indirection with > a pointer to an extended structure, but this appears to be > slightly less efficient. The answer is always a soft "maybe" because it depends exactly on how we do it and what consequences we are willing to accept in the design. For example, static applications that call dlopen will fail if we increase the alignment beyond 32 because we had to special case this scenario. Why did we have to special case it? Because the "static" part of the runtime needs to create the initial thread's static TLS space, and since it doesn't know apriori what will be loaded in the shared library, it needs to make a "best guess" at the alignment requirement at startup. We need to discuss this and agree that it's OK. We already want to deprecate dynamic loading from static applications, so this may not be a problem in general, but I hope you see my point. That there are corner cases to be considered and ironed out. I want to see a detailed design document explaining the various compatibility issues and how we solve them along with the way the extension mechanism would work and how it would be compliant with C/C++ language rules in userspace without adding undue burden of potentially having to use atomic instructions all the time. This includes discussing how the headers change. We should also talk out the options for symbol versioning and their consequences. I haven't seen enough details, and there isn't really enough time to discuss this. I think it is *great* that we are discussing it, but it's safest if we revert rseq, finish the discussion, and then finalize the inclusion for 2.33 with these details ironed out. I feel like we've made all the technical process we need to actually include rseq in glibc, but this discussion, and the google example (even if it doesn't match our use case) shows that if we spend another month hammering out the extension details could yield something we can use for years to come while we work out other details e.g. cpu_opv. I can set aside time in the next month to write up such a document and discuss these issues with you and Florian. The text would form even more of the language we'd have to include in the man page for the feature. In the meantime I think we should revert rseq in glibc and take our time to hash this out without the looming deadline of August 1st for the ABI going out the door. I know this is disappointing, but I think in a month you'll look back at this, we'll have Fedora Rawhide using the new extensible version (and you'll be able to point people at that), and we'll only be 5 months away from an official release with extensible rseq. Could you please respond to Florian's request to revert here? https://sourceware.org/pipermail/libc-alpha/2020-July/116368.html I'm looking for a Signed-off-by from you that you're OK with reverting. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq 2020-07-16 13:39 ` Carlos O'Donell @ 2020-07-16 14:45 ` Mathieu Desnoyers 0 siblings, 0 replies; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-16 14:45 UTC (permalink / raw) To: carlos Cc: Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, Chris Kennelly, Linus Torvalds ----- On Jul 16, 2020, at 9:39 AM, carlos carlos@redhat.com wrote: > On 7/15/20 9:02 AM, Mathieu Desnoyers wrote: >> At this point, the main question I would like answered is whether >> it would be acceptable to increase the size and alignment of >> the __rseq_abi symbol (which will be exposed by glibc) between >> e.g. glibc 2.32 and 2.33. If it's not possible, then we can >> find other solutions, for instance using an indirection with >> a pointer to an extended structure, but this appears to be >> slightly less efficient. > > The answer is always a soft "maybe" because it depends exactly > on how we do it and what consequences we are willing to accept > in the design. > > For example, static applications that call dlopen will fail if > we increase the alignment beyond 32 because we had to special > case this scenario. Why did we have to special case it? Because > the "static" part of the runtime needs to create the initial > thread's static TLS space, and since it doesn't know apriori > what will be loaded in the shared library, it needs to make a > "best guess" at the alignment requirement at startup. > We need to discuss this and agree that it's OK. We already want > to deprecate dynamic loading from static applications, so this > may not be a problem in general, but I hope you see my point. > That there are corner cases to be considered and ironed out. Note that I don't foresee we will explicitly need to increase the alignment value for __rseq_abi beyond 32, but I was merely asking this for sake of completeness, in case extending struct rseq beyond a certain limit ever happens to increase the minimum alignment. > > I want to see a detailed design document explaining the various > compatibility issues and how we solve them along with the way > the extension mechanism would work and how it would be compliant > with C/C++ language rules in userspace without adding undue burden > of potentially having to use atomic instructions all the time. > This includes discussing how the headers change. We should also > talk out the options for symbol versioning and their consequences. > > I haven't seen enough details, and there isn't really enough > time to discuss this. I think it is *great* that we are discussing > it, but it's safest if we revert rseq, finish the discussion, > and then finalize the inclusion for 2.33 with these details > ironed out. Yes, absolutely. > > I feel like we've made all the technical process we need to actually > include rseq in glibc, but this discussion, and the google example > (even if it doesn't match our use case) shows that if we spend another > month hammering out the extension details could yield something we > can use for years to come while we work out other details e.g. cpu_opv. Indeed. Note that the current approach proposed to replace cpu_opv is "sched_pair_cpu", ref. https://lore.kernel.org/lkml/20200619202516.7109-1-mathieu.desnoyers@efficios.com/ > I can set aside time in the next month to write up such a document > and discuss these issues with you and Florian. The text would form > even more of the language we'd have to include in the man page for > the feature. I'll do my best to secure some time to work with you on this in the next month, but I will really have to focus on other projects which I had to delay to make sure the rseq integration was ready for glibc 2.32. > In the meantime I think we should revert rseq in glibc and take > our time to hash this out without the looming deadline of August 1st > for the ABI going out the door. > > I know this is disappointing, but I think in a month you'll look > back at this, we'll have Fedora Rawhide using the new extensible > version (and you'll be able to point people at that), and we'll > only be 5 months away from an official release with extensible > rseq. If this delay gives us a future-proof extensible rseq ABI, I'm absolutely for it! > Could you please respond to Florian's request to revert here? > https://sourceware.org/pipermail/libc-alpha/2020-July/116368.html > > I'm looking for a Signed-off-by from you that you're OK with > reverting. Will do, thanks! Mathieu > > -- > Cheers, > Carlos. -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq 2020-07-14 20:55 ` [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Carlos O'Donell 2020-07-15 13:02 ` Mathieu Desnoyers @ 2020-07-15 15:12 ` Florian Weimer 2020-07-15 15:32 ` Mathieu Desnoyers 1 sibling, 1 reply; 37+ messages in thread From: Florian Weimer @ 2020-07-15 15:12 UTC (permalink / raw) To: Carlos O'Donell Cc: Mathieu Desnoyers, Peter Zijlstra, linux-kernel, Thomas Gleixner, Paul E . McKenney, Boqun Feng, H . Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer * Carlos O'Donell: > On 7/13/20 11:03 PM, Mathieu Desnoyers wrote: >> Recent discussion led to a solution for extending struct rseq. This is >> an implementation of the proposed solution. >> >> Now is a good time to agree on this scheme before the release of glibc >> 2.32, just in case there are small details to fix on the user-space >> side in order to allow extending struct rseq. > > Adding extensibility to the rseq registration process would be great, > but we are out of time for the glibc 2.32 release. > > Should we revert rseq for glibc 2.32 and spend quality time discussing > the implications of an extensible design, something that Google already > says they are doing? > > We can, with a clear head, and an agreed upon extension mechanism > include rseq in glibc 2.33 (release scheduled for Feburary 1st 2021). > We release time boxed every 6 months, no deviation, so you know when > your next merge window will be. > > We have already done the hard work of fixing the nesting signal > handler issues, and glibc integration. If we revert today that will > also give time for Firefox and Chrome to adjust their sandboxes. > > Do you wish to go forward with rseq as we have it in glibc 2.32, > or do you wish to revert rseq from glibc 2.32, discuss the extension > mechanism, and put it back into glibc 2.33 with adjustments? I posted the glibc revert: <https://sourceware.org/pipermail/libc-alpha/2020-July/116368.html> I do not think we have any other choice at this point. Thanks, Florian ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 0/4] rseq: Introduce extensible struct rseq 2020-07-15 15:12 ` Florian Weimer @ 2020-07-15 15:32 ` Mathieu Desnoyers 0 siblings, 0 replies; 37+ messages in thread From: Mathieu Desnoyers @ 2020-07-15 15:32 UTC (permalink / raw) To: Florian Weimer Cc: carlos, Peter Zijlstra, linux-kernel, Thomas Gleixner, paulmck, Boqun Feng, H. Peter Anvin, Paul Turner, linux-api, Christian Brauner, Florian Weimer, Linus Torvalds ----- On Jul 15, 2020, at 11:12 AM, Florian Weimer fweimer@redhat.com wrote: > * Carlos O'Donell: > >> On 7/13/20 11:03 PM, Mathieu Desnoyers wrote: >>> Recent discussion led to a solution for extending struct rseq. This is >>> an implementation of the proposed solution. >>> >>> Now is a good time to agree on this scheme before the release of glibc >>> 2.32, just in case there are small details to fix on the user-space >>> side in order to allow extending struct rseq. >> >> Adding extensibility to the rseq registration process would be great, >> but we are out of time for the glibc 2.32 release. >> >> Should we revert rseq for glibc 2.32 and spend quality time discussing >> the implications of an extensible design, something that Google already >> says they are doing? >> >> We can, with a clear head, and an agreed upon extension mechanism >> include rseq in glibc 2.33 (release scheduled for Feburary 1st 2021). >> We release time boxed every 6 months, no deviation, so you know when >> your next merge window will be. >> >> We have already done the hard work of fixing the nesting signal >> handler issues, and glibc integration. If we revert today that will >> also give time for Firefox and Chrome to adjust their sandboxes. >> >> Do you wish to go forward with rseq as we have it in glibc 2.32, >> or do you wish to revert rseq from glibc 2.32, discuss the extension >> mechanism, and put it back into glibc 2.33 with adjustments? > > I posted the glibc revert: > > <https://sourceware.org/pipermail/libc-alpha/2020-July/116368.html> > > I do not think we have any other choice at this point. This is indeed the safe course of action. Let's hope the overall interest about rseq will be sufficient to justify integrating extensibility support in the rseq system call ABI, otherwise we have a catch-22 situation where everything is stalled again, due to further progress on rseq kernel features awaiting user feedback on the existing implementation, which will never come because the integration of coordinated use across libraries is awaiting further development at the kernel level. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2020-07-16 14:45 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-14 3:03 [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Mathieu Desnoyers 2020-07-14 3:03 ` [RFC PATCH 1/4] selftests: rseq: Use fixed value as rseq_len parameter Mathieu Desnoyers 2020-07-14 3:03 ` [RFC PATCH 2/4] rseq: Allow extending struct rseq Mathieu Desnoyers 2020-07-14 9:58 ` Florian Weimer 2020-07-14 12:50 ` Mathieu Desnoyers 2020-07-14 13:00 ` Florian Weimer 2020-07-14 13:19 ` Mathieu Desnoyers 2020-07-14 21:30 ` Carlos O'Donell 2020-07-15 13:12 ` Mathieu Desnoyers 2020-07-15 13:22 ` Florian Weimer 2020-07-15 13:31 ` Mathieu Desnoyers 2020-07-15 13:42 ` Florian Weimer 2020-07-15 13:55 ` Christian Brauner 2020-07-15 14:20 ` Mathieu Desnoyers 2020-07-15 14:54 ` Mathieu Desnoyers 2020-07-15 14:58 ` Florian Weimer 2020-07-15 15:26 ` Mathieu Desnoyers 2020-07-14 17:24 ` Peter Oskolkov 2020-07-14 17:43 ` Mathieu Desnoyers 2020-07-14 18:33 ` Peter Oskolkov 2020-07-15 2:34 ` Chris Kennelly 2020-07-15 6:31 ` Florian Weimer 2020-07-15 10:59 ` Christian Brauner 2020-07-15 14:38 ` Mathieu Desnoyers 2020-07-15 14:50 ` Mathieu Desnoyers 2020-07-15 11:38 ` Christian Brauner 2020-07-15 12:33 ` Christian Brauner 2020-07-15 15:10 ` Mathieu Desnoyers 2020-07-15 15:33 ` Christian Brauner 2020-07-14 3:03 ` [RFC PATCH 3/4] selftests: rseq: define __rseq_abi with extensible size Mathieu Desnoyers 2020-07-14 3:03 ` [RFC PATCH 4/4] selftests: rseq: print rseq extensible size in basic test Mathieu Desnoyers 2020-07-14 20:55 ` [RFC PATCH 0/4] rseq: Introduce extensible struct rseq Carlos O'Donell 2020-07-15 13:02 ` Mathieu Desnoyers 2020-07-16 13:39 ` Carlos O'Donell 2020-07-16 14:45 ` Mathieu Desnoyers 2020-07-15 15:12 ` Florian Weimer 2020-07-15 15:32 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).