All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
       [not found] <20190212194253.1951-1-mathieu.desnoyers@efficios.com>
@ 2019-02-12 19:42 ` Mathieu Desnoyers
  2019-03-22 20:09   ` Carlos O'Donell
  2019-02-12 19:42 ` [PATCH 2/4] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux Mathieu Desnoyers
  1 sibling, 1 reply; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-02-12 19:42 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Mathieu Desnoyers, Thomas Gleixner, Ben Maurer, Peter Zijlstra,
	Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson,
	Paul Turner, Rich Felker, linux-kernel, linux-api

Register rseq(2) TLS for each thread (including main), and unregister
for each thread (excluding main). "rseq" stands for Restartable
Sequences.

See the rseq(2) man page proposed here:
  https://lkml.org/lkml/2018/9/19/647

This patch is based on glibc-2.29. The rseq(2) system call was merged
into Linux 4.18.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Carlos O'Donell <carlos@redhat.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: Joseph Myers <joseph@codesourcery.com>
CC: Szabolcs Nagy <szabolcs.nagy@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ben Maurer <bmaurer@fb.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Dave Watson <davejwatson@fb.com>
CC: Paul Turner <pjt@google.com>
CC: Rich Felker <dalias@libc.org>
CC: libc-alpha@sourceware.org
CC: linux-kernel@vger.kernel.org
CC: linux-api@vger.kernel.org
---
Changes since v1:
- Move __rseq_refcount to an extra field at the end of __rseq_abi to
  eliminate one symbol.

  All libraries/programs which try to register rseq (glibc,
  early-adopter applications, early-adopter libraries) should use the
  rseq refcount. It becomes part of the ABI within a user-space
  process, but it's not part of the ABI shared with the kernel per se.

- Restructure how this code is organized so glibc keeps building on
  non-Linux targets.

- Use non-weak symbol for __rseq_abi.

- Move rseq registration/unregistration implementation into its own
  nptl/rseq.c compile unit.

- Move __rseq_abi symbol under GLIBC_2.29.

Changes since v2:
- Move __rseq_refcount to its own symbol, which is less ugly than
  trying to play tricks with the rseq uapi.
- Move __rseq_abi from nptl to csu (C start up), so it can be used
  across glibc, including memory allocator and sched_getcpu(). The
  __rseq_refcount symbol is kept in nptl, because there is no reason
  to use it elsewhere in glibc.

Changes since v3:
- Set __rseq_refcount TLS to 1 on register/set to 0 on unregister
  because glibc is the first/last user.
- Unconditionally register/unregister rseq at thread start/exit, because
  glibc is the first/last user.
- Add missing abilist items.
- Rebase on glibc master commit a502c5294.
- Add NEWS entry.

Changes since v4:
- Do not use "weak" symbols for __rseq_abi and __rseq_refcount. Based on
  "System V Application Binary Interface", weak only affects the link
  editor, not the dynamic linker.
- Install a new sys/rseq.h system header on Linux, which contains the
  RSEQ_SIG definition, __rseq_abi declaration and __rseq_refcount
  declaration. Move those definition/declarations from rseq-internal.h
  to the installed sys/rseq.h header.
- Considering that rseq is only available on Linux, move csu/rseq.c to
  sysdeps/unix/sysv/linux/rseq-sym.c.
- Move __rseq_refcount from nptl/rseq.c to
  sysdeps/unix/sysv/linux/rseq-sym.c, so it is only defined on Linux.
- Move both ABI definitions for __rseq_abi and __rseq_refcount to
  sysdeps/unix/sysv/linux/Versions, so they only appear on Linux.
- Document __rseq_abi and __rseq_refcount volatile.
- Document the RSEQ_SIG signature define.
- Move registration functions from rseq.c to rseq-internal.h static
  inline functions. Introduce empty stubs in misc/rseq-internal.h,
  which can be overridden by architecture code in
  sysdeps/unix/sysv/linux/rseq-internal.h.
- Rename __rseq_register_current_thread and __rseq_unregister_current_thread
  to rseq_register_current_thread and rseq_unregister_current_thread,
  now that those are only visible as internal static inline functions.
- Invoke rseq_register_current_thread() from libc-start.c LIBC_START_MAIN
  rather than nptl init, so applications not linked against
  libpthread.so have rseq registered for their main() thread. Note that
  it is invoked separately for SHARED and !SHARED builds.

Changes since v5:
- Replace __rseq_refcount by __rseq_lib_abi, which contains two
  uint32_t: register_state and refcount. The "register_state" field
  allows inhibiting rseq registration from signal handlers nested on top
  of glibc registration and occuring after rseq unregistration by glibc.
- Introduce enum rseq_register_state, which contains the states allowed
  for the struct rseq_lib_abi register_state field.

Changes since v6:
- Introduce bits/rseq.h to define RSEQ_SIG for each architecture.
  The generic bits/rseq.h does not define RSEQ_SIG, meaning that each
  architecture implementing rseq needs to implement bits/rseq.h.
- Rename enum item RSEQ_REGISTER_NESTED to RSEQ_REGISTER_ONGOING.
- Port to glibc-2.29.
---
 NEWS                                          | 11 +++
 csu/libc-start.c                              | 12 ++-
 misc/Makefile                                 |  3 +-
 misc/rseq-internal.h                          | 34 +++++++
 nptl/pthread_create.c                         |  9 ++
 sysdeps/unix/sysv/linux/Makefile              |  4 +-
 sysdeps/unix/sysv/linux/Versions              |  4 +
 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h   | 24 +++++
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |  2 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |  2 +
 sysdeps/unix/sysv/linux/arm/bits/rseq.h       | 24 +++++
 sysdeps/unix/sysv/linux/arm/libc.abilist      |  2 +
 sysdeps/unix/sysv/linux/bits/rseq.h           | 24 +++++
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |  2 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |  2 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |  2 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |  2 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |  2 +
 .../unix/sysv/linux/microblaze/libc.abilist   |  2 +
 sysdeps/unix/sysv/linux/mips/bits/rseq.h      | 24 +++++
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |  2 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |  2 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |  2 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |  2 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |  2 +
 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h   | 24 +++++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |  2 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |  2 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |  2 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |  2 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |  2 +
 sysdeps/unix/sysv/linux/rseq-internal.h       | 91 +++++++++++++++++++
 sysdeps/unix/sysv/linux/rseq-sym.c            | 54 +++++++++++
 sysdeps/unix/sysv/linux/s390/bits/rseq.h      | 24 +++++
 .../unix/sysv/linux/s390/s390-32/libc.abilist |  2 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |  2 +
 sysdeps/unix/sysv/linux/sh/libc.abilist       |  2 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |  2 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |  2 +
 sysdeps/unix/sysv/linux/sys/rseq.h            | 65 +++++++++++++
 sysdeps/unix/sysv/linux/x86/bits/rseq.h       | 24 +++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |  2 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |  2 +
 43 files changed, 501 insertions(+), 6 deletions(-)
 create mode 100644 misc/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/arm/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-sym.c
 create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h

diff --git a/NEWS b/NEWS
index 912a9bdc0f..0608c60f7d 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,17 @@ See the end for copying conditions.
 Please send GNU C library bug reports via <https://sourceware.org/bugzilla/>
 using `glibc' in the "product" field.
 \f
+Version 2.30
+
+Major new features:
+
+* Support for automatically registering threads with the Linux rseq(2)
+  system call has been added. This system call is implemented starting
+  from Linux 4.18. In order to be activated, it requires that glibc is built
+  against kernel headers that include this system call, and that glibc
+  detects availability of that system call at runtime.
+
+\f
 Version 2.29
 
 Major new features:
diff --git a/csu/libc-start.c b/csu/libc-start.c
index 5d9c3675fa..8680afc0ef 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -22,6 +22,7 @@
 #include <ldsodefs.h>
 #include <exit-thread.h>
 #include <libc-internal.h>
+#include <rseq-internal.h>
 
 #include <elf/dl-tunables.h>
 
@@ -140,7 +141,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
 
   __libc_multiple_libcs = &_dl_starting_up && !_dl_starting_up;
 
-#ifndef SHARED
+#ifdef SHARED
+  /* Register rseq ABI to the kernel. */
+  (void) rseq_register_current_thread ();
+#else
   _dl_relocate_static_pie ();
 
   char **ev = &argv[argc + 1];
@@ -218,6 +222,9 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
     }
 # endif
 
+  /* Register rseq ABI to the kernel. */
+  (void) rseq_register_current_thread ();
+
   /* Initialize libpthread if linked in.  */
   if (__pthread_initialize_minimal != NULL)
     __pthread_initialize_minimal ();
@@ -230,8 +237,7 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
 # else
   __pointer_chk_guard_local = pointer_chk_guard;
 # endif
-
-#endif /* !SHARED  */
+#endif
 
   /* Register the destructor of the dynamic linker if there is any.  */
   if (__glibc_likely (rtld_fini != NULL))
diff --git a/misc/Makefile b/misc/Makefile
index cf0daa1161..0ae1dbaf80 100644
--- a/misc/Makefile
+++ b/misc/Makefile
@@ -36,7 +36,8 @@ headers	:= sys/uio.h bits/uio-ext.h bits/uio_lim.h \
 	   syslog.h sys/syslog.h \
 	   bits/syslog.h bits/syslog-ldbl.h bits/syslog-path.h bits/error.h \
 	   bits/select2.h bits/hwcap.h sys/auxv.h \
-	   sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h
+	   sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h \
+	   rseq-internal.h
 
 routines := brk sbrk sstk ioctl \
 	    readv writev preadv preadv64 pwritev pwritev64 \
diff --git a/misc/rseq-internal.h b/misc/rseq-internal.h
new file mode 100644
index 0000000000..915122e4bf
--- /dev/null
+++ b/misc/rseq-internal.h
@@ -0,0 +1,34 @@
+/* Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef RSEQ_INTERNAL_H
+#define RSEQ_INTERNAL_H
+
+static inline int
+rseq_register_current_thread (void)
+{
+  return -1;
+}
+
+static inline int
+rseq_unregister_current_thread (void)
+{
+  return -1;
+}
+
+#endif /* rseq-internal.h */
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 2bd2b10727..90b3419390 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -33,6 +33,7 @@
 #include <default-sched.h>
 #include <futex-internal.h>
 #include <tls-setup.h>
+#include <rseq-internal.h>
 #include "libioP.h"
 
 #include <shlib-compat.h>
@@ -378,6 +379,7 @@ __free_tcb (struct pthread *pd)
 START_THREAD_DEFN
 {
   struct pthread *pd = START_THREAD_SELF;
+  bool has_rseq = false;
 
 #if HP_TIMING_AVAIL
   /* Remember the time when the thread was started.  */
@@ -396,6 +398,9 @@ START_THREAD_DEFN
   if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2))
     futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE);
 
+  /* Register rseq TLS to the kernel. */
+  has_rseq = !rseq_register_current_thread ();
+
 #ifdef __NR_set_robust_list
 # ifndef __ASSUME_SET_ROBUST_LIST
   if (__set_robust_list_avail >= 0)
@@ -573,6 +578,10 @@ START_THREAD_DEFN
     }
 #endif
 
+  /* Unregister rseq TLS from kernel. */
+  if (has_rseq && rseq_unregister_current_thread ())
+    abort();
+
   advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
 		      pd->guardsize);
 
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 5f8c2c7c7d..5b541469ec 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -1,5 +1,5 @@
 ifeq ($(subdir),csu)
-sysdep_routines += errno-loc
+sysdep_routines += errno-loc rseq-sym
 endif
 
 ifeq ($(subdir),assert)
@@ -48,7 +48,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \
 		  bits/termios-c_iflag.h bits/termios-c_oflag.h \
 		  bits/termios-baud.h bits/termios-c_cflag.h \
 		  bits/termios-c_lflag.h bits/termios-tcflow.h \
-		  bits/termios-misc.h
+		  bits/termios-misc.h sys/rseq.h bits/rseq.h
 
 tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \
 	 tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index f1e12d9c69..ad88c2b7ff 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -174,6 +174,10 @@ libc {
   GLIBC_2.29 {
     getcpu;
   }
+  GLIBC_2.30 {
+    __rseq_abi;
+    __rseq_lib_abi;
+  }
   GLIBC_PRIVATE {
     # functions used in other libraries
     __syscall_rt_sigqueueinfo;
diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
new file mode 100644
index 0000000000..543bc5388a
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
@@ -0,0 +1,24 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* Signature required before each abort handler code.  */
+#define RSEQ_SIG 0xd428bc00	/* BRK #0x45E0.  */
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 9c330f325e..bc937f585d 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2141,3 +2141,5 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index f630fa4c6f..89cc8b1cfb 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2036,6 +2036,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/arm/bits/rseq.h b/sysdeps/unix/sysv/linux/arm/bits/rseq.h
new file mode 100644
index 0000000000..19d3755837
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h
@@ -0,0 +1,24 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* Signature required before each abort handler code.  */
+#define RSEQ_SIG 0x53053053
diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist
index b96f45590f..e5055f2d4e 100644
--- a/sysdeps/unix/sysv/linux/arm/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/libc.abilist
@@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/bits/rseq.h b/sysdeps/unix/sysv/linux/bits/rseq.h
new file mode 100644
index 0000000000..d60f02cfeb
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/bits/rseq.h
@@ -0,0 +1,24 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* Each architecture supporting rseq should define RSEQ_SIG as a 32-bit
+   signature inserted before each rseq abort label in the code section.  */
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 088a8ee369..546d073cdb 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -1883,6 +1883,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index f7ff2c57b9..ac1de6e4b3 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2048,6 +2048,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index becd8b1033..cc3445b958 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -1917,6 +1917,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index 74e42a5209..f7e28bd5a0 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -127,6 +127,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0x98
 GLIBC_2.4 _IO_2_1_stdin_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 4af5a74e8a..b8f00f6111 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -1992,6 +1992,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
index ccef673fd2..19f191434f 100644
--- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
@@ -2133,3 +2133,5 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
diff --git a/sysdeps/unix/sysv/linux/mips/bits/rseq.h b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
new file mode 100644
index 0000000000..19d3755837
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
@@ -0,0 +1,24 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* Signature required before each abort handler code.  */
+#define RSEQ_SIG 0x53053053
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 1054bb599e..fe43507f55 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -1970,6 +1970,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 4f5b5ffebf..b247c6ea9b 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -1968,6 +1968,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 943aee58d4..5339ca52b6 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -1976,6 +1976,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 17a5d17ef9..11f24eb440 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -1971,6 +1971,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 4d62a540fd..fd223bfc44 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2174,3 +2174,5 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
diff --git a/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
new file mode 100644
index 0000000000..19d3755837
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
@@ -0,0 +1,24 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* Signature required before each abort handler code.  */
+#define RSEQ_SIG 0x53053053
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index ecc2d6fa13..cc53178e81 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -1996,6 +1996,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index f5830f9c33..2de3134bc7 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2000,6 +2000,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index 633d8f4792..aae3def700 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 _Exit F
 GLIBC_2.3 _IO_2_1_stderr_ D 0xe0
 GLIBC_2.3 _IO_2_1_stdin_ D 0xe0
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index 2c712636ef..8d582a3a9b 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2231,3 +2231,5 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 195bc8b2cf..155953f6cf 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2103,3 +2103,5 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
new file mode 100644
index 0000000000..d676da3701
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -0,0 +1,91 @@
+/* Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef RSEQ_INTERNAL_H
+#define RSEQ_INTERNAL_H
+
+#include <sysdep.h>
+
+#ifdef __NR_rseq
+
+#include <errno.h>
+#include <sys/rseq.h>
+
+static inline int
+rseq_register_current_thread (void)
+{
+  int rc, ret = 0;
+  INTERNAL_SYSCALL_DECL (err);
+
+  if (__rseq_abi.cpu_id == RSEQ_CPU_ID_REGISTRATION_FAILED)
+    return -1;
+  /* Temporarily prevent nested signal handlers from registering rseq.  */
+  __rseq_lib_abi.register_state = RSEQ_REGISTER_ONGOING;
+  if (__rseq_lib_abi.refcount == UINT_MAX)
+    {
+      ret = -1;
+      goto end;
+    }
+  if (__rseq_lib_abi.refcount++)
+    goto end;
+  rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq),
+                              0, RSEQ_SIG);
+  if (!rc)
+    goto end;
+  if (INTERNAL_SYSCALL_ERRNO (rc, err) != EBUSY)
+    __rseq_abi.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED;
+  ret = -1;
+end:
+  __rseq_lib_abi.register_state = RSEQ_REGISTER_ALLOWED;
+  return ret;
+}
+
+static inline int
+rseq_unregister_current_thread (void)
+{
+  int rc, ret = 0;
+  INTERNAL_SYSCALL_DECL (err);
+
+  /* Setting __rseq_register_state = RSEQ_REGISTER_EXITING for the rest of the
+     thread lifetime. Ensures signal handlers nesting just before thread exit
+     don't try to register rseq.  */
+  __rseq_lib_abi.register_state = RSEQ_REGISTER_EXITING;
+  __rseq_lib_abi.refcount = 0;
+  rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq),
+                              RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
+  if (!rc)
+    goto end;
+  ret = -1;
+end:
+  return ret;
+}
+#else
+static inline int
+rseq_register_current_thread (void)
+{
+  return -1;
+}
+
+static inline int
+rseq_unregister_current_thread (void)
+{
+  return -1;
+}
+#endif
+
+#endif /* rseq-internal.h */
diff --git a/sysdeps/unix/sysv/linux/rseq-sym.c b/sysdeps/unix/sysv/linux/rseq-sym.c
new file mode 100644
index 0000000000..99b277e9d6
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/rseq-sym.c
@@ -0,0 +1,54 @@
+/* Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/syscall.h>
+#include <stdint.h>
+
+#ifdef __NR_rseq
+#include <sys/rseq.h>
+#else
+
+enum rseq_cpu_id_state {
+  RSEQ_CPU_ID_UNINITIALIZED = -1,
+  RSEQ_CPU_ID_REGISTRATION_FAILED = -2,
+};
+
+/* linux/rseq.h defines struct rseq as aligned on 32 bytes. The kernel ABI
+   size is 20 bytes.  */
+struct rseq {
+  uint32_t cpu_id_start;
+  uint32_t cpu_id;
+  uint64_t rseq_cs;
+  uint32_t flags;
+} __attribute__ ((aligned(4 * sizeof(uint64_t))));
+
+struct rseq_lib_abi
+{
+  uint32_t register_state;
+  uint32_t refcount;
+};
+
+#endif
+
+/* volatile because fields can be read/updated by the kernel.  */
+__thread volatile struct rseq __rseq_abi = {
+  .cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
+};
+
+/* volatile because fields can be read/updated by signal handlers.  */
+__thread volatile struct rseq_lib_abi __rseq_lib_abi;
diff --git a/sysdeps/unix/sysv/linux/s390/bits/rseq.h b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
new file mode 100644
index 0000000000..19d3755837
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
@@ -0,0 +1,24 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* Signature required before each abort handler code.  */
+#define RSEQ_SIG 0x53053053
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 334def033c..42316d8666 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2005,6 +2005,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 536f4c4ced..c6c4e55a77 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -1911,6 +1911,8 @@ GLIBC_2.29 __fentry__ F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist
index 30ae3b6ebb..8652dfea59 100644
--- a/sysdeps/unix/sysv/linux/sh/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/libc.abilist
@@ -1887,6 +1887,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 68b107d080..95b58dfa67 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -1999,6 +1999,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index e5b6a4da50..bfd24f9d1c 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -1940,6 +1940,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h
new file mode 100644
index 0000000000..83c8976f50
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/sys/rseq.h
@@ -0,0 +1,65 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+#define _SYS_RSEQ_H	1
+
+/* We use the structures declarations from the kernel headers.  */
+#include <linux/rseq.h>
+/* Architecture-specific rseq signature.  */
+#include <bits/rseq.h>
+#include <stdint.h>
+
+enum rseq_register_state
+{
+  /* Value RSEQ_REGISTER_ALLOWED means it is allowed to update
+     the refcount field and to register/unregister rseq.  */
+  RSEQ_REGISTER_ALLOWED = 0,
+  /* Value RSEQ_REGISTER_ONGOING means a rseq registration is in progress,
+     so it is temporarily forbidden to update the refcount field or to
+     register/unregister rseq for this thread or signal handlers nested
+     on this thread.  */
+  RSEQ_REGISTER_ONGOING = 1,
+  /* Value RSEQ_REGISTER_EXITING means it is forbidden to update the
+     refcount field or to register/unregister rseq for the rest of the
+     thread's lifetime.  */
+  RSEQ_REGISTER_EXITING = 2,
+};
+
+struct rseq_lib_abi
+{
+  uint32_t register_state; /* enum rseq_register_state.  */
+  /* The refcount field keeps track of rseq users, so early adopters
+     of rseq can cooperate amongst each other and with glibc to
+     share rseq thread registration. The refcount field can only be
+     updated when allowed by the value of field register_state.
+     Registering rseq should be performed when incrementing refcount
+     from 0 to 1, and unregistering rseq should be performed when
+     decrementing refcount from 1 to 0.  */
+  uint32_t refcount;
+};
+
+/* volatile because fields can be read/updated by the kernel.  */
+extern __thread volatile struct rseq __rseq_abi
+__attribute__ ((tls_model ("initial-exec")));
+
+/* volatile because fields can be read/updated by signal handlers.  */
+extern __thread volatile struct rseq_lib_abi __rseq_lib_abi
+__attribute__ ((tls_model ("initial-exec")));
+
+#endif /* sys/rseq.h */
diff --git a/sysdeps/unix/sysv/linux/x86/bits/rseq.h b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
new file mode 100644
index 0000000000..19d3755837
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
@@ -0,0 +1,24 @@
+/* Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* Signature required before each abort handler code.  */
+#define RSEQ_SIG 0x53053053
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 86dfb0c94d..e9f8411fb2 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -1898,6 +1898,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
 GLIBC_2.3 __ctype_toupper_loc F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index dd688263aa..f9432d07f1 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2149,3 +2149,5 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_lib_abi T 0x8
-- 
2.17.1




^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 2/4] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux
       [not found] <20190212194253.1951-1-mathieu.desnoyers@efficios.com>
  2019-02-12 19:42 ` [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) Mathieu Desnoyers
@ 2019-02-12 19:42 ` Mathieu Desnoyers
  2019-03-22 20:13   ` Carlos O'Donell
  1 sibling, 1 reply; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-02-12 19:42 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Mathieu Desnoyers, Thomas Gleixner, Ben Maurer, Peter Zijlstra,
	Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson,
	Paul Turner, linux-kernel, linux-api

When available, use the cpu_id field from __rseq_abi on Linux to
implement sched_getcpu(). Fall-back on the vgetcpu vDSO if unavailable.

Benchmarks:

x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading

glibc sched_getcpu():                     13.7 ns (baseline)
glibc sched_getcpu() using rseq:           2.5 ns (speedup:  5.5x)
inline load cpuid from __rseq_abi TLS:     0.8 ns (speedup: 17.1x)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Carlos O'Donell <carlos@redhat.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: Joseph Myers <joseph@codesourcery.com>
CC: Szabolcs Nagy <szabolcs.nagy@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ben Maurer <bmaurer@fb.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Dave Watson <davejwatson@fb.com>
CC: Paul Turner <pjt@google.com>
CC: libc-alpha@sourceware.org
CC: linux-kernel@vger.kernel.org
CC: linux-api@vger.kernel.org
---
 sysdeps/unix/sysv/linux/sched_getcpu.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c
index fb0d317f83..8bfb03778b 100644
--- a/sysdeps/unix/sysv/linux/sched_getcpu.c
+++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
@@ -24,8 +24,8 @@
 #endif
 #include <sysdep-vdso.h>
 
-int
-sched_getcpu (void)
+static int
+vsyscall_sched_getcpu (void)
 {
 #ifdef __NR_getcpu
   unsigned int cpu;
@@ -37,3 +37,24 @@ sched_getcpu (void)
   return -1;
 #endif
 }
+
+#ifdef __NR_rseq
+#include <linux/rseq.h>
+
+extern __attribute__ ((tls_model ("initial-exec")))
+__thread volatile struct rseq __rseq_abi;
+
+int
+sched_getcpu (void)
+{
+  int cpu_id = __rseq_abi.cpu_id;
+
+  return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu ();
+}
+#else
+int
+sched_getcpu (void)
+{
+  return vsyscall_sched_getcpu ();
+}
+#endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-02-12 19:42 ` [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) Mathieu Desnoyers
@ 2019-03-22 20:09   ` Carlos O'Donell
  2019-03-25 15:54     ` Mathieu Desnoyers
  0 siblings, 1 reply; 55+ messages in thread
From: Carlos O'Donell @ 2019-03-22 20:09 UTC (permalink / raw)
  To: Mathieu Desnoyers, Carlos O'Donell
  Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Will Deacon, Dave Watson, Paul Turner, Rich Felker,
	linux-kernel, linux-api

On 2/12/19 2:42 PM, Mathieu Desnoyers wrote:
> Register rseq(2) TLS for each thread (including main), and unregister
> for each thread (excluding main). "rseq" stands for Restartable
> Sequences.

Thanks, the implementation is looking good, before this goes in it needs
some procedural cleanup and the following:

(a) I would like to see a final position from Florian, either accept,
     object (objection recorded), or sustained objection (blocks this patch),
     on the issue of the registration protocol.

     I suggested a possible way forward to remove the registration protocol
     down the line, leaving only the __rseq_lib_abi as the residual artifact
     once we deprecate coordination with other libraries. Applications would
     have to be written with this in mind, and coordination still remains
     with libc for the time being. Perhaps Mathieu has another suggestion.

(b) I would like to see a final position from Rich Felker, either accept,
     object (objection recorded), or sustained objection (blocks this patch),
     on the patch set as a whole. I trust Rich's opinion as an alternative
     libc maintainer in this matter and as a independent implementor of
     similar standards. Rich will have to implement this for musl, and it
     would be good to have consensus from him.

So while I reviewed this patch, and provided nit-picky feedback below,
I think we need a whole new thread, to finalize the way forward for the
registration process.

Next steps:

- Start a new thread to talk *only* about options for removing the
   refcounted registration.

> See the rseq(2) man page proposed here:
>    https://lkml.org/lkml/2018/9/19/647

Thank you for having a man page, that makes for a very easy to follow
description of the structures, syscall and use cases.

> 
> This patch is based on glibc-2.29. The rseq(2) system call was merged
> into Linux 4.18.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Carlos O'Donell <carlos@redhat.com>
> CC: Florian Weimer <fweimer@redhat.com>
> CC: Joseph Myers <joseph@codesourcery.com>
> CC: Szabolcs Nagy <szabolcs.nagy@arm.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Ben Maurer <bmaurer@fb.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Will Deacon <will.deacon@arm.com>
> CC: Dave Watson <davejwatson@fb.com>
> CC: Paul Turner <pjt@google.com>
> CC: Rich Felker <dalias@libc.org>
> CC: libc-alpha@sourceware.org
> CC: linux-kernel@vger.kernel.org
> CC: linux-api@vger.kernel.org
> ---
> Changes since v1:
> - Move __rseq_refcount to an extra field at the end of __rseq_abi to
>    eliminate one symbol.
> 
>    All libraries/programs which try to register rseq (glibc,
>    early-adopter applications, early-adopter libraries) should use the
>    rseq refcount. It becomes part of the ABI within a user-space
>    process, but it's not part of the ABI shared with the kernel per se.
> 
> - Restructure how this code is organized so glibc keeps building on
>    non-Linux targets.
> 
> - Use non-weak symbol for __rseq_abi.
> 
> - Move rseq registration/unregistration implementation into its own
>    nptl/rseq.c compile unit.
> 
> - Move __rseq_abi symbol under GLIBC_2.29.
> 
> Changes since v2:
> - Move __rseq_refcount to its own symbol, which is less ugly than
>    trying to play tricks with the rseq uapi.
> - Move __rseq_abi from nptl to csu (C start up), so it can be used
>    across glibc, including memory allocator and sched_getcpu(). The
>    __rseq_refcount symbol is kept in nptl, because there is no reason
>    to use it elsewhere in glibc.
> 
> Changes since v3:
> - Set __rseq_refcount TLS to 1 on register/set to 0 on unregister
>    because glibc is the first/last user.
> - Unconditionally register/unregister rseq at thread start/exit, because
>    glibc is the first/last user.
> - Add missing abilist items.
> - Rebase on glibc master commit a502c5294.
> - Add NEWS entry.
> 
> Changes since v4:
> - Do not use "weak" symbols for __rseq_abi and __rseq_refcount. Based on
>    "System V Application Binary Interface", weak only affects the link
>    editor, not the dynamic linker.
> - Install a new sys/rseq.h system header on Linux, which contains the
>    RSEQ_SIG definition, __rseq_abi declaration and __rseq_refcount
>    declaration. Move those definition/declarations from rseq-internal.h
>    to the installed sys/rseq.h header.
> - Considering that rseq is only available on Linux, move csu/rseq.c to
>    sysdeps/unix/sysv/linux/rseq-sym.c.
> - Move __rseq_refcount from nptl/rseq.c to
>    sysdeps/unix/sysv/linux/rseq-sym.c, so it is only defined on Linux.
> - Move both ABI definitions for __rseq_abi and __rseq_refcount to
>    sysdeps/unix/sysv/linux/Versions, so they only appear on Linux.
> - Document __rseq_abi and __rseq_refcount volatile.
> - Document the RSEQ_SIG signature define.
> - Move registration functions from rseq.c to rseq-internal.h static
>    inline functions. Introduce empty stubs in misc/rseq-internal.h,
>    which can be overridden by architecture code in
>    sysdeps/unix/sysv/linux/rseq-internal.h.
> - Rename __rseq_register_current_thread and __rseq_unregister_current_thread
>    to rseq_register_current_thread and rseq_unregister_current_thread,
>    now that those are only visible as internal static inline functions.
> - Invoke rseq_register_current_thread() from libc-start.c LIBC_START_MAIN
>    rather than nptl init, so applications not linked against
>    libpthread.so have rseq registered for their main() thread. Note that
>    it is invoked separately for SHARED and !SHARED builds.
> 
> Changes since v5:
> - Replace __rseq_refcount by __rseq_lib_abi, which contains two
>    uint32_t: register_state and refcount. The "register_state" field
>    allows inhibiting rseq registration from signal handlers nested on top
>    of glibc registration and occuring after rseq unregistration by glibc.
> - Introduce enum rseq_register_state, which contains the states allowed
>    for the struct rseq_lib_abi register_state field.
> 
> Changes since v6:
> - Introduce bits/rseq.h to define RSEQ_SIG for each architecture.
>    The generic bits/rseq.h does not define RSEQ_SIG, meaning that each
>    architecture implementing rseq needs to implement bits/rseq.h.
> - Rename enum item RSEQ_REGISTER_NESTED to RSEQ_REGISTER_ONGOING.
> - Port to glibc-2.29.
> ---
>   NEWS                                          | 11 +++
>   csu/libc-start.c                              | 12 ++-
>   misc/Makefile                                 |  3 +-
>   misc/rseq-internal.h                          | 34 +++++++
>   nptl/pthread_create.c                         |  9 ++
>   sysdeps/unix/sysv/linux/Makefile              |  4 +-
>   sysdeps/unix/sysv/linux/Versions              |  4 +
>   sysdeps/unix/sysv/linux/aarch64/bits/rseq.h   | 24 +++++
>   sysdeps/unix/sysv/linux/aarch64/libc.abilist  |  2 +
>   sysdeps/unix/sysv/linux/alpha/libc.abilist    |  2 +
>   sysdeps/unix/sysv/linux/arm/bits/rseq.h       | 24 +++++
>   sysdeps/unix/sysv/linux/arm/libc.abilist      |  2 +
>   sysdeps/unix/sysv/linux/bits/rseq.h           | 24 +++++
>   sysdeps/unix/sysv/linux/hppa/libc.abilist     |  2 +
>   sysdeps/unix/sysv/linux/i386/libc.abilist     |  2 +
>   sysdeps/unix/sysv/linux/ia64/libc.abilist     |  2 +
>   .../sysv/linux/m68k/coldfire/libc.abilist     |  2 +
>   .../unix/sysv/linux/m68k/m680x0/libc.abilist  |  2 +
>   .../unix/sysv/linux/microblaze/libc.abilist   |  2 +
>   sysdeps/unix/sysv/linux/mips/bits/rseq.h      | 24 +++++
>   .../sysv/linux/mips/mips32/fpu/libc.abilist   |  2 +
>   .../sysv/linux/mips/mips32/nofpu/libc.abilist |  2 +
>   .../sysv/linux/mips/mips64/n32/libc.abilist   |  2 +
>   .../sysv/linux/mips/mips64/n64/libc.abilist   |  2 +
>   sysdeps/unix/sysv/linux/nios2/libc.abilist    |  2 +
>   sysdeps/unix/sysv/linux/powerpc/bits/rseq.h   | 24 +++++
>   .../linux/powerpc/powerpc32/fpu/libc.abilist  |  2 +
>   .../powerpc/powerpc32/nofpu/libc.abilist      |  2 +
>   .../linux/powerpc/powerpc64/be/libc.abilist   |  2 +
>   .../linux/powerpc/powerpc64/le/libc.abilist   |  2 +
>   .../unix/sysv/linux/riscv/rv64/libc.abilist   |  2 +
>   sysdeps/unix/sysv/linux/rseq-internal.h       | 91 +++++++++++++++++++
>   sysdeps/unix/sysv/linux/rseq-sym.c            | 54 +++++++++++
>   sysdeps/unix/sysv/linux/s390/bits/rseq.h      | 24 +++++
>   .../unix/sysv/linux/s390/s390-32/libc.abilist |  2 +
>   .../unix/sysv/linux/s390/s390-64/libc.abilist |  2 +
>   sysdeps/unix/sysv/linux/sh/libc.abilist       |  2 +
>   .../sysv/linux/sparc/sparc32/libc.abilist     |  2 +
>   .../sysv/linux/sparc/sparc64/libc.abilist     |  2 +
>   sysdeps/unix/sysv/linux/sys/rseq.h            | 65 +++++++++++++
>   sysdeps/unix/sysv/linux/x86/bits/rseq.h       | 24 +++++
>   .../unix/sysv/linux/x86_64/64/libc.abilist    |  2 +
>   .../unix/sysv/linux/x86_64/x32/libc.abilist   |  2 +
>   43 files changed, 501 insertions(+), 6 deletions(-)
>   create mode 100644 misc/rseq-internal.h
>   create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/arm/bits/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h
>   create mode 100644 sysdeps/unix/sysv/linux/rseq-sym.c
>   create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h
> 
> diff --git a/NEWS b/NEWS
> index 912a9bdc0f..0608c60f7d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -5,6 +5,17 @@ See the end for copying conditions.
>   Please send GNU C library bug reports via <https://sourceware.org/bugzilla/>
>   using `glibc' in the "product" field.
>   \f
> +Version 2.30
> +
> +Major new features:
> +
> +* Support for automatically registering threads with the Linux rseq(2)
> +  system call has been added. This system call is implemented starting
> +  from Linux 4.18. In order to be activated, it requires that glibc is built
> +  against kernel headers that include this system call, and that glibc
> +  detects availability of that system call at runtime.

What benefit does the feature have for users? Can you talk about that please?
Why would a user want to use it. Please feel free to link to an external reference.

> +
> +\f
>   Version 2.29
>   
>   Major new features:
> diff --git a/csu/libc-start.c b/csu/libc-start.c
> index 5d9c3675fa..8680afc0ef 100644
> --- a/csu/libc-start.c
> +++ b/csu/libc-start.c
> @@ -22,6 +22,7 @@
>   #include <ldsodefs.h>
>   #include <exit-thread.h>
>   #include <libc-internal.h>
> +#include <rseq-internal.h>

OK.

>   
>   #include <elf/dl-tunables.h>
>   
> @@ -140,7 +141,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
>   
>     __libc_multiple_libcs = &_dl_starting_up && !_dl_starting_up;
>   
> -#ifndef SHARED
> +#ifdef SHARED
> +  /* Register rseq ABI to the kernel. */
> +  (void) rseq_register_current_thread ();

OK, early registration of main thread in shared case.

> +#else
>     _dl_relocate_static_pie ();
>   
>     char **ev = &argv[argc + 1];
> @@ -218,6 +222,9 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
>       }
>   # endif
>   
> +  /* Register rseq ABI to the kernel. */
> +  (void) rseq_register_current_thread ();

OK, early registration of main thread in static case.

> +
>     /* Initialize libpthread if linked in.  */
>     if (__pthread_initialize_minimal != NULL)
>       __pthread_initialize_minimal ();
> @@ -230,8 +237,7 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
>   # else
>     __pointer_chk_guard_local = pointer_chk_guard;
>   # endif
> -
> -#endif /* !SHARED  */
> +#endif

OK.

>   
>     /* Register the destructor of the dynamic linker if there is any.  */
>     if (__glibc_likely (rtld_fini != NULL))
> diff --git a/misc/Makefile b/misc/Makefile
> index cf0daa1161..0ae1dbaf80 100644
> --- a/misc/Makefile
> +++ b/misc/Makefile
> @@ -36,7 +36,8 @@ headers	:= sys/uio.h bits/uio-ext.h bits/uio_lim.h \
>   	   syslog.h sys/syslog.h \
>   	   bits/syslog.h bits/syslog-ldbl.h bits/syslog-path.h bits/error.h \
>   	   bits/select2.h bits/hwcap.h sys/auxv.h \
> -	   sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h
> +	   sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h \
> +	   rseq-internal.h

OK.

>   
>   routines := brk sbrk sstk ioctl \
>   	    readv writev preadv preadv64 pwritev pwritev64 \
> diff --git a/misc/rseq-internal.h b/misc/rseq-internal.h
> new file mode 100644
> index 0000000000..915122e4bf
> --- /dev/null
> +++ b/misc/rseq-internal.h
> @@ -0,0 +1,34 @@
> +/* Copyright (C) 2018 Free Software Foundation, Inc.

Add a one line description before copyright which describes the file.
Is it a stub file. Is it the Linux implementation. etc.

Please add a one sentence first line short description for all new files
you create.

> +   This file is part of the GNU C Library.
> +   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018.

We don't use "contributed by" markers anymore, we let the git commit author
carry that information, along with git blame information.

Please remove this contributed by line and all subsequent such lines.

> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef RSEQ_INTERNAL_H
> +#define RSEQ_INTERNAL_H
> +
> +static inline int
> +rseq_register_current_thread (void)
> +{
> +  return -1;
> +}
> +
> +static inline int
> +rseq_unregister_current_thread (void)
> +{
> +  return -1;
> +}
> +

OK.

> +#endif /* rseq-internal.h */
> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index 2bd2b10727..90b3419390 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -33,6 +33,7 @@
>   #include <default-sched.h>
>   #include <futex-internal.h>
>   #include <tls-setup.h>
> +#include <rseq-internal.h>

Ok.

>   #include "libioP.h"
>   
>   #include <shlib-compat.h>
> @@ -378,6 +379,7 @@ __free_tcb (struct pthread *pd)
>   START_THREAD_DEFN
>   {
>     struct pthread *pd = START_THREAD_SELF;
> +  bool has_rseq = false;

OK.

>   
>   #if HP_TIMING_AVAIL
>     /* Remember the time when the thread was started.  */
> @@ -396,6 +398,9 @@ START_THREAD_DEFN
>     if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2))
>       futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE);
>   
> +  /* Register rseq TLS to the kernel. */
> +  has_rseq = !rseq_register_current_thread ();

OK, initialize rseq after enabling setxid but before setting up roubust list,
no particularly reason it should be right here that I can see, that's OK though.

> +
>   #ifdef __NR_set_robust_list
>   # ifndef __ASSUME_SET_ROBUST_LIST
>     if (__set_robust_list_avail >= 0)
> @@ -573,6 +578,10 @@ START_THREAD_DEFN
>       }
>   #endif
>   
> +  /* Unregister rseq TLS from kernel. */
> +  if (has_rseq && rseq_unregister_current_thread ())
> +    abort();

OK. Calling abort() seems harsh because it kills the whole process, but I'm OK
with failing catastrophically if that's what we want here. It seems unrecoverable.

> +
>     advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
>   		      pd->guardsize);
>   
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index 5f8c2c7c7d..5b541469ec 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -1,5 +1,5 @@
>   ifeq ($(subdir),csu)
> -sysdep_routines += errno-loc
> +sysdep_routines += errno-loc rseq-sym

OK.

>   endif
>   
>   ifeq ($(subdir),assert)
> @@ -48,7 +48,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \
>   		  bits/termios-c_iflag.h bits/termios-c_oflag.h \
>   		  bits/termios-baud.h bits/termios-c_cflag.h \
>   		  bits/termios-c_lflag.h bits/termios-tcflow.h \
> -		  bits/termios-misc.h
> +		  bits/termios-misc.h sys/rseq.h bits/rseq.h

OK.

>   
>   tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \
>   	 tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \
> diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
> index f1e12d9c69..ad88c2b7ff 100644
> --- a/sysdeps/unix/sysv/linux/Versions
> +++ b/sysdeps/unix/sysv/linux/Versions
> @@ -174,6 +174,10 @@ libc {
>     GLIBC_2.29 {
>       getcpu;
>     }
> +  GLIBC_2.30 {
> +    __rseq_abi;
> +    __rseq_lib_abi;
> +  }

OK, good using GLIBC_2.30 for master.

>     GLIBC_PRIVATE {
>       # functions used in other libraries
>       __syscall_rt_sigqueueinfo;
> diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
> new file mode 100644
> index 0000000000..543bc5388a
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
> @@ -0,0 +1,24 @@

Needs a one sentence description.

> +/* Copyright (C) 2019 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* Signature required before each abort handler code.  */
> +#define RSEQ_SIG 0xd428bc00	/* BRK #0x45E0.  */

OK.

> diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> index 9c330f325e..bc937f585d 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> @@ -2141,3 +2141,5 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

> diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
> index f630fa4c6f..89cc8b1cfb 100644
> --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
> @@ -2036,6 +2036,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/arm/bits/rseq.h b/sysdeps/unix/sysv/linux/arm/bits/rseq.h
> new file mode 100644
> index 0000000000..19d3755837
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h
> @@ -0,0 +1,24 @@

Needs a one sentence description.

> +/* Copyright (C) 2019 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* Signature required before each abort handler code.  */
> +#define RSEQ_SIG 0x53053053

Why isn't this an arm specific op code? Does the user have to mark this
up as part of a constant pool when placing it in front of the abort handler
to avoid disassembling the constant as code? This was at one point required
to get gdb to work properly.

> diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist
> index b96f45590f..e5055f2d4e 100644
> --- a/sysdeps/unix/sysv/linux/arm/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arm/libc.abilist
> @@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.4 _Exit F
>   GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
>   GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
> diff --git a/sysdeps/unix/sysv/linux/bits/rseq.h b/sysdeps/unix/sysv/linux/bits/rseq.h
> new file mode 100644
> index 0000000000..d60f02cfeb
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/bits/rseq.h
> @@ -0,0 +1,24 @@
> +/* Copyright (C) 2019 Free Software Foundation, Inc.

Needs a one sentence description.

> +   This file is part of the GNU C Library.
> +   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* Each architecture supporting rseq should define RSEQ_SIG as a 32-bit
> +   signature inserted before each rseq abort label in the code section.  */

Needs a huge explanation about RSEQ_SIG and the reasons why it's per-arch.

Basically cut-and-paste what you wrote to libc-alpha about this.

> diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
> index 088a8ee369..546d073cdb 100644
> --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
> @@ -1883,6 +1883,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
> index f7ff2c57b9..ac1de6e4b3 100644
> --- a/sysdeps/unix/sysv/linux/i386/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
> @@ -2048,6 +2048,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
> index becd8b1033..cc3445b958 100644
> --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
> @@ -1917,6 +1917,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> index 74e42a5209..f7e28bd5a0 100644
> --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> @@ -127,6 +127,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.4 _Exit F
>   GLIBC_2.4 _IO_2_1_stderr_ D 0x98
>   GLIBC_2.4 _IO_2_1_stdin_ D 0x98
> diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> index 4af5a74e8a..b8f00f6111 100644
> --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> @@ -1992,6 +1992,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
> index ccef673fd2..19f191434f 100644
> --- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
> @@ -2133,3 +2133,5 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

> diff --git a/sysdeps/unix/sysv/linux/mips/bits/rseq.h b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
> new file mode 100644
> index 0000000000..19d3755837
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
> @@ -0,0 +1,24 @@
> +/* Copyright (C) 2019 Free Software Foundation, Inc.

Needs a one sentence description.

> +   This file is part of the GNU C Library.
> +   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* Signature required before each abort handler code.  */
> +#define RSEQ_SIG 0x53053053

Why isn't this a mips-specific op code?

> diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> index 1054bb599e..fe43507f55 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> @@ -1970,6 +1970,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> index 4f5b5ffebf..b247c6ea9b 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> @@ -1968,6 +1968,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> index 943aee58d4..5339ca52b6 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> @@ -1976,6 +1976,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> index 17a5d17ef9..11f24eb440 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> @@ -1971,6 +1971,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
> index 4d62a540fd..fd223bfc44 100644
> --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
> @@ -2174,3 +2174,5 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

> diff --git a/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
> new file mode 100644
> index 0000000000..19d3755837
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
> @@ -0,0 +1,24 @@
> +/* Copyright (C) 2019 Free Software Foundation, Inc.

Needs a one sentence description.

> +   This file is part of the GNU C Library.
> +   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* Signature required before each abort handler code.  */
> +#define RSEQ_SIG 0x53053053

Why isn't this an opcode specific to power?

> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> index ecc2d6fa13..cc53178e81 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> @@ -1996,6 +1996,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> index f5830f9c33..2de3134bc7 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> @@ -2000,6 +2000,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> index 633d8f4792..aae3def700 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> @@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 _Exit F
>   GLIBC_2.3 _IO_2_1_stderr_ D 0xe0
>   GLIBC_2.3 _IO_2_1_stdin_ D 0xe0
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> index 2c712636ef..8d582a3a9b 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> @@ -2231,3 +2231,5 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

> diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> index 195bc8b2cf..155953f6cf 100644
> --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> @@ -2103,3 +2103,5 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

> diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
> new file mode 100644
> index 0000000000..d676da3701
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/rseq-internal.h
> @@ -0,0 +1,91 @@
> +/* Copyright (C) 2018 Free Software Foundation, Inc.

Needs a one sentence description.

> +   This file is part of the GNU C Library.
> +   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef RSEQ_INTERNAL_H
> +#define RSEQ_INTERNAL_H
> +
> +#include <sysdep.h>
> +
> +#ifdef __NR_rseq
> +
> +#include <errno.h>
> +#include <sys/rseq.h>
> +

This needs a giant comment describing the reference count algorithm
and how it is intended to work.

> +static inline int
> +rseq_register_current_thread (void)
> +{
> +  int rc, ret = 0;
> +  INTERNAL_SYSCALL_DECL (err);
> +
> +  if (__rseq_abi.cpu_id == RSEQ_CPU_ID_REGISTRATION_FAILED)
> +    return -1;
> +  /* Temporarily prevent nested signal handlers from registering rseq.  */
> +  __rseq_lib_abi.register_state = RSEQ_REGISTER_ONGOING;

OK.

> +  if (__rseq_lib_abi.refcount == UINT_MAX)
> +    {
> +      ret = -1;
> +      goto end;
> +    }
> +  if (__rseq_lib_abi.refcount++)
> +    goto end;

OK.

> +  rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq),
> +                              0, RSEQ_SIG);
> +  if (!rc)
> +    goto end;
> +  if (INTERNAL_SYSCALL_ERRNO (rc, err) != EBUSY)
> +    __rseq_abi.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED;
> +  ret = -1;
> +end:
> +  __rseq_lib_abi.register_state = RSEQ_REGISTER_ALLOWED;

OK.

> +  return ret;
> +}
> +
> +static inline int
> +rseq_unregister_current_thread (void)
> +{
> +  int rc, ret = 0;
> +  INTERNAL_SYSCALL_DECL (err);
> +
> +  /* Setting __rseq_register_state = RSEQ_REGISTER_EXITING for the rest of the
> +     thread lifetime. Ensures signal handlers nesting just before thread exit
> +     don't try to register rseq.  */
> +  __rseq_lib_abi.register_state = RSEQ_REGISTER_EXITING;
> +  __rseq_lib_abi.refcount = 0;
> +  rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq),
> +                              RSEQ_FLAG_UNREGISTER, RSEQ_SIG);

OK.

> +  if (!rc)
> +    goto end;
> +  ret = -1;
> +end:
> +  return ret;
> +}
> +#else
> +static inline int
> +rseq_register_current_thread (void)
> +{
> +  return -1;
> +}
> +
> +static inline int
> +rseq_unregister_current_thread (void)
> +{
> +  return -1;
> +}

OK.

> +#endif
> +
> +#endif /* rseq-internal.h */
> diff --git a/sysdeps/unix/sysv/linux/rseq-sym.c b/sysdeps/unix/sysv/linux/rseq-sym.c
> new file mode 100644
> index 0000000000..99b277e9d6
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/rseq-sym.c
> @@ -0,0 +1,54 @@
> +/* Copyright (C) 2018 Free Software Foundation, Inc.

Needs a one sentence description.

> +   This file is part of the GNU C Library.
> +   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <sys/syscall.h>
> +#include <stdint.h>
> +
> +#ifdef __NR_rseq
> +#include <sys/rseq.h>
> +#else
> +
> +enum rseq_cpu_id_state {
> +  RSEQ_CPU_ID_UNINITIALIZED = -1,
> +  RSEQ_CPU_ID_REGISTRATION_FAILED = -2,

OK.

> +};
> +
> +/* linux/rseq.h defines struct rseq as aligned on 32 bytes. The kernel ABI
> +   size is 20 bytes.  */
> +struct rseq {
> +  uint32_t cpu_id_start;
> +  uint32_t cpu_id;
> +  uint64_t rseq_cs;
> +  uint32_t flags;
> +} __attribute__ ((aligned(4 * sizeof(uint64_t))));

OK, this is the kernel abi.

> +
> +struct rseq_lib_abi
> +{
> +  uint32_t register_state;
> +  uint32_t refcount;

OK. This is the refcount coordination structure for registration.

> +};
> +
> +#endif
> +
> +/* volatile because fields can be read/updated by the kernel.  */
> +__thread volatile struct rseq __rseq_abi = {
> +  .cpu_id = RSEQ_CPU_ID_UNINITIALIZED,

OK.

> +};
> +
> +/* volatile because fields can be read/updated by signal handlers.  */
> +__thread volatile struct rseq_lib_abi __rseq_lib_abi;

OK.

> diff --git a/sysdeps/unix/sysv/linux/s390/bits/rseq.h b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
> new file mode 100644
> index 0000000000..19d3755837
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
> @@ -0,0 +1,24 @@
> +/* Copyright (C) 2019 Free Software Foundation, Inc.

Needs a one sentence description.

> +   This file is part of the GNU C Library.
> +   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* Signature required before each abort handler code.  */
> +#define RSEQ_SIG 0x53053053

Why not a s390 specific value here?

> diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> index 334def033c..42316d8666 100644
> --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> @@ -2005,6 +2005,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> index 536f4c4ced..c6c4e55a77 100644
> --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> @@ -1911,6 +1911,8 @@ GLIBC_2.29 __fentry__ F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist
> index 30ae3b6ebb..8652dfea59 100644
> --- a/sysdeps/unix/sysv/linux/sh/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sh/libc.abilist
> @@ -1887,6 +1887,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> index 68b107d080..95b58dfa67 100644
> --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> @@ -1999,6 +1999,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> index e5b6a4da50..bfd24f9d1c 100644
> --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> @@ -1940,6 +1940,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h
> new file mode 100644
> index 0000000000..83c8976f50
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/sys/rseq.h
> @@ -0,0 +1,65 @@

Needs a one sentence description.

> +/* Copyright (C) 2019 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +#define _SYS_RSEQ_H	1
> +
> +/* We use the structures declarations from the kernel headers.  */
> +#include <linux/rseq.h>
> +/* Architecture-specific rseq signature.  */
> +#include <bits/rseq.h>
> +#include <stdint.h>

OK.

> +
> +enum rseq_register_state
> +{
> +  /* Value RSEQ_REGISTER_ALLOWED means it is allowed to update
> +     the refcount field and to register/unregister rseq.  */
> +  RSEQ_REGISTER_ALLOWED = 0,
> +  /* Value RSEQ_REGISTER_ONGOING means a rseq registration is in progress,
> +     so it is temporarily forbidden to update the refcount field or to
> +     register/unregister rseq for this thread or signal handlers nested
> +     on this thread.  */
> +  RSEQ_REGISTER_ONGOING = 1,
> +  /* Value RSEQ_REGISTER_EXITING means it is forbidden to update the
> +     refcount field or to register/unregister rseq for the rest of the
> +     thread's lifetime.  */
> +  RSEQ_REGISTER_EXITING = 2,

OK.

> +};
> +
> +struct rseq_lib_abi
> +{
> +  uint32_t register_state; /* enum rseq_register_state.  */
> +  /* The refcount field keeps track of rseq users, so early adopters
> +     of rseq can cooperate amongst each other and with glibc to
> +     share rseq thread registration. The refcount field can only be
> +     updated when allowed by the value of field register_state.
> +     Registering rseq should be performed when incrementing refcount
> +     from 0 to 1, and unregistering rseq should be performed when
> +     decrementing refcount from 1 to 0.  */

OK.

> +  uint32_t refcount;
> +};
> +
> +/* volatile because fields can be read/updated by the kernel.  */
> +extern __thread volatile struct rseq __rseq_abi
> +__attribute__ ((tls_model ("initial-exec")));

OK.

> +
> +/* volatile because fields can be read/updated by signal handlers.  */
> +extern __thread volatile struct rseq_lib_abi __rseq_lib_abi
> +__attribute__ ((tls_model ("initial-exec")));

OK.

> +
> +#endif /* sys/rseq.h */
> diff --git a/sysdeps/unix/sysv/linux/x86/bits/rseq.h b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
> new file mode 100644
> index 0000000000..19d3755837
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
> @@ -0,0 +1,24 @@
> +/* Copyright (C) 2019 Free Software Foundation, Inc.

Add one line short description.

> +   This file is part of the GNU C Library.
> +   Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* Signature required before each abort handler code.  */
> +#define RSEQ_SIG 0x53053053

Why not an x86-specific op code?

> diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> index 86dfb0c94d..e9f8411fb2 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> @@ -1898,6 +1898,8 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

>   GLIBC_2.3 __ctype_b_loc F
>   GLIBC_2.3 __ctype_tolower_loc F
>   GLIBC_2.3 __ctype_toupper_loc F
> diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> index dd688263aa..f9432d07f1 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> @@ -2149,3 +2149,5 @@ GLIBC_2.28 thrd_yield F
>   GLIBC_2.29 getcpu F
>   GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
>   GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
> +GLIBC_2.30 __rseq_abi T 0x20
> +GLIBC_2.30 __rseq_lib_abi T 0x8

OK.

> 


-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/4] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux
  2019-02-12 19:42 ` [PATCH 2/4] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux Mathieu Desnoyers
@ 2019-03-22 20:13   ` Carlos O'Donell
  0 siblings, 0 replies; 55+ messages in thread
From: Carlos O'Donell @ 2019-03-22 20:13 UTC (permalink / raw)
  To: Mathieu Desnoyers, Carlos O'Donell
  Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Boqun Feng, Will Deacon, Dave Watson, Paul Turner, linux-kernel,
	linux-api

On 2/12/19 2:42 PM, Mathieu Desnoyers wrote:
> When available, use the cpu_id field from __rseq_abi on Linux to
> implement sched_getcpu(). Fall-back on the vgetcpu vDSO if unavailable.
> 
> Benchmarks:
> 
> x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading

This patch looks good to me for master, but is blocked on patch 1/4
being reworked.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> glibc sched_getcpu():                     13.7 ns (baseline)
> glibc sched_getcpu() using rseq:           2.5 ns (speedup:  5.5x)
> inline load cpuid from __rseq_abi TLS:     0.8 ns (speedup: 17.1x)
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Carlos O'Donell <carlos@redhat.com>
> CC: Florian Weimer <fweimer@redhat.com>
> CC: Joseph Myers <joseph@codesourcery.com>
> CC: Szabolcs Nagy <szabolcs.nagy@arm.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Ben Maurer <bmaurer@fb.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Will Deacon <will.deacon@arm.com>
> CC: Dave Watson <davejwatson@fb.com>
> CC: Paul Turner <pjt@google.com>
> CC: libc-alpha@sourceware.org
> CC: linux-kernel@vger.kernel.org
> CC: linux-api@vger.kernel.org
> ---
>   sysdeps/unix/sysv/linux/sched_getcpu.c | 25 +++++++++++++++++++++++--
>   1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c
> index fb0d317f83..8bfb03778b 100644
> --- a/sysdeps/unix/sysv/linux/sched_getcpu.c
> +++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
> @@ -24,8 +24,8 @@
>   #endif
>   #include <sysdep-vdso.h>
>   
> -int
> -sched_getcpu (void)
> +static int
> +vsyscall_sched_getcpu (void)

OK.

>   {
>   #ifdef __NR_getcpu
>     unsigned int cpu;
> @@ -37,3 +37,24 @@ sched_getcpu (void)
>     return -1;
>   #endif
>   }
> +
> +#ifdef __NR_rseq
> +#include <linux/rseq.h>
> +
> +extern __attribute__ ((tls_model ("initial-exec")))
> +__thread volatile struct rseq __rseq_abi;

OK.

> +
> +int
> +sched_getcpu (void)
> +{
> +  int cpu_id = __rseq_abi.cpu_id;
> +
> +  return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu ();

OK. Impressive :-)

> +}
> +#else
> +int
> +sched_getcpu (void)
> +{
> +  return vsyscall_sched_getcpu ();

OK.

> +}
> +#endif
> 


-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-03-22 20:09   ` Carlos O'Donell
@ 2019-03-25 15:54     ` Mathieu Desnoyers
  2019-03-27  9:16         ` Martin Schwidefsky
                         ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-03-25 15:54 UTC (permalink / raw)
  To: Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

Hi Carlos,

----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote:

[...]

I took care of all your comments for an upcoming round of patches, except the
following that remain open (see answer inline). I'm adding Linux maintainers
for ARM, aarch64, MIPS, powerpc, s390, x86 in CC to discuss the choice of
code signature prior to the abort handler for each of those architectures.

[...]

>> diff --git a/NEWS b/NEWS
>> index 912a9bdc0f..0608c60f7d 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -5,6 +5,17 @@ See the end for copying conditions.
>>   Please send GNU C library bug reports via <https://sourceware.org/bugzilla/>
>>   using `glibc' in the "product" field.
>>   
>> +Version 2.30
>> +
>> +Major new features:
>> +
>> +* Support for automatically registering threads with the Linux rseq(2)
>> +  system call has been added. This system call is implemented starting
>> +  from Linux 4.18. In order to be activated, it requires that glibc is built
>> +  against kernel headers that include this system call, and that glibc
>> +  detects availability of that system call at runtime.
> 
> What benefit does the feature have for users? Can you talk about that please?
> Why would a user want to use it. Please feel free to link to an external
> reference.

Would the following text work ?

Version 2.30

Major new features:

* Support for automatically registering threads with the Linux rseq(2)
  system call has been added.  This system call is implemented starting
  from Linux 4.18.  The Restartable Sequences ABI accelerates user-space
  operations on per-cpu data.  It allows user-space to perform updates
  on per-cpu data without requiring heavy-weight atomic operations. See
  https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/
  for further explanation.

  In order to be activated, it requires that glibc is built against
  kernel headers that include this system call, and that glibc detects
  availability of that system call at runtime.


For reference the assembly code I'm pointing at below can be found
in the Linux selftests under:

tools/testing/selftests/rseq/rseq-*.h

>> +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h
[...]
>> +
>> +/* Signature required before each abort handler code.  */
>> +#define RSEQ_SIG 0x53053053
> 
> Why isn't this an arm specific op code? Does the user have to mark this
> up as part of a constant pool when placing it in front of the abort handler
> to avoid disassembling the constant as code? This was at one point required
> to get gdb to work properly.
> 

For arm, the abort is defined as:

#define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown,           \
                                abort_label, version, flags,            \
                                start_ip, post_commit_offset, abort_ip) \
                ".balign 32\n\t"                                        \
                __rseq_str(table_label) ":\n\t"                         \
                ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
                ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
                ".word " __rseq_str(RSEQ_SIG) "\n\t"                    \
                __rseq_str(label) ":\n\t"                               \
                teardown                                                \
                "b %l[" __rseq_str(abort_label) "]\n\t"

Which contains a copy of the struct rseq_cs for that critical section
close to the actual critical section, within the code, followed by the
signature. The reason why we have a copy of the struct rseq_cs there is
to speed up entry into the critical section by using the "adr" instruction
to compute the address to store into __rseq_cs->rseq_cs.

AFAIU, a literal pool on ARM is defined as something which is always
jumped over (never executed), which is the case here. We always have
an unconditional branch instruction ("b") skipping over each
RSEQ_ASM_DEFINE_ABORT().

Therefore, given that the signature is part of a literal pool on ARM,
it can be any value we choose and should not need to be an actual valid
instruction.

aarch64 defines the abort as:

#define RSEQ_ASM_DEFINE_ABORT(label, abort_label)                               \
        "       b       222f\n"                                                 \
        "       .inst   "       __rseq_str(RSEQ_SIG) "\n"                       \
        __rseq_str(label) ":\n"                                                 \
        "       b       %l[" __rseq_str(abort_label) "]\n"                      \
        "222:\n"

Where the signature actually maps to a valid instruction. Considering that
aarch64 also have literal pools, and we branch over the signature, I wonder
why it's so important to ensure the signature is a valid trap instruction.
Perhaps Will Deacon can enlighten us ?


[...]
>> +++ b/sysdeps/unix/sysv/linux/bits/rseq.h
>> @@ -0,0 +1,24 @@
[...]
>> +
>> +/* Each architecture supporting rseq should define RSEQ_SIG as a 32-bit
>> +   signature inserted before each rseq abort label in the code section.  */
> 
> Needs a huge explanation about RSEQ_SIG and the reasons why it's per-arch.
> 
> Basically cut-and-paste what you wrote to libc-alpha about this.

Updated to:

/* RSEQ_SIG is a signature required before each abort handler code.

   It is a 32-bit value that maps to actual architecture code compiled
   into applications and libraries. It needs to be defined for each
   architecture. When choosing this value, it needs to be taken into
   account that generating invalid instructions may have ill effects on
   tools like objdump, and may also have impact on the CPU speculative
   execution efficiency in some cases.  */


[...]
>> +++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
[...]
>> +
>> +/* Signature required before each abort handler code.  */
>> +#define RSEQ_SIG 0x53053053
> 
> Why isn't this a mips-specific op code?

MIPS also has a literal pool just before the abort handler, and it
jumps over it. My understanding is that we can use any signature value
we want, and it does not need to be a valid instruction, similarly to ARM:

#define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \
                                abort_label, version, flags, \
                                start_ip, post_commit_offset, abort_ip) \
                ".balign 32\n\t" \
                __rseq_str(table_label) ":\n\t" \
                ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
                LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \
                LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \
                LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \
                ".word " __rseq_str(RSEQ_SIG) "\n\t" \
                __rseq_str(label) ":\n\t" \
                teardown \
                "b %l[" __rseq_str(abort_label) "]\n\t"

Perhaps Paul Burton can confirm this ?

[...]
>> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
[...]
>> +/* Signature required before each abort handler code.  */
>> +#define RSEQ_SIG 0x53053053
> 
> Why isn't this an opcode specific to power?

On powerpc 32/64, the abort is placed in a __rseq_failure executable section:

#define RSEQ_ASM_DEFINE_ABORT(label, abort_label)                               \
                ".pushsection __rseq_failure, \"ax\"\n\t"                       \
                ".long " __rseq_str(RSEQ_SIG) "\n\t"                            \
                __rseq_str(label) ":\n\t"                                       \
                "b %l[" __rseq_str(abort_label) "]\n\t"                         \
                ".popsection\n\t"

That section only contains snippets of those trampolines. Arguably, it would be
good if disassemblers could find valid instructions there. Boqun Feng could perhaps
shed some light on this signature choice ? Now would be a good time to decide
once and for all whether a valid instruction would be a better choice.

[...]
>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
[...]
>> +
>> +/* Signature required before each abort handler code.  */
>> +#define RSEQ_SIG 0x53053053
> 
> Why not a s390 specific value here?

s390 also has the abort handler in a __rseq_failure section:

#define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
                ".pushsection __rseq_failure, \"ax\"\n\t"               \
                ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
                __rseq_str(label) ":\n\t"                               \
                teardown                                                \
                "j %l[" __rseq_str(abort_label) "]\n\t"                 \
                ".popsection\n\t"

Same question applies as powerpc: since disassemblers will try to decode
that instruction, would it be better to define it as a valid one ?

[...]
>> +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
[...]
>> +/* Signature required before each abort handler code.  */
>> +#define RSEQ_SIG 0x53053053
> 
> Why not an x86-specific op code?

On x86, we use this 4-byte signature as operand to a "no-op" instruction
taking 4-byte immediate operand:

x86-32:

#define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
                ".pushsection __rseq_failure, \"ax\"\n\t"               \
                /* Disassembler-friendly signature: nopl <sig>. */      \
                ".byte 0x0f, 0x1f, 0x05\n\t"                            \
                ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
                __rseq_str(label) ":\n\t"                               \
                teardown                                                \
                "jmp %l[" __rseq_str(abort_label) "]\n\t"               \
                ".popsection\n\t"


x86-64:

#define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
                ".pushsection __rseq_failure, \"ax\"\n\t"               \
                /* Disassembler-friendly signature: nopl <sig>(%rip). */\
                ".byte 0x0f, 0x1f, 0x05\n\t"                            \
                ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
                __rseq_str(label) ":\n\t"                               \
                teardown                                                \
                "jmp %l[" __rseq_str(abort_label) "]\n\t"               \
                ".popsection\n\t"

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-03-25 15:54     ` Mathieu Desnoyers
@ 2019-03-27  9:16         ` Martin Schwidefsky
  2019-04-02  6:02       ` Michael Ellerman
  2019-04-04 20:50       ` Carlos O'Donell
  2 siblings, 0 replies; 55+ messages in thread
From: Martin Schwidefsky @ 2019-03-27  9:16 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

On Mon, 25 Mar 2019 11:54:32 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> >> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h  
> [...]
> >> +
> >> +/* Signature required before each abort handler code.  */
> >> +#define RSEQ_SIG 0x53053053  
> > 
> > Why not a s390 specific value here?  
> 
> s390 also has the abort handler in a __rseq_failure section:
> 
> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
>                 ".pushsection __rseq_failure, \"ax\"\n\t"               \
>                 ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
>                 __rseq_str(label) ":\n\t"                               \
>                 teardown                                                \
>                 "j %l[" __rseq_str(abort_label) "]\n\t"                 \
>                 ".popsection\n\t"
> 
> Same question applies as powerpc: since disassemblers will try to decode
> that instruction, would it be better to define it as a valid one ?
> 
> [...]

A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction.
And please replace that "j %l[...]" with a "jg %l[...]", the branch target
range of the "j" instruction is 64K, not enough for the general case.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-03-27  9:16         ` Martin Schwidefsky
  0 siblings, 0 replies; 55+ messages in thread
From: Martin Schwidefsky @ 2019-03-27  9:16 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson

On Mon, 25 Mar 2019 11:54:32 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> >> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h  
> [...]
> >> +
> >> +/* Signature required before each abort handler code.  */
> >> +#define RSEQ_SIG 0x53053053  
> > 
> > Why not a s390 specific value here?  
> 
> s390 also has the abort handler in a __rseq_failure section:
> 
> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
>                 ".pushsection __rseq_failure, \"ax\"\n\t"               \
>                 ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
>                 __rseq_str(label) ":\n\t"                               \
>                 teardown                                                \
>                 "j %l[" __rseq_str(abort_label) "]\n\t"                 \
>                 ".popsection\n\t"
> 
> Same question applies as powerpc: since disassemblers will try to decode
> that instruction, would it be better to define it as a valid one ?
> 
> [...]

A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction.
And please replace that "j %l[...]" with a "jg %l[...]", the branch target
range of the "j" instruction is 64K, not enough for the general case.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-03-27  9:16         ` Martin Schwidefsky
@ 2019-03-27 20:01           ` Mathieu Desnoyers
  -1 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-03-27 20:01 UTC (permalink / raw)
  To: schwidefsky
  Cc: Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng,
	heiko carstens, gor, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

----- On Mar 27, 2019, at 5:16 AM, schwidefsky schwidefsky@de.ibm.com wrote:

> On Mon, 25 Mar 2019 11:54:32 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> >> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
>> [...]
>> >> +
>> >> +/* Signature required before each abort handler code.  */
>> >> +#define RSEQ_SIG 0x53053053
>> > 
>> > Why not a s390 specific value here?
>> 
>> s390 also has the abort handler in a __rseq_failure section:
>> 
>> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
>>                 ".pushsection __rseq_failure, \"ax\"\n\t"               \
>>                 ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
>>                 __rseq_str(label) ":\n\t"                               \
>>                 teardown                                                \
>>                 "j %l[" __rseq_str(abort_label) "]\n\t"                 \
>>                 ".popsection\n\t"
>> 
>> Same question applies as powerpc: since disassemblers will try to decode
>> that instruction, would it be better to define it as a valid one ?
>> 
>> [...]
> 
> A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction.

Based on the z/Architecture reference summary, it appears that the DIEBR
instruction's opcode is B353. So I suspect that just starting with 0x53 is
not sufficient to make it a valid opcode in the instruction set. Is it
something we should care about ?

> And please replace that "j %l[...]" with a "jg %l[...]", the branch target
> range of the "j" instruction is 64K, not enough for the general case.

I'll also need to use "jg" for RSEQ_ASM_DEFINE_CMPFAIL:

#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label)         \
                ".pushsection __rseq_failure, \"ax\"\n\t"               \
                __rseq_str(label) ":\n\t"                               \
                teardown                                                \
                "jg %l[" __rseq_str(cmpfail_label) "]\n\t"              \
                ".popsection\n\t"

I'll prepare a fix for Linux selftests, and I already pushed a fix within
my librseq repository.

Thanks for pointing it out!

Mathieu

> 
> --
> blue skies,
>   Martin.
> 
> "Reality continues to ruin my life." - Calvin.

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-03-27 20:01           ` Mathieu Desnoyers
  0 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-03-27 20:01 UTC (permalink / raw)
  To: schwidefsky
  Cc: Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng,
	heiko carstens, gor, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson

----- On Mar 27, 2019, at 5:16 AM, schwidefsky schwidefsky@de.ibm.com wrote:

> On Mon, 25 Mar 2019 11:54:32 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> >> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
>> [...]
>> >> +
>> >> +/* Signature required before each abort handler code.  */
>> >> +#define RSEQ_SIG 0x53053053
>> > 
>> > Why not a s390 specific value here?
>> 
>> s390 also has the abort handler in a __rseq_failure section:
>> 
>> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
>>                 ".pushsection __rseq_failure, \"ax\"\n\t"               \
>>                 ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
>>                 __rseq_str(label) ":\n\t"                               \
>>                 teardown                                                \
>>                 "j %l[" __rseq_str(abort_label) "]\n\t"                 \
>>                 ".popsection\n\t"
>> 
>> Same question applies as powerpc: since disassemblers will try to decode
>> that instruction, would it be better to define it as a valid one ?
>> 
>> [...]
> 
> A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction.

Based on the z/Architecture reference summary, it appears that the DIEBR
instruction's opcode is B353. So I suspect that just starting with 0x53 is
not sufficient to make it a valid opcode in the instruction set. Is it
something we should care about ?

> And please replace that "j %l[...]" with a "jg %l[...]", the branch target
> range of the "j" instruction is 64K, not enough for the general case.

I'll also need to use "jg" for RSEQ_ASM_DEFINE_CMPFAIL:

#define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label)         \
                ".pushsection __rseq_failure, \"ax\"\n\t"               \
                __rseq_str(label) ":\n\t"                               \
                teardown                                                \
                "jg %l[" __rseq_str(cmpfail_label) "]\n\t"              \
                ".popsection\n\t"

I'll prepare a fix for Linux selftests, and I already pushed a fix within
my librseq repository.

Thanks for pointing it out!

Mathieu

> 
> --
> blue skies,
>   Martin.
> 
> "Reality continues to ruin my life." - Calvin.

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-03-27  9:16         ` Martin Schwidefsky
@ 2019-03-27 20:38           ` Carlos O'Donell
  -1 siblings, 0 replies; 55+ messages in thread
From: Carlos O'Donell @ 2019-03-27 20:38 UTC (permalink / raw)
  To: Martin Schwidefsky, Mathieu Desnoyers
  Cc: Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens,
	Vasily Gorbik, Russell King, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, carlos, Florian Weimer,
	Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
	Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson,
	Paul Turner, Rich Felker, linux-kernel, linux-api

On 3/27/19 5:16 AM, Martin Schwidefsky wrote:
> On Mon, 25 Mar 2019 11:54:32 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
>> [...]
>>>> +
>>>> +/* Signature required before each abort handler code.  */
>>>> +#define RSEQ_SIG 0x53053053
>>>
>>> Why not a s390 specific value here?
>>
>> s390 also has the abort handler in a __rseq_failure section:
>>
>> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
>>                  ".pushsection __rseq_failure, \"ax\"\n\t"               \
>>                  ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
>>                  __rseq_str(label) ":\n\t"                               \
>>                  teardown                                                \
>>                  "j %l[" __rseq_str(abort_label) "]\n\t"                 \
>>                  ".popsection\n\t"
>>
>> Same question applies as powerpc: since disassemblers will try to decode
>> that instruction, would it be better to define it as a valid one ?
>>
>> [...]
> 
> A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction.
> And please replace that "j %l[...]" with a "jg %l[...]", the branch target
> range of the "j" instruction is 64K, not enough for the general case.

Why was this particular operated selected?
  
So on s390 the RSEQ_SIG will show up as an unexpected "divide to integer"
instruction that can't be reached by any control flow?

Can we use a NOP with a unique value in an immediate operand?

The goal being to have something that won't confuse during a debug
session, or that the debugger can ignore (like constant pools on Arm)

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-03-27 20:38           ` Carlos O'Donell
  0 siblings, 0 replies; 55+ messages in thread
From: Carlos O'Donell @ 2019-03-27 20:38 UTC (permalink / raw)
  To: Martin Schwidefsky, Mathieu Desnoyers
  Cc: Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens,
	Vasily Gorbik, Russell King, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, carlos, Florian Weimer,
	Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
	Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson,
	Paul Turner

On 3/27/19 5:16 AM, Martin Schwidefsky wrote:
> On Mon, 25 Mar 2019 11:54:32 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
>> [...]
>>>> +
>>>> +/* Signature required before each abort handler code.  */
>>>> +#define RSEQ_SIG 0x53053053
>>>
>>> Why not a s390 specific value here?
>>
>> s390 also has the abort handler in a __rseq_failure section:
>>
>> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
>>                  ".pushsection __rseq_failure, \"ax\"\n\t"               \
>>                  ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
>>                  __rseq_str(label) ":\n\t"                               \
>>                  teardown                                                \
>>                  "j %l[" __rseq_str(abort_label) "]\n\t"                 \
>>                  ".popsection\n\t"
>>
>> Same question applies as powerpc: since disassemblers will try to decode
>> that instruction, would it be better to define it as a valid one ?
>>
>> [...]
> 
> A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction.
> And please replace that "j %l[...]" with a "jg %l[...]", the branch target
> range of the "j" instruction is 64K, not enough for the general case.

Why was this particular operated selected?
  
So on s390 the RSEQ_SIG will show up as an unexpected "divide to integer"
instruction that can't be reached by any control flow?

Can we use a NOP with a unique value in an immediate operand?

The goal being to have something that won't confuse during a debug
session, or that the debugger can ignore (like constant pools on Arm)

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-03-27 20:38           ` Carlos O'Donell
@ 2019-03-28  7:49             ` Martin Schwidefsky
  -1 siblings, 0 replies; 55+ messages in thread
From: Martin Schwidefsky @ 2019-03-28  7:49 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

On Wed, 27 Mar 2019 16:38:32 -0400
"Carlos O'Donell" <codonell@redhat.com> wrote:

> On 3/27/19 5:16 AM, Martin Schwidefsky wrote:
> > On Mon, 25 Mar 2019 11:54:32 -0400 (EDT)
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >   
> >>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h  
> >> [...]  
> >>>> +
> >>>> +/* Signature required before each abort handler code.  */
> >>>> +#define RSEQ_SIG 0x53053053  
> >>>
> >>> Why not a s390 specific value here?  
> >>
> >> s390 also has the abort handler in a __rseq_failure section:
> >>
> >> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
> >>                  ".pushsection __rseq_failure, \"ax\"\n\t"               \
> >>                  ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
> >>                  __rseq_str(label) ":\n\t"                               \
> >>                  teardown                                                \
> >>                  "j %l[" __rseq_str(abort_label) "]\n\t"                 \
> >>                  ".popsection\n\t"
> >>
> >> Same question applies as powerpc: since disassemblers will try to decode
> >> that instruction, would it be better to define it as a valid one ?
> >>
> >> [...]  
> > 
> > A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction.
> > And please replace that "j %l[...]" with a "jg %l[...]", the branch target
> > range of the "j" instruction is 64K, not enough for the general case.  
> 
> Why was this particular operated selected?
>   
> So on s390 the RSEQ_SIG will show up as an unexpected "divide to integer"
> instruction that can't be reached by any control flow?
> 
> Can we use a NOP with a unique value in an immediate operand?
> 
> The goal being to have something that won't confuse during a debug
> session, or that the debugger can ignore (like constant pools on Arm)

I was looking at the wrong table in regard to opcode 0x53. The pattern
0x53...... is not a known instruction as far as the disassembler is
concerned. As Mathieu pointed out "diebr" is actually 0xb353....
Sorry about the confusion.

But why do we need this value in the first place if it can not be reached?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-03-28  7:49             ` Martin Schwidefsky
  0 siblings, 0 replies; 55+ messages in thread
From: Martin Schwidefsky @ 2019-03-28  7:49 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney

On Wed, 27 Mar 2019 16:38:32 -0400
"Carlos O'Donell" <codonell@redhat.com> wrote:

> On 3/27/19 5:16 AM, Martin Schwidefsky wrote:
> > On Mon, 25 Mar 2019 11:54:32 -0400 (EDT)
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >   
> >>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h  
> >> [...]  
> >>>> +
> >>>> +/* Signature required before each abort handler code.  */
> >>>> +#define RSEQ_SIG 0x53053053  
> >>>
> >>> Why not a s390 specific value here?  
> >>
> >> s390 also has the abort handler in a __rseq_failure section:
> >>
> >> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
> >>                  ".pushsection __rseq_failure, \"ax\"\n\t"               \
> >>                  ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
> >>                  __rseq_str(label) ":\n\t"                               \
> >>                  teardown                                                \
> >>                  "j %l[" __rseq_str(abort_label) "]\n\t"                 \
> >>                  ".popsection\n\t"
> >>
> >> Same question applies as powerpc: since disassemblers will try to decode
> >> that instruction, would it be better to define it as a valid one ?
> >>
> >> [...]  
> > 
> > A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction.
> > And please replace that "j %l[...]" with a "jg %l[...]", the branch target
> > range of the "j" instruction is 64K, not enough for the general case.  
> 
> Why was this particular operated selected?
>   
> So on s390 the RSEQ_SIG will show up as an unexpected "divide to integer"
> instruction that can't be reached by any control flow?
> 
> Can we use a NOP with a unique value in an immediate operand?
> 
> The goal being to have something that won't confuse during a debug
> session, or that the debugger can ignore (like constant pools on Arm)

I was looking at the wrong table in regard to opcode 0x53. The pattern
0x53...... is not a known instruction as far as the disassembler is
concerned. As Mathieu pointed out "diebr" is actually 0xb353....
Sorry about the confusion.

But why do we need this value in the first place if it can not be reached?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-03-28  7:49             ` Martin Schwidefsky
@ 2019-03-28 15:42               ` Mathieu Desnoyers
  -1 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-03-28 15:42 UTC (permalink / raw)
  To: schwidefsky
  Cc: Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng,
	heiko carstens, gor, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

----- On Mar 28, 2019, at 3:49 AM, schwidefsky schwidefsky@de.ibm.com wrote:

> On Wed, 27 Mar 2019 16:38:32 -0400
> "Carlos O'Donell" <codonell@redhat.com> wrote:
> 
>> On 3/27/19 5:16 AM, Martin Schwidefsky wrote:
>> > On Mon, 25 Mar 2019 11:54:32 -0400 (EDT)
>> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>> >   
>> >>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
>> >> [...]
>> >>>> +
>> >>>> +/* Signature required before each abort handler code.  */
>> >>>> +#define RSEQ_SIG 0x53053053
>> >>>
>> >>> Why not a s390 specific value here?
>> >>
>> >> s390 also has the abort handler in a __rseq_failure section:
>> >>
>> >> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
>> >>                  ".pushsection __rseq_failure, \"ax\"\n\t"               \
>> >>                  ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
>> >>                  __rseq_str(label) ":\n\t"                               \
>> >>                  teardown                                                \
>> >>                  "j %l[" __rseq_str(abort_label) "]\n\t"                 \
>> >>                  ".popsection\n\t"
>> >>
>> >> Same question applies as powerpc: since disassemblers will try to decode
>> >> that instruction, would it be better to define it as a valid one ?
>> >>
>> >> [...]
>> > 
>> > A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction.
>> > And please replace that "j %l[...]" with a "jg %l[...]", the branch target
>> > range of the "j" instruction is 64K, not enough for the general case.
>> 
>> Why was this particular operated selected?

The 0x53053053 signature was selected by myself on x86, where it is a
4-byte operand to a no-op instruction. That value looks like "SEQSEQSE"
in hexadecimal. The goal was to have an uncommon code signature value.

Then it has been used as-is on arm and mips within literal pools (which seems
fine), and also on s390 and powerpc where those seem to generate invalid
instruction within a separate code section.

I'm mainly concerned about the choice of this value on s390 and powerpc.

>>   
>> So on s390 the RSEQ_SIG will show up as an unexpected "divide to integer"
>> instruction that can't be reached by any control flow?
>> 
>> Can we use a NOP with a unique value in an immediate operand?
>> 
>> The goal being to have something that won't confuse during a debug
>> session, or that the debugger can ignore (like constant pools on Arm)
> 
> I was looking at the wrong table in regard to opcode 0x53. The pattern
> 0x53...... is not a known instruction as far as the disassembler is
> concerned. As Mathieu pointed out "diebr" is actually 0xb353....
> Sorry about the confusion.
> 
> But why do we need this value in the first place if it can not be reached?

One reason is to help disassemblers in tools like objdump, gdb, and so on.

Another reason for making this a valid instruction is if the CPU speculative
execution can be helped by making this instruction a valid one, even though
it's not reachable through normal execution. (this appears to be important
at least on aarch64)

However, we want that instruction to be an uncommon one, to reduce the chances
that an attacker can use the rseq abort mechanism to redirect execution to
an unrelated code block that would happen to follow that same 4-byte signature.
For instance, I would not use a no-op which is typically generated by compilers
there.

In summary:

* x86: Uses a no-op instruction ending with a 4-byte operand.
* aarch64: Uses a trap instruction which also has a seldom-used operand value.
* arm: The signature is in a literal pool (there is a jump over the signature).
* mips: The signature is in a literal pool (there is a jump over the signature).
* powerpc: the signature is an unreachable invalid(?) instruction within an
  executable section.
* s390: the signature is an unreachable invalid instruction within an executable
  section.

Thanks,

Mathieu



-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-03-28 15:42               ` Mathieu Desnoyers
  0 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-03-28 15:42 UTC (permalink / raw)
  To: schwidefsky
  Cc: Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng,
	heiko carstens, gor, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson

----- On Mar 28, 2019, at 3:49 AM, schwidefsky schwidefsky@de.ibm.com wrote:

> On Wed, 27 Mar 2019 16:38:32 -0400
> "Carlos O'Donell" <codonell@redhat.com> wrote:
> 
>> On 3/27/19 5:16 AM, Martin Schwidefsky wrote:
>> > On Mon, 25 Mar 2019 11:54:32 -0400 (EDT)
>> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>> >   
>> >>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
>> >> [...]
>> >>>> +
>> >>>> +/* Signature required before each abort handler code.  */
>> >>>> +#define RSEQ_SIG 0x53053053
>> >>>
>> >>> Why not a s390 specific value here?
>> >>
>> >> s390 also has the abort handler in a __rseq_failure section:
>> >>
>> >> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
>> >>                  ".pushsection __rseq_failure, \"ax\"\n\t"               \
>> >>                  ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
>> >>                  __rseq_str(label) ":\n\t"                               \
>> >>                  teardown                                                \
>> >>                  "j %l[" __rseq_str(abort_label) "]\n\t"                 \
>> >>                  ".popsection\n\t"
>> >>
>> >> Same question applies as powerpc: since disassemblers will try to decode
>> >> that instruction, would it be better to define it as a valid one ?
>> >>
>> >> [...]
>> > 
>> > A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction.
>> > And please replace that "j %l[...]" with a "jg %l[...]", the branch target
>> > range of the "j" instruction is 64K, not enough for the general case.
>> 
>> Why was this particular operated selected?

The 0x53053053 signature was selected by myself on x86, where it is a
4-byte operand to a no-op instruction. That value looks like "SEQSEQSE"
in hexadecimal. The goal was to have an uncommon code signature value.

Then it has been used as-is on arm and mips within literal pools (which seems
fine), and also on s390 and powerpc where those seem to generate invalid
instruction within a separate code section.

I'm mainly concerned about the choice of this value on s390 and powerpc.

>>   
>> So on s390 the RSEQ_SIG will show up as an unexpected "divide to integer"
>> instruction that can't be reached by any control flow?
>> 
>> Can we use a NOP with a unique value in an immediate operand?
>> 
>> The goal being to have something that won't confuse during a debug
>> session, or that the debugger can ignore (like constant pools on Arm)
> 
> I was looking at the wrong table in regard to opcode 0x53. The pattern
> 0x53...... is not a known instruction as far as the disassembler is
> concerned. As Mathieu pointed out "diebr" is actually 0xb353....
> Sorry about the confusion.
> 
> But why do we need this value in the first place if it can not be reached?

One reason is to help disassemblers in tools like objdump, gdb, and so on.

Another reason for making this a valid instruction is if the CPU speculative
execution can be helped by making this instruction a valid one, even though
it's not reachable through normal execution. (this appears to be important
at least on aarch64)

However, we want that instruction to be an uncommon one, to reduce the chances
that an attacker can use the rseq abort mechanism to redirect execution to
an unrelated code block that would happen to follow that same 4-byte signature.
For instance, I would not use a no-op which is typically generated by compilers
there.

In summary:

* x86: Uses a no-op instruction ending with a 4-byte operand.
* aarch64: Uses a trap instruction which also has a seldom-used operand value.
* arm: The signature is in a literal pool (there is a jump over the signature).
* mips: The signature is in a literal pool (there is a jump over the signature).
* powerpc: the signature is an unreachable invalid(?) instruction within an
  executable section.
* s390: the signature is an unreachable invalid instruction within an executable
  section.

Thanks,

Mathieu



-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-03-25 15:54     ` Mathieu Desnoyers
  2019-03-27  9:16         ` Martin Schwidefsky
@ 2019-04-02  6:02       ` Michael Ellerman
  2019-04-02  7:08           ` Florian Weimer
  2019-04-04 20:15         ` Carlos O'Donell
  2019-04-04 20:50       ` Carlos O'Donell
  2 siblings, 2 replies; 55+ messages in thread
From: Michael Ellerman @ 2019-04-02  6:02 UTC (permalink / raw)
  To: Mathieu Desnoyers, Carlos O'Donell, Paul Burton, Will Deacon,
	Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky,
	Russell King, Benjamin Herrenschmidt, Paul Mackerras
  Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes:
> Hi Carlos,
>
> ----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote:
...
>
> [...]
>>> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
> [...]
>>> +/* Signature required before each abort handler code.  */
>>> +#define RSEQ_SIG 0x53053053
>> 
>> Why isn't this an opcode specific to power?
>
> On powerpc 32/64, the abort is placed in a __rseq_failure executable section:
>
> #define RSEQ_ASM_DEFINE_ABORT(label, abort_label)                               \
>                 ".pushsection __rseq_failure, \"ax\"\n\t"                       \
>                 ".long " __rseq_str(RSEQ_SIG) "\n\t"                            \
>                 __rseq_str(label) ":\n\t"                                       \
>                 "b %l[" __rseq_str(abort_label) "]\n\t"                         \
>                 ".popsection\n\t"
>
> That section only contains snippets of those trampolines. Arguably, it would be
> good if disassemblers could find valid instructions there. Boqun Feng could perhaps
> shed some light on this signature choice ? Now would be a good time to decide
> once and for all whether a valid instruction would be a better choice.

I'm a bit vague on what we're trying to do here.

But it seems like you want some sort of "eye catcher" prior to the branch?

That value is a valid instruction on current CPUs (rlwimi.
r5,r24,6,1,9), and even if it wasn't it could become one in future.

If you change it to 0x8053530 that is both a valid instruction and is a
nop (conditional trap immediate but with no conditions set).

cheers

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-02  6:02       ` Michael Ellerman
@ 2019-04-02  7:08           ` Florian Weimer
  2019-04-04 20:15         ` Carlos O'Donell
  1 sibling, 0 replies; 55+ messages in thread
From: Florian Weimer @ 2019-04-02  7:08 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Mathieu Desnoyers, Carlos O'Donell, Paul Burton, Will Deacon,
	Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky,
	Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos,
	Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
	Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson,
	Paul Turner, Rich Felker, linux-kernel, linux-api

* Michael Ellerman:

> I'm a bit vague on what we're trying to do here.
>
> But it seems like you want some sort of "eye catcher" prior to the branch?
>
> That value is a valid instruction on current CPUs (rlwimi.
> r5,r24,6,1,9), and even if it wasn't it could become one in future.
>
> If you change it to 0x8053530 that is both a valid instruction and is a
> nop (conditional trap immediate but with no conditions set).

I think we need something that is very unlikely to appear in the
instruction stream.  It's just a marker.  The instruction will never be
executed, and it does not have to be a trap, either (I believe that a
standard trap instruction would be a bad choice).

Thanks,
Florian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-02  7:08           ` Florian Weimer
  0 siblings, 0 replies; 55+ messages in thread
From: Florian Weimer @ 2019-04-02  7:08 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Mathieu Desnoyers, Carlos O'Donell, Paul Burton, Will Deacon,
	Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky,
	Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos,
	Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
	Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson

* Michael Ellerman:

> I'm a bit vague on what we're trying to do here.
>
> But it seems like you want some sort of "eye catcher" prior to the branch?
>
> That value is a valid instruction on current CPUs (rlwimi.
> r5,r24,6,1,9), and even if it wasn't it could become one in future.
>
> If you change it to 0x8053530 that is both a valid instruction and is a
> nop (conditional trap immediate but with no conditions set).

I think we need something that is very unlikely to appear in the
instruction stream.  It's just a marker.  The instruction will never be
executed, and it does not have to be a trap, either (I believe that a
standard trap instruction would be a bad choice).

Thanks,
Florian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-02  6:02       ` Michael Ellerman
  2019-04-02  7:08           ` Florian Weimer
@ 2019-04-04 20:15         ` Carlos O'Donell
  1 sibling, 0 replies; 55+ messages in thread
From: Carlos O'Donell @ 2019-04-04 20:15 UTC (permalink / raw)
  To: Michael Ellerman, Mathieu Desnoyers, Paul Burton, Will Deacon,
	Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky,
	Russell King, Benjamin Herrenschmidt, Paul Mackerras
  Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

On 4/2/19 2:02 AM, Michael Ellerman wrote:
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes:
>> Hi Carlos,
>>
>> ----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote:
> ...
>>
>> [...]
>>>> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
>> [...]
>>>> +/* Signature required before each abort handler code.  */
>>>> +#define RSEQ_SIG 0x53053053
>>>
>>> Why isn't this an opcode specific to power?
>>
>> On powerpc 32/64, the abort is placed in a __rseq_failure executable section:
>>
>> #define RSEQ_ASM_DEFINE_ABORT(label, abort_label)                               \
>>                  ".pushsection __rseq_failure, \"ax\"\n\t"                       \
>>                  ".long " __rseq_str(RSEQ_SIG) "\n\t"                            \
>>                  __rseq_str(label) ":\n\t"                                       \
>>                  "b %l[" __rseq_str(abort_label) "]\n\t"                         \
>>                  ".popsection\n\t"
>>
>> That section only contains snippets of those trampolines. Arguably, it would be
>> good if disassemblers could find valid instructions there. Boqun Feng could perhaps
>> shed some light on this signature choice ? Now would be a good time to decide
>> once and for all whether a valid instruction would be a better choice.
> 
> I'm a bit vague on what we're trying to do here.
> 
> But it seems like you want some sort of "eye catcher" prior to the branch?
> 
> That value is a valid instruction on current CPUs (rlwimi.
> r5,r24,6,1,9), and even if it wasn't it could become one in future.
> 
> If you change it to 0x8053530 that is both a valid instruction and is a
> nop (conditional trap immediate but with no conditions set).

The s390 IBM team needs to respond to this and I want to make sure they ACK
the NOP being used here because it impacts them directly.

I'd like to see Martin's opinion on this.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-02  7:08           ` Florian Weimer
@ 2019-04-04 20:32             ` Carlos O'Donell
  -1 siblings, 0 replies; 55+ messages in thread
From: Carlos O'Donell @ 2019-04-04 20:32 UTC (permalink / raw)
  To: Florian Weimer, Michael Ellerman
  Cc: Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers,
	Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer,
	Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner,
	Rich Felker, linux-kernel, linux-api

On 4/2/19 3:08 AM, Florian Weimer wrote:
> * Michael Ellerman:
> 
>> I'm a bit vague on what we're trying to do here.
>>
>> But it seems like you want some sort of "eye catcher" prior to the branch?
>>
>> That value is a valid instruction on current CPUs (rlwimi.
>> r5,r24,6,1,9), and even if it wasn't it could become one in future.
>>
>> If you change it to 0x8053530 that is both a valid instruction and is a
>> nop (conditional trap immediate but with no conditions set).
> 
> I think we need something that is very unlikely to appear in the
> instruction stream.  It's just a marker.  The instruction will never be
> executed, and it does not have to be a trap, either (I believe that a
> standard trap instruction would be a bad choice).

I assume you want to avoid a standard trap instruction because it would
be common, and so not meet the intent of the RSEQ_SIG choice as being something
that is *uncommon* right?

It is valuable that it be a trap, particularly for constant pools because
it means that a jump into the constant pool will trap.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-04 20:32             ` Carlos O'Donell
  0 siblings, 0 replies; 55+ messages in thread
From: Carlos O'Donell @ 2019-04-04 20:32 UTC (permalink / raw)
  To: Florian Weimer, Michael Ellerman
  Cc: Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers,
	Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer,
	Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner

On 4/2/19 3:08 AM, Florian Weimer wrote:
> * Michael Ellerman:
> 
>> I'm a bit vague on what we're trying to do here.
>>
>> But it seems like you want some sort of "eye catcher" prior to the branch?
>>
>> That value is a valid instruction on current CPUs (rlwimi.
>> r5,r24,6,1,9), and even if it wasn't it could become one in future.
>>
>> If you change it to 0x8053530 that is both a valid instruction and is a
>> nop (conditional trap immediate but with no conditions set).
> 
> I think we need something that is very unlikely to appear in the
> instruction stream.  It's just a marker.  The instruction will never be
> executed, and it does not have to be a trap, either (I believe that a
> standard trap instruction would be a bad choice).

I assume you want to avoid a standard trap instruction because it would
be common, and so not meet the intent of the RSEQ_SIG choice as being something
that is *uncommon* right?

It is valuable that it be a trap, particularly for constant pools because
it means that a jump into the constant pool will trap.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-03-25 15:54     ` Mathieu Desnoyers
  2019-03-27  9:16         ` Martin Schwidefsky
  2019-04-02  6:02       ` Michael Ellerman
@ 2019-04-04 20:50       ` Carlos O'Donell
  2019-04-04 21:41           ` Paul Burton
  2 siblings, 1 reply; 55+ messages in thread
From: Carlos O'Donell @ 2019-04-04 20:50 UTC (permalink / raw)
  To: Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

On 3/25/19 11:54 AM, Mathieu Desnoyers wrote:
> Hi Carlos,
> 
> ----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote:
> 
> [...]
> 
> I took care of all your comments for an upcoming round of patches, except the
> following that remain open (see answer inline). I'm adding Linux maintainers
> for ARM, aarch64, MIPS, powerpc, s390, x86 in CC to discuss the choice of
> code signature prior to the abort handler for each of those architectures.

Thank you for kicking off this conversation.

Every architecture should have a reasonable RSEQ_SIG that applies to their
ISA with a comment about why that instruction was chosen. It should be a
conscious choice, without a default.

> * Support for automatically registering threads with the Linux rseq(2)
>    system call has been added.  This system call is implemented starting
>    from Linux 4.18.  The Restartable Sequences ABI accelerates user-space
>    operations on per-cpu data.  It allows user-space to perform updates
>    on per-cpu data without requiring heavy-weight atomic operations. See
>    https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/
>    for further explanation.
> 
>    In order to be activated, it requires that glibc is built against
>    kernel headers that include this system call, and that glibc detects
>    availability of that system call at runtime.

Suggest:

* Support for automatically registering threads with the Linux rseq(2)
   system call has been added.  This system call is implemented starting
   from Linux 4.18.  The Restartable Sequences ABI accelerates user-space
   operations on per-cpu data.  It allows user-space to perform updates
   on per-cpu data without requiring heavy-weight atomic operations.
   Automatically registering threads allows all libraries, including libc,
   to make immediate use of the rseq(2) support by using the documented ABI.
   See 'man 2 rseq' for the details of the ABI shared between libc and the
   kernel.

> 
> For reference the assembly code I'm pointing at below can be found
> in the Linux selftests under:
> 
> tools/testing/selftests/rseq/rseq-*.h

OK.


>>> +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h
> [...]
>>> +
>>> +/* Signature required before each abort handler code.  */
>>> +#define RSEQ_SIG 0x53053053
>>
>> Why isn't this an arm specific op code? Does the user have to mark this
>> up as part of a constant pool when placing it in front of the abort handler
>> to avoid disassembling the constant as code? This was at one point required
>> to get gdb to work properly.
>>
> 
> For arm, the abort is defined as:
> 
> #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown,           \
>                                  abort_label, version, flags,            \
>                                  start_ip, post_commit_offset, abort_ip) \
>                  ".balign 32\n\t"                                        \
>                  __rseq_str(table_label) ":\n\t"                         \
>                  ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
>                  ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
>                  ".word " __rseq_str(RSEQ_SIG) "\n\t"                    \
>                  __rseq_str(label) ":\n\t"                               \
>                  teardown                                                \
>                  "b %l[" __rseq_str(abort_label) "]\n\t"
> 
> Which contains a copy of the struct rseq_cs for that critical section
> close to the actual critical section, within the code, followed by the
> signature. The reason why we have a copy of the struct rseq_cs there is
> to speed up entry into the critical section by using the "adr" instruction
> to compute the address to store into __rseq_cs->rseq_cs.
> 
> AFAIU, a literal pool on ARM is defined as something which is always
> jumped over (never executed), which is the case here. We always have
> an unconditional branch instruction ("b") skipping over each
> RSEQ_ASM_DEFINE_ABORT().
> 
> Therefore, given that the signature is part of a literal pool on ARM,
> it can be any value we choose and should not need to be an actual valid
> instruction.
> 
> aarch64 defines the abort as:
> 
> #define RSEQ_ASM_DEFINE_ABORT(label, abort_label)                               \
>          "       b       222f\n"                                                 \
>          "       .inst   "       __rseq_str(RSEQ_SIG) "\n"                       \
>          __rseq_str(label) ":\n"                                                 \
>          "       b       %l[" __rseq_str(abort_label) "]\n"                      \
>          "222:\n"
> 
> Where the signature actually maps to a valid instruction. Considering that
> aarch64 also have literal pools, and we branch over the signature, I wonder
> why it's so important to ensure the signature is a valid trap instruction.
> Perhaps Will Deacon can enlighten us ?

In the event that you accidentally jump to it then you trap?

However, you want an *uncommon* trap insn.

I think the order of preference is:

1.  An uncommon insn (with random immediate values), in a literal pool, that is
     not a useful ROP/JOP sequence (very uncommon)
2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
2b. A NOP to avoid affecting speculative execution (maybe uncommon)

With 2a/2b being roughly equivalent depending on speculative execution policy.

>>> +/* Signature required before each abort handler code.  */
>>> +#define RSEQ_SIG 0x53053053
>>
>> Why isn't this a mips-specific op code?
> 
> MIPS also has a literal pool just before the abort handler, and it
> jumps over it. My understanding is that we can use any signature value
> we want, and it does not need to be a valid instruction, similarly to ARM:
> 
> #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \
>                                  abort_label, version, flags, \
>                                  start_ip, post_commit_offset, abort_ip) \
>                  ".balign 32\n\t" \
>                  __rseq_str(table_label) ":\n\t" \
>                  ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
>                  LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \
>                  LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \
>                  LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \
>                  ".word " __rseq_str(RSEQ_SIG) "\n\t" \
>                  __rseq_str(label) ":\n\t" \
>                  teardown \
>                  "b %l[" __rseq_str(abort_label) "]\n\t"
> 
> Perhaps Paul Burton can confirm this ?

Yes please.

You also want to avoid the value being a valid MIPS insn that's common.

Did you check that?

> [...]
>>> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
> [...]
>>> +/* Signature required before each abort handler code.  */
>>> +#define RSEQ_SIG 0x53053053
>>
>> Why isn't this an opcode specific to power?
> 
> On powerpc 32/64, the abort is placed in a __rseq_failure executable section:
> 
> #define RSEQ_ASM_DEFINE_ABORT(label, abort_label)                               \
>                  ".pushsection __rseq_failure, \"ax\"\n\t"                       \
>                  ".long " __rseq_str(RSEQ_SIG) "\n\t"                            \
>                  __rseq_str(label) ":\n\t"                                       \
>                  "b %l[" __rseq_str(abort_label) "]\n\t"                         \
>                  ".popsection\n\t"
> 
> That section only contains snippets of those trampolines. Arguably, it would be
> good if disassemblers could find valid instructions there. Boqun Feng could perhaps
> shed some light on this signature choice ? Now would be a good time to decide
> once and for all whether a valid instruction would be a better choice.

This seems questionable too.

> [...]
>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
> [...]
>>> +
>>> +/* Signature required before each abort handler code.  */
>>> +#define RSEQ_SIG 0x53053053
>>
>> Why not a s390 specific value here?
> 
> s390 also has the abort handler in a __rseq_failure section:
> 
> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label)             \
>                  ".pushsection __rseq_failure, \"ax\"\n\t"               \
>                  ".long " __rseq_str(RSEQ_SIG) "\n\t"                    \
>                  __rseq_str(label) ":\n\t"                               \
>                  teardown                                                \
>                  "j %l[" __rseq_str(abort_label) "]\n\t"                 \
>                  ".popsection\n\t"
> 
> Same question applies as powerpc: since disassemblers will try to decode
> that instruction, would it be better to define it as a valid one ?

Yes, I think it needs to be a valid uncommon insn or nop.

> [...]
>>> +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
> [...]
>>> +/* Signature required before each abort handler code.  */
>>> +#define RSEQ_SIG 0x53053053
>>
>> Why not an x86-specific op code?
> 
> On x86, we use this 4-byte signature as operand to a "no-op" instruction
> taking 4-byte immediate operand:

That makes perfect sense. Thanks.

So what is left to audit?

In summary:

- Why does AArch64 choose a trap?

- Is the current choice of 0x53053053 OK for MIPS? Does it map to a valid insn?

- What better choice is there for power? Pick a real uncommon insn or nop?

- What better choice is there for s390? Pick a real uncommon insn or nop?
   - Todays choice could become something special in the future since it's unassigned.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-04 20:50       ` Carlos O'Donell
@ 2019-04-04 21:41           ` Paul Burton
  0 siblings, 0 replies; 55+ messages in thread
From: Paul Burton @ 2019-04-04 21:41 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Mathieu Desnoyers, Will Deacon, Boqun Feng, Heiko Carstens,
	Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

Hi Carlos / all,

On Thu, Apr 04, 2019 at 04:50:08PM -0400, Carlos O'Donell wrote:
> > > > +/* Signature required before each abort handler code.  */
> > > > +#define RSEQ_SIG 0x53053053
> > > 
> > > Why isn't this a mips-specific op code?
> > 
> > MIPS also has a literal pool just before the abort handler, and it
> > jumps over it. My understanding is that we can use any signature value
> > we want, and it does not need to be a valid instruction, similarly to ARM:
> > 
> > #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \
> >                                  abort_label, version, flags, \
> >                                  start_ip, post_commit_offset, abort_ip) \
> >                  ".balign 32\n\t" \
> >                  __rseq_str(table_label) ":\n\t" \
> >                  ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> >                  LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \
> >                  LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \
> >                  LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \
> >                  ".word " __rseq_str(RSEQ_SIG) "\n\t" \
> >                  __rseq_str(label) ":\n\t" \
> >                  teardown \
> >                  "b %l[" __rseq_str(abort_label) "]\n\t"
> > 
> > Perhaps Paul Burton can confirm this ?
> 
> Yes please.
> 
> You also want to avoid the value being a valid MIPS insn that's common.
> 
> Did you check that?

This does not decode as a standard MIPS instruction, though it does
decode for both the microMIPS (ori) & nanoMIPS (lwxs; sll) ISAs.

I imagine I copied the value from another architecture when porting, and
since it doesn't get executed it seemed fine.

One maybe nicer option along the same lines would be 0x72736571 or
0x71657372 (ASCII 'rseq') neither of which decode as a MIPS instruction.

> I think the order of preference is:
> 
> 1.  An uncommon insn (with random immediate values), in a literal pool, that is
>     not a useful ROP/JOP sequence (very uncommon)

For that option on MIPS we could do something like:

  sll $0, $0, 31     # effectively a nop, but looks weird

> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)

Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS
& classic MIPS) so that could be something like:

  break 0x7273       # ASCII 'rs'

That's pretty unlikely to be seen in normal code, or the teq instruction
has a rarely used code field (4b in microMIPS, 5b in nanoMIPS, 10b in
classic MIPS) that's meaningless to hardware so something like this
would be possible:

  teq $0, $0, 0x8    # ASCII backspace

> 2b. A NOP to avoid affecting speculative execution (maybe uncommon)
> 
> With 2a/2b being roughly equivalent depending on speculative execution policy.

There are a bunch of potential odd looking nops possible, one of which
would be the sll I mentioned above.

Another option would be to use a priveleged instruction which userland
code can't execute & should normally never contain. That would decode as
a valid instruction & effectively behave like a trap instruction but
look very odd to anyone reading disassembled code. eg:

  mfc0 $0, 13        # Try to read the cause register; take SIGILL

In order to handle MIPS vs microMIPS vs nanoMIPS differences I'm
thinking it may be best to switch to one of these real instructions that
looks strange. The ugly part would be the nest of #ifdef's to deal with
endianness & ISA when defining it as a number...

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-04 21:41           ` Paul Burton
  0 siblings, 0 replies; 55+ messages in thread
From: Paul Burton @ 2019-04-04 21:41 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Mathieu Desnoyers, Will Deacon, Boqun Feng, Heiko Carstens,
	Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave

Hi Carlos / all,

On Thu, Apr 04, 2019 at 04:50:08PM -0400, Carlos O'Donell wrote:
> > > > +/* Signature required before each abort handler code.  */
> > > > +#define RSEQ_SIG 0x53053053
> > > 
> > > Why isn't this a mips-specific op code?
> > 
> > MIPS also has a literal pool just before the abort handler, and it
> > jumps over it. My understanding is that we can use any signature value
> > we want, and it does not need to be a valid instruction, similarly to ARM:
> > 
> > #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \
> >                                  abort_label, version, flags, \
> >                                  start_ip, post_commit_offset, abort_ip) \
> >                  ".balign 32\n\t" \
> >                  __rseq_str(table_label) ":\n\t" \
> >                  ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
> >                  LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \
> >                  LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \
> >                  LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \
> >                  ".word " __rseq_str(RSEQ_SIG) "\n\t" \
> >                  __rseq_str(label) ":\n\t" \
> >                  teardown \
> >                  "b %l[" __rseq_str(abort_label) "]\n\t"
> > 
> > Perhaps Paul Burton can confirm this ?
> 
> Yes please.
> 
> You also want to avoid the value being a valid MIPS insn that's common.
> 
> Did you check that?

This does not decode as a standard MIPS instruction, though it does
decode for both the microMIPS (ori) & nanoMIPS (lwxs; sll) ISAs.

I imagine I copied the value from another architecture when porting, and
since it doesn't get executed it seemed fine.

One maybe nicer option along the same lines would be 0x72736571 or
0x71657372 (ASCII 'rseq') neither of which decode as a MIPS instruction.

> I think the order of preference is:
> 
> 1.  An uncommon insn (with random immediate values), in a literal pool, that is
>     not a useful ROP/JOP sequence (very uncommon)

For that option on MIPS we could do something like:

  sll $0, $0, 31     # effectively a nop, but looks weird

> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)

Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS
& classic MIPS) so that could be something like:

  break 0x7273       # ASCII 'rs'

That's pretty unlikely to be seen in normal code, or the teq instruction
has a rarely used code field (4b in microMIPS, 5b in nanoMIPS, 10b in
classic MIPS) that's meaningless to hardware so something like this
would be possible:

  teq $0, $0, 0x8    # ASCII backspace

> 2b. A NOP to avoid affecting speculative execution (maybe uncommon)
> 
> With 2a/2b being roughly equivalent depending on speculative execution policy.

There are a bunch of potential odd looking nops possible, one of which
would be the sll I mentioned above.

Another option would be to use a priveleged instruction which userland
code can't execute & should normally never contain. That would decode as
a valid instruction & effectively behave like a trap instruction but
look very odd to anyone reading disassembled code. eg:

  mfc0 $0, 13        # Try to read the cause register; take SIGILL

In order to handle MIPS vs microMIPS vs nanoMIPS differences I'm
thinking it may be best to switch to one of these real instructions that
looks strange. The ugly part would be the nest of #ifdef's to deal with
endianness & ISA when defining it as a number...

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-04 20:32             ` Carlos O'Donell
@ 2019-04-05  9:16               ` Florian Weimer
  -1 siblings, 0 replies; 55+ messages in thread
From: Florian Weimer @ 2019-04-05  9:16 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Michael Ellerman, Mathieu Desnoyers, Paul Burton, Will Deacon,
	Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky,
	Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos,
	Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
	Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson,
	Paul Turner, Rich Felker, linux-kernel, linux-api

* Carlos O'Donell:

> On 4/2/19 3:08 AM, Florian Weimer wrote:
>> * Michael Ellerman:
>>
>>> I'm a bit vague on what we're trying to do here.
>>>
>>> But it seems like you want some sort of "eye catcher" prior to the branch?
>>>
>>> That value is a valid instruction on current CPUs (rlwimi.
>>> r5,r24,6,1,9), and even if it wasn't it could become one in future.
>>>
>>> If you change it to 0x8053530 that is both a valid instruction and is a
>>> nop (conditional trap immediate but with no conditions set).
>>
>> I think we need something that is very unlikely to appear in the
>> instruction stream.  It's just a marker.  The instruction will never be
>> executed, and it does not have to be a trap, either (I believe that a
>> standard trap instruction would be a bad choice).
>
> I assume you want to avoid a standard trap instruction because it would
> be common, and so not meet the intent of the RSEQ_SIG choice as being something
> that is *uncommon* right?

Ideally, RSEQ_SIG would be something that does not show up in the
instruction stream at all, so that it is a reliable marker for the start
of an rseq handler.  I assume the intent here is that the kernel
provides some validation on the program counter it reads from the rseq
area, so that we do not end up with some easily-abused gadget in every
process image.

> It is valuable that it be a trap, particularly for constant pools because
> it means that a jump into the constant pool will trap.

Sorry, I don't understand why this matters in this context.  Would you
please elaborate?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-05  9:16               ` Florian Weimer
  0 siblings, 0 replies; 55+ messages in thread
From: Florian Weimer @ 2019-04-05  9:16 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Michael Ellerman, Mathieu Desnoyers, Paul Burton, Will Deacon,
	Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky,
	Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos,
	Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
	Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave

* Carlos O'Donell:

> On 4/2/19 3:08 AM, Florian Weimer wrote:
>> * Michael Ellerman:
>>
>>> I'm a bit vague on what we're trying to do here.
>>>
>>> But it seems like you want some sort of "eye catcher" prior to the branch?
>>>
>>> That value is a valid instruction on current CPUs (rlwimi.
>>> r5,r24,6,1,9), and even if it wasn't it could become one in future.
>>>
>>> If you change it to 0x8053530 that is both a valid instruction and is a
>>> nop (conditional trap immediate but with no conditions set).
>>
>> I think we need something that is very unlikely to appear in the
>> instruction stream.  It's just a marker.  The instruction will never be
>> executed, and it does not have to be a trap, either (I believe that a
>> standard trap instruction would be a bad choice).
>
> I assume you want to avoid a standard trap instruction because it would
> be common, and so not meet the intent of the RSEQ_SIG choice as being something
> that is *uncommon* right?

Ideally, RSEQ_SIG would be something that does not show up in the
instruction stream at all, so that it is a reliable marker for the start
of an rseq handler.  I assume the intent here is that the kernel
provides some validation on the program counter it reads from the rseq
area, so that we do not end up with some easily-abused gadget in every
process image.

> It is valuable that it be a trap, particularly for constant pools because
> it means that a jump into the constant pool will trap.

Sorry, I don't understand why this matters in this context.  Would you
please elaborate?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-05  9:16               ` Florian Weimer
@ 2019-04-05 15:40                 ` Carlos O'Donell
  -1 siblings, 0 replies; 55+ messages in thread
From: Carlos O'Donell @ 2019-04-05 15:40 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Michael Ellerman, Mathieu Desnoyers, Paul Burton, Will Deacon,
	Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky,
	Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos,
	Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
	Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson,
	Paul Turner, Rich Felker, linux-kernel, linux-api

On 4/5/19 5:16 AM, Florian Weimer wrote:
> * Carlos O'Donell:
>> It is valuable that it be a trap, particularly for constant pools because
>> it means that a jump into the constant pool will trap.
> 
> Sorry, I don't understand why this matters in this context.  Would you
> please elaborate?

Sorry, I wasn't very clear.

My point is only that any accidental jumps, either with off-by-one (like you
fixed in gcc/glibc's signal unwinding most recently), result in a process fault
rather than executing RSEQ_SIG as a valid instruction *and then* continuing
onwards to the handler.

A process fault is achieved either by a trap, or an invalid instruction, or
a privileged insn (like suggested for MIPS in this thread).

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-05 15:40                 ` Carlos O'Donell
  0 siblings, 0 replies; 55+ messages in thread
From: Carlos O'Donell @ 2019-04-05 15:40 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Michael Ellerman, Mathieu Desnoyers, Paul Burton, Will Deacon,
	Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky,
	Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos,
	Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
	Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave

On 4/5/19 5:16 AM, Florian Weimer wrote:
> * Carlos O'Donell:
>> It is valuable that it be a trap, particularly for constant pools because
>> it means that a jump into the constant pool will trap.
> 
> Sorry, I don't understand why this matters in this context.  Would you
> please elaborate?

Sorry, I wasn't very clear.

My point is only that any accidental jumps, either with off-by-one (like you
fixed in gcc/glibc's signal unwinding most recently), result in a process fault
rather than executing RSEQ_SIG as a valid instruction *and then* continuing
onwards to the handler.

A process fault is achieved either by a trap, or an invalid instruction, or
a privileged insn (like suggested for MIPS in this thread).

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-05 15:40                 ` Carlos O'Donell
@ 2019-04-08 19:20                   ` Tulio Magno Quites Machado Filho
  -1 siblings, 0 replies; 55+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2019-04-08 19:20 UTC (permalink / raw)
  To: Carlos O'Donell, Florian Weimer, Michael Meissner,
	Alan Modra, Peter Bergner, Michael Ellerman
  Cc: Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers,
	Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer,
	Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner,
	Rich Felker, linux-kernel, linux-api

Carlos O'Donell <codonell@redhat.com> writes:

> On 4/5/19 5:16 AM, Florian Weimer wrote:
>> * Carlos O'Donell:
>>> It is valuable that it be a trap, particularly for constant pools because
>>> it means that a jump into the constant pool will trap.
>> 
>> Sorry, I don't understand why this matters in this context.  Would you
>> please elaborate?
>
> Sorry, I wasn't very clear.
>
> My point is only that any accidental jumps, either with off-by-one (like you
> fixed in gcc/glibc's signal unwinding most recently), result in a process fault
> rather than executing RSEQ_SIG as a valid instruction *and then* continuing
> onwards to the handler.
>
> A process fault is achieved either by a trap, or an invalid instruction, or
> a privileged insn (like suggested for MIPS in this thread).

In that case, mtmsr (Move to Machine State Register) seems a good candidate.

mtmsr is available both on 32 and 64 bits since their first implementations.

It's a privileged instruction and should never appear in userspace
code (causes SIGILL).

Any comments?

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-08 19:20                   ` Tulio Magno Quites Machado Filho
  0 siblings, 0 replies; 55+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2019-04-08 19:20 UTC (permalink / raw)
  To: Carlos O'Donell, Florian Weimer, Michael Meissner,
	Alan Modra, Peter Bergner, Michael Ellerman
  Cc: Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers,
	Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer,
	Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner

Carlos O'Donell <codonell@redhat.com> writes:

> On 4/5/19 5:16 AM, Florian Weimer wrote:
>> * Carlos O'Donell:
>>> It is valuable that it be a trap, particularly for constant pools because
>>> it means that a jump into the constant pool will trap.
>> 
>> Sorry, I don't understand why this matters in this context.  Would you
>> please elaborate?
>
> Sorry, I wasn't very clear.
>
> My point is only that any accidental jumps, either with off-by-one (like you
> fixed in gcc/glibc's signal unwinding most recently), result in a process fault
> rather than executing RSEQ_SIG as a valid instruction *and then* continuing
> onwards to the handler.
>
> A process fault is achieved either by a trap, or an invalid instruction, or
> a privileged insn (like suggested for MIPS in this thread).

In that case, mtmsr (Move to Machine State Register) seems a good candidate.

mtmsr is available both on 32 and 64 bits since their first implementations.

It's a privileged instruction and should never appear in userspace
code (causes SIGILL).

Any comments?

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-08 19:20                   ` Tulio Magno Quites Machado Filho
@ 2019-04-08 21:45                     ` Carlos O'Donell
  -1 siblings, 0 replies; 55+ messages in thread
From: Carlos O'Donell @ 2019-04-08 21:45 UTC (permalink / raw)
  To: Tulio Magno Quites Machado Filho, Florian Weimer,
	Michael Meissner, Alan Modra, Peter Bergner, Michael Ellerman,
	Mathieu Desnoyers
  Cc: Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens,
	Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers,
	Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer,
	Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner,
	Rich Felker, linux-kernel, linux-api

On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote:
> Carlos O'Donell <codonell@redhat.com> writes:
> 
>> On 4/5/19 5:16 AM, Florian Weimer wrote:
>>> * Carlos O'Donell:
>>>> It is valuable that it be a trap, particularly for constant pools because
>>>> it means that a jump into the constant pool will trap.
>>>
>>> Sorry, I don't understand why this matters in this context.  Would you
>>> please elaborate?
>>
>> Sorry, I wasn't very clear.
>>
>> My point is only that any accidental jumps, either with off-by-one (like you
>> fixed in gcc/glibc's signal unwinding most recently), result in a process fault
>> rather than executing RSEQ_SIG as a valid instruction *and then* continuing
>> onwards to the handler.
>>
>> A process fault is achieved either by a trap, or an invalid instruction, or
>> a privileged insn (like suggested for MIPS in this thread).
> 
> In that case, mtmsr (Move to Machine State Register) seems a good candidate.
> 
> mtmsr is available both on 32 and 64 bits since their first implementations.
> 
> It's a privileged instruction and should never appear in userspace
> code (causes SIGILL).
> 
> Any comments?
  
That seems good to me.

Mathieu,

What's required to move this forward for POWER?

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-08 21:45                     ` Carlos O'Donell
  0 siblings, 0 replies; 55+ messages in thread
From: Carlos O'Donell @ 2019-04-08 21:45 UTC (permalink / raw)
  To: Tulio Magno Quites Machado Filho, Florian Weimer,
	Michael Meissner, Alan Modra, Peter Bergner, Michael Ellerman,
	Mathieu Desnoyers
  Cc: Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens,
	Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers,
	Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer,
	Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner,
	Rich Felker

On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote:
> Carlos O'Donell <codonell@redhat.com> writes:
> 
>> On 4/5/19 5:16 AM, Florian Weimer wrote:
>>> * Carlos O'Donell:
>>>> It is valuable that it be a trap, particularly for constant pools because
>>>> it means that a jump into the constant pool will trap.
>>>
>>> Sorry, I don't understand why this matters in this context.  Would you
>>> please elaborate?
>>
>> Sorry, I wasn't very clear.
>>
>> My point is only that any accidental jumps, either with off-by-one (like you
>> fixed in gcc/glibc's signal unwinding most recently), result in a process fault
>> rather than executing RSEQ_SIG as a valid instruction *and then* continuing
>> onwards to the handler.
>>
>> A process fault is achieved either by a trap, or an invalid instruction, or
>> a privileged insn (like suggested for MIPS in this thread).
> 
> In that case, mtmsr (Move to Machine State Register) seems a good candidate.
> 
> mtmsr is available both on 32 and 64 bits since their first implementations.
> 
> It's a privileged instruction and should never appear in userspace
> code (causes SIGILL).
> 
> Any comments?
  
That seems good to me.

Mathieu,

What's required to move this forward for POWER?

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-08 21:45                     ` Carlos O'Donell
@ 2019-04-09  4:23                       ` Michael Ellerman
  -1 siblings, 0 replies; 55+ messages in thread
From: Michael Ellerman @ 2019-04-09  4:23 UTC (permalink / raw)
  To: Carlos O'Donell, Tulio Magno Quites Machado Filho,
	Florian Weimer, Michael Meissner, Alan Modra, Peter Bergner,
	Mathieu Desnoyers
  Cc: Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens,
	Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers,
	Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer,
	Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner,
	Rich Felker, linux-kernel, linux-api

Carlos O'Donell <codonell@redhat.com> writes:
> On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote:
>> Carlos O'Donell <codonell@redhat.com> writes:
>> 
>>> On 4/5/19 5:16 AM, Florian Weimer wrote:
>>>> * Carlos O'Donell:
>>>>> It is valuable that it be a trap, particularly for constant pools because
>>>>> it means that a jump into the constant pool will trap.
>>>>
>>>> Sorry, I don't understand why this matters in this context.  Would you
>>>> please elaborate?
>>>
>>> Sorry, I wasn't very clear.
>>>
>>> My point is only that any accidental jumps, either with off-by-one (like you
>>> fixed in gcc/glibc's signal unwinding most recently), result in a process fault
>>> rather than executing RSEQ_SIG as a valid instruction *and then* continuing
>>> onwards to the handler.
>>>
>>> A process fault is achieved either by a trap, or an invalid instruction, or
>>> a privileged insn (like suggested for MIPS in this thread).
>> 
>> In that case, mtmsr (Move to Machine State Register) seems a good candidate.
>> 
>> mtmsr is available both on 32 and 64 bits since their first implementations.
>> 
>> It's a privileged instruction and should never appear in userspace
>> code (causes SIGILL).

I'd much rather we use a trap with a specific immediate value. Otherwise
someone's going to waste time one day puzzling over why userspace is
doing mtmsr.

It would also complicate things if we ever wanted to emulate mtmsr.

If we want something that is a trap rather than a nop then use 0x0fe50553.

That's "compare the value in r5 with 0x553 and then trap unconditionally".

It shows up in objdump as:

    10000000:	53 05 e5 0f 	twui    r5,1363


The immediate can be anything, I chose that value to mimic the x86 value
Mathieu mentioned.

There's no reason that instruction would ever be generated because the
immediate value serves no purpose. So it satisfies the "very unlikely
to appear" criteria AFAICS.

cheers

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-09  4:23                       ` Michael Ellerman
  0 siblings, 0 replies; 55+ messages in thread
From: Michael Ellerman @ 2019-04-09  4:23 UTC (permalink / raw)
  To: Carlos O'Donell, Tulio Magno Quites Machado Filho,
	Florian Weimer, Michael Meissner, Alan Modra, Peter Bergner,
	Mathieu Desnoyers
  Cc: Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens,
	Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers,
	Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer,
	Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner,
	Rich Felker

Carlos O'Donell <codonell@redhat.com> writes:
> On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote:
>> Carlos O'Donell <codonell@redhat.com> writes:
>> 
>>> On 4/5/19 5:16 AM, Florian Weimer wrote:
>>>> * Carlos O'Donell:
>>>>> It is valuable that it be a trap, particularly for constant pools because
>>>>> it means that a jump into the constant pool will trap.
>>>>
>>>> Sorry, I don't understand why this matters in this context.  Would you
>>>> please elaborate?
>>>
>>> Sorry, I wasn't very clear.
>>>
>>> My point is only that any accidental jumps, either with off-by-one (like you
>>> fixed in gcc/glibc's signal unwinding most recently), result in a process fault
>>> rather than executing RSEQ_SIG as a valid instruction *and then* continuing
>>> onwards to the handler.
>>>
>>> A process fault is achieved either by a trap, or an invalid instruction, or
>>> a privileged insn (like suggested for MIPS in this thread).
>> 
>> In that case, mtmsr (Move to Machine State Register) seems a good candidate.
>> 
>> mtmsr is available both on 32 and 64 bits since their first implementations.
>> 
>> It's a privileged instruction and should never appear in userspace
>> code (causes SIGILL).

I'd much rather we use a trap with a specific immediate value. Otherwise
someone's going to waste time one day puzzling over why userspace is
doing mtmsr.

It would also complicate things if we ever wanted to emulate mtmsr.

If we want something that is a trap rather than a nop then use 0x0fe50553.

That's "compare the value in r5 with 0x553 and then trap unconditionally".

It shows up in objdump as:

    10000000:	53 05 e5 0f 	twui    r5,1363


The immediate can be anything, I chose that value to mimic the x86 value
Mathieu mentioned.

There's no reason that instruction would ever be generated because the
immediate value serves no purpose. So it satisfies the "very unlikely
to appear" criteria AFAICS.

cheers

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-09  4:23                       ` Michael Ellerman
@ 2019-04-09  9:29                         ` Alan Modra
  -1 siblings, 0 replies; 55+ messages in thread
From: Alan Modra @ 2019-04-09  9:29 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Carlos O'Donell, Tulio Magno Quites Machado Filho,
	Florian Weimer, Michael Meissner, Peter Bergner,
	Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers,
	Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer,
	Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner,
	Rich Felker, linux-kernel, linux-api

On Tue, Apr 09, 2019 at 02:23:53PM +1000, Michael Ellerman wrote:
> I'd much rather we use a trap with a specific immediate value. Otherwise
> someone's going to waste time one day puzzling over why userspace is
> doing mtmsr.

It's data.  We have other data in executable sections.  Anyone who
wonders about odd disassembly just hasn't realized they are
disassembling data.

> It would also complicate things if we ever wanted to emulate mtmsr.

No, because it won't be executed.  If I understand correctly, the only
reason to choose an illegal, trap or privileged insn is to halt
execution earlier rather than later when a program goes off in the
weeds.

> If we want something that is a trap rather than a nop then use 0x0fe50553.
> 
> That's "compare the value in r5 with 0x553 and then trap unconditionally".
> 
> It shows up in objdump as:
> 
>     10000000:	53 05 e5 0f 	twui    r5,1363
> 
> 
> The immediate can be anything, I chose that value to mimic the x86 value
> Mathieu mentioned.
> 
> There's no reason that instruction would ever be generated because the
> immediate value serves no purpose. So it satisfies the "very unlikely
> to appear" criteria AFAICS.

Yes, looks fine to me, except that in VLE mode (do we care?)
".long 0x0fe50553" disassembles as
   0:	0f e5       	se_cmphl r5,r30
   2:	05 53       	se_mullw r3,r5
No illegal/trap/privileged insn there.

".long 0x0fe5000b" might be better to cover VLE.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-09  9:29                         ` Alan Modra
  0 siblings, 0 replies; 55+ messages in thread
From: Alan Modra @ 2019-04-09  9:29 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Carlos O'Donell, Tulio Magno Quites Machado Filho,
	Florian Weimer, Michael Meissner, Peter Bergner,
	Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers,
	Szabolcs Nagy, libc-alpha

On Tue, Apr 09, 2019 at 02:23:53PM +1000, Michael Ellerman wrote:
> I'd much rather we use a trap with a specific immediate value. Otherwise
> someone's going to waste time one day puzzling over why userspace is
> doing mtmsr.

It's data.  We have other data in executable sections.  Anyone who
wonders about odd disassembly just hasn't realized they are
disassembling data.

> It would also complicate things if we ever wanted to emulate mtmsr.

No, because it won't be executed.  If I understand correctly, the only
reason to choose an illegal, trap or privileged insn is to halt
execution earlier rather than later when a program goes off in the
weeds.

> If we want something that is a trap rather than a nop then use 0x0fe50553.
> 
> That's "compare the value in r5 with 0x553 and then trap unconditionally".
> 
> It shows up in objdump as:
> 
>     10000000:	53 05 e5 0f 	twui    r5,1363
> 
> 
> The immediate can be anything, I chose that value to mimic the x86 value
> Mathieu mentioned.
> 
> There's no reason that instruction would ever be generated because the
> immediate value serves no purpose. So it satisfies the "very unlikely
> to appear" criteria AFAICS.

Yes, looks fine to me, except that in VLE mode (do we care?)
".long 0x0fe50553" disassembles as
   0:	0f e5       	se_cmphl r5,r30
   2:	05 53       	se_mullw r3,r5
No illegal/trap/privileged insn there.

".long 0x0fe5000b" might be better to cover VLE.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-09  9:29                         ` Alan Modra
  (?)
@ 2019-04-09 13:58                         ` Tulio Magno Quites Machado Filho
  2019-04-09 14:13                             ` Carlos O'Donell
  -1 siblings, 1 reply; 55+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2019-04-09 13:58 UTC (permalink / raw)
  To: Alan Modra, Michael Ellerman
  Cc: Carlos O'Donell, Florian Weimer, Michael Meissner,
	Peter Bergner, Mathieu Desnoyers, Paul Burton, Will Deacon,
	Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky,
	Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos,
	Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
	Ben Maurer, Peter Zijlstra

Alan Modra <amodra@gmail.com> writes:

> On Tue, Apr 09, 2019 at 02:23:53PM +1000, Michael Ellerman wrote:
>> I'd much rather we use a trap with a specific immediate value. Otherwise
>> someone's going to waste time one day puzzling over why userspace is
>> doing mtmsr.
>
> It's data.  We have other data in executable sections.  Anyone who
> wonders about odd disassembly just hasn't realized they are
> disassembling data.
>
>> It would also complicate things if we ever wanted to emulate mtmsr.
>
> No, because it won't be executed.  If I understand correctly, the only
> reason to choose an illegal, trap or privileged insn is to halt
> execution earlier rather than later when a program goes off in the
> weeds.
>
>> If we want something that is a trap rather than a nop then use 0x0fe50553.
>> 
>> That's "compare the value in r5 with 0x553 and then trap unconditionally".
>> 
>> It shows up in objdump as:
>> 
>>     10000000:	53 05 e5 0f 	twui    r5,1363
>> 
>> 
>> The immediate can be anything, I chose that value to mimic the x86 value
>> Mathieu mentioned.
>> 
>> There's no reason that instruction would ever be generated because the
>> immediate value serves no purpose. So it satisfies the "very unlikely
>> to appear" criteria AFAICS.
>
> Yes, looks fine to me, except that in VLE mode (do we care?)
> ".long 0x0fe50553" disassembles as
>    0:	0f e5       	se_cmphl r5,r30
>    2:	05 53       	se_mullw r3,r5
> No illegal/trap/privileged insn there.
>
> ".long 0x0fe5000b" might be better to cover VLE.

Looks good for me too.

Actually, it better fits what Carlos O'Donnell had requested:

>>> I think the order of preference is:
>>> 
>>> 1.  An uncommon insn (with random immediate values), in a literal pool, that is
>>>      not a useful ROP/JOP sequence (very uncommon)
>>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
>>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon)
>>> 
>>> With 2a/2b being roughly equivalent depending on speculative execution policy.

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-09 13:58                         ` Tulio Magno Quites Machado Filho
@ 2019-04-09 14:13                             ` Carlos O'Donell
  0 siblings, 0 replies; 55+ messages in thread
From: Carlos O'Donell @ 2019-04-09 14:13 UTC (permalink / raw)
  To: Tulio Magno Quites Machado Filho, Alan Modra, Michael Ellerman
  Cc: Florian Weimer, Michael Meissner, Peter Bergner,
	Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers,
	Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer,
	Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner,
	Rich Felker, linux-kernel, linux-api

On 4/9/19 9:58 AM, Tulio Magno Quites Machado Filho wrote:
> Alan Modra <amodra@gmail.com> writes:
>> Yes, looks fine to me, except that in VLE mode (do we care?)
>> ".long 0x0fe50553" disassembles as
>>     0:	0f e5       	se_cmphl r5,r30
>>     2:	05 53       	se_mullw r3,r5
>> No illegal/trap/privileged insn there.
>>
>> ".long 0x0fe5000b" might be better to cover VLE.
> 
> Looks good for me too.

The requirement that it be a valid instruction is simply to aid in the
disassembly of rseq regions which may be hand written assembly with a
thin veneer of CFI/DWARF information.

It has already been pointed out that POWER uses data in the instruction
stream for jump tables to implement switch statements, but that specific
use has compiler support and one presumes good debug information. So as
Alan says, there is already data in the insn stream, though such things
can't be good for performance (pollutes D-cache, problematic for
speculative execution).

> Actually, it better fits what Carlos O'Donnell had requested:
> 
>>>> I think the order of preference is:
>>>>
>>>> 1.  An uncommon insn (with random immediate values), in a literal pool, that is
>>>>       not a useful ROP/JOP sequence (very uncommon)
>>>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
>>>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon)
>>>>
>>>> With 2a/2b being roughly equivalent depending on speculative execution policy.

Yes, though "in a literal pool" is something that is not required, since
users might not want literal pools and so we shouldn't require that
feature (it also pollutes D-cache).

Keep in mind the insn will never execute.

If a trap insn calls out the nature of the signature more clearly then
use that instead.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-09 14:13                             ` Carlos O'Donell
  0 siblings, 0 replies; 55+ messages in thread
From: Carlos O'Donell @ 2019-04-09 14:13 UTC (permalink / raw)
  To: Tulio Magno Quites Machado Filho, Alan Modra, Michael Ellerman
  Cc: Florian Weimer, Michael Meissner, Peter Bergner,
	Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng,
	Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King,
	Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers,
	Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer

On 4/9/19 9:58 AM, Tulio Magno Quites Machado Filho wrote:
> Alan Modra <amodra@gmail.com> writes:
>> Yes, looks fine to me, except that in VLE mode (do we care?)
>> ".long 0x0fe50553" disassembles as
>>     0:	0f e5       	se_cmphl r5,r30
>>     2:	05 53       	se_mullw r3,r5
>> No illegal/trap/privileged insn there.
>>
>> ".long 0x0fe5000b" might be better to cover VLE.
> 
> Looks good for me too.

The requirement that it be a valid instruction is simply to aid in the
disassembly of rseq regions which may be hand written assembly with a
thin veneer of CFI/DWARF information.

It has already been pointed out that POWER uses data in the instruction
stream for jump tables to implement switch statements, but that specific
use has compiler support and one presumes good debug information. So as
Alan says, there is already data in the insn stream, though such things
can't be good for performance (pollutes D-cache, problematic for
speculative execution).

> Actually, it better fits what Carlos O'Donnell had requested:
> 
>>>> I think the order of preference is:
>>>>
>>>> 1.  An uncommon insn (with random immediate values), in a literal pool, that is
>>>>       not a useful ROP/JOP sequence (very uncommon)
>>>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
>>>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon)
>>>>
>>>> With 2a/2b being roughly equivalent depending on speculative execution policy.

Yes, though "in a literal pool" is something that is not required, since
users might not want literal pools and so we shouldn't require that
feature (it also pollutes D-cache).

Keep in mind the insn will never execute.

If a trap insn calls out the nature of the signature more clearly then
use that instead.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-09 14:13                             ` Carlos O'Donell
@ 2019-04-09 15:45                               ` Mathieu Desnoyers
  -1 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-04-09 15:45 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Tulio Magno Quites Machado Filho, Alan Modra, Michael Ellerman,
	Florian Weimer, Michael Meissner, Peter Bergner, Paul Burton,
	Will Deacon, Boqun Feng, heiko carstens, gor, schwidefsky,
	Russell King, ARM Linux, Benjamin Herrenschmidt, Paul Mackerras,
	carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
	Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson,
	Paul Turner, Rich Felker, linux-kernel, linux-api

----- On Apr 9, 2019, at 10:13 AM, Carlos O'Donell codonell@redhat.com wrote:

> On 4/9/19 9:58 AM, Tulio Magno Quites Machado Filho wrote:
>> Alan Modra <amodra@gmail.com> writes:
>>> Yes, looks fine to me, except that in VLE mode (do we care?)
>>> ".long 0x0fe50553" disassembles as
>>>     0:	0f e5       	se_cmphl r5,r30
>>>     2:	05 53       	se_mullw r3,r5
>>> No illegal/trap/privileged insn there.
>>>
>>> ".long 0x0fe5000b" might be better to cover VLE.
>> 
>> Looks good for me too.
> 
> The requirement that it be a valid instruction is simply to aid in the
> disassembly of rseq regions which may be hand written assembly with a
> thin veneer of CFI/DWARF information.
> 
> It has already been pointed out that POWER uses data in the instruction
> stream for jump tables to implement switch statements, but that specific
> use has compiler support and one presumes good debug information. So as
> Alan says, there is already data in the insn stream, though such things
> can't be good for performance (pollutes D-cache, problematic for
> speculative execution).
> 
>> Actually, it better fits what Carlos O'Donnell had requested:
>> 
>>>>> I think the order of preference is:
>>>>>
>>>>> 1.  An uncommon insn (with random immediate values), in a literal pool, that is
>>>>>       not a useful ROP/JOP sequence (very uncommon)
>>>>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
>>>>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon)
>>>>>
>>>>> With 2a/2b being roughly equivalent depending on speculative execution policy.
> 
> Yes, though "in a literal pool" is something that is not required, since
> users might not want literal pools and so we shouldn't require that
> feature (it also pollutes D-cache).
> 
> Keep in mind the insn will never execute.
> 
> If a trap insn calls out the nature of the signature more clearly then
> use that instead.

So based on the recent discussions, there are a few things we can conclude:

- Choosing a random value and relying on literal pools is a bad idea, because some
  compilation environments disable them entirely,
- We ideally want the signature to be a valid instruction in the instruction set
  so disassembler/emulator tools don't get confused and we don't hurt speculative
  execution.
- Best option is a trap with an unlikely immediate opcode, because it traps on the
  instruction in case the program try to execute it by mistake,
- Second best would be a no-op with an unlikely immediate opcode,
- We may want to stay away from privileged instructions because they can confuse
  emulators with may try to emulate them,
- Some architectures have big endian/little endian variants. We may need to carefully
  #ifdef each case so the numeric value matches actual instructions,
- Some architectures have extensions to their instruction set (e.g. ARM thumb, power
  VLE) which can be combined with the basic instruction set within the same program.
  We need to decide whether we care what those signatures look like in those
  instruction set extensions or not. Is it a best effort to match real instructions
  or a hard requirement ? If it's a hard requirement, we may need to extend the rseq
  system call with new flags to accept more than one signature.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-09 15:45                               ` Mathieu Desnoyers
  0 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-04-09 15:45 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Tulio Magno Quites Machado Filho, Alan Modra, Michael Ellerman,
	Florian Weimer, Michael Meissner, Peter Bergner, Paul Burton,
	Will Deacon, Boqun Feng, heiko carstens, gor, schwidefsky,
	Russell King, ARM Linux, Benjamin Herrenschmidt, Paul Mackerras,
	carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner

----- On Apr 9, 2019, at 10:13 AM, Carlos O'Donell codonell@redhat.com wrote:

> On 4/9/19 9:58 AM, Tulio Magno Quites Machado Filho wrote:
>> Alan Modra <amodra@gmail.com> writes:
>>> Yes, looks fine to me, except that in VLE mode (do we care?)
>>> ".long 0x0fe50553" disassembles as
>>>     0:	0f e5       	se_cmphl r5,r30
>>>     2:	05 53       	se_mullw r3,r5
>>> No illegal/trap/privileged insn there.
>>>
>>> ".long 0x0fe5000b" might be better to cover VLE.
>> 
>> Looks good for me too.
> 
> The requirement that it be a valid instruction is simply to aid in the
> disassembly of rseq regions which may be hand written assembly with a
> thin veneer of CFI/DWARF information.
> 
> It has already been pointed out that POWER uses data in the instruction
> stream for jump tables to implement switch statements, but that specific
> use has compiler support and one presumes good debug information. So as
> Alan says, there is already data in the insn stream, though such things
> can't be good for performance (pollutes D-cache, problematic for
> speculative execution).
> 
>> Actually, it better fits what Carlos O'Donnell had requested:
>> 
>>>>> I think the order of preference is:
>>>>>
>>>>> 1.  An uncommon insn (with random immediate values), in a literal pool, that is
>>>>>       not a useful ROP/JOP sequence (very uncommon)
>>>>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
>>>>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon)
>>>>>
>>>>> With 2a/2b being roughly equivalent depending on speculative execution policy.
> 
> Yes, though "in a literal pool" is something that is not required, since
> users might not want literal pools and so we shouldn't require that
> feature (it also pollutes D-cache).
> 
> Keep in mind the insn will never execute.
> 
> If a trap insn calls out the nature of the signature more clearly then
> use that instead.

So based on the recent discussions, there are a few things we can conclude:

- Choosing a random value and relying on literal pools is a bad idea, because some
  compilation environments disable them entirely,
- We ideally want the signature to be a valid instruction in the instruction set
  so disassembler/emulator tools don't get confused and we don't hurt speculative
  execution.
- Best option is a trap with an unlikely immediate opcode, because it traps on the
  instruction in case the program try to execute it by mistake,
- Second best would be a no-op with an unlikely immediate opcode,
- We may want to stay away from privileged instructions because they can confuse
  emulators with may try to emulate them,
- Some architectures have big endian/little endian variants. We may need to carefully
  #ifdef each case so the numeric value matches actual instructions,
- Some architectures have extensions to their instruction set (e.g. ARM thumb, power
  VLE) which can be combined with the basic instruction set within the same program.
  We need to decide whether we care what those signatures look like in those
  instruction set extensions or not. Is it a best effort to match real instructions
  or a hard requirement ? If it's a hard requirement, we may need to extend the rseq
  system call with new flags to accept more than one signature.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-08 21:45                     ` Carlos O'Donell
@ 2019-04-09 16:33                       ` Mathieu Desnoyers
  -1 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-04-09 16:33 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Tulio Magno Quites Machado Filho, Florian Weimer,
	Michael Meissner, Alan Modra, Peter Bergner, Michael Ellerman,
	Paul Burton, Will Deacon, Boqun Feng, heiko carstens, gor,
	schwidefsky, Russell King, ARM Linux, Benjamin Herrenschmidt,
	Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

----- On Apr 8, 2019, at 5:45 PM, Carlos O'Donell codonell@redhat.com wrote:

> On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote:
>> Carlos O'Donell <codonell@redhat.com> writes:
>> 
>>> On 4/5/19 5:16 AM, Florian Weimer wrote:
>>>> * Carlos O'Donell:
>>>>> It is valuable that it be a trap, particularly for constant pools because
>>>>> it means that a jump into the constant pool will trap.
>>>>
>>>> Sorry, I don't understand why this matters in this context.  Would you
>>>> please elaborate?
>>>
>>> Sorry, I wasn't very clear.
>>>
>>> My point is only that any accidental jumps, either with off-by-one (like you
>>> fixed in gcc/glibc's signal unwinding most recently), result in a process fault
>>> rather than executing RSEQ_SIG as a valid instruction *and then* continuing
>>> onwards to the handler.
>>>
>>> A process fault is achieved either by a trap, or an invalid instruction, or
>>> a privileged insn (like suggested for MIPS in this thread).
>> 
>> In that case, mtmsr (Move to Machine State Register) seems a good candidate.
>> 
>> mtmsr is available both on 32 and 64 bits since their first implementations.
>> 
>> It's a privileged instruction and should never appear in userspace
>> code (causes SIGILL).
>> 
>> Any comments?
>  
> That seems good to me.
> 
> Mathieu,
> 
> What's required to move this forward for POWER?

First, we need to decide whether we need rseq to support more than
one signature per process to cover the VLE extension. If so, we'd
need to extend rseq with a new flag for future kernels.

Then, ideally, we'd need a patch on top of the Linux kernel
tools/testing/selftests/rseq/rseq-ppc.h file that updates
the signature value. I think the current discussion leads us
towards a trap:

/*
 * TODO: document instruction objdump output on each architecture and VLE
 * instruction sets.
 */
#define RSEQ_SIG 0x0fe5000b

Should we do anything specific for big/little endian ? Is the byte order
of the instruction encoding the same as data ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-09 16:33                       ` Mathieu Desnoyers
  0 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-04-09 16:33 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Tulio Magno Quites Machado Filho, Florian Weimer,
	Michael Meissner, Alan Modra, Peter Bergner, Michael Ellerman,
	Paul Burton, Will Deacon, Boqun Feng, heiko carstens, gor,
	schwidefsky, Russell King, ARM Linux, Benjamin Herrenschmidt,
	Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner

----- On Apr 8, 2019, at 5:45 PM, Carlos O'Donell codonell@redhat.com wrote:

> On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote:
>> Carlos O'Donell <codonell@redhat.com> writes:
>> 
>>> On 4/5/19 5:16 AM, Florian Weimer wrote:
>>>> * Carlos O'Donell:
>>>>> It is valuable that it be a trap, particularly for constant pools because
>>>>> it means that a jump into the constant pool will trap.
>>>>
>>>> Sorry, I don't understand why this matters in this context.  Would you
>>>> please elaborate?
>>>
>>> Sorry, I wasn't very clear.
>>>
>>> My point is only that any accidental jumps, either with off-by-one (like you
>>> fixed in gcc/glibc's signal unwinding most recently), result in a process fault
>>> rather than executing RSEQ_SIG as a valid instruction *and then* continuing
>>> onwards to the handler.
>>>
>>> A process fault is achieved either by a trap, or an invalid instruction, or
>>> a privileged insn (like suggested for MIPS in this thread).
>> 
>> In that case, mtmsr (Move to Machine State Register) seems a good candidate.
>> 
>> mtmsr is available both on 32 and 64 bits since their first implementations.
>> 
>> It's a privileged instruction and should never appear in userspace
>> code (causes SIGILL).
>> 
>> Any comments?
>  
> That seems good to me.
> 
> Mathieu,
> 
> What's required to move this forward for POWER?

First, we need to decide whether we need rseq to support more than
one signature per process to cover the VLE extension. If so, we'd
need to extend rseq with a new flag for future kernels.

Then, ideally, we'd need a patch on top of the Linux kernel
tools/testing/selftests/rseq/rseq-ppc.h file that updates
the signature value. I think the current discussion leads us
towards a trap:

/*
 * TODO: document instruction objdump output on each architecture and VLE
 * instruction sets.
 */
#define RSEQ_SIG 0x0fe5000b

Should we do anything specific for big/little endian ? Is the byte order
of the instruction encoding the same as data ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-04 21:41           ` Paul Burton
@ 2019-04-09 16:40             ` Mathieu Desnoyers
  -1 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-04-09 16:40 UTC (permalink / raw)
  To: Paul Burton
  Cc: Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens,
	gor, schwidefsky, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

----- On Apr 4, 2019, at 5:41 PM, Paul Burton paul.burton@mips.com wrote:

> Hi Carlos / all,
> 
> On Thu, Apr 04, 2019 at 04:50:08PM -0400, Carlos O'Donell wrote:
>> > > > +/* Signature required before each abort handler code.  */
>> > > > +#define RSEQ_SIG 0x53053053
>> > > 
>> > > Why isn't this a mips-specific op code?
>> > 
>> > MIPS also has a literal pool just before the abort handler, and it
>> > jumps over it. My understanding is that we can use any signature value
>> > we want, and it does not need to be a valid instruction, similarly to ARM:
>> > 
>> > #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \
>> >                                  abort_label, version, flags, \
>> >                                  start_ip, post_commit_offset, abort_ip) \
>> >                  ".balign 32\n\t" \
>> >                  __rseq_str(table_label) ":\n\t" \
>> >                  ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
>> >                  LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \
>> >                  LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \
>> >                  LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \
>> >                  ".word " __rseq_str(RSEQ_SIG) "\n\t" \
>> >                  __rseq_str(label) ":\n\t" \
>> >                  teardown \
>> >                  "b %l[" __rseq_str(abort_label) "]\n\t"
>> > 
>> > Perhaps Paul Burton can confirm this ?
>> 
>> Yes please.
>> 
>> You also want to avoid the value being a valid MIPS insn that's common.
>> 
>> Did you check that?
> 
> This does not decode as a standard MIPS instruction, though it does
> decode for both the microMIPS (ori) & nanoMIPS (lwxs; sll) ISAs.
> 
> I imagine I copied the value from another architecture when porting, and
> since it doesn't get executed it seemed fine.
> 
> One maybe nicer option along the same lines would be 0x72736571 or
> 0x71657372 (ASCII 'rseq') neither of which decode as a MIPS instruction.
> 
>> I think the order of preference is:
>> 
>> 1.  An uncommon insn (with random immediate values), in a literal pool, that is
>>     not a useful ROP/JOP sequence (very uncommon)
> 
> For that option on MIPS we could do something like:
> 
>  sll $0, $0, 31     # effectively a nop, but looks weird
> 
>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
> 
> Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS
> & classic MIPS) so that could be something like:
> 
>  break 0x7273       # ASCII 'rs'
> 
> That's pretty unlikely to be seen in normal code, or the teq instruction
> has a rarely used code field (4b in microMIPS, 5b in nanoMIPS, 10b in
> classic MIPS) that's meaningless to hardware so something like this
> would be possible:
> 
>  teq $0, $0, 0x8    # ASCII backspace
> 
>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon)
>> 
>> With 2a/2b being roughly equivalent depending on speculative execution policy.
> 
> There are a bunch of potential odd looking nops possible, one of which
> would be the sll I mentioned above.
> 
> Another option would be to use a priveleged instruction which userland
> code can't execute & should normally never contain. That would decode as
> a valid instruction & effectively behave like a trap instruction but
> look very odd to anyone reading disassembled code. eg:
> 
>  mfc0 $0, 13        # Try to read the cause register; take SIGILL
> 
> In order to handle MIPS vs microMIPS vs nanoMIPS differences I'm
> thinking it may be best to switch to one of these real instructions that
> looks strange. The ugly part would be the nest of #ifdef's to deal with
> endianness & ISA when defining it as a number...

Note that we can have different signatures for each sub-architecture, as
long as they don't have to co-exist within the same process.

Ideally we'd need a patch on top of the Linux kernel
tools/testing/selftests/rseq/rseq-mips.h file that updates
the signature value. I think the current discussion leads us
towards a trap with unlikely immediate operand. Note that we
can special-case with #ifdef for each sub-architecture and endianness
if need be.

/*
 * TODO: document trap instruction objdump output on each sub-architecture
 * instruction sets.
 */
#define RSEQ_SIG 0x########

Should we do anything specific for big/little endian ? Is the byte order
of the instruction encoding the same as data ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-09 16:40             ` Mathieu Desnoyers
  0 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-04-09 16:40 UTC (permalink / raw)
  To: Paul Burton
  Cc: Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens,
	gor, schwidefsky, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson

----- On Apr 4, 2019, at 5:41 PM, Paul Burton paul.burton@mips.com wrote:

> Hi Carlos / all,
> 
> On Thu, Apr 04, 2019 at 04:50:08PM -0400, Carlos O'Donell wrote:
>> > > > +/* Signature required before each abort handler code.  */
>> > > > +#define RSEQ_SIG 0x53053053
>> > > 
>> > > Why isn't this a mips-specific op code?
>> > 
>> > MIPS also has a literal pool just before the abort handler, and it
>> > jumps over it. My understanding is that we can use any signature value
>> > we want, and it does not need to be a valid instruction, similarly to ARM:
>> > 
>> > #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \
>> >                                  abort_label, version, flags, \
>> >                                  start_ip, post_commit_offset, abort_ip) \
>> >                  ".balign 32\n\t" \
>> >                  __rseq_str(table_label) ":\n\t" \
>> >                  ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
>> >                  LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \
>> >                  LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \
>> >                  LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \
>> >                  ".word " __rseq_str(RSEQ_SIG) "\n\t" \
>> >                  __rseq_str(label) ":\n\t" \
>> >                  teardown \
>> >                  "b %l[" __rseq_str(abort_label) "]\n\t"
>> > 
>> > Perhaps Paul Burton can confirm this ?
>> 
>> Yes please.
>> 
>> You also want to avoid the value being a valid MIPS insn that's common.
>> 
>> Did you check that?
> 
> This does not decode as a standard MIPS instruction, though it does
> decode for both the microMIPS (ori) & nanoMIPS (lwxs; sll) ISAs.
> 
> I imagine I copied the value from another architecture when porting, and
> since it doesn't get executed it seemed fine.
> 
> One maybe nicer option along the same lines would be 0x72736571 or
> 0x71657372 (ASCII 'rseq') neither of which decode as a MIPS instruction.
> 
>> I think the order of preference is:
>> 
>> 1.  An uncommon insn (with random immediate values), in a literal pool, that is
>>     not a useful ROP/JOP sequence (very uncommon)
> 
> For that option on MIPS we could do something like:
> 
>  sll $0, $0, 31     # effectively a nop, but looks weird
> 
>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
> 
> Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS
> & classic MIPS) so that could be something like:
> 
>  break 0x7273       # ASCII 'rs'
> 
> That's pretty unlikely to be seen in normal code, or the teq instruction
> has a rarely used code field (4b in microMIPS, 5b in nanoMIPS, 10b in
> classic MIPS) that's meaningless to hardware so something like this
> would be possible:
> 
>  teq $0, $0, 0x8    # ASCII backspace
> 
>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon)
>> 
>> With 2a/2b being roughly equivalent depending on speculative execution policy.
> 
> There are a bunch of potential odd looking nops possible, one of which
> would be the sll I mentioned above.
> 
> Another option would be to use a priveleged instruction which userland
> code can't execute & should normally never contain. That would decode as
> a valid instruction & effectively behave like a trap instruction but
> look very odd to anyone reading disassembled code. eg:
> 
>  mfc0 $0, 13        # Try to read the cause register; take SIGILL
> 
> In order to handle MIPS vs microMIPS vs nanoMIPS differences I'm
> thinking it may be best to switch to one of these real instructions that
> looks strange. The ugly part would be the nest of #ifdef's to deal with
> endianness & ISA when defining it as a number...

Note that we can have different signatures for each sub-architecture, as
long as they don't have to co-exist within the same process.

Ideally we'd need a patch on top of the Linux kernel
tools/testing/selftests/rseq/rseq-mips.h file that updates
the signature value. I think the current discussion leads us
towards a trap with unlikely immediate operand. Note that we
can special-case with #ifdef for each sub-architecture and endianness
if need be.

/*
 * TODO: document trap instruction objdump output on each sub-architecture
 * instruction sets.
 */
#define RSEQ_SIG 0x########

Should we do anything specific for big/little endian ? Is the byte order
of the instruction encoding the same as data ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-09  9:29                         ` Alan Modra
@ 2019-04-18 15:31                           ` Mathieu Desnoyers
  -1 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-04-18 15:31 UTC (permalink / raw)
  To: Alan Modra
  Cc: Michael Ellerman, Carlos O'Donell,
	Tulio Magno Quites Machado Filho, Florian Weimer,
	Michael Meissner, Peter Bergner, Paul Burton, Will Deacon,
	Boqun Feng, heiko carstens, gor, schwidefsky, Russell King,
	ARM Linux, Benjamin Herrenschmidt, Paul Mackerras, carlos,
	Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
	Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson,
	Paul Turner, Rich Felker, linux-kernel, linux-api

----- On Apr 9, 2019, at 5:29 AM, Alan Modra amodra@gmail.com wrote:

> On Tue, Apr 09, 2019 at 02:23:53PM +1000, Michael Ellerman wrote:
>> I'd much rather we use a trap with a specific immediate value. Otherwise
>> someone's going to waste time one day puzzling over why userspace is
>> doing mtmsr.
> 
> It's data.  We have other data in executable sections.  Anyone who
> wonders about odd disassembly just hasn't realized they are
> disassembling data.
> 
>> It would also complicate things if we ever wanted to emulate mtmsr.
> 
> No, because it won't be executed.  If I understand correctly, the only
> reason to choose an illegal, trap or privileged insn is to halt
> execution earlier rather than later when a program goes off in the
> weeds.
> 
>> If we want something that is a trap rather than a nop then use 0x0fe50553.
>> 
>> That's "compare the value in r5 with 0x553 and then trap unconditionally".
>> 
>> It shows up in objdump as:
>> 
>>     10000000:	53 05 e5 0f 	twui    r5,1363
>> 
>> 
>> The immediate can be anything, I chose that value to mimic the x86 value
>> Mathieu mentioned.
>> 
>> There's no reason that instruction would ever be generated because the
>> immediate value serves no purpose. So it satisfies the "very unlikely
>> to appear" criteria AFAICS.
> 
> Yes, looks fine to me, except that in VLE mode (do we care?)
> ".long 0x0fe50553" disassembles as
>   0:	0f e5       	se_cmphl r5,r30
>   2:	05 53       	se_mullw r3,r5
> No illegal/trap/privileged insn there.
> 
> ".long 0x0fe5000b" might be better to cover VLE.

Can you share with us the objdump output of ".long 0x0fe5000b" in
VLE mode ? VLE mode support does not appear to be available in typical
toolchains. Also, is VLE mode only for powerpc 32 be, or also for
powerpc 64 be/le ?

Thanks,

Mathieu



-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-18 15:31                           ` Mathieu Desnoyers
  0 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-04-18 15:31 UTC (permalink / raw)
  To: Alan Modra
  Cc: Michael Ellerman, Carlos O'Donell,
	Tulio Magno Quites Machado Filho, Florian Weimer,
	Michael Meissner, Peter Bergner, Paul Burton, Will Deacon,
	Boqun Feng, heiko carstens, gor, schwidefsky, Russell King,
	ARM Linux, Benjamin Herrenschmidt, Paul Mackerras, carlos,
	Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas

----- On Apr 9, 2019, at 5:29 AM, Alan Modra amodra@gmail.com wrote:

> On Tue, Apr 09, 2019 at 02:23:53PM +1000, Michael Ellerman wrote:
>> I'd much rather we use a trap with a specific immediate value. Otherwise
>> someone's going to waste time one day puzzling over why userspace is
>> doing mtmsr.
> 
> It's data.  We have other data in executable sections.  Anyone who
> wonders about odd disassembly just hasn't realized they are
> disassembling data.
> 
>> It would also complicate things if we ever wanted to emulate mtmsr.
> 
> No, because it won't be executed.  If I understand correctly, the only
> reason to choose an illegal, trap or privileged insn is to halt
> execution earlier rather than later when a program goes off in the
> weeds.
> 
>> If we want something that is a trap rather than a nop then use 0x0fe50553.
>> 
>> That's "compare the value in r5 with 0x553 and then trap unconditionally".
>> 
>> It shows up in objdump as:
>> 
>>     10000000:	53 05 e5 0f 	twui    r5,1363
>> 
>> 
>> The immediate can be anything, I chose that value to mimic the x86 value
>> Mathieu mentioned.
>> 
>> There's no reason that instruction would ever be generated because the
>> immediate value serves no purpose. So it satisfies the "very unlikely
>> to appear" criteria AFAICS.
> 
> Yes, looks fine to me, except that in VLE mode (do we care?)
> ".long 0x0fe50553" disassembles as
>   0:	0f e5       	se_cmphl r5,r30
>   2:	05 53       	se_mullw r3,r5
> No illegal/trap/privileged insn there.
> 
> ".long 0x0fe5000b" might be better to cover VLE.

Can you share with us the objdump output of ".long 0x0fe5000b" in
VLE mode ? VLE mode support does not appear to be available in typical
toolchains. Also, is VLE mode only for powerpc 32 be, or also for
powerpc 64 be/le ?

Thanks,

Mathieu



-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-04 21:41           ` Paul Burton
@ 2019-04-18 18:58             ` Mathieu Desnoyers
  -1 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-04-18 18:58 UTC (permalink / raw)
  To: Paul Burton
  Cc: Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens,
	gor, schwidefsky, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

----- On Apr 4, 2019, at 5:41 PM, Paul Burton paul.burton@mips.com wrote:
[...]

>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
> 
> Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS
> & classic MIPS) so that could be something like:
> 
>  break 0x7273       # ASCII 'rs'
> 

Hi Paul,

I like this uncommon break instruction as signature choice.

However, if I try to compile assembler with a break 0x7273 instruction
with mips64 and mips32 toolchains (gcc version 8.2.0 (Ubuntu 8.2.0-1ubuntu2~18.04))
I get:

/tmp/ccVh9F7T.s: Assembler messages:
/tmp/ccVh9F7T.s:24: Error: operand 1 out of range `break 0x7273'

It works up to the value 0x3FF, which seems to use the top 10
code bits:

   a:	03ff 0007 	break	0x3ff

Would a "break 0x350" be a good choice as well ? 

Any idea why 0x7273 is not accepted by my assembler ?

I also tried crafting the assembler with values between 0x3FF and 0x7273
in the 20 code bits. It seems fine from an objdump perspective:

".long 0x03FFFC7\n\t"

generates:

  10:	003f ffc7 	break	0x3f,0x3ff

What I don't understand is why the instruction generated by my
toolchain ends with the last 6 bits "000111", whereas the mips32
instruction set specifies break as ending with "001101" [1].
What am I missing ?

Also, the nanomips break code [2] has a completely different
instruction layout. Should we use a different signature when
compiling for nanomips ? What #ifdef should we use ? Do I
need a special toolchain to generate nanomips binaries ?

Thanks,

Mathieu

[1] http://hades.mech.northwestern.edu/images/1/16/MIPS32_Architecture_Volume_II-A_Instruction_Set.pdf
[2] https://s3-eu-west-1.amazonaws.com/downloads-mips/I7200/I7200+product+launch/MIPS_nanomips32_ISA_TRM_01_01_MD01247.pdf

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-18 18:58             ` Mathieu Desnoyers
  0 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-04-18 18:58 UTC (permalink / raw)
  To: Paul Burton
  Cc: Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens,
	gor, schwidefsky, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson

----- On Apr 4, 2019, at 5:41 PM, Paul Burton paul.burton@mips.com wrote:
[...]

>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
> 
> Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS
> & classic MIPS) so that could be something like:
> 
>  break 0x7273       # ASCII 'rs'
> 

Hi Paul,

I like this uncommon break instruction as signature choice.

However, if I try to compile assembler with a break 0x7273 instruction
with mips64 and mips32 toolchains (gcc version 8.2.0 (Ubuntu 8.2.0-1ubuntu2~18.04))
I get:

/tmp/ccVh9F7T.s: Assembler messages:
/tmp/ccVh9F7T.s:24: Error: operand 1 out of range `break 0x7273'

It works up to the value 0x3FF, which seems to use the top 10
code bits:

   a:	03ff 0007 	break	0x3ff

Would a "break 0x350" be a good choice as well ? 

Any idea why 0x7273 is not accepted by my assembler ?

I also tried crafting the assembler with values between 0x3FF and 0x7273
in the 20 code bits. It seems fine from an objdump perspective:

".long 0x03FFFC7\n\t"

generates:

  10:	003f ffc7 	break	0x3f,0x3ff

What I don't understand is why the instruction generated by my
toolchain ends with the last 6 bits "000111", whereas the mips32
instruction set specifies break as ending with "001101" [1].
What am I missing ?

Also, the nanomips break code [2] has a completely different
instruction layout. Should we use a different signature when
compiling for nanomips ? What #ifdef should we use ? Do I
need a special toolchain to generate nanomips binaries ?

Thanks,

Mathieu

[1] http://hades.mech.northwestern.edu/images/1/16/MIPS32_Architecture_Volume_II-A_Instruction_Set.pdf
[2] https://s3-eu-west-1.amazonaws.com/downloads-mips/I7200/I7200+product+launch/MIPS_nanomips32_ISA_TRM_01_01_MD01247.pdf

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-18 18:58             ` Mathieu Desnoyers
@ 2019-04-24 15:05               ` Mathieu Desnoyers
  -1 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-04-24 15:05 UTC (permalink / raw)
  To: Paul Burton
  Cc: Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens,
	gor, schwidefsky, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

----- On Apr 18, 2019, at 2:58 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Apr 4, 2019, at 5:41 PM, Paul Burton paul.burton@mips.com wrote:
> [...]
> 
>>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
>> 
>> Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS
>> & classic MIPS) so that could be something like:
>> 
>>  break 0x7273       # ASCII 'rs'
>> 
> 
> Hi Paul,
> 
> I like this uncommon break instruction as signature choice.
> 
> However, if I try to compile assembler with a break 0x7273 instruction
> with mips64 and mips32 toolchains (gcc version 8.2.0 (Ubuntu
> 8.2.0-1ubuntu2~18.04))
> I get:
> 
> /tmp/ccVh9F7T.s: Assembler messages:
> /tmp/ccVh9F7T.s:24: Error: operand 1 out of range `break 0x7273'
> 
> It works up to the value 0x3FF, which seems to use the top 10
> code bits:
> 
>   a:	03ff 0007 	break	0x3ff
> 
> Would a "break 0x350" be a good choice as well ?
> 
> Any idea why 0x7273 is not accepted by my assembler ?
> 
> I also tried crafting the assembler with values between 0x3FF and 0x7273
> in the 20 code bits. It seems fine from an objdump perspective:
> 
> ".long 0x03FFFC7\n\t"
> 
> generates:
> 
>  10:	003f ffc7 	break	0x3f,0x3ff
> 
> What I don't understand is why the instruction generated by my
> toolchain ends with the last 6 bits "000111", whereas the mips32
> instruction set specifies break as ending with "001101" [1].
> What am I missing ?
> 
> Also, the nanomips break code [2] has a completely different
> instruction layout. Should we use a different signature when
> compiling for nanomips ? What #ifdef should we use ? Do I
> need a special toolchain to generate nanomips binaries ?

Hi Paul, I'm still waiting for feedback on the MIPS front.

Meanwhile, I plan to use #define RSEQ_SIG 0x0350000d which maps to:

   0350000d 	break	0x350

and use RSEQ_SIG in assembly with:

   ".word " __rseq_str(RSEQ_SIG) "\n\t"

on big and little endian MIPS, for MIPS32 and MIPS64, based on
code generated with gcc version 8.2.0 (Ubuntu 8.2.0-1ubuntu2~18.04).

Let me know if it needs to be tweaked.

Thanks,

Mathieu


> 
> Thanks,
> 
> Mathieu
> 
> [1]
> http://hades.mech.northwestern.edu/images/1/16/MIPS32_Architecture_Volume_II-A_Instruction_Set.pdf
> [2]
> https://s3-eu-west-1.amazonaws.com/downloads-mips/I7200/I7200+product+launch/MIPS_nanomips32_ISA_TRM_01_01_MD01247.pdf
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-24 15:05               ` Mathieu Desnoyers
  0 siblings, 0 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2019-04-24 15:05 UTC (permalink / raw)
  To: Paul Burton
  Cc: Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens,
	gor, schwidefsky, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson

----- On Apr 18, 2019, at 2:58 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Apr 4, 2019, at 5:41 PM, Paul Burton paul.burton@mips.com wrote:
> [...]
> 
>>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
>> 
>> Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS
>> & classic MIPS) so that could be something like:
>> 
>>  break 0x7273       # ASCII 'rs'
>> 
> 
> Hi Paul,
> 
> I like this uncommon break instruction as signature choice.
> 
> However, if I try to compile assembler with a break 0x7273 instruction
> with mips64 and mips32 toolchains (gcc version 8.2.0 (Ubuntu
> 8.2.0-1ubuntu2~18.04))
> I get:
> 
> /tmp/ccVh9F7T.s: Assembler messages:
> /tmp/ccVh9F7T.s:24: Error: operand 1 out of range `break 0x7273'
> 
> It works up to the value 0x3FF, which seems to use the top 10
> code bits:
> 
>   a:	03ff 0007 	break	0x3ff
> 
> Would a "break 0x350" be a good choice as well ?
> 
> Any idea why 0x7273 is not accepted by my assembler ?
> 
> I also tried crafting the assembler with values between 0x3FF and 0x7273
> in the 20 code bits. It seems fine from an objdump perspective:
> 
> ".long 0x03FFFC7\n\t"
> 
> generates:
> 
>  10:	003f ffc7 	break	0x3f,0x3ff
> 
> What I don't understand is why the instruction generated by my
> toolchain ends with the last 6 bits "000111", whereas the mips32
> instruction set specifies break as ending with "001101" [1].
> What am I missing ?
> 
> Also, the nanomips break code [2] has a completely different
> instruction layout. Should we use a different signature when
> compiling for nanomips ? What #ifdef should we use ? Do I
> need a special toolchain to generate nanomips binaries ?

Hi Paul, I'm still waiting for feedback on the MIPS front.

Meanwhile, I plan to use #define RSEQ_SIG 0x0350000d which maps to:

   0350000d 	break	0x350

and use RSEQ_SIG in assembly with:

   ".word " __rseq_str(RSEQ_SIG) "\n\t"

on big and little endian MIPS, for MIPS32 and MIPS64, based on
code generated with gcc version 8.2.0 (Ubuntu 8.2.0-1ubuntu2~18.04).

Let me know if it needs to be tweaked.

Thanks,

Mathieu


> 
> Thanks,
> 
> Mathieu
> 
> [1]
> http://hades.mech.northwestern.edu/images/1/16/MIPS32_Architecture_Volume_II-A_Instruction_Set.pdf
> [2]
> https://s3-eu-west-1.amazonaws.com/downloads-mips/I7200/I7200+product+launch/MIPS_nanomips32_ISA_TRM_01_01_MD01247.pdf
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-24 15:05               ` Mathieu Desnoyers
@ 2019-04-24 23:13                 ` Paul Burton
  -1 siblings, 0 replies; 55+ messages in thread
From: Paul Burton @ 2019-04-24 23:13 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens,
	gor, schwidefsky, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

Hi Mathieu,

Just following up on a couple of things here.

On Wed, Apr 24, 2019 at 11:05:42AM -0400, Mathieu Desnoyers wrote:
> >>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
> >> 
> >> Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS
> >> & classic MIPS) so that could be something like:
> >> 
> >>  break 0x7273       # ASCII 'rs'
> >> 
> > 
> > I like this uncommon break instruction as signature choice.
> > 
> > However, if I try to compile assembler with a break 0x7273 instruction
> > with mips64 and mips32 toolchains (gcc version 8.2.0 (Ubuntu
> > 8.2.0-1ubuntu2~18.04))
> > I get:
> > 
> > /tmp/ccVh9F7T.s: Assembler messages:
> > /tmp/ccVh9F7T.s:24: Error: operand 1 out of range `break 0x7273'
> > 
> > It works up to the value 0x3FF, which seems to use the top 10
> > code bits:
> > 
> >   a:	03ff 0007 	break	0x3ff
> > 
> > Would a "break 0x350" be a good choice as well ?

The "break 0x350" instruction seems good to me - it's still going to be
rare.

> > Any idea why 0x7273 is not accepted by my assembler ?

I don't know why the assembler wants a smaller immediate than the
instruction encoding allows... There's a comment in the binutils file
include/opcode/mips.h that reads:

> A breakpoint instruction uses OP, CODE and SPEC (10 bits of the
> breakpoint instruction are not defined; Kane says the breakpoint code
> field in BREAK is 20 bits; yet MIPS assemblers and debuggers only use
> ten bits).  An optional two-operand form of break/sdbbp allows the
> lower ten bits to be set too, and MIPS32 and later architectures allow
> 20 bits to be set with a signal operand (using CODE20).

I suspect there's some history here that predates my involvement (or
possibly just predates me).

> > I also tried crafting the assembler with values between 0x3FF and 0x7273
> > in the 20 code bits. It seems fine from an objdump perspective:
> > 
> > ".long 0x03FFFC7\n\t"
> > 
> > generates:
> > 
> >  10:	003f ffc7 	break	0x3f,0x3ff
> > 
> > What I don't understand is why the instruction generated by my
> > toolchain ends with the last 6 bits "000111", whereas the mips32
> > instruction set specifies break as ending with "001101" [1].
> > What am I missing ?

Were you targeting microMIPS by any chance? There the break32
instructions ends with 000111.

> > Also, the nanomips break code [2] has a completely different
> > instruction layout. Should we use a different signature when
> > compiling for nanomips ? What #ifdef should we use ?

Yes, and __nanomips__. I included the encoding in my reply to your RFC
patch.

> > Do I need a special toolchain to generate nanomips binaries ?

Yes, you can find it here:

  https://codescape.mips.com/components/toolchain/nanomips/latest/index.html

We don't have the nanoMIPS kernel support in mainline yet, I've gotten
various things applied in preparation but also been swamped with other
things so it's taking a while. If you want to see a working downstream
kernel though you can find it here:

  git://git.linux-mips.org/pub/scm/linux-mti.git nanomips-v4.15

> Hi Paul, I'm still waiting for feedback on the MIPS front.
> 
> Meanwhile, I plan to use #define RSEQ_SIG 0x0350000d which maps to:
> 
>    0350000d 	break	0x350
> 
> and use RSEQ_SIG in assembly with:
> 
>    ".word " __rseq_str(RSEQ_SIG) "\n\t"
> 
> on big and little endian MIPS, for MIPS32 and MIPS64, based on
> code generated with gcc version 8.2.0 (Ubuntu 8.2.0-1ubuntu2~18.04).
> 
> Let me know if it needs to be tweaked.

That's fine for the classic MIPS ISA, but won't decode as a break for
microMIPS or nanoMIPS. See my reply to your RFC for valid encodings for
both of those.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-24 23:13                 ` Paul Burton
  0 siblings, 0 replies; 55+ messages in thread
From: Paul Burton @ 2019-04-24 23:13 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens,
	gor, schwidefsky, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson

Hi Mathieu,

Just following up on a couple of things here.

On Wed, Apr 24, 2019 at 11:05:42AM -0400, Mathieu Desnoyers wrote:
> >>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
> >> 
> >> Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS
> >> & classic MIPS) so that could be something like:
> >> 
> >>  break 0x7273       # ASCII 'rs'
> >> 
> > 
> > I like this uncommon break instruction as signature choice.
> > 
> > However, if I try to compile assembler with a break 0x7273 instruction
> > with mips64 and mips32 toolchains (gcc version 8.2.0 (Ubuntu
> > 8.2.0-1ubuntu2~18.04))
> > I get:
> > 
> > /tmp/ccVh9F7T.s: Assembler messages:
> > /tmp/ccVh9F7T.s:24: Error: operand 1 out of range `break 0x7273'
> > 
> > It works up to the value 0x3FF, which seems to use the top 10
> > code bits:
> > 
> >   a:	03ff 0007 	break	0x3ff
> > 
> > Would a "break 0x350" be a good choice as well ?

The "break 0x350" instruction seems good to me - it's still going to be
rare.

> > Any idea why 0x7273 is not accepted by my assembler ?

I don't know why the assembler wants a smaller immediate than the
instruction encoding allows... There's a comment in the binutils file
include/opcode/mips.h that reads:

> A breakpoint instruction uses OP, CODE and SPEC (10 bits of the
> breakpoint instruction are not defined; Kane says the breakpoint code
> field in BREAK is 20 bits; yet MIPS assemblers and debuggers only use
> ten bits).  An optional two-operand form of break/sdbbp allows the
> lower ten bits to be set too, and MIPS32 and later architectures allow
> 20 bits to be set with a signal operand (using CODE20).

I suspect there's some history here that predates my involvement (or
possibly just predates me).

> > I also tried crafting the assembler with values between 0x3FF and 0x7273
> > in the 20 code bits. It seems fine from an objdump perspective:
> > 
> > ".long 0x03FFFC7\n\t"
> > 
> > generates:
> > 
> >  10:	003f ffc7 	break	0x3f,0x3ff
> > 
> > What I don't understand is why the instruction generated by my
> > toolchain ends with the last 6 bits "000111", whereas the mips32
> > instruction set specifies break as ending with "001101" [1].
> > What am I missing ?

Were you targeting microMIPS by any chance? There the break32
instructions ends with 000111.

> > Also, the nanomips break code [2] has a completely different
> > instruction layout. Should we use a different signature when
> > compiling for nanomips ? What #ifdef should we use ?

Yes, and __nanomips__. I included the encoding in my reply to your RFC
patch.

> > Do I need a special toolchain to generate nanomips binaries ?

Yes, you can find it here:

  https://codescape.mips.com/components/toolchain/nanomips/latest/index.html

We don't have the nanoMIPS kernel support in mainline yet, I've gotten
various things applied in preparation but also been swamped with other
things so it's taking a while. If you want to see a working downstream
kernel though you can find it here:

  git://git.linux-mips.org/pub/scm/linux-mti.git nanomips-v4.15

> Hi Paul, I'm still waiting for feedback on the MIPS front.
> 
> Meanwhile, I plan to use #define RSEQ_SIG 0x0350000d which maps to:
> 
>    0350000d 	break	0x350
> 
> and use RSEQ_SIG in assembly with:
> 
>    ".word " __rseq_str(RSEQ_SIG) "\n\t"
> 
> on big and little endian MIPS, for MIPS32 and MIPS64, based on
> code generated with gcc version 8.2.0 (Ubuntu 8.2.0-1ubuntu2~18.04).
> 
> Let me know if it needs to be tweaked.

That's fine for the classic MIPS ISA, but won't decode as a break for
microMIPS or nanoMIPS. See my reply to your RFC for valid encodings for
both of those.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
  2019-04-24 23:13                 ` Paul Burton
@ 2019-04-25  0:41                   ` Maciej W. Rozycki
  -1 siblings, 0 replies; 55+ messages in thread
From: Maciej W. Rozycki @ 2019-04-25  0:41 UTC (permalink / raw)
  To: Paul Burton
  Cc: Mathieu Desnoyers, Carlos O'Donell, Will Deacon, Boqun Feng,
	heiko carstens, gor, schwidefsky, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney,
	Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api

On Wed, 24 Apr 2019, Paul Burton wrote:

> > > Any idea why 0x7273 is not accepted by my assembler ?
> 
> I don't know why the assembler wants a smaller immediate than the
> instruction encoding allows... There's a comment in the binutils file
> include/opcode/mips.h that reads:
> 
> > A breakpoint instruction uses OP, CODE and SPEC (10 bits of the
> > breakpoint instruction are not defined; Kane says the breakpoint code
> > field in BREAK is 20 bits; yet MIPS assemblers and debuggers only use
> > ten bits).  An optional two-operand form of break/sdbbp allows the
> > lower ten bits to be set too, and MIPS32 and later architectures allow
> > 20 bits to be set with a signal operand (using CODE20).
> 
> I suspect there's some history here that predates my involvement (or
> possibly just predates me).

 A useful explanation is in the Linux kernel (always good to look there), 
in arch/mips/kernel/traps.c:

	/*
	 * There is the ancient bug in the MIPS assemblers that the break
	 * code starts left to bit 16 instead to bit 6 in the opcode.
	 * Gas is bug-compatible, but not always, grrr...
	 * We handle both cases with a simple heuristics.  --macro
	 */

Unfortunately the bug has been carried over to the microMIPS instruction 
encoding in libopcodes for no reason (i.e. likely by copying the table 
mechanically without analysing it) and I didn't catch it when upstreaming.  
We should have permitted setting all bits in the 20-bit code field in the 
microMIPS encoding with a single operand, but you need two, like with the 
regular MIPS instruction set.

 The note on the MIPS32 assembly ISA permitting to set all the 20 bits 
with a single operand is a stale comment referring to the situation before 
binutils commit 1586d91e32ea ("/ 0 should send SIGFPE not SIGTRAP..."), 
<https://sourceware.org/ml/binutils/2004-07/msg00260.html>, which 
addressed a user ABI compatibility issue as discussed upthread here: 
<https://sourceware.org/ml/binutils/2004-06/msg00188.html> and previously: 
<https://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=40C9F5A4.2050606%40avtrex.com>.

 As this is my mistake with the stale note, I have applied a fix to 
binutils now, commit cd0923370be1 ("MIPS/include: opcode/mips.h: Update 
stale comment for CODE20 operand"), so that it is clear that it is only 
SDBBP that accepts a single 20-bit operand for the code field (for the 
MIPS32 and later ISAs).

 FWIW,

  Maciej

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
@ 2019-04-25  0:41                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 55+ messages in thread
From: Maciej W. Rozycki @ 2019-04-25  0:41 UTC (permalink / raw)
  To: Paul Burton
  Cc: Mathieu Desnoyers, Carlos O'Donell, Will Deacon, Boqun Feng,
	heiko carstens, gor, schwidefsky, Russell King, ARM Linux,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney

On Wed, 24 Apr 2019, Paul Burton wrote:

> > > Any idea why 0x7273 is not accepted by my assembler ?
> 
> I don't know why the assembler wants a smaller immediate than the
> instruction encoding allows... There's a comment in the binutils file
> include/opcode/mips.h that reads:
> 
> > A breakpoint instruction uses OP, CODE and SPEC (10 bits of the
> > breakpoint instruction are not defined; Kane says the breakpoint code
> > field in BREAK is 20 bits; yet MIPS assemblers and debuggers only use
> > ten bits).  An optional two-operand form of break/sdbbp allows the
> > lower ten bits to be set too, and MIPS32 and later architectures allow
> > 20 bits to be set with a signal operand (using CODE20).
> 
> I suspect there's some history here that predates my involvement (or
> possibly just predates me).

 A useful explanation is in the Linux kernel (always good to look there), 
in arch/mips/kernel/traps.c:

	/*
	 * There is the ancient bug in the MIPS assemblers that the break
	 * code starts left to bit 16 instead to bit 6 in the opcode.
	 * Gas is bug-compatible, but not always, grrr...
	 * We handle both cases with a simple heuristics.  --macro
	 */

Unfortunately the bug has been carried over to the microMIPS instruction 
encoding in libopcodes for no reason (i.e. likely by copying the table 
mechanically without analysing it) and I didn't catch it when upstreaming.  
We should have permitted setting all bits in the 20-bit code field in the 
microMIPS encoding with a single operand, but you need two, like with the 
regular MIPS instruction set.

 The note on the MIPS32 assembly ISA permitting to set all the 20 bits 
with a single operand is a stale comment referring to the situation before 
binutils commit 1586d91e32ea ("/ 0 should send SIGFPE not SIGTRAP..."), 
<https://sourceware.org/ml/binutils/2004-07/msg00260.html>, which 
addressed a user ABI compatibility issue as discussed upthread here: 
<https://sourceware.org/ml/binutils/2004-06/msg00188.html> and previously: 
<https://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=40C9F5A4.2050606%40avtrex.com>.

 As this is my mistake with the stale note, I have applied a fix to 
binutils now, commit cd0923370be1 ("MIPS/include: opcode/mips.h: Update 
stale comment for CODE20 operand"), so that it is clear that it is only 
SDBBP that accepts a single 20-bit operand for the code field (for the 
MIPS32 and later ISAs).

 FWIW,

  Maciej

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2019-04-25  0:41 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190212194253.1951-1-mathieu.desnoyers@efficios.com>
2019-02-12 19:42 ` [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) Mathieu Desnoyers
2019-03-22 20:09   ` Carlos O'Donell
2019-03-25 15:54     ` Mathieu Desnoyers
2019-03-27  9:16       ` Martin Schwidefsky
2019-03-27  9:16         ` Martin Schwidefsky
2019-03-27 20:01         ` Mathieu Desnoyers
2019-03-27 20:01           ` Mathieu Desnoyers
2019-03-27 20:38         ` Carlos O'Donell
2019-03-27 20:38           ` Carlos O'Donell
2019-03-28  7:49           ` Martin Schwidefsky
2019-03-28  7:49             ` Martin Schwidefsky
2019-03-28 15:42             ` Mathieu Desnoyers
2019-03-28 15:42               ` Mathieu Desnoyers
2019-04-02  6:02       ` Michael Ellerman
2019-04-02  7:08         ` Florian Weimer
2019-04-02  7:08           ` Florian Weimer
2019-04-04 20:32           ` Carlos O'Donell
2019-04-04 20:32             ` Carlos O'Donell
2019-04-05  9:16             ` Florian Weimer
2019-04-05  9:16               ` Florian Weimer
2019-04-05 15:40               ` Carlos O'Donell
2019-04-05 15:40                 ` Carlos O'Donell
2019-04-08 19:20                 ` Tulio Magno Quites Machado Filho
2019-04-08 19:20                   ` Tulio Magno Quites Machado Filho
2019-04-08 21:45                   ` Carlos O'Donell
2019-04-08 21:45                     ` Carlos O'Donell
2019-04-09  4:23                     ` Michael Ellerman
2019-04-09  4:23                       ` Michael Ellerman
2019-04-09  9:29                       ` Alan Modra
2019-04-09  9:29                         ` Alan Modra
2019-04-09 13:58                         ` Tulio Magno Quites Machado Filho
2019-04-09 14:13                           ` Carlos O'Donell
2019-04-09 14:13                             ` Carlos O'Donell
2019-04-09 15:45                             ` Mathieu Desnoyers
2019-04-09 15:45                               ` Mathieu Desnoyers
2019-04-18 15:31                         ` Mathieu Desnoyers
2019-04-18 15:31                           ` Mathieu Desnoyers
2019-04-09 16:33                     ` Mathieu Desnoyers
2019-04-09 16:33                       ` Mathieu Desnoyers
2019-04-04 20:15         ` Carlos O'Donell
2019-04-04 20:50       ` Carlos O'Donell
2019-04-04 21:41         ` Paul Burton
2019-04-04 21:41           ` Paul Burton
2019-04-09 16:40           ` Mathieu Desnoyers
2019-04-09 16:40             ` Mathieu Desnoyers
2019-04-18 18:58           ` Mathieu Desnoyers
2019-04-18 18:58             ` Mathieu Desnoyers
2019-04-24 15:05             ` Mathieu Desnoyers
2019-04-24 15:05               ` Mathieu Desnoyers
2019-04-24 23:13               ` Paul Burton
2019-04-24 23:13                 ` Paul Burton
2019-04-25  0:41                 ` Maciej W. Rozycki
2019-04-25  0:41                   ` Maciej W. Rozycki
2019-02-12 19:42 ` [PATCH 2/4] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux Mathieu Desnoyers
2019-03-22 20:13   ` Carlos O'Donell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.