All of lore.kernel.org
 help / color / mirror / Atom feed
* MPK: pkey_free and key reuse
@ 2017-11-05 10:35 ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-05 10:35 UTC (permalink / raw)
  To: Dave Hansen, linux-x86_64-u79uwXL29TY76Z2rM5mHXA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mm, Linux API

[-- Attachment #1: Type: text/plain, Size: 2962 bytes --]

I'm working on adding memory protection key support to glibc.

I don't think pkey_free, as it is implemented today, is very safe due to 
key reuse by a subsequent pkey_alloc.  I see two problems:

(A) pkey_free allows reuse for they key while there are still mappings 
that use it.

(B) If a key is reused, existing threads retain their access rights, 
while there is an expectation that pkey_alloc denies access for the 
threads except the current one.

Issue (A) could be fixed by having pkey_free to mark the key for reuse, 
and only actually reuse it if all those mappings are gone.  This could 
have a significant performance cost, but pkey_free is supposed to be rare.

Issue (B) is much harder to fix.  There is no atomic way to change 
access for a single key, so there is always a race condition due to the 
read-modify-write cycle for the PKRU update in user space.  This means 
that even if the kernel iterated over all threads to revoke access on 
pkey_free, there is a chance that the race reinstantiates the old access 
rights.

One way to deal with this is to give up and just remove pkey_free from 
the API (i.e., we wouldn't provide it in glibc).  A slightly less 
drastic way could add two pkey_alloc flags, a flag to disable pkey_free 
for the new key (which would mainly serve as a documentation of intent), 
and another flag which requests a pristine key which has never been used 
before.  With the second flag, and assuming correct key management, 
libraries would have some confidence that other threads in the process 
would not implicitly gain access to the new key (although there is the 
init_pkru= boot flag, which overrides the thread default, so it doesn't 
look like the assumption is actually valid).

All this is of course a bit on thin ice anyway because code could just 
clear the PKRU register at any time.

I'm attaching my glibc patch for reference.  The interesting bits is 
probably the test case (and how it creates and joins threads) and the 
pkey_set/pkey_get functions.  The support/ subdirectory is just our 
testing framework which is still very young—I needed a few more 
functions for debugging, which is why they are in this patch.

Key reuse is not the only problem, we also have an issue with siglongjmp:

   https://sourceware.org/bugzilla/show_bug.cgi?id=22396

I've started wondering whether it even makes sense to expose this 
interface for general use.  I don't think any other architecture will 
implement something like this in the same way (with a PKRU register 
which can simply be cleared, and keys which are easily guessed and 
reused).  I suspect the only use for this functionality is in-memory 
databases which use DAX mappings for persistence, and want to reduce 
risk of persistent data corruption due to random pointer writes.  (And 
maybe execute-only memory, but that's not really benefiting anyone anyway.)

Thanks,
Florian

PS: The manpages need fixing.  Right now, they are misleading.

[-- Attachment #2: glibc-pkey.patch --]
[-- Type: text/x-patch, Size: 54600 bytes --]


This adds system call wrappers for pkey_alloc, pkey_free, pkey_mprotect,
and x86-64 implementations of pkey_get and pkey_set, which abstract over
the PKRU CPU register and hide the actual number of memory protection
keys supported by the CPU.

The system call wrapers use unsigned int instead of unsigned long for
parameters, so that no special treatment for x32 is needed.  The flags
argument is currently unused, and the access rights bit mask is limited
to two bits by the current PKRU register layout anyway.

2017-11-04  Florian Weimer  <fweimer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

	Linux: Implement interfaces for memory protection keys
	* support/Makefile (libsupport-routines): Add
	support_test_compare_failure, xraise, xsigaction, xsignal,
	xsysconf.
	* support/check.h (TEST_COMPARE): New macro.
	(support_test_compare_failure): Declare.
	* support/xsignal.h (xraise, xsignal, xsigaction): Declare.
	* support/xunistd.h (xsysconf): Declare.
	* support/support_test_compare_failure.c: New file.
	* support/xraise.c: Likewise.
	* support/xsigaction.c: Likewise.
	* support/xsignal.c: Likewise.
	* support/xsysconf.c: Likewise.
	* sysdeps/unix/sysv/linux/Makefile [misc] (routines): Add
	pkey_set, pkey_get.
	[misc] (tests): Add tst-pkey.
	(tst-pkey): Link with -lpthread.
	* sysdeps/unix/sysv/linux/Versions (GLIBC_2.27): Add pkey_alloc,
	pkey_free, pkey_set, pkey_get, pkey_mprotect.
	* sysdeps/unix/sysv/linux/bits/mman-linux.h (PKEY_DISABLE_ACCESS)
	(PKEY_DISABLE_WRITE): Define.
	(pkey_alloc, pkey_free, pkey_set, pkey_get, pkey_mprotect):
	Declare.
	* sysdeps/unix/sysv/linux/bits/siginfo-consts.h (SEGV_BNDERR)
	(SEGV_PKUERR): Add.
	* sysdeps/unix/sysv/linux/pkey_get.c: New file.
	* sysdeps/unix/sysv/linux/pkey_set.c: Likewise.
	* sysdeps/unix/sysv/linux/syscalls.list (pkey_alloc, pkey_free)
	(pkey_mprotect): Add.
	* sysdeps/unix/sysv/linux/tst-pkey.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/arch-pkey.h: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/pkey_get.c: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/pkey_set.c: Likewise.
	* sysdeps/unix/sysv/linux/**.abilist: Update.

diff --git a/NEWS b/NEWS
index 933085417c..0652012a09 100644
--- a/NEWS
+++ b/NEWS
@@ -38,6 +38,10 @@ Major new features:
 * glibc now provides the <sys/memfd.h> header file and the memfd_create
   system call.
 
+* Support for memory protection keys was added.  The <sys/mman.h> header now
+  declares the functions pkey_alloc, pkey_free, pkey_memprotect, pkey_set,
+  pkey_get.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * On GNU/Linux, the obsolete Linux constant PTRACE_SEIZE_DEVEL is no longer
diff --git a/support/Makefile b/support/Makefile
index dafb1737a4..50d4269e24 100644
--- a/support/Makefile
+++ b/support/Makefile
@@ -52,9 +52,10 @@ libsupport-routines = \
   support_record_failure \
   support_run_diff \
   support_shared_allocate \
-  support_write_file_string \
+  support_test_compare_failure \
   support_test_main \
   support_test_verify_impl \
+  support_write_file_string \
   temp_file \
   write_message \
   xaccept \
@@ -84,8 +85,8 @@ libsupport-routines = \
   xpthread_attr_destroy \
   xpthread_attr_init \
   xpthread_attr_setdetachstate \
-  xpthread_attr_setstacksize \
   xpthread_attr_setguardsize \
+  xpthread_attr_setstacksize \
   xpthread_barrier_destroy \
   xpthread_barrier_init \
   xpthread_barrier_wait \
@@ -116,14 +117,18 @@ libsupport-routines = \
   xpthread_sigmask \
   xpthread_spin_lock \
   xpthread_spin_unlock \
+  xraise \
   xreadlink \
   xrealloc \
   xrecvfrom \
   xsendto \
   xsetsockopt \
+  xsigaction \
+  xsignal \
   xsocket \
   xstrdup \
   xstrndup \
+  xsysconf \
   xunlink \
   xwaitpid \
   xwrite \
diff --git a/support/check.h b/support/check.h
index bdcd12952a..29b709c2b0 100644
--- a/support/check.h
+++ b/support/check.h
@@ -86,6 +86,35 @@ void support_test_verify_exit_impl (int status, const char *file, int line,
    does not support reporting failures from a DSO.  */
 void support_record_failure (void);
 
+/* Compare the two numbers LEFT and RIGHT and report failure if they
+   are different.  */
+#define TEST_COMPARE(left, right)                                       \
+  ({                                                                    \
+    __typeof__ (left) __left_value = (left);                            \
+    __typeof__ (right) __right_value = (right);                         \
+    _Static_assert (sizeof (__left_value) <= sizeof (long long),        \
+                    "left value fits into long long");                  \
+    _Static_assert (sizeof (__right_value) <= sizeof (long long),       \
+                    "right value fits into long long");                 \
+    if (__left_value != __right_value                                   \
+        || ((__left_value > 0) != (__right_value > 0)))                 \
+      support_test_compare_failure                                      \
+        (__FILE__, __LINE__,                                            \
+         #left, __left_value, __left_value > 0,                         \
+         #right, __right_value, __right_value > 0);                     \
+  })
+
+/* Internal implementation of TEST_COMPARE.  LEFT_POSITIVE and
+   RIGHT_POSITIVE are used to fit both unsigned long long and long
+   long arguments into LEFT_VALUE and RIGHT_VALUE.  */
+void support_test_compare_failure (const char *file, int line,
+                                   const char *left_expr,
+                                   long long left_value,
+                                   int left_positive,
+                                   const char *right_expr,
+                                   long long right_value,
+                                   int right_positive);
+
 /* Internal function called by the test driver.  */
 int support_report_failure (int status)
   __attribute__ ((weak, warn_unused_result));
diff --git a/support/support_test_compare_failure.c b/support/support_test_compare_failure.c
new file mode 100644
index 0000000000..38fec1ca89
--- /dev/null
+++ b/support/support_test_compare_failure.c
@@ -0,0 +1,46 @@
+/* Reporting mumeric comparison failure.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <support/check.h>
+
+static void
+report (const char *which, const char *expr, long long value, int positive)
+{
+  printf ("  %s: ", which);
+  if (positive)
+    printf ("%llu", (unsigned long long) value);
+  else
+    printf ("%lld", value);
+  printf (" (0x%llx); from: %s\n", (unsigned long long) value, expr);
+}
+
+void
+support_test_compare_failure (const char *file, int line,
+                              const char *left_expr,
+                              long long left_value,
+                              int left_positive,
+                              const char *right_expr,
+                              long long right_value,
+                              int right_positive)
+{
+  support_record_failure ();
+  printf ("%s:%d: numeric comparison failure\n", file, line);
+  report (" left", left_expr, left_value, left_positive);
+  report ("right", right_expr, right_value, right_positive);
+}
diff --git a/support/xraise.c b/support/xraise.c
new file mode 100644
index 0000000000..9126c6c3ea
--- /dev/null
+++ b/support/xraise.c
@@ -0,0 +1,27 @@
+/* Error-checking wrapper for raise.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/check.h>
+#include <support/xsignal.h>
+
+void
+xraise (int sig)
+{
+  if (raise (sig) != 0)
+    FAIL_EXIT1 ("raise (%d): %m" , sig);
+}
diff --git a/support/xsigaction.c b/support/xsigaction.c
new file mode 100644
index 0000000000..b74c69afae
--- /dev/null
+++ b/support/xsigaction.c
@@ -0,0 +1,27 @@
+/* Error-checking wrapper for sigaction.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/check.h>
+#include <support/xsignal.h>
+
+void
+xsigaction (int sig, const struct sigaction *newact, struct sigaction *oldact)
+{
+  if (sigaction (sig, newact, oldact))
+    FAIL_EXIT1 ("sigaction (%d): %m" , sig);
+}
diff --git a/support/xsignal.c b/support/xsignal.c
new file mode 100644
index 0000000000..22a1dd74a7
--- /dev/null
+++ b/support/xsignal.c
@@ -0,0 +1,29 @@
+/* Error-checking wrapper for signal.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/check.h>
+#include <support/xsignal.h>
+
+sighandler_t
+xsignal (int sig, sighandler_t handler)
+{
+  sighandler_t result = signal (sig, handler);
+  if (result == SIG_ERR)
+    FAIL_EXIT1 ("signal (%d, %p): %m", sig, handler);
+  return result;
+}
diff --git a/support/xsignal.h b/support/xsignal.h
index 3dc0d9d5ce..3087ed0082 100644
--- a/support/xsignal.h
+++ b/support/xsignal.h
@@ -24,6 +24,14 @@
 
 __BEGIN_DECLS
 
+/* The following functions call the corresponding libc functions and
+   terminate the process on error.  */
+
+void xraise (int sig);
+sighandler_t xsignal (int sig, sighandler_t handler);
+void xsigaction (int sig, const struct sigaction *newact,
+                 struct sigaction *oldact);
+
 /* The following functions call the corresponding libpthread functions
    and terminate the process on error.  */
 
diff --git a/support/xsysconf.c b/support/xsysconf.c
new file mode 100644
index 0000000000..15ab1e26c4
--- /dev/null
+++ b/support/xsysconf.c
@@ -0,0 +1,36 @@
+/* Error-checking wrapper for sysconf.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <support/check.h>
+#include <support/xunistd.h>
+
+long
+xsysconf (int name)
+{
+  /* Detect errors by a changed errno value, in case -1 is a valid
+     value.  Make sure that the caller does not see the zero value for
+     errno.  */
+  int old_errno = errno;
+  errno = 0;
+  long result = sysconf (name);
+  if (errno != 0)
+    FAIL_EXIT1 ("sysconf (%d): %m", name);
+  errno = old_errno;
+  return result;
+}
diff --git a/support/xunistd.h b/support/xunistd.h
index 05c2626a7b..00376f7aae 100644
--- a/support/xunistd.h
+++ b/support/xunistd.h
@@ -39,6 +39,7 @@ void xstat (const char *path, struct stat64 *);
 void xmkdir (const char *path, mode_t);
 void xchroot (const char *path);
 void xunlink (const char *path);
+long xsysconf (int name);
 
 /* Read the link at PATH.  The caller should free the returned string
    with free.  */
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 53e41510e3..095cf93892 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -18,7 +18,7 @@ sysdep_routines += clone umount umount2 readahead \
 		   setfsuid setfsgid epoll_pwait signalfd \
 		   eventfd eventfd_read eventfd_write prlimit \
 		   personality epoll_wait tee vmsplice splice \
-		   open_by_handle_at
+		   open_by_handle_at pkey_set pkey_get
 
 CFLAGS-gethostid.c = -fexceptions
 CFLAGS-tee.c = -fexceptions -fasynchronous-unwind-tables
@@ -44,7 +44,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \
 
 tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \
 	 tst-quota tst-sync_file_range test-errno-linux tst-sysconf-iov_max \
-	 tst-memfd_create
+	 tst-memfd_create tst-pkey
 
 # Generate the list of SYS_* macros for the system calls (__NR_*
 # macros).  The file syscall-names.list contains all possible system
@@ -92,6 +92,8 @@ $(objpfx)tst-syscall-list.out: \
 # Separate object file for access to the constant from the UAPI header.
 $(objpfx)tst-sysconf-iov_max: $(objpfx)tst-sysconf-iov_max-uapi.o
 
+$(objpfx)tst-pkey: $(shared-thread-library)
+
 endif # $(subdir) == misc
 
 ifeq ($(subdir),time)
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index 992c19729f..798ffc7660 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -168,6 +168,7 @@ libc {
   }
   GLIBC_2.27 {
     memfd_create;
+    pkey_alloc; pkey_free; pkey_set; pkey_get; pkey_mprotect;
   }
   GLIBC_PRIVATE {
     # functions used in other libraries
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 140ca28abc..85788be12b 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2107,6 +2107,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index f698e1b2f4..3b463dacbe 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2018,6 +2018,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist
index 8a8af3e3e4..a1315aef35 100644
--- a/sysdeps/unix/sysv/linux/arm/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/libc.abilist
@@ -108,6 +108,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.4 GLIBC_2.4 A
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/bits/mman-linux.h b/sysdeps/unix/sysv/linux/bits/mman-linux.h
index b091181960..da5ec79334 100644
--- a/sysdeps/unix/sysv/linux/bits/mman-linux.h
+++ b/sysdeps/unix/sysv/linux/bits/mman-linux.h
@@ -109,3 +109,38 @@
 # define MCL_ONFAULT	4		/* Lock all pages that are
 					   faulted in.  */
 #endif
+
+/* Memory protection key support.  */
+#ifdef __USE_GNU
+
+/* FLags for pkey_alloc.  */
+# define PKEY_DISABLE_ACCESS 0x1
+# define PKEY_DISABLE_WRITE 0x2
+
+__BEGIN_DECLS
+
+/* Allocate a new protection key, with the PKEY_DISABLE_* bits
+   specified in ACCESS_RIGHTS.  The protection key mask for the
+   current thread is updated to match the access privilege for the new
+   key.  */
+int pkey_alloc (unsigned int __flags, unsigned int __access_rights) __THROW;
+
+/* Update the access rights for the current thread for KEY, which must
+   have been allocated using pkey_alloc.  */
+int pkey_set (int __key, unsigned int __access_rights) __THROW;
+
+/* Return the access rights for the current thread for KEY, which must
+   have been allocated using pkey_alloc.  */
+int pkey_get (int _key) __THROW;
+
+/* Free an allocated protection key, which must have been allocated
+   using pkey_alloc.  */
+int pkey_free (int __key) __THROW;
+
+/* Apply memory protection flags for KEY to the specified address
+   range.  */
+int pkey_mprotect (void *__addr, size_t __len, int __prot, int __pkey) __THROW;
+
+__END_DECLS
+
+#endif /* __USE_GNU */
diff --git a/sysdeps/unix/sysv/linux/bits/siginfo-consts.h b/sysdeps/unix/sysv/linux/bits/siginfo-consts.h
index 525840cea1..e86b933040 100644
--- a/sysdeps/unix/sysv/linux/bits/siginfo-consts.h
+++ b/sysdeps/unix/sysv/linux/bits/siginfo-consts.h
@@ -111,8 +111,12 @@ enum
 {
   SEGV_MAPERR = 1,		/* Address not mapped to object.  */
 #  define SEGV_MAPERR	SEGV_MAPERR
-  SEGV_ACCERR			/* Invalid permissions for mapped object.  */
+  SEGV_ACCERR,			/* Invalid permissions for mapped object.  */
 #  define SEGV_ACCERR	SEGV_ACCERR
+  SEGV_BNDERR,			/* Bounds checking failure.  */
+#  define SEGV_BNDERR	SEGV_BNDERR
+  SEGV_PKUERR			/* Protection key checking failure.  */
+#  define SEGV_PKUERR	SEGV_PKUERR
 };
 
 /* `si_code' values for SIGBUS signal.  */
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 5b81a6cd7d..7397d728f2 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -1872,6 +1872,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index 51ead9e867..cffdf251d6 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2037,6 +2037,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index 78b4ee8d40..3292510a55 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -1901,6 +1901,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index d9c97779e4..636bbdd1a7 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -109,6 +109,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.4 GLIBC_2.4 A
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 4acbf7eeed..6952863f86 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -1986,6 +1986,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
index 93f02f08ce..ac5b56abab 100644
--- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
@@ -2107,3 +2107,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 795e85de70..bb0958e842 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -1961,6 +1961,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index dc714057b7..9104eb4d6d 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -1959,6 +1959,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index ce7bc9b175..58a5d5e141 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -1957,6 +1957,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 3fdd85eace..2efac14a7d 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -1952,6 +1952,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 3e0bcb2a5c..9ef29e4e98 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2148,3 +2148,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/pkey_get.c b/sysdeps/unix/sysv/linux/pkey_get.c
new file mode 100644
index 0000000000..fc3204c82f
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pkey_get.c
@@ -0,0 +1,26 @@
+/* Obtaining the thread memory protection key, generic stub.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+
+int
+pkey_get (int key)
+{
+  __set_errno (ENOSYS);
+  return -1;
+}
diff --git a/sysdeps/unix/sysv/linux/pkey_set.c b/sysdeps/unix/sysv/linux/pkey_set.c
new file mode 100644
index 0000000000..f686c4373c
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pkey_set.c
@@ -0,0 +1,26 @@
+/* Changing the thread memory protection key, generic stub.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+
+int
+pkey_set (int key, unsigned int access_rights)
+{
+  __set_errno (ENOSYS);
+  return -1;
+}
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index 375c69d9d1..60c024096f 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -1990,6 +1990,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index a88172a906..327933c973 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -1995,6 +1995,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist
index fa026a332c..b04c31bc10 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist
@@ -2202,3 +2202,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist
index 838f395d78..e0645e9e25 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist
@@ -109,6 +109,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 _Exit F
 GLIBC_2.3 _IO_2_1_stderr_ D 0xe0
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 41b79c496a..ef434c61a7 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -1990,6 +1990,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 68251a0e69..4114a4ce57 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -1891,6 +1891,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist
index bc1aae275e..f4478b0cc5 100644
--- a/sysdeps/unix/sysv/linux/sh/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/libc.abilist
@@ -1876,6 +1876,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 93e6d092ac..136a57fc0e 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -1983,6 +1983,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index b11d6764d4..9ad0790829 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -1920,6 +1920,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/syscalls.list b/sysdeps/unix/sysv/linux/syscalls.list
index 40c4fbb9ea..6f657eea2e 100644
--- a/sysdeps/unix/sysv/linux/syscalls.list
+++ b/sysdeps/unix/sysv/linux/syscalls.list
@@ -110,3 +110,6 @@ setns		EXTRA	setns		i:ii	setns
 process_vm_readv EXTRA	process_vm_readv i:ipipii process_vm_readv
 process_vm_writev EXTRA	process_vm_writev i:ipipii process_vm_writev
 memfd_create    EXTRA	memfd_create	i:si    memfd_create
+pkey_alloc	EXTRA	pkey_alloc	i:ii	pkey_alloc
+pkey_free	EXTRA	pkey_free	i:i	pkey_free
+pkey_mprotect	EXTRA	pkey_mprotect	i:aiii  pkey_mprotect
diff --git a/sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist b/sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist
index e9eb4ff7bd..d4f2094027 100644
--- a/sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist
@@ -2114,3 +2114,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist b/sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist
index 8f08e909cd..4916dbabb5 100644
--- a/sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist
@@ -2114,3 +2114,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist b/sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist
index e9eb4ff7bd..d4f2094027 100644
--- a/sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist
+++ b/sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist
@@ -2114,3 +2114,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/tst-pkey.c b/sysdeps/unix/sysv/linux/tst-pkey.c
new file mode 100644
index 0000000000..42d50e37c2
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-pkey.c
@@ -0,0 +1,390 @@
+/* Tests for memory protection keys.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <inttypes.h>
+#include <setjmp.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/support.h>
+#include <support/test-driver.h>
+#include <support/xsignal.h>
+#include <support/xthread.h>
+#include <support/xunistd.h>
+#include <sys/mman.h>
+
+/* Used to force threads to wait until the main thread has set up the
+   keys as intended.  */
+static pthread_barrier_t barrier;
+
+/* The keys used for testing.  These have been allocated with access
+   rights set based on their array index.  */
+enum { key_count = 4 };
+static int keys[key_count];
+static volatile int *pages[key_count];
+
+/* Used to report results from the signal handler.  */
+static volatile void *sigsegv_addr;
+static volatile int sigsegv_code;
+static volatile int sigsegv_pkey;
+static sigjmp_buf sigsegv_jmp;
+
+/* Used to handle expected read or write faults.  */
+static void
+sigsegv_handler (int signum, siginfo_t *info, void *context)
+{
+  sigsegv_addr = info->si_addr;
+  sigsegv_code = info->si_code;
+  sigsegv_pkey = info->si_pkey;
+  siglongjmp (sigsegv_jmp, 2);
+}
+
+static const struct sigaction sigsegv_sigaction =
+  {
+    .sa_flags = SA_RESETHAND | SA_SIGINFO,
+    .sa_sigaction = &sigsegv_handler,
+  };
+
+/* Check if PAGE is readable (if !WRITE) or writable (if WRITE).  */
+static bool
+check_page_access (int page, bool write)
+{
+  /* This is needed to work around bug 22396: On x86-64, siglongjmp
+     does not restore the protection key access rights for the current
+     thread.  We restore only the access rights for the keys under
+     test.  (This is not a general solution to this problem, but it
+     allows testing to proceed after a fault.)  */
+  unsigned saved_rights[key_count];
+  for (int i = 0; i < key_count; ++i)
+    saved_rights[i] = pkey_get (keys[i]);
+
+  volatile int *addr = pages[page];
+  if (test_verbose > 0)
+    {
+      printf ("info: checking access at %p (page %d) for %s\n",
+              addr, page, write ? "writing" : "reading");
+    }
+  int result = sigsetjmp (sigsegv_jmp, 1);
+  if (result == 0)
+    {
+      xsigaction (SIGSEGV, &sigsegv_sigaction, NULL);
+      if (write)
+        *addr = 3;
+      else
+        (void) *addr;
+      xsignal (SIGSEGV, SIG_DFL);
+      if (test_verbose > 0)
+        puts ("  --> access allowed");
+      return true;
+    }
+  else
+    {
+      xsignal (SIGSEGV, SIG_DFL);
+      if (test_verbose > 0)
+        puts ("  --> access denied");
+      TEST_COMPARE (result, 2);
+      TEST_COMPARE ((uintptr_t) sigsegv_addr, (uintptr_t) addr);
+      TEST_COMPARE (sigsegv_code, SEGV_PKUERR);
+      TEST_COMPARE (sigsegv_pkey, keys[page]);
+      for (int i = 0; i < key_count; ++i)
+        TEST_COMPARE (pkey_set (keys[i], saved_rights[i]), 0);
+      return false;
+    }
+}
+
+static volatile sig_atomic_t sigusr1_handler_ran;
+
+/* Used to check that access is revoked in signal handlers.  */
+static void
+sigusr1_handler (int signum)
+{
+  TEST_COMPARE (signum, SIGUSR1);
+  for (int i = 0; i < key_count; ++i)
+    TEST_COMPARE (pkey_get (keys[i]), PKEY_DISABLE_ACCESS);
+  sigusr1_handler_ran = 1;
+}
+
+/* Used to report results from other threads.  */
+struct thread_result
+{
+  int access_rights[key_count];
+  pthread_t next_thread;
+};
+
+/* Return the thread's access rights for the keys under test.  */
+static void *
+get_thread_func (void *closure)
+{
+  struct thread_result *result = xmalloc (sizeof (*result));
+  for (int i = 0; i < key_count; ++i)
+    result->access_rights[i] = pkey_get (keys[i]);
+  memset (&result->next_thread, 0, sizeof (result->next_thread));
+  return result;
+}
+
+/* Wait for initialization and then check that the current thread does
+   not have access through the keys under test.  */
+static void *
+delayed_thread_func (void *closure)
+{
+  bool check_access = *(bool *) closure;
+  pthread_barrier_wait (&barrier);
+  struct thread_result *result = get_thread_func (NULL);
+
+  if (check_access)
+    {
+      /* Also check directly.  This code should not run with other
+         threads in parallel because of the SIGSEGV handler which is
+         installed by check_page_access.  */
+      for (int i = 0; i < key_count; ++i)
+        {
+          TEST_VERIFY (!check_page_access (i, false));
+          TEST_VERIFY (!check_page_access (i, true));
+        }
+    }
+
+  result->next_thread = xpthread_create (NULL, get_thread_func, NULL);
+  return result;
+}
+
+static int
+do_test (void)
+{
+  long pagesize = xsysconf (_SC_PAGESIZE);
+
+  xpthread_barrier_init (&barrier, NULL, 2);
+  bool delayed_thread_check_access = true;
+  pthread_t delayed_thread = xpthread_create
+    (NULL, &delayed_thread_func, &delayed_thread_check_access);
+
+  keys[0] = pkey_alloc (0, 0);
+  if (keys[0] < 0)
+    {
+      if (errno == ENOSYS)
+        {
+          puts ("warning: kernel does not support memory protection keys");
+          return EXIT_UNSUPPORTED;
+        }
+      if (errno == ENOSPC)
+        {
+          puts ("warning: CPU does not support memory protection keys");
+          return EXIT_UNSUPPORTED;
+        }
+      FAIL_EXIT1 ("pkey_alloc: %m");
+    }
+  TEST_COMPARE (pkey_get (keys[0]), 0);
+  for (int i = 1; i < key_count; ++i)
+    {
+      keys[i] = pkey_alloc (0, i);
+      if (keys[i] < 0)
+        FAIL_EXIT1 ("pkey_alloc (0, %d): %m", i);
+      /* pkey_alloc is supposed to change the current thread's access
+         rights for the new key.  */
+      TEST_COMPARE (pkey_get (keys[i]), i);
+    }
+  /* Check that all the keys have the expected access rights for the
+     current thread.  */
+  for (int i = 0; i < key_count; ++i)
+    TEST_COMPARE (pkey_get (keys[i]), i);
+
+  /* Allocate a test page for each key.  */
+  for (int i = 0; i < key_count; ++i)
+    {
+      pages[i] = xmmap (NULL, pagesize, PROT_READ | PROT_WRITE,
+                        MAP_ANONYMOUS | MAP_PRIVATE, -1);
+      TEST_COMPARE (pkey_mprotect ((void *) pages[i], pagesize,
+                                   PROT_READ | PROT_WRITE, keys[i]), 0);
+    }
+
+  /* Check that the initial thread does not have access to the new
+     keys.  */
+  {
+    pthread_barrier_wait (&barrier);
+    struct thread_result *result = xpthread_join (delayed_thread);
+    for (int i = 0; i < key_count; ++i)
+      TEST_COMPARE (result->access_rights[i],
+                    PKEY_DISABLE_ACCESS);
+    struct thread_result *result2 = xpthread_join (result->next_thread);
+    for (int i = 0; i < key_count; ++i)
+      TEST_COMPARE (result->access_rights[i],
+                    PKEY_DISABLE_ACCESS);
+    free (result);
+    free (result2);
+  }
+
+  /* Check that the current thread access rights are inherited by new
+     threads.  */
+  {
+    pthread_t get_thread = xpthread_create (NULL, get_thread_func, NULL);
+    struct thread_result *result = xpthread_join (get_thread);
+    for (int i = 0; i < key_count; ++i)
+      TEST_COMPARE (result->access_rights[i], i);
+    free (result);
+  }
+
+  for (int i = 0; i < key_count; ++i)
+    TEST_COMPARE (pkey_get (keys[i]), i);
+
+  /* Check that in a signal handler, there is no access.  */
+  xsignal (SIGUSR1, &sigusr1_handler);
+  xraise (SIGUSR1);
+  xsignal (SIGUSR1, SIG_DFL);
+  TEST_COMPARE (sigusr1_handler_ran, 1);
+
+  /* The first key results in a writable page.  */
+  TEST_VERIFY (check_page_access (0, false));
+  TEST_VERIFY (check_page_access (0, true));
+
+  /* The other keys do not.   */
+  for (int i = 1; i < key_count; ++i)
+    {
+      if (test_verbose)
+        printf ("info: checking access for key %d, bits 0x%x\n",
+                i, pkey_get (keys[i]));
+      for (int j = 0; j < key_count; ++j)
+        TEST_COMPARE (pkey_get (keys[j]), j);
+      if (i & PKEY_DISABLE_ACCESS)
+        {
+          TEST_VERIFY (!check_page_access (i, false));
+          TEST_VERIFY (!check_page_access (i, true));
+        }
+      else
+        {
+          TEST_VERIFY (i & PKEY_DISABLE_WRITE);
+          TEST_VERIFY (check_page_access (i, false));
+          TEST_VERIFY (!check_page_access (i, true));
+        }
+    }
+
+  /* But if we set the current thread's access rights, we gain
+     access.  */
+  for (int do_write = 0; do_write < 2; ++do_write)
+    for (int allowed_key = 0; allowed_key < key_count; ++allowed_key)
+      {
+        for (int i = 0; i < key_count; ++i)
+          if (i == allowed_key)
+            {
+              if (do_write)
+                TEST_COMPARE (pkey_set (keys[i], 0), 0);
+              else
+                TEST_COMPARE (pkey_set (keys[i], PKEY_DISABLE_WRITE), 0);
+            }
+          else
+            TEST_COMPARE (pkey_set (keys[i], PKEY_DISABLE_ACCESS), 0);
+
+        if (test_verbose)
+          printf ("info: key %d is allowed access for %s\n",
+                  allowed_key, do_write ? "writing" : "reading");
+        for (int i = 0; i < key_count; ++i)
+          if (i == allowed_key)
+            {
+              TEST_VERIFY (check_page_access (i, false));
+              TEST_VERIFY (check_page_access (i, true) == do_write);
+            }
+          else
+            {
+              TEST_VERIFY (!check_page_access (i, false));
+              TEST_VERIFY (!check_page_access (i, true));
+            }
+      }
+
+  /* Restore access to all keys, and launch a thread which should
+     inherit that access.  */
+  for (int i = 0; i < key_count; ++i)
+    {
+      TEST_COMPARE (pkey_set (keys[i], 0), 0);
+      TEST_VERIFY (check_page_access (i, false));
+      TEST_VERIFY (check_page_access (i, true));
+    }
+  delayed_thread_check_access = false;
+  delayed_thread = xpthread_create
+    (NULL, delayed_thread_func, &delayed_thread_check_access);
+
+  TEST_COMPARE (pkey_free (keys[0]), 0);
+  /* Second pkey_free will fail because the key has already been
+     freed.  */
+  TEST_COMPARE (pkey_free (keys[0]),-1);
+  TEST_COMPARE (errno, EINVAL);
+  for (int i = 1; i < key_count; ++i)
+    TEST_COMPARE (pkey_free (keys[i]), 0);
+
+  /* Check what happens to running threads which have access to
+     previously allocated protection keys.  The implemented behavior
+     is somewhat dubious: Ideally, pkey_free should revoke access to
+     that key and pkey_alloc of the same (numeric) key should not
+     implicitly confer access to already-running threads, but this is
+     not what happens in practice.  */
+  {
+    /* The limit is in place to avoid running indefinitely in case
+       there many keys available.  */
+    int *keys_array = xcalloc (100000, sizeof (*keys_array));
+    int keys_allocated = 0;
+    while (keys_allocated < 100000)
+      {
+        int new_key = pkey_alloc (0, PKEY_DISABLE_WRITE);
+        if (new_key < 0)
+          {
+            /* No key reuse observed before running out of keys.  */
+            TEST_COMPARE (errno, ENOSPC);
+            break;
+          }
+        for (int i = 0; i < key_count; ++i)
+          if (new_key == keys[i])
+            {
+              /* We allocated the key with disabled write access.
+                 This should affect the protection state of the
+                 existing page.  */
+              TEST_VERIFY (check_page_access (i, false));
+              TEST_VERIFY (!check_page_access (i, true));
+
+              xpthread_barrier_wait (&barrier);
+              struct thread_result *result = xpthread_join (delayed_thread);
+              /* The thread which was launched before should still have
+                 access to the key.  */
+              TEST_COMPARE (result->access_rights[i], 0);
+              struct thread_result *result2
+                = xpthread_join (result->next_thread);
+              /* Same for a thread which is launched afterwards from
+                 the old thread.  */
+              TEST_COMPARE (result2->access_rights[i], 0);
+              free (result);
+              free (result2);
+              keys_array[keys_allocated++] = new_key;
+              goto after_key_search;
+            }
+        /* Save key for later deallocation.  */
+        keys_array[keys_allocated++] = new_key;
+      }
+  after_key_search:
+    /* Deallocate the keys allocated for testing purposes.  */
+    for (int j = 0; j < keys_allocated; ++j)
+      TEST_COMPARE (pkey_free (keys_array[j]), 0);
+    free (keys_array);
+  }
+
+  for (int i = 0; i < key_count; ++i)
+    xmunmap ((void *) pages[i], pagesize);
+
+  xpthread_barrier_destroy (&barrier);
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 0a4f7797ac..1ea74f9e8c 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -1878,6 +1878,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/x86_64/arch-pkey.h b/sysdeps/unix/sysv/linux/x86_64/arch-pkey.h
new file mode 100644
index 0000000000..8e9bfdae96
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/arch-pkey.h
@@ -0,0 +1,40 @@
+/* Helper functions for manipulating memory protection keys.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _ARCH_PKEY_H
+#define _ARCH_PKEY_H
+
+/* Return the value of the PKRU register.  */
+static inline unsigned int
+pkey_read (void)
+{
+  unsigned int result;
+  __asm__ volatile (".byte 0x0f, 0x01, 0xee"
+                    : "=a" (result) : "c" (0) : "rdx");
+  return result;
+}
+
+/* Overwrite the PKRU register with VALUE.  */
+static inline void
+pkey_write (unsigned int value)
+{
+  __asm__ volatile (".byte 0x0f, 0x01, 0xef"
+                    : : "a" (value), "c" (0), "d" (0));
+}
+
+#endif /* _ARCH_PKEY_H */
diff --git a/sysdeps/unix/sysv/linux/x86_64/pkey_get.c b/sysdeps/unix/sysv/linux/x86_64/pkey_get.c
new file mode 100644
index 0000000000..3a9bfbe676
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/pkey_get.c
@@ -0,0 +1,33 @@
+/* Reading the per-thread memory protection key, x86_64 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <arch-pkey.h>
+#include <errno.h>
+
+int
+pkey_get (int key)
+{
+  if (key < 0 || key > 15)
+    {
+      __set_errno (EINVAL);
+      return -1;
+    }
+  unsigned int pkru = pkey_read ();
+  return (pkru >> (2 * key)) & 3;
+  return 0;
+}
diff --git a/sysdeps/unix/sysv/linux/x86_64/pkey_set.c b/sysdeps/unix/sysv/linux/x86_64/pkey_set.c
new file mode 100644
index 0000000000..91dffd22c3
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/pkey_set.c
@@ -0,0 +1,35 @@
+/* Changing the per-thread memory protection key, x86_64 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <arch-pkey.h>
+#include <errno.h>
+
+int
+pkey_set (int key, unsigned int rights)
+{
+  if (key < 0 || key > 15 || rights > 3)
+    {
+      __set_errno (EINVAL);
+      return -1;
+    }
+  unsigned int mask = 3 << (2 * key);
+  unsigned int pkru = pkey_read ();
+  pkru = (pkru & ~mask) | (rights << (2 * key));
+  pkey_write (pkru);
+  return 0;
+}
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index 23f6a91429..1d3d598618 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2121,3 +2121,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* MPK: pkey_free and key reuse
@ 2017-11-05 10:35 ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-05 10:35 UTC (permalink / raw)
  To: Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

[-- Attachment #1: Type: text/plain, Size: 2962 bytes --]

I'm working on adding memory protection key support to glibc.

I don't think pkey_free, as it is implemented today, is very safe due to 
key reuse by a subsequent pkey_alloc.  I see two problems:

(A) pkey_free allows reuse for they key while there are still mappings 
that use it.

(B) If a key is reused, existing threads retain their access rights, 
while there is an expectation that pkey_alloc denies access for the 
threads except the current one.

Issue (A) could be fixed by having pkey_free to mark the key for reuse, 
and only actually reuse it if all those mappings are gone.  This could 
have a significant performance cost, but pkey_free is supposed to be rare.

Issue (B) is much harder to fix.  There is no atomic way to change 
access for a single key, so there is always a race condition due to the 
read-modify-write cycle for the PKRU update in user space.  This means 
that even if the kernel iterated over all threads to revoke access on 
pkey_free, there is a chance that the race reinstantiates the old access 
rights.

One way to deal with this is to give up and just remove pkey_free from 
the API (i.e., we wouldn't provide it in glibc).  A slightly less 
drastic way could add two pkey_alloc flags, a flag to disable pkey_free 
for the new key (which would mainly serve as a documentation of intent), 
and another flag which requests a pristine key which has never been used 
before.  With the second flag, and assuming correct key management, 
libraries would have some confidence that other threads in the process 
would not implicitly gain access to the new key (although there is the 
init_pkru= boot flag, which overrides the thread default, so it doesn't 
look like the assumption is actually valid).

All this is of course a bit on thin ice anyway because code could just 
clear the PKRU register at any time.

I'm attaching my glibc patch for reference.  The interesting bits is 
probably the test case (and how it creates and joins threads) and the 
pkey_set/pkey_get functions.  The support/ subdirectory is just our 
testing framework which is still very young—I needed a few more 
functions for debugging, which is why they are in this patch.

Key reuse is not the only problem, we also have an issue with siglongjmp:

   https://sourceware.org/bugzilla/show_bug.cgi?id=22396

I've started wondering whether it even makes sense to expose this 
interface for general use.  I don't think any other architecture will 
implement something like this in the same way (with a PKRU register 
which can simply be cleared, and keys which are easily guessed and 
reused).  I suspect the only use for this functionality is in-memory 
databases which use DAX mappings for persistence, and want to reduce 
risk of persistent data corruption due to random pointer writes.  (And 
maybe execute-only memory, but that's not really benefiting anyone anyway.)

Thanks,
Florian

PS: The manpages need fixing.  Right now, they are misleading.

[-- Attachment #2: glibc-pkey.patch --]
[-- Type: text/x-patch, Size: 54571 bytes --]


This adds system call wrappers for pkey_alloc, pkey_free, pkey_mprotect,
and x86-64 implementations of pkey_get and pkey_set, which abstract over
the PKRU CPU register and hide the actual number of memory protection
keys supported by the CPU.

The system call wrapers use unsigned int instead of unsigned long for
parameters, so that no special treatment for x32 is needed.  The flags
argument is currently unused, and the access rights bit mask is limited
to two bits by the current PKRU register layout anyway.

2017-11-04  Florian Weimer  <fweimer@redhat.com>

	Linux: Implement interfaces for memory protection keys
	* support/Makefile (libsupport-routines): Add
	support_test_compare_failure, xraise, xsigaction, xsignal,
	xsysconf.
	* support/check.h (TEST_COMPARE): New macro.
	(support_test_compare_failure): Declare.
	* support/xsignal.h (xraise, xsignal, xsigaction): Declare.
	* support/xunistd.h (xsysconf): Declare.
	* support/support_test_compare_failure.c: New file.
	* support/xraise.c: Likewise.
	* support/xsigaction.c: Likewise.
	* support/xsignal.c: Likewise.
	* support/xsysconf.c: Likewise.
	* sysdeps/unix/sysv/linux/Makefile [misc] (routines): Add
	pkey_set, pkey_get.
	[misc] (tests): Add tst-pkey.
	(tst-pkey): Link with -lpthread.
	* sysdeps/unix/sysv/linux/Versions (GLIBC_2.27): Add pkey_alloc,
	pkey_free, pkey_set, pkey_get, pkey_mprotect.
	* sysdeps/unix/sysv/linux/bits/mman-linux.h (PKEY_DISABLE_ACCESS)
	(PKEY_DISABLE_WRITE): Define.
	(pkey_alloc, pkey_free, pkey_set, pkey_get, pkey_mprotect):
	Declare.
	* sysdeps/unix/sysv/linux/bits/siginfo-consts.h (SEGV_BNDERR)
	(SEGV_PKUERR): Add.
	* sysdeps/unix/sysv/linux/pkey_get.c: New file.
	* sysdeps/unix/sysv/linux/pkey_set.c: Likewise.
	* sysdeps/unix/sysv/linux/syscalls.list (pkey_alloc, pkey_free)
	(pkey_mprotect): Add.
	* sysdeps/unix/sysv/linux/tst-pkey.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/arch-pkey.h: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/pkey_get.c: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/pkey_set.c: Likewise.
	* sysdeps/unix/sysv/linux/**.abilist: Update.

diff --git a/NEWS b/NEWS
index 933085417c..0652012a09 100644
--- a/NEWS
+++ b/NEWS
@@ -38,6 +38,10 @@ Major new features:
 * glibc now provides the <sys/memfd.h> header file and the memfd_create
   system call.
 
+* Support for memory protection keys was added.  The <sys/mman.h> header now
+  declares the functions pkey_alloc, pkey_free, pkey_memprotect, pkey_set,
+  pkey_get.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * On GNU/Linux, the obsolete Linux constant PTRACE_SEIZE_DEVEL is no longer
diff --git a/support/Makefile b/support/Makefile
index dafb1737a4..50d4269e24 100644
--- a/support/Makefile
+++ b/support/Makefile
@@ -52,9 +52,10 @@ libsupport-routines = \
   support_record_failure \
   support_run_diff \
   support_shared_allocate \
-  support_write_file_string \
+  support_test_compare_failure \
   support_test_main \
   support_test_verify_impl \
+  support_write_file_string \
   temp_file \
   write_message \
   xaccept \
@@ -84,8 +85,8 @@ libsupport-routines = \
   xpthread_attr_destroy \
   xpthread_attr_init \
   xpthread_attr_setdetachstate \
-  xpthread_attr_setstacksize \
   xpthread_attr_setguardsize \
+  xpthread_attr_setstacksize \
   xpthread_barrier_destroy \
   xpthread_barrier_init \
   xpthread_barrier_wait \
@@ -116,14 +117,18 @@ libsupport-routines = \
   xpthread_sigmask \
   xpthread_spin_lock \
   xpthread_spin_unlock \
+  xraise \
   xreadlink \
   xrealloc \
   xrecvfrom \
   xsendto \
   xsetsockopt \
+  xsigaction \
+  xsignal \
   xsocket \
   xstrdup \
   xstrndup \
+  xsysconf \
   xunlink \
   xwaitpid \
   xwrite \
diff --git a/support/check.h b/support/check.h
index bdcd12952a..29b709c2b0 100644
--- a/support/check.h
+++ b/support/check.h
@@ -86,6 +86,35 @@ void support_test_verify_exit_impl (int status, const char *file, int line,
    does not support reporting failures from a DSO.  */
 void support_record_failure (void);
 
+/* Compare the two numbers LEFT and RIGHT and report failure if they
+   are different.  */
+#define TEST_COMPARE(left, right)                                       \
+  ({                                                                    \
+    __typeof__ (left) __left_value = (left);                            \
+    __typeof__ (right) __right_value = (right);                         \
+    _Static_assert (sizeof (__left_value) <= sizeof (long long),        \
+                    "left value fits into long long");                  \
+    _Static_assert (sizeof (__right_value) <= sizeof (long long),       \
+                    "right value fits into long long");                 \
+    if (__left_value != __right_value                                   \
+        || ((__left_value > 0) != (__right_value > 0)))                 \
+      support_test_compare_failure                                      \
+        (__FILE__, __LINE__,                                            \
+         #left, __left_value, __left_value > 0,                         \
+         #right, __right_value, __right_value > 0);                     \
+  })
+
+/* Internal implementation of TEST_COMPARE.  LEFT_POSITIVE and
+   RIGHT_POSITIVE are used to fit both unsigned long long and long
+   long arguments into LEFT_VALUE and RIGHT_VALUE.  */
+void support_test_compare_failure (const char *file, int line,
+                                   const char *left_expr,
+                                   long long left_value,
+                                   int left_positive,
+                                   const char *right_expr,
+                                   long long right_value,
+                                   int right_positive);
+
 /* Internal function called by the test driver.  */
 int support_report_failure (int status)
   __attribute__ ((weak, warn_unused_result));
diff --git a/support/support_test_compare_failure.c b/support/support_test_compare_failure.c
new file mode 100644
index 0000000000..38fec1ca89
--- /dev/null
+++ b/support/support_test_compare_failure.c
@@ -0,0 +1,46 @@
+/* Reporting mumeric comparison failure.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <support/check.h>
+
+static void
+report (const char *which, const char *expr, long long value, int positive)
+{
+  printf ("  %s: ", which);
+  if (positive)
+    printf ("%llu", (unsigned long long) value);
+  else
+    printf ("%lld", value);
+  printf (" (0x%llx); from: %s\n", (unsigned long long) value, expr);
+}
+
+void
+support_test_compare_failure (const char *file, int line,
+                              const char *left_expr,
+                              long long left_value,
+                              int left_positive,
+                              const char *right_expr,
+                              long long right_value,
+                              int right_positive)
+{
+  support_record_failure ();
+  printf ("%s:%d: numeric comparison failure\n", file, line);
+  report (" left", left_expr, left_value, left_positive);
+  report ("right", right_expr, right_value, right_positive);
+}
diff --git a/support/xraise.c b/support/xraise.c
new file mode 100644
index 0000000000..9126c6c3ea
--- /dev/null
+++ b/support/xraise.c
@@ -0,0 +1,27 @@
+/* Error-checking wrapper for raise.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/check.h>
+#include <support/xsignal.h>
+
+void
+xraise (int sig)
+{
+  if (raise (sig) != 0)
+    FAIL_EXIT1 ("raise (%d): %m" , sig);
+}
diff --git a/support/xsigaction.c b/support/xsigaction.c
new file mode 100644
index 0000000000..b74c69afae
--- /dev/null
+++ b/support/xsigaction.c
@@ -0,0 +1,27 @@
+/* Error-checking wrapper for sigaction.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/check.h>
+#include <support/xsignal.h>
+
+void
+xsigaction (int sig, const struct sigaction *newact, struct sigaction *oldact)
+{
+  if (sigaction (sig, newact, oldact))
+    FAIL_EXIT1 ("sigaction (%d): %m" , sig);
+}
diff --git a/support/xsignal.c b/support/xsignal.c
new file mode 100644
index 0000000000..22a1dd74a7
--- /dev/null
+++ b/support/xsignal.c
@@ -0,0 +1,29 @@
+/* Error-checking wrapper for signal.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/check.h>
+#include <support/xsignal.h>
+
+sighandler_t
+xsignal (int sig, sighandler_t handler)
+{
+  sighandler_t result = signal (sig, handler);
+  if (result == SIG_ERR)
+    FAIL_EXIT1 ("signal (%d, %p): %m", sig, handler);
+  return result;
+}
diff --git a/support/xsignal.h b/support/xsignal.h
index 3dc0d9d5ce..3087ed0082 100644
--- a/support/xsignal.h
+++ b/support/xsignal.h
@@ -24,6 +24,14 @@
 
 __BEGIN_DECLS
 
+/* The following functions call the corresponding libc functions and
+   terminate the process on error.  */
+
+void xraise (int sig);
+sighandler_t xsignal (int sig, sighandler_t handler);
+void xsigaction (int sig, const struct sigaction *newact,
+                 struct sigaction *oldact);
+
 /* The following functions call the corresponding libpthread functions
    and terminate the process on error.  */
 
diff --git a/support/xsysconf.c b/support/xsysconf.c
new file mode 100644
index 0000000000..15ab1e26c4
--- /dev/null
+++ b/support/xsysconf.c
@@ -0,0 +1,36 @@
+/* Error-checking wrapper for sysconf.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <support/check.h>
+#include <support/xunistd.h>
+
+long
+xsysconf (int name)
+{
+  /* Detect errors by a changed errno value, in case -1 is a valid
+     value.  Make sure that the caller does not see the zero value for
+     errno.  */
+  int old_errno = errno;
+  errno = 0;
+  long result = sysconf (name);
+  if (errno != 0)
+    FAIL_EXIT1 ("sysconf (%d): %m", name);
+  errno = old_errno;
+  return result;
+}
diff --git a/support/xunistd.h b/support/xunistd.h
index 05c2626a7b..00376f7aae 100644
--- a/support/xunistd.h
+++ b/support/xunistd.h
@@ -39,6 +39,7 @@ void xstat (const char *path, struct stat64 *);
 void xmkdir (const char *path, mode_t);
 void xchroot (const char *path);
 void xunlink (const char *path);
+long xsysconf (int name);
 
 /* Read the link at PATH.  The caller should free the returned string
    with free.  */
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 53e41510e3..095cf93892 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -18,7 +18,7 @@ sysdep_routines += clone umount umount2 readahead \
 		   setfsuid setfsgid epoll_pwait signalfd \
 		   eventfd eventfd_read eventfd_write prlimit \
 		   personality epoll_wait tee vmsplice splice \
-		   open_by_handle_at
+		   open_by_handle_at pkey_set pkey_get
 
 CFLAGS-gethostid.c = -fexceptions
 CFLAGS-tee.c = -fexceptions -fasynchronous-unwind-tables
@@ -44,7 +44,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \
 
 tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \
 	 tst-quota tst-sync_file_range test-errno-linux tst-sysconf-iov_max \
-	 tst-memfd_create
+	 tst-memfd_create tst-pkey
 
 # Generate the list of SYS_* macros for the system calls (__NR_*
 # macros).  The file syscall-names.list contains all possible system
@@ -92,6 +92,8 @@ $(objpfx)tst-syscall-list.out: \
 # Separate object file for access to the constant from the UAPI header.
 $(objpfx)tst-sysconf-iov_max: $(objpfx)tst-sysconf-iov_max-uapi.o
 
+$(objpfx)tst-pkey: $(shared-thread-library)
+
 endif # $(subdir) == misc
 
 ifeq ($(subdir),time)
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index 992c19729f..798ffc7660 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -168,6 +168,7 @@ libc {
   }
   GLIBC_2.27 {
     memfd_create;
+    pkey_alloc; pkey_free; pkey_set; pkey_get; pkey_mprotect;
   }
   GLIBC_PRIVATE {
     # functions used in other libraries
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 140ca28abc..85788be12b 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2107,6 +2107,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index f698e1b2f4..3b463dacbe 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2018,6 +2018,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist
index 8a8af3e3e4..a1315aef35 100644
--- a/sysdeps/unix/sysv/linux/arm/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/libc.abilist
@@ -108,6 +108,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.4 GLIBC_2.4 A
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/bits/mman-linux.h b/sysdeps/unix/sysv/linux/bits/mman-linux.h
index b091181960..da5ec79334 100644
--- a/sysdeps/unix/sysv/linux/bits/mman-linux.h
+++ b/sysdeps/unix/sysv/linux/bits/mman-linux.h
@@ -109,3 +109,38 @@
 # define MCL_ONFAULT	4		/* Lock all pages that are
 					   faulted in.  */
 #endif
+
+/* Memory protection key support.  */
+#ifdef __USE_GNU
+
+/* FLags for pkey_alloc.  */
+# define PKEY_DISABLE_ACCESS 0x1
+# define PKEY_DISABLE_WRITE 0x2
+
+__BEGIN_DECLS
+
+/* Allocate a new protection key, with the PKEY_DISABLE_* bits
+   specified in ACCESS_RIGHTS.  The protection key mask for the
+   current thread is updated to match the access privilege for the new
+   key.  */
+int pkey_alloc (unsigned int __flags, unsigned int __access_rights) __THROW;
+
+/* Update the access rights for the current thread for KEY, which must
+   have been allocated using pkey_alloc.  */
+int pkey_set (int __key, unsigned int __access_rights) __THROW;
+
+/* Return the access rights for the current thread for KEY, which must
+   have been allocated using pkey_alloc.  */
+int pkey_get (int _key) __THROW;
+
+/* Free an allocated protection key, which must have been allocated
+   using pkey_alloc.  */
+int pkey_free (int __key) __THROW;
+
+/* Apply memory protection flags for KEY to the specified address
+   range.  */
+int pkey_mprotect (void *__addr, size_t __len, int __prot, int __pkey) __THROW;
+
+__END_DECLS
+
+#endif /* __USE_GNU */
diff --git a/sysdeps/unix/sysv/linux/bits/siginfo-consts.h b/sysdeps/unix/sysv/linux/bits/siginfo-consts.h
index 525840cea1..e86b933040 100644
--- a/sysdeps/unix/sysv/linux/bits/siginfo-consts.h
+++ b/sysdeps/unix/sysv/linux/bits/siginfo-consts.h
@@ -111,8 +111,12 @@ enum
 {
   SEGV_MAPERR = 1,		/* Address not mapped to object.  */
 #  define SEGV_MAPERR	SEGV_MAPERR
-  SEGV_ACCERR			/* Invalid permissions for mapped object.  */
+  SEGV_ACCERR,			/* Invalid permissions for mapped object.  */
 #  define SEGV_ACCERR	SEGV_ACCERR
+  SEGV_BNDERR,			/* Bounds checking failure.  */
+#  define SEGV_BNDERR	SEGV_BNDERR
+  SEGV_PKUERR			/* Protection key checking failure.  */
+#  define SEGV_PKUERR	SEGV_PKUERR
 };
 
 /* `si_code' values for SIGBUS signal.  */
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 5b81a6cd7d..7397d728f2 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -1872,6 +1872,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index 51ead9e867..cffdf251d6 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2037,6 +2037,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index 78b4ee8d40..3292510a55 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -1901,6 +1901,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index d9c97779e4..636bbdd1a7 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -109,6 +109,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.4 GLIBC_2.4 A
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 4acbf7eeed..6952863f86 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -1986,6 +1986,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
index 93f02f08ce..ac5b56abab 100644
--- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
@@ -2107,3 +2107,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 795e85de70..bb0958e842 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -1961,6 +1961,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index dc714057b7..9104eb4d6d 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -1959,6 +1959,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index ce7bc9b175..58a5d5e141 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -1957,6 +1957,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 3fdd85eace..2efac14a7d 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -1952,6 +1952,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 3e0bcb2a5c..9ef29e4e98 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2148,3 +2148,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/pkey_get.c b/sysdeps/unix/sysv/linux/pkey_get.c
new file mode 100644
index 0000000000..fc3204c82f
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pkey_get.c
@@ -0,0 +1,26 @@
+/* Obtaining the thread memory protection key, generic stub.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+
+int
+pkey_get (int key)
+{
+  __set_errno (ENOSYS);
+  return -1;
+}
diff --git a/sysdeps/unix/sysv/linux/pkey_set.c b/sysdeps/unix/sysv/linux/pkey_set.c
new file mode 100644
index 0000000000..f686c4373c
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pkey_set.c
@@ -0,0 +1,26 @@
+/* Changing the thread memory protection key, generic stub.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+
+int
+pkey_set (int key, unsigned int access_rights)
+{
+  __set_errno (ENOSYS);
+  return -1;
+}
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index 375c69d9d1..60c024096f 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -1990,6 +1990,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index a88172a906..327933c973 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -1995,6 +1995,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist
index fa026a332c..b04c31bc10 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist
@@ -2202,3 +2202,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist
index 838f395d78..e0645e9e25 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist
@@ -109,6 +109,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 _Exit F
 GLIBC_2.3 _IO_2_1_stderr_ D 0xe0
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 41b79c496a..ef434c61a7 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -1990,6 +1990,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 68251a0e69..4114a4ce57 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -1891,6 +1891,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist
index bc1aae275e..f4478b0cc5 100644
--- a/sysdeps/unix/sysv/linux/sh/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/libc.abilist
@@ -1876,6 +1876,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 93e6d092ac..136a57fc0e 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -1983,6 +1983,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index b11d6764d4..9ad0790829 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -1920,6 +1920,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/syscalls.list b/sysdeps/unix/sysv/linux/syscalls.list
index 40c4fbb9ea..6f657eea2e 100644
--- a/sysdeps/unix/sysv/linux/syscalls.list
+++ b/sysdeps/unix/sysv/linux/syscalls.list
@@ -110,3 +110,6 @@ setns		EXTRA	setns		i:ii	setns
 process_vm_readv EXTRA	process_vm_readv i:ipipii process_vm_readv
 process_vm_writev EXTRA	process_vm_writev i:ipipii process_vm_writev
 memfd_create    EXTRA	memfd_create	i:si    memfd_create
+pkey_alloc	EXTRA	pkey_alloc	i:ii	pkey_alloc
+pkey_free	EXTRA	pkey_free	i:i	pkey_free
+pkey_mprotect	EXTRA	pkey_mprotect	i:aiii  pkey_mprotect
diff --git a/sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist b/sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist
index e9eb4ff7bd..d4f2094027 100644
--- a/sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist
@@ -2114,3 +2114,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist b/sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist
index 8f08e909cd..4916dbabb5 100644
--- a/sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist
@@ -2114,3 +2114,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist b/sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist
index e9eb4ff7bd..d4f2094027 100644
--- a/sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist
+++ b/sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist
@@ -2114,3 +2114,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/tst-pkey.c b/sysdeps/unix/sysv/linux/tst-pkey.c
new file mode 100644
index 0000000000..42d50e37c2
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-pkey.c
@@ -0,0 +1,390 @@
+/* Tests for memory protection keys.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <inttypes.h>
+#include <setjmp.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/support.h>
+#include <support/test-driver.h>
+#include <support/xsignal.h>
+#include <support/xthread.h>
+#include <support/xunistd.h>
+#include <sys/mman.h>
+
+/* Used to force threads to wait until the main thread has set up the
+   keys as intended.  */
+static pthread_barrier_t barrier;
+
+/* The keys used for testing.  These have been allocated with access
+   rights set based on their array index.  */
+enum { key_count = 4 };
+static int keys[key_count];
+static volatile int *pages[key_count];
+
+/* Used to report results from the signal handler.  */
+static volatile void *sigsegv_addr;
+static volatile int sigsegv_code;
+static volatile int sigsegv_pkey;
+static sigjmp_buf sigsegv_jmp;
+
+/* Used to handle expected read or write faults.  */
+static void
+sigsegv_handler (int signum, siginfo_t *info, void *context)
+{
+  sigsegv_addr = info->si_addr;
+  sigsegv_code = info->si_code;
+  sigsegv_pkey = info->si_pkey;
+  siglongjmp (sigsegv_jmp, 2);
+}
+
+static const struct sigaction sigsegv_sigaction =
+  {
+    .sa_flags = SA_RESETHAND | SA_SIGINFO,
+    .sa_sigaction = &sigsegv_handler,
+  };
+
+/* Check if PAGE is readable (if !WRITE) or writable (if WRITE).  */
+static bool
+check_page_access (int page, bool write)
+{
+  /* This is needed to work around bug 22396: On x86-64, siglongjmp
+     does not restore the protection key access rights for the current
+     thread.  We restore only the access rights for the keys under
+     test.  (This is not a general solution to this problem, but it
+     allows testing to proceed after a fault.)  */
+  unsigned saved_rights[key_count];
+  for (int i = 0; i < key_count; ++i)
+    saved_rights[i] = pkey_get (keys[i]);
+
+  volatile int *addr = pages[page];
+  if (test_verbose > 0)
+    {
+      printf ("info: checking access at %p (page %d) for %s\n",
+              addr, page, write ? "writing" : "reading");
+    }
+  int result = sigsetjmp (sigsegv_jmp, 1);
+  if (result == 0)
+    {
+      xsigaction (SIGSEGV, &sigsegv_sigaction, NULL);
+      if (write)
+        *addr = 3;
+      else
+        (void) *addr;
+      xsignal (SIGSEGV, SIG_DFL);
+      if (test_verbose > 0)
+        puts ("  --> access allowed");
+      return true;
+    }
+  else
+    {
+      xsignal (SIGSEGV, SIG_DFL);
+      if (test_verbose > 0)
+        puts ("  --> access denied");
+      TEST_COMPARE (result, 2);
+      TEST_COMPARE ((uintptr_t) sigsegv_addr, (uintptr_t) addr);
+      TEST_COMPARE (sigsegv_code, SEGV_PKUERR);
+      TEST_COMPARE (sigsegv_pkey, keys[page]);
+      for (int i = 0; i < key_count; ++i)
+        TEST_COMPARE (pkey_set (keys[i], saved_rights[i]), 0);
+      return false;
+    }
+}
+
+static volatile sig_atomic_t sigusr1_handler_ran;
+
+/* Used to check that access is revoked in signal handlers.  */
+static void
+sigusr1_handler (int signum)
+{
+  TEST_COMPARE (signum, SIGUSR1);
+  for (int i = 0; i < key_count; ++i)
+    TEST_COMPARE (pkey_get (keys[i]), PKEY_DISABLE_ACCESS);
+  sigusr1_handler_ran = 1;
+}
+
+/* Used to report results from other threads.  */
+struct thread_result
+{
+  int access_rights[key_count];
+  pthread_t next_thread;
+};
+
+/* Return the thread's access rights for the keys under test.  */
+static void *
+get_thread_func (void *closure)
+{
+  struct thread_result *result = xmalloc (sizeof (*result));
+  for (int i = 0; i < key_count; ++i)
+    result->access_rights[i] = pkey_get (keys[i]);
+  memset (&result->next_thread, 0, sizeof (result->next_thread));
+  return result;
+}
+
+/* Wait for initialization and then check that the current thread does
+   not have access through the keys under test.  */
+static void *
+delayed_thread_func (void *closure)
+{
+  bool check_access = *(bool *) closure;
+  pthread_barrier_wait (&barrier);
+  struct thread_result *result = get_thread_func (NULL);
+
+  if (check_access)
+    {
+      /* Also check directly.  This code should not run with other
+         threads in parallel because of the SIGSEGV handler which is
+         installed by check_page_access.  */
+      for (int i = 0; i < key_count; ++i)
+        {
+          TEST_VERIFY (!check_page_access (i, false));
+          TEST_VERIFY (!check_page_access (i, true));
+        }
+    }
+
+  result->next_thread = xpthread_create (NULL, get_thread_func, NULL);
+  return result;
+}
+
+static int
+do_test (void)
+{
+  long pagesize = xsysconf (_SC_PAGESIZE);
+
+  xpthread_barrier_init (&barrier, NULL, 2);
+  bool delayed_thread_check_access = true;
+  pthread_t delayed_thread = xpthread_create
+    (NULL, &delayed_thread_func, &delayed_thread_check_access);
+
+  keys[0] = pkey_alloc (0, 0);
+  if (keys[0] < 0)
+    {
+      if (errno == ENOSYS)
+        {
+          puts ("warning: kernel does not support memory protection keys");
+          return EXIT_UNSUPPORTED;
+        }
+      if (errno == ENOSPC)
+        {
+          puts ("warning: CPU does not support memory protection keys");
+          return EXIT_UNSUPPORTED;
+        }
+      FAIL_EXIT1 ("pkey_alloc: %m");
+    }
+  TEST_COMPARE (pkey_get (keys[0]), 0);
+  for (int i = 1; i < key_count; ++i)
+    {
+      keys[i] = pkey_alloc (0, i);
+      if (keys[i] < 0)
+        FAIL_EXIT1 ("pkey_alloc (0, %d): %m", i);
+      /* pkey_alloc is supposed to change the current thread's access
+         rights for the new key.  */
+      TEST_COMPARE (pkey_get (keys[i]), i);
+    }
+  /* Check that all the keys have the expected access rights for the
+     current thread.  */
+  for (int i = 0; i < key_count; ++i)
+    TEST_COMPARE (pkey_get (keys[i]), i);
+
+  /* Allocate a test page for each key.  */
+  for (int i = 0; i < key_count; ++i)
+    {
+      pages[i] = xmmap (NULL, pagesize, PROT_READ | PROT_WRITE,
+                        MAP_ANONYMOUS | MAP_PRIVATE, -1);
+      TEST_COMPARE (pkey_mprotect ((void *) pages[i], pagesize,
+                                   PROT_READ | PROT_WRITE, keys[i]), 0);
+    }
+
+  /* Check that the initial thread does not have access to the new
+     keys.  */
+  {
+    pthread_barrier_wait (&barrier);
+    struct thread_result *result = xpthread_join (delayed_thread);
+    for (int i = 0; i < key_count; ++i)
+      TEST_COMPARE (result->access_rights[i],
+                    PKEY_DISABLE_ACCESS);
+    struct thread_result *result2 = xpthread_join (result->next_thread);
+    for (int i = 0; i < key_count; ++i)
+      TEST_COMPARE (result->access_rights[i],
+                    PKEY_DISABLE_ACCESS);
+    free (result);
+    free (result2);
+  }
+
+  /* Check that the current thread access rights are inherited by new
+     threads.  */
+  {
+    pthread_t get_thread = xpthread_create (NULL, get_thread_func, NULL);
+    struct thread_result *result = xpthread_join (get_thread);
+    for (int i = 0; i < key_count; ++i)
+      TEST_COMPARE (result->access_rights[i], i);
+    free (result);
+  }
+
+  for (int i = 0; i < key_count; ++i)
+    TEST_COMPARE (pkey_get (keys[i]), i);
+
+  /* Check that in a signal handler, there is no access.  */
+  xsignal (SIGUSR1, &sigusr1_handler);
+  xraise (SIGUSR1);
+  xsignal (SIGUSR1, SIG_DFL);
+  TEST_COMPARE (sigusr1_handler_ran, 1);
+
+  /* The first key results in a writable page.  */
+  TEST_VERIFY (check_page_access (0, false));
+  TEST_VERIFY (check_page_access (0, true));
+
+  /* The other keys do not.   */
+  for (int i = 1; i < key_count; ++i)
+    {
+      if (test_verbose)
+        printf ("info: checking access for key %d, bits 0x%x\n",
+                i, pkey_get (keys[i]));
+      for (int j = 0; j < key_count; ++j)
+        TEST_COMPARE (pkey_get (keys[j]), j);
+      if (i & PKEY_DISABLE_ACCESS)
+        {
+          TEST_VERIFY (!check_page_access (i, false));
+          TEST_VERIFY (!check_page_access (i, true));
+        }
+      else
+        {
+          TEST_VERIFY (i & PKEY_DISABLE_WRITE);
+          TEST_VERIFY (check_page_access (i, false));
+          TEST_VERIFY (!check_page_access (i, true));
+        }
+    }
+
+  /* But if we set the current thread's access rights, we gain
+     access.  */
+  for (int do_write = 0; do_write < 2; ++do_write)
+    for (int allowed_key = 0; allowed_key < key_count; ++allowed_key)
+      {
+        for (int i = 0; i < key_count; ++i)
+          if (i == allowed_key)
+            {
+              if (do_write)
+                TEST_COMPARE (pkey_set (keys[i], 0), 0);
+              else
+                TEST_COMPARE (pkey_set (keys[i], PKEY_DISABLE_WRITE), 0);
+            }
+          else
+            TEST_COMPARE (pkey_set (keys[i], PKEY_DISABLE_ACCESS), 0);
+
+        if (test_verbose)
+          printf ("info: key %d is allowed access for %s\n",
+                  allowed_key, do_write ? "writing" : "reading");
+        for (int i = 0; i < key_count; ++i)
+          if (i == allowed_key)
+            {
+              TEST_VERIFY (check_page_access (i, false));
+              TEST_VERIFY (check_page_access (i, true) == do_write);
+            }
+          else
+            {
+              TEST_VERIFY (!check_page_access (i, false));
+              TEST_VERIFY (!check_page_access (i, true));
+            }
+      }
+
+  /* Restore access to all keys, and launch a thread which should
+     inherit that access.  */
+  for (int i = 0; i < key_count; ++i)
+    {
+      TEST_COMPARE (pkey_set (keys[i], 0), 0);
+      TEST_VERIFY (check_page_access (i, false));
+      TEST_VERIFY (check_page_access (i, true));
+    }
+  delayed_thread_check_access = false;
+  delayed_thread = xpthread_create
+    (NULL, delayed_thread_func, &delayed_thread_check_access);
+
+  TEST_COMPARE (pkey_free (keys[0]), 0);
+  /* Second pkey_free will fail because the key has already been
+     freed.  */
+  TEST_COMPARE (pkey_free (keys[0]),-1);
+  TEST_COMPARE (errno, EINVAL);
+  for (int i = 1; i < key_count; ++i)
+    TEST_COMPARE (pkey_free (keys[i]), 0);
+
+  /* Check what happens to running threads which have access to
+     previously allocated protection keys.  The implemented behavior
+     is somewhat dubious: Ideally, pkey_free should revoke access to
+     that key and pkey_alloc of the same (numeric) key should not
+     implicitly confer access to already-running threads, but this is
+     not what happens in practice.  */
+  {
+    /* The limit is in place to avoid running indefinitely in case
+       there many keys available.  */
+    int *keys_array = xcalloc (100000, sizeof (*keys_array));
+    int keys_allocated = 0;
+    while (keys_allocated < 100000)
+      {
+        int new_key = pkey_alloc (0, PKEY_DISABLE_WRITE);
+        if (new_key < 0)
+          {
+            /* No key reuse observed before running out of keys.  */
+            TEST_COMPARE (errno, ENOSPC);
+            break;
+          }
+        for (int i = 0; i < key_count; ++i)
+          if (new_key == keys[i])
+            {
+              /* We allocated the key with disabled write access.
+                 This should affect the protection state of the
+                 existing page.  */
+              TEST_VERIFY (check_page_access (i, false));
+              TEST_VERIFY (!check_page_access (i, true));
+
+              xpthread_barrier_wait (&barrier);
+              struct thread_result *result = xpthread_join (delayed_thread);
+              /* The thread which was launched before should still have
+                 access to the key.  */
+              TEST_COMPARE (result->access_rights[i], 0);
+              struct thread_result *result2
+                = xpthread_join (result->next_thread);
+              /* Same for a thread which is launched afterwards from
+                 the old thread.  */
+              TEST_COMPARE (result2->access_rights[i], 0);
+              free (result);
+              free (result2);
+              keys_array[keys_allocated++] = new_key;
+              goto after_key_search;
+            }
+        /* Save key for later deallocation.  */
+        keys_array[keys_allocated++] = new_key;
+      }
+  after_key_search:
+    /* Deallocate the keys allocated for testing purposes.  */
+    for (int j = 0; j < keys_allocated; ++j)
+      TEST_COMPARE (pkey_free (keys_array[j]), 0);
+    free (keys_array);
+  }
+
+  for (int i = 0; i < key_count; ++i)
+    xmunmap ((void *) pages[i], pagesize);
+
+  xpthread_barrier_destroy (&barrier);
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 0a4f7797ac..1ea74f9e8c 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -1878,6 +1878,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/x86_64/arch-pkey.h b/sysdeps/unix/sysv/linux/x86_64/arch-pkey.h
new file mode 100644
index 0000000000..8e9bfdae96
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/arch-pkey.h
@@ -0,0 +1,40 @@
+/* Helper functions for manipulating memory protection keys.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _ARCH_PKEY_H
+#define _ARCH_PKEY_H
+
+/* Return the value of the PKRU register.  */
+static inline unsigned int
+pkey_read (void)
+{
+  unsigned int result;
+  __asm__ volatile (".byte 0x0f, 0x01, 0xee"
+                    : "=a" (result) : "c" (0) : "rdx");
+  return result;
+}
+
+/* Overwrite the PKRU register with VALUE.  */
+static inline void
+pkey_write (unsigned int value)
+{
+  __asm__ volatile (".byte 0x0f, 0x01, 0xef"
+                    : : "a" (value), "c" (0), "d" (0));
+}
+
+#endif /* _ARCH_PKEY_H */
diff --git a/sysdeps/unix/sysv/linux/x86_64/pkey_get.c b/sysdeps/unix/sysv/linux/x86_64/pkey_get.c
new file mode 100644
index 0000000000..3a9bfbe676
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/pkey_get.c
@@ -0,0 +1,33 @@
+/* Reading the per-thread memory protection key, x86_64 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <arch-pkey.h>
+#include <errno.h>
+
+int
+pkey_get (int key)
+{
+  if (key < 0 || key > 15)
+    {
+      __set_errno (EINVAL);
+      return -1;
+    }
+  unsigned int pkru = pkey_read ();
+  return (pkru >> (2 * key)) & 3;
+  return 0;
+}
diff --git a/sysdeps/unix/sysv/linux/x86_64/pkey_set.c b/sysdeps/unix/sysv/linux/x86_64/pkey_set.c
new file mode 100644
index 0000000000..91dffd22c3
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/pkey_set.c
@@ -0,0 +1,35 @@
+/* Changing the per-thread memory protection key, x86_64 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <arch-pkey.h>
+#include <errno.h>
+
+int
+pkey_set (int key, unsigned int rights)
+{
+  if (key < 0 || key > 15 || rights > 3)
+    {
+      __set_errno (EINVAL);
+      return -1;
+    }
+  unsigned int mask = 3 << (2 * key);
+  unsigned int pkru = pkey_read ();
+  pkru = (pkru & ~mask) | (rights << (2 * key));
+  pkey_write (pkru);
+  return 0;
+}
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index 23f6a91429..1d3d598618 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2121,3 +2121,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* MPK: pkey_free and key reuse
@ 2017-11-05 10:35 ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-05 10:35 UTC (permalink / raw)
  To: Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

[-- Attachment #1: Type: text/plain, Size: 2962 bytes --]

I'm working on adding memory protection key support to glibc.

I don't think pkey_free, as it is implemented today, is very safe due to 
key reuse by a subsequent pkey_alloc.  I see two problems:

(A) pkey_free allows reuse for they key while there are still mappings 
that use it.

(B) If a key is reused, existing threads retain their access rights, 
while there is an expectation that pkey_alloc denies access for the 
threads except the current one.

Issue (A) could be fixed by having pkey_free to mark the key for reuse, 
and only actually reuse it if all those mappings are gone.  This could 
have a significant performance cost, but pkey_free is supposed to be rare.

Issue (B) is much harder to fix.  There is no atomic way to change 
access for a single key, so there is always a race condition due to the 
read-modify-write cycle for the PKRU update in user space.  This means 
that even if the kernel iterated over all threads to revoke access on 
pkey_free, there is a chance that the race reinstantiates the old access 
rights.

One way to deal with this is to give up and just remove pkey_free from 
the API (i.e., we wouldn't provide it in glibc).  A slightly less 
drastic way could add two pkey_alloc flags, a flag to disable pkey_free 
for the new key (which would mainly serve as a documentation of intent), 
and another flag which requests a pristine key which has never been used 
before.  With the second flag, and assuming correct key management, 
libraries would have some confidence that other threads in the process 
would not implicitly gain access to the new key (although there is the 
init_pkru= boot flag, which overrides the thread default, so it doesn't 
look like the assumption is actually valid).

All this is of course a bit on thin ice anyway because code could just 
clear the PKRU register at any time.

I'm attaching my glibc patch for reference.  The interesting bits is 
probably the test case (and how it creates and joins threads) and the 
pkey_set/pkey_get functions.  The support/ subdirectory is just our 
testing framework which is still very younga??I needed a few more 
functions for debugging, which is why they are in this patch.

Key reuse is not the only problem, we also have an issue with siglongjmp:

   https://sourceware.org/bugzilla/show_bug.cgi?id=22396

I've started wondering whether it even makes sense to expose this 
interface for general use.  I don't think any other architecture will 
implement something like this in the same way (with a PKRU register 
which can simply be cleared, and keys which are easily guessed and 
reused).  I suspect the only use for this functionality is in-memory 
databases which use DAX mappings for persistence, and want to reduce 
risk of persistent data corruption due to random pointer writes.  (And 
maybe execute-only memory, but that's not really benefiting anyone anyway.)

Thanks,
Florian

PS: The manpages need fixing.  Right now, they are misleading.

[-- Attachment #2: glibc-pkey.patch --]
[-- Type: text/x-patch, Size: 54571 bytes --]


This adds system call wrappers for pkey_alloc, pkey_free, pkey_mprotect,
and x86-64 implementations of pkey_get and pkey_set, which abstract over
the PKRU CPU register and hide the actual number of memory protection
keys supported by the CPU.

The system call wrapers use unsigned int instead of unsigned long for
parameters, so that no special treatment for x32 is needed.  The flags
argument is currently unused, and the access rights bit mask is limited
to two bits by the current PKRU register layout anyway.

2017-11-04  Florian Weimer  <fweimer@redhat.com>

	Linux: Implement interfaces for memory protection keys
	* support/Makefile (libsupport-routines): Add
	support_test_compare_failure, xraise, xsigaction, xsignal,
	xsysconf.
	* support/check.h (TEST_COMPARE): New macro.
	(support_test_compare_failure): Declare.
	* support/xsignal.h (xraise, xsignal, xsigaction): Declare.
	* support/xunistd.h (xsysconf): Declare.
	* support/support_test_compare_failure.c: New file.
	* support/xraise.c: Likewise.
	* support/xsigaction.c: Likewise.
	* support/xsignal.c: Likewise.
	* support/xsysconf.c: Likewise.
	* sysdeps/unix/sysv/linux/Makefile [misc] (routines): Add
	pkey_set, pkey_get.
	[misc] (tests): Add tst-pkey.
	(tst-pkey): Link with -lpthread.
	* sysdeps/unix/sysv/linux/Versions (GLIBC_2.27): Add pkey_alloc,
	pkey_free, pkey_set, pkey_get, pkey_mprotect.
	* sysdeps/unix/sysv/linux/bits/mman-linux.h (PKEY_DISABLE_ACCESS)
	(PKEY_DISABLE_WRITE): Define.
	(pkey_alloc, pkey_free, pkey_set, pkey_get, pkey_mprotect):
	Declare.
	* sysdeps/unix/sysv/linux/bits/siginfo-consts.h (SEGV_BNDERR)
	(SEGV_PKUERR): Add.
	* sysdeps/unix/sysv/linux/pkey_get.c: New file.
	* sysdeps/unix/sysv/linux/pkey_set.c: Likewise.
	* sysdeps/unix/sysv/linux/syscalls.list (pkey_alloc, pkey_free)
	(pkey_mprotect): Add.
	* sysdeps/unix/sysv/linux/tst-pkey.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/arch-pkey.h: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/pkey_get.c: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/pkey_set.c: Likewise.
	* sysdeps/unix/sysv/linux/**.abilist: Update.

diff --git a/NEWS b/NEWS
index 933085417c..0652012a09 100644
--- a/NEWS
+++ b/NEWS
@@ -38,6 +38,10 @@ Major new features:
 * glibc now provides the <sys/memfd.h> header file and the memfd_create
   system call.
 
+* Support for memory protection keys was added.  The <sys/mman.h> header now
+  declares the functions pkey_alloc, pkey_free, pkey_memprotect, pkey_set,
+  pkey_get.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * On GNU/Linux, the obsolete Linux constant PTRACE_SEIZE_DEVEL is no longer
diff --git a/support/Makefile b/support/Makefile
index dafb1737a4..50d4269e24 100644
--- a/support/Makefile
+++ b/support/Makefile
@@ -52,9 +52,10 @@ libsupport-routines = \
   support_record_failure \
   support_run_diff \
   support_shared_allocate \
-  support_write_file_string \
+  support_test_compare_failure \
   support_test_main \
   support_test_verify_impl \
+  support_write_file_string \
   temp_file \
   write_message \
   xaccept \
@@ -84,8 +85,8 @@ libsupport-routines = \
   xpthread_attr_destroy \
   xpthread_attr_init \
   xpthread_attr_setdetachstate \
-  xpthread_attr_setstacksize \
   xpthread_attr_setguardsize \
+  xpthread_attr_setstacksize \
   xpthread_barrier_destroy \
   xpthread_barrier_init \
   xpthread_barrier_wait \
@@ -116,14 +117,18 @@ libsupport-routines = \
   xpthread_sigmask \
   xpthread_spin_lock \
   xpthread_spin_unlock \
+  xraise \
   xreadlink \
   xrealloc \
   xrecvfrom \
   xsendto \
   xsetsockopt \
+  xsigaction \
+  xsignal \
   xsocket \
   xstrdup \
   xstrndup \
+  xsysconf \
   xunlink \
   xwaitpid \
   xwrite \
diff --git a/support/check.h b/support/check.h
index bdcd12952a..29b709c2b0 100644
--- a/support/check.h
+++ b/support/check.h
@@ -86,6 +86,35 @@ void support_test_verify_exit_impl (int status, const char *file, int line,
    does not support reporting failures from a DSO.  */
 void support_record_failure (void);
 
+/* Compare the two numbers LEFT and RIGHT and report failure if they
+   are different.  */
+#define TEST_COMPARE(left, right)                                       \
+  ({                                                                    \
+    __typeof__ (left) __left_value = (left);                            \
+    __typeof__ (right) __right_value = (right);                         \
+    _Static_assert (sizeof (__left_value) <= sizeof (long long),        \
+                    "left value fits into long long");                  \
+    _Static_assert (sizeof (__right_value) <= sizeof (long long),       \
+                    "right value fits into long long");                 \
+    if (__left_value != __right_value                                   \
+        || ((__left_value > 0) != (__right_value > 0)))                 \
+      support_test_compare_failure                                      \
+        (__FILE__, __LINE__,                                            \
+         #left, __left_value, __left_value > 0,                         \
+         #right, __right_value, __right_value > 0);                     \
+  })
+
+/* Internal implementation of TEST_COMPARE.  LEFT_POSITIVE and
+   RIGHT_POSITIVE are used to fit both unsigned long long and long
+   long arguments into LEFT_VALUE and RIGHT_VALUE.  */
+void support_test_compare_failure (const char *file, int line,
+                                   const char *left_expr,
+                                   long long left_value,
+                                   int left_positive,
+                                   const char *right_expr,
+                                   long long right_value,
+                                   int right_positive);
+
 /* Internal function called by the test driver.  */
 int support_report_failure (int status)
   __attribute__ ((weak, warn_unused_result));
diff --git a/support/support_test_compare_failure.c b/support/support_test_compare_failure.c
new file mode 100644
index 0000000000..38fec1ca89
--- /dev/null
+++ b/support/support_test_compare_failure.c
@@ -0,0 +1,46 @@
+/* Reporting mumeric comparison failure.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <support/check.h>
+
+static void
+report (const char *which, const char *expr, long long value, int positive)
+{
+  printf ("  %s: ", which);
+  if (positive)
+    printf ("%llu", (unsigned long long) value);
+  else
+    printf ("%lld", value);
+  printf (" (0x%llx); from: %s\n", (unsigned long long) value, expr);
+}
+
+void
+support_test_compare_failure (const char *file, int line,
+                              const char *left_expr,
+                              long long left_value,
+                              int left_positive,
+                              const char *right_expr,
+                              long long right_value,
+                              int right_positive)
+{
+  support_record_failure ();
+  printf ("%s:%d: numeric comparison failure\n", file, line);
+  report (" left", left_expr, left_value, left_positive);
+  report ("right", right_expr, right_value, right_positive);
+}
diff --git a/support/xraise.c b/support/xraise.c
new file mode 100644
index 0000000000..9126c6c3ea
--- /dev/null
+++ b/support/xraise.c
@@ -0,0 +1,27 @@
+/* Error-checking wrapper for raise.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/check.h>
+#include <support/xsignal.h>
+
+void
+xraise (int sig)
+{
+  if (raise (sig) != 0)
+    FAIL_EXIT1 ("raise (%d): %m" , sig);
+}
diff --git a/support/xsigaction.c b/support/xsigaction.c
new file mode 100644
index 0000000000..b74c69afae
--- /dev/null
+++ b/support/xsigaction.c
@@ -0,0 +1,27 @@
+/* Error-checking wrapper for sigaction.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/check.h>
+#include <support/xsignal.h>
+
+void
+xsigaction (int sig, const struct sigaction *newact, struct sigaction *oldact)
+{
+  if (sigaction (sig, newact, oldact))
+    FAIL_EXIT1 ("sigaction (%d): %m" , sig);
+}
diff --git a/support/xsignal.c b/support/xsignal.c
new file mode 100644
index 0000000000..22a1dd74a7
--- /dev/null
+++ b/support/xsignal.c
@@ -0,0 +1,29 @@
+/* Error-checking wrapper for signal.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/check.h>
+#include <support/xsignal.h>
+
+sighandler_t
+xsignal (int sig, sighandler_t handler)
+{
+  sighandler_t result = signal (sig, handler);
+  if (result == SIG_ERR)
+    FAIL_EXIT1 ("signal (%d, %p): %m", sig, handler);
+  return result;
+}
diff --git a/support/xsignal.h b/support/xsignal.h
index 3dc0d9d5ce..3087ed0082 100644
--- a/support/xsignal.h
+++ b/support/xsignal.h
@@ -24,6 +24,14 @@
 
 __BEGIN_DECLS
 
+/* The following functions call the corresponding libc functions and
+   terminate the process on error.  */
+
+void xraise (int sig);
+sighandler_t xsignal (int sig, sighandler_t handler);
+void xsigaction (int sig, const struct sigaction *newact,
+                 struct sigaction *oldact);
+
 /* The following functions call the corresponding libpthread functions
    and terminate the process on error.  */
 
diff --git a/support/xsysconf.c b/support/xsysconf.c
new file mode 100644
index 0000000000..15ab1e26c4
--- /dev/null
+++ b/support/xsysconf.c
@@ -0,0 +1,36 @@
+/* Error-checking wrapper for sysconf.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <support/check.h>
+#include <support/xunistd.h>
+
+long
+xsysconf (int name)
+{
+  /* Detect errors by a changed errno value, in case -1 is a valid
+     value.  Make sure that the caller does not see the zero value for
+     errno.  */
+  int old_errno = errno;
+  errno = 0;
+  long result = sysconf (name);
+  if (errno != 0)
+    FAIL_EXIT1 ("sysconf (%d): %m", name);
+  errno = old_errno;
+  return result;
+}
diff --git a/support/xunistd.h b/support/xunistd.h
index 05c2626a7b..00376f7aae 100644
--- a/support/xunistd.h
+++ b/support/xunistd.h
@@ -39,6 +39,7 @@ void xstat (const char *path, struct stat64 *);
 void xmkdir (const char *path, mode_t);
 void xchroot (const char *path);
 void xunlink (const char *path);
+long xsysconf (int name);
 
 /* Read the link at PATH.  The caller should free the returned string
    with free.  */
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 53e41510e3..095cf93892 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -18,7 +18,7 @@ sysdep_routines += clone umount umount2 readahead \
 		   setfsuid setfsgid epoll_pwait signalfd \
 		   eventfd eventfd_read eventfd_write prlimit \
 		   personality epoll_wait tee vmsplice splice \
-		   open_by_handle_at
+		   open_by_handle_at pkey_set pkey_get
 
 CFLAGS-gethostid.c = -fexceptions
 CFLAGS-tee.c = -fexceptions -fasynchronous-unwind-tables
@@ -44,7 +44,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \
 
 tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \
 	 tst-quota tst-sync_file_range test-errno-linux tst-sysconf-iov_max \
-	 tst-memfd_create
+	 tst-memfd_create tst-pkey
 
 # Generate the list of SYS_* macros for the system calls (__NR_*
 # macros).  The file syscall-names.list contains all possible system
@@ -92,6 +92,8 @@ $(objpfx)tst-syscall-list.out: \
 # Separate object file for access to the constant from the UAPI header.
 $(objpfx)tst-sysconf-iov_max: $(objpfx)tst-sysconf-iov_max-uapi.o
 
+$(objpfx)tst-pkey: $(shared-thread-library)
+
 endif # $(subdir) == misc
 
 ifeq ($(subdir),time)
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index 992c19729f..798ffc7660 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -168,6 +168,7 @@ libc {
   }
   GLIBC_2.27 {
     memfd_create;
+    pkey_alloc; pkey_free; pkey_set; pkey_get; pkey_mprotect;
   }
   GLIBC_PRIVATE {
     # functions used in other libraries
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 140ca28abc..85788be12b 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2107,6 +2107,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index f698e1b2f4..3b463dacbe 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2018,6 +2018,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist
index 8a8af3e3e4..a1315aef35 100644
--- a/sysdeps/unix/sysv/linux/arm/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/libc.abilist
@@ -108,6 +108,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.4 GLIBC_2.4 A
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/bits/mman-linux.h b/sysdeps/unix/sysv/linux/bits/mman-linux.h
index b091181960..da5ec79334 100644
--- a/sysdeps/unix/sysv/linux/bits/mman-linux.h
+++ b/sysdeps/unix/sysv/linux/bits/mman-linux.h
@@ -109,3 +109,38 @@
 # define MCL_ONFAULT	4		/* Lock all pages that are
 					   faulted in.  */
 #endif
+
+/* Memory protection key support.  */
+#ifdef __USE_GNU
+
+/* FLags for pkey_alloc.  */
+# define PKEY_DISABLE_ACCESS 0x1
+# define PKEY_DISABLE_WRITE 0x2
+
+__BEGIN_DECLS
+
+/* Allocate a new protection key, with the PKEY_DISABLE_* bits
+   specified in ACCESS_RIGHTS.  The protection key mask for the
+   current thread is updated to match the access privilege for the new
+   key.  */
+int pkey_alloc (unsigned int __flags, unsigned int __access_rights) __THROW;
+
+/* Update the access rights for the current thread for KEY, which must
+   have been allocated using pkey_alloc.  */
+int pkey_set (int __key, unsigned int __access_rights) __THROW;
+
+/* Return the access rights for the current thread for KEY, which must
+   have been allocated using pkey_alloc.  */
+int pkey_get (int _key) __THROW;
+
+/* Free an allocated protection key, which must have been allocated
+   using pkey_alloc.  */
+int pkey_free (int __key) __THROW;
+
+/* Apply memory protection flags for KEY to the specified address
+   range.  */
+int pkey_mprotect (void *__addr, size_t __len, int __prot, int __pkey) __THROW;
+
+__END_DECLS
+
+#endif /* __USE_GNU */
diff --git a/sysdeps/unix/sysv/linux/bits/siginfo-consts.h b/sysdeps/unix/sysv/linux/bits/siginfo-consts.h
index 525840cea1..e86b933040 100644
--- a/sysdeps/unix/sysv/linux/bits/siginfo-consts.h
+++ b/sysdeps/unix/sysv/linux/bits/siginfo-consts.h
@@ -111,8 +111,12 @@ enum
 {
   SEGV_MAPERR = 1,		/* Address not mapped to object.  */
 #  define SEGV_MAPERR	SEGV_MAPERR
-  SEGV_ACCERR			/* Invalid permissions for mapped object.  */
+  SEGV_ACCERR,			/* Invalid permissions for mapped object.  */
 #  define SEGV_ACCERR	SEGV_ACCERR
+  SEGV_BNDERR,			/* Bounds checking failure.  */
+#  define SEGV_BNDERR	SEGV_BNDERR
+  SEGV_PKUERR			/* Protection key checking failure.  */
+#  define SEGV_PKUERR	SEGV_PKUERR
 };
 
 /* `si_code' values for SIGBUS signal.  */
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 5b81a6cd7d..7397d728f2 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -1872,6 +1872,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index 51ead9e867..cffdf251d6 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2037,6 +2037,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index 78b4ee8d40..3292510a55 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -1901,6 +1901,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index d9c97779e4..636bbdd1a7 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -109,6 +109,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.4 GLIBC_2.4 A
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 4acbf7eeed..6952863f86 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -1986,6 +1986,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
index 93f02f08ce..ac5b56abab 100644
--- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
@@ -2107,3 +2107,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 795e85de70..bb0958e842 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -1961,6 +1961,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index dc714057b7..9104eb4d6d 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -1959,6 +1959,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index ce7bc9b175..58a5d5e141 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -1957,6 +1957,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 3fdd85eace..2efac14a7d 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -1952,6 +1952,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 3e0bcb2a5c..9ef29e4e98 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2148,3 +2148,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/pkey_get.c b/sysdeps/unix/sysv/linux/pkey_get.c
new file mode 100644
index 0000000000..fc3204c82f
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pkey_get.c
@@ -0,0 +1,26 @@
+/* Obtaining the thread memory protection key, generic stub.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+
+int
+pkey_get (int key)
+{
+  __set_errno (ENOSYS);
+  return -1;
+}
diff --git a/sysdeps/unix/sysv/linux/pkey_set.c b/sysdeps/unix/sysv/linux/pkey_set.c
new file mode 100644
index 0000000000..f686c4373c
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/pkey_set.c
@@ -0,0 +1,26 @@
+/* Changing the thread memory protection key, generic stub.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+
+int
+pkey_set (int key, unsigned int access_rights)
+{
+  __set_errno (ENOSYS);
+  return -1;
+}
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index 375c69d9d1..60c024096f 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -1990,6 +1990,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index a88172a906..327933c973 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -1995,6 +1995,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist
index fa026a332c..b04c31bc10 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist
@@ -2202,3 +2202,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist
index 838f395d78..e0645e9e25 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist
@@ -109,6 +109,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 _Exit F
 GLIBC_2.3 _IO_2_1_stderr_ D 0xe0
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 41b79c496a..ef434c61a7 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -1990,6 +1990,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 68251a0e69..4114a4ce57 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -1891,6 +1891,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist
index bc1aae275e..f4478b0cc5 100644
--- a/sysdeps/unix/sysv/linux/sh/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/libc.abilist
@@ -1876,6 +1876,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 93e6d092ac..136a57fc0e 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -1983,6 +1983,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index b11d6764d4..9ad0790829 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -1920,6 +1920,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.27 strfromf128 F
 GLIBC_2.27 strtof128 F
 GLIBC_2.27 strtof128_l F
diff --git a/sysdeps/unix/sysv/linux/syscalls.list b/sysdeps/unix/sysv/linux/syscalls.list
index 40c4fbb9ea..6f657eea2e 100644
--- a/sysdeps/unix/sysv/linux/syscalls.list
+++ b/sysdeps/unix/sysv/linux/syscalls.list
@@ -110,3 +110,6 @@ setns		EXTRA	setns		i:ii	setns
 process_vm_readv EXTRA	process_vm_readv i:ipipii process_vm_readv
 process_vm_writev EXTRA	process_vm_writev i:ipipii process_vm_writev
 memfd_create    EXTRA	memfd_create	i:si    memfd_create
+pkey_alloc	EXTRA	pkey_alloc	i:ii	pkey_alloc
+pkey_free	EXTRA	pkey_free	i:i	pkey_free
+pkey_mprotect	EXTRA	pkey_mprotect	i:aiii  pkey_mprotect
diff --git a/sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist b/sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist
index e9eb4ff7bd..d4f2094027 100644
--- a/sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist
@@ -2114,3 +2114,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist b/sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist
index 8f08e909cd..4916dbabb5 100644
--- a/sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist
@@ -2114,3 +2114,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist b/sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist
index e9eb4ff7bd..d4f2094027 100644
--- a/sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist
+++ b/sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist
@@ -2114,3 +2114,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
diff --git a/sysdeps/unix/sysv/linux/tst-pkey.c b/sysdeps/unix/sysv/linux/tst-pkey.c
new file mode 100644
index 0000000000..42d50e37c2
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-pkey.c
@@ -0,0 +1,390 @@
+/* Tests for memory protection keys.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <inttypes.h>
+#include <setjmp.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/support.h>
+#include <support/test-driver.h>
+#include <support/xsignal.h>
+#include <support/xthread.h>
+#include <support/xunistd.h>
+#include <sys/mman.h>
+
+/* Used to force threads to wait until the main thread has set up the
+   keys as intended.  */
+static pthread_barrier_t barrier;
+
+/* The keys used for testing.  These have been allocated with access
+   rights set based on their array index.  */
+enum { key_count = 4 };
+static int keys[key_count];
+static volatile int *pages[key_count];
+
+/* Used to report results from the signal handler.  */
+static volatile void *sigsegv_addr;
+static volatile int sigsegv_code;
+static volatile int sigsegv_pkey;
+static sigjmp_buf sigsegv_jmp;
+
+/* Used to handle expected read or write faults.  */
+static void
+sigsegv_handler (int signum, siginfo_t *info, void *context)
+{
+  sigsegv_addr = info->si_addr;
+  sigsegv_code = info->si_code;
+  sigsegv_pkey = info->si_pkey;
+  siglongjmp (sigsegv_jmp, 2);
+}
+
+static const struct sigaction sigsegv_sigaction =
+  {
+    .sa_flags = SA_RESETHAND | SA_SIGINFO,
+    .sa_sigaction = &sigsegv_handler,
+  };
+
+/* Check if PAGE is readable (if !WRITE) or writable (if WRITE).  */
+static bool
+check_page_access (int page, bool write)
+{
+  /* This is needed to work around bug 22396: On x86-64, siglongjmp
+     does not restore the protection key access rights for the current
+     thread.  We restore only the access rights for the keys under
+     test.  (This is not a general solution to this problem, but it
+     allows testing to proceed after a fault.)  */
+  unsigned saved_rights[key_count];
+  for (int i = 0; i < key_count; ++i)
+    saved_rights[i] = pkey_get (keys[i]);
+
+  volatile int *addr = pages[page];
+  if (test_verbose > 0)
+    {
+      printf ("info: checking access at %p (page %d) for %s\n",
+              addr, page, write ? "writing" : "reading");
+    }
+  int result = sigsetjmp (sigsegv_jmp, 1);
+  if (result == 0)
+    {
+      xsigaction (SIGSEGV, &sigsegv_sigaction, NULL);
+      if (write)
+        *addr = 3;
+      else
+        (void) *addr;
+      xsignal (SIGSEGV, SIG_DFL);
+      if (test_verbose > 0)
+        puts ("  --> access allowed");
+      return true;
+    }
+  else
+    {
+      xsignal (SIGSEGV, SIG_DFL);
+      if (test_verbose > 0)
+        puts ("  --> access denied");
+      TEST_COMPARE (result, 2);
+      TEST_COMPARE ((uintptr_t) sigsegv_addr, (uintptr_t) addr);
+      TEST_COMPARE (sigsegv_code, SEGV_PKUERR);
+      TEST_COMPARE (sigsegv_pkey, keys[page]);
+      for (int i = 0; i < key_count; ++i)
+        TEST_COMPARE (pkey_set (keys[i], saved_rights[i]), 0);
+      return false;
+    }
+}
+
+static volatile sig_atomic_t sigusr1_handler_ran;
+
+/* Used to check that access is revoked in signal handlers.  */
+static void
+sigusr1_handler (int signum)
+{
+  TEST_COMPARE (signum, SIGUSR1);
+  for (int i = 0; i < key_count; ++i)
+    TEST_COMPARE (pkey_get (keys[i]), PKEY_DISABLE_ACCESS);
+  sigusr1_handler_ran = 1;
+}
+
+/* Used to report results from other threads.  */
+struct thread_result
+{
+  int access_rights[key_count];
+  pthread_t next_thread;
+};
+
+/* Return the thread's access rights for the keys under test.  */
+static void *
+get_thread_func (void *closure)
+{
+  struct thread_result *result = xmalloc (sizeof (*result));
+  for (int i = 0; i < key_count; ++i)
+    result->access_rights[i] = pkey_get (keys[i]);
+  memset (&result->next_thread, 0, sizeof (result->next_thread));
+  return result;
+}
+
+/* Wait for initialization and then check that the current thread does
+   not have access through the keys under test.  */
+static void *
+delayed_thread_func (void *closure)
+{
+  bool check_access = *(bool *) closure;
+  pthread_barrier_wait (&barrier);
+  struct thread_result *result = get_thread_func (NULL);
+
+  if (check_access)
+    {
+      /* Also check directly.  This code should not run with other
+         threads in parallel because of the SIGSEGV handler which is
+         installed by check_page_access.  */
+      for (int i = 0; i < key_count; ++i)
+        {
+          TEST_VERIFY (!check_page_access (i, false));
+          TEST_VERIFY (!check_page_access (i, true));
+        }
+    }
+
+  result->next_thread = xpthread_create (NULL, get_thread_func, NULL);
+  return result;
+}
+
+static int
+do_test (void)
+{
+  long pagesize = xsysconf (_SC_PAGESIZE);
+
+  xpthread_barrier_init (&barrier, NULL, 2);
+  bool delayed_thread_check_access = true;
+  pthread_t delayed_thread = xpthread_create
+    (NULL, &delayed_thread_func, &delayed_thread_check_access);
+
+  keys[0] = pkey_alloc (0, 0);
+  if (keys[0] < 0)
+    {
+      if (errno == ENOSYS)
+        {
+          puts ("warning: kernel does not support memory protection keys");
+          return EXIT_UNSUPPORTED;
+        }
+      if (errno == ENOSPC)
+        {
+          puts ("warning: CPU does not support memory protection keys");
+          return EXIT_UNSUPPORTED;
+        }
+      FAIL_EXIT1 ("pkey_alloc: %m");
+    }
+  TEST_COMPARE (pkey_get (keys[0]), 0);
+  for (int i = 1; i < key_count; ++i)
+    {
+      keys[i] = pkey_alloc (0, i);
+      if (keys[i] < 0)
+        FAIL_EXIT1 ("pkey_alloc (0, %d): %m", i);
+      /* pkey_alloc is supposed to change the current thread's access
+         rights for the new key.  */
+      TEST_COMPARE (pkey_get (keys[i]), i);
+    }
+  /* Check that all the keys have the expected access rights for the
+     current thread.  */
+  for (int i = 0; i < key_count; ++i)
+    TEST_COMPARE (pkey_get (keys[i]), i);
+
+  /* Allocate a test page for each key.  */
+  for (int i = 0; i < key_count; ++i)
+    {
+      pages[i] = xmmap (NULL, pagesize, PROT_READ | PROT_WRITE,
+                        MAP_ANONYMOUS | MAP_PRIVATE, -1);
+      TEST_COMPARE (pkey_mprotect ((void *) pages[i], pagesize,
+                                   PROT_READ | PROT_WRITE, keys[i]), 0);
+    }
+
+  /* Check that the initial thread does not have access to the new
+     keys.  */
+  {
+    pthread_barrier_wait (&barrier);
+    struct thread_result *result = xpthread_join (delayed_thread);
+    for (int i = 0; i < key_count; ++i)
+      TEST_COMPARE (result->access_rights[i],
+                    PKEY_DISABLE_ACCESS);
+    struct thread_result *result2 = xpthread_join (result->next_thread);
+    for (int i = 0; i < key_count; ++i)
+      TEST_COMPARE (result->access_rights[i],
+                    PKEY_DISABLE_ACCESS);
+    free (result);
+    free (result2);
+  }
+
+  /* Check that the current thread access rights are inherited by new
+     threads.  */
+  {
+    pthread_t get_thread = xpthread_create (NULL, get_thread_func, NULL);
+    struct thread_result *result = xpthread_join (get_thread);
+    for (int i = 0; i < key_count; ++i)
+      TEST_COMPARE (result->access_rights[i], i);
+    free (result);
+  }
+
+  for (int i = 0; i < key_count; ++i)
+    TEST_COMPARE (pkey_get (keys[i]), i);
+
+  /* Check that in a signal handler, there is no access.  */
+  xsignal (SIGUSR1, &sigusr1_handler);
+  xraise (SIGUSR1);
+  xsignal (SIGUSR1, SIG_DFL);
+  TEST_COMPARE (sigusr1_handler_ran, 1);
+
+  /* The first key results in a writable page.  */
+  TEST_VERIFY (check_page_access (0, false));
+  TEST_VERIFY (check_page_access (0, true));
+
+  /* The other keys do not.   */
+  for (int i = 1; i < key_count; ++i)
+    {
+      if (test_verbose)
+        printf ("info: checking access for key %d, bits 0x%x\n",
+                i, pkey_get (keys[i]));
+      for (int j = 0; j < key_count; ++j)
+        TEST_COMPARE (pkey_get (keys[j]), j);
+      if (i & PKEY_DISABLE_ACCESS)
+        {
+          TEST_VERIFY (!check_page_access (i, false));
+          TEST_VERIFY (!check_page_access (i, true));
+        }
+      else
+        {
+          TEST_VERIFY (i & PKEY_DISABLE_WRITE);
+          TEST_VERIFY (check_page_access (i, false));
+          TEST_VERIFY (!check_page_access (i, true));
+        }
+    }
+
+  /* But if we set the current thread's access rights, we gain
+     access.  */
+  for (int do_write = 0; do_write < 2; ++do_write)
+    for (int allowed_key = 0; allowed_key < key_count; ++allowed_key)
+      {
+        for (int i = 0; i < key_count; ++i)
+          if (i == allowed_key)
+            {
+              if (do_write)
+                TEST_COMPARE (pkey_set (keys[i], 0), 0);
+              else
+                TEST_COMPARE (pkey_set (keys[i], PKEY_DISABLE_WRITE), 0);
+            }
+          else
+            TEST_COMPARE (pkey_set (keys[i], PKEY_DISABLE_ACCESS), 0);
+
+        if (test_verbose)
+          printf ("info: key %d is allowed access for %s\n",
+                  allowed_key, do_write ? "writing" : "reading");
+        for (int i = 0; i < key_count; ++i)
+          if (i == allowed_key)
+            {
+              TEST_VERIFY (check_page_access (i, false));
+              TEST_VERIFY (check_page_access (i, true) == do_write);
+            }
+          else
+            {
+              TEST_VERIFY (!check_page_access (i, false));
+              TEST_VERIFY (!check_page_access (i, true));
+            }
+      }
+
+  /* Restore access to all keys, and launch a thread which should
+     inherit that access.  */
+  for (int i = 0; i < key_count; ++i)
+    {
+      TEST_COMPARE (pkey_set (keys[i], 0), 0);
+      TEST_VERIFY (check_page_access (i, false));
+      TEST_VERIFY (check_page_access (i, true));
+    }
+  delayed_thread_check_access = false;
+  delayed_thread = xpthread_create
+    (NULL, delayed_thread_func, &delayed_thread_check_access);
+
+  TEST_COMPARE (pkey_free (keys[0]), 0);
+  /* Second pkey_free will fail because the key has already been
+     freed.  */
+  TEST_COMPARE (pkey_free (keys[0]),-1);
+  TEST_COMPARE (errno, EINVAL);
+  for (int i = 1; i < key_count; ++i)
+    TEST_COMPARE (pkey_free (keys[i]), 0);
+
+  /* Check what happens to running threads which have access to
+     previously allocated protection keys.  The implemented behavior
+     is somewhat dubious: Ideally, pkey_free should revoke access to
+     that key and pkey_alloc of the same (numeric) key should not
+     implicitly confer access to already-running threads, but this is
+     not what happens in practice.  */
+  {
+    /* The limit is in place to avoid running indefinitely in case
+       there many keys available.  */
+    int *keys_array = xcalloc (100000, sizeof (*keys_array));
+    int keys_allocated = 0;
+    while (keys_allocated < 100000)
+      {
+        int new_key = pkey_alloc (0, PKEY_DISABLE_WRITE);
+        if (new_key < 0)
+          {
+            /* No key reuse observed before running out of keys.  */
+            TEST_COMPARE (errno, ENOSPC);
+            break;
+          }
+        for (int i = 0; i < key_count; ++i)
+          if (new_key == keys[i])
+            {
+              /* We allocated the key with disabled write access.
+                 This should affect the protection state of the
+                 existing page.  */
+              TEST_VERIFY (check_page_access (i, false));
+              TEST_VERIFY (!check_page_access (i, true));
+
+              xpthread_barrier_wait (&barrier);
+              struct thread_result *result = xpthread_join (delayed_thread);
+              /* The thread which was launched before should still have
+                 access to the key.  */
+              TEST_COMPARE (result->access_rights[i], 0);
+              struct thread_result *result2
+                = xpthread_join (result->next_thread);
+              /* Same for a thread which is launched afterwards from
+                 the old thread.  */
+              TEST_COMPARE (result2->access_rights[i], 0);
+              free (result);
+              free (result2);
+              keys_array[keys_allocated++] = new_key;
+              goto after_key_search;
+            }
+        /* Save key for later deallocation.  */
+        keys_array[keys_allocated++] = new_key;
+      }
+  after_key_search:
+    /* Deallocate the keys allocated for testing purposes.  */
+    for (int j = 0; j < keys_allocated; ++j)
+      TEST_COMPARE (pkey_free (keys_array[j]), 0);
+    free (keys_array);
+  }
+
+  for (int i = 0; i < key_count; ++i)
+    xmunmap ((void *) pages[i], pagesize);
+
+  xpthread_barrier_destroy (&barrier);
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 0a4f7797ac..1ea74f9e8c 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -1878,6 +1878,11 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F
 GLIBC_2.3 GLIBC_2.3 A
 GLIBC_2.3 __ctype_b_loc F
 GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/x86_64/arch-pkey.h b/sysdeps/unix/sysv/linux/x86_64/arch-pkey.h
new file mode 100644
index 0000000000..8e9bfdae96
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/arch-pkey.h
@@ -0,0 +1,40 @@
+/* Helper functions for manipulating memory protection keys.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _ARCH_PKEY_H
+#define _ARCH_PKEY_H
+
+/* Return the value of the PKRU register.  */
+static inline unsigned int
+pkey_read (void)
+{
+  unsigned int result;
+  __asm__ volatile (".byte 0x0f, 0x01, 0xee"
+                    : "=a" (result) : "c" (0) : "rdx");
+  return result;
+}
+
+/* Overwrite the PKRU register with VALUE.  */
+static inline void
+pkey_write (unsigned int value)
+{
+  __asm__ volatile (".byte 0x0f, 0x01, 0xef"
+                    : : "a" (value), "c" (0), "d" (0));
+}
+
+#endif /* _ARCH_PKEY_H */
diff --git a/sysdeps/unix/sysv/linux/x86_64/pkey_get.c b/sysdeps/unix/sysv/linux/x86_64/pkey_get.c
new file mode 100644
index 0000000000..3a9bfbe676
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/pkey_get.c
@@ -0,0 +1,33 @@
+/* Reading the per-thread memory protection key, x86_64 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <arch-pkey.h>
+#include <errno.h>
+
+int
+pkey_get (int key)
+{
+  if (key < 0 || key > 15)
+    {
+      __set_errno (EINVAL);
+      return -1;
+    }
+  unsigned int pkru = pkey_read ();
+  return (pkru >> (2 * key)) & 3;
+  return 0;
+}
diff --git a/sysdeps/unix/sysv/linux/x86_64/pkey_set.c b/sysdeps/unix/sysv/linux/x86_64/pkey_set.c
new file mode 100644
index 0000000000..91dffd22c3
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/pkey_set.c
@@ -0,0 +1,35 @@
+/* Changing the per-thread memory protection key, x86_64 version.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <arch-pkey.h>
+#include <errno.h>
+
+int
+pkey_set (int key, unsigned int rights)
+{
+  if (key < 0 || key > 15 || rights > 3)
+    {
+      __set_errno (EINVAL);
+      return -1;
+    }
+  unsigned int mask = 3 << (2 * key);
+  unsigned int pkru = pkey_read ();
+  pkru = (pkru & ~mask) | (rights << (2 * key));
+  pkey_write (pkru);
+  return 0;
+}
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index 23f6a91429..1d3d598618 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2121,3 +2121,8 @@ GLIBC_2.27 GLIBC_2.27 A
 GLIBC_2.27 glob F
 GLIBC_2.27 glob64 F
 GLIBC_2.27 memfd_create F
+GLIBC_2.27 pkey_alloc F
+GLIBC_2.27 pkey_free F
+GLIBC_2.27 pkey_get F
+GLIBC_2.27 pkey_mprotect F
+GLIBC_2.27 pkey_set F

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
  2017-11-05 10:35 ` Florian Weimer
@ 2017-11-08 20:41   ` Dave Hansen
  -1 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-08 20:41 UTC (permalink / raw)
  To: Florian Weimer, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/05/2017 02:35 AM, Florian Weimer wrote:
> I don't think pkey_free, as it is implemented today, is very safe due to
> key reuse by a subsequent pkey_alloc.  I see two problems:
> 
> (A) pkey_free allows reuse for they key while there are still mappings
> that use it.

I don't agree with this assessment.  Is malloc() unsafe?  If someone
free()s memory that is still in use, a subsequent malloc() would hand
the address out again for reuse.

> (B) If a key is reused, existing threads retain their access rights,
> while there is an expectation that pkey_alloc denies access for the
> threads except the current one.
Where does this expectation come from?  Using the malloc() analogy, we
don't expect that free() in one thread actively takes away references to
the memory held by other threads.

We define free() as only being called on resources to which there are no
active references.  If you free() things in use, bad things happen.
pkey_free() is only to be called when there is nothing actively using
the key.  If you pkey_free() an in-use key, bad things happen.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-08 20:41   ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-08 20:41 UTC (permalink / raw)
  To: Florian Weimer, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/05/2017 02:35 AM, Florian Weimer wrote:
> I don't think pkey_free, as it is implemented today, is very safe due to
> key reuse by a subsequent pkey_alloc.A  I see two problems:
> 
> (A) pkey_free allows reuse for they key while there are still mappings
> that use it.

I don't agree with this assessment.  Is malloc() unsafe?  If someone
free()s memory that is still in use, a subsequent malloc() would hand
the address out again for reuse.

> (B) If a key is reused, existing threads retain their access rights,
> while there is an expectation that pkey_alloc denies access for the
> threads except the current one.
Where does this expectation come from?  Using the malloc() analogy, we
don't expect that free() in one thread actively takes away references to
the memory held by other threads.

We define free() as only being called on resources to which there are no
active references.  If you free() things in use, bad things happen.
pkey_free() is only to be called when there is nothing actively using
the key.  If you pkey_free() an in-use key, bad things happen.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-09 14:48     ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-09 14:48 UTC (permalink / raw)
  To: Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/08/2017 09:41 PM, Dave Hansen wrote:
> On 11/05/2017 02:35 AM, Florian Weimer wrote:
>> I don't think pkey_free, as it is implemented today, is very safe due to
>> key reuse by a subsequent pkey_alloc.  I see two problems:
>>
>> (A) pkey_free allows reuse for they key while there are still mappings
>> that use it.
> 
> I don't agree with this assessment.  Is malloc() unsafe?  If someone
> free()s memory that is still in use, a subsequent malloc() would hand
> the address out again for reuse.

I think the disagreement is not about what is considered acceptable 
behavior as such, but what constitutes “use”.

And even if with concurrent use, the behavior can be well-defined.  We 
make sure that if munmap is called, we do not return before all threads 
have observed in principle that the page is gone (at considerable cost, 
of course, and in most cases, that is total overkill).

I'm pretty sure there is another key reuse scenario which does not even 
involve pkey_free, but I need to write a test first.

>> (B) If a key is reused, existing threads retain their access rights,
>> while there is an expectation that pkey_alloc denies access for the
>> threads except the current one.
> Where does this expectation come from?

For me, it was the access_rights argument to pkey_alloc.  What else 
would it do?  For the current thread, I can already set the rights with 
a PKRU write, so the existence of the syscall argument is puzzling.

> Using the malloc() analogy, we
> don't expect that free() in one thread actively takes away references to
> the memory held by other threads.

But malloc/free isn't expected to be a partial antidote to random 
pointer scribbling.

> We define free() as only being called on resources to which there are no
> active references.  If you free() things in use, bad things happen.
> pkey_free() is only to be called when there is nothing actively using
> the key.  If you pkey_free() an in-use key, bad things happen.

My impression was that MPK was intended as a fallback in case you did 
that, and unrelated code suddenly writes through a dangling pointer and 
accidentally hits the DAX-mapped persistent memory of the database.  To 
prevent that, the those pages are mapped write-disabled on all threads 
almost all the time, and only if the database needs to write something, 
it temporarily tweaks PKRU so that it gains access.  All that assumes 
that you can actually restrict all threads in the process, but with the 
current implementation, that's not true even if threads never touch keys 
they don't know.

I think we should either implement revoke on pkey_alloc, with a 
broadcast to all threads (the pkey_set race can be closed by having a 
vDSO for that an the revocation code can check %rip to see if the old 
PKRU value needs to be fixed up).  Or we add the two pkey_alloc flags I 
mentioned earlier.

Thanks,
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-09 14:48     ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-09 14:48 UTC (permalink / raw)
  To: Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/08/2017 09:41 PM, Dave Hansen wrote:
> On 11/05/2017 02:35 AM, Florian Weimer wrote:
>> I don't think pkey_free, as it is implemented today, is very safe due to
>> key reuse by a subsequent pkey_alloc.  I see two problems:
>>
>> (A) pkey_free allows reuse for they key while there are still mappings
>> that use it.
> 
> I don't agree with this assessment.  Is malloc() unsafe?  If someone
> free()s memory that is still in use, a subsequent malloc() would hand
> the address out again for reuse.

I think the disagreement is not about what is considered acceptable 
behavior as such, but what constitutes “use”.

And even if with concurrent use, the behavior can be well-defined.  We 
make sure that if munmap is called, we do not return before all threads 
have observed in principle that the page is gone (at considerable cost, 
of course, and in most cases, that is total overkill).

I'm pretty sure there is another key reuse scenario which does not even 
involve pkey_free, but I need to write a test first.

>> (B) If a key is reused, existing threads retain their access rights,
>> while there is an expectation that pkey_alloc denies access for the
>> threads except the current one.
> Where does this expectation come from?

For me, it was the access_rights argument to pkey_alloc.  What else 
would it do?  For the current thread, I can already set the rights with 
a PKRU write, so the existence of the syscall argument is puzzling.

> Using the malloc() analogy, we
> don't expect that free() in one thread actively takes away references to
> the memory held by other threads.

But malloc/free isn't expected to be a partial antidote to random 
pointer scribbling.

> We define free() as only being called on resources to which there are no
> active references.  If you free() things in use, bad things happen.
> pkey_free() is only to be called when there is nothing actively using
> the key.  If you pkey_free() an in-use key, bad things happen.

My impression was that MPK was intended as a fallback in case you did 
that, and unrelated code suddenly writes through a dangling pointer and 
accidentally hits the DAX-mapped persistent memory of the database.  To 
prevent that, the those pages are mapped write-disabled on all threads 
almost all the time, and only if the database needs to write something, 
it temporarily tweaks PKRU so that it gains access.  All that assumes 
that you can actually restrict all threads in the process, but with the 
current implementation, that's not true even if threads never touch keys 
they don't know.

I think we should either implement revoke on pkey_alloc, with a 
broadcast to all threads (the pkey_set race can be closed by having a 
vDSO for that an the revocation code can check %rip to see if the old 
PKRU value needs to be fixed up).  Or we add the two pkey_alloc flags I 
mentioned earlier.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-09 14:48     ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-09 14:48 UTC (permalink / raw)
  To: Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/08/2017 09:41 PM, Dave Hansen wrote:
> On 11/05/2017 02:35 AM, Florian Weimer wrote:
>> I don't think pkey_free, as it is implemented today, is very safe due to
>> key reuse by a subsequent pkey_alloc.A  I see two problems:
>>
>> (A) pkey_free allows reuse for they key while there are still mappings
>> that use it.
> 
> I don't agree with this assessment.  Is malloc() unsafe?  If someone
> free()s memory that is still in use, a subsequent malloc() would hand
> the address out again for reuse.

I think the disagreement is not about what is considered acceptable 
behavior as such, but what constitutes a??usea??.

And even if with concurrent use, the behavior can be well-defined.  We 
make sure that if munmap is called, we do not return before all threads 
have observed in principle that the page is gone (at considerable cost, 
of course, and in most cases, that is total overkill).

I'm pretty sure there is another key reuse scenario which does not even 
involve pkey_free, but I need to write a test first.

>> (B) If a key is reused, existing threads retain their access rights,
>> while there is an expectation that pkey_alloc denies access for the
>> threads except the current one.
> Where does this expectation come from?

For me, it was the access_rights argument to pkey_alloc.  What else 
would it do?  For the current thread, I can already set the rights with 
a PKRU write, so the existence of the syscall argument is puzzling.

> Using the malloc() analogy, we
> don't expect that free() in one thread actively takes away references to
> the memory held by other threads.

But malloc/free isn't expected to be a partial antidote to random 
pointer scribbling.

> We define free() as only being called on resources to which there are no
> active references.  If you free() things in use, bad things happen.
> pkey_free() is only to be called when there is nothing actively using
> the key.  If you pkey_free() an in-use key, bad things happen.

My impression was that MPK was intended as a fallback in case you did 
that, and unrelated code suddenly writes through a dangling pointer and 
accidentally hits the DAX-mapped persistent memory of the database.  To 
prevent that, the those pages are mapped write-disabled on all threads 
almost all the time, and only if the database needs to write something, 
it temporarily tweaks PKRU so that it gains access.  All that assumes 
that you can actually restrict all threads in the process, but with the 
current implementation, that's not true even if threads never touch keys 
they don't know.

I think we should either implement revoke on pkey_alloc, with a 
broadcast to all threads (the pkey_set race can be closed by having a 
vDSO for that an the revocation code can check %rip to see if the old 
PKRU value needs to be fixed up).  Or we add the two pkey_alloc flags I 
mentioned earlier.

Thanks,
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-09 16:59       ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-09 16:59 UTC (permalink / raw)
  To: Florian Weimer, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/09/2017 06:48 AM, Florian Weimer wrote:
> On 11/08/2017 09:41 PM, Dave Hansen wrote:
>>> (B) If a key is reused, existing threads retain their access rights,
>>> while there is an expectation that pkey_alloc denies access for the
>>> threads except the current one.
>> Where does this expectation come from?
> 
> For me, it was the access_rights argument to pkey_alloc.  What else
> would it do?  For the current thread, I can already set the rights with
> a PKRU write, so the existence of the syscall argument is puzzling.

The manpage is pretty bare here.  But the thought was that, in most
cases, you will want to allocate a key and start using it immediately.
This was in response to some feedback on one of the earlier reviews of
the patch set.

>> Using the malloc() analogy, we
>> don't expect that free() in one thread actively takes away references to
>> the memory held by other threads.
> 
> But malloc/free isn't expected to be a partial antidote to random
> pointer scribbling.

Nor is protection keys intended to be an antidote for use-after-free.

> I think we should either implement revoke on pkey_alloc, with a
> broadcast to all threads (the pkey_set race can be closed by having a
> vDSO for that an the revocation code can check %rip to see if the old
> PKRU value needs to be fixed up).  Or we add the two pkey_alloc flags I
> mentioned earlier.

That sounds awfully complicated to put in-kernel.  I'd be happy to
review the patches after you put them together once we see how it looks.

You basically want threads to broadcast their PKRU values at pkey_free()
time.  That's totally doable... in userspace.  You just need a mechanism
for each thread to periodically check if they need an update.  I don't
think we need kernel intervention and vDSO magic for that.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-09 16:59       ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-09 16:59 UTC (permalink / raw)
  To: Florian Weimer, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/09/2017 06:48 AM, Florian Weimer wrote:
> On 11/08/2017 09:41 PM, Dave Hansen wrote:
>>> (B) If a key is reused, existing threads retain their access rights,
>>> while there is an expectation that pkey_alloc denies access for the
>>> threads except the current one.
>> Where does this expectation come from?
> 
> For me, it was the access_rights argument to pkey_alloc.  What else
> would it do?  For the current thread, I can already set the rights with
> a PKRU write, so the existence of the syscall argument is puzzling.

The manpage is pretty bare here.  But the thought was that, in most
cases, you will want to allocate a key and start using it immediately.
This was in response to some feedback on one of the earlier reviews of
the patch set.

>> Using the malloc() analogy, we
>> don't expect that free() in one thread actively takes away references to
>> the memory held by other threads.
> 
> But malloc/free isn't expected to be a partial antidote to random
> pointer scribbling.

Nor is protection keys intended to be an antidote for use-after-free.

> I think we should either implement revoke on pkey_alloc, with a
> broadcast to all threads (the pkey_set race can be closed by having a
> vDSO for that an the revocation code can check %rip to see if the old
> PKRU value needs to be fixed up).  Or we add the two pkey_alloc flags I
> mentioned earlier.

That sounds awfully complicated to put in-kernel.  I'd be happy to
review the patches after you put them together once we see how it looks.

You basically want threads to broadcast their PKRU values at pkey_free()
time.  That's totally doable... in userspace.  You just need a mechanism
for each thread to periodically check if they need an update.  I don't
think we need kernel intervention and vDSO magic for that.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-09 16:59       ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-09 16:59 UTC (permalink / raw)
  To: Florian Weimer, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/09/2017 06:48 AM, Florian Weimer wrote:
> On 11/08/2017 09:41 PM, Dave Hansen wrote:
>>> (B) If a key is reused, existing threads retain their access rights,
>>> while there is an expectation that pkey_alloc denies access for the
>>> threads except the current one.
>> Where does this expectation come from?
> 
> For me, it was the access_rights argument to pkey_alloc.A  What else
> would it do?A  For the current thread, I can already set the rights with
> a PKRU write, so the existence of the syscall argument is puzzling.

The manpage is pretty bare here.  But the thought was that, in most
cases, you will want to allocate a key and start using it immediately.
This was in response to some feedback on one of the earlier reviews of
the patch set.

>> Using the malloc() analogy, we
>> don't expect that free() in one thread actively takes away references to
>> the memory held by other threads.
> 
> But malloc/free isn't expected to be a partial antidote to random
> pointer scribbling.

Nor is protection keys intended to be an antidote for use-after-free.

> I think we should either implement revoke on pkey_alloc, with a
> broadcast to all threads (the pkey_set race can be closed by having a
> vDSO for that an the revocation code can check %rip to see if the old
> PKRU value needs to be fixed up).A  Or we add the two pkey_alloc flags I
> mentioned earlier.

That sounds awfully complicated to put in-kernel.  I'd be happy to
review the patches after you put them together once we see how it looks.

You basically want threads to broadcast their PKRU values at pkey_free()
time.  That's totally doable... in userspace.  You just need a mechanism
for each thread to periodically check if they need an update.  I don't
think we need kernel intervention and vDSO magic for that.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey (was: pkey_free and key reuse)
@ 2017-11-22  8:18     ` Vlastimil Babka
  0 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-22  8:18 UTC (permalink / raw)
  To: Florian Weimer, Dave Hansen, linux-x86_64-u79uwXL29TY76Z2rM5mHXA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mm, Linux API

On 11/05/2017 11:35 AM, Florian Weimer wrote:
> I'm working on adding memory protection key support to glibc.
> 
> I don't think pkey_free, as it is implemented today, is very safe due to 
> key reuse by a subsequent pkey_alloc.  I see two problems:
> 
> (A) pkey_free allows reuse for they key while there are still mappings 
> that use it.
> 
> (B) If a key is reused, existing threads retain their access rights, 
> while there is an expectation that pkey_alloc denies access for the 
> threads except the current one.

I have a somewhat related question to API/documentation of pkeys, that
came up from a customer interested in using the feature. The man page of
mprotect/pkey_mprotect doesn't say how to remove a pkey from a set of
pages, i.e. reset it to the default 0 (or the exec-only pkey), so
initially they thought there's no way to do that.

Calling pkey_mprotect() with pkey==0 will fail with EINVAL, because 0
was not allocated by pkey_alloc(). That's fair I guess.

What seems to work to reset the pkey is either calling plain mprotect(),
or calling pkey_mprotect() with pkey == -1, as the former is just wired
to the latter.

So, is plain mprotect() the intended way to reset a pkey and should it
be explicitly documented in the man page?

And, was the pkey == -1 internal wiring supposed to be exposed to the
pkey_mprotect() signal, or should there have been a pre-check returning
EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
do_mprotect_pkey())? I assume it's too late to change it now anyway (or
not?), so should we also document it?

Thanks,
Vlastimil

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey (was: pkey_free and key reuse)
@ 2017-11-22  8:18     ` Vlastimil Babka
  0 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-22  8:18 UTC (permalink / raw)
  To: Florian Weimer, Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/05/2017 11:35 AM, Florian Weimer wrote:
> I'm working on adding memory protection key support to glibc.
> 
> I don't think pkey_free, as it is implemented today, is very safe due to 
> key reuse by a subsequent pkey_alloc.  I see two problems:
> 
> (A) pkey_free allows reuse for they key while there are still mappings 
> that use it.
> 
> (B) If a key is reused, existing threads retain their access rights, 
> while there is an expectation that pkey_alloc denies access for the 
> threads except the current one.

I have a somewhat related question to API/documentation of pkeys, that
came up from a customer interested in using the feature. The man page of
mprotect/pkey_mprotect doesn't say how to remove a pkey from a set of
pages, i.e. reset it to the default 0 (or the exec-only pkey), so
initially they thought there's no way to do that.

Calling pkey_mprotect() with pkey==0 will fail with EINVAL, because 0
was not allocated by pkey_alloc(). That's fair I guess.

What seems to work to reset the pkey is either calling plain mprotect(),
or calling pkey_mprotect() with pkey == -1, as the former is just wired
to the latter.

So, is plain mprotect() the intended way to reset a pkey and should it
be explicitly documented in the man page?

And, was the pkey == -1 internal wiring supposed to be exposed to the
pkey_mprotect() signal, or should there have been a pre-check returning
EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
do_mprotect_pkey())? I assume it's too late to change it now anyway (or
not?), so should we also document it?

Thanks,
Vlastimil

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey (was: pkey_free and key reuse)
@ 2017-11-22  8:18     ` Vlastimil Babka
  0 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-22  8:18 UTC (permalink / raw)
  To: Florian Weimer, Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/05/2017 11:35 AM, Florian Weimer wrote:
> I'm working on adding memory protection key support to glibc.
> 
> I don't think pkey_free, as it is implemented today, is very safe due to 
> key reuse by a subsequent pkey_alloc.  I see two problems:
> 
> (A) pkey_free allows reuse for they key while there are still mappings 
> that use it.
> 
> (B) If a key is reused, existing threads retain their access rights, 
> while there is an expectation that pkey_alloc denies access for the 
> threads except the current one.

I have a somewhat related question to API/documentation of pkeys, that
came up from a customer interested in using the feature. The man page of
mprotect/pkey_mprotect doesn't say how to remove a pkey from a set of
pages, i.e. reset it to the default 0 (or the exec-only pkey), so
initially they thought there's no way to do that.

Calling pkey_mprotect() with pkey==0 will fail with EINVAL, because 0
was not allocated by pkey_alloc(). That's fair I guess.

What seems to work to reset the pkey is either calling plain mprotect(),
or calling pkey_mprotect() with pkey == -1, as the former is just wired
to the latter.

So, is plain mprotect() the intended way to reset a pkey and should it
be explicitly documented in the man page?

And, was the pkey == -1 internal wiring supposed to be exposed to the
pkey_mprotect() signal, or should there have been a pre-check returning
EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
do_mprotect_pkey())? I assume it's too late to change it now anyway (or
not?), so should we also document it?

Thanks,
Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 12:15       ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-22 12:15 UTC (permalink / raw)
  To: Vlastimil Babka, Dave Hansen, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
> And, was the pkey == -1 internal wiring supposed to be exposed to the
> pkey_mprotect() signal, or should there have been a pre-check returning
> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
> not?), so should we also document it?

I think the -1 case to the set the default key is useful because it 
allows you to use a key value of -1 to mean “MPK is not supported”, and 
still call pkey_mprotect.

I plan to document this behavior on the glibc side, and glibc will call 
mprotect (not pkey_mprotect) for key -1, so that you won't get ENOSYS 
with kernels which do not support pkey_mprotect.

Thanks,
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 12:15       ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-22 12:15 UTC (permalink / raw)
  To: Vlastimil Babka, Dave Hansen, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
> And, was the pkey == -1 internal wiring supposed to be exposed to the
> pkey_mprotect() signal, or should there have been a pre-check returning
> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
> not?), so should we also document it?

I think the -1 case to the set the default key is useful because it 
allows you to use a key value of -1 to mean “MPK is not supported”, and 
still call pkey_mprotect.

I plan to document this behavior on the glibc side, and glibc will call 
mprotect (not pkey_mprotect) for key -1, so that you won't get ENOSYS 
with kernels which do not support pkey_mprotect.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 12:15       ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-22 12:15 UTC (permalink / raw)
  To: Vlastimil Babka, Dave Hansen, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
> And, was the pkey == -1 internal wiring supposed to be exposed to the
> pkey_mprotect() signal, or should there have been a pre-check returning
> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
> not?), so should we also document it?

I think the -1 case to the set the default key is useful because it 
allows you to use a key value of -1 to mean a??MPK is not supporteda??, and 
still call pkey_mprotect.

I plan to document this behavior on the glibc side, and glibc will call 
mprotect (not pkey_mprotect) for key -1, so that you won't get ENOSYS 
with kernels which do not support pkey_mprotect.

Thanks,
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 12:46         ` Vlastimil Babka
  0 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-22 12:46 UTC (permalink / raw)
  To: Florian Weimer, Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/22/2017 01:15 PM, Florian Weimer wrote:
> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>> pkey_mprotect() signal, or should there have been a pre-check returning
>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>> not?), so should we also document it?
> 
> I think the -1 case to the set the default key is useful because it 
> allows you to use a key value of -1 to mean “MPK is not supported”, and 
> still call pkey_mprotect.

Hmm the current manpage says then when MPK is not supported, pkey has to
be specified 0. Which is a value that doesn't work when MPK *is*
supported. So -1 is more universal indeed.

> I plan to document this behavior on the glibc side, and glibc will call 
> mprotect (not pkey_mprotect) for key -1, so that you won't get ENOSYS 
> with kernels which do not support pkey_mprotect.

Fair enough. What will you do about pkey_alloc() in that case, emulate
ENOSPC? Oh, the manpage already suggests so. And the return value in
that case is... -1. Makes sense :)

> Thanks,
> Florian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 12:46         ` Vlastimil Babka
  0 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-22 12:46 UTC (permalink / raw)
  To: Florian Weimer, Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/22/2017 01:15 PM, Florian Weimer wrote:
> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>> pkey_mprotect() signal, or should there have been a pre-check returning
>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>> not?), so should we also document it?
> 
> I think the -1 case to the set the default key is useful because it 
> allows you to use a key value of -1 to mean “MPK is not supported”, and 
> still call pkey_mprotect.

Hmm the current manpage says then when MPK is not supported, pkey has to
be specified 0. Which is a value that doesn't work when MPK *is*
supported. So -1 is more universal indeed.

> I plan to document this behavior on the glibc side, and glibc will call 
> mprotect (not pkey_mprotect) for key -1, so that you won't get ENOSYS 
> with kernels which do not support pkey_mprotect.

Fair enough. What will you do about pkey_alloc() in that case, emulate
ENOSPC? Oh, the manpage already suggests so. And the return value in
that case is... -1. Makes sense :)

> Thanks,
> Florian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 12:46         ` Vlastimil Babka
  0 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-22 12:46 UTC (permalink / raw)
  To: Florian Weimer, Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/22/2017 01:15 PM, Florian Weimer wrote:
> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>> pkey_mprotect() signal, or should there have been a pre-check returning
>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>> not?), so should we also document it?
> 
> I think the -1 case to the set the default key is useful because it 
> allows you to use a key value of -1 to mean a??MPK is not supporteda??, and 
> still call pkey_mprotect.

Hmm the current manpage says then when MPK is not supported, pkey has to
be specified 0. Which is a value that doesn't work when MPK *is*
supported. So -1 is more universal indeed.

> I plan to document this behavior on the glibc side, and glibc will call 
> mprotect (not pkey_mprotect) for key -1, so that you won't get ENOSYS 
> with kernels which do not support pkey_mprotect.

Fair enough. What will you do about pkey_alloc() in that case, emulate
ENOSPC? Oh, the manpage already suggests so. And the return value in
that case is... -1. Makes sense :)

> Thanks,
> Florian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 12:49             ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-22 12:49 UTC (permalink / raw)
  To: Vlastimil Babka, Dave Hansen,
	linux-x86_64-u79uwXL29TY76Z2rM5mHXA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mm, Linux API

On 11/22/2017 01:46 PM, Vlastimil Babka wrote:
> On 11/22/2017 01:15 PM, Florian Weimer wrote:
>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>> not?), so should we also document it?
>>
>> I think the -1 case to the set the default key is useful because it
>> allows you to use a key value of -1 to mean “MPK is not supported”, and
>> still call pkey_mprotect.
> 
> Hmm the current manpage says then when MPK is not supported, pkey has to
> be specified 0. Which is a value that doesn't work when MPK *is*
> supported. So -1 is more universal indeed.

-1 also chosen a different key if key 0 does not support the requested 
protection flags.

>> I plan to document this behavior on the glibc side, and glibc will call
>> mprotect (not pkey_mprotect) for key -1, so that you won't get ENOSYS
>> with kernels which do not support pkey_mprotect.
> 
> Fair enough. What will you do about pkey_alloc() in that case, emulate
> ENOSPC? Oh, the manpage already suggests so. And the return value in
> that case is... -1. Makes sense :)

The manual page is incorrect, the kernel actually returns EINVAL. 
Applications should check for EINVAL (and also ENOSYS) and activate 
fallback code.  Using -1 directly would be a bit reckless IMHO.

Thanks
Florian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 12:49             ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-22 12:49 UTC (permalink / raw)
  To: Vlastimil Babka, Dave Hansen, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/22/2017 01:46 PM, Vlastimil Babka wrote:
> On 11/22/2017 01:15 PM, Florian Weimer wrote:
>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>> not?), so should we also document it?
>>
>> I think the -1 case to the set the default key is useful because it
>> allows you to use a key value of -1 to mean “MPK is not supported”, and
>> still call pkey_mprotect.
> 
> Hmm the current manpage says then when MPK is not supported, pkey has to
> be specified 0. Which is a value that doesn't work when MPK *is*
> supported. So -1 is more universal indeed.

-1 also chosen a different key if key 0 does not support the requested 
protection flags.

>> I plan to document this behavior on the glibc side, and glibc will call
>> mprotect (not pkey_mprotect) for key -1, so that you won't get ENOSYS
>> with kernels which do not support pkey_mprotect.
> 
> Fair enough. What will you do about pkey_alloc() in that case, emulate
> ENOSPC? Oh, the manpage already suggests so. And the return value in
> that case is... -1. Makes sense :)

The manual page is incorrect, the kernel actually returns EINVAL. 
Applications should check for EINVAL (and also ENOSYS) and activate 
fallback code.  Using -1 directly would be a bit reckless IMHO.

Thanks
Florian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 12:49             ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-22 12:49 UTC (permalink / raw)
  To: Vlastimil Babka, Dave Hansen, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/22/2017 01:46 PM, Vlastimil Babka wrote:
> On 11/22/2017 01:15 PM, Florian Weimer wrote:
>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>> not?), so should we also document it?
>>
>> I think the -1 case to the set the default key is useful because it
>> allows you to use a key value of -1 to mean a??MPK is not supporteda??, and
>> still call pkey_mprotect.
> 
> Hmm the current manpage says then when MPK is not supported, pkey has to
> be specified 0. Which is a value that doesn't work when MPK *is*
> supported. So -1 is more universal indeed.

-1 also chosen a different key if key 0 does not support the requested 
protection flags.

>> I plan to document this behavior on the glibc side, and glibc will call
>> mprotect (not pkey_mprotect) for key -1, so that you won't get ENOSYS
>> with kernels which do not support pkey_mprotect.
> 
> Fair enough. What will you do about pkey_alloc() in that case, emulate
> ENOSPC? Oh, the manpage already suggests so. And the return value in
> that case is... -1. Makes sense :)

The manual page is incorrect, the kernel actually returns EINVAL. 
Applications should check for EINVAL (and also ENOSYS) and activate 
fallback code.  Using -1 directly would be a bit reckless IMHO.

Thanks
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
  2017-11-22 12:15       ` Florian Weimer
@ 2017-11-22 16:10         ` Dave Hansen
  -1 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-22 16:10 UTC (permalink / raw)
  To: Florian Weimer, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/22/2017 04:15 AM, Florian Weimer wrote:
> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>> pkey_mprotect() signal, or should there have been a pre-check returning
>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>> not?), so should we also document it?
> 
> I think the -1 case to the set the default key is useful because it
> allows you to use a key value of -1 to mean “MPK is not supported”, and
> still call pkey_mprotect.

The behavior to not allow 0 to be set was unintentional and is a bug.
We should fix that.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 16:10         ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-22 16:10 UTC (permalink / raw)
  To: Florian Weimer, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/22/2017 04:15 AM, Florian Weimer wrote:
> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>> pkey_mprotect() signal, or should there have been a pre-check returning
>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>> not?), so should we also document it?
> 
> I think the -1 case to the set the default key is useful because it
> allows you to use a key value of -1 to mean a??MPK is not supporteda??, and
> still call pkey_mprotect.

The behavior to not allow 0 to be set was unintentional and is a bug.
We should fix that.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 16:21           ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-22 16:21 UTC (permalink / raw)
  To: Dave Hansen, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/22/2017 05:10 PM, Dave Hansen wrote:
> On 11/22/2017 04:15 AM, Florian Weimer wrote:
>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>> not?), so should we also document it?
>>
>> I think the -1 case to the set the default key is useful because it
>> allows you to use a key value of -1 to mean “MPK is not supported”, and
>> still call pkey_mprotect.
> 
> The behavior to not allow 0 to be set was unintentional and is a bug.
> We should fix that.

On the other hand, x86-64 has no single default protection key due to 
the PROT_EXEC emulation.

Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 16:21           ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-22 16:21 UTC (permalink / raw)
  To: Dave Hansen, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/22/2017 05:10 PM, Dave Hansen wrote:
> On 11/22/2017 04:15 AM, Florian Weimer wrote:
>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>> not?), so should we also document it?
>>
>> I think the -1 case to the set the default key is useful because it
>> allows you to use a key value of -1 to mean “MPK is not supported”, and
>> still call pkey_mprotect.
> 
> The behavior to not allow 0 to be set was unintentional and is a bug.
> We should fix that.

On the other hand, x86-64 has no single default protection key due to 
the PROT_EXEC emulation.

Florian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 16:21           ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-22 16:21 UTC (permalink / raw)
  To: Dave Hansen, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/22/2017 05:10 PM, Dave Hansen wrote:
> On 11/22/2017 04:15 AM, Florian Weimer wrote:
>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>> not?), so should we also document it?
>>
>> I think the -1 case to the set the default key is useful because it
>> allows you to use a key value of -1 to mean a??MPK is not supporteda??, and
>> still call pkey_mprotect.
> 
> The behavior to not allow 0 to be set was unintentional and is a bug.
> We should fix that.

On the other hand, x86-64 has no single default protection key due to 
the PROT_EXEC emulation.

Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 16:32               ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-22 16:32 UTC (permalink / raw)
  To: Florian Weimer, Vlastimil Babka,
	linux-x86_64-u79uwXL29TY76Z2rM5mHXA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mm, Linux API

On 11/22/2017 08:21 AM, Florian Weimer wrote:
> On 11/22/2017 05:10 PM, Dave Hansen wrote:
>> On 11/22/2017 04:15 AM, Florian Weimer wrote:
>>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>>> not?), so should we also document it?
>>>
>>> I think the -1 case to the set the default key is useful because it
>>> allows you to use a key value of -1 to mean “MPK is not supported”, and
>>> still call pkey_mprotect.
>>
>> The behavior to not allow 0 to be set was unintentional and is a bug.
>> We should fix that.
> 
> On the other hand, x86-64 has no single default protection key due to
> the PROT_EXEC emulation.

No, the default is clearly 0 and documented to be so.  The PROT_EXEC
emulation one should be inaccessible in all the APIs so does not even
show up as *being* a key in the API.  The fact that it's implemented
with pkeys should be pretty immaterial other than the fact that you
can't touch the high bits in PKRU.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 16:32               ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-22 16:32 UTC (permalink / raw)
  To: Florian Weimer, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/22/2017 08:21 AM, Florian Weimer wrote:
> On 11/22/2017 05:10 PM, Dave Hansen wrote:
>> On 11/22/2017 04:15 AM, Florian Weimer wrote:
>>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>>> not?), so should we also document it?
>>>
>>> I think the -1 case to the set the default key is useful because it
>>> allows you to use a key value of -1 to mean “MPK is not supported”, and
>>> still call pkey_mprotect.
>>
>> The behavior to not allow 0 to be set was unintentional and is a bug.
>> We should fix that.
> 
> On the other hand, x86-64 has no single default protection key due to
> the PROT_EXEC emulation.

No, the default is clearly 0 and documented to be so.  The PROT_EXEC
emulation one should be inaccessible in all the APIs so does not even
show up as *being* a key in the API.  The fact that it's implemented
with pkeys should be pretty immaterial other than the fact that you
can't touch the high bits in PKRU.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-22 16:32               ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-22 16:32 UTC (permalink / raw)
  To: Florian Weimer, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/22/2017 08:21 AM, Florian Weimer wrote:
> On 11/22/2017 05:10 PM, Dave Hansen wrote:
>> On 11/22/2017 04:15 AM, Florian Weimer wrote:
>>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>>> not?), so should we also document it?
>>>
>>> I think the -1 case to the set the default key is useful because it
>>> allows you to use a key value of -1 to mean a??MPK is not supporteda??, and
>>> still call pkey_mprotect.
>>
>> The behavior to not allow 0 to be set was unintentional and is a bug.
>> We should fix that.
> 
> On the other hand, x86-64 has no single default protection key due to
> the PROT_EXEC emulation.

No, the default is clearly 0 and documented to be so.  The PROT_EXEC
emulation one should be inaccessible in all the APIs so does not even
show up as *being* a key in the API.  The fact that it's implemented
with pkeys should be pretty immaterial other than the fact that you
can't touch the high bits in PKRU.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
  2017-11-22 16:32               ` Dave Hansen
@ 2017-11-23  8:11                 ` Vlastimil Babka
  -1 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-23  8:11 UTC (permalink / raw)
  To: Dave Hansen, Florian Weimer, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/22/2017 05:32 PM, Dave Hansen wrote:
> On 11/22/2017 08:21 AM, Florian Weimer wrote:
>> On 11/22/2017 05:10 PM, Dave Hansen wrote:
>>> On 11/22/2017 04:15 AM, Florian Weimer wrote:
>>>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>>>> not?), so should we also document it?
>>>>
>>>> I think the -1 case to the set the default key is useful because it
>>>> allows you to use a key value of -1 to mean “MPK is not supported”, and
>>>> still call pkey_mprotect.
>>>
>>> The behavior to not allow 0 to be set was unintentional and is a bug.
>>> We should fix that.
>>
>> On the other hand, x86-64 has no single default protection key due to
>> the PROT_EXEC emulation.
> 
> No, the default is clearly 0 and documented to be so.  The PROT_EXEC
> emulation one should be inaccessible in all the APIs so does not even
> show up as *being* a key in the API.  The fact that it's implemented
> with pkeys should be pretty immaterial other than the fact that you
> can't touch the high bits in PKRU.

So, just to be sure, if we call pkey_mprotect() with 0, will it blindly
set 0, or the result of arch_override_mprotect_pkey() (thus equivalent
to call with -1) ? I assume the latter?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-23  8:11                 ` Vlastimil Babka
  0 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-23  8:11 UTC (permalink / raw)
  To: Dave Hansen, Florian Weimer, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/22/2017 05:32 PM, Dave Hansen wrote:
> On 11/22/2017 08:21 AM, Florian Weimer wrote:
>> On 11/22/2017 05:10 PM, Dave Hansen wrote:
>>> On 11/22/2017 04:15 AM, Florian Weimer wrote:
>>>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>>>> not?), so should we also document it?
>>>>
>>>> I think the -1 case to the set the default key is useful because it
>>>> allows you to use a key value of -1 to mean a??MPK is not supporteda??, and
>>>> still call pkey_mprotect.
>>>
>>> The behavior to not allow 0 to be set was unintentional and is a bug.
>>> We should fix that.
>>
>> On the other hand, x86-64 has no single default protection key due to
>> the PROT_EXEC emulation.
> 
> No, the default is clearly 0 and documented to be so.  The PROT_EXEC
> emulation one should be inaccessible in all the APIs so does not even
> show up as *being* a key in the API.  The fact that it's implemented
> with pkeys should be pretty immaterial other than the fact that you
> can't touch the high bits in PKRU.

So, just to be sure, if we call pkey_mprotect() with 0, will it blindly
set 0, or the result of arch_override_mprotect_pkey() (thus equivalent
to call with -1) ? I assume the latter?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
  2017-11-22 16:32               ` Dave Hansen
@ 2017-11-23 12:38                 ` Florian Weimer
  -1 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-23 12:38 UTC (permalink / raw)
  To: Dave Hansen, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

[-- Attachment #1: Type: text/plain, Size: 1906 bytes --]

On 11/22/2017 05:32 PM, Dave Hansen wrote:
> On 11/22/2017 08:21 AM, Florian Weimer wrote:
>> On 11/22/2017 05:10 PM, Dave Hansen wrote:
>>> On 11/22/2017 04:15 AM, Florian Weimer wrote:
>>>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>>>> not?), so should we also document it?
>>>>
>>>> I think the -1 case to the set the default key is useful because it
>>>> allows you to use a key value of -1 to mean “MPK is not supported”, and
>>>> still call pkey_mprotect.
>>>
>>> The behavior to not allow 0 to be set was unintentional and is a bug.
>>> We should fix that.
>>
>> On the other hand, x86-64 has no single default protection key due to
>> the PROT_EXEC emulation.
> 
> No, the default is clearly 0 and documented to be so.  The PROT_EXEC
> emulation one should be inaccessible in all the APIs so does not even
> show up as *being* a key in the API.

I see key 1 in /proc for a PROT_EXEC mapping.  If I supply an explicit 
protection key, that key is used, and the page ends up having read 
access enabled.

The key is also visible in the siginfo_t argument on read access to a 
PROT_EXEC mapping with the default key, so it's not just /proc:

page 1 (0x7f008242d000): read access denied
   SIGSEGV address: 0x7f008242d000
   SIGSEGV code: 4
   SIGSEGV key: 1

I'm attaching my test.

 > The fact that it's implemented
 > with pkeys should be pretty immaterial other than the fact that you
 > can't touch the high bits in PKRU.

I don't see a restriction for PKRU updates.  If I write zero to the PKRU 
register, PROT_EXEC implies PROT_READ, as I would expect.

This is with kernel 4.14.

Florian

[-- Attachment #2: mpk-default.c --]
[-- Type: text/x-csrc, Size: 4720 bytes --]

#include <err.h>
#include <setjmp.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <unistd.h>

#define PKEY_DISABLE_ACCESS 1
#define PKEY_DISABLE_WRITE 2

__attribute__ ((weak, noinline, noclone)) /* Compiler barrier.  */
void
touch (void *buffer)
{
}

__attribute__ ((weak, noinline, noclone)) /* Compiler barrier.  */
void
read_page (void *page)
{
  char buf[16];
  memcpy (buf, page, sizeof (buf));
  touch (buf);
}

__attribute__ ((weak, noinline, noclone)) /* Compiler barrier.  */
void
write_page (void *page)
{
  memset (page, 0, 16);
  touch (page);
}

static volatile void *sigsegv_addr;
static volatile int sigsegv_code;
static volatile int sigsegv_pkey;
static sigjmp_buf sigsegv_jmp;

static void
sigsegv_handler (int signo, siginfo_t *info, void *arg)
{
  sigsegv_addr = info->si_addr;
  sigsegv_code = info->si_code;
  if (info->si_code == 4)
    {
      /* Guess the address of the protection key field.  */
      int *ppkey = 2 + ((int *)((&info->si_addr) + 1));
      sigsegv_pkey = *ppkey;
    }
  else
    sigsegv_pkey = -1;
  siglongjmp (sigsegv_jmp, 2);
}

static const struct sigaction sigsegv_sigaction =
  {
    .sa_flags = SA_RESETHAND | SA_SIGINFO,
    .sa_sigaction = &sigsegv_handler,
  };

/* Return the value of the PKRU register.  */
static inline unsigned int
pkey_read (void)
{
  unsigned int result;
  __asm__ volatile (".byte 0x0f, 0x01, 0xee"
                    : "=a" (result) : "c" (0) : "rdx");
  return result;
}

/* Overwrite the PKRU register with VALUE.  */
static inline void
pkey_write (unsigned int value)
{
  __asm__ volatile (".byte 0x0f, 0x01, 0xef"
                    : : "a" (value), "c" (0), "d" (0));
}

enum { page_count = 7 };
static void *pages[page_count];

static void
check_fault_1 (int page, const char *what, void (*op) (void *))
{
  unsigned pkru = pkey_read ();

  int result = sigsetjmp (sigsegv_jmp, 1);
  if (result == 0)
    {
      if (sigaction (SIGSEGV, &sigsegv_sigaction, NULL) != 0)
	err (1, "sigaction");
      op (pages[page]);
      printf ("page %d (%p): %s access allowed\n", page, pages[page], what);
      return;
    }
  else
    {
      if (signal (SIGSEGV, SIG_DFL) == SIG_ERR)
	err (1, "signal");
      printf ("page %d (%p): %s access denied\n", page, pages[page], what);
      printf ("  SIGSEGV address: %p\n", sigsegv_addr);
      printf ("  SIGSEGV code: %d\n", sigsegv_code);
      printf ("  SIGSEGV key: %d\n", sigsegv_pkey);
    }

  /* Preserve PKRU register value (clobbered by signal handler).  */
  pkey_write (pkru);
}

static void
check_fault (int page)
{
  check_fault_1 (page, "read", read_page);
  check_fault_1 (page, "write", write_page);
}

static void
dump_smaps (const char *what)
{
  printf ("info: *** BEGIN %s ***\n", what);
  FILE *fp = fopen ("/proc/self/smaps", "r");
  if (fp == NULL)
    err (1, "fopen");
  while (true)
    {
      int ch = fgetc (fp);
      if (ch == EOF)
	break;
      fputc (ch, stdout);
    }
  if (ferror (fp))
    err (1, "fgetc");
  if (fclose (fp) != 0)
    err (1, "fclose");
  printf ("info: *** END %s ***\n", what);
  fflush (stdout);
}

int
main (void)
{
  int protections[page_count] = 
    { PROT_READ | PROT_WRITE, PROT_EXEC, PROT_READ, PROT_READ,
      PROT_EXEC | PROT_WRITE, PROT_EXEC | PROT_WRITE, PROT_EXEC };
  for (int i = 0; i < page_count; ++i)
    {
      pages[i] = mmap (NULL, 1, protections[i],
		       MAP_ANON | MAP_PRIVATE, -1, 0);
      if (pages[i] == MAP_FAILED)
	err (1, "mmap");
      printf ("page %d: %p\n", i, pages[i]);
    }
      
  int key = syscall (SYS_pkey_alloc, 0, 0);
  if (key < 0)
    err (1, "pkey_alloc");
  printf ("key: %d\n", key);

  if (syscall (SYS_pkey_mprotect, pages[2], 1, PROT_READ, key) != 0)
    err (1, "pkey_mprotected (pages[2])");
  if (syscall (SYS_pkey_mprotect, pages[3], 1, PROT_EXEC, key) != 0)
    err (1, "pkey_mprotected (pages[3])");
  if (syscall (SYS_pkey_mprotect, pages[5], 1, PROT_EXEC | PROT_WRITE, key)
      != 0)
    err (1, "pkey_mprotected (pages[5])");
  if (syscall (SYS_pkey_mprotect, pages[6], 1, PROT_EXEC, key) != 0)
    err (1, "pkey_mprotected (pages[6])");
  if (syscall (SYS_pkey_mprotect, pages[6], 1, PROT_EXEC, -1) != 0)
    err (1, "pkey_mprotected (pages[6])");

  dump_smaps ("dump before faults");

  /* This succeeds because the page is mapped readable.  */
  puts ("info: performing accesses");
  fflush (stdout);
  for (int i = 0; i < page_count; ++i)
    check_fault (i);

  /* See what happens if we grant all access rights.  */
  puts ("info: setting PKRU to zero");
  fflush (stdout);
  pkey_write (0);

  for (int i = 0; i < page_count; ++i)
    check_fault (i);

  return 0;
}

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-23 12:38                 ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-23 12:38 UTC (permalink / raw)
  To: Dave Hansen, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

[-- Attachment #1: Type: text/plain, Size: 1906 bytes --]

On 11/22/2017 05:32 PM, Dave Hansen wrote:
> On 11/22/2017 08:21 AM, Florian Weimer wrote:
>> On 11/22/2017 05:10 PM, Dave Hansen wrote:
>>> On 11/22/2017 04:15 AM, Florian Weimer wrote:
>>>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>>>> pkey_mprotect() signal, or should there have been a pre-check returning
>>>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>>>> do_mprotect_pkey())? I assume it's too late to change it now anyway (or
>>>>> not?), so should we also document it?
>>>>
>>>> I think the -1 case to the set the default key is useful because it
>>>> allows you to use a key value of -1 to mean a??MPK is not supporteda??, and
>>>> still call pkey_mprotect.
>>>
>>> The behavior to not allow 0 to be set was unintentional and is a bug.
>>> We should fix that.
>>
>> On the other hand, x86-64 has no single default protection key due to
>> the PROT_EXEC emulation.
> 
> No, the default is clearly 0 and documented to be so.  The PROT_EXEC
> emulation one should be inaccessible in all the APIs so does not even
> show up as *being* a key in the API.

I see key 1 in /proc for a PROT_EXEC mapping.  If I supply an explicit 
protection key, that key is used, and the page ends up having read 
access enabled.

The key is also visible in the siginfo_t argument on read access to a 
PROT_EXEC mapping with the default key, so it's not just /proc:

page 1 (0x7f008242d000): read access denied
   SIGSEGV address: 0x7f008242d000
   SIGSEGV code: 4
   SIGSEGV key: 1

I'm attaching my test.

 > The fact that it's implemented
 > with pkeys should be pretty immaterial other than the fact that you
 > can't touch the high bits in PKRU.

I don't see a restriction for PKRU updates.  If I write zero to the PKRU 
register, PROT_EXEC implies PROT_READ, as I would expect.

This is with kernel 4.14.

Florian

[-- Attachment #2: mpk-default.c --]
[-- Type: text/x-csrc, Size: 4720 bytes --]

#include <err.h>
#include <setjmp.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <unistd.h>

#define PKEY_DISABLE_ACCESS 1
#define PKEY_DISABLE_WRITE 2

__attribute__ ((weak, noinline, noclone)) /* Compiler barrier.  */
void
touch (void *buffer)
{
}

__attribute__ ((weak, noinline, noclone)) /* Compiler barrier.  */
void
read_page (void *page)
{
  char buf[16];
  memcpy (buf, page, sizeof (buf));
  touch (buf);
}

__attribute__ ((weak, noinline, noclone)) /* Compiler barrier.  */
void
write_page (void *page)
{
  memset (page, 0, 16);
  touch (page);
}

static volatile void *sigsegv_addr;
static volatile int sigsegv_code;
static volatile int sigsegv_pkey;
static sigjmp_buf sigsegv_jmp;

static void
sigsegv_handler (int signo, siginfo_t *info, void *arg)
{
  sigsegv_addr = info->si_addr;
  sigsegv_code = info->si_code;
  if (info->si_code == 4)
    {
      /* Guess the address of the protection key field.  */
      int *ppkey = 2 + ((int *)((&info->si_addr) + 1));
      sigsegv_pkey = *ppkey;
    }
  else
    sigsegv_pkey = -1;
  siglongjmp (sigsegv_jmp, 2);
}

static const struct sigaction sigsegv_sigaction =
  {
    .sa_flags = SA_RESETHAND | SA_SIGINFO,
    .sa_sigaction = &sigsegv_handler,
  };

/* Return the value of the PKRU register.  */
static inline unsigned int
pkey_read (void)
{
  unsigned int result;
  __asm__ volatile (".byte 0x0f, 0x01, 0xee"
                    : "=a" (result) : "c" (0) : "rdx");
  return result;
}

/* Overwrite the PKRU register with VALUE.  */
static inline void
pkey_write (unsigned int value)
{
  __asm__ volatile (".byte 0x0f, 0x01, 0xef"
                    : : "a" (value), "c" (0), "d" (0));
}

enum { page_count = 7 };
static void *pages[page_count];

static void
check_fault_1 (int page, const char *what, void (*op) (void *))
{
  unsigned pkru = pkey_read ();

  int result = sigsetjmp (sigsegv_jmp, 1);
  if (result == 0)
    {
      if (sigaction (SIGSEGV, &sigsegv_sigaction, NULL) != 0)
	err (1, "sigaction");
      op (pages[page]);
      printf ("page %d (%p): %s access allowed\n", page, pages[page], what);
      return;
    }
  else
    {
      if (signal (SIGSEGV, SIG_DFL) == SIG_ERR)
	err (1, "signal");
      printf ("page %d (%p): %s access denied\n", page, pages[page], what);
      printf ("  SIGSEGV address: %p\n", sigsegv_addr);
      printf ("  SIGSEGV code: %d\n", sigsegv_code);
      printf ("  SIGSEGV key: %d\n", sigsegv_pkey);
    }

  /* Preserve PKRU register value (clobbered by signal handler).  */
  pkey_write (pkru);
}

static void
check_fault (int page)
{
  check_fault_1 (page, "read", read_page);
  check_fault_1 (page, "write", write_page);
}

static void
dump_smaps (const char *what)
{
  printf ("info: *** BEGIN %s ***\n", what);
  FILE *fp = fopen ("/proc/self/smaps", "r");
  if (fp == NULL)
    err (1, "fopen");
  while (true)
    {
      int ch = fgetc (fp);
      if (ch == EOF)
	break;
      fputc (ch, stdout);
    }
  if (ferror (fp))
    err (1, "fgetc");
  if (fclose (fp) != 0)
    err (1, "fclose");
  printf ("info: *** END %s ***\n", what);
  fflush (stdout);
}

int
main (void)
{
  int protections[page_count] = 
    { PROT_READ | PROT_WRITE, PROT_EXEC, PROT_READ, PROT_READ,
      PROT_EXEC | PROT_WRITE, PROT_EXEC | PROT_WRITE, PROT_EXEC };
  for (int i = 0; i < page_count; ++i)
    {
      pages[i] = mmap (NULL, 1, protections[i],
		       MAP_ANON | MAP_PRIVATE, -1, 0);
      if (pages[i] == MAP_FAILED)
	err (1, "mmap");
      printf ("page %d: %p\n", i, pages[i]);
    }
      
  int key = syscall (SYS_pkey_alloc, 0, 0);
  if (key < 0)
    err (1, "pkey_alloc");
  printf ("key: %d\n", key);

  if (syscall (SYS_pkey_mprotect, pages[2], 1, PROT_READ, key) != 0)
    err (1, "pkey_mprotected (pages[2])");
  if (syscall (SYS_pkey_mprotect, pages[3], 1, PROT_EXEC, key) != 0)
    err (1, "pkey_mprotected (pages[3])");
  if (syscall (SYS_pkey_mprotect, pages[5], 1, PROT_EXEC | PROT_WRITE, key)
      != 0)
    err (1, "pkey_mprotected (pages[5])");
  if (syscall (SYS_pkey_mprotect, pages[6], 1, PROT_EXEC, key) != 0)
    err (1, "pkey_mprotected (pages[6])");
  if (syscall (SYS_pkey_mprotect, pages[6], 1, PROT_EXEC, -1) != 0)
    err (1, "pkey_mprotected (pages[6])");

  dump_smaps ("dump before faults");

  /* This succeeds because the page is mapped readable.  */
  puts ("info: performing accesses");
  fflush (stdout);
  for (int i = 0; i < page_count; ++i)
    check_fault (i);

  /* See what happens if we grant all access rights.  */
  puts ("info: setting PKRU to zero");
  fflush (stdout);
  pkey_write (0);

  for (int i = 0; i < page_count; ++i)
    check_fault (i);

  return 0;
}

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
  2017-11-09 16:59       ` Dave Hansen
@ 2017-11-23 12:48         ` Florian Weimer
  -1 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-23 12:48 UTC (permalink / raw)
  To: Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/09/2017 05:59 PM, Dave Hansen wrote:
> On 11/09/2017 06:48 AM, Florian Weimer wrote:
>> On 11/08/2017 09:41 PM, Dave Hansen wrote:
>>>> (B) If a key is reused, existing threads retain their access rights,
>>>> while there is an expectation that pkey_alloc denies access for the
>>>> threads except the current one.
>>> Where does this expectation come from?
>>
>> For me, it was the access_rights argument to pkey_alloc.  What else
>> would it do?  For the current thread, I can already set the rights with
>> a PKRU write, so the existence of the syscall argument is puzzling.
> 
> The manpage is pretty bare here.  But the thought was that, in most
> cases, you will want to allocate a key and start using it immediately.
> This was in response to some feedback on one of the earlier reviews of
> the patch set.

Okay.  In the future, may want to use this access rights to specify the 
default for the signal handler (with a new pkey_alloc flag).  If I can 
the default access rights, that would pretty much solve the sigsetjmp 
problem for me, too, and I can start using protection keys in low-level 
glibc code.

>>> Using the malloc() analogy, we
>>> don't expect that free() in one thread actively takes away references to
>>> the memory held by other threads.
>>
>> But malloc/free isn't expected to be a partial antidote to random
>> pointer scribbling.
> 
> Nor is protection keys intended to be an antidote for use-after-free.

I'm comparing this to munmap, which is actually such an antidote 
(because it involves an IPI to flush all CPUs which could have seen the 
mapping before).

I'm surprised that pkey_free doesn't perform a similar broadcast.

>> I think we should either implement revoke on pkey_alloc, with a
>> broadcast to all threads (the pkey_set race can be closed by having a
>> vDSO for that an the revocation code can check %rip to see if the old
>> PKRU value needs to be fixed up).  Or we add the two pkey_alloc flags I
>> mentioned earlier.
> 
> That sounds awfully complicated to put in-kernel.  I'd be happy to
> review the patches after you put them together once we see how it looks.

TLB flushes are complicated, too, and very costly, but we still do them 
on unmap, even in cases where they are not required for security reasons.

> You basically want threads to broadcast their PKRU values at pkey_free()
> time.  That's totally doable... in userspace.  You just need a mechanism
> for each thread to periodically check if they need an update.

No, we want to the revocation to be immediate, so we'd have to use 
something like the setxid broadcast, and we have to make sure that we 
aren't in a pkey_set, and if we are, adjust register contents besides 
PKRU.  Not pretty at all.  I really don't want to implement that.

If the broadcast is lazy, I think it defeats its purpose because you 
don't know what kind of access privileges other threads in the system have.

Your solution to all MPK problems seems to be to say that it's undefined 
and applications shouldn't do that.  But if applications only used 
well-defined memory accesses, why would we need MPK?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-23 12:48         ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-23 12:48 UTC (permalink / raw)
  To: Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/09/2017 05:59 PM, Dave Hansen wrote:
> On 11/09/2017 06:48 AM, Florian Weimer wrote:
>> On 11/08/2017 09:41 PM, Dave Hansen wrote:
>>>> (B) If a key is reused, existing threads retain their access rights,
>>>> while there is an expectation that pkey_alloc denies access for the
>>>> threads except the current one.
>>> Where does this expectation come from?
>>
>> For me, it was the access_rights argument to pkey_alloc.A  What else
>> would it do?A  For the current thread, I can already set the rights with
>> a PKRU write, so the existence of the syscall argument is puzzling.
> 
> The manpage is pretty bare here.  But the thought was that, in most
> cases, you will want to allocate a key and start using it immediately.
> This was in response to some feedback on one of the earlier reviews of
> the patch set.

Okay.  In the future, may want to use this access rights to specify the 
default for the signal handler (with a new pkey_alloc flag).  If I can 
the default access rights, that would pretty much solve the sigsetjmp 
problem for me, too, and I can start using protection keys in low-level 
glibc code.

>>> Using the malloc() analogy, we
>>> don't expect that free() in one thread actively takes away references to
>>> the memory held by other threads.
>>
>> But malloc/free isn't expected to be a partial antidote to random
>> pointer scribbling.
> 
> Nor is protection keys intended to be an antidote for use-after-free.

I'm comparing this to munmap, which is actually such an antidote 
(because it involves an IPI to flush all CPUs which could have seen the 
mapping before).

I'm surprised that pkey_free doesn't perform a similar broadcast.

>> I think we should either implement revoke on pkey_alloc, with a
>> broadcast to all threads (the pkey_set race can be closed by having a
>> vDSO for that an the revocation code can check %rip to see if the old
>> PKRU value needs to be fixed up).A  Or we add the two pkey_alloc flags I
>> mentioned earlier.
> 
> That sounds awfully complicated to put in-kernel.  I'd be happy to
> review the patches after you put them together once we see how it looks.

TLB flushes are complicated, too, and very costly, but we still do them 
on unmap, even in cases where they are not required for security reasons.

> You basically want threads to broadcast their PKRU values at pkey_free()
> time.  That's totally doable... in userspace.  You just need a mechanism
> for each thread to periodically check if they need an update.

No, we want to the revocation to be immediate, so we'd have to use 
something like the setxid broadcast, and we have to make sure that we 
aren't in a pkey_set, and if we are, adjust register contents besides 
PKRU.  Not pretty at all.  I really don't want to implement that.

If the broadcast is lazy, I think it defeats its purpose because you 
don't know what kind of access privileges other threads in the system have.

Your solution to all MPK problems seems to be to say that it's undefined 
and applications shouldn't do that.  But if applications only used 
well-defined memory accesses, why would we need MPK?

Thanks,
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-23 13:07           ` Vlastimil Babka
  0 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-23 13:07 UTC (permalink / raw)
  To: Florian Weimer, Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/23/2017 01:48 PM, Florian Weimer wrote:
>>>> Using the malloc() analogy, we
>>>> don't expect that free() in one thread actively takes away references to
>>>> the memory held by other threads.
>>>
>>> But malloc/free isn't expected to be a partial antidote to random
>>> pointer scribbling.
>>
>> Nor is protection keys intended to be an antidote for use-after-free.
> 
> I'm comparing this to munmap, which is actually such an antidote 
> (because it involves an IPI to flush all CPUs which could have seen the 
> mapping before).
> 
> I'm surprised that pkey_free doesn't perform a similar broadcast.

Hmm, I'm not sure this comparison is accurate. IPI flushes in unmap are
done because the shared page tables were updated, and TLB's in other
cpu's might be stale. The closest pkey equivalent would be allocating a
new pkey that only my thread can use, and then using it in
pkey_mprotect() to change some memory region. Then other threads will
lose access and I believe IPI's will be issued and existing TLB mappings
in other cpu's removed.

pkey_remove() has AFAICS two potential problems:
- the key is still used in some page tables. Scanning them all and
resetting to 0 would be rather expensive. Maybe we could maintain
per-pkey counters (for pkey != 0) in the mm, which might not be that
expensive, and refuse pkey_free() if the counter is not zero?
- the key is still "used" by other threads in their PKRU. Here I would
think that if kernel doesn't broadcast pkey_alloc() to other threads, it
also shouldn't broadcast the freeing? We also can't track per-pkey
"threads using pkey" counters, as WRPKRU is pure userspace.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-23 13:07           ` Vlastimil Babka
  0 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-23 13:07 UTC (permalink / raw)
  To: Florian Weimer, Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/23/2017 01:48 PM, Florian Weimer wrote:
>>>> Using the malloc() analogy, we
>>>> don't expect that free() in one thread actively takes away references to
>>>> the memory held by other threads.
>>>
>>> But malloc/free isn't expected to be a partial antidote to random
>>> pointer scribbling.
>>
>> Nor is protection keys intended to be an antidote for use-after-free.
> 
> I'm comparing this to munmap, which is actually such an antidote 
> (because it involves an IPI to flush all CPUs which could have seen the 
> mapping before).
> 
> I'm surprised that pkey_free doesn't perform a similar broadcast.

Hmm, I'm not sure this comparison is accurate. IPI flushes in unmap are
done because the shared page tables were updated, and TLB's in other
cpu's might be stale. The closest pkey equivalent would be allocating a
new pkey that only my thread can use, and then using it in
pkey_mprotect() to change some memory region. Then other threads will
lose access and I believe IPI's will be issued and existing TLB mappings
in other cpu's removed.

pkey_remove() has AFAICS two potential problems:
- the key is still used in some page tables. Scanning them all and
resetting to 0 would be rather expensive. Maybe we could maintain
per-pkey counters (for pkey != 0) in the mm, which might not be that
expensive, and refuse pkey_free() if the counter is not zero?
- the key is still "used" by other threads in their PKRU. Here I would
think that if kernel doesn't broadcast pkey_alloc() to other threads, it
also shouldn't broadcast the freeing? We also can't track per-pkey
"threads using pkey" counters, as WRPKRU is pure userspace.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-23 15:00                     ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-23 15:00 UTC (permalink / raw)
  To: Vlastimil Babka, Florian Weimer,
	linux-x86_64-u79uwXL29TY76Z2rM5mHXA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mm, Linux API

On 11/23/2017 12:11 AM, Vlastimil Babka wrote:
>> No, the default is clearly 0 and documented to be so.  The PROT_EXEC
>> emulation one should be inaccessible in all the APIs so does not even
>> show up as *being* a key in the API.  The fact that it's implemented
>> with pkeys should be pretty immaterial other than the fact that you
>> can't touch the high bits in PKRU.
> So, just to be sure, if we call pkey_mprotect() with 0, will it blindly
> set 0, or the result of arch_override_mprotect_pkey() (thus equivalent
> to call with -1) ? I assume the latter?

It's supposed to set 0.

-1 was, as far as I remember, an internal-to-the-kernel-only thing to
tell us that a key came from *mprotect()* instead of pkey_mprotect().

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-23 15:00                     ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-23 15:00 UTC (permalink / raw)
  To: Vlastimil Babka, Florian Weimer, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/23/2017 12:11 AM, Vlastimil Babka wrote:
>> No, the default is clearly 0 and documented to be so.  The PROT_EXEC
>> emulation one should be inaccessible in all the APIs so does not even
>> show up as *being* a key in the API.  The fact that it's implemented
>> with pkeys should be pretty immaterial other than the fact that you
>> can't touch the high bits in PKRU.
> So, just to be sure, if we call pkey_mprotect() with 0, will it blindly
> set 0, or the result of arch_override_mprotect_pkey() (thus equivalent
> to call with -1) ? I assume the latter?

It's supposed to set 0.

-1 was, as far as I remember, an internal-to-the-kernel-only thing to
tell us that a key came from *mprotect()* instead of pkey_mprotect().

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-23 15:00                     ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-23 15:00 UTC (permalink / raw)
  To: Vlastimil Babka, Florian Weimer, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/23/2017 12:11 AM, Vlastimil Babka wrote:
>> No, the default is clearly 0 and documented to be so.  The PROT_EXEC
>> emulation one should be inaccessible in all the APIs so does not even
>> show up as *being* a key in the API.  The fact that it's implemented
>> with pkeys should be pretty immaterial other than the fact that you
>> can't touch the high bits in PKRU.
> So, just to be sure, if we call pkey_mprotect() with 0, will it blindly
> set 0, or the result of arch_override_mprotect_pkey() (thus equivalent
> to call with -1) ? I assume the latter?

It's supposed to set 0.

-1 was, as far as I remember, an internal-to-the-kernel-only thing to
tell us that a key came from *mprotect()* instead of pkey_mprotect().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
  2017-11-23 12:38                 ` Florian Weimer
@ 2017-11-23 15:09                   ` Dave Hansen
  -1 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-23 15:09 UTC (permalink / raw)
  To: Florian Weimer, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/23/2017 04:38 AM, Florian Weimer wrote:
> On 11/22/2017 05:32 PM, Dave Hansen wrote:
>> On 11/22/2017 08:21 AM, Florian Weimer wrote:
>>> On 11/22/2017 05:10 PM, Dave Hansen wrote:
>>>> On 11/22/2017 04:15 AM, Florian Weimer wrote:
>>>>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>>>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>>>>> pkey_mprotect() signal, or should there have been a pre-check
>>>>>> returning
>>>>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>>>>> do_mprotect_pkey())? I assume it's too late to change it now
>>>>>> anyway (or
>>>>>> not?), so should we also document it?
>>>>>
>>>>> I think the -1 case to the set the default key is useful because it
>>>>> allows you to use a key value of -1 to mean “MPK is not supported”,
>>>>> and
>>>>> still call pkey_mprotect.
>>>>
>>>> The behavior to not allow 0 to be set was unintentional and is a bug.
>>>> We should fix that.
>>>
>>> On the other hand, x86-64 has no single default protection key due to
>>> the PROT_EXEC emulation.
>>
>> No, the default is clearly 0 and documented to be so.  The PROT_EXEC
>> emulation one should be inaccessible in all the APIs so does not even
>> show up as *being* a key in the API.

I should have been more explicit: the EXEC pkey does not show up in the
syscall API.

> I see key 1 in /proc for a PROT_EXEC mapping.  If I supply an explicit
> protection key, that key is used, and the page ends up having read
> access enabled.
> 
> The key is also visible in the siginfo_t argument on read access to a
> PROT_EXEC mapping with the default key, so it's not just /proc:
> 
> page 1 (0x7f008242d000): read access denied
>   SIGSEGV address: 0x7f008242d000
>   SIGSEGV code: 4
>   SIGSEGV key: 1
> 
> I'm attaching my test.

Yes, it is exposed there.  But, as a non-allocated pkey, the intention
in the kernel was to make sure that it could not be passed to the syscalls.

If that behavior is broken, we should probably fix it.

>> The fact that it's implemented
>> with pkeys should be pretty immaterial other than the fact that you
>> can't touch the high bits in PKRU.
> 
> I don't see a restriction for PKRU updates.  If I write zero to the PKRU
> register, PROT_EXEC implies PROT_READ, as I would expect.

I'll rephrase:
	
	The fact that it's implemented with pkeys should be pretty
	immaterial other than the fact that you must not touch the bits
	controlling PROT_EXEC in PKRU if you want to keep it working.

There is no restriction which is *enforced*.  It's just documented.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-23 15:09                   ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-23 15:09 UTC (permalink / raw)
  To: Florian Weimer, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/23/2017 04:38 AM, Florian Weimer wrote:
> On 11/22/2017 05:32 PM, Dave Hansen wrote:
>> On 11/22/2017 08:21 AM, Florian Weimer wrote:
>>> On 11/22/2017 05:10 PM, Dave Hansen wrote:
>>>> On 11/22/2017 04:15 AM, Florian Weimer wrote:
>>>>> On 11/22/2017 09:18 AM, Vlastimil Babka wrote:
>>>>>> And, was the pkey == -1 internal wiring supposed to be exposed to the
>>>>>> pkey_mprotect() signal, or should there have been a pre-check
>>>>>> returning
>>>>>> EINVAL in SYSCALL_DEFINE4(pkey_mprotect), before calling
>>>>>> do_mprotect_pkey())? I assume it's too late to change it now
>>>>>> anyway (or
>>>>>> not?), so should we also document it?
>>>>>
>>>>> I think the -1 case to the set the default key is useful because it
>>>>> allows you to use a key value of -1 to mean a??MPK is not supporteda??,
>>>>> and
>>>>> still call pkey_mprotect.
>>>>
>>>> The behavior to not allow 0 to be set was unintentional and is a bug.
>>>> We should fix that.
>>>
>>> On the other hand, x86-64 has no single default protection key due to
>>> the PROT_EXEC emulation.
>>
>> No, the default is clearly 0 and documented to be so.A  The PROT_EXEC
>> emulation one should be inaccessible in all the APIs so does not even
>> show up as *being* a key in the API.

I should have been more explicit: the EXEC pkey does not show up in the
syscall API.

> I see key 1 in /proc for a PROT_EXEC mapping.A  If I supply an explicit
> protection key, that key is used, and the page ends up having read
> access enabled.
> 
> The key is also visible in the siginfo_t argument on read access to a
> PROT_EXEC mapping with the default key, so it's not just /proc:
> 
> page 1 (0x7f008242d000): read access denied
> A  SIGSEGV address: 0x7f008242d000
> A  SIGSEGV code: 4
> A  SIGSEGV key: 1
> 
> I'm attaching my test.

Yes, it is exposed there.  But, as a non-allocated pkey, the intention
in the kernel was to make sure that it could not be passed to the syscalls.

If that behavior is broken, we should probably fix it.

>> The fact that it's implemented
>> with pkeys should be pretty immaterial other than the fact that you
>> can't touch the high bits in PKRU.
> 
> I don't see a restriction for PKRU updates.A  If I write zero to the PKRU
> register, PROT_EXEC implies PROT_READ, as I would expect.

I'll rephrase:
	
	The fact that it's implemented with pkeys should be pretty
	immaterial other than the fact that you must not touch the bits
	controlling PROT_EXEC in PKRU if you want to keep it working.

There is no restriction which is *enforced*.  It's just documented.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
  2017-11-23 12:48         ` Florian Weimer
@ 2017-11-23 15:25           ` Dave Hansen
  -1 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-23 15:25 UTC (permalink / raw)
  To: Florian Weimer, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/23/2017 04:48 AM, Florian Weimer wrote:
> On 11/09/2017 05:59 PM, Dave Hansen wrote:
>> The manpage is pretty bare here.  But the thought was that, in most
>> cases, you will want to allocate a key and start using it immediately.
>> This was in response to some feedback on one of the earlier reviews of
>> the patch set.
> 
> Okay.  In the future, may want to use this access rights to specify the
> default for the signal handler (with a new pkey_alloc flag).  If I can
> the default access rights, that would pretty much solve the sigsetjmp
> problem for me, too, and I can start using protection keys in low-level
> glibc code.

I haven't thought about this much in a year or so, but I think this is
doable.

One bit of advice: please look at features when they go in to the
kernel.  Your feedback has been valuable, but not very timely.  I
promise you'll get better results if you give feedback when patches are
being posted rather than when they've been in the kernel for a year.

>>> I think we should either implement revoke on pkey_alloc, with a
>>> broadcast to all threads (the pkey_set race can be closed by having a
>>> vDSO for that an the revocation code can check %rip to see if the old
>>> PKRU value needs to be fixed up).  Or we add the two pkey_alloc flags I
>>> mentioned earlier.
>>
>> That sounds awfully complicated to put in-kernel.  I'd be happy to
>> review the patches after you put them together once we see how it looks.
> 
> TLB flushes are complicated, too, and very costly, but we still do them
> on unmap, even in cases where they are not required for security reasons.

I'll also note that TLB flushes are transparent to software.  What you
are suggesting is not.  That makes it a *LOT* more difficult to implement.

If you have an idea how to do this, I'll happily review patches!

>> You basically want threads to broadcast their PKRU values at pkey_free()
>> time.  That's totally doable... in userspace.  You just need a mechanism
>> for each thread to periodically check if they need an update.
> 
> No, we want to the revocation to be immediate, so we'd have to use
> something like the setxid broadcast, and we have to make sure that we
> aren't in a pkey_set, and if we are, adjust register contents besides
> PKRU.  Not pretty at all.  I really don't want to implement that.
> 
> If the broadcast is lazy, I think it defeats its purpose because you
> don't know what kind of access privileges other threads in the system have.
> 
> Your solution to all MPK problems seems to be to say that it's undefined
> and applications shouldn't do that.  But if applications only used
> well-defined memory accesses, why would we need MPK?

BTW, I never call this feature MPK because it looks too much like MPX
and they have nothing to do with each other.  I'd recommend the same to
you.  It keeps your audience less confused.

I understand there is some distaste for where the implementation
settled.  I don't, either, in a lot of ways.  If I were to re-architect
it in the CPU, I certainly wouldn't have a user-visible PKRU and and
found a way to avoid the signal PKRU issues.  But, that ship has sailed.

I don't see a way to do a broadcast PKRU update.  But, I'd love to be
proven wrong, with code.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-23 15:25           ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-23 15:25 UTC (permalink / raw)
  To: Florian Weimer, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/23/2017 04:48 AM, Florian Weimer wrote:
> On 11/09/2017 05:59 PM, Dave Hansen wrote:
>> The manpage is pretty bare here.A  But the thought was that, in most
>> cases, you will want to allocate a key and start using it immediately.
>> This was in response to some feedback on one of the earlier reviews of
>> the patch set.
> 
> Okay.A  In the future, may want to use this access rights to specify the
> default for the signal handler (with a new pkey_alloc flag).A  If I can
> the default access rights, that would pretty much solve the sigsetjmp
> problem for me, too, and I can start using protection keys in low-level
> glibc code.

I haven't thought about this much in a year or so, but I think this is
doable.

One bit of advice: please look at features when they go in to the
kernel.  Your feedback has been valuable, but not very timely.  I
promise you'll get better results if you give feedback when patches are
being posted rather than when they've been in the kernel for a year.

>>> I think we should either implement revoke on pkey_alloc, with a
>>> broadcast to all threads (the pkey_set race can be closed by having a
>>> vDSO for that an the revocation code can check %rip to see if the old
>>> PKRU value needs to be fixed up).A  Or we add the two pkey_alloc flags I
>>> mentioned earlier.
>>
>> That sounds awfully complicated to put in-kernel.A  I'd be happy to
>> review the patches after you put them together once we see how it looks.
> 
> TLB flushes are complicated, too, and very costly, but we still do them
> on unmap, even in cases where they are not required for security reasons.

I'll also note that TLB flushes are transparent to software.  What you
are suggesting is not.  That makes it a *LOT* more difficult to implement.

If you have an idea how to do this, I'll happily review patches!

>> You basically want threads to broadcast their PKRU values at pkey_free()
>> time.A  That's totally doable... in userspace.A  You just need a mechanism
>> for each thread to periodically check if they need an update.
> 
> No, we want to the revocation to be immediate, so we'd have to use
> something like the setxid broadcast, and we have to make sure that we
> aren't in a pkey_set, and if we are, adjust register contents besides
> PKRU.A  Not pretty at all.A  I really don't want to implement that.
> 
> If the broadcast is lazy, I think it defeats its purpose because you
> don't know what kind of access privileges other threads in the system have.
> 
> Your solution to all MPK problems seems to be to say that it's undefined
> and applications shouldn't do that.A  But if applications only used
> well-defined memory accesses, why would we need MPK?

BTW, I never call this feature MPK because it looks too much like MPX
and they have nothing to do with each other.  I'd recommend the same to
you.  It keeps your audience less confused.

I understand there is some distaste for where the implementation
settled.  I don't, either, in a lot of ways.  If I were to re-architect
it in the CPU, I certainly wouldn't have a user-visible PKRU and and
found a way to avoid the signal PKRU issues.  But, that ship has sailed.

I don't see a way to do a broadcast PKRU update.  But, I'd love to be
proven wrong, with code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
  2017-11-23 15:00                     ` Dave Hansen
@ 2017-11-23 21:42                       ` Vlastimil Babka
  -1 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-23 21:42 UTC (permalink / raw)
  To: Dave Hansen, Florian Weimer, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/23/2017 04:00 PM, Dave Hansen wrote:
> On 11/23/2017 12:11 AM, Vlastimil Babka wrote:
>>> No, the default is clearly 0 and documented to be so.  The PROT_EXEC
>>> emulation one should be inaccessible in all the APIs so does not even
>>> show up as *being* a key in the API.  The fact that it's implemented
>>> with pkeys should be pretty immaterial other than the fact that you
>>> can't touch the high bits in PKRU.
>> So, just to be sure, if we call pkey_mprotect() with 0, will it blindly
>> set 0, or the result of arch_override_mprotect_pkey() (thus equivalent
>> to call with -1) ? I assume the latter?
> 
> It's supposed to set 0.
> 
> -1 was, as far as I remember, an internal-to-the-kernel-only thing to
> tell us that a key came from *mprotect()* instead of pkey_mprotect().

So, pkey_mprotect(..., 0) will set it to 0, regardless of PROT_EXEC.
pkey_mprotect(..., -1) or mprotect() will set it to 0-or-PROT_EXEC-pkey.

Can't shake the feeling that it's somewhat weird, but I guess it's
flexible at least. So just has to be well documented.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-23 21:42                       ` Vlastimil Babka
  0 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-23 21:42 UTC (permalink / raw)
  To: Dave Hansen, Florian Weimer, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/23/2017 04:00 PM, Dave Hansen wrote:
> On 11/23/2017 12:11 AM, Vlastimil Babka wrote:
>>> No, the default is clearly 0 and documented to be so.  The PROT_EXEC
>>> emulation one should be inaccessible in all the APIs so does not even
>>> show up as *being* a key in the API.  The fact that it's implemented
>>> with pkeys should be pretty immaterial other than the fact that you
>>> can't touch the high bits in PKRU.
>> So, just to be sure, if we call pkey_mprotect() with 0, will it blindly
>> set 0, or the result of arch_override_mprotect_pkey() (thus equivalent
>> to call with -1) ? I assume the latter?
> 
> It's supposed to set 0.
> 
> -1 was, as far as I remember, an internal-to-the-kernel-only thing to
> tell us that a key came from *mprotect()* instead of pkey_mprotect().

So, pkey_mprotect(..., 0) will set it to 0, regardless of PROT_EXEC.
pkey_mprotect(..., -1) or mprotect() will set it to 0-or-PROT_EXEC-pkey.

Can't shake the feeling that it's somewhat weird, but I guess it's
flexible at least. So just has to be well documented.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-23 23:29                           ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-23 23:29 UTC (permalink / raw)
  To: Vlastimil Babka, Florian Weimer,
	linux-x86_64-u79uwXL29TY76Z2rM5mHXA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mm, Linux API

On 11/23/2017 01:42 PM, Vlastimil Babka wrote:
>> It's supposed to set 0.
>>
>> -1 was, as far as I remember, an internal-to-the-kernel-only thing to
>> tell us that a key came from *mprotect()* instead of pkey_mprotect().
> So, pkey_mprotect(..., 0) will set it to 0, regardless of PROT_EXEC.

Although weird, the thought here was that pkey_mprotect() callers are
new and should know about the interactions with PROT_EXEC.  They can
also *get* PROT_EXEC semantics if they want.

The only wart here is if you do:

	mprotect(..., PROT_EXEC); // key 10 is now the PROT_EXEC key
	pkey_mprotect(..., PROT_EXEC, key=3);

I'm not sure what this does.  We should probably ensure that it returns
an error.

> pkey_mprotect(..., -1) or mprotect() will set it to 0-or-PROT_EXEC-pkey.
> 
> Can't shake the feeling that it's somewhat weird, but I guess it's
> flexible at least. So just has to be well documented.

It *is* weird.  But, layering on top of legacy APIs are often weird.  I
would have been open to other sane, but less weird ways to do it a year
ago. :)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-23 23:29                           ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-23 23:29 UTC (permalink / raw)
  To: Vlastimil Babka, Florian Weimer, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/23/2017 01:42 PM, Vlastimil Babka wrote:
>> It's supposed to set 0.
>>
>> -1 was, as far as I remember, an internal-to-the-kernel-only thing to
>> tell us that a key came from *mprotect()* instead of pkey_mprotect().
> So, pkey_mprotect(..., 0) will set it to 0, regardless of PROT_EXEC.

Although weird, the thought here was that pkey_mprotect() callers are
new and should know about the interactions with PROT_EXEC.  They can
also *get* PROT_EXEC semantics if they want.

The only wart here is if you do:

	mprotect(..., PROT_EXEC); // key 10 is now the PROT_EXEC key
	pkey_mprotect(..., PROT_EXEC, key=3);

I'm not sure what this does.  We should probably ensure that it returns
an error.

> pkey_mprotect(..., -1) or mprotect() will set it to 0-or-PROT_EXEC-pkey.
> 
> Can't shake the feeling that it's somewhat weird, but I guess it's
> flexible at least. So just has to be well documented.

It *is* weird.  But, layering on top of legacy APIs are often weird.  I
would have been open to other sane, but less weird ways to do it a year
ago. :)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-23 23:29                           ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2017-11-23 23:29 UTC (permalink / raw)
  To: Vlastimil Babka, Florian Weimer, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/23/2017 01:42 PM, Vlastimil Babka wrote:
>> It's supposed to set 0.
>>
>> -1 was, as far as I remember, an internal-to-the-kernel-only thing to
>> tell us that a key came from *mprotect()* instead of pkey_mprotect().
> So, pkey_mprotect(..., 0) will set it to 0, regardless of PROT_EXEC.

Although weird, the thought here was that pkey_mprotect() callers are
new and should know about the interactions with PROT_EXEC.  They can
also *get* PROT_EXEC semantics if they want.

The only wart here is if you do:

	mprotect(..., PROT_EXEC); // key 10 is now the PROT_EXEC key
	pkey_mprotect(..., PROT_EXEC, key=3);

I'm not sure what this does.  We should probably ensure that it returns
an error.

> pkey_mprotect(..., -1) or mprotect() will set it to 0-or-PROT_EXEC-pkey.
> 
> Can't shake the feeling that it's somewhat weird, but I guess it's
> flexible at least. So just has to be well documented.

It *is* weird.  But, layering on top of legacy APIs are often weird.  I
would have been open to other sane, but less weird ways to do it a year
ago. :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-24  8:35                             ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-24  8:35 UTC (permalink / raw)
  To: Dave Hansen, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/24/2017 12:29 AM, Dave Hansen wrote:
> Although weird, the thought here was that pkey_mprotect() callers are
> new and should know about the interactions with PROT_EXEC.  They can
> also*get*  PROT_EXEC semantics if they want.
> 
> The only wart here is if you do:
> 
> 	mprotect(..., PROT_EXEC); // key 10 is now the PROT_EXEC key

I thought the PROT_EXEC key is always 1?

> 	pkey_mprotect(..., PROT_EXEC, key=3);
> 
> I'm not sure what this does.  We should probably ensure that it returns
> an error.

Without protection key support, PROT_EXEC would imply PROT_READ with an 
ordinary mprotect.  I think it makes sense to stick to this behavior. 
It is what I have documented for glibc:

   <https://sourceware.org/ml/libc-alpha/2017-11/msg00841.html>

Thanks,
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-24  8:35                             ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-24  8:35 UTC (permalink / raw)
  To: Dave Hansen, Vlastimil Babka, linux-x86_64, linux-arch
  Cc: linux-mm, Linux API

On 11/24/2017 12:29 AM, Dave Hansen wrote:
> Although weird, the thought here was that pkey_mprotect() callers are
> new and should know about the interactions with PROT_EXEC.  They can
> also*get*  PROT_EXEC semantics if they want.
> 
> The only wart here is if you do:
> 
> 	mprotect(..., PROT_EXEC); // key 10 is now the PROT_EXEC key

I thought the PROT_EXEC key is always 1?

> 	pkey_mprotect(..., PROT_EXEC, key=3);
> 
> I'm not sure what this does.  We should probably ensure that it returns
> an error.

Without protection key support, PROT_EXEC would imply PROT_READ with an 
ordinary mprotect.  I think it makes sense to stick to this behavior. 
It is what I have documented for glibc:

   <https://sourceware.org/ml/libc-alpha/2017-11/msg00841.html>

Thanks,
Florian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
  2017-11-24  8:35                             ` Florian Weimer
@ 2017-11-24  8:38                               ` Vlastimil Babka
  -1 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-24  8:38 UTC (permalink / raw)
  To: Florian Weimer, Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/24/2017 09:35 AM, Florian Weimer wrote:
> On 11/24/2017 12:29 AM, Dave Hansen wrote:
>> Although weird, the thought here was that pkey_mprotect() callers are
>> new and should know about the interactions with PROT_EXEC.  They can
>> also*get*  PROT_EXEC semantics if they want.
>>
>> The only wart here is if you do:
>>
>> 	mprotect(..., PROT_EXEC); // key 10 is now the PROT_EXEC key
> 
> I thought the PROT_EXEC key is always 1?

Seems it assigns the first non-allocated one. Can even fail if there's
none left, and then there's no PROT_EXEC read protection. In practice I
expect PROT_EXEC mapping to be created by ELF loader (?) before the
program can even call pkey_alloc() itself, so it would be 1.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: removing a pkey
@ 2017-11-24  8:38                               ` Vlastimil Babka
  0 siblings, 0 replies; 57+ messages in thread
From: Vlastimil Babka @ 2017-11-24  8:38 UTC (permalink / raw)
  To: Florian Weimer, Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/24/2017 09:35 AM, Florian Weimer wrote:
> On 11/24/2017 12:29 AM, Dave Hansen wrote:
>> Although weird, the thought here was that pkey_mprotect() callers are
>> new and should know about the interactions with PROT_EXEC.  They can
>> also*get*  PROT_EXEC semantics if they want.
>>
>> The only wart here is if you do:
>>
>> 	mprotect(..., PROT_EXEC); // key 10 is now the PROT_EXEC key
> 
> I thought the PROT_EXEC key is always 1?

Seems it assigns the first non-allocated one. Can even fail if there's
none left, and then there's no PROT_EXEC read protection. In practice I
expect PROT_EXEC mapping to be created by ELF loader (?) before the
program can even call pkey_alloc() itself, so it would be 1.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-24 14:55             ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-24 14:55 UTC (permalink / raw)
  To: Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/23/2017 04:25 PM, Dave Hansen wrote:
> I don't see a way to do a broadcast PKRU update.  But, I'd love to be
> proven wrong, with code.

I could use the existing setxid broadcast code in glibc to update PKRU 
on all running threads upon a key allocation (before pkey_alloc returns 
to the application), but this won't work for the implicit protection key 
used for PROT_EXEC.  I don't see a good way to get its number, and to 
determine whether a particular mprotect call allocated it.  (We 
obviously don't want to do the broadcast on every mprotect call with 
PROT_EXEC, just in case.)

What's worse, the setxid broadcast is not async-signal-safe, so we can't 
use it from mprotect, which should better be async-signal-safe (I know 
that official, it's not, but it would still be problematic to change 
that IMHO).

(The setxid broadcast mechanism allows us to run a piece of code on all 
threads of the process.  We could look at %rip and see if the signal 
arrived during a pkey_set function call, and make sure that this call 
delivers the right result, by altering the task state before returning.)

Thanks,
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: MPK: pkey_free and key reuse
@ 2017-11-24 14:55             ` Florian Weimer
  0 siblings, 0 replies; 57+ messages in thread
From: Florian Weimer @ 2017-11-24 14:55 UTC (permalink / raw)
  To: Dave Hansen, linux-x86_64, linux-arch; +Cc: linux-mm, Linux API

On 11/23/2017 04:25 PM, Dave Hansen wrote:
> I don't see a way to do a broadcast PKRU update.  But, I'd love to be
> proven wrong, with code.

I could use the existing setxid broadcast code in glibc to update PKRU 
on all running threads upon a key allocation (before pkey_alloc returns 
to the application), but this won't work for the implicit protection key 
used for PROT_EXEC.  I don't see a good way to get its number, and to 
determine whether a particular mprotect call allocated it.  (We 
obviously don't want to do the broadcast on every mprotect call with 
PROT_EXEC, just in case.)

What's worse, the setxid broadcast is not async-signal-safe, so we can't 
use it from mprotect, which should better be async-signal-safe (I know 
that official, it's not, but it would still be problematic to change 
that IMHO).

(The setxid broadcast mechanism allows us to run a piece of code on all 
threads of the process.  We could look at %rip and see if the signal 
arrived during a pkey_set function call, and make sure that this call 
delivers the right result, by altering the task state before returning.)

Thanks,
Florian

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2017-11-24 14:55 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-05 10:35 MPK: pkey_free and key reuse Florian Weimer
2017-11-05 10:35 ` Florian Weimer
2017-11-05 10:35 ` Florian Weimer
2017-11-08 20:41 ` Dave Hansen
2017-11-08 20:41   ` Dave Hansen
2017-11-09 14:48   ` Florian Weimer
2017-11-09 14:48     ` Florian Weimer
2017-11-09 14:48     ` Florian Weimer
2017-11-09 16:59     ` Dave Hansen
2017-11-09 16:59       ` Dave Hansen
2017-11-09 16:59       ` Dave Hansen
2017-11-23 12:48       ` Florian Weimer
2017-11-23 12:48         ` Florian Weimer
2017-11-23 13:07         ` Vlastimil Babka
2017-11-23 13:07           ` Vlastimil Babka
2017-11-23 15:25         ` Dave Hansen
2017-11-23 15:25           ` Dave Hansen
2017-11-24 14:55           ` Florian Weimer
2017-11-24 14:55             ` Florian Weimer
     [not found] ` <0f006ef4-a7b5-c0cf-5f58-d0fd1f911a54-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-11-22  8:18   ` MPK: removing a pkey (was: pkey_free and key reuse) Vlastimil Babka
2017-11-22  8:18     ` Vlastimil Babka
2017-11-22  8:18     ` Vlastimil Babka
2017-11-22 12:15     ` MPK: removing a pkey Florian Weimer
2017-11-22 12:15       ` Florian Weimer
2017-11-22 12:15       ` Florian Weimer
2017-11-22 12:46       ` Vlastimil Babka
2017-11-22 12:46         ` Vlastimil Babka
2017-11-22 12:46         ` Vlastimil Babka
     [not found]         ` <f0495f01-9821-ec36-56b4-333f109eb761-AlSwsSmVLrQ@public.gmane.org>
2017-11-22 12:49           ` Florian Weimer
2017-11-22 12:49             ` Florian Weimer
2017-11-22 12:49             ` Florian Weimer
2017-11-22 16:10       ` Dave Hansen
2017-11-22 16:10         ` Dave Hansen
2017-11-22 16:21         ` Florian Weimer
2017-11-22 16:21           ` Florian Weimer
2017-11-22 16:21           ` Florian Weimer
     [not found]           ` <9ec19ff3-86f6-7cfe-1a07-1ab1c5d9882c-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-11-22 16:32             ` Dave Hansen
2017-11-22 16:32               ` Dave Hansen
2017-11-22 16:32               ` Dave Hansen
2017-11-23  8:11               ` Vlastimil Babka
2017-11-23  8:11                 ` Vlastimil Babka
     [not found]                 ` <de93997a-7802-96cf-62e2-e59416e745ca-AlSwsSmVLrQ@public.gmane.org>
2017-11-23 15:00                   ` Dave Hansen
2017-11-23 15:00                     ` Dave Hansen
2017-11-23 15:00                     ` Dave Hansen
2017-11-23 21:42                     ` Vlastimil Babka
2017-11-23 21:42                       ` Vlastimil Babka
     [not found]                       ` <2d12777f-615a-8101-2156-cf861ec13aa7-AlSwsSmVLrQ@public.gmane.org>
2017-11-23 23:29                         ` Dave Hansen
2017-11-23 23:29                           ` Dave Hansen
2017-11-23 23:29                           ` Dave Hansen
2017-11-24  8:35                           ` Florian Weimer
2017-11-24  8:35                             ` Florian Weimer
2017-11-24  8:38                             ` Vlastimil Babka
2017-11-24  8:38                               ` Vlastimil Babka
2017-11-23 12:38               ` Florian Weimer
2017-11-23 12:38                 ` Florian Weimer
2017-11-23 15:09                 ` Dave Hansen
2017-11-23 15:09                   ` Dave Hansen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.