All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/5] getcpu_cache system call
@ 2016-01-28 21:01 ` Mathieu Desnoyers
  0 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2016-01-28 21:01 UTC (permalink / raw)
  To: Thomas Gleixner, Paul Turner, Andrew Hunter, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Andi Kleen,
	Dave Watson, Chris Lameter, Ingo Molnar, Ben Maurer,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Linus Torvalds,
	Andrew Morton, Russell King, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Mathieu Desnoyers

Hi,

Here is an update to the getcpu_cache system call following the v2 round
of review. It implements a cache for the CPU number of the currently
running thread in user-space.

Benchmarks comparing this approach to a getcpu based on system call on
ARM show a 44x speedup. They show a 14x speedup on x86-64 compared to
executing lsl from a vDSO through glibc.

This patchset is sent as RFC. It applies on Linux 4.4. It is submitted
for feedback. I'm targeting the 4.6 merge window.

Feedback is welcome,

Thanks!

Mathieu

Mathieu Desnoyers (5):
  getcpu_cache system call: cache CPU number of running thread
  getcpu_cache: ARM resume notifier
  getcpu_cache: wire up ARM system call
  getcpu_cache: x86 32/64 resume notifier
  getcpu_cache: wire up x86 32/64 system call

 MAINTAINERS                            |   7 ++
 arch/arm/include/uapi/asm/unistd.h     |   1 +
 arch/arm/kernel/calls.S                |   1 +
 arch/arm/kernel/signal.c               |   1 +
 arch/x86/entry/common.c                |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/exec.c                              |   1 +
 include/linux/sched.h                  |  36 ++++++++
 include/uapi/linux/Kbuild              |   1 +
 include/uapi/linux/getcpu_cache.h      |  42 +++++++++
 init/Kconfig                           |  10 +++
 kernel/Makefile                        |   1 +
 kernel/fork.c                          |   4 +
 kernel/getcpu_cache.c                  | 150 +++++++++++++++++++++++++++++++++
 kernel/sched/sched.h                   |   1 +
 kernel/sys_ni.c                        |   3 +
 17 files changed, 262 insertions(+)
 create mode 100644 include/uapi/linux/getcpu_cache.h
 create mode 100644 kernel/getcpu_cache.c

-- 
2.1.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 0/5] getcpu_cache system call
@ 2016-01-28 21:01 ` Mathieu Desnoyers
  0 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2016-01-28 21:01 UTC (permalink / raw)
  To: Thomas Gleixner, Paul Turner, Andrew Hunter, Peter Zijlstra
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski, Andi Kleen,
	Dave Watson, Chris Lameter, Ingo Molnar, Ben Maurer,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Linus Torvalds,
	Andrew Morton, Russell King, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Mathieu Desnoyers

Hi,

Here is an update to the getcpu_cache system call following the v2 round
of review. It implements a cache for the CPU number of the currently
running thread in user-space.

Benchmarks comparing this approach to a getcpu based on system call on
ARM show a 44x speedup. They show a 14x speedup on x86-64 compared to
executing lsl from a vDSO through glibc.

This patchset is sent as RFC. It applies on Linux 4.4. It is submitted
for feedback. I'm targeting the 4.6 merge window.

Feedback is welcome,

Thanks!

Mathieu

Mathieu Desnoyers (5):
  getcpu_cache system call: cache CPU number of running thread
  getcpu_cache: ARM resume notifier
  getcpu_cache: wire up ARM system call
  getcpu_cache: x86 32/64 resume notifier
  getcpu_cache: wire up x86 32/64 system call

 MAINTAINERS                            |   7 ++
 arch/arm/include/uapi/asm/unistd.h     |   1 +
 arch/arm/kernel/calls.S                |   1 +
 arch/arm/kernel/signal.c               |   1 +
 arch/x86/entry/common.c                |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/exec.c                              |   1 +
 include/linux/sched.h                  |  36 ++++++++
 include/uapi/linux/Kbuild              |   1 +
 include/uapi/linux/getcpu_cache.h      |  42 +++++++++
 init/Kconfig                           |  10 +++
 kernel/Makefile                        |   1 +
 kernel/fork.c                          |   4 +
 kernel/getcpu_cache.c                  | 150 +++++++++++++++++++++++++++++++++
 kernel/sched/sched.h                   |   1 +
 kernel/sys_ni.c                        |   3 +
 17 files changed, 262 insertions(+)
 create mode 100644 include/uapi/linux/getcpu_cache.h
 create mode 100644 kernel/getcpu_cache.c

-- 
2.1.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 1/5] getcpu_cache system call: cache CPU number of running thread
  2016-01-28 21:01 ` Mathieu Desnoyers
  (?)
@ 2016-01-28 21:01 ` Mathieu Desnoyers
  2016-01-28 21:52     ` Thomas Gleixner
  -1 siblings, 1 reply; 15+ messages in thread
From: Mathieu Desnoyers @ 2016-01-28 21:01 UTC (permalink / raw)
  To: Thomas Gleixner, Paul Turner, Andrew Hunter, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Andi Kleen,
	Dave Watson, Chris Lameter, Ingo Molnar, Ben Maurer,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Linus Torvalds,
	Andrew Morton, Russell King, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Mathieu Desnoyers

Expose a new system call allowing threads to register one userspace
memory area where to store the CPU number on which the calling thread is
running. Scheduler migration sets the TIF_NOTIFY_RESUME flag on the
current thread. Upon return to user-space, a notify-resume handler
updates the current CPU value within each registered user-space memory
area. User-space can then read the current CPU number directly from
memory.

This getcpu cache is an improvement over current mechanisms available to
read the current CPU number, which has the following benefits:

- 44x speedup on ARM vs system call through glibc,
- 14x speedup on x86 compared to calling glibc, which calls vdso
  executing a "lsl" instruction,
- 11x speedup on x86 compared to inlined "lsl" instruction,
- Unlike vdso approaches, this cached value can be read from an inline
  assembly, which makes it a useful building block for restartable
  sequences.
- The getcpu cache approach is portable (e.g. ARM), which is not the
  case for the lsl-based x86 vdso.

On x86, yet another possible approach would be to use the gs segment
selector to point to user-space per-cpu data. This approach performs
similarly to the getcpu cache, but it has two disadvantages: it is
not portable, and it is incompatible with existing applications already
using the gs segment selector for other purposes.

This approach is inspired by Paul Turner and Andrew Hunter's work
on percpu atomics, which lets the kernel handle restart of critical
sections:
Ref.:
* https://lkml.org/lkml/2015/10/27/1095
* https://lkml.org/lkml/2015/6/24/665
* https://lwn.net/Articles/650333/
* http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf

Benchmarking various approaches for reading the current CPU number:

ARMv7 Processor rev 10 (v7l)
Machine model: Wandboard i.MX6 Quad Board
- Baseline (empty loop):               10.1 ns
- Read CPU from getcpu cache:          10.1 ns
- glibc 2.19-0ubuntu6.6 getcpu:       445.6 ns
- getcpu system call:                 322.2 ns

x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
- Baseline (empty loop):                1.0 ns
- Read CPU from getcpu cache:           1.0 ns
- Read using gs segment selector:       1.0 ns
- "lsl" inline assembly:               11.2 ns
- glibc 2.19-0ubuntu6.6 getcpu:        14.3 ns
- getcpu system call:                  51.0 ns

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: linux-api@vger.kernel.org
---

Changes since v1:
- Return -1, errno=EINVAL if cpu_cache pointer is not aligned on
  sizeof(int32_t).
- Update man page to describe the pointer alignement requirements and
  update atomicity guarantees.
- Add MAINTAINERS file GETCPU_CACHE entry.
- Remove dynamic memory allocation: go back to having a single
  getcpu_cache entry per thread. Update documentation accordingly.
- Rebased on Linux 4.4.

Changes since v2:
- Introduce a "cmd" argument, along with an enum with GETCPU_CACHE_GET
  and GETCPU_CACHE_SET. Introduce a uapi header linux/getcpu_cache.h
  defining this enumeration.
- Split resume notifier architecture implementation from the system call
  wire up in the following arch-specific patches.
- Man pages updates.
- Handle 32-bit compat pointers.
- Simplify handling of getcpu_cache GETCPU_CACHE_SET compiler barrier:
  set the current cpu cache pointer before doing the cache update, and
  set it back to NULL if the update fails. Setting it back to NULL on
  error ensures that no resume notifier will trigger a SIGSEGV if a
  migration happened concurrently.

Rationale for the getcpu_cache system call rather than the thread-local
ABI system call proposed earlier:

Rather than doing a "generic" thread-local ABI, specialize this system
call for a cpu number cache only. Anyway, the thread-local ABI approach
would have required that we introduce "feature" flags, which would have
ended up reimplementing multiplexing of features on top of a system
call. It seems better to introduce one system call per feature instead.

Man page associated:

GETCPU_CACHE(2)       Linux Programmer's Manual      GETCPU_CACHE(2)

NAME
       getcpu_cache  -  cache CPU number on which the calling thread
       is running

SYNOPSIS
       #include <linux/getcpu_cache.h>
       #include <stdint.h>

       int getcpu_cache(int cmd, int32_t **cpu_cachep, int flags);

DESCRIPTION
       The getcpu_cache() helps speeding up reading the current  CPU
       number  by  ensuring  that  the memory location registered by
       each user-space thread is always updated with the CPU  number
       on which the thread is running when reading that memory loca‐
       tion.

       The cmd argument is one of the following:

       GETCPU_CACHE_GET
              Get the pointer to the current cpu number  cache  into
              the   memory   location  targeted  by  the  cpu_cachep
              pointer.

       GETCPU_CACHE_SET
              Attempt to set the current cpu number cache  by  using
              the pointer located in the memory location targeted by
              the cpu_cachep pointer. This pointer must  be  aligned
              on 4-byte multiples (natural alignment).

       The cpu_cachep argument is a pointer to a int32_t pointer. It
       is used as an output argument for GETCPU_CACHE_GET, and as an
       input argument for GETCPU_CACHE_SET.

       The  flags argument is currently unused and must be specified
       as 0.

       Typically, a library or application will put the  cpu  number
       cache  in  a  thread-local  storage variable, or other memory
       areas belonging to each thread. It is recommended to  perform
       a  volatile  read of the cpu number cache to prevent the com‐
       piler from doing load tearing. An alternative approach is  to
       read  the  cpu  number cache from inline assembly in a single
       instruction.

       Each thread is responsible for registering its own cpu number
       cache.   Only  one  cpu  cache  address can be registered per
       thread.

       The symbol  __getcpu_cache_tls  is  recommended  to  be  used
       across  libraries  and  applications  wishing  to  register a
       thread-local getcpu_cache. The  attribute  "weak"  is  recom‐
       mended  when  declaring this variable in libraries.  Applica‐
       tions can choose to define their own version of  this  symbol
       without the weak attribute as a performance improvement.

       In  a  typical usage scenario, the thread registering the cpu
       number cache will be performing reads from that cache. It  is
       however  also allowed to read the cpu number cache from other
       threads. The cpu number cache updates performed by the kernel
       provide single-copy atomicity semantics, which guarantee that
       other threads performing single-copy atomic reads of the  cpu
       number cache will always observe a consistent value.

       Memory registered as cpu number cache should never be deallo‐
       cated before the thread which registered it  exits:  specifi‐
       cally, it should not be freed, and the library containing the
       registered thread-local storage should not be dlclose'd.

       Unregistration of associated cpu  cache  is  implicitly  per‐
       formed when a thread or process exit.

RETURN VALUE
       A  return  value  of  0  indicates  success.  On error, -1 is
       returned, and errno is set appropriately.

ERRORS
       EINVAL Either flags is non-zero,  an  invalid  cmd  has  been
              specified,  or  the  GETCPU_CACHE_GET command has been
              specified and cpu_cachep points to a location contain‐
              ing  an  invalid  address,  or  cpu_cachep points to a
              location containing an address which is not aligned on
              4-byte multiples.

       ENOSYS The  getcpu_cache()  system call is not implemented by
              this kernel.

       EFAULT cpu_cachep is an invalid address, or cpu_cachep points
              to a location containing an invalid address.

       EBUSY  The GETCPU_CACHE_SET command has been specified, and a
              cpu cache address which differs from  the  content  of
              the  memory  location  pointed  to  by  cpu_cachep  is
              already registered for this thread.

       ENOENT The GETCPU_CACHE_GET command has been  specified,  but
              no cpu cache has been registered for this thread.

VERSIONS
       The getcpu_cache() system call was added in Linux 4.X (TODO).

CONFORMING TO
       getcpu_cache() is Linux-specific.

EXAMPLE
       The  following  code  uses  the getcpu_cache() system call to
       keep a thread local storage variable up to date with the cur‐
       rent  CPU  number,  with a fallback on sched_getcpu(3) if the
       cache is not available. For example simplicity, it is done in
       main(),  but  multithreaded  programs  would  need  to invoke
       getcpu_cache() from each program thread.

           #define _GNU_SOURCE
           #include <stdlib.h>
           #include <stdio.h>
           #include <unistd.h>
           #include <stdint.h>
           #include <sched.h>
           #include <linux/getcpu_cache.h>
           #include <sys/syscall.h>

           static inline int
           getcpu_cache(int cmd, volatile int32_t **cpu_cachep, int flags)
           {
               return syscall(__NR_getcpu_cache, cmd, cpu_cachep, flags);
           }

           /*
            * __getcpu_cache_tls is recommended as symbol name for the
            * cpu number cache. Weak attribute is recommended when
            * declaring this variable in libraries. Applications can
            * choose to define their own version of this symbol without
            * the weak attribute and access it directly as a
            * performance improvement when it matches the address
            * returned by GETCPU_CACHE_GET. The initial value "-1"
            * will be read in case the getcpu cache is not available.
            */
           __thread __attribute__((weak)) volatile int32_t
                     __getcpu_cache_tls = -1;

           int
           main(int argc, char **argv)
           {
               volatile int32_t *cpu_cache = &__getcpu_cache_tls;
               int32_t cpu;

               /* Try to register the CPU cache. */
               if (getcpu_cache(GETCPU_CACHE_SET, &cpu_cache, 0) < 0) {
                   perror("getcpu_cache set");
                   fprintf(stderr, "Using sched_getcpu() as fallback.\n");
               }

               cpu = __getcpu_cache_tls;    /* Read current CPU number. */
               if (cpu < 0) {
                   /* Fallback on sched_getcpu(). */
                   cpu = sched_getcpu();
               }
               printf("Current CPU number: %d\n", cpu);

               exit(EXIT_SUCCESS);
           }

SEE ALSO
       sched_getcpu(3)

Linux                        2016-01-27              GETCPU_CACHE(2)
---
 MAINTAINERS                       |   7 ++
 fs/exec.c                         |   1 +
 include/linux/sched.h             |  36 +++++++++
 include/uapi/linux/Kbuild         |   1 +
 include/uapi/linux/getcpu_cache.h |  42 +++++++++++
 init/Kconfig                      |  10 +++
 kernel/Makefile                   |   1 +
 kernel/fork.c                     |   4 +
 kernel/getcpu_cache.c             | 150 ++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h              |   1 +
 kernel/sys_ni.c                   |   3 +
 11 files changed, 256 insertions(+)
 create mode 100644 include/uapi/linux/getcpu_cache.h
 create mode 100644 kernel/getcpu_cache.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 233f834..74872e6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4712,6 +4712,13 @@ M:	Joe Perches <joe@perches.com>
 S:	Maintained
 F:	scripts/get_maintainer.pl
 
+GETCPU_CACHE SUPPORT
+M:	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+L:	linux-kernel@vger.kernel.org
+S:	Supported
+F:	kernel/getcpu_cache.c
+F:	include/uapi/linux/getcpu_cache.h
+
 GFS2 FILE SYSTEM
 M:	Steven Whitehouse <swhiteho@redhat.com>
 M:	Bob Peterson <rpeterso@redhat.com>
diff --git a/fs/exec.c b/fs/exec.c
index b06623a..1d66af6 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1594,6 +1594,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	/* execve succeeded */
 	current->fs->in_exec = 0;
 	current->in_execve = 0;
+	getcpu_cache_execve(current);
 	acct_update_integrals(current);
 	task_numa_free(current);
 	free_bprm(bprm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index fa39434..2fa2db8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1813,6 +1813,9 @@ struct task_struct {
 	unsigned long	task_state_change;
 #endif
 	int pagefault_disabled;
+#ifdef CONFIG_GETCPU_CACHE
+	int32_t __user *cpu_cache;
+#endif
 /* CPU-specific state of this task */
 	struct thread_struct thread;
 /*
@@ -3190,4 +3193,37 @@ static inline unsigned long rlimit_max(unsigned int limit)
 	return task_rlimit_max(current, limit);
 }
 
+#ifdef CONFIG_GETCPU_CACHE
+void getcpu_cache_fork(struct task_struct *t);
+void getcpu_cache_execve(struct task_struct *t);
+void getcpu_cache_exit(struct task_struct *t);
+void __getcpu_cache_handle_notify_resume(struct task_struct *t);
+static inline void getcpu_cache_set_notify_resume(struct task_struct *t)
+{
+	if (t->cpu_cache)
+		set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
+}
+static inline void getcpu_cache_handle_notify_resume(struct task_struct *t)
+{
+	if (t->cpu_cache)
+		__getcpu_cache_handle_notify_resume(t);
+}
+#else
+static inline void getcpu_cache_fork(struct task_struct *t)
+{
+}
+static inline void getcpu_cache_execve(struct task_struct *t)
+{
+}
+static inline void getcpu_cache_exit(struct task_struct *t)
+{
+}
+static inline void getcpu_cache_set_notify_resume(struct task_struct *t)
+{
+}
+static inline void getcpu_cache_handle_notify_resume(struct task_struct *t)
+{
+}
+#endif
+
 #endif
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index c2e5d6c..98fce4a 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -136,6 +136,7 @@ header-y += futex.h
 header-y += gameport.h
 header-y += genetlink.h
 header-y += gen_stats.h
+header-y += getcpu_cache.h
 header-y += gfs2_ondisk.h
 header-y += gigaset_dev.h
 header-y += gsmmux.h
diff --git a/include/uapi/linux/getcpu_cache.h b/include/uapi/linux/getcpu_cache.h
new file mode 100644
index 0000000..25343b9
--- /dev/null
+++ b/include/uapi/linux/getcpu_cache.h
@@ -0,0 +1,42 @@
+#ifndef _UAPI_LINUX_GETCPU_CACHE_H
+#define _UAPI_LINUX_GETCPU_CACHE_H
+
+/*
+ * linux/getcpu_cache.h
+ *
+ * getcpu_cache system call API
+ *
+ * Copyright (c) 2015, 2016 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/**
+ * enum getcpu_cache_cmd - getcpu_cache system call command
+ * @GETCPU_CACHE_GET: Get the address of the current thread CPU number
+ *                    cache.
+ * @GETCPU_CACHE_SET: Set the address of the current thread CPU number
+ *                    cache.
+ */
+enum getcpu_cache_cmd {
+	GETCPU_CACHE_GET = 0,
+	GETCPU_CACHE_SET = 1,
+};
+
+#endif /* _UAPI_LINUX_GETCPU_CACHE_H */
diff --git a/init/Kconfig b/init/Kconfig
index 235c7a2..fee2fa1 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1614,6 +1614,16 @@ config MEMBARRIER
 
 	  If unsure, say Y.
 
+config GETCPU_CACHE
+	bool "Enable getcpu cache" if EXPERT
+	default y
+	help
+	  Enable the getcpu cache system call. It provides a user-space
+	  cache for the current CPU number value, which speeds up
+	  getting the current CPU number from user-space.
+
+	  If unsure, say Y.
+
 config EMBEDDED
 	bool "Embedded system"
 	option allnoconfig_y
diff --git a/kernel/Makefile b/kernel/Makefile
index 53abf00..b630247 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -103,6 +103,7 @@ obj-$(CONFIG_TORTURE_TEST) += torture.o
 obj-$(CONFIG_MEMBARRIER) += membarrier.o
 
 obj-$(CONFIG_HAS_IOMEM) += memremap.o
+obj-$(CONFIG_GETCPU_CACHE) += getcpu_cache.o
 
 $(obj)/configs.o: $(obj)/config_data.h
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 1155eac..37d0645 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -252,6 +252,7 @@ void __put_task_struct(struct task_struct *tsk)
 	WARN_ON(tsk == current);
 
 	cgroup_free(tsk);
+	getcpu_cache_exit(tsk);
 	task_numa_free(tsk);
 	security_task_free(tsk);
 	exit_creds(tsk);
@@ -1554,6 +1555,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	 */
 	copy_seccomp(p);
 
+	if (!(clone_flags & CLONE_THREAD))
+		getcpu_cache_fork(p);
+
 	/*
 	 * Process group and session signals need to be delivered to just the
 	 * parent before the fork or both the parent and the child after the
diff --git a/kernel/getcpu_cache.c b/kernel/getcpu_cache.c
new file mode 100644
index 0000000..f1251d1
--- /dev/null
+++ b/kernel/getcpu_cache.c
@@ -0,0 +1,150 @@
+/*
+ * Copyright (C) 2015 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * getcpu cache system call
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/sched.h>
+#include <linux/uaccess.h>
+#include <linux/syscalls.h>
+#include <linux/compat.h>
+#include <linux/getcpu_cache.h>
+
+static int getcpu_cache_update(int32_t __user *cpu_cache)
+{
+	if (put_user(raw_smp_processor_id(), cpu_cache))
+		return -1;
+	return 0;
+}
+
+/*
+ * This resume handler should always be executed between a migration
+ * triggered by preemption and return to user-space.
+ */
+void __getcpu_cache_handle_notify_resume(struct task_struct *t)
+{
+	if (unlikely(t->flags & PF_EXITING))
+		return;
+	if (getcpu_cache_update(t->cpu_cache))
+		force_sig(SIGSEGV, t);
+}
+
+/*
+ * If parent process has a thread-local ABI, the child inherits. Only applies
+ * when forking a process, not a thread.
+ */
+void getcpu_cache_fork(struct task_struct *t)
+{
+	t->cpu_cache = current->cpu_cache;
+}
+
+void getcpu_cache_execve(struct task_struct *t)
+{
+	t->cpu_cache = NULL;
+}
+
+void getcpu_cache_exit(struct task_struct *t)
+{
+	t->cpu_cache = NULL;
+}
+
+static int __get_cpu_cache_ptr(int32_t __user **cpu_cache,
+		int32_t __user * __user *cpu_cachep)
+{
+#ifdef CONFIG_COMPAT
+	if (is_compat_task()) {
+		compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
+		compat_uptr_t compat_cache;
+
+		if (get_user(compat_cache, compat_cachep))
+			return -EFAULT;
+		*cpu_cache = compat_ptr(compat_cache);
+		return 0;
+	}
+#endif
+	return get_user(*cpu_cache, cpu_cachep);
+}
+
+#define get_cpu_cache_ptr(cpu_cache, cpu_cachep)	\
+	__get_cpu_cache_ptr(&(cpu_cache), cpu_cachep)
+
+static int put_cpu_cache_ptr(int32_t __user *cpu_cache,
+		int32_t __user * __user *cpu_cachep)
+{
+#ifdef CONFIG_COMPAT
+	if (is_compat_task()) {
+		compat_uptr_t compat_cache = ptr_to_compat(cpu_cache);
+		compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
+
+		return put_user(compat_cache, compat_cachep);
+	}
+#endif
+	return put_user(cpu_cache, cpu_cachep);
+}
+
+/*
+ * sys_getcpu_cache - setup getcpu cache for caller thread
+ */
+SYSCALL_DEFINE3(getcpu_cache, int, cmd, int32_t __user * __user *, cpu_cachep,
+		int, flags)
+{
+	if (unlikely(flags))
+		return -EINVAL;
+	switch (cmd) {
+	case GETCPU_CACHE_GET:
+		if (!current->cpu_cache)
+			return -ENOENT;
+		if (put_cpu_cache_ptr(current->cpu_cache, cpu_cachep))
+			return -EFAULT;
+		return 0;
+	case GETCPU_CACHE_SET:
+	{
+		int32_t __user *cpu_cache;
+
+		if (get_cpu_cache_ptr(cpu_cache, cpu_cachep))
+				return -EFAULT;
+		if (unlikely(!IS_ALIGNED((unsigned long)cpu_cache,
+				sizeof(int32_t)) || !cpu_cache))
+			return -EINVAL;
+		/*
+		 * Check if cpu_cache is already registered, and whether
+		 * the address differs from *cpu_cachep.
+		 */
+		if (current->cpu_cache) {
+			if (current->cpu_cache != cpu_cache)
+				return -EBUSY;
+			return 0;
+		}
+		current->cpu_cache = cpu_cache;
+		/*
+		 * Migration checks the getcpu cache to see whether the
+		 * notify_resume flag should be set.
+		 * Therefore, we need to ensure that the scheduler sees
+		 * the getcpu cache pointer update before we update the getcpu
+		 * cache content with the current CPU number.
+		 */
+		barrier();
+		/*
+		 * Do an initial cpu cache update to ensure we won't hit
+		 * SIGSEGV if put_user() fails in the resume notifier.
+		 */
+		if (getcpu_cache_update(cpu_cache)) {
+			current->cpu_cache = NULL;
+			return -EFAULT;
+		}
+		return 0;
+	}
+	default:
+		return -EINVAL;
+	}
+}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b242775..3edcd13 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -957,6 +957,7 @@ static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 {
 	set_task_rq(p, cpu);
 #ifdef CONFIG_SMP
+	getcpu_cache_set_notify_resume(p);
 	/*
 	 * After ->cpu is set up to a new value, task_rq_lock(p, ...) can be
 	 * successfuly executed on another CPU. We must ensure that updates of
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 0623787..1e1c299 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -249,3 +249,6 @@ cond_syscall(sys_execveat);
 
 /* membarrier */
 cond_syscall(sys_membarrier);
+
+/* thread-local ABI */
+cond_syscall(sys_getcpu_cache);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 2/5] getcpu_cache: ARM resume notifier
@ 2016-01-28 21:01   ` Mathieu Desnoyers
  0 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2016-01-28 21:01 UTC (permalink / raw)
  To: Thomas Gleixner, Paul Turner, Andrew Hunter, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Andi Kleen,
	Dave Watson, Chris Lameter, Ingo Molnar, Ben Maurer,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Linus Torvalds,
	Andrew Morton, Russell King, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Mathieu Desnoyers

Call the getcpu_cache_handle_notify_resume() function on return to
userspace if TIF_NOTIFY_RESUME thread flag is set.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: linux-api@vger.kernel.org
---
 arch/arm/kernel/signal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index 7b8f214..ff5052c 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -594,6 +594,7 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall)
 			} else {
 				clear_thread_flag(TIF_NOTIFY_RESUME);
 				tracehook_notify_resume(regs);
+				getcpu_cache_handle_notify_resume(current);
 			}
 		}
 		local_irq_disable();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 2/5] getcpu_cache: ARM resume notifier
@ 2016-01-28 21:01   ` Mathieu Desnoyers
  0 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2016-01-28 21:01 UTC (permalink / raw)
  To: Thomas Gleixner, Paul Turner, Andrew Hunter, Peter Zijlstra
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski, Andi Kleen,
	Dave Watson, Chris Lameter, Ingo Molnar, Ben Maurer,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Linus Torvalds,
	Andrew Morton, Russell King, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Mathieu Desnoyers

Call the getcpu_cache_handle_notify_resume() function on return to
userspace if TIF_NOTIFY_RESUME thread flag is set.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 arch/arm/kernel/signal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index 7b8f214..ff5052c 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -594,6 +594,7 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall)
 			} else {
 				clear_thread_flag(TIF_NOTIFY_RESUME);
 				tracehook_notify_resume(regs);
+				getcpu_cache_handle_notify_resume(current);
 			}
 		}
 		local_irq_disable();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 3/5] getcpu_cache: wire up ARM system call
@ 2016-01-28 21:01   ` Mathieu Desnoyers
  0 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2016-01-28 21:01 UTC (permalink / raw)
  To: Thomas Gleixner, Paul Turner, Andrew Hunter, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Andi Kleen,
	Dave Watson, Chris Lameter, Ingo Molnar, Ben Maurer,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Linus Torvalds,
	Andrew Morton, Russell King, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Mathieu Desnoyers

Wire up the getcpu cache system call on 32-bit ARM.

This provides an ABI improving the speed of a getcpu operation
on ARM by skipping the getcpu system call on the fast path.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: linux-api@vger.kernel.org
---
 arch/arm/include/uapi/asm/unistd.h | 1 +
 arch/arm/kernel/calls.S            | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm/include/uapi/asm/unistd.h b/arch/arm/include/uapi/asm/unistd.h
index ede692f..ea0ea94 100644
--- a/arch/arm/include/uapi/asm/unistd.h
+++ b/arch/arm/include/uapi/asm/unistd.h
@@ -417,6 +417,7 @@
 #define __NR_userfaultfd		(__NR_SYSCALL_BASE+388)
 #define __NR_membarrier			(__NR_SYSCALL_BASE+389)
 #define __NR_mlock2			(__NR_SYSCALL_BASE+390)
+#define __NR_getcpu_cache		(__NR_SYSCALL_BASE+391)
 
 /*
  * The following SWIs are ARM private.
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index ac368bb..aa43e29 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -400,6 +400,7 @@
 		CALL(sys_userfaultfd)
 		CALL(sys_membarrier)
 		CALL(sys_mlock2)
+/* 391 */	CALL(sys_getcpu_cache)
 #ifndef syscalls_counted
 .equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
 #define syscalls_counted
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 3/5] getcpu_cache: wire up ARM system call
@ 2016-01-28 21:01   ` Mathieu Desnoyers
  0 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2016-01-28 21:01 UTC (permalink / raw)
  To: Thomas Gleixner, Paul Turner, Andrew Hunter, Peter Zijlstra
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski, Andi Kleen,
	Dave Watson, Chris Lameter, Ingo Molnar, Ben Maurer,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Linus Torvalds,
	Andrew Morton, Russell King, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Mathieu Desnoyers

Wire up the getcpu cache system call on 32-bit ARM.

This provides an ABI improving the speed of a getcpu operation
on ARM by skipping the getcpu system call on the fast path.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
CC: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
CC: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Chris Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>
CC: Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
CC: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>
CC: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 arch/arm/include/uapi/asm/unistd.h | 1 +
 arch/arm/kernel/calls.S            | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm/include/uapi/asm/unistd.h b/arch/arm/include/uapi/asm/unistd.h
index ede692f..ea0ea94 100644
--- a/arch/arm/include/uapi/asm/unistd.h
+++ b/arch/arm/include/uapi/asm/unistd.h
@@ -417,6 +417,7 @@
 #define __NR_userfaultfd		(__NR_SYSCALL_BASE+388)
 #define __NR_membarrier			(__NR_SYSCALL_BASE+389)
 #define __NR_mlock2			(__NR_SYSCALL_BASE+390)
+#define __NR_getcpu_cache		(__NR_SYSCALL_BASE+391)
 
 /*
  * The following SWIs are ARM private.
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index ac368bb..aa43e29 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -400,6 +400,7 @@
 		CALL(sys_userfaultfd)
 		CALL(sys_membarrier)
 		CALL(sys_mlock2)
+/* 391 */	CALL(sys_getcpu_cache)
 #ifndef syscalls_counted
 .equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
 #define syscalls_counted
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 4/5] getcpu_cache: x86 32/64 resume notifier
  2016-01-28 21:01 ` Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  (?)
@ 2016-01-28 21:01 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2016-01-28 21:01 UTC (permalink / raw)
  To: Thomas Gleixner, Paul Turner, Andrew Hunter, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Andi Kleen,
	Dave Watson, Chris Lameter, Ingo Molnar, Ben Maurer,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Linus Torvalds,
	Andrew Morton, Russell King, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Mathieu Desnoyers

Call the getcpu_cache_handle_notify_resume() function on return to
userspace if TIF_NOTIFY_RESUME thread flag is set.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: linux-api@vger.kernel.org
---
 arch/x86/entry/common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 0366374..eb6bcae 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -249,6 +249,7 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
 		if (cached_flags & _TIF_NOTIFY_RESUME) {
 			clear_thread_flag(TIF_NOTIFY_RESUME);
 			tracehook_notify_resume(regs);
+			getcpu_cache_handle_notify_resume(current);
 		}
 
 		if (cached_flags & _TIF_USER_RETURN_NOTIFY)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 5/5] getcpu_cache: wire up x86 32/64 system call
  2016-01-28 21:01 ` Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  (?)
@ 2016-01-28 21:01 ` Mathieu Desnoyers
  -1 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2016-01-28 21:01 UTC (permalink / raw)
  To: Thomas Gleixner, Paul Turner, Andrew Hunter, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Andi Kleen,
	Dave Watson, Chris Lameter, Ingo Molnar, Ben Maurer,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Linus Torvalds,
	Andrew Morton, Russell King, Catalin Marinas, Will Deacon,
	Michael Kerrisk, Mathieu Desnoyers

Wire up the getcpu_cache system call on x86 32/64.

This provides an ABI improving the speed of a getcpu operation
on x86 by removing the need to perform a function call, "lsl"
instruction, or system call on the fast path.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul Turner <pjt@google.com>
CC: Andrew Hunter <ahh@google.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: linux-api@vger.kernel.org
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index f17705e..b3ea491 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -383,3 +383,4 @@
 374	i386	userfaultfd		sys_userfaultfd
 375	i386	membarrier		sys_membarrier
 376	i386	mlock2			sys_mlock2
+377	i386	getcpu_cache		sys_getcpu_cache
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 314a90b..442aaa0 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -332,6 +332,7 @@
 323	common	userfaultfd		sys_userfaultfd
 324	common	membarrier		sys_membarrier
 325	common	mlock2			sys_mlock2
+326	common	getcpu_cache		sys_getcpu_cache
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v3 1/5] getcpu_cache system call: cache CPU number of running thread
@ 2016-01-28 21:52     ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2016-01-28 21:52 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul Turner, Andrew Hunter, Peter Zijlstra, linux-kernel,
	linux-api, Andy Lutomirski, Andi Kleen, Dave Watson,
	Chris Lameter, Ingo Molnar, Ben Maurer, Steven Rostedt,
	Paul E. McKenney, Josh Triplett, Linus Torvalds, Andrew Morton,
	Russell King, Catalin Marinas, Will Deacon, Michael Kerrisk

On Thu, 28 Jan 2016, Mathieu Desnoyers wrote:

> +static int __get_cpu_cache_ptr(int32_t __user **cpu_cache,

cpu_cache is __user ????

> +		int32_t __user * __user *cpu_cachep)
> +{
> +#ifdef CONFIG_COMPAT
> +	if (is_compat_task()) {
> +		compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
> +		compat_uptr_t compat_cache;
> +
> +		if (get_user(compat_cache, compat_cachep))
> +			return -EFAULT;
> +		*cpu_cache = compat_ptr(compat_cache);

sparse should have told you that :)

> +		return 0;
> +	}
> +#endif
> +	return get_user(*cpu_cache, cpu_cachep);
> +}
> +
> +#define get_cpu_cache_ptr(cpu_cache, cpu_cachep)	\
> +	__get_cpu_cache_ptr(&(cpu_cache), cpu_cachep)
> +
> +static int put_cpu_cache_ptr(int32_t __user *cpu_cache,

Ditto

> +		int32_t __user * __user *cpu_cachep)
> +{
> +#ifdef CONFIG_COMPAT
> +	if (is_compat_task()) {
> +		compat_uptr_t compat_cache = ptr_to_compat(cpu_cache);
> +		compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
> +
> +		return put_user(compat_cache, compat_cachep);
> +	}
> +#endif
> +	return put_user(cpu_cache, cpu_cachep);
> +}
> +		current->cpu_cache = cpu_cache;
> +		/*
> +		 * Migration checks the getcpu cache to see whether the
> +		 * notify_resume flag should be set.
> +		 * Therefore, we need to ensure that the scheduler sees
> +		 * the getcpu cache pointer update before we update the getcpu
> +		 * cache content with the current CPU number.
> +		 */
> +		barrier();

And how does that barrier ensure this? Not at all. And why would the scheduler
care? All the scheduler cares about is tsk->cpu_cache.

> +		/*
> +		 * Do an initial cpu cache update to ensure we won't hit
> +		 * SIGSEGV if put_user() fails in the resume notifier.
> +		 */

If you get migrated before that call, then you SIGSEGV nevertheless.

You need that call here for the case you are NOT migrated before returning to
user space because otherwise the variable is not updated.

If you want to verify that user address without a potential SIGSEGV, then you
need to do this before setting current->cpu_cache. You still need the update
after setting current->cpu_cache.

> +		if (getcpu_cache_update(cpu_cache)) {
> +			current->cpu_cache = NULL;
> +			return -EFAULT;
> +		}
> +		return 0;

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v3 1/5] getcpu_cache system call: cache CPU number of running thread
@ 2016-01-28 21:52     ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2016-01-28 21:52 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul Turner, Andrew Hunter, Peter Zijlstra,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski, Andi Kleen,
	Dave Watson, Chris Lameter, Ingo Molnar, Ben Maurer,
	Steven Rostedt, Paul E. McKenney, Josh Triplett, Linus Torvalds,
	Andrew Morton, Russell King, Catalin Marinas, Will Deacon,
	Michael Kerrisk

On Thu, 28 Jan 2016, Mathieu Desnoyers wrote:

> +static int __get_cpu_cache_ptr(int32_t __user **cpu_cache,

cpu_cache is __user ????

> +		int32_t __user * __user *cpu_cachep)
> +{
> +#ifdef CONFIG_COMPAT
> +	if (is_compat_task()) {
> +		compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
> +		compat_uptr_t compat_cache;
> +
> +		if (get_user(compat_cache, compat_cachep))
> +			return -EFAULT;
> +		*cpu_cache = compat_ptr(compat_cache);

sparse should have told you that :)

> +		return 0;
> +	}
> +#endif
> +	return get_user(*cpu_cache, cpu_cachep);
> +}
> +
> +#define get_cpu_cache_ptr(cpu_cache, cpu_cachep)	\
> +	__get_cpu_cache_ptr(&(cpu_cache), cpu_cachep)
> +
> +static int put_cpu_cache_ptr(int32_t __user *cpu_cache,

Ditto

> +		int32_t __user * __user *cpu_cachep)
> +{
> +#ifdef CONFIG_COMPAT
> +	if (is_compat_task()) {
> +		compat_uptr_t compat_cache = ptr_to_compat(cpu_cache);
> +		compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
> +
> +		return put_user(compat_cache, compat_cachep);
> +	}
> +#endif
> +	return put_user(cpu_cache, cpu_cachep);
> +}
> +		current->cpu_cache = cpu_cache;
> +		/*
> +		 * Migration checks the getcpu cache to see whether the
> +		 * notify_resume flag should be set.
> +		 * Therefore, we need to ensure that the scheduler sees
> +		 * the getcpu cache pointer update before we update the getcpu
> +		 * cache content with the current CPU number.
> +		 */
> +		barrier();

And how does that barrier ensure this? Not at all. And why would the scheduler
care? All the scheduler cares about is tsk->cpu_cache.

> +		/*
> +		 * Do an initial cpu cache update to ensure we won't hit
> +		 * SIGSEGV if put_user() fails in the resume notifier.
> +		 */

If you get migrated before that call, then you SIGSEGV nevertheless.

You need that call here for the case you are NOT migrated before returning to
user space because otherwise the variable is not updated.

If you want to verify that user address without a potential SIGSEGV, then you
need to do this before setting current->cpu_cache. You still need the update
after setting current->cpu_cache.

> +		if (getcpu_cache_update(cpu_cache)) {
> +			current->cpu_cache = NULL;
> +			return -EFAULT;
> +		}
> +		return 0;

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v3 1/5] getcpu_cache system call: cache CPU number of running thread
  2016-01-28 21:52     ` Thomas Gleixner
@ 2016-01-28 22:44       ` Mathieu Desnoyers
  -1 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2016-01-28 22:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paul Turner, Andrew Hunter, Peter Zijlstra, linux-kernel,
	linux-api, Andy Lutomirski, Andi Kleen, Dave Watson,
	Chris Lameter, Ingo Molnar, Ben Maurer, rostedt,
	Paul E. McKenney, Josh Triplett, Linus Torvalds, Andrew Morton,
	Russell King, Catalin Marinas, Will Deacon, Michael Kerrisk

----- On Jan 28, 2016, at 4:52 PM, Thomas Gleixner tglx@linutronix.de wrote:

> On Thu, 28 Jan 2016, Mathieu Desnoyers wrote:
> 
>> +static int __get_cpu_cache_ptr(int32_t __user **cpu_cache,
> 
> cpu_cache is __user ????

Note that I mimicked the "get_user()" macro: the first
argument to "get_cpu_cache_ptr()" is a variable within the
kernel address space, from which we take the address
and pass it as first parameter of __get_cpu_cache_ptr().

int32_t __user **cpu_cache is therefore the right typing
here. I had wrongfully assumed that compat_uptr_t contained
an implicit __user tagging, which is my error. This patch
addresses the current sparse warning:

diff --git a/kernel/getcpu_cache.c b/kernel/getcpu_cache.c
index f1251d1..7053611 100644
--- a/kernel/getcpu_cache.c
+++ b/kernel/getcpu_cache.c
@@ -63,7 +63,7 @@ static int __get_cpu_cache_ptr(int32_t __user **cpu_cache,
 {
 #ifdef CONFIG_COMPAT
        if (is_compat_task()) {
-               compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
+               compat_uptr_t __user *compat_cachep = (compat_uptr_t __user *) cpu_cachep;
                compat_uptr_t compat_cache;
 
                if (get_user(compat_cache, compat_cachep))
@@ -84,7 +84,7 @@ static int put_cpu_cache_ptr(int32_t __user *cpu_cache,
 #ifdef CONFIG_COMPAT
        if (is_compat_task()) {
                compat_uptr_t compat_cache = ptr_to_compat(cpu_cache);
-               compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
+               compat_uptr_t __user *compat_cachep = (compat_uptr_t __user *) cpu_cachep;
 
                return put_user(compat_cache, compat_cachep);
        }


> 
>> +		int32_t __user * __user *cpu_cachep)
>> +{
>> +#ifdef CONFIG_COMPAT
>> +	if (is_compat_task()) {
>> +		compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
>> +		compat_uptr_t compat_cache;
>> +
>> +		if (get_user(compat_cache, compat_cachep))
>> +			return -EFAULT;
>> +		*cpu_cache = compat_ptr(compat_cache);
> 
> sparse should have told you that :)
> 
>> +		return 0;
>> +	}
>> +#endif
>> +	return get_user(*cpu_cache, cpu_cachep);
>> +}
>> +
>> +#define get_cpu_cache_ptr(cpu_cache, cpu_cachep)	\
>> +	__get_cpu_cache_ptr(&(cpu_cache), cpu_cachep)
>> +
>> +static int put_cpu_cache_ptr(int32_t __user *cpu_cache,
> 
> Ditto
> 
>> +		int32_t __user * __user *cpu_cachep)
>> +{
>> +#ifdef CONFIG_COMPAT
>> +	if (is_compat_task()) {
>> +		compat_uptr_t compat_cache = ptr_to_compat(cpu_cache);
>> +		compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
>> +
>> +		return put_user(compat_cache, compat_cachep);
>> +	}
>> +#endif
>> +	return put_user(cpu_cache, cpu_cachep);
>> +}
>> +		current->cpu_cache = cpu_cache;
>> +		/*
>> +		 * Migration checks the getcpu cache to see whether the
>> +		 * notify_resume flag should be set.
>> +		 * Therefore, we need to ensure that the scheduler sees
>> +		 * the getcpu cache pointer update before we update the getcpu
>> +		 * cache content with the current CPU number.
>> +		 */
>> +		barrier();
> 
> And how does that barrier ensure this? Not at all. And why would the scheduler
> care? All the scheduler cares about is tsk->cpu_cache.

The case I want to ensure never happens is the following:

Compiler reorders storing the address of current->cpu_cache after
the getcpu_cache_update() store to *cpu_cache. In between, the
scheduler preempts and migrates the task, but does not set the
resume notifier thread flag because it still see a NULL
current->cpu_cache. We therefore return to userspace with a
wrong CPU number in the cache.

The compiler barrier enforces ordering of the current->cpu_cache
address store before updating the *cpu_cache.

> 
>> +		/*
>> +		 * Do an initial cpu cache update to ensure we won't hit
>> +		 * SIGSEGV if put_user() fails in the resume notifier.
>> +		 */
> 
> If you get migrated before that call, then you SIGSEGV nevertheless.

No, because the SIGSEGV is only triggered when returning to userspace.
We are still in the system call here. All we care about in the migration
schedule code is to check the current->cpu_cache to see if we need to
raise the resume notifier flag. No userspace access there.

> 
> You need that call here for the case you are NOT migrated before returning to
> user space because otherwise the variable is not updated.

This call has two goals: indeed, populating the initial current CPU value,
but also checking if the address is valid (and -EFAULT on error).

> 
> If you want to verify that user address without a potential SIGSEGV, then you
> need to do this before setting current->cpu_cache. You still need the update
> after setting current->cpu_cache.

No, because the SIGSEGV can only be triggered by the resume
notifier when returning to userspace from the getcpu_cache
system call. All we need to do is to ensure the resume notifier
sees a NULL current->cpu_cache, and we don't have any segfault.

Does it make sense, or am I missing something ?

Thanks,

Mathieu

> 
>> +		if (getcpu_cache_update(cpu_cache)) {
>> +			current->cpu_cache = NULL;
>> +			return -EFAULT;
>> +		}
>> +		return 0;
> 
> Thanks,
> 
> 	tglx

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v3 1/5] getcpu_cache system call: cache CPU number of running thread
@ 2016-01-28 22:44       ` Mathieu Desnoyers
  0 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2016-01-28 22:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paul Turner, Andrew Hunter, Peter Zijlstra,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-api, Andy Lutomirski,
	Andi Kleen, Dave Watson, Chris Lameter, Ingo Molnar, Ben Maurer,
	rostedt, Paul E. McKenney, Josh Triplett, Linus Torvalds,
	Andrew Morton, Russell King, Catalin Marinas, Will Deacon,
	Michael Kerrisk

----- On Jan 28, 2016, at 4:52 PM, Thomas Gleixner tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org wrote:

> On Thu, 28 Jan 2016, Mathieu Desnoyers wrote:
> 
>> +static int __get_cpu_cache_ptr(int32_t __user **cpu_cache,
> 
> cpu_cache is __user ????

Note that I mimicked the "get_user()" macro: the first
argument to "get_cpu_cache_ptr()" is a variable within the
kernel address space, from which we take the address
and pass it as first parameter of __get_cpu_cache_ptr().

int32_t __user **cpu_cache is therefore the right typing
here. I had wrongfully assumed that compat_uptr_t contained
an implicit __user tagging, which is my error. This patch
addresses the current sparse warning:

diff --git a/kernel/getcpu_cache.c b/kernel/getcpu_cache.c
index f1251d1..7053611 100644
--- a/kernel/getcpu_cache.c
+++ b/kernel/getcpu_cache.c
@@ -63,7 +63,7 @@ static int __get_cpu_cache_ptr(int32_t __user **cpu_cache,
 {
 #ifdef CONFIG_COMPAT
        if (is_compat_task()) {
-               compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
+               compat_uptr_t __user *compat_cachep = (compat_uptr_t __user *) cpu_cachep;
                compat_uptr_t compat_cache;
 
                if (get_user(compat_cache, compat_cachep))
@@ -84,7 +84,7 @@ static int put_cpu_cache_ptr(int32_t __user *cpu_cache,
 #ifdef CONFIG_COMPAT
        if (is_compat_task()) {
                compat_uptr_t compat_cache = ptr_to_compat(cpu_cache);
-               compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
+               compat_uptr_t __user *compat_cachep = (compat_uptr_t __user *) cpu_cachep;
 
                return put_user(compat_cache, compat_cachep);
        }


> 
>> +		int32_t __user * __user *cpu_cachep)
>> +{
>> +#ifdef CONFIG_COMPAT
>> +	if (is_compat_task()) {
>> +		compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
>> +		compat_uptr_t compat_cache;
>> +
>> +		if (get_user(compat_cache, compat_cachep))
>> +			return -EFAULT;
>> +		*cpu_cache = compat_ptr(compat_cache);
> 
> sparse should have told you that :)
> 
>> +		return 0;
>> +	}
>> +#endif
>> +	return get_user(*cpu_cache, cpu_cachep);
>> +}
>> +
>> +#define get_cpu_cache_ptr(cpu_cache, cpu_cachep)	\
>> +	__get_cpu_cache_ptr(&(cpu_cache), cpu_cachep)
>> +
>> +static int put_cpu_cache_ptr(int32_t __user *cpu_cache,
> 
> Ditto
> 
>> +		int32_t __user * __user *cpu_cachep)
>> +{
>> +#ifdef CONFIG_COMPAT
>> +	if (is_compat_task()) {
>> +		compat_uptr_t compat_cache = ptr_to_compat(cpu_cache);
>> +		compat_uptr_t *compat_cachep = (compat_uptr_t *) cpu_cachep;
>> +
>> +		return put_user(compat_cache, compat_cachep);
>> +	}
>> +#endif
>> +	return put_user(cpu_cache, cpu_cachep);
>> +}
>> +		current->cpu_cache = cpu_cache;
>> +		/*
>> +		 * Migration checks the getcpu cache to see whether the
>> +		 * notify_resume flag should be set.
>> +		 * Therefore, we need to ensure that the scheduler sees
>> +		 * the getcpu cache pointer update before we update the getcpu
>> +		 * cache content with the current CPU number.
>> +		 */
>> +		barrier();
> 
> And how does that barrier ensure this? Not at all. And why would the scheduler
> care? All the scheduler cares about is tsk->cpu_cache.

The case I want to ensure never happens is the following:

Compiler reorders storing the address of current->cpu_cache after
the getcpu_cache_update() store to *cpu_cache. In between, the
scheduler preempts and migrates the task, but does not set the
resume notifier thread flag because it still see a NULL
current->cpu_cache. We therefore return to userspace with a
wrong CPU number in the cache.

The compiler barrier enforces ordering of the current->cpu_cache
address store before updating the *cpu_cache.

> 
>> +		/*
>> +		 * Do an initial cpu cache update to ensure we won't hit
>> +		 * SIGSEGV if put_user() fails in the resume notifier.
>> +		 */
> 
> If you get migrated before that call, then you SIGSEGV nevertheless.

No, because the SIGSEGV is only triggered when returning to userspace.
We are still in the system call here. All we care about in the migration
schedule code is to check the current->cpu_cache to see if we need to
raise the resume notifier flag. No userspace access there.

> 
> You need that call here for the case you are NOT migrated before returning to
> user space because otherwise the variable is not updated.

This call has two goals: indeed, populating the initial current CPU value,
but also checking if the address is valid (and -EFAULT on error).

> 
> If you want to verify that user address without a potential SIGSEGV, then you
> need to do this before setting current->cpu_cache. You still need the update
> after setting current->cpu_cache.

No, because the SIGSEGV can only be triggered by the resume
notifier when returning to userspace from the getcpu_cache
system call. All we need to do is to ensure the resume notifier
sees a NULL current->cpu_cache, and we don't have any segfault.

Does it make sense, or am I missing something ?

Thanks,

Mathieu

> 
>> +		if (getcpu_cache_update(cpu_cache)) {
>> +			current->cpu_cache = NULL;
>> +			return -EFAULT;
>> +		}
>> +		return 0;
> 
> Thanks,
> 
> 	tglx

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v3 1/5] getcpu_cache system call: cache CPU number of running thread
  2016-01-28 22:44       ` Mathieu Desnoyers
  (?)
@ 2016-01-29  8:39       ` Thomas Gleixner
  2016-01-29 16:28         ` Mathieu Desnoyers
  -1 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2016-01-29  8:39 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul Turner, Andrew Hunter, Peter Zijlstra, linux-kernel,
	linux-api, Andy Lutomirski, Andi Kleen, Dave Watson,
	Chris Lameter, Ingo Molnar, Ben Maurer, rostedt,
	Paul E. McKenney, Josh Triplett, Linus Torvalds, Andrew Morton,
	Russell King, Catalin Marinas, Will Deacon, Michael Kerrisk

On Thu, 28 Jan 2016, Mathieu Desnoyers wrote:
> ----- On Jan 28, 2016, at 4:52 PM, Thomas Gleixner tglx@linutronix.de wrote:
> >> +		current->cpu_cache = cpu_cache;
> >> +		/*
> >> +		 * Migration checks the getcpu cache to see whether the
> >> +		 * notify_resume flag should be set.
> >> +		 * Therefore, we need to ensure that the scheduler sees
> >> +		 * the getcpu cache pointer update before we update the getcpu
> >> +		 * cache content with the current CPU number.
> >> +		 */
> >> +		barrier();
> > 
> > And how does that barrier ensure this? Not at all. And why would the scheduler
> > care? All the scheduler cares about is tsk->cpu_cache.
> 
> The case I want to ensure never happens is the following:
> 
> Compiler reorders storing the address of current->cpu_cache after
> the getcpu_cache_update() store to *cpu_cache. In between, the
> scheduler preempts and migrates the task, but does not set the
> resume notifier thread flag because it still see a NULL
> current->cpu_cache. We therefore return to userspace with a
> wrong CPU number in the cache.
> 
> The compiler barrier enforces ordering of the current->cpu_cache
> address store before updating the *cpu_cache.

Fair enough. Updating the comment might help.

> > 
> >> +		/*
> >> +		 * Do an initial cpu cache update to ensure we won't hit
> >> +		 * SIGSEGV if put_user() fails in the resume notifier.
> >> +		 */
> > 
> > If you get migrated before that call, then you SIGSEGV nevertheless.
> 
> No, because the SIGSEGV is only triggered when returning to userspace.
> We are still in the system call here. All we care about in the migration
> schedule code is to check the current->cpu_cache to see if we need to
> raise the resume notifier flag. No userspace access there.

True. Should have went to bed instead of staring at that code tired :)
 
> > You need that call here for the case you are NOT migrated before returning to
> > user space because otherwise the variable is not updated.
> 
> This call has two goals: indeed, populating the initial current CPU value,
> but also checking if the address is valid (and -EFAULT on error).

Right. So the comment should mention both.
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v3 1/5] getcpu_cache system call: cache CPU number of running thread
  2016-01-29  8:39       ` Thomas Gleixner
@ 2016-01-29 16:28         ` Mathieu Desnoyers
  0 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2016-01-29 16:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paul Turner, Andrew Hunter, Peter Zijlstra, linux-kernel,
	linux-api, Andy Lutomirski, Andi Kleen, Dave Watson,
	Chris Lameter, Ingo Molnar, Ben Maurer, rostedt,
	Paul E. McKenney, Josh Triplett, Linus Torvalds, Andrew Morton,
	Russell King, Catalin Marinas, Will Deacon, Michael Kerrisk

----- On Jan 29, 2016, at 3:39 AM, Thomas Gleixner tglx@linutronix.de wrote:

> On Thu, 28 Jan 2016, Mathieu Desnoyers wrote:
>> ----- On Jan 28, 2016, at 4:52 PM, Thomas Gleixner tglx@linutronix.de wrote:
>> >> +		current->cpu_cache = cpu_cache;
>> >> +		/*
>> >> +		 * Migration checks the getcpu cache to see whether the
>> >> +		 * notify_resume flag should be set.
>> >> +		 * Therefore, we need to ensure that the scheduler sees
>> >> +		 * the getcpu cache pointer update before we update the getcpu
>> >> +		 * cache content with the current CPU number.
>> >> +		 */
>> >> +		barrier();
>> > 
>> > And how does that barrier ensure this? Not at all. And why would the scheduler
>> > care? All the scheduler cares about is tsk->cpu_cache.
>> 
>> The case I want to ensure never happens is the following:
>> 
>> Compiler reorders storing the address of current->cpu_cache after
>> the getcpu_cache_update() store to *cpu_cache. In between, the
>> scheduler preempts and migrates the task, but does not set the
>> resume notifier thread flag because it still see a NULL
>> current->cpu_cache. We therefore return to userspace with a
>> wrong CPU number in the cache.
>> 
>> The compiler barrier enforces ordering of the current->cpu_cache
>> address store before updating the *cpu_cache.
> 
> Fair enough. Updating the comment might help.
> 
>> > 
>> >> +		/*
>> >> +		 * Do an initial cpu cache update to ensure we won't hit
>> >> +		 * SIGSEGV if put_user() fails in the resume notifier.
>> >> +		 */
>> > 
>> > If you get migrated before that call, then you SIGSEGV nevertheless.
>> 
>> No, because the SIGSEGV is only triggered when returning to userspace.
>> We are still in the system call here. All we care about in the migration
>> schedule code is to check the current->cpu_cache to see if we need to
>> raise the resume notifier flag. No userspace access there.
> 
> True. Should have went to bed instead of staring at that code tired :)
> 
>> > You need that call here for the case you are NOT migrated before returning to
>> > user space because otherwise the variable is not updated.
>> 
>> This call has two goals: indeed, populating the initial current CPU value,
>> but also checking if the address is valid (and -EFAULT on error).
> 
> Right. So the comment should mention both.

Sure, I'm proposing the following documentation update:

diff --git a/kernel/getcpu_cache.c b/kernel/getcpu_cache.c
index 7053611..044f246 100644
--- a/kernel/getcpu_cache.c
+++ b/kernel/getcpu_cache.c
@@ -127,16 +127,27 @@ SYSCALL_DEFINE3(getcpu_cache, int, cmd, int32_t __user * __user *, cpu_cachep,
                }
                current->cpu_cache = cpu_cache;
                /*
-                * Migration checks the getcpu cache to see whether the
-                * notify_resume flag should be set.
+                * Migration reads the current->cpu_cache pointer to
+                * decide whether the notify_resume flag should be set.
                 * Therefore, we need to ensure that the scheduler sees
-                * the getcpu cache pointer update before we update the getcpu
-                * cache content with the current CPU number.
+                * the getcpu cache pointer update before we update the
+                * getcpu cache content with the current CPU number.
+                * This ensures we don't return from the getcpu_cache
+                * system call to userspace with a wrong CPU number in
+                * the cache if preempted and migrated after the initial
+                * successful cpu cache update (below).
+                *
+                * This compiler barrier enforces ordering of the
+                * current->cpu_cache address store before update of the
+                * *cpu_cache.
                 */
                barrier();
                /*
-                * Do an initial cpu cache update to ensure we won't hit
-                * SIGSEGV if put_user() fails in the resume notifier.
+                * Do an initial cpu cache update to populate the
+                * current CPU value, and to check whether the address
+                * is valid, thus ensuring we return -EFAULT in case or
+                * invalid address rather than triggering a SIGSEGV if
+                * put_user() fails in the resume notifier.
                 */
                if (getcpu_cache_update(cpu_cache)) {
                        current->cpu_cache = NULL;

Thanks!

Mathieu


> 
> Thanks,
> 
> 	tglx

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-01-29 16:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-28 21:01 [RFC PATCH v3 0/5] getcpu_cache system call Mathieu Desnoyers
2016-01-28 21:01 ` Mathieu Desnoyers
2016-01-28 21:01 ` [RFC PATCH v3 1/5] getcpu_cache system call: cache CPU number of running thread Mathieu Desnoyers
2016-01-28 21:52   ` Thomas Gleixner
2016-01-28 21:52     ` Thomas Gleixner
2016-01-28 22:44     ` Mathieu Desnoyers
2016-01-28 22:44       ` Mathieu Desnoyers
2016-01-29  8:39       ` Thomas Gleixner
2016-01-29 16:28         ` Mathieu Desnoyers
2016-01-28 21:01 ` [RFC PATCH v3 2/5] getcpu_cache: ARM resume notifier Mathieu Desnoyers
2016-01-28 21:01   ` Mathieu Desnoyers
2016-01-28 21:01 ` [RFC PATCH v3 3/5] getcpu_cache: wire up ARM system call Mathieu Desnoyers
2016-01-28 21:01   ` Mathieu Desnoyers
2016-01-28 21:01 ` [RFC PATCH v3 4/5] getcpu_cache: x86 32/64 resume notifier Mathieu Desnoyers
2016-01-28 21:01 ` [RFC PATCH v3 5/5] getcpu_cache: wire up x86 32/64 system call Mathieu Desnoyers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.