All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 1/2] membarrier: speed up single-threaded private cmd
@ 2017-08-27 19:54 Mathieu Desnoyers
  2017-08-27 19:54 ` [RFC PATCH 2/2] membarrier: provide register sync core cmd Mathieu Desnoyers
  0 siblings, 1 reply; 9+ messages in thread
From: Mathieu Desnoyers @ 2017-08-27 19:54 UTC (permalink / raw)
  To: Paul E . McKenney, Peter Zijlstra
  Cc: linux-kernel, Mathieu Desnoyers, Boqun Feng, Andrew Hunter,
	Maged Michael, gromer, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson

Single-threaded processes don't need to issue barriers, since the
thread's current CPU is the only one that matters.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: gromer@google.com
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
---
 kernel/sched/membarrier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index a92fddc22747..7eec6914d2d2 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -39,7 +39,7 @@ static void membarrier_private_expedited(void)
 	bool fallback = false;
 	cpumask_var_t tmpmask;
 
-	if (num_online_cpus() == 1)
+	if (num_online_cpus() == 1 || get_nr_threads(current) == 1)
 		return;
 
 	/*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 2/2] membarrier: provide register sync core cmd
  2017-08-27 19:54 [RFC PATCH 1/2] membarrier: speed up single-threaded private cmd Mathieu Desnoyers
@ 2017-08-27 19:54 ` Mathieu Desnoyers
  2017-08-27 20:35   ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Mathieu Desnoyers @ 2017-08-27 19:54 UTC (permalink / raw)
  To: Paul E . McKenney, Peter Zijlstra
  Cc: linux-kernel, Mathieu Desnoyers, Boqun Feng, Andrew Hunter,
	Maged Michael, gromer, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski,
	Will Deacon, Hans Boehm

Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier
system call. It allows processes to register their intent to have their
threads issue core serializing barriers in addition to memory barriers
whenever a membarrier command is performed.

It is relevant for reclaim of JIT code, which requires to issue core
serializing barriers on all threads running on behalf of a process
after ensuring the old code is not visible anymore, before re-using
memory for new code.

When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE
command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and
MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in
addition to the memory barriers, on all of its running threads.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: gromer@google.com
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Will Deacon <will.deacon@arm.com>
CC: Hans Boehm <hboehm@google.com>
---
 fs/exec.c                       |  1 +
 include/linux/sched.h           | 52 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/membarrier.h |  1 +
 kernel/fork.c                   |  2 ++
 kernel/sched/core.c             |  3 +++
 kernel/sched/membarrier.c       | 37 ++++++++++++++++++++++++++---
 6 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 62175cbcc801..a4ab3253bac7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	/* execve succeeded */
 	current->fs->in_exec = 0;
 	current->in_execve = 0;
+	membarrier_execve(current);
 	acct_update_integrals(current);
 	task_numa_free(current);
 	free_bprm(bprm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8337e2db0bb2..b1ecdc4e8b84 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1086,6 +1086,9 @@ struct task_struct {
 	/* Used by LSM modules for access restriction: */
 	void				*security;
 #endif
+#ifdef CONFIG_MEMBARRIER
+	int membarrier_sync_core;
+#endif
 
 	/*
 	 * New fields for task_struct should be added above here, so that
@@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
 #define TASK_SIZE_OF(tsk)	TASK_SIZE
 #endif
 
+#ifdef CONFIG_MEMBARRIER
+static inline void membarrier_fork(struct task_struct *t,
+		unsigned long clone_flags)
+{
+	/*
+	 * Coherence of membarrier_sync_core against thread fork is
+	 * protected by siglock. membarrier_fork is called with siglock
+	 * held.
+	 */
+	t->membarrier_sync_core = current->membarrier_sync_core;
+}
+static inline void membarrier_execve(struct task_struct *t)
+{
+	t->membarrier_sync_core = 0;
+}
+static inline void membarrier_sched_out(struct task_struct *t)
+{
+	/*
+	 * Core serialization is performed before the memory barrier
+	 * preceding the store to rq->curr.
+	 */
+	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
+		sync_core();
+}
+static inline void membarrier_sched_in(struct task_struct *t)
+{
+	/*
+	 * Core serialization is performed after the memory barrier
+	 * following the store to rq->curr.
+	 */
+	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
+		sync_core();
+}
+#else
+static inline void membarrier_fork(struct task_struct *t,
+		unsigned long clone_flags)
+{
+}
+static inline void membarrier_execve(struct task_struct *t)
+{
+}
+static inline void membarrier_sched_out(struct task_struct *t)
+{
+}
+static inline void membarrier_sched_in(struct task_struct *t)
+{
+}
+#endif
+
 #endif
diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index 6d47b3249d8a..bc26fcf8fb7b 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -67,6 +67,7 @@ enum membarrier_cmd {
 	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
 	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
 	MEMBARRIER_CMD_PRIVATE_EXPEDITED	= (1 << 3),
+	MEMBARRIER_CMD_REGISTER_SYNC_CORE	= (1 << 4),
 };
 
 #endif /* _UAPI_LINUX_MEMBARRIER_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 17921b0390b4..713b3c932671 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	copy_seccomp(p);
 
+	membarrier_fork(p, clone_flags);
+
 	/*
 	 * Process group and session signals need to be delivered to just the
 	 * parent before the fork or both the parent and the child after the
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0e36d9960d91..3ca27d066950 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt)
 	local_irq_disable();
 	rcu_note_context_switch(preempt);
 
+	membarrier_sched_out(prev);
+
 	/*
 	 * Make sure that signal_pending_state()->signal_pending() below
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
@@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt)
 
 		/* Also unlocks the rq: */
 		rq = context_switch(rq, prev, next, &rf);
+		membarrier_sched_in(next);
 	} else {
 		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
 		rq_unlock_irq(rq, &rf);
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 7eec6914d2d2..f128caf64b49 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -25,12 +25,16 @@
  * Bitmask made from a "or" of all commands within enum membarrier_cmd,
  * except MEMBARRIER_CMD_QUERY.
  */
-#define MEMBARRIER_CMD_BITMASK	\
-	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED)
+#define MEMBARRIER_CMD_BITMASK			\
+	(MEMBARRIER_CMD_SHARED			\
+	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
+	| MEMBARRIER_CMD_REGISTER_SYNC_CORE)
 
 static void ipi_mb(void *info)
 {
-	smp_mb();	/* IPIs should be serializing but paranoid. */
+	/* IPIs should be serializing but paranoid. */
+	smp_mb();
+	sync_core();
 }
 
 static void membarrier_private_expedited(void)
@@ -96,6 +100,30 @@ static void membarrier_private_expedited(void)
 	smp_mb();	/* exit from system call is not a mb */
 }
 
+static void membarrier_register_sync_core(void)
+{
+	struct task_struct *p = current, *t;
+
+	if (get_nr_threads(p) == 1) {
+		p->membarrier_sync_core = 1;
+		return;
+	}
+
+	/*
+	 * Coherence of membarrier_sync_core against thread fork is
+	 * protected by siglock.
+	 */
+	spin_lock(&p->sighand->siglock);
+	for_each_thread(p, t)
+		WRITE_ONCE(t->membarrier_sync_core, 1);
+	spin_unlock(&p->sighand->siglock);
+	/*
+	 * Ensure all future scheduler execution will observe the new
+	 * membarrier_sync_core state for this process.
+	 */
+	synchronize_sched();
+}
+
 /**
  * sys_membarrier - issue memory barriers on a set of threads
  * @cmd:   Takes command values defined in enum membarrier_cmd.
@@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
 		membarrier_private_expedited();
 		return 0;
+	case MEMBARRIER_CMD_REGISTER_SYNC_CORE:
+		membarrier_register_sync_core();
+		return 0;
 	default:
 		return -EINVAL;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd
  2017-08-27 19:54 ` [RFC PATCH 2/2] membarrier: provide register sync core cmd Mathieu Desnoyers
@ 2017-08-27 20:35   ` Paul E. McKenney
  2017-08-27 20:52     ` Mathieu Desnoyers
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2017-08-27 20:35 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter,
	Maged Michael, gromer, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski,
	Will Deacon, Hans Boehm

On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote:
> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier
> system call. It allows processes to register their intent to have their
> threads issue core serializing barriers in addition to memory barriers
> whenever a membarrier command is performed.
> 
> It is relevant for reclaim of JIT code, which requires to issue core
> serializing barriers on all threads running on behalf of a process
> after ensuring the old code is not visible anymore, before re-using
> memory for new code.
> 
> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE
> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and
> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in
> addition to the memory barriers, on all of its running threads.

I have queued both of these for testing and review:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
  f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd")
  0d6eb99818da ("membarrier: Provide register sync core cmd")

Could the people who asked for this please test it?  The second commit
is non-obvious enough to need some careful review and intensive testing
before I can in good conscious pass it upstream.

							Thanx, Paul

> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Andrew Hunter <ahh@google.com>
> CC: Maged Michael <maged.michael@gmail.com>
> CC: gromer@google.com
> CC: Avi Kivity <avi@scylladb.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Michael Ellerman <mpe@ellerman.id.au>
> CC: Dave Watson <davejwatson@fb.com>
> CC: Andy Lutomirski <luto@kernel.org>
> CC: Will Deacon <will.deacon@arm.com>
> CC: Hans Boehm <hboehm@google.com>
> ---
>  fs/exec.c                       |  1 +
>  include/linux/sched.h           | 52 +++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/membarrier.h |  1 +
>  kernel/fork.c                   |  2 ++
>  kernel/sched/core.c             |  3 +++
>  kernel/sched/membarrier.c       | 37 ++++++++++++++++++++++++++---
>  6 files changed, 93 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 62175cbcc801..a4ab3253bac7 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename *filename,
>  	/* execve succeeded */
>  	current->fs->in_exec = 0;
>  	current->in_execve = 0;
> +	membarrier_execve(current);
>  	acct_update_integrals(current);
>  	task_numa_free(current);
>  	free_bprm(bprm);
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 8337e2db0bb2..b1ecdc4e8b84 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1086,6 +1086,9 @@ struct task_struct {
>  	/* Used by LSM modules for access restriction: */
>  	void				*security;
>  #endif
> +#ifdef CONFIG_MEMBARRIER
> +	int membarrier_sync_core;
> +#endif
> 
>  	/*
>  	 * New fields for task_struct should be added above here, so that
> @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
>  #define TASK_SIZE_OF(tsk)	TASK_SIZE
>  #endif
> 
> +#ifdef CONFIG_MEMBARRIER
> +static inline void membarrier_fork(struct task_struct *t,
> +		unsigned long clone_flags)
> +{
> +	/*
> +	 * Coherence of membarrier_sync_core against thread fork is
> +	 * protected by siglock. membarrier_fork is called with siglock
> +	 * held.
> +	 */
> +	t->membarrier_sync_core = current->membarrier_sync_core;
> +}
> +static inline void membarrier_execve(struct task_struct *t)
> +{
> +	t->membarrier_sync_core = 0;
> +}
> +static inline void membarrier_sched_out(struct task_struct *t)
> +{
> +	/*
> +	 * Core serialization is performed before the memory barrier
> +	 * preceding the store to rq->curr.
> +	 */
> +	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
> +		sync_core();
> +}
> +static inline void membarrier_sched_in(struct task_struct *t)
> +{
> +	/*
> +	 * Core serialization is performed after the memory barrier
> +	 * following the store to rq->curr.
> +	 */
> +	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
> +		sync_core();
> +}
> +#else
> +static inline void membarrier_fork(struct task_struct *t,
> +		unsigned long clone_flags)
> +{
> +}
> +static inline void membarrier_execve(struct task_struct *t)
> +{
> +}
> +static inline void membarrier_sched_out(struct task_struct *t)
> +{
> +}
> +static inline void membarrier_sched_in(struct task_struct *t)
> +{
> +}
> +#endif
> +
>  #endif
> diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
> index 6d47b3249d8a..bc26fcf8fb7b 100644
> --- a/include/uapi/linux/membarrier.h
> +++ b/include/uapi/linux/membarrier.h
> @@ -67,6 +67,7 @@ enum membarrier_cmd {
>  	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
>  	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
>  	MEMBARRIER_CMD_PRIVATE_EXPEDITED	= (1 << 3),
> +	MEMBARRIER_CMD_REGISTER_SYNC_CORE	= (1 << 4),
>  };
> 
>  #endif /* _UAPI_LINUX_MEMBARRIER_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 17921b0390b4..713b3c932671 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process(
>  	 */
>  	copy_seccomp(p);
> 
> +	membarrier_fork(p, clone_flags);
> +
>  	/*
>  	 * Process group and session signals need to be delivered to just the
>  	 * parent before the fork or both the parent and the child after the
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0e36d9960d91..3ca27d066950 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt)
>  	local_irq_disable();
>  	rcu_note_context_switch(preempt);
> 
> +	membarrier_sched_out(prev);
> +
>  	/*
>  	 * Make sure that signal_pending_state()->signal_pending() below
>  	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
> @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt)
> 
>  		/* Also unlocks the rq: */
>  		rq = context_switch(rq, prev, next, &rf);
> +		membarrier_sched_in(next);
>  	} else {
>  		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
>  		rq_unlock_irq(rq, &rf);
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index 7eec6914d2d2..f128caf64b49 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -25,12 +25,16 @@
>   * Bitmask made from a "or" of all commands within enum membarrier_cmd,
>   * except MEMBARRIER_CMD_QUERY.
>   */
> -#define MEMBARRIER_CMD_BITMASK	\
> -	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED)
> +#define MEMBARRIER_CMD_BITMASK			\
> +	(MEMBARRIER_CMD_SHARED			\
> +	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
> +	| MEMBARRIER_CMD_REGISTER_SYNC_CORE)
> 
>  static void ipi_mb(void *info)
>  {
> -	smp_mb();	/* IPIs should be serializing but paranoid. */
> +	/* IPIs should be serializing but paranoid. */
> +	smp_mb();
> +	sync_core();
>  }
> 
>  static void membarrier_private_expedited(void)
> @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void)
>  	smp_mb();	/* exit from system call is not a mb */
>  }
> 
> +static void membarrier_register_sync_core(void)
> +{
> +	struct task_struct *p = current, *t;
> +
> +	if (get_nr_threads(p) == 1) {
> +		p->membarrier_sync_core = 1;
> +		return;
> +	}
> +
> +	/*
> +	 * Coherence of membarrier_sync_core against thread fork is
> +	 * protected by siglock.
> +	 */
> +	spin_lock(&p->sighand->siglock);
> +	for_each_thread(p, t)
> +		WRITE_ONCE(t->membarrier_sync_core, 1);
> +	spin_unlock(&p->sighand->siglock);
> +	/*
> +	 * Ensure all future scheduler execution will observe the new
> +	 * membarrier_sync_core state for this process.
> +	 */
> +	synchronize_sched();
> +}
> +
>  /**
>   * sys_membarrier - issue memory barriers on a set of threads
>   * @cmd:   Takes command values defined in enum membarrier_cmd.
> @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
>  	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
>  		membarrier_private_expedited();
>  		return 0;
> +	case MEMBARRIER_CMD_REGISTER_SYNC_CORE:
> +		membarrier_register_sync_core();
> +		return 0;
>  	default:
>  		return -EINVAL;
>  	}
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd
  2017-08-27 20:35   ` Paul E. McKenney
@ 2017-08-27 20:52     ` Mathieu Desnoyers
  2017-08-27 21:00       ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Mathieu Desnoyers @ 2017-08-27 20:52 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter,
	maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski,
	Will Deacon, Hans Boehm

----- On Aug 27, 2017, at 1:35 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote:

> On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote:
>> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier
>> system call. It allows processes to register their intent to have their
>> threads issue core serializing barriers in addition to memory barriers
>> whenever a membarrier command is performed.
>> 
>> It is relevant for reclaim of JIT code, which requires to issue core
>> serializing barriers on all threads running on behalf of a process
>> after ensuring the old code is not visible anymore, before re-using
>> memory for new code.
>> 
>> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE
>> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and
>> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in
>> addition to the memory barriers, on all of its running threads.
> 
> I have queued both of these for testing and review:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>  f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd")
>  0d6eb99818da ("membarrier: Provide register sync core cmd")
> 
> Could the people who asked for this please test it?  The second commit
> is non-obvious enough to need some careful review and intensive testing
> before I can in good conscious pass it upstream.

Thanks Paul!

Can you pick up my v2 instead ? I added benchmarks in the changelog and
documented the new command in the membarrier public header. No changes
otherwise.

Thanks,

Mathieu

> 
>							Thanx, Paul
> 
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> CC: Peter Zijlstra <peterz@infradead.org>
>> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> CC: Boqun Feng <boqun.feng@gmail.com>
>> CC: Andrew Hunter <ahh@google.com>
>> CC: Maged Michael <maged.michael@gmail.com>
>> CC: gromer@google.com
>> CC: Avi Kivity <avi@scylladb.com>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Michael Ellerman <mpe@ellerman.id.au>
>> CC: Dave Watson <davejwatson@fb.com>
>> CC: Andy Lutomirski <luto@kernel.org>
>> CC: Will Deacon <will.deacon@arm.com>
>> CC: Hans Boehm <hboehm@google.com>
>> ---
>>  fs/exec.c                       |  1 +
>>  include/linux/sched.h           | 52 +++++++++++++++++++++++++++++++++++++++++
>>  include/uapi/linux/membarrier.h |  1 +
>>  kernel/fork.c                   |  2 ++
>>  kernel/sched/core.c             |  3 +++
>>  kernel/sched/membarrier.c       | 37 ++++++++++++++++++++++++++---
>>  6 files changed, 93 insertions(+), 3 deletions(-)
>> 
>> diff --git a/fs/exec.c b/fs/exec.c
>> index 62175cbcc801..a4ab3253bac7 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename
>> *filename,
>>  	/* execve succeeded */
>>  	current->fs->in_exec = 0;
>>  	current->in_execve = 0;
>> +	membarrier_execve(current);
>>  	acct_update_integrals(current);
>>  	task_numa_free(current);
>>  	free_bprm(bprm);
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 8337e2db0bb2..b1ecdc4e8b84 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1086,6 +1086,9 @@ struct task_struct {
>>  	/* Used by LSM modules for access restriction: */
>>  	void				*security;
>>  #endif
>> +#ifdef CONFIG_MEMBARRIER
>> +	int membarrier_sync_core;
>> +#endif
>> 
>>  	/*
>>  	 * New fields for task_struct should be added above here, so that
>> @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask
>> *mask);
>>  #define TASK_SIZE_OF(tsk)	TASK_SIZE
>>  #endif
>> 
>> +#ifdef CONFIG_MEMBARRIER
>> +static inline void membarrier_fork(struct task_struct *t,
>> +		unsigned long clone_flags)
>> +{
>> +	/*
>> +	 * Coherence of membarrier_sync_core against thread fork is
>> +	 * protected by siglock. membarrier_fork is called with siglock
>> +	 * held.
>> +	 */
>> +	t->membarrier_sync_core = current->membarrier_sync_core;
>> +}
>> +static inline void membarrier_execve(struct task_struct *t)
>> +{
>> +	t->membarrier_sync_core = 0;
>> +}
>> +static inline void membarrier_sched_out(struct task_struct *t)
>> +{
>> +	/*
>> +	 * Core serialization is performed before the memory barrier
>> +	 * preceding the store to rq->curr.
>> +	 */
>> +	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
>> +		sync_core();
>> +}
>> +static inline void membarrier_sched_in(struct task_struct *t)
>> +{
>> +	/*
>> +	 * Core serialization is performed after the memory barrier
>> +	 * following the store to rq->curr.
>> +	 */
>> +	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
>> +		sync_core();
>> +}
>> +#else
>> +static inline void membarrier_fork(struct task_struct *t,
>> +		unsigned long clone_flags)
>> +{
>> +}
>> +static inline void membarrier_execve(struct task_struct *t)
>> +{
>> +}
>> +static inline void membarrier_sched_out(struct task_struct *t)
>> +{
>> +}
>> +static inline void membarrier_sched_in(struct task_struct *t)
>> +{
>> +}
>> +#endif
>> +
>>  #endif
>> diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
>> index 6d47b3249d8a..bc26fcf8fb7b 100644
>> --- a/include/uapi/linux/membarrier.h
>> +++ b/include/uapi/linux/membarrier.h
>> @@ -67,6 +67,7 @@ enum membarrier_cmd {
>>  	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
>>  	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
>>  	MEMBARRIER_CMD_PRIVATE_EXPEDITED	= (1 << 3),
>> +	MEMBARRIER_CMD_REGISTER_SYNC_CORE	= (1 << 4),
>>  };
>> 
>>  #endif /* _UAPI_LINUX_MEMBARRIER_H */
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 17921b0390b4..713b3c932671 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process(
>>  	 */
>>  	copy_seccomp(p);
>> 
>> +	membarrier_fork(p, clone_flags);
>> +
>>  	/*
>>  	 * Process group and session signals need to be delivered to just the
>>  	 * parent before the fork or both the parent and the child after the
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 0e36d9960d91..3ca27d066950 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt)
>>  	local_irq_disable();
>>  	rcu_note_context_switch(preempt);
>> 
>> +	membarrier_sched_out(prev);
>> +
>>  	/*
>>  	 * Make sure that signal_pending_state()->signal_pending() below
>>  	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
>> @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt)
>> 
>>  		/* Also unlocks the rq: */
>>  		rq = context_switch(rq, prev, next, &rf);
>> +		membarrier_sched_in(next);
>>  	} else {
>>  		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
>>  		rq_unlock_irq(rq, &rf);
>> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
>> index 7eec6914d2d2..f128caf64b49 100644
>> --- a/kernel/sched/membarrier.c
>> +++ b/kernel/sched/membarrier.c
>> @@ -25,12 +25,16 @@
>>   * Bitmask made from a "or" of all commands within enum membarrier_cmd,
>>   * except MEMBARRIER_CMD_QUERY.
>>   */
>> -#define MEMBARRIER_CMD_BITMASK	\
>> -	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED)
>> +#define MEMBARRIER_CMD_BITMASK			\
>> +	(MEMBARRIER_CMD_SHARED			\
>> +	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
>> +	| MEMBARRIER_CMD_REGISTER_SYNC_CORE)
>> 
>>  static void ipi_mb(void *info)
>>  {
>> -	smp_mb();	/* IPIs should be serializing but paranoid. */
>> +	/* IPIs should be serializing but paranoid. */
>> +	smp_mb();
>> +	sync_core();
>>  }
>> 
>>  static void membarrier_private_expedited(void)
>> @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void)
>>  	smp_mb();	/* exit from system call is not a mb */
>>  }
>> 
>> +static void membarrier_register_sync_core(void)
>> +{
>> +	struct task_struct *p = current, *t;
>> +
>> +	if (get_nr_threads(p) == 1) {
>> +		p->membarrier_sync_core = 1;
>> +		return;
>> +	}
>> +
>> +	/*
>> +	 * Coherence of membarrier_sync_core against thread fork is
>> +	 * protected by siglock.
>> +	 */
>> +	spin_lock(&p->sighand->siglock);
>> +	for_each_thread(p, t)
>> +		WRITE_ONCE(t->membarrier_sync_core, 1);
>> +	spin_unlock(&p->sighand->siglock);
>> +	/*
>> +	 * Ensure all future scheduler execution will observe the new
>> +	 * membarrier_sync_core state for this process.
>> +	 */
>> +	synchronize_sched();
>> +}
>> +
>>  /**
>>   * sys_membarrier - issue memory barriers on a set of threads
>>   * @cmd:   Takes command values defined in enum membarrier_cmd.
>> @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
>>  	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
>>  		membarrier_private_expedited();
>>  		return 0;
>> +	case MEMBARRIER_CMD_REGISTER_SYNC_CORE:
>> +		membarrier_register_sync_core();
>> +		return 0;
>>  	default:
>>  		return -EINVAL;
>>  	}
>> --
>> 2.11.0

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd
  2017-08-27 20:52     ` Mathieu Desnoyers
@ 2017-08-27 21:00       ` Paul E. McKenney
  2017-08-27 21:39         ` Mathieu Desnoyers
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2017-08-27 21:00 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter,
	maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski,
	Will Deacon, Hans Boehm

On Sun, Aug 27, 2017 at 08:52:58PM +0000, Mathieu Desnoyers wrote:
> ----- On Aug 27, 2017, at 1:35 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote:
> 
> > On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote:
> >> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier
> >> system call. It allows processes to register their intent to have their
> >> threads issue core serializing barriers in addition to memory barriers
> >> whenever a membarrier command is performed.
> >> 
> >> It is relevant for reclaim of JIT code, which requires to issue core
> >> serializing barriers on all threads running on behalf of a process
> >> after ensuring the old code is not visible anymore, before re-using
> >> memory for new code.
> >> 
> >> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE
> >> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and
> >> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in
> >> addition to the memory barriers, on all of its running threads.
> > 
> > I have queued both of these for testing and review:
> > 
> >  git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> >  f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd")
> >  0d6eb99818da ("membarrier: Provide register sync core cmd")
> > 
> > Could the people who asked for this please test it?  The second commit
> > is non-obvious enough to need some careful review and intensive testing
> > before I can in good conscious pass it upstream.
> 
> Thanks Paul!
> 
> Can you pick up my v2 instead ? I added benchmarks in the changelog and
> documented the new command in the membarrier public header. No changes
> otherwise.

It looks like 0day Test Robot also wants some additional header files.
Could you please send me a v3 with its complaints addressed?

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> > 
> >							Thanx, Paul
> > 
> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >> CC: Peter Zijlstra <peterz@infradead.org>
> >> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >> CC: Boqun Feng <boqun.feng@gmail.com>
> >> CC: Andrew Hunter <ahh@google.com>
> >> CC: Maged Michael <maged.michael@gmail.com>
> >> CC: gromer@google.com
> >> CC: Avi Kivity <avi@scylladb.com>
> >> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> >> CC: Paul Mackerras <paulus@samba.org>
> >> CC: Michael Ellerman <mpe@ellerman.id.au>
> >> CC: Dave Watson <davejwatson@fb.com>
> >> CC: Andy Lutomirski <luto@kernel.org>
> >> CC: Will Deacon <will.deacon@arm.com>
> >> CC: Hans Boehm <hboehm@google.com>
> >> ---
> >>  fs/exec.c                       |  1 +
> >>  include/linux/sched.h           | 52 +++++++++++++++++++++++++++++++++++++++++
> >>  include/uapi/linux/membarrier.h |  1 +
> >>  kernel/fork.c                   |  2 ++
> >>  kernel/sched/core.c             |  3 +++
> >>  kernel/sched/membarrier.c       | 37 ++++++++++++++++++++++++++---
> >>  6 files changed, 93 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/fs/exec.c b/fs/exec.c
> >> index 62175cbcc801..a4ab3253bac7 100644
> >> --- a/fs/exec.c
> >> +++ b/fs/exec.c
> >> @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename
> >> *filename,
> >>  	/* execve succeeded */
> >>  	current->fs->in_exec = 0;
> >>  	current->in_execve = 0;
> >> +	membarrier_execve(current);
> >>  	acct_update_integrals(current);
> >>  	task_numa_free(current);
> >>  	free_bprm(bprm);
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> index 8337e2db0bb2..b1ecdc4e8b84 100644
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -1086,6 +1086,9 @@ struct task_struct {
> >>  	/* Used by LSM modules for access restriction: */
> >>  	void				*security;
> >>  #endif
> >> +#ifdef CONFIG_MEMBARRIER
> >> +	int membarrier_sync_core;
> >> +#endif
> >> 
> >>  	/*
> >>  	 * New fields for task_struct should be added above here, so that
> >> @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask
> >> *mask);
> >>  #define TASK_SIZE_OF(tsk)	TASK_SIZE
> >>  #endif
> >> 
> >> +#ifdef CONFIG_MEMBARRIER
> >> +static inline void membarrier_fork(struct task_struct *t,
> >> +		unsigned long clone_flags)
> >> +{
> >> +	/*
> >> +	 * Coherence of membarrier_sync_core against thread fork is
> >> +	 * protected by siglock. membarrier_fork is called with siglock
> >> +	 * held.
> >> +	 */
> >> +	t->membarrier_sync_core = current->membarrier_sync_core;
> >> +}
> >> +static inline void membarrier_execve(struct task_struct *t)
> >> +{
> >> +	t->membarrier_sync_core = 0;
> >> +}
> >> +static inline void membarrier_sched_out(struct task_struct *t)
> >> +{
> >> +	/*
> >> +	 * Core serialization is performed before the memory barrier
> >> +	 * preceding the store to rq->curr.
> >> +	 */
> >> +	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
> >> +		sync_core();
> >> +}
> >> +static inline void membarrier_sched_in(struct task_struct *t)
> >> +{
> >> +	/*
> >> +	 * Core serialization is performed after the memory barrier
> >> +	 * following the store to rq->curr.
> >> +	 */
> >> +	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
> >> +		sync_core();
> >> +}
> >> +#else
> >> +static inline void membarrier_fork(struct task_struct *t,
> >> +		unsigned long clone_flags)
> >> +{
> >> +}
> >> +static inline void membarrier_execve(struct task_struct *t)
> >> +{
> >> +}
> >> +static inline void membarrier_sched_out(struct task_struct *t)
> >> +{
> >> +}
> >> +static inline void membarrier_sched_in(struct task_struct *t)
> >> +{
> >> +}
> >> +#endif
> >> +
> >>  #endif
> >> diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
> >> index 6d47b3249d8a..bc26fcf8fb7b 100644
> >> --- a/include/uapi/linux/membarrier.h
> >> +++ b/include/uapi/linux/membarrier.h
> >> @@ -67,6 +67,7 @@ enum membarrier_cmd {
> >>  	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
> >>  	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
> >>  	MEMBARRIER_CMD_PRIVATE_EXPEDITED	= (1 << 3),
> >> +	MEMBARRIER_CMD_REGISTER_SYNC_CORE	= (1 << 4),
> >>  };
> >> 
> >>  #endif /* _UAPI_LINUX_MEMBARRIER_H */
> >> diff --git a/kernel/fork.c b/kernel/fork.c
> >> index 17921b0390b4..713b3c932671 100644
> >> --- a/kernel/fork.c
> >> +++ b/kernel/fork.c
> >> @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process(
> >>  	 */
> >>  	copy_seccomp(p);
> >> 
> >> +	membarrier_fork(p, clone_flags);
> >> +
> >>  	/*
> >>  	 * Process group and session signals need to be delivered to just the
> >>  	 * parent before the fork or both the parent and the child after the
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index 0e36d9960d91..3ca27d066950 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt)
> >>  	local_irq_disable();
> >>  	rcu_note_context_switch(preempt);
> >> 
> >> +	membarrier_sched_out(prev);
> >> +
> >>  	/*
> >>  	 * Make sure that signal_pending_state()->signal_pending() below
> >>  	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
> >> @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt)
> >> 
> >>  		/* Also unlocks the rq: */
> >>  		rq = context_switch(rq, prev, next, &rf);
> >> +		membarrier_sched_in(next);
> >>  	} else {
> >>  		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
> >>  		rq_unlock_irq(rq, &rf);
> >> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> >> index 7eec6914d2d2..f128caf64b49 100644
> >> --- a/kernel/sched/membarrier.c
> >> +++ b/kernel/sched/membarrier.c
> >> @@ -25,12 +25,16 @@
> >>   * Bitmask made from a "or" of all commands within enum membarrier_cmd,
> >>   * except MEMBARRIER_CMD_QUERY.
> >>   */
> >> -#define MEMBARRIER_CMD_BITMASK	\
> >> -	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED)
> >> +#define MEMBARRIER_CMD_BITMASK			\
> >> +	(MEMBARRIER_CMD_SHARED			\
> >> +	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
> >> +	| MEMBARRIER_CMD_REGISTER_SYNC_CORE)
> >> 
> >>  static void ipi_mb(void *info)
> >>  {
> >> -	smp_mb();	/* IPIs should be serializing but paranoid. */
> >> +	/* IPIs should be serializing but paranoid. */
> >> +	smp_mb();
> >> +	sync_core();
> >>  }
> >> 
> >>  static void membarrier_private_expedited(void)
> >> @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void)
> >>  	smp_mb();	/* exit from system call is not a mb */
> >>  }
> >> 
> >> +static void membarrier_register_sync_core(void)
> >> +{
> >> +	struct task_struct *p = current, *t;
> >> +
> >> +	if (get_nr_threads(p) == 1) {
> >> +		p->membarrier_sync_core = 1;
> >> +		return;
> >> +	}
> >> +
> >> +	/*
> >> +	 * Coherence of membarrier_sync_core against thread fork is
> >> +	 * protected by siglock.
> >> +	 */
> >> +	spin_lock(&p->sighand->siglock);
> >> +	for_each_thread(p, t)
> >> +		WRITE_ONCE(t->membarrier_sync_core, 1);
> >> +	spin_unlock(&p->sighand->siglock);
> >> +	/*
> >> +	 * Ensure all future scheduler execution will observe the new
> >> +	 * membarrier_sync_core state for this process.
> >> +	 */
> >> +	synchronize_sched();
> >> +}
> >> +
> >>  /**
> >>   * sys_membarrier - issue memory barriers on a set of threads
> >>   * @cmd:   Takes command values defined in enum membarrier_cmd.
> >> @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
> >>  	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
> >>  		membarrier_private_expedited();
> >>  		return 0;
> >> +	case MEMBARRIER_CMD_REGISTER_SYNC_CORE:
> >> +		membarrier_register_sync_core();
> >> +		return 0;
> >>  	default:
> >>  		return -EINVAL;
> >>  	}
> >> --
> >> 2.11.0
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd
  2017-08-27 21:00       ` Paul E. McKenney
@ 2017-08-27 21:39         ` Mathieu Desnoyers
  2017-08-28 17:49           ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Mathieu Desnoyers @ 2017-08-27 21:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter,
	maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski,
	Will Deacon, Hans Boehm

----- On Aug 27, 2017, at 2:00 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote:

> On Sun, Aug 27, 2017 at 08:52:58PM +0000, Mathieu Desnoyers wrote:
>> ----- On Aug 27, 2017, at 1:35 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com
>> wrote:
>> 
>> > On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote:
>> >> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier
>> >> system call. It allows processes to register their intent to have their
>> >> threads issue core serializing barriers in addition to memory barriers
>> >> whenever a membarrier command is performed.
>> >> 
>> >> It is relevant for reclaim of JIT code, which requires to issue core
>> >> serializing barriers on all threads running on behalf of a process
>> >> after ensuring the old code is not visible anymore, before re-using
>> >> memory for new code.
>> >> 
>> >> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE
>> >> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and
>> >> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in
>> >> addition to the memory barriers, on all of its running threads.
>> > 
>> > I have queued both of these for testing and review:
>> > 
>> >  git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>> >  f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd")
>> >  0d6eb99818da ("membarrier: Provide register sync core cmd")
>> > 
>> > Could the people who asked for this please test it?  The second commit
>> > is non-obvious enough to need some careful review and intensive testing
>> > before I can in good conscious pass it upstream.
>> 
>> Thanks Paul!
>> 
>> Can you pick up my v2 instead ? I added benchmarks in the changelog and
>> documented the new command in the membarrier public header. No changes
>> otherwise.
> 
> It looks like 0day Test Robot also wants some additional header files.
> Could you please send me a v3 with its complaints addressed?

I sent a separate patch implementing sync_core() on xtensa. It should
be applied before this patch.

Hopefully we can get some testing from people with xtensa testing
environments.

Thanks,

Mathieu

> 
>							Thanx, Paul
> 
>> Thanks,
>> 
>> Mathieu
>> 
>> > 
>> >							Thanx, Paul
>> > 
>> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> >> CC: Peter Zijlstra <peterz@infradead.org>
>> >> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> >> CC: Boqun Feng <boqun.feng@gmail.com>
>> >> CC: Andrew Hunter <ahh@google.com>
>> >> CC: Maged Michael <maged.michael@gmail.com>
>> >> CC: gromer@google.com
>> >> CC: Avi Kivity <avi@scylladb.com>
>> >> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> >> CC: Paul Mackerras <paulus@samba.org>
>> >> CC: Michael Ellerman <mpe@ellerman.id.au>
>> >> CC: Dave Watson <davejwatson@fb.com>
>> >> CC: Andy Lutomirski <luto@kernel.org>
>> >> CC: Will Deacon <will.deacon@arm.com>
>> >> CC: Hans Boehm <hboehm@google.com>
>> >> ---
>> >>  fs/exec.c                       |  1 +
>> >>  include/linux/sched.h           | 52 +++++++++++++++++++++++++++++++++++++++++
>> >>  include/uapi/linux/membarrier.h |  1 +
>> >>  kernel/fork.c                   |  2 ++
>> >>  kernel/sched/core.c             |  3 +++
>> >>  kernel/sched/membarrier.c       | 37 ++++++++++++++++++++++++++---
>> >>  6 files changed, 93 insertions(+), 3 deletions(-)
>> >> 
>> >> diff --git a/fs/exec.c b/fs/exec.c
>> >> index 62175cbcc801..a4ab3253bac7 100644
>> >> --- a/fs/exec.c
>> >> +++ b/fs/exec.c
>> >> @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename
>> >> *filename,
>> >>  	/* execve succeeded */
>> >>  	current->fs->in_exec = 0;
>> >>  	current->in_execve = 0;
>> >> +	membarrier_execve(current);
>> >>  	acct_update_integrals(current);
>> >>  	task_numa_free(current);
>> >>  	free_bprm(bprm);
>> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> >> index 8337e2db0bb2..b1ecdc4e8b84 100644
>> >> --- a/include/linux/sched.h
>> >> +++ b/include/linux/sched.h
>> >> @@ -1086,6 +1086,9 @@ struct task_struct {
>> >>  	/* Used by LSM modules for access restriction: */
>> >>  	void				*security;
>> >>  #endif
>> >> +#ifdef CONFIG_MEMBARRIER
>> >> +	int membarrier_sync_core;
>> >> +#endif
>> >> 
>> >>  	/*
>> >>  	 * New fields for task_struct should be added above here, so that
>> >> @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask
>> >> *mask);
>> >>  #define TASK_SIZE_OF(tsk)	TASK_SIZE
>> >>  #endif
>> >> 
>> >> +#ifdef CONFIG_MEMBARRIER
>> >> +static inline void membarrier_fork(struct task_struct *t,
>> >> +		unsigned long clone_flags)
>> >> +{
>> >> +	/*
>> >> +	 * Coherence of membarrier_sync_core against thread fork is
>> >> +	 * protected by siglock. membarrier_fork is called with siglock
>> >> +	 * held.
>> >> +	 */
>> >> +	t->membarrier_sync_core = current->membarrier_sync_core;
>> >> +}
>> >> +static inline void membarrier_execve(struct task_struct *t)
>> >> +{
>> >> +	t->membarrier_sync_core = 0;
>> >> +}
>> >> +static inline void membarrier_sched_out(struct task_struct *t)
>> >> +{
>> >> +	/*
>> >> +	 * Core serialization is performed before the memory barrier
>> >> +	 * preceding the store to rq->curr.
>> >> +	 */
>> >> +	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
>> >> +		sync_core();
>> >> +}
>> >> +static inline void membarrier_sched_in(struct task_struct *t)
>> >> +{
>> >> +	/*
>> >> +	 * Core serialization is performed after the memory barrier
>> >> +	 * following the store to rq->curr.
>> >> +	 */
>> >> +	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
>> >> +		sync_core();
>> >> +}
>> >> +#else
>> >> +static inline void membarrier_fork(struct task_struct *t,
>> >> +		unsigned long clone_flags)
>> >> +{
>> >> +}
>> >> +static inline void membarrier_execve(struct task_struct *t)
>> >> +{
>> >> +}
>> >> +static inline void membarrier_sched_out(struct task_struct *t)
>> >> +{
>> >> +}
>> >> +static inline void membarrier_sched_in(struct task_struct *t)
>> >> +{
>> >> +}
>> >> +#endif
>> >> +
>> >>  #endif
>> >> diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
>> >> index 6d47b3249d8a..bc26fcf8fb7b 100644
>> >> --- a/include/uapi/linux/membarrier.h
>> >> +++ b/include/uapi/linux/membarrier.h
>> >> @@ -67,6 +67,7 @@ enum membarrier_cmd {
>> >>  	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
>> >>  	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
>> >>  	MEMBARRIER_CMD_PRIVATE_EXPEDITED	= (1 << 3),
>> >> +	MEMBARRIER_CMD_REGISTER_SYNC_CORE	= (1 << 4),
>> >>  };
>> >> 
>> >>  #endif /* _UAPI_LINUX_MEMBARRIER_H */
>> >> diff --git a/kernel/fork.c b/kernel/fork.c
>> >> index 17921b0390b4..713b3c932671 100644
>> >> --- a/kernel/fork.c
>> >> +++ b/kernel/fork.c
>> >> @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process(
>> >>  	 */
>> >>  	copy_seccomp(p);
>> >> 
>> >> +	membarrier_fork(p, clone_flags);
>> >> +
>> >>  	/*
>> >>  	 * Process group and session signals need to be delivered to just the
>> >>  	 * parent before the fork or both the parent and the child after the
>> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> >> index 0e36d9960d91..3ca27d066950 100644
>> >> --- a/kernel/sched/core.c
>> >> +++ b/kernel/sched/core.c
>> >> @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt)
>> >>  	local_irq_disable();
>> >>  	rcu_note_context_switch(preempt);
>> >> 
>> >> +	membarrier_sched_out(prev);
>> >> +
>> >>  	/*
>> >>  	 * Make sure that signal_pending_state()->signal_pending() below
>> >>  	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
>> >> @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt)
>> >> 
>> >>  		/* Also unlocks the rq: */
>> >>  		rq = context_switch(rq, prev, next, &rf);
>> >> +		membarrier_sched_in(next);
>> >>  	} else {
>> >>  		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
>> >>  		rq_unlock_irq(rq, &rf);
>> >> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
>> >> index 7eec6914d2d2..f128caf64b49 100644
>> >> --- a/kernel/sched/membarrier.c
>> >> +++ b/kernel/sched/membarrier.c
>> >> @@ -25,12 +25,16 @@
>> >>   * Bitmask made from a "or" of all commands within enum membarrier_cmd,
>> >>   * except MEMBARRIER_CMD_QUERY.
>> >>   */
>> >> -#define MEMBARRIER_CMD_BITMASK	\
>> >> -	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED)
>> >> +#define MEMBARRIER_CMD_BITMASK			\
>> >> +	(MEMBARRIER_CMD_SHARED			\
>> >> +	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
>> >> +	| MEMBARRIER_CMD_REGISTER_SYNC_CORE)
>> >> 
>> >>  static void ipi_mb(void *info)
>> >>  {
>> >> -	smp_mb();	/* IPIs should be serializing but paranoid. */
>> >> +	/* IPIs should be serializing but paranoid. */
>> >> +	smp_mb();
>> >> +	sync_core();
>> >>  }
>> >> 
>> >>  static void membarrier_private_expedited(void)
>> >> @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void)
>> >>  	smp_mb();	/* exit from system call is not a mb */
>> >>  }
>> >> 
>> >> +static void membarrier_register_sync_core(void)
>> >> +{
>> >> +	struct task_struct *p = current, *t;
>> >> +
>> >> +	if (get_nr_threads(p) == 1) {
>> >> +		p->membarrier_sync_core = 1;
>> >> +		return;
>> >> +	}
>> >> +
>> >> +	/*
>> >> +	 * Coherence of membarrier_sync_core against thread fork is
>> >> +	 * protected by siglock.
>> >> +	 */
>> >> +	spin_lock(&p->sighand->siglock);
>> >> +	for_each_thread(p, t)
>> >> +		WRITE_ONCE(t->membarrier_sync_core, 1);
>> >> +	spin_unlock(&p->sighand->siglock);
>> >> +	/*
>> >> +	 * Ensure all future scheduler execution will observe the new
>> >> +	 * membarrier_sync_core state for this process.
>> >> +	 */
>> >> +	synchronize_sched();
>> >> +}
>> >> +
>> >>  /**
>> >>   * sys_membarrier - issue memory barriers on a set of threads
>> >>   * @cmd:   Takes command values defined in enum membarrier_cmd.
>> >> @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
>> >>  	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
>> >>  		membarrier_private_expedited();
>> >>  		return 0;
>> >> +	case MEMBARRIER_CMD_REGISTER_SYNC_CORE:
>> >> +		membarrier_register_sync_core();
>> >> +		return 0;
>> >>  	default:
>> >>  		return -EINVAL;
>> >>  	}
>> >> --
>> >> 2.11.0
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd
  2017-08-27 21:39         ` Mathieu Desnoyers
@ 2017-08-28 17:49           ` Paul E. McKenney
  2017-08-30 17:53             ` Mathieu Desnoyers
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2017-08-28 17:49 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter,
	maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski,
	Will Deacon, Hans Boehm

On Sun, Aug 27, 2017 at 09:39:54PM +0000, Mathieu Desnoyers wrote:
> ----- On Aug 27, 2017, at 2:00 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote:
> 
> > On Sun, Aug 27, 2017 at 08:52:58PM +0000, Mathieu Desnoyers wrote:
> >> ----- On Aug 27, 2017, at 1:35 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com
> >> wrote:
> >> 
> >> > On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote:
> >> >> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier
> >> >> system call. It allows processes to register their intent to have their
> >> >> threads issue core serializing barriers in addition to memory barriers
> >> >> whenever a membarrier command is performed.
> >> >> 
> >> >> It is relevant for reclaim of JIT code, which requires to issue core
> >> >> serializing barriers on all threads running on behalf of a process
> >> >> after ensuring the old code is not visible anymore, before re-using
> >> >> memory for new code.
> >> >> 
> >> >> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE
> >> >> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and
> >> >> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in
> >> >> addition to the memory barriers, on all of its running threads.
> >> > 
> >> > I have queued both of these for testing and review:
> >> > 
> >> >  git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> >> >  f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd")
> >> >  0d6eb99818da ("membarrier: Provide register sync core cmd")
> >> > 
> >> > Could the people who asked for this please test it?  The second commit
> >> > is non-obvious enough to need some careful review and intensive testing
> >> > before I can in good conscious pass it upstream.
> >> 
> >> Thanks Paul!
> >> 
> >> Can you pick up my v2 instead ? I added benchmarks in the changelog and
> >> documented the new command in the membarrier public header. No changes
> >> otherwise.
> > 
> > It looks like 0day Test Robot also wants some additional header files.
> > Could you please send me a v3 with its complaints addressed?
> 
> I sent a separate patch implementing sync_core() on xtensa. It should
> be applied before this patch.

And it looks like we also need a sync_core() on every CPU family other
than x86, which already has one.  Do we want to make this capability
depend on an ARCH_ kconfig option, or should we just bit the bullet and
implement sync_core() everywhere?

(I had to drop this commit due to downstream complaints.  Left to myself,
I would take the ARCH_ approach, then add it back in, but please let me
know how you would like to proceed.)

							Thanx, Paul

> Hopefully we can get some testing from people with xtensa testing
> environments.
> 
> Thanks,
> 
> Mathieu
> 
> > 
> >							Thanx, Paul
> > 
> >> Thanks,
> >> 
> >> Mathieu
> >> 
> >> > 
> >> >							Thanx, Paul
> >> > 
> >> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >> >> CC: Peter Zijlstra <peterz@infradead.org>
> >> >> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >> >> CC: Boqun Feng <boqun.feng@gmail.com>
> >> >> CC: Andrew Hunter <ahh@google.com>
> >> >> CC: Maged Michael <maged.michael@gmail.com>
> >> >> CC: gromer@google.com
> >> >> CC: Avi Kivity <avi@scylladb.com>
> >> >> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> >> >> CC: Paul Mackerras <paulus@samba.org>
> >> >> CC: Michael Ellerman <mpe@ellerman.id.au>
> >> >> CC: Dave Watson <davejwatson@fb.com>
> >> >> CC: Andy Lutomirski <luto@kernel.org>
> >> >> CC: Will Deacon <will.deacon@arm.com>
> >> >> CC: Hans Boehm <hboehm@google.com>
> >> >> ---
> >> >>  fs/exec.c                       |  1 +
> >> >>  include/linux/sched.h           | 52 +++++++++++++++++++++++++++++++++++++++++
> >> >>  include/uapi/linux/membarrier.h |  1 +
> >> >>  kernel/fork.c                   |  2 ++
> >> >>  kernel/sched/core.c             |  3 +++
> >> >>  kernel/sched/membarrier.c       | 37 ++++++++++++++++++++++++++---
> >> >>  6 files changed, 93 insertions(+), 3 deletions(-)
> >> >> 
> >> >> diff --git a/fs/exec.c b/fs/exec.c
> >> >> index 62175cbcc801..a4ab3253bac7 100644
> >> >> --- a/fs/exec.c
> >> >> +++ b/fs/exec.c
> >> >> @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename
> >> >> *filename,
> >> >>  	/* execve succeeded */
> >> >>  	current->fs->in_exec = 0;
> >> >>  	current->in_execve = 0;
> >> >> +	membarrier_execve(current);
> >> >>  	acct_update_integrals(current);
> >> >>  	task_numa_free(current);
> >> >>  	free_bprm(bprm);
> >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> >> index 8337e2db0bb2..b1ecdc4e8b84 100644
> >> >> --- a/include/linux/sched.h
> >> >> +++ b/include/linux/sched.h
> >> >> @@ -1086,6 +1086,9 @@ struct task_struct {
> >> >>  	/* Used by LSM modules for access restriction: */
> >> >>  	void				*security;
> >> >>  #endif
> >> >> +#ifdef CONFIG_MEMBARRIER
> >> >> +	int membarrier_sync_core;
> >> >> +#endif
> >> >> 
> >> >>  	/*
> >> >>  	 * New fields for task_struct should be added above here, so that
> >> >> @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask
> >> >> *mask);
> >> >>  #define TASK_SIZE_OF(tsk)	TASK_SIZE
> >> >>  #endif
> >> >> 
> >> >> +#ifdef CONFIG_MEMBARRIER
> >> >> +static inline void membarrier_fork(struct task_struct *t,
> >> >> +		unsigned long clone_flags)
> >> >> +{
> >> >> +	/*
> >> >> +	 * Coherence of membarrier_sync_core against thread fork is
> >> >> +	 * protected by siglock. membarrier_fork is called with siglock
> >> >> +	 * held.
> >> >> +	 */
> >> >> +	t->membarrier_sync_core = current->membarrier_sync_core;
> >> >> +}
> >> >> +static inline void membarrier_execve(struct task_struct *t)
> >> >> +{
> >> >> +	t->membarrier_sync_core = 0;
> >> >> +}
> >> >> +static inline void membarrier_sched_out(struct task_struct *t)
> >> >> +{
> >> >> +	/*
> >> >> +	 * Core serialization is performed before the memory barrier
> >> >> +	 * preceding the store to rq->curr.
> >> >> +	 */
> >> >> +	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
> >> >> +		sync_core();
> >> >> +}
> >> >> +static inline void membarrier_sched_in(struct task_struct *t)
> >> >> +{
> >> >> +	/*
> >> >> +	 * Core serialization is performed after the memory barrier
> >> >> +	 * following the store to rq->curr.
> >> >> +	 */
> >> >> +	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
> >> >> +		sync_core();
> >> >> +}
> >> >> +#else
> >> >> +static inline void membarrier_fork(struct task_struct *t,
> >> >> +		unsigned long clone_flags)
> >> >> +{
> >> >> +}
> >> >> +static inline void membarrier_execve(struct task_struct *t)
> >> >> +{
> >> >> +}
> >> >> +static inline void membarrier_sched_out(struct task_struct *t)
> >> >> +{
> >> >> +}
> >> >> +static inline void membarrier_sched_in(struct task_struct *t)
> >> >> +{
> >> >> +}
> >> >> +#endif
> >> >> +
> >> >>  #endif
> >> >> diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
> >> >> index 6d47b3249d8a..bc26fcf8fb7b 100644
> >> >> --- a/include/uapi/linux/membarrier.h
> >> >> +++ b/include/uapi/linux/membarrier.h
> >> >> @@ -67,6 +67,7 @@ enum membarrier_cmd {
> >> >>  	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
> >> >>  	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
> >> >>  	MEMBARRIER_CMD_PRIVATE_EXPEDITED	= (1 << 3),
> >> >> +	MEMBARRIER_CMD_REGISTER_SYNC_CORE	= (1 << 4),
> >> >>  };
> >> >> 
> >> >>  #endif /* _UAPI_LINUX_MEMBARRIER_H */
> >> >> diff --git a/kernel/fork.c b/kernel/fork.c
> >> >> index 17921b0390b4..713b3c932671 100644
> >> >> --- a/kernel/fork.c
> >> >> +++ b/kernel/fork.c
> >> >> @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process(
> >> >>  	 */
> >> >>  	copy_seccomp(p);
> >> >> 
> >> >> +	membarrier_fork(p, clone_flags);
> >> >> +
> >> >>  	/*
> >> >>  	 * Process group and session signals need to be delivered to just the
> >> >>  	 * parent before the fork or both the parent and the child after the
> >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> >> index 0e36d9960d91..3ca27d066950 100644
> >> >> --- a/kernel/sched/core.c
> >> >> +++ b/kernel/sched/core.c
> >> >> @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt)
> >> >>  	local_irq_disable();
> >> >>  	rcu_note_context_switch(preempt);
> >> >> 
> >> >> +	membarrier_sched_out(prev);
> >> >> +
> >> >>  	/*
> >> >>  	 * Make sure that signal_pending_state()->signal_pending() below
> >> >>  	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
> >> >> @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt)
> >> >> 
> >> >>  		/* Also unlocks the rq: */
> >> >>  		rq = context_switch(rq, prev, next, &rf);
> >> >> +		membarrier_sched_in(next);
> >> >>  	} else {
> >> >>  		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
> >> >>  		rq_unlock_irq(rq, &rf);
> >> >> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> >> >> index 7eec6914d2d2..f128caf64b49 100644
> >> >> --- a/kernel/sched/membarrier.c
> >> >> +++ b/kernel/sched/membarrier.c
> >> >> @@ -25,12 +25,16 @@
> >> >>   * Bitmask made from a "or" of all commands within enum membarrier_cmd,
> >> >>   * except MEMBARRIER_CMD_QUERY.
> >> >>   */
> >> >> -#define MEMBARRIER_CMD_BITMASK	\
> >> >> -	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED)
> >> >> +#define MEMBARRIER_CMD_BITMASK			\
> >> >> +	(MEMBARRIER_CMD_SHARED			\
> >> >> +	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
> >> >> +	| MEMBARRIER_CMD_REGISTER_SYNC_CORE)
> >> >> 
> >> >>  static void ipi_mb(void *info)
> >> >>  {
> >> >> -	smp_mb();	/* IPIs should be serializing but paranoid. */
> >> >> +	/* IPIs should be serializing but paranoid. */
> >> >> +	smp_mb();
> >> >> +	sync_core();
> >> >>  }
> >> >> 
> >> >>  static void membarrier_private_expedited(void)
> >> >> @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void)
> >> >>  	smp_mb();	/* exit from system call is not a mb */
> >> >>  }
> >> >> 
> >> >> +static void membarrier_register_sync_core(void)
> >> >> +{
> >> >> +	struct task_struct *p = current, *t;
> >> >> +
> >> >> +	if (get_nr_threads(p) == 1) {
> >> >> +		p->membarrier_sync_core = 1;
> >> >> +		return;
> >> >> +	}
> >> >> +
> >> >> +	/*
> >> >> +	 * Coherence of membarrier_sync_core against thread fork is
> >> >> +	 * protected by siglock.
> >> >> +	 */
> >> >> +	spin_lock(&p->sighand->siglock);
> >> >> +	for_each_thread(p, t)
> >> >> +		WRITE_ONCE(t->membarrier_sync_core, 1);
> >> >> +	spin_unlock(&p->sighand->siglock);
> >> >> +	/*
> >> >> +	 * Ensure all future scheduler execution will observe the new
> >> >> +	 * membarrier_sync_core state for this process.
> >> >> +	 */
> >> >> +	synchronize_sched();
> >> >> +}
> >> >> +
> >> >>  /**
> >> >>   * sys_membarrier - issue memory barriers on a set of threads
> >> >>   * @cmd:   Takes command values defined in enum membarrier_cmd.
> >> >> @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
> >> >>  	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
> >> >>  		membarrier_private_expedited();
> >> >>  		return 0;
> >> >> +	case MEMBARRIER_CMD_REGISTER_SYNC_CORE:
> >> >> +		membarrier_register_sync_core();
> >> >> +		return 0;
> >> >>  	default:
> >> >>  		return -EINVAL;
> >> >>  	}
> >> >> --
> >> >> 2.11.0
> >> 
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> http://www.efficios.com
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd
  2017-08-28 17:49           ` Paul E. McKenney
@ 2017-08-30 17:53             ` Mathieu Desnoyers
  2017-08-30 18:01               ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Mathieu Desnoyers @ 2017-08-30 17:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter,
	maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski,
	Will Deacon, Hans Boehm

----- On Aug 28, 2017, at 1:49 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote:

> On Sun, Aug 27, 2017 at 09:39:54PM +0000, Mathieu Desnoyers wrote:
>> ----- On Aug 27, 2017, at 2:00 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com
>> wrote:
>> 
>> > On Sun, Aug 27, 2017 at 08:52:58PM +0000, Mathieu Desnoyers wrote:
>> >> ----- On Aug 27, 2017, at 1:35 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com
>> >> wrote:
>> >> 
>> >> > On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote:
>> >> >> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier
>> >> >> system call. It allows processes to register their intent to have their
>> >> >> threads issue core serializing barriers in addition to memory barriers
>> >> >> whenever a membarrier command is performed.
>> >> >> 
>> >> >> It is relevant for reclaim of JIT code, which requires to issue core
>> >> >> serializing barriers on all threads running on behalf of a process
>> >> >> after ensuring the old code is not visible anymore, before re-using
>> >> >> memory for new code.
>> >> >> 
>> >> >> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE
>> >> >> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and
>> >> >> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in
>> >> >> addition to the memory barriers, on all of its running threads.
>> >> > 
>> >> > I have queued both of these for testing and review:
>> >> > 
>> >> >  git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>> >> >  f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd")
>> >> >  0d6eb99818da ("membarrier: Provide register sync core cmd")
>> >> > 
>> >> > Could the people who asked for this please test it?  The second commit
>> >> > is non-obvious enough to need some careful review and intensive testing
>> >> > before I can in good conscious pass it upstream.
>> >> 
>> >> Thanks Paul!
>> >> 
>> >> Can you pick up my v2 instead ? I added benchmarks in the changelog and
>> >> documented the new command in the membarrier public header. No changes
>> >> otherwise.
>> > 
>> > It looks like 0day Test Robot also wants some additional header files.
>> > Could you please send me a v3 with its complaints addressed?
>> 
>> I sent a separate patch implementing sync_core() on xtensa. It should
>> be applied before this patch.
> 
> And it looks like we also need a sync_core() on every CPU family other
> than x86, which already has one.  Do we want to make this capability
> depend on an ARCH_ kconfig option, or should we just bit the bullet and
> implement sync_core() everywhere?
> 
> (I had to drop this commit due to downstream complaints.  Left to myself,
> I would take the ARCH_ approach, then add it back in, but please let me
> know how you would like to proceed.)

I plan to go with the ARCH_ approach indeed.

Thanks,

Mathieu

> 
>							Thanx, Paul
> 
>> Hopefully we can get some testing from people with xtensa testing
>> environments.
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> > 
>> >							Thanx, Paul
>> > 
>> >> Thanks,
>> >> 
>> >> Mathieu
>> >> 
>> >> > 
>> >> >							Thanx, Paul
>> >> > 
>> >> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> >> >> CC: Peter Zijlstra <peterz@infradead.org>
>> >> >> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> >> >> CC: Boqun Feng <boqun.feng@gmail.com>
>> >> >> CC: Andrew Hunter <ahh@google.com>
>> >> >> CC: Maged Michael <maged.michael@gmail.com>
>> >> >> CC: gromer@google.com
>> >> >> CC: Avi Kivity <avi@scylladb.com>
>> >> >> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> >> >> CC: Paul Mackerras <paulus@samba.org>
>> >> >> CC: Michael Ellerman <mpe@ellerman.id.au>
>> >> >> CC: Dave Watson <davejwatson@fb.com>
>> >> >> CC: Andy Lutomirski <luto@kernel.org>
>> >> >> CC: Will Deacon <will.deacon@arm.com>
>> >> >> CC: Hans Boehm <hboehm@google.com>
>> >> >> ---
>> >> >>  fs/exec.c                       |  1 +
>> >> >>  include/linux/sched.h           | 52 +++++++++++++++++++++++++++++++++++++++++
>> >> >>  include/uapi/linux/membarrier.h |  1 +
>> >> >>  kernel/fork.c                   |  2 ++
>> >> >>  kernel/sched/core.c             |  3 +++
>> >> >>  kernel/sched/membarrier.c       | 37 ++++++++++++++++++++++++++---
>> >> >>  6 files changed, 93 insertions(+), 3 deletions(-)
>> >> >> 
>> >> >> diff --git a/fs/exec.c b/fs/exec.c
>> >> >> index 62175cbcc801..a4ab3253bac7 100644
>> >> >> --- a/fs/exec.c
>> >> >> +++ b/fs/exec.c
>> >> >> @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename
>> >> >> *filename,
>> >> >>  	/* execve succeeded */
>> >> >>  	current->fs->in_exec = 0;
>> >> >>  	current->in_execve = 0;
>> >> >> +	membarrier_execve(current);
>> >> >>  	acct_update_integrals(current);
>> >> >>  	task_numa_free(current);
>> >> >>  	free_bprm(bprm);
>> >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> >> >> index 8337e2db0bb2..b1ecdc4e8b84 100644
>> >> >> --- a/include/linux/sched.h
>> >> >> +++ b/include/linux/sched.h
>> >> >> @@ -1086,6 +1086,9 @@ struct task_struct {
>> >> >>  	/* Used by LSM modules for access restriction: */
>> >> >>  	void				*security;
>> >> >>  #endif
>> >> >> +#ifdef CONFIG_MEMBARRIER
>> >> >> +	int membarrier_sync_core;
>> >> >> +#endif
>> >> >> 
>> >> >>  	/*
>> >> >>  	 * New fields for task_struct should be added above here, so that
>> >> >> @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask
>> >> >> *mask);
>> >> >>  #define TASK_SIZE_OF(tsk)	TASK_SIZE
>> >> >>  #endif
>> >> >> 
>> >> >> +#ifdef CONFIG_MEMBARRIER
>> >> >> +static inline void membarrier_fork(struct task_struct *t,
>> >> >> +		unsigned long clone_flags)
>> >> >> +{
>> >> >> +	/*
>> >> >> +	 * Coherence of membarrier_sync_core against thread fork is
>> >> >> +	 * protected by siglock. membarrier_fork is called with siglock
>> >> >> +	 * held.
>> >> >> +	 */
>> >> >> +	t->membarrier_sync_core = current->membarrier_sync_core;
>> >> >> +}
>> >> >> +static inline void membarrier_execve(struct task_struct *t)
>> >> >> +{
>> >> >> +	t->membarrier_sync_core = 0;
>> >> >> +}
>> >> >> +static inline void membarrier_sched_out(struct task_struct *t)
>> >> >> +{
>> >> >> +	/*
>> >> >> +	 * Core serialization is performed before the memory barrier
>> >> >> +	 * preceding the store to rq->curr.
>> >> >> +	 */
>> >> >> +	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
>> >> >> +		sync_core();
>> >> >> +}
>> >> >> +static inline void membarrier_sched_in(struct task_struct *t)
>> >> >> +{
>> >> >> +	/*
>> >> >> +	 * Core serialization is performed after the memory barrier
>> >> >> +	 * following the store to rq->curr.
>> >> >> +	 */
>> >> >> +	if (unlikely(READ_ONCE(t->membarrier_sync_core)))
>> >> >> +		sync_core();
>> >> >> +}
>> >> >> +#else
>> >> >> +static inline void membarrier_fork(struct task_struct *t,
>> >> >> +		unsigned long clone_flags)
>> >> >> +{
>> >> >> +}
>> >> >> +static inline void membarrier_execve(struct task_struct *t)
>> >> >> +{
>> >> >> +}
>> >> >> +static inline void membarrier_sched_out(struct task_struct *t)
>> >> >> +{
>> >> >> +}
>> >> >> +static inline void membarrier_sched_in(struct task_struct *t)
>> >> >> +{
>> >> >> +}
>> >> >> +#endif
>> >> >> +
>> >> >>  #endif
>> >> >> diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
>> >> >> index 6d47b3249d8a..bc26fcf8fb7b 100644
>> >> >> --- a/include/uapi/linux/membarrier.h
>> >> >> +++ b/include/uapi/linux/membarrier.h
>> >> >> @@ -67,6 +67,7 @@ enum membarrier_cmd {
>> >> >>  	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
>> >> >>  	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
>> >> >>  	MEMBARRIER_CMD_PRIVATE_EXPEDITED	= (1 << 3),
>> >> >> +	MEMBARRIER_CMD_REGISTER_SYNC_CORE	= (1 << 4),
>> >> >>  };
>> >> >> 
>> >> >>  #endif /* _UAPI_LINUX_MEMBARRIER_H */
>> >> >> diff --git a/kernel/fork.c b/kernel/fork.c
>> >> >> index 17921b0390b4..713b3c932671 100644
>> >> >> --- a/kernel/fork.c
>> >> >> +++ b/kernel/fork.c
>> >> >> @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process(
>> >> >>  	 */
>> >> >>  	copy_seccomp(p);
>> >> >> 
>> >> >> +	membarrier_fork(p, clone_flags);
>> >> >> +
>> >> >>  	/*
>> >> >>  	 * Process group and session signals need to be delivered to just the
>> >> >>  	 * parent before the fork or both the parent and the child after the
>> >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> >> >> index 0e36d9960d91..3ca27d066950 100644
>> >> >> --- a/kernel/sched/core.c
>> >> >> +++ b/kernel/sched/core.c
>> >> >> @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt)
>> >> >>  	local_irq_disable();
>> >> >>  	rcu_note_context_switch(preempt);
>> >> >> 
>> >> >> +	membarrier_sched_out(prev);
>> >> >> +
>> >> >>  	/*
>> >> >>  	 * Make sure that signal_pending_state()->signal_pending() below
>> >> >>  	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
>> >> >> @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt)
>> >> >> 
>> >> >>  		/* Also unlocks the rq: */
>> >> >>  		rq = context_switch(rq, prev, next, &rf);
>> >> >> +		membarrier_sched_in(next);
>> >> >>  	} else {
>> >> >>  		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
>> >> >>  		rq_unlock_irq(rq, &rf);
>> >> >> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
>> >> >> index 7eec6914d2d2..f128caf64b49 100644
>> >> >> --- a/kernel/sched/membarrier.c
>> >> >> +++ b/kernel/sched/membarrier.c
>> >> >> @@ -25,12 +25,16 @@
>> >> >>   * Bitmask made from a "or" of all commands within enum membarrier_cmd,
>> >> >>   * except MEMBARRIER_CMD_QUERY.
>> >> >>   */
>> >> >> -#define MEMBARRIER_CMD_BITMASK	\
>> >> >> -	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED)
>> >> >> +#define MEMBARRIER_CMD_BITMASK			\
>> >> >> +	(MEMBARRIER_CMD_SHARED			\
>> >> >> +	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
>> >> >> +	| MEMBARRIER_CMD_REGISTER_SYNC_CORE)
>> >> >> 
>> >> >>  static void ipi_mb(void *info)
>> >> >>  {
>> >> >> -	smp_mb();	/* IPIs should be serializing but paranoid. */
>> >> >> +	/* IPIs should be serializing but paranoid. */
>> >> >> +	smp_mb();
>> >> >> +	sync_core();
>> >> >>  }
>> >> >> 
>> >> >>  static void membarrier_private_expedited(void)
>> >> >> @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void)
>> >> >>  	smp_mb();	/* exit from system call is not a mb */
>> >> >>  }
>> >> >> 
>> >> >> +static void membarrier_register_sync_core(void)
>> >> >> +{
>> >> >> +	struct task_struct *p = current, *t;
>> >> >> +
>> >> >> +	if (get_nr_threads(p) == 1) {
>> >> >> +		p->membarrier_sync_core = 1;
>> >> >> +		return;
>> >> >> +	}
>> >> >> +
>> >> >> +	/*
>> >> >> +	 * Coherence of membarrier_sync_core against thread fork is
>> >> >> +	 * protected by siglock.
>> >> >> +	 */
>> >> >> +	spin_lock(&p->sighand->siglock);
>> >> >> +	for_each_thread(p, t)
>> >> >> +		WRITE_ONCE(t->membarrier_sync_core, 1);
>> >> >> +	spin_unlock(&p->sighand->siglock);
>> >> >> +	/*
>> >> >> +	 * Ensure all future scheduler execution will observe the new
>> >> >> +	 * membarrier_sync_core state for this process.
>> >> >> +	 */
>> >> >> +	synchronize_sched();
>> >> >> +}
>> >> >> +
>> >> >>  /**
>> >> >>   * sys_membarrier - issue memory barriers on a set of threads
>> >> >>   * @cmd:   Takes command values defined in enum membarrier_cmd.
>> >> >> @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
>> >> >>  	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
>> >> >>  		membarrier_private_expedited();
>> >> >>  		return 0;
>> >> >> +	case MEMBARRIER_CMD_REGISTER_SYNC_CORE:
>> >> >> +		membarrier_register_sync_core();
>> >> >> +		return 0;
>> >> >>  	default:
>> >> >>  		return -EINVAL;
>> >> >>  	}
>> >> >> --
>> >> >> 2.11.0
>> >> 
>> >> --
>> >> Mathieu Desnoyers
>> >> EfficiOS Inc.
>> >> http://www.efficios.com
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd
  2017-08-30 17:53             ` Mathieu Desnoyers
@ 2017-08-30 18:01               ` Paul E. McKenney
  0 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2017-08-30 18:01 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter,
	maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski,
	Will Deacon, Hans Boehm

On Wed, Aug 30, 2017 at 05:53:53PM +0000, Mathieu Desnoyers wrote:
> ----- On Aug 28, 2017, at 1:49 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote:
> 
> > On Sun, Aug 27, 2017 at 09:39:54PM +0000, Mathieu Desnoyers wrote:
> >> ----- On Aug 27, 2017, at 2:00 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com
> >> wrote:
> >> 
> >> > On Sun, Aug 27, 2017 at 08:52:58PM +0000, Mathieu Desnoyers wrote:
> >> >> ----- On Aug 27, 2017, at 1:35 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com
> >> >> wrote:
> >> >> 
> >> >> > On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote:
> >> >> >> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier
> >> >> >> system call. It allows processes to register their intent to have their
> >> >> >> threads issue core serializing barriers in addition to memory barriers
> >> >> >> whenever a membarrier command is performed.
> >> >> >> 
> >> >> >> It is relevant for reclaim of JIT code, which requires to issue core
> >> >> >> serializing barriers on all threads running on behalf of a process
> >> >> >> after ensuring the old code is not visible anymore, before re-using
> >> >> >> memory for new code.
> >> >> >> 
> >> >> >> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE
> >> >> >> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and
> >> >> >> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in
> >> >> >> addition to the memory barriers, on all of its running threads.
> >> >> > 
> >> >> > I have queued both of these for testing and review:
> >> >> > 
> >> >> >  git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> >> >> >  f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd")
> >> >> >  0d6eb99818da ("membarrier: Provide register sync core cmd")
> >> >> > 
> >> >> > Could the people who asked for this please test it?  The second commit
> >> >> > is non-obvious enough to need some careful review and intensive testing
> >> >> > before I can in good conscious pass it upstream.
> >> >> 
> >> >> Thanks Paul!
> >> >> 
> >> >> Can you pick up my v2 instead ? I added benchmarks in the changelog and
> >> >> documented the new command in the membarrier public header. No changes
> >> >> otherwise.
> >> > 
> >> > It looks like 0day Test Robot also wants some additional header files.
> >> > Could you please send me a v3 with its complaints addressed?
> >> 
> >> I sent a separate patch implementing sync_core() on xtensa. It should
> >> be applied before this patch.
> > 
> > And it looks like we also need a sync_core() on every CPU family other
> > than x86, which already has one.  Do we want to make this capability
> > depend on an ARCH_ kconfig option, or should we just bit the bullet and
> > implement sync_core() everywhere?
> > 
> > (I had to drop this commit due to downstream complaints.  Left to myself,
> > I would take the ARCH_ approach, then add it back in, but please let me
> > know how you would like to proceed.)
> 
> I plan to go with the ARCH_ approach indeed.

Sounds good, looking forward to seeing it.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-08-30 18:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-27 19:54 [RFC PATCH 1/2] membarrier: speed up single-threaded private cmd Mathieu Desnoyers
2017-08-27 19:54 ` [RFC PATCH 2/2] membarrier: provide register sync core cmd Mathieu Desnoyers
2017-08-27 20:35   ` Paul E. McKenney
2017-08-27 20:52     ` Mathieu Desnoyers
2017-08-27 21:00       ` Paul E. McKenney
2017-08-27 21:39         ` Mathieu Desnoyers
2017-08-28 17:49           ` Paul E. McKenney
2017-08-30 17:53             ` Mathieu Desnoyers
2017-08-30 18:01               ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.