* [RFC PATCH 1/2] membarrier: speed up single-threaded private cmd @ 2017-08-27 19:54 Mathieu Desnoyers 2017-08-27 19:54 ` [RFC PATCH 2/2] membarrier: provide register sync core cmd Mathieu Desnoyers 0 siblings, 1 reply; 9+ messages in thread From: Mathieu Desnoyers @ 2017-08-27 19:54 UTC (permalink / raw) To: Paul E . McKenney, Peter Zijlstra Cc: linux-kernel, Mathieu Desnoyers, Boqun Feng, Andrew Hunter, Maged Michael, gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Dave Watson Single-threaded processes don't need to issue barriers, since the thread's current CPU is the only one that matters. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: Boqun Feng <boqun.feng@gmail.com> CC: Andrew Hunter <ahh@google.com> CC: Maged Michael <maged.michael@gmail.com> CC: gromer@google.com CC: Avi Kivity <avi@scylladb.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Michael Ellerman <mpe@ellerman.id.au> CC: Dave Watson <davejwatson@fb.com> --- kernel/sched/membarrier.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index a92fddc22747..7eec6914d2d2 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -39,7 +39,7 @@ static void membarrier_private_expedited(void) bool fallback = false; cpumask_var_t tmpmask; - if (num_online_cpus() == 1) + if (num_online_cpus() == 1 || get_nr_threads(current) == 1) return; /* -- 2.11.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC PATCH 2/2] membarrier: provide register sync core cmd 2017-08-27 19:54 [RFC PATCH 1/2] membarrier: speed up single-threaded private cmd Mathieu Desnoyers @ 2017-08-27 19:54 ` Mathieu Desnoyers 2017-08-27 20:35 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: Mathieu Desnoyers @ 2017-08-27 19:54 UTC (permalink / raw) To: Paul E . McKenney, Peter Zijlstra Cc: linux-kernel, Mathieu Desnoyers, Boqun Feng, Andrew Hunter, Maged Michael, gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski, Will Deacon, Hans Boehm Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier system call. It allows processes to register their intent to have their threads issue core serializing barriers in addition to memory barriers whenever a membarrier command is performed. It is relevant for reclaim of JIT code, which requires to issue core serializing barriers on all threads running on behalf of a process after ensuring the old code is not visible anymore, before re-using memory for new code. When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in addition to the memory barriers, on all of its running threads. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: Boqun Feng <boqun.feng@gmail.com> CC: Andrew Hunter <ahh@google.com> CC: Maged Michael <maged.michael@gmail.com> CC: gromer@google.com CC: Avi Kivity <avi@scylladb.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Michael Ellerman <mpe@ellerman.id.au> CC: Dave Watson <davejwatson@fb.com> CC: Andy Lutomirski <luto@kernel.org> CC: Will Deacon <will.deacon@arm.com> CC: Hans Boehm <hboehm@google.com> --- fs/exec.c | 1 + include/linux/sched.h | 52 +++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/membarrier.h | 1 + kernel/fork.c | 2 ++ kernel/sched/core.c | 3 +++ kernel/sched/membarrier.c | 37 ++++++++++++++++++++++++++--- 6 files changed, 93 insertions(+), 3 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 62175cbcc801..a4ab3253bac7 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename *filename, /* execve succeeded */ current->fs->in_exec = 0; current->in_execve = 0; + membarrier_execve(current); acct_update_integrals(current); task_numa_free(current); free_bprm(bprm); diff --git a/include/linux/sched.h b/include/linux/sched.h index 8337e2db0bb2..b1ecdc4e8b84 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1086,6 +1086,9 @@ struct task_struct { /* Used by LSM modules for access restriction: */ void *security; #endif +#ifdef CONFIG_MEMBARRIER + int membarrier_sync_core; +#endif /* * New fields for task_struct should be added above here, so that @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask); #define TASK_SIZE_OF(tsk) TASK_SIZE #endif +#ifdef CONFIG_MEMBARRIER +static inline void membarrier_fork(struct task_struct *t, + unsigned long clone_flags) +{ + /* + * Coherence of membarrier_sync_core against thread fork is + * protected by siglock. membarrier_fork is called with siglock + * held. + */ + t->membarrier_sync_core = current->membarrier_sync_core; +} +static inline void membarrier_execve(struct task_struct *t) +{ + t->membarrier_sync_core = 0; +} +static inline void membarrier_sched_out(struct task_struct *t) +{ + /* + * Core serialization is performed before the memory barrier + * preceding the store to rq->curr. + */ + if (unlikely(READ_ONCE(t->membarrier_sync_core))) + sync_core(); +} +static inline void membarrier_sched_in(struct task_struct *t) +{ + /* + * Core serialization is performed after the memory barrier + * following the store to rq->curr. + */ + if (unlikely(READ_ONCE(t->membarrier_sync_core))) + sync_core(); +} +#else +static inline void membarrier_fork(struct task_struct *t, + unsigned long clone_flags) +{ +} +static inline void membarrier_execve(struct task_struct *t) +{ +} +static inline void membarrier_sched_out(struct task_struct *t) +{ +} +static inline void membarrier_sched_in(struct task_struct *t) +{ +} +#endif + #endif diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h index 6d47b3249d8a..bc26fcf8fb7b 100644 --- a/include/uapi/linux/membarrier.h +++ b/include/uapi/linux/membarrier.h @@ -67,6 +67,7 @@ enum membarrier_cmd { /* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */ /* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */ MEMBARRIER_CMD_PRIVATE_EXPEDITED = (1 << 3), + MEMBARRIER_CMD_REGISTER_SYNC_CORE = (1 << 4), }; #endif /* _UAPI_LINUX_MEMBARRIER_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 17921b0390b4..713b3c932671 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process( */ copy_seccomp(p); + membarrier_fork(p, clone_flags); + /* * Process group and session signals need to be delivered to just the * parent before the fork or both the parent and the child after the diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0e36d9960d91..3ca27d066950 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt) local_irq_disable(); rcu_note_context_switch(preempt); + membarrier_sched_out(prev); + /* * Make sure that signal_pending_state()->signal_pending() below * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt) /* Also unlocks the rq: */ rq = context_switch(rq, prev, next, &rf); + membarrier_sched_in(next); } else { rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); rq_unlock_irq(rq, &rf); diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index 7eec6914d2d2..f128caf64b49 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -25,12 +25,16 @@ * Bitmask made from a "or" of all commands within enum membarrier_cmd, * except MEMBARRIER_CMD_QUERY. */ -#define MEMBARRIER_CMD_BITMASK \ - (MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED) +#define MEMBARRIER_CMD_BITMASK \ + (MEMBARRIER_CMD_SHARED \ + | MEMBARRIER_CMD_PRIVATE_EXPEDITED \ + | MEMBARRIER_CMD_REGISTER_SYNC_CORE) static void ipi_mb(void *info) { - smp_mb(); /* IPIs should be serializing but paranoid. */ + /* IPIs should be serializing but paranoid. */ + smp_mb(); + sync_core(); } static void membarrier_private_expedited(void) @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void) smp_mb(); /* exit from system call is not a mb */ } +static void membarrier_register_sync_core(void) +{ + struct task_struct *p = current, *t; + + if (get_nr_threads(p) == 1) { + p->membarrier_sync_core = 1; + return; + } + + /* + * Coherence of membarrier_sync_core against thread fork is + * protected by siglock. + */ + spin_lock(&p->sighand->siglock); + for_each_thread(p, t) + WRITE_ONCE(t->membarrier_sync_core, 1); + spin_unlock(&p->sighand->siglock); + /* + * Ensure all future scheduler execution will observe the new + * membarrier_sync_core state for this process. + */ + synchronize_sched(); +} + /** * sys_membarrier - issue memory barriers on a set of threads * @cmd: Takes command values defined in enum membarrier_cmd. @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) case MEMBARRIER_CMD_PRIVATE_EXPEDITED: membarrier_private_expedited(); return 0; + case MEMBARRIER_CMD_REGISTER_SYNC_CORE: + membarrier_register_sync_core(); + return 0; default: return -EINVAL; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd 2017-08-27 19:54 ` [RFC PATCH 2/2] membarrier: provide register sync core cmd Mathieu Desnoyers @ 2017-08-27 20:35 ` Paul E. McKenney 2017-08-27 20:52 ` Mathieu Desnoyers 0 siblings, 1 reply; 9+ messages in thread From: Paul E. McKenney @ 2017-08-27 20:35 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter, Maged Michael, gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski, Will Deacon, Hans Boehm On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote: > Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier > system call. It allows processes to register their intent to have their > threads issue core serializing barriers in addition to memory barriers > whenever a membarrier command is performed. > > It is relevant for reclaim of JIT code, which requires to issue core > serializing barriers on all threads running on behalf of a process > after ensuring the old code is not visible anymore, before re-using > memory for new code. > > When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE > command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and > MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in > addition to the memory barriers, on all of its running threads. I have queued both of these for testing and review: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd") 0d6eb99818da ("membarrier: Provide register sync core cmd") Could the people who asked for this please test it? The second commit is non-obvious enough to need some careful review and intensive testing before I can in good conscious pass it upstream. Thanx, Paul > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > CC: Peter Zijlstra <peterz@infradead.org> > CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > CC: Boqun Feng <boqun.feng@gmail.com> > CC: Andrew Hunter <ahh@google.com> > CC: Maged Michael <maged.michael@gmail.com> > CC: gromer@google.com > CC: Avi Kivity <avi@scylladb.com> > CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> > CC: Paul Mackerras <paulus@samba.org> > CC: Michael Ellerman <mpe@ellerman.id.au> > CC: Dave Watson <davejwatson@fb.com> > CC: Andy Lutomirski <luto@kernel.org> > CC: Will Deacon <will.deacon@arm.com> > CC: Hans Boehm <hboehm@google.com> > --- > fs/exec.c | 1 + > include/linux/sched.h | 52 +++++++++++++++++++++++++++++++++++++++++ > include/uapi/linux/membarrier.h | 1 + > kernel/fork.c | 2 ++ > kernel/sched/core.c | 3 +++ > kernel/sched/membarrier.c | 37 ++++++++++++++++++++++++++--- > 6 files changed, 93 insertions(+), 3 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 62175cbcc801..a4ab3253bac7 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename *filename, > /* execve succeeded */ > current->fs->in_exec = 0; > current->in_execve = 0; > + membarrier_execve(current); > acct_update_integrals(current); > task_numa_free(current); > free_bprm(bprm); > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 8337e2db0bb2..b1ecdc4e8b84 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1086,6 +1086,9 @@ struct task_struct { > /* Used by LSM modules for access restriction: */ > void *security; > #endif > +#ifdef CONFIG_MEMBARRIER > + int membarrier_sync_core; > +#endif > > /* > * New fields for task_struct should be added above here, so that > @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask); > #define TASK_SIZE_OF(tsk) TASK_SIZE > #endif > > +#ifdef CONFIG_MEMBARRIER > +static inline void membarrier_fork(struct task_struct *t, > + unsigned long clone_flags) > +{ > + /* > + * Coherence of membarrier_sync_core against thread fork is > + * protected by siglock. membarrier_fork is called with siglock > + * held. > + */ > + t->membarrier_sync_core = current->membarrier_sync_core; > +} > +static inline void membarrier_execve(struct task_struct *t) > +{ > + t->membarrier_sync_core = 0; > +} > +static inline void membarrier_sched_out(struct task_struct *t) > +{ > + /* > + * Core serialization is performed before the memory barrier > + * preceding the store to rq->curr. > + */ > + if (unlikely(READ_ONCE(t->membarrier_sync_core))) > + sync_core(); > +} > +static inline void membarrier_sched_in(struct task_struct *t) > +{ > + /* > + * Core serialization is performed after the memory barrier > + * following the store to rq->curr. > + */ > + if (unlikely(READ_ONCE(t->membarrier_sync_core))) > + sync_core(); > +} > +#else > +static inline void membarrier_fork(struct task_struct *t, > + unsigned long clone_flags) > +{ > +} > +static inline void membarrier_execve(struct task_struct *t) > +{ > +} > +static inline void membarrier_sched_out(struct task_struct *t) > +{ > +} > +static inline void membarrier_sched_in(struct task_struct *t) > +{ > +} > +#endif > + > #endif > diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h > index 6d47b3249d8a..bc26fcf8fb7b 100644 > --- a/include/uapi/linux/membarrier.h > +++ b/include/uapi/linux/membarrier.h > @@ -67,6 +67,7 @@ enum membarrier_cmd { > /* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */ > /* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */ > MEMBARRIER_CMD_PRIVATE_EXPEDITED = (1 << 3), > + MEMBARRIER_CMD_REGISTER_SYNC_CORE = (1 << 4), > }; > > #endif /* _UAPI_LINUX_MEMBARRIER_H */ > diff --git a/kernel/fork.c b/kernel/fork.c > index 17921b0390b4..713b3c932671 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process( > */ > copy_seccomp(p); > > + membarrier_fork(p, clone_flags); > + > /* > * Process group and session signals need to be delivered to just the > * parent before the fork or both the parent and the child after the > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 0e36d9960d91..3ca27d066950 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt) > local_irq_disable(); > rcu_note_context_switch(preempt); > > + membarrier_sched_out(prev); > + > /* > * Make sure that signal_pending_state()->signal_pending() below > * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) > @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt) > > /* Also unlocks the rq: */ > rq = context_switch(rq, prev, next, &rf); > + membarrier_sched_in(next); > } else { > rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); > rq_unlock_irq(rq, &rf); > diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c > index 7eec6914d2d2..f128caf64b49 100644 > --- a/kernel/sched/membarrier.c > +++ b/kernel/sched/membarrier.c > @@ -25,12 +25,16 @@ > * Bitmask made from a "or" of all commands within enum membarrier_cmd, > * except MEMBARRIER_CMD_QUERY. > */ > -#define MEMBARRIER_CMD_BITMASK \ > - (MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED) > +#define MEMBARRIER_CMD_BITMASK \ > + (MEMBARRIER_CMD_SHARED \ > + | MEMBARRIER_CMD_PRIVATE_EXPEDITED \ > + | MEMBARRIER_CMD_REGISTER_SYNC_CORE) > > static void ipi_mb(void *info) > { > - smp_mb(); /* IPIs should be serializing but paranoid. */ > + /* IPIs should be serializing but paranoid. */ > + smp_mb(); > + sync_core(); > } > > static void membarrier_private_expedited(void) > @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void) > smp_mb(); /* exit from system call is not a mb */ > } > > +static void membarrier_register_sync_core(void) > +{ > + struct task_struct *p = current, *t; > + > + if (get_nr_threads(p) == 1) { > + p->membarrier_sync_core = 1; > + return; > + } > + > + /* > + * Coherence of membarrier_sync_core against thread fork is > + * protected by siglock. > + */ > + spin_lock(&p->sighand->siglock); > + for_each_thread(p, t) > + WRITE_ONCE(t->membarrier_sync_core, 1); > + spin_unlock(&p->sighand->siglock); > + /* > + * Ensure all future scheduler execution will observe the new > + * membarrier_sync_core state for this process. > + */ > + synchronize_sched(); > +} > + > /** > * sys_membarrier - issue memory barriers on a set of threads > * @cmd: Takes command values defined in enum membarrier_cmd. > @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) > case MEMBARRIER_CMD_PRIVATE_EXPEDITED: > membarrier_private_expedited(); > return 0; > + case MEMBARRIER_CMD_REGISTER_SYNC_CORE: > + membarrier_register_sync_core(); > + return 0; > default: > return -EINVAL; > } > -- > 2.11.0 > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd 2017-08-27 20:35 ` Paul E. McKenney @ 2017-08-27 20:52 ` Mathieu Desnoyers 2017-08-27 21:00 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: Mathieu Desnoyers @ 2017-08-27 20:52 UTC (permalink / raw) To: Paul E. McKenney Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter, maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski, Will Deacon, Hans Boehm ----- On Aug 27, 2017, at 1:35 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote: >> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier >> system call. It allows processes to register their intent to have their >> threads issue core serializing barriers in addition to memory barriers >> whenever a membarrier command is performed. >> >> It is relevant for reclaim of JIT code, which requires to issue core >> serializing barriers on all threads running on behalf of a process >> after ensuring the old code is not visible anymore, before re-using >> memory for new code. >> >> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE >> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and >> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in >> addition to the memory barriers, on all of its running threads. > > I have queued both of these for testing and review: > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git > f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd") > 0d6eb99818da ("membarrier: Provide register sync core cmd") > > Could the people who asked for this please test it? The second commit > is non-obvious enough to need some careful review and intensive testing > before I can in good conscious pass it upstream. Thanks Paul! Can you pick up my v2 instead ? I added benchmarks in the changelog and documented the new command in the membarrier public header. No changes otherwise. Thanks, Mathieu > > Thanx, Paul > >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> >> CC: Peter Zijlstra <peterz@infradead.org> >> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >> CC: Boqun Feng <boqun.feng@gmail.com> >> CC: Andrew Hunter <ahh@google.com> >> CC: Maged Michael <maged.michael@gmail.com> >> CC: gromer@google.com >> CC: Avi Kivity <avi@scylladb.com> >> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> >> CC: Paul Mackerras <paulus@samba.org> >> CC: Michael Ellerman <mpe@ellerman.id.au> >> CC: Dave Watson <davejwatson@fb.com> >> CC: Andy Lutomirski <luto@kernel.org> >> CC: Will Deacon <will.deacon@arm.com> >> CC: Hans Boehm <hboehm@google.com> >> --- >> fs/exec.c | 1 + >> include/linux/sched.h | 52 +++++++++++++++++++++++++++++++++++++++++ >> include/uapi/linux/membarrier.h | 1 + >> kernel/fork.c | 2 ++ >> kernel/sched/core.c | 3 +++ >> kernel/sched/membarrier.c | 37 ++++++++++++++++++++++++++--- >> 6 files changed, 93 insertions(+), 3 deletions(-) >> >> diff --git a/fs/exec.c b/fs/exec.c >> index 62175cbcc801..a4ab3253bac7 100644 >> --- a/fs/exec.c >> +++ b/fs/exec.c >> @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename >> *filename, >> /* execve succeeded */ >> current->fs->in_exec = 0; >> current->in_execve = 0; >> + membarrier_execve(current); >> acct_update_integrals(current); >> task_numa_free(current); >> free_bprm(bprm); >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index 8337e2db0bb2..b1ecdc4e8b84 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -1086,6 +1086,9 @@ struct task_struct { >> /* Used by LSM modules for access restriction: */ >> void *security; >> #endif >> +#ifdef CONFIG_MEMBARRIER >> + int membarrier_sync_core; >> +#endif >> >> /* >> * New fields for task_struct should be added above here, so that >> @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask >> *mask); >> #define TASK_SIZE_OF(tsk) TASK_SIZE >> #endif >> >> +#ifdef CONFIG_MEMBARRIER >> +static inline void membarrier_fork(struct task_struct *t, >> + unsigned long clone_flags) >> +{ >> + /* >> + * Coherence of membarrier_sync_core against thread fork is >> + * protected by siglock. membarrier_fork is called with siglock >> + * held. >> + */ >> + t->membarrier_sync_core = current->membarrier_sync_core; >> +} >> +static inline void membarrier_execve(struct task_struct *t) >> +{ >> + t->membarrier_sync_core = 0; >> +} >> +static inline void membarrier_sched_out(struct task_struct *t) >> +{ >> + /* >> + * Core serialization is performed before the memory barrier >> + * preceding the store to rq->curr. >> + */ >> + if (unlikely(READ_ONCE(t->membarrier_sync_core))) >> + sync_core(); >> +} >> +static inline void membarrier_sched_in(struct task_struct *t) >> +{ >> + /* >> + * Core serialization is performed after the memory barrier >> + * following the store to rq->curr. >> + */ >> + if (unlikely(READ_ONCE(t->membarrier_sync_core))) >> + sync_core(); >> +} >> +#else >> +static inline void membarrier_fork(struct task_struct *t, >> + unsigned long clone_flags) >> +{ >> +} >> +static inline void membarrier_execve(struct task_struct *t) >> +{ >> +} >> +static inline void membarrier_sched_out(struct task_struct *t) >> +{ >> +} >> +static inline void membarrier_sched_in(struct task_struct *t) >> +{ >> +} >> +#endif >> + >> #endif >> diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h >> index 6d47b3249d8a..bc26fcf8fb7b 100644 >> --- a/include/uapi/linux/membarrier.h >> +++ b/include/uapi/linux/membarrier.h >> @@ -67,6 +67,7 @@ enum membarrier_cmd { >> /* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */ >> /* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */ >> MEMBARRIER_CMD_PRIVATE_EXPEDITED = (1 << 3), >> + MEMBARRIER_CMD_REGISTER_SYNC_CORE = (1 << 4), >> }; >> >> #endif /* _UAPI_LINUX_MEMBARRIER_H */ >> diff --git a/kernel/fork.c b/kernel/fork.c >> index 17921b0390b4..713b3c932671 100644 >> --- a/kernel/fork.c >> +++ b/kernel/fork.c >> @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process( >> */ >> copy_seccomp(p); >> >> + membarrier_fork(p, clone_flags); >> + >> /* >> * Process group and session signals need to be delivered to just the >> * parent before the fork or both the parent and the child after the >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 0e36d9960d91..3ca27d066950 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt) >> local_irq_disable(); >> rcu_note_context_switch(preempt); >> >> + membarrier_sched_out(prev); >> + >> /* >> * Make sure that signal_pending_state()->signal_pending() below >> * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) >> @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt) >> >> /* Also unlocks the rq: */ >> rq = context_switch(rq, prev, next, &rf); >> + membarrier_sched_in(next); >> } else { >> rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); >> rq_unlock_irq(rq, &rf); >> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c >> index 7eec6914d2d2..f128caf64b49 100644 >> --- a/kernel/sched/membarrier.c >> +++ b/kernel/sched/membarrier.c >> @@ -25,12 +25,16 @@ >> * Bitmask made from a "or" of all commands within enum membarrier_cmd, >> * except MEMBARRIER_CMD_QUERY. >> */ >> -#define MEMBARRIER_CMD_BITMASK \ >> - (MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED) >> +#define MEMBARRIER_CMD_BITMASK \ >> + (MEMBARRIER_CMD_SHARED \ >> + | MEMBARRIER_CMD_PRIVATE_EXPEDITED \ >> + | MEMBARRIER_CMD_REGISTER_SYNC_CORE) >> >> static void ipi_mb(void *info) >> { >> - smp_mb(); /* IPIs should be serializing but paranoid. */ >> + /* IPIs should be serializing but paranoid. */ >> + smp_mb(); >> + sync_core(); >> } >> >> static void membarrier_private_expedited(void) >> @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void) >> smp_mb(); /* exit from system call is not a mb */ >> } >> >> +static void membarrier_register_sync_core(void) >> +{ >> + struct task_struct *p = current, *t; >> + >> + if (get_nr_threads(p) == 1) { >> + p->membarrier_sync_core = 1; >> + return; >> + } >> + >> + /* >> + * Coherence of membarrier_sync_core against thread fork is >> + * protected by siglock. >> + */ >> + spin_lock(&p->sighand->siglock); >> + for_each_thread(p, t) >> + WRITE_ONCE(t->membarrier_sync_core, 1); >> + spin_unlock(&p->sighand->siglock); >> + /* >> + * Ensure all future scheduler execution will observe the new >> + * membarrier_sync_core state for this process. >> + */ >> + synchronize_sched(); >> +} >> + >> /** >> * sys_membarrier - issue memory barriers on a set of threads >> * @cmd: Takes command values defined in enum membarrier_cmd. >> @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) >> case MEMBARRIER_CMD_PRIVATE_EXPEDITED: >> membarrier_private_expedited(); >> return 0; >> + case MEMBARRIER_CMD_REGISTER_SYNC_CORE: >> + membarrier_register_sync_core(); >> + return 0; >> default: >> return -EINVAL; >> } >> -- >> 2.11.0 -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd 2017-08-27 20:52 ` Mathieu Desnoyers @ 2017-08-27 21:00 ` Paul E. McKenney 2017-08-27 21:39 ` Mathieu Desnoyers 0 siblings, 1 reply; 9+ messages in thread From: Paul E. McKenney @ 2017-08-27 21:00 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter, maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski, Will Deacon, Hans Boehm On Sun, Aug 27, 2017 at 08:52:58PM +0000, Mathieu Desnoyers wrote: > ----- On Aug 27, 2017, at 1:35 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > > > On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote: > >> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier > >> system call. It allows processes to register their intent to have their > >> threads issue core serializing barriers in addition to memory barriers > >> whenever a membarrier command is performed. > >> > >> It is relevant for reclaim of JIT code, which requires to issue core > >> serializing barriers on all threads running on behalf of a process > >> after ensuring the old code is not visible anymore, before re-using > >> memory for new code. > >> > >> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE > >> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and > >> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in > >> addition to the memory barriers, on all of its running threads. > > > > I have queued both of these for testing and review: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git > > f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd") > > 0d6eb99818da ("membarrier: Provide register sync core cmd") > > > > Could the people who asked for this please test it? The second commit > > is non-obvious enough to need some careful review and intensive testing > > before I can in good conscious pass it upstream. > > Thanks Paul! > > Can you pick up my v2 instead ? I added benchmarks in the changelog and > documented the new command in the membarrier public header. No changes > otherwise. It looks like 0day Test Robot also wants some additional header files. Could you please send me a v3 with its complaints addressed? Thanx, Paul > Thanks, > > Mathieu > > > > > Thanx, Paul > > > >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > >> CC: Peter Zijlstra <peterz@infradead.org> > >> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >> CC: Boqun Feng <boqun.feng@gmail.com> > >> CC: Andrew Hunter <ahh@google.com> > >> CC: Maged Michael <maged.michael@gmail.com> > >> CC: gromer@google.com > >> CC: Avi Kivity <avi@scylladb.com> > >> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> > >> CC: Paul Mackerras <paulus@samba.org> > >> CC: Michael Ellerman <mpe@ellerman.id.au> > >> CC: Dave Watson <davejwatson@fb.com> > >> CC: Andy Lutomirski <luto@kernel.org> > >> CC: Will Deacon <will.deacon@arm.com> > >> CC: Hans Boehm <hboehm@google.com> > >> --- > >> fs/exec.c | 1 + > >> include/linux/sched.h | 52 +++++++++++++++++++++++++++++++++++++++++ > >> include/uapi/linux/membarrier.h | 1 + > >> kernel/fork.c | 2 ++ > >> kernel/sched/core.c | 3 +++ > >> kernel/sched/membarrier.c | 37 ++++++++++++++++++++++++++--- > >> 6 files changed, 93 insertions(+), 3 deletions(-) > >> > >> diff --git a/fs/exec.c b/fs/exec.c > >> index 62175cbcc801..a4ab3253bac7 100644 > >> --- a/fs/exec.c > >> +++ b/fs/exec.c > >> @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename > >> *filename, > >> /* execve succeeded */ > >> current->fs->in_exec = 0; > >> current->in_execve = 0; > >> + membarrier_execve(current); > >> acct_update_integrals(current); > >> task_numa_free(current); > >> free_bprm(bprm); > >> diff --git a/include/linux/sched.h b/include/linux/sched.h > >> index 8337e2db0bb2..b1ecdc4e8b84 100644 > >> --- a/include/linux/sched.h > >> +++ b/include/linux/sched.h > >> @@ -1086,6 +1086,9 @@ struct task_struct { > >> /* Used by LSM modules for access restriction: */ > >> void *security; > >> #endif > >> +#ifdef CONFIG_MEMBARRIER > >> + int membarrier_sync_core; > >> +#endif > >> > >> /* > >> * New fields for task_struct should be added above here, so that > >> @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask > >> *mask); > >> #define TASK_SIZE_OF(tsk) TASK_SIZE > >> #endif > >> > >> +#ifdef CONFIG_MEMBARRIER > >> +static inline void membarrier_fork(struct task_struct *t, > >> + unsigned long clone_flags) > >> +{ > >> + /* > >> + * Coherence of membarrier_sync_core against thread fork is > >> + * protected by siglock. membarrier_fork is called with siglock > >> + * held. > >> + */ > >> + t->membarrier_sync_core = current->membarrier_sync_core; > >> +} > >> +static inline void membarrier_execve(struct task_struct *t) > >> +{ > >> + t->membarrier_sync_core = 0; > >> +} > >> +static inline void membarrier_sched_out(struct task_struct *t) > >> +{ > >> + /* > >> + * Core serialization is performed before the memory barrier > >> + * preceding the store to rq->curr. > >> + */ > >> + if (unlikely(READ_ONCE(t->membarrier_sync_core))) > >> + sync_core(); > >> +} > >> +static inline void membarrier_sched_in(struct task_struct *t) > >> +{ > >> + /* > >> + * Core serialization is performed after the memory barrier > >> + * following the store to rq->curr. > >> + */ > >> + if (unlikely(READ_ONCE(t->membarrier_sync_core))) > >> + sync_core(); > >> +} > >> +#else > >> +static inline void membarrier_fork(struct task_struct *t, > >> + unsigned long clone_flags) > >> +{ > >> +} > >> +static inline void membarrier_execve(struct task_struct *t) > >> +{ > >> +} > >> +static inline void membarrier_sched_out(struct task_struct *t) > >> +{ > >> +} > >> +static inline void membarrier_sched_in(struct task_struct *t) > >> +{ > >> +} > >> +#endif > >> + > >> #endif > >> diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h > >> index 6d47b3249d8a..bc26fcf8fb7b 100644 > >> --- a/include/uapi/linux/membarrier.h > >> +++ b/include/uapi/linux/membarrier.h > >> @@ -67,6 +67,7 @@ enum membarrier_cmd { > >> /* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */ > >> /* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */ > >> MEMBARRIER_CMD_PRIVATE_EXPEDITED = (1 << 3), > >> + MEMBARRIER_CMD_REGISTER_SYNC_CORE = (1 << 4), > >> }; > >> > >> #endif /* _UAPI_LINUX_MEMBARRIER_H */ > >> diff --git a/kernel/fork.c b/kernel/fork.c > >> index 17921b0390b4..713b3c932671 100644 > >> --- a/kernel/fork.c > >> +++ b/kernel/fork.c > >> @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process( > >> */ > >> copy_seccomp(p); > >> > >> + membarrier_fork(p, clone_flags); > >> + > >> /* > >> * Process group and session signals need to be delivered to just the > >> * parent before the fork or both the parent and the child after the > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c > >> index 0e36d9960d91..3ca27d066950 100644 > >> --- a/kernel/sched/core.c > >> +++ b/kernel/sched/core.c > >> @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt) > >> local_irq_disable(); > >> rcu_note_context_switch(preempt); > >> > >> + membarrier_sched_out(prev); > >> + > >> /* > >> * Make sure that signal_pending_state()->signal_pending() below > >> * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) > >> @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt) > >> > >> /* Also unlocks the rq: */ > >> rq = context_switch(rq, prev, next, &rf); > >> + membarrier_sched_in(next); > >> } else { > >> rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); > >> rq_unlock_irq(rq, &rf); > >> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c > >> index 7eec6914d2d2..f128caf64b49 100644 > >> --- a/kernel/sched/membarrier.c > >> +++ b/kernel/sched/membarrier.c > >> @@ -25,12 +25,16 @@ > >> * Bitmask made from a "or" of all commands within enum membarrier_cmd, > >> * except MEMBARRIER_CMD_QUERY. > >> */ > >> -#define MEMBARRIER_CMD_BITMASK \ > >> - (MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED) > >> +#define MEMBARRIER_CMD_BITMASK \ > >> + (MEMBARRIER_CMD_SHARED \ > >> + | MEMBARRIER_CMD_PRIVATE_EXPEDITED \ > >> + | MEMBARRIER_CMD_REGISTER_SYNC_CORE) > >> > >> static void ipi_mb(void *info) > >> { > >> - smp_mb(); /* IPIs should be serializing but paranoid. */ > >> + /* IPIs should be serializing but paranoid. */ > >> + smp_mb(); > >> + sync_core(); > >> } > >> > >> static void membarrier_private_expedited(void) > >> @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void) > >> smp_mb(); /* exit from system call is not a mb */ > >> } > >> > >> +static void membarrier_register_sync_core(void) > >> +{ > >> + struct task_struct *p = current, *t; > >> + > >> + if (get_nr_threads(p) == 1) { > >> + p->membarrier_sync_core = 1; > >> + return; > >> + } > >> + > >> + /* > >> + * Coherence of membarrier_sync_core against thread fork is > >> + * protected by siglock. > >> + */ > >> + spin_lock(&p->sighand->siglock); > >> + for_each_thread(p, t) > >> + WRITE_ONCE(t->membarrier_sync_core, 1); > >> + spin_unlock(&p->sighand->siglock); > >> + /* > >> + * Ensure all future scheduler execution will observe the new > >> + * membarrier_sync_core state for this process. > >> + */ > >> + synchronize_sched(); > >> +} > >> + > >> /** > >> * sys_membarrier - issue memory barriers on a set of threads > >> * @cmd: Takes command values defined in enum membarrier_cmd. > >> @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) > >> case MEMBARRIER_CMD_PRIVATE_EXPEDITED: > >> membarrier_private_expedited(); > >> return 0; > >> + case MEMBARRIER_CMD_REGISTER_SYNC_CORE: > >> + membarrier_register_sync_core(); > >> + return 0; > >> default: > >> return -EINVAL; > >> } > >> -- > >> 2.11.0 > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd 2017-08-27 21:00 ` Paul E. McKenney @ 2017-08-27 21:39 ` Mathieu Desnoyers 2017-08-28 17:49 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: Mathieu Desnoyers @ 2017-08-27 21:39 UTC (permalink / raw) To: Paul E. McKenney Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter, maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski, Will Deacon, Hans Boehm ----- On Aug 27, 2017, at 2:00 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > On Sun, Aug 27, 2017 at 08:52:58PM +0000, Mathieu Desnoyers wrote: >> ----- On Aug 27, 2017, at 1:35 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com >> wrote: >> >> > On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote: >> >> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier >> >> system call. It allows processes to register their intent to have their >> >> threads issue core serializing barriers in addition to memory barriers >> >> whenever a membarrier command is performed. >> >> >> >> It is relevant for reclaim of JIT code, which requires to issue core >> >> serializing barriers on all threads running on behalf of a process >> >> after ensuring the old code is not visible anymore, before re-using >> >> memory for new code. >> >> >> >> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE >> >> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and >> >> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in >> >> addition to the memory barriers, on all of its running threads. >> > >> > I have queued both of these for testing and review: >> > >> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git >> > f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd") >> > 0d6eb99818da ("membarrier: Provide register sync core cmd") >> > >> > Could the people who asked for this please test it? The second commit >> > is non-obvious enough to need some careful review and intensive testing >> > before I can in good conscious pass it upstream. >> >> Thanks Paul! >> >> Can you pick up my v2 instead ? I added benchmarks in the changelog and >> documented the new command in the membarrier public header. No changes >> otherwise. > > It looks like 0day Test Robot also wants some additional header files. > Could you please send me a v3 with its complaints addressed? I sent a separate patch implementing sync_core() on xtensa. It should be applied before this patch. Hopefully we can get some testing from people with xtensa testing environments. Thanks, Mathieu > > Thanx, Paul > >> Thanks, >> >> Mathieu >> >> > >> > Thanx, Paul >> > >> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> >> >> CC: Peter Zijlstra <peterz@infradead.org> >> >> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >> >> CC: Boqun Feng <boqun.feng@gmail.com> >> >> CC: Andrew Hunter <ahh@google.com> >> >> CC: Maged Michael <maged.michael@gmail.com> >> >> CC: gromer@google.com >> >> CC: Avi Kivity <avi@scylladb.com> >> >> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> >> >> CC: Paul Mackerras <paulus@samba.org> >> >> CC: Michael Ellerman <mpe@ellerman.id.au> >> >> CC: Dave Watson <davejwatson@fb.com> >> >> CC: Andy Lutomirski <luto@kernel.org> >> >> CC: Will Deacon <will.deacon@arm.com> >> >> CC: Hans Boehm <hboehm@google.com> >> >> --- >> >> fs/exec.c | 1 + >> >> include/linux/sched.h | 52 +++++++++++++++++++++++++++++++++++++++++ >> >> include/uapi/linux/membarrier.h | 1 + >> >> kernel/fork.c | 2 ++ >> >> kernel/sched/core.c | 3 +++ >> >> kernel/sched/membarrier.c | 37 ++++++++++++++++++++++++++--- >> >> 6 files changed, 93 insertions(+), 3 deletions(-) >> >> >> >> diff --git a/fs/exec.c b/fs/exec.c >> >> index 62175cbcc801..a4ab3253bac7 100644 >> >> --- a/fs/exec.c >> >> +++ b/fs/exec.c >> >> @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename >> >> *filename, >> >> /* execve succeeded */ >> >> current->fs->in_exec = 0; >> >> current->in_execve = 0; >> >> + membarrier_execve(current); >> >> acct_update_integrals(current); >> >> task_numa_free(current); >> >> free_bprm(bprm); >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> >> index 8337e2db0bb2..b1ecdc4e8b84 100644 >> >> --- a/include/linux/sched.h >> >> +++ b/include/linux/sched.h >> >> @@ -1086,6 +1086,9 @@ struct task_struct { >> >> /* Used by LSM modules for access restriction: */ >> >> void *security; >> >> #endif >> >> +#ifdef CONFIG_MEMBARRIER >> >> + int membarrier_sync_core; >> >> +#endif >> >> >> >> /* >> >> * New fields for task_struct should be added above here, so that >> >> @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask >> >> *mask); >> >> #define TASK_SIZE_OF(tsk) TASK_SIZE >> >> #endif >> >> >> >> +#ifdef CONFIG_MEMBARRIER >> >> +static inline void membarrier_fork(struct task_struct *t, >> >> + unsigned long clone_flags) >> >> +{ >> >> + /* >> >> + * Coherence of membarrier_sync_core against thread fork is >> >> + * protected by siglock. membarrier_fork is called with siglock >> >> + * held. >> >> + */ >> >> + t->membarrier_sync_core = current->membarrier_sync_core; >> >> +} >> >> +static inline void membarrier_execve(struct task_struct *t) >> >> +{ >> >> + t->membarrier_sync_core = 0; >> >> +} >> >> +static inline void membarrier_sched_out(struct task_struct *t) >> >> +{ >> >> + /* >> >> + * Core serialization is performed before the memory barrier >> >> + * preceding the store to rq->curr. >> >> + */ >> >> + if (unlikely(READ_ONCE(t->membarrier_sync_core))) >> >> + sync_core(); >> >> +} >> >> +static inline void membarrier_sched_in(struct task_struct *t) >> >> +{ >> >> + /* >> >> + * Core serialization is performed after the memory barrier >> >> + * following the store to rq->curr. >> >> + */ >> >> + if (unlikely(READ_ONCE(t->membarrier_sync_core))) >> >> + sync_core(); >> >> +} >> >> +#else >> >> +static inline void membarrier_fork(struct task_struct *t, >> >> + unsigned long clone_flags) >> >> +{ >> >> +} >> >> +static inline void membarrier_execve(struct task_struct *t) >> >> +{ >> >> +} >> >> +static inline void membarrier_sched_out(struct task_struct *t) >> >> +{ >> >> +} >> >> +static inline void membarrier_sched_in(struct task_struct *t) >> >> +{ >> >> +} >> >> +#endif >> >> + >> >> #endif >> >> diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h >> >> index 6d47b3249d8a..bc26fcf8fb7b 100644 >> >> --- a/include/uapi/linux/membarrier.h >> >> +++ b/include/uapi/linux/membarrier.h >> >> @@ -67,6 +67,7 @@ enum membarrier_cmd { >> >> /* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */ >> >> /* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */ >> >> MEMBARRIER_CMD_PRIVATE_EXPEDITED = (1 << 3), >> >> + MEMBARRIER_CMD_REGISTER_SYNC_CORE = (1 << 4), >> >> }; >> >> >> >> #endif /* _UAPI_LINUX_MEMBARRIER_H */ >> >> diff --git a/kernel/fork.c b/kernel/fork.c >> >> index 17921b0390b4..713b3c932671 100644 >> >> --- a/kernel/fork.c >> >> +++ b/kernel/fork.c >> >> @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process( >> >> */ >> >> copy_seccomp(p); >> >> >> >> + membarrier_fork(p, clone_flags); >> >> + >> >> /* >> >> * Process group and session signals need to be delivered to just the >> >> * parent before the fork or both the parent and the child after the >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> >> index 0e36d9960d91..3ca27d066950 100644 >> >> --- a/kernel/sched/core.c >> >> +++ b/kernel/sched/core.c >> >> @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt) >> >> local_irq_disable(); >> >> rcu_note_context_switch(preempt); >> >> >> >> + membarrier_sched_out(prev); >> >> + >> >> /* >> >> * Make sure that signal_pending_state()->signal_pending() below >> >> * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) >> >> @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt) >> >> >> >> /* Also unlocks the rq: */ >> >> rq = context_switch(rq, prev, next, &rf); >> >> + membarrier_sched_in(next); >> >> } else { >> >> rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); >> >> rq_unlock_irq(rq, &rf); >> >> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c >> >> index 7eec6914d2d2..f128caf64b49 100644 >> >> --- a/kernel/sched/membarrier.c >> >> +++ b/kernel/sched/membarrier.c >> >> @@ -25,12 +25,16 @@ >> >> * Bitmask made from a "or" of all commands within enum membarrier_cmd, >> >> * except MEMBARRIER_CMD_QUERY. >> >> */ >> >> -#define MEMBARRIER_CMD_BITMASK \ >> >> - (MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED) >> >> +#define MEMBARRIER_CMD_BITMASK \ >> >> + (MEMBARRIER_CMD_SHARED \ >> >> + | MEMBARRIER_CMD_PRIVATE_EXPEDITED \ >> >> + | MEMBARRIER_CMD_REGISTER_SYNC_CORE) >> >> >> >> static void ipi_mb(void *info) >> >> { >> >> - smp_mb(); /* IPIs should be serializing but paranoid. */ >> >> + /* IPIs should be serializing but paranoid. */ >> >> + smp_mb(); >> >> + sync_core(); >> >> } >> >> >> >> static void membarrier_private_expedited(void) >> >> @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void) >> >> smp_mb(); /* exit from system call is not a mb */ >> >> } >> >> >> >> +static void membarrier_register_sync_core(void) >> >> +{ >> >> + struct task_struct *p = current, *t; >> >> + >> >> + if (get_nr_threads(p) == 1) { >> >> + p->membarrier_sync_core = 1; >> >> + return; >> >> + } >> >> + >> >> + /* >> >> + * Coherence of membarrier_sync_core against thread fork is >> >> + * protected by siglock. >> >> + */ >> >> + spin_lock(&p->sighand->siglock); >> >> + for_each_thread(p, t) >> >> + WRITE_ONCE(t->membarrier_sync_core, 1); >> >> + spin_unlock(&p->sighand->siglock); >> >> + /* >> >> + * Ensure all future scheduler execution will observe the new >> >> + * membarrier_sync_core state for this process. >> >> + */ >> >> + synchronize_sched(); >> >> +} >> >> + >> >> /** >> >> * sys_membarrier - issue memory barriers on a set of threads >> >> * @cmd: Takes command values defined in enum membarrier_cmd. >> >> @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) >> >> case MEMBARRIER_CMD_PRIVATE_EXPEDITED: >> >> membarrier_private_expedited(); >> >> return 0; >> >> + case MEMBARRIER_CMD_REGISTER_SYNC_CORE: >> >> + membarrier_register_sync_core(); >> >> + return 0; >> >> default: >> >> return -EINVAL; >> >> } >> >> -- >> >> 2.11.0 >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd 2017-08-27 21:39 ` Mathieu Desnoyers @ 2017-08-28 17:49 ` Paul E. McKenney 2017-08-30 17:53 ` Mathieu Desnoyers 0 siblings, 1 reply; 9+ messages in thread From: Paul E. McKenney @ 2017-08-28 17:49 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter, maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski, Will Deacon, Hans Boehm On Sun, Aug 27, 2017 at 09:39:54PM +0000, Mathieu Desnoyers wrote: > ----- On Aug 27, 2017, at 2:00 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > > > On Sun, Aug 27, 2017 at 08:52:58PM +0000, Mathieu Desnoyers wrote: > >> ----- On Aug 27, 2017, at 1:35 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com > >> wrote: > >> > >> > On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote: > >> >> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier > >> >> system call. It allows processes to register their intent to have their > >> >> threads issue core serializing barriers in addition to memory barriers > >> >> whenever a membarrier command is performed. > >> >> > >> >> It is relevant for reclaim of JIT code, which requires to issue core > >> >> serializing barriers on all threads running on behalf of a process > >> >> after ensuring the old code is not visible anymore, before re-using > >> >> memory for new code. > >> >> > >> >> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE > >> >> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and > >> >> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in > >> >> addition to the memory barriers, on all of its running threads. > >> > > >> > I have queued both of these for testing and review: > >> > > >> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git > >> > f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd") > >> > 0d6eb99818da ("membarrier: Provide register sync core cmd") > >> > > >> > Could the people who asked for this please test it? The second commit > >> > is non-obvious enough to need some careful review and intensive testing > >> > before I can in good conscious pass it upstream. > >> > >> Thanks Paul! > >> > >> Can you pick up my v2 instead ? I added benchmarks in the changelog and > >> documented the new command in the membarrier public header. No changes > >> otherwise. > > > > It looks like 0day Test Robot also wants some additional header files. > > Could you please send me a v3 with its complaints addressed? > > I sent a separate patch implementing sync_core() on xtensa. It should > be applied before this patch. And it looks like we also need a sync_core() on every CPU family other than x86, which already has one. Do we want to make this capability depend on an ARCH_ kconfig option, or should we just bit the bullet and implement sync_core() everywhere? (I had to drop this commit due to downstream complaints. Left to myself, I would take the ARCH_ approach, then add it back in, but please let me know how you would like to proceed.) Thanx, Paul > Hopefully we can get some testing from people with xtensa testing > environments. > > Thanks, > > Mathieu > > > > > Thanx, Paul > > > >> Thanks, > >> > >> Mathieu > >> > >> > > >> > Thanx, Paul > >> > > >> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > >> >> CC: Peter Zijlstra <peterz@infradead.org> > >> >> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >> >> CC: Boqun Feng <boqun.feng@gmail.com> > >> >> CC: Andrew Hunter <ahh@google.com> > >> >> CC: Maged Michael <maged.michael@gmail.com> > >> >> CC: gromer@google.com > >> >> CC: Avi Kivity <avi@scylladb.com> > >> >> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> > >> >> CC: Paul Mackerras <paulus@samba.org> > >> >> CC: Michael Ellerman <mpe@ellerman.id.au> > >> >> CC: Dave Watson <davejwatson@fb.com> > >> >> CC: Andy Lutomirski <luto@kernel.org> > >> >> CC: Will Deacon <will.deacon@arm.com> > >> >> CC: Hans Boehm <hboehm@google.com> > >> >> --- > >> >> fs/exec.c | 1 + > >> >> include/linux/sched.h | 52 +++++++++++++++++++++++++++++++++++++++++ > >> >> include/uapi/linux/membarrier.h | 1 + > >> >> kernel/fork.c | 2 ++ > >> >> kernel/sched/core.c | 3 +++ > >> >> kernel/sched/membarrier.c | 37 ++++++++++++++++++++++++++--- > >> >> 6 files changed, 93 insertions(+), 3 deletions(-) > >> >> > >> >> diff --git a/fs/exec.c b/fs/exec.c > >> >> index 62175cbcc801..a4ab3253bac7 100644 > >> >> --- a/fs/exec.c > >> >> +++ b/fs/exec.c > >> >> @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename > >> >> *filename, > >> >> /* execve succeeded */ > >> >> current->fs->in_exec = 0; > >> >> current->in_execve = 0; > >> >> + membarrier_execve(current); > >> >> acct_update_integrals(current); > >> >> task_numa_free(current); > >> >> free_bprm(bprm); > >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h > >> >> index 8337e2db0bb2..b1ecdc4e8b84 100644 > >> >> --- a/include/linux/sched.h > >> >> +++ b/include/linux/sched.h > >> >> @@ -1086,6 +1086,9 @@ struct task_struct { > >> >> /* Used by LSM modules for access restriction: */ > >> >> void *security; > >> >> #endif > >> >> +#ifdef CONFIG_MEMBARRIER > >> >> + int membarrier_sync_core; > >> >> +#endif > >> >> > >> >> /* > >> >> * New fields for task_struct should be added above here, so that > >> >> @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask > >> >> *mask); > >> >> #define TASK_SIZE_OF(tsk) TASK_SIZE > >> >> #endif > >> >> > >> >> +#ifdef CONFIG_MEMBARRIER > >> >> +static inline void membarrier_fork(struct task_struct *t, > >> >> + unsigned long clone_flags) > >> >> +{ > >> >> + /* > >> >> + * Coherence of membarrier_sync_core against thread fork is > >> >> + * protected by siglock. membarrier_fork is called with siglock > >> >> + * held. > >> >> + */ > >> >> + t->membarrier_sync_core = current->membarrier_sync_core; > >> >> +} > >> >> +static inline void membarrier_execve(struct task_struct *t) > >> >> +{ > >> >> + t->membarrier_sync_core = 0; > >> >> +} > >> >> +static inline void membarrier_sched_out(struct task_struct *t) > >> >> +{ > >> >> + /* > >> >> + * Core serialization is performed before the memory barrier > >> >> + * preceding the store to rq->curr. > >> >> + */ > >> >> + if (unlikely(READ_ONCE(t->membarrier_sync_core))) > >> >> + sync_core(); > >> >> +} > >> >> +static inline void membarrier_sched_in(struct task_struct *t) > >> >> +{ > >> >> + /* > >> >> + * Core serialization is performed after the memory barrier > >> >> + * following the store to rq->curr. > >> >> + */ > >> >> + if (unlikely(READ_ONCE(t->membarrier_sync_core))) > >> >> + sync_core(); > >> >> +} > >> >> +#else > >> >> +static inline void membarrier_fork(struct task_struct *t, > >> >> + unsigned long clone_flags) > >> >> +{ > >> >> +} > >> >> +static inline void membarrier_execve(struct task_struct *t) > >> >> +{ > >> >> +} > >> >> +static inline void membarrier_sched_out(struct task_struct *t) > >> >> +{ > >> >> +} > >> >> +static inline void membarrier_sched_in(struct task_struct *t) > >> >> +{ > >> >> +} > >> >> +#endif > >> >> + > >> >> #endif > >> >> diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h > >> >> index 6d47b3249d8a..bc26fcf8fb7b 100644 > >> >> --- a/include/uapi/linux/membarrier.h > >> >> +++ b/include/uapi/linux/membarrier.h > >> >> @@ -67,6 +67,7 @@ enum membarrier_cmd { > >> >> /* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */ > >> >> /* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */ > >> >> MEMBARRIER_CMD_PRIVATE_EXPEDITED = (1 << 3), > >> >> + MEMBARRIER_CMD_REGISTER_SYNC_CORE = (1 << 4), > >> >> }; > >> >> > >> >> #endif /* _UAPI_LINUX_MEMBARRIER_H */ > >> >> diff --git a/kernel/fork.c b/kernel/fork.c > >> >> index 17921b0390b4..713b3c932671 100644 > >> >> --- a/kernel/fork.c > >> >> +++ b/kernel/fork.c > >> >> @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process( > >> >> */ > >> >> copy_seccomp(p); > >> >> > >> >> + membarrier_fork(p, clone_flags); > >> >> + > >> >> /* > >> >> * Process group and session signals need to be delivered to just the > >> >> * parent before the fork or both the parent and the child after the > >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c > >> >> index 0e36d9960d91..3ca27d066950 100644 > >> >> --- a/kernel/sched/core.c > >> >> +++ b/kernel/sched/core.c > >> >> @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt) > >> >> local_irq_disable(); > >> >> rcu_note_context_switch(preempt); > >> >> > >> >> + membarrier_sched_out(prev); > >> >> + > >> >> /* > >> >> * Make sure that signal_pending_state()->signal_pending() below > >> >> * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) > >> >> @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt) > >> >> > >> >> /* Also unlocks the rq: */ > >> >> rq = context_switch(rq, prev, next, &rf); > >> >> + membarrier_sched_in(next); > >> >> } else { > >> >> rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); > >> >> rq_unlock_irq(rq, &rf); > >> >> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c > >> >> index 7eec6914d2d2..f128caf64b49 100644 > >> >> --- a/kernel/sched/membarrier.c > >> >> +++ b/kernel/sched/membarrier.c > >> >> @@ -25,12 +25,16 @@ > >> >> * Bitmask made from a "or" of all commands within enum membarrier_cmd, > >> >> * except MEMBARRIER_CMD_QUERY. > >> >> */ > >> >> -#define MEMBARRIER_CMD_BITMASK \ > >> >> - (MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED) > >> >> +#define MEMBARRIER_CMD_BITMASK \ > >> >> + (MEMBARRIER_CMD_SHARED \ > >> >> + | MEMBARRIER_CMD_PRIVATE_EXPEDITED \ > >> >> + | MEMBARRIER_CMD_REGISTER_SYNC_CORE) > >> >> > >> >> static void ipi_mb(void *info) > >> >> { > >> >> - smp_mb(); /* IPIs should be serializing but paranoid. */ > >> >> + /* IPIs should be serializing but paranoid. */ > >> >> + smp_mb(); > >> >> + sync_core(); > >> >> } > >> >> > >> >> static void membarrier_private_expedited(void) > >> >> @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void) > >> >> smp_mb(); /* exit from system call is not a mb */ > >> >> } > >> >> > >> >> +static void membarrier_register_sync_core(void) > >> >> +{ > >> >> + struct task_struct *p = current, *t; > >> >> + > >> >> + if (get_nr_threads(p) == 1) { > >> >> + p->membarrier_sync_core = 1; > >> >> + return; > >> >> + } > >> >> + > >> >> + /* > >> >> + * Coherence of membarrier_sync_core against thread fork is > >> >> + * protected by siglock. > >> >> + */ > >> >> + spin_lock(&p->sighand->siglock); > >> >> + for_each_thread(p, t) > >> >> + WRITE_ONCE(t->membarrier_sync_core, 1); > >> >> + spin_unlock(&p->sighand->siglock); > >> >> + /* > >> >> + * Ensure all future scheduler execution will observe the new > >> >> + * membarrier_sync_core state for this process. > >> >> + */ > >> >> + synchronize_sched(); > >> >> +} > >> >> + > >> >> /** > >> >> * sys_membarrier - issue memory barriers on a set of threads > >> >> * @cmd: Takes command values defined in enum membarrier_cmd. > >> >> @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) > >> >> case MEMBARRIER_CMD_PRIVATE_EXPEDITED: > >> >> membarrier_private_expedited(); > >> >> return 0; > >> >> + case MEMBARRIER_CMD_REGISTER_SYNC_CORE: > >> >> + membarrier_register_sync_core(); > >> >> + return 0; > >> >> default: > >> >> return -EINVAL; > >> >> } > >> >> -- > >> >> 2.11.0 > >> > >> -- > >> Mathieu Desnoyers > >> EfficiOS Inc. > >> http://www.efficios.com > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd 2017-08-28 17:49 ` Paul E. McKenney @ 2017-08-30 17:53 ` Mathieu Desnoyers 2017-08-30 18:01 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: Mathieu Desnoyers @ 2017-08-30 17:53 UTC (permalink / raw) To: Paul E. McKenney Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter, maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski, Will Deacon, Hans Boehm ----- On Aug 28, 2017, at 1:49 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > On Sun, Aug 27, 2017 at 09:39:54PM +0000, Mathieu Desnoyers wrote: >> ----- On Aug 27, 2017, at 2:00 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com >> wrote: >> >> > On Sun, Aug 27, 2017 at 08:52:58PM +0000, Mathieu Desnoyers wrote: >> >> ----- On Aug 27, 2017, at 1:35 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com >> >> wrote: >> >> >> >> > On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote: >> >> >> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier >> >> >> system call. It allows processes to register their intent to have their >> >> >> threads issue core serializing barriers in addition to memory barriers >> >> >> whenever a membarrier command is performed. >> >> >> >> >> >> It is relevant for reclaim of JIT code, which requires to issue core >> >> >> serializing barriers on all threads running on behalf of a process >> >> >> after ensuring the old code is not visible anymore, before re-using >> >> >> memory for new code. >> >> >> >> >> >> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE >> >> >> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and >> >> >> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in >> >> >> addition to the memory barriers, on all of its running threads. >> >> > >> >> > I have queued both of these for testing and review: >> >> > >> >> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git >> >> > f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd") >> >> > 0d6eb99818da ("membarrier: Provide register sync core cmd") >> >> > >> >> > Could the people who asked for this please test it? The second commit >> >> > is non-obvious enough to need some careful review and intensive testing >> >> > before I can in good conscious pass it upstream. >> >> >> >> Thanks Paul! >> >> >> >> Can you pick up my v2 instead ? I added benchmarks in the changelog and >> >> documented the new command in the membarrier public header. No changes >> >> otherwise. >> > >> > It looks like 0day Test Robot also wants some additional header files. >> > Could you please send me a v3 with its complaints addressed? >> >> I sent a separate patch implementing sync_core() on xtensa. It should >> be applied before this patch. > > And it looks like we also need a sync_core() on every CPU family other > than x86, which already has one. Do we want to make this capability > depend on an ARCH_ kconfig option, or should we just bit the bullet and > implement sync_core() everywhere? > > (I had to drop this commit due to downstream complaints. Left to myself, > I would take the ARCH_ approach, then add it back in, but please let me > know how you would like to proceed.) I plan to go with the ARCH_ approach indeed. Thanks, Mathieu > > Thanx, Paul > >> Hopefully we can get some testing from people with xtensa testing >> environments. >> >> Thanks, >> >> Mathieu >> >> > >> > Thanx, Paul >> > >> >> Thanks, >> >> >> >> Mathieu >> >> >> >> > >> >> > Thanx, Paul >> >> > >> >> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> >> >> >> CC: Peter Zijlstra <peterz@infradead.org> >> >> >> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >> >> >> CC: Boqun Feng <boqun.feng@gmail.com> >> >> >> CC: Andrew Hunter <ahh@google.com> >> >> >> CC: Maged Michael <maged.michael@gmail.com> >> >> >> CC: gromer@google.com >> >> >> CC: Avi Kivity <avi@scylladb.com> >> >> >> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> >> >> >> CC: Paul Mackerras <paulus@samba.org> >> >> >> CC: Michael Ellerman <mpe@ellerman.id.au> >> >> >> CC: Dave Watson <davejwatson@fb.com> >> >> >> CC: Andy Lutomirski <luto@kernel.org> >> >> >> CC: Will Deacon <will.deacon@arm.com> >> >> >> CC: Hans Boehm <hboehm@google.com> >> >> >> --- >> >> >> fs/exec.c | 1 + >> >> >> include/linux/sched.h | 52 +++++++++++++++++++++++++++++++++++++++++ >> >> >> include/uapi/linux/membarrier.h | 1 + >> >> >> kernel/fork.c | 2 ++ >> >> >> kernel/sched/core.c | 3 +++ >> >> >> kernel/sched/membarrier.c | 37 ++++++++++++++++++++++++++--- >> >> >> 6 files changed, 93 insertions(+), 3 deletions(-) >> >> >> >> >> >> diff --git a/fs/exec.c b/fs/exec.c >> >> >> index 62175cbcc801..a4ab3253bac7 100644 >> >> >> --- a/fs/exec.c >> >> >> +++ b/fs/exec.c >> >> >> @@ -1794,6 +1794,7 @@ static int do_execveat_common(int fd, struct filename >> >> >> *filename, >> >> >> /* execve succeeded */ >> >> >> current->fs->in_exec = 0; >> >> >> current->in_execve = 0; >> >> >> + membarrier_execve(current); >> >> >> acct_update_integrals(current); >> >> >> task_numa_free(current); >> >> >> free_bprm(bprm); >> >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> >> >> index 8337e2db0bb2..b1ecdc4e8b84 100644 >> >> >> --- a/include/linux/sched.h >> >> >> +++ b/include/linux/sched.h >> >> >> @@ -1086,6 +1086,9 @@ struct task_struct { >> >> >> /* Used by LSM modules for access restriction: */ >> >> >> void *security; >> >> >> #endif >> >> >> +#ifdef CONFIG_MEMBARRIER >> >> >> + int membarrier_sync_core; >> >> >> +#endif >> >> >> >> >> >> /* >> >> >> * New fields for task_struct should be added above here, so that >> >> >> @@ -1623,4 +1626,53 @@ extern long sched_getaffinity(pid_t pid, struct cpumask >> >> >> *mask); >> >> >> #define TASK_SIZE_OF(tsk) TASK_SIZE >> >> >> #endif >> >> >> >> >> >> +#ifdef CONFIG_MEMBARRIER >> >> >> +static inline void membarrier_fork(struct task_struct *t, >> >> >> + unsigned long clone_flags) >> >> >> +{ >> >> >> + /* >> >> >> + * Coherence of membarrier_sync_core against thread fork is >> >> >> + * protected by siglock. membarrier_fork is called with siglock >> >> >> + * held. >> >> >> + */ >> >> >> + t->membarrier_sync_core = current->membarrier_sync_core; >> >> >> +} >> >> >> +static inline void membarrier_execve(struct task_struct *t) >> >> >> +{ >> >> >> + t->membarrier_sync_core = 0; >> >> >> +} >> >> >> +static inline void membarrier_sched_out(struct task_struct *t) >> >> >> +{ >> >> >> + /* >> >> >> + * Core serialization is performed before the memory barrier >> >> >> + * preceding the store to rq->curr. >> >> >> + */ >> >> >> + if (unlikely(READ_ONCE(t->membarrier_sync_core))) >> >> >> + sync_core(); >> >> >> +} >> >> >> +static inline void membarrier_sched_in(struct task_struct *t) >> >> >> +{ >> >> >> + /* >> >> >> + * Core serialization is performed after the memory barrier >> >> >> + * following the store to rq->curr. >> >> >> + */ >> >> >> + if (unlikely(READ_ONCE(t->membarrier_sync_core))) >> >> >> + sync_core(); >> >> >> +} >> >> >> +#else >> >> >> +static inline void membarrier_fork(struct task_struct *t, >> >> >> + unsigned long clone_flags) >> >> >> +{ >> >> >> +} >> >> >> +static inline void membarrier_execve(struct task_struct *t) >> >> >> +{ >> >> >> +} >> >> >> +static inline void membarrier_sched_out(struct task_struct *t) >> >> >> +{ >> >> >> +} >> >> >> +static inline void membarrier_sched_in(struct task_struct *t) >> >> >> +{ >> >> >> +} >> >> >> +#endif >> >> >> + >> >> >> #endif >> >> >> diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h >> >> >> index 6d47b3249d8a..bc26fcf8fb7b 100644 >> >> >> --- a/include/uapi/linux/membarrier.h >> >> >> +++ b/include/uapi/linux/membarrier.h >> >> >> @@ -67,6 +67,7 @@ enum membarrier_cmd { >> >> >> /* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */ >> >> >> /* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */ >> >> >> MEMBARRIER_CMD_PRIVATE_EXPEDITED = (1 << 3), >> >> >> + MEMBARRIER_CMD_REGISTER_SYNC_CORE = (1 << 4), >> >> >> }; >> >> >> >> >> >> #endif /* _UAPI_LINUX_MEMBARRIER_H */ >> >> >> diff --git a/kernel/fork.c b/kernel/fork.c >> >> >> index 17921b0390b4..713b3c932671 100644 >> >> >> --- a/kernel/fork.c >> >> >> +++ b/kernel/fork.c >> >> >> @@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process( >> >> >> */ >> >> >> copy_seccomp(p); >> >> >> >> >> >> + membarrier_fork(p, clone_flags); >> >> >> + >> >> >> /* >> >> >> * Process group and session signals need to be delivered to just the >> >> >> * parent before the fork or both the parent and the child after the >> >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> >> >> index 0e36d9960d91..3ca27d066950 100644 >> >> >> --- a/kernel/sched/core.c >> >> >> +++ b/kernel/sched/core.c >> >> >> @@ -3292,6 +3292,8 @@ static void __sched notrace __schedule(bool preempt) >> >> >> local_irq_disable(); >> >> >> rcu_note_context_switch(preempt); >> >> >> >> >> >> + membarrier_sched_out(prev); >> >> >> + >> >> >> /* >> >> >> * Make sure that signal_pending_state()->signal_pending() below >> >> >> * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) >> >> >> @@ -3364,6 +3366,7 @@ static void __sched notrace __schedule(bool preempt) >> >> >> >> >> >> /* Also unlocks the rq: */ >> >> >> rq = context_switch(rq, prev, next, &rf); >> >> >> + membarrier_sched_in(next); >> >> >> } else { >> >> >> rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); >> >> >> rq_unlock_irq(rq, &rf); >> >> >> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c >> >> >> index 7eec6914d2d2..f128caf64b49 100644 >> >> >> --- a/kernel/sched/membarrier.c >> >> >> +++ b/kernel/sched/membarrier.c >> >> >> @@ -25,12 +25,16 @@ >> >> >> * Bitmask made from a "or" of all commands within enum membarrier_cmd, >> >> >> * except MEMBARRIER_CMD_QUERY. >> >> >> */ >> >> >> -#define MEMBARRIER_CMD_BITMASK \ >> >> >> - (MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED) >> >> >> +#define MEMBARRIER_CMD_BITMASK \ >> >> >> + (MEMBARRIER_CMD_SHARED \ >> >> >> + | MEMBARRIER_CMD_PRIVATE_EXPEDITED \ >> >> >> + | MEMBARRIER_CMD_REGISTER_SYNC_CORE) >> >> >> >> >> >> static void ipi_mb(void *info) >> >> >> { >> >> >> - smp_mb(); /* IPIs should be serializing but paranoid. */ >> >> >> + /* IPIs should be serializing but paranoid. */ >> >> >> + smp_mb(); >> >> >> + sync_core(); >> >> >> } >> >> >> >> >> >> static void membarrier_private_expedited(void) >> >> >> @@ -96,6 +100,30 @@ static void membarrier_private_expedited(void) >> >> >> smp_mb(); /* exit from system call is not a mb */ >> >> >> } >> >> >> >> >> >> +static void membarrier_register_sync_core(void) >> >> >> +{ >> >> >> + struct task_struct *p = current, *t; >> >> >> + >> >> >> + if (get_nr_threads(p) == 1) { >> >> >> + p->membarrier_sync_core = 1; >> >> >> + return; >> >> >> + } >> >> >> + >> >> >> + /* >> >> >> + * Coherence of membarrier_sync_core against thread fork is >> >> >> + * protected by siglock. >> >> >> + */ >> >> >> + spin_lock(&p->sighand->siglock); >> >> >> + for_each_thread(p, t) >> >> >> + WRITE_ONCE(t->membarrier_sync_core, 1); >> >> >> + spin_unlock(&p->sighand->siglock); >> >> >> + /* >> >> >> + * Ensure all future scheduler execution will observe the new >> >> >> + * membarrier_sync_core state for this process. >> >> >> + */ >> >> >> + synchronize_sched(); >> >> >> +} >> >> >> + >> >> >> /** >> >> >> * sys_membarrier - issue memory barriers on a set of threads >> >> >> * @cmd: Takes command values defined in enum membarrier_cmd. >> >> >> @@ -146,6 +174,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) >> >> >> case MEMBARRIER_CMD_PRIVATE_EXPEDITED: >> >> >> membarrier_private_expedited(); >> >> >> return 0; >> >> >> + case MEMBARRIER_CMD_REGISTER_SYNC_CORE: >> >> >> + membarrier_register_sync_core(); >> >> >> + return 0; >> >> >> default: >> >> >> return -EINVAL; >> >> >> } >> >> >> -- >> >> >> 2.11.0 >> >> >> >> -- >> >> Mathieu Desnoyers >> >> EfficiOS Inc. >> >> http://www.efficios.com >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 2/2] membarrier: provide register sync core cmd 2017-08-30 17:53 ` Mathieu Desnoyers @ 2017-08-30 18:01 ` Paul E. McKenney 0 siblings, 0 replies; 9+ messages in thread From: Paul E. McKenney @ 2017-08-30 18:01 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, linux-kernel, Boqun Feng, Andrew Hunter, maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, Dave Watson, Andy Lutomirski, Will Deacon, Hans Boehm On Wed, Aug 30, 2017 at 05:53:53PM +0000, Mathieu Desnoyers wrote: > ----- On Aug 28, 2017, at 1:49 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > > > On Sun, Aug 27, 2017 at 09:39:54PM +0000, Mathieu Desnoyers wrote: > >> ----- On Aug 27, 2017, at 2:00 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com > >> wrote: > >> > >> > On Sun, Aug 27, 2017 at 08:52:58PM +0000, Mathieu Desnoyers wrote: > >> >> ----- On Aug 27, 2017, at 1:35 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com > >> >> wrote: > >> >> > >> >> > On Sun, Aug 27, 2017 at 12:54:04PM -0700, Mathieu Desnoyers wrote: > >> >> >> Add a new MEMBARRIER_CMD_REGISTER_SYNC_CORE command to the membarrier > >> >> >> system call. It allows processes to register their intent to have their > >> >> >> threads issue core serializing barriers in addition to memory barriers > >> >> >> whenever a membarrier command is performed. > >> >> >> > >> >> >> It is relevant for reclaim of JIT code, which requires to issue core > >> >> >> serializing barriers on all threads running on behalf of a process > >> >> >> after ensuring the old code is not visible anymore, before re-using > >> >> >> memory for new code. > >> >> >> > >> >> >> When a processes returns from a MEMBARRIER_CMD_REGISTER_SYNC_CORE > >> >> >> command, it is guaranteed that all following MEMBARRIER_CMD_SHARED and > >> >> >> MEMBARRIER_CMD_PRIVATE_EXPEDITED issue core serializing barriers, in > >> >> >> addition to the memory barriers, on all of its running threads. > >> >> > > >> >> > I have queued both of these for testing and review: > >> >> > > >> >> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git > >> >> > f54cf5123eb9 ("membarrier: Speed up single-threaded private cmd") > >> >> > 0d6eb99818da ("membarrier: Provide register sync core cmd") > >> >> > > >> >> > Could the people who asked for this please test it? The second commit > >> >> > is non-obvious enough to need some careful review and intensive testing > >> >> > before I can in good conscious pass it upstream. > >> >> > >> >> Thanks Paul! > >> >> > >> >> Can you pick up my v2 instead ? I added benchmarks in the changelog and > >> >> documented the new command in the membarrier public header. No changes > >> >> otherwise. > >> > > >> > It looks like 0day Test Robot also wants some additional header files. > >> > Could you please send me a v3 with its complaints addressed? > >> > >> I sent a separate patch implementing sync_core() on xtensa. It should > >> be applied before this patch. > > > > And it looks like we also need a sync_core() on every CPU family other > > than x86, which already has one. Do we want to make this capability > > depend on an ARCH_ kconfig option, or should we just bit the bullet and > > implement sync_core() everywhere? > > > > (I had to drop this commit due to downstream complaints. Left to myself, > > I would take the ARCH_ approach, then add it back in, but please let me > > know how you would like to proceed.) > > I plan to go with the ARCH_ approach indeed. Sounds good, looking forward to seeing it. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-08-30 18:01 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-27 19:54 [RFC PATCH 1/2] membarrier: speed up single-threaded private cmd Mathieu Desnoyers 2017-08-27 19:54 ` [RFC PATCH 2/2] membarrier: provide register sync core cmd Mathieu Desnoyers 2017-08-27 20:35 ` Paul E. McKenney 2017-08-27 20:52 ` Mathieu Desnoyers 2017-08-27 21:00 ` Paul E. McKenney 2017-08-27 21:39 ` Mathieu Desnoyers 2017-08-28 17:49 ` Paul E. McKenney 2017-08-30 17:53 ` Mathieu Desnoyers 2017-08-30 18:01 ` Paul E. McKenney
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.