From: Peter Zijlstra <peterz@infradead.org> To: Nicholas Piggin <npiggin@gmail.com> Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Arnd Bergmann <arnd@arndb.de>, linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, Anton Blanchard <anton@ozlabs.org> Subject: Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Date: Wed, 2 Dec 2020 15:19:57 +0100 [thread overview] Message-ID: <20201202141957.GJ3021@hirez.programming.kicks-ass.net> (raw) In-Reply-To: <20201128160141.1003903-7-npiggin@gmail.com> On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote: > + * - A delayed freeing and RCU-like quiescing sequence based on > + * mm switching to avoid IPIs completely. That one's interesting too. so basically you want to count switch_mm() invocations on each CPU. Then, periodically snapshot the counter on each CPU, and when they've all changed, increment a global counter. Then, you snapshot the global counter and wait for it to increment (twice I think, the first increment might already be in progress). The only question here is what should drive this machinery.. the tick probably. This shouldn't be too hard to do I think. Something a little like so perhaps? diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 41404afb7f4c..27b64a60a468 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4525,6 +4525,7 @@ context_switch(struct rq *rq, struct task_struct *prev, * finish_task_switch()'s mmdrop(). */ switch_mm_irqs_off(prev->active_mm, next->mm, next); + rq->nr_mm_switches++; if (!prev->mm) { // from kernel /* will mmdrop() in finish_task_switch(). */ @@ -4739,6 +4740,80 @@ unsigned long long task_sched_runtime(struct task_struct *p) return ns; } +static DEFINE_PER_CPU(unsigned long[2], mm_switches); + +static struct { + unsigned long __percpu *switches[2]; + unsigned long generation; + atomic_t complete; + struct wait_queue_dead wait; +} mm_foo = { + .switches = &mm_switches, + .generation = 0, + .complete = -1, // XXX bootstrap, hotplug + .wait = __WAIT_QUEUE_HEAD_INITIALIZER(mm_foo.wait), +}; + +static void mm_gen_tick(int cpu, struct rq *rq) +{ + unsigned long prev, curr, switches = rq->nr_mm_switches; + int idx = READ_ONCE(mm_foo.generation) & 1; + + /* DATA-DEP on mm_foo.generation */ + + prev = __this_cpu_read(mm_foo.switches[idx^1]); + curr = __this_cpu_read(mm_foo.switches[idx]); + + /* we haven't switched since the last generation */ + if (prev == switches) + return false; + + __this_cpu_write(mm_foo.switches[idx], switches); + + /* + * If @curr is less than @prev, this is the first update of + * this generation, per the above, switches has also increased since, + * so mark out CPU complete. + */ + if ((long)(curr - prev) < 0 && atomic_dec_and_test(&mm_foo.complete)) { + /* + * All CPUs are complete, IOW they all switched at least once + * since the last generation. Reset the completion counter and + * increment the generation. + */ + atomic_set(&mm_foo.complete, nr_online_cpus()); + /* + * Matches the address dependency above: + * + * idx = gen & 1 complete = nr_cpus + * <DATA-DEP> <WMB> + * curr = sw[idx] generation++; + * prev = sw[idx^1] + * if (curr < prev) + * complete-- + * + * If we don't observe the new generation; we'll not decrement. If we + * do see the new generation, we must also see the new completion count. + */ + smp_wmb(); + mm_foo.generation++; + return true; + } + + return false; +} + +static void mm_gen_wake(void) +{ + wake_up_all(&mm_foo.wait); +} + +static void mm_gen_wait(void) +{ + unsigned int gen = READ_ONCE(mm_foo.generation); + wait_event(&mm_foo.wait, READ_ONCE(mm_foo.generation) - gen > 1); +} + /* * This function gets called by the timer code, with HZ frequency. * We call it with interrupts disabled. @@ -4750,6 +4825,7 @@ void scheduler_tick(void) struct task_struct *curr = rq->curr; struct rq_flags rf; unsigned long thermal_pressure; + bool wake_mm_gen; arch_scale_freq_tick(); sched_clock_tick(); @@ -4763,8 +4839,13 @@ void scheduler_tick(void) calc_global_load_tick(rq); psi_task_tick(rq); + wake_mm_gen = mm_gen_tick(cpu, rq); + rq_unlock(rq, &rf); + if (wake_mm_gen) + mm_gen_wake(); + perf_event_task_tick(); #ifdef CONFIG_SMP diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index bf9d8da7d35e..62fb685db8d0 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -927,6 +927,7 @@ struct rq { unsigned int ttwu_pending; #endif u64 nr_switches; + u64 nr_mm_switches; #ifdef CONFIG_UCLAMP_TASK /* Utilization clamp values based on CPU's RUNNABLE tasks */
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org> To: Nicholas Piggin <npiggin@gmail.com> Cc: linux-arch@vger.kernel.org, Arnd Bergmann <arnd@arndb.de>, x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Date: Wed, 2 Dec 2020 15:19:57 +0100 [thread overview] Message-ID: <20201202141957.GJ3021@hirez.programming.kicks-ass.net> (raw) In-Reply-To: <20201128160141.1003903-7-npiggin@gmail.com> On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote: > + * - A delayed freeing and RCU-like quiescing sequence based on > + * mm switching to avoid IPIs completely. That one's interesting too. so basically you want to count switch_mm() invocations on each CPU. Then, periodically snapshot the counter on each CPU, and when they've all changed, increment a global counter. Then, you snapshot the global counter and wait for it to increment (twice I think, the first increment might already be in progress). The only question here is what should drive this machinery.. the tick probably. This shouldn't be too hard to do I think. Something a little like so perhaps? diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 41404afb7f4c..27b64a60a468 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4525,6 +4525,7 @@ context_switch(struct rq *rq, struct task_struct *prev, * finish_task_switch()'s mmdrop(). */ switch_mm_irqs_off(prev->active_mm, next->mm, next); + rq->nr_mm_switches++; if (!prev->mm) { // from kernel /* will mmdrop() in finish_task_switch(). */ @@ -4739,6 +4740,80 @@ unsigned long long task_sched_runtime(struct task_struct *p) return ns; } +static DEFINE_PER_CPU(unsigned long[2], mm_switches); + +static struct { + unsigned long __percpu *switches[2]; + unsigned long generation; + atomic_t complete; + struct wait_queue_dead wait; +} mm_foo = { + .switches = &mm_switches, + .generation = 0, + .complete = -1, // XXX bootstrap, hotplug + .wait = __WAIT_QUEUE_HEAD_INITIALIZER(mm_foo.wait), +}; + +static void mm_gen_tick(int cpu, struct rq *rq) +{ + unsigned long prev, curr, switches = rq->nr_mm_switches; + int idx = READ_ONCE(mm_foo.generation) & 1; + + /* DATA-DEP on mm_foo.generation */ + + prev = __this_cpu_read(mm_foo.switches[idx^1]); + curr = __this_cpu_read(mm_foo.switches[idx]); + + /* we haven't switched since the last generation */ + if (prev == switches) + return false; + + __this_cpu_write(mm_foo.switches[idx], switches); + + /* + * If @curr is less than @prev, this is the first update of + * this generation, per the above, switches has also increased since, + * so mark out CPU complete. + */ + if ((long)(curr - prev) < 0 && atomic_dec_and_test(&mm_foo.complete)) { + /* + * All CPUs are complete, IOW they all switched at least once + * since the last generation. Reset the completion counter and + * increment the generation. + */ + atomic_set(&mm_foo.complete, nr_online_cpus()); + /* + * Matches the address dependency above: + * + * idx = gen & 1 complete = nr_cpus + * <DATA-DEP> <WMB> + * curr = sw[idx] generation++; + * prev = sw[idx^1] + * if (curr < prev) + * complete-- + * + * If we don't observe the new generation; we'll not decrement. If we + * do see the new generation, we must also see the new completion count. + */ + smp_wmb(); + mm_foo.generation++; + return true; + } + + return false; +} + +static void mm_gen_wake(void) +{ + wake_up_all(&mm_foo.wait); +} + +static void mm_gen_wait(void) +{ + unsigned int gen = READ_ONCE(mm_foo.generation); + wait_event(&mm_foo.wait, READ_ONCE(mm_foo.generation) - gen > 1); +} + /* * This function gets called by the timer code, with HZ frequency. * We call it with interrupts disabled. @@ -4750,6 +4825,7 @@ void scheduler_tick(void) struct task_struct *curr = rq->curr; struct rq_flags rf; unsigned long thermal_pressure; + bool wake_mm_gen; arch_scale_freq_tick(); sched_clock_tick(); @@ -4763,8 +4839,13 @@ void scheduler_tick(void) calc_global_load_tick(rq); psi_task_tick(rq); + wake_mm_gen = mm_gen_tick(cpu, rq); + rq_unlock(rq, &rf); + if (wake_mm_gen) + mm_gen_wake(); + perf_event_task_tick(); #ifdef CONFIG_SMP diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index bf9d8da7d35e..62fb685db8d0 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -927,6 +927,7 @@ struct rq { unsigned int ttwu_pending; #endif u64 nr_switches; + u64 nr_mm_switches; #ifdef CONFIG_UCLAMP_TASK /* Utilization clamp values based on CPU's RUNNABLE tasks */
next prev parent reply other threads:[~2020-12-02 14:21 UTC|newest] Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-28 16:01 [PATCH 0/8] shoot lazy tlbs Nicholas Piggin 2020-11-28 16:01 ` Nicholas Piggin 2020-11-28 16:01 ` [PATCH 1/8] lazy tlb: introduce exit_lazy_tlb Nicholas Piggin 2020-11-28 16:01 ` Nicholas Piggin 2020-11-29 0:38 ` Andy Lutomirski 2020-11-29 0:38 ` Andy Lutomirski 2020-11-29 0:38 ` Andy Lutomirski 2020-12-02 2:49 ` Nicholas Piggin 2020-12-02 2:49 ` Nicholas Piggin 2020-11-28 16:01 ` [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode Nicholas Piggin 2020-11-28 16:01 ` Nicholas Piggin 2020-11-28 17:55 ` Andy Lutomirski 2020-11-28 17:55 ` Andy Lutomirski 2020-11-28 17:55 ` Andy Lutomirski 2020-12-02 2:49 ` Nicholas Piggin 2020-12-02 2:49 ` Nicholas Piggin 2020-12-03 5:09 ` Andy Lutomirski 2020-12-03 5:09 ` Andy Lutomirski 2020-12-03 5:09 ` Andy Lutomirski 2020-12-05 8:00 ` Nicholas Piggin 2020-12-05 8:00 ` Nicholas Piggin 2020-12-05 16:11 ` Andy Lutomirski 2020-12-05 16:11 ` Andy Lutomirski 2020-12-05 23:14 ` Nicholas Piggin 2020-12-05 23:14 ` Nicholas Piggin 2020-12-06 0:36 ` Andy Lutomirski 2020-12-06 0:36 ` Andy Lutomirski 2020-12-06 0:36 ` Andy Lutomirski 2020-12-06 3:59 ` Nicholas Piggin 2020-12-06 3:59 ` Nicholas Piggin 2020-12-11 0:11 ` Andy Lutomirski 2020-12-11 0:11 ` Andy Lutomirski 2020-12-11 0:11 ` Andy Lutomirski 2020-12-14 4:07 ` Nicholas Piggin 2020-12-14 4:07 ` Nicholas Piggin 2020-12-14 5:53 ` Nicholas Piggin 2020-12-14 5:53 ` Nicholas Piggin 2020-11-30 14:57 ` Mathieu Desnoyers 2020-11-30 14:57 ` Mathieu Desnoyers 2020-11-30 14:57 ` Mathieu Desnoyers 2020-11-28 16:01 ` [PATCH 3/8] x86: remove ARCH_HAS_SYNC_CORE_BEFORE_USERMODE Nicholas Piggin 2020-11-28 16:01 ` Nicholas Piggin 2020-11-28 16:01 ` [PATCH 4/8] lazy tlb: introduce lazy mm refcount helper functions Nicholas Piggin 2020-11-28 16:01 ` Nicholas Piggin 2020-11-28 16:01 ` [PATCH 5/8] lazy tlb: allow lazy tlb mm switching to be configurable Nicholas Piggin 2020-11-28 16:01 ` Nicholas Piggin 2020-11-29 0:36 ` Andy Lutomirski 2020-11-29 0:36 ` Andy Lutomirski 2020-11-29 0:36 ` Andy Lutomirski 2020-12-02 2:49 ` Nicholas Piggin 2020-12-02 2:49 ` Nicholas Piggin 2020-11-28 16:01 ` [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Nicholas Piggin 2020-11-28 16:01 ` Nicholas Piggin 2020-11-29 3:54 ` Andy Lutomirski 2020-11-29 3:54 ` Andy Lutomirski 2020-11-29 3:54 ` Andy Lutomirski 2020-11-29 20:16 ` Andy Lutomirski 2020-11-29 20:16 ` Andy Lutomirski 2020-11-29 20:16 ` Andy Lutomirski 2020-11-30 9:25 ` Peter Zijlstra 2020-11-30 9:25 ` Peter Zijlstra 2020-11-30 18:31 ` Andy Lutomirski 2020-11-30 18:31 ` Andy Lutomirski 2020-11-30 18:31 ` Andy Lutomirski 2020-12-01 21:27 ` Will Deacon 2020-12-01 21:27 ` Will Deacon 2020-12-01 21:50 ` Andy Lutomirski 2020-12-01 21:50 ` Andy Lutomirski 2020-12-01 21:50 ` Andy Lutomirski 2020-12-01 23:04 ` Will Deacon 2020-12-01 23:04 ` Will Deacon 2020-12-02 3:47 ` Nicholas Piggin 2020-12-02 3:47 ` Nicholas Piggin 2020-12-03 5:05 ` Andy Lutomirski 2020-12-03 5:05 ` Andy Lutomirski 2020-12-03 5:05 ` Andy Lutomirski 2020-12-03 17:03 ` Alexander Gordeev 2020-12-03 17:03 ` Alexander Gordeev 2020-12-03 17:14 ` Andy Lutomirski 2020-12-03 17:14 ` Andy Lutomirski 2020-12-03 18:33 ` Alexander Gordeev 2020-12-03 18:33 ` Alexander Gordeev 2020-11-30 9:26 ` Peter Zijlstra 2020-11-30 9:26 ` Peter Zijlstra 2020-11-30 9:30 ` Peter Zijlstra 2020-11-30 9:30 ` Peter Zijlstra 2020-11-30 9:34 ` Peter Zijlstra 2020-11-30 9:34 ` Peter Zijlstra 2020-12-02 3:09 ` Nicholas Piggin 2020-12-02 3:09 ` Nicholas Piggin 2020-12-02 11:17 ` Peter Zijlstra 2020-12-02 11:17 ` Peter Zijlstra 2020-12-02 12:45 ` Peter Zijlstra 2020-12-02 12:45 ` Peter Zijlstra 2020-12-02 14:19 ` Peter Zijlstra [this message] 2020-12-02 14:19 ` Peter Zijlstra 2020-12-02 14:38 ` Andy Lutomirski 2020-12-02 14:38 ` Andy Lutomirski 2020-12-02 16:29 ` Peter Zijlstra 2020-12-02 16:29 ` Peter Zijlstra 2020-11-28 16:01 ` [PATCH 7/8] powerpc: use lazy mm refcount helper functions Nicholas Piggin 2020-11-28 16:01 ` Nicholas Piggin 2020-11-28 16:01 ` [PATCH 8/8] powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN Nicholas Piggin 2020-11-28 16:01 ` Nicholas Piggin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201202141957.GJ3021@hirez.programming.kicks-ass.net \ --to=peterz@infradead.org \ --cc=anton@ozlabs.org \ --cc=arnd@arndb.de \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mathieu.desnoyers@efficios.com \ --cc=npiggin@gmail.com \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.