All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@surriel.com>
To: linux-kernel@vger.kernel.org
Cc: x86@kernel.org, luto@kernel.org, efault@gmx.de,
	kernel-team@fb.com, mingo@kernel.org, dave.hansen@intel.com,
	Rik van Riel <riel@surriel.com>
Subject: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier
Date: Mon, 16 Jul 2018 15:03:34 -0400	[thread overview]
Message-ID: <20180716190337.26133-5-riel@surriel.com> (raw)
In-Reply-To: <20180716190337.26133-1-riel@surriel.com>

Lazy TLB mode can result in an idle CPU being woken up by a TLB flush,
when all it really needs to do is reload %CR3 at the next context switch,
assuming no page table pages got freed.

Memory ordering is used to prevent race conditions between switch_mm_irqs_off,
which checks whether .tlb_gen changed, and the TLB invalidation code, which
increments .tlb_gen whenever page table entries get invalidated.

The atomic increment in inc_mm_tlb_gen is its own barrier; the context
switch code adds an explicit barrier between reading tlbstate.is_lazy and
next->context.tlb_gen.

Unlike the 2016 version of this patch, CPUs with cpu_tlbstate.is_lazy set
are not removed from the mm_cpumask(mm), since that would prevent the TLB
flush IPIs at page table free time from being sent to all the CPUs
that need them.

This patch reduces total CPU use in the system by about 1-2% for a
memcache workload on two socket systems, and by about 1% for a heavily
multi-process netperf between two systems.

Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Song Liu <songliubraving@fb.com>
---
 arch/x86/mm/tlb.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 59 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 4b73fe835c95..26542cc17043 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -7,6 +7,7 @@
 #include <linux/export.h>
 #include <linux/cpu.h>
 #include <linux/debugfs.h>
+#include <linux/gfp.h>
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
@@ -185,6 +186,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 {
 	struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
 	u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+	bool was_lazy = this_cpu_read(cpu_tlbstate.is_lazy);
 	unsigned cpu = smp_processor_id();
 	u64 next_tlb_gen;
 	bool need_flush;
@@ -242,17 +244,40 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			   next->context.ctx_id);
 
 		/*
-		 * We don't currently support having a real mm loaded without
-		 * our cpu set in mm_cpumask().  We have all the bookkeeping
-		 * in place to figure out whether we would need to flush
-		 * if our cpu were cleared in mm_cpumask(), but we don't
-		 * currently use it.
+		 * Even in lazy TLB mode, the CPU should stay set in the
+		 * mm_cpumask. The TLB shootdown code can figure out from
+		 * from cpu_tlbstate.is_lazy whether or not to send an IPI.
 		 */
 		if (WARN_ON_ONCE(real_prev != &init_mm &&
 				 !cpumask_test_cpu(cpu, mm_cpumask(next))))
 			cpumask_set_cpu(cpu, mm_cpumask(next));
 
-		return;
+		/*
+		 * If the CPU is not in lazy TLB mode, we are just switching
+		 * from one thread in a process to another thread in the same
+		 * process. No TLB flush required.
+		 */
+		if (!was_lazy)
+			return;
+
+		/*
+		 * Read the tlb_gen to check whether a flush is needed.
+		 * If the TLB is up to date, just use it.
+		 * The barrier synchronizes with the tlb_gen increment in
+		 * the TLB shootdown code.
+		 */
+		smp_mb();
+		next_tlb_gen = atomic64_read(&next->context.tlb_gen);
+		if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
+				next_tlb_gen)
+			return;
+
+		/*
+		 * TLB contents went out of date while we were in lazy
+		 * mode. Fall through to the TLB switching code below.
+		 */
+		new_asid = prev_asid;
+		need_flush = true;
 	} else {
 		u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
 
@@ -454,6 +479,9 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
 		 * paging-structure cache to avoid speculatively reading
 		 * garbage into our TLB.  Since switching to init_mm is barely
 		 * slower than a minimal flush, just switch to init_mm.
+		 *
+		 * This should be rare, with native_flush_tlb_others skipping
+		 * IPIs to lazy TLB mode CPUs.
 		 */
 		switch_mm_irqs_off(NULL, &init_mm, NULL);
 		return;
@@ -560,6 +588,9 @@ static void flush_tlb_func_remote(void *info)
 void native_flush_tlb_others(const struct cpumask *cpumask,
 			     const struct flush_tlb_info *info)
 {
+	cpumask_var_t lazymask;
+	unsigned int cpu;
+
 	count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
 	if (info->end == TLB_FLUSH_ALL)
 		trace_tlb_flush(TLB_REMOTE_SEND_IPI, TLB_FLUSH_ALL);
@@ -583,8 +614,6 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 		 * that UV should be updated so that smp_call_function_many(),
 		 * etc, are optimal on UV.
 		 */
-		unsigned int cpu;
-
 		cpu = smp_processor_id();
 		cpumask = uv_flush_tlb_others(cpumask, info);
 		if (cpumask)
@@ -592,8 +621,29 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 					       (void *)info, 1);
 		return;
 	}
-	smp_call_function_many(cpumask, flush_tlb_func_remote,
+
+	/*
+	 * A temporary cpumask is used in order to skip sending IPIs
+	 * to CPUs in lazy TLB state, while keeping them in mm_cpumask(mm).
+	 * If the allocation fails, simply IPI every CPU in mm_cpumask.
+	 */
+	if (!alloc_cpumask_var(&lazymask, GFP_ATOMIC)) {
+		smp_call_function_many(cpumask, flush_tlb_func_remote,
+			       (void *)info, 1);
+		return;
+	}
+
+	cpumask_copy(lazymask, cpumask);
+
+	for_each_cpu(cpu, lazymask) {
+		if (per_cpu(cpu_tlbstate.is_lazy, cpu))
+			cpumask_clear_cpu(cpu, lazymask);
+	}
+
+	smp_call_function_many(lazymask, flush_tlb_func_remote,
 			       (void *)info, 1);
+
+	free_cpumask_var(lazymask);
 }
 
 /*
-- 
2.14.4


  parent reply	other threads:[~2018-07-16 19:04 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 19:03 [PATCH v6 0/7] x86,tlb,mm: make lazy TLB mode even lazier Rik van Riel
2018-07-16 19:03 ` [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids Rik van Riel
2018-07-17  9:33   ` [tip:x86/mm] mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) " tip-bot for Rik van Riel
2018-08-04 22:28   ` [PATCH 1/7] mm: allocate mm_cpumask " Guenter Roeck
2018-07-16 19:03 ` [PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time Rik van Riel
2018-07-17  9:34   ` [tip:x86/mm] x86/mm/tlb: Leave " tip-bot for Rik van Riel
2018-07-17 11:46     ` Peter Zijlstra
2018-07-25  1:00       ` Anders Roxell
2018-08-16  1:54   ` [PATCH 2/7] x86,tlb: leave " Andy Lutomirski
2018-08-16  5:31     ` Rik van Riel
2018-07-16 19:03 ` [PATCH 3/7] x86,mm: restructure switch_mm_irqs_off Rik van Riel
2018-07-17  9:34   ` [tip:x86/mm] x86/mm/tlb: Restructure switch_mm_irqs_off() tip-bot for Rik van Riel
2018-10-09 14:58   ` tip-bot for Rik van Riel
2018-07-16 19:03 ` Rik van Riel [this message]
2018-07-17  9:35   ` [tip:x86/mm] x86/mm/tlb: Make lazy TLB mode lazier tip-bot for Rik van Riel
2018-07-17 11:33     ` Peter Zijlstra
2018-07-18 15:33       ` Rik van Riel
2018-07-18 16:00         ` Peter Zijlstra
     [not found]           ` <081E558D-DB34-4A18-A35C-896BC47F6EBA@surriel.com>
2018-07-18 18:23             ` Peter Zijlstra
2018-07-18 18:51               ` Rik van Riel
2018-07-19  9:13                 ` Peter Zijlstra
2018-07-17 20:04   ` [PATCH 4/7] x86,tlb: make " Andy Lutomirski
     [not found]     ` <FF977B78-140F-4787-AA57-0EA934017D85@surriel.com>
2018-07-17 21:29       ` Andy Lutomirski
2018-07-17 22:05         ` Rik van Riel
2018-07-17 22:27           ` Andy Lutomirski
2018-07-18 20:58     ` Rik van Riel
2018-07-18 23:13       ` Andy Lutomirski
     [not found]         ` <B976CC13-D014-433A-83DE-F8DF9AB4F421@surriel.com>
2018-07-19 16:45           ` Andy Lutomirski
2018-07-19 17:04             ` Andy Lutomirski
2018-07-19 17:04               ` Andy Lutomirski
2018-07-20  4:57               ` Benjamin Herrenschmidt
2018-07-20  8:30               ` Peter Zijlstra
2018-07-23 12:26                 ` Rik van Riel
2018-07-23 12:26                   ` Rik van Riel
2018-07-24 16:33               ` Will Deacon
     [not found]             ` <CF849A07-B7CE-4DE9-8246-53AC5A53A705@surriel.com>
2018-07-19 17:18               ` Andy Lutomirski
2018-07-20  8:02             ` Vitaly Kuznetsov
2018-07-20  9:49               ` Peter Zijlstra
2018-07-20 10:18                 ` Vitaly Kuznetsov
2018-07-20  9:32             ` Peter Zijlstra
2018-07-20 11:04               ` Peter Zijlstra
2018-07-16 19:03 ` [PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs Rik van Riel
2018-07-17  9:35   ` [tip:x86/mm] x86/mm/tlb: Only " tip-bot for Rik van Riel
2018-07-17 11:39     ` Peter Zijlstra
     [not found]       ` <1F8BDD25-864D-4105-B872-2109AA417454@surriel.com>
     [not found]         ` <24AA4367-22A1-450E-8F6A-3CBF39518384@surriel.com>
2018-07-18 16:19           ` Peter Zijlstra
2018-07-16 19:03 ` [PATCH 6/7] x86,mm: always use lazy TLB mode Rik van Riel
2018-07-17  9:36   ` [tip:x86/mm] x86/mm/tlb: Always " tip-bot for Rik van Riel
2018-10-09 14:58   ` tip-bot for Rik van Riel
2018-07-16 19:03 ` [PATCH 7/7] x86,switch_mm: skip atomic operations for init_mm Rik van Riel
2018-07-17  9:36   ` [tip:x86/mm] x86/mm/tlb: Skip atomic operations for 'init_mm' in switch_mm_irqs_off() tip-bot for Rik van Riel
  -- strict thread matches above, loose matches on Subject: below --
2018-07-10 14:28 [PATCH v5 0/7] x86,tlb,mm: make lazy TLB mode even lazier Rik van Riel
2018-07-10 14:28 ` [PATCH 4/7] x86,tlb: make lazy TLB mode lazier Rik van Riel
2018-07-06 21:56 [PATCH v4 0/7] x86,tlb,mm: make lazy TLB mode even lazier Rik van Riel
2018-07-06 21:56 ` [PATCH 4/7] x86,tlb: make lazy TLB mode lazier Rik van Riel
2018-06-29 14:29 [PATCH v3 0/7] x86,tlb,mm: make lazy TLB mode even lazier Rik van Riel
2018-06-29 14:29 ` [PATCH 4/7] x86,tlb: make lazy TLB mode lazier Rik van Riel
2018-06-29 17:05   ` Dave Hansen
2018-06-29 17:29     ` Rik van Riel
2018-06-20 19:56 [PATCH 0/7] x86,tlb,mm: make lazy TLB mode even lazier Rik van Riel
2018-06-20 19:56 ` [PATCH 4/7] x86,tlb: make lazy TLB mode lazier Rik van Riel
2018-06-22 15:04   ` Andy Lutomirski
2018-06-22 15:15     ` Rik van Riel
2018-06-22 15:34       ` Andy Lutomirski
2018-06-22 17:05   ` Dave Hansen
2018-06-22 17:16     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180716190337.26133-5-riel@surriel.com \
    --to=riel@surriel.com \
    --cc=dave.hansen@intel.com \
    --cc=efault@gmx.de \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.