linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Speed MTRR programming up when we can
@ 2019-06-28  2:35 Ricardo Neri
  2019-06-28  2:35 ` [PATCH v2 1/2] x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata Ricardo Neri
  2019-06-28  2:35 ` [PATCH v2 2/2] x86, mtrr: generic: Skip cache flushes on CPUs with cache self-snooping Ricardo Neri
  0 siblings, 2 replies; 5+ messages in thread
From: Ricardo Neri @ 2019-06-28  2:35 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Alan Cox, Tony Luck, H. Peter Anvin, Andy Shevchenko, Andi Kleen,
	Hans de Goede, Greg Kroah-Hartman, Jordan Borgner,
	Ravi V. Shankar, Mohammad Etemadi, Ricardo Neri, linux-kernel,
	x86, Ricardo Neri

This is the second iteration of this patchset. The first iteration can be
viewed here [1]. 

Programming MTRR registers in multi-processor systems is a rather lengthy
process. Furthermore, all processors must program these registers in lock
step and with interrupts disabled; the process also involves flushing
caches and TLBs twice. As a result, the process may take a considerable
amount of time.

In some platforms, this can lead to a large skew of the refined-jiffies
clock source. Early when booting, if no other clock is available (e.g.,
booting with hpet=disabled), the refined-jiffies clock source is used to
monitor the TSC clock source. If the skew of refined-jiffies is too large,
Linux wrongly assumes that the TSC is unstable:

     clocksource: timekeeping watchdog on CPU1: Marking clocksource
          'tsc-early' as unstable because the skew is too large:
     clocksource: 'refined-jiffies' wd_now: fffedc10 wd_last:
          fffedb90 mask: ffffffff
     clocksource: 'tsc-early' cs_now: 5eccfddebc cs_last: 5e7e3303d4
          mask: ffffffffffffffff
     tsc: Marking TSC unstable due to clocksource watchdog

As per my measurements, around 98% of the time needed by the procedure to
program MTRRs in multi-processor systems is spent flushing caches with
wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32
Architectures Software Developer's Manual, it is not necessary to flush
caches if the CPU supports cache self-snooping. Thus, skipping the cache
flushes can reduce by several tens of milliseconds the time needed to
complete the programming of the MTRR registers.

However, there exist CPU models with errata that affect their self-
snooping capabilities. Such errata may cause unpredictable behavior,
machine check errors, or hangs. For instance:

     "Where two different logical processors have the same code page
      mapped with two different memory types Specifically if one code
      page is mapped by one logical processor as write back and by
      another as uncacheable and certain instruction timing conditions
      occur the system may experience unpredictable behaviour." [2].

Similar errata are reported in other processors as well [3], [4], [5],
[6], and [7].

Thus, in order to confidently leverage self-snooping for the MTRR
programming algorithm, we must first clear such feature in models with
known errata.

By measuring the execution time of mtrr_aps_init() (from which MTRRs
in all CPUs are programmed in lock-step at boot), I find savings in the
time required to program MTRRs as follows:

Platform                      time-with-wbinvd(ms) time-no-wbinvd(ms)
104-core (208 LP) Skylake            1437                 28
2-core (4 LP) Haswell                 114                  2

LP = Logical Processor

Thanks and BR,
Ricardo

Changes since v1:

 * Relocated comment on the utility of cache self-snooping from
   check_memory_type_self_snoop_errata() to the prepare_set() function
   of the generic MTRR programming ops (Thomas Gleixner).
 * In early_init_intel(), moved check_memory_type_self_snoop_errata()
   next to check_mpx_erratum() for improved readability.

[1]. https://lkml.org/lkml/2019/6/27/828
[2]. Erratum BF52, 
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-3600-specification-update.pdf
[3]. Erratum BK47, 
https://www.mouser.com/pdfdocs/2ndgencorefamilymobilespecificationupdate.pdf
[4]. Erratum AAO54, 
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-c5500-c3500-spec-update.pdf
[5]. Errata AZ39, AZ42, 
https://www.intel.com/content/dam/support/us/en/documents/processors/mobile/celeron/sb/320121.pdf
[6]. Errata AQ51, AQ102, AQ104, 
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-dual-core-desktop-e2000-specification-update.pdf
[7]. Errata AN107, AN109, 
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-dual-core-specification-update.pdf

Ricardo Neri (2):
  x86/cpu/intel: Clear cache self-snoop capability in CPUs with known
    errata
  x86, mtrr: generic: Skip cache flushes on CPUs with cache
    self-snooping

 arch/x86/kernel/cpu/intel.c        | 27 +++++++++++++++++++++++++++
 arch/x86/kernel/cpu/mtrr/generic.c | 15 +++++++++++++--
 2 files changed, 40 insertions(+), 2 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata
  2019-06-28  2:35 [PATCH v2 0/2] Speed MTRR programming up when we can Ricardo Neri
@ 2019-06-28  2:35 ` Ricardo Neri
  2019-06-28  5:25   ` [tip:x86/cpu] " tip-bot for Ricardo Neri
  2019-06-28  2:35 ` [PATCH v2 2/2] x86, mtrr: generic: Skip cache flushes on CPUs with cache self-snooping Ricardo Neri
  1 sibling, 1 reply; 5+ messages in thread
From: Ricardo Neri @ 2019-06-28  2:35 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Alan Cox, Tony Luck, H. Peter Anvin, Andy Shevchenko, Andi Kleen,
	Hans de Goede, Greg Kroah-Hartman, Jordan Borgner,
	Ravi V. Shankar, Mohammad Etemadi, Ricardo Neri, linux-kernel,
	x86, Ricardo Neri, Andy Shevchenko, Andi Kleen, Peter Feiner,
	Rafael J. Wysocki

Processors which have self-snooping capability can handle conflicting
memory type across CPUs by snooping its own cache. However, there exists
CPU models in which having conflicting memory types still leads to
unpredictable behavior, machine check errors, or hangs. Clear this feature
to prevent its use. 

Cc: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Alan Cox <alan.cox@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/cpu/intel.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index f17c1a714779..62e366ec0812 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -66,6 +66,32 @@ void check_mpx_erratum(struct cpuinfo_x86 *c)
 	}
 }
 
+/*
+ * Processors which have self-snooping capability can handle conflicting
+ * memory type across CPUs by snooping its own cache. However, there exists
+ * CPU models in which having conflicting memory types still leads to
+ * unpredictable behavior, machine check errors, or hangs. Clear this feature
+ * to prevent its use.
+ */
+static void check_memory_type_self_snoop_errata(struct cpuinfo_x86 *c)
+{
+	switch (c->x86_model) {
+	case INTEL_FAM6_CORE_YONAH:
+	case INTEL_FAM6_CORE2_MEROM:
+	case INTEL_FAM6_CORE2_MEROM_L:
+	case INTEL_FAM6_CORE2_PENRYN:
+	case INTEL_FAM6_CORE2_DUNNINGTON:
+	case INTEL_FAM6_NEHALEM:
+	case INTEL_FAM6_NEHALEM_G:
+	case INTEL_FAM6_NEHALEM_EP:
+	case INTEL_FAM6_NEHALEM_EX:
+	case INTEL_FAM6_WESTMERE:
+	case INTEL_FAM6_WESTMERE_EP:
+	case INTEL_FAM6_SANDYBRIDGE:
+		setup_clear_cpu_cap(X86_FEATURE_SELFSNOOP);
+	}
+}
+
 static bool ring3mwait_disabled __read_mostly;
 
 static int __init ring3mwait_disable(char *__unused)
@@ -304,6 +330,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	}
 
 	check_mpx_erratum(c);
+	check_memory_type_self_snoop_errata(c);
 
 	/*
 	 * Get the number of SMT siblings early from the extended topology
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] x86, mtrr: generic: Skip cache flushes on CPUs with cache self-snooping
  2019-06-28  2:35 [PATCH v2 0/2] Speed MTRR programming up when we can Ricardo Neri
  2019-06-28  2:35 ` [PATCH v2 1/2] x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata Ricardo Neri
@ 2019-06-28  2:35 ` Ricardo Neri
  2019-06-28  5:25   ` [tip:x86/cpu] x86/mtrr: " tip-bot for Ricardo Neri
  1 sibling, 1 reply; 5+ messages in thread
From: Ricardo Neri @ 2019-06-28  2:35 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Alan Cox, Tony Luck, H. Peter Anvin, Andy Shevchenko, Andi Kleen,
	Hans de Goede, Greg Kroah-Hartman, Jordan Borgner,
	Ravi V. Shankar, Mohammad Etemadi, Ricardo Neri, linux-kernel,
	x86, Ricardo Neri, Andy Shevchenko, Andi Kleen, Peter Feiner,
	Rafael J. Wysocki

Programming MTRR registers in multi-processor systems is a rather lengthy
process. Furthermore, all processors must program these registers in lock
step and with interrupts disabled; the process also involves flushing
caches and TLBs twice. As a result, the process may take a considerable
amount of time.

In some platforms, this can lead to a large skew of the refined-jiffies
clock source. Early when booting, if no other clock is available (e.g.,
booting with hpet=disabled), the refined-jiffies clock source is used to
monitor the TSC clock source. If the skew of refined-jiffies is too large,
Linux wrongly assumes that the TSC is unstable:

  clocksource: timekeeping watchdog on CPU1: Marking clocksource
               'tsc-early' as unstable because the skew is too large:
  clocksource: 'refined-jiffies' wd_now: fffedc10 wd_last:
               fffedb90 mask: ffffffff
  clocksource: 'tsc-early' cs_now: 5eccfddebc cs_last: 5e7e3303d4
               mask: ffffffffffffffff
  tsc: Marking TSC unstable due to clocksource watchdog

As per my measurements, around 98% of the time needed by the procedure to
program MTRRs in multi-processor systems is spent flushing caches with
wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32
Architectures Software Developer's Manual, it is not necessary to flush
caches if the CPU supports cache self-snooping. Thus, skipping the cache
flushes can reduce by several tens of milliseconds the time needed to
complete the programming of the MTRR registers.

Cc: Alan Cox <alan.cox@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Reported-by: Mohammad Etemadi <mohammad.etemadi@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 9356c1c9024d..aa5c064a6a22 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -743,7 +743,15 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
 	/* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
 	cr0 = read_cr0() | X86_CR0_CD;
 	write_cr0(cr0);
-	wbinvd();
+
+	/*
+	 * Cache flushing is the most time-consuming step when programming
+	 * the MTRRs. Fortunately, as per the Intel Software Development
+	 * Manual, we can skip it if the processor supports cache self-
+	 * snooping.
+	 */
+	if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
+		wbinvd();
 
 	/* Save value of CR4 and clear Page Global Enable (bit 7) */
 	if (boot_cpu_has(X86_FEATURE_PGE)) {
@@ -760,7 +768,10 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
 
 	/* Disable MTRRs, and set the default type to uncached */
 	mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
-	wbinvd();
+
+	/* Again, only flush caches if we have to. */
+	if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
+		wbinvd();
 }
 
 static void post_set(void) __releases(set_atomicity_lock)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [tip:x86/cpu] x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata
  2019-06-28  2:35 ` [PATCH v2 1/2] x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata Ricardo Neri
@ 2019-06-28  5:25   ` tip-bot for Ricardo Neri
  0 siblings, 0 replies; 5+ messages in thread
From: tip-bot for Ricardo Neri @ 2019-06-28  5:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hdegoede, tony.luck, andriy.shevchenko, andi.kleen, alan.cox,
	ricardo.neri-calderon, gregkh, mohammad.etemadi, ak, mail, mingo,
	hpa, andriy.shevchenko, bp, ravi.v.shankar, tglx,
	rafael.j.wysocki, linux-kernel, ricardo.neri, pfeiner

Commit-ID:  1e03bff3600101bd9158d005e4313132e55bdec8
Gitweb:     https://git.kernel.org/tip/1e03bff3600101bd9158d005e4313132e55bdec8
Author:     Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate: Thu, 27 Jun 2019 19:35:36 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 28 Jun 2019 07:20:48 +0200

x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata

Processors which have self-snooping capability can handle conflicting
memory type across CPUs by snooping its own cache. However, there exists
CPU models in which having conflicting memory types still leads to
unpredictable behavior, machine check errors, or hangs.

Clear this feature on affected CPUs to prevent its use.

Suggested-by: Alan Cox <alan.cox@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Mohammad Etemadi <mohammad.etemadi@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/1561689337-19390-2-git-send-email-ricardo.neri-calderon@linux.intel.com

---
 arch/x86/kernel/cpu/intel.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index f17c1a714779..8d6d92ebeb54 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -66,6 +66,32 @@ void check_mpx_erratum(struct cpuinfo_x86 *c)
 	}
 }
 
+/*
+ * Processors which have self-snooping capability can handle conflicting
+ * memory type across CPUs by snooping its own cache. However, there exists
+ * CPU models in which having conflicting memory types still leads to
+ * unpredictable behavior, machine check errors, or hangs. Clear this
+ * feature to prevent its use on machines with known erratas.
+ */
+static void check_memory_type_self_snoop_errata(struct cpuinfo_x86 *c)
+{
+	switch (c->x86_model) {
+	case INTEL_FAM6_CORE_YONAH:
+	case INTEL_FAM6_CORE2_MEROM:
+	case INTEL_FAM6_CORE2_MEROM_L:
+	case INTEL_FAM6_CORE2_PENRYN:
+	case INTEL_FAM6_CORE2_DUNNINGTON:
+	case INTEL_FAM6_NEHALEM:
+	case INTEL_FAM6_NEHALEM_G:
+	case INTEL_FAM6_NEHALEM_EP:
+	case INTEL_FAM6_NEHALEM_EX:
+	case INTEL_FAM6_WESTMERE:
+	case INTEL_FAM6_WESTMERE_EP:
+	case INTEL_FAM6_SANDYBRIDGE:
+		setup_clear_cpu_cap(X86_FEATURE_SELFSNOOP);
+	}
+}
+
 static bool ring3mwait_disabled __read_mostly;
 
 static int __init ring3mwait_disable(char *__unused)
@@ -304,6 +330,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	}
 
 	check_mpx_erratum(c);
+	check_memory_type_self_snoop_errata(c);
 
 	/*
 	 * Get the number of SMT siblings early from the extended topology

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [tip:x86/cpu] x86/mtrr: Skip cache flushes on CPUs with cache self-snooping
  2019-06-28  2:35 ` [PATCH v2 2/2] x86, mtrr: generic: Skip cache flushes on CPUs with cache self-snooping Ricardo Neri
@ 2019-06-28  5:25   ` tip-bot for Ricardo Neri
  0 siblings, 0 replies; 5+ messages in thread
From: tip-bot for Ricardo Neri @ 2019-06-28  5:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ravi.v.shankar, mail, alan.cox, linux-kernel,
	ricardo.neri-calderon, andriy.shevchenko, mingo, gregkh,
	andi.kleen, hdegoede, pfeiner, hpa, ak, tglx, bp,
	andriy.shevchenko, ricardo.neri, tony.luck, mohammad.etemadi,
	rafael.j.wysocki

Commit-ID:  fd329f276ecaad7a371d6f91b9bbea031d0c3440
Gitweb:     https://git.kernel.org/tip/fd329f276ecaad7a371d6f91b9bbea031d0c3440
Author:     Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
AuthorDate: Thu, 27 Jun 2019 19:35:37 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 28 Jun 2019 07:21:00 +0200

x86/mtrr: Skip cache flushes on CPUs with cache self-snooping

Programming MTRR registers in multi-processor systems is a rather lengthy
process. Furthermore, all processors must program these registers in lock
step and with interrupts disabled; the process also involves flushing
caches and TLBs twice. As a result, the process may take a considerable
amount of time.

On some platforms, this can lead to a large skew of the refined-jiffies
clock source. Early when booting, if no other clock is available (e.g.,
booting with hpet=disabled), the refined-jiffies clock source is used to
monitor the TSC clock source. If the skew of refined-jiffies is too large,
Linux wrongly assumes that the TSC is unstable:

  clocksource: timekeeping watchdog on CPU1: Marking clocksource
               'tsc-early' as unstable because the skew is too large:
  clocksource: 'refined-jiffies' wd_now: fffedc10 wd_last:
               fffedb90 mask: ffffffff
  clocksource: 'tsc-early' cs_now: 5eccfddebc cs_last: 5e7e3303d4
               mask: ffffffffffffffff
  tsc: Marking TSC unstable due to clocksource watchdog

As per measurements, around 98% of the time needed by the procedure to
program MTRRs in multi-processor systems is spent flushing caches with
wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32
Architectures Software Developer's Manual, it is not necessary to flush
caches if the CPU supports cache self-snooping. Thus, skipping the cache
flushes can reduce by several tens of milliseconds the time needed to
complete the programming of the MTRR registers:

Platform                      	Before	   After
104-core (208 Threads) Skylake  1437ms      28ms
  2-core (  4 Threads) Haswell   114ms       2ms

Reported-by: Mohammad Etemadi <mohammad.etemadi@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Alan Cox <alan.cox@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/1561689337-19390-3-git-send-email-ricardo.neri-calderon@linux.intel.com
---
 arch/x86/kernel/cpu/mtrr/generic.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 9356c1c9024d..aa5c064a6a22 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -743,7 +743,15 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
 	/* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
 	cr0 = read_cr0() | X86_CR0_CD;
 	write_cr0(cr0);
-	wbinvd();
+
+	/*
+	 * Cache flushing is the most time-consuming step when programming
+	 * the MTRRs. Fortunately, as per the Intel Software Development
+	 * Manual, we can skip it if the processor supports cache self-
+	 * snooping.
+	 */
+	if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
+		wbinvd();
 
 	/* Save value of CR4 and clear Page Global Enable (bit 7) */
 	if (boot_cpu_has(X86_FEATURE_PGE)) {
@@ -760,7 +768,10 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
 
 	/* Disable MTRRs, and set the default type to uncached */
 	mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
-	wbinvd();
+
+	/* Again, only flush caches if we have to. */
+	if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
+		wbinvd();
 }
 
 static void post_set(void) __releases(set_atomicity_lock)

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-28  5:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-28  2:35 [PATCH v2 0/2] Speed MTRR programming up when we can Ricardo Neri
2019-06-28  2:35 ` [PATCH v2 1/2] x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata Ricardo Neri
2019-06-28  5:25   ` [tip:x86/cpu] " tip-bot for Ricardo Neri
2019-06-28  2:35 ` [PATCH v2 2/2] x86, mtrr: generic: Skip cache flushes on CPUs with cache self-snooping Ricardo Neri
2019-06-28  5:25   ` [tip:x86/cpu] x86/mtrr: " tip-bot for Ricardo Neri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).