* [PATCH v1 01/47] x86: mtrr: annotate mtrr_type_lookup() is only implemented on generic_mtrr_ops
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
@ 2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR Luis R. Rodriguez
` (46 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
toshi.kani, bhelgaas, Roger Pau Monné,
xen-devel
From: "Luis R. Rodriguez" <mcgrof@suse.com>
There area few users of mtrr_type_lookup(), including PAT.
Note that PAT can be in theory enabled without MTRR fully
kicking in, such is the case with Xen.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: bhelgaas@google.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
arch/x86/kernel/cpu/mtrr/generic.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7d74f7b..09c82de 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -230,6 +230,8 @@ u8 mtrr_type_lookup(u64 start, u64 end)
int repeat;
u64 partial_end;
+ /* XXX: Currently only implemented on generic_mtrr_ops */
+
type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
/*
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 01/47] x86: mtrr: annotate mtrr_type_lookup() is only implemented on generic_mtrr_ops Luis R. Rodriguez
@ 2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-25 19:59 ` Konrad Rzeszutek Wilk
2015-03-27 20:40 ` Toshi Kani
2015-03-20 23:17 ` [PATCH v1 03/47] devres: add devm_ioremap_wc() Luis R. Rodriguez
` (45 subsequent siblings)
47 siblings, 2 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
toshi.kani, bhelgaas, Roger Pau Monné,
xen-devel
From: "Luis R. Rodriguez" <mcgrof@suse.com>
It is possible to enable CONFIG_MTRR and up with it
disabled at run time and yet CONFIG_X86_PAT continues
to kick through fully functionally. This can happen
for instance on Xen where MTRR is not supported but
PAT is, this can happen now on Linux as of commit
47591df50 by Juergen introduced as of v3.19.
Technically we should assume the proper CPU
bits would be set to disable MTRR but we can't
always rely on this. At least on the Xen Hypervisor
for instance only X86_FEATURE_MTRR was disabled
as of Xen 4.4 through Xen commit 586ab6a [0],
but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR,
or X86_FEATURE_CYRIX_ARR for instance.
x86 mtrr code relies on quite a bit of checks for
mtrr_if being set to check to see if MTRR did get
set up, instead of using that lets provide a generic
setter which when set we know MTRR is enabled. This
also adds a few checks where they were not before
which could potentially safeguard ourselves against
incorrect usage of MTRR where this was not desirable.
Where possible match error codes as if MTRR was
disabled on arch/x86/include/asm/mtrr.h.
Lastly, since disabling MTRR can happen at run time
and we could end up with PAT enabled best record now
on our logs when MTRR is disabled.
[0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a
4.4.0-rc1~18
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: bhelgaas@google.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
arch/x86/include/asm/mtrr.h | 2 ++
arch/x86/kernel/cpu/mtrr/cleanup.c | 2 +-
arch/x86/kernel/cpu/mtrr/generic.c | 5 +++--
arch/x86/kernel/cpu/mtrr/if.c | 3 +++
arch/x86/kernel/cpu/mtrr/main.c | 31 ++++++++++++++++++++++---------
5 files changed, 31 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f62..cade917 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,6 +31,7 @@
* arch_phys_wc_add and arch_phys_wc_del.
*/
# ifdef CONFIG_MTRR
+extern int mtrr_enabled;
extern u8 mtrr_type_lookup(u64 addr, u64 end);
extern void mtrr_save_fixed_ranges(void *);
extern void mtrr_save_state(void);
@@ -50,6 +51,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
extern int phys_wc_to_mtrr_index(int handle);
# else
+static const int mtrr_enabled;
static inline u8 mtrr_type_lookup(u64 addr, u64 end)
{
/*
diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 5f90b85..784dc55 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -880,7 +880,7 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
* Make sure we only trim uncachable memory on machines that
* support the Intel MTRR architecture:
*/
- if (!is_cpu(INTEL) || disable_mtrr_trim)
+ if (!is_cpu(INTEL) || disable_mtrr_trim || !mtrr_enabled)
return 0;
rdmsr(MSR_MTRRdefType, def, dummy);
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 09c82de..df321b2 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -116,7 +116,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
u8 prev_match, curr_match;
*repeat = 0;
- if (!mtrr_state_set)
+ /* generic_mtrr_ops is only set for generic_mtrr_ops */
+ if (!mtrr_state_set || !mtrr_enabled)
return 0xFF;
if (!mtrr_state.enabled)
@@ -290,7 +291,7 @@ static void get_fixed_ranges(mtrr_type *frs)
void mtrr_save_fixed_ranges(void *info)
{
- if (cpu_has_mtrr)
+ if (mtrr_enabled && cpu_has_mtrr)
get_fixed_ranges(mtrr_state.fixed_ranges);
}
diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
index d76f13d..e9e001a 100644
--- a/arch/x86/kernel/cpu/mtrr/if.c
+++ b/arch/x86/kernel/cpu/mtrr/if.c
@@ -436,6 +436,9 @@ static int __init mtrr_if_init(void)
{
struct cpuinfo_x86 *c = &boot_cpu_data;
+ if (!mtrr_enabled)
+ return 0;
+
if ((!cpu_has(c, X86_FEATURE_MTRR)) &&
(!cpu_has(c, X86_FEATURE_K6_MTRR)) &&
(!cpu_has(c, X86_FEATURE_CYRIX_ARR)) &&
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363..7db9c47 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -59,6 +59,7 @@
#define MTRR_TO_PHYS_WC_OFFSET 1000
u32 num_var_ranges;
+int mtrr_enabled;
unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
static DEFINE_MUTEX(mtrr_mutex);
@@ -84,6 +85,9 @@ static int have_wrcomb(void)
{
struct pci_dev *dev;
+ if (!mtrr_enabled)
+ return 0;
+
dev = pci_get_class(PCI_CLASS_BRIDGE_HOST << 8, NULL);
if (dev != NULL) {
/*
@@ -286,7 +290,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
int i, replace, error;
mtrr_type ltype;
- if (!mtrr_if)
+ if (!mtrr_enabled)
return -ENXIO;
error = mtrr_if->validate_add_page(base, size, type);
@@ -388,6 +392,8 @@ int mtrr_add_page(unsigned long base, unsigned long size,
static int mtrr_check(unsigned long base, unsigned long size)
{
+ if (!mtrr_enabled)
+ return -ENODEV;
if ((base & (PAGE_SIZE - 1)) || (size & (PAGE_SIZE - 1))) {
pr_warning("mtrr: size and base must be multiples of 4 kiB\n");
pr_debug("mtrr: size: 0x%lx base: 0x%lx\n", size, base);
@@ -463,8 +469,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
unsigned long lbase, lsize;
int error = -EINVAL;
- if (!mtrr_if)
- return -ENXIO;
+ if (!mtrr_enabled)
+ return -ENODEV;
max = num_var_ranges;
/* No CPU hotplug when we change MTRR entries */
@@ -523,6 +529,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
*/
int mtrr_del(int reg, unsigned long base, unsigned long size)
{
+ if (!mtrr_enabled)
+ return -ENODEV;
if (mtrr_check(base, size))
return -EINVAL;
return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
@@ -545,7 +553,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
{
int ret;
- if (pat_enabled)
+ if (pat_enabled || !mtrr_enabled)
return 0; /* Success! (We don't need to do anything.) */
ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
@@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
}
if (mtrr_if) {
+ mtrr_enabled = true;
set_num_var_ranges();
init_table();
if (use_intel()) {
@@ -744,12 +753,13 @@ void __init mtrr_bp_init(void)
mtrr_if->set_all();
}
}
- }
+ } else
+ pr_info("mtrr: system does not support MTRR\n");
}
void mtrr_ap_init(void)
{
- if (!use_intel() || mtrr_aps_delayed_init)
+ if (!use_intel() || mtrr_aps_delayed_init || !mtrr_enabled)
return;
/*
* Ideally we should hold mtrr_mutex here to avoid mtrr entries
@@ -774,6 +784,9 @@ void mtrr_save_state(void)
{
int first_cpu;
+ if (!mtrr_enabled)
+ return;
+
get_online_cpus();
first_cpu = cpumask_first(cpu_online_mask);
smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
@@ -782,7 +795,7 @@ void mtrr_save_state(void)
void set_mtrr_aps_delayed_init(void)
{
- if (!use_intel())
+ if (!use_intel() || !mtrr_enabled)
return;
mtrr_aps_delayed_init = true;
@@ -810,7 +823,7 @@ void mtrr_aps_init(void)
void mtrr_bp_restore(void)
{
- if (!use_intel())
+ if (!use_intel() || !mtrr_enabled)
return;
mtrr_if->set_all();
@@ -818,7 +831,7 @@ void mtrr_bp_restore(void)
static int __init mtrr_init_finialize(void)
{
- if (!mtrr_if)
+ if (!mtrr_enabled)
return 0;
if (use_intel()) {
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-03-20 23:17 ` [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR Luis R. Rodriguez
@ 2015-03-25 19:59 ` Konrad Rzeszutek Wilk
2015-03-26 4:38 ` Juergen Gross
2015-03-26 23:35 ` Luis R. Rodriguez
2015-03-27 20:40 ` Toshi Kani
1 sibling, 2 replies; 400+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-25 19:59 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Stefan Bader, ville.syrjala,
david.vrabel, toshi.kani, bhelgaas, Roger Pau Monné,
xen-devel
On Fri, Mar 20, 2015 at 04:17:52PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> It is possible to enable CONFIG_MTRR and up with it
> disabled at run time and yet CONFIG_X86_PAT continues
> to kick through fully functionally. This can happen
s/fully/full/ ?
> for instance on Xen where MTRR is not supported but
> PAT is, this can happen now on Linux as of commit
> 47591df50 by Juergen introduced as of v3.19.
s/3.19/4.0/
>
> Technically we should assume the proper CPU
> bits would be set to disable MTRR but we can't
> always rely on this. At least on the Xen Hypervisor
> for instance only X86_FEATURE_MTRR was disabled
> as of Xen 4.4 through Xen commit 586ab6a [0],
> but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR,
> or X86_FEATURE_CYRIX_ARR for instance.
Oh, could you send an patch for that to Xen please?
>
> x86 mtrr code relies on quite a bit of checks for
> mtrr_if being set to check to see if MTRR did get
> set up, instead of using that lets provide a generic
> setter which when set we know MTRR is enabled. This
s/we know MTRR is enabled/will let us know that MTRR is enabled/
> also adds a few checks where they were not before
> which could potentially safeguard ourselves against
> incorrect usage of MTRR where this was not desirable.
>
> Where possible match error codes as if MTRR was
> disabled on arch/x86/include/asm/mtrr.h.
>
> Lastly, since disabling MTRR can happen at run time
> and we could end up with PAT enabled best record now
> on our logs when MTRR is disabled.
>
> [0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a
> 4.4.0-rc1~18
>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: venkatesh.pallipadi@intel.com
> Cc: Stefan Bader <stefan.bader@canonical.com>
> Cc: konrad.wilk@oracle.com
> Cc: ville.syrjala@linux.intel.com
> Cc: david.vrabel@citrix.com
> Cc: jbeulich@suse.com
> Cc: toshi.kani@hp.com
> Cc: bhelgaas@google.com
> Cc: Roger Pau Monné <roger.pau@citrix.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: xen-devel@lists.xensource.com
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
> arch/x86/include/asm/mtrr.h | 2 ++
> arch/x86/kernel/cpu/mtrr/cleanup.c | 2 +-
> arch/x86/kernel/cpu/mtrr/generic.c | 5 +++--
> arch/x86/kernel/cpu/mtrr/if.c | 3 +++
> arch/x86/kernel/cpu/mtrr/main.c | 31 ++++++++++++++++++++++---------
> 5 files changed, 31 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
> index f768f62..cade917 100644
> --- a/arch/x86/include/asm/mtrr.h
> +++ b/arch/x86/include/asm/mtrr.h
> @@ -31,6 +31,7 @@
> * arch_phys_wc_add and arch_phys_wc_del.
> */
> # ifdef CONFIG_MTRR
> +extern int mtrr_enabled;
> extern u8 mtrr_type_lookup(u64 addr, u64 end);
> extern void mtrr_save_fixed_ranges(void *);
> extern void mtrr_save_state(void);
> @@ -50,6 +51,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
> extern int amd_special_default_mtrr(void);
> extern int phys_wc_to_mtrr_index(int handle);
> # else
> +static const int mtrr_enabled;
> static inline u8 mtrr_type_lookup(u64 addr, u64 end)
> {
> /*
> diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
> index 5f90b85..784dc55 100644
> --- a/arch/x86/kernel/cpu/mtrr/cleanup.c
> +++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
> @@ -880,7 +880,7 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
> * Make sure we only trim uncachable memory on machines that
> * support the Intel MTRR architecture:
> */
> - if (!is_cpu(INTEL) || disable_mtrr_trim)
> + if (!is_cpu(INTEL) || disable_mtrr_trim || !mtrr_enabled)
> return 0;
>
> rdmsr(MSR_MTRRdefType, def, dummy);
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index 09c82de..df321b2 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -116,7 +116,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> u8 prev_match, curr_match;
>
> *repeat = 0;
> - if (!mtrr_state_set)
> + /* generic_mtrr_ops is only set for generic_mtrr_ops */
> + if (!mtrr_state_set || !mtrr_enabled)
> return 0xFF;
>
> if (!mtrr_state.enabled)
> @@ -290,7 +291,7 @@ static void get_fixed_ranges(mtrr_type *frs)
>
> void mtrr_save_fixed_ranges(void *info)
> {
> - if (cpu_has_mtrr)
> + if (mtrr_enabled && cpu_has_mtrr)
> get_fixed_ranges(mtrr_state.fixed_ranges);
> }
>
> diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
> index d76f13d..e9e001a 100644
> --- a/arch/x86/kernel/cpu/mtrr/if.c
> +++ b/arch/x86/kernel/cpu/mtrr/if.c
> @@ -436,6 +436,9 @@ static int __init mtrr_if_init(void)
> {
> struct cpuinfo_x86 *c = &boot_cpu_data;
>
> + if (!mtrr_enabled)
> + return 0;
> +
> if ((!cpu_has(c, X86_FEATURE_MTRR)) &&
> (!cpu_has(c, X86_FEATURE_K6_MTRR)) &&
> (!cpu_has(c, X86_FEATURE_CYRIX_ARR)) &&
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index ea5f363..7db9c47 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -59,6 +59,7 @@
> #define MTRR_TO_PHYS_WC_OFFSET 1000
>
> u32 num_var_ranges;
> +int mtrr_enabled;
>
> unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
> static DEFINE_MUTEX(mtrr_mutex);
> @@ -84,6 +85,9 @@ static int have_wrcomb(void)
> {
> struct pci_dev *dev;
>
> + if (!mtrr_enabled)
> + return 0;
> +
> dev = pci_get_class(PCI_CLASS_BRIDGE_HOST << 8, NULL);
> if (dev != NULL) {
> /*
> @@ -286,7 +290,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
> int i, replace, error;
> mtrr_type ltype;
>
> - if (!mtrr_if)
> + if (!mtrr_enabled)
> return -ENXIO;
>
> error = mtrr_if->validate_add_page(base, size, type);
> @@ -388,6 +392,8 @@ int mtrr_add_page(unsigned long base, unsigned long size,
>
> static int mtrr_check(unsigned long base, unsigned long size)
> {
> + if (!mtrr_enabled)
> + return -ENODEV;
> if ((base & (PAGE_SIZE - 1)) || (size & (PAGE_SIZE - 1))) {
> pr_warning("mtrr: size and base must be multiples of 4 kiB\n");
> pr_debug("mtrr: size: 0x%lx base: 0x%lx\n", size, base);
> @@ -463,8 +469,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
> unsigned long lbase, lsize;
> int error = -EINVAL;
>
> - if (!mtrr_if)
> - return -ENXIO;
> + if (!mtrr_enabled)
> + return -ENODEV;
>
> max = num_var_ranges;
> /* No CPU hotplug when we change MTRR entries */
> @@ -523,6 +529,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
> */
> int mtrr_del(int reg, unsigned long base, unsigned long size)
> {
> + if (!mtrr_enabled)
> + return -ENODEV;
> if (mtrr_check(base, size))
> return -EINVAL;
> return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
> @@ -545,7 +553,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
> {
> int ret;
>
> - if (pat_enabled)
> + if (pat_enabled || !mtrr_enabled)
> return 0; /* Success! (We don't need to do anything.) */
>
> ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
> @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> }
>
> if (mtrr_if) {
> + mtrr_enabled = true;
> set_num_var_ranges();
> init_table();
> if (use_intel()) {
> @@ -744,12 +753,13 @@ void __init mtrr_bp_init(void)
> mtrr_if->set_all();
> }
> }
> - }
> + } else
> + pr_info("mtrr: system does not support MTRR\n");
'pr_warn' ?
> }
>
> void mtrr_ap_init(void)
> {
> - if (!use_intel() || mtrr_aps_delayed_init)
> + if (!use_intel() || mtrr_aps_delayed_init || !mtrr_enabled)
> return;
> /*
> * Ideally we should hold mtrr_mutex here to avoid mtrr entries
> @@ -774,6 +784,9 @@ void mtrr_save_state(void)
> {
> int first_cpu;
>
> + if (!mtrr_enabled)
> + return;
> +
> get_online_cpus();
> first_cpu = cpumask_first(cpu_online_mask);
> smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
> @@ -782,7 +795,7 @@ void mtrr_save_state(void)
>
> void set_mtrr_aps_delayed_init(void)
> {
> - if (!use_intel())
> + if (!use_intel() || !mtrr_enabled)
> return;
>
> mtrr_aps_delayed_init = true;
> @@ -810,7 +823,7 @@ void mtrr_aps_init(void)
>
> void mtrr_bp_restore(void)
> {
> - if (!use_intel())
> + if (!use_intel() || !mtrr_enabled)
> return;
>
> mtrr_if->set_all();
> @@ -818,7 +831,7 @@ void mtrr_bp_restore(void)
>
> static int __init mtrr_init_finialize(void)
> {
> - if (!mtrr_if)
> + if (!mtrr_enabled)
> return 0;
>
> if (use_intel()) {
> --
> 2.3.2.209.gd67f9d5.dirty
>
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-03-25 19:59 ` Konrad Rzeszutek Wilk
@ 2015-03-26 4:38 ` Juergen Gross
2015-03-26 23:35 ` Luis R. Rodriguez
1 sibling, 0 replies; 400+ messages in thread
From: Juergen Gross @ 2015-03-26 4:38 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Luis R. Rodriguez
Cc: luto, mingo, tglx, hpa, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Stefan Bader, ville.syrjala,
david.vrabel, toshi.kani, bhelgaas, Roger Pau Monné,
xen-devel
On 03/25/2015 08:59 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:52PM -0700, Luis R. Rodriguez wrote:
>> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>>
>> It is possible to enable CONFIG_MTRR and up with it
>> disabled at run time and yet CONFIG_X86_PAT continues
>> to kick through fully functionally. This can happen
>
> s/fully/full/ ?
>
>
>> for instance on Xen where MTRR is not supported but
>> PAT is, this can happen now on Linux as of commit
>> 47591df50 by Juergen introduced as of v3.19.
>
> s/3.19/4.0/
No, 3.19 is correct.
Juergen
>>
>> Technically we should assume the proper CPU
>> bits would be set to disable MTRR but we can't
>> always rely on this. At least on the Xen Hypervisor
>> for instance only X86_FEATURE_MTRR was disabled
>> as of Xen 4.4 through Xen commit 586ab6a [0],
>> but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR,
>> or X86_FEATURE_CYRIX_ARR for instance.
>
> Oh, could you send an patch for that to Xen please?
>>
>> x86 mtrr code relies on quite a bit of checks for
>> mtrr_if being set to check to see if MTRR did get
>> set up, instead of using that lets provide a generic
>> setter which when set we know MTRR is enabled. This
>
> s/we know MTRR is enabled/will let us know that MTRR is enabled/
>
>> also adds a few checks where they were not before
>> which could potentially safeguard ourselves against
>> incorrect usage of MTRR where this was not desirable.
>>
>> Where possible match error codes as if MTRR was
>> disabled on arch/x86/include/asm/mtrr.h.
>>
>> Lastly, since disabling MTRR can happen at run time
>> and we could end up with PAT enabled best record now
>> on our logs when MTRR is disabled.
>>
>> [0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a
>> 4.4.0-rc1~18
>>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
>> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>> Cc: Ingo Molnar <mingo@elte.hu>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Dave Airlie <airlied@redhat.com>
>> Cc: Antonino Daplas <adaplas@gmail.com>
>> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
>> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: venkatesh.pallipadi@intel.com
>> Cc: Stefan Bader <stefan.bader@canonical.com>
>> Cc: konrad.wilk@oracle.com
>> Cc: ville.syrjala@linux.intel.com
>> Cc: david.vrabel@citrix.com
>> Cc: jbeulich@suse.com
>> Cc: toshi.kani@hp.com
>> Cc: bhelgaas@google.com
>> Cc: Roger Pau Monné <roger.pau@citrix.com>
>> Cc: linux-fbdev@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: xen-devel@lists.xensource.com
>> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
>> ---
>> arch/x86/include/asm/mtrr.h | 2 ++
>> arch/x86/kernel/cpu/mtrr/cleanup.c | 2 +-
>> arch/x86/kernel/cpu/mtrr/generic.c | 5 +++--
>> arch/x86/kernel/cpu/mtrr/if.c | 3 +++
>> arch/x86/kernel/cpu/mtrr/main.c | 31 ++++++++++++++++++++++---------
>> 5 files changed, 31 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
>> index f768f62..cade917 100644
>> --- a/arch/x86/include/asm/mtrr.h
>> +++ b/arch/x86/include/asm/mtrr.h
>> @@ -31,6 +31,7 @@
>> * arch_phys_wc_add and arch_phys_wc_del.
>> */
>> # ifdef CONFIG_MTRR
>> +extern int mtrr_enabled;
>> extern u8 mtrr_type_lookup(u64 addr, u64 end);
>> extern void mtrr_save_fixed_ranges(void *);
>> extern void mtrr_save_state(void);
>> @@ -50,6 +51,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
>> extern int amd_special_default_mtrr(void);
>> extern int phys_wc_to_mtrr_index(int handle);
>> # else
>> +static const int mtrr_enabled;
>> static inline u8 mtrr_type_lookup(u64 addr, u64 end)
>> {
>> /*
>> diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
>> index 5f90b85..784dc55 100644
>> --- a/arch/x86/kernel/cpu/mtrr/cleanup.c
>> +++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
>> @@ -880,7 +880,7 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
>> * Make sure we only trim uncachable memory on machines that
>> * support the Intel MTRR architecture:
>> */
>> - if (!is_cpu(INTEL) || disable_mtrr_trim)
>> + if (!is_cpu(INTEL) || disable_mtrr_trim || !mtrr_enabled)
>> return 0;
>>
>> rdmsr(MSR_MTRRdefType, def, dummy);
>> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
>> index 09c82de..df321b2 100644
>> --- a/arch/x86/kernel/cpu/mtrr/generic.c
>> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
>> @@ -116,7 +116,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>> u8 prev_match, curr_match;
>>
>> *repeat = 0;
>> - if (!mtrr_state_set)
>> + /* generic_mtrr_ops is only set for generic_mtrr_ops */
>> + if (!mtrr_state_set || !mtrr_enabled)
>> return 0xFF;
>>
>> if (!mtrr_state.enabled)
>> @@ -290,7 +291,7 @@ static void get_fixed_ranges(mtrr_type *frs)
>>
>> void mtrr_save_fixed_ranges(void *info)
>> {
>> - if (cpu_has_mtrr)
>> + if (mtrr_enabled && cpu_has_mtrr)
>> get_fixed_ranges(mtrr_state.fixed_ranges);
>> }
>>
>> diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
>> index d76f13d..e9e001a 100644
>> --- a/arch/x86/kernel/cpu/mtrr/if.c
>> +++ b/arch/x86/kernel/cpu/mtrr/if.c
>> @@ -436,6 +436,9 @@ static int __init mtrr_if_init(void)
>> {
>> struct cpuinfo_x86 *c = &boot_cpu_data;
>>
>> + if (!mtrr_enabled)
>> + return 0;
>> +
>> if ((!cpu_has(c, X86_FEATURE_MTRR)) &&
>> (!cpu_has(c, X86_FEATURE_K6_MTRR)) &&
>> (!cpu_has(c, X86_FEATURE_CYRIX_ARR)) &&
>> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
>> index ea5f363..7db9c47 100644
>> --- a/arch/x86/kernel/cpu/mtrr/main.c
>> +++ b/arch/x86/kernel/cpu/mtrr/main.c
>> @@ -59,6 +59,7 @@
>> #define MTRR_TO_PHYS_WC_OFFSET 1000
>>
>> u32 num_var_ranges;
>> +int mtrr_enabled;
>>
>> unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
>> static DEFINE_MUTEX(mtrr_mutex);
>> @@ -84,6 +85,9 @@ static int have_wrcomb(void)
>> {
>> struct pci_dev *dev;
>>
>> + if (!mtrr_enabled)
>> + return 0;
>> +
>> dev = pci_get_class(PCI_CLASS_BRIDGE_HOST << 8, NULL);
>> if (dev != NULL) {
>> /*
>> @@ -286,7 +290,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
>> int i, replace, error;
>> mtrr_type ltype;
>>
>> - if (!mtrr_if)
>> + if (!mtrr_enabled)
>> return -ENXIO;
>>
>> error = mtrr_if->validate_add_page(base, size, type);
>> @@ -388,6 +392,8 @@ int mtrr_add_page(unsigned long base, unsigned long size,
>>
>> static int mtrr_check(unsigned long base, unsigned long size)
>> {
>> + if (!mtrr_enabled)
>> + return -ENODEV;
>> if ((base & (PAGE_SIZE - 1)) || (size & (PAGE_SIZE - 1))) {
>> pr_warning("mtrr: size and base must be multiples of 4 kiB\n");
>> pr_debug("mtrr: size: 0x%lx base: 0x%lx\n", size, base);
>> @@ -463,8 +469,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
>> unsigned long lbase, lsize;
>> int error = -EINVAL;
>>
>> - if (!mtrr_if)
>> - return -ENXIO;
>> + if (!mtrr_enabled)
>> + return -ENODEV;
>>
>> max = num_var_ranges;
>> /* No CPU hotplug when we change MTRR entries */
>> @@ -523,6 +529,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
>> */
>> int mtrr_del(int reg, unsigned long base, unsigned long size)
>> {
>> + if (!mtrr_enabled)
>> + return -ENODEV;
>> if (mtrr_check(base, size))
>> return -EINVAL;
>> return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
>> @@ -545,7 +553,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
>> {
>> int ret;
>>
>> - if (pat_enabled)
>> + if (pat_enabled || !mtrr_enabled)
>> return 0; /* Success! (We don't need to do anything.) */
>>
>> ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
>> @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
>> }
>>
>> if (mtrr_if) {
>> + mtrr_enabled = true;
>> set_num_var_ranges();
>> init_table();
>> if (use_intel()) {
>> @@ -744,12 +753,13 @@ void __init mtrr_bp_init(void)
>> mtrr_if->set_all();
>> }
>> }
>> - }
>> + } else
>> + pr_info("mtrr: system does not support MTRR\n");
>
> 'pr_warn' ?
>> }
>>
>> void mtrr_ap_init(void)
>> {
>> - if (!use_intel() || mtrr_aps_delayed_init)
>> + if (!use_intel() || mtrr_aps_delayed_init || !mtrr_enabled)
>> return;
>> /*
>> * Ideally we should hold mtrr_mutex here to avoid mtrr entries
>> @@ -774,6 +784,9 @@ void mtrr_save_state(void)
>> {
>> int first_cpu;
>>
>> + if (!mtrr_enabled)
>> + return;
>> +
>> get_online_cpus();
>> first_cpu = cpumask_first(cpu_online_mask);
>> smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
>> @@ -782,7 +795,7 @@ void mtrr_save_state(void)
>>
>> void set_mtrr_aps_delayed_init(void)
>> {
>> - if (!use_intel())
>> + if (!use_intel() || !mtrr_enabled)
>> return;
>>
>> mtrr_aps_delayed_init = true;
>> @@ -810,7 +823,7 @@ void mtrr_aps_init(void)
>>
>> void mtrr_bp_restore(void)
>> {
>> - if (!use_intel())
>> + if (!use_intel() || !mtrr_enabled)
>> return;
>>
>> mtrr_if->set_all();
>> @@ -818,7 +831,7 @@ void mtrr_bp_restore(void)
>>
>> static int __init mtrr_init_finialize(void)
>> {
>> - if (!mtrr_if)
>> + if (!mtrr_enabled)
>> return 0;
>>
>> if (use_intel()) {
>> --
>> 2.3.2.209.gd67f9d5.dirty
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-03-25 19:59 ` Konrad Rzeszutek Wilk
2015-03-26 4:38 ` Juergen Gross
@ 2015-03-26 23:35 ` Luis R. Rodriguez
2015-04-02 20:13 ` Bjorn Helgaas
1 sibling, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-26 23:35 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Stefan Bader, ville.syrjala,
david.vrabel, toshi.kani, bhelgaas, Roger Pau Monné,
xen-devel
On Wed, Mar 25, 2015 at 03:59:41PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:52PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > It is possible to enable CONFIG_MTRR and up with it
> > disabled at run time and yet CONFIG_X86_PAT continues
> > to kick through fully functionally. This can happen
>
> s/fully/full/ ?
I'll rephrase this to:
---
It is possible to enable CONFIG_MTRR and up with it
disabled at run time and yet CONFIG_X86_PAT continues
to kick through with all functionally enabled. This
can happen for instance on Xen where MTRR is not
supported but PAT is, this can happen now on Linux as
of commit 47591df50 by Juergen introduced as of v3.19.
---
Which BTW I had also mentioned on the cover letter that
this is a good time to address if we want to make PAT
then a first class citizen, to detangle it from depending
on MTRR. If so I can do that later.
> > Technically we should assume the proper CPU
> > bits would be set to disable MTRR but we can't
> > always rely on this. At least on the Xen Hypervisor
> > for instance only X86_FEATURE_MTRR was disabled
> > as of Xen 4.4 through Xen commit 586ab6a [0],
> > but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR,
> > or X86_FEATURE_CYRIX_ARR for instance.
>
> Oh, could you send an patch for that to Xen please?
Done.
> > x86 mtrr code relies on quite a bit of checks for
> > mtrr_if being set to check to see if MTRR did get
> > set up, instead of using that lets provide a generic
> > setter which when set we know MTRR is enabled. This
>
> s/we know MTRR is enabled/will let us know that MTRR is enabled/
Amended.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-03-26 23:35 ` Luis R. Rodriguez
@ 2015-04-02 20:13 ` Bjorn Helgaas
2015-04-02 20:20 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 20:13 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Konrad Rzeszutek Wilk, Luis R. Rodriguez, Andy Lutomirski,
Ingo Molnar, Thomas Gleixner, H. Peter Anvin, jgross,
Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
Dave Airlie, linux-kernel, linux-fbdev, x86, xen-devel,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
Stefan Bader, Ville Syrjälä,
David Vrabel, Toshi Kani, Roger Pau Monné,
xen-devel
On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> I'll rephrase this to:
>
> ---
> It is possible to enable CONFIG_MTRR and up with it
> disabled at run time and yet CONFIG_X86_PAT continues
> to kick through with all functionally enabled. This
> can happen for instance on Xen where MTRR is not
> supported but PAT is, this can happen now on Linux as
> of commit 47591df50 by Juergen introduced as of v3.19.
I still can't parse this. What does "up with it disabled at run time"
mean? And "... continues to kick through"? Probably some idiomatic
usage I'm just too old to understand :)
Please use the conventional citation format:
47591df50512 ("xen: Support Xen pv-domains using PAT")
A one-character typo in a SHA1 makes it completely useless, so it's
nice to have the summary line both for readability and a bit of
redundancy.
Bjorn
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-04-02 20:13 ` Bjorn Helgaas
@ 2015-04-02 20:20 ` Luis R. Rodriguez
2015-04-02 20:28 ` Bjorn Helgaas
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 20:20 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Stefan Bader,
Ville Syrjälä,
David Vrabel, Toshi Kani, Roger Pau Monné,
xen-devel
On Thu, Apr 2, 2015 at 1:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>
> > I'll rephrase this to:
> >
> > ---
> > It is possible to enable CONFIG_MTRR and up with it
> > disabled at run time and yet CONFIG_X86_PAT continues
> > to kick through with all functionally enabled. This
> > can happen for instance on Xen where MTRR is not
> > supported but PAT is, this can happen now on Linux as
> > of commit 47591df50 by Juergen introduced as of v3.19.
>
> I still can't parse this. What does "up with it disabled at run time"
> mean?
It means that technically even if your CPU/BIOS/system did support
MTRR if you use a kernel with MTRR support enabled you might end up
with a situation where under one situation MTRR might be enabled and
at another run time scenario with the same exact kernel and system you
will end up with MTRR disabled. Such is the case for example when
booting with Xen, which disables the CPU bits on the hypervisor code.
If you boot the same system without Xen you'll get MTRR.
> And "... continues to kick through"? Probably some idiomatic
> usage I'm just too old to understand :)
That means for example that in both the above circumstances even if
MTRR went disabled at run time with Xen, the kernel went through with
getting PAT enabled.
> Please use the conventional citation format:
>
> 47591df50512 ("xen: Support Xen pv-domains using PAT")
>
> A one-character typo in a SHA1 makes it completely useless, so it's
> nice to have the summary line both for readability and a bit of
> redundancy.
Sure, fixed.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-04-02 20:20 ` Luis R. Rodriguez
@ 2015-04-02 20:28 ` Bjorn Helgaas
2015-04-02 21:02 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 20:28 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Stefan Bader,
Ville Syrjälä,
David Vrabel, Toshi Kani, Roger Pau Monné,
xen-devel
On Thu, Apr 2, 2015 at 3:20 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Thu, Apr 2, 2015 at 1:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>>
>> > I'll rephrase this to:
>> >
>> > ---
>> > It is possible to enable CONFIG_MTRR and up with it
>> > disabled at run time and yet CONFIG_X86_PAT continues
>> > to kick through with all functionally enabled. This
>> > can happen for instance on Xen where MTRR is not
>> > supported but PAT is, this can happen now on Linux as
>> > of commit 47591df50 by Juergen introduced as of v3.19.
>>
>> I still can't parse this. What does "up with it disabled at run time"
>> mean?
>
> It means that technically even if your CPU/BIOS/system did support
> MTRR if you use a kernel with MTRR support enabled you might end up
> with a situation where under one situation MTRR might be enabled and
> at another run time scenario with the same exact kernel and system you
> will end up with MTRR disabled. Such is the case for example when
> booting with Xen, which disables the CPU bits on the hypervisor code.
> If you boot the same system without Xen you'll get MTRR.
Your text is missing some words. You seem to be using "up" as a verb,
but it's not a verb. Maybe you meant "end up"? Even then, it
wouldn't make sense for CONFIG_MTRR to be "disabled at run time"
because CONFIG_MTRR is a compile-time switch. The MTRR
*functionality* could certainly be disabled at run-time, but not
CONFIG_MTRR itself.
>> And "... continues to kick through"? Probably some idiomatic
>> usage I'm just too old to understand :)
>
> That means for example that in both the above circumstances even if
> MTRR went disabled at run time with Xen, the kernel went through with
> getting PAT enabled.
"CONFIG_X86_PAT continues to kick through" doesn't seem a very precise
way of describing this. But maybe it's enough for experts in this
area (which I'm not).
Bjorn
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-04-02 20:28 ` Bjorn Helgaas
@ 2015-04-02 21:02 ` Luis R. Rodriguez
2015-04-02 22:09 ` Bjorn Helgaas
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 21:02 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Stefan Bader,
Ville Syrjälä,
David Vrabel, Toshi Kani, Roger Pau Monné,
xen-devel
On Thu, Apr 02, 2015 at 03:28:51PM -0500, Bjorn Helgaas wrote:
> On Thu, Apr 2, 2015 at 3:20 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Thu, Apr 2, 2015 at 1:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >>
> >> On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >>
> >> > I'll rephrase this to:
> >> >
> >> > ---
> >> > It is possible to enable CONFIG_MTRR and up with it
> >> > disabled at run time and yet CONFIG_X86_PAT continues
> >> > to kick through with all functionally enabled. This
> >> > can happen for instance on Xen where MTRR is not
> >> > supported but PAT is, this can happen now on Linux as
> >> > of commit 47591df50 by Juergen introduced as of v3.19.
> >>
> >> I still can't parse this. What does "up with it disabled at run time"
> >> mean?
> >
> > It means that technically even if your CPU/BIOS/system did support
> > MTRR if you use a kernel with MTRR support enabled you might end up
> > with a situation where under one situation MTRR might be enabled and
> > at another run time scenario with the same exact kernel and system you
> > will end up with MTRR disabled. Such is the case for example when
> > booting with Xen, which disables the CPU bits on the hypervisor code.
> > If you boot the same system without Xen you'll get MTRR.
>
> Your text is missing some words. You seem to be using "up" as a verb,
> but it's not a verb. Maybe you meant "end up"?
Indeed.
> Even then, it
> wouldn't make sense for CONFIG_MTRR to be "disabled at run time"
> because CONFIG_MTRR is a compile-time switch. The MTRR
> *functionality* could certainly be disabled at run-time, but not
> CONFIG_MTRR itself.
I'll clarify.
> >> And "... continues to kick through"? Probably some idiomatic
> >> usage I'm just too old to understand :)
> >
> > That means for example that in both the above circumstances even if
> > MTRR went disabled at run time with Xen, the kernel went through with
> > getting PAT enabled.
>
> "CONFIG_X86_PAT continues to kick through" doesn't seem a very precise
> way of describing this. But maybe it's enough for experts in this
> area (which I'm not).
I've rephrased this to:
---
It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT
and end up with a system with MTRR functionality disabled
PAT functionality enabled. This can happen for instance
on Xen where MTRR is not supported but PAT is. This can
happen on Linux as of commit 47591df50 ("xen: Support Xen
pv-domains using PAT") by Juergen, introduced as of v3.19.
---
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-04-02 21:02 ` Luis R. Rodriguez
@ 2015-04-02 22:09 ` Bjorn Helgaas
2015-04-02 22:12 ` [Xen-devel] " Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 22:09 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Stefan Bader,
Ville Syrjälä,
David Vrabel, Toshi Kani, Roger Pau Monné,
xen-devel
On Thu, Apr 2, 2015 at 4:02 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> ---
> It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT
> and end up with a system with MTRR functionality disabled
> PAT functionality enabled.
This is missing a conjunction or something in "MTRR functionality
disabled PAT functionality."
Bjorn
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [Xen-devel] [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-04-02 22:09 ` Bjorn Helgaas
@ 2015-04-02 22:12 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 22:12 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-fbdev, Daniel Vetter, Dave Hansen, Jan Beulich,
H. Peter Anvin, Ville Syrjälä,
xen-devel, Suresh Siddha, x86, Tomi Valkeinen, xen-devel,
Ingo Molnar, Borislav Petkov, Jean-Christophe Plagniol-Villard,
Antonino Daplas, Stefan Bader, Dave Airlie, Thomas Gleixner,
Ingo Molnar, Juergen Gross, Toshi Kani, linux-kernel,
Andy Lutomirski, David Vrabel, venkatesh.pallipadi,
Roger Pau Monné
On Thu, Apr 2, 2015 at 3:09 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Apr 2, 2015 at 4:02 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>
>> ---
>> It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT
>> and end up with a system with MTRR functionality disabled
>> PAT functionality enabled.
>
> This is missing a conjunction or something in "MTRR functionality
> disabled PAT functionality."
"and PAT functionality" -- fixed. Thanks.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-03-20 23:17 ` [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR Luis R. Rodriguez
2015-03-25 19:59 ` Konrad Rzeszutek Wilk
@ 2015-03-27 20:40 ` Toshi Kani
2015-03-27 23:56 ` Luis R. Rodriguez
1 sibling, 1 reply; 400+ messages in thread
From: Toshi Kani @ 2015-03-27 20:40 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
xen-devel
On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
:
> @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> }
>
> if (mtrr_if) {
> + mtrr_enabled = true;
> set_num_var_ranges();
> init_table();
> if (use_intel()) {
get_mtrr_state();
After setting mtrr_enabled to true, get_mtrr_state() reads
MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
MTRRs are enabled or not on the system. So, potentially, we could have
a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
to disabled when MTRRs are disabled by BIOS.
Thanks,
-Toshi
ps.
I recently cleaned up this part of the MTRR code in the patch below,
which is currently available in the -mm & -next trees.
https://lkml.org/lkml/2015/3/24/1063
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-03-27 20:40 ` Toshi Kani
@ 2015-03-27 23:56 ` Luis R. Rodriguez
2015-04-02 21:49 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:56 UTC (permalink / raw)
To: Toshi Kani
Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
xen-devel
On Fri, Mar 27, 2015 at 02:40:17PM -0600, Toshi Kani wrote:
> On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
> :
> > @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> > }
> >
> > if (mtrr_if) {
> > + mtrr_enabled = true;
> > set_num_var_ranges();
> > init_table();
> > if (use_intel()) {
> get_mtrr_state();
>
> After setting mtrr_enabled to true, get_mtrr_state() reads
> MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
> MTRRs are enabled or not on the system. So, potentially, we could have
> a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
> to disabled when MTRRs are disabled by BIOS.
Thanks for the review, in this case then we should update mtrr_enabled to false.
> ps.
> I recently cleaned up this part of the MTRR code in the patch below,
> which is currently available in the -mm & -next trees.
> https://lkml.org/lkml/2015/3/24/1063
Great I will rebase and work with that and try to address this
consideration you have raised.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-03-27 23:56 ` Luis R. Rodriguez
@ 2015-04-02 21:49 ` Luis R. Rodriguez
2015-04-02 23:52 ` Toshi Kani
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 21:49 UTC (permalink / raw)
To: Toshi Kani
Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
xen-devel
On Sat, Mar 28, 2015 at 12:56:30AM +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 27, 2015 at 02:40:17PM -0600, Toshi Kani wrote:
> > On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
> > :
> > > @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> > > }
> > >
> > > if (mtrr_if) {
> > > + mtrr_enabled = true;
> > > set_num_var_ranges();
> > > init_table();
> > > if (use_intel()) {
> > get_mtrr_state();
> >
> > After setting mtrr_enabled to true, get_mtrr_state() reads
> > MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
> > MTRRs are enabled or not on the system. So, potentially, we could have
> > a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
> > to disabled when MTRRs are disabled by BIOS.
>
> Thanks for the review, in this case then we should update mtrr_enabled to false.
>
> > ps.
> > I recently cleaned up this part of the MTRR code in the patch below,
> > which is currently available in the -mm & -next trees.
> > https://lkml.org/lkml/2015/3/24/1063
>
> Great I will rebase and work with that and try to address this
> consideration you have raised.
OK I'll mesh in this change as well in my next respin:
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index a83f27a..ecf7cb9 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -438,7 +438,7 @@ static void __init print_mtrr_state(void)
}
/* Grab all of the MTRR state for this CPU into *state */
-void __init get_mtrr_state(void)
+bool __init get_mtrr_state(void)
{
struct mtrr_var_range *vrs;
unsigned long flags;
@@ -482,6 +482,8 @@ void __init get_mtrr_state(void)
post_set();
local_irq_restore(flags);
+
+ return !!mtrr_state.enabled;
}
/* Some BIOS's are messed up and don't set all MTRRs the same! */
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363..f96195e 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -734,22 +742,25 @@ void __init mtrr_bp_init(void)
}
if (mtrr_if) {
+ mtrr_enabled = true;
set_num_var_ranges();
init_table();
if (use_intel()) {
- get_mtrr_state();
+ /* BIOS may override */
+ mtrr_enabled = get_mtrr_state();
if (mtrr_cleanup(phys_addr)) {
changed_by_mtrr_cleanup = 1;
@@ -745,11 +755,14 @@ void __init mtrr_bp_init(void)
}
}
}
+
+ if (!mtrr_enabled)
+ pr_info("mtrr: system does not support MTRR\n");
}
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index df5e41f..951884d 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -51,7 +51,7 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
void fill_mtrr_var_range(unsigned int index,
u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
-void get_mtrr_state(void);
+bool get_mtrr_state(void);
extern void set_mtrr_ops(const struct mtrr_ops *ops);
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-04-02 21:49 ` Luis R. Rodriguez
@ 2015-04-02 23:52 ` Toshi Kani
2015-04-03 1:08 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Toshi Kani @ 2015-04-02 23:52 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
xen-devel
On Thu, 2015-04-02 at 23:49 +0200, Luis R. Rodriguez wrote:
> On Sat, Mar 28, 2015 at 12:56:30AM +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 27, 2015 at 02:40:17PM -0600, Toshi Kani wrote:
> > > On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
> > > :
> > > > @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> > > > }
> > > >
> > > > if (mtrr_if) {
> > > > + mtrr_enabled = true;
> > > > set_num_var_ranges();
> > > > init_table();
> > > > if (use_intel()) {
> > > get_mtrr_state();
> > >
> > > After setting mtrr_enabled to true, get_mtrr_state() reads
> > > MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
> > > MTRRs are enabled or not on the system. So, potentially, we could have
> > > a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
> > > to disabled when MTRRs are disabled by BIOS.
> >
> > Thanks for the review, in this case then we should update mtrr_enabled to false.
> >
> > > ps.
> > > I recently cleaned up this part of the MTRR code in the patch below,
> > > which is currently available in the -mm & -next trees.
> > > https://lkml.org/lkml/2015/3/24/1063
> >
> > Great I will rebase and work with that and try to address this
> > consideration you have raised.
>
> OK I'll mesh in this change as well in my next respin:
>
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index a83f27a..ecf7cb9 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -438,7 +438,7 @@ static void __init print_mtrr_state(void)
> }
>
> /* Grab all of the MTRR state for this CPU into *state */
> -void __init get_mtrr_state(void)
> +bool __init get_mtrr_state(void)
> {
> struct mtrr_var_range *vrs;
> unsigned long flags;
> @@ -482,6 +482,8 @@ void __init get_mtrr_state(void)
>
> post_set();
> local_irq_restore(flags);
> +
> + return !!mtrr_state.enabled;
This should be:
return mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED;
because the MTRR_STATE_MTRR_FIXED_ENABLED flag is ignored when the
MTRR_STATE_MTRR_ENABLED flag is clear.
Thanks,
-Toshi
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
2015-04-02 23:52 ` Toshi Kani
@ 2015-04-03 1:08 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-04-03 1:08 UTC (permalink / raw)
To: Toshi Kani
Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
xen-devel
On Thu, Apr 02, 2015 at 05:52:16PM -0600, Toshi Kani wrote:
> On Thu, 2015-04-02 at 23:49 +0200, Luis R. Rodriguez wrote:
> > On Sat, Mar 28, 2015 at 12:56:30AM +0100, Luis R. Rodriguez wrote:
> > > On Fri, Mar 27, 2015 at 02:40:17PM -0600, Toshi Kani wrote:
> > > > On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
> > > > :
> > > > > @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> > > > > }
> > > > >
> > > > > if (mtrr_if) {
> > > > > + mtrr_enabled = true;
> > > > > set_num_var_ranges();
> > > > > init_table();
> > > > > if (use_intel()) {
> > > > get_mtrr_state();
> > > >
> > > > After setting mtrr_enabled to true, get_mtrr_state() reads
> > > > MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
> > > > MTRRs are enabled or not on the system. So, potentially, we could have
> > > > a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
> > > > to disabled when MTRRs are disabled by BIOS.
> > >
> > > Thanks for the review, in this case then we should update mtrr_enabled to false.
> > >
> > > > ps.
> > > > I recently cleaned up this part of the MTRR code in the patch below,
> > > > which is currently available in the -mm & -next trees.
> > > > https://lkml.org/lkml/2015/3/24/1063
> > >
> > > Great I will rebase and work with that and try to address this
> > > consideration you have raised.
> >
> > OK I'll mesh in this change as well in my next respin:
> >
> > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> > index a83f27a..ecf7cb9 100644
> > --- a/arch/x86/kernel/cpu/mtrr/generic.c
> > +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> > @@ -438,7 +438,7 @@ static void __init print_mtrr_state(void)
> > }
> >
> > /* Grab all of the MTRR state for this CPU into *state */
> > -void __init get_mtrr_state(void)
> > +bool __init get_mtrr_state(void)
> > {
> > struct mtrr_var_range *vrs;
> > unsigned long flags;
> > @@ -482,6 +482,8 @@ void __init get_mtrr_state(void)
> >
> > post_set();
> > local_irq_restore(flags);
> > +
> > + return !!mtrr_state.enabled;
>
> This should be:
> return mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED;
>
> because the MTRR_STATE_MTRR_FIXED_ENABLED flag is ignored when the
> MTRR_STATE_MTRR_ENABLED flag is clear.
Thanks, I've used
return !!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED);
Amended.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH v1 03/47] devres: add devm_ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 01/47] x86: mtrr: annotate mtrr_type_lookup() is only implemented on generic_mtrr_ops Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR Luis R. Rodriguez
@ 2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:49 ` Andy Lutomirski
2015-03-20 23:17 ` [PATCH v1 04/47] pci: add pci_ioremap_wc_bar() Luis R. Rodriguez
` (44 subsequent siblings)
47 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
We have devm_ioremap_nocache() but no devm_ioremap_wc()
so add that. This will be used later.
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
Documentation/driver-model/devres.txt | 1 +
include/linux/io.h | 2 ++
lib/devres.c | 29 +++++++++++++++++++++++++++++
3 files changed, 32 insertions(+)
diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt
index e1e2bbd..831a536 100644
--- a/Documentation/driver-model/devres.txt
+++ b/Documentation/driver-model/devres.txt
@@ -276,6 +276,7 @@ IOMAP
devm_ioport_unmap()
devm_ioremap()
devm_ioremap_nocache()
+ devm_ioremap_wc()
devm_ioremap_resource() : checks resource, requests memory region, ioremaps
devm_iounmap()
pcim_iomap()
diff --git a/include/linux/io.h b/include/linux/io.h
index 4cc299c..91101a1 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -72,6 +72,8 @@ void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
resource_size_t size);
void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
resource_size_t size);
+void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
+ resource_size_t size);
void devm_iounmap(struct device *dev, void __iomem *addr);
int check_signature(const volatile void __iomem *io_addr,
const unsigned char *signature, int length);
diff --git a/lib/devres.c b/lib/devres.c
index 0f1dd2e..2eb2bfe 100644
--- a/lib/devres.c
+++ b/lib/devres.c
@@ -72,6 +72,35 @@ void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
EXPORT_SYMBOL(devm_ioremap_nocache);
/**
+ * devm_ioremap_wc - Managed ioremap_wc()
+ * @dev: Generic device to remap IO address for
+ * @offset: BUS offset to map
+ * @size: Size of map
+ *
+ * Managed ioremap_wc(). Map is automatically unmapped on driver
+ * detach.
+ */
+void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
+ resource_size_t size)
+{
+ void __iomem **ptr, *addr;
+
+ ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL);
+ if (!ptr)
+ return NULL;
+
+ addr = ioremap_wc(offset, size);
+ if (addr) {
+ *ptr = addr;
+ devres_add(dev, ptr);
+ } else
+ devres_free(ptr);
+
+ return addr;
+}
+EXPORT_SYMBOL_GPL(devm_ioremap_wc);
+
+/**
* devm_iounmap - Managed iounmap()
* @dev: Generic device to unmap for
* @addr: Address to unmap
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v1 03/47] devres: add devm_ioremap_wc()
2015-03-20 23:17 ` [PATCH v1 03/47] devres: add devm_ioremap_wc() Luis R. Rodriguez
@ 2015-03-20 23:49 ` Andy Lutomirski
2015-03-25 19:50 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:49 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> We have devm_ioremap_nocache() but no devm_ioremap_wc()
> so add that. This will be used later.
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Looks good to me.
> ---
> Documentation/driver-model/devres.txt | 1 +
> include/linux/io.h | 2 ++
> lib/devres.c | 29 +++++++++++++++++++++++++++++
> 3 files changed, 32 insertions(+)
>
> diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt
> index e1e2bbd..831a536 100644
> --- a/Documentation/driver-model/devres.txt
> +++ b/Documentation/driver-model/devres.txt
> @@ -276,6 +276,7 @@ IOMAP
> devm_ioport_unmap()
> devm_ioremap()
> devm_ioremap_nocache()
> + devm_ioremap_wc()
> devm_ioremap_resource() : checks resource, requests memory region, ioremaps
> devm_iounmap()
> pcim_iomap()
> diff --git a/include/linux/io.h b/include/linux/io.h
> index 4cc299c..91101a1 100644
> --- a/include/linux/io.h
> +++ b/include/linux/io.h
> @@ -72,6 +72,8 @@ void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
> resource_size_t size);
> void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
> resource_size_t size);
> +void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
> + resource_size_t size);
> void devm_iounmap(struct device *dev, void __iomem *addr);
> int check_signature(const volatile void __iomem *io_addr,
> const unsigned char *signature, int length);
> diff --git a/lib/devres.c b/lib/devres.c
> index 0f1dd2e..2eb2bfe 100644
> --- a/lib/devres.c
> +++ b/lib/devres.c
> @@ -72,6 +72,35 @@ void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
> EXPORT_SYMBOL(devm_ioremap_nocache);
>
> /**
> + * devm_ioremap_wc - Managed ioremap_wc()
> + * @dev: Generic device to remap IO address for
> + * @offset: BUS offset to map
> + * @size: Size of map
> + *
> + * Managed ioremap_wc(). Map is automatically unmapped on driver
> + * detach.
> + */
> +void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
> + resource_size_t size)
> +{
> + void __iomem **ptr, *addr;
> +
> + ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL);
> + if (!ptr)
> + return NULL;
> +
> + addr = ioremap_wc(offset, size);
> + if (addr) {
> + *ptr = addr;
> + devres_add(dev, ptr);
> + } else
> + devres_free(ptr);
> +
> + return addr;
> +}
> +EXPORT_SYMBOL_GPL(devm_ioremap_wc);
> +
> +/**
> * devm_iounmap - Managed iounmap()
> * @dev: Generic device to unmap for
> * @addr: Address to unmap
> --
> 2.3.2.209.gd67f9d5.dirty
>
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 03/47] devres: add devm_ioremap_wc()
2015-03-20 23:49 ` Andy Lutomirski
@ 2015-03-25 19:50 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-25 19:50 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Fri, Mar 20, 2015 at 04:49:51PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > We have devm_ioremap_nocache() but no devm_ioremap_wc()
> > so add that. This will be used later.
> >
> > Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> > Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
>
> Looks good to me.
Thanks, I'll peg a Reviewed-by.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (2 preceding siblings ...)
2015-03-20 23:17 ` [PATCH v1 03/47] devres: add devm_ioremap_wc() Luis R. Rodriguez
@ 2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:50 ` Andy Lutomirski
2015-03-25 20:03 ` [Xen-devel] " Konrad Rzeszutek Wilk
2015-03-20 23:17 ` [PATCH v1 05/47] pci: add pci_iomap_wc() variants Luis R. Rodriguez
` (43 subsequent siblings)
47 siblings, 2 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This lets drivers take advanate of PAT when available. This
should help with the transition of converting video drivers over
to ioremap_wc() to help with the goal of eventually using
_PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
ioremap_nocache() (de33c442e)
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/pci/pci.c | 14 ++++++++++++++
include/linux/pci.h | 1 +
2 files changed, 15 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 81f06e8..6afd507 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
pci_resource_len(pdev, bar));
}
EXPORT_SYMBOL_GPL(pci_ioremap_bar);
+
+void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
+{
+ /*
+ * Make sure the BAR is actually a memory resource, not an IO resource
+ */
+ if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
+ WARN_ON(1);
+ return NULL;
+ }
+ return ioremap_wc(pci_resource_start(pdev, bar),
+ pci_resource_len(pdev, bar));
+}
+EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
#endif
#define PCI_FIND_CAP_TTL 48
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 211e9da..c235b09 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1667,6 +1667,7 @@ static inline void pci_mmcfg_late_init(void) { }
int pci_ext_cfg_avail(void);
void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
+void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
#ifdef CONFIG_PCI_IOV
int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
2015-03-20 23:17 ` [PATCH v1 04/47] pci: add pci_ioremap_wc_bar() Luis R. Rodriguez
@ 2015-03-20 23:50 ` Andy Lutomirski
2015-03-25 20:06 ` Luis R. Rodriguez
2015-03-25 20:03 ` [Xen-devel] " Konrad Rzeszutek Wilk
1 sibling, 1 reply; 400+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:50 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This lets drivers take advanate of PAT when available. This
> should help with the transition of converting video drivers over
> to ioremap_wc() to help with the goal of eventually using
> _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> ioremap_nocache() (de33c442e)
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
> drivers/pci/pci.c | 14 ++++++++++++++
> include/linux/pci.h | 1 +
> 2 files changed, 15 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 81f06e8..6afd507 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> pci_resource_len(pdev, bar));
> }
> EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> +
> +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> +{
> + /*
> + * Make sure the BAR is actually a memory resource, not an IO resource
> + */
> + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> + WARN_ON(1);
> + return NULL;
> + }
if (WARN_ON(...))?
--Andy
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
2015-03-20 23:50 ` Andy Lutomirski
@ 2015-03-25 20:06 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-25 20:06 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Fri, Mar 20, 2015 at 04:50:32PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > This lets drivers take advanate of PAT when available. This
> > should help with the transition of converting video drivers over
> > to ioremap_wc() to help with the goal of eventually using
> > _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> > ioremap_nocache() (de33c442e)
> >
> > Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> > Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> > ---
> > drivers/pci/pci.c | 14 ++++++++++++++
> > include/linux/pci.h | 1 +
> > 2 files changed, 15 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 81f06e8..6afd507 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> > pci_resource_len(pdev, bar));
> > }
> > EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> > +
> > +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> > +{
> > + /*
> > + * Make sure the BAR is actually a memory resource, not an IO resource
> > + */
> > + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> > + WARN_ON(1);
> > + return NULL;
> > + }
>
> if (WARN_ON(...))?
Sure, they are equivalent however this follows the same exact style as
pci_ioremap_bar() so if we change this one might as well change the style of
pci_ioremap_bar() as well. Let me know if there is any preference. I personally
don't mind the extra line as it shortens the check.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [Xen-devel] [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
2015-03-20 23:17 ` [PATCH v1 04/47] pci: add pci_ioremap_wc_bar() Luis R. Rodriguez
2015-03-20 23:50 ` Andy Lutomirski
@ 2015-03-25 20:03 ` Konrad Rzeszutek Wilk
2015-03-25 20:39 ` Luis R. Rodriguez
1 sibling, 1 reply; 400+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-25 20:03 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied, linux-fbdev, Antonino Daplas,
Daniel Vetter, Luis R. Rodriguez, x86, linux-kernel,
Tomi Valkeinen, xen-devel, Ingo Molnar,
Jean-Christophe Plagniol-Villard
On Fri, Mar 20, 2015 at 04:17:54PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This lets drivers take advanate of PAT when available. This
s/advanate/advantage/
> should help with the transition of converting video drivers over
> to ioremap_wc() to help with the goal of eventually using
> _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> ioremap_nocache() (de33c442e)
Please mention the title of the patch too:
"x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
> drivers/pci/pci.c | 14 ++++++++++++++
> include/linux/pci.h | 1 +
> 2 files changed, 15 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 81f06e8..6afd507 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> pci_resource_len(pdev, bar));
> }
> EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> +
> +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> +{
> + /*
> + * Make sure the BAR is actually a memory resource, not an IO resource
> + */
> + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> + WARN_ON(1);
Would it be better to use dev_warn ? That way you can see which BDF it is?
Thought WARN will give a nice stack-trace that should easily point to the
driver so perhaps not.. Either way - up to you.
> + return NULL;
> + }
> + return ioremap_wc(pci_resource_start(pdev, bar),
> + pci_resource_len(pdev, bar));
> +}
> +EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
> #endif
>
> #define PCI_FIND_CAP_TTL 48
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 211e9da..c235b09 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1667,6 +1667,7 @@ static inline void pci_mmcfg_late_init(void) { }
> int pci_ext_cfg_avail(void);
>
> void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
> +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
>
> #ifdef CONFIG_PCI_IOV
> int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
> --
> 2.3.2.209.gd67f9d5.dirty
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [Xen-devel] [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
2015-03-25 20:03 ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2015-03-25 20:39 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-25 20:39 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Arjan van de Ven, Arjan van de Ven
Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
suresh.b.siddha, venkatesh.pallipadi, airlied, linux-fbdev,
Antonino Daplas, Daniel Vetter, x86, linux-kernel,
Tomi Valkeinen, xen-devel, Ingo Molnar,
Jean-Christophe Plagniol-Villard
On Wed, Mar 25, 2015 at 04:03:46PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:54PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > This lets drivers take advanate of PAT when available. This
>
> s/advanate/advantage/
Amended.
> > should help with the transition of converting video drivers over
> > to ioremap_wc() to help with the goal of eventually using
> > _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> > ioremap_nocache() (de33c442e)
>
> Please mention the title of the patch too:
>
> "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
Added.
> >
> > Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> > Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> > ---
> > drivers/pci/pci.c | 14 ++++++++++++++
> > include/linux/pci.h | 1 +
> > 2 files changed, 15 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 81f06e8..6afd507 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> > pci_resource_len(pdev, bar));
> > }
> > EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> > +
> > +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> > +{
> > + /*
> > + * Make sure the BAR is actually a memory resource, not an IO resource
> > + */
> > + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> > + WARN_ON(1);
>
> Would it be better to use dev_warn ? That way you can see which BDF it is?
>
> Thought WARN will give a nice stack-trace that should easily point to the
> driver so perhaps not.. Either way - up to you.
I'm sticking to the style and use as with pci_ioremap_bar(). Whatever we pick
we should make both use the same. More information is always better and
since we do have dev_warn(), it would be nice to use that however within
its use on both pci_ioremap_wc_bar() and pci_ioremap_bar() we have
a use of the pdev with pci_resource_flags() and I believe if pdev is NULL
we'd get a NULL dereference (dev_driver_string() is used), so it would
seem it might be best to stick with a simple WARN_ON(). Arjan, any
preference? Obviously if pdev is NULL your driver is dumb but as folks
develop drivers this should be expected.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH v1 05/47] pci: add pci_iomap_wc() variants
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (3 preceding siblings ...)
2015-03-20 23:17 ` [PATCH v1 04/47] pci: add pci_ioremap_wc_bar() Luis R. Rodriguez
@ 2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-23 17:20 ` Bjorn Helgaas
2015-03-25 20:07 ` Konrad Rzeszutek Wilk
2015-03-20 23:17 ` [PATCH v1 06/47] mtrr: add __arch_phys_wc_add() Luis R. Rodriguez
` (42 subsequent siblings)
47 siblings, 2 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Bjorn Helgaas, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
Arnd Bergmann, Michael S. Tsirkin, Stefan Bader, konrad.wilk,
ville.syrjala, david.vrabel, jbeulich, toshi.kani,
Roger Pau Monné,
xen-devel
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This allows drivers to take advantage of write-combining
when possible. Ideally we'd have pci_read_bases() just
peg an IORESOURCE_WC flag for us but where exactly
video devices memory lie varies *largely* and at times things
are mixed with MMIO registers, sometimes we can address
the changes in drivers, other times the change requires
intrusive changes.
Although there is also arch_phys_wc_add() that makes use of
architecture specific write-combinging alternatives (MTRR on
x86 when a system does not have PAT) we void polluting
pci_iomap() space with it and force drivers and subsystems
that want to use it to be explicit.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
include/asm-generic/pci_iomap.h | 14 ++++++++++
lib/pci_iomap.c | 61 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 75 insertions(+)
diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h
index 7389c87..b1e17fc 100644
--- a/include/asm-generic/pci_iomap.h
+++ b/include/asm-generic/pci_iomap.h
@@ -15,9 +15,13 @@ struct pci_dev;
#ifdef CONFIG_PCI
/* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
+extern void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max);
extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
unsigned long offset,
unsigned long maxlen);
+extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
+ unsigned long offset,
+ unsigned long maxlen);
/* Create a virtual mapping cookie for a port on a given PCI device.
* Do not call this directly, it exists to make it easier for architectures
* to override */
@@ -34,12 +38,22 @@ static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned lon
return NULL;
}
+static inline void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max)
+{
+ return NULL;
+}
static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
unsigned long offset,
unsigned long maxlen)
{
return NULL;
}
+static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
+ unsigned long offset,
+ unsigned long maxlen)
+{
+ return NULL;
+}
#endif
#endif /* __ASM_GENERIC_IO_H */
diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
index bcce5f1..30b65ae 100644
--- a/lib/pci_iomap.c
+++ b/lib/pci_iomap.c
@@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
EXPORT_SYMBOL(pci_iomap_range);
/**
+ * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
+ * @dev: PCI device that owns the BAR
+ * @bar: BAR number
+ * @offset: map memory at the given offset in BAR
+ * @maxlen: max length of the memory to map
+ *
+ * Using this function you will get a __iomem address to your device BAR.
+ * You can access it using ioread*() and iowrite*(). These functions hide
+ * the details if this is a MMIO or PIO address space and will just do what
+ * you expect from them in the correct way. When possible write combining
+ * is used.
+ *
+ * @maxlen specifies the maximum length to map. If you want to get access to
+ * the complete BAR from offset to the end, pass %0 here.
+ * */
+void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
+ int bar,
+ unsigned long offset,
+ unsigned long maxlen)
+{
+ resource_size_t start = pci_resource_start(dev, bar);
+ resource_size_t len = pci_resource_len(dev, bar);
+ unsigned long flags = pci_resource_flags(dev, bar);
+
+ if (len <= offset || !start)
+ return NULL;
+ len -= offset;
+ start += offset;
+ if (maxlen && len > maxlen)
+ len = maxlen;
+ if (flags & IORESOURCE_IO)
+ return __pci_ioport_map(dev, start, len);
+ if (flags & IORESOURCE_MEM)
+ return ioremap_wc(start, len);
+ /* What? */
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_iomap_wc_range);
+
+/**
* pci_iomap - create a virtual mapping cookie for a PCI BAR
* @dev: PCI device that owns the BAR
* @bar: BAR number
@@ -70,4 +110,25 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
return pci_iomap_range(dev, bar, 0, maxlen);
}
EXPORT_SYMBOL(pci_iomap);
+
+/**
+ * pci_iomap_wc - create a virtual WC mapping cookie for a PCI BAR
+ * @dev: PCI device that owns the BAR
+ * @bar: BAR number
+ * @maxlen: length of the memory to map
+ *
+ * Using this function you will get a __iomem address to your device BAR.
+ * You can access it using ioread*() and iowrite*(). These functions hide
+ * the details if this is a MMIO or PIO address space and will just do what
+ * you expect from them in the correct way. When possible write combining
+ * is used.
+ *
+ * @maxlen specifies the maximum length to map. If you want to get access to
+ * the complete BAR without checking for its length first, pass %0 here.
+ * */
+void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen)
+{
+ return pci_iomap_wc_range(dev, bar, 0, maxlen);
+}
+EXPORT_SYMBOL_GPL(pci_iomap_wc);
#endif /* CONFIG_PCI */
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
2015-03-20 23:17 ` [PATCH v1 05/47] pci: add pci_iomap_wc() variants Luis R. Rodriguez
@ 2015-03-23 17:20 ` Bjorn Helgaas
2015-03-26 3:00 ` Luis R. Rodriguez
` (2 more replies)
2015-03-25 20:07 ` Konrad Rzeszutek Wilk
1 sibling, 3 replies; 400+ messages in thread
From: Bjorn Helgaas @ 2015-03-23 17:20 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
jgross, Jan Beulich, Borislav Petkov, Suresh Siddha,
venkatesh.pallipadi, Dave Airlie, linux-kernel, linux-fbdev, x86,
xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
Stefan Bader, Konrad Rzeszutek Wilk, Ville Syrjälä,
David Vrabel, Toshi Kani, Roger Pau Monné,
xen-devel
Hi Luis,
This seems OK to me, but I'm curious about a few things.
On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This allows drivers to take advantage of write-combining
> when possible. Ideally we'd have pci_read_bases() just
> peg an IORESOURCE_WC flag for us
We do set IORESOURCE_PREFETCH. Do you mean something different?
> but where exactly
> video devices memory lie varies *largely* and at times things
> are mixed with MMIO registers, sometimes we can address
> the changes in drivers, other times the change requires
> intrusive changes.
What does a video device address have to do with this? I do see that
if a BAR maps only a frame buffer, the device might be able to mark it
prefetchable, while if the BAR mapped both a frame buffer and some
registers, it might not be able to make it prefetchable. But that
doesn't seem like it depends on the *address*.
pci_iomap_range() already makes a cacheable mapping if
IORESOURCE_CACHEABLE; I'm guessing that you would like it to
automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
if (flags & IORESOURCE_CACHEABLE)
return ioremap(start, len);
if (flags & IORESOURCE_PREFETCH)
return ioremap_wc(start, len);
return ioremap_nocache(start, len);
Is there a reason not to do that?
> Although there is also arch_phys_wc_add() that makes use of
> architecture specific write-combinging alternatives (MTRR on
> x86 when a system does not have PAT) we void polluting
> pci_iomap() space with it and force drivers and subsystems
> that want to use it to be explicit.
>
> There are a few motivations for this:
>
> a) Take advantage of PAT when available
>
> b) Help bury MTRR code away, MTRR is architecture specific and on
> x86 its replaced by PAT
>
> c) Help with the goal of eventually using _PAGE_CACHE_UC over
> _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> ...
> +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> + int bar,
> + unsigned long offset,
> + unsigned long maxlen)
> +{
> + resource_size_t start = pci_resource_start(dev, bar);
> + resource_size_t len = pci_resource_len(dev, bar);
> + unsigned long flags = pci_resource_flags(dev, bar);
> +
> + if (len <= offset || !start)
> + return NULL;
> + len -= offset;
> + start += offset;
> + if (maxlen && len > maxlen)
> + len = maxlen;
> + if (flags & IORESOURCE_IO)
> + return __pci_ioport_map(dev, start, len);
> + if (flags & IORESOURCE_MEM)
Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
I know the driver might know it's safe even if the device didn't mark
the BAR as prefetchable, but it does seem like an easy way for a
driver to shoot itself in the foot.
> + return ioremap_wc(start, len);
> + /* What? */
> + return NULL;
> +}
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
2015-03-23 17:20 ` Bjorn Helgaas
@ 2015-03-26 3:00 ` Luis R. Rodriguez
2015-04-21 17:52 ` Luis R. Rodriguez
2015-03-27 19:18 ` Toshi Kani
2015-04-21 19:25 ` Michael S. Tsirkin
2 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-26 3:00 UTC (permalink / raw)
To: Bjorn Helgaas, Arnd Bergmann, Linus Walleij, Stefano Stabellini,
Julia Lawall, Peter Senna Tschudin
Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
Stefan Bader, Konrad Rzeszutek Wilk, Ville Syrjälä,
David Vrabel, Toshi Kani, Roger Pau Monné,
Benjamin Poirier, linux-pci
On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> Hi Luis,
>
> This seems OK to me,
Great.
> but I'm curious about a few things.
>
> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > This allows drivers to take advantage of write-combining
> > when possible. Ideally we'd have pci_read_bases() just
> > peg an IORESOURCE_WC flag for us
>
> We do set IORESOURCE_PREFETCH. Do you mean something different?
I did not think we had a WC IORESOURCE flag. Are you saying that we can use
IORESOURCE_PREFETCH for that purpose? If so then great. As I read a PCI BAR
can have PCI_BASE_ADDRESS_MEM_PREFETCH and when that's the case we peg
IORESOURCE_PREFETCH. That seems to be what I want indeed. Questions below.
> > but where exactly
> > video devices memory lie varies *largely* and at times things
> > are mixed with MMIO registers, sometimes we can address
> > the changes in drivers, other times the change requires
> > intrusive changes.
>
> What does a video device address have to do with this? I do see that
> if a BAR maps only a frame buffer, the device might be able to mark it
> prefetchable, while if the BAR mapped both a frame buffer and some
> registers, it might not be able to make it prefetchable. But that
> doesn't seem like it depends on the *address*.
I meant the offsets for each of those, either registers or framebuffer,
and that typically they are mixed (primarily on older devices), so indeed your
summary of the problem is what I meant. Let's remember that we are trying to
take advantage of PAT here when available and avoid MTRR in that case, do we
know that the same PCI BARs that have always historically used MTRRs had
IORESOURCE_PREFETCH set, is that a fair assumption ? I realize they are
different things -- but its precisely why I ask.
> pci_iomap_range() already makes a cacheable mapping if
> IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
>
> if (flags & IORESOURCE_CACHEABLE)
> return ioremap(start, len);
> if (flags & IORESOURCE_PREFETCH)
> return ioremap_wc(start, len);
> return ioremap_nocache(start, len);
Indeed, that's exactly what I think we should strive towards.
> Is there a reason not to do that?
This depends on the exact defintion of IORESOURCE_PREFETCH and
PCI_BASE_ADDRESS_MEM_PREFETCH and how they are used all over and
accross *all devices*. This didn't look promising for starters:
include/uapi/linux/pci_regs.h:#define PCI_BASE_ADDRESS_MEM_PREFETCH 0x08 /* prefetchable? */
PCI_BASE_ADDRESS_MEM_PREFETCH seems to be BAR specific, so a few questions:
1) Can we rest assured for instance that if we check for
PCI_BASE_ADDRESS_MEM_PREFETCH and if set that it will *only* be set on a full
PCI BAR if the full PCI BAR does want WC? If not this can regress
functionality. That seems risky. It however would not be risky if we used
another API that did look for IORESOURCE_PREFETCH and if so use ioremap_wc() --
that way only drivers we know that do use the full PCI bar would use this API.
There's a bit of a problem with this though:
2) Do we know that if a *full PCI BAR* is used for WC that
PCI_BASE_ADDRESS_MEM_PREFETCH *was* definitely set for the PCI BAR? If so then
the API usage would be restricted only to devices that we know *do* adhere to
this. That reduces the possible uses for older drivers and can create
regressions if used loosely without verification... but..
3) If from now on we get folks to commit to uset PCI_BASE_ADDRESS_MEM_PREFETCH
for full PCI BARs that do want WC perhaps newer devices / drivers will use
this very consistently ? Can we bank on that and is it worth it ?
4) If a PCI BAR *does not* have PCI_BASE_ADDRESS_MEM_PREFETCH do we know it
must not never want WC ?
If we don't have certainty on any of the above I'm afraid we can't do much
right now but perhaps we can push towards better use of PCI_BASE_ADDRESS_MEM_PREFETCH
and hope folks will only use this for the full PCI BAR only if WC is desired.
Thoughts?
> > Although there is also arch_phys_wc_add() that makes use of
> > architecture specific write-combinging alternatives (MTRR on
> > x86 when a system does not have PAT) we void polluting
> > pci_iomap() space with it and force drivers and subsystems
> > that want to use it to be explicit.
> >
> > There are a few motivations for this:
> >
> > a) Take advantage of PAT when available
> >
> > b) Help bury MTRR code away, MTRR is architecture specific and on
> > x86 its replaced by PAT
> >
> > c) Help with the goal of eventually using _PAGE_CACHE_UC over
> > _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> > ...
>
> > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> > + int bar,
> > + unsigned long offset,
> > + unsigned long maxlen)
> > +{
> > + resource_size_t start = pci_resource_start(dev, bar);
> > + resource_size_t len = pci_resource_len(dev, bar);
> > + unsigned long flags = pci_resource_flags(dev, bar);
> > +
> > + if (len <= offset || !start)
> > + return NULL;
> > + len -= offset;
> > + start += offset;
> > + if (maxlen && len > maxlen)
> > + len = maxlen;
> > + if (flags & IORESOURCE_IO)
> > + return __pci_ioport_map(dev, start, len);
> > + if (flags & IORESOURCE_MEM)
>
> Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
> I know the driver might know it's safe even if the device didn't mark
> the BAR as prefetchable, but it does seem like an easy way for a
> driver to shoot itself in the foot.
You tell me. I would fear this may not be consistent and we'd end up
having bug reports open for something that has historically been a
non-issue. The above questions can help us gauge the risk of this.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
2015-03-26 3:00 ` Luis R. Rodriguez
@ 2015-04-21 17:52 ` Luis R. Rodriguez
2015-04-21 18:46 ` Michael S. Tsirkin
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 17:52 UTC (permalink / raw)
To: Bjorn Helgaas, Arnd Bergmann, Linus Walleij, Stefano Stabellini,
Julia Lawall, Peter Senna Tschudin, Sarah Sharp
Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
Stefan Bader, Konrad Rzeszutek Wilk, Ville Syrjälä,
David Vrabel, Toshi Kani, Roger Pau Monné,
Benjamin Poirier, linux-pci
On Thu, Mar 26, 2015 at 04:00:54AM +0100, Luis R. Rodriguez wrote:
> On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> > Hi Luis,
> >
> > This seems OK to me,
>
> Great.
>
> > but I'm curious about a few things.
> >
> > On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> > <mcgrof@do-not-panic.com> wrote:
> > > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > >
> > > This allows drivers to take advantage of write-combining
> > > when possible. Ideally we'd have pci_read_bases() just
> > > peg an IORESOURCE_WC flag for us
> >
> > We do set IORESOURCE_PREFETCH. Do you mean something different?
>
> I did not think we had a WC IORESOURCE flag. Are you saying that we can use
> IORESOURCE_PREFETCH for that purpose? If so then great. As I read a PCI BAR
> can have PCI_BASE_ADDRESS_MEM_PREFETCH and when that's the case we peg
> IORESOURCE_PREFETCH. That seems to be what I want indeed. Questions below.
>
> > > but where exactly
> > > video devices memory lie varies *largely* and at times things
> > > are mixed with MMIO registers, sometimes we can address
> > > the changes in drivers, other times the change requires
> > > intrusive changes.
> >
> > What does a video device address have to do with this? I do see that
> > if a BAR maps only a frame buffer, the device might be able to mark it
> > prefetchable, while if the BAR mapped both a frame buffer and some
> > registers, it might not be able to make it prefetchable. But that
> > doesn't seem like it depends on the *address*.
>
> I meant the offsets for each of those, either registers or framebuffer,
> and that typically they are mixed (primarily on older devices), so indeed your
> summary of the problem is what I meant. Let's remember that we are trying to
> take advantage of PAT here when available and avoid MTRR in that case, do we
> know that the same PCI BARs that have always historically used MTRRs had
> IORESOURCE_PREFETCH set, is that a fair assumption ? I realize they are
> different things -- but its precisely why I ask.
>
> > pci_iomap_range() already makes a cacheable mapping if
> > IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> > automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> >
> > if (flags & IORESOURCE_CACHEABLE)
> > return ioremap(start, len);
> > if (flags & IORESOURCE_PREFETCH)
> > return ioremap_wc(start, len);
> > return ioremap_nocache(start, len);
>
> Indeed, that's exactly what I think we should strive towards.
>
> > Is there a reason not to do that?
>
> This depends on the exact defintion of IORESOURCE_PREFETCH and
> PCI_BASE_ADDRESS_MEM_PREFETCH and how they are used all over and
> accross *all devices*. This didn't look promising for starters:
>
> include/uapi/linux/pci_regs.h:#define PCI_BASE_ADDRESS_MEM_PREFETCH 0x08 /* prefetchable? */
>
> PCI_BASE_ADDRESS_MEM_PREFETCH seems to be BAR specific, so a few questions:
>
> 1) Can we rest assured for instance that if we check for
> PCI_BASE_ADDRESS_MEM_PREFETCH and if set that it will *only* be set on a full
> PCI BAR if the full PCI BAR does want WC? If not this can regress
> functionality. That seems risky. It however would not be risky if we used
> another API that did look for IORESOURCE_PREFETCH and if so use ioremap_wc() --
> that way only drivers we know that do use the full PCI bar would use this API.
> There's a bit of a problem with this though:
>
> 2) Do we know that if a *full PCI BAR* is used for WC that
> PCI_BASE_ADDRESS_MEM_PREFETCH *was* definitely set for the PCI BAR? If so then
> the API usage would be restricted only to devices that we know *do* adhere to
> this. That reduces the possible uses for older drivers and can create
> regressions if used loosely without verification... but..
>
> 3) If from now on we get folks to commit to uset PCI_BASE_ADDRESS_MEM_PREFETCH
> for full PCI BARs that do want WC perhaps newer devices / drivers will use
> this very consistently ? Can we bank on that and is it worth it ?
>
> 4) If a PCI BAR *does not* have PCI_BASE_ADDRESS_MEM_PREFETCH do we know it
> must not never want WC ?
>
> If we don't have certainty on any of the above I'm afraid we can't do much
> right now but perhaps we can push towards better use of PCI_BASE_ADDRESS_MEM_PREFETCH
> and hope folks will only use this for the full PCI BAR only if WC is desired.
>
> Thoughts?
Bjorn, now that you're done schooling me on English, any thoughts on the above?
> > > Although there is also arch_phys_wc_add() that makes use of
> > > architecture specific write-combinging alternatives (MTRR on
> > > x86 when a system does not have PAT) we void polluting
> > > pci_iomap() space with it and force drivers and subsystems
> > > that want to use it to be explicit.
> > >
> > > There are a few motivations for this:
> > >
> > > a) Take advantage of PAT when available
> > >
> > > b) Help bury MTRR code away, MTRR is architecture specific and on
> > > x86 its replaced by PAT
> > >
> > > c) Help with the goal of eventually using _PAGE_CACHE_UC over
> > > _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> > > ...
> >
> > > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> > > + int bar,
> > > + unsigned long offset,
> > > + unsigned long maxlen)
> > > +{
> > > + resource_size_t start = pci_resource_start(dev, bar);
> > > + resource_size_t len = pci_resource_len(dev, bar);
> > > + unsigned long flags = pci_resource_flags(dev, bar);
> > > +
> > > + if (len <= offset || !start)
> > > + return NULL;
> > > + len -= offset;
> > > + start += offset;
> > > + if (maxlen && len > maxlen)
> > > + len = maxlen;
> > > + if (flags & IORESOURCE_IO)
> > > + return __pci_ioport_map(dev, start, len);
> > > + if (flags & IORESOURCE_MEM)
> >
> > Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
> > I know the driver might know it's safe even if the device didn't mark
> > the BAR as prefetchable, but it does seem like an easy way for a
> > driver to shoot itself in the foot.
>
> You tell me. I would fear this may not be consistent and we'd end up
> having bug reports open for something that has historically been a
> non-issue. The above questions can help us gauge the risk of this.
Now, I'll tell you what I *think* but these are just guestimates (TM):
* Likely PCI_BASE_ADDRESS_MEM_PREFETCH can implate IORESOURCE_PREFETCH
and use of ioremap_wc() on a full PCI BAR only, but this strict
definition likely cannot be 100% guaranteed and could break some
devices. We need something a bit more concrete and well known so
that next generation industry standards embrace and let us in
the kernel automatically detect specific ranges and their respective
page attribute requirements. Might be good to address here x86 and
ARM families
Curious: Sarah, how does USB address these different different page attribute
needs on USB 3.0?
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
2015-04-21 17:52 ` Luis R. Rodriguez
@ 2015-04-21 18:46 ` Michael S. Tsirkin
0 siblings, 0 replies; 400+ messages in thread
From: Michael S. Tsirkin @ 2015-04-21 18:46 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Bjorn Helgaas, Arnd Bergmann, Linus Walleij, Stefano Stabellini,
Julia Lawall, Peter Senna Tschudin, Sarah Sharp,
Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Stefan Bader,
Konrad Rzeszutek Wilk, Ville Syrjälä,
David Vrabel, Toshi Kani, Roger Pau Monné,
Benjamin Poirier, linux-pci
On Tue, Apr 21, 2015 at 07:52:49PM +0200, Luis R. Rodriguez wrote:
> On Thu, Mar 26, 2015 at 04:00:54AM +0100, Luis R. Rodriguez wrote:
> > On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> > > Hi Luis,
> > >
> > > This seems OK to me,
> >
> > Great.
> >
> > > but I'm curious about a few things.
> > >
> > > On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> > > <mcgrof@do-not-panic.com> wrote:
> > > > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > > >
> > > > This allows drivers to take advantage of write-combining
> > > > when possible. Ideally we'd have pci_read_bases() just
> > > > peg an IORESOURCE_WC flag for us
> > >
> > > We do set IORESOURCE_PREFETCH. Do you mean something different?
> >
> > I did not think we had a WC IORESOURCE flag. Are you saying that we can use
> > IORESOURCE_PREFETCH for that purpose? If so then great. As I read a PCI BAR
> > can have PCI_BASE_ADDRESS_MEM_PREFETCH and when that's the case we peg
> > IORESOURCE_PREFETCH. That seems to be what I want indeed. Questions below.
> >
> > > > but where exactly
> > > > video devices memory lie varies *largely* and at times things
> > > > are mixed with MMIO registers, sometimes we can address
> > > > the changes in drivers, other times the change requires
> > > > intrusive changes.
> > >
> > > What does a video device address have to do with this? I do see that
> > > if a BAR maps only a frame buffer, the device might be able to mark it
> > > prefetchable, while if the BAR mapped both a frame buffer and some
> > > registers, it might not be able to make it prefetchable. But that
> > > doesn't seem like it depends on the *address*.
> >
> > I meant the offsets for each of those, either registers or framebuffer,
> > and that typically they are mixed (primarily on older devices), so indeed your
> > summary of the problem is what I meant. Let's remember that we are trying to
> > take advantage of PAT here when available and avoid MTRR in that case, do we
> > know that the same PCI BARs that have always historically used MTRRs had
> > IORESOURCE_PREFETCH set, is that a fair assumption ? I realize they are
> > different things -- but its precisely why I ask.
> >
> > > pci_iomap_range() already makes a cacheable mapping if
> > > IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> > > automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> > >
> > > if (flags & IORESOURCE_CACHEABLE)
> > > return ioremap(start, len);
> > > if (flags & IORESOURCE_PREFETCH)
> > > return ioremap_wc(start, len);
> > > return ioremap_nocache(start, len);
> >
> > Indeed, that's exactly what I think we should strive towards.
> >
> > > Is there a reason not to do that?
> >
> > This depends on the exact defintion of IORESOURCE_PREFETCH and
> > PCI_BASE_ADDRESS_MEM_PREFETCH and how they are used all over and
> > accross *all devices*. This didn't look promising for starters:
> >
> > include/uapi/linux/pci_regs.h:#define PCI_BASE_ADDRESS_MEM_PREFETCH 0x08 /* prefetchable? */
> >
> > PCI_BASE_ADDRESS_MEM_PREFETCH seems to be BAR specific, so a few questions:
> >
> > 1) Can we rest assured for instance that if we check for
> > PCI_BASE_ADDRESS_MEM_PREFETCH and if set that it will *only* be set on a full
> > PCI BAR if the full PCI BAR does want WC? If not this can regress
> > functionality. That seems risky. It however would not be risky if we used
> > another API that did look for IORESOURCE_PREFETCH and if so use ioremap_wc() --
> > that way only drivers we know that do use the full PCI bar would use this API.
> > There's a bit of a problem with this though:
> >
> > 2) Do we know that if a *full PCI BAR* is used for WC that
> > PCI_BASE_ADDRESS_MEM_PREFETCH *was* definitely set for the PCI BAR? If so then
> > the API usage would be restricted only to devices that we know *do* adhere to
> > this. That reduces the possible uses for older drivers and can create
> > regressions if used loosely without verification... but..
> >
In theory, PCI spec says this about prefetch memory:
Bridges are permitted to merge writes into this range (refer to Section 3.2.6).
Exceptions could be:
- devices not behind a bridge (e.g. intergrated in a root
complex)
- devices behind a virtual bridge from same vendor
(which know bridge won't prefetch)
I worry that WC might also cause more reordering though. I don't
remember this is true, off-hand. Bridges can only reorder transactions
according to very specific rules.
> > 3) If from now on we get folks to commit to uset PCI_BASE_ADDRESS_MEM_PREFETCH
> > for full PCI BARs that do want WC perhaps newer devices / drivers will use
> > this very consistently ? Can we bank on that and is it worth it ?
Unfortunately there's a separate good reason to set memory as prefetcheable:
it's the only way to get 64 bit addresses for devices behind bridges.
So WC might be *safe* for prefetch BARs, but might not be a good idea.
> >
> > 4) If a PCI BAR *does not* have PCI_BASE_ADDRESS_MEM_PREFETCH do we know it
> > must not never want WC ?
That's not true I think. It means device can't allow prefetch but maybe
it does allow combining.
> >
> > If we don't have certainty on any of the above I'm afraid we can't do much
> > right now but perhaps we can push towards better use of PCI_BASE_ADDRESS_MEM_PREFETCH
> > and hope folks will only use this for the full PCI BAR only if WC is desired.
> >
> > Thoughts?
>
> Bjorn, now that you're done schooling me on English, any thoughts on the above?
>
> > > > Although there is also arch_phys_wc_add() that makes use of
> > > > architecture specific write-combinging alternatives (MTRR on
> > > > x86 when a system does not have PAT) we void polluting
> > > > pci_iomap() space with it and force drivers and subsystems
> > > > that want to use it to be explicit.
> > > >
> > > > There are a few motivations for this:
> > > >
> > > > a) Take advantage of PAT when available
> > > >
> > > > b) Help bury MTRR code away, MTRR is architecture specific and on
> > > > x86 its replaced by PAT
> > > >
> > > > c) Help with the goal of eventually using _PAGE_CACHE_UC over
> > > > _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> > > > ...
> > >
> > > > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> > > > + int bar,
> > > > + unsigned long offset,
> > > > + unsigned long maxlen)
> > > > +{
> > > > + resource_size_t start = pci_resource_start(dev, bar);
> > > > + resource_size_t len = pci_resource_len(dev, bar);
> > > > + unsigned long flags = pci_resource_flags(dev, bar);
> > > > +
> > > > + if (len <= offset || !start)
> > > > + return NULL;
> > > > + len -= offset;
> > > > + start += offset;
> > > > + if (maxlen && len > maxlen)
> > > > + len = maxlen;
> > > > + if (flags & IORESOURCE_IO)
> > > > + return __pci_ioport_map(dev, start, len);
> > > > + if (flags & IORESOURCE_MEM)
> > >
> > > Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
> > > I know the driver might know it's safe even if the device didn't mark
> > > the BAR as prefetchable, but it does seem like an easy way for a
> > > driver to shoot itself in the foot.
> >
> > You tell me. I would fear this may not be consistent and we'd end up
> > having bug reports open for something that has historically been a
> > non-issue. The above questions can help us gauge the risk of this.
>
> Now, I'll tell you what I *think* but these are just guestimates (TM):
>
> * Likely PCI_BASE_ADDRESS_MEM_PREFETCH can implate IORESOURCE_PREFETCH
> and use of ioremap_wc() on a full PCI BAR only, but this strict
> definition likely cannot be 100% guaranteed and could break some
> devices. We need something a bit more concrete and well known so
> that next generation industry standards embrace and let us in
> the kernel automatically detect specific ranges and their respective
> page attribute requirements. Might be good to address here x86 and
> ARM families
>
> Curious: Sarah, how does USB address these different different page attribute
> needs on USB 3.0?
>
> Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
2015-03-23 17:20 ` Bjorn Helgaas
2015-03-26 3:00 ` Luis R. Rodriguez
@ 2015-03-27 19:18 ` Toshi Kani
2015-04-21 19:25 ` Michael S. Tsirkin
2 siblings, 0 replies; 400+ messages in thread
From: Toshi Kani @ 2015-03-27 19:18 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
linux-fbdev, x86, xen-devel, Luis R. Rodriguez, Ingo Molnar,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
Stefan Bader, Konrad Rzeszutek Wilk, Ville Syrjälä,
David Vrabel, Roger Pau Monné,
xen-devel
On Mon, 2015-03-23 at 12:20 -0500, Bjorn Helgaas wrote:
:
> pci_iomap_range() already makes a cacheable mapping if
> IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
>
> if (flags & IORESOURCE_CACHEABLE)
> return ioremap(start, len);
Is this supposed to be ioremap_cache()? ioremap() is the same as
ioremap_nocache() at least on x86 per arch/x86/include/asm/io.h.
> if (flags & IORESOURCE_PREFETCH)
> return ioremap_wc(start, len);
> return ioremap_nocache(start, len);
>
-Toshi
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
2015-03-23 17:20 ` Bjorn Helgaas
2015-03-26 3:00 ` Luis R. Rodriguez
2015-03-27 19:18 ` Toshi Kani
@ 2015-04-21 19:25 ` Michael S. Tsirkin
2015-04-21 19:27 ` Luis R. Rodriguez
2 siblings, 1 reply; 400+ messages in thread
From: Michael S. Tsirkin @ 2015-04-21 19:25 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
linux-fbdev, x86, xen-devel, Luis R. Rodriguez, Ingo Molnar,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Stefan Bader,
Konrad Rzeszutek Wilk, Ville Syrjälä,
David Vrabel, Toshi Kani, Roger Pau Monné,
xen-devel
On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> pci_iomap_range() already makes a cacheable mapping if
> IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
>
> if (flags & IORESOURCE_CACHEABLE)
> return ioremap(start, len);
> if (flags & IORESOURCE_PREFETCH)
> return ioremap_wc(start, len);
> return ioremap_nocache(start, len);
>
> Is there a reason not to do that?
I think that's wrong and will break a bunch of things.
PCI prefetch bit merely means bridges can combine writes and prefetch
reads. Prefetch does not affect ordering rules and does not allow
writes to be collapsed.
WC is stronger: it allows collapsing and changes ordering rules.
WC can also hurt latency as small writes are buffered.
To summarise, driver needs to know what it's doing,
we can't set WC in the pci core automatically.
--
MST
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
2015-04-21 19:25 ` Michael S. Tsirkin
@ 2015-04-21 19:27 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 19:27 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Bjorn Helgaas, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Stefan Bader,
Konrad Rzeszutek Wilk, Ville Syrjälä,
David Vrabel, Toshi Kani, Roger Pau Monné,
xen-devel
On Tue, Apr 21, 2015 at 12:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> To summarise, driver needs to know what it's doing,
> we can't set WC in the pci core automatically.
Thanks, I'll document this and proceed with device driver helpers to
aid with this.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
2015-03-20 23:17 ` [PATCH v1 05/47] pci: add pci_iomap_wc() variants Luis R. Rodriguez
2015-03-23 17:20 ` Bjorn Helgaas
@ 2015-03-25 20:07 ` Konrad Rzeszutek Wilk
2015-03-27 18:40 ` Luis R. Rodriguez
1 sibling, 1 reply; 400+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-25 20:07 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
Bjorn Helgaas, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
Stefan Bader, ville.syrjala, david.vrabel, toshi.kani,
Roger Pau Monné,
xen-devel
On Fri, Mar 20, 2015 at 04:17:55PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This allows drivers to take advantage of write-combining
> when possible. Ideally we'd have pci_read_bases() just
> peg an IORESOURCE_WC flag for us but where exactly
> video devices memory lie varies *largely* and at times things
> are mixed with MMIO registers, sometimes we can address
> the changes in drivers, other times the change requires
> intrusive changes.
>
> Although there is also arch_phys_wc_add() that makes use of
> architecture specific write-combinging alternatives (MTRR on
combinging?
> x86 when a system does not have PAT) we void polluting
> pci_iomap() space with it and force drivers and subsystems
> that want to use it to be explicit.
>
> There are a few motivations for this:
>
> a) Take advantage of PAT when available
>
> b) Help bury MTRR code away, MTRR is architecture specific and on
> x86 its replaced by PAT
>
> c) Help with the goal of eventually using _PAGE_CACHE_UC over
> _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: venkatesh.pallipadi@intel.com
> Cc: Stefan Bader <stefan.bader@canonical.com>
> Cc: konrad.wilk@oracle.com
> Cc: ville.syrjala@linux.intel.com
> Cc: david.vrabel@citrix.com
> Cc: jbeulich@suse.com
> Cc: toshi.kani@hp.com
> Cc: Roger Pau Monné <roger.pau@citrix.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: xen-devel@lists.xensource.com
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
> include/asm-generic/pci_iomap.h | 14 ++++++++++
> lib/pci_iomap.c | 61 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 75 insertions(+)
>
> diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h
> index 7389c87..b1e17fc 100644
> --- a/include/asm-generic/pci_iomap.h
> +++ b/include/asm-generic/pci_iomap.h
> @@ -15,9 +15,13 @@ struct pci_dev;
> #ifdef CONFIG_PCI
> /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
> extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
> +extern void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max);
> extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
> unsigned long offset,
> unsigned long maxlen);
> +extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
> + unsigned long offset,
> + unsigned long maxlen);
> /* Create a virtual mapping cookie for a port on a given PCI device.
> * Do not call this directly, it exists to make it easier for architectures
> * to override */
> @@ -34,12 +38,22 @@ static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned lon
> return NULL;
> }
>
> +static inline void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max)
> +{
> + return NULL;
> +}
> static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
> unsigned long offset,
> unsigned long maxlen)
> {
> return NULL;
> }
> +static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
> + unsigned long offset,
> + unsigned long maxlen)
> +{
> + return NULL;
> +}
> #endif
>
> #endif /* __ASM_GENERIC_IO_H */
> diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
> index bcce5f1..30b65ae 100644
> --- a/lib/pci_iomap.c
> +++ b/lib/pci_iomap.c
> @@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
> EXPORT_SYMBOL(pci_iomap_range);
>
> /**
> + * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
> + * @dev: PCI device that owns the BAR
> + * @bar: BAR number
> + * @offset: map memory at the given offset in BAR
> + * @maxlen: max length of the memory to map
> + *
> + * Using this function you will get a __iomem address to your device BAR.
> + * You can access it using ioread*() and iowrite*(). These functions hide
> + * the details if this is a MMIO or PIO address space and will just do what
> + * you expect from them in the correct way. When possible write combining
> + * is used.
> + *
> + * @maxlen specifies the maximum length to map. If you want to get access to
> + * the complete BAR from offset to the end, pass %0 here.
s/%0/0 ? Or is that some special syntax?
> + * */
> +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> + int bar,
> + unsigned long offset,
> + unsigned long maxlen)
> +{
> + resource_size_t start = pci_resource_start(dev, bar);
> + resource_size_t len = pci_resource_len(dev, bar);
> + unsigned long flags = pci_resource_flags(dev, bar);
> +
> + if (len <= offset || !start)
> + return NULL;
> + len -= offset;
> + start += offset;
> + if (maxlen && len > maxlen)
> + len = maxlen;
> + if (flags & IORESOURCE_IO)
> + return __pci_ioport_map(dev, start, len);
> + if (flags & IORESOURCE_MEM)
> + return ioremap_wc(start, len);
> + /* What? */
> + return NULL;
> +}
> +EXPORT_SYMBOL_GPL(pci_iomap_wc_range);
> +
> +/**
> * pci_iomap - create a virtual mapping cookie for a PCI BAR
> * @dev: PCI device that owns the BAR
> * @bar: BAR number
> @@ -70,4 +110,25 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
> return pci_iomap_range(dev, bar, 0, maxlen);
> }
> EXPORT_SYMBOL(pci_iomap);
> +
> +/**
> + * pci_iomap_wc - create a virtual WC mapping cookie for a PCI BAR
> + * @dev: PCI device that owns the BAR
> + * @bar: BAR number
> + * @maxlen: length of the memory to map
> + *
> + * Using this function you will get a __iomem address to your device BAR.
> + * You can access it using ioread*() and iowrite*(). These functions hide
> + * the details if this is a MMIO or PIO address space and will just do what
> + * you expect from them in the correct way. When possible write combining
> + * is used.
> + *
> + * @maxlen specifies the maximum length to map. If you want to get access to
> + * the complete BAR without checking for its length first, pass %0 here.
> + * */
> +void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen)
> +{
> + return pci_iomap_wc_range(dev, bar, 0, maxlen);
> +}
> +EXPORT_SYMBOL_GPL(pci_iomap_wc);
> #endif /* CONFIG_PCI */
> --
> 2.3.2.209.gd67f9d5.dirty
>
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
2015-03-25 20:07 ` Konrad Rzeszutek Wilk
@ 2015-03-27 18:40 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 18:40 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
Bjorn Helgaas, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
Stefan Bader, ville.syrjala, david.vrabel, toshi.kani,
Roger Pau Monné,
xen-devel
On Wed, Mar 25, 2015 at 04:07:43PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:55PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > This allows drivers to take advantage of write-combining
> > when possible. Ideally we'd have pci_read_bases() just
> > peg an IORESOURCE_WC flag for us but where exactly
> > video devices memory lie varies *largely* and at times things
> > are mixed with MMIO registers, sometimes we can address
> > the changes in drivers, other times the change requires
> > intrusive changes.
> >
> > Although there is also arch_phys_wc_add() that makes use of
> > architecture specific write-combinging alternatives (MTRR on
>
> combinging?
Amended.
> > diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
> > index bcce5f1..30b65ae 100644
> > --- a/lib/pci_iomap.c
> > +++ b/lib/pci_iomap.c
> > @@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
> > EXPORT_SYMBOL(pci_iomap_range);
> >
> > /**
> > + * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
> > + * @dev: PCI device that owns the BAR
> > + * @bar: BAR number
> > + * @offset: map memory at the given offset in BAR
> > + * @maxlen: max length of the memory to map
> > + *
> > + * Using this function you will get a __iomem address to your device BAR.
> > + * You can access it using ioread*() and iowrite*(). These functions hide
> > + * the details if this is a MMIO or PIO address space and will just do what
> > + * you expect from them in the correct way. When possible write combining
> > + * is used.
> > + *
> > + * @maxlen specifies the maximum length to map. If you want to get access to
> > + * the complete BAR from offset to the end, pass %0 here.
>
> s/%0/0 ? Or is that some special syntax?
This copies the syntax of pci_iomap_range() which also uses %0, and as per
Documentation/kernel-doc-nano-HOWTO.txt % is used for constants. See:
scripts/kernel-doc -man -function pci_iomap_range lib/pci_iomap.c | nroff -man | less
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (4 preceding siblings ...)
2015-03-20 23:17 ` [PATCH v1 05/47] pci: add pci_iomap_wc() variants Luis R. Rodriguez
@ 2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:48 ` Andy Lutomirski
2015-04-02 20:21 ` Bjorn Helgaas
2015-03-20 23:17 ` [PATCH v1 07/47] video: fbdev: atyfb: move framebuffer length fudging to helper Luis R. Rodriguez
` (41 subsequent siblings)
47 siblings, 2 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Ideally on systems using PAT we can expect a swift
transition away from MTRR. There can be a few exceptions
to this, one is where device drivers are known to exist
on PATs with errata, another situation is observed on
old device drivers where devices had combined MMIO
register access with whatever area they typically
later wanted to end up using MTRR for on the same
PCI BAR. This situation can still be addressed by
splitting up ioremap'd PCI BAR into two ioremap'd
calls, one for MMIO registers, and another for whatever
is desirable for write-combining -- in order to
accomplish this though quite a bit of driver
restructuring is required.
Device drivers which are known to require large
amount of re-work in order to split ioremap'd areas
can use __arch_phys_wc_add() to avoid regressions
when PAT is enabled.
For a good example driver where things are neatly
split up on a PCI BAR refer the infiniband qib
driver. For a good example of a driver where good
amount of work is required refer to the infiniband
ipath driver.
This is *only* a transitive API -- and as such no new
drivers are ever expected to use this.
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
arch/x86/include/asm/io.h | 4 ++++
arch/x86/kernel/cpu/mtrr/main.c | 36 +++++++++++++++++++++++++++++-------
include/linux/io.h | 4 ++++
3 files changed, 37 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 34a5b93..a144d05 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -338,6 +338,10 @@ extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
#define IO_SPACE_LIMIT 0xffff
#ifdef CONFIG_MTRR
+extern int __must_check __arch_phys_wc_add(unsigned long base,
+ unsigned long size);
+#define __arch_phys_wc_add __arch_phys_wc_add
+
extern int __must_check arch_phys_wc_add(unsigned long base,
unsigned long size);
extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 7db9c47..5ae830b 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -538,23 +538,24 @@ int mtrr_del(int reg, unsigned long base, unsigned long size)
EXPORT_SYMBOL(mtrr_del);
/**
- * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable
+ * __arch_phys_wc_add - add a WC MTRR even if PAT is available
* @base: Physical base address
* @size: Size of region
*
- * If PAT is available, this does nothing. If PAT is unavailable, it
- * attempts to add a WC MTRR covering size bytes starting at base and
- * logs an error if this fails.
+ * We typically do not want to use MTRR if PAT is available but there
+ * are some drivers which require significant work to get this to work
+ * properly. This call should only be used by those drivers where it is
+ * clear that hard work is required to modify them to use arch_phys_wc_add()
*
* Drivers must store the return value to pass to mtrr_del_wc_if_needed,
* but drivers should not try to interpret that return value.
*/
-int arch_phys_wc_add(unsigned long base, unsigned long size)
+int __arch_phys_wc_add(unsigned long base, unsigned long size)
{
int ret;
- if (pat_enabled || !mtrr_enabled)
- return 0; /* Success! (We don't need to do anything.) */
+ if (!mtrr_enabled)
+ return 0;
ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
if (ret < 0) {
@@ -564,6 +565,27 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
}
return ret + MTRR_TO_PHYS_WC_OFFSET;
}
+EXPORT_SYMBOL_GPL(__arch_phys_wc_add);
+
+/**
+ * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable
+ * @base: Physical base address
+ * @size: Size of region
+ *
+ * If PAT is available, this does nothing. If PAT is unavailable, it
+ * attempts to add a WC MTRR covering size bytes starting at base and
+ * logs an error if this fails.
+ *
+ * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
+ * but drivers should not try to interpret that return value.
+ */
+int arch_phys_wc_add(unsigned long base, unsigned long size)
+{
+ if (pat_enabled || !mtrr_enabled)
+ return 0; /* Success! (We don't need to do anything.) */
+
+ return __arch_phys_wc_add(base, size);
+}
EXPORT_SYMBOL(arch_phys_wc_add);
/*
diff --git a/include/linux/io.h b/include/linux/io.h
index 91101a1..ecc51c3 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -111,6 +111,10 @@ static inline void arch_phys_wc_del(int handle)
}
#define arch_phys_wc_add arch_phys_wc_add
+#ifndef __arch_phys_wc_add
+#define __arch_phys_wc_add arch_phys_wc_add
+#endif
+
#endif
#endif /* _LINUX_IO_H */
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-03-20 23:17 ` [PATCH v1 06/47] mtrr: add __arch_phys_wc_add() Luis R. Rodriguez
@ 2015-03-20 23:48 ` Andy Lutomirski
2015-03-27 19:53 ` Luis R. Rodriguez
2015-04-02 20:21 ` Bjorn Helgaas
1 sibling, 1 reply; 400+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:48 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Ideally on systems using PAT we can expect a swift
> transition away from MTRR. There can be a few exceptions
> to this, one is where device drivers are known to exist
> on PATs with errata, another situation is observed on
> old device drivers where devices had combined MMIO
> register access with whatever area they typically
> later wanted to end up using MTRR for on the same
> PCI BAR. This situation can still be addressed by
> splitting up ioremap'd PCI BAR into two ioremap'd
> calls, one for MMIO registers, and another for whatever
> is desirable for write-combining -- in order to
> accomplish this though quite a bit of driver
> restructuring is required.
>
> Device drivers which are known to require large
> amount of re-work in order to split ioremap'd areas
> can use __arch_phys_wc_add() to avoid regressions
> when PAT is enabled.
>
> For a good example driver where things are neatly
> split up on a PCI BAR refer the infiniband qib
> driver. For a good example of a driver where good
> amount of work is required refer to the infiniband
> ipath driver.
>
> This is *only* a transitive API -- and as such no new
> drivers are ever expected to use this.
What's the exact layout that this helps? I'm sceptical that this can
ever be correct.
Is there some awful driver that has a large ioremap that's supposed to
contain multiple different memtypes? If so, can we ioremap +
set_page_xyz instead?
--Andy
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-03-20 23:48 ` Andy Lutomirski
@ 2015-03-27 19:53 ` Luis R. Rodriguez
2015-03-27 19:58 ` Andy Lutomirski
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:53 UTC (permalink / raw)
To: Andy Lutomirski, Bjorn Helgaas, Ville Syrjälä,
Mauro Carvalho Chehab, Mike Marciniszyn
Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > Ideally on systems using PAT we can expect a swift
> > transition away from MTRR. There can be a few exceptions
> > to this, one is where device drivers are known to exist
> > on PATs with errata, another situation is observed on
> > old device drivers where devices had combined MMIO
> > register access with whatever area they typically
> > later wanted to end up using MTRR for on the same
> > PCI BAR. This situation can still be addressed by
> > splitting up ioremap'd PCI BAR into two ioremap'd
> > calls, one for MMIO registers, and another for whatever
> > is desirable for write-combining -- in order to
> > accomplish this though quite a bit of driver
> > restructuring is required.
> >
> > Device drivers which are known to require large
> > amount of re-work in order to split ioremap'd areas
> > can use __arch_phys_wc_add() to avoid regressions
> > when PAT is enabled.
> >
> > For a good example driver where things are neatly
> > split up on a PCI BAR refer the infiniband qib
> > driver. For a good example of a driver where good
> > amount of work is required refer to the infiniband
> > ipath driver.
> >
> > This is *only* a transitive API -- and as such no new
> > drivers are ever expected to use this.
>
> What's the exact layout that this helps? I'm sceptical that this can
> ever be correct.
>
> Is there some awful driver that has a large ioremap that's supposed to
> contain multiple different memtypes?
Yes, I cc'd you just now on one where I made changes on a driver which uses one
PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
regress those drivers by making the MTRR WC hole trick non functional.
The changes are non trivial and so in this series I supplied changes on
one driver only to show the effort required. The other drivers which
required this were:
Driver File
------------------------------------------------------------
fusion drivers/message/fusion/mptbase.c
ivtv drivers/media/pci/ivtv/ivtvfb.c
ipath drivers/infiniband/hw/ipath/ipath_driver.c
This series makes those drivers use __arch_phys_wc_add() more as a
transitory phase in hopes we can address the proper split as with the
atyfb illustrates. For ipath the changes required have a nice template
with the qib driver as they share very similar driver structure, the
qib driver *did* do the nice split.
> If so, can we ioremap + set_page_xyz instead?
I'm not sure I see which call we'd use. Care to provide an example patch
alternative for the atyfb as a case in point alternative to the work required
to do the split?
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-03-27 19:53 ` Luis R. Rodriguez
@ 2015-03-27 19:58 ` Andy Lutomirski
2015-03-27 20:30 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Andy Lutomirski @ 2015-03-27 19:58 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Bjorn Helgaas, Ville Syrjälä,
Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >
>> > Ideally on systems using PAT we can expect a swift
>> > transition away from MTRR. There can be a few exceptions
>> > to this, one is where device drivers are known to exist
>> > on PATs with errata, another situation is observed on
>> > old device drivers where devices had combined MMIO
>> > register access with whatever area they typically
>> > later wanted to end up using MTRR for on the same
>> > PCI BAR. This situation can still be addressed by
>> > splitting up ioremap'd PCI BAR into two ioremap'd
>> > calls, one for MMIO registers, and another for whatever
>> > is desirable for write-combining -- in order to
>> > accomplish this though quite a bit of driver
>> > restructuring is required.
>> >
>> > Device drivers which are known to require large
>> > amount of re-work in order to split ioremap'd areas
>> > can use __arch_phys_wc_add() to avoid regressions
>> > when PAT is enabled.
>> >
>> > For a good example driver where things are neatly
>> > split up on a PCI BAR refer the infiniband qib
>> > driver. For a good example of a driver where good
>> > amount of work is required refer to the infiniband
>> > ipath driver.
>> >
>> > This is *only* a transitive API -- and as such no new
>> > drivers are ever expected to use this.
>>
>> What's the exact layout that this helps? I'm sceptical that this can
>> ever be correct.
>>
>> Is there some awful driver that has a large ioremap that's supposed to
>> contain multiple different memtypes?
>
> Yes, I cc'd you just now on one where I made changes on a driver which uses one
> PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> regress those drivers by making the MTRR WC hole trick non functional.
> The changes are non trivial and so in this series I supplied changes on
> one driver only to show the effort required. The other drivers which
> required this were:
>
> Driver File
> ------------------------------------------------------------
> fusion drivers/message/fusion/mptbase.c
> ivtv drivers/media/pci/ivtv/ivtvfb.c
> ipath drivers/infiniband/hw/ipath/ipath_driver.c
>
> This series makes those drivers use __arch_phys_wc_add() more as a
> transitory phase in hopes we can address the proper split as with the
> atyfb illustrates. For ipath the changes required have a nice template
> with the qib driver as they share very similar driver structure, the
> qib driver *did* do the nice split.
>
>> If so, can we ioremap + set_page_xyz instead?
>
> I'm not sure I see which call we'd use. Care to provide an example patch
> alternative for the atyfb as a case in point alternative to the work required
> to do the split?
>
I'm still confused. Would it be insufficient to ioremap_nocache the
whole thing and then call set_memory_wc on parts of it? (Sorry,
set_page_xyz was a typo.)
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-03-27 19:58 ` Andy Lutomirski
@ 2015-03-27 20:30 ` Luis R. Rodriguez
2015-03-27 21:23 ` Andy Lutomirski
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 20:30 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Bjorn Helgaas, Ville Syrjälä,
Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> <mcgrof@do-not-panic.com> wrote:
> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >
> >> > Ideally on systems using PAT we can expect a swift
> >> > transition away from MTRR. There can be a few exceptions
> >> > to this, one is where device drivers are known to exist
> >> > on PATs with errata, another situation is observed on
> >> > old device drivers where devices had combined MMIO
> >> > register access with whatever area they typically
> >> > later wanted to end up using MTRR for on the same
> >> > PCI BAR. This situation can still be addressed by
> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> > calls, one for MMIO registers, and another for whatever
> >> > is desirable for write-combining -- in order to
> >> > accomplish this though quite a bit of driver
> >> > restructuring is required.
> >> >
> >> > Device drivers which are known to require large
> >> > amount of re-work in order to split ioremap'd areas
> >> > can use __arch_phys_wc_add() to avoid regressions
> >> > when PAT is enabled.
> >> >
> >> > For a good example driver where things are neatly
> >> > split up on a PCI BAR refer the infiniband qib
> >> > driver. For a good example of a driver where good
> >> > amount of work is required refer to the infiniband
> >> > ipath driver.
> >> >
> >> > This is *only* a transitive API -- and as such no new
> >> > drivers are ever expected to use this.
> >>
> >> What's the exact layout that this helps? I'm sceptical that this can
> >> ever be correct.
> >>
> >> Is there some awful driver that has a large ioremap that's supposed to
> >> contain multiple different memtypes?
> >
> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> > regress those drivers by making the MTRR WC hole trick non functional.
> > The changes are non trivial and so in this series I supplied changes on
> > one driver only to show the effort required. The other drivers which
> > required this were:
> >
> > Driver File
> > ------------------------------------------------------------
> > fusion drivers/message/fusion/mptbase.c
> > ivtv drivers/media/pci/ivtv/ivtvfb.c
> > ipath drivers/infiniband/hw/ipath/ipath_driver.c
> >
> > This series makes those drivers use __arch_phys_wc_add() more as a
> > transitory phase in hopes we can address the proper split as with the
> > atyfb illustrates. For ipath the changes required have a nice template
> > with the qib driver as they share very similar driver structure, the
> > qib driver *did* do the nice split.
> >
> >> If so, can we ioremap + set_page_xyz instead?
> >
> > I'm not sure I see which call we'd use. Care to provide an example patch
> > alternative for the atyfb as a case in point alternative to the work required
> > to do the split?
> >
>
> I'm still confused. Would it be insufficient to ioremap_nocache the
> whole thing and then call set_memory_wc on parts of it? (Sorry,
> set_page_xyz was a typo.)
I think that would be a sexy alternative.
In this driver's case the thing is a bit messy as it not only used
the WC MTRR for a hole but it also then used a UC MTRR on top of
it all, so since I already tried to address the split, and if we address
the power of 2 woes, I think it'd be best to try to remove the UC MTRR
and just avoid set_page_wc() in this driver's case, but for the other cases
(fusion, ivtv, ipath) I think this makes sense.
Thoughts?
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-03-27 20:30 ` Luis R. Rodriguez
@ 2015-03-27 21:23 ` Andy Lutomirski
2015-03-27 23:04 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Andy Lutomirski @ 2015-03-27 21:23 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Bjorn Helgaas, Ville Syrjälä,
Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> >> <mcgrof@do-not-panic.com> wrote:
>> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >> >
>> >> > Ideally on systems using PAT we can expect a swift
>> >> > transition away from MTRR. There can be a few exceptions
>> >> > to this, one is where device drivers are known to exist
>> >> > on PATs with errata, another situation is observed on
>> >> > old device drivers where devices had combined MMIO
>> >> > register access with whatever area they typically
>> >> > later wanted to end up using MTRR for on the same
>> >> > PCI BAR. This situation can still be addressed by
>> >> > splitting up ioremap'd PCI BAR into two ioremap'd
>> >> > calls, one for MMIO registers, and another for whatever
>> >> > is desirable for write-combining -- in order to
>> >> > accomplish this though quite a bit of driver
>> >> > restructuring is required.
>> >> >
>> >> > Device drivers which are known to require large
>> >> > amount of re-work in order to split ioremap'd areas
>> >> > can use __arch_phys_wc_add() to avoid regressions
>> >> > when PAT is enabled.
>> >> >
>> >> > For a good example driver where things are neatly
>> >> > split up on a PCI BAR refer the infiniband qib
>> >> > driver. For a good example of a driver where good
>> >> > amount of work is required refer to the infiniband
>> >> > ipath driver.
>> >> >
>> >> > This is *only* a transitive API -- and as such no new
>> >> > drivers are ever expected to use this.
>> >>
>> >> What's the exact layout that this helps? I'm sceptical that this can
>> >> ever be correct.
>> >>
>> >> Is there some awful driver that has a large ioremap that's supposed to
>> >> contain multiple different memtypes?
>> >
>> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
>> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
>> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
>> > regress those drivers by making the MTRR WC hole trick non functional.
>> > The changes are non trivial and so in this series I supplied changes on
>> > one driver only to show the effort required. The other drivers which
>> > required this were:
>> >
>> > Driver File
>> > ------------------------------------------------------------
>> > fusion drivers/message/fusion/mptbase.c
>> > ivtv drivers/media/pci/ivtv/ivtvfb.c
>> > ipath drivers/infiniband/hw/ipath/ipath_driver.c
>> >
>> > This series makes those drivers use __arch_phys_wc_add() more as a
>> > transitory phase in hopes we can address the proper split as with the
>> > atyfb illustrates. For ipath the changes required have a nice template
>> > with the qib driver as they share very similar driver structure, the
>> > qib driver *did* do the nice split.
>> >
>> >> If so, can we ioremap + set_page_xyz instead?
>> >
>> > I'm not sure I see which call we'd use. Care to provide an example patch
>> > alternative for the atyfb as a case in point alternative to the work required
>> > to do the split?
>> >
>>
>> I'm still confused. Would it be insufficient to ioremap_nocache the
>> whole thing and then call set_memory_wc on parts of it? (Sorry,
>> set_page_xyz was a typo.)
>
> I think that would be a sexy alternative.
>
> In this driver's case the thing is a bit messy as it not only used
> the WC MTRR for a hole but it also then used a UC MTRR on top of
> it all, so since I already tried to address the split, and if we address
> the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> and just avoid set_page_wc() in this driver's case, but for the other cases
> (fusion, ivtv, ipath) I think this makes sense.
>
> Thoughts?
Once that WC MTRR is in place, I think you really need UC and not UC-
if you want to override it. Otherwise I agree with all of this.
--Andy
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-03-27 21:23 ` Andy Lutomirski
@ 2015-03-27 23:04 ` Luis R. Rodriguez
2015-03-27 23:10 ` Andy Lutomirski
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:04 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Bjorn Helgaas, Ville Syrjälä,
Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> >> <mcgrof@do-not-panic.com> wrote:
> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >> >
> >> >> > Ideally on systems using PAT we can expect a swift
> >> >> > transition away from MTRR. There can be a few exceptions
> >> >> > to this, one is where device drivers are known to exist
> >> >> > on PATs with errata, another situation is observed on
> >> >> > old device drivers where devices had combined MMIO
> >> >> > register access with whatever area they typically
> >> >> > later wanted to end up using MTRR for on the same
> >> >> > PCI BAR. This situation can still be addressed by
> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> >> > calls, one for MMIO registers, and another for whatever
> >> >> > is desirable for write-combining -- in order to
> >> >> > accomplish this though quite a bit of driver
> >> >> > restructuring is required.
> >> >> >
> >> >> > Device drivers which are known to require large
> >> >> > amount of re-work in order to split ioremap'd areas
> >> >> > can use __arch_phys_wc_add() to avoid regressions
> >> >> > when PAT is enabled.
> >> >> >
> >> >> > For a good example driver where things are neatly
> >> >> > split up on a PCI BAR refer the infiniband qib
> >> >> > driver. For a good example of a driver where good
> >> >> > amount of work is required refer to the infiniband
> >> >> > ipath driver.
> >> >> >
> >> >> > This is *only* a transitive API -- and as such no new
> >> >> > drivers are ever expected to use this.
> >> >>
> >> >> What's the exact layout that this helps? I'm sceptical that this can
> >> >> ever be correct.
> >> >>
> >> >> Is there some awful driver that has a large ioremap that's supposed to
> >> >> contain multiple different memtypes?
> >> >
> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> >> > regress those drivers by making the MTRR WC hole trick non functional.
> >> > The changes are non trivial and so in this series I supplied changes on
> >> > one driver only to show the effort required. The other drivers which
> >> > required this were:
> >> >
> >> > Driver File
> >> > ------------------------------------------------------------
> >> > fusion drivers/message/fusion/mptbase.c
> >> > ivtv drivers/media/pci/ivtv/ivtvfb.c
> >> > ipath drivers/infiniband/hw/ipath/ipath_driver.c
> >> >
> >> > This series makes those drivers use __arch_phys_wc_add() more as a
> >> > transitory phase in hopes we can address the proper split as with the
> >> > atyfb illustrates. For ipath the changes required have a nice template
> >> > with the qib driver as they share very similar driver structure, the
> >> > qib driver *did* do the nice split.
> >> >
> >> >> If so, can we ioremap + set_page_xyz instead?
> >> >
> >> > I'm not sure I see which call we'd use. Care to provide an example patch
> >> > alternative for the atyfb as a case in point alternative to the work required
> >> > to do the split?
> >> >
> >>
> >> I'm still confused. Would it be insufficient to ioremap_nocache the
> >> whole thing and then call set_memory_wc on parts of it? (Sorry,
> >> set_page_xyz was a typo.)
> >
> > I think that would be a sexy alternative.
> >
> > In this driver's case the thing is a bit messy as it not only used
> > the WC MTRR for a hole but it also then used a UC MTRR on top of
> > it all, so since I already tried to address the split, and if we address
> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> > and just avoid set_page_wc() in this driver's case, but for the other cases
> > (fusion, ivtv, ipath) I think this makes sense.
> >
> > Thoughts?
>
> Once that WC MTRR is in place, I think you really need UC and not UC-
> if you want to override it. Otherwise I agree with all of this.
Do you mean that the UC MTRR work around that was in place might not
have really been effective? Not quite sure what you mean. I don't think
I follow.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-03-27 23:04 ` Luis R. Rodriguez
@ 2015-03-27 23:10 ` Andy Lutomirski
2015-03-27 23:33 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Andy Lutomirski @ 2015-03-27 23:10 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Bjorn Helgaas, Ville Syrjälä,
Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 4:04 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
>> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> >> >> <mcgrof@do-not-panic.com> wrote:
>> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >> >> >
>> >> >> > Ideally on systems using PAT we can expect a swift
>> >> >> > transition away from MTRR. There can be a few exceptions
>> >> >> > to this, one is where device drivers are known to exist
>> >> >> > on PATs with errata, another situation is observed on
>> >> >> > old device drivers where devices had combined MMIO
>> >> >> > register access with whatever area they typically
>> >> >> > later wanted to end up using MTRR for on the same
>> >> >> > PCI BAR. This situation can still be addressed by
>> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
>> >> >> > calls, one for MMIO registers, and another for whatever
>> >> >> > is desirable for write-combining -- in order to
>> >> >> > accomplish this though quite a bit of driver
>> >> >> > restructuring is required.
>> >> >> >
>> >> >> > Device drivers which are known to require large
>> >> >> > amount of re-work in order to split ioremap'd areas
>> >> >> > can use __arch_phys_wc_add() to avoid regressions
>> >> >> > when PAT is enabled.
>> >> >> >
>> >> >> > For a good example driver where things are neatly
>> >> >> > split up on a PCI BAR refer the infiniband qib
>> >> >> > driver. For a good example of a driver where good
>> >> >> > amount of work is required refer to the infiniband
>> >> >> > ipath driver.
>> >> >> >
>> >> >> > This is *only* a transitive API -- and as such no new
>> >> >> > drivers are ever expected to use this.
>> >> >>
>> >> >> What's the exact layout that this helps? I'm sceptical that this can
>> >> >> ever be correct.
>> >> >>
>> >> >> Is there some awful driver that has a large ioremap that's supposed to
>> >> >> contain multiple different memtypes?
>> >> >
>> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
>> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
>> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
>> >> > regress those drivers by making the MTRR WC hole trick non functional.
>> >> > The changes are non trivial and so in this series I supplied changes on
>> >> > one driver only to show the effort required. The other drivers which
>> >> > required this were:
>> >> >
>> >> > Driver File
>> >> > ------------------------------------------------------------
>> >> > fusion drivers/message/fusion/mptbase.c
>> >> > ivtv drivers/media/pci/ivtv/ivtvfb.c
>> >> > ipath drivers/infiniband/hw/ipath/ipath_driver.c
>> >> >
>> >> > This series makes those drivers use __arch_phys_wc_add() more as a
>> >> > transitory phase in hopes we can address the proper split as with the
>> >> > atyfb illustrates. For ipath the changes required have a nice template
>> >> > with the qib driver as they share very similar driver structure, the
>> >> > qib driver *did* do the nice split.
>> >> >
>> >> >> If so, can we ioremap + set_page_xyz instead?
>> >> >
>> >> > I'm not sure I see which call we'd use. Care to provide an example patch
>> >> > alternative for the atyfb as a case in point alternative to the work required
>> >> > to do the split?
>> >> >
>> >>
>> >> I'm still confused. Would it be insufficient to ioremap_nocache the
>> >> whole thing and then call set_memory_wc on parts of it? (Sorry,
>> >> set_page_xyz was a typo.)
>> >
>> > I think that would be a sexy alternative.
>> >
>> > In this driver's case the thing is a bit messy as it not only used
>> > the WC MTRR for a hole but it also then used a UC MTRR on top of
>> > it all, so since I already tried to address the split, and if we address
>> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
>> > and just avoid set_page_wc() in this driver's case, but for the other cases
>> > (fusion, ivtv, ipath) I think this makes sense.
>> >
>> > Thoughts?
>>
>> Once that WC MTRR is in place, I think you really need UC and not UC-
>> if you want to override it. Otherwise I agree with all of this.
>
> Do you mean that the UC MTRR work around that was in place might not
> have really been effective? Not quite sure what you mean. I don't think
> I follow.
I mean that the UC MTRR that overrides the WC MTRR was probably fine
(I hope smaller MTRRs override larger MTRRs). But we should just
ditch UC MTRRs entirely, and setting UC in the page tables would work
on all CPUs *if we supported that*. We'd need to add a couple trivial
helpers to do that.
--Andy
>
> Luis
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-03-27 23:10 ` Andy Lutomirski
@ 2015-03-27 23:33 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:33 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Bjorn Helgaas, Ville Syrjälä,
Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 04:10:03PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 4:04 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> >> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> >> >> <mcgrof@do-not-panic.com> wrote:
> >> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >> >> >
> >> >> >> > Ideally on systems using PAT we can expect a swift
> >> >> >> > transition away from MTRR. There can be a few exceptions
> >> >> >> > to this, one is where device drivers are known to exist
> >> >> >> > on PATs with errata, another situation is observed on
> >> >> >> > old device drivers where devices had combined MMIO
> >> >> >> > register access with whatever area they typically
> >> >> >> > later wanted to end up using MTRR for on the same
> >> >> >> > PCI BAR. This situation can still be addressed by
> >> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> >> >> > calls, one for MMIO registers, and another for whatever
> >> >> >> > is desirable for write-combining -- in order to
> >> >> >> > accomplish this though quite a bit of driver
> >> >> >> > restructuring is required.
> >> >> >> >
> >> >> >> > Device drivers which are known to require large
> >> >> >> > amount of re-work in order to split ioremap'd areas
> >> >> >> > can use __arch_phys_wc_add() to avoid regressions
> >> >> >> > when PAT is enabled.
> >> >> >> >
> >> >> >> > For a good example driver where things are neatly
> >> >> >> > split up on a PCI BAR refer the infiniband qib
> >> >> >> > driver. For a good example of a driver where good
> >> >> >> > amount of work is required refer to the infiniband
> >> >> >> > ipath driver.
> >> >> >> >
> >> >> >> > This is *only* a transitive API -- and as such no new
> >> >> >> > drivers are ever expected to use this.
> >> >> >>
> >> >> >> What's the exact layout that this helps? I'm sceptical that this can
> >> >> >> ever be correct.
> >> >> >>
> >> >> >> Is there some awful driver that has a large ioremap that's supposed to
> >> >> >> contain multiple different memtypes?
> >> >> >
> >> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> >> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> >> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> >> >> > regress those drivers by making the MTRR WC hole trick non functional.
> >> >> > The changes are non trivial and so in this series I supplied changes on
> >> >> > one driver only to show the effort required. The other drivers which
> >> >> > required this were:
> >> >> >
> >> >> > Driver File
> >> >> > ------------------------------------------------------------
> >> >> > fusion drivers/message/fusion/mptbase.c
> >> >> > ivtv drivers/media/pci/ivtv/ivtvfb.c
> >> >> > ipath drivers/infiniband/hw/ipath/ipath_driver.c
> >> >> >
> >> >> > This series makes those drivers use __arch_phys_wc_add() more as a
> >> >> > transitory phase in hopes we can address the proper split as with the
> >> >> > atyfb illustrates. For ipath the changes required have a nice template
> >> >> > with the qib driver as they share very similar driver structure, the
> >> >> > qib driver *did* do the nice split.
> >> >> >
> >> >> >> If so, can we ioremap + set_page_xyz instead?
> >> >> >
> >> >> > I'm not sure I see which call we'd use. Care to provide an example patch
> >> >> > alternative for the atyfb as a case in point alternative to the work required
> >> >> > to do the split?
> >> >> >
> >> >>
> >> >> I'm still confused. Would it be insufficient to ioremap_nocache the
> >> >> whole thing and then call set_memory_wc on parts of it? (Sorry,
> >> >> set_page_xyz was a typo.)
> >> >
> >> > I think that would be a sexy alternative.
> >> >
> >> > In this driver's case the thing is a bit messy as it not only used
> >> > the WC MTRR for a hole but it also then used a UC MTRR on top of
> >> > it all, so since I already tried to address the split, and if we address
> >> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> >> > and just avoid set_page_wc() in this driver's case, but for the other cases
> >> > (fusion, ivtv, ipath) I think this makes sense.
> >> >
> >> > Thoughts?
> >>
> >> Once that WC MTRR is in place, I think you really need UC and not UC-
> >> if you want to override it. Otherwise I agree with all of this.
> >
> > Do you mean that the UC MTRR work around that was in place might not
> > have really been effective? Not quite sure what you mean. I don't think
> > I follow.
>
> I mean that the UC MTRR that overrides the WC MTRR was probably fine
> (I hope smaller MTRRs override larger MTRRs). But we should just
> ditch UC MTRRs entirely,
Agreed, this series does that, and this patch addresses the last
UC MTRR ;)
> and setting UC in the page tables would work on all CPUs *if we supported
> that*. We'd need to add a couple trivial helpers to do that.
OK please check my latest reply and if you do not mind clarify what you mean
there as I am not sure if we're on the same page (no pun) here. I don't quite
follow this last statement.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-03-20 23:17 ` [PATCH v1 06/47] mtrr: add __arch_phys_wc_add() Luis R. Rodriguez
2015-03-20 23:48 ` Andy Lutomirski
@ 2015-04-02 20:21 ` Bjorn Helgaas
2015-04-02 20:55 ` Luis R. Rodriguez
1 sibling, 1 reply; 400+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 20:21 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
jgross, Jan Beulich, Borislav Petkov, Suresh Siddha,
venkatesh.pallipadi, Dave Airlie, linux-kernel, linux-fbdev, x86,
xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Ideally on systems using PAT we can expect a swift
> transition away from MTRR. There can be a few exceptions
> to this, one is where device drivers are known to exist
> on PATs with errata,
This probably makes sense to someone steeped in MTRR and PAT, but not
otherwise. "One exception is where drivers are known to exist on PATs
with errata"? The drivers exist, independent of PAT/MTRR/errata. Do
you mean there are drivers that can't be converted from MTRR to PAT
because some PATs are broken?
I don't really know anything about MTRR or PAT; I'm just trying to
figure out how to parse this paragraph.
> another situation is observed on
> old device drivers where devices had combined MMIO
> register access with whatever area they typically
> later wanted to end up using MTRR for on the same
> PCI BAR. This situation can still be addressed by
> splitting up ioremap'd PCI BAR into two ioremap'd
> calls, one for MMIO registers, and another for whatever
> is desirable for write-combining -- in order to
> accomplish this though quite a bit of driver
> restructuring is required.
>
> Device drivers which are known to require large
> amount of re-work in order to split ioremap'd areas
> can use __arch_phys_wc_add() to avoid regressions
> when PAT is enabled.
>
> For a good example driver where things are neatly
> split up on a PCI BAR refer the infiniband qib
> driver. For a good example of a driver where good
> amount of work is required refer to the infiniband
> ipath driver.
>
> This is *only* a transitive API -- and as such no new
> drivers are ever expected to use this.
"transient"? Do you mean you intend to remove this API in the near future?
Bjorn
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-04-02 20:21 ` Bjorn Helgaas
@ 2015-04-02 20:55 ` Luis R. Rodriguez
2015-04-02 22:35 ` Bjorn Helgaas
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 20:55 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > Ideally on systems using PAT we can expect a swift
> > transition away from MTRR. There can be a few exceptions
> > to this, one is where device drivers are known to exist
> > on PATs with errata,
>
> This probably makes sense to someone steeped in MTRR and PAT, but not
> otherwise. "One exception is where drivers are known to exist on PATs
> with errata"? The drivers exist, independent of PAT/MTRR/errata. Do
> you mean there are drivers that can't be converted from MTRR to PAT
> because some PATs are broken?
Well there is that but it seems we have motivation to
address the PAT broken systems so this would be one of the
lower priority reasons to consider adding this API. The
more important reason is below.
> I don't really know anything about MTRR or PAT; I'm just trying to
> figure out how to parse this paragraph.
Sure.
> > another situation is observed on
> > old device drivers where devices had combined MMIO
> > register access with whatever area they typically
> > later wanted to end up using MTRR for on the same
> > PCI BAR. This situation can still be addressed by
> > splitting up ioremap'd PCI BAR into two ioremap'd
> > calls, one for MMIO registers, and another for whatever
> > is desirable for write-combining -- in order to
> > accomplish this though quite a bit of driver
> > restructuring is required.
> >
> > Device drivers which are known to require large
> > amount of re-work in order to split ioremap'd areas
> > can use __arch_phys_wc_add() to avoid regressions
> > when PAT is enabled.
> >
> > For a good example driver where things are neatly
> > split up on a PCI BAR refer the infiniband qib
> > driver. For a good example of a driver where good
> > amount of work is required refer to the infiniband
> > ipath driver.
> >
> > This is *only* a transitive API -- and as such no new
> > drivers are ever expected to use this.
>
> "transient"? Do you mean you intend to remove this API in the near future?
That's correct, the problem is that in order to use PAT cleanly we'd need to
change these drivers to not overlap ioremap'd areas otherwise things can get
quite complex, and changing the way we do the ioremap() calls on a driver might
require a bit of work. The atyfb driver changes I did are an example of the
types of changes that are expected. In the most complex worst cases there are
MTRR "hole" tricks used, and as can be observed with the atyfb driver changes
there are a series of things to consider when this is done specially in light
of eventually making strong UC the default instead of UC-.
I might be able to work around not adding this API by reviewing the users I had
in this series again and seeing if something similar to what I will do on atyfb
can be done in the meantime by using ioremap_uc(). Its not clear to me yet.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-04-02 20:55 ` Luis R. Rodriguez
@ 2015-04-02 22:35 ` Bjorn Helgaas
2015-04-02 22:54 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 22:35 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
Dave Airlie, linux-kernel, linux-fbdev, x86, xen-devel,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
[-cc Venkatesh, Suresh]
On Thu, Apr 2, 2015 at 3:55 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
>> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> > This is *only* a transitive API -- and as such no new
>> > drivers are ever expected to use this.
>>
>> "transient"? Do you mean you intend to remove this API in the near future?
>
> That's correct, the problem is that in order to use PAT cleanly we'd need to
> change these drivers ...
I was just trying to ask whether you intended to write "transient"
instead of "transitive." But I'm not doing a very good job :)
Bjorn
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
2015-04-02 22:35 ` Bjorn Helgaas
@ 2015-04-02 22:54 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 22:54 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Juergen Gross, Jan Beulich, Borislav Petkov, Dave Airlie,
linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Thu, Apr 2, 2015 at 3:35 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [-cc Venkatesh, Suresh]
>
> On Thu, Apr 2, 2015 at 3:55 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
>>> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
>>> <mcgrof@do-not-panic.com> wrote:
>>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
>>> > This is *only* a transitive API -- and as such no new
>>> > drivers are ever expected to use this.
>>>
>>> "transient"? Do you mean you intend to remove this API in the near future?
>>
>> That's correct, the problem is that in order to use PAT cleanly we'd need to
>> change these drivers ...
>
> I was just trying to ask whether you intended to write "transient"
> instead of "transitive." But I'm not doing a very good job :)
Yes, corrected, thanks.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH v1 07/47] video: fbdev: atyfb: move framebuffer length fudging to helper
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (5 preceding siblings ...)
2015-03-20 23:17 ` [PATCH v1 06/47] mtrr: add __arch_phys_wc_add() Luis R. Rodriguez
@ 2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 08/47] video: fbdev: atyfb: clarify ioremap() base and length used Luis R. Rodriguez
` (40 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
The size of the framebuffer to be used needs to
be fudged to account for the different type of
devices that are out there. This captures what
is required to do well, we'll resuse this later.
This has no functional changes.
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/aty/atyfb_base.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8789e48..16936bb 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -427,6 +427,20 @@ static struct {
#endif /* CONFIG_FB_ATY_CT */
};
+/*
+ * Last page of 8 MB (4 MB on ISA) aperture is MMIO,
+ * unless the auxiliary register aperture is used.
+ */
+static void aty_fudge_framebuffer_len(struct fb_info *info)
+{
+ struct atyfb_par *par = (struct atyfb_par *) info->par;
+
+ if (!par->aux_start &&
+ (info->fix.smem_len == 0x800000 ||
+ (par->bus_type == ISA && info->fix.smem_len == 0x400000)))
+ info->fix.smem_len -= GUI_RESERVE;
+}
+
static int correct_chipset(struct atyfb_par *par)
{
u8 rev;
@@ -2603,14 +2617,7 @@ static int aty_init(struct fb_info *info)
if (par->pll_ops->resume_pll)
par->pll_ops->resume_pll(info, &par->pll);
- /*
- * Last page of 8 MB (4 MB on ISA) aperture is MMIO,
- * unless the auxiliary register aperture is used.
- */
- if (!par->aux_start &&
- (info->fix.smem_len == 0x800000 ||
- (par->bus_type == ISA && info->fix.smem_len == 0x400000)))
- info->fix.smem_len -= GUI_RESERVE;
+ aty_fudge_framebuffer_len(info);
/*
* Disable register access through the linear aperture
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 08/47] video: fbdev: atyfb: clarify ioremap() base and length used
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (6 preceding siblings ...)
2015-03-20 23:17 ` [PATCH v1 07/47] video: fbdev: atyfb: move framebuffer length fudging to helper Luis R. Rodriguez
@ 2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around Luis R. Rodriguez
` (39 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This has no functional changes, it just adjusts
the ioremap() call for the framebuffer to use
the same values we later use for the framebuffer,
this will make it easier to review the next change.
The size of the framebuffer varies but since this is
for PCI we *know* this defaults to 0x800000.
atyfb_setup_generic() is *only* used on PCI probe.
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/aty/atyfb_base.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 16936bb..8025624 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -3489,7 +3489,9 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
/* Map in frame buffer */
info->fix.smem_start = addr;
- info->screen_base = ioremap(addr, 0x800000);
+ info->fix.smem_len = 0x800000;
+
+ info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
if (info->screen_base == NULL) {
ret = -ENOMEM;
goto atyfb_setup_generic_fail;
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (7 preceding siblings ...)
2015-03-20 23:17 ` [PATCH v1 08/47] video: fbdev: atyfb: clarify ioremap() base and length used Luis R. Rodriguez
@ 2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:52 ` Andy Lutomirski
2015-03-21 9:15 ` Ville Syrjälä
2015-03-20 23:18 ` [PATCH v1 10/47] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
` (38 subsequent siblings)
47 siblings, 2 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
The atyfb driver uses an MTRR work around since some
cards use the same PCI BAR for the framebuffer and MMIO.
In such cards the last page is used for MMIO, the rest for
the framebuffer, so on those cards we ioremap() the MMIO
page alone, then again ioremap() the full framebuffer
including the MMIO space *and* ___then___ use an MTRR with
MTRR_TYPE_WRCOMB on the full PCI BAR... and finally "hole"
in an MTRR_TYPE_UNCACHABLE MTRR only for MMIO.
This is a terrible fucking work around, and should by no means
be necessary however evidence through a large series of conversion
of drivers to ioremap_wc() for the framebuffer shows that around
the time MTRR started becoming popular devices did not have things
lined up for easily separating the framebuffer and MMIO register
access. In some cases a driver requires significant intrusive
changes in order to make the split for an ioremap() for MMIO registers
and another ioremap_wc() for the framebuffer, at other times a
bit of careful study of the driver suffices. This example driver
falls into the later category.
We can replace the MTRR MTRR_TYPE_UNCACHABLE
work around by using ioremap_nocache(), the length of the
MMIO space should already be correct. The other part we
need to correct is ensuring we ioremap() for the framebuffer
only the required size. Since the ioremap() happens early
on probe for PCI devices before aty_init() where we typically
adjust the length and know how to do it, we can fix this by
pegging the bus type as PCI on PCI probe, and finally fudging
and framebuffer length just as we do on aty_init().
The last thing we do must do to remain sane is ensure we
use the info->fix.smem_start and info->fix.smem_len for
the framebuffer MTRR as we know that is always well adjusted.
The *one* concern here would be if the MTRR is not in units
of 4K __but__ we already know that in the PCI case this cannot
happen, in the shared space setting the MTRR would be up to
0x7ff000 and assuming a 4K page:
; 0x7ff000 / 0x1000
2047
Also, internally when MTRR is used mtrr_add() will use mtrr_check()
and that should splat a warning when the MTRR base and size are
not compatible with what is expected for MTRR usage.
This fix lets us nuke the MTRR_TYPE_UNCACHABLE MTRR "hole".
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/aty/atyfb.h | 1 -
drivers/video/fbdev/aty/atyfb_base.c | 28 ++++++----------------------
2 files changed, 6 insertions(+), 23 deletions(-)
diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
index 1f39a62..89ec439 100644
--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -184,7 +184,6 @@ struct atyfb_par {
spinlock_t int_lock;
#ifdef CONFIG_MTRR
int mtrr_aper;
- int mtrr_reg;
#endif
u32 mem_cntl;
struct crtc saved_crtc;
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8025624..8875e56 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
#ifdef CONFIG_MTRR
par->mtrr_aper = -1;
- par->mtrr_reg = -1;
if (!nomtrr) {
- /* Cover the whole resource. */
- par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
+ par->mtrr_aper = mtrr_add(info->fix.smem_start,
+ info->fix.smem_len,
MTRR_TYPE_WRCOMB, 1);
- if (par->mtrr_aper >= 0 && !par->aux_start) {
- /* Make a hole for mmio. */
- par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
- GUI_RESERVE, GUI_RESERVE,
- MTRR_TYPE_UNCACHABLE, 1);
- if (par->mtrr_reg < 0) {
- mtrr_del(par->mtrr_aper, 0, 0);
- par->mtrr_aper = -1;
- }
- }
}
#endif
@@ -2776,10 +2765,6 @@ aty_init_exit:
par->pll_ops->set_pll(info, &par->saved_pll);
#ifdef CONFIG_MTRR
- if (par->mtrr_reg >= 0) {
- mtrr_del(par->mtrr_reg, 0, 0);
- par->mtrr_reg = -1;
- }
if (par->mtrr_aper >= 0) {
mtrr_del(par->mtrr_aper, 0, 0);
par->mtrr_aper = -1;
@@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
}
info->fix.mmio_start = raddr;
- par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
+ par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
if (par->ati_regbase == NULL)
return -ENOMEM;
@@ -3491,6 +3476,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
info->fix.smem_start = addr;
info->fix.smem_len = 0x800000;
+ aty_fudge_framebuffer_len(info);
+
info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
if (info->screen_base == NULL) {
ret = -ENOMEM;
@@ -3563,6 +3550,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
return -ENOMEM;
}
par = info->par;
+ par->bus_type = PCI;
info->fix = atyfb_fix;
info->device = &pdev->dev;
par->pci_id = pdev->device;
@@ -3732,10 +3720,6 @@ static void atyfb_remove(struct fb_info *info)
#endif
#ifdef CONFIG_MTRR
- if (par->mtrr_reg >= 0) {
- mtrr_del(par->mtrr_reg, 0, 0);
- par->mtrr_reg = -1;
- }
if (par->mtrr_aper >= 0) {
mtrr_del(par->mtrr_aper, 0, 0);
par->mtrr_aper = -1;
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-20 23:17 ` [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around Luis R. Rodriguez
@ 2015-03-20 23:52 ` Andy Lutomirski
2015-03-27 20:12 ` Luis R. Rodriguez
2015-03-21 9:15 ` Ville Syrjälä
1 sibling, 1 reply; 400+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:52 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
xen-devel, Luis R. Rodriguez, Ingo Molnar, Linus Torvalds,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> The atyfb driver uses an MTRR work around since some
> cards use the same PCI BAR for the framebuffer and MMIO.
> In such cards the last page is used for MMIO, the rest for
> the framebuffer, so on those cards we ioremap() the MMIO
> page alone, then again ioremap() the full framebuffer
> including the MMIO space *and* ___then___ use an MTRR with
> MTRR_TYPE_WRCOMB on the full PCI BAR... and finally "hole"
> in an MTRR_TYPE_UNCACHABLE MTRR only for MMIO.
>
> This is a terrible fucking work around, and should by no means
> be necessary however evidence through a large series of conversion
> of drivers to ioremap_wc() for the framebuffer shows that around
> the time MTRR started becoming popular devices did not have things
> lined up for easily separating the framebuffer and MMIO register
> access. In some cases a driver requires significant intrusive
> changes in order to make the split for an ioremap() for MMIO registers
> and another ioremap_wc() for the framebuffer, at other times a
> bit of careful study of the driver suffices. This example driver
> falls into the later category.
>
> We can replace the MTRR MTRR_TYPE_UNCACHABLE
> work around by using ioremap_nocache(), the length of the
> MMIO space should already be correct. The other part we
> need to correct is ensuring we ioremap() for the framebuffer
> only the required size. Since the ioremap() happens early
> on probe for PCI devices before aty_init() where we typically
> adjust the length and know how to do it, we can fix this by
> pegging the bus type as PCI on PCI probe, and finally fudging
> and framebuffer length just as we do on aty_init().
>
> The last thing we do must do to remain sane is ensure we
> use the info->fix.smem_start and info->fix.smem_len for
> the framebuffer MTRR as we know that is always well adjusted.
> The *one* concern here would be if the MTRR is not in units
> of 4K __but__ we already know that in the PCI case this cannot
> happen, in the shared space setting the MTRR would be up to
> 0x7ff000 and assuming a 4K page:
>
> ; 0x7ff000 / 0x1000
> 2047
>
> Also, internally when MTRR is used mtrr_add() will use mtrr_check()
> and that should splat a warning when the MTRR base and size are
> not compatible with what is expected for MTRR usage.
>
> This fix lets us nuke the MTRR_TYPE_UNCACHABLE MTRR "hole".
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
> drivers/video/fbdev/aty/atyfb.h | 1 -
> drivers/video/fbdev/aty/atyfb_base.c | 28 ++++++----------------------
> 2 files changed, 6 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
> index 1f39a62..89ec439 100644
> --- a/drivers/video/fbdev/aty/atyfb.h
> +++ b/drivers/video/fbdev/aty/atyfb.h
> @@ -184,7 +184,6 @@ struct atyfb_par {
> spinlock_t int_lock;
> #ifdef CONFIG_MTRR
> int mtrr_aper;
> - int mtrr_reg;
> #endif
> u32 mem_cntl;
> struct crtc saved_crtc;
> diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> index 8025624..8875e56 100644
> --- a/drivers/video/fbdev/aty/atyfb_base.c
> +++ b/drivers/video/fbdev/aty/atyfb_base.c
> @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>
> #ifdef CONFIG_MTRR
> par->mtrr_aper = -1;
> - par->mtrr_reg = -1;
> if (!nomtrr) {
> - /* Cover the whole resource. */
> - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> + info->fix.smem_len,
> MTRR_TYPE_WRCOMB, 1);
> - if (par->mtrr_aper >= 0 && !par->aux_start) {
> - /* Make a hole for mmio. */
> - par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> - GUI_RESERVE, GUI_RESERVE,
> - MTRR_TYPE_UNCACHABLE, 1);
> - if (par->mtrr_reg < 0) {
> - mtrr_del(par->mtrr_aper, 0, 0);
> - par->mtrr_aper = -1;
> - }
> - }
> }
> #endif
>
> @@ -2776,10 +2765,6 @@ aty_init_exit:
> par->pll_ops->set_pll(info, &par->saved_pll);
>
> #ifdef CONFIG_MTRR
> - if (par->mtrr_reg >= 0) {
> - mtrr_del(par->mtrr_reg, 0, 0);
> - par->mtrr_reg = -1;
> - }
> if (par->mtrr_aper >= 0) {
> mtrr_del(par->mtrr_aper, 0, 0);
> par->mtrr_aper = -1;
> @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> }
>
> info->fix.mmio_start = raddr;
> - par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> + par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
Double-check me, but I think that ioremap_nocache + WC MTRR = WC. I
think we might need ioremap_nocache_me_harder (or maybe ioremap_x86_uc
if you prefer that bikeshed color) for this.
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-20 23:52 ` Andy Lutomirski
@ 2015-03-27 20:12 ` Luis R. Rodriguez
2015-03-27 21:21 ` Andy Lutomirski
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 20:12 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 20, 2015 at 04:52:18PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > index 8025624..8875e56 100644
> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >
> > #ifdef CONFIG_MTRR
> > par->mtrr_aper = -1;
> > - par->mtrr_reg = -1;
> > if (!nomtrr) {
> > - /* Cover the whole resource. */
> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > + info->fix.smem_len,
> > MTRR_TYPE_WRCOMB, 1);
> > - if (par->mtrr_aper >= 0 && !par->aux_start) {
> > - /* Make a hole for mmio. */
> > - par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> > - GUI_RESERVE, GUI_RESERVE,
> > - MTRR_TYPE_UNCACHABLE, 1);
> > - if (par->mtrr_reg < 0) {
> > - mtrr_del(par->mtrr_aper, 0, 0);
> > - par->mtrr_aper = -1;
> > - }
> > - }
> > }
> > #endif
> >
> > @@ -2776,10 +2765,6 @@ aty_init_exit:
> > par->pll_ops->set_pll(info, &par->saved_pll);
> >
> > #ifdef CONFIG_MTRR
> > - if (par->mtrr_reg >= 0) {
> > - mtrr_del(par->mtrr_reg, 0, 0);
> > - par->mtrr_reg = -1;
> > - }
> > if (par->mtrr_aper >= 0) {
> > mtrr_del(par->mtrr_aper, 0, 0);
> > par->mtrr_aper = -1;
> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> > }
> >
> > info->fix.mmio_start = raddr;
> > - par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> > + par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
>
> Double-check me, but I think that ioremap_nocache + WC MTRR = WC.
Precicely, in this case the WC hole was obtained by using MTRR WC. This
patch removes that WC hole trick and now we can be explciit about
only wanting ioremap_nocache() on the registers, that is WC is not
desired here and is not used. The patch does not highlight the fact
that there was left in place another ioremap() call for the framebuffer:
info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
That is the one that later after this patch we use ioremap_wc() for.
This patch just removes the hole solution. That's all.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-27 20:12 ` Luis R. Rodriguez
@ 2015-03-27 21:21 ` Andy Lutomirski
2015-03-27 23:31 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Andy Lutomirski @ 2015-03-27 21:21 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 1:12 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 20, 2015 at 04:52:18PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > index 8025624..8875e56 100644
>> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> >
>> > #ifdef CONFIG_MTRR
>> > par->mtrr_aper = -1;
>> > - par->mtrr_reg = -1;
>> > if (!nomtrr) {
>> > - /* Cover the whole resource. */
>> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > + info->fix.smem_len,
>> > MTRR_TYPE_WRCOMB, 1);
>> > - if (par->mtrr_aper >= 0 && !par->aux_start) {
>> > - /* Make a hole for mmio. */
>> > - par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
>> > - GUI_RESERVE, GUI_RESERVE,
>> > - MTRR_TYPE_UNCACHABLE, 1);
>> > - if (par->mtrr_reg < 0) {
>> > - mtrr_del(par->mtrr_aper, 0, 0);
>> > - par->mtrr_aper = -1;
>> > - }
>> > - }
>> > }
>> > #endif
>> >
>> > @@ -2776,10 +2765,6 @@ aty_init_exit:
>> > par->pll_ops->set_pll(info, &par->saved_pll);
>> >
>> > #ifdef CONFIG_MTRR
>> > - if (par->mtrr_reg >= 0) {
>> > - mtrr_del(par->mtrr_reg, 0, 0);
>> > - par->mtrr_reg = -1;
>> > - }
>> > if (par->mtrr_aper >= 0) {
>> > mtrr_del(par->mtrr_aper, 0, 0);
>> > par->mtrr_aper = -1;
>> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>> > }
>> >
>> > info->fix.mmio_start = raddr;
>> > - par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
>> > + par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
>>
>> Double-check me, but I think that ioremap_nocache + WC MTRR = WC.
>
> Precicely, in this case the WC hole was obtained by using MTRR WC. This
> patch removes that WC hole trick and now we can be explciit about
> only wanting ioremap_nocache() on the registers, that is WC is not
> desired here and is not used. The patch does not highlight the fact
> that there was left in place another ioremap() call for the framebuffer:
>
> info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
>
> That is the one that later after this patch we use ioremap_wc() for.
> This patch just removes the hole solution. That's all.
>
I don't understand.
If I read it right, there's a 2^n byte BAR. You're requesting WC for
the whole think using arch_phys_wc_add. On a PAT system that has no
effect and all is well. On a non-PAT system, it adds an MTRR. That
means that you need to override the MTRR somehow for the mmio regs,
and UC- won't do the trick.
Or am I missing something here?
--Andy
> Luis
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-27 21:21 ` Andy Lutomirski
@ 2015-03-27 23:31 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:31 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 02:21:34PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 1:12 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 20, 2015 at 04:52:18PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> <mcgrof@do-not-panic.com> wrote:
> >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > index 8025624..8875e56 100644
> >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> >
> >> > #ifdef CONFIG_MTRR
> >> > par->mtrr_aper = -1;
> >> > - par->mtrr_reg = -1;
> >> > if (!nomtrr) {
> >> > - /* Cover the whole resource. */
> >> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > + info->fix.smem_len,
> >> > MTRR_TYPE_WRCOMB, 1);
> >> > - if (par->mtrr_aper >= 0 && !par->aux_start) {
> >> > - /* Make a hole for mmio. */
> >> > - par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> >> > - GUI_RESERVE, GUI_RESERVE,
> >> > - MTRR_TYPE_UNCACHABLE, 1);
> >> > - if (par->mtrr_reg < 0) {
> >> > - mtrr_del(par->mtrr_aper, 0, 0);
> >> > - par->mtrr_aper = -1;
> >> > - }
> >> > - }
> >> > }
> >> > #endif
> >> >
> >> > @@ -2776,10 +2765,6 @@ aty_init_exit:
> >> > par->pll_ops->set_pll(info, &par->saved_pll);
> >> >
> >> > #ifdef CONFIG_MTRR
> >> > - if (par->mtrr_reg >= 0) {
> >> > - mtrr_del(par->mtrr_reg, 0, 0);
> >> > - par->mtrr_reg = -1;
> >> > - }
> >> > if (par->mtrr_aper >= 0) {
> >> > mtrr_del(par->mtrr_aper, 0, 0);
> >> > par->mtrr_aper = -1;
> >> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >> > }
> >> >
> >> > info->fix.mmio_start = raddr;
> >> > - par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> >> > + par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
> >>
> >> Double-check me, but I think that ioremap_nocache + WC MTRR = WC.
> >
> > Precicely, in this case the WC hole was obtained by using MTRR WC. This
> > patch removes that WC hole trick and now we can be explciit about
> > only wanting ioremap_nocache() on the registers, that is WC is not
> > desired here and is not used. The patch does not highlight the fact
> > that there was left in place another ioremap() call for the framebuffer:
> >
> > info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
> >
> > That is the one that later after this patch we use ioremap_wc() for.
> > This patch just removes the hole solution. That's all.
> >
>
> I don't understand.
>
> If I read it right, there's a 2^n byte BAR. You're requesting WC for
> the whole think using arch_phys_wc_add.
I believe there is a misunderstanding of order of changes.
Let's split when we use mtrr_add() Vs arch_phys_wc_add() to avoid
confusion as in this patch we don't yet use arch_phys_wc_add(). That
is done in the next patch.
The commit log describes best the state of affairs prior to this
patch:
The atyfb driver uses an MTRR work around since some
cards use the same PCI BAR for the framebuffer and MMIO.
In such cards the last page is used for MMIO, the rest for
the framebuffer, so on those cards we ioremap() the MMIO
page alone, then again ioremap() the full framebuffer
including the MMIO space *and* ___then___ use an MTRR with
MTRR_TYPE_WRCOMB on the full PCI BAR... and finally "hole"
in an MTRR_TYPE_UNCACHABLE MTRR only for MMIO.
Then this patch drops the MTRR_TYPE_UNCACHABLE rewrite thing
and instead corrects the ioremap() call for the framebuffer
to only be called for the framebuffer alone. For the MMIO
area we adjust to use then ioremap_nocache(). The MTRR left
is now only *for the framebuffer* and it should not be touching
the MMIO area. So the MMIO area has its own ioremap_nocache()
area alone, the framebuffer is left with an ioremap() followed
by an mtrr_add() call.
The next patch replaces the mtrr_add() with arch_phys_wc_add()
and then also uses ioremap_wc().
> On a PAT system that has no effect and all is well.
Yeah we're not doing arch_phys_wc_add() on the entire PCI BAR.
That was dumb, this fixes that, and on this patch mtrr_add()
is still used.
> On a non-PAT system, it adds an MTRR. That
> means that you need to override the MTRR somehow for the mmio regs,
> and UC- won't do the trick.
We don't need to solve that problem here as the MTRR should only
be for the framebuffer.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-20 23:17 ` [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around Luis R. Rodriguez
2015-03-20 23:52 ` Andy Lutomirski
@ 2015-03-21 9:15 ` Ville Syrjälä
2015-03-27 8:37 ` Ville Syrjälä
2015-03-27 19:38 ` Luis R. Rodriguez
1 sibling, 2 replies; 400+ messages in thread
From: Ville Syrjälä @ 2015-03-21 9:15 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
xen-devel, Luis R. Rodriguez, Ingo Molnar, Linus Torvalds,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> index 8025624..8875e56 100644
> --- a/drivers/video/fbdev/aty/atyfb_base.c
> +++ b/drivers/video/fbdev/aty/atyfb_base.c
> @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>
> #ifdef CONFIG_MTRR
> par->mtrr_aper = -1;
> - par->mtrr_reg = -1;
> if (!nomtrr) {
> - /* Cover the whole resource. */
> - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> + info->fix.smem_len,
> MTRR_TYPE_WRCOMB, 1);
MTRRs need power of two size, so how is this supposed to work?
> - if (par->mtrr_aper >= 0 && !par->aux_start) {
> - /* Make a hole for mmio. */
> - par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> - GUI_RESERVE, GUI_RESERVE,
> - MTRR_TYPE_UNCACHABLE, 1);
> - if (par->mtrr_reg < 0) {
> - mtrr_del(par->mtrr_aper, 0, 0);
> - par->mtrr_aper = -1;
> - }
> - }
> }
> #endif
>
> @@ -2776,10 +2765,6 @@ aty_init_exit:
> par->pll_ops->set_pll(info, &par->saved_pll);
>
> #ifdef CONFIG_MTRR
> - if (par->mtrr_reg >= 0) {
> - mtrr_del(par->mtrr_reg, 0, 0);
> - par->mtrr_reg = -1;
> - }
> if (par->mtrr_aper >= 0) {
> mtrr_del(par->mtrr_aper, 0, 0);
> par->mtrr_aper = -1;
> @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> }
>
> info->fix.mmio_start = raddr;
> - par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> + par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
> if (par->ati_regbase == NULL)
> return -ENOMEM;
>
> @@ -3491,6 +3476,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> info->fix.smem_start = addr;
> info->fix.smem_len = 0x800000;
>
> + aty_fudge_framebuffer_len(info);
> +
> info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
> if (info->screen_base == NULL) {
> ret = -ENOMEM;
> @@ -3563,6 +3550,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
> return -ENOMEM;
> }
> par = info->par;
> + par->bus_type = PCI;
> info->fix = atyfb_fix;
> info->device = &pdev->dev;
> par->pci_id = pdev->device;
> @@ -3732,10 +3720,6 @@ static void atyfb_remove(struct fb_info *info)
> #endif
>
> #ifdef CONFIG_MTRR
> - if (par->mtrr_reg >= 0) {
> - mtrr_del(par->mtrr_reg, 0, 0);
> - par->mtrr_reg = -1;
> - }
> if (par->mtrr_aper >= 0) {
> mtrr_del(par->mtrr_aper, 0, 0);
> par->mtrr_aper = -1;
> --
> 2.3.2.209.gd67f9d5.dirty
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-21 9:15 ` Ville Syrjälä
@ 2015-03-27 8:37 ` Ville Syrjälä
2015-03-27 19:38 ` Luis R. Rodriguez
2015-03-27 19:38 ` Luis R. Rodriguez
1 sibling, 1 reply; 400+ messages in thread
From: Ville Syrjälä @ 2015-03-27 8:37 UTC (permalink / raw)
To: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
linux-fbdev, x86, xen-devel, Luis R. Rodriguez, Ingo Molnar,
Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > index 8025624..8875e56 100644
> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >
> > #ifdef CONFIG_MTRR
> > par->mtrr_aper = -1;
> > - par->mtrr_reg = -1;
> > if (!nomtrr) {
> > - /* Cover the whole resource. */
> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > + info->fix.smem_len,
> > MTRR_TYPE_WRCOMB, 1);
>
> MTRRs need power of two size, so how is this supposed to work?
Still waiting for an answer...
>
> > - if (par->mtrr_aper >= 0 && !par->aux_start) {
> > - /* Make a hole for mmio. */
> > - par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> > - GUI_RESERVE, GUI_RESERVE,
> > - MTRR_TYPE_UNCACHABLE, 1);
> > - if (par->mtrr_reg < 0) {
> > - mtrr_del(par->mtrr_aper, 0, 0);
> > - par->mtrr_aper = -1;
> > - }
> > - }
> > }
> > #endif
> >
> > @@ -2776,10 +2765,6 @@ aty_init_exit:
> > par->pll_ops->set_pll(info, &par->saved_pll);
> >
> > #ifdef CONFIG_MTRR
> > - if (par->mtrr_reg >= 0) {
> > - mtrr_del(par->mtrr_reg, 0, 0);
> > - par->mtrr_reg = -1;
> > - }
> > if (par->mtrr_aper >= 0) {
> > mtrr_del(par->mtrr_aper, 0, 0);
> > par->mtrr_aper = -1;
> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> > }
> >
> > info->fix.mmio_start = raddr;
> > - par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> > + par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
> > if (par->ati_regbase == NULL)
> > return -ENOMEM;
> >
> > @@ -3491,6 +3476,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> > info->fix.smem_start = addr;
> > info->fix.smem_len = 0x800000;
> >
> > + aty_fudge_framebuffer_len(info);
> > +
> > info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
> > if (info->screen_base == NULL) {
> > ret = -ENOMEM;
> > @@ -3563,6 +3550,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
> > return -ENOMEM;
> > }
> > par = info->par;
> > + par->bus_type = PCI;
> > info->fix = atyfb_fix;
> > info->device = &pdev->dev;
> > par->pci_id = pdev->device;
> > @@ -3732,10 +3720,6 @@ static void atyfb_remove(struct fb_info *info)
> > #endif
> >
> > #ifdef CONFIG_MTRR
> > - if (par->mtrr_reg >= 0) {
> > - mtrr_del(par->mtrr_reg, 0, 0);
> > - par->mtrr_reg = -1;
> > - }
> > if (par->mtrr_aper >= 0) {
> > mtrr_del(par->mtrr_aper, 0, 0);
> > par->mtrr_aper = -1;
> > --
> > 2.3.2.209.gd67f9d5.dirty
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Ville Syrjälä
> syrjala@sci.fi
> http://www.sci.fi/~syrjala/
--
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-27 8:37 ` Ville Syrjälä
@ 2015-03-27 19:38 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:38 UTC (permalink / raw)
To: Ville Syrjälä,
Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Linus Torvalds,
Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Fri, Mar 27, 2015 at 10:37:04AM +0200, Ville Syrjälä wrote:
> On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > > index 8025624..8875e56 100644
> > > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > >
> > > #ifdef CONFIG_MTRR
> > > par->mtrr_aper = -1;
> > > - par->mtrr_reg = -1;
> > > if (!nomtrr) {
> > > - /* Cover the whole resource. */
> > > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > > + info->fix.smem_len,
> > > MTRR_TYPE_WRCOMB, 1);
> >
> > MTRRs need power of two size, so how is this supposed to work?
>
> Still waiting for an answer...
Sorry was in the desert for a bit, I'm back now.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-21 9:15 ` Ville Syrjälä
2015-03-27 8:37 ` Ville Syrjälä
@ 2015-03-27 19:38 ` Luis R. Rodriguez
2015-03-27 19:43 ` Andy Lutomirski
1 sibling, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:38 UTC (permalink / raw)
To: Ville Syrjälä,
Bjorn Helgaas, Luis R. Rodriguez, Andy Lutomirski
Cc: mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
xen-devel, Ingo Molnar, Linus Torvalds, Daniel Vetter,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > index 8025624..8875e56 100644
> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >
> > #ifdef CONFIG_MTRR
> > par->mtrr_aper = -1;
> > - par->mtrr_reg = -1;
> > if (!nomtrr) {
> > - /* Cover the whole resource. */
> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > + info->fix.smem_len,
> > MTRR_TYPE_WRCOMB, 1);
>
> MTRRs need power of two size, so how is this supposed to work?
As per mtrr_add_page() [0] the base and size are just supposed to be in units
of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
is not standardized and by no means recorded as a requirement. Obviously
powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
will use mtrr_check() to verify the the same requirement. Furthermore,
as per my commit log message:
---
The last thing we do must do to remain sane is ensure we
use the info->fix.smem_start and info->fix.smem_len for
the framebuffer MTRR as we know that is always well adjusted.
The *one* concern here would be if the MTRR is not in units
of 4K __but__ we already know that in the PCI case this cannot
happen, in the shared space setting the MTRR would be up to
0x7ff000 and assuming a 4K page:
; 0x7ff000 / 0x1000
2047
Also, internally when MTRR is used mtrr_add() will use mtrr_check()
and that should splat a warning when the MTRR base and size are
not compatible with what is expected for MTRR usage.
---
If any of this is too risky we can use the __arch_phys_wc_add() (or as
Andy suggested perhaps use set_page_* stuff, although I am still evaluating
this) but I did this change to show the effort required for a change when
the registers / framebuffer is on the same PCI BAR but at different offsets.
[0] scripts/kernel-doc -man -function mtrr_add_page arch/x86/kernel/cpu/mtrr/main.c | nroff -man | less
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-27 19:38 ` Luis R. Rodriguez
@ 2015-03-27 19:43 ` Andy Lutomirski
2015-03-27 19:57 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Andy Lutomirski @ 2015-03-27 19:43 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Ville Syrjälä,
Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > index 8025624..8875e56 100644
>> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> >
>> > #ifdef CONFIG_MTRR
>> > par->mtrr_aper = -1;
>> > - par->mtrr_reg = -1;
>> > if (!nomtrr) {
>> > - /* Cover the whole resource. */
>> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > + info->fix.smem_len,
>> > MTRR_TYPE_WRCOMB, 1);
>>
>> MTRRs need power of two size, so how is this supposed to work?
>
> As per mtrr_add_page() [0] the base and size are just supposed to be in units
> of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> is not standardized and by no means recorded as a requirement. Obviously
> powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> will use mtrr_check() to verify the the same requirement. Furthermore,
> as per my commit log message:
Whatever the code may or may not do, the x86 architecture uses
power-of-two MTRR sizes. So I'm confused.
--Andy
>
> ---
> The last thing we do must do to remain sane is ensure we
> use the info->fix.smem_start and info->fix.smem_len for
> the framebuffer MTRR as we know that is always well adjusted.
> The *one* concern here would be if the MTRR is not in units
> of 4K __but__ we already know that in the PCI case this cannot
> happen, in the shared space setting the MTRR would be up to
> 0x7ff000 and assuming a 4K page:
>
> ; 0x7ff000 / 0x1000
> 2047
>
> Also, internally when MTRR is used mtrr_add() will use mtrr_check()
> and that should splat a warning when the MTRR base and size are
> not compatible with what is expected for MTRR usage.
> ---
>
> If any of this is too risky we can use the __arch_phys_wc_add() (or as
> Andy suggested perhaps use set_page_* stuff, although I am still evaluating
> this) but I did this change to show the effort required for a change when
> the registers / framebuffer is on the same PCI BAR but at different offsets.
>
> [0] scripts/kernel-doc -man -function mtrr_add_page arch/x86/kernel/cpu/mtrr/main.c | nroff -man | less
>
> Luis
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-27 19:43 ` Andy Lutomirski
@ 2015-03-27 19:57 ` Luis R. Rodriguez
2015-03-27 21:56 ` Ville Syrjälä
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:57 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Ville Syrjälä,
Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > index 8025624..8875e56 100644
> >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> >
> >> > #ifdef CONFIG_MTRR
> >> > par->mtrr_aper = -1;
> >> > - par->mtrr_reg = -1;
> >> > if (!nomtrr) {
> >> > - /* Cover the whole resource. */
> >> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > + info->fix.smem_len,
> >> > MTRR_TYPE_WRCOMB, 1);
> >>
> >> MTRRs need power of two size, so how is this supposed to work?
> >
> > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > is not standardized and by no means recorded as a requirement. Obviously
> > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > will use mtrr_check() to verify the the same requirement. Furthermore,
> > as per my commit log message:
>
> Whatever the code may or may not do, the x86 architecture uses
> power-of-two MTRR sizes. So I'm confused.
There should be no confusion, I simply did not know that *was* the
requirement for x86, if that is the case we should add a check for that
and perhaps generalize a helper that does the power of two helper changes,
the cleanest I found was the vesafb driver solution.
Thoughts?
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-27 19:57 ` Luis R. Rodriguez
@ 2015-03-27 21:56 ` Ville Syrjälä
2015-03-27 22:02 ` Andy Lutomirski
2015-03-28 0:21 ` Luis R. Rodriguez
0 siblings, 2 replies; 400+ messages in thread
From: Ville Syrjälä @ 2015-03-27 21:56 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > index 8025624..8875e56 100644
> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > >> >
> > >> > #ifdef CONFIG_MTRR
> > >> > par->mtrr_aper = -1;
> > >> > - par->mtrr_reg = -1;
> > >> > if (!nomtrr) {
> > >> > - /* Cover the whole resource. */
> > >> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > >> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > >> > + info->fix.smem_len,
> > >> > MTRR_TYPE_WRCOMB, 1);
> > >>
> > >> MTRRs need power of two size, so how is this supposed to work?
> > >
> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > > is not standardized and by no means recorded as a requirement. Obviously
> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > > as per my commit log message:
> >
> > Whatever the code may or may not do, the x86 architecture uses
> > power-of-two MTRR sizes. So I'm confused.
>
> There should be no confusion, I simply did not know that *was* the
> requirement for x86, if that is the case we should add a check for that
> and perhaps generalize a helper that does the power of two helper changes,
> the cleanest I found was the vesafb driver solution.
>
> Thoughts?
The vesafb solution is bad since you'll only end up covering only
the first 4MB of the framebuffer instead of the almost 8MB you want.
Which in practice will mean throwing away half the VRAM since you really
don't want the massive performance hit from accessing it as UC. And that
would mean giving up decent display resolutions as well :(
And the other option of trying to cover the remainder with multiple ever
smaller MTRRs doesn't work either since you'll run out of MTRRs very
quickly.
This is precisely why I used the hole method in atyfb in the first
place.
I don't really like the idea of any new mtrr code not supporting that
use case, especially as these things tend to be present in older machines
where PAT isn't an option.
--
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-27 21:56 ` Ville Syrjälä
@ 2015-03-27 22:02 ` Andy Lutomirski
2015-03-28 0:28 ` Luis R. Rodriguez
2015-03-28 0:21 ` Luis R. Rodriguez
1 sibling, 1 reply; 400+ messages in thread
From: Andy Lutomirski @ 2015-03-27 22:02 UTC (permalink / raw)
To: Ville Syrjälä,
Luis R. Rodriguez, Andy Lutomirski, Bjorn Helgaas,
Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
>> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
>> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > >> > index 8025624..8875e56 100644
>> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> > >> >
>> > >> > #ifdef CONFIG_MTRR
>> > >> > par->mtrr_aper = -1;
>> > >> > - par->mtrr_reg = -1;
>> > >> > if (!nomtrr) {
>> > >> > - /* Cover the whole resource. */
>> > >> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > >> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > >> > + info->fix.smem_len,
>> > >> > MTRR_TYPE_WRCOMB, 1);
>> > >>
>> > >> MTRRs need power of two size, so how is this supposed to work?
>> > >
>> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
>> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
>> > > is not standardized and by no means recorded as a requirement. Obviously
>> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
>> > > will use mtrr_check() to verify the the same requirement. Furthermore,
>> > > as per my commit log message:
>> >
>> > Whatever the code may or may not do, the x86 architecture uses
>> > power-of-two MTRR sizes. So I'm confused.
>>
>> There should be no confusion, I simply did not know that *was* the
>> requirement for x86, if that is the case we should add a check for that
>> and perhaps generalize a helper that does the power of two helper changes,
>> the cleanest I found was the vesafb driver solution.
>>
>> Thoughts?
>
> The vesafb solution is bad since you'll only end up covering only
> the first 4MB of the framebuffer instead of the almost 8MB you want.
> Which in practice will mean throwing away half the VRAM since you really
> don't want the massive performance hit from accessing it as UC. And that
> would mean giving up decent display resolutions as well :(
>
> And the other option of trying to cover the remainder with multiple ever
> smaller MTRRs doesn't work either since you'll run out of MTRRs very
> quickly.
>
> This is precisely why I used the hole method in atyfb in the first
> place.
>
> I don't really like the idea of any new mtrr code not supporting that
> use case, especially as these things tend to be present in older machines
> where PAT isn't an option.
According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
an effective memory type of UC. Hence my suggestion to add
ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
otherwise WC MTRR-covered region.
ioremap_nocache is UC- (even on non-PAT unless I misunderstood how
this stuff works), so ioremap_nocache by itself isn't good enough.
--Andy
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-27 22:02 ` Andy Lutomirski
@ 2015-03-28 0:28 ` Luis R. Rodriguez
2015-03-28 12:23 ` Ville Syrjälä
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-28 0:28 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Ville Syrjälä,
Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > >> > index 8025624..8875e56 100644
> >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> > >> >
> >> > >> > #ifdef CONFIG_MTRR
> >> > >> > par->mtrr_aper = -1;
> >> > >> > - par->mtrr_reg = -1;
> >> > >> > if (!nomtrr) {
> >> > >> > - /* Cover the whole resource. */
> >> > >> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > >> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > >> > + info->fix.smem_len,
> >> > >> > MTRR_TYPE_WRCOMB, 1);
> >> > >>
> >> > >> MTRRs need power of two size, so how is this supposed to work?
> >> > >
> >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> >> > > is not standardized and by no means recorded as a requirement. Obviously
> >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> >> > > as per my commit log message:
> >> >
> >> > Whatever the code may or may not do, the x86 architecture uses
> >> > power-of-two MTRR sizes. So I'm confused.
> >>
> >> There should be no confusion, I simply did not know that *was* the
> >> requirement for x86, if that is the case we should add a check for that
> >> and perhaps generalize a helper that does the power of two helper changes,
> >> the cleanest I found was the vesafb driver solution.
> >>
> >> Thoughts?
> >
> > The vesafb solution is bad since you'll only end up covering only
> > the first 4MB of the framebuffer instead of the almost 8MB you want.
> > Which in practice will mean throwing away half the VRAM since you really
> > don't want the massive performance hit from accessing it as UC. And that
> > would mean giving up decent display resolutions as well :(
> >
> > And the other option of trying to cover the remainder with multiple ever
> > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> > quickly.
> >
> > This is precisely why I used the hole method in atyfb in the first
> > place.
> >
> > I don't really like the idea of any new mtrr code not supporting that
> > use case, especially as these things tend to be present in older machines
> > where PAT isn't an option.
>
> According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> an effective memory type of UC. Hence my suggestion to add
> ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> otherwise WC MTRR-covered region.
OK I think I get it now.
And I take it this would hopefully only be used for non-PAT systems?
Would there be a use case for PAT systems? I wonder if we can wrap
this under some APIs to make it clean and hide this dirty thing
behind the scenes, it seems a fragile and error prone and my hope
would be that we won't need more specialization in this area for
PAT systems.
> ioremap_nocache is UC- (even on non-PAT unless I misunderstood how
> this stuff works), so ioremap_nocache by itself isn't good enough.
Thanks for the clarification.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-28 0:28 ` Luis R. Rodriguez
@ 2015-03-28 12:23 ` Ville Syrjälä
2015-04-01 23:52 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Ville Syrjälä @ 2015-03-28 12:23 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > >> > index 8025624..8875e56 100644
> > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > >> > >> >
> > >> > >> > #ifdef CONFIG_MTRR
> > >> > >> > par->mtrr_aper = -1;
> > >> > >> > - par->mtrr_reg = -1;
> > >> > >> > if (!nomtrr) {
> > >> > >> > - /* Cover the whole resource. */
> > >> > >> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > >> > >> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > >> > >> > + info->fix.smem_len,
> > >> > >> > MTRR_TYPE_WRCOMB, 1);
> > >> > >>
> > >> > >> MTRRs need power of two size, so how is this supposed to work?
> > >> > >
> > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > >> > > is not standardized and by no means recorded as a requirement. Obviously
> > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > >> > > as per my commit log message:
> > >> >
> > >> > Whatever the code may or may not do, the x86 architecture uses
> > >> > power-of-two MTRR sizes. So I'm confused.
> > >>
> > >> There should be no confusion, I simply did not know that *was* the
> > >> requirement for x86, if that is the case we should add a check for that
> > >> and perhaps generalize a helper that does the power of two helper changes,
> > >> the cleanest I found was the vesafb driver solution.
> > >>
> > >> Thoughts?
> > >
> > > The vesafb solution is bad since you'll only end up covering only
> > > the first 4MB of the framebuffer instead of the almost 8MB you want.
> > > Which in practice will mean throwing away half the VRAM since you really
> > > don't want the massive performance hit from accessing it as UC. And that
> > > would mean giving up decent display resolutions as well :(
> > >
> > > And the other option of trying to cover the remainder with multiple ever
> > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> > > quickly.
> > >
> > > This is precisely why I used the hole method in atyfb in the first
> > > place.
> > >
> > > I don't really like the idea of any new mtrr code not supporting that
> > > use case, especially as these things tend to be present in older machines
> > > where PAT isn't an option.
> >
> > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> > an effective memory type of UC. Hence my suggestion to add
> > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> > otherwise WC MTRR-covered region.
>
> OK I think I get it now.
>
> And I take it this would hopefully only be used for non-PAT systems?
> Would there be a use case for PAT systems? I wonder if we can wrap
> this under some APIs to make it clean and hide this dirty thing
> behind the scenes, it seems a fragile and error prone and my hope
> would be that we won't need more specialization in this area for
> PAT systems.
One potential complication is kernel vs. userspace mmap. MTRR applies to
the physical address, but PAT applies to the virtual address, so with
the WC MTRR you get WC for userspace "for free" as well. Also the
userspace mmaps request will have the length of smem_len (at most), so
it won't be the nice power of two in that case.
Also on PAT systems w/o a BIOS provided WC MTRR, the fbdev mmap seems
to be total crap at the moment. IIRC I have a patch to fix things a bit...
>From 4e6d70d223f35953c8a11a58cf3376a8a001fa4f Mon Sep 17 00:00:00 2001
From: Ville Syrjala <syrjala@sci.fi>
Date: Fri, 15 Apr 2011 04:02:43 +0300
Subject: [PATCH] fb: writecombine fb
---
drivers/video/fbdev/core/fbmem.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 0705d88..ecbde0e 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -1396,6 +1396,7 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
unsigned long mmio_pgoff;
unsigned long start;
u32 len;
+ bool mmio = false;
if (!info)
return -ENODEV;
@@ -1426,11 +1427,20 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
vma->vm_pgoff -= mmio_pgoff;
start = info->fix.mmio_start;
len = info->fix.mmio_len;
+ mmio = true;
}
mutex_unlock(&info->mm_lock);
+ if (!mmio) {
+ vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+ vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
+ if (!vm_iomap_memory(vma, start, len))
+ return 0;
+ }
+
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
- fb_pgprotect(file, vma, start);
+ vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
return vm_iomap_memory(vma, start, len);
}
Perhaps it's time I tried to send that upstream properly :P
--
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-28 12:23 ` Ville Syrjälä
@ 2015-04-01 23:52 ` Luis R. Rodriguez
2015-04-02 0:04 ` Andy Lutomirski
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-04-01 23:52 UTC (permalink / raw)
To: Ville Syrjälä,
Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > >> > index 8025624..8875e56 100644
> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > > >> > >> >
> > > >> > >> > #ifdef CONFIG_MTRR
> > > >> > >> > par->mtrr_aper = -1;
> > > >> > >> > - par->mtrr_reg = -1;
> > > >> > >> > if (!nomtrr) {
> > > >> > >> > - /* Cover the whole resource. */
> > > >> > >> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > > >> > >> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > > >> > >> > + info->fix.smem_len,
> > > >> > >> > MTRR_TYPE_WRCOMB, 1);
> > > >> > >>
> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
> > > >> > >
> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > > >> > > as per my commit log message:
> > > >> >
> > > >> > Whatever the code may or may not do, the x86 architecture uses
> > > >> > power-of-two MTRR sizes. So I'm confused.
> > > >>
> > > >> There should be no confusion, I simply did not know that *was* the
> > > >> requirement for x86, if that is the case we should add a check for that
> > > >> and perhaps generalize a helper that does the power of two helper changes,
> > > >> the cleanest I found was the vesafb driver solution.
> > > >>
> > > >> Thoughts?
> > > >
> > > > The vesafb solution is bad since you'll only end up covering only
> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
> > > > Which in practice will mean throwing away half the VRAM since you really
> > > > don't want the massive performance hit from accessing it as UC. And that
> > > > would mean giving up decent display resolutions as well :(
> > > >
> > > > And the other option of trying to cover the remainder with multiple ever
> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> > > > quickly.
> > > >
> > > > This is precisely why I used the hole method in atyfb in the first
> > > > place.
> > > >
> > > > I don't really like the idea of any new mtrr code not supporting that
> > > > use case, especially as these things tend to be present in older machines
> > > > where PAT isn't an option.
> > >
> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> > > an effective memory type of UC.
This is true but non-PAT systems that use just ioremap() will default to
_PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
on Linux has PCD = 1, PWT = 0. The list comes from:
uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
[_PAGE_CACHE_MODE_WB ] = 0 | 0 ,
[_PAGE_CACHE_MODE_WC ] = _PAGE_PWT | 0 ,
[_PAGE_CACHE_MODE_UC_MINUS] = 0 | _PAGE_PCD,
[_PAGE_CACHE_MODE_UC ] = _PAGE_PWT | _PAGE_PCD,
[_PAGE_CACHE_MODE_WT ] = 0 | _PAGE_PCD,
[_PAGE_CACHE_MODE_WP ] = 0 | _PAGE_PCD,
};
This can better be read here:
PAT
|PCD
||PWT
|||
000 WB _PAGE_CACHE_MODE_WB
001 WC _PAGE_CACHE_MODE_WC
010 UC- _PAGE_CACHE_MODE_UC_MINUS
011 UC _PAGE_CACHE_MODE_UC
On x86 ioremap() defaults to ioremap_nocache() and right now that uses
_PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
to consider for non-PAT systems then:
a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
table table 11-6 on non-PAT systems seems to place this situation as
"implementation defined" and not encouraged.
a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
case on x86 for both ioremap() and ioremap_nocache() as they will
both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
an effective memory type of UC.
If I've understood this correctly then neither of these situations are good and
its just by chance that on some systems situation a) has lead to proper WC.
On a PAT system we have a bit different combinatorial results (based on Table
11-7):
a) Right now ioremap() and ioremap_nocache() defaulting to
_PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
So to be clear right now atyfb should work fine on PAT systems
with de33c442e in place, once reverted as-is right now we'd end
up with UC effective memory type.
For both PAT and non-PAT systems when commit de33c442e gets reverted
we'd end up with UC as the effective memory type for atyfb. Right
now it shoud work on PAT systems and by chance its suspected to work
on non-PAT systems. We want to phase MTRR though, specially to avoid
all this insane combinatorial nightmware.
> > > Hence my suggestion to add
> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> > > otherwise WC MTRR-covered region.
To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
and after commit de33c442e gets reverted. So for instance if we had on the
atyfb driver:
ioremap_x86_uc(PCI BAR)
ioremap_wc(framebuffer)
arch_phys_add_wc(PCI BAR)
On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
MTRR that follows would mean we'd end up with another grey area (but
similar to before as technically an effectivethe memory type of WC).
On PAT systems the above would not use MTRRs but we'd be counting on
overlapping memory types -- its not clear if aliasing here is a problem.
Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
describes that: "the minimum range size is 4 KiB, the base address must be on
a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
2^n and its base address must be alinged on a 2^n boundary where n is a value
equal or greatar then 12. The base-address alignment value cannot be less
than its length. For example, an 8-KiB range cannot be aligned on a
4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
So to answer my own question: indeed, our framebuffer base address must be
aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
fixed range sizes and variable range sizes, in case of the MMIO that does
not need to abide by the power of 2 rule as a fixed range size of 4 KiB
could be used although upon review ouf our own implemetnation its unclear if
that is what is used for 4 KiB sized MTRRs.
Hence my arch_phys_add_wc(PCI BAR) as above.
> > OK I think I get it now.
> >
> > And I take it this would hopefully only be used for non-PAT systems?
Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
could make the effective for both PAT and non-PAT obviously then. Later when
we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
only need it as transitory until then -- that is unless we want perhaps a strong
UC ioremap primitive which is always following strong UC when available regardless
of these default transitions.
The big issue I see here is simply the combinatorial issues, so I do think
its best to annotate these corner cases well and avoid them.
> > Would there be a use case for PAT systems? I wonder if we can wrap
> > this under some APIs to make it clean and hide this dirty thing
> > behind the scenes, it seems a fragile and error prone and my hope
> > would be that we won't need more specialization in this area for
> > PAT systems.
>
> One potential complication is kernel vs. userspace mmap. MTRR applies to
> the physical address, but PAT applies to the virtual address, so with
> the WC MTRR you get WC for userspace "for free" as well.
What is the performance impact of having the conversion being done by the
kernel? Has anyone done measurements? If significant can't the subsystem mmap()
cache the phys address for PAT? Shouldn't the TLB take care of those considerations
for us? If this is generally desirable shouldn't we just generalize the cache
for devices for O(1) access through a generic API?
Can the difference, other than a possible performance hit, implicate a userspace
visible change?
If the performance / userspace effect is neglibable then I'd expect the gains
from cleaner code / APIs to outweight the cons. After all the goal is to
streamline PAT when possible here.
> Also the
> userspace mmaps request will have the length of smem_len (at most), so
> it won't be the nice power of two in that case.
Does that length change implicate a userspace visible change?
> Also on PAT systems w/o a BIOS provided WC MTRR, the fbdev mmap seems
> to be total crap at the moment. IIRC I have a patch to fix things a bit...
Isn't that becuase of the lack of the ioremap_wc()'s? You seem to be
alternatively doing this with pgprot_writecombine(), more on this strategy
below though.
> From 4e6d70d223f35953c8a11a58cf3376a8a001fa4f Mon Sep 17 00:00:00 2001
> From: Ville Syrjala <syrjala@sci.fi>
> Date: Fri, 15 Apr 2011 04:02:43 +0300
> Subject: [PATCH] fb: writecombine fb
>
> ---
> drivers/video/fbdev/core/fbmem.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> index 0705d88..ecbde0e 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -1396,6 +1396,7 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
> unsigned long mmio_pgoff;
> unsigned long start;
> u32 len;
> + bool mmio = false;
>
> if (!info)
> return -ENODEV;
> @@ -1426,11 +1427,20 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
> vma->vm_pgoff -= mmio_pgoff;
> start = info->fix.mmio_start;
> len = info->fix.mmio_len;
> + mmio = true;
> }
> mutex_unlock(&info->mm_lock);
>
> + if (!mmio) {
> + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +
> + if (!vm_iomap_memory(vma, start, len))
> + return 0;
> + }
> +
> vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> - fb_pgprotect(file, vma, start);
> + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>
> return vm_iomap_memory(vma, start, len);
> }
>
> Perhaps it's time I tried to send that upstream properly :P
Lets assume drivers all have ioremap_wc() on the framebuffer, would the
the above not be needed then (disregarding the corner cases such as atyfb)?
If your goal is to generalize a place to make framebuffer WC instead of doing
that at mmap() why not do it at register_framebuffer() time and do it
only once? I suspect all this might also be easier to do and generalize
after this series.
So as we can see from this series there are tons of drivers that can safely
be moved to use ioremap_wc() already, provided there are no regressions with
the simple ioremap_wc() / arch_phys_wc_add() switch. There are only a few corner
cases to address after that. Addressing both of these *first* would simplify
the code and gramatically make it a bit more consistent while trying to avoid a
generalized regression. I believe a generalized solution is definitely in order
but we also should first address the corner cases.
So how about we:
a) convert all drivers over that are safe to convert to ioremap_wc() /
arch_phys_add_wc()
b) address all corner cases and try to avoid further combinatorial
issues
c) after a while push for reverting de33c442e
d) generalize a solution / for framebuffer
Ideally as I mentioned in the other thread with Bjorn we could even
have the WC be done further below for us but it was very unclear
if we could accomplish this due the definition of the PCI flags,
the way we'd use it and the way they could be integrated on hardware
by manufacturers. I think generalizing things under the frambuffer
code would be good intermediate step but I think we need to phase
this in in light of the corner cases, combinatorial issues with
PAT / non-PAT and eventual reverting goals of commit de33c442e
in order to generalize strong UC.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-04-01 23:52 ` Luis R. Rodriguez
@ 2015-04-02 0:04 ` Andy Lutomirski
2015-04-02 19:45 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Andy Lutomirski @ 2015-04-02 0:04 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Ville Syrjälä,
Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Wed, Apr 1, 2015 at 4:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
>> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
>> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
>> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
>> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
>> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
>> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > > >> > >> > index 8025624..8875e56 100644
>> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> > > >> > >> >
>> > > >> > >> > #ifdef CONFIG_MTRR
>> > > >> > >> > par->mtrr_aper = -1;
>> > > >> > >> > - par->mtrr_reg = -1;
>> > > >> > >> > if (!nomtrr) {
>> > > >> > >> > - /* Cover the whole resource. */
>> > > >> > >> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > > >> > >> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > > >> > >> > + info->fix.smem_len,
>> > > >> > >> > MTRR_TYPE_WRCOMB, 1);
>> > > >> > >>
>> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
>> > > >> > >
>> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
>> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
>> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
>> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
>> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
>> > > >> > > as per my commit log message:
>> > > >> >
>> > > >> > Whatever the code may or may not do, the x86 architecture uses
>> > > >> > power-of-two MTRR sizes. So I'm confused.
>> > > >>
>> > > >> There should be no confusion, I simply did not know that *was* the
>> > > >> requirement for x86, if that is the case we should add a check for that
>> > > >> and perhaps generalize a helper that does the power of two helper changes,
>> > > >> the cleanest I found was the vesafb driver solution.
>> > > >>
>> > > >> Thoughts?
>> > > >
>> > > > The vesafb solution is bad since you'll only end up covering only
>> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
>> > > > Which in practice will mean throwing away half the VRAM since you really
>> > > > don't want the massive performance hit from accessing it as UC. And that
>> > > > would mean giving up decent display resolutions as well :(
>> > > >
>> > > > And the other option of trying to cover the remainder with multiple ever
>> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
>> > > > quickly.
>> > > >
>> > > > This is precisely why I used the hole method in atyfb in the first
>> > > > place.
>> > > >
>> > > > I don't really like the idea of any new mtrr code not supporting that
>> > > > use case, especially as these things tend to be present in older machines
>> > > > where PAT isn't an option.
>> > >
>> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
>> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
>> > > an effective memory type of UC.
>
> This is true but non-PAT systems that use just ioremap() will default to
> _PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
> on Linux has PCD = 1, PWT = 0. The list comes from:
>
> uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
> [_PAGE_CACHE_MODE_WB ] = 0 | 0 ,
> [_PAGE_CACHE_MODE_WC ] = _PAGE_PWT | 0 ,
> [_PAGE_CACHE_MODE_UC_MINUS] = 0 | _PAGE_PCD,
> [_PAGE_CACHE_MODE_UC ] = _PAGE_PWT | _PAGE_PCD,
> [_PAGE_CACHE_MODE_WT ] = 0 | _PAGE_PCD,
> [_PAGE_CACHE_MODE_WP ] = 0 | _PAGE_PCD,
> };
>
> This can better be read here:
>
> PAT
> |PCD
> ||PWT
> |||
> 000 WB _PAGE_CACHE_MODE_WB
> 001 WC _PAGE_CACHE_MODE_WC
> 010 UC- _PAGE_CACHE_MODE_UC_MINUS
> 011 UC _PAGE_CACHE_MODE_UC
>
> On x86 ioremap() defaults to ioremap_nocache() and right now that uses
> _PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
> to consider for non-PAT systems then:
>
> a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
> on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
> table table 11-6 on non-PAT systems seems to place this situation as
> "implementation defined" and not encouraged.
>
> a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
> UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
> gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
> case on x86 for both ioremap() and ioremap_nocache() as they will
> both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
> an effective memory type of UC.
>
> If I've understood this correctly then neither of these situations are good and
> its just by chance that on some systems situation a) has lead to proper WC.
>
> On a PAT system we have a bit different combinatorial results (based on Table
> 11-7):
>
> a) Right now ioremap() and ioremap_nocache() defaulting to
> _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
>
> b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
>
> So to be clear right now atyfb should work fine on PAT systems
> with de33c442e in place, once reverted as-is right now we'd end
> up with UC effective memory type.
>
> For both PAT and non-PAT systems when commit de33c442e gets reverted
> we'd end up with UC as the effective memory type for atyfb. Right
> now it shoud work on PAT systems and by chance its suspected to work
> on non-PAT systems. We want to phase MTRR though, specially to avoid
> all this insane combinatorial nightmware.
>
>> > > Hence my suggestion to add
>> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
>> > > otherwise WC MTRR-covered region.
>
> To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
> jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
> and after commit de33c442e gets reverted. So for instance if we had on the
> atyfb driver:
>
> ioremap_x86_uc(PCI BAR)
> ioremap_wc(framebuffer)
> arch_phys_add_wc(PCI BAR)
>
> On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
> Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
> MTRR that follows would mean we'd end up with another grey area (but
> similar to before as technically an effectivethe memory type of WC).
>
> On PAT systems the above would not use MTRRs but we'd be counting on
> overlapping memory types -- its not clear if aliasing here is a problem.
>
> Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
> describes that: "the minimum range size is 4 KiB, the base address must be on
> a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
> 2^n and its base address must be alinged on a 2^n boundary where n is a value
> equal or greatar then 12. The base-address alignment value cannot be less
> than its length. For example, an 8-KiB range cannot be aligned on a
> 4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
>
> So to answer my own question: indeed, our framebuffer base address must be
> aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
> fixed range sizes and variable range sizes, in case of the MMIO that does
> not need to abide by the power of 2 rule as a fixed range size of 4 KiB
> could be used although upon review ouf our own implemetnation its unclear if
> that is what is used for 4 KiB sized MTRRs.
>
> Hence my arch_phys_add_wc(PCI BAR) as above.
>
>> > OK I think I get it now.
>> >
>> > And I take it this would hopefully only be used for non-PAT systems?
>
> Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
> could make the effective for both PAT and non-PAT obviously then. Later when
> we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
> only need it as transitory until then -- that is unless we want perhaps a strong
> UC ioremap primitive which is always following strong UC when available regardless
> of these default transitions.
>
> The big issue I see here is simply the combinatorial issues, so I do think
> its best to annotate these corner cases well and avoid them.
>
>> > Would there be a use case for PAT systems? I wonder if we can wrap
>> > this under some APIs to make it clean and hide this dirty thing
>> > behind the scenes, it seems a fragile and error prone and my hope
>> > would be that we won't need more specialization in this area for
>> > PAT systems.
>>
>> One potential complication is kernel vs. userspace mmap. MTRR applies to
>> the physical address, but PAT applies to the virtual address, so with
>> the WC MTRR you get WC for userspace "for free" as well.
>
> What is the performance impact of having the conversion being done by the
> kernel? Has anyone done measurements? If significant can't the subsystem mmap()
> cache the phys address for PAT? Shouldn't the TLB take care of those considerations
> for us? If this is generally desirable shouldn't we just generalize the cache
> for devices for O(1) access through a generic API?
We're pretty much required to keep the PTE memory types consistent for
aliasses of the same page. I think that the x86 pageattr code is
supposed to take care of this. IOW, if everything is working right,
then the supposedly uncached mmap should either fail, be promoted to
WC, or cause the existing WC map to degrade to UC. The code is really
overcomplicated right now.
--Andy
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-04-02 0:04 ` Andy Lutomirski
@ 2015-04-02 19:45 ` Luis R. Rodriguez
2015-04-02 19:50 ` Andy Lutomirski
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 19:45 UTC (permalink / raw)
To: Andy Lutomirski, Mel Gorman, Vlastimil Babka
Cc: Ville Syrjälä,
Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Wed, Apr 01, 2015 at 05:04:08PM -0700, Andy Lutomirski wrote:
> On Wed, Apr 1, 2015 at 4:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
> >> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
> >> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> >> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> >> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> >> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> >> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> >> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> >> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > > >> > >> > index 8025624..8875e56 100644
> >> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> > > >> > >> >
> >> > > >> > >> > #ifdef CONFIG_MTRR
> >> > > >> > >> > par->mtrr_aper = -1;
> >> > > >> > >> > - par->mtrr_reg = -1;
> >> > > >> > >> > if (!nomtrr) {
> >> > > >> > >> > - /* Cover the whole resource. */
> >> > > >> > >> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > > >> > >> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > > >> > >> > + info->fix.smem_len,
> >> > > >> > >> > MTRR_TYPE_WRCOMB, 1);
> >> > > >> > >>
> >> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
> >> > > >> > >
> >> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> >> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> >> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
> >> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> >> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> >> > > >> > > as per my commit log message:
> >> > > >> >
> >> > > >> > Whatever the code may or may not do, the x86 architecture uses
> >> > > >> > power-of-two MTRR sizes. So I'm confused.
> >> > > >>
> >> > > >> There should be no confusion, I simply did not know that *was* the
> >> > > >> requirement for x86, if that is the case we should add a check for that
> >> > > >> and perhaps generalize a helper that does the power of two helper changes,
> >> > > >> the cleanest I found was the vesafb driver solution.
> >> > > >>
> >> > > >> Thoughts?
> >> > > >
> >> > > > The vesafb solution is bad since you'll only end up covering only
> >> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
> >> > > > Which in practice will mean throwing away half the VRAM since you really
> >> > > > don't want the massive performance hit from accessing it as UC. And that
> >> > > > would mean giving up decent display resolutions as well :(
> >> > > >
> >> > > > And the other option of trying to cover the remainder with multiple ever
> >> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> >> > > > quickly.
> >> > > >
> >> > > > This is precisely why I used the hole method in atyfb in the first
> >> > > > place.
> >> > > >
> >> > > > I don't really like the idea of any new mtrr code not supporting that
> >> > > > use case, especially as these things tend to be present in older machines
> >> > > > where PAT isn't an option.
> >> > >
> >> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> >> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> >> > > an effective memory type of UC.
> >
> > This is true but non-PAT systems that use just ioremap() will default to
> > _PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
> > on Linux has PCD = 1, PWT = 0. The list comes from:
> >
> > uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
> > [_PAGE_CACHE_MODE_WB ] = 0 | 0 ,
> > [_PAGE_CACHE_MODE_WC ] = _PAGE_PWT | 0 ,
> > [_PAGE_CACHE_MODE_UC_MINUS] = 0 | _PAGE_PCD,
> > [_PAGE_CACHE_MODE_UC ] = _PAGE_PWT | _PAGE_PCD,
> > [_PAGE_CACHE_MODE_WT ] = 0 | _PAGE_PCD,
> > [_PAGE_CACHE_MODE_WP ] = 0 | _PAGE_PCD,
> > };
> >
> > This can better be read here:
> >
> > PAT
> > |PCD
> > ||PWT
> > |||
> > 000 WB _PAGE_CACHE_MODE_WB
> > 001 WC _PAGE_CACHE_MODE_WC
> > 010 UC- _PAGE_CACHE_MODE_UC_MINUS
> > 011 UC _PAGE_CACHE_MODE_UC
> >
> > On x86 ioremap() defaults to ioremap_nocache() and right now that uses
> > _PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
> > to consider for non-PAT systems then:
> >
> > a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
> > on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
> > table table 11-6 on non-PAT systems seems to place this situation as
> > "implementation defined" and not encouraged.
> >
> > a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
> > UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
> > gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
> > case on x86 for both ioremap() and ioremap_nocache() as they will
> > both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
> > an effective memory type of UC.
> >
> > If I've understood this correctly then neither of these situations are good and
> > its just by chance that on some systems situation a) has lead to proper WC.
> >
> > On a PAT system we have a bit different combinatorial results (based on Table
> > 11-7):
> >
> > a) Right now ioremap() and ioremap_nocache() defaulting to
> > _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
> >
> > b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
> >
> > So to be clear right now atyfb should work fine on PAT systems
> > with de33c442e in place, once reverted as-is right now we'd end
> > up with UC effective memory type.
> >
> > For both PAT and non-PAT systems when commit de33c442e gets reverted
> > we'd end up with UC as the effective memory type for atyfb. Right
> > now it shoud work on PAT systems and by chance its suspected to work
> > on non-PAT systems. We want to phase MTRR though, specially to avoid
> > all this insane combinatorial nightmware.
> >
> >> > > Hence my suggestion to add
> >> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> >> > > otherwise WC MTRR-covered region.
> >
> > To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
> > jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
> > and after commit de33c442e gets reverted. So for instance if we had on the
> > atyfb driver:
> >
> > ioremap_x86_uc(PCI BAR)
> > ioremap_wc(framebuffer)
> > arch_phys_add_wc(PCI BAR)
> >
> > On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
> > Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
> > MTRR that follows would mean we'd end up with another grey area (but
> > similar to before as technically an effectivethe memory type of WC).
> >
> > On PAT systems the above would not use MTRRs but we'd be counting on
> > overlapping memory types -- its not clear if aliasing here is a problem.
> >
> > Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
> > describes that: "the minimum range size is 4 KiB, the base address must be on
> > a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
> > 2^n and its base address must be alinged on a 2^n boundary where n is a value
> > equal or greatar then 12. The base-address alignment value cannot be less
> > than its length. For example, an 8-KiB range cannot be aligned on a
> > 4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
> >
> > So to answer my own question: indeed, our framebuffer base address must be
> > aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
> > fixed range sizes and variable range sizes, in case of the MMIO that does
> > not need to abide by the power of 2 rule as a fixed range size of 4 KiB
> > could be used although upon review ouf our own implemetnation its unclear if
> > that is what is used for 4 KiB sized MTRRs.
> >
> > Hence my arch_phys_add_wc(PCI BAR) as above.
> >
> >> > OK I think I get it now.
> >> >
> >> > And I take it this would hopefully only be used for non-PAT systems?
> >
> > Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
> > could make the effective for both PAT and non-PAT obviously then. Later when
> > we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
> > only need it as transitory until then -- that is unless we want perhaps a strong
> > UC ioremap primitive which is always following strong UC when available regardless
> > of these default transitions.
> >
> > The big issue I see here is simply the combinatorial issues, so I do think
> > its best to annotate these corner cases well and avoid them.
> >
> >> > Would there be a use case for PAT systems? I wonder if we can wrap
> >> > this under some APIs to make it clean and hide this dirty thing
> >> > behind the scenes, it seems a fragile and error prone and my hope
> >> > would be that we won't need more specialization in this area for
> >> > PAT systems.
> >>
> >> One potential complication is kernel vs. userspace mmap. MTRR applies to
> >> the physical address, but PAT applies to the virtual address, so with
> >> the WC MTRR you get WC for userspace "for free" as well.
> >
> > What is the performance impact of having the conversion being done by the
> > kernel? Has anyone done measurements? If significant can't the subsystem mmap()
> > cache the phys address for PAT? Shouldn't the TLB take care of those considerations
> > for us? If this is generally desirable shouldn't we just generalize the cache
> > for devices for O(1) access through a generic API?
>
> We're pretty much required to keep the PTE memory types consistent for
> aliasses of the same page.
Hrm, OK so overlapping ioremap() calls should be frowed upon?
I think its important to clarify the few different scenarios we have
for atyfb, both for today when uc- is default and when uc becomes the
default. I'll also clarify what this series originally tried to do
but the issues that size requirements prohibit us to do along with
combinatorial issues that would also be present when and if uc becomes
default. Finally I'll clarify what I am thinking we should do in light
of all this.
_______________________________________________________________________
| | |
|_______________________________________________________|_____________|
\______________________________________________________/ \____________/
Framebuffer (8 MiB) MMIO (4 KiB)
Currently we have:
Page_cache_mode's _PAGE_CACHE_MODE_ is removed below for brevity.
The atyfb PCI BAR is condensed to:
Frambuffer,MMIO
Keeping in mind:
Intel SDM, volume 3, section 11.5.2.1, table 11-6 (NonPAT combinatorial)
Intel SDM, volume 3, section 11.5.2.2, table 11-7 (PAT combinatorial)
Linux PCD, PWT bits:
PAT
|PCD
||PWT
|||
000 WB _PAGE_CACHE_MODE_WB
001 WC _PAGE_CACHE_MODE_WC
010 UC- _PAGE_CACHE_MODE_UC_MINUS
011 UC _PAGE_CACHE_MODE_UC
(*) below denotes grey area as per SDM, implementation-defined
(%) below denotes not posislbe due to size / base requirements of MTRRs
(+) below denotes combinatorial issue
Non-PAT systems use PCD, PWT values, their respective bit settings for
these are given although internally we use _PAGE_CACHE_MODE* on the
ioremap* calls for both non-PAT and PAT. For instance
_PAGE_CACHE_MODE_UC_MINUS is 10 for PCD=1, PWT=0.
Today we have:
--------------------------------------------------------------------
Calls | Page_cache_mode | Effective memtype |
------------------------|---------------------|---------------------
| Non-PAT | PAT | Non-PAT | PAT |
--------------------------------------------------------------------
ioremap(MMIO) | xxx, 10 | xxx, UC- | xxx, UC | xxx, UC- |
ioremap(PCI BAR) | 10 , 10 | UC-, UC- | UC, UC | UC-, UC- |
MTRR WC(PCI BAR) | 10 , 10 | UC-, UC- | WC*, WC* | WC , WC |
MTRR UC(MMIO) | 10 , 10 | UC-, UC- | WC*, UC | WC , UC |
--------------------------------------------------------------------
If today we revert commit de33c442e and UC becomes default this would run into
the combinatorial issue:
--------------------------------------------------------------------
Calls | Page_cache_mode | Effective memtype |
------------------------|---------------------|---------------------
| Non-PAT | PAT | Non-PAT | PAT |
--------------------------------------------------------------------
ioremap(MMIO) | xxx, 11 | xxx, UC | xxx, UC | xxx, UC |
ioremap(PCI BAR) | 11 , 11 | UC , UC | UC, UC | UC , UC |
MTRR WC(PCI BAR) | 11 , 11 | UC, UC | UC+, UC+ | UC+, UC+ |
MTRR UC(MMIO) | 11 , 11 | UC, UC | UC+, UC | UC+, UC |
--------------------------------------------------------------------
We ideally would like to do the following but can't because of the restriction
of having to use powers of two for both size and base address for MTRRs, we'd
have two steps, one with mtrr_add, and another with arch_phys_add_wc(). This is
what this series was proposing for atyfb.
With mtrr_add():
--------------------------------------------------------------------
Calls | Page_cache_mode | Effective memtype |
------------------------|---------------------|---------------------
| Non-PAT | PAT | Non-PAT | PAT |
--------------------------------------------------------------------
ioremap_nocache(MMIO) | xxx, 10 | xxx, UC- | xxx, UC | xxx, UC- |
ioremap_wc(fb) | 01 , 10 | WC , UC- | UC , UC | WC , UC- |
MTRR WC(fb) | 01 , 10 | UC-, WC | WC%*,UC | WC%, UC- |
--------------------------------------------------------------------
Then we'd change this to arch_phys_add_wc():
--------------------------------------------------------------------
Calls | Page_cache_mode | Effective memtype |
------------------------|---------------------|---------------------
| Non-PAT | PAT | Non-PAT | PAT |
--------------------------------------------------------------------
ioremap_nocache(MMIO) | xxx, 10 | xxx, UC- | xxx, UC | UC-, UC- |
ioremap_wc(fb) | 01 , 10 | WC , UC- | UC , UC | WC , UC- |
arch_phys_add_wc(fb) | 01 , 10 | WC , WC | WC%*,UC | WC , UC- |
--------------------------------------------------------------------
With the above code as well we have to consider the issues if we
revert commit de33c442e and UC becomes default, we'd run into then
both the size issue and also a grey area:
With mtrr_add():
--------------------------------------------------------------------
Calls | Page_cache_mode | Effective memtype |
------------------------|---------------------|---------------------
| Non-PAT | PAT | Non-PAT | PAT |
--------------------------------------------------------------------
ioremap_nocache(MMIO) | xxx, 11 | xxx, UC | xxx, UC | xxx, UC |
ioremap_wc(fb) | 01 , 11 | WC , UC | UC , UC | WC , UC |
MTRR WC(fb) | 01 , 11 | WC , UC | WC%* ,UC | WC , UC |
--------------------------------------------------------------------
Then with arch_phys_add_wc():
--------------------------------------------------------------------
Calls | Page_cache_mode | Effective memtype |
------------------------|---------------------|---------------------
| Non-PAT | PAT | Non-PAT | PAT |
--------------------------------------------------------------------
ioremap_nocache(MMIO) | xxx, 11 | xxx, UC | xxx, UC | xxx, UC |
ioremap_wc(fb) | 01 , 11 | WC , UC | UC , UC | WC , UC |
arch_phys_add_wc(fb) | 01 , 11 | WC , UC | WC%*,UC | WC , UC |
--------------------------------------------------------------------
So what we *could* do then if we add ioremap_uc() (use strong UC always),
then override the framebuffer area with wc, and finally use MTRR on the
full PCI BAR, relying on that strong UC won't let the MTRR override
the earlier UC on the MMIO area. There is a grey area here for non-PAT
systemes but that is also the case as-is today.
--------------------------------------------------------------------
Calls | Page_cache_mode | Effective memtype |
------------------------|---------------------|---------------------
| Non-PAT | PAT | Non-PAT | PAT |
--------------------------------------------------------------------
ioremap_uc(PCI BAR) | 11 , 11 | UC , UC | UC , UC | UC , UC |
ioremap_wc(fb) | 01 , 11 | WC , UC | UC , UC | WC , UC |
MTRR_WC(PCI BAR) | 01 , 11 | WC , UC | WC*, UC | WC , UC |
--------------------------------------------------------------------
Finally with the arch_phys_add_wc() we'd end up with:
--------------------------------------------------------------------
Calls | Page_cache_mode | Effective memtype |
------------------------|---------------------|---------------------
| Non-PAT | PAT | Non-PAT | PAT |
--------------------------------------------------------------------
ioremap_uc(PCI BAR) | 11 , 11 | UC , UC | UC , UC | UC , UC |
ioremap_wc(fb) | 01 , 11 | WC , UC | UC , UC | WC , UC |
arch_phys_add_wc(PCIBAR)| 01 , 11 | WC , UC | WC*, UC | WC , UC |
--------------------------------------------------------------------
In this case a revert of de33c442e won't have any effect as the driver
was already well prepared for it by using ioremap_uc().
> I think that the x86 pageattr code is
> supposed to take care of this. IOW, if everything is working right,
> then the supposedly uncached mmap should either fail, be promoted to
> WC, or cause the existing WC map to degrade to UC. The code is really
> overcomplicated right now.
Yeah aliasing things are not clear for the above picture for me, someone
who is knee-deep in this can likely confirm of any issues with the above
pictures. But most importrantly if we believe however that the last two sets
above don't have any issues then I think we can move forward. Since we only
have a few drivers that need special handling I think it makes sense to treat
them specially and document this strategy for the "hole" work around.
Thoughts?
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-04-02 19:45 ` Luis R. Rodriguez
@ 2015-04-02 19:50 ` Andy Lutomirski
0 siblings, 0 replies; 400+ messages in thread
From: Andy Lutomirski @ 2015-04-02 19:50 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Mel Gorman, Vlastimil Babka, Ville Syrjälä,
Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Thu, Apr 2, 2015 at 12:45 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Wed, Apr 01, 2015 at 05:04:08PM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 1, 2015 at 4:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
>> >> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
>> >> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
>> >> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
>> >> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
>> >> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
>> >> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> >> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> >> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> >> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> >> > > >> > >> > index 8025624..8875e56 100644
>> >> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> >> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> >> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> >> > > >> > >> >
>> >> > > >> > >> > #ifdef CONFIG_MTRR
>> >> > > >> > >> > par->mtrr_aper = -1;
>> >> > > >> > >> > - par->mtrr_reg = -1;
>> >> > > >> > >> > if (!nomtrr) {
>> >> > > >> > >> > - /* Cover the whole resource. */
>> >> > > >> > >> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> >> > > >> > >> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> >> > > >> > >> > + info->fix.smem_len,
>> >> > > >> > >> > MTRR_TYPE_WRCOMB, 1);
>> >> > > >> > >>
>> >> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
>> >> > > >> > >
>> >> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
>> >> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
>> >> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
>> >> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
>> >> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
>> >> > > >> > > as per my commit log message:
>> >> > > >> >
>> >> > > >> > Whatever the code may or may not do, the x86 architecture uses
>> >> > > >> > power-of-two MTRR sizes. So I'm confused.
>> >> > > >>
>> >> > > >> There should be no confusion, I simply did not know that *was* the
>> >> > > >> requirement for x86, if that is the case we should add a check for that
>> >> > > >> and perhaps generalize a helper that does the power of two helper changes,
>> >> > > >> the cleanest I found was the vesafb driver solution.
>> >> > > >>
>> >> > > >> Thoughts?
>> >> > > >
>> >> > > > The vesafb solution is bad since you'll only end up covering only
>> >> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
>> >> > > > Which in practice will mean throwing away half the VRAM since you really
>> >> > > > don't want the massive performance hit from accessing it as UC. And that
>> >> > > > would mean giving up decent display resolutions as well :(
>> >> > > >
>> >> > > > And the other option of trying to cover the remainder with multiple ever
>> >> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
>> >> > > > quickly.
>> >> > > >
>> >> > > > This is precisely why I used the hole method in atyfb in the first
>> >> > > > place.
>> >> > > >
>> >> > > > I don't really like the idea of any new mtrr code not supporting that
>> >> > > > use case, especially as these things tend to be present in older machines
>> >> > > > where PAT isn't an option.
>> >> > >
>> >> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
>> >> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
>> >> > > an effective memory type of UC.
>> >
>> > This is true but non-PAT systems that use just ioremap() will default to
>> > _PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
>> > on Linux has PCD = 1, PWT = 0. The list comes from:
>> >
>> > uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
>> > [_PAGE_CACHE_MODE_WB ] = 0 | 0 ,
>> > [_PAGE_CACHE_MODE_WC ] = _PAGE_PWT | 0 ,
>> > [_PAGE_CACHE_MODE_UC_MINUS] = 0 | _PAGE_PCD,
>> > [_PAGE_CACHE_MODE_UC ] = _PAGE_PWT | _PAGE_PCD,
>> > [_PAGE_CACHE_MODE_WT ] = 0 | _PAGE_PCD,
>> > [_PAGE_CACHE_MODE_WP ] = 0 | _PAGE_PCD,
>> > };
>> >
>> > This can better be read here:
>> >
>> > PAT
>> > |PCD
>> > ||PWT
>> > |||
>> > 000 WB _PAGE_CACHE_MODE_WB
>> > 001 WC _PAGE_CACHE_MODE_WC
>> > 010 UC- _PAGE_CACHE_MODE_UC_MINUS
>> > 011 UC _PAGE_CACHE_MODE_UC
>> >
>> > On x86 ioremap() defaults to ioremap_nocache() and right now that uses
>> > _PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
>> > to consider for non-PAT systems then:
>> >
>> > a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
>> > on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
>> > table table 11-6 on non-PAT systems seems to place this situation as
>> > "implementation defined" and not encouraged.
>> >
>> > a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
>> > UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
>> > gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
>> > case on x86 for both ioremap() and ioremap_nocache() as they will
>> > both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
>> > an effective memory type of UC.
>> >
>> > If I've understood this correctly then neither of these situations are good and
>> > its just by chance that on some systems situation a) has lead to proper WC.
>> >
>> > On a PAT system we have a bit different combinatorial results (based on Table
>> > 11-7):
>> >
>> > a) Right now ioremap() and ioremap_nocache() defaulting to
>> > _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
>> >
>> > b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
>> >
>> > So to be clear right now atyfb should work fine on PAT systems
>> > with de33c442e in place, once reverted as-is right now we'd end
>> > up with UC effective memory type.
>> >
>> > For both PAT and non-PAT systems when commit de33c442e gets reverted
>> > we'd end up with UC as the effective memory type for atyfb. Right
>> > now it shoud work on PAT systems and by chance its suspected to work
>> > on non-PAT systems. We want to phase MTRR though, specially to avoid
>> > all this insane combinatorial nightmware.
>> >
>> >> > > Hence my suggestion to add
>> >> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
>> >> > > otherwise WC MTRR-covered region.
>> >
>> > To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
>> > jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
>> > and after commit de33c442e gets reverted. So for instance if we had on the
>> > atyfb driver:
>> >
>> > ioremap_x86_uc(PCI BAR)
>> > ioremap_wc(framebuffer)
>> > arch_phys_add_wc(PCI BAR)
>> >
>> > On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
>> > Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
>> > MTRR that follows would mean we'd end up with another grey area (but
>> > similar to before as technically an effectivethe memory type of WC).
>> >
>> > On PAT systems the above would not use MTRRs but we'd be counting on
>> > overlapping memory types -- its not clear if aliasing here is a problem.
>> >
>> > Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
>> > describes that: "the minimum range size is 4 KiB, the base address must be on
>> > a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
>> > 2^n and its base address must be alinged on a 2^n boundary where n is a value
>> > equal or greatar then 12. The base-address alignment value cannot be less
>> > than its length. For example, an 8-KiB range cannot be aligned on a
>> > 4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
>> >
>> > So to answer my own question: indeed, our framebuffer base address must be
>> > aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
>> > fixed range sizes and variable range sizes, in case of the MMIO that does
>> > not need to abide by the power of 2 rule as a fixed range size of 4 KiB
>> > could be used although upon review ouf our own implemetnation its unclear if
>> > that is what is used for 4 KiB sized MTRRs.
>> >
>> > Hence my arch_phys_add_wc(PCI BAR) as above.
>> >
>> >> > OK I think I get it now.
>> >> >
>> >> > And I take it this would hopefully only be used for non-PAT systems?
>> >
>> > Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
>> > could make the effective for both PAT and non-PAT obviously then. Later when
>> > we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
>> > only need it as transitory until then -- that is unless we want perhaps a strong
>> > UC ioremap primitive which is always following strong UC when available regardless
>> > of these default transitions.
>> >
>> > The big issue I see here is simply the combinatorial issues, so I do think
>> > its best to annotate these corner cases well and avoid them.
>> >
>> >> > Would there be a use case for PAT systems? I wonder if we can wrap
>> >> > this under some APIs to make it clean and hide this dirty thing
>> >> > behind the scenes, it seems a fragile and error prone and my hope
>> >> > would be that we won't need more specialization in this area for
>> >> > PAT systems.
>> >>
>> >> One potential complication is kernel vs. userspace mmap. MTRR applies to
>> >> the physical address, but PAT applies to the virtual address, so with
>> >> the WC MTRR you get WC for userspace "for free" as well.
>> >
>> > What is the performance impact of having the conversion being done by the
>> > kernel? Has anyone done measurements? If significant can't the subsystem mmap()
>> > cache the phys address for PAT? Shouldn't the TLB take care of those considerations
>> > for us? If this is generally desirable shouldn't we just generalize the cache
>> > for devices for O(1) access through a generic API?
>>
>> We're pretty much required to keep the PTE memory types consistent for
>> aliasses of the same page.
>
> Hrm, OK so overlapping ioremap() calls should be frowed upon?
>
> I think its important to clarify the few different scenarios we have
> for atyfb, both for today when uc- is default and when uc becomes the
> default. I'll also clarify what this series originally tried to do
> but the issues that size requirements prohibit us to do along with
> combinatorial issues that would also be present when and if uc becomes
> default. Finally I'll clarify what I am thinking we should do in light
> of all this.
>
> _______________________________________________________________________
> | | |
> |_______________________________________________________|_____________|
>
> \______________________________________________________/ \____________/
>
> Framebuffer (8 MiB) MMIO (4 KiB)
>
> Currently we have:
>
> Page_cache_mode's _PAGE_CACHE_MODE_ is removed below for brevity.
> The atyfb PCI BAR is condensed to:
>
> Frambuffer,MMIO
>
> Keeping in mind:
>
> Intel SDM, volume 3, section 11.5.2.1, table 11-6 (NonPAT combinatorial)
> Intel SDM, volume 3, section 11.5.2.2, table 11-7 (PAT combinatorial)
>
> Linux PCD, PWT bits:
>
> PAT
> |PCD
> ||PWT
> |||
> 000 WB _PAGE_CACHE_MODE_WB
> 001 WC _PAGE_CACHE_MODE_WC
> 010 UC- _PAGE_CACHE_MODE_UC_MINUS
> 011 UC _PAGE_CACHE_MODE_UC
>
> (*) below denotes grey area as per SDM, implementation-defined
> (%) below denotes not posislbe due to size / base requirements of MTRRs
> (+) below denotes combinatorial issue
>
> Non-PAT systems use PCD, PWT values, their respective bit settings for
> these are given although internally we use _PAGE_CACHE_MODE* on the
> ioremap* calls for both non-PAT and PAT. For instance
> _PAGE_CACHE_MODE_UC_MINUS is 10 for PCD=1, PWT=0.
>
> Today we have:
>
> --------------------------------------------------------------------
> Calls | Page_cache_mode | Effective memtype |
> ------------------------|---------------------|---------------------
> | Non-PAT | PAT | Non-PAT | PAT |
> --------------------------------------------------------------------
> ioremap(MMIO) | xxx, 10 | xxx, UC- | xxx, UC | xxx, UC- |
> ioremap(PCI BAR) | 10 , 10 | UC-, UC- | UC, UC | UC-, UC- |
> MTRR WC(PCI BAR) | 10 , 10 | UC-, UC- | WC*, WC* | WC , WC |
> MTRR UC(MMIO) | 10 , 10 | UC-, UC- | WC*, UC | WC , UC |
> --------------------------------------------------------------------
>
> If today we revert commit de33c442e and UC becomes default this would run into
> the combinatorial issue:
>
> --------------------------------------------------------------------
> Calls | Page_cache_mode | Effective memtype |
> ------------------------|---------------------|---------------------
> | Non-PAT | PAT | Non-PAT | PAT |
> --------------------------------------------------------------------
> ioremap(MMIO) | xxx, 11 | xxx, UC | xxx, UC | xxx, UC |
> ioremap(PCI BAR) | 11 , 11 | UC , UC | UC, UC | UC , UC |
> MTRR WC(PCI BAR) | 11 , 11 | UC, UC | UC+, UC+ | UC+, UC+ |
> MTRR UC(MMIO) | 11 , 11 | UC, UC | UC+, UC | UC+, UC |
> --------------------------------------------------------------------
>
> We ideally would like to do the following but can't because of the restriction
> of having to use powers of two for both size and base address for MTRRs, we'd
> have two steps, one with mtrr_add, and another with arch_phys_add_wc(). This is
> what this series was proposing for atyfb.
>
> With mtrr_add():
>
> --------------------------------------------------------------------
> Calls | Page_cache_mode | Effective memtype |
> ------------------------|---------------------|---------------------
> | Non-PAT | PAT | Non-PAT | PAT |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO) | xxx, 10 | xxx, UC- | xxx, UC | xxx, UC- |
> ioremap_wc(fb) | 01 , 10 | WC , UC- | UC , UC | WC , UC- |
> MTRR WC(fb) | 01 , 10 | UC-, WC | WC%*,UC | WC%, UC- |
> --------------------------------------------------------------------
>
> Then we'd change this to arch_phys_add_wc():
>
> --------------------------------------------------------------------
> Calls | Page_cache_mode | Effective memtype |
> ------------------------|---------------------|---------------------
> | Non-PAT | PAT | Non-PAT | PAT |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO) | xxx, 10 | xxx, UC- | xxx, UC | UC-, UC- |
> ioremap_wc(fb) | 01 , 10 | WC , UC- | UC , UC | WC , UC- |
> arch_phys_add_wc(fb) | 01 , 10 | WC , WC | WC%*,UC | WC , UC- |
> --------------------------------------------------------------------
>
> With the above code as well we have to consider the issues if we
> revert commit de33c442e and UC becomes default, we'd run into then
> both the size issue and also a grey area:
>
> With mtrr_add():
>
> --------------------------------------------------------------------
> Calls | Page_cache_mode | Effective memtype |
> ------------------------|---------------------|---------------------
> | Non-PAT | PAT | Non-PAT | PAT |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO) | xxx, 11 | xxx, UC | xxx, UC | xxx, UC |
> ioremap_wc(fb) | 01 , 11 | WC , UC | UC , UC | WC , UC |
> MTRR WC(fb) | 01 , 11 | WC , UC | WC%* ,UC | WC , UC |
> --------------------------------------------------------------------
>
> Then with arch_phys_add_wc():
>
> --------------------------------------------------------------------
> Calls | Page_cache_mode | Effective memtype |
> ------------------------|---------------------|---------------------
> | Non-PAT | PAT | Non-PAT | PAT |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO) | xxx, 11 | xxx, UC | xxx, UC | xxx, UC |
> ioremap_wc(fb) | 01 , 11 | WC , UC | UC , UC | WC , UC |
> arch_phys_add_wc(fb) | 01 , 11 | WC , UC | WC%*,UC | WC , UC |
> --------------------------------------------------------------------
>
> So what we *could* do then if we add ioremap_uc() (use strong UC always),
> then override the framebuffer area with wc, and finally use MTRR on the
> full PCI BAR, relying on that strong UC won't let the MTRR override
> the earlier UC on the MMIO area. There is a grey area here for non-PAT
> systemes but that is also the case as-is today.
>
> --------------------------------------------------------------------
> Calls | Page_cache_mode | Effective memtype |
> ------------------------|---------------------|---------------------
> | Non-PAT | PAT | Non-PAT | PAT |
> --------------------------------------------------------------------
> ioremap_uc(PCI BAR) | 11 , 11 | UC , UC | UC , UC | UC , UC |
> ioremap_wc(fb) | 01 , 11 | WC , UC | UC , UC | WC , UC |
> MTRR_WC(PCI BAR) | 01 , 11 | WC , UC | WC*, UC | WC , UC |
> --------------------------------------------------------------------
>
> Finally with the arch_phys_add_wc() we'd end up with:
>
> --------------------------------------------------------------------
> Calls | Page_cache_mode | Effective memtype |
> ------------------------|---------------------|---------------------
> | Non-PAT | PAT | Non-PAT | PAT |
> --------------------------------------------------------------------
> ioremap_uc(PCI BAR) | 11 , 11 | UC , UC | UC , UC | UC , UC |
> ioremap_wc(fb) | 01 , 11 | WC , UC | UC , UC | WC , UC |
> arch_phys_add_wc(PCIBAR)| 01 , 11 | WC , UC | WC*, UC | WC , UC |
> --------------------------------------------------------------------
>
> In this case a revert of de33c442e won't have any effect as the driver
> was already well prepared for it by using ioremap_uc().
>
>> I think that the x86 pageattr code is
>> supposed to take care of this. IOW, if everything is working right,
>> then the supposedly uncached mmap should either fail, be promoted to
>> WC, or cause the existing WC map to degrade to UC. The code is really
>> overcomplicated right now.
>
> Yeah aliasing things are not clear for the above picture for me, someone
> who is knee-deep in this can likely confirm of any issues with the above
> pictures. But most importrantly if we believe however that the last two sets
> above don't have any issues then I think we can move forward. Since we only
> have a few drivers that need special handling I think it makes sense to treat
> them specially and document this strategy for the "hole" work around.
>
Seems reaonable to me.
--Andy
> Thoughts?
>
> Luis
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
2015-03-27 21:56 ` Ville Syrjälä
2015-03-27 22:02 ` Andy Lutomirski
@ 2015-03-28 0:21 ` Luis R. Rodriguez
1 sibling, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-28 0:21 UTC (permalink / raw)
To: Ville Syrjälä,
Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 27, 2015 at 11:56:55PM +0200, Ville Syrjälä wrote:
> On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > index 8025624..8875e56 100644
> > > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > > >> >
> > > >> > #ifdef CONFIG_MTRR
> > > >> > par->mtrr_aper = -1;
> > > >> > - par->mtrr_reg = -1;
> > > >> > if (!nomtrr) {
> > > >> > - /* Cover the whole resource. */
> > > >> > - par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > > >> > + par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > > >> > + info->fix.smem_len,
> > > >> > MTRR_TYPE_WRCOMB, 1);
> > > >>
> > > >> MTRRs need power of two size, so how is this supposed to work?
> > > >
> > > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > > > is not standardized and by no means recorded as a requirement. Obviously
> > > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > > > as per my commit log message:
> > >
> > > Whatever the code may or may not do, the x86 architecture uses
> > > power-of-two MTRR sizes. So I'm confused.
> >
> > There should be no confusion, I simply did not know that *was* the
> > requirement for x86, if that is the case we should add a check for that
> > and perhaps generalize a helper that does the power of two helper changes,
> > the cleanest I found was the vesafb driver solution.
> >
> > Thoughts?
>
> The vesafb solution is bad since you'll only end up covering only
> the first 4MB of the framebuffer instead of the almost 8MB you want.
OK so the power of 2 requirement implicates us *having* to use a large
MTRR that includes the MMIo region in the shared PCI case?
Andy, Ville, are we 100% certain about this power of two requirement?
Is that for the base and size or just the size?
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH v1 10/47] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (8 preceding siblings ...)
2015-03-20 23:17 ` [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 11/47] IB/qib: add acounting for MTRR Luis R. Rodriguez
` (37 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/aty/atyfb.h | 4 +---
drivers/video/fbdev/aty/atyfb_base.c | 41 +++++++++---------------------------
2 files changed, 11 insertions(+), 34 deletions(-)
diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
index 89ec439..63c4842 100644
--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -182,9 +182,7 @@ struct atyfb_par {
unsigned long irq_flags;
unsigned int irq;
spinlock_t int_lock;
-#ifdef CONFIG_MTRR
- int mtrr_aper;
-#endif
+ int wc_cookie;
u32 mem_cntl;
struct crtc saved_crtc;
union aty_pll saved_pll;
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8875e56..af278bb 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -98,9 +98,6 @@
#ifdef CONFIG_PMAC_BACKLIGHT
#include <asm/backlight.h>
#endif
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
/*
* Debug flags.
@@ -303,9 +300,7 @@ static struct fb_ops atyfb_ops = {
};
static bool noaccel;
-#ifdef CONFIG_MTRR
static bool nomtrr;
-#endif
static int vram;
static int pll;
static int mclk;
@@ -2628,14 +2623,9 @@ static int aty_init(struct fb_info *info)
aty_st_le32(BUS_CNTL, aty_ld_le32(BUS_CNTL, par) |
BUS_APER_REG_DIS, par);
-#ifdef CONFIG_MTRR
- par->mtrr_aper = -1;
- if (!nomtrr) {
- par->mtrr_aper = mtrr_add(info->fix.smem_start,
- info->fix.smem_len,
- MTRR_TYPE_WRCOMB, 1);
- }
-#endif
+ if (!nomtrr)
+ par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+ info->fix.smem_len);
info->fbops = &atyfb_ops;
info->pseudo_palette = par->pseudo_palette;
@@ -2763,13 +2753,8 @@ aty_init_exit:
/* restore video mode */
aty_set_crtc(par, &par->saved_crtc);
par->pll_ops->set_pll(info, &par->saved_pll);
+ arch_phys_wc_del(par->wc_cookie);
-#ifdef CONFIG_MTRR
- if (par->mtrr_aper >= 0) {
- mtrr_del(par->mtrr_aper, 0, 0);
- par->mtrr_aper = -1;
- }
-#endif
return ret;
}
@@ -3478,7 +3463,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
aty_fudge_framebuffer_len(info);
- info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+ info->screen_base = ioremap_wc(info->fix.smem_start,
+ info->fix.smem_len);
if (info->screen_base == NULL) {
ret = -ENOMEM;
goto atyfb_setup_generic_fail;
@@ -3652,7 +3638,8 @@ static int __init atyfb_atari_probe(void)
* Map the video memory (physical address given)
* to somewhere in the kernel address space.
*/
- info->screen_base = ioremap(phys_vmembase[m64_num], phys_size[m64_num]);
+ info->screen_base = ioremap_wc(phys_vmembase[m64_num],
+ phys_size[m64_num]);
info->fix.smem_start = (unsigned long)info->screen_base; /* Fake! */
par->ati_regbase = ioremap(phys_guiregbase[m64_num], 0x10000) +
0xFC00ul;
@@ -3719,12 +3706,8 @@ static void atyfb_remove(struct fb_info *info)
aty_bl_exit(info->bl_dev);
#endif
-#ifdef CONFIG_MTRR
- if (par->mtrr_aper >= 0) {
- mtrr_del(par->mtrr_aper, 0, 0);
- par->mtrr_aper = -1;
- }
-#endif
+ arch_phys_wc_del(par->wc_cookie);
+
#ifndef __sparc__
if (par->ati_regbase)
iounmap(par->ati_regbase);
@@ -3840,10 +3823,8 @@ static int __init atyfb_setup(char *options)
while ((this_opt = strsep(&options, ",")) != NULL) {
if (!strncmp(this_opt, "noaccel", 7)) {
noaccel = 1;
-#ifdef CONFIG_MTRR
} else if (!strncmp(this_opt, "nomtrr", 6)) {
nomtrr = 1;
-#endif
} else if (!strncmp(this_opt, "vram:", 5))
vram = simple_strtoul(this_opt + 5, NULL, 0);
else if (!strncmp(this_opt, "pll:", 4))
@@ -4013,7 +3994,5 @@ module_param(comp_sync, int, 0);
MODULE_PARM_DESC(comp_sync, "Set composite sync signal to low (0) or high (1)");
module_param(mode, charp, 0);
MODULE_PARM_DESC(mode, "Specify resolution as \"<xres>x<yres>[-<bpp>][@<refresh>]\" ");
-#ifdef CONFIG_MTRR
module_param(nomtrr, bool, 0);
MODULE_PARM_DESC(nomtrr, "bool: disable use of MTRR registers");
-#endif
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 11/47] IB/qib: add acounting for MTRR
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (9 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 10/47] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 12/47] IB/qib: use arch_phys_wc_add() Luis R. Rodriguez
` (36 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
There is no good reason not to, we eventually delete it as well.
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/infiniband/hw/qib/qib_wc_x86_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/qib/qib_wc_x86_64.c b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
index 81b225f..fe0850a 100644
--- a/drivers/infiniband/hw/qib/qib_wc_x86_64.c
+++ b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
@@ -118,7 +118,7 @@ int qib_enable_wc(struct qib_devdata *dd)
if (!ret) {
int cookie;
- cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 0);
+ cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
if (cookie < 0) {
{
qib_devinfo(dd->pcidev,
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 12/47] IB/qib: use arch_phys_wc_add()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (10 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 11/47] IB/qib: add acounting for MTRR Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 13/47] IB/ipath: add counting for MTRR Luis R. Rodriguez
` (35 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Rickard Strandqvist, Dennis Dalessandro, Mike Marciniszyn,
Roland Dreier, Ingo Molnar, Daniel Vetter, Bjorn Helgaas,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
toshi.kani, Roger Pau Monné,
xen-devel
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver already makes use of ioremap_wc() on PIO buffers,
so convert it to use arch_phys_wc_add().
Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/infiniband/hw/qib/qib_wc_x86_64.c | 31 ++++---------------------------
1 file changed, 4 insertions(+), 27 deletions(-)
diff --git a/drivers/infiniband/hw/qib/qib_wc_x86_64.c b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
index fe0850a..6d61ef9 100644
--- a/drivers/infiniband/hw/qib/qib_wc_x86_64.c
+++ b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
@@ -116,21 +116,9 @@ int qib_enable_wc(struct qib_devdata *dd)
}
if (!ret) {
- int cookie;
-
- cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
- if (cookie < 0) {
- {
- qib_devinfo(dd->pcidev,
- "mtrr_add() WC for PIO bufs failed (%d)\n",
- cookie);
- ret = -EINVAL;
- }
- } else {
- dd->wc_cookie = cookie;
- dd->wc_base = (unsigned long) pioaddr;
- dd->wc_len = (unsigned long) piolen;
- }
+ dd->wc_cookie = arch_phys_wc_add(pioaddr, piolen);
+ if (dd->wc_cookie < 0)
+ ret = -EINVAL;
}
return ret;
@@ -142,18 +130,7 @@ int qib_enable_wc(struct qib_devdata *dd)
*/
void qib_disable_wc(struct qib_devdata *dd)
{
- if (dd->wc_cookie) {
- int r;
-
- r = mtrr_del(dd->wc_cookie, dd->wc_base,
- dd->wc_len);
- if (r < 0)
- qib_devinfo(dd->pcidev,
- "mtrr_del(%lx, %lx, %lx) failed: %d\n",
- dd->wc_cookie, dd->wc_base,
- dd->wc_len, r);
- dd->wc_cookie = 0; /* even on failure */
- }
+ arch_phys_wc_del(dd->wc_cookie);
}
/**
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 13/47] IB/ipath: add counting for MTRR
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (11 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 12/47] IB/qib: use arch_phys_wc_add() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 14/47] IB/ipath: use __arch_phys_wc_add() Luis R. Rodriguez
` (34 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
There is no good reason not to, we eventually delete it as well.
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 4ad0b93..70c1f3a 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -127,7 +127,7 @@ int ipath_enable_wc(struct ipath_devdata *dd)
"(addr %llx, len=0x%llx)\n",
(unsigned long long) pioaddr,
(unsigned long long) piolen);
- cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 0);
+ cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
if (cookie < 0) {
{
dev_info(&dd->pcidev->dev,
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 14/47] IB/ipath: use __arch_phys_wc_add()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (12 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 13/47] IB/ipath: add counting for MTRR Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 15/47] [media] media: ivtv: " Luis R. Rodriguez
` (33 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
Ingo Molnar, Linus Torvalds, Daniel Vetter, Bjorn Helgaas,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
toshi.kani, Roger Pau Monné,
xen-devel
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver sadly does not have the MMIO registers and WC
desired areas (PIO buffers in this case) properly split up
and addressing a split is considerable work, as such this
such requires using the __arch_phys_wc_add() call to
ensure write combining is enforced using MTRR on x86
even when PAT is available.
Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/infiniband/hw/ipath/ipath_driver.c | 7 ++--
drivers/infiniband/hw/ipath/ipath_kernel.h | 4 +--
drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 47 ++++++++++-----------------
3 files changed, 20 insertions(+), 38 deletions(-)
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index bd0caed..464f39c 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -542,6 +542,7 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
dd->ipath_kregbase = __ioremap(addr, len,
(_PAGE_NO_CACHE|_PAGE_WRITETHRU));
#else
+ /* XXX: split pio on a separate ioremap_wc() */
dd->ipath_kregbase = ioremap_nocache(addr, len);
#endif
@@ -587,12 +588,8 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
ret = ipath_enable_wc(dd);
- if (ret) {
- ipath_dev_err(dd, "Write combining not enabled "
- "(err %d): performance may be poor\n",
- -ret);
+ if (ret)
ret = 0;
- }
ipath_verify_pioperf(dd);
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index e08db70..f0f9471 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -463,9 +463,7 @@ struct ipath_devdata {
/* offset in HT config space of slave/primary interface block */
u8 ipath_ht_slave_off;
/* for write combining settings */
- unsigned long ipath_wc_cookie;
- unsigned long ipath_wc_base;
- unsigned long ipath_wc_len;
+ int wc_cookie;
/* ref count for each pkey */
atomic_t ipath_pkeyrefs[4];
/* shadow copy of struct page *'s for exp tid pages */
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 70c1f3a..88709c1 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -37,7 +37,6 @@
*/
#include <linux/pci.h>
-#include <asm/mtrr.h>
#include <asm/processor.h>
#include "ipath_kernel.h"
@@ -122,27 +121,26 @@ int ipath_enable_wc(struct ipath_devdata *dd)
}
if (!ret) {
- int cookie;
ipath_cdbg(VERBOSE, "Setting mtrr for chip to WC "
"(addr %llx, len=0x%llx)\n",
(unsigned long long) pioaddr,
(unsigned long long) piolen);
- cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
- if (cookie < 0) {
- {
- dev_info(&dd->pcidev->dev,
- "mtrr_add() WC for PIO bufs "
- "failed (%d)\n",
- cookie);
- ret = -EINVAL;
- }
- } else {
- ipath_cdbg(VERBOSE, "Set mtrr for chip to WC, "
- "cookie is %d\n", cookie);
- dd->ipath_wc_cookie = cookie;
- dd->ipath_wc_base = (unsigned long) pioaddr;
- dd->ipath_wc_len = (unsigned long) piolen;
- }
+ dd->wc_cookie = __arch_phys_wc_add(pioaddr, piolen);
+ if (dd->wc_cookie <= 0) {
+ /*
+ * If MTRR is not available on an architecture
+ * or if it could not be enabled at run time
+ * folks who care should work towards the
+ * ioremap_wc() split.
+ */
+ if (!dd->wc_cookie)
+ ipath_dev_err(dd, "System does not support MTRR\n");
+ else {
+ ipath_dev_err(dd, "Seting mtrr failed on PIO buffers\n");
+ ret = -EINVAL;
+ }
+ } else
+ ipath_cdbg(VERBOSE, "Set mtrr for chip to WC\n");
}
return ret;
@@ -154,16 +152,5 @@ int ipath_enable_wc(struct ipath_devdata *dd)
*/
void ipath_disable_wc(struct ipath_devdata *dd)
{
- if (dd->ipath_wc_cookie) {
- int r;
- ipath_cdbg(VERBOSE, "undoing WCCOMB on pio buffers\n");
- r = mtrr_del(dd->ipath_wc_cookie, dd->ipath_wc_base,
- dd->ipath_wc_len);
- if (r < 0)
- dev_info(&dd->pcidev->dev,
- "mtrr_del(%lx, %lx, %lx) failed: %d\n",
- dd->ipath_wc_cookie, dd->ipath_wc_base,
- dd->ipath_wc_len, r);
- dd->ipath_wc_cookie = 0; /* even on failure */
- }
+ arch_phys_wc_del(dd->wc_cookie);
}
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 15/47] [media] media: ivtv: use __arch_phys_wc_add()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (13 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 14/47] IB/ipath: use __arch_phys_wc_add() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 16/47] fusion: " Luis R. Rodriguez
` (32 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Andy Walls, Ingo Molnar, Daniel Vetter, Bjorn Helgaas,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
toshi.kani, Roger Pau Monné,
ivtv-devel, linux-media, xen-devel
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Sadly this driver requires a bit of work in order
to use ioremap_wc() on the range currently used
for MTRR write-combining. We'd need to ensure two
ioremap() calls are done. Annotate this.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: ivtv-devel@ivtvdriver.org
Cc: linux-media@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/media/pci/ivtv/ivtvfb.c | 51 +++++++++++------------------------------
1 file changed, 14 insertions(+), 37 deletions(-)
diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
index 9ff1230..ceefa6f 100644
--- a/drivers/media/pci/ivtv/ivtvfb.c
+++ b/drivers/media/pci/ivtv/ivtvfb.c
@@ -44,10 +44,6 @@
#include <linux/ivtvfb.h>
#include <linux/slab.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
#include "ivtv-driver.h"
#include "ivtv-cards.h"
#include "ivtv-i2c.h"
@@ -155,12 +151,11 @@ struct osd_info {
/* Buffer size */
u32 video_buffer_size;
-#ifdef CONFIG_MTRR
/* video_base rounded down as required by hardware MTRRs */
unsigned long fb_start_aligned_physaddr;
/* video_base rounded up as required by hardware MTRRs */
unsigned long fb_end_aligned_physaddr;
-#endif
+ int wc_cookie;
/* Store the buffer offset */
int set_osd_coords_x;
@@ -1099,6 +1094,8 @@ static int ivtvfb_init_vidmode(struct ivtv *itv)
static int ivtvfb_init_io(struct ivtv *itv)
{
struct osd_info *oi = itv->osd_info;
+ /* Find the largest power of two that maps the whole buffer */
+ int size_shift = 31;
mutex_lock(&itv->serialize_lock);
if (ivtv_init_on_first_open(itv)) {
@@ -1132,29 +1129,16 @@ static int ivtvfb_init_io(struct ivtv *itv)
oi->video_pbase, oi->video_vbase,
oi->video_buffer_size / 1024);
-#ifdef CONFIG_MTRR
- {
- /* Find the largest power of two that maps the whole buffer */
- int size_shift = 31;
-
- while (!(oi->video_buffer_size & (1 << size_shift))) {
- size_shift--;
- }
- size_shift++;
- oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
- oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
- oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
- oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
- if (mtrr_add(oi->fb_start_aligned_physaddr,
- oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr,
- MTRR_TYPE_WRCOMB, 1) < 0) {
- IVTVFB_INFO("disabled mttr\n");
- oi->fb_start_aligned_physaddr = 0;
- oi->fb_end_aligned_physaddr = 0;
- }
- }
-#endif
-
+ while (!(oi->video_buffer_size & (1 << size_shift)))
+ size_shift--;
+ size_shift++;
+ oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
+ oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
+ oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
+ oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
+ oi->wc_cookie = __arch_phys_wc_add(oi->fb_start_aligned_physaddr,
+ oi->fb_end_aligned_physaddr -
+ oi->fb_start_aligned_physaddr);
/* Blank the entire osd. */
memset_io(oi->video_vbase, 0, oi->video_buffer_size);
@@ -1172,14 +1156,7 @@ static void ivtvfb_release_buffers (struct ivtv *itv)
/* Release pseudo palette */
kfree(oi->ivtvfb_info.pseudo_palette);
-
-#ifdef CONFIG_MTRR
- if (oi->fb_end_aligned_physaddr) {
- mtrr_del(-1, oi->fb_start_aligned_physaddr,
- oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr);
- }
-#endif
-
+ arch_phys_wc_del(oi->wc_cookie);
kfree(oi);
itv->osd_info = NULL;
}
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 16/47] fusion: use __arch_phys_wc_add()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (14 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 15/47] [media] media: ivtv: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 17/47] video: fbdev: vesafb: only support MTRR_TYPE_WRCOMB Luis R. Rodriguez
` (31 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Nagalakshmi Nandigama,
Praveen Krishnamoorthy, Sreekanth Reddy, Abhijit Mahajan,
Antonino Daplas, Tomi Valkeinen,
Jean-Christophe Plagniol-Villard, MPT-FusionLinux.pdl,
linux-scsi
From: "Luis R. Rodriguez" <mcgrof@suse.com>
If and when this gets enabled the driver should address
using ioremap_wc() on the same area, that could require
a bit of work as it would mean a split with two ioremap'd
areas. Annotate this.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>
Cc: Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: Abhijit Mahajan <abhijit.mahajan@avagotech.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: MPT-FusionLinux.pdl@avagotech.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/message/fusion/mptbase.c | 19 ++++---------------
drivers/message/fusion/mptbase.h | 2 +-
2 files changed, 5 insertions(+), 16 deletions(-)
diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
index 187f836..c7b1a55 100644
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c
@@ -59,10 +59,6 @@
#include <linux/delay.h>
#include <linux/interrupt.h> /* needed for in_interrupt() proto */
#include <linux/dma-mapping.h>
-#include <asm/io.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
#include <linux/kthread.h>
#include <scsi/scsi_host.h>
@@ -2820,11 +2816,8 @@ mpt_adapter_dispose(MPT_ADAPTER *ioc)
pci_disable_device(ioc->pcidev);
pci_release_selected_regions(ioc->pcidev, ioc->bars);
-#if defined(CONFIG_MTRR) && 0
- if (ioc->mtrr_reg > 0) {
- mtrr_del(ioc->mtrr_reg, 0, 0);
- dprintk(ioc, printk(MYIOC_s_INFO_FMT "MTRR region de-registered\n", ioc->name));
- }
+#if 0
+ __arch_phys_wc_del(ioc->wc_cookie);
#endif
/* Zap the adapter lookup ptr! */
@@ -4512,17 +4505,13 @@ PrimeIocFifos(MPT_ADAPTER *ioc)
ioc->req_frames_low_dma = (u32) (alloc_dma & 0xFFFFFFFF);
-#if defined(CONFIG_MTRR) && 0
+#if 0
/*
* Enable Write Combining MTRR for IOC's memory region.
* (at least as much as we can; "size and base must be
* multiples of 4 kiB"
*/
- ioc->mtrr_reg = mtrr_add(ioc->req_frames_dma,
- sz,
- MTRR_TYPE_WRCOMB, 1);
- dprintk(ioc, printk(MYIOC_s_DEBUG_FMT "MTRR region registered (base:size=%08x:%x)\n",
- ioc->name, ioc->req_frames_dma, sz));
+ ioc->wc_cookie = arch_phys_wc_add(ioc->req_frames_dma, sz);
#endif
for (i = 0; i < ioc->req_depth; i++) {
diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h
index 8f14090..f0bff11 100644
--- a/drivers/message/fusion/mptbase.h
+++ b/drivers/message/fusion/mptbase.h
@@ -671,7 +671,7 @@ typedef struct _MPT_ADAPTER
u8 *HostPageBuffer; /* SAS - host page buffer support */
u32 HostPageBuffer_sz;
dma_addr_t HostPageBuffer_dma;
- int mtrr_reg;
+ int wc_cookie;
struct pci_dev *pcidev; /* struct pci_dev pointer */
int bars; /* bitmask of BAR's that must be configured */
int msi_enable;
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 17/47] video: fbdev: vesafb: only support MTRR_TYPE_WRCOMB
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (15 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 16/47] fusion: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 18/47] vidoe: fbdev: vesafb: add missing mtrr_del() for added MTRR Luis R. Rodriguez
` (30 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
No other video driver uses MTRR types except for MTRR_TYPE_WRCOMB,
the other MTRR types were implemented and supported here but with
no real good reason. The ioremap() APIs are architecture agnostic and
at least on x86 PAT is a new design that extends MTRRs and
can replace it in a much cleaner way, where so long as the
proper ioremap_wc() or variant API is used the right thing will
be done behind the scenes. This is the only driver left using the
other MTRR types -- and since there is no good reason for it now
rip them out.
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/vesafb.c | 62 ++++++++++++--------------------------------
1 file changed, 16 insertions(+), 46 deletions(-)
diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index d79a0ac..191156b 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -404,60 +404,30 @@ static int vesafb_probe(struct platform_device *dev)
* region already (FIXME) */
request_region(0x3c0, 32, "vesafb");
+ if (mtrr == 3) {
#ifdef CONFIG_MTRR
- if (mtrr) {
unsigned int temp_size = size_total;
- unsigned int type = 0;
+ int rc;
- switch (mtrr) {
- case 1:
- type = MTRR_TYPE_UNCACHABLE;
- break;
- case 2:
- type = MTRR_TYPE_WRBACK;
- break;
- case 3:
- type = MTRR_TYPE_WRCOMB;
- break;
- case 4:
- type = MTRR_TYPE_WRTHROUGH;
- break;
- default:
- type = 0;
- break;
- }
-
- if (type) {
- int rc;
-
- /* Find the largest power-of-two */
- temp_size = roundup_pow_of_two(temp_size);
+ /* Find the largest power-of-two */
+ temp_size = roundup_pow_of_two(temp_size);
- /* Try and find a power of two to add */
- do {
- rc = mtrr_add(vesafb_fix.smem_start, temp_size,
- type, 1);
- temp_size >>= 1;
- } while (temp_size >= PAGE_SIZE && rc == -EINVAL);
- }
- }
+ /* Try and find a power of two to add */
+ do {
+ rc = mtrr_add(vesafb_fix.smem_start, temp_size,
+ MTRR_TYPE_WRCOMB, 1);
+ temp_size >>= 1;
+ } while (temp_size >= PAGE_SIZE && rc == -EINVAL);
#endif
-
- switch (mtrr) {
- case 1: /* uncachable */
- info->screen_base = ioremap_nocache(vesafb_fix.smem_start, vesafb_fix.smem_len);
- break;
- case 2: /* write-back */
- info->screen_base = ioremap_cache(vesafb_fix.smem_start, vesafb_fix.smem_len);
- break;
- case 3: /* write-combining */
info->screen_base = ioremap_wc(vesafb_fix.smem_start, vesafb_fix.smem_len);
- break;
- case 4: /* write-through */
- default:
+ } else {
+#ifdef CONFIG_MTRR
+ if (mtrr && mtrr != 3)
+ WARN_ONCE(1, "Only MTRR_TYPE_WRCOMB (3) make sense\n");
+#endif
info->screen_base = ioremap(vesafb_fix.smem_start, vesafb_fix.smem_len);
- break;
}
+
if (!info->screen_base) {
printk(KERN_ERR
"vesafb: abort, cannot ioremap video memory 0x%x @ 0x%lx\n",
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 18/47] vidoe: fbdev: vesafb: add missing mtrr_del() for added MTRR
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (16 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 17/47] video: fbdev: vesafb: only support MTRR_TYPE_WRCOMB Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 19/47] video: fbdev: vesafb: use arch_phys_wc_add() Luis R. Rodriguez
` (29 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
The MTRR added was never being deleted.
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/vesafb.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index 191156b..a2261d0 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -29,6 +29,10 @@
/* --------------------------------------------------------------------- */
+struct vesafb_par {
+ int wc_cookie;
+};
+
static struct fb_var_screeninfo vesafb_defined = {
.activate = FB_ACTIVATE_NOW,
.height = -1,
@@ -175,7 +179,16 @@ static int vesafb_setcolreg(unsigned regno, unsigned red, unsigned green,
static void vesafb_destroy(struct fb_info *info)
{
+#ifdef CONFIG_MTRR
+ struct vesafb_par *par = info->par;
+#endif
+
fb_dealloc_cmap(&info->cmap);
+
+#ifdef CONFIG_MTRR
+ if (par->wc_cookie >= 0)
+ mtrr_del(par->wc_cookie, 0, 0);
+#endif
if (info->screen_base)
iounmap(info->screen_base);
release_mem_region(info->apertures->ranges[0].base, info->apertures->ranges[0].size);
@@ -228,6 +241,7 @@ static int vesafb_setup(char *options)
static int vesafb_probe(struct platform_device *dev)
{
struct fb_info *info;
+ struct vesafb_par *par;
int i, err;
unsigned int size_vmode;
unsigned int size_remap;
@@ -297,8 +311,8 @@ static int vesafb_probe(struct platform_device *dev)
return -ENOMEM;
}
platform_set_drvdata(dev, info);
- info->pseudo_palette = info->par;
- info->par = NULL;
+ info->pseudo_palette = NULL;
+ par = info->par;
/* set vesafb aperture size for generic probing */
info->apertures = alloc_apertures(1);
@@ -407,17 +421,17 @@ static int vesafb_probe(struct platform_device *dev)
if (mtrr == 3) {
#ifdef CONFIG_MTRR
unsigned int temp_size = size_total;
- int rc;
/* Find the largest power-of-two */
temp_size = roundup_pow_of_two(temp_size);
/* Try and find a power of two to add */
do {
- rc = mtrr_add(vesafb_fix.smem_start, temp_size,
- MTRR_TYPE_WRCOMB, 1);
+ par->wc_cookie = mtrr_add(vesafb_fix.smem_start,
+ temp_size,
+ MTRR_TYPE_WRCOMB, 1);
temp_size >>= 1;
- } while (temp_size >= PAGE_SIZE && rc == -EINVAL);
+ } while (temp_size >= PAGE_SIZE && par->wc_cookie == -EINVAL);
#endif
info->screen_base = ioremap_wc(vesafb_fix.smem_start, vesafb_fix.smem_len);
} else {
@@ -462,6 +476,10 @@ static int vesafb_probe(struct platform_device *dev)
fb_info(info, "%s frame buffer device\n", info->fix.id);
return 0;
err:
+#ifdef CONFIG_MTRR
+ if (par->wc_cookie >= 0)
+ mtrr_del(par->wc_cookie, 0, 0);
+#endif
if (info->screen_base)
iounmap(info->screen_base);
framebuffer_release(info);
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 19/47] video: fbdev: vesafb: use arch_phys_wc_add()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (17 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 18/47] vidoe: fbdev: vesafb: add missing mtrr_del() for added MTRR Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 20/47] mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index() Luis R. Rodriguez
` (28 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver uses the same area for MTRR as for the ioremap_wc(), if
anything it just uses a smaller size in case MTRR reservation fails.
ioremap_wc() API is already used to take advantage of architecture
write-combining when available.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/vesafb.c | 29 ++++++++---------------------
1 file changed, 8 insertions(+), 21 deletions(-)
diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index a2261d0..5bc94d3 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -19,10 +19,9 @@
#include <linux/init.h>
#include <linux/platform_device.h>
#include <linux/screen_info.h>
+#include <linux/io.h>
#include <video/vga.h>
-#include <asm/io.h>
-#include <asm/mtrr.h>
#define dac_reg (0x3c8)
#define dac_val (0x3c9)
@@ -179,16 +178,10 @@ static int vesafb_setcolreg(unsigned regno, unsigned red, unsigned green,
static void vesafb_destroy(struct fb_info *info)
{
-#ifdef CONFIG_MTRR
struct vesafb_par *par = info->par;
-#endif
fb_dealloc_cmap(&info->cmap);
-
-#ifdef CONFIG_MTRR
- if (par->wc_cookie >= 0)
- mtrr_del(par->wc_cookie, 0, 0);
-#endif
+ arch_phys_wc_del(par->wc_cookie);
if (info->screen_base)
iounmap(info->screen_base);
release_mem_region(info->apertures->ranges[0].base, info->apertures->ranges[0].size);
@@ -419,7 +412,6 @@ static int vesafb_probe(struct platform_device *dev)
request_region(0x3c0, 32, "vesafb");
if (mtrr == 3) {
-#ifdef CONFIG_MTRR
unsigned int temp_size = size_total;
/* Find the largest power-of-two */
@@ -427,18 +419,16 @@ static int vesafb_probe(struct platform_device *dev)
/* Try and find a power of two to add */
do {
- par->wc_cookie = mtrr_add(vesafb_fix.smem_start,
- temp_size,
- MTRR_TYPE_WRCOMB, 1);
+ par->wc_cookie =
+ arch_phys_wc_add(vesafb_fix.smem_start,
+ temp_size);
temp_size >>= 1;
- } while (temp_size >= PAGE_SIZE && par->wc_cookie == -EINVAL);
-#endif
+ } while (temp_size >= PAGE_SIZE && par->wc_cookie < 0);
+
info->screen_base = ioremap_wc(vesafb_fix.smem_start, vesafb_fix.smem_len);
} else {
-#ifdef CONFIG_MTRR
if (mtrr && mtrr != 3)
WARN_ONCE(1, "Only MTRR_TYPE_WRCOMB (3) make sense\n");
-#endif
info->screen_base = ioremap(vesafb_fix.smem_start, vesafb_fix.smem_len);
}
@@ -476,10 +466,7 @@ static int vesafb_probe(struct platform_device *dev)
fb_info(info, "%s frame buffer device\n", info->fix.id);
return 0;
err:
-#ifdef CONFIG_MTRR
- if (par->wc_cookie >= 0)
- mtrr_del(par->wc_cookie, 0, 0);
-#endif
+ arch_phys_wc_del(par->wc_cookie);
if (info->screen_base)
iounmap(info->screen_base);
framebuffer_release(info);
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 20/47] mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (18 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 19/47] video: fbdev: vesafb: use arch_phys_wc_add() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add() Luis R. Rodriguez
` (27 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
There is only one user but since we're going to bury
MTRR next out of access to drivers expose this last
piece of API to drivers in a general fashion only
needing io.h for access to helpers.
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
arch/x86/include/asm/io.h | 2 ++
arch/x86/include/asm/mtrr.h | 5 -----
arch/x86/kernel/cpu/mtrr/main.c | 6 +++---
drivers/gpu/drm/drm_ioctl.c | 14 +-------------
include/linux/io.h | 6 ++++++
5 files changed, 12 insertions(+), 21 deletions(-)
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index a144d05..5e3f1f2 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -346,6 +346,8 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
unsigned long size);
extern void arch_phys_wc_del(int handle);
#define arch_phys_wc_add arch_phys_wc_add
+extern int arch_phys_wc_index(int handle);
+#define arch_phys_wc_index arch_phys_wc_index
#endif
#endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index cade917..380bb4b 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -49,7 +49,6 @@ extern void mtrr_aps_init(void);
extern void mtrr_bp_restore(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
-extern int phys_wc_to_mtrr_index(int handle);
# else
static const int mtrr_enabled;
static inline u8 mtrr_type_lookup(u64 addr, u64 end)
@@ -86,10 +85,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
{
}
-static inline int phys_wc_to_mtrr_index(int handle)
-{
- return -1;
-}
#define mtrr_ap_init() do {} while (0)
#define mtrr_bp_init() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 5ae830b..b68b671 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -607,7 +607,7 @@ void arch_phys_wc_del(int handle)
EXPORT_SYMBOL(arch_phys_wc_del);
/*
- * phys_wc_to_mtrr_index - translates arch_phys_wc_add's return value
+ * arch_phys_wc_index - translates arch_phys_wc_add's return value
* @handle: Return value from arch_phys_wc_add
*
* This will turn the return value from arch_phys_wc_add into an mtrr
@@ -617,14 +617,14 @@ EXPORT_SYMBOL(arch_phys_wc_del);
* in printk line. Alas there is an illegitimate use in some ancient
* drm ioctls.
*/
-int phys_wc_to_mtrr_index(int handle)
+int arch_phys_wc_index(int handle)
{
if (handle < MTRR_TO_PHYS_WC_OFFSET)
return -1;
else
return handle - MTRR_TO_PHYS_WC_OFFSET;
}
-EXPORT_SYMBOL_GPL(phys_wc_to_mtrr_index);
+EXPORT_SYMBOL_GPL(arch_phys_wc_index);
/*
* HACK ALERT!
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index a6d773a..e597cdd 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -36,9 +36,6 @@
#include <linux/pci.h>
#include <linux/export.h>
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
static int drm_version(struct drm_device *dev, void *data,
struct drm_file *file_priv);
@@ -197,16 +194,7 @@ static int drm_getmap(struct drm_device *dev, void *data,
map->type = r_list->map->type;
map->flags = r_list->map->flags;
map->handle = (void *)(unsigned long) r_list->user_token;
-
-#ifdef CONFIG_X86
- /*
- * There appears to be exactly one user of the mtrr index: dritest.
- * It's easy enough to keep it working on non-PAT systems.
- */
- map->mtrr = phys_wc_to_mtrr_index(r_list->map->mtrr);
-#else
- map->mtrr = -1;
-#endif
+ map->mtrr = arch_phys_wc_index(r_list->map->mtrr);
mutex_unlock(&dev->struct_mutex);
diff --git a/include/linux/io.h b/include/linux/io.h
index ecc51c3..1676437 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -115,6 +115,12 @@ static inline void arch_phys_wc_del(int handle)
#define __arch_phys_wc_add arch_phys_wc_add
#endif
+#ifndef arch_phys_wc_index
+static inline int arch_phys_wc_index(int handle)
+{
+ return -1;
+}
+#define arch_phys_wc_index arch_phys_wc_index
#endif
#endif /* _LINUX_IO_H */
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (19 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 20/47] mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-21 7:08 ` Hyong-Youb Kim
2015-03-20 23:18 ` [PATCH v1 22/47] staging: sm750fb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
` (26 subsequent siblings)
47 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Hyong-Youb Kim, netdev,
Antonino Daplas, Jean-Christophe Plagniol-Villard,
Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver already uses ioremap_wc() on the same range
so when write-combining is available that will be used
instead.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hyong-Youb Kim <hykim@myri.com>
Cc: netdev@vger.kernel.org
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 36 ++++++------------------
1 file changed, 8 insertions(+), 28 deletions(-)
diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 1412f5a..01e4069 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -69,11 +69,7 @@
#include <net/ip.h>
#include <net/tcp.h>
#include <asm/byteorder.h>
-#include <asm/io.h>
#include <asm/processor.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
#include <net/busy_poll.h>
#include "myri10ge_mcp.h"
@@ -242,8 +238,7 @@ struct myri10ge_priv {
unsigned int rdma_tags_available;
int intr_coal_delay;
__be32 __iomem *intr_coal_delay_ptr;
- int mtrr;
- int wc_enabled;
+ int wc_cookie;
int down_cnt;
wait_queue_head_t down_wq;
struct work_struct watchdog_work;
@@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
data[i] = ((u64 *)&link_stats)[i];
data[i++] = (unsigned int)mgp->tx_boundary;
- data[i++] = (unsigned int)mgp->wc_enabled;
data[i++] = (unsigned int)mgp->pdev->irq;
data[i++] = (unsigned int)mgp->msi_enabled;
data[i++] = (unsigned int)mgp->msix_enabled;
@@ -4040,14 +4034,7 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
mgp->board_span = pci_resource_len(pdev, 0);
mgp->iomem_base = pci_resource_start(pdev, 0);
- mgp->mtrr = -1;
- mgp->wc_enabled = 0;
-#ifdef CONFIG_MTRR
- mgp->mtrr = mtrr_add(mgp->iomem_base, mgp->board_span,
- MTRR_TYPE_WRCOMB, 1);
- if (mgp->mtrr >= 0)
- mgp->wc_enabled = 1;
-#endif
+ mgp->wc_cookie = arch_phys_wc_add(mgp->iomem_base, mgp->board_span);
mgp->sram = ioremap_wc(mgp->iomem_base, mgp->board_span);
if (mgp->sram == NULL) {
dev_err(&pdev->dev, "ioremap failed for %ld bytes at 0x%lx\n",
@@ -4146,14 +4133,14 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto abort_with_state;
}
if (mgp->msix_enabled)
- dev_info(dev, "%d MSI-X IRQs, tx bndry %d, fw %s, WC %s\n",
+ dev_info(dev, "%d MSI-X IRQs, tx bndry %d, fw %s, MTRR %s, WC Enabled\n",
mgp->num_slices, mgp->tx_boundary, mgp->fw_name,
- (mgp->wc_enabled ? "Enabled" : "Disabled"));
+ (mgp->wc_cookie > 0 ? "Enabled" : "Disabled"));
else
- dev_info(dev, "%s IRQ %d, tx bndry %d, fw %s, WC %s\n",
+ dev_info(dev, "%s IRQ %d, tx bndry %d, fw %s, MTRR %s, WC Enabled\n",
mgp->msi_enabled ? "MSI" : "xPIC",
pdev->irq, mgp->tx_boundary, mgp->fw_name,
- (mgp->wc_enabled ? "Enabled" : "Disabled"));
+ (mgp->wc_cookie > 0 ? "Enabled" : "Disabled"));
board_number++;
return 0;
@@ -4175,10 +4162,7 @@ abort_with_ioremap:
iounmap(mgp->sram);
abort_with_mtrr:
-#ifdef CONFIG_MTRR
- if (mgp->mtrr >= 0)
- mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
-#endif
+ arch_phys_wc_del(mgp->wc_cookie);
dma_free_coherent(&pdev->dev, sizeof(*mgp->cmd),
mgp->cmd, mgp->cmd_bus);
@@ -4220,11 +4204,7 @@ static void myri10ge_remove(struct pci_dev *pdev)
pci_restore_state(pdev);
iounmap(mgp->sram);
-
-#ifdef CONFIG_MTRR
- if (mgp->mtrr >= 0)
- mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
-#endif
+ arch_phys_wc_del(mgp->wc_cookie);
myri10ge_free_slices(mgp);
kfree(mgp->msix_vectors);
dma_free_coherent(&pdev->dev, sizeof(*mgp->cmd),
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add()
2015-03-20 23:18 ` [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add() Luis R. Rodriguez
@ 2015-03-21 7:08 ` Hyong-Youb Kim
2015-03-27 20:36 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Hyong-Youb Kim @ 2015-03-21 7:08 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
Hyong-Youb Kim, netdev, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 20, 2015 at 04:18:11PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This driver already uses ioremap_wc() on the same range
> so when write-combining is available that will be used
> instead.
>
[...]
> --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
[...]
> @@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
> data[i] = ((u64 *)&link_stats)[i];
>
> data[i++] = (unsigned int)mgp->tx_boundary;
> - data[i++] = (unsigned int)mgp->wc_enabled;
> data[i++] = (unsigned int)mgp->pdev->irq;
> data[i++] = (unsigned int)mgp->msi_enabled;
> data[i++] = (unsigned int)mgp->msix_enabled;
You would have to delete "WC from myri10ge_gstrings_main_stats too.
Something like below. Thanks.
@@ -1905,7 +1905,7 @@ static const char myri10ge_gstrings_main_stats[][ETH_GSTRING_LEN] = {
"tx_aborted_errors", "tx_carrier_errors", "tx_fifo_errors",
"tx_heartbeat_errors", "tx_window_errors",
/* device-specific stats */
- "tx_boundary", "WC", "irq", "MSI", "MSIX",
+ "tx_boundary", "irq", "MSI", "MSIX",
"read_dma_bw_MBs", "write_dma_bw_MBs", "read_write_dma_bw_MBs",
"serial_number", "watchdog_resets",
#ifdef CONFIG_MYRI10GE_DCA
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add()
2015-03-21 7:08 ` Hyong-Youb Kim
@ 2015-03-27 20:36 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 20:36 UTC (permalink / raw)
To: Hyong-Youb Kim
Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
Hyong-Youb Kim, netdev, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Sat, Mar 21, 2015 at 04:08:00PM +0900, Hyong-Youb Kim wrote:
> On Fri, Mar 20, 2015 at 04:18:11PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > This driver already uses ioremap_wc() on the same range
> > so when write-combining is available that will be used
> > instead.
> >
> [...]
> > --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> > +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> [...]
> > @@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
> > data[i] = ((u64 *)&link_stats)[i];
> >
> > data[i++] = (unsigned int)mgp->tx_boundary;
> > - data[i++] = (unsigned int)mgp->wc_enabled;
> > data[i++] = (unsigned int)mgp->pdev->irq;
> > data[i++] = (unsigned int)mgp->msi_enabled;
> > data[i++] = (unsigned int)mgp->msix_enabled;
>
> You would have to delete "WC from myri10ge_gstrings_main_stats too.
> Something like below. Thanks.
>
> @@ -1905,7 +1905,7 @@ static const char myri10ge_gstrings_main_stats[][ETH_GSTRING_LEN] = {
> "tx_aborted_errors", "tx_carrier_errors", "tx_fifo_errors",
> "tx_heartbeat_errors", "tx_window_errors",
> /* device-specific stats */
> - "tx_boundary", "WC", "irq", "MSI", "MSIX",
> + "tx_boundary", "irq", "MSI", "MSIX",
> "read_dma_bw_MBs", "write_dma_bw_MBs", "read_write_dma_bw_MBs",
> "serial_number", "watchdog_resets",
> #ifdef CONFIG_MYRI10GE_DCA
OK great thanks. Amended.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH v1 22/47] staging: sm750fb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (20 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 23/47] staging: xgifb: " Luis R. Rodriguez
` (25 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
The same area used for ioremap() is used for the MTRR area.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/staging/sm750fb/sm750.c | 34 ++++------------------------------
drivers/staging/sm750fb/sm750.h | 3 ---
drivers/staging/sm750fb/sm750_hw.c | 3 +--
3 files changed, 5 insertions(+), 35 deletions(-)
diff --git a/drivers/staging/sm750fb/sm750.c b/drivers/staging/sm750fb/sm750.c
index aa0888c..ea59471 100644
--- a/drivers/staging/sm750fb/sm750.c
+++ b/drivers/staging/sm750fb/sm750.c
@@ -16,9 +16,6 @@
#include<linux/vmalloc.h>
#include<linux/pagemap.h>
#include <linux/console.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
#include <asm/fb.h>
#include "sm750.h"
#include "sm750_hw.h"
@@ -47,9 +44,7 @@ typedef int (*PROC_SPEC_INITHW)(struct lynx_share*,struct pci_dev*);
/* common var for all device */
static int g_hwcursor = 1;
static int g_noaccel = 0;
-#ifdef CONFIG_MTRR
static int g_nomtrr = 0;
-#endif
static const char * g_fbmode[] = {NULL,NULL};
static const char * g_def_fbmode = "800x600-16@60";
static char * g_settings = NULL;
@@ -1102,11 +1097,8 @@ static int lynxfb_pci_probe(struct pci_dev * pdev,
pr_info("share->revid = %02x\n",share->revid);
share->pdev = pdev;
-#ifdef CONFIG_MTRR
share->mtrr_off = g_nomtrr;
share->mtrr.vram = 0;
- share->mtrr.vram_added = 0;
-#endif
share->accel_off = g_noaccel;
share->dual = g_dualview;
spin_lock_init(&share->slock);
@@ -1134,22 +1126,9 @@ static int lynxfb_pci_probe(struct pci_dev * pdev,
goto err_map;
}
-#ifdef CONFIG_MTRR
- if(!share->mtrr_off){
- pr_info("enable mtrr\n");
- share->mtrr.vram = mtrr_add(share->vidmem_start,
- share->vidmem_size,
- MTRR_TYPE_WRCOMB,1);
-
- if(share->mtrr.vram < 0){
- /* don't block driver with the failure of MTRR */
- pr_err("Unable to setup MTRR.\n");
- }else{
- share->mtrr.vram_added = 1;
- pr_info("MTRR added succesfully\n");
- }
- }
-#endif
+ if (!share->mtrr_off)
+ share->mtrr.vram = arch_phys_wc_add(share->vidmem_start,
+ share->vidmem_size);
memset(share->pvMem,0,share->vidmem_size);
@@ -1250,10 +1229,7 @@ static void __exit lynxfb_pci_remove(struct pci_dev * pdev)
/* release frame buffer*/
framebuffer_release(info);
}
-#ifdef CONFIG_MTRR
- if(share->mtrr.vram_added)
- mtrr_del(share->mtrr.vram,share->vidmem_start,share->vidmem_size);
-#endif
+ arch_phys_wc_del(share->mtrr.vram);
// pci_release_regions(pdev);
iounmap(share->pvReg);
@@ -1297,10 +1273,8 @@ static int __init lynxfb_setup(char * options)
/* options that mean for any lynx chips are configured here */
if(!strncmp(opt,"noaccel",strlen("noaccel")))
g_noaccel = 1;
-#ifdef CONFIG_MTRR
else if(!strncmp(opt,"nomtrr",strlen("nomtrr")))
g_nomtrr = 1;
-#endif
else if(!strncmp(opt,"dual",strlen("dual")))
g_dualview = 1;
else
diff --git a/drivers/staging/sm750fb/sm750.h b/drivers/staging/sm750fb/sm750.h
index 0847d2b..5528912 100644
--- a/drivers/staging/sm750fb/sm750.h
+++ b/drivers/staging/sm750fb/sm750.h
@@ -51,13 +51,10 @@ struct lynx_share{
struct lynx_accel accel;
int accel_off;
int dual;
-#ifdef CONFIG_MTRR
int mtrr_off;
struct{
int vram;
- int vram_added;
}mtrr;
-#endif
/* all smi graphic adaptor got below attributes */
unsigned long vidmem_start;
unsigned long vidreg_start;
diff --git a/drivers/staging/sm750fb/sm750_hw.c b/drivers/staging/sm750fb/sm750_hw.c
index c44a50b..203a0a1 100644
--- a/drivers/staging/sm750fb/sm750_hw.c
+++ b/drivers/staging/sm750fb/sm750_hw.c
@@ -85,8 +85,7 @@ int hw_sm750_map(struct lynx_share* share,struct pci_dev* pdev)
}
#endif
- share->pvMem = ioremap(share->vidmem_start,
- share->vidmem_size);
+ share->pvMem = ioremap_wc(share->vidmem_start, share->vidmem_size);
if(!share->pvMem){
pr_err("Map video memory failed\n");
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 23/47] staging: xgifb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (21 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 22/47] staging: sm750fb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-04-30 17:40 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 24/47] video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc() Luis R. Rodriguez
` (24 subsequent siblings)
47 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
The same area used for ioremap() is used for the MTRR area.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/staging/xgifb/XGI_main_26.c | 27 ++++++---------------------
1 file changed, 6 insertions(+), 21 deletions(-)
diff --git a/drivers/staging/xgifb/XGI_main_26.c b/drivers/staging/xgifb/XGI_main_26.c
index 74e8820..943d463 100644
--- a/drivers/staging/xgifb/XGI_main_26.c
+++ b/drivers/staging/xgifb/XGI_main_26.c
@@ -8,10 +8,7 @@
#include <linux/sizes.h>
#include <linux/module.h>
-
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
+#include <linux/pci.h>
#include "XGI_main.h"
#include "vb_init.h"
@@ -1770,7 +1767,7 @@ static int xgifb_probe(struct pci_dev *pdev,
}
xgifb_info->video_vbase = hw_info->pjVideoMemoryAddress =
- ioremap(xgifb_info->video_base, xgifb_info->video_size);
+ ioremap_wc(xgifb_info->video_base, xgifb_info->video_size);
xgifb_info->mmio_vbase = ioremap(xgifb_info->mmio_base,
xgifb_info->mmio_size);
@@ -2014,12 +2011,8 @@ static int xgifb_probe(struct pci_dev *pdev,
fb_alloc_cmap(&fb_info->cmap, 256, 0);
-#ifdef CONFIG_MTRR
- xgifb_info->mtrr = mtrr_add(xgifb_info->video_base,
- xgifb_info->video_size, MTRR_TYPE_WRCOMB, 1);
- if (xgifb_info->mtrr >= 0)
- dev_info(&pdev->dev, "Added MTRR\n");
-#endif
+ xgifb_info->mtrr = arch_phys_wc_add(xgifb_info->video_base,
+ xgifb_info->video_size);
if (register_framebuffer(fb_info) < 0) {
ret = -EINVAL;
@@ -2031,11 +2024,7 @@ static int xgifb_probe(struct pci_dev *pdev,
return 0;
error_mtrr:
-#ifdef CONFIG_MTRR
- if (xgifb_info->mtrr >= 0)
- mtrr_del(xgifb_info->mtrr, xgifb_info->video_base,
- xgifb_info->video_size);
-#endif /* CONFIG_MTRR */
+ arch_phys_wc_del(xgifb_info->mtrr);
error_1:
iounmap(xgifb_info->mmio_vbase);
iounmap(xgifb_info->video_vbase);
@@ -2059,11 +2048,7 @@ static void xgifb_remove(struct pci_dev *pdev)
struct fb_info *fb_info = xgifb_info->fb_info;
unregister_framebuffer(fb_info);
-#ifdef CONFIG_MTRR
- if (xgifb_info->mtrr >= 0)
- mtrr_del(xgifb_info->mtrr, xgifb_info->video_base,
- xgifb_info->video_size);
-#endif /* CONFIG_MTRR */
+ arch_phys_wc_del(xgifb_info->mtrr);
iounmap(xgifb_info->mmio_vbase);
iounmap(xgifb_info->video_vbase);
release_mem_region(xgifb_info->mmio_base, xgifb_info->mmio_size);
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v1 23/47] staging: xgifb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:18 ` [PATCH v1 23/47] staging: xgifb: " Luis R. Rodriguez
@ 2015-04-30 17:40 ` Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 17:40 UTC (permalink / raw)
To: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
venkatesh.pallipadi, Dave Airlie
Cc: linux-kernel, linux-fbdev, X86 ML, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
On Fri, Mar 20, 2015 at 4:18 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> The same area used for ioremap() is used for the MTRR area.
> Convert the driver from using the x86 specific MTRR code to
> the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
> will avoid MTRR if write-combining is available, in order to
> take advantage of that also ensure the ioremap'd area is requested
> as write-combining.
>
> There are a few motivations for this:
>
> a) Take advantage of PAT when available
>
> b) Help bury MTRR code away, MTRR is architecture specific and on
> x86 its replaced by PAT
>
> c) Help with the goal of eventually using _PAGE_CACHE_UC over
> _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
>
> The conversion done is expressed by the following Coccinelle
> SmPL patch, it additionally required manual intervention to
> address all the #ifdery and removal of redundant things which
> arch_phys_wc_add() already addresses such as verbose message
> about when MTRR fails and doing nothing when we didn't get
> an MTRR.
>
> @ mtrr_found @
> expression index, base, size;
> @@
>
> -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
> +index = arch_phys_wc_add(base, size);
>
> @ mtrr_rm depends on mtrr_found @
> expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
> @@
>
> -mtrr_del(index, base, size);
> +arch_phys_wc_del(index);
>
> @ mtrr_rm_zero_arg depends on mtrr_found @
> expression mtrr_found.index;
> @@
>
> -mtrr_del(index, 0, 0);
> +arch_phys_wc_del(index);
>
> @ mtrr_rm_fb_info depends on mtrr_found @
> struct fb_info *info;
> expression mtrr_found.index;
> @@
>
> -mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
> +arch_phys_wc_del(index);
>
> @ ioremap_replace_nocache depends on mtrr_found @
> struct fb_info *info;
> expression base, size;
> @@
>
> -info->screen_base = ioremap_nocache(base, size);
> +info->screen_base = ioremap_wc(base, size);
>
> @ ioremap_replace_default depends on mtrr_found @
> struct fb_info *info;
> expression base, size;
> @@
>
> -info->screen_base = ioremap(base, size);
> +info->screen_base = ioremap_wc(base, size);
>
> Generated-by: Coccinelle SmPL
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Hey folks, can this be considered to be merged.
Thanks,
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH v1 24/47] video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (22 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 23/47] staging: xgifb: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 25/47] video: fbdev: radeonfb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
` (23 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/arkfb.c | 36 +++++-------------------------------
1 file changed, 5 insertions(+), 31 deletions(-)
diff --git a/drivers/video/fbdev/arkfb.c b/drivers/video/fbdev/arkfb.c
index b305a1e..6a317de 100644
--- a/drivers/video/fbdev/arkfb.c
+++ b/drivers/video/fbdev/arkfb.c
@@ -26,13 +26,9 @@
#include <linux/console.h> /* Why should fb driver call console functions? because console_lock() */
#include <video/vga.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
struct arkfb_info {
int mclk_freq;
- int mtrr_reg;
+ int wc_cookie;
struct dac_info *dac;
struct vgastate state;
@@ -102,10 +98,6 @@ static const struct svga_timing_regs ark_timing_regs = {
static char *mode_option = "640x480-8@60";
-#ifdef CONFIG_MTRR
-static int mtrr = 1;
-#endif
-
MODULE_AUTHOR("(c) 2007 Ondrej Zajicek <santiago@crfreenet.org>");
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("fbdev driver for ARK 2000PV");
@@ -115,11 +107,6 @@ MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
module_param_named(mode, mode_option, charp, 0444);
MODULE_PARM_DESC(mode, "Default video mode ('640x480-8@60', etc) (deprecated)");
-#ifdef CONFIG_MTRR
-module_param(mtrr, int, 0444);
-MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
-
static int threshold = 4;
module_param(threshold, int, 0644);
@@ -1002,7 +989,7 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
info->fix.smem_len = pci_resource_len(dev, 0);
/* Map physical IO memory address into kernel space */
- info->screen_base = pci_iomap(dev, 0, 0);
+ info->screen_base = pci_iomap_wc(dev, 0, 0);
if (! info->screen_base) {
rc = -ENOMEM;
dev_err(info->device, "iomap for framebuffer failed\n");
@@ -1057,14 +1044,8 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
/* Record a reference to the driver data */
pci_set_drvdata(dev, info);
-
-#ifdef CONFIG_MTRR
- if (mtrr) {
- par->mtrr_reg = -1;
- par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
- }
-#endif
-
+ par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+ info->fix.smem_len);
return 0;
/* Error handling */
@@ -1092,14 +1073,7 @@ static void ark_pci_remove(struct pci_dev *dev)
if (info) {
struct arkfb_info *par = info->par;
-
-#ifdef CONFIG_MTRR
- if (par->mtrr_reg >= 0) {
- mtrr_del(par->mtrr_reg, 0, 0);
- par->mtrr_reg = -1;
- }
-#endif
-
+ arch_phys_wc_del(par->wc_cookie);
dac_release(par->dac);
unregister_framebuffer(info);
fb_dealloc_cmap(&info->cmap);
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 25/47] video: fbdev: radeonfb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (23 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 24/47] video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 26/47] video: fbdev: gbefb: add missing mtrr_del() calls Luis R. Rodriguez
` (22 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/aty/radeon_base.c | 29 ++++++-----------------------
drivers/video/fbdev/aty/radeonfb.h | 2 +-
2 files changed, 7 insertions(+), 24 deletions(-)
diff --git a/drivers/video/fbdev/aty/radeon_base.c b/drivers/video/fbdev/aty/radeon_base.c
index 26d80a4..922e8fc 100644
--- a/drivers/video/fbdev/aty/radeon_base.c
+++ b/drivers/video/fbdev/aty/radeon_base.c
@@ -85,10 +85,6 @@
#endif /* CONFIG_PPC_OF */
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
#include <video/radeon.h>
#include <linux/radeonfb.h>
@@ -271,9 +267,7 @@ static bool mirror = 0;
static int panel_yres = 0;
static bool force_dfp = 0;
static bool force_measure_pll = 0;
-#ifdef CONFIG_MTRR
static bool nomtrr = 0;
-#endif
static bool force_sleep;
static bool ignore_devlist;
#ifdef CONFIG_PMAC_BACKLIGHT
@@ -2260,8 +2254,8 @@ static int radeonfb_pci_register(struct pci_dev *pdev,
rinfo->mapped_vram = min_t(unsigned long, MAX_MAPPED_VRAM, rinfo->video_ram);
do {
- rinfo->fb_base = ioremap (rinfo->fb_base_phys,
- rinfo->mapped_vram);
+ rinfo->fb_base = ioremap_wc(rinfo->fb_base_phys,
+ rinfo->mapped_vram);
} while (rinfo->fb_base == NULL &&
((rinfo->mapped_vram /= 2) >= MIN_MAPPED_VRAM));
@@ -2359,11 +2353,9 @@ static int radeonfb_pci_register(struct pci_dev *pdev,
goto err_unmap_fb;
}
-#ifdef CONFIG_MTRR
- rinfo->mtrr_hdl = nomtrr ? -1 : mtrr_add(rinfo->fb_base_phys,
- rinfo->video_ram,
- MTRR_TYPE_WRCOMB, 1);
-#endif
+ if (!nomtrr)
+ rinfo->wc_cookie = arch_phys_wc_add(rinfo->fb_base_phys,
+ rinfo->video_ram);
if (backlight)
radeonfb_bl_init(rinfo);
@@ -2428,12 +2420,7 @@ static void radeonfb_pci_unregister(struct pci_dev *pdev)
#endif
del_timer_sync(&rinfo->lvds_timer);
-
-#ifdef CONFIG_MTRR
- if (rinfo->mtrr_hdl >= 0)
- mtrr_del(rinfo->mtrr_hdl, 0, 0);
-#endif
-
+ arch_phys_wc_del(rinfo->wc_cookie);
unregister_framebuffer(info);
radeonfb_bl_exit(rinfo);
@@ -2489,10 +2476,8 @@ static int __init radeonfb_setup (char *options)
panel_yres = simple_strtoul((this_opt+11), NULL, 0);
} else if (!strncmp(this_opt, "backlight:", 10)) {
backlight = simple_strtoul(this_opt+10, NULL, 0);
-#ifdef CONFIG_MTRR
} else if (!strncmp(this_opt, "nomtrr", 6)) {
nomtrr = 1;
-#endif
} else if (!strncmp(this_opt, "nomodeset", 9)) {
nomodeset = 1;
} else if (!strncmp(this_opt, "force_measure_pll", 17)) {
@@ -2552,10 +2537,8 @@ module_param(monitor_layout, charp, 0);
MODULE_PARM_DESC(monitor_layout, "Specify monitor mapping (like XFree86)");
module_param(force_measure_pll, bool, 0);
MODULE_PARM_DESC(force_measure_pll, "Force measurement of PLL (debug)");
-#ifdef CONFIG_MTRR
module_param(nomtrr, bool, 0);
MODULE_PARM_DESC(nomtrr, "bool: disable use of MTRR registers");
-#endif
module_param(panel_yres, int, 0);
MODULE_PARM_DESC(panel_yres, "int: set panel yres");
module_param(mode_option, charp, 0);
diff --git a/drivers/video/fbdev/aty/radeonfb.h b/drivers/video/fbdev/aty/radeonfb.h
index cb84604..61812db 100644
--- a/drivers/video/fbdev/aty/radeonfb.h
+++ b/drivers/video/fbdev/aty/radeonfb.h
@@ -340,7 +340,7 @@ struct radeonfb_info {
struct pll_info pll;
- int mtrr_hdl;
+ int wc_cookie;
u32 save_regs[100];
int asleep;
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 26/47] video: fbdev: gbefb: add missing mtrr_del() calls
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (24 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 25/47] video: fbdev: radeonfb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 27/47] video: fbdev: gbefb: use arch_phys_wc_add() and devm_ioremap_wc() Luis R. Rodriguez
` (21 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver never removed the MTRRs. Fix that.
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/gbefb.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c
index 6d9ef39..f48ea7e 100644
--- a/drivers/video/fbdev/gbefb.c
+++ b/drivers/video/fbdev/gbefb.c
@@ -38,6 +38,7 @@ static struct sgi_gbe *gbe;
struct gbefb_par {
struct fb_var_screeninfo var;
struct gbe_timing_info timing;
+ int wc_cookie;
int valid;
};
@@ -1199,7 +1200,8 @@ static int gbefb_probe(struct platform_device *p_dev)
}
#ifdef CONFIG_X86
- mtrr_add(gbe_mem_phys, gbe_mem_size, MTRR_TYPE_WRCOMB, 1);
+ info->wc_cookie = mtrr_add(gbe_mem_phys, gbe_mem_size,
+ MTRR_TYPE_WRCOMB, 1);
#endif
/* map framebuffer memory into tiles table */
@@ -1240,6 +1242,10 @@ static int gbefb_probe(struct platform_device *p_dev)
return 0;
out_gbe_unmap:
+#ifdef CONFIG_MTRR
+ if (info->wc_cookie >= 0)
+ mtrr_del(info->wc_cookie, 0, 0);
+#endif
if (gbe_dma_addr)
dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
out_tiles_free:
@@ -1259,6 +1265,10 @@ static int gbefb_remove(struct platform_device* p_dev)
unregister_framebuffer(info);
gbe_turn_off();
+#ifdef CONFIG_MTRR
+ if (info->wc_cookie >= 0)
+ mtrr_del(info->wc_cookie, 0, 0);
+#endif
if (gbe_dma_addr)
dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
dma_free_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t),
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 27/47] video: fbdev: gbefb: use arch_phys_wc_add() and devm_ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (25 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 26/47] video: fbdev: gbefb: add missing mtrr_del() calls Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 28/47] video: fbdev: intelfb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
` (20 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/gbefb.c | 26 +++++++-------------------
1 file changed, 7 insertions(+), 19 deletions(-)
diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c
index f48ea7e..ef81215 100644
--- a/drivers/video/fbdev/gbefb.c
+++ b/drivers/video/fbdev/gbefb.c
@@ -22,9 +22,6 @@
#include <linux/module.h>
#include <linux/io.h>
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
#ifdef CONFIG_MIPS
#include <asm/addrspace.h>
#endif
@@ -1176,8 +1173,8 @@ static int gbefb_probe(struct platform_device *p_dev)
if (gbe_mem_phys) {
/* memory was allocated at boot time */
- gbe_mem = devm_ioremap_nocache(&p_dev->dev, gbe_mem_phys,
- gbe_mem_size);
+ gbe_mem = devm_ioremap_wc(&p_dev->dev, gbe_mem_phys,
+ gbe_mem_size);
if (!gbe_mem) {
printk(KERN_ERR "gbefb: couldn't map framebuffer\n");
ret = -ENOMEM;
@@ -1188,8 +1185,8 @@ static int gbefb_probe(struct platform_device *p_dev)
} else {
/* try to allocate memory with the classical allocator
* this has high chance to fail on low memory machines */
- gbe_mem = dma_alloc_coherent(NULL, gbe_mem_size, &gbe_dma_addr,
- GFP_KERNEL);
+ gbe_mem = dma_alloc_writecombine(NULL, gbe_mem_size,
+ &gbe_dma_addr, GFP_KERNEL);
if (!gbe_mem) {
printk(KERN_ERR "gbefb: couldn't allocate framebuffer memory\n");
ret = -ENOMEM;
@@ -1199,10 +1196,7 @@ static int gbefb_probe(struct platform_device *p_dev)
gbe_mem_phys = (unsigned long) gbe_dma_addr;
}
-#ifdef CONFIG_X86
- info->wc_cookie = mtrr_add(gbe_mem_phys, gbe_mem_size,
- MTRR_TYPE_WRCOMB, 1);
-#endif
+ info->wc_cookie = arch_phys_wc_add(gbe_mem_phys, gbe_mem_size);
/* map framebuffer memory into tiles table */
for (i = 0; i < (gbe_mem_size >> TILE_SHIFT); i++)
@@ -1242,10 +1236,7 @@ static int gbefb_probe(struct platform_device *p_dev)
return 0;
out_gbe_unmap:
-#ifdef CONFIG_MTRR
- if (info->wc_cookie >= 0)
- mtrr_del(info->wc_cookie, 0, 0);
-#endif
+ arch_phys_wc_del(info->wc_cookie);
if (gbe_dma_addr)
dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
out_tiles_free:
@@ -1265,10 +1256,7 @@ static int gbefb_remove(struct platform_device* p_dev)
unregister_framebuffer(info);
gbe_turn_off();
-#ifdef CONFIG_MTRR
- if (info->wc_cookie >= 0)
- mtrr_del(info->wc_cookie, 0, 0);
-#endif
+ arch_phys_wc_del(info->wc_cookie);
if (gbe_dma_addr)
dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
dma_free_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t),
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 28/47] video: fbdev: intelfb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (26 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 27/47] video: fbdev: gbefb: use arch_phys_wc_add() and devm_ioremap_wc() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 29/47] video: fbdev: matrox: " Luis R. Rodriguez
` (19 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Although this driver gives the framebuffer layer a different
size for the framebuffer it uses the entire aperture PCI BAR
size for the MTRR. Since the framebuffer is included in that
range and MTRR was used on the entire PCI BAR WC will have
been preferred on that range as well. This propagates the
WC preference on the same entire PCI BAR.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/intelfb/intelfb.h | 4 +---
drivers/video/fbdev/intelfb/intelfbdrv.c | 38 ++++----------------------------
2 files changed, 5 insertions(+), 37 deletions(-)
diff --git a/drivers/video/fbdev/intelfb/intelfb.h b/drivers/video/fbdev/intelfb/intelfb.h
index 6b51175..37f8339 100644
--- a/drivers/video/fbdev/intelfb/intelfb.h
+++ b/drivers/video/fbdev/intelfb/intelfb.h
@@ -285,9 +285,7 @@ struct intelfb_info {
/* use a gart reserved fb mem */
u8 fbmem_gart;
- /* mtrr support */
- int mtrr_reg;
- u32 has_mtrr;
+ int wc_cookie;
/* heap data */
struct intelfb_heap_data aperture;
diff --git a/drivers/video/fbdev/intelfb/intelfbdrv.c b/drivers/video/fbdev/intelfb/intelfbdrv.c
index b847d53..bbec737 100644
--- a/drivers/video/fbdev/intelfb/intelfbdrv.c
+++ b/drivers/video/fbdev/intelfb/intelfbdrv.c
@@ -124,10 +124,6 @@
#include <asm/io.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
#include "intelfb.h"
#include "intelfbhw.h"
#include "../edid.h"
@@ -411,33 +407,6 @@ module_init(intelfb_init);
module_exit(intelfb_exit);
/***************************************************************
- * mtrr support functions *
- ***************************************************************/
-
-#ifdef CONFIG_MTRR
-static inline void set_mtrr(struct intelfb_info *dinfo)
-{
- dinfo->mtrr_reg = mtrr_add(dinfo->aperture.physical,
- dinfo->aperture.size, MTRR_TYPE_WRCOMB, 1);
- if (dinfo->mtrr_reg < 0) {
- ERR_MSG("unable to set MTRR\n");
- return;
- }
- dinfo->has_mtrr = 1;
-}
-static inline void unset_mtrr(struct intelfb_info *dinfo)
-{
- if (dinfo->has_mtrr)
- mtrr_del(dinfo->mtrr_reg, dinfo->aperture.physical,
- dinfo->aperture.size);
-}
-#else
-#define set_mtrr(x) WRN_MSG("MTRR is disabled in the kernel\n")
-
-#define unset_mtrr(x) do { } while (0)
-#endif /* CONFIG_MTRR */
-
-/***************************************************************
* driver init / cleanup *
***************************************************************/
@@ -456,7 +425,7 @@ static void cleanup(struct intelfb_info *dinfo)
if (dinfo->registered)
unregister_framebuffer(dinfo->info);
- unset_mtrr(dinfo);
+ arch_phys_wc_del(dinfo->wc_cookie);
if (dinfo->fbmem_gart && dinfo->gtt_fb_mem) {
agp_unbind_memory(dinfo->gtt_fb_mem);
@@ -675,7 +644,7 @@ static int intelfb_pci_register(struct pci_dev *pdev,
/* Allocate memories (which aren't stolen) */
/* Map the fb and MMIO regions */
/* ioremap only up to the end of used aperture */
- dinfo->aperture.virtual = (u8 __iomem *)ioremap_nocache
+ dinfo->aperture.virtual = (u8 __iomem *)ioremap_wc
(dinfo->aperture.physical, ((offset + dinfo->fb.offset) << 12)
+ dinfo->fb.size);
if (!dinfo->aperture.virtual) {
@@ -772,7 +741,8 @@ static int intelfb_pci_register(struct pci_dev *pdev,
agp_backend_release(bridge);
if (mtrr)
- set_mtrr(dinfo);
+ dinfo->wc_cookie = arch_phys_wc_add(dinfo->aperture.physical,
+ dinfo->aperture.size);
DBG_MSG("fb: 0x%x(+ 0x%x)/0x%x (0x%p)\n",
dinfo->fb.physical, dinfo->fb.offset, dinfo->fb.size,
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 29/47] video: fbdev: matrox: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (27 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 28/47] video: fbdev: intelfb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 30/47] video: fbdev: neofb: " Luis R. Rodriguez
` (18 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver uses the same ioremap()'d area for the MTRR.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/matrox/matroxfb_base.c | 36 +++++++++++-------------------
drivers/video/fbdev/matrox/matroxfb_base.h | 27 +---------------------
2 files changed, 14 insertions(+), 49 deletions(-)
diff --git a/drivers/video/fbdev/matrox/matroxfb_base.c b/drivers/video/fbdev/matrox/matroxfb_base.c
index 62539ca..2f70365 100644
--- a/drivers/video/fbdev/matrox/matroxfb_base.c
+++ b/drivers/video/fbdev/matrox/matroxfb_base.c
@@ -370,12 +370,9 @@ static void matroxfb_remove(struct matrox_fb_info *minfo, int dummy)
matroxfb_unregister_device(minfo);
unregister_framebuffer(&minfo->fbcon);
matroxfb_g450_shutdown(minfo);
-#ifdef CONFIG_MTRR
- if (minfo->mtrr.vram_valid)
- mtrr_del(minfo->mtrr.vram, minfo->video.base, minfo->video.len);
-#endif
- mga_iounmap(minfo->mmio.vbase);
- mga_iounmap(minfo->video.vbase);
+ arch_phys_wc_del(minfo->wc_cookie);
+ iounmap(minfo->mmio.vbase.vaddr);
+ iounmap(minfo->video.vbase.vaddr);
release_mem_region(minfo->video.base, minfo->video.len_maximum);
release_mem_region(minfo->mmio.base, 16384);
kfree(minfo);
@@ -1256,9 +1253,7 @@ static int nobios; /* "matroxfb:nobios" */
static int noinit = 1; /* "matroxfb:init" */
static int inverse; /* "matroxfb:inverse" */
static int sgram; /* "matroxfb:sgram" */
-#ifdef CONFIG_MTRR
static int mtrr = 1; /* "matroxfb:nomtrr" */
-#endif
static int grayscale; /* "matroxfb:grayscale" */
static int dev = -1; /* "matroxfb:dev:xxxxx" */
static unsigned int vesa = ~0; /* "matroxfb:vesa:xxxxx" */
@@ -1717,14 +1712,17 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
if (mem && (mem < memsize))
memsize = mem;
err = -ENOMEM;
- if (mga_ioremap(ctrlptr_phys, 16384, MGA_IOREMAP_MMIO, &minfo->mmio.vbase)) {
+
+ minfo->mmio.vbase.vaddr = ioremap_nocache(ctrlptr_phys, 16384);
+ if (!minfo->mmio.vbase.vaddr) {
printk(KERN_ERR "matroxfb: cannot ioremap(%lX, 16384), matroxfb disabled\n", ctrlptr_phys);
goto failVideoMR;
}
minfo->mmio.base = ctrlptr_phys;
minfo->mmio.len = 16384;
minfo->video.base = video_base_phys;
- if (mga_ioremap(video_base_phys, memsize, MGA_IOREMAP_FB, &minfo->video.vbase)) {
+ minfo->video.vbase.vaddr = ioremap_wc(video_base_phys, memsize);
+ if (!minfo->video.vbase.vaddr) {
printk(KERN_ERR "matroxfb: cannot ioremap(%lX, %d), matroxfb disabled\n",
video_base_phys, memsize);
goto failCtrlIO;
@@ -1772,13 +1770,9 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
minfo->video.len_usable = minfo->video.len;
if (minfo->video.len_usable > b->base->maxdisplayable)
minfo->video.len_usable = b->base->maxdisplayable;
-#ifdef CONFIG_MTRR
- if (mtrr) {
- minfo->mtrr.vram = mtrr_add(video_base_phys, minfo->video.len, MTRR_TYPE_WRCOMB, 1);
- minfo->mtrr.vram_valid = 1;
- printk(KERN_INFO "matroxfb: MTRR's turned on\n");
- }
-#endif /* CONFIG_MTRR */
+ if (mtrr)
+ minfo->wc_cookie = arch_phys_wc_add(video_base_phys,
+ minfo->video.len);
if (!minfo->devflags.novga)
request_region(0x3C0, 32, "matrox");
@@ -1947,9 +1941,9 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
return 0;
failVideoIO:;
matroxfb_g450_shutdown(minfo);
- mga_iounmap(minfo->video.vbase);
+ iounmap(minfo->video.vbase.vaddr);
failCtrlIO:;
- mga_iounmap(minfo->mmio.vbase);
+ iounmap(minfo->mmio.vbase.vaddr);
failVideoMR:;
release_mem_region(video_base_phys, minfo->video.len_maximum);
failCtrlMR:;
@@ -2443,10 +2437,8 @@ static int __init matroxfb_setup(char *options) {
nobios = !value;
else if (!strcmp(this_opt, "init"))
noinit = !value;
-#ifdef CONFIG_MTRR
else if (!strcmp(this_opt, "mtrr"))
mtrr = value;
-#endif
else if (!strcmp(this_opt, "inv24"))
inv24 = value;
else if (!strcmp(this_opt, "cross4MB"))
@@ -2515,10 +2507,8 @@ module_param(noinit, int, 0);
MODULE_PARM_DESC(noinit, "Disables W/SG/SD-RAM and bus interface initialization (0 or 1=do not initialize) (default=0)");
module_param(memtype, int, 0);
MODULE_PARM_DESC(memtype, "Memory type for G200/G400 (see Documentation/fb/matroxfb.txt for explanation) (default=3 for G200, 0 for G400)");
-#ifdef CONFIG_MTRR
module_param(mtrr, int, 0);
MODULE_PARM_DESC(mtrr, "This speeds up video memory accesses (0=disabled or 1) (default=1)");
-#endif
module_param(sgram, int, 0);
MODULE_PARM_DESC(sgram, "Indicates that G100/G200/G400 has SGRAM memory (0=SDRAM, 1=SGRAM) (default=0)");
module_param(inv24, int, 0);
diff --git a/drivers/video/fbdev/matrox/matroxfb_base.h b/drivers/video/fbdev/matrox/matroxfb_base.h
index 89a8a89a..09b02cd 100644
--- a/drivers/video/fbdev/matrox/matroxfb_base.h
+++ b/drivers/video/fbdev/matrox/matroxfb_base.h
@@ -44,9 +44,6 @@
#include <asm/io.h>
#include <asm/unaligned.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
#if defined(CONFIG_PPC_PMAC)
#include <asm/prom.h>
@@ -187,23 +184,6 @@ static inline void __iomem* vaddr_va(vaddr_t va) {
return va.vaddr;
}
-#define MGA_IOREMAP_NORMAL 0
-#define MGA_IOREMAP_NOCACHE 1
-
-#define MGA_IOREMAP_FB MGA_IOREMAP_NOCACHE
-#define MGA_IOREMAP_MMIO MGA_IOREMAP_NOCACHE
-static inline int mga_ioremap(unsigned long phys, unsigned long size, int flags, vaddr_t* virt) {
- if (flags & MGA_IOREMAP_NOCACHE)
- virt->vaddr = ioremap_nocache(phys, size);
- else
- virt->vaddr = ioremap(phys, size);
- return (virt->vaddr == NULL); /* 0, !0... 0, error_code in future */
-}
-
-static inline void mga_iounmap(vaddr_t va) {
- iounmap(va.vaddr);
-}
-
struct my_timming {
unsigned int pixclock;
int mnp;
@@ -449,12 +429,7 @@ struct matrox_fb_info {
int plnwt;
int srcorg;
} capable;
-#ifdef CONFIG_MTRR
- struct {
- int vram;
- int vram_valid;
- } mtrr;
-#endif
+ int wc_cookie;
struct {
int precise_width;
int mga_24bpp_fix;
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 30/47] video: fbdev: neofb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (28 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 29/47] video: fbdev: matrox: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 31/47] video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc() Luis R. Rodriguez
` (17 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/neofb.c | 26 +++++++-------------------
include/video/neomagic.h | 5 +----
2 files changed, 8 insertions(+), 23 deletions(-)
diff --git a/drivers/video/fbdev/neofb.c b/drivers/video/fbdev/neofb.c
index 44f99a6..db023a9 100644
--- a/drivers/video/fbdev/neofb.c
+++ b/drivers/video/fbdev/neofb.c
@@ -71,11 +71,6 @@
#include <asm/io.h>
#include <asm/irq.h>
#include <asm/pgtable.h>
-
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
#include <video/vga.h>
#include <video/neomagic.h>
@@ -1710,6 +1705,7 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
int video_len)
{
//unsigned long addr;
+ struct neofb_par *par = info->par;
DBG("neo_map_video");
@@ -1723,7 +1719,7 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
}
info->screen_base =
- ioremap(info->fix.smem_start, info->fix.smem_len);
+ ioremap_wc(info->fix.smem_start, info->fix.smem_len);
if (!info->screen_base) {
printk("neofb: unable to map screen memory\n");
release_mem_region(info->fix.smem_start,
@@ -1733,11 +1729,8 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
printk(KERN_INFO "neofb: mapped framebuffer at %p\n",
info->screen_base);
-#ifdef CONFIG_MTRR
- ((struct neofb_par *)(info->par))->mtrr =
- mtrr_add(info->fix.smem_start, pci_resource_len(dev, 0),
- MTRR_TYPE_WRCOMB, 1);
-#endif
+ par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+ pci_resource_len(dev, 0));
/* Clear framebuffer, it's all white in memory after boot */
memset_io(info->screen_base, 0, info->fix.smem_len);
@@ -1754,16 +1747,11 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
static void neo_unmap_video(struct fb_info *info)
{
- DBG("neo_unmap_video");
+ struct neofb_par *par = info->par;
-#ifdef CONFIG_MTRR
- {
- struct neofb_par *par = info->par;
+ DBG("neo_unmap_video");
- mtrr_del(par->mtrr, info->fix.smem_start,
- info->fix.smem_len);
- }
-#endif
+ arch_phys_wc_del(par->wc_cookie);
iounmap(info->screen_base);
info->screen_base = NULL;
diff --git a/include/video/neomagic.h b/include/video/neomagic.h
index bc5013e..91e225a 100644
--- a/include/video/neomagic.h
+++ b/include/video/neomagic.h
@@ -159,10 +159,7 @@ struct neofb_par {
unsigned char VCLK3NumeratorHigh;
unsigned char VCLK3Denominator;
unsigned char VerticalExt;
-
-#ifdef CONFIG_MTRR
- int mtrr;
-#endif
+ int wc_cookie;
u8 __iomem *mmio_vbase;
u8 cursorOff;
u8 *cursorPad; /* Must die !! */
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 31/47] video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (29 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 30/47] video: fbdev: neofb: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 32/47] video: fbdev: nvidia: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
` (16 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/s3fb.c | 35 ++++++-----------------------------
1 file changed, 6 insertions(+), 29 deletions(-)
diff --git a/drivers/video/fbdev/s3fb.c b/drivers/video/fbdev/s3fb.c
index f0ae61a..13b1090 100644
--- a/drivers/video/fbdev/s3fb.c
+++ b/drivers/video/fbdev/s3fb.c
@@ -28,13 +28,9 @@
#include <linux/i2c.h>
#include <linux/i2c-algo-bit.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
struct s3fb_info {
int chip, rev, mclk_freq;
- int mtrr_reg;
+ int wc_cookie;
struct vgastate state;
struct mutex open_lock;
unsigned int ref_count;
@@ -154,11 +150,7 @@ static const struct svga_timing_regs s3_timing_regs = {
static char *mode_option;
-
-#ifdef CONFIG_MTRR
static int mtrr = 1;
-#endif
-
static int fasttext = 1;
@@ -170,11 +162,8 @@ module_param(mode_option, charp, 0444);
MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
module_param_named(mode, mode_option, charp, 0444);
MODULE_PARM_DESC(mode, "Default video mode ('640x480-8@60', etc) (deprecated)");
-
-#ifdef CONFIG_MTRR
module_param(mtrr, int, 0444);
MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
module_param(fasttext, int, 0644);
MODULE_PARM_DESC(fasttext, "Enable S3 fast text mode (1=enable, 0=disable, default=1)");
@@ -1168,7 +1157,7 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
info->fix.smem_len = pci_resource_len(dev, 0);
/* Map physical IO memory address into kernel space */
- info->screen_base = pci_iomap(dev, 0, 0);
+ info->screen_base = pci_iomap_wc(dev, 0, 0);
if (! info->screen_base) {
rc = -ENOMEM;
dev_err(info->device, "iomap for framebuffer failed\n");
@@ -1365,12 +1354,9 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
/* Record a reference to the driver data */
pci_set_drvdata(dev, info);
-#ifdef CONFIG_MTRR
- if (mtrr) {
- par->mtrr_reg = -1;
- par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
- }
-#endif
+ if (mtrr)
+ par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+ info->fix.smem_len);
return 0;
@@ -1405,14 +1391,7 @@ static void s3_pci_remove(struct pci_dev *dev)
if (info) {
par = info->par;
-
-#ifdef CONFIG_MTRR
- if (par->mtrr_reg >= 0) {
- mtrr_del(par->mtrr_reg, 0, 0);
- par->mtrr_reg = -1;
- }
-#endif
-
+ arch_phys_wc_del(par->wc_cookie);
unregister_framebuffer(info);
fb_dealloc_cmap(&info->cmap);
@@ -1551,10 +1530,8 @@ static int __init s3fb_setup(char *options)
if (!*opt)
continue;
-#ifdef CONFIG_MTRR
else if (!strncmp(opt, "mtrr:", 5))
mtrr = simple_strtoul(opt + 5, NULL, 0);
-#endif
else if (!strncmp(opt, "fasttext:", 9))
fasttext = simple_strtoul(opt + 9, NULL, 0);
else
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 32/47] video: fbdev: nvidia: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (30 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 31/47] video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 33/47] video: fbdev: savagefb: " Luis R. Rodriguez
` (15 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver uses the same area for MTRR and ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/nvidia/nv_type.h | 7 +------
drivers/video/fbdev/nvidia/nvidia.c | 37 ++++++------------------------------
2 files changed, 7 insertions(+), 37 deletions(-)
diff --git a/drivers/video/fbdev/nvidia/nv_type.h b/drivers/video/fbdev/nvidia/nv_type.h
index c03f7f5..6ff321a 100644
--- a/drivers/video/fbdev/nvidia/nv_type.h
+++ b/drivers/video/fbdev/nvidia/nv_type.h
@@ -148,12 +148,7 @@ struct nvidia_par {
u32 forceCRTC;
u32 open_count;
u8 DDCBase;
-#ifdef CONFIG_MTRR
- struct {
- int vram;
- int vram_valid;
- } mtrr;
-#endif
+ int wc_cookie;
struct nvidia_i2c_chan chan[3];
volatile u32 __iomem *REGS;
diff --git a/drivers/video/fbdev/nvidia/nvidia.c b/drivers/video/fbdev/nvidia/nvidia.c
index def0412..781f5e7 100644
--- a/drivers/video/fbdev/nvidia/nvidia.c
+++ b/drivers/video/fbdev/nvidia/nvidia.c
@@ -21,9 +21,6 @@
#include <linux/pci.h>
#include <linux/console.h>
#include <linux/backlight.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
#ifdef CONFIG_PPC_OF
#include <asm/prom.h>
#include <asm/pci-bridge.h>
@@ -80,9 +77,7 @@ static int paneltweak = 0;
static int vram = 0;
static int bpp = 8;
static int reverse_i2c;
-#ifdef CONFIG_MTRR
static bool nomtrr = false;
-#endif
#ifdef CONFIG_PMAC_BACKLIGHT
static int backlight = 1;
#else
@@ -1365,7 +1360,8 @@ static int nvidiafb_probe(struct pci_dev *pd, const struct pci_device_id *ent)
par->ScratchBufferStart = par->FbUsableSize - par->ScratchBufferSize;
par->CursorStart = par->FbUsableSize + (32 * 1024);
- info->screen_base = ioremap(nvidiafb_fix.smem_start, par->FbMapSize);
+ info->screen_base = ioremap_wc(nvidiafb_fix.smem_start,
+ par->FbMapSize);
info->screen_size = par->FbUsableSize;
nvidiafb_fix.smem_len = par->RamAmountKBytes * 1024;
@@ -1376,20 +1372,9 @@ static int nvidiafb_probe(struct pci_dev *pd, const struct pci_device_id *ent)
par->FbStart = info->screen_base;
-#ifdef CONFIG_MTRR
- if (!nomtrr) {
- par->mtrr.vram = mtrr_add(nvidiafb_fix.smem_start,
- par->RamAmountKBytes * 1024,
- MTRR_TYPE_WRCOMB, 1);
- if (par->mtrr.vram < 0) {
- printk(KERN_ERR PFX "unable to setup MTRR\n");
- } else {
- par->mtrr.vram_valid = 1;
- /* let there be speed */
- printk(KERN_INFO PFX "MTRR set to ON\n");
- }
- }
-#endif /* CONFIG_MTRR */
+ if (!nomtrr)
+ par->wc_cookie = arch_phys_wc_add(nvidiafb_fix.smem_start,
+ par->RamAmountKBytes * 1024);
info->fbops = &nvidia_fb_ops;
info->fix = nvidiafb_fix;
@@ -1447,13 +1432,7 @@ static void nvidiafb_remove(struct pci_dev *pd)
unregister_framebuffer(info);
nvidia_bl_exit(par);
-
-#ifdef CONFIG_MTRR
- if (par->mtrr.vram_valid)
- mtrr_del(par->mtrr.vram, info->fix.smem_start,
- info->fix.smem_len);
-#endif /* CONFIG_MTRR */
-
+ arch_phys_wc_del(par->wc_cookie);
iounmap(info->screen_base);
fb_destroy_modedb(info->monspecs.modedb);
nvidia_delete_i2c_busses(par);
@@ -1505,10 +1484,8 @@ static int nvidiafb_setup(char *options)
vram = simple_strtoul(this_opt+5, NULL, 0);
} else if (!strncmp(this_opt, "backlight:", 10)) {
backlight = simple_strtoul(this_opt+10, NULL, 0);
-#ifdef CONFIG_MTRR
} else if (!strncmp(this_opt, "nomtrr", 6)) {
nomtrr = true;
-#endif
} else if (!strncmp(this_opt, "fpdither:", 9)) {
fpdither = simple_strtol(this_opt+9, NULL, 0);
} else if (!strncmp(this_opt, "bpp:", 4)) {
@@ -1596,11 +1573,9 @@ MODULE_PARM_DESC(bpp, "pixel width in bits"
"(default=8)");
module_param(reverse_i2c, int, 0);
MODULE_PARM_DESC(reverse_i2c, "reverse port assignment of the i2c bus");
-#ifdef CONFIG_MTRR
module_param(nomtrr, bool, false);
MODULE_PARM_DESC(nomtrr, "Disables MTRR support (0 or 1=disabled) "
"(default=0)");
-#endif
MODULE_AUTHOR("Antonino Daplas");
MODULE_DESCRIPTION("Framebuffer driver for nVidia graphics chipset");
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 33/47] video: fbdev: savagefb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (31 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 32/47] video: fbdev: nvidia: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 34/47] video: fbdev: sisfb: " Luis R. Rodriguez
` (14 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/savage/savagefb.h | 4 +---
drivers/video/fbdev/savage/savagefb_driver.c | 17 +++--------------
2 files changed, 4 insertions(+), 17 deletions(-)
diff --git a/drivers/video/fbdev/savage/savagefb.h b/drivers/video/fbdev/savage/savagefb.h
index 8ff4ab1..aba04af 100644
--- a/drivers/video/fbdev/savage/savagefb.h
+++ b/drivers/video/fbdev/savage/savagefb.h
@@ -213,9 +213,7 @@ struct savagefb_par {
void __iomem *vbase;
u32 pbase;
u32 len;
-#ifdef CONFIG_MTRR
- int mtrr;
-#endif
+ int wc_cookie;
} video;
struct {
diff --git a/drivers/video/fbdev/savage/savagefb_driver.c b/drivers/video/fbdev/savage/savagefb_driver.c
index 4dbf45f..6c77ab0 100644
--- a/drivers/video/fbdev/savage/savagefb_driver.c
+++ b/drivers/video/fbdev/savage/savagefb_driver.c
@@ -57,10 +57,6 @@
#include <asm/irq.h>
#include <asm/pgtable.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
#include "savagefb.h"
@@ -1775,7 +1771,7 @@ static int savage_map_video(struct fb_info *info, int video_len)
par->video.pbase = pci_resource_start(par->pcidev, resource);
par->video.len = video_len;
- par->video.vbase = ioremap(par->video.pbase, par->video.len);
+ par->video.vbase = ioremap_wc(par->video.pbase, par->video.len);
if (!par->video.vbase) {
printk("savagefb: unable to map screen memory\n");
@@ -1787,11 +1783,7 @@ static int savage_map_video(struct fb_info *info, int video_len)
info->fix.smem_start = par->video.pbase;
info->fix.smem_len = par->video.len - par->cob_size;
info->screen_base = par->video.vbase;
-
-#ifdef CONFIG_MTRR
- par->video.mtrr = mtrr_add(par->video.pbase, video_len,
- MTRR_TYPE_WRCOMB, 1);
-#endif
+ par->video.wc_cookie = arch_phys_wc_add(par->video.pbase, video_len);
/* Clear framebuffer, it's all white in memory after boot */
memset_io(par->video.vbase, 0, par->video.len);
@@ -1806,10 +1798,7 @@ static void savage_unmap_video(struct fb_info *info)
DBG("savage_unmap_video");
if (par->video.vbase) {
-#ifdef CONFIG_MTRR
- mtrr_del(par->video.mtrr, par->video.pbase, par->video.len);
-#endif
-
+ arch_phys_wc_del(par->video.wc_cookie);
iounmap(par->video.vbase);
par->video.vbase = NULL;
info->screen_base = NULL;
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 34/47] video: fbdev: sisfb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (32 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 33/47] video: fbdev: savagefb: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 35/47] video: fbdev: aty: " Luis R. Rodriguez
` (13 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/sis/sis.h | 2 +-
drivers/video/fbdev/sis/sis_main.c | 27 ++++++---------------------
2 files changed, 7 insertions(+), 22 deletions(-)
diff --git a/drivers/video/fbdev/sis/sis.h b/drivers/video/fbdev/sis/sis.h
index 1987f1b7..ea1d1c9 100644
--- a/drivers/video/fbdev/sis/sis.h
+++ b/drivers/video/fbdev/sis/sis.h
@@ -458,7 +458,7 @@ struct sis_video_info {
unsigned char *bios_abase;
- int mtrr;
+ int wc_cookie;
u32 sisfb_mem;
diff --git a/drivers/video/fbdev/sis/sis_main.c b/drivers/video/fbdev/sis/sis_main.c
index fcf610e..e923038 100644
--- a/drivers/video/fbdev/sis/sis_main.c
+++ b/drivers/video/fbdev/sis/sis_main.c
@@ -53,9 +53,6 @@
#include <linux/types.h>
#include <linux/uaccess.h>
#include <asm/io.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
#include "sis.h"
#include "sis_main.h"
@@ -4130,13 +4127,13 @@ static void sisfb_post_map_vram(struct sis_video_info *ivideo,
if (*mapsize < (min << 20))
return;
- ivideo->video_vbase = ioremap(ivideo->video_base, (*mapsize));
+ ivideo->video_vbase = ioremap_wc(ivideo->video_base, (*mapsize));
if(!ivideo->video_vbase) {
printk(KERN_ERR
"sisfb: Unable to map maximum video RAM for size detection\n");
(*mapsize) >>= 1;
- while((!(ivideo->video_vbase = ioremap(ivideo->video_base, (*mapsize))))) {
+ while((!(ivideo->video_vbase = ioremap_wc(ivideo->video_base, (*mapsize))))) {
(*mapsize) >>= 1;
if((*mapsize) < (min << 20))
break;
@@ -6186,7 +6183,7 @@ static int sisfb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto error_2;
}
- ivideo->video_vbase = ioremap(ivideo->video_base, ivideo->video_size);
+ ivideo->video_vbase = ioremap_wc(ivideo->video_base, ivideo->video_size);
ivideo->SiS_Pr.VideoMemoryAddress = ivideo->video_vbase;
if(!ivideo->video_vbase) {
printk(KERN_ERR "sisfb: Fatal error: Unable to map framebuffer memory\n");
@@ -6254,8 +6251,6 @@ error_3: vfree(ivideo->bios_abase);
ivideo->SiS_Pr.VideoMemoryAddress += ivideo->video_offset;
ivideo->SiS_Pr.VideoMemorySize = ivideo->sisfb_mem;
- ivideo->mtrr = -1;
-
ivideo->vbflags = 0;
ivideo->lcddefmodeidx = DEFAULT_LCDMODE;
ivideo->tvdefmodeidx = DEFAULT_TVMODE;
@@ -6443,14 +6438,8 @@ error_3: vfree(ivideo->bios_abase);
printk(KERN_DEBUG "sisfb: Initial vbflags 0x%x\n", (int)ivideo->vbflags);
-#ifdef CONFIG_MTRR
- ivideo->mtrr = mtrr_add(ivideo->video_base, ivideo->video_size,
- MTRR_TYPE_WRCOMB, 1);
- if(ivideo->mtrr < 0) {
- printk(KERN_DEBUG "sisfb: Failed to add MTRRs\n");
- }
-#endif
-
+ ivideo->wc_cookie = arch_phys_wc_add(ivideo->video_base,
+ ivideo->video_size);
if(register_framebuffer(sis_fb_info) < 0) {
printk(KERN_ERR "sisfb: Fatal error: Failed to register framebuffer\n");
ret = -EINVAL;
@@ -6507,11 +6496,7 @@ static void sisfb_remove(struct pci_dev *pdev)
pci_dev_put(ivideo->nbridge);
-#ifdef CONFIG_MTRR
- /* Release MTRR region */
- if(ivideo->mtrr >= 0)
- mtrr_del(ivideo->mtrr, ivideo->video_base, ivideo->video_size);
-#endif
+ arch_phys_wc_del(ivideo->wc_cookie);
/* If device was disabled when starting, disable
* it when quitting.
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 35/47] video: fbdev: aty: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (33 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 34/47] video: fbdev: sisfb: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 36/47] video: fbdev: i810: " Luis R. Rodriguez
` (12 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/aty/aty128fb.c | 36 ++++++------------------------------
1 file changed, 6 insertions(+), 30 deletions(-)
diff --git a/drivers/video/fbdev/aty/aty128fb.c b/drivers/video/fbdev/aty/aty128fb.c
index aedf2fb..f41955b 100644
--- a/drivers/video/fbdev/aty/aty128fb.c
+++ b/drivers/video/fbdev/aty/aty128fb.c
@@ -80,10 +80,6 @@
#include <asm/btext.h>
#endif /* CONFIG_BOOTX_TEXT */
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
#include <video/aty128.h>
/* Debug flag */
@@ -399,10 +395,7 @@ static int default_cmode = CMODE_8;
static int default_crt_on = 0;
static int default_lcd_on = 1;
-
-#ifdef CONFIG_MTRR
static bool mtrr = true;
-#endif
#ifdef CONFIG_FB_ATY128_BACKLIGHT
#ifdef CONFIG_PMAC_BACKLIGHT
@@ -456,9 +449,7 @@ struct aty128fb_par {
u32 vram_size; /* onboard video ram */
int chip_gen;
const struct aty128_meminfo *mem; /* onboard mem info */
-#ifdef CONFIG_MTRR
- struct { int vram; int vram_valid; } mtrr;
-#endif
+ int wc_cookie;
int blitter_may_be_busy;
int fifo_slots; /* free slots in FIFO (64 max) */
@@ -1725,12 +1716,10 @@ static int aty128fb_setup(char *options)
#endif
continue;
}
-#ifdef CONFIG_MTRR
if(!strncmp(this_opt, "nomtrr", 6)) {
mtrr = 0;
continue;
}
-#endif
#ifdef CONFIG_PPC_PMAC
/* vmode and cmode deprecated */
if (!strncmp(this_opt, "vmode:", 6)) {
@@ -2133,7 +2122,7 @@ static int aty128_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
par->vram_size = aty_ld_le32(CNFG_MEMSIZE) & 0x03FFFFFF;
/* Virtualize the framebuffer */
- info->screen_base = ioremap(fb_addr, par->vram_size);
+ info->screen_base = ioremap_wc(fb_addr, par->vram_size);
if (!info->screen_base)
goto err_unmap_out;
@@ -2170,15 +2159,9 @@ static int aty128_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (!aty128_init(pdev, ent))
goto err_out;
-#ifdef CONFIG_MTRR
- if (mtrr) {
- par->mtrr.vram = mtrr_add(info->fix.smem_start,
- par->vram_size, MTRR_TYPE_WRCOMB, 1);
- par->mtrr.vram_valid = 1;
- /* let there be speed */
- printk(KERN_INFO "aty128fb: Rage128 MTRR set to ON\n");
- }
-#endif /* CONFIG_MTRR */
+ if (mtrr)
+ par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+ par->vram_size);
return 0;
err_out:
@@ -2212,11 +2195,7 @@ static void aty128_remove(struct pci_dev *pdev)
aty128_bl_exit(info->bl_dev);
#endif
-#ifdef CONFIG_MTRR
- if (par->mtrr.vram_valid)
- mtrr_del(par->mtrr.vram, info->fix.smem_start,
- par->vram_size);
-#endif /* CONFIG_MTRR */
+ arch_phys_wc_del(par->wc_cookie);
iounmap(par->regbase);
iounmap(info->screen_base);
@@ -2625,8 +2604,5 @@ MODULE_DESCRIPTION("FBDev driver for ATI Rage128 / Pro cards");
MODULE_LICENSE("GPL");
module_param(mode_option, charp, 0);
MODULE_PARM_DESC(mode_option, "Specify resolution as \"<xres>x<yres>[-<bpp>][@<refresh>]\" ");
-#ifdef CONFIG_MTRR
module_param_named(nomtrr, mtrr, invbool, 0);
MODULE_PARM_DESC(nomtrr, "bool: Disable MTRR support (0 or 1=disabled) (default=0)");
-#endif
-
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 36/47] video: fbdev: i810: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (34 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 35/47] video: fbdev: aty: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 37/47] video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar() Luis R. Rodriguez
` (11 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
The same area used for MTRR is used for the ioremap() area.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/i810/i810.h | 3 +--
drivers/video/fbdev/i810/i810_main.c | 11 +++++++----
drivers/video/fbdev/i810/i810_main.h | 26 --------------------------
3 files changed, 8 insertions(+), 32 deletions(-)
diff --git a/drivers/video/fbdev/i810/i810.h b/drivers/video/fbdev/i810/i810.h
index 1414b73..7b1c002 100644
--- a/drivers/video/fbdev/i810/i810.h
+++ b/drivers/video/fbdev/i810/i810.h
@@ -199,7 +199,6 @@
#define HAS_FONTCACHE 8
/* driver flags */
-#define HAS_MTRR 1
#define HAS_ACCELERATION 2
#define ALWAYS_SYNC 4
#define LOCKUP 8
@@ -281,7 +280,7 @@ struct i810fb_par {
u32 ovract;
u32 cur_state;
u32 ddc_num;
- int mtrr_reg;
+ int wc_cookie;
u16 bltcntl;
u8 interlace;
};
diff --git a/drivers/video/fbdev/i810/i810_main.c b/drivers/video/fbdev/i810/i810_main.c
index bb674e4..025b882 100644
--- a/drivers/video/fbdev/i810/i810_main.c
+++ b/drivers/video/fbdev/i810/i810_main.c
@@ -41,6 +41,7 @@
#include <linux/resource.h>
#include <linux/unistd.h>
#include <linux/console.h>
+#include <linux/io.h>
#include <asm/io.h>
#include <asm/div64.h>
@@ -1816,7 +1817,9 @@ static void i810_init_device(struct i810fb_par *par)
u8 reg;
u8 __iomem *mmio = par->mmio_start_virtual;
- if (mtrr) set_mtrr(par);
+ if (mtrr)
+ par->wc_cookie= arch_phys_wc_add((u32) par->aperture.physical,
+ par->aperture.size);
i810_init_cursor(par);
@@ -1865,8 +1868,8 @@ static int i810_allocate_pci_resource(struct i810fb_par *par,
}
par->res_flags |= FRAMEBUFFER_REQ;
- par->aperture.virtual = ioremap_nocache(par->aperture.physical,
- par->aperture.size);
+ par->aperture.virtual = ioremap_wc(par->aperture.physical,
+ par->aperture.size);
if (!par->aperture.virtual) {
printk("i810fb_init: cannot remap framebuffer region\n");
return -ENODEV;
@@ -2096,7 +2099,7 @@ static void i810fb_release_resource(struct fb_info *info,
struct i810fb_par *par)
{
struct gtt_data *gtt = &par->i810_gtt;
- unset_mtrr(par);
+ arch_phys_wc_del(par->wc_cookie);
i810_delete_i2c_busses(par);
diff --git a/drivers/video/fbdev/i810/i810_main.h b/drivers/video/fbdev/i810/i810_main.h
index a25afaa..7bfaaad 100644
--- a/drivers/video/fbdev/i810/i810_main.h
+++ b/drivers/video/fbdev/i810/i810_main.h
@@ -60,32 +60,6 @@ static inline void flush_cache(void)
#define flush_cache() do { } while(0)
#endif
-#ifdef CONFIG_MTRR
-
-#include <asm/mtrr.h>
-
-static inline void set_mtrr(struct i810fb_par *par)
-{
- par->mtrr_reg = mtrr_add((u32) par->aperture.physical,
- par->aperture.size, MTRR_TYPE_WRCOMB, 1);
- if (par->mtrr_reg < 0) {
- printk(KERN_ERR "set_mtrr: unable to set MTRR\n");
- return;
- }
- par->dev_flags |= HAS_MTRR;
-}
-static inline void unset_mtrr(struct i810fb_par *par)
-{
- if (par->dev_flags & HAS_MTRR)
- mtrr_del(par->mtrr_reg, (u32) par->aperture.physical,
- par->aperture.size);
-}
-#else
-#define set_mtrr(x) printk("set_mtrr: MTRR is disabled in the kernel\n")
-
-#define unset_mtrr(x) do { } while (0)
-#endif /* CONFIG_MTRR */
-
#ifdef CONFIG_FB_I810_GTF
#define IS_DVT (0)
#else
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 37/47] video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (35 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 36/47] video: fbdev: i810: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 38/47] video: fbdev: kyrofb: " Luis R. Rodriguez
` (10 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/i740fb.c | 35 ++++++-----------------------------
1 file changed, 6 insertions(+), 29 deletions(-)
diff --git a/drivers/video/fbdev/i740fb.c b/drivers/video/fbdev/i740fb.c
index a2b4204..452e116 100644
--- a/drivers/video/fbdev/i740fb.c
+++ b/drivers/video/fbdev/i740fb.c
@@ -27,24 +27,15 @@
#include <linux/console.h>
#include <video/vga.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
#include "i740_reg.h"
static char *mode_option;
-
-#ifdef CONFIG_MTRR
static int mtrr = 1;
-#endif
struct i740fb_par {
unsigned char __iomem *regs;
bool has_sgram;
-#ifdef CONFIG_MTRR
- int mtrr_reg;
-#endif
+ int wc_cookie;
bool ddc_registered;
struct i2c_adapter ddc_adapter;
struct i2c_algo_bit_data ddc_algo;
@@ -1040,7 +1031,7 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
goto err_request_regions;
}
- info->screen_base = pci_ioremap_bar(dev, 0);
+ info->screen_base = pci_ioremap_wc_bar(dev, 0);
if (!info->screen_base) {
dev_err(info->device, "error remapping base\n");
ret = -ENOMEM;
@@ -1144,13 +1135,9 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
fb_info(info, "%s frame buffer device\n", info->fix.id);
pci_set_drvdata(dev, info);
-#ifdef CONFIG_MTRR
- if (mtrr) {
- par->mtrr_reg = -1;
- par->mtrr_reg = mtrr_add(info->fix.smem_start,
- info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
- }
-#endif
+ if (mtrr)
+ par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+ info->fix.smem_len);
return 0;
err_reg_framebuffer:
@@ -1177,13 +1164,7 @@ static void i740fb_remove(struct pci_dev *dev)
if (info) {
struct i740fb_par *par = info->par;
-
-#ifdef CONFIG_MTRR
- if (par->mtrr_reg >= 0) {
- mtrr_del(par->mtrr_reg, 0, 0);
- par->mtrr_reg = -1;
- }
-#endif
+ arch_phys_wc_del(par->wc_cookie);
unregister_framebuffer(info);
fb_dealloc_cmap(&info->cmap);
if (par->ddc_registered)
@@ -1287,10 +1268,8 @@ static int __init i740fb_setup(char *options)
while ((opt = strsep(&options, ",")) != NULL) {
if (!*opt)
continue;
-#ifdef CONFIG_MTRR
else if (!strncmp(opt, "mtrr:", 5))
mtrr = simple_strtoul(opt + 5, NULL, 0);
-#endif
else
mode_option = opt;
}
@@ -1327,7 +1306,5 @@ MODULE_DESCRIPTION("fbdev driver for Intel740");
module_param(mode_option, charp, 0444);
MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
-#ifdef CONFIG_MTRR
module_param(mtrr, int, 0444);
MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 38/47] video: fbdev: kyrofb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (36 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 37/47] video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 39/47] video: fbdev: pm2fb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
` (9 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/kyro/fbdev.c | 33 +++++++++++----------------------
include/video/kyro.h | 4 +---
2 files changed, 12 insertions(+), 25 deletions(-)
diff --git a/drivers/video/fbdev/kyro/fbdev.c b/drivers/video/fbdev/kyro/fbdev.c
index 65041e1..5bb0153 100644
--- a/drivers/video/fbdev/kyro/fbdev.c
+++ b/drivers/video/fbdev/kyro/fbdev.c
@@ -22,9 +22,6 @@
#include <linux/pci.h>
#include <asm/io.h>
#include <linux/uaccess.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
#include <video/kyro.h>
@@ -84,9 +81,7 @@ static device_info_t deviceInfo;
static char *mode_option = NULL;
static int nopan = 0;
static int nowrap = 1;
-#ifdef CONFIG_MTRR
static int nomtrr = 0;
-#endif
/* PCI driver prototypes */
static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
@@ -570,10 +565,8 @@ static int __init kyrofb_setup(char *options)
nopan = 1;
} else if (strcmp(this_opt, "nowrap") == 0) {
nowrap = 1;
-#ifdef CONFIG_MTRR
} else if (strcmp(this_opt, "nomtrr") == 0) {
nomtrr = 1;
-#endif
} else {
mode_option = this_opt;
}
@@ -691,17 +684,16 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
currentpar->regbase = deviceInfo.pSTGReg =
ioremap_nocache(kyro_fix.mmio_start, kyro_fix.mmio_len);
+ if (!currentpar->regbase)
+ goto out_free_fb;
- info->screen_base = ioremap_nocache(kyro_fix.smem_start,
- kyro_fix.smem_len);
+ info->screen_base = pci_ioremap_wc_bar(pdev, 0);
+ if (!info->screen_base)
+ goto out_unmap_regs;
-#ifdef CONFIG_MTRR
if (!nomtrr)
- currentpar->mtrr_handle =
- mtrr_add(kyro_fix.smem_start,
- kyro_fix.smem_len,
- MTRR_TYPE_WRCOMB, 1);
-#endif
+ currentpar->wc_cookie = arch_phys_wc_add(kyro_fix.smem_start,
+ kyro_fix.smem_len);
kyro_fix.ypanstep = nopan ? 0 : 1;
kyro_fix.ywrapstep = nowrap ? 0 : 1;
@@ -745,8 +737,10 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return 0;
out_unmap:
- iounmap(currentpar->regbase);
iounmap(info->screen_base);
+out_unmap_regs:
+ iounmap(currentpar->regbase);
+out_free_fb:
framebuffer_release(info);
return -EINVAL;
@@ -770,12 +764,7 @@ static void kyrofb_remove(struct pci_dev *pdev)
iounmap(info->screen_base);
iounmap(par->regbase);
-#ifdef CONFIG_MTRR
- if (par->mtrr_handle)
- mtrr_del(par->mtrr_handle,
- info->fix.smem_start,
- info->fix.smem_len);
-#endif
+ arch_phys_wc_del(par->wc_cookie);
unregister_framebuffer(info);
framebuffer_release(info);
diff --git a/include/video/kyro.h b/include/video/kyro.h
index c563968..b958c2e 100644
--- a/include/video/kyro.h
+++ b/include/video/kyro.h
@@ -35,9 +35,7 @@ struct kyrofb_info {
/* Useful to hold depth here for Linux */
u8 PIXDEPTH;
-#ifdef CONFIG_MTRR
- int mtrr_handle;
-#endif
+ int wc_cookie;
};
extern int kyro_dev_init(void);
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 39/47] video: fbdev: pm2fb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (37 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 38/47] video: fbdev: kyrofb: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 40/47] video: fbdev: pm3fb: " Luis R. Rodriguez
` (8 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/pm2fb.c | 31 +++++--------------------------
1 file changed, 5 insertions(+), 26 deletions(-)
diff --git a/drivers/video/fbdev/pm2fb.c b/drivers/video/fbdev/pm2fb.c
index 3b85b64..aa8d288 100644
--- a/drivers/video/fbdev/pm2fb.c
+++ b/drivers/video/fbdev/pm2fb.c
@@ -38,10 +38,6 @@
#include <linux/fb.h>
#include <linux/init.h>
#include <linux/pci.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
#include <video/permedia2.h>
#include <video/cvisionppc.h>
@@ -81,10 +77,7 @@ static char *mode_option;
static bool lowhsync;
static bool lowvsync;
static bool noaccel;
-/* mtrr option */
-#ifdef CONFIG_MTRR
static bool nomtrr;
-#endif
/*
* The hardware state of the graphics card that isn't part of the
@@ -100,7 +93,7 @@ struct pm2fb_par
u32 mem_control; /* MemControl reg at probe */
u32 boot_address; /* BootAddress reg at probe */
u32 palette[16];
- int mtrr_handle;
+ int wc_cookie;
};
/*
@@ -1637,21 +1630,16 @@ static int pm2fb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
goto err_exit_mmio;
}
info->screen_base =
- ioremap_nocache(pm2fb_fix.smem_start, pm2fb_fix.smem_len);
+ ioremap_wc(pm2fb_fix.smem_start, pm2fb_fix.smem_len);
if (!info->screen_base) {
printk(KERN_WARNING "pm2fb: Can't ioremap smem area.\n");
release_mem_region(pm2fb_fix.smem_start, pm2fb_fix.smem_len);
goto err_exit_mmio;
}
-#ifdef CONFIG_MTRR
- default_par->mtrr_handle = -1;
if (!nomtrr)
- default_par->mtrr_handle =
- mtrr_add(pm2fb_fix.smem_start,
- pm2fb_fix.smem_len,
- MTRR_TYPE_WRCOMB, 1);
-#endif
+ default_par->wc_cookie = arch_phys_wc_add(pm2fb_fix.smem_start,
+ pm2fb_fix.smem_len);
info->fbops = &pm2fb_ops;
info->fix = pm2fb_fix;
@@ -1733,12 +1721,7 @@ static void pm2fb_remove(struct pci_dev *pdev)
struct pm2fb_par *par = info->par;
unregister_framebuffer(info);
-
-#ifdef CONFIG_MTRR
- if (par->mtrr_handle >= 0)
- mtrr_del(par->mtrr_handle, info->fix.smem_start,
- info->fix.smem_len);
-#endif /* CONFIG_MTRR */
+ arch_phys_wc_del(par->wc_cookie);
iounmap(info->screen_base);
release_mem_region(fix->smem_start, fix->smem_len);
iounmap(par->v_regs);
@@ -1791,10 +1774,8 @@ static int __init pm2fb_setup(char *options)
lowvsync = 1;
else if (!strncmp(this_opt, "hwcursor=", 9))
hwcursor = simple_strtoul(this_opt + 9, NULL, 0);
-#ifdef CONFIG_MTRR
else if (!strncmp(this_opt, "nomtrr", 6))
nomtrr = 1;
-#endif
else if (!strncmp(this_opt, "noaccel", 7))
noaccel = 1;
else
@@ -1847,10 +1828,8 @@ MODULE_PARM_DESC(noaccel, "Disable acceleration");
module_param(hwcursor, int, 0644);
MODULE_PARM_DESC(hwcursor, "Enable hardware cursor "
"(1=enable, 0=disable, default=1)");
-#ifdef CONFIG_MTRR
module_param(nomtrr, bool, 0);
MODULE_PARM_DESC(nomtrr, "Disable MTRR support (0 or 1=disabled) (default=0)");
-#endif
MODULE_AUTHOR("Jim Hague <jim.hague@acm.org>");
MODULE_DESCRIPTION("Permedia2 framebuffer device driver");
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 40/47] video: fbdev: pm3fb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (38 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 39/47] video: fbdev: pm2fb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 41/47] video: fbdev: rivafb: " Luis R. Rodriguez
` (7 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/pm3fb.c | 30 ++++++------------------------
1 file changed, 6 insertions(+), 24 deletions(-)
diff --git a/drivers/video/fbdev/pm3fb.c b/drivers/video/fbdev/pm3fb.c
index 77b99ed..6ff5077 100644
--- a/drivers/video/fbdev/pm3fb.c
+++ b/drivers/video/fbdev/pm3fb.c
@@ -32,9 +32,6 @@
#include <linux/fb.h>
#include <linux/init.h>
#include <linux/pci.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
#include <video/pm3fb.h>
@@ -58,11 +55,7 @@
static int hwcursor = 1;
static char *mode_option;
static bool noaccel;
-
-/* mtrr option */
-#ifdef CONFIG_MTRR
static bool nomtrr;
-#endif
/*
* This structure defines the hardware state of the graphics card. Normally
@@ -76,7 +69,7 @@ struct pm3_par {
u32 video; /* video flags before blanking */
u32 base; /* screen base in 128 bits unit */
u32 palette[16];
- int mtrr_handle;
+ int wc_cookie;
};
/*
@@ -1374,8 +1367,8 @@ static int pm3fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
printk(KERN_WARNING "pm3fb: Can't reserve smem.\n");
goto err_exit_mmio;
}
- info->screen_base =
- ioremap_nocache(pm3fb_fix.smem_start, pm3fb_fix.smem_len);
+ info->screen_base = ioremap_wc(pm3fb_fix.smem_start,
+ pm3fb_fix.smem_len);
if (!info->screen_base) {
printk(KERN_WARNING "pm3fb: Can't ioremap smem area.\n");
release_mem_region(pm3fb_fix.smem_start, pm3fb_fix.smem_len);
@@ -1383,12 +1376,9 @@ static int pm3fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
}
info->screen_size = pm3fb_fix.smem_len;
-#ifdef CONFIG_MTRR
if (!nomtrr)
- par->mtrr_handle = mtrr_add(pm3fb_fix.smem_start,
- pm3fb_fix.smem_len,
- MTRR_TYPE_WRCOMB, 1);
-#endif
+ par->wc_cookie = arch_phys_wc_add(pm3fb_fix.smem_start,
+ pm3fb_fix.smem_len);
info->fbops = &pm3fb_ops;
par->video = PM3_READ_REG(par, PM3VideoControl);
@@ -1478,11 +1468,7 @@ static void pm3fb_remove(struct pci_dev *dev)
unregister_framebuffer(info);
fb_dealloc_cmap(&info->cmap);
-#ifdef CONFIG_MTRR
- if (par->mtrr_handle >= 0)
- mtrr_del(par->mtrr_handle, info->fix.smem_start,
- info->fix.smem_len);
-#endif /* CONFIG_MTRR */
+ arch_phys_wc_del(par->wc_cookie);
iounmap(info->screen_base);
release_mem_region(fix->smem_start, fix->smem_len);
iounmap(par->v_regs);
@@ -1533,10 +1519,8 @@ static int __init pm3fb_setup(char *options)
noaccel = 1;
else if (!strncmp(this_opt, "hwcursor=", 9))
hwcursor = simple_strtoul(this_opt + 9, NULL, 0);
-#ifdef CONFIG_MTRR
else if (!strncmp(this_opt, "nomtrr", 6))
nomtrr = 1;
-#endif
else
mode_option = this_opt;
}
@@ -1577,10 +1561,8 @@ MODULE_PARM_DESC(noaccel, "Disable acceleration");
module_param(hwcursor, int, 0644);
MODULE_PARM_DESC(hwcursor, "Enable hardware cursor "
"(1=enable, 0=disable, default=1)");
-#ifdef CONFIG_MTRR
module_param(nomtrr, bool, 0);
MODULE_PARM_DESC(nomtrr, "Disable MTRR support (0 or 1=disabled) (default=0)");
-#endif
MODULE_DESCRIPTION("Permedia3 framebuffer device driver");
MODULE_LICENSE("GPL");
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 41/47] video: fbdev: rivafb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (39 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 40/47] video: fbdev: pm3fb: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 42/47] video: fbdev: tdfxfb: " Luis R. Rodriguez
` (6 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/riva/fbdev.c | 39 +++++++--------------------------------
drivers/video/fbdev/riva/rivafb.h | 4 +---
2 files changed, 8 insertions(+), 35 deletions(-)
diff --git a/drivers/video/fbdev/riva/fbdev.c b/drivers/video/fbdev/riva/fbdev.c
index be73727..854b86d 100644
--- a/drivers/video/fbdev/riva/fbdev.c
+++ b/drivers/video/fbdev/riva/fbdev.c
@@ -41,9 +41,6 @@
#include <linux/pci.h>
#include <linux/backlight.h>
#include <linux/bitrev.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
#ifdef CONFIG_PPC_OF
#include <asm/prom.h>
#include <asm/pci-bridge.h>
@@ -208,9 +205,7 @@ MODULE_DEVICE_TABLE(pci, rivafb_pci_tbl);
static int flatpanel = -1; /* Autodetect later */
static int forceCRTC = -1;
static bool noaccel = 0;
-#ifdef CONFIG_MTRR
static bool nomtrr = 0;
-#endif
#ifdef CONFIG_PMAC_BACKLIGHT
static int backlight = 1;
#else
@@ -2013,28 +2008,18 @@ static int rivafb_probe(struct pci_dev *pd, const struct pci_device_id *ent)
rivafb_fix.smem_len = riva_get_memlen(default_par) * 1024;
default_par->dclk_max = riva_get_maxdclk(default_par) * 1000;
- info->screen_base = ioremap(rivafb_fix.smem_start,
- rivafb_fix.smem_len);
+ info->screen_base = ioremap_wc(rivafb_fix.smem_start,
+ rivafb_fix.smem_len);
if (!info->screen_base) {
printk(KERN_ERR PFX "cannot ioremap FB base\n");
ret = -EIO;
goto err_iounmap_pramin;
}
-#ifdef CONFIG_MTRR
- if (!nomtrr) {
- default_par->mtrr.vram = mtrr_add(rivafb_fix.smem_start,
- rivafb_fix.smem_len,
- MTRR_TYPE_WRCOMB, 1);
- if (default_par->mtrr.vram < 0) {
- printk(KERN_ERR PFX "unable to setup MTRR\n");
- } else {
- default_par->mtrr.vram_valid = 1;
- /* let there be speed */
- printk(KERN_INFO PFX "RIVA MTRR set to ON\n");
- }
- }
-#endif /* CONFIG_MTRR */
+ if (!nomtrr)
+ default_par->wc_cookie =
+ arch_phys_wc_add(rivafb_fix.smem_start,
+ rivafb_fix.smem_len);
info->fbops = &riva_fb_ops;
info->fix = rivafb_fix;
@@ -2108,13 +2093,7 @@ static void rivafb_remove(struct pci_dev *pd)
unregister_framebuffer(info);
riva_bl_exit(info);
-
-#ifdef CONFIG_MTRR
- if (par->mtrr.vram_valid)
- mtrr_del(par->mtrr.vram, info->fix.smem_start,
- info->fix.smem_len);
-#endif /* CONFIG_MTRR */
-
+ arch_phys_wc_del(par->wc_cookie);
iounmap(par->ctrl_base);
iounmap(info->screen_base);
if (par->riva.Architecture == NV_ARCH_03)
@@ -2153,10 +2132,8 @@ static int rivafb_setup(char *options)
flatpanel = 1;
} else if (!strncmp(this_opt, "backlight:", 10)) {
backlight = simple_strtoul(this_opt+10, NULL, 0);
-#ifdef CONFIG_MTRR
} else if (!strncmp(this_opt, "nomtrr", 6)) {
nomtrr = 1;
-#endif
} else if (!strncmp(this_opt, "strictmode", 10)) {
strictmode = 1;
} else if (!strncmp(this_opt, "noaccel", 7)) {
@@ -2212,10 +2189,8 @@ module_param(flatpanel, int, 0);
MODULE_PARM_DESC(flatpanel, "Enables experimental flat panel support for some chipsets. (0 or 1=enabled) (default=0)");
module_param(forceCRTC, int, 0);
MODULE_PARM_DESC(forceCRTC, "Forces usage of a particular CRTC in case autodetection fails. (0 or 1) (default=autodetect)");
-#ifdef CONFIG_MTRR
module_param(nomtrr, bool, 0);
MODULE_PARM_DESC(nomtrr, "Disables MTRR support (0 or 1=disabled) (default=0)");
-#endif
module_param(strictmode, bool, 0);
MODULE_PARM_DESC(strictmode, "Only use video modes from EDID");
diff --git a/drivers/video/fbdev/riva/rivafb.h b/drivers/video/fbdev/riva/rivafb.h
index d9f107b..61fd37c 100644
--- a/drivers/video/fbdev/riva/rivafb.h
+++ b/drivers/video/fbdev/riva/rivafb.h
@@ -61,9 +61,7 @@ struct riva_par {
int FlatPanel;
struct pci_dev *pdev;
int cursor_reset;
-#ifdef CONFIG_MTRR
- struct { int vram; int vram_valid; } mtrr;
-#endif
+ int wc_cookie;
struct riva_i2c_chan chan[3];
};
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 42/47] video: fbdev: tdfxfb: use arch_phys_wc_add() and ioremap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (40 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 41/47] video: fbdev: rivafb: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 43/47] video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc() Luis R. Rodriguez
` (5 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/tdfxfb.c | 41 ++++++-----------------------------------
include/video/tdfx.h | 2 +-
2 files changed, 7 insertions(+), 36 deletions(-)
diff --git a/drivers/video/fbdev/tdfxfb.c b/drivers/video/fbdev/tdfxfb.c
index f761fe3..621fa44 100644
--- a/drivers/video/fbdev/tdfxfb.c
+++ b/drivers/video/fbdev/tdfxfb.c
@@ -78,24 +78,6 @@
#define DPRINTK(a, b...) pr_debug("fb: %s: " a, __func__ , ## b)
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#else
-/* duplicate asm/mtrr.h defines to work on archs without mtrr */
-#define MTRR_TYPE_WRCOMB 1
-
-static inline int mtrr_add(unsigned long base, unsigned long size,
- unsigned int type, char increment)
-{
- return -ENODEV;
-}
-static inline int mtrr_del(int reg, unsigned long base,
- unsigned long size)
-{
- return -ENODEV;
-}
-#endif
-
#define BANSHEE_MAX_PIXCLOCK 270000
#define VOODOO3_MAX_PIXCLOCK 300000
#define VOODOO5_MAX_PIXCLOCK 350000
@@ -167,7 +149,6 @@ static int nopan;
static int nowrap = 1; /* not implemented (yet) */
static int hwcursor = 1;
static char *mode_option;
-/* mtrr option */
static bool nomtrr;
/* -------------------------------------------------------------------------
@@ -1454,8 +1435,8 @@ static int tdfxfb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
goto out_err_regbase;
}
- info->screen_base = ioremap_nocache(info->fix.smem_start,
- info->fix.smem_len);
+ info->screen_base = ioremap_wc(info->fix.smem_start,
+ info->fix.smem_len);
if (!info->screen_base) {
printk(KERN_ERR "fb: Can't remap %s framebuffer.\n",
info->fix.id);
@@ -1473,11 +1454,9 @@ static int tdfxfb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
printk(KERN_INFO "fb: %s memory = %dK\n", info->fix.id,
info->fix.smem_len >> 10);
- default_par->mtrr_handle = -1;
if (!nomtrr)
- default_par->mtrr_handle =
- mtrr_add(info->fix.smem_start, info->fix.smem_len,
- MTRR_TYPE_WRCOMB, 1);
+ default_par->wc_cookie= arch_phys_wc_add(info->fix.smem_start,
+ info->fix.smem_len);
info->fix.ypanstep = nopan ? 0 : 1;
info->fix.ywrapstep = nowrap ? 0 : 1;
@@ -1566,9 +1545,7 @@ out_err_iobase:
#ifdef CONFIG_FB_3DFX_I2C
tdfxfb_delete_i2c_busses(default_par);
#endif
- if (default_par->mtrr_handle >= 0)
- mtrr_del(default_par->mtrr_handle, info->fix.smem_start,
- info->fix.smem_len);
+ arch_phys_wc_del(default_par->wc_cookie);
release_region(pci_resource_start(pdev, 2),
pci_resource_len(pdev, 2));
out_err_screenbase:
@@ -1604,10 +1581,8 @@ static void __init tdfxfb_setup(char *options)
nowrap = 1;
} else if (!strncmp(this_opt, "hwcursor=", 9)) {
hwcursor = simple_strtoul(this_opt + 9, NULL, 0);
-#ifdef CONFIG_MTRR
} else if (!strncmp(this_opt, "nomtrr", 6)) {
nomtrr = 1;
-#endif
} else {
mode_option = this_opt;
}
@@ -1633,9 +1608,7 @@ static void tdfxfb_remove(struct pci_dev *pdev)
#ifdef CONFIG_FB_3DFX_I2C
tdfxfb_delete_i2c_busses(par);
#endif
- if (par->mtrr_handle >= 0)
- mtrr_del(par->mtrr_handle, info->fix.smem_start,
- info->fix.smem_len);
+ arch_phys_wc_del(par->wc_cookie);
iounmap(par->regbase_virt);
iounmap(info->screen_base);
@@ -1677,10 +1650,8 @@ MODULE_PARM_DESC(hwcursor, "Enable hardware cursor "
"(1=enable, 0=disable, default=1)");
module_param(mode_option, charp, 0);
MODULE_PARM_DESC(mode_option, "Initial video mode e.g. '648x480-8@60'");
-#ifdef CONFIG_MTRR
module_param(nomtrr, bool, 0);
MODULE_PARM_DESC(nomtrr, "Disable MTRR support (default: enabled)");
-#endif
module_init(tdfxfb_init);
module_exit(tdfxfb_exit);
diff --git a/include/video/tdfx.h b/include/video/tdfx.h
index befbaf0..69674b9 100644
--- a/include/video/tdfx.h
+++ b/include/video/tdfx.h
@@ -196,7 +196,7 @@ struct tdfx_par {
u32 palette[16];
void __iomem *regbase_virt;
unsigned long iobase;
- int mtrr_handle;
+ int wc_cookie;
#ifdef CONFIG_FB_3DFX_I2C
struct tdfxfb_i2c_chan chan[2];
#endif
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 43/47] video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (41 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 42/47] video: fbdev: tdfxfb: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 44/47] video: fbdev: atmel_lcdfb: use ioremap_wc() for framebuffer Luis R. Rodriguez
` (4 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.
There are a few motivations for this:
a) Take advantage of PAT when available
b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT
c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.
@ mtrr_found @
expression index, base, size;
@@
-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);
@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@
-mtrr_del(index, base, size);
+arch_phys_wc_del(index);
@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@
-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);
@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@
-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);
@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);
@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@
-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);
Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/vt8623fb.c | 31 ++++++-------------------------
1 file changed, 6 insertions(+), 25 deletions(-)
diff --git a/drivers/video/fbdev/vt8623fb.c b/drivers/video/fbdev/vt8623fb.c
index ea7f056..60f24828 100644
--- a/drivers/video/fbdev/vt8623fb.c
+++ b/drivers/video/fbdev/vt8623fb.c
@@ -26,13 +26,9 @@
#include <linux/console.h> /* Why should fb driver call console functions? because console_lock() */
#include <video/vga.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
struct vt8623fb_info {
char __iomem *mmio_base;
- int mtrr_reg;
+ int wc_cookie;
struct vgastate state;
struct mutex open_lock;
unsigned int ref_count;
@@ -99,10 +95,7 @@ static struct svga_timing_regs vt8623_timing_regs = {
/* Module parameters */
static char *mode_option = "640x480-8@60";
-
-#ifdef CONFIG_MTRR
static int mtrr = 1;
-#endif
MODULE_AUTHOR("(c) 2006 Ondrej Zajicek <santiago@crfreenet.org>");
MODULE_LICENSE("GPL");
@@ -112,11 +105,8 @@ module_param(mode_option, charp, 0644);
MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
module_param_named(mode, mode_option, charp, 0);
MODULE_PARM_DESC(mode, "Default video mode e.g. '648x480-8@60' (deprecated)");
-
-#ifdef CONFIG_MTRR
module_param(mtrr, int, 0444);
MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
/* ------------------------------------------------------------------------- */
@@ -710,7 +700,7 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
info->fix.mmio_len = pci_resource_len(dev, 1);
/* Map physical IO memory address into kernel space */
- info->screen_base = pci_iomap(dev, 0, 0);
+ info->screen_base = pci_iomap_wc(dev, 0, 0);
if (! info->screen_base) {
rc = -ENOMEM;
dev_err(info->device, "iomap for framebuffer failed\n");
@@ -781,12 +771,9 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
/* Record a reference to the driver data */
pci_set_drvdata(dev, info);
-#ifdef CONFIG_MTRR
- if (mtrr) {
- par->mtrr_reg = -1;
- par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
- }
-#endif
+ if (mtrr)
+ par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+ info->fix.smem_len);
return 0;
@@ -816,13 +803,7 @@ static void vt8623_pci_remove(struct pci_dev *dev)
if (info) {
struct vt8623fb_info *par = info->par;
-#ifdef CONFIG_MTRR
- if (par->mtrr_reg >= 0) {
- mtrr_del(par->mtrr_reg, 0, 0);
- par->mtrr_reg = -1;
- }
-#endif
-
+ arch_phys_wc_del(par->wc_cookie);
unregister_framebuffer(info);
fb_dealloc_cmap(&info->cmap);
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 44/47] video: fbdev: atmel_lcdfb: use ioremap_wc() for framebuffer
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (42 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 43/47] video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc() Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 45/47] video: fbdev: geode gxfb: " Luis R. Rodriguez
` (3 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.
In this case there are a few motivations for this:
a) Take advantage of PAT when available
b) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/atmel_lcdfb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/atmel_lcdfb.c b/drivers/video/fbdev/atmel_lcdfb.c
index 94a8d04..abadc49 100644
--- a/drivers/video/fbdev/atmel_lcdfb.c
+++ b/drivers/video/fbdev/atmel_lcdfb.c
@@ -1266,7 +1266,8 @@ static int __init atmel_lcdfb_probe(struct platform_device *pdev)
goto stop_clk;
}
- info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+ info->screen_base = ioremap_wc(info->fix.smem_start,
+ info->fix.smem_len);
if (!info->screen_base) {
ret = -ENOMEM;
goto release_intmem;
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 45/47] video: fbdev: geode gxfb: use ioremap_wc() for framebuffer
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (43 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 44/47] video: fbdev: atmel_lcdfb: use ioremap_wc() for framebuffer Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 46/47] video: fbdev: gxt4500: use pci_ioremap_wc_bar() " Luis R. Rodriguez
` (2 subsequent siblings)
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.
In this case there are a few motivations for this:
a) Take advantage of PAT when available
b) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/geode/gxfb_core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/geode/gxfb_core.c b/drivers/video/fbdev/geode/gxfb_core.c
index 124d7c7..ec9fc9a 100644
--- a/drivers/video/fbdev/geode/gxfb_core.c
+++ b/drivers/video/fbdev/geode/gxfb_core.c
@@ -263,7 +263,8 @@ static int gxfb_map_video_memory(struct fb_info *info, struct pci_dev *dev)
info->fix.smem_start = pci_resource_start(dev, 0);
info->fix.smem_len = vram ? vram : gx_frame_buffer_size();
- info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+ info->screen_base = ioremap_wc(info->fix.smem_start,
+ info->fix.smem_len);
if (!info->screen_base)
return -ENOMEM;
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 46/47] video: fbdev: gxt4500: use pci_ioremap_wc_bar() for framebuffer
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (44 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 45/47] video: fbdev: geode gxfb: " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 47/47] mtrr: bury MTRR - unexport mtrr_add() and mtrr_del() Luis R. Rodriguez
2015-03-21 1:08 ` [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Andy Lutomirski
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.
In this case there are a few motivations for this:
a) Take advantage of PAT when available
b) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
drivers/video/fbdev/gxt4500.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/gxt4500.c b/drivers/video/fbdev/gxt4500.c
index 135d78a..f19133a 100644
--- a/drivers/video/fbdev/gxt4500.c
+++ b/drivers/video/fbdev/gxt4500.c
@@ -662,7 +662,7 @@ static int gxt4500_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
info->fix.smem_start = fb_phys;
info->fix.smem_len = pci_resource_len(pdev, 1);
- info->screen_base = pci_ioremap_bar(pdev, 1);
+ info->screen_base = pci_ioremap_wc_bar(pdev, 1);
if (!info->screen_base) {
dev_err(&pdev->dev, "gxt4500: cannot map framebuffer\n");
goto err_unmap_regs;
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v1 47/47] mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (45 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 46/47] video: fbdev: gxt4500: use pci_ioremap_wc_bar() " Luis R. Rodriguez
@ 2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-21 1:08 ` [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Andy Lutomirski
47 siblings, 0 replies; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
venkatesh.pallipadi, airlied
Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
Ingo Molnar, Daniel Vetter, Antonino Daplas,
Jean-Christophe Plagniol-Villard, Tomi Valkeinen
From: "Luis R. Rodriguez" <mcgrof@suse.com>
The crusade to replace mtrr_add() with architecture agnostic
arch_phys_wc_add() is complete, this will ensure write-combining
implementations (PAT on x86) is taken advantage instead of using
MTRR. With the crusade done now, hide direct MTRR access for
drivers.
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
arch/x86/kernel/cpu/mtrr/main.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index b68b671..f0e19db 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -446,7 +446,6 @@ int mtrr_add(unsigned long base, unsigned long size, unsigned int type,
return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type,
increment);
}
-EXPORT_SYMBOL(mtrr_add);
/**
* mtrr_del_page - delete a memory type region
@@ -535,7 +534,6 @@ int mtrr_del(int reg, unsigned long base, unsigned long size)
return -EINVAL;
return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
}
-EXPORT_SYMBOL(mtrr_del);
/**
* __arch_phys_wc_add - add a WC MTRR even if PAT is available
--
2.3.2.209.gd67f9d5.dirty
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
` (46 preceding siblings ...)
2015-03-20 23:18 ` [PATCH v1 47/47] mtrr: bury MTRR - unexport mtrr_add() and mtrr_del() Luis R. Rodriguez
@ 2015-03-21 1:08 ` Andy Lutomirski
47 siblings, 0 replies; 400+ messages in thread
From: Andy Lutomirski @ 2015-03-21 1:08 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
xen-devel, Luis R. Rodriguez
On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> When a system has PAT support enabled you don't need to be
> using MTRRs. Andy had added arch_phys_wc_add() long ago to
> help with this but not all drivers were converted over. We
> have to take care to only convert drivers where we know that
> the proper ioremap_wc() API has been used. Doing this requires
> a bit of work on verifying the driver split out the ioremap'd
> areas -- and if not doing that ourselves. Verifying a driver
> uses the same areas can be hard but with a bit of love Coccinelle
> can help with that.
>
> We're motivated to change drivers for a few reasons:
>
> 1) Take advantage of PAT when available
>
> 2) Help with the goal of eventually using _PAGE_CACHE_UC over
> _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
Nice!
--Andy
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH v4 1/3] x86, stackvalidate: Compile-time stack frame pointer validation
2015-05-18 16:34 [PATCH v4 0/3] Compile-time stack frame pointer validation Josh Poimboeuf
@ 2015-05-18 16:34 ` Josh Poimboeuf
2015-05-18 16:34 ` [PATCH v4 2/3] x86: Make push/pop CFI macros arch-independent Josh Poimboeuf
` (2 subsequent siblings)
3 siblings, 0 replies; 400+ messages in thread
From: Josh Poimboeuf @ 2015-05-18 16:34 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
Cc: Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel
Frame pointer based stack traces aren't always reliable. One big reason
is that most asm functions don't set up the frame pointer.
Fix that by enforcing that all asm functions honor CONFIG_FRAME_POINTER.
This is done with a new stackvalidate host tool which is automatically
run for every compiled .S file and which validates that every asm
function does the proper frame pointer setup.
Also, to make sure somebody didn't forget to annotate their callable asm code
as a function, flag an error for any return instructions which are hiding
outside of a function. In almost all cases, return instructions are part of
callable functions and should be annotated as such so that we can validate
their frame pointer usage. A whitelist mechanism exists for those few return
instructions which are not actually in callable code.
It currently only supports x86_64. It *almost* supports x86_32, but the
stackvalidate code doesn't yet know how to deal with 32-bit REL
relocations for the return whitelists. I tried to make the code generic
so that support for other architectures can be plugged in pretty easily.
As a first step, CONFIG_STACK_VALIDATION is disabled by default, and all
reported non-compliances result in warnings. Right now I'm seeing 200+
warnings. Once we get them all cleaned up, we can change the default to
CONFIG_STACK_VALIDATION=y and change the warnings to build errors so the
asm code can stay clean.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Marek <mmarek@suse.cz>
---
MAINTAINERS | 6 +
arch/Kconfig | 3 +
arch/x86/Kconfig | 1 +
arch/x86/Makefile | 6 +-
lib/Kconfig.debug | 11 ++
scripts/Makefile | 1 +
scripts/Makefile.build | 22 ++-
scripts/stackvalidate/Makefile | 17 ++
scripts/stackvalidate/arch-x86.c | 134 +++++++++++++
scripts/stackvalidate/arch.h | 10 +
scripts/stackvalidate/elf.c | 352 ++++++++++++++++++++++++++++++++++
scripts/stackvalidate/elf.h | 56 ++++++
scripts/stackvalidate/list.h | 217 +++++++++++++++++++++
scripts/stackvalidate/stackvalidate.c | 226 ++++++++++++++++++++++
14 files changed, 1059 insertions(+), 3 deletions(-)
create mode 100644 scripts/stackvalidate/Makefile
create mode 100644 scripts/stackvalidate/arch-x86.c
create mode 100644 scripts/stackvalidate/arch.h
create mode 100644 scripts/stackvalidate/elf.c
create mode 100644 scripts/stackvalidate/elf.h
create mode 100644 scripts/stackvalidate/list.h
create mode 100644 scripts/stackvalidate/stackvalidate.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 78ea7b6..6d700bf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9451,6 +9451,12 @@ L: stable@vger.kernel.org
S: Supported
F: Documentation/stable_kernel_rules.txt
+STACK VALIDATION
+M: Josh Poimboeuf <jpoimboe@redhat.com>
+S: Supported
+F: scripts/stackvalidate/
+F: arch/x86/include/asm/func.h
+
STAGING SUBSYSTEM
M: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
T: git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
diff --git a/arch/Kconfig b/arch/Kconfig
index bec6666..a5c3f50 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -506,6 +506,9 @@ config HAVE_COPY_THREAD_TLS
normal C parameter passing, rather than extracting the syscall
argument from pt_regs.
+config HAVE_STACK_VALIDATION
+ bool
+
#
# ABI hall of shame
#
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c92fdcc..d60a2378a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -146,6 +146,7 @@ config X86
select ACPI_LEGACY_TABLES_LOOKUP if ACPI
select X86_FEATURE_NAMES if PROC_FS
select SRCU
+ select HAVE_STACK_VALIDATION if FRAME_POINTER && X86_64
config INSTRUCTION_DECODER
def_bool y
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 57996ee..c5598a0 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -180,9 +180,13 @@ KBUILD_CFLAGS += $(call cc-option,-mno-avx,)
KBUILD_CFLAGS += $(mflags-y)
KBUILD_AFLAGS += $(mflags-y)
-archscripts: scripts_basic
+archscripts: scripts_basic $(objtree)/arch/x86/lib/inat-tables.c
$(Q)$(MAKE) $(build)=arch/x86/tools relocs
+# this file is needed early by scripts/stackvalidate
+$(objtree)/arch/x86/lib/inat-tables.c:
+ $(Q)$(MAKE) $(build)=arch/x86/lib $@
+
###
# Syscall table generation
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index eb3997b..7bfaf80 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -332,6 +332,17 @@ config FRAME_POINTER
larger and slower, but it gives very useful debugging information
in case of kernel bugs. (precise oopses/stacktraces/warnings)
+
+config STACK_VALIDATION
+ bool "Enable kernel stack validation"
+ depends on HAVE_STACK_VALIDATION
+ default n
+ help
+ Add compile-time validations which help make kernel stack traces more
+ reliable. This includes checks to ensure that assembly functions
+ save, update and restore the frame pointer or the back chain pointer.
+
+
config DEBUG_FORCE_WEAK_PER_CPU
bool "Force weak per-cpu definitions"
depends on DEBUG_KERNEL
diff --git a/scripts/Makefile b/scripts/Makefile
index 2016a64..c882a91 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -37,6 +37,7 @@ subdir-y += mod
subdir-$(CONFIG_SECURITY_SELINUX) += selinux
subdir-$(CONFIG_DTC) += dtc
subdir-$(CONFIG_GDB_SCRIPTS) += gdb
+subdir-$(CONFIG_STACK_VALIDATION) += stackvalidate
# Let clean descend into subdirs
subdir- += basic kconfig package
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 01df30a..3b05833 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -253,6 +253,24 @@ define rule_cc_o_c
mv -f $(dot-target).tmp $(dot-target).cmd
endef
+ifdef CONFIG_STACK_VALIDATION
+stackvalidate = $(objtree)/scripts/stackvalidate/stackvalidate
+cmd_stackvalidate = \
+ case $(@) in \
+ arch/x86/purgatory/*) ;; \
+ *) $(stackvalidate) "$(@)"; \
+ esac;
+endif
+
+define rule_as_o_S
+ $(call echo-cmd,as_o_S) $(cmd_as_o_S); \
+ $(cmd_stackvalidate) \
+ scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,as_o_S)' > \
+ $(dot-target).tmp; \
+ rm -f $(depfile); \
+ mv -f $(dot-target).tmp $(dot-target).cmd
+endef
+
# Built-in and composite module parts
$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
$(call cmd,force_checksrc)
@@ -290,8 +308,8 @@ $(obj)/%.s: $(src)/%.S FORCE
quiet_cmd_as_o_S = AS $(quiet_modtag) $@
cmd_as_o_S = $(CC) $(a_flags) -c -o $@ $<
-$(obj)/%.o: $(src)/%.S FORCE
- $(call if_changed_dep,as_o_S)
+$(obj)/%.o: $(src)/%.S $(stackvalidate) FORCE
+ $(call if_changed_rule,as_o_S)
targets += $(real-objs-y) $(real-objs-m) $(lib-y)
targets += $(extra-y) $(MAKECMDGOALS) $(always)
diff --git a/scripts/stackvalidate/Makefile b/scripts/stackvalidate/Makefile
new file mode 100644
index 0000000..6027ec4
--- /dev/null
+++ b/scripts/stackvalidate/Makefile
@@ -0,0 +1,17 @@
+hostprogs-y := stackvalidate
+always := $(hostprogs-y)
+
+stackvalidate-objs := stackvalidate.o elf.o
+
+HOSTCFLAGS += -Werror
+HOSTLOADLIBES_stackvalidate := -lelf
+
+ifdef CONFIG_X86
+
+stackvalidate-objs += arch-x86.o
+
+HOSTCFLAGS_arch-x86.o := -I$(objtree)/arch/x86/lib/ -I$(srctree)/arch/x86/include/ -I$(srctree)/arch/x86/lib/
+
+$(obj)/arch-x86.o: $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c $(srctree)/arch/x86/include/asm/inat_types.h $(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c
+
+endif
diff --git a/scripts/stackvalidate/arch-x86.c b/scripts/stackvalidate/arch-x86.c
new file mode 100644
index 0000000..fbc0756
--- /dev/null
+++ b/scripts/stackvalidate/arch-x86.c
@@ -0,0 +1,134 @@
+#include <stdio.h>
+
+#define unlikely(cond) (cond)
+#include <asm/insn.h>
+#include <inat.c>
+#include <insn.c>
+
+#include "elf.h"
+#include "arch.h"
+
+static int is_x86_64(struct elf *elf)
+{
+ switch (elf->ehdr.e_machine) {
+ case EM_X86_64:
+ return 1;
+ case EM_386:
+ return 0;
+ default:
+ WARN("unexpected ELF machine type %d", elf->ehdr.e_machine);
+ return -1;
+ }
+}
+
+/*
+ * arch_validate_function() - Ensures the given asm function saves, sets up,
+ * and restores the frame pointer.
+ *
+ * The frame pointer prologue/epilogue should look something like:
+ *
+ * push %rbp
+ * mov %rsp, %rbp
+ * [ function body ]
+ * pop %rbp
+ * ret
+ *
+ * Return value:
+ * -1: bad instruction
+ * 1: missing frame pointer logic
+ * 0: validation succeeded
+ */
+int arch_validate_function(struct elf *elf, struct symbol *func)
+{
+ struct insn insn;
+ unsigned long addr, length;
+ int push, mov, pop, ret, x86_64;
+
+ push = mov = pop = ret = 0;
+
+ x86_64 = is_x86_64(elf);
+ if (x86_64 == -1)
+ return -1;
+
+ for (addr = func->start; addr < func->end; addr += length) {
+ insn_init(&insn, (void *)addr, func->end - addr, x86_64);
+ insn_get_length(&insn);
+ length = insn.length;
+ insn_get_opcode(&insn);
+ if (!length || !insn.opcode.got) {
+ WARN("%s+0x%lx: bad instruction", func->name,
+ addr - func->start);
+ return -1;
+ }
+
+ switch (insn.opcode.bytes[0]) {
+ case 0x55:
+ if (!insn.rex_prefix.nbytes)
+ /* push bp */
+ push++;
+ break;
+ case 0x5d:
+ if (!insn.rex_prefix.nbytes)
+ /* pop bp */
+ pop++;
+ break;
+ case 0xc9: /* leave */
+ pop++;
+ break;
+ case 0x89:
+ insn_get_modrm(&insn);
+ if (insn.modrm.bytes[0] == 0xe5)
+ /* mov sp, bp */
+ mov++;
+ break;
+ case 0xc3: /* ret */
+ ret++;
+ break;
+ }
+ }
+
+ if (push != 1 || mov != 1 || !pop || !ret || pop != ret) {
+ WARN("%s() is missing frame pointer logic. Please use FUNC_ENTER.",
+ func->name);
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * arch_is_return_insn() - Determines whether the instruction at the given
+ * address is a return instruction. Also returns the instruction length in
+ * *len.
+ *
+ * Return value:
+ * -1: bad instruction
+ * 0: no, it's not a return instruction
+ * 1: yes, it's a return instruction
+ */
+int arch_is_return_insn(struct elf *elf, unsigned long addr,
+ unsigned int maxlen, unsigned int *len)
+{
+ struct insn insn;
+ int x86_64;
+
+ x86_64 = is_x86_64(elf);
+ if (x86_64 == -1)
+ return -1;
+
+ insn_init(&insn, (void *)addr, maxlen, x86_64);
+ insn_get_length(&insn);
+ insn_get_opcode(&insn);
+ if (!insn.opcode.got)
+ return -1;
+
+ *len = insn.length;
+
+ switch (insn.opcode.bytes[0]) {
+ case 0xc2: case 0xc3: /* ret near */
+ case 0xca: case 0xcb: /* ret far */
+ return 1;
+ }
+
+ return 0;
+}
diff --git a/scripts/stackvalidate/arch.h b/scripts/stackvalidate/arch.h
new file mode 100644
index 0000000..3b91b1c
--- /dev/null
+++ b/scripts/stackvalidate/arch.h
@@ -0,0 +1,10 @@
+#ifndef _ARCH_H_
+#define _ARCH_H_
+
+#include "elf.h"
+
+int arch_validate_function(struct elf *elf, struct symbol *func);
+int arch_is_return_insn(struct elf *elf, unsigned long addr,
+ unsigned int maxlen, unsigned int *len);
+
+#endif /* _ARCH_H_ */
diff --git a/scripts/stackvalidate/elf.c b/scripts/stackvalidate/elf.c
new file mode 100644
index 0000000..a1419a5
--- /dev/null
+++ b/scripts/stackvalidate/elf.c
@@ -0,0 +1,352 @@
+/*
+ * elf.c - ELF access library
+ *
+ * Adapted from kpatch (https://github.com/dynup/kpatch):
+ * Copyright (C) 2013-2015 Josh Poimboeuf <jpoimboe@redhat.com>
+ * Copyright (C) 2014 Seth Jennings <sjenning@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include "elf.h"
+
+struct section *elf_find_section_by_name(struct elf *elf, const char *name)
+{
+ struct section *sec;
+
+ list_for_each_entry(sec, &elf->sections, list)
+ if (!strcmp(sec->name, name))
+ return sec;
+
+ return NULL;
+}
+
+static struct section *elf_find_section_by_index(struct elf *elf,
+ unsigned int index)
+{
+ struct section *sec;
+
+ list_for_each_entry(sec, &elf->sections, list)
+ if (sec->index == index)
+ return sec;
+
+ return NULL;
+}
+
+static struct symbol *elf_find_symbol_by_index(struct elf *elf,
+ unsigned int index)
+{
+ struct section *sec;
+ struct symbol *sym;
+
+ list_for_each_entry(sec, &elf->sections, list)
+ list_for_each_entry(sym, &sec->symbols, list)
+ if (sym->index == index)
+ return sym;
+
+ return NULL;
+}
+
+static int elf_read_sections(struct elf *elf)
+{
+ Elf_Scn *s = NULL;
+ struct section *sec;
+ size_t shstrndx, sections_nr;
+ int i;
+
+ if (elf_getshdrnum(elf->elf, §ions_nr)) {
+ perror("elf_getshdrnum");
+ return -1;
+ }
+
+ if (elf_getshdrstrndx(elf->elf, &shstrndx)) {
+ perror("elf_getshdrstrndx");
+ return -1;
+ }
+
+ for (i = 0; i < sections_nr; i++) {
+ sec = malloc(sizeof(*sec));
+ if (!sec) {
+ perror("malloc");
+ return -1;
+ }
+ memset(sec, 0, sizeof(*sec));
+
+ INIT_LIST_HEAD(&sec->symbols);
+ INIT_LIST_HEAD(&sec->relas);
+
+ list_add_tail(&sec->list, &elf->sections);
+
+ s = elf_getscn(elf->elf, i);
+ if (!s) {
+ perror("elf_getscn");
+ return -1;
+ }
+
+ sec->index = elf_ndxscn(s);
+
+ if (!gelf_getshdr(s, &sec->sh)) {
+ perror("gelf_getshdr");
+ return -1;
+ }
+
+ sec->name = elf_strptr(elf->elf, shstrndx, sec->sh.sh_name);
+ if (!sec->name) {
+ perror("elf_strptr");
+ return -1;
+ }
+
+ sec->data = elf_getdata(s, NULL);
+ if (!sec->data) {
+ perror("elf_getdata");
+ return -1;
+ }
+
+ if (sec->data->d_off != 0 ||
+ sec->data->d_size != sec->sh.sh_size) {
+ WARN("unexpected data attributes for %s", sec->name);
+ return -1;
+ }
+
+ sec->start = (unsigned long)sec->data->d_buf;
+ sec->end = sec->start + sec->data->d_size;
+ }
+
+ /* sanity check, one more call to elf_nextscn() should return NULL */
+ if (elf_nextscn(elf->elf, s)) {
+ WARN("section entry mismatch");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int elf_read_symbols(struct elf *elf)
+{
+ struct section *symtab;
+ struct symbol *sym;
+ struct list_head *entry, *tmp;
+ int symbols_nr, i;
+
+ symtab = elf_find_section_by_name(elf, ".symtab");
+ if (!symtab) {
+ WARN("missing symbol table");
+ return -1;
+ }
+
+ symbols_nr = symtab->sh.sh_size / symtab->sh.sh_entsize;
+
+ for (i = 0; i < symbols_nr; i++) {
+ sym = malloc(sizeof(*sym));
+ if (!sym) {
+ perror("malloc");
+ return -1;
+ }
+ memset(sym, 0, sizeof(*sym));
+
+ sym->index = i;
+
+ if (!gelf_getsym(symtab->data, i, &sym->sym)) {
+ perror("gelf_getsym");
+ goto err;
+ }
+
+ sym->name = elf_strptr(elf->elf, symtab->sh.sh_link,
+ sym->sym.st_name);
+ if (!sym->name) {
+ perror("elf_strptr");
+ goto err;
+ }
+
+ sym->type = GELF_ST_TYPE(sym->sym.st_info);
+ sym->bind = GELF_ST_BIND(sym->sym.st_info);
+
+ if (sym->sym.st_shndx > SHN_UNDEF &&
+ sym->sym.st_shndx < SHN_LORESERVE) {
+ sym->sec = elf_find_section_by_index(elf,
+ sym->sym.st_shndx);
+ if (!sym->sec) {
+ WARN("couldn't find section for symbol %s",
+ sym->name);
+ goto err;
+ }
+ if (sym->type == STT_SECTION)
+ sym->name = sym->sec->name;
+ } else
+ sym->sec = elf_find_section_by_index(elf, 0);
+
+ sym->start = sym->sec->start + sym->sym.st_value;
+ sym->end = sym->start + sym->sym.st_size;
+
+ /* sorted insert into a per-section list */
+ entry = &sym->sec->symbols;
+ list_for_each_prev(tmp, &sym->sec->symbols) {
+ struct symbol *s;
+
+ s = list_entry(tmp, struct symbol, list);
+
+ if (sym->start > s->start) {
+ entry = tmp;
+ break;
+ }
+
+ if (sym->start == s->start && sym->end >= s->end) {
+ entry = tmp;
+ break;
+ }
+ }
+ list_add(&sym->list, entry);
+ }
+
+ return 0;
+
+err:
+ free(sym);
+ return -1;
+}
+
+static int elf_read_relas(struct elf *elf)
+{
+ struct section *sec;
+ struct rela *rela;
+ int i;
+ unsigned int symndx;
+
+ list_for_each_entry(sec, &elf->sections, list) {
+ if (sec->sh.sh_type != SHT_RELA)
+ continue;
+
+ sec->base = elf_find_section_by_name(elf, sec->name + 5);
+ if (!sec->base) {
+ WARN("can't find base section for rela section %s",
+ sec->name);
+ return -1;
+ }
+
+ sec->base->rela = sec;
+
+ for (i = 0; i < sec->sh.sh_size / sec->sh.sh_entsize; i++) {
+ rela = malloc(sizeof(*rela));
+ if (!rela) {
+ perror("malloc");
+ return -1;
+ }
+ memset(rela, 0, sizeof(*rela));
+
+ list_add_tail(&rela->list, &sec->relas);
+
+ if (!gelf_getrela(sec->data, i, &rela->rela)) {
+ perror("gelf_getrela");
+ return -1;
+ }
+
+ rela->type = GELF_R_TYPE(rela->rela.r_info);
+ rela->addend = rela->rela.r_addend;
+ rela->offset = rela->rela.r_offset;
+ symndx = GELF_R_SYM(rela->rela.r_info);
+ rela->sym = elf_find_symbol_by_index(elf, symndx);
+ if (!rela->sym) {
+ WARN("can't find rela entry symbol %d for %s",
+ symndx, sec->name);
+ return -1;
+ }
+ }
+ }
+
+ return 0;
+}
+
+struct elf *elf_open(const char *name)
+{
+ struct elf *elf;
+
+ elf_version(EV_CURRENT);
+
+ elf = malloc(sizeof(*elf));
+ if (!elf) {
+ perror("malloc");
+ return NULL;
+ }
+ memset(elf, 0, sizeof(*elf));
+
+ INIT_LIST_HEAD(&elf->sections);
+
+ elf->name = strdup(name);
+ if (!elf->name) {
+ perror("strdup");
+ goto err;
+ }
+
+ elf->fd = open(name, O_RDONLY);
+ if (elf->fd == -1) {
+ perror("open");
+ goto err;
+ }
+
+ elf->elf = elf_begin(elf->fd, ELF_C_READ_MMAP, NULL);
+ if (!elf->elf) {
+ perror("elf_begin");
+ goto err;
+ }
+
+ if (!gelf_getehdr(elf->elf, &elf->ehdr)) {
+ perror("gelf_getehdr");
+ goto err;
+ }
+
+ if (elf_read_sections(elf))
+ goto err;
+
+ if (elf_read_symbols(elf))
+ goto err;
+
+ if (elf_read_relas(elf))
+ goto err;
+
+ return elf;
+
+err:
+ elf_close(elf);
+ return NULL;
+}
+
+void elf_close(struct elf *elf)
+{
+ struct section *sec, *tmpsec;
+ struct symbol *sym, *tmpsym;
+
+ list_for_each_entry_safe(sec, tmpsec, &elf->sections, list) {
+ list_for_each_entry_safe(sym, tmpsym, &sec->symbols, list) {
+ list_del(&sym->list);
+ free(sym);
+ }
+ list_del(&sec->list);
+ free(sec);
+ }
+ if (elf->name)
+ free(elf->name);
+ if (elf->fd > 0)
+ close(elf->fd);
+ if (elf->elf)
+ elf_end(elf->elf);
+ free(elf);
+}
diff --git a/scripts/stackvalidate/elf.h b/scripts/stackvalidate/elf.h
new file mode 100644
index 0000000..db5d5fa
--- /dev/null
+++ b/scripts/stackvalidate/elf.h
@@ -0,0 +1,56 @@
+#ifndef _ELF_H_
+#define _ELF_H_
+
+#include <gelf.h>
+#include "list.h"
+
+#define WARN(format, ...) \
+ fprintf(stderr, \
+ "%s: " format "\n", \
+ elf->name, ##__VA_ARGS__)
+
+struct section {
+ struct list_head list;
+ GElf_Shdr sh;
+ struct list_head symbols;
+ struct list_head relas;
+ struct section *base, *rela;
+ Elf_Data *data;
+ char *name;
+ int index;
+ unsigned long start, end;
+};
+
+struct symbol {
+ struct list_head list;
+ GElf_Sym sym;
+ struct section *sec;
+ char *name;
+ int index;
+ unsigned char bind, type;
+ unsigned long start, end;
+};
+
+struct rela {
+ struct list_head list;
+ GElf_Rela rela;
+ struct symbol *sym;
+ unsigned int type;
+ int offset;
+ int addend;
+};
+
+struct elf {
+ Elf *elf;
+ GElf_Ehdr ehdr;
+ int fd;
+ char *name;
+ struct list_head sections;
+};
+
+
+struct elf *elf_open(const char *name);
+struct section *elf_find_section_by_name(struct elf *elf, const char *name);
+void elf_close(struct elf *elf);
+
+#endif /* _ELF_H_ */
diff --git a/scripts/stackvalidate/list.h b/scripts/stackvalidate/list.h
new file mode 100644
index 0000000..25716b5
--- /dev/null
+++ b/scripts/stackvalidate/list.h
@@ -0,0 +1,217 @@
+#ifndef _LIST_H
+#define _LIST_H
+
+#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
+
+#define container_of(ptr, type, member) ({ \
+ const typeof(((type *)0)->member) *__mptr = (ptr); \
+ (type *)((char *)__mptr - offsetof(type, member)); })
+
+#define LIST_POISON1 ((void *) 0x00100100)
+#define LIST_POISON2 ((void *) 0x00200200)
+
+struct list_head {
+ struct list_head *next, *prev;
+};
+
+#define LIST_HEAD_INIT(name) { &(name), &(name) }
+
+#define LIST_HEAD(name) \
+ struct list_head name = LIST_HEAD_INIT(name)
+
+static inline void INIT_LIST_HEAD(struct list_head *list)
+{
+ list->next = list;
+ list->prev = list;
+}
+
+static inline void __list_add(struct list_head *new,
+ struct list_head *prev,
+ struct list_head *next)
+{
+ next->prev = new;
+ new->next = next;
+ new->prev = prev;
+ prev->next = new;
+}
+
+static inline void list_add(struct list_head *new, struct list_head *head)
+{
+ __list_add(new, head, head->next);
+}
+
+static inline void list_add_tail(struct list_head *new, struct list_head *head)
+{
+ __list_add(new, head->prev, head);
+}
+
+static inline void __list_del(struct list_head *prev, struct list_head *next)
+{
+ next->prev = prev;
+ prev->next = next;
+}
+
+static inline void __list_del_entry(struct list_head *entry)
+{
+ __list_del(entry->prev, entry->next);
+}
+
+static inline void list_del(struct list_head *entry)
+{
+ __list_del(entry->prev, entry->next);
+ entry->next = LIST_POISON1;
+ entry->prev = LIST_POISON2;
+}
+
+static inline void list_replace(struct list_head *old,
+ struct list_head *new)
+{
+ new->next = old->next;
+ new->next->prev = new;
+ new->prev = old->prev;
+ new->prev->next = new;
+}
+
+static inline void list_replace_init(struct list_head *old,
+ struct list_head *new)
+{
+ list_replace(old, new);
+ INIT_LIST_HEAD(old);
+}
+
+static inline void list_del_init(struct list_head *entry)
+{
+ __list_del_entry(entry);
+ INIT_LIST_HEAD(entry);
+}
+
+static inline void list_move(struct list_head *list, struct list_head *head)
+{
+ __list_del_entry(list);
+ list_add(list, head);
+}
+
+static inline void list_move_tail(struct list_head *list,
+ struct list_head *head)
+{
+ __list_del_entry(list);
+ list_add_tail(list, head);
+}
+
+static inline int list_is_last(const struct list_head *list,
+ const struct list_head *head)
+{
+ return list->next == head;
+}
+
+static inline int list_empty(const struct list_head *head)
+{
+ return head->next == head;
+}
+
+static inline int list_empty_careful(const struct list_head *head)
+{
+ struct list_head *next = head->next;
+
+ return (next == head) && (next == head->prev);
+}
+
+static inline void list_rotate_left(struct list_head *head)
+{
+ struct list_head *first;
+
+ if (!list_empty(head)) {
+ first = head->next;
+ list_move_tail(first, head);
+ }
+}
+
+static inline int list_is_singular(const struct list_head *head)
+{
+ return !list_empty(head) && (head->next == head->prev);
+}
+
+#define list_entry(ptr, type, member) \
+ container_of(ptr, type, member)
+
+#define list_first_entry(ptr, type, member) \
+ list_entry((ptr)->next, type, member)
+
+#define list_last_entry(ptr, type, member) \
+ list_entry((ptr)->prev, type, member)
+
+#define list_first_entry_or_null(ptr, type, member) \
+ (!list_empty(ptr) ? list_first_entry(ptr, type, member) : NULL)
+
+#define list_next_entry(pos, member) \
+ list_entry((pos)->member.next, typeof(*(pos)), member)
+
+#define list_prev_entry(pos, member) \
+ list_entry((pos)->member.prev, typeof(*(pos)), member)
+
+#define list_for_each(pos, head) \
+ for (pos = (head)->next; pos != (head); pos = pos->next)
+
+#define list_for_each_prev(pos, head) \
+ for (pos = (head)->prev; pos != (head); pos = pos->prev)
+
+#define list_for_each_safe(pos, n, head) \
+ for (pos = (head)->next, n = pos->next; pos != (head); \
+ pos = n, n = pos->next)
+
+#define list_for_each_prev_safe(pos, n, head) \
+ for (pos = (head)->prev, n = pos->prev; \
+ pos != (head); \
+ pos = n, n = pos->prev)
+
+#define list_for_each_entry(pos, head, member) \
+ for (pos = list_first_entry(head, typeof(*pos), member); \
+ &pos->member != (head); \
+ pos = list_next_entry(pos, member))
+
+#define list_for_each_entry_reverse(pos, head, member) \
+ for (pos = list_last_entry(head, typeof(*pos), member); \
+ &pos->member != (head); \
+ pos = list_prev_entry(pos, member))
+
+#define list_prepare_entry(pos, head, member) \
+ ((pos) ? : list_entry(head, typeof(*pos), member))
+
+#define list_for_each_entry_continue(pos, head, member) \
+ for (pos = list_next_entry(pos, member); \
+ &pos->member != (head); \
+ pos = list_next_entry(pos, member))
+
+#define list_for_each_entry_continue_reverse(pos, head, member) \
+ for (pos = list_prev_entry(pos, member); \
+ &pos->member != (head); \
+ pos = list_prev_entry(pos, member))
+
+#define list_for_each_entry_from(pos, head, member) \
+ for (; &pos->member != (head); \
+ pos = list_next_entry(pos, member))
+
+#define list_for_each_entry_safe(pos, n, head, member) \
+ for (pos = list_first_entry(head, typeof(*pos), member), \
+ n = list_next_entry(pos, member); \
+ &pos->member != (head); \
+ pos = n, n = list_next_entry(n, member))
+
+#define list_for_each_entry_safe_continue(pos, n, head, member) \
+ for (pos = list_next_entry(pos, member), \
+ n = list_next_entry(pos, member); \
+ &pos->member != (head); \
+ pos = n, n = list_next_entry(n, member))
+
+#define list_for_each_entry_safe_from(pos, n, head, member) \
+ for (n = list_next_entry(pos, member); \
+ &pos->member != (head); \
+ pos = n, n = list_next_entry(n, member))
+
+#define list_for_each_entry_safe_reverse(pos, n, head, member) \
+ for (pos = list_last_entry(head, typeof(*pos), member), \
+ n = list_prev_entry(pos, member); \
+ &pos->member != (head); \
+ pos = n, n = list_prev_entry(n, member))
+
+#endif /* _LIST_H */
diff --git a/scripts/stackvalidate/stackvalidate.c b/scripts/stackvalidate/stackvalidate.c
new file mode 100644
index 0000000..07f1110
--- /dev/null
+++ b/scripts/stackvalidate/stackvalidate.c
@@ -0,0 +1,226 @@
+/*
+ * stackvalidate.c
+ *
+ * Copyright (C) 2015 Josh Poimboeuf <jpoimboe@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This tool automatically runs for every compiled .S file and validates that
+ * every asm function does the proper frame pointer setup.
+ *
+ * Also, to make sure somebody didn't forget to annotate their callable asm
+ * code as a function (e.g. via the FUNC_ENTER/FUNC_RETURN macros), it flags an
+ * error for any return instructions which are hiding outside of a function.
+ * In almost all cases, return instructions are part of callable functions and
+ * should be annotated as such so that we can validate their frame pointer
+ * usage.
+ *
+ * Whitelist mechanisms exist (RET_NOVALIDATE and FILE_NOVALIDATE) for those
+ * few return instructions which are not actually in callable code.
+ */
+
+#include <argp.h>
+#include <stdbool.h>
+
+#include "elf.h"
+#include "arch.h"
+
+int warnings;
+
+struct args {
+ char *args[1];
+};
+static const char args_doc[] = "file.o";
+static struct argp_option options[] = {
+ {0},
+};
+static error_t parse_opt(int key, char *arg, struct argp_state *state)
+{
+ /* Get the input argument from argp_parse, which we
+ know is a pointer to our args structure. */
+ struct args *args = state->input;
+
+ switch (key) {
+ case ARGP_KEY_ARG:
+ if (state->arg_num >= 1)
+ /* Too many arguments. */
+ argp_usage(state);
+ args->args[state->arg_num] = arg;
+ break;
+ case ARGP_KEY_END:
+ if (state->arg_num < 1)
+ /* Not enough arguments. */
+ argp_usage(state);
+ break;
+ default:
+ return ARGP_ERR_UNKNOWN;
+ }
+ return 0;
+}
+static struct argp argp = { options, parse_opt, args_doc, 0 };
+
+/*
+ * Check for the RET_NOVALIDATE macro.
+ */
+static bool is_ret_whitelisted(struct elf *elf, struct section *sec,
+ unsigned long offset)
+{
+ struct section *wlsec;
+ struct rela *rela;
+
+ wlsec = elf_find_section_by_name(elf,
+ ".rela__stackvalidate_whitelist_ret");
+ if (!wlsec)
+ return false;
+
+ list_for_each_entry(rela, &wlsec->relas, list)
+ if (rela->sym->type == STT_SECTION &&
+ rela->sym->index == sec->index && rela->addend == offset)
+ return true;
+
+ return false;
+}
+
+/*
+ * Check for the FILE_NOVALIDATE macro.
+ */
+static bool is_file_whitelisted(struct elf *elf)
+{
+ if (elf_find_section_by_name(elf, "__stackvalidate_whitelist_file"))
+ return true;
+
+ return false;
+}
+
+/*
+ * For the given collection of instructions which are outside an STT_FUNC
+ * function, ensure there are no (whitelisted) return instructions.
+ */
+static int validate_nonfunction(struct elf *elf, struct section *sec,
+ unsigned long start, unsigned long end)
+{
+ unsigned long addr;
+ unsigned int len;
+ int ret, warnings = 0;
+
+ for (addr = start; addr < end; addr += len) {
+ ret = arch_is_return_insn(elf, addr, end - addr, &len);
+ if (ret == -1)
+ return -1;
+
+ if (ret && !is_ret_whitelisted(elf, sec, addr - sec->start)) {
+ WARN("return instruction outside of a function at %s+0x%lx. Please use FUNC_ENTER.",
+ sec->name, addr - sec->start);
+ warnings++;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * For the given section, ensure that:
+ *
+ * 1) all STT_FUNC functions do the proper frame pointer setup; and
+ * 2) any other instructions outside of STT_FUNC aren't return instructions
+ * (unless they're annotated with the RET_NOVALIDATE macro).
+ */
+static int validate_section(struct elf *elf, struct section *sec)
+{
+ struct symbol *func, *last_func;
+ struct symbol null_func = {};
+ int ret, warnings = 0;
+
+ if (!(sec->sh.sh_flags & SHF_EXECINSTR))
+ return 0;
+
+ if (list_empty(&sec->symbols)) {
+ WARN("%s: no symbols", sec->name);
+ return -1;
+ }
+
+ last_func = &null_func;
+ last_func->start = last_func->end = sec->start;
+ list_for_each_entry(func, &sec->symbols, list) {
+ if (func->type != STT_FUNC)
+ continue;
+
+ if (func->start != last_func->start &&
+ func->end != last_func->end &&
+ func->start < last_func->end) {
+ WARN("overlapping functions %s and %s",
+ last_func->name, func->name);
+ warnings++;
+ }
+
+ if (func->start > last_func->end) {
+ ret = validate_nonfunction(elf, sec, last_func->end,
+ func->start);
+ if (ret < 0)
+ return -1;
+
+ warnings += ret;
+ }
+
+ ret = arch_validate_function(elf, func);
+ if (ret < 0)
+ return -1;
+
+ warnings += ret;
+
+ last_func = func;
+ }
+
+ if (last_func->end < sec->end) {
+ ret = validate_nonfunction(elf, sec, last_func->end, sec->end);
+ if (ret < 0)
+ return -1;
+
+ warnings += ret;
+ }
+
+ return warnings;
+}
+
+int main(int argc, char *argv[])
+{
+ struct args args;
+ struct elf *elf;
+ struct section *sec;
+ int ret, warnings = 0;
+
+ argp_parse(&argp, argc, argv, 0, 0, &args);
+
+ elf = elf_open(args.args[0]);
+ if (!elf) {
+ fprintf(stderr, "error reading elf file %s\n", args.args[0]);
+ return 1;
+ }
+
+ if (is_file_whitelisted(elf))
+ return 0;
+
+ list_for_each_entry(sec, &elf->sections, list) {
+ ret = validate_section(elf, sec);
+ if (ret < 0)
+ return 1;
+
+ warnings += ret;
+ }
+
+ /* ignore warnings for now until we get all the asm code cleaned up */
+ return 0;
+}
--
2.1.0
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v4 2/3] x86: Make push/pop CFI macros arch-independent
2015-05-18 16:34 [PATCH v4 0/3] Compile-time stack frame pointer validation Josh Poimboeuf
2015-05-18 16:34 ` [PATCH v4 1/3] x86, stackvalidate: " Josh Poimboeuf
@ 2015-05-18 16:34 ` Josh Poimboeuf
2015-05-18 16:34 ` [PATCH v4 3/3] x86, stackvalidate: Add asm frame pointer setup macros Josh Poimboeuf
2015-05-20 10:33 ` [PATCH v4 0/3] Compile-time stack frame pointer validation Ingo Molnar
3 siblings, 0 replies; 400+ messages in thread
From: Josh Poimboeuf @ 2015-05-18 16:34 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
Cc: Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel
The separate push{lq}_cfi and pop_{lq}_cfi macros aren't needed. Push
and pop only come in one size per architecture, so the trailing 'q' or
'l' characters are redundant, and awkward to use in arch-independent
code.
Replace the push/pop CFI macros with architecture-independent versions:
push_cfi, pop_cfi, etc.
This change is purely cosmetic, with no resulting object code changes.
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
arch/x86/ia32/ia32entry.S | 60 ++++++------
arch/x86/include/asm/calling.h | 28 +++---
arch/x86/include/asm/dwarf2.h | 92 ++++++------------
arch/x86/include/asm/frame.h | 4 +-
arch/x86/kernel/entry_32.S | 214 ++++++++++++++++++++---------------------
arch/x86/kernel/entry_64.S | 96 +++++++++---------
arch/x86/lib/atomic64_386_32.S | 4 +-
arch/x86/lib/atomic64_cx8_32.S | 40 ++++----
arch/x86/lib/checksum_32.S | 42 ++++----
arch/x86/lib/cmpxchg16b_emu.S | 6 +-
arch/x86/lib/cmpxchg8b_emu.S | 6 +-
arch/x86/lib/msr-reg.S | 34 +++----
arch/x86/lib/rwsem.S | 40 ++++----
arch/x86/lib/thunk_32.S | 12 +--
arch/x86/lib/thunk_64.S | 36 +++----
15 files changed, 343 insertions(+), 371 deletions(-)
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 83e4ed2..7259bc9 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -124,19 +124,19 @@ ENTRY(ia32_sysenter_target)
CFI_REGISTER rip,r10
/* Construct struct pt_regs on stack */
- pushq_cfi $__USER32_DS /* pt_regs->ss */
- pushq_cfi %rbp /* pt_regs->sp */
+ push_cfi $__USER32_DS /* pt_regs->ss */
+ push_cfi %rbp /* pt_regs->sp */
CFI_REL_OFFSET rsp,0
- pushfq_cfi /* pt_regs->flags */
- pushq_cfi $__USER32_CS /* pt_regs->cs */
- pushq_cfi %r10 /* pt_regs->ip = thread_info->sysenter_return */
+ pushf_cfi /* pt_regs->flags */
+ push_cfi $__USER32_CS /* pt_regs->cs */
+ push_cfi %r10 /* pt_regs->ip = thread_info->sysenter_return */
CFI_REL_OFFSET rip,0
- pushq_cfi_reg rax /* pt_regs->orig_ax */
- pushq_cfi_reg rdi /* pt_regs->di */
- pushq_cfi_reg rsi /* pt_regs->si */
- pushq_cfi_reg rdx /* pt_regs->dx */
- pushq_cfi_reg rcx /* pt_regs->cx */
- pushq_cfi $-ENOSYS /* pt_regs->ax */
+ push_cfi_reg rax /* pt_regs->orig_ax */
+ push_cfi_reg rdi /* pt_regs->di */
+ push_cfi_reg rsi /* pt_regs->si */
+ push_cfi_reg rdx /* pt_regs->dx */
+ push_cfi_reg rcx /* pt_regs->cx */
+ push_cfi $-ENOSYS /* pt_regs->ax */
cld
sub $(10*8),%rsp /* pt_regs->r8-11,bp,bx,r12-15 not saved */
CFI_ADJUST_CFA_OFFSET 10*8
@@ -282,8 +282,8 @@ sysexit_audit:
#endif
sysenter_fix_flags:
- pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
- popfq_cfi
+ push_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
+ popf_cfi
jmp sysenter_flags_fixed
sysenter_tracesys:
@@ -353,20 +353,20 @@ ENTRY(ia32_cstar_target)
movl %eax,%eax
/* Construct struct pt_regs on stack */
- pushq_cfi $__USER32_DS /* pt_regs->ss */
- pushq_cfi %r8 /* pt_regs->sp */
+ push_cfi $__USER32_DS /* pt_regs->ss */
+ push_cfi %r8 /* pt_regs->sp */
CFI_REL_OFFSET rsp,0
- pushq_cfi %r11 /* pt_regs->flags */
- pushq_cfi $__USER32_CS /* pt_regs->cs */
- pushq_cfi %rcx /* pt_regs->ip */
+ push_cfi %r11 /* pt_regs->flags */
+ push_cfi $__USER32_CS /* pt_regs->cs */
+ push_cfi %rcx /* pt_regs->ip */
CFI_REL_OFFSET rip,0
- pushq_cfi_reg rax /* pt_regs->orig_ax */
- pushq_cfi_reg rdi /* pt_regs->di */
- pushq_cfi_reg rsi /* pt_regs->si */
- pushq_cfi_reg rdx /* pt_regs->dx */
- pushq_cfi_reg rbp /* pt_regs->cx */
+ push_cfi_reg rax /* pt_regs->orig_ax */
+ push_cfi_reg rdi /* pt_regs->di */
+ push_cfi_reg rsi /* pt_regs->si */
+ push_cfi_reg rdx /* pt_regs->dx */
+ push_cfi_reg rbp /* pt_regs->cx */
movl %ebp,%ecx
- pushq_cfi $-ENOSYS /* pt_regs->ax */
+ push_cfi $-ENOSYS /* pt_regs->ax */
sub $(10*8),%rsp /* pt_regs->r8-11,bp,bx,r12-15 not saved */
CFI_ADJUST_CFA_OFFSET 10*8
@@ -506,12 +506,12 @@ ENTRY(ia32_syscall)
movl %eax,%eax
/* Construct struct pt_regs on stack (iret frame is already on stack) */
- pushq_cfi_reg rax /* pt_regs->orig_ax */
- pushq_cfi_reg rdi /* pt_regs->di */
- pushq_cfi_reg rsi /* pt_regs->si */
- pushq_cfi_reg rdx /* pt_regs->dx */
- pushq_cfi_reg rcx /* pt_regs->cx */
- pushq_cfi $-ENOSYS /* pt_regs->ax */
+ push_cfi_reg rax /* pt_regs->orig_ax */
+ push_cfi_reg rdi /* pt_regs->di */
+ push_cfi_reg rsi /* pt_regs->si */
+ push_cfi_reg rdx /* pt_regs->dx */
+ push_cfi_reg rcx /* pt_regs->cx */
+ push_cfi $-ENOSYS /* pt_regs->ax */
cld
sub $(10*8),%rsp /* pt_regs->r8-11,bp,bx,r12-15 not saved */
CFI_ADJUST_CFA_OFFSET 10*8
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index 1c8b50e..4abc60f 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -224,23 +224,23 @@ For 32-bit we have the following conventions - kernel is built with
*/
.macro SAVE_ALL
- pushl_cfi_reg eax
- pushl_cfi_reg ebp
- pushl_cfi_reg edi
- pushl_cfi_reg esi
- pushl_cfi_reg edx
- pushl_cfi_reg ecx
- pushl_cfi_reg ebx
+ push_cfi_reg eax
+ push_cfi_reg ebp
+ push_cfi_reg edi
+ push_cfi_reg esi
+ push_cfi_reg edx
+ push_cfi_reg ecx
+ push_cfi_reg ebx
.endm
.macro RESTORE_ALL
- popl_cfi_reg ebx
- popl_cfi_reg ecx
- popl_cfi_reg edx
- popl_cfi_reg esi
- popl_cfi_reg edi
- popl_cfi_reg ebp
- popl_cfi_reg eax
+ pop_cfi_reg ebx
+ pop_cfi_reg ecx
+ pop_cfi_reg edx
+ pop_cfi_reg esi
+ pop_cfi_reg edi
+ pop_cfi_reg ebp
+ pop_cfi_reg eax
.endm
#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/include/asm/dwarf2.h b/arch/x86/include/asm/dwarf2.h
index de1cdaf..5af7e15 100644
--- a/arch/x86/include/asm/dwarf2.h
+++ b/arch/x86/include/asm/dwarf2.h
@@ -5,6 +5,8 @@
#warning "asm/dwarf2.h should be only included in pure assembly files"
#endif
+#include <asm/asm.h>
+
/*
* Macros for dwarf2 CFI unwind table entries.
* See "as.info" for details on these pseudo ops. Unfortunately
@@ -80,79 +82,39 @@
* what you're doing if you use them.
*/
#ifdef __ASSEMBLY__
-#ifdef CONFIG_X86_64
- .macro pushq_cfi reg
- pushq \reg
- CFI_ADJUST_CFA_OFFSET 8
- .endm
-
- .macro pushq_cfi_reg reg
- pushq %\reg
- CFI_ADJUST_CFA_OFFSET 8
- CFI_REL_OFFSET \reg, 0
- .endm
- .macro popq_cfi reg
- popq \reg
- CFI_ADJUST_CFA_OFFSET -8
- .endm
-
- .macro popq_cfi_reg reg
- popq %\reg
- CFI_ADJUST_CFA_OFFSET -8
- CFI_RESTORE \reg
- .endm
+#define STACK_WORD_SIZE __ASM_SEL(4,8)
- .macro pushfq_cfi
- pushfq
- CFI_ADJUST_CFA_OFFSET 8
+ .macro push_cfi reg
+ push \reg
+ CFI_ADJUST_CFA_OFFSET STACK_WORD_SIZE
.endm
- .macro popfq_cfi
- popfq
- CFI_ADJUST_CFA_OFFSET -8
- .endm
-
- .macro movq_cfi reg offset=0
- movq %\reg, \offset(%rsp)
- CFI_REL_OFFSET \reg, \offset
- .endm
-
- .macro movq_cfi_restore offset reg
- movq \offset(%rsp), %\reg
- CFI_RESTORE \reg
- .endm
-#else /*!CONFIG_X86_64*/
- .macro pushl_cfi reg
- pushl \reg
- CFI_ADJUST_CFA_OFFSET 4
- .endm
-
- .macro pushl_cfi_reg reg
- pushl %\reg
- CFI_ADJUST_CFA_OFFSET 4
+ .macro push_cfi_reg reg
+ push %\reg
+ CFI_ADJUST_CFA_OFFSET STACK_WORD_SIZE
CFI_REL_OFFSET \reg, 0
.endm
- .macro popl_cfi reg
- popl \reg
- CFI_ADJUST_CFA_OFFSET -4
+ .macro pop_cfi reg
+ pop \reg
+ CFI_ADJUST_CFA_OFFSET -STACK_WORD_SIZE
.endm
- .macro popl_cfi_reg reg
- popl %\reg
- CFI_ADJUST_CFA_OFFSET -4
+ .macro pop_cfi_reg reg
+ pop %\reg
+ CFI_ADJUST_CFA_OFFSET -STACK_WORD_SIZE
CFI_RESTORE \reg
.endm
- .macro pushfl_cfi
- pushfl
- CFI_ADJUST_CFA_OFFSET 4
+ .macro pushf_cfi
+ pushf
+ CFI_ADJUST_CFA_OFFSET STACK_WORD_SIZE
.endm
- .macro popfl_cfi
- popfl
- CFI_ADJUST_CFA_OFFSET -4
+ .macro popf_cfi
+ popf
+ CFI_ADJUST_CFA_OFFSET -STACK_WORD_SIZE
.endm
.macro movl_cfi reg offset=0
@@ -164,7 +126,17 @@
movl \offset(%esp), %\reg
CFI_RESTORE \reg
.endm
-#endif /*!CONFIG_X86_64*/
+
+ .macro movq_cfi reg offset=0
+ movq %\reg, \offset(%rsp)
+ CFI_REL_OFFSET \reg, \offset
+ .endm
+
+ .macro movq_cfi_restore offset reg
+ movq \offset(%rsp), %\reg
+ CFI_RESTORE \reg
+ .endm
+
#endif /*__ASSEMBLY__*/
#endif /* _ASM_X86_DWARF2_H */
diff --git a/arch/x86/include/asm/frame.h b/arch/x86/include/asm/frame.h
index 3b629f4..325e4e8 100644
--- a/arch/x86/include/asm/frame.h
+++ b/arch/x86/include/asm/frame.h
@@ -8,12 +8,12 @@
frame pointer later */
#ifdef CONFIG_FRAME_POINTER
.macro FRAME
- __ASM_SIZE(push,_cfi) %__ASM_REG(bp)
+ push_cfi %__ASM_REG(bp)
CFI_REL_OFFSET __ASM_REG(bp), 0
__ASM_SIZE(mov) %__ASM_REG(sp), %__ASM_REG(bp)
.endm
.macro ENDFRAME
- __ASM_SIZE(pop,_cfi) %__ASM_REG(bp)
+ pop_cfi %__ASM_REG(bp)
CFI_RESTORE __ASM_REG(bp)
.endm
#else
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 1c30976..7e88181 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -113,7 +113,7 @@
/* unfortunately push/pop can't be no-op */
.macro PUSH_GS
- pushl_cfi $0
+ push_cfi $0
.endm
.macro POP_GS pop=0
addl $(4 + \pop), %esp
@@ -137,12 +137,12 @@
#else /* CONFIG_X86_32_LAZY_GS */
.macro PUSH_GS
- pushl_cfi %gs
+ push_cfi %gs
/*CFI_REL_OFFSET gs, 0*/
.endm
.macro POP_GS pop=0
-98: popl_cfi %gs
+98: pop_cfi %gs
/*CFI_RESTORE gs*/
.if \pop <> 0
add $\pop, %esp
@@ -186,25 +186,25 @@
.macro SAVE_ALL
cld
PUSH_GS
- pushl_cfi %fs
+ push_cfi %fs
/*CFI_REL_OFFSET fs, 0;*/
- pushl_cfi %es
+ push_cfi %es
/*CFI_REL_OFFSET es, 0;*/
- pushl_cfi %ds
+ push_cfi %ds
/*CFI_REL_OFFSET ds, 0;*/
- pushl_cfi %eax
+ push_cfi %eax
CFI_REL_OFFSET eax, 0
- pushl_cfi %ebp
+ push_cfi %ebp
CFI_REL_OFFSET ebp, 0
- pushl_cfi %edi
+ push_cfi %edi
CFI_REL_OFFSET edi, 0
- pushl_cfi %esi
+ push_cfi %esi
CFI_REL_OFFSET esi, 0
- pushl_cfi %edx
+ push_cfi %edx
CFI_REL_OFFSET edx, 0
- pushl_cfi %ecx
+ push_cfi %ecx
CFI_REL_OFFSET ecx, 0
- pushl_cfi %ebx
+ push_cfi %ebx
CFI_REL_OFFSET ebx, 0
movl $(__USER_DS), %edx
movl %edx, %ds
@@ -215,29 +215,29 @@
.endm
.macro RESTORE_INT_REGS
- popl_cfi %ebx
+ pop_cfi %ebx
CFI_RESTORE ebx
- popl_cfi %ecx
+ pop_cfi %ecx
CFI_RESTORE ecx
- popl_cfi %edx
+ pop_cfi %edx
CFI_RESTORE edx
- popl_cfi %esi
+ pop_cfi %esi
CFI_RESTORE esi
- popl_cfi %edi
+ pop_cfi %edi
CFI_RESTORE edi
- popl_cfi %ebp
+ pop_cfi %ebp
CFI_RESTORE ebp
- popl_cfi %eax
+ pop_cfi %eax
CFI_RESTORE eax
.endm
.macro RESTORE_REGS pop=0
RESTORE_INT_REGS
-1: popl_cfi %ds
+1: pop_cfi %ds
/*CFI_RESTORE ds;*/
-2: popl_cfi %es
+2: pop_cfi %es
/*CFI_RESTORE es;*/
-3: popl_cfi %fs
+3: pop_cfi %fs
/*CFI_RESTORE fs;*/
POP_GS \pop
.pushsection .fixup, "ax"
@@ -289,24 +289,24 @@
ENTRY(ret_from_fork)
CFI_STARTPROC
- pushl_cfi %eax
+ push_cfi %eax
call schedule_tail
GET_THREAD_INFO(%ebp)
- popl_cfi %eax
- pushl_cfi $0x0202 # Reset kernel eflags
- popfl_cfi
+ pop_cfi %eax
+ push_cfi $0x0202 # Reset kernel eflags
+ popf_cfi
jmp syscall_exit
CFI_ENDPROC
END(ret_from_fork)
ENTRY(ret_from_kernel_thread)
CFI_STARTPROC
- pushl_cfi %eax
+ push_cfi %eax
call schedule_tail
GET_THREAD_INFO(%ebp)
- popl_cfi %eax
- pushl_cfi $0x0202 # Reset kernel eflags
- popfl_cfi
+ pop_cfi %eax
+ push_cfi $0x0202 # Reset kernel eflags
+ popf_cfi
movl PT_EBP(%esp),%eax
call *PT_EBX(%esp)
movl $0,PT_EAX(%esp)
@@ -385,13 +385,13 @@ sysenter_past_esp:
* enough kernel state to call TRACE_IRQS_OFF can be called - but
* we immediately enable interrupts at that point anyway.
*/
- pushl_cfi $__USER_DS
+ push_cfi $__USER_DS
/*CFI_REL_OFFSET ss, 0*/
- pushl_cfi %ebp
+ push_cfi %ebp
CFI_REL_OFFSET esp, 0
- pushfl_cfi
+ pushf_cfi
orl $X86_EFLAGS_IF, (%esp)
- pushl_cfi $__USER_CS
+ push_cfi $__USER_CS
/*CFI_REL_OFFSET cs, 0*/
/*
* Push current_thread_info()->sysenter_return to the stack.
@@ -401,10 +401,10 @@ sysenter_past_esp:
* TOP_OF_KERNEL_STACK_PADDING takes us to the top of the stack;
* and THREAD_SIZE takes us to the bottom.
*/
- pushl_cfi ((TI_sysenter_return) - THREAD_SIZE + TOP_OF_KERNEL_STACK_PADDING + 4*4)(%esp)
+ push_cfi ((TI_sysenter_return) - THREAD_SIZE + TOP_OF_KERNEL_STACK_PADDING + 4*4)(%esp)
CFI_REL_OFFSET eip, 0
- pushl_cfi %eax
+ push_cfi %eax
SAVE_ALL
ENABLE_INTERRUPTS(CLBR_NONE)
@@ -453,11 +453,11 @@ sysenter_audit:
/* movl PT_EAX(%esp), %eax already set, syscall number: 1st arg to audit */
movl PT_EBX(%esp), %edx /* ebx/a0: 2nd arg to audit */
/* movl PT_ECX(%esp), %ecx already set, a1: 3nd arg to audit */
- pushl_cfi PT_ESI(%esp) /* a3: 5th arg */
- pushl_cfi PT_EDX+4(%esp) /* a2: 4th arg */
+ push_cfi PT_ESI(%esp) /* a3: 5th arg */
+ push_cfi PT_EDX+4(%esp) /* a2: 4th arg */
call __audit_syscall_entry
- popl_cfi %ecx /* get that remapped edx off the stack */
- popl_cfi %ecx /* get that remapped esi off the stack */
+ pop_cfi %ecx /* get that remapped edx off the stack */
+ pop_cfi %ecx /* get that remapped esi off the stack */
movl PT_EAX(%esp),%eax /* reload syscall number */
jmp sysenter_do_call
@@ -493,7 +493,7 @@ ENDPROC(ia32_sysenter_target)
ENTRY(system_call)
RING0_INT_FRAME # can't unwind into user space anyway
ASM_CLAC
- pushl_cfi %eax # save orig_eax
+ push_cfi %eax # save orig_eax
SAVE_ALL
GET_THREAD_INFO(%ebp)
# system call tracing in operation / emulation
@@ -577,8 +577,8 @@ ldt_ss:
shr $16, %edx
mov %dl, GDT_ESPFIX_SS + 4 /* bits 16..23 */
mov %dh, GDT_ESPFIX_SS + 7 /* bits 24..31 */
- pushl_cfi $__ESPFIX_SS
- pushl_cfi %eax /* new kernel esp */
+ push_cfi $__ESPFIX_SS
+ push_cfi %eax /* new kernel esp */
/* Disable interrupts, but do not irqtrace this section: we
* will soon execute iret and the tracer was already set to
* the irqstate after the iret */
@@ -634,9 +634,9 @@ work_notifysig: # deal with pending signals and
#ifdef CONFIG_VM86
ALIGN
work_notifysig_v86:
- pushl_cfi %ecx # save ti_flags for do_notify_resume
+ push_cfi %ecx # save ti_flags for do_notify_resume
call save_v86_state # %eax contains pt_regs pointer
- popl_cfi %ecx
+ pop_cfi %ecx
movl %eax, %esp
jmp 1b
#endif
@@ -701,8 +701,8 @@ END(sysenter_badsys)
mov GDT_ESPFIX_SS + 7, %ah /* bits 24..31 */
shl $16, %eax
addl %esp, %eax /* the adjusted stack pointer */
- pushl_cfi $__KERNEL_DS
- pushl_cfi %eax
+ push_cfi $__KERNEL_DS
+ push_cfi %eax
lss (%esp), %esp /* switch to the normal stack segment */
CFI_ADJUST_CFA_OFFSET -8
#endif
@@ -731,7 +731,7 @@ ENTRY(irq_entries_start)
RING0_INT_FRAME
vector=FIRST_EXTERNAL_VECTOR
.rept (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR)
- pushl_cfi $(~vector+0x80) /* Note: always in signed byte range */
+ push_cfi $(~vector+0x80) /* Note: always in signed byte range */
vector=vector+1
jmp common_interrupt
CFI_ADJUST_CFA_OFFSET -4
@@ -759,7 +759,7 @@ ENDPROC(common_interrupt)
ENTRY(name) \
RING0_INT_FRAME; \
ASM_CLAC; \
- pushl_cfi $~(nr); \
+ push_cfi $~(nr); \
SAVE_ALL; \
TRACE_IRQS_OFF \
movl %esp,%eax; \
@@ -786,8 +786,8 @@ ENDPROC(name)
ENTRY(coprocessor_error)
RING0_INT_FRAME
ASM_CLAC
- pushl_cfi $0
- pushl_cfi $do_coprocessor_error
+ push_cfi $0
+ push_cfi $do_coprocessor_error
jmp error_code
CFI_ENDPROC
END(coprocessor_error)
@@ -795,14 +795,14 @@ END(coprocessor_error)
ENTRY(simd_coprocessor_error)
RING0_INT_FRAME
ASM_CLAC
- pushl_cfi $0
+ push_cfi $0
#ifdef CONFIG_X86_INVD_BUG
/* AMD 486 bug: invd from userspace calls exception 19 instead of #GP */
- ALTERNATIVE "pushl_cfi $do_general_protection", \
+ ALTERNATIVE "push_cfi $do_general_protection", \
"pushl $do_simd_coprocessor_error", \
X86_FEATURE_XMM
#else
- pushl_cfi $do_simd_coprocessor_error
+ push_cfi $do_simd_coprocessor_error
#endif
jmp error_code
CFI_ENDPROC
@@ -811,8 +811,8 @@ END(simd_coprocessor_error)
ENTRY(device_not_available)
RING0_INT_FRAME
ASM_CLAC
- pushl_cfi $-1 # mark this as an int
- pushl_cfi $do_device_not_available
+ push_cfi $-1 # mark this as an int
+ push_cfi $do_device_not_available
jmp error_code
CFI_ENDPROC
END(device_not_available)
@@ -832,8 +832,8 @@ END(native_irq_enable_sysexit)
ENTRY(overflow)
RING0_INT_FRAME
ASM_CLAC
- pushl_cfi $0
- pushl_cfi $do_overflow
+ push_cfi $0
+ push_cfi $do_overflow
jmp error_code
CFI_ENDPROC
END(overflow)
@@ -841,8 +841,8 @@ END(overflow)
ENTRY(bounds)
RING0_INT_FRAME
ASM_CLAC
- pushl_cfi $0
- pushl_cfi $do_bounds
+ push_cfi $0
+ push_cfi $do_bounds
jmp error_code
CFI_ENDPROC
END(bounds)
@@ -850,8 +850,8 @@ END(bounds)
ENTRY(invalid_op)
RING0_INT_FRAME
ASM_CLAC
- pushl_cfi $0
- pushl_cfi $do_invalid_op
+ push_cfi $0
+ push_cfi $do_invalid_op
jmp error_code
CFI_ENDPROC
END(invalid_op)
@@ -859,8 +859,8 @@ END(invalid_op)
ENTRY(coprocessor_segment_overrun)
RING0_INT_FRAME
ASM_CLAC
- pushl_cfi $0
- pushl_cfi $do_coprocessor_segment_overrun
+ push_cfi $0
+ push_cfi $do_coprocessor_segment_overrun
jmp error_code
CFI_ENDPROC
END(coprocessor_segment_overrun)
@@ -868,7 +868,7 @@ END(coprocessor_segment_overrun)
ENTRY(invalid_TSS)
RING0_EC_FRAME
ASM_CLAC
- pushl_cfi $do_invalid_TSS
+ push_cfi $do_invalid_TSS
jmp error_code
CFI_ENDPROC
END(invalid_TSS)
@@ -876,7 +876,7 @@ END(invalid_TSS)
ENTRY(segment_not_present)
RING0_EC_FRAME
ASM_CLAC
- pushl_cfi $do_segment_not_present
+ push_cfi $do_segment_not_present
jmp error_code
CFI_ENDPROC
END(segment_not_present)
@@ -884,7 +884,7 @@ END(segment_not_present)
ENTRY(stack_segment)
RING0_EC_FRAME
ASM_CLAC
- pushl_cfi $do_stack_segment
+ push_cfi $do_stack_segment
jmp error_code
CFI_ENDPROC
END(stack_segment)
@@ -892,7 +892,7 @@ END(stack_segment)
ENTRY(alignment_check)
RING0_EC_FRAME
ASM_CLAC
- pushl_cfi $do_alignment_check
+ push_cfi $do_alignment_check
jmp error_code
CFI_ENDPROC
END(alignment_check)
@@ -900,8 +900,8 @@ END(alignment_check)
ENTRY(divide_error)
RING0_INT_FRAME
ASM_CLAC
- pushl_cfi $0 # no error code
- pushl_cfi $do_divide_error
+ push_cfi $0 # no error code
+ push_cfi $do_divide_error
jmp error_code
CFI_ENDPROC
END(divide_error)
@@ -910,8 +910,8 @@ END(divide_error)
ENTRY(machine_check)
RING0_INT_FRAME
ASM_CLAC
- pushl_cfi $0
- pushl_cfi machine_check_vector
+ push_cfi $0
+ push_cfi machine_check_vector
jmp error_code
CFI_ENDPROC
END(machine_check)
@@ -920,8 +920,8 @@ END(machine_check)
ENTRY(spurious_interrupt_bug)
RING0_INT_FRAME
ASM_CLAC
- pushl_cfi $0
- pushl_cfi $do_spurious_interrupt_bug
+ push_cfi $0
+ push_cfi $do_spurious_interrupt_bug
jmp error_code
CFI_ENDPROC
END(spurious_interrupt_bug)
@@ -938,7 +938,7 @@ ENTRY(xen_sysenter_target)
ENTRY(xen_hypervisor_callback)
CFI_STARTPROC
- pushl_cfi $-1 /* orig_ax = -1 => not a system call */
+ push_cfi $-1 /* orig_ax = -1 => not a system call */
SAVE_ALL
TRACE_IRQS_OFF
@@ -977,7 +977,7 @@ ENDPROC(xen_hypervisor_callback)
# We distinguish between categories by maintaining a status value in EAX.
ENTRY(xen_failsafe_callback)
CFI_STARTPROC
- pushl_cfi %eax
+ push_cfi %eax
movl $1,%eax
1: mov 4(%esp),%ds
2: mov 8(%esp),%es
@@ -986,12 +986,12 @@ ENTRY(xen_failsafe_callback)
/* EAX == 0 => Category 1 (Bad segment)
EAX != 0 => Category 2 (Bad IRET) */
testl %eax,%eax
- popl_cfi %eax
+ pop_cfi %eax
lea 16(%esp),%esp
CFI_ADJUST_CFA_OFFSET -16
jz 5f
jmp iret_exc
-5: pushl_cfi $-1 /* orig_ax = -1 => not a system call */
+5: push_cfi $-1 /* orig_ax = -1 => not a system call */
SAVE_ALL
jmp ret_from_exception
CFI_ENDPROC
@@ -1197,7 +1197,7 @@ return_to_handler:
ENTRY(trace_page_fault)
RING0_EC_FRAME
ASM_CLAC
- pushl_cfi $trace_do_page_fault
+ push_cfi $trace_do_page_fault
jmp error_code
CFI_ENDPROC
END(trace_page_fault)
@@ -1206,23 +1206,23 @@ END(trace_page_fault)
ENTRY(page_fault)
RING0_EC_FRAME
ASM_CLAC
- pushl_cfi $do_page_fault
+ push_cfi $do_page_fault
ALIGN
error_code:
/* the function address is in %gs's slot on the stack */
- pushl_cfi %fs
+ push_cfi %fs
/*CFI_REL_OFFSET fs, 0*/
- pushl_cfi %es
+ push_cfi %es
/*CFI_REL_OFFSET es, 0*/
- pushl_cfi %ds
+ push_cfi %ds
/*CFI_REL_OFFSET ds, 0*/
- pushl_cfi_reg eax
- pushl_cfi_reg ebp
- pushl_cfi_reg edi
- pushl_cfi_reg esi
- pushl_cfi_reg edx
- pushl_cfi_reg ecx
- pushl_cfi_reg ebx
+ push_cfi_reg eax
+ push_cfi_reg ebp
+ push_cfi_reg edi
+ push_cfi_reg esi
+ push_cfi_reg edx
+ push_cfi_reg ecx
+ push_cfi_reg ebx
cld
movl $(__KERNEL_PERCPU), %ecx
movl %ecx, %fs
@@ -1263,9 +1263,9 @@ END(page_fault)
movl TSS_sysenter_sp0 + \offset(%esp), %esp
CFI_DEF_CFA esp, 0
CFI_UNDEFINED eip
- pushfl_cfi
- pushl_cfi $__KERNEL_CS
- pushl_cfi $sysenter_past_esp
+ pushf_cfi
+ push_cfi $__KERNEL_CS
+ push_cfi $sysenter_past_esp
CFI_REL_OFFSET eip, 0
.endm
@@ -1276,7 +1276,7 @@ ENTRY(debug)
jne debug_stack_correct
FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn
debug_stack_correct:
- pushl_cfi $-1 # mark this as an int
+ push_cfi $-1 # mark this as an int
SAVE_ALL
TRACE_IRQS_OFF
xorl %edx,%edx # error code 0
@@ -1298,28 +1298,28 @@ ENTRY(nmi)
RING0_INT_FRAME
ASM_CLAC
#ifdef CONFIG_X86_ESPFIX32
- pushl_cfi %eax
+ push_cfi %eax
movl %ss, %eax
cmpw $__ESPFIX_SS, %ax
- popl_cfi %eax
+ pop_cfi %eax
je nmi_espfix_stack
#endif
cmpl $ia32_sysenter_target,(%esp)
je nmi_stack_fixup
- pushl_cfi %eax
+ push_cfi %eax
movl %esp,%eax
/* Do not access memory above the end of our stack page,
* it might not exist.
*/
andl $(THREAD_SIZE-1),%eax
cmpl $(THREAD_SIZE-20),%eax
- popl_cfi %eax
+ pop_cfi %eax
jae nmi_stack_correct
cmpl $ia32_sysenter_target,12(%esp)
je nmi_debug_stack_check
nmi_stack_correct:
/* We have a RING0_INT_FRAME here */
- pushl_cfi %eax
+ push_cfi %eax
SAVE_ALL
xorl %edx,%edx # zero error code
movl %esp,%eax # pt_regs pointer
@@ -1349,14 +1349,14 @@ nmi_espfix_stack:
*
* create the pointer to lss back
*/
- pushl_cfi %ss
- pushl_cfi %esp
+ push_cfi %ss
+ push_cfi %esp
addl $4, (%esp)
/* copy the iret frame of 12 bytes */
.rept 3
- pushl_cfi 16(%esp)
+ push_cfi 16(%esp)
.endr
- pushl_cfi %eax
+ push_cfi %eax
SAVE_ALL
FIXUP_ESPFIX_STACK # %eax == %esp
xorl %edx,%edx # zero error code
@@ -1372,7 +1372,7 @@ END(nmi)
ENTRY(int3)
RING0_INT_FRAME
ASM_CLAC
- pushl_cfi $-1 # mark this as an int
+ push_cfi $-1 # mark this as an int
SAVE_ALL
TRACE_IRQS_OFF
xorl %edx,%edx # zero error code
@@ -1384,7 +1384,7 @@ END(int3)
ENTRY(general_protection)
RING0_EC_FRAME
- pushl_cfi $do_general_protection
+ push_cfi $do_general_protection
jmp error_code
CFI_ENDPROC
END(general_protection)
@@ -1393,7 +1393,7 @@ END(general_protection)
ENTRY(async_page_fault)
RING0_EC_FRAME
ASM_CLAC
- pushl_cfi $do_async_page_fault
+ push_cfi $do_async_page_fault
jmp error_code
CFI_ENDPROC
END(async_page_fault)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 4e0ed47..3f2c4b2 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -219,8 +219,8 @@ GLOBAL(system_call_after_swapgs)
movq PER_CPU_VAR(cpu_current_top_of_stack),%rsp
/* Construct struct pt_regs on stack */
- pushq_cfi $__USER_DS /* pt_regs->ss */
- pushq_cfi PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */
+ push_cfi $__USER_DS /* pt_regs->ss */
+ push_cfi PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */
/*
* Re-enable interrupts.
* We use 'rsp_scratch' as a scratch space, hence irq-off block above
@@ -229,20 +229,20 @@ GLOBAL(system_call_after_swapgs)
* with using rsp_scratch:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
- pushq_cfi %r11 /* pt_regs->flags */
- pushq_cfi $__USER_CS /* pt_regs->cs */
- pushq_cfi %rcx /* pt_regs->ip */
+ push_cfi %r11 /* pt_regs->flags */
+ push_cfi $__USER_CS /* pt_regs->cs */
+ push_cfi %rcx /* pt_regs->ip */
CFI_REL_OFFSET rip,0
- pushq_cfi_reg rax /* pt_regs->orig_ax */
- pushq_cfi_reg rdi /* pt_regs->di */
- pushq_cfi_reg rsi /* pt_regs->si */
- pushq_cfi_reg rdx /* pt_regs->dx */
- pushq_cfi_reg rcx /* pt_regs->cx */
- pushq_cfi $-ENOSYS /* pt_regs->ax */
- pushq_cfi_reg r8 /* pt_regs->r8 */
- pushq_cfi_reg r9 /* pt_regs->r9 */
- pushq_cfi_reg r10 /* pt_regs->r10 */
- pushq_cfi_reg r11 /* pt_regs->r11 */
+ push_cfi_reg rax /* pt_regs->orig_ax */
+ push_cfi_reg rdi /* pt_regs->di */
+ push_cfi_reg rsi /* pt_regs->si */
+ push_cfi_reg rdx /* pt_regs->dx */
+ push_cfi_reg rcx /* pt_regs->cx */
+ push_cfi $-ENOSYS /* pt_regs->ax */
+ push_cfi_reg r8 /* pt_regs->r8 */
+ push_cfi_reg r9 /* pt_regs->r9 */
+ push_cfi_reg r10 /* pt_regs->r10 */
+ push_cfi_reg r11 /* pt_regs->r11 */
sub $(6*8),%rsp /* pt_regs->bp,bx,r12-15 not saved */
CFI_ADJUST_CFA_OFFSET 6*8
@@ -374,9 +374,9 @@ int_careful:
jnc int_very_careful
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
- pushq_cfi %rdi
+ push_cfi %rdi
SCHEDULE_USER
- popq_cfi %rdi
+ pop_cfi %rdi
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp int_with_check
@@ -389,10 +389,10 @@ int_very_careful:
/* Check for syscall exit trace */
testl $_TIF_WORK_SYSCALL_EXIT,%edx
jz int_signal
- pushq_cfi %rdi
+ push_cfi %rdi
leaq 8(%rsp),%rdi # &ptregs -> arg1
call syscall_trace_leave
- popq_cfi %rdi
+ pop_cfi %rdi
andl $~(_TIF_WORK_SYSCALL_EXIT|_TIF_SYSCALL_EMU),%edi
jmp int_restore_rest
@@ -603,8 +603,8 @@ ENTRY(ret_from_fork)
LOCK ; btr $TIF_FORK,TI_flags(%r8)
- pushq_cfi $0x0002
- popfq_cfi # reset kernel eflags
+ push_cfi $0x0002
+ popf_cfi # reset kernel eflags
call schedule_tail # rdi: 'prev' task parameter
@@ -640,7 +640,7 @@ ENTRY(irq_entries_start)
INTR_FRAME
vector=FIRST_EXTERNAL_VECTOR
.rept (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR)
- pushq_cfi $(~vector+0x80) /* Note: always in signed byte range */
+ push_cfi $(~vector+0x80) /* Note: always in signed byte range */
vector=vector+1
jmp common_interrupt
CFI_ADJUST_CFA_OFFSET -8
@@ -807,8 +807,8 @@ native_irq_return_iret:
#ifdef CONFIG_X86_ESPFIX64
native_irq_return_ldt:
- pushq_cfi %rax
- pushq_cfi %rdi
+ push_cfi %rax
+ push_cfi %rdi
SWAPGS
movq PER_CPU_VAR(espfix_waddr),%rdi
movq %rax,(0*8)(%rdi) /* RAX */
@@ -823,11 +823,11 @@ native_irq_return_ldt:
movq (5*8)(%rsp),%rax /* RSP */
movq %rax,(4*8)(%rdi)
andl $0xffff0000,%eax
- popq_cfi %rdi
+ pop_cfi %rdi
orq PER_CPU_VAR(espfix_stack),%rax
SWAPGS
movq %rax,%rsp
- popq_cfi %rax
+ pop_cfi %rax
jmp native_irq_return_iret
#endif
@@ -838,9 +838,9 @@ retint_careful:
jnc retint_signal
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
- pushq_cfi %rdi
+ push_cfi %rdi
SCHEDULE_USER
- popq_cfi %rdi
+ pop_cfi %rdi
GET_THREAD_INFO(%rcx)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
@@ -872,7 +872,7 @@ END(common_interrupt)
ENTRY(\sym)
INTR_FRAME
ASM_CLAC
- pushq_cfi $~(\num)
+ push_cfi $~(\num)
.Lcommon_\sym:
interrupt \do_sym
jmp ret_from_intr
@@ -974,7 +974,7 @@ ENTRY(\sym)
PARAVIRT_ADJUST_EXCEPTION_FRAME
.ifeq \has_error_code
- pushq_cfi $-1 /* ORIG_RAX: no syscall to restart */
+ push_cfi $-1 /* ORIG_RAX: no syscall to restart */
.endif
ALLOC_PT_GPREGS_ON_STACK
@@ -1091,14 +1091,14 @@ idtentry simd_coprocessor_error do_simd_coprocessor_error has_error_code=0
/* edi: new selector */
ENTRY(native_load_gs_index)
CFI_STARTPROC
- pushfq_cfi
+ pushf_cfi
DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI)
SWAPGS
gs_change:
movl %edi,%gs
2: mfence /* workaround */
SWAPGS
- popfq_cfi
+ popf_cfi
ret
CFI_ENDPROC
END(native_load_gs_index)
@@ -1116,7 +1116,7 @@ bad_gs:
/* Call softirq on interrupt stack. Interrupts are off. */
ENTRY(do_softirq_own_stack)
CFI_STARTPROC
- pushq_cfi %rbp
+ push_cfi %rbp
CFI_REL_OFFSET rbp,0
mov %rsp,%rbp
CFI_DEF_CFA_REGISTER rbp
@@ -1215,9 +1215,9 @@ ENTRY(xen_failsafe_callback)
CFI_RESTORE r11
addq $0x30,%rsp
CFI_ADJUST_CFA_OFFSET -0x30
- pushq_cfi $0 /* RIP */
- pushq_cfi %r11
- pushq_cfi %rcx
+ push_cfi $0 /* RIP */
+ push_cfi %r11
+ push_cfi %rcx
jmp general_protection
CFI_RESTORE_STATE
1: /* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */
@@ -1227,7 +1227,7 @@ ENTRY(xen_failsafe_callback)
CFI_RESTORE r11
addq $0x30,%rsp
CFI_ADJUST_CFA_OFFSET -0x30
- pushq_cfi $-1 /* orig_ax = -1 => not a system call */
+ push_cfi $-1 /* orig_ax = -1 => not a system call */
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
@@ -1422,7 +1422,7 @@ ENTRY(nmi)
*/
/* Use %rdx as our temp variable throughout */
- pushq_cfi %rdx
+ push_cfi %rdx
CFI_REL_OFFSET rdx, 0
/*
@@ -1478,18 +1478,18 @@ nested_nmi:
movq %rdx, %rsp
CFI_ADJUST_CFA_OFFSET 1*8
leaq -10*8(%rsp), %rdx
- pushq_cfi $__KERNEL_DS
- pushq_cfi %rdx
- pushfq_cfi
- pushq_cfi $__KERNEL_CS
- pushq_cfi $repeat_nmi
+ push_cfi $__KERNEL_DS
+ push_cfi %rdx
+ pushf_cfi
+ push_cfi $__KERNEL_CS
+ push_cfi $repeat_nmi
/* Put stack back */
addq $(6*8), %rsp
CFI_ADJUST_CFA_OFFSET -6*8
nested_nmi_out:
- popq_cfi %rdx
+ pop_cfi %rdx
CFI_RESTORE rdx
/* No need to check faults here */
@@ -1537,7 +1537,7 @@ first_nmi:
CFI_RESTORE rdx
/* Set the NMI executing variable on the stack. */
- pushq_cfi $1
+ push_cfi $1
/*
* Leave room for the "copied" frame
@@ -1547,7 +1547,7 @@ first_nmi:
/* Copy the stack frame to the Saved frame */
.rept 5
- pushq_cfi 11*8(%rsp)
+ push_cfi 11*8(%rsp)
.endr
CFI_DEF_CFA_OFFSET 5*8
@@ -1574,7 +1574,7 @@ repeat_nmi:
addq $(10*8), %rsp
CFI_ADJUST_CFA_OFFSET -10*8
.rept 5
- pushq_cfi -6*8(%rsp)
+ push_cfi -6*8(%rsp)
.endr
subq $(5*8), %rsp
CFI_DEF_CFA_OFFSET 5*8
@@ -1585,7 +1585,7 @@ end_repeat_nmi:
* NMI if the first NMI took an exception and reset our iret stack
* so that we repeat another NMI.
*/
- pushq_cfi $-1 /* ORIG_RAX: no syscall to restart */
+ push_cfi $-1 /* ORIG_RAX: no syscall to restart */
ALLOC_PT_GPREGS_ON_STACK
/*
diff --git a/arch/x86/lib/atomic64_386_32.S b/arch/x86/lib/atomic64_386_32.S
index 00933d5..aa17c69 100644
--- a/arch/x86/lib/atomic64_386_32.S
+++ b/arch/x86/lib/atomic64_386_32.S
@@ -15,12 +15,12 @@
/* if you want SMP support, implement these with real spinlocks */
.macro LOCK reg
- pushfl_cfi
+ pushf_cfi
cli
.endm
.macro UNLOCK reg
- popfl_cfi
+ popf_cfi
.endm
#define BEGIN(op) \
diff --git a/arch/x86/lib/atomic64_cx8_32.S b/arch/x86/lib/atomic64_cx8_32.S
index 082a851..c5dd086 100644
--- a/arch/x86/lib/atomic64_cx8_32.S
+++ b/arch/x86/lib/atomic64_cx8_32.S
@@ -57,10 +57,10 @@ ENDPROC(atomic64_xchg_cx8)
.macro addsub_return func ins insc
ENTRY(atomic64_\func\()_return_cx8)
CFI_STARTPROC
- pushl_cfi_reg ebp
- pushl_cfi_reg ebx
- pushl_cfi_reg esi
- pushl_cfi_reg edi
+ push_cfi_reg ebp
+ push_cfi_reg ebx
+ push_cfi_reg esi
+ push_cfi_reg edi
movl %eax, %esi
movl %edx, %edi
@@ -79,10 +79,10 @@ ENTRY(atomic64_\func\()_return_cx8)
10:
movl %ebx, %eax
movl %ecx, %edx
- popl_cfi_reg edi
- popl_cfi_reg esi
- popl_cfi_reg ebx
- popl_cfi_reg ebp
+ pop_cfi_reg edi
+ pop_cfi_reg esi
+ pop_cfi_reg ebx
+ pop_cfi_reg ebp
ret
CFI_ENDPROC
ENDPROC(atomic64_\func\()_return_cx8)
@@ -94,7 +94,7 @@ addsub_return sub sub sbb
.macro incdec_return func ins insc
ENTRY(atomic64_\func\()_return_cx8)
CFI_STARTPROC
- pushl_cfi_reg ebx
+ push_cfi_reg ebx
read64 %esi
1:
@@ -109,7 +109,7 @@ ENTRY(atomic64_\func\()_return_cx8)
10:
movl %ebx, %eax
movl %ecx, %edx
- popl_cfi_reg ebx
+ pop_cfi_reg ebx
ret
CFI_ENDPROC
ENDPROC(atomic64_\func\()_return_cx8)
@@ -120,7 +120,7 @@ incdec_return dec sub sbb
ENTRY(atomic64_dec_if_positive_cx8)
CFI_STARTPROC
- pushl_cfi_reg ebx
+ push_cfi_reg ebx
read64 %esi
1:
@@ -136,18 +136,18 @@ ENTRY(atomic64_dec_if_positive_cx8)
2:
movl %ebx, %eax
movl %ecx, %edx
- popl_cfi_reg ebx
+ pop_cfi_reg ebx
ret
CFI_ENDPROC
ENDPROC(atomic64_dec_if_positive_cx8)
ENTRY(atomic64_add_unless_cx8)
CFI_STARTPROC
- pushl_cfi_reg ebp
- pushl_cfi_reg ebx
+ push_cfi_reg ebp
+ push_cfi_reg ebx
/* these just push these two parameters on the stack */
- pushl_cfi_reg edi
- pushl_cfi_reg ecx
+ push_cfi_reg edi
+ push_cfi_reg ecx
movl %eax, %ebp
movl %edx, %edi
@@ -169,8 +169,8 @@ ENTRY(atomic64_add_unless_cx8)
3:
addl $8, %esp
CFI_ADJUST_CFA_OFFSET -8
- popl_cfi_reg ebx
- popl_cfi_reg ebp
+ pop_cfi_reg ebx
+ pop_cfi_reg ebp
ret
4:
cmpl %edx, 4(%esp)
@@ -182,7 +182,7 @@ ENDPROC(atomic64_add_unless_cx8)
ENTRY(atomic64_inc_not_zero_cx8)
CFI_STARTPROC
- pushl_cfi_reg ebx
+ push_cfi_reg ebx
read64 %esi
1:
@@ -199,7 +199,7 @@ ENTRY(atomic64_inc_not_zero_cx8)
movl $1, %eax
3:
- popl_cfi_reg ebx
+ pop_cfi_reg ebx
ret
CFI_ENDPROC
ENDPROC(atomic64_inc_not_zero_cx8)
diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 9bc944a..42c1f9f 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -51,8 +51,8 @@ unsigned int csum_partial(const unsigned char * buff, int len, unsigned int sum)
*/
ENTRY(csum_partial)
CFI_STARTPROC
- pushl_cfi_reg esi
- pushl_cfi_reg ebx
+ push_cfi_reg esi
+ push_cfi_reg ebx
movl 20(%esp),%eax # Function arg: unsigned int sum
movl 16(%esp),%ecx # Function arg: int len
movl 12(%esp),%esi # Function arg: unsigned char *buff
@@ -129,8 +129,8 @@ ENTRY(csum_partial)
jz 8f
roll $8, %eax
8:
- popl_cfi_reg ebx
- popl_cfi_reg esi
+ pop_cfi_reg ebx
+ pop_cfi_reg esi
ret
CFI_ENDPROC
ENDPROC(csum_partial)
@@ -141,8 +141,8 @@ ENDPROC(csum_partial)
ENTRY(csum_partial)
CFI_STARTPROC
- pushl_cfi_reg esi
- pushl_cfi_reg ebx
+ push_cfi_reg esi
+ push_cfi_reg ebx
movl 20(%esp),%eax # Function arg: unsigned int sum
movl 16(%esp),%ecx # Function arg: int len
movl 12(%esp),%esi # Function arg: const unsigned char *buf
@@ -249,8 +249,8 @@ ENTRY(csum_partial)
jz 90f
roll $8, %eax
90:
- popl_cfi_reg ebx
- popl_cfi_reg esi
+ pop_cfi_reg ebx
+ pop_cfi_reg esi
ret
CFI_ENDPROC
ENDPROC(csum_partial)
@@ -290,9 +290,9 @@ ENTRY(csum_partial_copy_generic)
CFI_STARTPROC
subl $4,%esp
CFI_ADJUST_CFA_OFFSET 4
- pushl_cfi_reg edi
- pushl_cfi_reg esi
- pushl_cfi_reg ebx
+ push_cfi_reg edi
+ push_cfi_reg esi
+ push_cfi_reg ebx
movl ARGBASE+16(%esp),%eax # sum
movl ARGBASE+12(%esp),%ecx # len
movl ARGBASE+4(%esp),%esi # src
@@ -401,10 +401,10 @@ DST( movb %cl, (%edi) )
.previous
- popl_cfi_reg ebx
- popl_cfi_reg esi
- popl_cfi_reg edi
- popl_cfi %ecx # equivalent to addl $4,%esp
+ pop_cfi_reg ebx
+ pop_cfi_reg esi
+ pop_cfi_reg edi
+ pop_cfi %ecx # equivalent to addl $4,%esp
ret
CFI_ENDPROC
ENDPROC(csum_partial_copy_generic)
@@ -427,9 +427,9 @@ ENDPROC(csum_partial_copy_generic)
ENTRY(csum_partial_copy_generic)
CFI_STARTPROC
- pushl_cfi_reg ebx
- pushl_cfi_reg edi
- pushl_cfi_reg esi
+ push_cfi_reg ebx
+ push_cfi_reg edi
+ push_cfi_reg esi
movl ARGBASE+4(%esp),%esi #src
movl ARGBASE+8(%esp),%edi #dst
movl ARGBASE+12(%esp),%ecx #len
@@ -489,9 +489,9 @@ DST( movb %dl, (%edi) )
jmp 7b
.previous
- popl_cfi_reg esi
- popl_cfi_reg edi
- popl_cfi_reg ebx
+ pop_cfi_reg esi
+ pop_cfi_reg edi
+ pop_cfi_reg ebx
ret
CFI_ENDPROC
ENDPROC(csum_partial_copy_generic)
diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
index 40a1725..b18f317 100644
--- a/arch/x86/lib/cmpxchg16b_emu.S
+++ b/arch/x86/lib/cmpxchg16b_emu.S
@@ -32,7 +32,7 @@ CFI_STARTPROC
# *atomic* on a single cpu (as provided by the this_cpu_xx class of
# macros).
#
- pushfq_cfi
+ pushf_cfi
cli
cmpq PER_CPU_VAR((%rsi)), %rax
@@ -44,13 +44,13 @@ CFI_STARTPROC
movq %rcx, PER_CPU_VAR(8(%rsi))
CFI_REMEMBER_STATE
- popfq_cfi
+ popf_cfi
mov $1, %al
ret
CFI_RESTORE_STATE
.Lnot_same:
- popfq_cfi
+ popf_cfi
xor %al,%al
ret
diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
index b4807fce..a4862d0 100644
--- a/arch/x86/lib/cmpxchg8b_emu.S
+++ b/arch/x86/lib/cmpxchg8b_emu.S
@@ -27,7 +27,7 @@ CFI_STARTPROC
# set the whole ZF thing (caller will just compare
# eax:edx with the expected value)
#
- pushfl_cfi
+ pushf_cfi
cli
cmpl (%esi), %eax
@@ -39,7 +39,7 @@ CFI_STARTPROC
movl %ecx, 4(%esi)
CFI_REMEMBER_STATE
- popfl_cfi
+ popf_cfi
ret
CFI_RESTORE_STATE
@@ -48,7 +48,7 @@ CFI_STARTPROC
.Lhalf_same:
movl 4(%esi), %edx
- popfl_cfi
+ popf_cfi
ret
CFI_ENDPROC
diff --git a/arch/x86/lib/msr-reg.S b/arch/x86/lib/msr-reg.S
index 3ca5218..046a560 100644
--- a/arch/x86/lib/msr-reg.S
+++ b/arch/x86/lib/msr-reg.S
@@ -14,8 +14,8 @@
.macro op_safe_regs op
ENTRY(\op\()_safe_regs)
CFI_STARTPROC
- pushq_cfi_reg rbx
- pushq_cfi_reg rbp
+ push_cfi_reg rbx
+ push_cfi_reg rbp
movq %rdi, %r10 /* Save pointer */
xorl %r11d, %r11d /* Return value */
movl (%rdi), %eax
@@ -35,8 +35,8 @@ ENTRY(\op\()_safe_regs)
movl %ebp, 20(%r10)
movl %esi, 24(%r10)
movl %edi, 28(%r10)
- popq_cfi_reg rbp
- popq_cfi_reg rbx
+ pop_cfi_reg rbp
+ pop_cfi_reg rbx
ret
3:
CFI_RESTORE_STATE
@@ -53,12 +53,12 @@ ENDPROC(\op\()_safe_regs)
.macro op_safe_regs op
ENTRY(\op\()_safe_regs)
CFI_STARTPROC
- pushl_cfi_reg ebx
- pushl_cfi_reg ebp
- pushl_cfi_reg esi
- pushl_cfi_reg edi
- pushl_cfi $0 /* Return value */
- pushl_cfi %eax
+ push_cfi_reg ebx
+ push_cfi_reg ebp
+ push_cfi_reg esi
+ push_cfi_reg edi
+ push_cfi $0 /* Return value */
+ push_cfi %eax
movl 4(%eax), %ecx
movl 8(%eax), %edx
movl 12(%eax), %ebx
@@ -68,9 +68,9 @@ ENTRY(\op\()_safe_regs)
movl (%eax), %eax
CFI_REMEMBER_STATE
1: \op
-2: pushl_cfi %eax
+2: push_cfi %eax
movl 4(%esp), %eax
- popl_cfi (%eax)
+ pop_cfi (%eax)
addl $4, %esp
CFI_ADJUST_CFA_OFFSET -4
movl %ecx, 4(%eax)
@@ -79,11 +79,11 @@ ENTRY(\op\()_safe_regs)
movl %ebp, 20(%eax)
movl %esi, 24(%eax)
movl %edi, 28(%eax)
- popl_cfi %eax
- popl_cfi_reg edi
- popl_cfi_reg esi
- popl_cfi_reg ebp
- popl_cfi_reg ebx
+ pop_cfi %eax
+ pop_cfi_reg edi
+ pop_cfi_reg esi
+ pop_cfi_reg ebp
+ pop_cfi_reg ebx
ret
3:
CFI_RESTORE_STATE
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
index 2322abe..c630a80 100644
--- a/arch/x86/lib/rwsem.S
+++ b/arch/x86/lib/rwsem.S
@@ -34,10 +34,10 @@
*/
#define save_common_regs \
- pushl_cfi_reg ecx
+ push_cfi_reg ecx
#define restore_common_regs \
- popl_cfi_reg ecx
+ pop_cfi_reg ecx
/* Avoid uglifying the argument copying x86-64 needs to do. */
.macro movq src, dst
@@ -64,22 +64,22 @@
*/
#define save_common_regs \
- pushq_cfi_reg rdi; \
- pushq_cfi_reg rsi; \
- pushq_cfi_reg rcx; \
- pushq_cfi_reg r8; \
- pushq_cfi_reg r9; \
- pushq_cfi_reg r10; \
- pushq_cfi_reg r11
+ push_cfi_reg rdi; \
+ push_cfi_reg rsi; \
+ push_cfi_reg rcx; \
+ push_cfi_reg r8; \
+ push_cfi_reg r9; \
+ push_cfi_reg r10; \
+ push_cfi_reg r11
#define restore_common_regs \
- popq_cfi_reg r11; \
- popq_cfi_reg r10; \
- popq_cfi_reg r9; \
- popq_cfi_reg r8; \
- popq_cfi_reg rcx; \
- popq_cfi_reg rsi; \
- popq_cfi_reg rdi
+ pop_cfi_reg r11; \
+ pop_cfi_reg r10; \
+ pop_cfi_reg r9; \
+ pop_cfi_reg r8; \
+ pop_cfi_reg rcx; \
+ pop_cfi_reg rsi; \
+ pop_cfi_reg rdi
#endif
@@ -87,10 +87,10 @@
ENTRY(call_rwsem_down_read_failed)
CFI_STARTPROC
save_common_regs
- __ASM_SIZE(push,_cfi_reg) __ASM_REG(dx)
+ push_cfi_reg __ASM_REG(dx)
movq %rax,%rdi
call rwsem_down_read_failed
- __ASM_SIZE(pop,_cfi_reg) __ASM_REG(dx)
+ pop_cfi_reg __ASM_REG(dx)
restore_common_regs
ret
CFI_ENDPROC
@@ -122,10 +122,10 @@ ENDPROC(call_rwsem_wake)
ENTRY(call_rwsem_downgrade_wake)
CFI_STARTPROC
save_common_regs
- __ASM_SIZE(push,_cfi_reg) __ASM_REG(dx)
+ push_cfi_reg __ASM_REG(dx)
movq %rax,%rdi
call rwsem_downgrade_wake
- __ASM_SIZE(pop,_cfi_reg) __ASM_REG(dx)
+ pop_cfi_reg __ASM_REG(dx)
restore_common_regs
ret
CFI_ENDPROC
diff --git a/arch/x86/lib/thunk_32.S b/arch/x86/lib/thunk_32.S
index 5eb7150..bb370de 100644
--- a/arch/x86/lib/thunk_32.S
+++ b/arch/x86/lib/thunk_32.S
@@ -13,9 +13,9 @@
.globl \name
\name:
CFI_STARTPROC
- pushl_cfi_reg eax
- pushl_cfi_reg ecx
- pushl_cfi_reg edx
+ push_cfi_reg eax
+ push_cfi_reg ecx
+ push_cfi_reg edx
.if \put_ret_addr_in_eax
/* Place EIP in the arg1 */
@@ -23,9 +23,9 @@
.endif
call \func
- popl_cfi_reg edx
- popl_cfi_reg ecx
- popl_cfi_reg eax
+ pop_cfi_reg edx
+ pop_cfi_reg ecx
+ pop_cfi_reg eax
ret
CFI_ENDPROC
_ASM_NOKPROBE(\name)
diff --git a/arch/x86/lib/thunk_64.S b/arch/x86/lib/thunk_64.S
index f89ba4e9..39ad268 100644
--- a/arch/x86/lib/thunk_64.S
+++ b/arch/x86/lib/thunk_64.S
@@ -17,15 +17,15 @@
CFI_STARTPROC
/* this one pushes 9 elems, the next one would be %rIP */
- pushq_cfi_reg rdi
- pushq_cfi_reg rsi
- pushq_cfi_reg rdx
- pushq_cfi_reg rcx
- pushq_cfi_reg rax
- pushq_cfi_reg r8
- pushq_cfi_reg r9
- pushq_cfi_reg r10
- pushq_cfi_reg r11
+ push_cfi_reg rdi
+ push_cfi_reg rsi
+ push_cfi_reg rdx
+ push_cfi_reg rcx
+ push_cfi_reg rax
+ push_cfi_reg r8
+ push_cfi_reg r9
+ push_cfi_reg r10
+ push_cfi_reg r11
.if \put_ret_addr_in_rdi
/* 9*8(%rsp) is return addr on stack */
@@ -60,15 +60,15 @@
CFI_STARTPROC
CFI_ADJUST_CFA_OFFSET 9*8
restore:
- popq_cfi_reg r11
- popq_cfi_reg r10
- popq_cfi_reg r9
- popq_cfi_reg r8
- popq_cfi_reg rax
- popq_cfi_reg rcx
- popq_cfi_reg rdx
- popq_cfi_reg rsi
- popq_cfi_reg rdi
+ pop_cfi_reg r11
+ pop_cfi_reg r10
+ pop_cfi_reg r9
+ pop_cfi_reg r8
+ pop_cfi_reg rax
+ pop_cfi_reg rcx
+ pop_cfi_reg rdx
+ pop_cfi_reg rsi
+ pop_cfi_reg rdi
ret
CFI_ENDPROC
_ASM_NOKPROBE(restore)
--
2.1.0
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH v4 3/3] x86, stackvalidate: Add asm frame pointer setup macros
2015-05-18 16:34 [PATCH v4 0/3] Compile-time stack frame pointer validation Josh Poimboeuf
2015-05-18 16:34 ` [PATCH v4 1/3] x86, stackvalidate: " Josh Poimboeuf
2015-05-18 16:34 ` [PATCH v4 2/3] x86: Make push/pop CFI macros arch-independent Josh Poimboeuf
@ 2015-05-18 16:34 ` Josh Poimboeuf
2015-05-20 10:33 ` [PATCH v4 0/3] Compile-time stack frame pointer validation Ingo Molnar
3 siblings, 0 replies; 400+ messages in thread
From: Josh Poimboeuf @ 2015-05-18 16:34 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
Cc: Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel
Add some helper macros for asm functions so that they can comply with
stackvalidate.
The FUNC_ENTER and FUNC_RETURN macros help asm functions save, set up,
and restore frame pointers.
The RET_NOVALIDATE and FILE_NOVALIDATE macros can be used to whitelist
the few locations which need a return instruction outside of a callable
function.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
arch/x86/include/asm/func.h | 82 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 82 insertions(+)
create mode 100644 arch/x86/include/asm/func.h
diff --git a/arch/x86/include/asm/func.h b/arch/x86/include/asm/func.h
new file mode 100644
index 0000000..cd27ad4
--- /dev/null
+++ b/arch/x86/include/asm/func.h
@@ -0,0 +1,82 @@
+#ifndef _ASM_X86_FUNC_H
+#define _ASM_X86_FUNC_H
+
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+#include <asm/asm.h>
+
+.macro FUNC_ENTER_NO_FP name
+ ENTRY(\name)
+ CFI_STARTPROC
+ CFI_DEF_CFA _ASM_SP, __ASM_SEL(4, 8)
+.endm
+
+.macro FUNC_RETURN_NO_FP name
+ CFI_DEF_CFA _ASM_SP, __ASM_SEL(4, 8)
+ ret
+ CFI_ENDPROC
+ ENDPROC(\name)
+.endm
+
+#ifdef CONFIG_FRAME_POINTER
+
+.macro FUNC_ENTER_FP name
+ FUNC_ENTER_NO_FP \name
+ push_cfi %_ASM_BP
+ CFI_REL_OFFSET _ASM_BP, 0
+ _ASM_MOV %_ASM_SP, %_ASM_BP
+ CFI_DEF_CFA_REGISTER _ASM_BP
+.endm
+
+.macro FUNC_RETURN_FP name
+ pop_cfi %_ASM_BP
+ CFI_RESTORE _ASM_BP
+ FUNC_RETURN_NO_FP \name
+.endm
+
+/*
+ * Every callable asm function should be bookended with FUNC_ENTER and
+ * FUNC_RETURN. They do proper frame pointer and DWARF CFI setups in order to
+ * achieve more reliable stack traces.
+ *
+ * For the sake of simplicity and correct DWARF annotations, use of the macros
+ * requires that the return instruction comes at the end of the function.
+ */
+#define FUNC_ENTER(name) FUNC_ENTER_FP name
+#define FUNC_RETURN(name) FUNC_RETURN_FP name
+
+/*
+ * RET_NOVALIDATE tells the stack validation script to whitelist the return
+ * instruction immediately after the macro. Only use it if you're completely
+ * sure you need a return instruction outside of a callable function.
+ * Otherwise, if the code can be called and you haven't annotated it with
+ * FUNC_ENTER/FUNC_RETURN, it will break stack trace reliability.
+ */
+.macro RET_NOVALIDATE
+ 163:
+ .pushsection __stackvalidate_whitelist_ret, "ae"
+ _ASM_ALIGN
+ .long 163b - .
+ .popsection
+.endm
+
+/*
+ * FILE_NOVALIDATE is like RET_NOVALIDATE except it whitelists the entire file.
+ * Use with extreme caution or you will silently break stack traces.
+ */
+.macro FILE_NOVALIDATE
+ .pushsection __stackvalidate_whitelist_file, "ae"
+ .long 0
+ .popsection
+.endm
+
+#else /* !FRAME_POINTER */
+
+#define FUNC_ENTER(name) FUNC_ENTER_NO_FP name
+#define FUNC_RETURN(name) FUNC_RETURN_NO_FP name
+#define RET_NOVALIDATE
+#define FILE_NOVALIDATE
+
+#endif /* FRAME_POINTER */
+
+#endif /* _ASM_X86_FUNC_H */
--
2.1.0
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-18 16:34 [PATCH v4 0/3] Compile-time stack frame pointer validation Josh Poimboeuf
` (2 preceding siblings ...)
2015-05-18 16:34 ` [PATCH v4 3/3] x86, stackvalidate: Add asm frame pointer setup macros Josh Poimboeuf
@ 2015-05-20 10:33 ` Ingo Molnar
2015-05-20 14:13 ` Josh Poimboeuf
3 siblings, 1 reply; 400+ messages in thread
From: Ingo Molnar @ 2015-05-20 10:33 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michal Marek,
Peter Zijlstra, x86, live-patching, linux-kernel
* Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> In discussions around the live kernel patching consistency model RFC
> [1], Peter and Ingo correctly pointed out that stack traces aren't
> reliable. And as Ingo said, there's no "strong force" which ensures we
> can rely on them.
>
> So I've been thinking about how to fix that. My goal is to eventually
> make stack traces reliable. Or at the very least, to be able to detect
> at runtime when a given stack trace *might* be unreliable. But improved
> stack traces would broadly benefit the entire kernel, regardless of the
> outcome of the live kernel patching consistency model discussions.
>
> This patch set is just the first in a series of proposed stack trace
> reliability improvements. Future proposals will include runtime stack
> reliability checking, as well as compile-time and runtime DWARF
> validations.
>
> As far as I can tell, there are two main obstacles which prevent frame
> pointer based stack traces from being reliable:
>
> 1) Missing frame pointer logic: currently, most assembly functions don't
> set up the frame pointer.
Could you please paste here the output of what the new checks print
for x86/64 defconfig?
> As a first step, all reported non-compliances result in warnings.
> Right now I'm seeing 200+ warnings. Once we get them all cleaned
> up, we can change the warnings to build errors so the asm code can
> stay clean.
That's quite a bit ...
Thanks,
Ingo
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 10:33 ` [PATCH v4 0/3] Compile-time stack frame pointer validation Ingo Molnar
@ 2015-05-20 14:13 ` Josh Poimboeuf
2015-05-20 14:48 ` Ingo Molnar
0 siblings, 1 reply; 400+ messages in thread
From: Josh Poimboeuf @ 2015-05-20 14:13 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michal Marek,
Peter Zijlstra, x86, live-patching, linux-kernel
On Wed, May 20, 2015 at 12:33:39PM +0200, Ingo Molnar wrote:
>
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> > In discussions around the live kernel patching consistency model RFC
> > [1], Peter and Ingo correctly pointed out that stack traces aren't
> > reliable. And as Ingo said, there's no "strong force" which ensures we
> > can rely on them.
> >
> > So I've been thinking about how to fix that. My goal is to eventually
> > make stack traces reliable. Or at the very least, to be able to detect
> > at runtime when a given stack trace *might* be unreliable. But improved
> > stack traces would broadly benefit the entire kernel, regardless of the
> > outcome of the live kernel patching consistency model discussions.
> >
> > This patch set is just the first in a series of proposed stack trace
> > reliability improvements. Future proposals will include runtime stack
> > reliability checking, as well as compile-time and runtime DWARF
> > validations.
> >
> > As far as I can tell, there are two main obstacles which prevent frame
> > pointer based stack traces from being reliable:
> >
> > 1) Missing frame pointer logic: currently, most assembly functions don't
> > set up the frame pointer.
>
> Could you please paste here the output of what the new checks print
> for x86/64 defconfig?
Here are all 89 warnings from defconfig:
arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e. Please use FUNC_ENTER.
arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359. Please use FUNC_ENTER.
arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be. Please use FUNC_ENTER.
arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5. Please use FUNC_ENTER.
arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21. Please use FUNC_ENTER.
arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb. Please use FUNC_ENTER.
arch/x86/kernel/acpi/wakeup_64.o: wakeup_long64() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b. Please use FUNC_ENTER.
arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7. Please use FUNC_ENTER.
arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110. Please use FUNC_ENTER.
arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145. Please use FUNC_ENTER.
arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4. Please use FUNC_ENTER.
arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170. Please use FUNC_ENTER.
arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176. Please use FUNC_ENTER.
arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2. Please use FUNC_ENTER.
arch/x86/kernel/head_64.o: start_cpu0() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a. Please use FUNC_ENTER.
arch/x86/realmode/rm/copy.o: memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/realmode/rm/copy.o: memset() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/realmode/rm/bioscall.o: intcall() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/vdso/vdso32/int80.o: __kernel_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/vdso/vdso32/int80.o: __kernel_rt_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/vdso/vdso32/int80.o: __kernel_vsyscall() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/vdso/vdso32/syscall.o: __kernel_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/vdso/vdso32/syscall.o: __kernel_rt_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/vdso/vdso32/syscall.o: __kernel_vsyscall() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/vdso/vdso32/sysenter.o: __kernel_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/vdso/vdso32/sysenter.o: __kernel_rt_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69. Please use FUNC_ENTER.
arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d. Please use FUNC_ENTER.
arch/x86/lib/msr-reg.o: rdmsr_safe_regs() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/msr-reg.o: wrmsr_safe_regs() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/iomap_copy_64.o: __iowrite32_copy() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/clear_page_64.o: clear_page() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/clear_page_64.o: clear_page_orig() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/clear_page_64.o: clear_page_c_e() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/cmpxchg16b_emu.o: this_cpu_cmpxchg16b_emu() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/copy_page_64.o: copy_page() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/copy_page_64.o: copy_page_regs() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: _copy_to_user() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: _copy_from_user() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: copy_user_generic_unrolled() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: copy_user_generic_string() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: copy_user_enhanced_fast_string() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: __copy_user_nocache() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: bad_from_user() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/csum-copy_64.o: csum_partial_copy_generic() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/getuser.o: __get_user_1() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/getuser.o: __get_user_2() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/getuser.o: __get_user_4() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/getuser.o: __get_user_8() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5. Please use FUNC_ENTER.
arch/x86/lib/memcpy_64.o: memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/memcpy_64.o: __memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/memcpy_64.o: memcpy_erms() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/memcpy_64.o: memcpy_orig() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/memmove_64.o: memmove() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/memmove_64.o: __memmove() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5. Please use FUNC_ENTER.
arch/x86/lib/memset_64.o: memset() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/memset_64.o: __memset() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/memset_64.o: memset_erms() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/memset_64.o: memset_orig() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/putuser.o: __put_user_1() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/putuser.o: __put_user_2() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/putuser.o: __put_user_4() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/putuser.o: __put_user_8() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1. Please use FUNC_ENTER.
arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/boot/bioscall.o: intcall() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/boot/copy.o: memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/boot/copy.o: memset() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/boot/pmjump.o: protected_mode_jump() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/boot/pmjump.o: in_pm32() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e. Please use FUNC_ENTER.
arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172. Please use FUNC_ENTER.
arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic. Please use FUNC_ENTER.
arch/x86/boot/header.o: die() is missing frame pointer logic. Please use FUNC_ENTER.
> > As a first step, all reported non-compliances result in warnings.
> > Right now I'm seeing 200+ warnings. Once we get them all cleaned
> > up, we can change the warnings to build errors so the asm code can
> > stay clean.
>
> That's quite a bit ...
Yeah, a Fedora-based config has over 200 warnings. Most of the
differences between the above 89 warnings for defconfig and the 200+ for
a Fedora config seem to be caused by xen, crypto and bpf.
--
Josh
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 14:13 ` Josh Poimboeuf
@ 2015-05-20 14:48 ` Ingo Molnar
2015-05-20 15:51 ` Josh Poimboeuf
` (2 more replies)
0 siblings, 3 replies; 400+ messages in thread
From: Ingo Molnar @ 2015-05-20 14:48 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michal Marek,
Peter Zijlstra, x86, live-patching, linux-kernel, Linus Torvalds,
Andy Lutomirski, Denys Vlasenko, Brian Gerst, Peter Zijlstra,
Borislav Petkov, Andrew Morton
* Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, May 20, 2015 at 12:33:39PM +0200, Ingo Molnar wrote:
> >
> > * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > > In discussions around the live kernel patching consistency model RFC
> > > [1], Peter and Ingo correctly pointed out that stack traces aren't
> > > reliable. And as Ingo said, there's no "strong force" which ensures we
> > > can rely on them.
> > >
> > > So I've been thinking about how to fix that. My goal is to eventually
> > > make stack traces reliable. Or at the very least, to be able to detect
> > > at runtime when a given stack trace *might* be unreliable. But improved
> > > stack traces would broadly benefit the entire kernel, regardless of the
> > > outcome of the live kernel patching consistency model discussions.
> > >
> > > This patch set is just the first in a series of proposed stack trace
> > > reliability improvements. Future proposals will include runtime stack
> > > reliability checking, as well as compile-time and runtime DWARF
> > > validations.
> > >
> > > As far as I can tell, there are two main obstacles which prevent frame
> > > pointer based stack traces from being reliable:
> > >
> > > 1) Missing frame pointer logic: currently, most assembly functions don't
> > > set up the frame pointer.
> >
> > Could you please paste here the output of what the new checks print
> > for x86/64 defconfig?
>
> Here are all 89 warnings from defconfig:
>
> arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e. Please use FUNC_ENTER.
> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359. Please use FUNC_ENTER.
> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be. Please use FUNC_ENTER.
> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5. Please use FUNC_ENTER.
> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21. Please use FUNC_ENTER.
> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb. Please use FUNC_ENTER.
> arch/x86/kernel/acpi/wakeup_64.o: wakeup_long64() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b. Please use FUNC_ENTER.
> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7. Please use FUNC_ENTER.
> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110. Please use FUNC_ENTER.
> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145. Please use FUNC_ENTER.
> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4. Please use FUNC_ENTER.
> arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170. Please use FUNC_ENTER.
> arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176. Please use FUNC_ENTER.
> arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2. Please use FUNC_ENTER.
> arch/x86/kernel/head_64.o: start_cpu0() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a. Please use FUNC_ENTER.
> arch/x86/realmode/rm/copy.o: memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/realmode/rm/copy.o: memset() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/realmode/rm/bioscall.o: intcall() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/int80.o: __kernel_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/int80.o: __kernel_rt_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/int80.o: __kernel_vsyscall() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/syscall.o: __kernel_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/syscall.o: __kernel_rt_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/syscall.o: __kernel_vsyscall() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/sysenter.o: __kernel_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/sysenter.o: __kernel_rt_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69. Please use FUNC_ENTER.
> arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d. Please use FUNC_ENTER.
> arch/x86/lib/msr-reg.o: rdmsr_safe_regs() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/msr-reg.o: wrmsr_safe_regs() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/iomap_copy_64.o: __iowrite32_copy() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/clear_page_64.o: clear_page() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/clear_page_64.o: clear_page_orig() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/clear_page_64.o: clear_page_c_e() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/cmpxchg16b_emu.o: this_cpu_cmpxchg16b_emu() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/copy_page_64.o: copy_page() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/copy_page_64.o: copy_page_regs() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: _copy_to_user() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: _copy_from_user() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: copy_user_generic_unrolled() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: copy_user_generic_string() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: copy_user_enhanced_fast_string() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: __copy_user_nocache() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: bad_from_user() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/csum-copy_64.o: csum_partial_copy_generic() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/getuser.o: __get_user_1() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/getuser.o: __get_user_2() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/getuser.o: __get_user_4() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/getuser.o: __get_user_8() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5. Please use FUNC_ENTER.
> arch/x86/lib/memcpy_64.o: memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/memcpy_64.o: __memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/memcpy_64.o: memcpy_erms() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/memcpy_64.o: memcpy_orig() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/memmove_64.o: memmove() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/memmove_64.o: __memmove() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5. Please use FUNC_ENTER.
> arch/x86/lib/memset_64.o: memset() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/memset_64.o: __memset() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/memset_64.o: memset_erms() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/memset_64.o: memset_orig() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/putuser.o: __put_user_1() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/putuser.o: __put_user_2() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/putuser.o: __put_user_4() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/putuser.o: __put_user_8() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1. Please use FUNC_ENTER.
> arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/boot/bioscall.o: intcall() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/boot/copy.o: memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/boot/copy.o: memset() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/boot/pmjump.o: protected_mode_jump() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/boot/pmjump.o: in_pm32() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e. Please use FUNC_ENTER.
> arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172. Please use FUNC_ENTER.
> arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic. Please use FUNC_ENTER.
> arch/x86/boot/header.o: die() is missing frame pointer logic. Please use FUNC_ENTER.
Yeah, so many of these seem to be 'leaf only' functions: functions
that don't ever call functions themselves.
So lets assume we always have CONFIG_FRAME_POINTERS=y.
If they don't set up a frame pointer then they in essence won't show
up in the call chain - but normally they wouldn't because they call
nothing.
If they trigger an exception/fault or if they get hit by an interrupt
then I think we'll still correctly walk the stack - just those
functions might be missing from the deterministic call chain, right?
(it will still show up as a '?' entry.)
If they crash then we'll see them because the crashing RIP will be
printed.
So I'm wondering what the x86 policy here should be: to create frame
pointers in them or not. Cc:-ed a few more gents for thoughts.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 14:48 ` Ingo Molnar
@ 2015-05-20 15:51 ` Josh Poimboeuf
2015-05-20 16:09 ` Josh Poimboeuf
2015-05-20 16:03 ` Andy Lutomirski
2015-05-21 20:54 ` Josh Poimboeuf
2 siblings, 1 reply; 400+ messages in thread
From: Josh Poimboeuf @ 2015-05-20 15:51 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michal Marek,
Peter Zijlstra, x86, live-patching, linux-kernel, Linus Torvalds,
Andy Lutomirski, Denys Vlasenko, Brian Gerst, Peter Zijlstra,
Borislav Petkov, Andrew Morton
On Wed, May 20, 2015 at 04:48:10PM +0200, Ingo Molnar wrote:
>
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> > On Wed, May 20, 2015 at 12:33:39PM +0200, Ingo Molnar wrote:
> > >
> > > * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >
> > > > In discussions around the live kernel patching consistency model RFC
> > > > [1], Peter and Ingo correctly pointed out that stack traces aren't
> > > > reliable. And as Ingo said, there's no "strong force" which ensures we
> > > > can rely on them.
> > > >
> > > > So I've been thinking about how to fix that. My goal is to eventually
> > > > make stack traces reliable. Or at the very least, to be able to detect
> > > > at runtime when a given stack trace *might* be unreliable. But improved
> > > > stack traces would broadly benefit the entire kernel, regardless of the
> > > > outcome of the live kernel patching consistency model discussions.
> > > >
> > > > This patch set is just the first in a series of proposed stack trace
> > > > reliability improvements. Future proposals will include runtime stack
> > > > reliability checking, as well as compile-time and runtime DWARF
> > > > validations.
> > > >
> > > > As far as I can tell, there are two main obstacles which prevent frame
> > > > pointer based stack traces from being reliable:
> > > >
> > > > 1) Missing frame pointer logic: currently, most assembly functions don't
> > > > set up the frame pointer.
> > >
> > > Could you please paste here the output of what the new checks print
> > > for x86/64 defconfig?
> >
> > Here are all 89 warnings from defconfig:
> >
> > arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e. Please use FUNC_ENTER.
> > arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359. Please use FUNC_ENTER.
> > arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be. Please use FUNC_ENTER.
> > arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5. Please use FUNC_ENTER.
> > arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21. Please use FUNC_ENTER.
> > arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb. Please use FUNC_ENTER.
> > arch/x86/kernel/acpi/wakeup_64.o: wakeup_long64() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b. Please use FUNC_ENTER.
> > arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7. Please use FUNC_ENTER.
> > arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110. Please use FUNC_ENTER.
> > arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145. Please use FUNC_ENTER.
> > arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4. Please use FUNC_ENTER.
> > arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170. Please use FUNC_ENTER.
> > arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176. Please use FUNC_ENTER.
> > arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2. Please use FUNC_ENTER.
> > arch/x86/kernel/head_64.o: start_cpu0() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a. Please use FUNC_ENTER.
> > arch/x86/realmode/rm/copy.o: memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/realmode/rm/copy.o: memset() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/realmode/rm/bioscall.o: intcall() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/int80.o: __kernel_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/int80.o: __kernel_rt_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/int80.o: __kernel_vsyscall() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/syscall.o: __kernel_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/syscall.o: __kernel_rt_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/syscall.o: __kernel_vsyscall() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/sysenter.o: __kernel_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/sysenter.o: __kernel_rt_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69. Please use FUNC_ENTER.
> > arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d. Please use FUNC_ENTER.
> > arch/x86/lib/msr-reg.o: rdmsr_safe_regs() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/msr-reg.o: wrmsr_safe_regs() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/iomap_copy_64.o: __iowrite32_copy() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/clear_page_64.o: clear_page() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/clear_page_64.o: clear_page_orig() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/clear_page_64.o: clear_page_c_e() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/cmpxchg16b_emu.o: this_cpu_cmpxchg16b_emu() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/copy_page_64.o: copy_page() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/copy_page_64.o: copy_page_regs() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: _copy_to_user() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: _copy_from_user() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: copy_user_generic_unrolled() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: copy_user_generic_string() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: copy_user_enhanced_fast_string() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: __copy_user_nocache() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: bad_from_user() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/csum-copy_64.o: csum_partial_copy_generic() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/getuser.o: __get_user_1() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/getuser.o: __get_user_2() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/getuser.o: __get_user_4() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/getuser.o: __get_user_8() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5. Please use FUNC_ENTER.
> > arch/x86/lib/memcpy_64.o: memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/memcpy_64.o: __memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/memcpy_64.o: memcpy_erms() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/memcpy_64.o: memcpy_orig() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/memmove_64.o: memmove() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/memmove_64.o: __memmove() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5. Please use FUNC_ENTER.
> > arch/x86/lib/memset_64.o: memset() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/memset_64.o: __memset() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/memset_64.o: memset_erms() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/memset_64.o: memset_orig() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/putuser.o: __put_user_1() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/putuser.o: __put_user_2() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/putuser.o: __put_user_4() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/putuser.o: __put_user_8() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1. Please use FUNC_ENTER.
> > arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/boot/bioscall.o: intcall() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/boot/copy.o: memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/boot/copy.o: memset() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/boot/pmjump.o: protected_mode_jump() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/boot/pmjump.o: in_pm32() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e. Please use FUNC_ENTER.
> > arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172. Please use FUNC_ENTER.
> > arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic. Please use FUNC_ENTER.
> > arch/x86/boot/header.o: die() is missing frame pointer logic. Please use FUNC_ENTER.
>
> Yeah, so many of these seem to be 'leaf only' functions: functions
> that don't ever call functions themselves.
Yeah, good observation.
> So lets assume we always have CONFIG_FRAME_POINTERS=y.
>
> If they don't set up a frame pointer then they in essence won't show
> up in the call chain
It's actually the _caller_ of the asm function which gets skipped in the
trace.
(Though it doesn't really matter -- either way it's unreliable.)
> but normally they wouldn't because they call nothing.
>
> If they trigger an exception/fault or if they get hit by an interrupt
> then I think we'll still correctly walk the stack - just those
> functions might be missing from the deterministic call chain, right?
> (it will still show up as a '?' entry.)
Right. This patch set takes the more conservative approach of requiring
_all_ callable asm functions to have frame pointer logic. Which has the
benefit of getting rid of some of the cases where we need the '?' stack
entries.
> If they crash then we'll see them because the crashing RIP will be
> printed.
>
> So I'm wondering what the x86 policy here should be: to create frame
> pointers in them or not. Cc:-ed a few more gents for thoughts.
I agree that frame pointers aren't strictly necessary for leaf
functions.
I could easily relax the stackvalidate restrictions to exclude the
checking of leaf functions. In fact I think that would be more
consistent with how gcc does it, so maybe that's a more reasonable
approach.
--
Josh
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 15:51 ` Josh Poimboeuf
@ 2015-05-20 16:09 ` Josh Poimboeuf
0 siblings, 0 replies; 400+ messages in thread
From: Josh Poimboeuf @ 2015-05-20 16:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michal Marek,
Peter Zijlstra, x86, live-patching, linux-kernel, Linus Torvalds,
Andy Lutomirski, Denys Vlasenko, Brian Gerst, Peter Zijlstra,
Borislav Petkov, Andrew Morton
On Wed, May 20, 2015 at 10:51:56AM -0500, Josh Poimboeuf wrote:
> On Wed, May 20, 2015 at 04:48:10PM +0200, Ingo Molnar wrote:
> > Yeah, so many of these seem to be 'leaf only' functions: functions
> > that don't ever call functions themselves.
>
> Yeah, good observation.
>
> > So lets assume we always have CONFIG_FRAME_POINTERS=y.
> >
> > If they don't set up a frame pointer then they in essence won't show
> > up in the call chain
>
> It's actually the _caller_ of the asm function which gets skipped in the
> trace.
>
> (Though it doesn't really matter -- either way it's unreliable.)
>
> > but normally they wouldn't because they call nothing.
> >
> > If they trigger an exception/fault or if they get hit by an interrupt
> > then I think we'll still correctly walk the stack - just those
> > functions might be missing from the deterministic call chain, right?
> > (it will still show up as a '?' entry.)
>
> Right. This patch set takes the more conservative approach of requiring
> _all_ callable asm functions to have frame pointer logic. Which has the
> benefit of getting rid of some of the cases where we need the '?' stack
> entries.
>
> > If they crash then we'll see them because the crashing RIP will be
> > printed.
> >
> > So I'm wondering what the x86 policy here should be: to create frame
> > pointers in them or not. Cc:-ed a few more gents for thoughts.
>
> I agree that frame pointers aren't strictly necessary for leaf
> functions.
>
> I could easily relax the stackvalidate restrictions to exclude the
> checking of leaf functions. In fact I think that would be more
> consistent with how gcc does it, so maybe that's a more reasonable
> approach.
I remembered another reason why I went with the more conservative
approach of requiring frame pointers in leaf functions.
It's often hard to pin down where an asm function begins and where it
ends. For example, you might have something like:
ENTRY(callable_asm_func)
jmp label
ENDPROC(callable_asm_func)
label:
call some_c_function
ret
If we were to relax the stackvalidate restrictions then we'd miss that
kind of (surprisingly common) situation, where a function jumps outside
of its scope.
Then again, I guess it would be pretty easy to add checks for that.
--
Josh
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 14:48 ` Ingo Molnar
2015-05-20 15:51 ` Josh Poimboeuf
@ 2015-05-20 16:03 ` Andy Lutomirski
2015-05-20 16:25 ` Josh Poimboeuf
2015-05-20 17:27 ` Peter Zijlstra
2015-05-21 20:54 ` Josh Poimboeuf
2 siblings, 2 replies; 400+ messages in thread
From: Andy Lutomirski @ 2015-05-20 16:03 UTC (permalink / raw)
To: Ingo Molnar
Cc: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Michal Marek, Peter Zijlstra, X86 ML, live-patching,
linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton
On Wed, May 20, 2015 at 7:48 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
>> On Wed, May 20, 2015 at 12:33:39PM +0200, Ingo Molnar wrote:
>> >
>> > * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> >
>> > > In discussions around the live kernel patching consistency model RFC
>> > > [1], Peter and Ingo correctly pointed out that stack traces aren't
>> > > reliable. And as Ingo said, there's no "strong force" which ensures we
>> > > can rely on them.
>> > >
>> > > So I've been thinking about how to fix that. My goal is to eventually
>> > > make stack traces reliable. Or at the very least, to be able to detect
>> > > at runtime when a given stack trace *might* be unreliable. But improved
>> > > stack traces would broadly benefit the entire kernel, regardless of the
>> > > outcome of the live kernel patching consistency model discussions.
>> > >
>> > > This patch set is just the first in a series of proposed stack trace
>> > > reliability improvements. Future proposals will include runtime stack
>> > > reliability checking, as well as compile-time and runtime DWARF
>> > > validations.
>> > >
>> > > As far as I can tell, there are two main obstacles which prevent frame
>> > > pointer based stack traces from being reliable:
>> > >
>> > > 1) Missing frame pointer logic: currently, most assembly functions don't
>> > > set up the frame pointer.
>> >
>> > Could you please paste here the output of what the new checks print
>> > for x86/64 defconfig?
>>
>> Here are all 89 warnings from defconfig:
>>
>> arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e. Please use FUNC_ENTER.
>> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359. Please use FUNC_ENTER.
>> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be. Please use FUNC_ENTER.
>> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5. Please use FUNC_ENTER.
>> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21. Please use FUNC_ENTER.
>> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb. Please use FUNC_ENTER.
>> arch/x86/kernel/acpi/wakeup_64.o: wakeup_long64() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b. Please use FUNC_ENTER.
>> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7. Please use FUNC_ENTER.
>> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110. Please use FUNC_ENTER.
>> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145. Please use FUNC_ENTER.
>> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4. Please use FUNC_ENTER.
>> arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170. Please use FUNC_ENTER.
>> arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176. Please use FUNC_ENTER.
>> arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2. Please use FUNC_ENTER.
>> arch/x86/kernel/head_64.o: start_cpu0() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a. Please use FUNC_ENTER.
>> arch/x86/realmode/rm/copy.o: memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/realmode/rm/copy.o: memset() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/realmode/rm/bioscall.o: intcall() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/int80.o: __kernel_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/int80.o: __kernel_rt_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/int80.o: __kernel_vsyscall() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/syscall.o: __kernel_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/syscall.o: __kernel_rt_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/syscall.o: __kernel_vsyscall() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/sysenter.o: __kernel_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/sysenter.o: __kernel_rt_sigreturn() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69. Please use FUNC_ENTER.
>> arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d. Please use FUNC_ENTER.
>> arch/x86/lib/msr-reg.o: rdmsr_safe_regs() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/msr-reg.o: wrmsr_safe_regs() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/iomap_copy_64.o: __iowrite32_copy() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/clear_page_64.o: clear_page() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/clear_page_64.o: clear_page_orig() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/clear_page_64.o: clear_page_c_e() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/cmpxchg16b_emu.o: this_cpu_cmpxchg16b_emu() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/copy_page_64.o: copy_page() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/copy_page_64.o: copy_page_regs() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: _copy_to_user() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: _copy_from_user() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: copy_user_generic_unrolled() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: copy_user_generic_string() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: copy_user_enhanced_fast_string() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: __copy_user_nocache() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: bad_from_user() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/csum-copy_64.o: csum_partial_copy_generic() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/getuser.o: __get_user_1() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/getuser.o: __get_user_2() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/getuser.o: __get_user_4() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/getuser.o: __get_user_8() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5. Please use FUNC_ENTER.
>> arch/x86/lib/memcpy_64.o: memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/memcpy_64.o: __memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/memcpy_64.o: memcpy_erms() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/memcpy_64.o: memcpy_orig() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/memmove_64.o: memmove() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/memmove_64.o: __memmove() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5. Please use FUNC_ENTER.
>> arch/x86/lib/memset_64.o: memset() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/memset_64.o: __memset() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/memset_64.o: memset_erms() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/memset_64.o: memset_orig() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/putuser.o: __put_user_1() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/putuser.o: __put_user_2() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/putuser.o: __put_user_4() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/putuser.o: __put_user_8() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1. Please use FUNC_ENTER.
>> arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/boot/bioscall.o: intcall() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/boot/copy.o: memcpy() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/boot/copy.o: memset() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/boot/pmjump.o: protected_mode_jump() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/boot/pmjump.o: in_pm32() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e. Please use FUNC_ENTER.
>> arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172. Please use FUNC_ENTER.
>> arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic. Please use FUNC_ENTER.
>> arch/x86/boot/header.o: die() is missing frame pointer logic. Please use FUNC_ENTER.
>
> Yeah, so many of these seem to be 'leaf only' functions: functions
> that don't ever call functions themselves.
>
> So lets assume we always have CONFIG_FRAME_POINTERS=y.
>
> If they don't set up a frame pointer then they in essence won't show
> up in the call chain - but normally they wouldn't because they call
> nothing.
>
> If they trigger an exception/fault or if they get hit by an interrupt
> then I think we'll still correctly walk the stack - just those
> functions might be missing from the deterministic call chain, right?
> (it will still show up as a '?' entry.)
I've never quite understood what the '?' means.
>
> If they crash then we'll see them because the crashing RIP will be
> printed.
>
> So I'm wondering what the x86 policy here should be: to create frame
> pointers in them or not. Cc:-ed a few more gents for thoughts.
>
I think it would be nice to have full DWARF unwind support for
everything at some point. Unfortunately, I don't see any easy path to
getting there. It doesn't help that AFAIK no one has ever proposed a
usable in-kernel DWARF unwinder.
It also doesn't help that writing correct CFI annotations in inline
asm can be very complicated.
I think that ia64 manages to have complete unwind support. How did
they manage it?
If we had an unwinder, it would be relatively straightforward to write
something perf-based that would frequently check that we can unwind
all the way out of an NMI back to userspace and warn if we couldn't.
--Andy
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 16:03 ` Andy Lutomirski
@ 2015-05-20 16:25 ` Josh Poimboeuf
2015-05-20 16:39 ` Andy Lutomirski
` (2 more replies)
2015-05-20 17:27 ` Peter Zijlstra
1 sibling, 3 replies; 400+ messages in thread
From: Josh Poimboeuf @ 2015-05-20 16:25 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Michal Marek, Peter Zijlstra, X86 ML, live-patching,
linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton
On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
> On Wed, May 20, 2015 at 7:48 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > Yeah, so many of these seem to be 'leaf only' functions: functions
> > that don't ever call functions themselves.
> >
> > So lets assume we always have CONFIG_FRAME_POINTERS=y.
> >
> > If they don't set up a frame pointer then they in essence won't show
> > up in the call chain - but normally they wouldn't because they call
> > nothing.
> >
> > If they trigger an exception/fault or if they get hit by an interrupt
> > then I think we'll still correctly walk the stack - just those
> > functions might be missing from the deterministic call chain, right?
> > (it will still show up as a '?' entry.)
>
> I've never quite understood what the '?' means.
It basically means "here's a function address we found on the stack,
which may or may not have been called." It's needed because stack
walking isn't currently 100% reliable.
> > If they crash then we'll see them because the crashing RIP will be
> > printed.
> >
> > So I'm wondering what the x86 policy here should be: to create frame
> > pointers in them or not. Cc:-ed a few more gents for thoughts.
> >
>
> I think it would be nice to have full DWARF unwind support for
> everything at some point. Unfortunately, I don't see any easy path to
> getting there. It doesn't help that AFAIK no one has ever proposed a
> usable in-kernel DWARF unwinder.
>
> It also doesn't help that writing correct CFI annotations in inline
> asm can be very complicated.
>
> I think that ia64 manages to have complete unwind support. How did
> they manage it?
>
> If we had an unwinder, it would be relatively straightforward to write
> something perf-based that would frequently check that we can unwind
> all the way out of an NMI back to userspace and warn if we couldn't.
I agree that DWARF unwind support would be nice. I have some plans
about how to achieve that in future patch sets. It includes several
pieces:
- compile-time DWARF data validation (using some similar approaches to
this patch set)
- run time DWARF data validation, including:
- a DWARF unwinder which doesn't blindly trust any of the DWARF data
- ensuring DWARF and frame pointer data are consistent with each other
- ensuring it can walk all the way to the bottom of the stack
- a DEBUG option which validates the stack periodically from an NMI
and/or schedule()
--
Josh
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 16:25 ` Josh Poimboeuf
@ 2015-05-20 16:39 ` Andy Lutomirski
2015-05-20 16:52 ` Borislav Petkov
2015-05-20 16:59 ` [PATCH v4 0/3] Compile-time stack frame pointer validation Linus Torvalds
2 siblings, 0 replies; 400+ messages in thread
From: Andy Lutomirski @ 2015-05-20 16:39 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Michal Marek, Peter Zijlstra, X86 ML, live-patching,
linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton
On Wed, May 20, 2015 at 9:25 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
>> On Wed, May 20, 2015 at 7:48 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> > Yeah, so many of these seem to be 'leaf only' functions: functions
>> > that don't ever call functions themselves.
>> >
>> > So lets assume we always have CONFIG_FRAME_POINTERS=y.
>> >
>> > If they don't set up a frame pointer then they in essence won't show
>> > up in the call chain - but normally they wouldn't because they call
>> > nothing.
>> >
>> > If they trigger an exception/fault or if they get hit by an interrupt
>> > then I think we'll still correctly walk the stack - just those
>> > functions might be missing from the deterministic call chain, right?
>> > (it will still show up as a '?' entry.)
>>
>> I've never quite understood what the '?' means.
>
> It basically means "here's a function address we found on the stack,
> which may or may not have been called." It's needed because stack
> walking isn't currently 100% reliable.
>
>> > If they crash then we'll see them because the crashing RIP will be
>> > printed.
>> >
>> > So I'm wondering what the x86 policy here should be: to create frame
>> > pointers in them or not. Cc:-ed a few more gents for thoughts.
>> >
>>
>> I think it would be nice to have full DWARF unwind support for
>> everything at some point. Unfortunately, I don't see any easy path to
>> getting there. It doesn't help that AFAIK no one has ever proposed a
>> usable in-kernel DWARF unwinder.
>>
>> It also doesn't help that writing correct CFI annotations in inline
>> asm can be very complicated.
>>
>> I think that ia64 manages to have complete unwind support. How did
>> they manage it?
>>
>> If we had an unwinder, it would be relatively straightforward to write
>> something perf-based that would frequently check that we can unwind
>> all the way out of an NMI back to userspace and warn if we couldn't.
>
> I agree that DWARF unwind support would be nice. I have some plans
> about how to achieve that in future patch sets. It includes several
> pieces:
>
> - compile-time DWARF data validation (using some similar approaches to
> this patch set)
>
> - run time DWARF data validation, including:
> - a DWARF unwinder which doesn't blindly trust any of the DWARF data
Fantastic!
> - ensuring DWARF and frame pointer data are consistent with each other
> - ensuring it can walk all the way to the bottom of the stack
> - a DEBUG option which validates the stack periodically from an NMI
> and/or schedule()
We think alike :)
NMI will be much more interesting than schedule.
--Andy
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 16:25 ` Josh Poimboeuf
2015-05-20 16:39 ` Andy Lutomirski
@ 2015-05-20 16:52 ` Borislav Petkov
2015-05-21 10:16 ` Ingo Molnar
2015-05-20 16:59 ` [PATCH v4 0/3] Compile-time stack frame pointer validation Linus Torvalds
2 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-20 16:52 UTC (permalink / raw)
To: Josh Poimboeuf, Andy Lutomirski
Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Michal Marek, Peter Zijlstra, X86 ML, live-patching,
linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Andrew Morton
On Wed, May 20, 2015 at 11:25:37AM -0500, Josh Poimboeuf wrote:
> > I've never quite understood what the '?' means.
>
> It basically means "here's a function address we found on the stack,
> which may or may not have been called." It's needed because stack
> walking isn't currently 100% reliable.
Yeah, that was not that trivial to figure out at the time:
unsigned long
print_context_stack(struct thread_info *tinfo,
...
if (__kernel_text_address(addr)) {
if ((unsigned long) stack == bp + sizeof(long)) {
ops->address(data, addr, 1);
frame = frame->next_frame;
bp = (unsigned long) frame;
} else {
ops->address(data, addr, 0);
}
and that ops->address is
print_trace_address()
|-> printk_stack_address()
So if I'm understanding this correctly, if rBP+8 is equal to rSP, i.e.
return address is on the stack, then this frame got called.
Otherwise -> "?".
I might be missing something though...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 16:52 ` Borislav Petkov
@ 2015-05-21 10:16 ` Ingo Molnar
2015-05-21 10:47 ` Borislav Petkov
2015-05-27 14:17 ` [tip:x86/debug] x86/Documentation: Adapt Ingo' s " tip-bot for Borislav Petkov
0 siblings, 2 replies; 400+ messages in thread
From: Ingo Molnar @ 2015-05-21 10:16 UTC (permalink / raw)
To: Borislav Petkov
Cc: Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
live-patching, linux-kernel, Linus Torvalds, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, Peter Zijlstra, Andrew Morton
* Borislav Petkov <bp@alien8.de> wrote:
> On Wed, May 20, 2015 at 11:25:37AM -0500, Josh Poimboeuf wrote:
> > > I've never quite understood what the '?' means.
> >
> > It basically means "here's a function address we found on the stack,
> > which may or may not have been called." It's needed because stack
> > walking isn't currently 100% reliable.
>
> Yeah, that was not that trivial to figure out at the time:
>
> unsigned long
> print_context_stack(struct thread_info *tinfo,
> ...
>
> if (__kernel_text_address(addr)) {
> if ((unsigned long) stack == bp + sizeof(long)) {
> ops->address(data, addr, 1);
> frame = frame->next_frame;
> bp = (unsigned long) frame;
> } else {
> ops->address(data, addr, 0);
> }
>
> and that ops->address is
>
> print_trace_address()
> |-> printk_stack_address()
>
> So if I'm understanding this correctly, if rBP+8 is equal to rSP, i.e.
> return address is on the stack, then this frame got called.
>
> Otherwise -> "?".
>
> I might be missing something though...
So this is how we are printing backtraces on x86:
We always scan the full kernel stack for return addresses stored on
the kernel stack(s) [*], from stack top to stack bottom, and print out
anything that 'looks like' a kernel text address.
If it fits into the frame pointer chain, we print it without a
question mark, knowing that it's part of the real backtrace.
If the address does not fit into our expected frame pointer chain we
still print it, but we print a '?'. It can mean two things:
- either the address is not part of the call chain: it's just stale
values on the kernel stack, from earlier function calls. This is
the common case.
- or it is part of the call chain, but the frame pointer was not set
up properly within the function, so we don't recognize it. See the
200+ assembly functions that Josh's build time validation found.
This way we will always print out the real call chain (plus a few more
entries), regardless of whether the frame pointer was set up correctly
or not - but in most cases we'll get the call chain right as well. The
entries printed are strictly in stack order, so you can deduce more
information from that as well.
The most important property of this method is that we _never_ lose
information: we always strive to print _all_ addresses on the stack(s)
that look like kernel text addresses, so if debug information is
wrong, we still print out the real call chain as well - just with more
question marks than ideal.
Thanks,
Ingo
[*] For things like IRQ stacks and ISTs we also scan those stacks, in
the right order, and try to cross from one stack into another
reconstructing the call chain. This works most of the time.
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-21 10:16 ` Ingo Molnar
@ 2015-05-21 10:47 ` Borislav Petkov
2015-05-21 11:11 ` Ingo Molnar
2015-05-27 14:17 ` [tip:x86/debug] x86/Documentation: Adapt Ingo' s " tip-bot for Borislav Petkov
1 sibling, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-21 10:47 UTC (permalink / raw)
To: Ingo Molnar
Cc: Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
live-patching, linux-kernel, Linus Torvalds, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, Peter Zijlstra, Andrew Morton
On Thu, May 21, 2015 at 12:16:14PM +0200, Ingo Molnar wrote:
> So this is how we are printing backtraces on x86:
<snip useful info>
This is pretty useful info and the question about the '?' keeps popping
up.
How about I moved Documentation/x86/x86_64/kernel-stacks to
Documentation/x86/kernel-stacks and added that info to it?
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-21 10:47 ` Borislav Petkov
@ 2015-05-21 11:11 ` Ingo Molnar
2015-05-21 15:49 ` [PATCH 1/3] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
0 siblings, 1 reply; 400+ messages in thread
From: Ingo Molnar @ 2015-05-21 11:11 UTC (permalink / raw)
To: Borislav Petkov
Cc: Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
live-patching, linux-kernel, Linus Torvalds, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, Peter Zijlstra, Andrew Morton
* Borislav Petkov <bp@alien8.de> wrote:
> On Thu, May 21, 2015 at 12:16:14PM +0200, Ingo Molnar wrote:
> > So this is how we are printing backtraces on x86:
>
> <snip useful info>
>
> This is pretty useful info and the question about the '?' keeps popping
> up.
>
> How about I moved Documentation/x86/x86_64/kernel-stacks to
> Documentation/x86/kernel-stacks and added that info to it?
Yeah, please do!
Thanks,
Ingo
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH 1/3] x86/documentation: Move kernel-stacks doc one level up
2015-05-21 11:11 ` Ingo Molnar
@ 2015-05-21 15:49 ` Borislav Petkov
2015-05-21 15:49 ` [PATCH 2/3] x86/documentation: Remove STACKFAULT_STACK bulletpoint Borislav Petkov
2015-05-21 15:49 ` [PATCH 3/3] x86/documentation: Adapt Ingo's explanation on printing backtraces Borislav Petkov
0 siblings, 2 replies; 400+ messages in thread
From: Borislav Petkov @ 2015-05-21 15:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner,
H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
live-patching, Linus Torvalds, Denys Vlasenko, Brian Gerst,
Peter Zijlstra, Andrew Morton
From: Borislav Petkov <bp@suse.de>
... to Documentation/x86/ as it is going to collect more and not only
64-bit specific info.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: X86 ML <x86@kernel.org>
Cc: live-patching@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/x86/{x86_64 => }/kernel-stacks | 0
1 file changed, 0 insertions(+), 0 deletions(-)
rename Documentation/x86/{x86_64 => }/kernel-stacks (100%)
diff --git a/Documentation/x86/x86_64/kernel-stacks b/Documentation/x86/kernel-stacks
similarity index 100%
rename from Documentation/x86/x86_64/kernel-stacks
rename to Documentation/x86/kernel-stacks
--
2.3.5
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH 2/3] x86/documentation: Remove STACKFAULT_STACK bulletpoint
2015-05-21 15:49 ` [PATCH 1/3] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
@ 2015-05-21 15:49 ` Borislav Petkov
2015-05-21 15:49 ` [PATCH 3/3] x86/documentation: Adapt Ingo's explanation on printing backtraces Borislav Petkov
1 sibling, 0 replies; 400+ messages in thread
From: Borislav Petkov @ 2015-05-21 15:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner,
H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
live-patching, Linus Torvalds, Denys Vlasenko, Brian Gerst,
Peter Zijlstra, Andrew Morton
From: Borislav Petkov <bp@suse.de>
Update the documentation after
6f442be2fb22 ("x86_64, traps: Stop using IST for #SS").
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: X86 ML <x86@kernel.org>
Cc: live-patching@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/x86/kernel-stacks | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index e3c8a49d1a2f..c3c935b9d56e 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -1,3 +1,6 @@
+Kernel stacks on x86-64 bit
+---------------------------
+
Most of the text from Keith Owens, hacked by AK
x86_64 page size (PAGE_SIZE) is 4K.
@@ -56,13 +59,6 @@ If that assumption is ever broken then the stacks will become corrupt.
The currently assigned IST stacks are :-
-* STACKFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
-
- Used for interrupt 12 - Stack Fault Exception (#SS).
-
- This allows the CPU to recover from invalid stack segments. Rarely
- happens.
-
* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
Used for interrupt 8 - Double Fault Exception (#DF).
--
2.3.5
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 3/3] x86/documentation: Adapt Ingo's explanation on printing backtraces
2015-05-21 15:49 ` [PATCH 1/3] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
2015-05-21 15:49 ` [PATCH 2/3] x86/documentation: Remove STACKFAULT_STACK bulletpoint Borislav Petkov
@ 2015-05-21 15:49 ` Borislav Petkov
1 sibling, 0 replies; 400+ messages in thread
From: Borislav Petkov @ 2015-05-21 15:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner,
H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
live-patching, Linus Torvalds, Denys Vlasenko, Brian Gerst,
Peter Zijlstra, Andrew Morton
From: Borislav Petkov <bp@suse.de>
Hold it down for future reference, as the question about the question
mark in stack traces keeps popping up.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: X86 ML <x86@kernel.org>
Cc: live-patching@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150521101614.GA10889@gmail.com
---
Documentation/x86/kernel-stacks | 44 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index c3c935b9d56e..0f3a6c201943 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -95,3 +95,47 @@ The currently assigned IST stacks are :-
assumptions about the previous state of the kernel stack.
For more details see the Intel IA32 or AMD AMD64 architecture manuals.
+
+
+Printing backtraces on x86
+--------------------------
+
+The question about the '?' preceding function names in an x86 stacktrace
+keeps popping up, here's an indepth explanation. It helps if the reader
+stares at print_context_stack() and the whole machinery in and around
+arch/x86/kernel/dumpstack.c.
+
+Adapted from Ingo's mail, Message-ID: <20150521101614.GA10889@gmail.com>:
+
+We always scan the full kernel stack for return addresses stored on
+the kernel stack(s) [*], from stack top to stack bottom, and print out
+anything that 'looks like' a kernel text address.
+
+If it fits into the frame pointer chain, we print it without a question
+mark, knowing that it's part of the real backtrace.
+
+If the address does not fit into our expected frame pointer chain we
+still print it, but we print a '?'. It can mean two things:
+
+ - either the address is not part of the call chain: it's just stale
+ values on the kernel stack, from earlier function calls. This is
+ the common case.
+
+ - or it is part of the call chain, but the frame pointer was not set
+ up properly within the function, so we don't recognize it.
+
+This way we will always print out the real call chain (plus a few more
+entries), regardless of whether the frame pointer was set up correctly
+or not - but in most cases we'll get the call chain right as well. The
+entries printed are strictly in stack order, so you can deduce more
+information from that as well.
+
+The most important property of this method is that we _never_ lose
+information: we always strive to print _all_ addresses on the stack(s)
+that look like kernel text addresses, so if debug information is wrong,
+we still print out the real call chain as well - just with more question
+marks than ideal.
+
+[*] For things like IRQ and IST stacks, we also scan those stacks, in
+ the right order, and try to cross from one stack into another
+ reconstructing the call chain. This works most of the time.
--
2.3.5
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/debug] x86/Documentation: Adapt Ingo' s explanation on printing backtraces
2015-05-21 10:16 ` Ingo Molnar
2015-05-21 10:47 ` Borislav Petkov
@ 2015-05-27 14:17 ` tip-bot for Borislav Petkov
1 sibling, 0 replies; 400+ messages in thread
From: tip-bot for Borislav Petkov @ 2015-05-27 14:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: hpa, brgerst, bp, peterz, torvalds, mingo, mmarek, luto,
dvlasenk, tglx, jpoimboe, luto, akpm, a.p.zijlstra, bp,
linux-kernel
Commit-ID: 113b5e3720e79ad938374163c1b8e295521dc9cf
Gitweb: http://git.kernel.org/tip/113b5e3720e79ad938374163c1b8e295521dc9cf
Author: Borislav Petkov <bp@suse.de>
AuthorDate: Tue, 26 May 2015 10:28:20 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:39:47 +0200
x86/Documentation: Adapt Ingo's explanation on printing backtraces
Hold it down for future reference, as the question about the
question mark in stack traces keeps popping up.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/1432628901-18044-18-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/20150521101614.GA10889@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
Documentation/x86/kernel-stacks | 44 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index c3c935b..0f3a6c2 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -95,3 +95,47 @@ The currently assigned IST stacks are :-
assumptions about the previous state of the kernel stack.
For more details see the Intel IA32 or AMD AMD64 architecture manuals.
+
+
+Printing backtraces on x86
+--------------------------
+
+The question about the '?' preceding function names in an x86 stacktrace
+keeps popping up, here's an indepth explanation. It helps if the reader
+stares at print_context_stack() and the whole machinery in and around
+arch/x86/kernel/dumpstack.c.
+
+Adapted from Ingo's mail, Message-ID: <20150521101614.GA10889@gmail.com>:
+
+We always scan the full kernel stack for return addresses stored on
+the kernel stack(s) [*], from stack top to stack bottom, and print out
+anything that 'looks like' a kernel text address.
+
+If it fits into the frame pointer chain, we print it without a question
+mark, knowing that it's part of the real backtrace.
+
+If the address does not fit into our expected frame pointer chain we
+still print it, but we print a '?'. It can mean two things:
+
+ - either the address is not part of the call chain: it's just stale
+ values on the kernel stack, from earlier function calls. This is
+ the common case.
+
+ - or it is part of the call chain, but the frame pointer was not set
+ up properly within the function, so we don't recognize it.
+
+This way we will always print out the real call chain (plus a few more
+entries), regardless of whether the frame pointer was set up correctly
+or not - but in most cases we'll get the call chain right as well. The
+entries printed are strictly in stack order, so you can deduce more
+information from that as well.
+
+The most important property of this method is that we _never_ lose
+information: we always strive to print _all_ addresses on the stack(s)
+that look like kernel text addresses, so if debug information is wrong,
+we still print out the real call chain as well - just with more question
+marks than ideal.
+
+[*] For things like IRQ and IST stacks, we also scan those stacks, in
+ the right order, and try to cross from one stack into another
+ reconstructing the call chain. This works most of the time.
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 16:25 ` Josh Poimboeuf
2015-05-20 16:39 ` Andy Lutomirski
2015-05-20 16:52 ` Borislav Petkov
@ 2015-05-20 16:59 ` Linus Torvalds
2015-05-20 17:20 ` Josh Poimboeuf
2015-05-21 7:52 ` Ingo Molnar
2 siblings, 2 replies; 400+ messages in thread
From: Linus Torvalds @ 2015-05-20 16:59 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
live-patching, linux-kernel, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton
On Wed, May 20, 2015 at 9:25 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
>>
>> I've never quite understood what the '?' means.
>
> It basically means "here's a function address we found on the stack,
> which may or may not have been called." It's needed because stack
> walking isn't currently 100% reliable.
It is often quite interesting and helpful, because it shows stale data
on the stack, giving clues about what happened just before.
Now, I'd like gcc to generally be better about not wasting so much
stack frame, so in that sense I'd like to see fewer '?" entries just
from a code quality standpoint, but when debugging those things, the
downside of "noise" is often cancelled by the upside of "ahh, it
happens after calling X".
So the "perfect stack frames" is actually not as great a thing as some
people want to make it seem.
Linus
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 16:59 ` [PATCH v4 0/3] Compile-time stack frame pointer validation Linus Torvalds
@ 2015-05-20 17:20 ` Josh Poimboeuf
2015-05-21 10:27 ` Ingo Molnar
2015-05-21 7:52 ` Ingo Molnar
1 sibling, 1 reply; 400+ messages in thread
From: Josh Poimboeuf @ 2015-05-20 17:20 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
live-patching, linux-kernel, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton
On Wed, May 20, 2015 at 09:59:18AM -0700, Linus Torvalds wrote:
> On Wed, May 20, 2015 at 9:25 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
> >>
> >> I've never quite understood what the '?' means.
> >
> > It basically means "here's a function address we found on the stack,
> > which may or may not have been called." It's needed because stack
> > walking isn't currently 100% reliable.
>
> It is often quite interesting and helpful, because it shows stale data
> on the stack, giving clues about what happened just before.
>
> Now, I'd like gcc to generally be better about not wasting so much
> stack frame, so in that sense I'd like to see fewer '?" entries just
> from a code quality standpoint, but when debugging those things, the
> downside of "noise" is often cancelled by the upside of "ahh, it
> happens after calling X".
>
> So the "perfect stack frames" is actually not as great a thing as some
> people want to make it seem.
Ok, I can see how looking at stale stack data could be useful for some
of the really tough problems.
But right now, the meaning of '?' is ambiguous. It could be stale data,
or it could be part of a frame for the current stack which was skipped
due to missing frame pointers or an exception.
If we can somehow make the stack unwinder reliable, then it would at
least allow us to remove the ambiguity of the '?' entries. And it would
reduce the "noise" for the majority of issues where we don't care about
stale stack data, and can simply ignore it.
--
Josh
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 17:20 ` Josh Poimboeuf
@ 2015-05-21 10:27 ` Ingo Molnar
0 siblings, 0 replies; 400+ messages in thread
From: Ingo Molnar @ 2015-05-21 10:27 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Linus Torvalds, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
live-patching, linux-kernel, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton
* Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, May 20, 2015 at 09:59:18AM -0700, Linus Torvalds wrote:
> > On Wed, May 20, 2015 at 9:25 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
> > >>
> > >> I've never quite understood what the '?' means.
> > >
> > > It basically means "here's a function address we found on the
> > > stack, which may or may not have been called." It's needed
> > > because stack walking isn't currently 100% reliable.
> >
> > It is often quite interesting and helpful, because it shows stale
> > data on the stack, giving clues about what happened just before.
> >
> > Now, I'd like gcc to generally be better about not wasting so much
> > stack frame, so in that sense I'd like to see fewer '?" entries
> > just from a code quality standpoint, but when debugging those
> > things, the downside of "noise" is often cancelled by the upside
> > of "ahh, it happens after calling X".
> >
> > So the "perfect stack frames" is actually not as great a thing as
> > some people want to make it seem.
>
> Ok, I can see how looking at stale stack data could be useful for
> some of the really tough problems.
And note that the tough problems are actually the ones where we need
that information the most. So any stack backtrace printing method must
be biased towards helping the difficult scenarios - not the trivial
crashes. That is one of the reasons why we are always printing the
question marks.
> But right now, the meaning of '?' is ambiguous. It could be stale
> data, or it could be part of a frame for the current stack which was
> skipped due to missing frame pointers or an exception.
Yes, of course. That's not a big problem as the actual symbolic
information will tell us a lot, which allows us to reconstruct the
real call chain, plus allows us to see any 'recent execution activity'
that might be on the stack as stale entries.
> If we can somehow make the stack unwinder reliable, then it would at
> least allow us to remove the ambiguity of the '?' entries. And it
> would reduce the "noise" for the majority of issues where we don't
> care about stale stack data, and can simply ignore it.
Yes, but note the above consideration - the probability distribution
of kernel bugs tends to have a _very_ long tail, with bugs that
sometimes take years to trigger and fix. Kernel developers upstream
and at distros tend to spend a disproportionately large amount of time
staring at difficult to decode bugs.
For that reason it is far more important to still stay maintainable
with those kinds of difficult bugs, than to make the resolution of
trivial, unambiguous crashes a tiny bit easier by printing fewer
'distractions'...
Also, note that the '?' entries have another role: they cross-check
the unwinder.
If you think we'll be able to do a perfect unwinder then think again:
debug info _will_ be messed up periodically, either by us or by
tooling, because right now no kernel code or other functionality
relies on perfect unwinding.
So this is not like C++ exception handling where broken unwinding will
break the code. This is something that is literally only visible in
kernel logs currently, as a slight anomaly.
So any x86 stack unwinder code must be fundamentally based on the idea
and expectation that stack unwinding is always going to be somewhat
imperfect and somewhat statistical.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 16:59 ` [PATCH v4 0/3] Compile-time stack frame pointer validation Linus Torvalds
2015-05-20 17:20 ` Josh Poimboeuf
@ 2015-05-21 7:52 ` Ingo Molnar
2015-05-21 12:12 ` Ingo Molnar
2015-05-26 23:06 ` Andi Kleen
1 sibling, 2 replies; 400+ messages in thread
From: Ingo Molnar @ 2015-05-21 7:52 UTC (permalink / raw)
To: Linus Torvalds
Cc: Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
live-patching, linux-kernel, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Wed, May 20, 2015 at 9:25 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
> >>
> >> I've never quite understood what the '?' means.
> >
> > It basically means "here's a function address we found on the
> > stack, which may or may not have been called." It's needed
> > because stack walking isn't currently 100% reliable.
>
> It is often quite interesting and helpful, because it shows stale
> data on the stack, giving clues about what happened just before.
Yes, it's basically a zero-cost tracer: often showing a partial trace
of events that happened before.
> Now, I'd like gcc to generally be better about not wasting so much
> stack frame, so in that sense I'd like to see fewer '?" entries just
> from a code quality standpoint, but when debugging those things, the
> downside of "noise" is often cancelled by the upside of "ahh, it
> happens after calling X".
>
> So the "perfect stack frames" is actually not as great a thing as
> some people want to make it seem.
We should definitely also print out the '?' entries, they are very
useful especially when analyzing rare, difficult to reproduce,
sporadic bugs - which are usually the hardest to fix bugs.
The biggest long term plus of 'perfect stack frames' would not be to
skip the '?' entries (we don't want to skip them!), but to be able to
eventually build the kernel without frame pointers.
Especially on modern x86 CPUs with stack engines (latest Intel and AMD
CPUs) that keeps ESP updates out of the later stages of execution
pipelines, going from RBP framepointers to direct ESP use is
beneficial to performance and compresses I$ footprint as well:
text data bss dec hex filename
12150606 2565544 1634304 16350454 f97cf6 linux-CONFIG_FRAME_POINTERS=n/vmlinux
13282884 2571744 1617920 17472548 10a9c24 linux-CONFIG_FRAME_POINTERS=y/vmlinux
Here's the I$ cachemiss rate with the 'vfs-mix' workload that I used
in the -falign-functions measuremenst gives this for
CONFIG_FRAMEPOINTERS=y, on Intel Sandy Bridge (best of 9x10 runs):
#
# CONFIG_FRAMEPOINTERS=y
#
Performance counter stats for 'system wide' (10 runs):
728,328,347 L1-icache-load-misses ( +- 0.08% ) (100.00%)
11,891,931,664 instructions ( +- 0.00% )
300,023 context-switches ( +- 0.00% )
7.324048170 seconds time elapsed ( +- 0.09% )
... and these are the I$ miss perf stats from running the same
workload on a CONFIG_FRAMEPOINTERS=n kernel:
#
# CONFIG_FRAMEPOINTERS are not set
#
Performance counter stats for 'system wide' (10 runs):
687,758,078 L1-icache-load-misses ( +- 0.10% ) (100.00%)
10,984,908,013 instructions ( +- 0.01% )
300,021 context-switches ( +- 0.00% )
7.120867260 seconds time elapsed ( +- 0.29% )
So if we disable frame pointers, then on this workload:
- the kernel text size is 9.3% smaller
- the number of instructions executed went down by about 8.2%
- the cachemiss rate went down by about 5.9%
- performance went up by about 2.8%.
The speedup is actually even better than 2.8%, if you look at average
execution time:
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.324048170 seconds time elapsed ( +- 0.09% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.470166715 seconds time elapsed ( +- 1.01% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.365047474 seconds time elapsed ( +- 0.25% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.828223324 seconds time elapsed ( +- 2.04% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.427164489 seconds time elapsed ( +- 0.70% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.385565350 seconds time elapsed ( +- 0.35% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.560782318 seconds time elapsed ( +- 1.68% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.399741309 seconds time elapsed ( +- 0.74% )
linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.303746766 seconds time elapsed ( +- 0.04% )
avg = 7.451609
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.201498813 seconds time elapsed ( +- 0.86% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.120867260 seconds time elapsed ( +- 0.29% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.141642635 seconds time elapsed ( +- 0.15% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.217213506 seconds time elapsed ( +- 0.85% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.163046581 seconds time elapsed ( +- 0.56% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.128939439 seconds time elapsed ( +- 0.23% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.256172853 seconds time elapsed ( +- 0.82% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.122946768 seconds time elapsed ( +- 0.23% )
linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.126018578 seconds time elapsed ( +- 0.18% )
avg = 7.164260
Then with framepointers disabled this workload gets faster by 4.0% on
average.
The average result is also pretty stable in the no-framepointers case,
while it fluctuates more in the framepointers case. (and this is why
the 'best runtime' favors the framepointers case - the average is
closer to reality.)
So the performance advantages of not doing framepointers is not
something we can ignore IMHO: but obviously performance isn't
everything - so if stack unwinding is unrobust, then we need and
want frame pointers.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-21 7:52 ` Ingo Molnar
@ 2015-05-21 12:12 ` Ingo Molnar
2015-05-26 23:06 ` Andi Kleen
1 sibling, 0 replies; 400+ messages in thread
From: Ingo Molnar @ 2015-05-21 12:12 UTC (permalink / raw)
To: Linus Torvalds
Cc: Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
live-patching, linux-kernel, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton
* Ingo Molnar <mingo@kernel.org> wrote:
> Especially on modern x86 CPUs with stack engines (latest Intel and
> AMD CPUs) that keeps ESP updates out of the later stages of
> execution pipelines, going from RBP framepointers to direct ESP use
> is beneficial to performance and compresses I$ footprint as well:
>
> text data bss dec hex filename
> 12150606 2565544 1634304 16350454 f97cf6 linux-CONFIG_FRAME_POINTERS=n/vmlinux
> 13282884 2571744 1617920 17472548 10a9c24 linux-CONFIG_FRAME_POINTERS=y/vmlinux
Correction: I ran that with a 1-byte alignment patch still applied.
I reran all the numbers with the default 16-bytes alignment as well,
and the gap between framepointers and no-framepointers become smaller,
but the various trends and conclusions still hold.
Here are the updated numbers:
text data bss dec hex filename
13548564 2571744 1617920 17738228 10ea9f4 linux-CONFIG_FRAME_POINTERS=n/vmlinux
13797773 2571744 1617920 17987437 112776d linux-CONFIG_FRAME_POINTERS=y/vmlinux
> Here's the I$ cachemiss rate with the 'vfs-mix' workload that I used
> in the -falign-functions measuremenst gives this for
> CONFIG_FRAMEPOINTERS=y, on Intel Sandy Bridge (best of 9x10 runs):
>
> #
> # CONFIG_FRAMEPOINTERS=y
> #
> Performance counter stats for 'system wide' (10 runs):
>
> 728,328,347 L1-icache-load-misses ( +- 0.08% ) (100.00%)
> 11,891,931,664 instructions ( +- 0.00% )
> 300,023 context-switches ( +- 0.00% )
>
> 7.324048170 seconds time elapsed ( +- 0.09% )
Performance counter stats for 'system wide' (10 runs):
701,525,006 L1-icache-load-misses ( +- 0.06% ) (100.00%)
11,891,793,196 instructions ( +- 0.01% )
300,036 context-switches ( +- 0.00% )
7.354372294 seconds time elapsed ( +- 0.82% )
>
> ... and these are the I$ miss perf stats from running the same
> workload on a CONFIG_FRAMEPOINTERS=n kernel:
>
> #
> # CONFIG_FRAMEPOINTERS are not set
> #
> Performance counter stats for 'system wide' (10 runs):
>
> 687,758,078 L1-icache-load-misses ( +- 0.10% ) (100.00%)
> 10,984,908,013 instructions ( +- 0.01% )
> 300,021 context-switches ( +- 0.00% )
>
> 7.120867260 seconds time elapsed ( +- 0.29% )
Performance counter stats for 'system wide' (10 runs):
685,107,089 L1-icache-load-misses ( +- 0.08% ) (100.00%)
10,983,861,590 instructions ( +- 0.01% )
300,031 context-switches ( +- 0.00% )
7.120738452 seconds time elapsed ( +- 0.35% )
> So if we disable frame pointers, then on this workload:
>
> - the kernel text size is 9.3% smaller
> - the number of instructions executed went down by about 8.2%
> - the cachemiss rate went down by about 5.9%
> - performance went up by about 2.8%.
- the kernel text size is 1.8% smaller: with 16 bytes alignment
there's quite some extra free space the frame pointer code can
grow into, which reduces the size win.
- the number of instructions executed went down by about 8.2% (as
expected this is invariant of alignment.)
- the cachemiss rate went down by about 2.7%: this is a smaller
win again, partly because of the 'free space' 16-byte alignment
gives us.
- the best 'time elapsed' numbers out of 10 runs show a speedup of
2.0% - close to the 2.8% with 1-byte alignment.
> The speedup is actually even better than 2.8%, if you look at
> average execution time:
>
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.324048170 seconds time elapsed ( +- 0.09% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.470166715 seconds time elapsed ( +- 1.01% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.365047474 seconds time elapsed ( +- 0.25% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.828223324 seconds time elapsed ( +- 2.04% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.427164489 seconds time elapsed ( +- 0.70% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.385565350 seconds time elapsed ( +- 0.35% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.560782318 seconds time elapsed ( +- 1.68% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.399741309 seconds time elapsed ( +- 0.74% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.303746766 seconds time elapsed ( +- 0.04% )
>
> avg = 7.451609
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.300875812 seconds time elapsed ( +- 0.17% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.491652338 seconds time elapsed ( +- 1.33% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.307877300 seconds time elapsed ( +- 0.20% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.258946461 seconds time elapsed ( +- 0.23% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.295113779 seconds time elapsed ( +- 0.30% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.283375859 seconds time elapsed ( +- 0.21% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.319320205 seconds time elapsed ( +- 0.38% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.354372294 seconds time elapsed ( +- 0.82% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.308955558 seconds time elapsed ( +- 0.26% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.295267101 seconds time elapsed ( +- 0.26% )
avg=7.32
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.201498813 seconds time elapsed ( +- 0.86% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.120867260 seconds time elapsed ( +- 0.29% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.141642635 seconds time elapsed ( +- 0.15% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.217213506 seconds time elapsed ( +- 0.85% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.163046581 seconds time elapsed ( +- 0.56% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.128939439 seconds time elapsed ( +- 0.23% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.256172853 seconds time elapsed ( +- 0.82% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.122946768 seconds time elapsed ( +- 0.23% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.126018578 seconds time elapsed ( +- 0.18% )
>
> avg = 7.164260
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.135061084 seconds time elapsed ( +- 0.39% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.132738388 seconds time elapsed ( +- 0.34% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.174334895 seconds time elapsed ( +- 0.32% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.215143851 seconds time elapsed ( +- 0.71% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.131166029 seconds time elapsed ( +- 0.19% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.270427197 seconds time elapsed ( +- 1.22% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.120738452 seconds time elapsed ( +- 0.35% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.168856127 seconds time elapsed ( +- 0.27% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.268637173 seconds time elapsed ( +- 1.28% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.178431781 seconds time elapsed ( +- 0.32% )
avg=7.18
> Then with framepointers disabled this workload gets faster by 4.0%
> on average.
With 16-byte alignment the average gets faster by 2.8%.
The conclusions are unchanged:
> The average result is also pretty stable in the no-framepointers
> case, while it fluctuates more in the framepointers case. (and this
> is why the 'best runtime' favors the framepointers case - the
> average is closer to reality.)
>
> So the performance advantages of not doing framepointers is not
> something we can ignore IMHO: but obviously performance isn't
> everything - so if stack unwinding is unrobust, then we need and
> want frame pointers.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-21 7:52 ` Ingo Molnar
2015-05-21 12:12 ` Ingo Molnar
@ 2015-05-26 23:06 ` Andi Kleen
1 sibling, 0 replies; 400+ messages in thread
From: Andi Kleen @ 2015-05-26 23:06 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, Michal Marek, Peter Zijlstra,
X86 ML, live-patching, linux-kernel, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, Peter Zijlstra, Borislav Petkov,
Andrew Morton
Ingo Molnar <mingo@kernel.org> writes:
>
> Especially on modern x86 CPUs with stack engines (latest Intel and AMD
> CPUs) that keeps ESP updates out of the later stages of execution
> pipelines, going from RBP framepointers to direct ESP use is
> beneficial to performance and compresses I$ footprint as well:
Note that Atom doesn't have this stack engine, so you'll likely
see even more difference there.
> So the performance advantages of not doing framepointers is not
> something we can ignore IMHO:
Agreed.
> but obviously performance isn't
> everything - so if stack unwinding is unrobust, then we need and
> want frame pointers.
It wasn't that bad in the old days with the approx stack traces. In
fact I bet it would be possible to write an automated tool that weeds
out many (most?) false positives automatically with a static
compile-time callgraph.
It would be good to at least make it easier building without them
again. Currently it's very difficult because a lot of subsystems force
select frame pointers.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 16:03 ` Andy Lutomirski
2015-05-20 16:25 ` Josh Poimboeuf
@ 2015-05-20 17:27 ` Peter Zijlstra
2015-05-20 19:10 ` Jiri Kosina
1 sibling, 1 reply; 400+ messages in thread
From: Peter Zijlstra @ 2015-05-20 17:27 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Ingo Molnar, Josh Poimboeuf, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michal Marek, X86 ML, live-patching,
linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Borislav Petkov, Andrew Morton
On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
> I think it would be nice to have full DWARF unwind support for
> everything at some point. Unfortunately, I don't see any easy path to
> getting there. It doesn't help that AFAIK no one has ever proposed a
> usable in-kernel DWARF unwinder.
There's a bit of history here; SuSE (iirc) actually has one, however:
https://lkml.org/lkml/2012/2/10/356
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 17:27 ` Peter Zijlstra
@ 2015-05-20 19:10 ` Jiri Kosina
0 siblings, 0 replies; 400+ messages in thread
From: Jiri Kosina @ 2015-05-20 19:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andy Lutomirski, Ingo Molnar, Josh Poimboeuf, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, Michal Marek, X86 ML, live-patching,
linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Borislav Petkov, Andrew Morton
On Wed, 20 May 2015, Peter Zijlstra wrote:
> > I think it would be nice to have full DWARF unwind support for
> > everything at some point. Unfortunately, I don't see any easy path to
> > getting there. It doesn't help that AFAIK no one has ever proposed a
> > usable in-kernel DWARF unwinder.
>
> There's a bit of history here; SuSE (iirc) actually has one, however:
>
> https://lkml.org/lkml/2012/2/10/356
Oh absolutely, there are stories behind this :)
Just for the sake of completness -- the current implementation can be
found in our public GIT repository, for not-really-complete picture see
[1] [2] [3] [4].
It turned out to be rather useful on many ocasions when debugging customer
reports, but I of course also understand what Linus is saying above. The
bugs in unwinder can be *really* painful. Our experience so far has been
that it did pay off at the end of the day (and of course analyzing
stacktraces is our daily bread).
[1] http://kernel.suse.com/cgit/kernel-source/tree/patches.suse/stack-unwind?h=SLE12
[2] http://kernel.suse.com/cgit/kernel-source/tree/patches.suse/no-frame-pointer-select?h=SLE12
[3] http://kernel.suse.com/cgit/kernel-source/tree/patches.arch/stack-unwind-cfi_ignore-takes-more-arguments?h=SLE12
[4] http://kernel.suse.com/cgit/kernel-source/tree/patches.arch/x86_64-unwind-annotations?h=SLE12
--
Jiri Kosina
SUSE Labs
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-20 14:48 ` Ingo Molnar
2015-05-20 15:51 ` Josh Poimboeuf
2015-05-20 16:03 ` Andy Lutomirski
@ 2015-05-21 20:54 ` Josh Poimboeuf
2015-05-21 21:53 ` Andy Lutomirski
2015-05-21 22:01 ` Borislav Petkov
2 siblings, 2 replies; 400+ messages in thread
From: Josh Poimboeuf @ 2015-05-21 20:54 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michal Marek,
Peter Zijlstra, x86, live-patching, linux-kernel, Linus Torvalds,
Andy Lutomirski, Denys Vlasenko, Brian Gerst, Peter Zijlstra,
Borislav Petkov, Andrew Morton
On Wed, May 20, 2015 at 04:48:10PM +0200, Ingo Molnar wrote:
> Yeah, so many of these seem to be 'leaf only' functions: functions
> that don't ever call functions themselves.
>
> So lets assume we always have CONFIG_FRAME_POINTERS=y.
>
> If they don't set up a frame pointer then they in essence won't show
> up in the call chain - but normally they wouldn't because they call
> nothing.
>
> If they trigger an exception/fault or if they get hit by an interrupt
> then I think we'll still correctly walk the stack - just those
> functions might be missing from the deterministic call chain, right?
> (it will still show up as a '?' entry.)
>
> If they crash then we'll see them because the crashing RIP will be
> printed.
>
> So I'm wondering what the x86 policy here should be: to create frame
> pointers in them or not. Cc:-ed a few more gents for thoughts.
After removing the frame pointer checks for leaf functions, and adding a
check for all functions which jump outside of their scope, the number of
defconfig warnings dropped from 89 -> 47. The Fedora config warning
count dropped from 207 -> 83.
Here are the remaining 47 warnings for defconfig:
stackvalidate: arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic
stackvalidate: arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e
stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359
stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be
stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5
stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21
stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb
stackvalidate: arch/x86/kernel/acpi/wakeup_64.o: unsupported jump to outside of the function at wakeup_long64+0x15
stackvalidate: arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic
stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b
stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7
stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110
stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145
stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4
stackvalidate: arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2
stackvalidate: arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic
stackvalidate: arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic
stackvalidate: arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170
stackvalidate: arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176
stackvalidate: arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a
stackvalidate: arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic
stackvalidate: arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic
stackvalidate: arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69
stackvalidate: arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d
stackvalidate: arch/x86/lib/copy_user_64.o: unsupported jump to outside of the function at _copy_to_user+0x25
stackvalidate: arch/x86/lib/copy_user_64.o: unsupported jump to outside of the function at _copy_from_user+0x25
stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_1+0x14
stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_2+0x4
stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_4+0x4
stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_8+0x4
stackvalidate: arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5
stackvalidate: arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5
stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_1+0x14
stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_2+0x1b
stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_4+0x1b
stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_8+0x1b
stackvalidate: arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1
stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic
stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic
stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic
stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic
stackvalidate: arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic
stackvalidate: arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic
stackvalidate: arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e
stackvalidate: arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172
stackvalidate: arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic
stackvalidate: arch/x86/boot/pmjump.o: unsupported jump to outside of the function at in_pm32+0x1c
Note that only 13 of the 47 warnings are actually due to missing frame
pointer logic. The rest are ambiguous conditions which prevent
stackvalidate from being able to make sense of things: returning from
outside of a proper ELF function, or jumping from inside of a function
to outside of its scope.
Similarly, in the Fedora config case, only 27 of the 83 warnings are for
missing frame pointer logic.
If there are no objections, I'll go with this approach in the next
version of the patch set.
Thanks!
--
Josh
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-21 20:54 ` Josh Poimboeuf
@ 2015-05-21 21:53 ` Andy Lutomirski
2015-05-22 14:53 ` Josh Poimboeuf
2015-05-21 22:01 ` Borislav Petkov
1 sibling, 1 reply; 400+ messages in thread
From: Andy Lutomirski @ 2015-05-21 21:53 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Michal Marek, Peter Zijlstra, X86 ML, live-patching,
linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton
On Thu, May 21, 2015 at 1:54 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, May 20, 2015 at 04:48:10PM +0200, Ingo Molnar wrote:
>> Yeah, so many of these seem to be 'leaf only' functions: functions
>> that don't ever call functions themselves.
>>
>> So lets assume we always have CONFIG_FRAME_POINTERS=y.
>>
>> If they don't set up a frame pointer then they in essence won't show
>> up in the call chain - but normally they wouldn't because they call
>> nothing.
>>
>> If they trigger an exception/fault or if they get hit by an interrupt
>> then I think we'll still correctly walk the stack - just those
>> functions might be missing from the deterministic call chain, right?
>> (it will still show up as a '?' entry.)
>>
>> If they crash then we'll see them because the crashing RIP will be
>> printed.
>>
>> So I'm wondering what the x86 policy here should be: to create frame
>> pointers in them or not. Cc:-ed a few more gents for thoughts.
>
> After removing the frame pointer checks for leaf functions, and adding a
> check for all functions which jump outside of their scope, the number of
> defconfig warnings dropped from 89 -> 47. The Fedora config warning
> count dropped from 207 -> 83.
>
> Here are the remaining 47 warnings for defconfig:
>
> stackvalidate: arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic
> stackvalidate: arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e
> stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359
> stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be
> stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5
> stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21
> stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb
> stackvalidate: arch/x86/kernel/acpi/wakeup_64.o: unsupported jump to outside of the function at wakeup_long64+0x15
> stackvalidate: arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic
> stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b
> stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7
> stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110
> stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145
> stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4
> stackvalidate: arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2
> stackvalidate: arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic
> stackvalidate: arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic
> stackvalidate: arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170
> stackvalidate: arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176
> stackvalidate: arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a
> stackvalidate: arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic
> stackvalidate: arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic
> stackvalidate: arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69
> stackvalidate: arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d
> stackvalidate: arch/x86/lib/copy_user_64.o: unsupported jump to outside of the function at _copy_to_user+0x25
> stackvalidate: arch/x86/lib/copy_user_64.o: unsupported jump to outside of the function at _copy_from_user+0x25
> stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_1+0x14
> stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_2+0x4
> stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_4+0x4
> stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_8+0x4
> stackvalidate: arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5
> stackvalidate: arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5
> stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_1+0x14
> stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_2+0x1b
> stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_4+0x1b
> stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_8+0x1b
> stackvalidate: arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1
> stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic
> stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic
> stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic
> stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic
> stackvalidate: arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic
> stackvalidate: arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic
> stackvalidate: arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e
> stackvalidate: arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172
> stackvalidate: arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic
> stackvalidate: arch/x86/boot/pmjump.o: unsupported jump to outside of the function at in_pm32+0x1c
>
> Note that only 13 of the 47 warnings are actually due to missing frame
> pointer logic. The rest are ambiguous conditions which prevent
> stackvalidate from being able to make sense of things: returning from
> outside of a proper ELF function, or jumping from inside of a function
> to outside of its scope.
>
> Similarly, in the Fedora config case, only 27 of the 83 warnings are for
> missing frame pointer logic.
>
> If there are no objections, I'll go with this approach in the next
> version of the patch set.
I'm willing to review anything with "entry" in its filename.
--Andy
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-21 21:53 ` Andy Lutomirski
@ 2015-05-22 14:53 ` Josh Poimboeuf
0 siblings, 0 replies; 400+ messages in thread
From: Josh Poimboeuf @ 2015-05-22 14:53 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Michal Marek, Peter Zijlstra, X86 ML, live-patching,
linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton
On Thu, May 21, 2015 at 02:53:07PM -0700, Andy Lutomirski wrote:
> On Thu, May 21, 2015 at 1:54 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > After removing the frame pointer checks for leaf functions, and adding a
> > check for all functions which jump outside of their scope, the number of
> > defconfig warnings dropped from 89 -> 47. The Fedora config warning
> > count dropped from 207 -> 83.
> >
> > Here are the remaining 47 warnings for defconfig:
> >
> > stackvalidate: arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic
> > stackvalidate: arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e
> > stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359
> > stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be
> > stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5
> > stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21
> > stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb
> > stackvalidate: arch/x86/kernel/acpi/wakeup_64.o: unsupported jump to outside of the function at wakeup_long64+0x15
> > stackvalidate: arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic
> > stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b
> > stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7
> > stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110
> > stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145
> > stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4
> > stackvalidate: arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2
> > stackvalidate: arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic
> > stackvalidate: arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic
> > stackvalidate: arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170
> > stackvalidate: arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176
> > stackvalidate: arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a
> > stackvalidate: arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic
> > stackvalidate: arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic
> > stackvalidate: arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69
> > stackvalidate: arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d
> > stackvalidate: arch/x86/lib/copy_user_64.o: unsupported jump to outside of the function at _copy_to_user+0x25
> > stackvalidate: arch/x86/lib/copy_user_64.o: unsupported jump to outside of the function at _copy_from_user+0x25
> > stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_1+0x14
> > stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_2+0x4
> > stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_4+0x4
> > stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_8+0x4
> > stackvalidate: arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5
> > stackvalidate: arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5
> > stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_1+0x14
> > stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_2+0x1b
> > stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_4+0x1b
> > stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_8+0x1b
> > stackvalidate: arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1
> > stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic
> > stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic
> > stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic
> > stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic
> > stackvalidate: arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic
> > stackvalidate: arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic
> > stackvalidate: arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e
> > stackvalidate: arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172
> > stackvalidate: arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic
> > stackvalidate: arch/x86/boot/pmjump.o: unsupported jump to outside of the function at in_pm32+0x1c
> >
> > Note that only 13 of the 47 warnings are actually due to missing frame
> > pointer logic. The rest are ambiguous conditions which prevent
> > stackvalidate from being able to make sense of things: returning from
> > outside of a proper ELF function, or jumping from inside of a function
> > to outside of its scope.
> >
> > Similarly, in the Fedora config case, only 27 of the 83 warnings are for
> > missing frame pointer logic.
> >
> > If there are no objections, I'll go with this approach in the next
> > version of the patch set.
>
> I'm willing to review anything with "entry" in its filename.
Thanks. I think the "entry" warnings may be false positives, since that
code isn't called by any C kernel code. (Now that I'm ignoring leaf
functions, the ratio of false positives to true positives has gone up.)
The false positives for "return instruction outside of a function" can
be marked with the RET_NOVALIDATE macro to tell stackvalidate to ignore
the return instruction, or with FILE_NOVALIDATE to tell it to ignore the
entire file.
I can add some patches to fix the warnings. I'll put you on CC for the
"entry" changes.
--
Josh
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-21 20:54 ` Josh Poimboeuf
2015-05-21 21:53 ` Andy Lutomirski
@ 2015-05-21 22:01 ` Borislav Petkov
2015-05-22 14:32 ` Josh Poimboeuf
1 sibling, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-21 22:01 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel,
Linus Torvalds, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
Peter Zijlstra, Andrew Morton
On Thu, May 21, 2015 at 03:54:25PM -0500, Josh Poimboeuf wrote:
> stackvalidate: arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5
That must be something like this:
0000000000000000 <.altinstr_replacement>:
0: 48 89 d1 mov %rdx,%rcx
3: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
5: c3 retq
right?
In any case, anything with alternatives is probably a false positive
because even if instructions appear outside of the containing function,
they get patched in and are actually inside. Jump offsets get fixed up
properly too. Should, at least :-)
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-21 22:01 ` Borislav Petkov
@ 2015-05-22 14:32 ` Josh Poimboeuf
2015-05-22 21:18 ` Jiri Kosina
2015-05-23 8:37 ` Borislav Petkov
0 siblings, 2 replies; 400+ messages in thread
From: Josh Poimboeuf @ 2015-05-22 14:32 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel,
Linus Torvalds, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
Peter Zijlstra, Andrew Morton
On Fri, May 22, 2015 at 12:01:58AM +0200, Borislav Petkov wrote:
> On Thu, May 21, 2015 at 03:54:25PM -0500, Josh Poimboeuf wrote:
> > stackvalidate: arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5
>
> That must be something like this:
>
> 0000000000000000 <.altinstr_replacement>:
> 0: 48 89 d1 mov %rdx,%rcx
> 3: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
> 5: c3 retq
>
> right?
>
> In any case, anything with alternatives is probably a false positive
> because even if instructions appear outside of the containing function,
> they get patched in and are actually inside. Jump offsets get fixed up
> properly too. Should, at least :-)
Hm, alternatives do complicate things a bit. It *is* a false positive,
but not necessarily because its part of an alternative instruction
block.
The above code would be patched into memmove(), which is a leaf function
because it doesn't call any other functions. Leaf functions don't need
frame pointer logic, so we can ignore them.
If instead the above code were patched into a non-leaf function, we'd
have to change it to restore the frame pointer before returning.
--
Josh
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-22 14:32 ` Josh Poimboeuf
@ 2015-05-22 21:18 ` Jiri Kosina
2015-05-22 22:22 ` Josh Poimboeuf
2015-05-23 8:37 ` Borislav Petkov
1 sibling, 1 reply; 400+ messages in thread
From: Jiri Kosina @ 2015-05-22 21:18 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Borislav Petkov, Ingo Molnar, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michal Marek, Peter Zijlstra, x86, live-patching,
linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Andrew Morton
On Fri, 22 May 2015, Josh Poimboeuf wrote:
> Hm, alternatives do complicate things a bit. It *is* a false positive,
> but not necessarily because its part of an alternative instruction
> block.
>
> The above code would be patched into memmove(), which is a leaf function
> because it doesn't call any other functions. Leaf functions don't need
> frame pointer logic, so we can ignore them.
>
> If instead the above code were patched into a non-leaf function, we'd
> have to change it to restore the frame pointer before returning.
Is this really only a problem of alternatives? How about
dynamically-enabled tracepoints?
--
Jiri Kosina
SUSE Labs
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-22 21:18 ` Jiri Kosina
@ 2015-05-22 22:22 ` Josh Poimboeuf
0 siblings, 0 replies; 400+ messages in thread
From: Josh Poimboeuf @ 2015-05-22 22:22 UTC (permalink / raw)
To: Jiri Kosina
Cc: Borislav Petkov, Ingo Molnar, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michal Marek, Peter Zijlstra, x86, live-patching,
linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, Peter Zijlstra, Andrew Morton
On Fri, May 22, 2015 at 11:18:57PM +0200, Jiri Kosina wrote:
> On Fri, 22 May 2015, Josh Poimboeuf wrote:
>
> > Hm, alternatives do complicate things a bit. It *is* a false positive,
> > but not necessarily because its part of an alternative instruction
> > block.
> >
> > The above code would be patched into memmove(), which is a leaf function
> > because it doesn't call any other functions. Leaf functions don't need
> > frame pointer logic, so we can ignore them.
> >
> > If instead the above code were patched into a non-leaf function, we'd
> > have to change it to restore the frame pointer before returning.
>
> Is this really only a problem of alternatives? How about
> dynamically-enabled tracepoints?
I think tracepoints are only in C code, right? stackvalidate only
analyzes asm code, so it's not a concern for this patch set.
And I think tracepoints rely on normal call instructions, so they
shouldn't cause any problems with frame pointers as far as I can tell.
--
Josh
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
2015-05-22 14:32 ` Josh Poimboeuf
2015-05-22 21:18 ` Jiri Kosina
@ 2015-05-23 8:37 ` Borislav Petkov
1 sibling, 0 replies; 400+ messages in thread
From: Borislav Petkov @ 2015-05-23 8:37 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel,
Linus Torvalds, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
Peter Zijlstra, Andrew Morton
On Fri, May 22, 2015 at 09:32:12AM -0500, Josh Poimboeuf wrote:
> If instead the above code were patched into a non-leaf function, we'd
> have to change it to restore the frame pointer before returning.
Not a problem, I think. One'll need to add the FP restoring before the
retq.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH 01/18] x86/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:17 ` [tip:x86/mm] x86/mm/kconfig: " tip-bot for Toshi Kani
2015-05-26 8:28 ` [PATCH 02/18] x86/mtrr: Fix MTRR lookup to handle an inclusive entry Borislav Petkov
` (16 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: Toshi Kani <toshi.kani@hp.com>
Simplify the conditions selecting HAVE_ARCH_HUGE_VMAP since X86_PAE
depends on X86_32 already.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-2-git-send-email-toshi.kani@hp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 226d5696e1d1..4eb0b0ffae85 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -100,7 +100,7 @@ config X86
select IRQ_FORCED_THREADING
select HAVE_BPF_JIT if X86_64
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
- select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
+ select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
select ARCH_HAS_SG_CHAIN
select CLKEVT_I8253
select ARCH_HAVE_NMI_SAFE_CMPXCHG
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/mm] x86/mm/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP
2015-05-26 8:28 ` [PATCH 01/18] x86/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP Borislav Petkov
@ 2015-05-27 14:17 ` tip-bot for Toshi Kani
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: dvlasenk, torvalds, hpa, toshi.kani, bp, akpm, tglx, linux-mm,
linux-kernel, bp, mingo, peterz, mcgrof, brgerst, luto
Commit-ID: 10455f64aff0d715dcdfb09b02393df168fe267e
Gitweb: http://git.kernel.org/tip/10455f64aff0d715dcdfb09b02393df168fe267e
Author: Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:04 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:55 +0200
x86/mm/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP
Simplify the conditions selecting HAVE_ARCH_HUGE_VMAP since
X86_PAE depends on X86_32 already.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-2-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 226d569..4eb0b0f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -100,7 +100,7 @@ config X86
select IRQ_FORCED_THREADING
select HAVE_BPF_JIT if X86_64
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
- select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
+ select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
select ARCH_HAS_SG_CHAIN
select CLKEVT_I8253
select ARCH_HAVE_NMI_SAFE_CMPXCHG
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 02/18] x86/mtrr: Fix MTRR lookup to handle an inclusive entry
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
2015-05-26 8:28 ` [PATCH 01/18] x86/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:18 ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Toshi Kani
2015-05-26 8:28 ` [PATCH 03/18] x86/mtrr: Fix MTRR state checks in mtrr_type_lookup() Borislav Petkov
` (15 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: Toshi Kani <toshi.kani@hp.com>
When an MTRR entry is inclusive to a requested range, i.e. the
start and end of the request are not within the MTRR entry range
but the range contains the MTRR entry entirely:
range_start ... [mtrr_start ... mtrr_end] ... range_end
__mtrr_type_lookup() ignores such a case because both start_state
and end_state are set to zero.
This bug can cause the following issues:
1) reserve_memtype() tracks an effective memory type in case
a request type is WB (ex. /dev/mem blindly uses WB). Missing
to track with its effective type causes a subsequent request
to map the same range with the effective type to fail.
2) pud_set_huge() and pmd_set_huge() check if a requested range
has any overlap with MTRRs. Missing to detect an overlap may
cause a performance penalty or undefined behavior.
This patch fixes the bug by adding a new flag, 'inclusive',
to detect the inclusive case. This case is then handled in
the same way as end_state:1 since the first region is the same.
With this fix, __mtrr_type_lookup() handles the inclusive case
properly.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-3-git-send-email-toshi.kani@hp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/mtrr/generic.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 5b239679cfc9..e202d26f64a2 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
prev_match = 0xFF;
for (i = 0; i < num_var_ranges; ++i) {
- unsigned short start_state, end_state;
+ unsigned short start_state, end_state, inclusive;
if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
continue;
@@ -166,19 +166,27 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
start_state = ((start & mask) == (base & mask));
end_state = ((end & mask) == (base & mask));
+ inclusive = ((start < base) && (end > base));
- if (start_state != end_state) {
+ if ((start_state != end_state) || inclusive) {
/*
* We have start:end spanning across an MTRR.
- * We split the region into
- * either
- * (start:mtrr_end) (mtrr_end:end)
- * or
- * (start:mtrr_start) (mtrr_start:end)
+ * We split the region into either
+ *
+ * - start_state:1
+ * (start:mtrr_end)(mtrr_end:end)
+ * - end_state:1
+ * (start:mtrr_start)(mtrr_start:end)
+ * - inclusive:1
+ * (start:mtrr_start)(mtrr_start:mtrr_end)(mtrr_end:end)
+ *
* depending on kind of overlap.
- * Return the type for first region and a pointer to
- * the start of second region so that caller will
- * lookup again on the second region.
+ *
+ * Return the type of the first region and a pointer
+ * to the start of next region so that caller will be
+ * advised to lookup again after having adjusted start
+ * and end.
+ *
* Note: This way we handle multiple overlaps as well.
*/
if (start_state)
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/mm] x86/mm/mtrr: Fix MTRR lookup to handle an inclusive entry
2015-05-26 8:28 ` [PATCH 02/18] x86/mtrr: Fix MTRR lookup to handle an inclusive entry Borislav Petkov
@ 2015-05-27 14:18 ` tip-bot for Toshi Kani
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: mingo, tglx, dvlasenk, peterz, bp, luto, torvalds, toshi.kani,
akpm, mcgrof, hpa, brgerst, linux-mm, bp, linux-kernel
Commit-ID: 7f0431e3dc8953f41e9433581c1fdd7ee45860b0
Gitweb: http://git.kernel.org/tip/7f0431e3dc8953f41e9433581c1fdd7ee45860b0
Author: Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:05 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:56 +0200
x86/mm/mtrr: Fix MTRR lookup to handle an inclusive entry
When an MTRR entry is inclusive to a requested range, i.e. the
start and end of the request are not within the MTRR entry range
but the range contains the MTRR entry entirely:
range_start ... [mtrr_start ... mtrr_end] ... range_end
__mtrr_type_lookup() ignores such a case because both
start_state and end_state are set to zero.
This bug can cause the following issues:
1) reserve_memtype() tracks an effective memory type in case
a request type is WB (ex. /dev/mem blindly uses WB). Missing
to track with its effective type causes a subsequent request
to map the same range with the effective type to fail.
2) pud_set_huge() and pmd_set_huge() check if a requested range
has any overlap with MTRRs. Missing to detect an overlap may
cause a performance penalty or undefined behavior.
This patch fixes the bug by adding a new flag, 'inclusive',
to detect the inclusive case. This case is then handled in
the same way as end_state:1 since the first region is the same.
With this fix, __mtrr_type_lookup() handles the inclusive case
properly.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-3-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/kernel/cpu/mtrr/generic.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 5b23967..e202d26 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
prev_match = 0xFF;
for (i = 0; i < num_var_ranges; ++i) {
- unsigned short start_state, end_state;
+ unsigned short start_state, end_state, inclusive;
if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
continue;
@@ -166,19 +166,27 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
start_state = ((start & mask) == (base & mask));
end_state = ((end & mask) == (base & mask));
+ inclusive = ((start < base) && (end > base));
- if (start_state != end_state) {
+ if ((start_state != end_state) || inclusive) {
/*
* We have start:end spanning across an MTRR.
- * We split the region into
- * either
- * (start:mtrr_end) (mtrr_end:end)
- * or
- * (start:mtrr_start) (mtrr_start:end)
+ * We split the region into either
+ *
+ * - start_state:1
+ * (start:mtrr_end)(mtrr_end:end)
+ * - end_state:1
+ * (start:mtrr_start)(mtrr_start:end)
+ * - inclusive:1
+ * (start:mtrr_start)(mtrr_start:mtrr_end)(mtrr_end:end)
+ *
* depending on kind of overlap.
- * Return the type for first region and a pointer to
- * the start of second region so that caller will
- * lookup again on the second region.
+ *
+ * Return the type of the first region and a pointer
+ * to the start of next region so that caller will be
+ * advised to lookup again after having adjusted start
+ * and end.
+ *
* Note: This way we handle multiple overlaps as well.
*/
if (start_state)
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 03/18] x86/mtrr: Fix MTRR state checks in mtrr_type_lookup()
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
2015-05-26 8:28 ` [PATCH 01/18] x86/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP Borislav Petkov
2015-05-26 8:28 ` [PATCH 02/18] x86/mtrr: Fix MTRR lookup to handle an inclusive entry Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:18 ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Toshi Kani
2015-05-26 8:28 ` [PATCH 04/18] x86/mtrr: Use symbolic define as a retval for disabled MTRRs Borislav Petkov
` (14 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: Toshi Kani <toshi.kani@hp.com>
'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
and E (MTRRs enabled) flags in MSR_MTRRdefType. Intel SDM,
section 11.11.2.1, defines these flags as follows:
- All MTRRs are disabled when the E flag is clear.
The FE flag has no affect when the E flag is clear.
- The default type is enabled when the E flag is set.
- MTRR variable ranges are enabled when the E flag is set.
- MTRR fixed ranges are enabled when both E and FE flags
are set.
MTRR state checks in __mtrr_type_lookup() do not match with SDM.
Hence, this patch makes the following changes:
- The current code detects MTRRs disabled when both E and
FE flags are clear in mtrr_state.enabled. Fix to detect
MTRRs disabled when the E flag is clear.
- The current code does not check if the FE bit is set in
mtrr_state.enabled when looking at the fixed entries.
Fix to check the FE flag.
- The current code returns the default type when the E flag
is clear in mtrr_state.enabled. However, the default type
is UC when the E flag is clear. Remove the code as this
case is handled as MTRR disabled with the 1st change.
In addition, this patch defines the E and FE flags in
mtrr_state.enabled as follows.
- FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
- E flag: MTRR_STATE_MTRR_ENABLED
print_mtrr_state() and x86_get_mtrr_mem_range() are also updated
accordingly.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-4-git-send-email-toshi.kani@hp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/include/asm/mtrr.h | 4 ++++
arch/x86/kernel/cpu/mtrr/cleanup.c | 3 ++-
arch/x86/kernel/cpu/mtrr/generic.c | 15 ++++++++-------
3 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f6298419..ef927948657c 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -127,4 +127,8 @@ struct mtrr_gentry32 {
_IOW(MTRR_IOCTL_BASE, 9, struct mtrr_sentry32)
#endif /* CONFIG_COMPAT */
+/* Bit fields for enabled in struct mtrr_state_type */
+#define MTRR_STATE_MTRR_FIXED_ENABLED 0x01
+#define MTRR_STATE_MTRR_ENABLED 0x02
+
#endif /* _ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 5f90b85ff22e..70d7c93f4550 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -98,7 +98,8 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
continue;
base = range_state[i].base_pfn;
if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed &&
- (mtrr_state.enabled & 1)) {
+ (mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+ (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
/* Var MTRR contains UC entry below 1M? Skip it: */
printk(BIOS_BUG_MSG, i);
if (base + size <= (1<<(20-PAGE_SHIFT)))
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e202d26f64a2..b0599dbb899a 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -119,14 +119,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
if (!mtrr_state_set)
return 0xFF;
- if (!mtrr_state.enabled)
+ if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
return 0xFF;
/* Make end inclusive end, instead of exclusive */
end--;
/* Look in fixed ranges. Just return the type as per start */
- if (mtrr_state.have_fixed && (start < 0x100000)) {
+ if ((start < 0x100000) &&
+ (mtrr_state.have_fixed) &&
+ (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
int idx;
if (start < 0x80000) {
@@ -149,9 +151,6 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
* Look of multiple ranges matching this address and pick type
* as per MTRR precedence
*/
- if (!(mtrr_state.enabled & 2))
- return mtrr_state.def_type;
-
prev_match = 0xFF;
for (i = 0; i < num_var_ranges; ++i) {
unsigned short start_state, end_state, inclusive;
@@ -355,7 +354,9 @@ static void __init print_mtrr_state(void)
mtrr_attrib_to_str(mtrr_state.def_type));
if (mtrr_state.have_fixed) {
pr_debug("MTRR fixed ranges %sabled:\n",
- mtrr_state.enabled & 1 ? "en" : "dis");
+ ((mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+ (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) ?
+ "en" : "dis");
print_fixed(0x00000, 0x10000, mtrr_state.fixed_ranges + 0);
for (i = 0; i < 2; ++i)
print_fixed(0x80000 + i * 0x20000, 0x04000,
@@ -368,7 +369,7 @@ static void __init print_mtrr_state(void)
print_fixed_last();
}
pr_debug("MTRR variable ranges %sabled:\n",
- mtrr_state.enabled & 2 ? "en" : "dis");
+ mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED ? "en" : "dis");
high_width = (__ffs64(size_or_mask) - (32 - PAGE_SHIFT) + 3) / 4;
for (i = 0; i < num_var_ranges; ++i) {
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/mm] x86/mm/mtrr: Fix MTRR state checks in mtrr_type_lookup()
2015-05-26 8:28 ` [PATCH 03/18] x86/mtrr: Fix MTRR state checks in mtrr_type_lookup() Borislav Petkov
@ 2015-05-27 14:18 ` tip-bot for Toshi Kani
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: hpa, dvlasenk, linux-kernel, toshi.kani, peterz, bp, mcgrof,
mingo, brgerst, torvalds, luto, akpm, tglx, linux-mm, bp
Commit-ID: 9b3aca620883fc06636737c82a4d024b22182281
Gitweb: http://git.kernel.org/tip/9b3aca620883fc06636737c82a4d024b22182281
Author: Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:06 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:56 +0200
x86/mm/mtrr: Fix MTRR state checks in mtrr_type_lookup()
'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
and E (MTRRs enabled) flags in MSR_MTRRdefType. Intel SDM,
section 11.11.2.1, defines these flags as follows:
- All MTRRs are disabled when the E flag is clear.
The FE flag has no affect when the E flag is clear.
- The default type is enabled when the E flag is set.
- MTRR variable ranges are enabled when the E flag is set.
- MTRR fixed ranges are enabled when both E and FE flags
are set.
MTRR state checks in __mtrr_type_lookup() do not match with SDM.
Hence, this patch makes the following changes:
- The current code detects MTRRs disabled when both E and
FE flags are clear in mtrr_state.enabled. Fix to detect
MTRRs disabled when the E flag is clear.
- The current code does not check if the FE bit is set in
mtrr_state.enabled when looking at the fixed entries.
Fix to check the FE flag.
- The current code returns the default type when the E flag
is clear in mtrr_state.enabled. However, the default type
is UC when the E flag is clear. Remove the code as this
case is handled as MTRR disabled with the 1st change.
In addition, this patch defines the E and FE flags in
mtrr_state.enabled as follows.
- FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
- E flag: MTRR_STATE_MTRR_ENABLED
print_mtrr_state() and x86_get_mtrr_mem_range() are also updated
accordingly.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-4-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/mtrr.h | 4 ++++
arch/x86/kernel/cpu/mtrr/cleanup.c | 3 ++-
arch/x86/kernel/cpu/mtrr/generic.c | 15 ++++++++-------
3 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f62..ef92794 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -127,4 +127,8 @@ struct mtrr_gentry32 {
_IOW(MTRR_IOCTL_BASE, 9, struct mtrr_sentry32)
#endif /* CONFIG_COMPAT */
+/* Bit fields for enabled in struct mtrr_state_type */
+#define MTRR_STATE_MTRR_FIXED_ENABLED 0x01
+#define MTRR_STATE_MTRR_ENABLED 0x02
+
#endif /* _ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 5f90b85..70d7c93 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -98,7 +98,8 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
continue;
base = range_state[i].base_pfn;
if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed &&
- (mtrr_state.enabled & 1)) {
+ (mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+ (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
/* Var MTRR contains UC entry below 1M? Skip it: */
printk(BIOS_BUG_MSG, i);
if (base + size <= (1<<(20-PAGE_SHIFT)))
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e202d26..b0599db 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -119,14 +119,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
if (!mtrr_state_set)
return 0xFF;
- if (!mtrr_state.enabled)
+ if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
return 0xFF;
/* Make end inclusive end, instead of exclusive */
end--;
/* Look in fixed ranges. Just return the type as per start */
- if (mtrr_state.have_fixed && (start < 0x100000)) {
+ if ((start < 0x100000) &&
+ (mtrr_state.have_fixed) &&
+ (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
int idx;
if (start < 0x80000) {
@@ -149,9 +151,6 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
* Look of multiple ranges matching this address and pick type
* as per MTRR precedence
*/
- if (!(mtrr_state.enabled & 2))
- return mtrr_state.def_type;
-
prev_match = 0xFF;
for (i = 0; i < num_var_ranges; ++i) {
unsigned short start_state, end_state, inclusive;
@@ -355,7 +354,9 @@ static void __init print_mtrr_state(void)
mtrr_attrib_to_str(mtrr_state.def_type));
if (mtrr_state.have_fixed) {
pr_debug("MTRR fixed ranges %sabled:\n",
- mtrr_state.enabled & 1 ? "en" : "dis");
+ ((mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+ (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) ?
+ "en" : "dis");
print_fixed(0x00000, 0x10000, mtrr_state.fixed_ranges + 0);
for (i = 0; i < 2; ++i)
print_fixed(0x80000 + i * 0x20000, 0x04000,
@@ -368,7 +369,7 @@ static void __init print_mtrr_state(void)
print_fixed_last();
}
pr_debug("MTRR variable ranges %sabled:\n",
- mtrr_state.enabled & 2 ? "en" : "dis");
+ mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED ? "en" : "dis");
high_width = (__ffs64(size_or_mask) - (32 - PAGE_SHIFT) + 3) / 4;
for (i = 0; i < num_var_ranges; ++i) {
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 04/18] x86/mtrr: Use symbolic define as a retval for disabled MTRRs
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (2 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 03/18] x86/mtrr: Fix MTRR state checks in mtrr_type_lookup() Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:18 ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Toshi Kani
2015-05-26 8:28 ` [PATCH 05/18] x86/mtrr: Clean up mtrr_type_lookup() Borislav Petkov
` (13 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: Toshi Kani <toshi.kani@hp.com>
mtrr_type_lookup() returns verbatim 0xFF when MTRRs are disabled. This
patch defines MTRR_TYPE_INVALID to clarify the meaning of this value,
and documents its usage.
Document the return values of the kernel virtual address mapping helpers
pud_set_huge(), pmd_set_huge, pud_clear_huge() and pmd_clear_huge().
There is no functional change in this patch.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-5-git-send-email-toshi.kani@hp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/include/asm/mtrr.h | 2 +-
arch/x86/include/uapi/asm/mtrr.h | 8 +++++++-
arch/x86/kernel/cpu/mtrr/generic.c | 14 ++++++-------
arch/x86/mm/pgtable.c | 42 +++++++++++++++++++++++++++++---------
4 files changed, 47 insertions(+), 19 deletions(-)
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index ef927948657c..bb03a547c1ab 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -55,7 +55,7 @@ static inline u8 mtrr_type_lookup(u64 addr, u64 end)
/*
* Return no-MTRRs:
*/
- return 0xff;
+ return MTRR_TYPE_INVALID;
}
#define mtrr_save_fixed_ranges(arg) do {} while (0)
#define mtrr_save_state() do {} while (0)
diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index d0acb658c8f4..7528dcf59691 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -103,7 +103,7 @@ struct mtrr_state_type {
#define MTRRIOC_GET_PAGE_ENTRY _IOWR(MTRR_IOCTL_BASE, 8, struct mtrr_gentry)
#define MTRRIOC_KILL_PAGE_ENTRY _IOW(MTRR_IOCTL_BASE, 9, struct mtrr_sentry)
-/* These are the region types */
+/* MTRR memory types, which are defined in SDM */
#define MTRR_TYPE_UNCACHABLE 0
#define MTRR_TYPE_WRCOMB 1
/*#define MTRR_TYPE_ 2*/
@@ -113,5 +113,11 @@ struct mtrr_state_type {
#define MTRR_TYPE_WRBACK 6
#define MTRR_NUM_TYPES 7
+/*
+ * Invalid MTRR memory type. mtrr_type_lookup() returns this value when
+ * MTRRs are disabled. Note, this value is allocated from the reserved
+ * values (0x7-0xff) of the MTRR memory types.
+ */
+#define MTRR_TYPE_INVALID 0xff
#endif /* _UAPI_ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index b0599dbb899a..7b1491c6232d 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -104,7 +104,7 @@ static int check_type_overlap(u8 *prev, u8 *curr)
/*
* Error/Semi-error returns:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
* *repeat == 1 implies [start:end] spanned across MTRR range and type returned
* corresponds only to [start:*partial_end].
* Caller has to lookup again for [*partial_end:end].
@@ -117,10 +117,10 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
*repeat = 0;
if (!mtrr_state_set)
- return 0xFF;
+ return MTRR_TYPE_INVALID;
if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
- return 0xFF;
+ return MTRR_TYPE_INVALID;
/* Make end inclusive end, instead of exclusive */
end--;
@@ -151,7 +151,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
* Look of multiple ranges matching this address and pick type
* as per MTRR precedence
*/
- prev_match = 0xFF;
+ prev_match = MTRR_TYPE_INVALID;
for (i = 0; i < num_var_ranges; ++i) {
unsigned short start_state, end_state, inclusive;
@@ -206,7 +206,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
continue;
curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;
- if (prev_match == 0xFF) {
+ if (prev_match == MTRR_TYPE_INVALID) {
prev_match = curr_match;
continue;
}
@@ -220,7 +220,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
return MTRR_TYPE_WRBACK;
}
- if (prev_match != 0xFF)
+ if (prev_match != MTRR_TYPE_INVALID)
return prev_match;
return mtrr_state.def_type;
@@ -229,7 +229,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
/*
* Returns the effective MTRR type for the region
* Error return:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
*/
u8 mtrr_type_lookup(u64 start, u64 end)
{
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 0b97d2c75df3..c30f9819786b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -563,16 +563,22 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
}
#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+/**
+ * pud_set_huge - setup kernel PUD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity. Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR. MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
{
u8 mtrr;
- /*
- * Do not use a huge page when the range is covered by non-WB type
- * of MTRRs.
- */
mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
- if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+ if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
return 0;
prot = pgprot_4k_2_large(prot);
@@ -584,16 +590,22 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
return 1;
}
+/**
+ * pmd_set_huge - setup kernel PMD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity. Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR. MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
{
u8 mtrr;
- /*
- * Do not use a huge page when the range is covered by non-WB type
- * of MTRRs.
- */
mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
- if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+ if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
return 0;
prot = pgprot_4k_2_large(prot);
@@ -605,6 +617,11 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
return 1;
}
+/**
+ * pud_clear_huge - clear kernel PUD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PUD map is found).
+ */
int pud_clear_huge(pud_t *pud)
{
if (pud_large(*pud)) {
@@ -615,6 +632,11 @@ int pud_clear_huge(pud_t *pud)
return 0;
}
+/**
+ * pmd_clear_huge - clear kernel PMD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PMD map is found).
+ */
int pmd_clear_huge(pmd_t *pmd)
{
if (pmd_large(*pmd)) {
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/mm] x86/mm/mtrr: Use symbolic define as a retval for disabled MTRRs
2015-05-26 8:28 ` [PATCH 04/18] x86/mtrr: Use symbolic define as a retval for disabled MTRRs Borislav Petkov
@ 2015-05-27 14:18 ` tip-bot for Toshi Kani
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: bp, tglx, linux-mm, mcgrof, hpa, akpm, torvalds, linux-kernel,
brgerst, mingo, toshi.kani, dvlasenk, peterz, bp, luto
Commit-ID: 3d3ca416d9b0784cfcf244eeeba1bcaf421bc64d
Gitweb: http://git.kernel.org/tip/3d3ca416d9b0784cfcf244eeeba1bcaf421bc64d
Author: Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:07 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:57 +0200
x86/mm/mtrr: Use symbolic define as a retval for disabled MTRRs
mtrr_type_lookup() returns verbatim 0xFF when MTRRs are
disabled. This patch defines MTRR_TYPE_INVALID to clarify the
meaning of this value, and documents its usage.
Document the return values of the kernel virtual address mapping
helpers pud_set_huge(), pmd_set_huge, pud_clear_huge() and
pmd_clear_huge().
There is no functional change in this patch.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-5-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/mtrr.h | 2 +-
arch/x86/include/uapi/asm/mtrr.h | 8 +++++++-
arch/x86/kernel/cpu/mtrr/generic.c | 14 ++++++-------
arch/x86/mm/pgtable.c | 42 +++++++++++++++++++++++++++++---------
4 files changed, 47 insertions(+), 19 deletions(-)
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index ef92794..bb03a54 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -55,7 +55,7 @@ static inline u8 mtrr_type_lookup(u64 addr, u64 end)
/*
* Return no-MTRRs:
*/
- return 0xff;
+ return MTRR_TYPE_INVALID;
}
#define mtrr_save_fixed_ranges(arg) do {} while (0)
#define mtrr_save_state() do {} while (0)
diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index d0acb65..7528dcf 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -103,7 +103,7 @@ struct mtrr_state_type {
#define MTRRIOC_GET_PAGE_ENTRY _IOWR(MTRR_IOCTL_BASE, 8, struct mtrr_gentry)
#define MTRRIOC_KILL_PAGE_ENTRY _IOW(MTRR_IOCTL_BASE, 9, struct mtrr_sentry)
-/* These are the region types */
+/* MTRR memory types, which are defined in SDM */
#define MTRR_TYPE_UNCACHABLE 0
#define MTRR_TYPE_WRCOMB 1
/*#define MTRR_TYPE_ 2*/
@@ -113,5 +113,11 @@ struct mtrr_state_type {
#define MTRR_TYPE_WRBACK 6
#define MTRR_NUM_TYPES 7
+/*
+ * Invalid MTRR memory type. mtrr_type_lookup() returns this value when
+ * MTRRs are disabled. Note, this value is allocated from the reserved
+ * values (0x7-0xff) of the MTRR memory types.
+ */
+#define MTRR_TYPE_INVALID 0xff
#endif /* _UAPI_ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index b0599db..7b1491c 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -104,7 +104,7 @@ static int check_type_overlap(u8 *prev, u8 *curr)
/*
* Error/Semi-error returns:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
* *repeat == 1 implies [start:end] spanned across MTRR range and type returned
* corresponds only to [start:*partial_end].
* Caller has to lookup again for [*partial_end:end].
@@ -117,10 +117,10 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
*repeat = 0;
if (!mtrr_state_set)
- return 0xFF;
+ return MTRR_TYPE_INVALID;
if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
- return 0xFF;
+ return MTRR_TYPE_INVALID;
/* Make end inclusive end, instead of exclusive */
end--;
@@ -151,7 +151,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
* Look of multiple ranges matching this address and pick type
* as per MTRR precedence
*/
- prev_match = 0xFF;
+ prev_match = MTRR_TYPE_INVALID;
for (i = 0; i < num_var_ranges; ++i) {
unsigned short start_state, end_state, inclusive;
@@ -206,7 +206,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
continue;
curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;
- if (prev_match == 0xFF) {
+ if (prev_match == MTRR_TYPE_INVALID) {
prev_match = curr_match;
continue;
}
@@ -220,7 +220,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
return MTRR_TYPE_WRBACK;
}
- if (prev_match != 0xFF)
+ if (prev_match != MTRR_TYPE_INVALID)
return prev_match;
return mtrr_state.def_type;
@@ -229,7 +229,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
/*
* Returns the effective MTRR type for the region
* Error return:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
*/
u8 mtrr_type_lookup(u64 start, u64 end)
{
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 0b97d2c..c30f981 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -563,16 +563,22 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
}
#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+/**
+ * pud_set_huge - setup kernel PUD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity. Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR. MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
{
u8 mtrr;
- /*
- * Do not use a huge page when the range is covered by non-WB type
- * of MTRRs.
- */
mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
- if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+ if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
return 0;
prot = pgprot_4k_2_large(prot);
@@ -584,16 +590,22 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
return 1;
}
+/**
+ * pmd_set_huge - setup kernel PMD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity. Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR. MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
{
u8 mtrr;
- /*
- * Do not use a huge page when the range is covered by non-WB type
- * of MTRRs.
- */
mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
- if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+ if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
return 0;
prot = pgprot_4k_2_large(prot);
@@ -605,6 +617,11 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
return 1;
}
+/**
+ * pud_clear_huge - clear kernel PUD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PUD map is found).
+ */
int pud_clear_huge(pud_t *pud)
{
if (pud_large(*pud)) {
@@ -615,6 +632,11 @@ int pud_clear_huge(pud_t *pud)
return 0;
}
+/**
+ * pmd_clear_huge - clear kernel PMD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PMD map is found).
+ */
int pmd_clear_huge(pmd_t *pmd)
{
if (pmd_large(*pmd)) {
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 05/18] x86/mtrr: Clean up mtrr_type_lookup()
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (3 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 04/18] x86/mtrr: Use symbolic define as a retval for disabled MTRRs Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:19 ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Toshi Kani
2015-05-26 8:28 ` [PATCH 06/18] x86/process: Drop repeated word from comment Borislav Petkov
` (12 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: Toshi Kani <toshi.kani@hp.com>
MTRRs contain fixed and variable entries. mtrr_type_lookup() may
repeatedly call __mtrr_type_lookup() to handle a request that overlaps
with variable entries.
However, __mtrr_type_lookup() also handles the fixed entries, which
do not have to be repeated. Therefore, this patch creates separate
functions, mtrr_type_lookup_fixed() and mtrr_type_lookup_variable(), to
handle the fixed and variable ranges respectively.
The patch also updates the function headers to clarify the return values
and output argument. It updates comments to clarify that the repeating
is necessary to handle overlaps with the default type, since overlaps
with multiple entries alone can be handled without such repeating.
There is no functional change in this patch.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-6-git-send-email-toshi.kani@hp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/mtrr/generic.c | 138 +++++++++++++++++++++++--------------
1 file changed, 86 insertions(+), 52 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7b1491c6232d..e51100c49eea 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -102,55 +102,68 @@ static int check_type_overlap(u8 *prev, u8 *curr)
return 0;
}
-/*
- * Error/Semi-error returns:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
- * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
- * corresponds only to [start:*partial_end].
- * Caller has to lookup again for [*partial_end:end].
+/**
+ * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
+ *
+ * Return the MTRR fixed memory type of 'start'.
+ *
+ * MTRR fixed entries are divided into the following ways:
+ * 0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
+ * 0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
+ * 0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
+ *
+ * Return Values:
+ * MTRR_TYPE_(type) - Matched memory type
+ * MTRR_TYPE_INVALID - Unmatched
+ */
+static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
+{
+ int idx;
+
+ if (start >= 0x100000)
+ return MTRR_TYPE_INVALID;
+
+ /* 0x0 - 0x7FFFF */
+ if (start < 0x80000) {
+ idx = 0;
+ idx += (start >> 16);
+ return mtrr_state.fixed_ranges[idx];
+ /* 0x80000 - 0xBFFFF */
+ } else if (start < 0xC0000) {
+ idx = 1 * 8;
+ idx += ((start - 0x80000) >> 14);
+ return mtrr_state.fixed_ranges[idx];
+ }
+
+ /* 0xC0000 - 0xFFFFF */
+ idx = 3 * 8;
+ idx += ((start - 0xC0000) >> 12);
+ return mtrr_state.fixed_ranges[idx];
+}
+
+/**
+ * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
+ *
+ * Return Value:
+ * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
+ *
+ * Output Argument:
+ * repeat - Set to 1 when [start:end] spanned across MTRR range and type
+ * returned corresponds only to [start:*partial_end]. Caller has
+ * to lookup again for [*partial_end:end].
*/
-static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
+static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
+ int *repeat)
{
int i;
u64 base, mask;
u8 prev_match, curr_match;
*repeat = 0;
- if (!mtrr_state_set)
- return MTRR_TYPE_INVALID;
-
- if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
- return MTRR_TYPE_INVALID;
- /* Make end inclusive end, instead of exclusive */
+ /* Make end inclusive instead of exclusive */
end--;
- /* Look in fixed ranges. Just return the type as per start */
- if ((start < 0x100000) &&
- (mtrr_state.have_fixed) &&
- (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
- int idx;
-
- if (start < 0x80000) {
- idx = 0;
- idx += (start >> 16);
- return mtrr_state.fixed_ranges[idx];
- } else if (start < 0xC0000) {
- idx = 1 * 8;
- idx += ((start - 0x80000) >> 14);
- return mtrr_state.fixed_ranges[idx];
- } else {
- idx = 3 * 8;
- idx += ((start - 0xC0000) >> 12);
- return mtrr_state.fixed_ranges[idx];
- }
- }
-
- /*
- * Look in variable ranges
- * Look of multiple ranges matching this address and pick type
- * as per MTRR precedence
- */
prev_match = MTRR_TYPE_INVALID;
for (i = 0; i < num_var_ranges; ++i) {
unsigned short start_state, end_state, inclusive;
@@ -186,7 +199,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
* advised to lookup again after having adjusted start
* and end.
*
- * Note: This way we handle multiple overlaps as well.
+ * Note: This way we handle overlaps with multiple
+ * entries and the default type properly.
*/
if (start_state)
*partial_end = base + get_mtrr_size(mask);
@@ -215,21 +229,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
return curr_match;
}
- if (mtrr_tom2) {
- if (start >= (1ULL<<32) && (end < mtrr_tom2))
- return MTRR_TYPE_WRBACK;
- }
-
if (prev_match != MTRR_TYPE_INVALID)
return prev_match;
return mtrr_state.def_type;
}
-/*
- * Returns the effective MTRR type for the region
- * Error return:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
+/**
+ * mtrr_type_lookup - look up memory type in MTRR
+ *
+ * Return Values:
+ * MTRR_TYPE_(type) - The effective MTRR type for the region
+ * MTRR_TYPE_INVALID - MTRR is disabled
*/
u8 mtrr_type_lookup(u64 start, u64 end)
{
@@ -237,22 +248,45 @@ u8 mtrr_type_lookup(u64 start, u64 end)
int repeat;
u64 partial_end;
- type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+ if (!mtrr_state_set)
+ return MTRR_TYPE_INVALID;
+
+ if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
+ return MTRR_TYPE_INVALID;
+
+ /*
+ * Look up the fixed ranges first, which take priority over
+ * the variable ranges.
+ */
+ if ((start < 0x100000) &&
+ (mtrr_state.have_fixed) &&
+ (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
+ return mtrr_type_lookup_fixed(start, end);
+
+ /*
+ * Look up the variable ranges. Look of multiple ranges matching
+ * this address and pick type as per MTRR precedence.
+ */
+ type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
/*
* Common path is with repeat = 0.
* However, we can have cases where [start:end] spans across some
- * MTRR range. Do repeated lookups for that case here.
+ * MTRR ranges and/or the default type. Do repeated lookups for
+ * that case here.
*/
while (repeat) {
prev_type = type;
start = partial_end;
- type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+ type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
if (check_type_overlap(&prev_type, &type))
return type;
}
+ if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
+ return MTRR_TYPE_WRBACK;
+
return type;
}
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
2015-05-26 8:28 ` [PATCH 05/18] x86/mtrr: Clean up mtrr_type_lookup() Borislav Petkov
@ 2015-05-27 14:19 ` tip-bot for Toshi Kani
2015-07-31 13:18 ` Peter Zijlstra
0 siblings, 1 reply; 400+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: hpa, dvlasenk, bp, bp, mingo, luto, linux-mm, linux-kernel,
torvalds, mcgrof, toshi.kani, brgerst, peterz, akpm, tglx
Commit-ID: 0cc705f56e400764a171055f727d28a48260bb4b
Gitweb: http://git.kernel.org/tip/0cc705f56e400764a171055f727d28a48260bb4b
Author: Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:08 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:57 +0200
x86/mm/mtrr: Clean up mtrr_type_lookup()
MTRRs contain fixed and variable entries. mtrr_type_lookup() may
repeatedly call __mtrr_type_lookup() to handle a request that
overlaps with variable entries.
However, __mtrr_type_lookup() also handles the fixed entries,
which do not have to be repeated. Therefore, this patch creates
separate functions, mtrr_type_lookup_fixed() and
mtrr_type_lookup_variable(), to handle the fixed and variable
ranges respectively.
The patch also updates the function headers to clarify the
return values and output argument. It updates comments to
clarify that the repeating is necessary to handle overlaps with
the default type, since overlaps with multiple entries alone can
be handled without such repeating.
There is no functional change in this patch.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-6-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/kernel/cpu/mtrr/generic.c | 138 +++++++++++++++++++++++--------------
1 file changed, 86 insertions(+), 52 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7b1491c..e51100c 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -102,55 +102,68 @@ static int check_type_overlap(u8 *prev, u8 *curr)
return 0;
}
-/*
- * Error/Semi-error returns:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
- * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
- * corresponds only to [start:*partial_end].
- * Caller has to lookup again for [*partial_end:end].
+/**
+ * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
+ *
+ * Return the MTRR fixed memory type of 'start'.
+ *
+ * MTRR fixed entries are divided into the following ways:
+ * 0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
+ * 0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
+ * 0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
+ *
+ * Return Values:
+ * MTRR_TYPE_(type) - Matched memory type
+ * MTRR_TYPE_INVALID - Unmatched
+ */
+static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
+{
+ int idx;
+
+ if (start >= 0x100000)
+ return MTRR_TYPE_INVALID;
+
+ /* 0x0 - 0x7FFFF */
+ if (start < 0x80000) {
+ idx = 0;
+ idx += (start >> 16);
+ return mtrr_state.fixed_ranges[idx];
+ /* 0x80000 - 0xBFFFF */
+ } else if (start < 0xC0000) {
+ idx = 1 * 8;
+ idx += ((start - 0x80000) >> 14);
+ return mtrr_state.fixed_ranges[idx];
+ }
+
+ /* 0xC0000 - 0xFFFFF */
+ idx = 3 * 8;
+ idx += ((start - 0xC0000) >> 12);
+ return mtrr_state.fixed_ranges[idx];
+}
+
+/**
+ * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
+ *
+ * Return Value:
+ * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
+ *
+ * Output Argument:
+ * repeat - Set to 1 when [start:end] spanned across MTRR range and type
+ * returned corresponds only to [start:*partial_end]. Caller has
+ * to lookup again for [*partial_end:end].
*/
-static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
+static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
+ int *repeat)
{
int i;
u64 base, mask;
u8 prev_match, curr_match;
*repeat = 0;
- if (!mtrr_state_set)
- return MTRR_TYPE_INVALID;
-
- if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
- return MTRR_TYPE_INVALID;
- /* Make end inclusive end, instead of exclusive */
+ /* Make end inclusive instead of exclusive */
end--;
- /* Look in fixed ranges. Just return the type as per start */
- if ((start < 0x100000) &&
- (mtrr_state.have_fixed) &&
- (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
- int idx;
-
- if (start < 0x80000) {
- idx = 0;
- idx += (start >> 16);
- return mtrr_state.fixed_ranges[idx];
- } else if (start < 0xC0000) {
- idx = 1 * 8;
- idx += ((start - 0x80000) >> 14);
- return mtrr_state.fixed_ranges[idx];
- } else {
- idx = 3 * 8;
- idx += ((start - 0xC0000) >> 12);
- return mtrr_state.fixed_ranges[idx];
- }
- }
-
- /*
- * Look in variable ranges
- * Look of multiple ranges matching this address and pick type
- * as per MTRR precedence
- */
prev_match = MTRR_TYPE_INVALID;
for (i = 0; i < num_var_ranges; ++i) {
unsigned short start_state, end_state, inclusive;
@@ -186,7 +199,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
* advised to lookup again after having adjusted start
* and end.
*
- * Note: This way we handle multiple overlaps as well.
+ * Note: This way we handle overlaps with multiple
+ * entries and the default type properly.
*/
if (start_state)
*partial_end = base + get_mtrr_size(mask);
@@ -215,21 +229,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
return curr_match;
}
- if (mtrr_tom2) {
- if (start >= (1ULL<<32) && (end < mtrr_tom2))
- return MTRR_TYPE_WRBACK;
- }
-
if (prev_match != MTRR_TYPE_INVALID)
return prev_match;
return mtrr_state.def_type;
}
-/*
- * Returns the effective MTRR type for the region
- * Error return:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
+/**
+ * mtrr_type_lookup - look up memory type in MTRR
+ *
+ * Return Values:
+ * MTRR_TYPE_(type) - The effective MTRR type for the region
+ * MTRR_TYPE_INVALID - MTRR is disabled
*/
u8 mtrr_type_lookup(u64 start, u64 end)
{
@@ -237,22 +248,45 @@ u8 mtrr_type_lookup(u64 start, u64 end)
int repeat;
u64 partial_end;
- type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+ if (!mtrr_state_set)
+ return MTRR_TYPE_INVALID;
+
+ if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
+ return MTRR_TYPE_INVALID;
+
+ /*
+ * Look up the fixed ranges first, which take priority over
+ * the variable ranges.
+ */
+ if ((start < 0x100000) &&
+ (mtrr_state.have_fixed) &&
+ (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
+ return mtrr_type_lookup_fixed(start, end);
+
+ /*
+ * Look up the variable ranges. Look of multiple ranges matching
+ * this address and pick type as per MTRR precedence.
+ */
+ type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
/*
* Common path is with repeat = 0.
* However, we can have cases where [start:end] spans across some
- * MTRR range. Do repeated lookups for that case here.
+ * MTRR ranges and/or the default type. Do repeated lookups for
+ * that case here.
*/
while (repeat) {
prev_type = type;
start = partial_end;
- type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+ type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
if (check_type_overlap(&prev_type, &type))
return type;
}
+ if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
+ return MTRR_TYPE_WRBACK;
+
return type;
}
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
2015-05-27 14:19 ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Toshi Kani
@ 2015-07-31 13:18 ` Peter Zijlstra
2015-07-31 14:44 ` Borislav Petkov
0 siblings, 1 reply; 400+ messages in thread
From: Peter Zijlstra @ 2015-07-31 13:18 UTC (permalink / raw)
To: mingo, hpa, bp, dvlasenk, bp, akpm, brgerst, tglx, linux-mm,
luto, mcgrof, toshi.kani, torvalds, linux-kernel
Cc: linux-tip-commits
On Wed, May 27, 2015 at 07:19:05AM -0700, tip-bot for Toshi Kani wrote:
> +/**
> + * mtrr_type_lookup - look up memory type in MTRR
> + *
> + * Return Values:
> + * MTRR_TYPE_(type) - The effective MTRR type for the region
> + * MTRR_TYPE_INVALID - MTRR is disabled
> */
> u8 mtrr_type_lookup(u64 start, u64 end)
> {
> int repeat;
> u64 partial_end;
>
> + if (!mtrr_state_set)
> + return MTRR_TYPE_INVALID;
> +
> + if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
> + return MTRR_TYPE_INVALID;
> +
> + /*
> + * Look up the fixed ranges first, which take priority over
> + * the variable ranges.
> + */
> + if ((start < 0x100000) &&
> + (mtrr_state.have_fixed) &&
> + (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> + return mtrr_type_lookup_fixed(start, end);
> +
> + /*
> + * Look up the variable ranges. Look of multiple ranges matching
> + * this address and pick type as per MTRR precedence.
> + */
> + type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
>
> /*
> * Common path is with repeat = 0.
> * However, we can have cases where [start:end] spans across some
> + * MTRR ranges and/or the default type. Do repeated lookups for
> + * that case here.
> */
> while (repeat) {
> prev_type = type;
> start = partial_end;
> + type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
>
> if (check_type_overlap(&prev_type, &type))
> return type;
> }
>
> + if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
> + return MTRR_TYPE_WRBACK;
> +
> return type;
> }
So I got staring at this MTRR horror show because I _really_ _Really_
want to kill stop_machine_from_inactive_cpu().
But I wondered about these lookup functions, should they not have an
assertion that preemption is disabled?
Using these functions with preemption enabled is racy against MTRR
updates. And if that race is ok, at the very least explain that it is
indeed racy and why this is not a problem.
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
2015-07-31 13:18 ` Peter Zijlstra
@ 2015-07-31 14:44 ` Borislav Petkov
2015-07-31 15:08 ` Peter Zijlstra
0 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-07-31 14:44 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, hpa, dvlasenk, bp, akpm, brgerst, tglx, linux-mm, luto,
mcgrof, toshi.kani, torvalds, linux-kernel, linux-tip-commits
On Fri, Jul 31, 2015 at 03:18:02PM +0200, Peter Zijlstra wrote:
> Using these functions with preemption enabled is racy against MTRR
> updates. And if that race is ok, at the very least explain that it is
> indeed racy and why this is not a problem.
Right, so Luis has been working on burying direct MTRR access so
after that work is done, we'll be using only PAT for changing memory
attributes. Look at arch_phys_wc_add() and all those fbdev users of
mtrr_add() which get converted to that thing...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
2015-07-31 14:44 ` Borislav Petkov
@ 2015-07-31 15:08 ` Peter Zijlstra
2015-07-31 15:27 ` Borislav Petkov
0 siblings, 1 reply; 400+ messages in thread
From: Peter Zijlstra @ 2015-07-31 15:08 UTC (permalink / raw)
To: Borislav Petkov
Cc: mingo, hpa, dvlasenk, bp, akpm, brgerst, tglx, linux-mm, luto,
mcgrof, toshi.kani, torvalds, linux-kernel, linux-tip-commits
On Fri, Jul 31, 2015 at 04:44:52PM +0200, Borislav Petkov wrote:
> On Fri, Jul 31, 2015 at 03:18:02PM +0200, Peter Zijlstra wrote:
> > Using these functions with preemption enabled is racy against MTRR
> > updates. And if that race is ok, at the very least explain that it is
> > indeed racy and why this is not a problem.
>
> Right, so Luis has been working on burying direct MTRR access so
> after that work is done, we'll be using only PAT for changing memory
> attributes. Look at arch_phys_wc_add() and all those fbdev users of
> mtrr_add() which get converted to that thing...
Drivers don't do those lookups afaict.
But its things like set_memory_XX(), and afaict that's all buggy against
MTRR modifications.
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
2015-07-31 15:08 ` Peter Zijlstra
@ 2015-07-31 15:27 ` Borislav Petkov
2015-08-01 14:28 ` Luis R. Rodriguez
0 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-07-31 15:27 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, hpa, dvlasenk, bp, akpm, brgerst, tglx, linux-mm, luto,
mcgrof, toshi.kani, torvalds, linux-kernel, linux-tip-commits
On Fri, Jul 31, 2015 at 05:08:06PM +0200, Peter Zijlstra wrote:
> But its things like set_memory_XX(), and afaict that's all buggy against
> MTRR modifications.
I think the idea is to not do any MTRR modifications at some point:
>From Documentation/x86/pat.txt:
"... Ideally mtrr_add() usage will be phased out in favor of
arch_phys_wc_add() which will be a no-op on PAT enabled systems. The
region over which a arch_phys_wc_add() is made, should already have been
ioremapped with WC attributes or PAT entries, this can be done by using
ioremap_wc() / set_memory_wc()."
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
2015-07-31 15:27 ` Borislav Petkov
@ 2015-08-01 14:28 ` Luis R. Rodriguez
2015-08-01 16:33 ` Borislav Petkov
0 siblings, 1 reply; 400+ messages in thread
From: Luis R. Rodriguez @ 2015-08-01 14:28 UTC (permalink / raw)
To: Borislav Petkov, Toshi Kani
Cc: Peter Zijlstra, mingo, hpa, dvlasenk, bp, akpm, brgerst, tglx,
linux-mm, luto, torvalds, linux-kernel, linux-tip-commits
On Fri, Jul 31, 2015 at 05:27:13PM +0200, Borislav Petkov wrote:
> On Fri, Jul 31, 2015 at 05:08:06PM +0200, Peter Zijlstra wrote:
> > But its things like set_memory_XX(), and afaict that's all buggy against
> > MTRR modifications.
>
> I think the idea is to not do any MTRR modifications at some point:
>
> From Documentation/x86/pat.txt:
>
> "... Ideally mtrr_add() usage will be phased out in favor of
> arch_phys_wc_add() which will be a no-op on PAT enabled systems. The
> region over which a arch_phys_wc_add() is made, should already have been
> ioremapped with WC attributes or PAT entries, this can be done by using
> ioremap_wc() / set_memory_wc()."
I need to update this documentation to remove set_memory_wc() there as we've
learned with the MTRR --> PAT conversion that set_memory_wc() cannot be used on
IO memory, it can only be used for RAM. I am not sure if I would call it being
broken that you cannot use set_memory_*() for IO memory that may have been by
design.
Luis
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
2015-08-01 14:28 ` Luis R. Rodriguez
@ 2015-08-01 16:33 ` Borislav Petkov
2015-08-01 16:39 ` Linus Torvalds
0 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-08-01 16:33 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Toshi Kani, Peter Zijlstra, mingo, hpa, dvlasenk, bp, akpm,
brgerst, tglx, linux-mm, luto, torvalds, linux-kernel,
linux-tip-commits
On Sat, Aug 01, 2015 at 04:28:20PM +0200, Luis R. Rodriguez wrote:
> I need to update this documentation to remove set_memory_wc() there as we've
> learned with the MTRR --> PAT conversion that set_memory_wc() cannot be used on
> IO memory, it can only be used for RAM. I am not sure if I would call it being
> broken that you cannot use set_memory_*() for IO memory that may have been by
> design.
Well, it doesn't really make sense to write-combine IO memory, does it?
My simplistic impression is that an IO range behind which there's a
device, cannot stomach any caching of IO as all commands/data accesses
need to happen as they get issued...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
2015-08-01 16:33 ` Borislav Petkov
@ 2015-08-01 16:39 ` Linus Torvalds
2015-08-01 16:49 ` Borislav Petkov
0 siblings, 1 reply; 400+ messages in thread
From: Linus Torvalds @ 2015-08-01 16:39 UTC (permalink / raw)
To: Borislav Petkov
Cc: Luis R. Rodriguez, Toshi Kani, Peter Zijlstra, Ingo Molnar,
Peter Anvin, Denys Vlasenko, Borislav Petkov, Andrew Morton,
Brian Gerst, Thomas Gleixner, linux-mm, Andy Lutomirski,
Linux Kernel Mailing List, linux-tip-commits
On Sat, Aug 1, 2015 at 9:33 AM, Borislav Petkov <bp@alien8.de> wrote:
>
> Well, it doesn't really make sense to write-combine IO memory, does it?
Quite the reverse.
It makes no sense to write-combine normal memory (RAM), because caches
work and sane memory is always cache-coherent. So marking regular
memory write-combining is a sign of crap hardware (which admittedly
exists all too much, but hopefully goes away).
In contrast, marking MMIO memory write-combining is not a sign of crap
hardware - it's just a sign of things like frame buffers on the card
etc. Which very much wants write combining. So WC for MMIO at least
makes sense.
Yes, yes, I realize that "crap hardware" may actually be the more
common case, but still..
Linus
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
2015-08-01 16:39 ` Linus Torvalds
@ 2015-08-01 16:49 ` Borislav Petkov
2015-08-01 17:03 ` Linus Torvalds
0 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-08-01 16:49 UTC (permalink / raw)
To: Linus Torvalds
Cc: Luis R. Rodriguez, Toshi Kani, Peter Zijlstra, Ingo Molnar,
Peter Anvin, Denys Vlasenko, Borislav Petkov, Andrew Morton,
Brian Gerst, Thomas Gleixner, linux-mm, Andy Lutomirski,
Linux Kernel Mailing List, linux-tip-commits
On Sat, Aug 01, 2015 at 09:39:07AM -0700, Linus Torvalds wrote:
> Quite the reverse.
>
> It makes no sense to write-combine normal memory (RAM), because caches
> work and sane memory is always cache-coherent. So marking regular
> memory write-combining is a sign of crap hardware (which admittedly
> exists all too much, but hopefully goes away).
>
> In contrast, marking MMIO memory write-combining is not a sign of crap
> hardware - it's just a sign of things like frame buffers on the card
> etc. Which very much wants write combining. So WC for MMIO at least
> makes sense.
>
> Yes, yes, I realize that "crap hardware" may actually be the more
> common case, but still..
Hmm, ok.
My simplistic mental picture while thinking of this is the IO range
where you send the commands to the device and you don't really want to
delay those but they should reach the device as they get issued.
OTOH, your example with frame buffers really wants to WC because sending
down each write separately is plain dumb.
Ok, I see, so it can make sense to have WC IO memory, depending on the
range and what you're going to use it for, I guess...
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
2015-08-01 16:49 ` Borislav Petkov
@ 2015-08-01 17:03 ` Linus Torvalds
0 siblings, 0 replies; 400+ messages in thread
From: Linus Torvalds @ 2015-08-01 17:03 UTC (permalink / raw)
To: Borislav Petkov
Cc: Luis R. Rodriguez, Toshi Kani, Peter Zijlstra, Ingo Molnar,
Peter Anvin, Denys Vlasenko, Borislav Petkov, Andrew Morton,
Brian Gerst, Thomas Gleixner, linux-mm, Andy Lutomirski,
Linux Kernel Mailing List, linux-tip-commits
On Sat, Aug 1, 2015 at 9:49 AM, Borislav Petkov <bp@alien8.de> wrote:
>
> My simplistic mental picture while thinking of this is the IO range
> where you send the commands to the device and you don't really want to
> delay those but they should reach the device as they get issued.
Well, even for command streams, people often do go for a
write-combining approach, simply because it is *so* much more
efficient on the bus to buffer and burst things. The interface is set
up to not really "combine" things in the over-writing sense, but just
in the "combine continuous writes into bigger buffers on the CPU, and
then write it out as efficiently as possible" sense.
Of course, the device (and the driver) has to be designed properly for
that, and it makes sense only with certain kinds of models, but it can
actually be much more efficient to make the device interface be
something like "write 32-byte command packets to a circular
write-combining buffer" than it is to do things other ways. Back in
the days, that was one of the most efficient ways to try to fill up
the PCI bandwidth.
There are other approaches too, of course, with the modern variation
tending to be "the device does all real accesses by reading over DMA,
and the only time you use IO accesses is for setup and as a 'start
your DMA transfers now' kind of interface". But write-combining MMIO
used to be a very common model for high-performace IO not that long
ago, because DMA didn't actually use to be all that efficient at all
(nasty behavior with caches and snooping etc back before the memory
controller was on-die and DMA accesses snooped caches directly). So
the "DMA is efficient even for smaller things" thing is relatively
recent.
Linus
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH 06/18] x86/process: Drop repeated word from comment
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (4 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 05/18] x86/mtrr: Clean up mtrr_type_lookup() Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:16 ` [tip:sched/core] sched/x86: Drop repeated word from mwait_idle() comment tip-bot for Huang Rui
2015-05-26 8:28 ` [PATCH 07/18] x86/mm: Enhance MTRR checks in kernel mapping helpers Borislav Petkov
` (11 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: Huang Rui <ray.huang@amd.com>
A single "default" is fine.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Link: http://lkml.kernel.org/r/1432022472-2224-5-git-send-email-ray.huang@amd.com
[ Fix another typo and reflow comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/process.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 6e338e3b1dc0..c648139d68d7 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -445,11 +445,10 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
}
/*
- * MONITOR/MWAIT with no hints, used for default default C1 state.
- * This invokes MWAIT with interrutps enabled and no flags,
- * which is backwards compatible with the original MWAIT implementation.
+ * MONITOR/MWAIT with no hints, used for default C1 state. This invokes MWAIT
+ * with interrupts enabled and no flags, which is backwards compatible with the
+ * original MWAIT implementation.
*/
-
static void mwait_idle(void)
{
if (!current_set_polling_and_test()) {
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:sched/core] sched/x86: Drop repeated word from mwait_idle() comment
2015-05-26 8:28 ` [PATCH 06/18] x86/process: Drop repeated word from comment Borislav Petkov
@ 2015-05-27 14:16 ` tip-bot for Huang Rui
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Huang Rui @ 2015-05-27 14:16 UTC (permalink / raw)
To: linux-tip-commits
Cc: torvalds, hpa, luto, peterz, ray.huang, linux-kernel, dvlasenk,
bp, tglx, brgerst, mingo, bp
Commit-ID: 0fb0328d3458ff2d6ffbb280b75053c99a8a4b1f
Gitweb: http://git.kernel.org/tip/0fb0328d3458ff2d6ffbb280b75053c99a8a4b1f
Author: Huang Rui <ray.huang@amd.com>
AuthorDate: Tue, 26 May 2015 10:28:09 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:38:04 +0200
sched/x86: Drop repeated word from mwait_idle() comment
A single "default" is fine.
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ Fix another typo and reflow comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432022472-2224-5-git-send-email-ray.huang@amd.com
Link: http://lkml.kernel.org/r/1432628901-18044-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/kernel/process.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 6e338e3..c648139 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -445,11 +445,10 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
}
/*
- * MONITOR/MWAIT with no hints, used for default default C1 state.
- * This invokes MWAIT with interrutps enabled and no flags,
- * which is backwards compatible with the original MWAIT implementation.
+ * MONITOR/MWAIT with no hints, used for default C1 state. This invokes MWAIT
+ * with interrupts enabled and no flags, which is backwards compatible with the
+ * original MWAIT implementation.
*/
-
static void mwait_idle(void)
{
if (!current_set_polling_and_test()) {
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 07/18] x86/mm: Enhance MTRR checks in kernel mapping helpers
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (5 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 06/18] x86/process: Drop repeated word from comment Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:19 ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Toshi Kani
2015-05-26 8:28 ` [PATCH 08/18] x86/mm/pat: Convert to pr_* usage Borislav Petkov
` (10 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: Toshi Kani <toshi.kani@hp.com>
This patch adds the argument 'uniform' to mtrr_type_lookup(), which gets
set to 1 when a given range is covered uniformly by MTRRs, i.e. the
range is fully covered by a single MTRR entry or the default type.
Change pud_set_huge() and pmd_set_huge() to honor the 'uniform' flag to
see if it is safe to create a huge page mapping in the range.
This allows them to create a huge page mapping in a range covered by
a single MTRR entry of any memory type. It also detects a non-optimal
request properly. They continue to check with the WB type since it does
not effectively change the uniform mapping even if a request spans
multiple MTRR entries.
pmd_set_huge() logs a warning message to a non-optimal request so that
driver writers will be aware of such a case. Drivers should make a
mapping request aligned to a single MTRR entry when the range is covered
by MTRRs.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com
[ Realign, flesh out comments, improve warning message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/include/asm/mtrr.h | 4 ++--
arch/x86/kernel/cpu/mtrr/generic.c | 40 ++++++++++++++++++++++++++++----------
arch/x86/mm/pat.c | 4 ++--
arch/x86/mm/pgtable.c | 38 +++++++++++++++++++++++-------------
4 files changed, 58 insertions(+), 28 deletions(-)
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index bb03a547c1ab..a31759e1edd9 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,7 +31,7 @@
* arch_phys_wc_add and arch_phys_wc_del.
*/
# ifdef CONFIG_MTRR
-extern u8 mtrr_type_lookup(u64 addr, u64 end);
+extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
extern void mtrr_save_fixed_ranges(void *);
extern void mtrr_save_state(void);
extern int mtrr_add(unsigned long base, unsigned long size,
@@ -50,7 +50,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
extern int phys_wc_to_mtrr_index(int handle);
# else
-static inline u8 mtrr_type_lookup(u64 addr, u64 end)
+static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
{
/*
* Return no-MTRRs:
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e51100c49eea..f782d9b62cb3 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -147,19 +147,24 @@ static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
* Return Value:
* MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
*
- * Output Argument:
+ * Output Arguments:
* repeat - Set to 1 when [start:end] spanned across MTRR range and type
* returned corresponds only to [start:*partial_end]. Caller has
* to lookup again for [*partial_end:end].
+ *
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ * region is fully covered by a single MTRR entry or the default
+ * type.
*/
static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
- int *repeat)
+ int *repeat, u8 *uniform)
{
int i;
u64 base, mask;
u8 prev_match, curr_match;
*repeat = 0;
+ *uniform = 1;
/* Make end inclusive instead of exclusive */
end--;
@@ -214,6 +219,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
end = *partial_end - 1; /* end is inclusive */
*repeat = 1;
+ *uniform = 0;
}
if ((start & mask) != (base & mask))
@@ -225,6 +231,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
continue;
}
+ *uniform = 0;
if (check_type_overlap(&prev_match, &curr_match))
return curr_match;
}
@@ -241,10 +248,15 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
* Return Values:
* MTRR_TYPE_(type) - The effective MTRR type for the region
* MTRR_TYPE_INVALID - MTRR is disabled
+ *
+ * Output Argument:
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ * region is fully covered by a single MTRR entry or the default
+ * type.
*/
-u8 mtrr_type_lookup(u64 start, u64 end)
+u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
{
- u8 type, prev_type;
+ u8 type, prev_type, is_uniform = 1, dummy;
int repeat;
u64 partial_end;
@@ -260,14 +272,18 @@ u8 mtrr_type_lookup(u64 start, u64 end)
*/
if ((start < 0x100000) &&
(mtrr_state.have_fixed) &&
- (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
- return mtrr_type_lookup_fixed(start, end);
+ (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
+ is_uniform = 0;
+ type = mtrr_type_lookup_fixed(start, end);
+ goto out;
+ }
/*
* Look up the variable ranges. Look of multiple ranges matching
* this address and pick type as per MTRR precedence.
*/
- type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+ type = mtrr_type_lookup_variable(start, end, &partial_end,
+ &repeat, &is_uniform);
/*
* Common path is with repeat = 0.
@@ -278,15 +294,19 @@ u8 mtrr_type_lookup(u64 start, u64 end)
while (repeat) {
prev_type = type;
start = partial_end;
- type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+ is_uniform = 0;
+ type = mtrr_type_lookup_variable(start, end, &partial_end,
+ &repeat, &dummy);
if (check_type_overlap(&prev_type, &type))
- return type;
+ goto out;
}
if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
- return MTRR_TYPE_WRBACK;
+ type = MTRR_TYPE_WRBACK;
+out:
+ *uniform = is_uniform;
return type;
}
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af6771a95a..372ad422c2c3 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
* request is for WB.
*/
if (req_type == _PAGE_CACHE_MODE_WB) {
- u8 mtrr_type;
+ u8 mtrr_type, uniform;
- mtrr_type = mtrr_type_lookup(start, end);
+ mtrr_type = mtrr_type_lookup(start, end, &uniform);
if (mtrr_type != MTRR_TYPE_WRBACK)
return _PAGE_CACHE_MODE_UC_MINUS;
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index c30f9819786b..fb0a9dd1d6e4 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
/**
* pud_set_huge - setup kernel PUD mapping
*
- * MTRR can override PAT memory types with 4KiB granularity. Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR. MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
+ * function sets up a huge page only if any of the following conditions are met:
+ *
+ * - MTRRs are disabled, or
+ *
+ * - MTRRs are enabled and the range is completely covered by a single MTRR, or
+ *
+ * - MTRRs are enabled and the corresponding MTRR memory type is WB, which
+ * has no effect on the requested PAT memory type.
+ *
+ * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
+ * page mapping attempt fails.
*
* Returns 1 on success and 0 on failure.
*/
int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
{
- u8 mtrr;
+ u8 mtrr, uniform;
- mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
- if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+ mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
+ if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+ (mtrr != MTRR_TYPE_WRBACK))
return 0;
prot = pgprot_4k_2_large(prot);
@@ -593,20 +602,21 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
/**
* pmd_set_huge - setup kernel PMD mapping
*
- * MTRR can override PAT memory types with 4KiB granularity. Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR. MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * See text over pud_set_huge() above.
*
* Returns 1 on success and 0 on failure.
*/
int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
{
- u8 mtrr;
+ u8 mtrr, uniform;
- mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
- if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+ mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
+ if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+ (mtrr != MTRR_TYPE_WRBACK)) {
+ pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
+ __func__, addr, addr + PMD_SIZE);
return 0;
+ }
prot = pgprot_4k_2_large(prot);
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/mm] x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers
2015-05-26 8:28 ` [PATCH 07/18] x86/mm: Enhance MTRR checks in kernel mapping helpers Borislav Petkov
@ 2015-05-27 14:19 ` tip-bot for Toshi Kani
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: luto, toshi.kani, torvalds, dvlasenk, brgerst, peterz, bp, tglx,
linux-mm, mcgrof, akpm, linux-kernel, hpa, bp, mingo
Commit-ID: b73522e0c1be58d3c69b124985b8ccf94e3677f7
Gitweb: http://git.kernel.org/tip/b73522e0c1be58d3c69b124985b8ccf94e3677f7
Author: Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:10 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:58 +0200
x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers
This patch adds the argument 'uniform' to mtrr_type_lookup(),
which gets set to 1 when a given range is covered uniformly by
MTRRs, i.e. the range is fully covered by a single MTRR entry or
the default type.
Change pud_set_huge() and pmd_set_huge() to honor the 'uniform'
flag to see if it is safe to create a huge page mapping in the
range.
This allows them to create a huge page mapping in a range
covered by a single MTRR entry of any memory type. It also
detects a non-optimal request properly. They continue to check
with the WB type since it does not effectively change the
uniform mapping even if a request spans multiple MTRR entries.
pmd_set_huge() logs a warning message to a non-optimal request
so that driver writers will be aware of such a case. Drivers
should make a mapping request aligned to a single MTRR entry
when the range is covered by MTRRs.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[ Realign, flesh out comments, improve warning message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/mtrr.h | 4 ++--
arch/x86/kernel/cpu/mtrr/generic.c | 40 ++++++++++++++++++++++++++++----------
arch/x86/mm/pat.c | 4 ++--
arch/x86/mm/pgtable.c | 38 +++++++++++++++++++++++-------------
4 files changed, 58 insertions(+), 28 deletions(-)
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index bb03a54..a31759e 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,7 +31,7 @@
* arch_phys_wc_add and arch_phys_wc_del.
*/
# ifdef CONFIG_MTRR
-extern u8 mtrr_type_lookup(u64 addr, u64 end);
+extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
extern void mtrr_save_fixed_ranges(void *);
extern void mtrr_save_state(void);
extern int mtrr_add(unsigned long base, unsigned long size,
@@ -50,7 +50,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
extern int phys_wc_to_mtrr_index(int handle);
# else
-static inline u8 mtrr_type_lookup(u64 addr, u64 end)
+static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
{
/*
* Return no-MTRRs:
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e51100c..f782d9b 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -147,19 +147,24 @@ static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
* Return Value:
* MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
*
- * Output Argument:
+ * Output Arguments:
* repeat - Set to 1 when [start:end] spanned across MTRR range and type
* returned corresponds only to [start:*partial_end]. Caller has
* to lookup again for [*partial_end:end].
+ *
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ * region is fully covered by a single MTRR entry or the default
+ * type.
*/
static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
- int *repeat)
+ int *repeat, u8 *uniform)
{
int i;
u64 base, mask;
u8 prev_match, curr_match;
*repeat = 0;
+ *uniform = 1;
/* Make end inclusive instead of exclusive */
end--;
@@ -214,6 +219,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
end = *partial_end - 1; /* end is inclusive */
*repeat = 1;
+ *uniform = 0;
}
if ((start & mask) != (base & mask))
@@ -225,6 +231,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
continue;
}
+ *uniform = 0;
if (check_type_overlap(&prev_match, &curr_match))
return curr_match;
}
@@ -241,10 +248,15 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
* Return Values:
* MTRR_TYPE_(type) - The effective MTRR type for the region
* MTRR_TYPE_INVALID - MTRR is disabled
+ *
+ * Output Argument:
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ * region is fully covered by a single MTRR entry or the default
+ * type.
*/
-u8 mtrr_type_lookup(u64 start, u64 end)
+u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
{
- u8 type, prev_type;
+ u8 type, prev_type, is_uniform = 1, dummy;
int repeat;
u64 partial_end;
@@ -260,14 +272,18 @@ u8 mtrr_type_lookup(u64 start, u64 end)
*/
if ((start < 0x100000) &&
(mtrr_state.have_fixed) &&
- (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
- return mtrr_type_lookup_fixed(start, end);
+ (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
+ is_uniform = 0;
+ type = mtrr_type_lookup_fixed(start, end);
+ goto out;
+ }
/*
* Look up the variable ranges. Look of multiple ranges matching
* this address and pick type as per MTRR precedence.
*/
- type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+ type = mtrr_type_lookup_variable(start, end, &partial_end,
+ &repeat, &is_uniform);
/*
* Common path is with repeat = 0.
@@ -278,15 +294,19 @@ u8 mtrr_type_lookup(u64 start, u64 end)
while (repeat) {
prev_type = type;
start = partial_end;
- type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+ is_uniform = 0;
+ type = mtrr_type_lookup_variable(start, end, &partial_end,
+ &repeat, &dummy);
if (check_type_overlap(&prev_type, &type))
- return type;
+ goto out;
}
if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
- return MTRR_TYPE_WRBACK;
+ type = MTRR_TYPE_WRBACK;
+out:
+ *uniform = is_uniform;
return type;
}
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af677..372ad42 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
* request is for WB.
*/
if (req_type == _PAGE_CACHE_MODE_WB) {
- u8 mtrr_type;
+ u8 mtrr_type, uniform;
- mtrr_type = mtrr_type_lookup(start, end);
+ mtrr_type = mtrr_type_lookup(start, end, &uniform);
if (mtrr_type != MTRR_TYPE_WRBACK)
return _PAGE_CACHE_MODE_UC_MINUS;
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index c30f981..fb0a9dd 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
/**
* pud_set_huge - setup kernel PUD mapping
*
- * MTRR can override PAT memory types with 4KiB granularity. Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR. MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
+ * function sets up a huge page only if any of the following conditions are met:
+ *
+ * - MTRRs are disabled, or
+ *
+ * - MTRRs are enabled and the range is completely covered by a single MTRR, or
+ *
+ * - MTRRs are enabled and the corresponding MTRR memory type is WB, which
+ * has no effect on the requested PAT memory type.
+ *
+ * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
+ * page mapping attempt fails.
*
* Returns 1 on success and 0 on failure.
*/
int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
{
- u8 mtrr;
+ u8 mtrr, uniform;
- mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
- if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+ mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
+ if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+ (mtrr != MTRR_TYPE_WRBACK))
return 0;
prot = pgprot_4k_2_large(prot);
@@ -593,20 +602,21 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
/**
* pmd_set_huge - setup kernel PMD mapping
*
- * MTRR can override PAT memory types with 4KiB granularity. Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR. MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * See text over pud_set_huge() above.
*
* Returns 1 on success and 0 on failure.
*/
int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
{
- u8 mtrr;
+ u8 mtrr, uniform;
- mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
- if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+ mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
+ if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+ (mtrr != MTRR_TYPE_WRBACK)) {
+ pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
+ __func__, addr, addr + PMD_SIZE);
return 0;
+ }
prot = pgprot_4k_2_large(prot);
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 08/18] x86/mm/pat: Convert to pr_* usage
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (6 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 07/18] x86/mm: Enhance MTRR checks in kernel mapping helpers Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:19 ` [tip:x86/mm] x86/mm/pat: Convert to pr_*() usage tip-bot for Luis R. Rodriguez
2015-05-26 8:28 ` [PATCH 09/18] x86: Document Write Combining MTRR type effects on PAT / non-PAT pages Borislav Petkov
` (9 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Use pr_info() instead of the old printk to prefix the component where
things are coming from. With this readers will know exactly where the
message is coming from. We use pr_* helpers but define pr_fmt to the
empty string for easier grepping for those error messages.
We leave the users of dprintk() in place, this will print only when the
debugpat kernel parameter is enabled. We want to leave those enabled as
a debug feature, but also make them use the same prefix.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: x86@kernel.org
Cc: cocci@systeme.lip6.fr
Link: http://lkml.kernel.org/r/1430425520-22275-2-git-send-email-mcgrof@do-not-panic.com
[ Kill pr_fmt. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/mm/pat.c | 44 ++++++++++++++++++++++----------------------
arch/x86/mm/pat_internal.h | 2 +-
arch/x86/mm/pat_rbtree.c | 6 +++---
3 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 372ad422c2c3..8c50b9bfa996 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -33,13 +33,16 @@
#include "pat_internal.h"
#include "mm_internal.h"
+#undef pr_fmt
+#define pr_fmt(fmt) "" fmt
+
#ifdef CONFIG_X86_PAT
int __read_mostly pat_enabled = 1;
static inline void pat_disable(const char *reason)
{
pat_enabled = 0;
- printk(KERN_INFO "%s\n", reason);
+ pr_info("x86/PAT: %s\n", reason);
}
static int __init nopat(char *str)
@@ -188,7 +191,7 @@ void pat_init_cache_modes(void)
pat_msg + 4 * i);
update_cache_mode_entry(i, cache);
}
- pr_info("PAT configuration [0-7]: %s\n", pat_msg);
+ pr_info("x86/PAT: Configuration [0-7]: %s\n", pat_msg);
}
#define PAT(x, y) ((u64)PAT_ ## y << ((x)*8))
@@ -211,8 +214,7 @@ void pat_init(void)
* switched to PAT on the boot CPU. We have no way to
* undo PAT.
*/
- printk(KERN_ERR "PAT enabled, "
- "but not supported by secondary CPU\n");
+ pr_err("x86/PAT: PAT enabled, but not supported by secondary CPU\n");
BUG();
}
}
@@ -347,7 +349,7 @@ static int reserve_ram_pages_type(u64 start, u64 end,
page = pfn_to_page(pfn);
type = get_page_memtype(page);
if (type != -1) {
- pr_info("reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n",
+ pr_info("x86/PAT: reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n",
start, end - 1, type, req_type);
if (new_type)
*new_type = type;
@@ -451,9 +453,9 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
err = rbt_memtype_check_insert(new, new_type);
if (err) {
- printk(KERN_INFO "reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
- start, end - 1,
- cattr_name(new->type), cattr_name(req_type));
+ pr_info("x86/PAT: reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
+ start, end - 1,
+ cattr_name(new->type), cattr_name(req_type));
kfree(new);
spin_unlock(&memtype_lock);
@@ -497,8 +499,8 @@ int free_memtype(u64 start, u64 end)
spin_unlock(&memtype_lock);
if (!entry) {
- printk(KERN_INFO "%s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
- current->comm, current->pid, start, end - 1);
+ pr_info("x86/PAT: %s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
+ current->comm, current->pid, start, end - 1);
return -EINVAL;
}
@@ -628,8 +630,8 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
while (cursor < to) {
if (!devmem_is_allowed(pfn)) {
- printk(KERN_INFO "Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
- current->comm, from, to - 1);
+ pr_info("x86/PAT: Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
+ current->comm, from, to - 1);
return 0;
}
cursor += PAGE_SIZE;
@@ -698,8 +700,7 @@ int kernel_map_sync_memtype(u64 base, unsigned long size,
size;
if (ioremap_change_attr((unsigned long)__va(base), id_sz, pcm) < 0) {
- printk(KERN_INFO "%s:%d ioremap_change_attr failed %s "
- "for [mem %#010Lx-%#010Lx]\n",
+ pr_info("x86/PAT: %s:%d ioremap_change_attr failed %s for [mem %#010Lx-%#010Lx]\n",
current->comm, current->pid,
cattr_name(pcm),
base, (unsigned long long)(base + size-1));
@@ -734,7 +735,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
pcm = lookup_memtype(paddr);
if (want_pcm != pcm) {
- printk(KERN_WARNING "%s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
+ pr_warn("x86/PAT: %s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
current->comm, current->pid,
cattr_name(want_pcm),
(unsigned long long)paddr,
@@ -755,13 +756,12 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
if (strict_prot ||
!is_new_memtype_allowed(paddr, size, want_pcm, pcm)) {
free_memtype(paddr, paddr + size);
- printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
- " for [mem %#010Lx-%#010Lx], got %s\n",
- current->comm, current->pid,
- cattr_name(want_pcm),
- (unsigned long long)paddr,
- (unsigned long long)(paddr + size - 1),
- cattr_name(pcm));
+ pr_err("x86/PAT: %s:%d map pfn expected mapping type %s for [mem %#010Lx-%#010Lx], got %s\n",
+ current->comm, current->pid,
+ cattr_name(want_pcm),
+ (unsigned long long)paddr,
+ (unsigned long long)(paddr + size - 1),
+ cattr_name(pcm));
return -EINVAL;
}
/*
diff --git a/arch/x86/mm/pat_internal.h b/arch/x86/mm/pat_internal.h
index f6411620305d..a739bfc40690 100644
--- a/arch/x86/mm/pat_internal.h
+++ b/arch/x86/mm/pat_internal.h
@@ -4,7 +4,7 @@
extern int pat_debug_enable;
#define dprintk(fmt, arg...) \
- do { if (pat_debug_enable) printk(KERN_INFO fmt, ##arg); } while (0)
+ do { if (pat_debug_enable) pr_info("x86/PAT: " fmt, ##arg); } while (0)
struct memtype {
u64 start;
diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index 6582adcc8bd9..63931080366a 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -160,9 +160,9 @@ success:
return 0;
failure:
- printk(KERN_INFO "%s:%d conflicting memory types "
- "%Lx-%Lx %s<->%s\n", current->comm, current->pid, start,
- end, cattr_name(found_type), cattr_name(match->type));
+ pr_info("x86/PAT: %s:%d conflicting memory types %Lx-%Lx %s<->%s\n",
+ current->comm, current->pid, start, end,
+ cattr_name(found_type), cattr_name(match->type));
return -EBUSY;
}
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/mm] x86/mm/pat: Convert to pr_*() usage
2015-05-26 8:28 ` [PATCH 08/18] x86/mm/pat: Convert to pr_* usage Borislav Petkov
@ 2015-05-27 14:19 ` tip-bot for Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-27 14:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: mst, torvalds, mcgrof, mingo, airlied, linux-kernel, luto,
brgerst, awalls, peterz, hpa, bp, bp, jgross, bhelgaas, tglx,
dledford, dvlasenk, daniel.vetter
Commit-ID: 9e76561f6a8a1a1c4f3152a3fb403ef9d6cfc2ff
Gitweb: http://git.kernel.org/tip/9e76561f6a8a1a1c4f3152a3fb403ef9d6cfc2ff
Author: Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Tue, 26 May 2015 10:28:11 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:59 +0200
x86/mm/pat: Convert to pr_*() usage
Use pr_info() instead of the old printk to prefix the component
where things are coming from. With this readers will know
exactly where the message is coming from. We use pr_* helpers
but define pr_fmt to the empty string for easier grepping for
those error messages.
We leave the users of dprintk() in place, this will print only
when the debugpat kernel parameter is enabled. We want to leave
those enabled as a debug feature, but also make them use the
same prefix.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
[ Kill pr_fmt. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: cocci@systeme.lip6.fr
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Link: http://lkml.kernel.org/r/1430425520-22275-2-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/mm/pat.c | 44 ++++++++++++++++++++++----------------------
arch/x86/mm/pat_internal.h | 2 +-
arch/x86/mm/pat_rbtree.c | 6 +++---
3 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 372ad42..8c50b9b 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -33,13 +33,16 @@
#include "pat_internal.h"
#include "mm_internal.h"
+#undef pr_fmt
+#define pr_fmt(fmt) "" fmt
+
#ifdef CONFIG_X86_PAT
int __read_mostly pat_enabled = 1;
static inline void pat_disable(const char *reason)
{
pat_enabled = 0;
- printk(KERN_INFO "%s\n", reason);
+ pr_info("x86/PAT: %s\n", reason);
}
static int __init nopat(char *str)
@@ -188,7 +191,7 @@ void pat_init_cache_modes(void)
pat_msg + 4 * i);
update_cache_mode_entry(i, cache);
}
- pr_info("PAT configuration [0-7]: %s\n", pat_msg);
+ pr_info("x86/PAT: Configuration [0-7]: %s\n", pat_msg);
}
#define PAT(x, y) ((u64)PAT_ ## y << ((x)*8))
@@ -211,8 +214,7 @@ void pat_init(void)
* switched to PAT on the boot CPU. We have no way to
* undo PAT.
*/
- printk(KERN_ERR "PAT enabled, "
- "but not supported by secondary CPU\n");
+ pr_err("x86/PAT: PAT enabled, but not supported by secondary CPU\n");
BUG();
}
}
@@ -347,7 +349,7 @@ static int reserve_ram_pages_type(u64 start, u64 end,
page = pfn_to_page(pfn);
type = get_page_memtype(page);
if (type != -1) {
- pr_info("reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n",
+ pr_info("x86/PAT: reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n",
start, end - 1, type, req_type);
if (new_type)
*new_type = type;
@@ -451,9 +453,9 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
err = rbt_memtype_check_insert(new, new_type);
if (err) {
- printk(KERN_INFO "reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
- start, end - 1,
- cattr_name(new->type), cattr_name(req_type));
+ pr_info("x86/PAT: reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
+ start, end - 1,
+ cattr_name(new->type), cattr_name(req_type));
kfree(new);
spin_unlock(&memtype_lock);
@@ -497,8 +499,8 @@ int free_memtype(u64 start, u64 end)
spin_unlock(&memtype_lock);
if (!entry) {
- printk(KERN_INFO "%s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
- current->comm, current->pid, start, end - 1);
+ pr_info("x86/PAT: %s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
+ current->comm, current->pid, start, end - 1);
return -EINVAL;
}
@@ -628,8 +630,8 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
while (cursor < to) {
if (!devmem_is_allowed(pfn)) {
- printk(KERN_INFO "Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
- current->comm, from, to - 1);
+ pr_info("x86/PAT: Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
+ current->comm, from, to - 1);
return 0;
}
cursor += PAGE_SIZE;
@@ -698,8 +700,7 @@ int kernel_map_sync_memtype(u64 base, unsigned long size,
size;
if (ioremap_change_attr((unsigned long)__va(base), id_sz, pcm) < 0) {
- printk(KERN_INFO "%s:%d ioremap_change_attr failed %s "
- "for [mem %#010Lx-%#010Lx]\n",
+ pr_info("x86/PAT: %s:%d ioremap_change_attr failed %s for [mem %#010Lx-%#010Lx]\n",
current->comm, current->pid,
cattr_name(pcm),
base, (unsigned long long)(base + size-1));
@@ -734,7 +735,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
pcm = lookup_memtype(paddr);
if (want_pcm != pcm) {
- printk(KERN_WARNING "%s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
+ pr_warn("x86/PAT: %s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
current->comm, current->pid,
cattr_name(want_pcm),
(unsigned long long)paddr,
@@ -755,13 +756,12 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
if (strict_prot ||
!is_new_memtype_allowed(paddr, size, want_pcm, pcm)) {
free_memtype(paddr, paddr + size);
- printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
- " for [mem %#010Lx-%#010Lx], got %s\n",
- current->comm, current->pid,
- cattr_name(want_pcm),
- (unsigned long long)paddr,
- (unsigned long long)(paddr + size - 1),
- cattr_name(pcm));
+ pr_err("x86/PAT: %s:%d map pfn expected mapping type %s for [mem %#010Lx-%#010Lx], got %s\n",
+ current->comm, current->pid,
+ cattr_name(want_pcm),
+ (unsigned long long)paddr,
+ (unsigned long long)(paddr + size - 1),
+ cattr_name(pcm));
return -EINVAL;
}
/*
diff --git a/arch/x86/mm/pat_internal.h b/arch/x86/mm/pat_internal.h
index f641162..a739bfc 100644
--- a/arch/x86/mm/pat_internal.h
+++ b/arch/x86/mm/pat_internal.h
@@ -4,7 +4,7 @@
extern int pat_debug_enable;
#define dprintk(fmt, arg...) \
- do { if (pat_debug_enable) printk(KERN_INFO fmt, ##arg); } while (0)
+ do { if (pat_debug_enable) pr_info("x86/PAT: " fmt, ##arg); } while (0)
struct memtype {
u64 start;
diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index 6582adc..6393108 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -160,9 +160,9 @@ success:
return 0;
failure:
- printk(KERN_INFO "%s:%d conflicting memory types "
- "%Lx-%Lx %s<->%s\n", current->comm, current->pid, start,
- end, cattr_name(found_type), cattr_name(match->type));
+ pr_info("x86/PAT: %s:%d conflicting memory types %Lx-%Lx %s<->%s\n",
+ current->comm, current->pid, start, end,
+ cattr_name(found_type), cattr_name(match->type));
return -EBUSY;
}
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 09/18] x86: Document Write Combining MTRR type effects on PAT / non-PAT pages
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (7 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 08/18] x86/mm/pat: Convert to pr_* usage Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:19 ` [tip:x86/mm] x86/mm/mtrr, pat: " tip-bot for Luis R. Rodriguez
2015-05-26 8:28 ` [PATCH 10/18] x86/mtrr: Avoid ifdeffery with phys_wc_to_mtrr_index() Borislav Petkov
` (8 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: "Luis R. Rodriguez" <mcgrof@suse.com>
As part of the effort to phase out MTRR use document write-combining
MTRR effects on pages with different non-PAT page attributes flags and
different PAT entry values. Extend arch_phys_wc_add() documentation
to clarify power of two sizes / boundary requirements as we phase out
mtrr_add() use.
Lastly hint towards ioremap_uc() for corner cases on device drivers
working with devices with mixed regions where MTRR size requirements
would otherwise not enable write-combining effective memory types.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: linux-fbdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1430343851-967-3-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
Documentation/x86/mtrr.txt | 18 +++++++++++++++---
Documentation/x86/pat.txt | 35 ++++++++++++++++++++++++++++++++++-
arch/x86/kernel/cpu/mtrr/main.c | 3 +++
3 files changed, 52 insertions(+), 4 deletions(-)
diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
index cc071dc333c2..860bc3adc223 100644
--- a/Documentation/x86/mtrr.txt
+++ b/Documentation/x86/mtrr.txt
@@ -1,7 +1,19 @@
MTRR (Memory Type Range Register) control
-3 Jun 1999
-Richard Gooch
-<rgooch@atnf.csiro.au>
+
+Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
+Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
+
+===============================================================================
+Phasing out MTRR use
+
+MTRR use is replaced on modern x86 hardware with PAT. Over time the only type
+of effective MTRR that is expected to be supported will be for write-combining.
+As MTRR use is phased out device drivers should use arch_phys_wc_add() to make
+MTRR effective on non-PAT systems while a no-op on PAT enabled systems.
+
+For details refer to Documentation/x86/pat.txt.
+
+===============================================================================
On Intel P6 family processors (Pentium Pro, Pentium II and later)
the Memory Type Range Registers (MTRRs) may be used to control
diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
index cf08c9fff3cd..521bd8adc3b8 100644
--- a/Documentation/x86/pat.txt
+++ b/Documentation/x86/pat.txt
@@ -34,6 +34,8 @@ ioremap | -- | UC- | UC- |
| | | |
ioremap_cache | -- | WB | WB |
| | | |
+ioremap_uc | -- | UC | UC |
+ | | | |
ioremap_nocache | -- | UC- | UC- |
| | | |
ioremap_wc | -- | -- | WC |
@@ -102,7 +104,38 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
as step 0 above and also track the usage of those pages and use set_memory_wb()
before the page is freed to free pool.
-
+MTRR effects on PAT / non-PAT systems
+-------------------------------------
+
+The following table provides the effects of using write-combining MTRRs when
+using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
+mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will
+be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
+is made, should already have been ioremapped with WC attributes or PAT entries,
+this can be done by using ioremap_wc() / set_memory_wc(). Devices which
+combine areas of IO memory desired to remain uncacheable with areas where
+write-combining is desirable should consider use of ioremap_uc() followed by
+set_memory_wc() to white-list effective write-combined areas. Such use is
+nevertheless discouraged as the effective memory type is considered
+implementation defined, yet this strategy can be used as last resort on devices
+with size-constrained regions where otherwise MTRR write-combining would
+otherwise not be effective.
+
+----------------------------------------------------------------------
+MTRR Non-PAT PAT Linux ioremap value Effective memory type
+----------------------------------------------------------------------
+ Non-PAT | PAT
+ PAT
+ |PCD
+ ||PWT
+ |||
+WC 000 WB _PAGE_CACHE_MODE_WB WC | WC
+WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC
+WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC
+WC 011 UC _PAGE_CACHE_MODE_UC UC | UC
+----------------------------------------------------------------------
+
+(*) denotes implementation defined and is discouraged
Notes:
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363a1948..04aceb7e6443 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
* attempts to add a WC MTRR covering size bytes starting at base and
* logs an error if this fails.
*
+ * The called should provide a power of two size on an equivalent
+ * power of two boundary.
+ *
* Drivers must store the return value to pass to mtrr_del_wc_if_needed,
* but drivers should not try to interpret that return value.
*/
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/mm] x86/mm/mtrr, pat: Document Write Combining MTRR type effects on PAT / non-PAT pages
2015-05-26 8:28 ` [PATCH 09/18] x86: Document Write Combining MTRR type effects on PAT / non-PAT pages Borislav Petkov
@ 2015-05-27 14:19 ` tip-bot for Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-27 14:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: adaplas, sbsiddha, plagnioj, airlied, vbabka, dbueso, luto,
peterz, bp, corbet, tomi.valkeinen, hpa, mcgrof, brgerst,
daniel.vetter, jgross, mingo, bp, linux-kernel, dvlasenk,
torvalds, dave.hansen, tglx, mgorman, syrjala
Commit-ID: 2f9e897353fcb99effd6eff22f7b464f8e2a659a
Gitweb: http://git.kernel.org/tip/2f9e897353fcb99effd6eff22f7b464f8e2a659a
Author: Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Tue, 26 May 2015 10:28:12 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:59 +0200
x86/mm/mtrr, pat: Document Write Combining MTRR type effects on PAT / non-PAT pages
As part of the effort to phase out MTRR use document
write-combining MTRR effects on pages with different non-PAT
page attributes flags and different PAT entry values. Extend
arch_phys_wc_add() documentation to clarify power of two sizes /
boundary requirements as we phase out mtrr_add() use.
Lastly hint towards ioremap_uc() for corner cases on device
drivers working with devices with mixed regions where MTRR size
requirements would otherwise not enable write-combining
effective memory types.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-fbdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1430343851-967-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
Documentation/x86/mtrr.txt | 18 +++++++++++++++---
Documentation/x86/pat.txt | 35 ++++++++++++++++++++++++++++++++++-
arch/x86/kernel/cpu/mtrr/main.c | 3 +++
3 files changed, 52 insertions(+), 4 deletions(-)
diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
index cc071dc..860bc3a 100644
--- a/Documentation/x86/mtrr.txt
+++ b/Documentation/x86/mtrr.txt
@@ -1,7 +1,19 @@
MTRR (Memory Type Range Register) control
-3 Jun 1999
-Richard Gooch
-<rgooch@atnf.csiro.au>
+
+Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
+Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
+
+===============================================================================
+Phasing out MTRR use
+
+MTRR use is replaced on modern x86 hardware with PAT. Over time the only type
+of effective MTRR that is expected to be supported will be for write-combining.
+As MTRR use is phased out device drivers should use arch_phys_wc_add() to make
+MTRR effective on non-PAT systems while a no-op on PAT enabled systems.
+
+For details refer to Documentation/x86/pat.txt.
+
+===============================================================================
On Intel P6 family processors (Pentium Pro, Pentium II and later)
the Memory Type Range Registers (MTRRs) may be used to control
diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
index cf08c9f..521bd8a 100644
--- a/Documentation/x86/pat.txt
+++ b/Documentation/x86/pat.txt
@@ -34,6 +34,8 @@ ioremap | -- | UC- | UC- |
| | | |
ioremap_cache | -- | WB | WB |
| | | |
+ioremap_uc | -- | UC | UC |
+ | | | |
ioremap_nocache | -- | UC- | UC- |
| | | |
ioremap_wc | -- | -- | WC |
@@ -102,7 +104,38 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
as step 0 above and also track the usage of those pages and use set_memory_wb()
before the page is freed to free pool.
-
+MTRR effects on PAT / non-PAT systems
+-------------------------------------
+
+The following table provides the effects of using write-combining MTRRs when
+using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
+mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will
+be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
+is made, should already have been ioremapped with WC attributes or PAT entries,
+this can be done by using ioremap_wc() / set_memory_wc(). Devices which
+combine areas of IO memory desired to remain uncacheable with areas where
+write-combining is desirable should consider use of ioremap_uc() followed by
+set_memory_wc() to white-list effective write-combined areas. Such use is
+nevertheless discouraged as the effective memory type is considered
+implementation defined, yet this strategy can be used as last resort on devices
+with size-constrained regions where otherwise MTRR write-combining would
+otherwise not be effective.
+
+----------------------------------------------------------------------
+MTRR Non-PAT PAT Linux ioremap value Effective memory type
+----------------------------------------------------------------------
+ Non-PAT | PAT
+ PAT
+ |PCD
+ ||PWT
+ |||
+WC 000 WB _PAGE_CACHE_MODE_WB WC | WC
+WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC
+WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC
+WC 011 UC _PAGE_CACHE_MODE_UC UC | UC
+----------------------------------------------------------------------
+
+(*) denotes implementation defined and is discouraged
Notes:
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363..04aceb7 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
* attempts to add a WC MTRR covering size bytes starting at base and
* logs an error if this fails.
*
+ * The called should provide a power of two size on an equivalent
+ * power of two boundary.
+ *
* Drivers must store the return value to pass to mtrr_del_wc_if_needed,
* but drivers should not try to interpret that return value.
*/
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 10/18] x86/mtrr: Avoid ifdeffery with phys_wc_to_mtrr_index()
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (8 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 09/18] x86: Document Write Combining MTRR type effects on PAT / non-PAT pages Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:20 ` [tip:x86/mm] x86/mm/mtrr: Avoid #ifdeffery " tip-bot for Luis R. Rodriguez
2015-05-26 8:28 ` [PATCH 11/18] x86/mtrr: Generalize runtime disabling of MTRRs Borislav Petkov
` (7 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: "Luis R. Rodriguez" <mcgrof@suse.com>
There is only one user but since we're going to bury MTRR next out of
access to drivers, expose this last piece of API to drivers in a general
fashion only needing io.h for access to helpers.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Cristian Stoica <cristian.stoica@freescale.com>
Cc: dri-devel@lists.freedesktop.org
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Link: http://lkml.kernel.org/r/1429722736-4473-1-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/include/asm/io.h | 3 +++
arch/x86/include/asm/mtrr.h | 5 -----
arch/x86/kernel/cpu/mtrr/main.c | 6 +++---
drivers/gpu/drm/drm_ioctl.c | 14 +-------------
include/linux/io.h | 7 +++++++
5 files changed, 14 insertions(+), 21 deletions(-)
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 4afc05ffa566..a2b97404922d 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -339,6 +339,9 @@ extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
#define IO_SPACE_LIMIT 0xffff
#ifdef CONFIG_MTRR
+extern int __must_check arch_phys_wc_index(int handle);
+#define arch_phys_wc_index arch_phys_wc_index
+
extern int __must_check arch_phys_wc_add(unsigned long base,
unsigned long size);
extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index a31759e1edd9..b94f6f64e23d 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -48,7 +48,6 @@ extern void mtrr_aps_init(void);
extern void mtrr_bp_restore(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
-extern int phys_wc_to_mtrr_index(int handle);
# else
static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
{
@@ -84,10 +83,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
{
}
-static inline int phys_wc_to_mtrr_index(int handle)
-{
- return -1;
-}
#define mtrr_ap_init() do {} while (0)
#define mtrr_bp_init() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 04aceb7e6443..81baf5fee0e1 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -580,7 +580,7 @@ void arch_phys_wc_del(int handle)
EXPORT_SYMBOL(arch_phys_wc_del);
/*
- * phys_wc_to_mtrr_index - translates arch_phys_wc_add's return value
+ * arch_phys_wc_index - translates arch_phys_wc_add's return value
* @handle: Return value from arch_phys_wc_add
*
* This will turn the return value from arch_phys_wc_add into an mtrr
@@ -590,14 +590,14 @@ EXPORT_SYMBOL(arch_phys_wc_del);
* in printk line. Alas there is an illegitimate use in some ancient
* drm ioctls.
*/
-int phys_wc_to_mtrr_index(int handle)
+int arch_phys_wc_index(int handle)
{
if (handle < MTRR_TO_PHYS_WC_OFFSET)
return -1;
else
return handle - MTRR_TO_PHYS_WC_OFFSET;
}
-EXPORT_SYMBOL_GPL(phys_wc_to_mtrr_index);
+EXPORT_SYMBOL_GPL(arch_phys_wc_index);
/*
* HACK ALERT!
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 266dcd6cdf3b..0a957828b3bd 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -36,9 +36,6 @@
#include <linux/pci.h>
#include <linux/export.h>
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
static int drm_version(struct drm_device *dev, void *data,
struct drm_file *file_priv);
@@ -197,16 +194,7 @@ static int drm_getmap(struct drm_device *dev, void *data,
map->type = r_list->map->type;
map->flags = r_list->map->flags;
map->handle = (void *)(unsigned long) r_list->user_token;
-
-#ifdef CONFIG_X86
- /*
- * There appears to be exactly one user of the mtrr index: dritest.
- * It's easy enough to keep it working on non-PAT systems.
- */
- map->mtrr = phys_wc_to_mtrr_index(r_list->map->mtrr);
-#else
- map->mtrr = -1;
-#endif
+ map->mtrr = arch_phys_wc_index(r_list->map->mtrr);
mutex_unlock(&dev->struct_mutex);
diff --git a/include/linux/io.h b/include/linux/io.h
index 986f2bffea1e..04cce4da3685 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -111,6 +111,13 @@ static inline void arch_phys_wc_del(int handle)
}
#define arch_phys_wc_add arch_phys_wc_add
+#ifndef arch_phys_wc_index
+static inline int arch_phys_wc_index(int handle)
+{
+ return -1;
+}
+#define arch_phys_wc_index arch_phys_wc_index
+#endif
#endif
#endif /* _LINUX_IO_H */
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/mm] x86/mm/mtrr: Avoid #ifdeffery with phys_wc_to_mtrr_index()
2015-05-26 8:28 ` [PATCH 10/18] x86/mtrr: Avoid ifdeffery with phys_wc_to_mtrr_index() Borislav Petkov
@ 2015-05-27 14:20 ` tip-bot for Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-27 14:20 UTC (permalink / raw)
To: linux-tip-commits
Cc: matthias.bgg, plagnioj, akpm, hpa, luto, sbsiddha, will.deacon,
bp, dbueso, brgerst, vbabka, a.kesavan, mcgrof, adaplas, syrjala,
mgorman, dave.hansen, peterz, tomi.valkeinen, linux-kernel,
jgross, airlied, mingo, treding, daniel.vetter, tglx,
catalin.marinas, toshi.kani, cristian.stoica, torvalds, gregkh,
dvlasenk, bp
Commit-ID: 7d010fdf299929f9583ce5e17da629dcd83c36ef
Gitweb: http://git.kernel.org/tip/7d010fdf299929f9583ce5e17da629dcd83c36ef
Author: Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Tue, 26 May 2015 10:28:13 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:41:00 +0200
x86/mm/mtrr: Avoid #ifdeffery with phys_wc_to_mtrr_index()
There is only one user but since we're going to bury MTRR next
out of access to drivers, expose this last piece of API to
drivers in a general fashion only needing io.h for access to
helpers.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Cristian Stoica <cristian.stoica@freescale.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1429722736-4473-1-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-11-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/io.h | 3 +++
arch/x86/include/asm/mtrr.h | 5 -----
arch/x86/kernel/cpu/mtrr/main.c | 6 +++---
drivers/gpu/drm/drm_ioctl.c | 14 +-------------
include/linux/io.h | 7 +++++++
5 files changed, 14 insertions(+), 21 deletions(-)
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 4afc05f..a2b9740 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -339,6 +339,9 @@ extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
#define IO_SPACE_LIMIT 0xffff
#ifdef CONFIG_MTRR
+extern int __must_check arch_phys_wc_index(int handle);
+#define arch_phys_wc_index arch_phys_wc_index
+
extern int __must_check arch_phys_wc_add(unsigned long base,
unsigned long size);
extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index a31759e..b94f6f6 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -48,7 +48,6 @@ extern void mtrr_aps_init(void);
extern void mtrr_bp_restore(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
extern int amd_special_default_mtrr(void);
-extern int phys_wc_to_mtrr_index(int handle);
# else
static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
{
@@ -84,10 +83,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
{
}
-static inline int phys_wc_to_mtrr_index(int handle)
-{
- return -1;
-}
#define mtrr_ap_init() do {} while (0)
#define mtrr_bp_init() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 04aceb7..81baf5f 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -580,7 +580,7 @@ void arch_phys_wc_del(int handle)
EXPORT_SYMBOL(arch_phys_wc_del);
/*
- * phys_wc_to_mtrr_index - translates arch_phys_wc_add's return value
+ * arch_phys_wc_index - translates arch_phys_wc_add's return value
* @handle: Return value from arch_phys_wc_add
*
* This will turn the return value from arch_phys_wc_add into an mtrr
@@ -590,14 +590,14 @@ EXPORT_SYMBOL(arch_phys_wc_del);
* in printk line. Alas there is an illegitimate use in some ancient
* drm ioctls.
*/
-int phys_wc_to_mtrr_index(int handle)
+int arch_phys_wc_index(int handle)
{
if (handle < MTRR_TO_PHYS_WC_OFFSET)
return -1;
else
return handle - MTRR_TO_PHYS_WC_OFFSET;
}
-EXPORT_SYMBOL_GPL(phys_wc_to_mtrr_index);
+EXPORT_SYMBOL_GPL(arch_phys_wc_index);
/*
* HACK ALERT!
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 266dcd6..0a95782 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -36,9 +36,6 @@
#include <linux/pci.h>
#include <linux/export.h>
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
static int drm_version(struct drm_device *dev, void *data,
struct drm_file *file_priv);
@@ -197,16 +194,7 @@ static int drm_getmap(struct drm_device *dev, void *data,
map->type = r_list->map->type;
map->flags = r_list->map->flags;
map->handle = (void *)(unsigned long) r_list->user_token;
-
-#ifdef CONFIG_X86
- /*
- * There appears to be exactly one user of the mtrr index: dritest.
- * It's easy enough to keep it working on non-PAT systems.
- */
- map->mtrr = phys_wc_to_mtrr_index(r_list->map->mtrr);
-#else
- map->mtrr = -1;
-#endif
+ map->mtrr = arch_phys_wc_index(r_list->map->mtrr);
mutex_unlock(&dev->struct_mutex);
diff --git a/include/linux/io.h b/include/linux/io.h
index 986f2bf..04cce4d 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -111,6 +111,13 @@ static inline void arch_phys_wc_del(int handle)
}
#define arch_phys_wc_add arch_phys_wc_add
+#ifndef arch_phys_wc_index
+static inline int arch_phys_wc_index(int handle)
+{
+ return -1;
+}
+#define arch_phys_wc_index arch_phys_wc_index
+#endif
#endif
#endif /* _LINUX_IO_H */
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 11/18] x86/mtrr: Generalize runtime disabling of MTRRs
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (9 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 10/18] x86/mtrr: Avoid ifdeffery with phys_wc_to_mtrr_index() Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:20 ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Luis R. Rodriguez
2015-05-26 8:28 ` [PATCH 12/18] x86/mm/pat: Wrap pat_enabled Borislav Petkov
` (6 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: "Luis R. Rodriguez" <mcgrof@suse.com>
It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT and end up with
a system with MTRR functionality disabled but PAT functionality enabled.
This can happen, for instance, when the Xen hypervisor is used where
MTRRs are not supported but PAT is. This can happen on Linux as of commit
47591df50512 ("xen: Support Xen pv-domains using PAT")
by Juergen, introduced in v3.19.
Technically, we should assume the proper CPU bits would be set to
disable MTRRs but we can't always rely on this. At least on the Xen
Hypervisor, for instance, only X86_FEATURE_MTRR was disabled as of
Xen 4.4 through Xen commit 586ab6a [0], but not X86_FEATURE_K6_MTRR,
X86_FEATURE_CENTAUR_MCR, or X86_FEATURE_CYRIX_ARR for instance.
Roger Pau Monné has clarified though that although this is technically
true we will never support PVH on these CPU types so Xen has no need to
disable these bits on those systems. As per Roger, AMD K6, Centaur and
VIA chips don't have the necessary hardware extensions to allow running
PVH guests [1].
As per Toshi it is also possible for the BIOS to disable MTRR support,
in such cases get_mtrr_state() would update the MTRR state as per the
BIOS, we need to propagate this information as well.
x86 MTRR code relies on quite a bit of checks for mtrr_if being set to
check to see if MTRRs did get set up. Instead, lets provide a generic
getter for that. This also adds a few checks where they were not before
which could potentially safeguard ourselves against incorrect usage of
MTRR where this was not desirable.
Where possible match error codes as if MTRRs were disabled on
arch/x86/include/asm/mtrr.h.
Lastly, since disabling MTRRs can happen at run time and we could end up
with PAT enabled, best record now in our logs when MTRRs are disabled.
[0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a 4.4.0-rc1~18
[1] http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg03460.html
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: bhelgaas@google.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1426893517-2511-3-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/mtrr/generic.c | 4 +++-
arch/x86/kernel/cpu/mtrr/main.c | 39 ++++++++++++++++++++++++++++++--------
arch/x86/kernel/cpu/mtrr/mtrr.h | 2 +-
3 files changed, 35 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index f782d9b62cb3..3b533cf37c74 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -445,7 +445,7 @@ static void __init print_mtrr_state(void)
}
/* Grab all of the MTRR state for this CPU into *state */
-void __init get_mtrr_state(void)
+bool __init get_mtrr_state(void)
{
struct mtrr_var_range *vrs;
unsigned long flags;
@@ -489,6 +489,8 @@ void __init get_mtrr_state(void)
post_set();
local_irq_restore(flags);
+
+ return !!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED);
}
/* Some BIOS's are messed up and don't set all MTRRs the same! */
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 81baf5fee0e1..383efb26e516 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -59,6 +59,12 @@
#define MTRR_TO_PHYS_WC_OFFSET 1000
u32 num_var_ranges;
+static bool __mtrr_enabled;
+
+static bool mtrr_enabled(void)
+{
+ return __mtrr_enabled;
+}
unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
static DEFINE_MUTEX(mtrr_mutex);
@@ -286,7 +292,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
int i, replace, error;
mtrr_type ltype;
- if (!mtrr_if)
+ if (!mtrr_enabled())
return -ENXIO;
error = mtrr_if->validate_add_page(base, size, type);
@@ -435,6 +441,8 @@ static int mtrr_check(unsigned long base, unsigned long size)
int mtrr_add(unsigned long base, unsigned long size, unsigned int type,
bool increment)
{
+ if (!mtrr_enabled())
+ return -ENODEV;
if (mtrr_check(base, size))
return -EINVAL;
return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type,
@@ -463,8 +471,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
unsigned long lbase, lsize;
int error = -EINVAL;
- if (!mtrr_if)
- return -ENXIO;
+ if (!mtrr_enabled())
+ return -ENODEV;
max = num_var_ranges;
/* No CPU hotplug when we change MTRR entries */
@@ -523,6 +531,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
*/
int mtrr_del(int reg, unsigned long base, unsigned long size)
{
+ if (!mtrr_enabled())
+ return -ENODEV;
if (mtrr_check(base, size))
return -EINVAL;
return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
@@ -548,7 +558,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
{
int ret;
- if (pat_enabled)
+ if (pat_enabled || !mtrr_enabled())
return 0; /* Success! (We don't need to do anything.) */
ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
@@ -737,10 +747,12 @@ void __init mtrr_bp_init(void)
}
if (mtrr_if) {
+ __mtrr_enabled = true;
set_num_var_ranges();
init_table();
if (use_intel()) {
- get_mtrr_state();
+ /* BIOS may override */
+ __mtrr_enabled = get_mtrr_state();
if (mtrr_cleanup(phys_addr)) {
changed_by_mtrr_cleanup = 1;
@@ -748,10 +760,16 @@ void __init mtrr_bp_init(void)
}
}
}
+
+ if (!mtrr_enabled())
+ pr_info("MTRR: Disabled\n");
}
void mtrr_ap_init(void)
{
+ if (!mtrr_enabled())
+ return;
+
if (!use_intel() || mtrr_aps_delayed_init)
return;
/*
@@ -777,6 +795,9 @@ void mtrr_save_state(void)
{
int first_cpu;
+ if (!mtrr_enabled())
+ return;
+
get_online_cpus();
first_cpu = cpumask_first(cpu_online_mask);
smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
@@ -785,6 +806,8 @@ void mtrr_save_state(void)
void set_mtrr_aps_delayed_init(void)
{
+ if (!mtrr_enabled())
+ return;
if (!use_intel())
return;
@@ -796,7 +819,7 @@ void set_mtrr_aps_delayed_init(void)
*/
void mtrr_aps_init(void)
{
- if (!use_intel())
+ if (!use_intel() || !mtrr_enabled())
return;
/*
@@ -813,7 +836,7 @@ void mtrr_aps_init(void)
void mtrr_bp_restore(void)
{
- if (!use_intel())
+ if (!use_intel() || !mtrr_enabled())
return;
mtrr_if->set_all();
@@ -821,7 +844,7 @@ void mtrr_bp_restore(void)
static int __init mtrr_init_finialize(void)
{
- if (!mtrr_if)
+ if (!mtrr_enabled())
return 0;
if (use_intel()) {
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index df5e41f31a27..951884dcc433 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -51,7 +51,7 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
void fill_mtrr_var_range(unsigned int index,
u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
-void get_mtrr_state(void);
+bool get_mtrr_state(void);
extern void set_mtrr_ops(const struct mtrr_ops *ops);
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/mm] x86/mm/mtrr: Generalize runtime disabling of MTRRs
2015-05-26 8:28 ` [PATCH 11/18] x86/mtrr: Generalize runtime disabling of MTRRs Borislav Petkov
@ 2015-05-27 14:20 ` tip-bot for Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-27 14:20 UTC (permalink / raw)
To: linux-tip-commits
Cc: bp, bp, brgerst, mingo, linux-kernel, sbsiddha, mgorman, airlied,
roger.pau, tomi.valkeinen, plagnioj, adaplas, dvlasenk,
dave.hansen, torvalds, luto, syrjala, toshi.kani, tglx,
stefan.bader, peterz, jgross, vbabka, hpa, mcgrof, dbueso,
daniel.vetter
Commit-ID: f9626104a5b6815ec7d65789dfb900af5fa51e64
Gitweb: http://git.kernel.org/tip/f9626104a5b6815ec7d65789dfb900af5fa51e64
Author: Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Tue, 26 May 2015 10:28:14 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:41:01 +0200
x86/mm/mtrr: Generalize runtime disabling of MTRRs
It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT and end
up with a system with MTRR functionality disabled but PAT
functionality enabled. This can happen, for instance, when the
Xen hypervisor is used where MTRRs are not supported but PAT is.
This can happen on Linux as of commit
47591df50512 ("xen: Support Xen pv-domains using PAT")
by Juergen, introduced in v3.19.
Technically, we should assume the proper CPU bits would be set
to disable MTRRs but we can't always rely on this. At least on
the Xen Hypervisor, for instance, only X86_FEATURE_MTRR was
disabled as of Xen 4.4 through Xen commit 586ab6a [0], but not
X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR, or
X86_FEATURE_CYRIX_ARR for instance.
Roger Pau Monné has clarified though that although this is
technically true we will never support PVH on these CPU types so
Xen has no need to disable these bits on those systems. As per
Roger, AMD K6, Centaur and VIA chips don't have the necessary
hardware extensions to allow running PVH guests [1].
As per Toshi it is also possible for the BIOS to disable MTRR
support, in such cases get_mtrr_state() would update the MTRR
state as per the BIOS, we need to propagate this information as
well.
x86 MTRR code relies on quite a bit of checks for mtrr_if being
set to check to see if MTRRs did get set up. Instead, lets
provide a generic getter for that. This also adds a few checks
where they were not before which could potentially safeguard
ourselves against incorrect usage of MTRR where this was not
desirable.
Where possible match error codes as if MTRRs were disabled on
arch/x86/include/asm/mtrr.h.
Lastly, since disabling MTRRs can happen at run time and we
could end up with PAT enabled, best record now in our logs when
MTRRs are disabled.
[0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a 4.4.0-rc1~18
[1] http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg03460.html
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: bhelgaas@google.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: konrad.wilk@oracle.com
Cc: venkatesh.pallipadi@intel.com
Cc: ville.syrjala@linux.intel.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1426893517-2511-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-12-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/kernel/cpu/mtrr/generic.c | 4 +++-
arch/x86/kernel/cpu/mtrr/main.c | 39 ++++++++++++++++++++++++++++++--------
arch/x86/kernel/cpu/mtrr/mtrr.h | 2 +-
3 files changed, 35 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index f782d9b..3b533cf 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -445,7 +445,7 @@ static void __init print_mtrr_state(void)
}
/* Grab all of the MTRR state for this CPU into *state */
-void __init get_mtrr_state(void)
+bool __init get_mtrr_state(void)
{
struct mtrr_var_range *vrs;
unsigned long flags;
@@ -489,6 +489,8 @@ void __init get_mtrr_state(void)
post_set();
local_irq_restore(flags);
+
+ return !!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED);
}
/* Some BIOS's are messed up and don't set all MTRRs the same! */
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 81baf5f..383efb2 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -59,6 +59,12 @@
#define MTRR_TO_PHYS_WC_OFFSET 1000
u32 num_var_ranges;
+static bool __mtrr_enabled;
+
+static bool mtrr_enabled(void)
+{
+ return __mtrr_enabled;
+}
unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
static DEFINE_MUTEX(mtrr_mutex);
@@ -286,7 +292,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
int i, replace, error;
mtrr_type ltype;
- if (!mtrr_if)
+ if (!mtrr_enabled())
return -ENXIO;
error = mtrr_if->validate_add_page(base, size, type);
@@ -435,6 +441,8 @@ static int mtrr_check(unsigned long base, unsigned long size)
int mtrr_add(unsigned long base, unsigned long size, unsigned int type,
bool increment)
{
+ if (!mtrr_enabled())
+ return -ENODEV;
if (mtrr_check(base, size))
return -EINVAL;
return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type,
@@ -463,8 +471,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
unsigned long lbase, lsize;
int error = -EINVAL;
- if (!mtrr_if)
- return -ENXIO;
+ if (!mtrr_enabled())
+ return -ENODEV;
max = num_var_ranges;
/* No CPU hotplug when we change MTRR entries */
@@ -523,6 +531,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
*/
int mtrr_del(int reg, unsigned long base, unsigned long size)
{
+ if (!mtrr_enabled())
+ return -ENODEV;
if (mtrr_check(base, size))
return -EINVAL;
return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
@@ -548,7 +558,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
{
int ret;
- if (pat_enabled)
+ if (pat_enabled || !mtrr_enabled())
return 0; /* Success! (We don't need to do anything.) */
ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
@@ -737,10 +747,12 @@ void __init mtrr_bp_init(void)
}
if (mtrr_if) {
+ __mtrr_enabled = true;
set_num_var_ranges();
init_table();
if (use_intel()) {
- get_mtrr_state();
+ /* BIOS may override */
+ __mtrr_enabled = get_mtrr_state();
if (mtrr_cleanup(phys_addr)) {
changed_by_mtrr_cleanup = 1;
@@ -748,10 +760,16 @@ void __init mtrr_bp_init(void)
}
}
}
+
+ if (!mtrr_enabled())
+ pr_info("MTRR: Disabled\n");
}
void mtrr_ap_init(void)
{
+ if (!mtrr_enabled())
+ return;
+
if (!use_intel() || mtrr_aps_delayed_init)
return;
/*
@@ -777,6 +795,9 @@ void mtrr_save_state(void)
{
int first_cpu;
+ if (!mtrr_enabled())
+ return;
+
get_online_cpus();
first_cpu = cpumask_first(cpu_online_mask);
smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
@@ -785,6 +806,8 @@ void mtrr_save_state(void)
void set_mtrr_aps_delayed_init(void)
{
+ if (!mtrr_enabled())
+ return;
if (!use_intel())
return;
@@ -796,7 +819,7 @@ void set_mtrr_aps_delayed_init(void)
*/
void mtrr_aps_init(void)
{
- if (!use_intel())
+ if (!use_intel() || !mtrr_enabled())
return;
/*
@@ -813,7 +836,7 @@ void mtrr_aps_init(void)
void mtrr_bp_restore(void)
{
- if (!use_intel())
+ if (!use_intel() || !mtrr_enabled())
return;
mtrr_if->set_all();
@@ -821,7 +844,7 @@ void mtrr_bp_restore(void)
static int __init mtrr_init_finialize(void)
{
- if (!mtrr_if)
+ if (!mtrr_enabled())
return 0;
if (use_intel()) {
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index df5e41f..951884d 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -51,7 +51,7 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
void fill_mtrr_var_range(unsigned int index,
u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
-void get_mtrr_state(void);
+bool get_mtrr_state(void);
extern void set_mtrr_ops(const struct mtrr_ops *ops);
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 12/18] x86/mm/pat: Wrap pat_enabled
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (10 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 11/18] x86/mtrr: Generalize runtime disabling of MTRRs Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:20 ` [tip:x86/mm] x86/mm/pat: Wrap pat_enabled into a function API tip-bot for Luis R. Rodriguez
2015-05-26 8:28 ` [PATCH 13/18] x86/mm/pat: Export pat_enabled() Borislav Petkov
` (5 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: "Luis R. Rodriguez" <mcgrof@suse.com>
We use pat_enabled in x86-specific code to see if PAT is enabled or not
but we're granting full access to it even though readers do not need to
set it. If, for instance, we granted access to it to modules later they
then could override the variable setting... no bueno.
This renames pat_enabled to a new static variable __pat_enabled. Folks
are redirected to use pat_enabled() now.
Code that sets this can only be internal to pat.c. Apart from the early
kernel parameter "nopat" to disable PAT, we also have a few cases that
disable it later and make use of a helper pat_disable(). It is wrapped
under an ifdef but since that code cannot run unless PAT was enabled its
not required to wrap it with ifdefs, unwrap that. Likewise, since "nopat"
doesn't really change non-PAT systems just remove that ifdef as well.
Although we could add and use an early_param_off(), these helpers don't
use __read_mostly but we want to keep __read_mostly for __pat_enabled as
this is a hot path -- upon boot, for instance, a simple guest may see
~4k accesses to pat_enabled(). Since __read_mostly early boot params are
not that common we don't add a helper for them just yet.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/include/asm/pat.h | 7 +------
arch/x86/kernel/cpu/mtrr/main.c | 2 +-
arch/x86/mm/iomap_32.c | 2 +-
arch/x86/mm/ioremap.c | 4 ++--
arch/x86/mm/pageattr.c | 2 +-
arch/x86/mm/pat.c | 33 +++++++++++++++------------------
arch/x86/pci/i386.c | 6 +++---
7 files changed, 24 insertions(+), 32 deletions(-)
diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
index 91bc4ba95f91..cdcff7f7f694 100644
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -4,12 +4,7 @@
#include <linux/types.h>
#include <asm/pgtable_types.h>
-#ifdef CONFIG_X86_PAT
-extern int pat_enabled;
-#else
-static const int pat_enabled;
-#endif
-
+bool pat_enabled(void);
extern void pat_init(void);
void pat_init_cache_modes(void);
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 383efb26e516..e7ed0d8ebacb 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -558,7 +558,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
{
int ret;
- if (pat_enabled || !mtrr_enabled())
+ if (pat_enabled() || !mtrr_enabled())
return 0; /* Success! (We don't need to do anything.) */
ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
index 9ca35fc60cfe..3a2ec8790ca7 100644
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -82,7 +82,7 @@ iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
* MTRR is UC or WC. UC_MINUS gets the real intention, of the
* user, which is "WC if the MTRR is WC, UC if you can't do that."
*/
- if (!pat_enabled && pgprot_val(prot) ==
+ if (!pat_enabled() && pgprot_val(prot) ==
(__PAGE_KERNEL | cachemode2protval(_PAGE_CACHE_MODE_WC)))
prot = __pgprot(__PAGE_KERNEL |
cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index a493bb83aa89..82d63ed70045 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -234,7 +234,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
{
/*
* Ideally, this should be:
- * pat_enabled ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
+ * pat_enabled() ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
*
* Till we fix all X drivers to use ioremap_wc(), we will use
* UC MINUS. Drivers that are certain they need or can already
@@ -292,7 +292,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
*/
void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
{
- if (pat_enabled)
+ if (pat_enabled())
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
__builtin_return_address(0));
else
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 397838eb292b..70d221fe2eb4 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1571,7 +1571,7 @@ int set_memory_wc(unsigned long addr, int numpages)
{
int ret;
- if (!pat_enabled)
+ if (!pat_enabled())
return set_memory_uc(addr, numpages);
ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 8c50b9bfa996..484dce7f759b 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -36,12 +36,11 @@
#undef pr_fmt
#define pr_fmt(fmt) "" fmt
-#ifdef CONFIG_X86_PAT
-int __read_mostly pat_enabled = 1;
+static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
static inline void pat_disable(const char *reason)
{
- pat_enabled = 0;
+ __pat_enabled = 0;
pr_info("x86/PAT: %s\n", reason);
}
@@ -51,13 +50,11 @@ static int __init nopat(char *str)
return 0;
}
early_param("nopat", nopat);
-#else
-static inline void pat_disable(const char *reason)
+
+bool pat_enabled(void)
{
- (void)reason;
+ return !!__pat_enabled;
}
-#endif
-
int pat_debug_enable;
@@ -201,7 +198,7 @@ void pat_init(void)
u64 pat;
bool boot_cpu = !boot_pat_state;
- if (!pat_enabled)
+ if (!pat_enabled())
return;
if (!cpu_has_pat) {
@@ -402,7 +399,7 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
BUG_ON(start >= end); /* end is exclusive */
- if (!pat_enabled) {
+ if (!pat_enabled()) {
/* This is identical to page table setting without PAT */
if (new_type) {
if (req_type == _PAGE_CACHE_MODE_WC)
@@ -477,7 +474,7 @@ int free_memtype(u64 start, u64 end)
int is_range_ram;
struct memtype *entry;
- if (!pat_enabled)
+ if (!pat_enabled())
return 0;
/* Low ISA region is always mapped WB. No need to track */
@@ -625,7 +622,7 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
u64 to = from + size;
u64 cursor = from;
- if (!pat_enabled)
+ if (!pat_enabled())
return 1;
while (cursor < to) {
@@ -661,7 +658,7 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
* caching for the high addresses through the KEN pin, but
* we maintain the tradition of paranoia in this code.
*/
- if (!pat_enabled &&
+ if (!pat_enabled() &&
!(boot_cpu_has(X86_FEATURE_MTRR) ||
boot_cpu_has(X86_FEATURE_K6_MTRR) ||
boot_cpu_has(X86_FEATURE_CYRIX_ARR) ||
@@ -730,7 +727,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
* the type requested matches the type of first page in the range.
*/
if (is_ram) {
- if (!pat_enabled)
+ if (!pat_enabled())
return 0;
pcm = lookup_memtype(paddr);
@@ -844,7 +841,7 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
return ret;
}
- if (!pat_enabled)
+ if (!pat_enabled())
return 0;
/*
@@ -872,7 +869,7 @@ int track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
{
enum page_cache_mode pcm;
- if (!pat_enabled)
+ if (!pat_enabled())
return 0;
/* Set prot based on lookup */
@@ -913,7 +910,7 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
pgprot_t pgprot_writecombine(pgprot_t prot)
{
- if (pat_enabled)
+ if (pat_enabled())
return __pgprot(pgprot_val(prot) |
cachemode2protval(_PAGE_CACHE_MODE_WC));
else
@@ -996,7 +993,7 @@ static const struct file_operations memtype_fops = {
static int __init pat_memtype_list_init(void)
{
- if (pat_enabled) {
+ if (pat_enabled()) {
debugfs_create_file("pat_memtype_list", S_IRUSR,
arch_debugfs_dir, NULL, &memtype_fops);
}
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 349c0d32cc0b..0a9f2caf358f 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -429,12 +429,12 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
* Caller can followup with UC MINUS request and add a WC mtrr if there
* is a free mtrr slot.
*/
- if (!pat_enabled && write_combine)
+ if (!pat_enabled() && write_combine)
return -EINVAL;
- if (pat_enabled && write_combine)
+ if (pat_enabled() && write_combine)
prot |= cachemode2protval(_PAGE_CACHE_MODE_WC);
- else if (pat_enabled || boot_cpu_data.x86 > 3)
+ else if (pat_enabled() || boot_cpu_data.x86 > 3)
/*
* ioremap() and ioremap_nocache() defaults to UC MINUS for now.
* To avoid attribute conflicts, request UC MINUS here
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/mm] x86/mm/pat: Wrap pat_enabled into a function API
2015-05-26 8:28 ` [PATCH 12/18] x86/mm/pat: Wrap pat_enabled Borislav Petkov
@ 2015-05-27 14:20 ` tip-bot for Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-27 14:20 UTC (permalink / raw)
To: linux-tip-commits
Cc: peterz, awalls, bp, tglx, dledford, brgerst, mingo, bhelgaas,
mcgrof, torvalds, jgross, dvlasenk, hpa, kyle, bp, cl,
linux-kernel, luto, airlied, mst, daniel.vetter
Commit-ID: cb32edf65bf2197a2d2226e94c7602dc92e295bb
Gitweb: http://git.kernel.org/tip/cb32edf65bf2197a2d2226e94c7602dc92e295bb
Author: Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Tue, 26 May 2015 10:28:15 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:41:01 +0200
x86/mm/pat: Wrap pat_enabled into a function API
We use pat_enabled in x86-specific code to see if PAT is enabled
or not but we're granting full access to it even though readers
do not need to set it. If, for instance, we granted access to it
to modules later they then could override the variable
setting... no bueno.
This renames pat_enabled to a new static variable __pat_enabled.
Folks are redirected to use pat_enabled() now.
Code that sets this can only be internal to pat.c. Apart from
the early kernel parameter "nopat" to disable PAT, we also have
a few cases that disable it later and make use of a helper
pat_disable(). It is wrapped under an ifdef but since that code
cannot run unless PAT was enabled its not required to wrap it
with ifdefs, unwrap that. Likewise, since "nopat" doesn't really
change non-PAT systems just remove that ifdef as well.
Although we could add and use an early_param_off(), these
helpers don't use __read_mostly but we want to keep
__read_mostly for __pat_enabled as this is a hot path -- upon
boot, for instance, a simple guest may see ~4k accesses to
pat_enabled(). Since __read_mostly early boot params are not
that common we don't add a helper for them just yet.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-13-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/include/asm/pat.h | 7 +------
arch/x86/kernel/cpu/mtrr/main.c | 2 +-
arch/x86/mm/iomap_32.c | 2 +-
arch/x86/mm/ioremap.c | 4 ++--
arch/x86/mm/pageattr.c | 2 +-
arch/x86/mm/pat.c | 33 +++++++++++++++------------------
arch/x86/pci/i386.c | 6 +++---
7 files changed, 24 insertions(+), 32 deletions(-)
diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
index 91bc4ba..cdcff7f 100644
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -4,12 +4,7 @@
#include <linux/types.h>
#include <asm/pgtable_types.h>
-#ifdef CONFIG_X86_PAT
-extern int pat_enabled;
-#else
-static const int pat_enabled;
-#endif
-
+bool pat_enabled(void);
extern void pat_init(void);
void pat_init_cache_modes(void);
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 383efb2..e7ed0d8 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -558,7 +558,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
{
int ret;
- if (pat_enabled || !mtrr_enabled())
+ if (pat_enabled() || !mtrr_enabled())
return 0; /* Success! (We don't need to do anything.) */
ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
index 9ca35fc..3a2ec87 100644
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -82,7 +82,7 @@ iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
* MTRR is UC or WC. UC_MINUS gets the real intention, of the
* user, which is "WC if the MTRR is WC, UC if you can't do that."
*/
- if (!pat_enabled && pgprot_val(prot) ==
+ if (!pat_enabled() && pgprot_val(prot) ==
(__PAGE_KERNEL | cachemode2protval(_PAGE_CACHE_MODE_WC)))
prot = __pgprot(__PAGE_KERNEL |
cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index a493bb8..82d63ed 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -234,7 +234,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
{
/*
* Ideally, this should be:
- * pat_enabled ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
+ * pat_enabled() ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
*
* Till we fix all X drivers to use ioremap_wc(), we will use
* UC MINUS. Drivers that are certain they need or can already
@@ -292,7 +292,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
*/
void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
{
- if (pat_enabled)
+ if (pat_enabled())
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
__builtin_return_address(0));
else
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 397838e..70d221f 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1571,7 +1571,7 @@ int set_memory_wc(unsigned long addr, int numpages)
{
int ret;
- if (!pat_enabled)
+ if (!pat_enabled())
return set_memory_uc(addr, numpages);
ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 8c50b9b..484dce7 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -36,12 +36,11 @@
#undef pr_fmt
#define pr_fmt(fmt) "" fmt
-#ifdef CONFIG_X86_PAT
-int __read_mostly pat_enabled = 1;
+static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
static inline void pat_disable(const char *reason)
{
- pat_enabled = 0;
+ __pat_enabled = 0;
pr_info("x86/PAT: %s\n", reason);
}
@@ -51,13 +50,11 @@ static int __init nopat(char *str)
return 0;
}
early_param("nopat", nopat);
-#else
-static inline void pat_disable(const char *reason)
+
+bool pat_enabled(void)
{
- (void)reason;
+ return !!__pat_enabled;
}
-#endif
-
int pat_debug_enable;
@@ -201,7 +198,7 @@ void pat_init(void)
u64 pat;
bool boot_cpu = !boot_pat_state;
- if (!pat_enabled)
+ if (!pat_enabled())
return;
if (!cpu_has_pat) {
@@ -402,7 +399,7 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
BUG_ON(start >= end); /* end is exclusive */
- if (!pat_enabled) {
+ if (!pat_enabled()) {
/* This is identical to page table setting without PAT */
if (new_type) {
if (req_type == _PAGE_CACHE_MODE_WC)
@@ -477,7 +474,7 @@ int free_memtype(u64 start, u64 end)
int is_range_ram;
struct memtype *entry;
- if (!pat_enabled)
+ if (!pat_enabled())
return 0;
/* Low ISA region is always mapped WB. No need to track */
@@ -625,7 +622,7 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
u64 to = from + size;
u64 cursor = from;
- if (!pat_enabled)
+ if (!pat_enabled())
return 1;
while (cursor < to) {
@@ -661,7 +658,7 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
* caching for the high addresses through the KEN pin, but
* we maintain the tradition of paranoia in this code.
*/
- if (!pat_enabled &&
+ if (!pat_enabled() &&
!(boot_cpu_has(X86_FEATURE_MTRR) ||
boot_cpu_has(X86_FEATURE_K6_MTRR) ||
boot_cpu_has(X86_FEATURE_CYRIX_ARR) ||
@@ -730,7 +727,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
* the type requested matches the type of first page in the range.
*/
if (is_ram) {
- if (!pat_enabled)
+ if (!pat_enabled())
return 0;
pcm = lookup_memtype(paddr);
@@ -844,7 +841,7 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
return ret;
}
- if (!pat_enabled)
+ if (!pat_enabled())
return 0;
/*
@@ -872,7 +869,7 @@ int track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
{
enum page_cache_mode pcm;
- if (!pat_enabled)
+ if (!pat_enabled())
return 0;
/* Set prot based on lookup */
@@ -913,7 +910,7 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
pgprot_t pgprot_writecombine(pgprot_t prot)
{
- if (pat_enabled)
+ if (pat_enabled())
return __pgprot(pgprot_val(prot) |
cachemode2protval(_PAGE_CACHE_MODE_WC));
else
@@ -996,7 +993,7 @@ static const struct file_operations memtype_fops = {
static int __init pat_memtype_list_init(void)
{
- if (pat_enabled) {
+ if (pat_enabled()) {
debugfs_create_file("pat_memtype_list", S_IRUSR,
arch_debugfs_dir, NULL, &memtype_fops);
}
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 349c0d3..0a9f2ca 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -429,12 +429,12 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
* Caller can followup with UC MINUS request and add a WC mtrr if there
* is a free mtrr slot.
*/
- if (!pat_enabled && write_combine)
+ if (!pat_enabled() && write_combine)
return -EINVAL;
- if (pat_enabled && write_combine)
+ if (pat_enabled() && write_combine)
prot |= cachemode2protval(_PAGE_CACHE_MODE_WC);
- else if (pat_enabled || boot_cpu_data.x86 > 3)
+ else if (pat_enabled() || boot_cpu_data.x86 > 3)
/*
* ioremap() and ioremap_nocache() defaults to UC MINUS for now.
* To avoid attribute conflicts, request UC MINUS here
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 13/18] x86/mm/pat: Export pat_enabled()
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (11 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 12/18] x86/mm/pat: Wrap pat_enabled Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:21 ` [tip:x86/mm] " tip-bot for Luis R. Rodriguez
2015-05-26 8:28 ` [PATCH 14/18] x86/cpu: Strip any /proc/cpuinfo model name field whitespace Borislav Petkov
` (4 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Two Linux device drivers cannot work with PAT and the work required to
make them work is significant. There is not enough motivation to convert
these drivers over to use PAT properly, the compromise reached is to let
drivers that cannot be ported to PAT check if PAT was enabled and if
so fail on probe with a recommendation to boot with the "nopat" kernel
parameter.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Link: http://lkml.kernel.org/r/1430425520-22275-4-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/mm/pat.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 484dce7f759b..a1c96544099d 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -55,6 +55,7 @@ bool pat_enabled(void)
{
return !!__pat_enabled;
}
+EXPORT_SYMBOL_GPL(pat_enabled);
int pat_debug_enable;
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/mm] x86/mm/pat: Export pat_enabled()
2015-05-26 8:28 ` [PATCH 13/18] x86/mm/pat: Export pat_enabled() Borislav Petkov
@ 2015-05-27 14:21 ` tip-bot for Luis R. Rodriguez
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-27 14:21 UTC (permalink / raw)
To: linux-tip-commits
Cc: mcgrof, torvalds, peterz, tglx, mst, awalls, brgerst, jgross,
hpa, linux-kernel, dvlasenk, dledford, bp, bp, luto,
daniel.vetter, airlied, bhelgaas, mingo
Commit-ID: fbe7193aa4787f27c84216d130ab877efc310d57
Gitweb: http://git.kernel.org/tip/fbe7193aa4787f27c84216d130ab877efc310d57
Author: Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Tue, 26 May 2015 10:28:16 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:41:02 +0200
x86/mm/pat: Export pat_enabled()
Two Linux device drivers cannot work with PAT and the work
required to make them work is significant. There is not enough
motivation to convert these drivers over to use PAT properly,
the compromise reached is to let drivers that cannot be ported
to PAT check if PAT was enabled and if so fail on probe with a
recommendation to boot with the "nopat" kernel parameter.
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1430425520-22275-4-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-14-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/mm/pat.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 484dce7..a1c9654 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -55,6 +55,7 @@ bool pat_enabled(void)
{
return !!__pat_enabled;
}
+EXPORT_SYMBOL_GPL(pat_enabled);
int pat_debug_enable;
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 14/18] x86/cpu: Strip any /proc/cpuinfo model name field whitespace
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (12 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 13/18] x86/mm/pat: Export pat_enabled() Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:16 ` [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo " tip-bot for Prarit Bhargava
2015-05-26 8:28 ` [PATCH 15/18] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
` (3 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: Prarit Bhargava <prarit@redhat.com>
When comparing the 'model name' field of each core in /proc/cpuinfo it
was noticed that there is a whitespace difference between the cores'
model names.
After some quick investigation it was noticed that the model name fields
were actually different -- processor 0's model name field had trailing
whitespace removed, while the other processors did not.
Another way of seeing this behaviour is to convert spaces into
underscores in the output of /proc/cpuinfo,
[thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
______1_model_name :_AMD_Opteron(TM)_Processor_6272
_____63_model_name :_AMD_Opteron(TM)_Processor_6272_________________
which shows the discrepancy.
This occurs because the kernel calls strim() on cpu 0's x86_model_id
field to output a pretty message to the console in print_cpu_info(),
and as a result strips the whitespace at the end of the ->x86_model_id
field.
But, the ->x86_model_id field should be the same for the all identical
CPUs in the box. Thus, we need to remove both leading and trailing
whitespace.
As a result, the print_cpu_info() output looks like
smpboot: CPU0: AMD Opteron(TM) Processor 6272 (fam: 15, model: 01, stepping: 02)
and the x86_model_id field is correct on all processors on AMD platforms:
_____64_model_name :_AMD_Opteron(TM)_Processor_6272
Output is still correct on an Intel box:
____144_model_name :_Intel(R)_Xeon(R)_CPU_E7-8890_v3_@_2.50GHz
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/common.c | 17 ++++-------------
1 file changed, 4 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a62cf04dac8a..41a8e9cb30bc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -419,7 +419,6 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
static void get_model_name(struct cpuinfo_x86 *c)
{
unsigned int *v;
- char *p, *q;
if (c->extended_cpuid_level < 0x80000004)
return;
@@ -431,18 +430,10 @@ static void get_model_name(struct cpuinfo_x86 *c)
c->x86_model_id[48] = 0;
/*
- * Intel chips right-justify this string for some dumb reason;
- * undo that brain damage:
+ * Remove leading whitespace on Intel processors and trailing
+ * whitespace on AMD processors.
*/
- p = q = &c->x86_model_id[0];
- while (*p == ' ')
- p++;
- if (p != q) {
- while (*p)
- *q++ = *p++;
- while (q <= &c->x86_model_id[48])
- *q++ = '\0'; /* Zero-pad the rest */
- }
+ memmove(c->x86_model_id, strim(c->x86_model_id), 48);
}
void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
@@ -1122,7 +1113,7 @@ void print_cpu_info(struct cpuinfo_x86 *c)
printk(KERN_CONT "%s ", vendor);
if (c->x86_model_id[0])
- printk(KERN_CONT "%s", strim(c->x86_model_id));
+ printk(KERN_CONT "%s", c->x86_model_id);
else
printk(KERN_CONT "%d86", c->x86);
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
2015-05-26 8:28 ` [PATCH 14/18] x86/cpu: Strip any /proc/cpuinfo model name field whitespace Borislav Petkov
@ 2015-05-27 14:16 ` tip-bot for Prarit Bhargava
2015-05-27 17:07 ` Joe Perches
0 siblings, 1 reply; 400+ messages in thread
From: tip-bot for Prarit Bhargava @ 2015-05-27 14:16 UTC (permalink / raw)
To: linux-tip-commits
Cc: peterz, imammedo, torvalds, dvlasenk, mingo, brgerst, luto, bp,
tglx, bp, prarit, dave.hansen, linux-kernel, hpa, fenghua.yu
Commit-ID: adafb98da6a7af5e45362933a7dae6ab0e5076bf
Gitweb: http://git.kernel.org/tip/adafb98da6a7af5e45362933a7dae6ab0e5076bf
Author: Prarit Bhargava <prarit@redhat.com>
AuthorDate: Tue, 26 May 2015 10:28:17 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:38:24 +0200
x86/cpu: Strip any /proc/cpuinfo model name field whitespace
When comparing the 'model name' field of each core in
/proc/cpuinfo it was noticed that there is a whitespace
difference between the cores' model names.
After some quick investigation it was noticed that the model
name fields were actually different -- processor 0's model name
field had trailing whitespace removed, while the other
processors did not.
Another way of seeing this behaviour is to convert spaces into
underscores in the output of /proc/cpuinfo,
[thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
______1_model_name :_AMD_Opteron(TM)_Processor_6272
_____63_model_name :_AMD_Opteron(TM)_Processor_6272_________________
which shows the discrepancy.
This occurs because the kernel calls strim() on cpu 0's
x86_model_id field to output a pretty message to the console in
print_cpu_info(), and as a result strips the whitespace at the
end of the ->x86_model_id field.
But, the ->x86_model_id field should be the same for the all
identical CPUs in the box. Thus, we need to remove both leading
and trailing whitespace.
As a result, the print_cpu_info() output looks like
smpboot: CPU0: AMD Opteron(TM) Processor 6272 (fam: 15, model: 01, stepping: 02)
and the x86_model_id field is correct on all processors on AMD
platforms:
_____64_model_name :_AMD_Opteron(TM)_Processor_6272
Output is still correct on an Intel box:
____144_model_name :_Intel(R)_Xeon(R)_CPU_E7-8890_v3_@_2.50GHz
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Link: http://lkml.kernel.org/r/1432628901-18044-15-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/kernel/cpu/common.c | 17 ++++-------------
1 file changed, 4 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a62cf04..41a8e9c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -419,7 +419,6 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
static void get_model_name(struct cpuinfo_x86 *c)
{
unsigned int *v;
- char *p, *q;
if (c->extended_cpuid_level < 0x80000004)
return;
@@ -431,18 +430,10 @@ static void get_model_name(struct cpuinfo_x86 *c)
c->x86_model_id[48] = 0;
/*
- * Intel chips right-justify this string for some dumb reason;
- * undo that brain damage:
+ * Remove leading whitespace on Intel processors and trailing
+ * whitespace on AMD processors.
*/
- p = q = &c->x86_model_id[0];
- while (*p == ' ')
- p++;
- if (p != q) {
- while (*p)
- *q++ = *p++;
- while (q <= &c->x86_model_id[48])
- *q++ = '\0'; /* Zero-pad the rest */
- }
+ memmove(c->x86_model_id, strim(c->x86_model_id), 48);
}
void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
@@ -1122,7 +1113,7 @@ void print_cpu_info(struct cpuinfo_x86 *c)
printk(KERN_CONT "%s ", vendor);
if (c->x86_model_id[0])
- printk(KERN_CONT "%s", strim(c->x86_model_id));
+ printk(KERN_CONT "%s", c->x86_model_id);
else
printk(KERN_CONT "%d86", c->x86);
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
2015-05-27 14:16 ` [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo " tip-bot for Prarit Bhargava
@ 2015-05-27 17:07 ` Joe Perches
2015-05-27 19:06 ` Borislav Petkov
0 siblings, 1 reply; 400+ messages in thread
From: Joe Perches @ 2015-05-27 17:07 UTC (permalink / raw)
To: luto, bp, peterz, dvlasenk, torvalds, imammedo, brgerst, mingo,
prarit, dave.hansen, fenghua.yu, hpa, linux-kernel, tglx, bp
Cc: linux-tip-commits
On Wed, 2015-05-27 at 07:16 -0700, tip-bot for Prarit Bhargava wrote:
> x86/cpu: Strip any /proc/cpuinfo model name field whitespace
[]
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> @@ -431,18 +430,10 @@ static void get_model_name(struct cpuinfo_x86 *c)
> c->x86_model_id[48] = 0;
>
> /*
> - * Intel chips right-justify this string for some dumb reason;
> - * undo that brain damage:
> + * Remove leading whitespace on Intel processors and trailing
> + * whitespace on AMD processors.
> */
> - p = q = &c->x86_model_id[0];
> - while (*p == ' ')
> - p++;
> - if (p != q) {
> - while (*p)
> - *q++ = *p++;
> - while (q <= &c->x86_model_id[48])
> - *q++ = '\0'; /* Zero-pad the rest */
> - }
> + memmove(c->x86_model_id, strim(c->x86_model_id), 48);
This code can memmove from beyond the x86_model_id field.
If the id was a single right justified char, to avoid overrunning
the field, it'd be safer moving only the actual string and
terminating 0 though this code is sub-optimal:
memmove(c->x86_model_id, strim(c->x86_model_id),
strlen(strim(c->x86_model_id) + 1);
Maybe:
char *model = strim(c->x86_model_id);
memmove(c->x86_model_id, model, strlen(model) + 1);
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
2015-05-27 17:07 ` Joe Perches
@ 2015-05-27 19:06 ` Borislav Petkov
2015-05-27 19:16 ` Joe Perches
0 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-27 19:06 UTC (permalink / raw)
To: Joe Perches
Cc: luto, peterz, dvlasenk, torvalds, imammedo, brgerst, mingo,
prarit, dave.hansen, fenghua.yu, hpa, linux-kernel, tglx, bp,
linux-tip-commits
On Wed, May 27, 2015 at 10:07:34AM -0700, Joe Perches wrote:
> This code can memmove from beyond the x86_model_id field.
... in the theoretical case where some model ID has more than 64 - 48
preceding white spaces.
I guess we want to be prepared here for insane CPU model IDs coming from
virtualization.
> Maybe:
> char *model = strim(c->x86_model_id);
> memmove(c->x86_model_id, model, strlen(model) + 1);
Yes, and additionally limit that string length:
---
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b35c777df6df..9d1fd48486d6 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -383,6 +383,9 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
static void get_model_name(struct cpuinfo_x86 *c)
{
unsigned int *v;
+ const char *model;
+
+#define MODEL_ID_MAXLEN 48
if (c->extended_cpuid_level < 0x80000004)
return;
@@ -391,13 +394,15 @@ static void get_model_name(struct cpuinfo_x86 *c)
cpuid(0x80000002, &v[0], &v[1], &v[2], &v[3]);
cpuid(0x80000003, &v[4], &v[5], &v[6], &v[7]);
cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
- c->x86_model_id[48] = 0;
+ c->x86_model_id[MODEL_ID_MAXLEN] = 0;
/*
* Remove leading whitespace on Intel processors and trailing
* whitespace on AMD processors.
*/
- memmove(c->x86_model_id, strim(c->x86_model_id), 48);
+ model = strim(c->x86_model_id);
+
+ memmove(c->x86_model_id, model, strnlen(model, MODEL_ID_MAXLEN) + 1);
}
void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
2015-05-27 19:06 ` Borislav Petkov
@ 2015-05-27 19:16 ` Joe Perches
2015-05-28 11:27 ` Prarit Bhargava
0 siblings, 1 reply; 400+ messages in thread
From: Joe Perches @ 2015-05-27 19:16 UTC (permalink / raw)
To: Borislav Petkov
Cc: luto, peterz, dvlasenk, torvalds, imammedo, brgerst, mingo,
prarit, dave.hansen, fenghua.yu, hpa, linux-kernel, tglx, bp,
linux-tip-commits
On Wed, 2015-05-27 at 21:06 +0200, Borislav Petkov wrote:
> On Wed, May 27, 2015 at 10:07:34AM -0700, Joe Perches wrote:
> > This code can memmove from beyond the x86_model_id field.
>
> ... in the theoretical case where some model ID has more than 64 - 48
> preceding white spaces.
>
> I guess we want to be prepared here for insane CPU model IDs coming from
> virtualization.
>
> > Maybe:
> > char *model = strim(c->x86_model_id);
> > memmove(c->x86_model_id, model, strlen(model) + 1);
>
> Yes, and additionally limit that string length:
>
> ---
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
[]
> @@ -383,6 +383,9 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
> static void get_model_name(struct cpuinfo_x86 *c)
> {
> unsigned int *v;
> + const char *model;
> +
> +#define MODEL_ID_MAXLEN 48
>
> if (c->extended_cpuid_level < 0x80000004)
> return;
> @@ -391,13 +394,15 @@ static void get_model_name(struct cpuinfo_x86 *c)
> cpuid(0x80000002, &v[0], &v[1], &v[2], &v[3]);
> cpuid(0x80000003, &v[4], &v[5], &v[6], &v[7]);
> cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
> - c->x86_model_id[48] = 0;
> + c->x86_model_id[MODEL_ID_MAXLEN] = 0;
>
> /*
> * Remove leading whitespace on Intel processors and trailing
> * whitespace on AMD processors.
> */
> - memmove(c->x86_model_id, strim(c->x86_model_id), 48);
> + model = strim(c->x86_model_id);
> +
> + memmove(c->x86_model_id, model, strnlen(model, MODEL_ID_MAXLEN) + 1);
I don't see any value in the #define or strnlen over strlen as
it's guaranteed terminated by the = 0 above, but <shrug> thanks.
cheers, Joe
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
2015-05-27 19:16 ` Joe Perches
@ 2015-05-28 11:27 ` Prarit Bhargava
2015-05-28 11:32 ` Borislav Petkov
0 siblings, 1 reply; 400+ messages in thread
From: Prarit Bhargava @ 2015-05-28 11:27 UTC (permalink / raw)
To: Joe Perches
Cc: Borislav Petkov, luto, peterz, dvlasenk, torvalds, imammedo,
brgerst, mingo, dave.hansen, fenghua.yu, hpa, linux-kernel, tglx,
bp, linux-tip-commits
On 05/27/2015 03:16 PM, Joe Perches wrote:
> On Wed, 2015-05-27 at 21:06 +0200, Borislav Petkov wrote:
>> On Wed, May 27, 2015 at 10:07:34AM -0700, Joe Perches wrote:
>>> This code can memmove from beyond the x86_model_id field.
>>
>> ... in the theoretical case where some model ID has more than 64 - 48
>> preceding white spaces.
>>
>> I guess we want to be prepared here for insane CPU model IDs coming from
>> virtualization.
>>
>>> Maybe:
>>> char *model = strim(c->x86_model_id);
>>> memmove(c->x86_model_id, model, strlen(model) + 1);
>>
>> Yes, and additionally limit that string length:
>>
>> ---
>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> []
>> @@ -383,6 +383,9 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
>> static void get_model_name(struct cpuinfo_x86 *c)
>> {
>> unsigned int *v;
>> + const char *model;
>> +
>> +#define MODEL_ID_MAXLEN 48
>>
>> if (c->extended_cpuid_level < 0x80000004)
>> return;
>> @@ -391,13 +394,15 @@ static void get_model_name(struct cpuinfo_x86 *c)
>> cpuid(0x80000002, &v[0], &v[1], &v[2], &v[3]);
>> cpuid(0x80000003, &v[4], &v[5], &v[6], &v[7]);
>> cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
>> - c->x86_model_id[48] = 0;
>> + c->x86_model_id[MODEL_ID_MAXLEN] = 0;
>>
>> /*
>> * Remove leading whitespace on Intel processors and trailing
>> * whitespace on AMD processors.
>> */
>> - memmove(c->x86_model_id, strim(c->x86_model_id), 48);
>> + model = strim(c->x86_model_id);
>> +
>> + memmove(c->x86_model_id, model, strnlen(model, MODEL_ID_MAXLEN) + 1);
>
> I don't see any value in the #define or strnlen over strlen as
> it's guaranteed terminated by the = 0 above, but <shrug> thanks.
>
FWIW, I agree with Joe here and don't think the #define is necessary.
I will post a follow-up patch against tip on LKML shortly.
P.
> cheers, Joe
>
>
>
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
2015-05-28 11:27 ` Prarit Bhargava
@ 2015-05-28 11:32 ` Borislav Petkov
2015-05-28 12:58 ` Borislav Petkov
0 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-28 11:32 UTC (permalink / raw)
To: Prarit Bhargava
Cc: Joe Perches, luto, peterz, dvlasenk, torvalds, imammedo, brgerst,
mingo, dave.hansen, fenghua.yu, hpa, linux-kernel, tglx, bp,
linux-tip-commits
On Thu, May 28, 2015 at 07:27:19AM -0400, Prarit Bhargava wrote:
> FWIW, I agree with Joe here and don't think the #define is necessary.
> I will post a follow-up patch against tip on LKML shortly.
No need, I have a better one:
---
From: Borislav Petkov <bp@suse.de>
Date: Tue, 26 May 2015 10:28:17 +0200
Subject: [PATCH] x86/cpu: Trim model id whitespace
We did try trimming whitespace surrounding the 'model name' field
in /proc/cpuinfo since reportedly some userspace uses it in string
comparisons and there were discrepancies:
[thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
______1_model_name :_AMD_Opteron(TM)_Processor_6272
_____63_model_name :_AMD_Opteron(TM)_Processor_6272_________________
However, there were issues with overlapping buffers, string sizes and
non-byte-sized copies in the previous proposed solutions; see Link tags
below for the whole farce.
So, instead of diddling with this more, let's simply extend what was
there originally with trimming any present trailing whitespace. Final
result is really simple and obvious.
Testing with the most insane model IDs qemu can generate, looks good:
.model_id = " My funny model ID CPU ",
______4_model_name :_My_funny_model_ID_CPU
.model_id = "My funny model ID CPU ",
______4_model_name :_My_funny_model_ID_CPU
.model_id = " My funny model ID CPU",
______4_model_name :_My_funny_model_ID_CPU
.model_id = " ",
______4_model_name :__
.model_id = "",
______4_model_name :_15/02
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Link: http://lkml.kernel.org/r/1432628901-18044-15-git-send-email-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/common.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 41a8e9cb30bc..351197cbbc8e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -5,6 +5,7 @@
#include <linux/module.h>
#include <linux/percpu.h>
#include <linux/string.h>
+#include <linux/ctype.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/init.h>
@@ -419,6 +420,7 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
static void get_model_name(struct cpuinfo_x86 *c)
{
unsigned int *v;
+ char *p, *q, *s;
if (c->extended_cpuid_level < 0x80000004)
return;
@@ -429,11 +431,21 @@ static void get_model_name(struct cpuinfo_x86 *c)
cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
c->x86_model_id[48] = 0;
- /*
- * Remove leading whitespace on Intel processors and trailing
- * whitespace on AMD processors.
- */
- memmove(c->x86_model_id, strim(c->x86_model_id), 48);
+ /* Trim whitespace */
+ p = q = s = &c->x86_model_id[0];
+
+ while (*p == ' ')
+ p++;
+
+ while (*p) {
+ /* Note the last non-whitespace index */
+ if (!isspace(*p))
+ s = q;
+
+ *q++ = *p++;
+ }
+
+ *(s + 1) = '\0';
}
void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
--
2.3.5
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
2015-05-28 11:32 ` Borislav Petkov
@ 2015-05-28 12:58 ` Borislav Petkov
2015-05-28 16:57 ` H. Peter Anvin
0 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-28 12:58 UTC (permalink / raw)
To: Prarit Bhargava
Cc: Joe Perches, luto, peterz, dvlasenk, torvalds, imammedo, brgerst,
mingo, dave.hansen, fenghua.yu, hpa, linux-kernel, tglx, bp,
linux-tip-commits
On Thu, May 28, 2015 at 01:32:29PM +0200, Borislav Petkov wrote:
> + while (*p) {
> + /* Note the last non-whitespace index */
> + if (!isspace(*p))
> + s = q;
> +
> + *q++ = *p++;
This should be optimized to not copy if there's no preceding whitespace
and p == q:
From: Borislav Petkov <bp@suse.de>
Date: Tue, 26 May 2015 10:28:17 +0200
Subject: [PATCH] x86/cpu: Trim model id whitespace
We did try trimming whitespace surrounding the 'model name' field
in /proc/cpuinfo since reportedly some userspace uses it in string
comparisons and there were discrepancies:
[thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
______1_model_name :_AMD_Opteron(TM)_Processor_6272
_____63_model_name :_AMD_Opteron(TM)_Processor_6272_________________
However, there were issues with overlapping buffers, string sizes and
non-byte-sized copies in the previous proposed solutions; see Link tags
below for the whole farce.
So, instead of diddling with this more, let's simply extend what was
there originally with trimming any present trailing whitespace. Final
result is really simple and obvious.
Testing with the most insane model IDs qemu can generate, looks good:
.model_id = " My funny model ID CPU ",
______4_model_name :_My_funny_model_ID_CPU
.model_id = "My funny model ID CPU ",
______4_model_name :_My_funny_model_ID_CPU
.model_id = " My funny model ID CPU",
______4_model_name :_My_funny_model_ID_CPU
.model_id = " ",
______4_model_name :__
.model_id = "",
______4_model_name :_15/02
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Link: http://lkml.kernel.org/r/1432628901-18044-15-git-send-email-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/common.c | 27 ++++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 41a8e9cb30bc..18120a33a2c1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -5,6 +5,7 @@
#include <linux/module.h>
#include <linux/percpu.h>
#include <linux/string.h>
+#include <linux/ctype.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/init.h>
@@ -419,6 +420,7 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
static void get_model_name(struct cpuinfo_x86 *c)
{
unsigned int *v;
+ char *p, *q, *s;
if (c->extended_cpuid_level < 0x80000004)
return;
@@ -429,11 +431,26 @@ static void get_model_name(struct cpuinfo_x86 *c)
cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
c->x86_model_id[48] = 0;
- /*
- * Remove leading whitespace on Intel processors and trailing
- * whitespace on AMD processors.
- */
- memmove(c->x86_model_id, strim(c->x86_model_id), 48);
+ /* Trim whitespace */
+ p = q = s = &c->x86_model_id[0];
+
+ while (*p == ' ')
+ p++;
+
+ while (*p) {
+ /* Note the last non-whitespace index: */
+ if (!isspace(*p))
+ s = q;
+
+ /* Only copy if p advanced due to whitespace: */
+ if (p != q)
+ *q = *p;
+
+ p++;
+ q++;
+ }
+
+ *(s + 1) = '\0';
}
void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
--
2.3.5
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply related [flat|nested] 400+ messages in thread
* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
2015-05-28 12:58 ` Borislav Petkov
@ 2015-05-28 16:57 ` H. Peter Anvin
2015-05-28 18:33 ` Borislav Petkov
0 siblings, 1 reply; 400+ messages in thread
From: H. Peter Anvin @ 2015-05-28 16:57 UTC (permalink / raw)
To: Borislav Petkov, Prarit Bhargava
Cc: Joe Perches, luto, peterz, dvlasenk, torvalds, imammedo, brgerst,
mingo, dave.hansen, fenghua.yu, linux-kernel, tglx, bp,
linux-tip-commits
Why?!
We are taking about 48 bytes run once per cpu. It isn't worth it to optimize, in fact the extra code size hurts more.
On May 28, 2015 5:58:19 AM PDT, Borislav Petkov <bp@alien8.de> wrote:
>On Thu, May 28, 2015 at 01:32:29PM +0200, Borislav Petkov wrote:
>> + while (*p) {
>> + /* Note the last non-whitespace index */
>> + if (!isspace(*p))
>> + s = q;
>> +
>> + *q++ = *p++;
>
>This should be optimized to not copy if there's no preceding whitespace
>and p == q:
>
>From: Borislav Petkov <bp@suse.de>
>Date: Tue, 26 May 2015 10:28:17 +0200
>Subject: [PATCH] x86/cpu: Trim model id whitespace
>
>We did try trimming whitespace surrounding the 'model name' field
>in /proc/cpuinfo since reportedly some userspace uses it in string
>comparisons and there were discrepancies:
>
>[thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed
>'s/\ /_/g'
> ______1_model_name :_AMD_Opteron(TM)_Processor_6272
>_____63_model_name
>:_AMD_Opteron(TM)_Processor_6272_________________
>
>However, there were issues with overlapping buffers, string sizes and
>non-byte-sized copies in the previous proposed solutions; see Link tags
>below for the whole farce.
>
>So, instead of diddling with this more, let's simply extend what was
>there originally with trimming any present trailing whitespace. Final
>result is really simple and obvious.
>
>Testing with the most insane model IDs qemu can generate, looks good:
>
> .model_id = " My funny model ID CPU ",
> ______4_model_name :_My_funny_model_ID_CPU
>
> .model_id = "My funny model ID CPU ",
> ______4_model_name :_My_funny_model_ID_CPU
>
> .model_id = " My funny model ID CPU",
> ______4_model_name :_My_funny_model_ID_CPU
>
> .model_id = " ",
> ______4_model_name :__
>
> .model_id = "",
> ______4_model_name :_15/02
>
>Cc: Andy Lutomirski <luto@amacapital.net>
>Cc: Brian Gerst <brgerst@gmail.com>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Denys Vlasenko <dvlasenk@redhat.com>
>Cc: Fenghua Yu <fenghua.yu@intel.com>
>Cc: H. Peter Anvin <hpa@zytor.com>
>Cc: Igor Mammedov <imammedo@redhat.com>
>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Link:
>http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
>Link:
>http://lkml.kernel.org/r/1432628901-18044-15-git-send-email-bp@alien8.de
>Signed-off-by: Borislav Petkov <bp@suse.de>
>---
> arch/x86/kernel/cpu/common.c | 27 ++++++++++++++++++++++-----
> 1 file changed, 22 insertions(+), 5 deletions(-)
>
>diff --git a/arch/x86/kernel/cpu/common.c
>b/arch/x86/kernel/cpu/common.c
>index 41a8e9cb30bc..18120a33a2c1 100644
>--- a/arch/x86/kernel/cpu/common.c
>+++ b/arch/x86/kernel/cpu/common.c
>@@ -5,6 +5,7 @@
> #include <linux/module.h>
> #include <linux/percpu.h>
> #include <linux/string.h>
>+#include <linux/ctype.h>
> #include <linux/delay.h>
> #include <linux/sched.h>
> #include <linux/init.h>
>@@ -419,6 +420,7 @@ static const struct cpu_dev
>*cpu_devs[X86_VENDOR_NUM] = {};
> static void get_model_name(struct cpuinfo_x86 *c)
> {
> unsigned int *v;
>+ char *p, *q, *s;
>
> if (c->extended_cpuid_level < 0x80000004)
> return;
>@@ -429,11 +431,26 @@ static void get_model_name(struct cpuinfo_x86 *c)
> cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
> c->x86_model_id[48] = 0;
>
>- /*
>- * Remove leading whitespace on Intel processors and trailing
>- * whitespace on AMD processors.
>- */
>- memmove(c->x86_model_id, strim(c->x86_model_id), 48);
>+ /* Trim whitespace */
>+ p = q = s = &c->x86_model_id[0];
>+
>+ while (*p == ' ')
>+ p++;
>+
>+ while (*p) {
>+ /* Note the last non-whitespace index: */
>+ if (!isspace(*p))
>+ s = q;
>+
>+ /* Only copy if p advanced due to whitespace: */
>+ if (p != q)
>+ *q = *p;
>+
>+ p++;
>+ q++;
>+ }
>+
>+ *(s + 1) = '\0';
> }
>
> void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
2015-05-28 16:57 ` H. Peter Anvin
@ 2015-05-28 18:33 ` Borislav Petkov
2015-05-28 20:39 ` H. Peter Anvin
0 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-28 18:33 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Prarit Bhargava, Joe Perches, luto, peterz, dvlasenk, torvalds,
imammedo, brgerst, mingo, dave.hansen, fenghua.yu, linux-kernel,
tglx, bp, linux-tip-commits
On Thu, May 28, 2015 at 09:57:15AM -0700, H. Peter Anvin wrote:
> Why?!
>
> We are taking about 48 bytes run once per cpu. It isn't worth it to
> optimize, in fact the extra code size hurts more.
I wanted to save us the redundant copying of the exact same bytes.
Because when there's no preceding whitespace, p and q point at the same
thing so we end up doing *p = *p.
OTOH, without the optimization, the code is even simpler.
I can remove it if you wanna - I don't care all that much.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 400+ messages in thread
* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
2015-05-28 18:33 ` Borislav Petkov
@ 2015-05-28 20:39 ` H. Peter Anvin
0 siblings, 0 replies; 400+ messages in thread
From: H. Peter Anvin @ 2015-05-28 20:39 UTC (permalink / raw)
To: Borislav Petkov
Cc: Prarit Bhargava, Joe Perches, luto, peterz, dvlasenk, torvalds,
imammedo, brgerst, mingo, dave.hansen, fenghua.yu, linux-kernel,
tglx, bp, linux-tip-commits
On 05/28/2015 11:33 AM, Borislav Petkov wrote:
> On Thu, May 28, 2015 at 09:57:15AM -0700, H. Peter Anvin wrote:
>> Why?!
>>
>> We are taking about 48 bytes run once per cpu. It isn't worth it to
>> optimize, in fact the extra code size hurts more.
>
> I wanted to save us the redundant copying of the exact same bytes.
> Because when there's no preceding whitespace, p and q point at the same
> thing so we end up doing *p = *p.
>
> OTOH, without the optimization, the code is even simpler.
>
> I can remove it if you wanna - I don't care all that much.
>
Yes, please. Actually, with a test inside the loop the way you have it,
the resulting code will almost certainly be slower -- a redundant write
to an already dirty cache line is way cheaper than a branch.
-hpa
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH 15/18] x86/documentation: Move kernel-stacks doc one level up
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (13 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 14/18] x86/cpu: Strip any /proc/cpuinfo model name field whitespace Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:17 ` [tip:x86/debug] x86/Documentation: " tip-bot for Borislav Petkov
2015-05-26 8:28 ` [PATCH 16/18] x86/documentation: Remove STACKFAULT_STACK bulletpoint Borislav Petkov
` (2 subsequent siblings)
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: Borislav Petkov <bp@suse.de>
... to Documentation/x86/ as it is going to collect more and not only
64-bit specific info.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: X86 ML <x86@kernel.org>
Cc: live-patching@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/x86/{x86_64 => }/kernel-stacks | 0
1 file changed, 0 insertions(+), 0 deletions(-)
rename Documentation/x86/{x86_64 => }/kernel-stacks (100%)
diff --git a/Documentation/x86/x86_64/kernel-stacks b/Documentation/x86/kernel-stacks
similarity index 100%
rename from Documentation/x86/x86_64/kernel-stacks
rename to Documentation/x86/kernel-stacks
--
1.9.0.258.g00eda23
^ permalink raw reply [flat|nested] 400+ messages in thread
* [tip:x86/debug] x86/Documentation: Move kernel-stacks doc one level up
2015-05-26 8:28 ` [PATCH 15/18] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
@ 2015-05-27 14:17 ` tip-bot for Borislav Petkov
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Borislav Petkov @ 2015-05-27 14:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: dvlasenk, bp, jpoimboe, torvalds, luto, peterz, linux-kernel,
brgerst, mingo, a.p.zijlstra, tglx, akpm, hpa, luto, mmarek, bp
Commit-ID: 54fd15780526c47fa29a85b066cf69996be59a59
Gitweb: http://git.kernel.org/tip/54fd15780526c47fa29a85b066cf69996be59a59
Author: Borislav Petkov <bp@suse.de>
AuthorDate: Tue, 26 May 2015 10:28:18 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:39:44 +0200
x86/Documentation: Move kernel-stacks doc one level up
... to Documentation/x86/ as it is going to collect more and not
only 64-bit specific info.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/1432628901-18044-16-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
Documentation/x86/{x86_64 => }/kernel-stacks | 0
1 file changed, 0 insertions(+), 0 deletions(-)
diff --git a/Documentation/x86/x86_64/kernel-stacks b/Documentation/x86/kernel-stacks
similarity index 100%
rename from Documentation/x86/x86_64/kernel-stacks
rename to Documentation/x86/kernel-stacks
^ permalink raw reply [flat|nested] 400+ messages in thread
* [PATCH 16/18] x86/documentation: Remove STACKFAULT_STACK bulletpoint
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (14 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 15/18] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-27 14:17 ` [tip:x86/debug] x86/Documentation: " tip-bot for Borislav Petkov
2015-05-26 8:28 ` [PATCH 17/18] x86/documentation: Adapt Ingo's explanation on printing backtraces Borislav Petkov
2015-05-26 8:28 ` [PATCH 18/18] x86/mce: Fix monarch timeout setting through the mce= cmdline option Borislav Petkov
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: Borislav Petkov <bp@suse.de>
Update the documentation after
6f442be2fb22 ("x86_64, traps: Stop using IST for #SS").
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: X86 ML <x86@kernel.org>
Cc: live-patching@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/x86/kernel-stacks | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index e3c8a49d1a2f..c3c935b9d56e 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -1,3 +1,6 @@
+Kernel stacks on x86-64 bit
+---------------------------
+
Most of the text from Keith Owens, hacked by AK
x86_64 page size (PAGE_SIZE) is 4K.
@@ -56,13 +59,6 @@ If that assumption is ever broken then the stacks will become corrupt.
The currently assigned IST stacks are :-
-* STACKFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
-
- Used for interrupt 12 - Stack Fault Exception (#SS).
-
- This allows the CPU to recover from invalid stack segments. Rarely
- happens.
-
* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
Used for interrupt 8 - Double Fault Exception (#DF).
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/debug] x86/Documentation: Remove STACKFAULT_STACK bulletpoint
2015-05-26 8:28 ` [PATCH 16/18] x86/documentation: Remove STACKFAULT_STACK bulletpoint Borislav Petkov
@ 2015-05-27 14:17 ` tip-bot for Borislav Petkov
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Borislav Petkov @ 2015-05-27 14:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: tglx, jpoimboe, peterz, brgerst, mmarek, linux-kernel,
a.p.zijlstra, luto, bp, akpm, luto, hpa, bp, mingo, dvlasenk,
torvalds
Commit-ID: d724a9a52b0026ac6a05440c079c9a618acfd8cf
Gitweb: http://git.kernel.org/tip/d724a9a52b0026ac6a05440c079c9a618acfd8cf
Author: Borislav Petkov <bp@suse.de>
AuthorDate: Tue, 26 May 2015 10:28:19 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:39:46 +0200
x86/Documentation: Remove STACKFAULT_STACK bulletpoint
Update the documentation after
6f442be2fb22 ("x86_64, traps: Stop using IST for #SS").
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/1432628901-18044-17-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
Documentation/x86/kernel-stacks | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index e3c8a49..c3c935b 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -1,3 +1,6 @@
+Kernel stacks on x86-64 bit
+---------------------------
+
Most of the text from Keith Owens, hacked by AK
x86_64 page size (PAGE_SIZE) is 4K.
@@ -56,13 +59,6 @@ If that assumption is ever broken then the stacks will become corrupt.
The currently assigned IST stacks are :-
-* STACKFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
-
- Used for interrupt 12 - Stack Fault Exception (#SS).
-
- This allows the CPU to recover from invalid stack segments. Rarely
- happens.
-
* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
Used for interrupt 8 - Double Fault Exception (#DF).
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 17/18] x86/documentation: Adapt Ingo's explanation on printing backtraces
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (15 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 16/18] x86/documentation: Remove STACKFAULT_STACK bulletpoint Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-05-26 8:28 ` [PATCH 18/18] x86/mce: Fix monarch timeout setting through the mce= cmdline option Borislav Petkov
17 siblings, 0 replies; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: Borislav Petkov <bp@suse.de>
Hold it down for future reference, as the question about the question
mark in stack traces keeps popping up.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: X86 ML <x86@kernel.org>
Cc: live-patching@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150521101614.GA10889@gmail.com
---
Documentation/x86/kernel-stacks | 44 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index c3c935b9d56e..0f3a6c201943 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -95,3 +95,47 @@ The currently assigned IST stacks are :-
assumptions about the previous state of the kernel stack.
For more details see the Intel IA32 or AMD AMD64 architecture manuals.
+
+
+Printing backtraces on x86
+--------------------------
+
+The question about the '?' preceding function names in an x86 stacktrace
+keeps popping up, here's an indepth explanation. It helps if the reader
+stares at print_context_stack() and the whole machinery in and around
+arch/x86/kernel/dumpstack.c.
+
+Adapted from Ingo's mail, Message-ID: <20150521101614.GA10889@gmail.com>:
+
+We always scan the full kernel stack for return addresses stored on
+the kernel stack(s) [*], from stack top to stack bottom, and print out
+anything that 'looks like' a kernel text address.
+
+If it fits into the frame pointer chain, we print it without a question
+mark, knowing that it's part of the real backtrace.
+
+If the address does not fit into our expected frame pointer chain we
+still print it, but we print a '?'. It can mean two things:
+
+ - either the address is not part of the call chain: it's just stale
+ values on the kernel stack, from earlier function calls. This is
+ the common case.
+
+ - or it is part of the call chain, but the frame pointer was not set
+ up properly within the function, so we don't recognize it.
+
+This way we will always print out the real call chain (plus a few more
+entries), regardless of whether the frame pointer was set up correctly
+or not - but in most cases we'll get the call chain right as well. The
+entries printed are strictly in stack order, so you can deduce more
+information from that as well.
+
+The most important property of this method is that we _never_ lose
+information: we always strive to print _all_ addresses on the stack(s)
+that look like kernel text addresses, so if debug information is wrong,
+we still print out the real call chain as well - just with more question
+marks than ideal.
+
+[*] For things like IRQ and IST stacks, we also scan those stacks, in
+ the right order, and try to cross from one stack into another
+ reconstructing the call chain. This works most of the time.
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [PATCH 18/18] x86/mce: Fix monarch timeout setting through the mce= cmdline option
2015-05-26 8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
` (16 preceding siblings ...)
2015-05-26 8:28 ` [PATCH 17/18] x86/documentation: Adapt Ingo's explanation on printing backtraces Borislav Petkov
@ 2015-05-26 8:28 ` Borislav Petkov
2015-06-07 17:39 ` [tip:x86/core] " tip-bot for Xie XiuQi
17 siblings, 1 reply; 400+ messages in thread
From: Borislav Petkov @ 2015-05-26 8:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: X86-ML, LKML
From: Xie XiuQi <xiexiuqi@huawei.com>
Using "mce=1,10000000" on the kernel cmdline to change the monarch
timeout does not work. The cause is that get_option() does parse a
subsequent comma in the option string and signals that with a return
value. So we don't need to check for a second comma ourselves.
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1432120943-25028-1-git-send-email-xiexiuqi@huawei.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/mcheck/mce.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e535533d5ab8..e6580b9255de 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2008,11 +2008,8 @@ static int __init mcheck_enable(char *str)
else if (!strcmp(str, "bios_cmci_threshold"))
cfg->bios_cmci_threshold = true;
else if (isdigit(str[0])) {
- get_option(&str, &(cfg->tolerant));
- if (*str == ',') {
- ++str;
+ if (get_option(&str, &(cfg->tolerant)) == 2)
get_option(&str, &(cfg->monarch_timeout));
- }
} else {
pr_info("mce argument %s ignored. Please use /sys\n", str);
return 0;
--
1.9.0.258.g00eda23
^ permalink raw reply related [flat|nested] 400+ messages in thread
* [tip:x86/core] x86/mce: Fix monarch timeout setting through the mce= cmdline option
2015-05-26 8:28 ` [PATCH 18/18] x86/mce: Fix monarch timeout setting through the mce= cmdline option Borislav Petkov
@ 2015-06-07 17:39 ` tip-bot for Xie XiuQi
0 siblings, 0 replies; 400+ messages in thread
From: tip-bot for Xie XiuQi @ 2015-06-07 17:39 UTC (permalink / raw)
To: linux-tip-commits
Cc: xiexiuqi, linux-kernel, dvlasenk, tglx, peterz, luto, mingo, hpa,
bp, brgerst, tony.luck, torvalds, bp
Commit-ID: 5c31b2800d8d3e735e5ecac8fc13d1cf862fd330
Gitweb: http://git.kernel.org/tip/5c31b2800d8d3e735e5ecac8fc13d1cf862fd330
Author: Xie XiuQi <xiexiuqi@huawei.com>
AuthorDate: Tue, 26 May 2015 10:28:21 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:39:14 +0200
x86/mce: Fix monarch timeout setting through the mce= cmdline option
Using "mce=1,10000000" on the kernel cmdline to change the
monarch timeout does not work. The cause is that get_option()
does parse a subsequent comma in the option string and signals
that with a return value. So we don't need to check for a second
comma ourselves.
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1432120943-25028-1-git-send-email-xiexiuqi@huawei.com
Link: http://lkml.kernel.org/r/1432628901-18044-19-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/kernel/cpu/mcheck/mce.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 521e501..0cbcd31 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2014,11 +2014,8 @@ static int __init mcheck_enable(char *str)
else if (!strcmp(str, "bios_cmci_threshold"))
cfg->bios_cmci_threshold = true;
else if (isdigit(str[0])) {
- get_option(&str, &(cfg->tolerant));
- if (*str == ',') {
- ++str;
+ if (get_option(&str, &cfg->tolerant) == 2)
get_option(&str, &(cfg->monarch_timeout));
- }
} else {
pr_info("mce argument %s ignored. Please use /sys\n", str);
return 0;
^ permalink raw reply related [flat|nested] 400+ messages in thread